JP2000293194A

JP2000293194A - Voice interactive device

Info

Publication number: JP2000293194A
Application number: JP11101628A
Authority: JP
Inventors: Keisuke Watanabe; 圭輔渡邉; Akito Nagai; 明人永井; Yasushi Ishikawa; 泰石川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1999-04-08
Filing date: 1999-04-08
Publication date: 2000-10-20
Anticipated expiration: 2019-04-08
Also published as: JP3933813B2

Abstract

PROBLEM TO BE SOLVED: To obtain a voice interactive device in which interactive procedures are determined to achieve an interactive objective most efficiently in accordance with a user. SOLUTION: The device is provided with a voice recognition section 1 which outputs a voice recognition result, an interactive procedure storage section 2 which holds a voice recognition object vocabulary, the voice recognition result and interactive procedures that define transition destination interactive states corresponding to the number of malrecognitions, a voice recognition correct/ error number storage section 3 which holds the number of correct/errors in voice recognition, a transition destination interactive state determining section 4 which determines and outputs a transition destination interactive state while referring to the interactive procedures held in the section 2, and an interactive control section 5 which transitions an interactive state to the transition destination interactive state outputted by the section 4. Thus, interactive procedures, in which an interactive objective is most efficiently achieved in accordance with the user.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、自然言語による
マン・マシン・インタフェースに用いられる音声対話装
置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a spoken dialogue apparatus used for a natural language man-machine interface.

【０００２】[0002]

【従来の技術】装置との音声による対話によって、利用
者が必要とする情報を得るような音声対話装置の重要性
が高まっている。このような音声対話装置においては、
利用者が必要とする情報を効率的に得るための対話制御
を行うことが重要であり、従来そのような目的のため
に、平均音声対話回数を推定し、その推定値に基づいて
対話手順を設定する方法が提案されている。2. Description of the Related Art Speech dialogue with a device has increased the importance of a voice dialogue device for obtaining information required by a user. In such a spoken dialogue device,
It is important to control the dialogue to obtain the information that the user needs efficiently. Conventionally, for such a purpose, the average number of spoken dialogues is estimated, and the dialogue procedure is performed based on the estimated value. A method of setting has been proposed.

【０００３】従来の音声対話装置について図面を参照し
ながら説明する。図１８は、例えば特開平１０−０９１
１８８号公報に示された従来の音声対話手順生成装置の
構成を示す図である。[0003] A conventional voice interaction apparatus will be described with reference to the drawings. FIG. 18 shows, for example, Japanese Patent Application Laid-Open No. 10-091.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing a configuration of a conventional voice interaction procedure generation device disclosed in Japanese Patent Publication No. 188.

【０００４】このように構成された従来の音声対話手順
生成装置において、対話全体繰り返し回数評価処理部で
は、基本対話分解部が対話手順を基本対話に分解し、基
本対話繰り返し回数評価処理部が音素誤認識行列と語彙
から求まる推定認識率を使用して各基本対話の繰り返し
回数を評価し、基本対話繰り返し回数合計部が各基本対
話の繰り返し回数を合計して出力する。最小選択出力部
が、各対話全体繰り返し回数評価処理部の出力のうちの
最小値を選択して対話手順を決定する。[0004] In the conventional speech dialogue procedure generating apparatus configured as described above, in the overall dialogue repeat count evaluation processor, the basic dialogue decomposer decomposes the dialogue procedure into basic dialogues, and the basic dialogue repeater evaluation processor treats the phoneme. The number of repetitions of each basic dialogue is evaluated using the estimated recognition rate obtained from the misrecognition matrix and the vocabulary, and a basic dialogue repetition totaling unit sums and outputs the number of repetitions of each basic dialogue. The minimum selection output unit selects the minimum value from the outputs of the evaluation units for the total number of repetitions of each dialog and determines the dialog procedure.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記の
ような従来の音声対話手順生成装置では、対話の繰り返
し回数の推定に用いる推定認識率は、実際の発声から予
め求めた音素誤認識行列と予め定められた語彙により求
めたものであり、装置に音声を入力している利用者の認
識率を表すものではない。したがって、推定される対話
の繰り返し回数は、特定の利用者の音声認識率を反映し
た繰り返し回数ではないため、決定される対話手順は必
ずしも利用者が最も効率よく対話目的を達成するもので
はないという問題点があった。However, in the above-described conventional speech dialogue procedure generation apparatus, the estimated recognition rate used for estimating the number of times of dialogue is determined by the phoneme misrecognition matrix obtained in advance from the actual utterance. It is obtained from a predetermined vocabulary, and does not represent the recognition rate of a user who is inputting voice to the device. Therefore, since the estimated number of repetitions of the dialogue is not the number of repetitions reflecting the speech recognition rate of a specific user, the determined dialogue procedure does not necessarily ensure that the user achieves the purpose of the dialogue most efficiently. There was a problem.

【０００６】この発明は、前述した問題点を解決するた
めになされたもので、利用者に応じて最も効率よく対話
目的を達成するための対話手順を決定できる音声対話装
置を得ることを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and has as its object to provide a voice interactive device capable of determining a dialog procedure for achieving a dialog purpose most efficiently according to a user. I do.

【０００７】[0007]

【課題を解決するための手段】この発明の請求項１に係
る音声対話装置は、入力音声に対して認識処理を行い音
声認識結果を出力する音声認識部と、各対話状態におけ
る、音声認識対象語彙、音声認識結果及び誤認識回数に
応じた遷移先対話状態を規定した対話手順を保持する対
話手順記憶部と、音声認識の正誤回数を保持する音声認
識正誤回数記憶部と、前記音声認識正誤回数記憶部に保
持された音声認識の正誤回数と前記音声認識部が出力す
る音声認識結果に基づいて、前記対話手順記憶部に保持
された対話手順を参照して遷移先対話状態を決定して出
力する遷移先対話状態決定部と、前記音声認識部が出力
する音声認識結果に対する正誤結果を出力し、前記遷移
先対話状態決定部が出力する遷移先対話状態へ対話状態
を遷移する対話管理部とを備えたものである。According to a first aspect of the present invention, there is provided a speech dialogue apparatus for performing a recognition process on an input speech and outputting a speech recognition result, and a speech recognition target in each dialogue state. A dialogue procedure storage unit that holds a dialogue procedure that defines a transition destination dialogue state according to a vocabulary, a speech recognition result, and the number of erroneous recognitions; a speech recognition true / false number storage unit that holds the number of correctness / errors of speech recognition; Based on the number of correct / incorrect speech recognition held in the number-of-times storage unit and the speech recognition result output by the voice recognition unit, determine the transition destination dialog state with reference to the dialog procedure held in the dialog procedure storage unit. A transition destination dialog state determination unit to output, and a dialogue tube that outputs a correct / incorrect result with respect to the speech recognition result output by the voice recognition unit and transitions the dialog state to the transition destination dialog state output by the transition destination dialog state determination unit It is obtained by a part.

【０００８】この発明の請求項２に係る音声対話装置
は、入力音声に対して認識処理を行い音声認識結果を出
力する音声認識部と、各対話状態における、音声認識対
象語彙、音声認識結果及び想定認識率に応じた遷移先対
話状態を規定した対話手順を保持する対話手順記憶部
と、音声認識の正誤回数を保持する音声認識正誤回数記
憶部と、前記音声認識正誤回数記憶部に保持された音声
認識の正誤回数に基づいて、現在の対話状態に規定され
た想定認識率に対して検定を行い、棄却されない想定認
識率をすべて出力する想定音声認識率検定部と、前記対
話手順記憶部に保持された対話手順を参照して、前記音
声認識部が出力する音声認識結果と前記想定音声認識率
検定部が出力する想定認識率に対応する遷移先対話状態
から、遷移先対話状態を１つに決定して出力する遷移先
対話状態決定部と、前記音声認識部が出力する音声認識
結果に対する正誤結果を出力し、前記遷移先対話状態決
定部が出力する遷移先対話状態へ対話状態を遷移する対
話管理部とを備えたものである。According to a second aspect of the present invention, there is provided a voice interaction apparatus for performing a recognition process on an input voice and outputting a voice recognition result, and a voice recognition target vocabulary, a voice recognition result, and a voice recognition result in each dialogue state. An interaction procedure storage unit that holds an interaction procedure that defines a transition destination interaction state according to an assumed recognition rate, a speech recognition accuracy / error number storage unit that holds the number of times of speech recognition error, and an audio recognition accuracy / number of error storage unit that is held An expected speech recognition rate test unit that performs a test on an assumed recognition rate defined in the current conversation state based on the number of correct / incorrect speech recognitions performed, and outputs all of the not-rejected assumed recognition rates; With reference to the dialogue procedure held in the above, from the speech recognition result outputted by the speech recognition unit and the transition destination conversation state corresponding to the assumed recognition rate output by the assumed speech recognition rate test unit, the transition destination dialog state A transition destination dialog state determining unit that determines and outputs one, and outputs a correct / incorrect result for the speech recognition result output by the speech recognition unit, and sets the dialog state to the transition destination dialog state output by the transition destination dialog state determination unit And a dialogue management unit that transitions to

【０００９】この発明の請求項３に係る音声対話装置
は、前記対話管理部が、前記遷移先対話状態決定部が出
力する遷移先対話状態が対話終了状態であり、かつ利用
者の対話目的が達成されていない場合には、利用者との
対話を打ち切りオペレータに切り替えるものである。According to a third aspect of the present invention, in the speech dialogue apparatus, the dialogue management unit determines that the destination dialogue state output by the destination dialogue state determination unit is a dialogue end state, and that the user's dialogue purpose is If not achieved, the dialog with the user is terminated and switched to the operator.

【００１０】この発明の請求項４に係る音声対話装置
は、前記対話手順記憶部が、各対話状態における終了対
話状態までの平均対話回数を規定した対話手順を保持
し、前記遷移先対話状態決定部が、前記対話手順記憶部
に保持された対話手順を参照して、前記音声認識部が出
力する音声認識結果と、前記想定音声認識率検定部が出
力する想定認識率に対応する遷移先対話状態から、終了
対話状態までの平均対話回数に基づいて遷移先対話状態
を１つに決定して出力するものである。According to a fourth aspect of the present invention, in the voice dialogue apparatus, the dialogue procedure storage unit holds a dialogue procedure defining an average number of dialogues up to an end dialogue state in each dialogue state, and determines the transition destination dialogue state. The unit refers to the dialogue procedure stored in the dialogue procedure storage unit, and recognizes a speech recognition result output by the speech recognition unit and a transition destination dialogue corresponding to the assumed recognition rate output by the assumed speech recognition rate test unit. The transition destination conversation state is determined to be one based on the average number of conversations from the state to the end conversation state, and is output.

【００１１】この発明の請求項５に係る音声対話装置
は、前記対話手順記憶部が、各対話状態における音声認
識率分布を規定した対話手順を保持し、前記音声認識正
誤回数記憶部に保持された音声認識正誤回数を用いて、
現在の対話状態までの利用者の音声認識率を推定して出
力する音声認識率推定部と、前記音声認識率推定部が出
力する音声認識率と、現在の対話状態における音声認識
率分布に基づいて、利用者の入力が正しく認識される可
能性を判定して判定結果を出力する音声認識成功可能性
判定部とをさらに備え、前記対話管理部が、前記音声認
識成功可能性判定部の判定結果に基づいて、利用者との
対話を打ち切りオペレータに切り替えるものである。According to a fifth aspect of the present invention, in the speech dialogue apparatus, the dialogue procedure storage unit holds a dialogue procedure defining a speech recognition rate distribution in each dialogue state, and is held in the speech recognition correct / incorrect count storage unit. Using the number of correct and incorrect speech recognition
A speech recognition rate estimator for estimating and outputting a user's speech recognition rate up to the current conversation state, a speech recognition rate output by the speech recognition rate estimator, and a speech recognition rate distribution in the current conversation state. A voice recognition success possibility determining unit that determines the possibility that the user's input is correctly recognized and outputs a determination result, wherein the dialog management unit determines the possibility of the speech recognition success possibility determining unit. Based on the result, the dialog with the user is terminated and switched to the operator.

【００１２】この発明の請求項６に係る音声対話装置
は、各対話状態における、利用者の該対話状態までの推
定音声認識率と該対話状態における音声認識結果の正誤
の履歴を蓄積する音声認識正誤履歴蓄積部と、前記音声
認識正誤履歴蓄積部を参照して、各対話状態における音
声認識率分布を計算し、前記対話手順記憶部に保持され
た音声認識率分布を更新する音声認識率分布更新部とを
さらに備えたものである。According to a sixth aspect of the present invention, there is provided a speech recognition apparatus for storing an estimated speech recognition rate of a user in each conversation state up to the conversation state and a history of correctness of speech recognition results in the conversation state. A speech recognition rate distribution for calculating a speech recognition rate distribution in each dialogue state with reference to the true / false history accumulating section and the speech recognition true / false history accumulating section, and updating the speech recognition rate distribution held in the dialogue procedure storage section. An update unit is further provided.

【００１３】[0013]

【発明の実施の形態】実施の形態１．この発明の実施の
形態１に係る音声対話装置について図面を参照しながら
説明する。図１は、この発明の実施の形態１に係る音声
対話装置の構成を示す図である。なお、各図中、同一符
号は同一又は相当部分を示す。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiment 1 A voice interactive device according to Embodiment 1 of the present invention will be described with reference to the drawings. FIG. 1 is a diagram showing a configuration of a voice interaction device according to Embodiment 1 of the present invention. In the drawings, the same reference numerals indicate the same or corresponding parts.

【００１４】図１において、１は入力音声に対して認識
処理を行い音声認識結果を出力する音声認識部、２は各
対話状態における、音声認識対象語彙、音声認識結果お
よび誤認識回数に応じた遷移先対話状態を規定した対話
手順を保持する対話手順記憶部、３は音声認識の正誤回
数を保持する音声認識正誤回数記憶部、４は音声認識正
誤回数記憶部３に保持された音声認識の正誤回数と音声
認識部１が出力する音声認識結果に基づいて、対話手順
記憶部２に保持された対話手順を参照して遷移先対話状
態を決定し出力する遷移先対話状態決定部、５は音声認
識部１が出力する認識結果に対する正誤結果を出力し、
遷移先対話状態決定部４が出力する対話状態へ対話状態
を遷移する対話管理部である。In FIG. 1, reference numeral 1 denotes a speech recognition unit which performs a recognition process on an input speech and outputs a speech recognition result, and 2 denotes a speech recognition target vocabulary, a speech recognition result, and the number of erroneous recognitions in each dialogue state. A dialogue procedure storage unit that holds a dialogue procedure that defines a transition destination dialogue state, 3 is a speech recognition correct / error count storage unit that holds the number of correct / incorrect counts of voice recognition, and 4 is a speech recognition A transition destination dialog state determination unit that determines and outputs a transition destination dialog state with reference to the dialog procedure stored in the dialog procedure storage unit 2 based on the number of correct / errors and the speech recognition result output by the speech recognition unit 1, 5 Outputting a correct or incorrect result with respect to the recognition result output by the voice recognition unit 1;
This is a dialog management unit that changes the dialog state to the dialog state output by the transition destination dialog state determination unit 4.

【００１５】つぎに、この実施の形態１に係る音声対話
装置の動作について図面を参照しながら説明する。図２
及び図３は、この発明の実施の形態１に係る音声対話装
置の対話手順記憶部に保持された対話手順の一例を示す
図である。Next, the operation of the voice interaction apparatus according to the first embodiment will be described with reference to the drawings. FIG.
FIG. 3 and FIG. 3 are diagrams showing an example of a dialogue procedure stored in a dialogue procedure storage unit of the voice dialogue apparatus according to Embodiment 1 of the present invention.

【００１６】以下、音声対話装置を電話番号案内に用い
た場合について具体的な動作説明を行う。電話番号案内
音声対話装置とは、利用者が装置と音声で対話すること
で、電話番号案内に必要な、住所、対象名などの項目情
報を入力し、装置は入力された項目に基づき電話番号の
検索を行い、利用者に電話番号を案内するものである。Hereinafter, a specific operation of the case where the voice interaction apparatus is used for telephone number guidance will be described. A telephone number guidance voice interactive device is a device in which a user interacts with the device by voice and inputs item information such as an address and a target name necessary for telephone number guidance, and the device makes a telephone number based on the input items. Is performed, and a telephone number is provided to the user.

【００１７】例えば、図２の上段に示す対話状態Ｓ₁₀に
おいては、音声認識対象語彙Ｖ₁₀として日本の全ての県
名、音声認識結果および誤認識回数に応じた遷移先対話
状態のテーブルＴ₁₀が規定されている。遷移先対話状態
のテーブルＴ₁₀は、音声認識結果が例えば「神奈川」で
ある場合には誤認識回数に関わらず遷移先対話状態がＳ
₃₅であることを示している。[0017] For example, in the dialog state S ₁₀ shown in the upper part of FIG. 2, all the prefecture name in Japan as the voice recognition target words V _10, table T ₁₀ transition destination dialog state in accordance with the speech recognition result and misrecognition number Is stipulated. Table T ₁₀ transition destination conversational state, the transition destination dialog state regardless of the misrecognition number when the speech recognition result is, for example, "Kanagawa" S
It is ₃₅ .

【００１８】また、図２の下段に示す遷移先対話状態の
テーブルＴ₃₅は、音声認識結果が「はい」であり、例え
ば誤認識回数が２回以下の場合には遷移先対話状態はＳ
₁₂₀、音声認識結果が「はい」であり、誤認識回数が３
回以上５回以下の場合には遷移先対話状態はＳ₁₂₁であ
ることを示している。The table T ₃₅ of the transition destination conversation state shown in the lower part of FIG. 2 shows that the speech recognition result is “Yes”. For example, when the number of erroneous recognitions is two or less, the transition destination conversation state is S.
₁₂₀ , the speech recognition result is "yes" and the number of false recognitions is 3
If the number is not less than 5 and not more than 5 times, it indicates that the transition destination conversation state is _S121 .

【００１９】各対話状態には、音声認識対象語彙、遷移
先対話状態以外の対話制御情報を記述することが可能で
あり、例えば図２の上段の対話状態Ｓ₁₀においては、利
用者への応答として「県名を入力してください」という
応答文Ａ₁₀が規定されている。[0019] Each conversation state, the voice recognition target words, it is possible to describe the interactive control information other than the transition destination dialog state, in the upper dialog state S ₁₀ in FIG. 2, for example, in response to the user response sentence a ₁₀ "please enter the prefecture name" is defined as.

【００２０】図４は、音声認識正誤回数記憶部３に保持
された音声認識の正誤回数の一例を示すものである。利
用者との対話が開始されて現在の対話状態に至るまで
に、音声認識結果が正しかった回数が「７」回、音声認
識結果が誤っていた回数が「２」回であることを表して
いる。FIG. 4 shows an example of the number of correct / incorrect voice recognitions held in the voice recognition correct / incorrect number storage unit 3. As shown in FIG. From the start of the dialogue with the user to the current state of the dialogue, the number of times that the speech recognition result was correct is “7” times, and the number of times that the speech recognition result was incorrect is “2” times. I have.

【００２１】音声認識正誤回数記憶部３に保持される音
声認識の正誤回数が図４である利用者が、対話状態Ｓ₁₀
に到達した場合の動作を説明する。The user right or wrong number is 4 for speech recognition held in the speech recognition errata count storage unit 3, dialog state S ₁₀
Will be described.

【００２２】対話状態Ｓ₁₀に到達すると、対話管理部５
は、対話手順記憶部２に保持された図２に示す対話状態
Ｓ₁₀に対する対話手順を参照して、利用者に対して「県
名を入力してください」と応答する。利用者が「神奈
川」と入力すると音声認識部１は入力音声に対して音声
認識を行ない認識結果「神奈川」を出力する。[0022] Upon reaching the dialogue state S _10, dialogue management unit 5
Refers to the dialogue procedure for dialog state S ₁₀ shown in FIG. 2 held dialogue procedure storage unit 2 responds with "Please enter a prefecture name" to the user. When the user inputs "Kanagawa", the voice recognition unit 1 performs voice recognition on the input voice and outputs a recognition result "Kanagawa".

【００２３】遷移先対話状態決定部４は、対話手順記憶
部２に保持された図２に示す対話状態Ｓ₁₀での遷移先対
話状態のテーブルＴ₁₀を参照して、音声認識部１が出力
する音声認識結果「神奈川」と、音声認識正誤回数記憶
部３に保持された誤認識回数「２」から、遷移先対話状
態をＳ₃₅と決定して出力する。The destination dialog state determination unit 4 refers to the table T ₁₀ of the destination dialog state in the dialog state S ₁₀ shown in FIG. a speech recognition result "Kanagawa" to be the misrecognition number held in the speech recognition errata frequency storage section 3 "2", a transition destination dialog state and determines that S ₃₅ output.

【００２４】対話管理部５は、遷移先対話状態決定部４
が出力する遷移先対話状態Ｓ₃₅へ現在の対話状態を遷移
させ、対話手順記憶部２に保持された図２の下段に示す
対話状態Ｓ₃₅での対話手順を参照して、利用者に対して
「神奈川ですね」と応答する。The dialogue management unit 5 is a transition destination dialogue state determination unit 4
The current dialog state is transited to the transition destination dialog state S ₃₅ output by the user, and the user refers to the dialog procedure in the dialog state S ₃₅ shown in the lower part of FIG. "Is Kanagawa".

【００２５】利用者が「はい」と入力すると、音声認識
部１は入力音声に対して音声認識を行い、音声認識結果
「はい」を出力する。When the user inputs "yes", the voice recognition unit 1 performs voice recognition on the input voice and outputs a voice recognition result "yes".

【００２６】対話管理部５は、確認応答「神奈川です
ね」に対する音声認識結果「はい」に基づき、認識結果
「神奈川」は正しい認識結果と判断し、正解認識が生じ
たことを音声認識正誤回数記憶部３に出力し、音声認識
正誤回数記憶部３に保持された正解認識回数は「８」に
更新される。The dialogue management unit 5 determines that the recognition result "Kanagawa" is a correct recognition result based on the speech recognition result "Yes" for the acknowledgment "Kanagawa is", and determines that correct recognition has occurred. The number of correct recognitions output to the storage unit 3 and stored in the number of correctness / unreliability counts of speech recognition 3 is updated to “8”.

【００２７】遷移先対話状態決定部４は、対話手順記憶
部２に保持された図２の下段に示す対話状態Ｓ₃₅での遷
移先対話状態のテーブルＴ₃₅を参照して、音声認識部１
が出力する音声認識結果「はい」と、音声認識正誤回数
記憶部３に保持された誤認識回数「２」から、遷移先対
話状態をＳ₁₂₀と決定して出力する。The destination dialog state determination unit 4 refers to the table T ₃₅ of the destination dialog state in the dialog state S ₃₅ shown in the lower part of FIG.
There the speech recognition result "Yes" to be output, from the misrecognition number held in the speech recognition errata frequency storage section 3 "2", and outputs a transition destination dialog state to determine the S _120.

【００２８】対話管理部５は、遷移先対話状態決定部４
が出力する遷移先対話状態Ｓ₁₂₀へ現在の対話状態を遷
移させ、対話手順記憶部２に保持された図３の中段に示
す対話状態Ｓ₁₂₀での対話手順を参照して、利用者に対
して「県名以下の住所をどうぞ」と応答する。これに対
し利用者は、例えば「鎌倉市の大船です」と入力し対話
を継続する。The dialog management unit 5 is a transition destination dialog state determination unit 4
There shifts the current dialog state to transition destination dialog state S ₁₂₀ to output, with reference to the dialogue procedure in dialog state S ₁₂₀ shown in the middle of FIG 3 held in the interactive procedure storage unit 2, to the user And reply, "Please enter the address below the prefecture name." On the other hand, the user inputs, for example, “It is Ofuna in Kamakura” and continues the dialogue.

【００２９】一方、音声認識正誤回数記憶部３に保持さ
れる音声認識の正誤回数が図４に示す回数である利用者
が、対話状態Ｓ₁₀において「神奈川」と入力し、音声認
識部１によって「香川」と誤認識された場合について説
明する。On the other hand, the user correctness number of speech recognition held in the speech recognition errata count storage unit 3 is the number of times shown in FIG. 4, type "Kanagawa" in the dialog state S _10, the speech recognition unit 1 A case in which “Kagawa” is erroneously recognized will be described.

【００３０】遷移先対話状態決定部４は、対話手順記憶
部２に保持された図２の上段に示す対話状態Ｓ₁₀での遷
移先対話状態のテーブルＴ₁₀を参照して、音声認識部１
が出力する音声認識結果「香川」と、音声認識正誤回数
記憶部３に保持された誤認識回数「２」から、遷移先対
話状態をＳ₅₃と決定して出力する。The transition destination dialog state determination unit 4 refers to the table T ₁₀ transition destination dialog state in a dialog state S ₁₀ shown in the upper part of FIG. 2 held in the interactive procedure storage unit 2, the speech recognition unit 1
Based on the voice recognition result “Kagawa” output by the device and the number of incorrect recognitions “2” held in the voice recognition correct / incorrect number storage unit 3, the transition destination dialog state is determined to be _S53 and output.

【００３１】対話管理部５は、遷移先対話状態決定部４
が出力する遷移先対話状態Ｓ₅₃へ現在の対話状態を遷移
させ、対話手順記憶部２に保持された図３の上段に示す
対話状態Ｓ₅₃での対話手順を参照して、利用者に対して
「香川ですね」と応答する。The dialog management unit 5 includes a transition destination dialog state determination unit 4
The current dialog state is transited to the transition destination dialog state S ₅₃ output by the user, and the user refers to the dialog procedure in the dialog state S ₅₃ shown in the upper part of FIG. And responds, "It's Kagawa."

【００３２】利用者が「いいえ」と入力すると、音声認
識部１は入力音声に対して音声認識を行い、音声認識結
果「いいえ」を出力する。When the user inputs "No", the voice recognition unit 1 performs voice recognition on the input voice and outputs a voice recognition result "No".

【００３３】対話管理部５は、確認応答「香川ですね」
に対する音声認識結果「いいえ」に基づき、認識結果
「香川」に対して認識誤りと判断し、誤認識が生じたこ
とを音声認識正誤回数記憶部３に出力し、音声認識正誤
回数記憶部３に保持された誤認識回数は「３」に更新さ
れる。The dialogue management unit 5 confirms the response "I'm Kagawa."
Based on the speech recognition result “No”, the recognition result “Kagawa” is determined to be a recognition error, and the fact that erroneous recognition has occurred is output to the speech recognition correct / incorrect number storage unit 3. The held false recognition count is updated to “3”.

【００３４】遷移先対話状態決定部４は、対話手順記憶
部２に保持された図３の上段に示す対話状態Ｓ₅₃での遷
移先対話状態のテーブルＴ₅₃を参照して、音声認識部１
が出力する音声認識結果「いいえ」と、音声認識正誤回
数記憶部３に保持された誤認識回数「３」から、遷移先
対話状態をＳ₁₀と決定して出力する。The transition destination dialog state determination unit 4 refers to the table T ₅₃ transition destination dialog state in a dialog state S ₅₃ shown in the upper part of FIG. 3 held in the interactive procedure storage unit 2, the speech recognition unit 1
There the speech recognition result "No" to be output, from the misrecognition number held in the speech recognition errata frequency storage unit 3 "3", and outputs a transition destination dialog state to determine the S _10.

【００３５】対話状態Ｓ₁₀において再び利用者が県名と
して「神奈川」を入力し、音声認識部１は正しく「神奈
川」認識した場合、遷移先対話状態決定部４は、対話状
態Ｓ ₁₀での遷移先対話状態のテーブルＴ₁₀を参照して、
音声認識結果「神奈川」と、誤認識回数「３」から、遷
移先対話状態をＳ₃₅と決定して出力する。Dialogue state S_TenAgain, the user
And input "Kanagawa".
When the river is recognized, the transition destination dialog state determination unit 4
State S _TenTable T of the transition destination conversation state in_TenSee
Transition from the voice recognition result "Kanagawa" and the number of incorrect recognition "3"
Set the destination dialog state to S₃₅Is determined and output.

【００３６】対話管理部５は、遷移先対話状態Ｓ₃₅へ現
在の対話状態を遷移させ、対話状態Ｓ₃₅での対話手順を
参照して、利用者に対して「神奈川ですね」と応答し、
利用者が「はい」と入力すると、音声認識部１は音声認
識結果「はい」を出力する。The dialogue management unit 5, to the transition destination dialogue state S ₃₅ to transition the current dialog state, with reference to the dialogue procedure in the interactive state S _35, responds that "It is Kanagawa" to the user ,
When the user inputs “Yes”, the voice recognition unit 1 outputs a voice recognition result “Yes”.

【００３７】対話管理部５は、確認応答「神奈川です
ね」に対する音声認識結果「はい」に基づき、認識結果
「神奈川」は正しい認識結果と判断し、正解認識が生じ
たことを音声認識正誤回数記憶部３に出力し、音声認識
正誤回数記憶部３に保持された正解認識回数は「８」に
更新される。Based on the speech recognition result "Yes" for the acknowledgment "Kanagawa is", the dialog management unit 5 judges that the recognition result "Kanagawa" is a correct recognition result, and determines that correct recognition has occurred. The number of correct recognitions output to the storage unit 3 and stored in the number of correctness / unreliability counts of speech recognition 3 is updated to “8”.

【００３８】遷移先対話状態決定部４は、対話状態Ｓ₃₅
での遷移先対話状態のテーブルＴ₃₅を参照して、音声認
識部１が出力する音声認識結果「はい」と、音声認識正
誤回数記憶部３に保持された誤認識回数「３」から、遷
移先対話状態をＳ₁₂₁と決定して出力する。The transition destination dialog state determination unit 4 determines the dialog state S ₃₅
With reference to the table T ₃₅ transition destination dialog state in a speech recognition result "yes" output by the speech recognition unit 1, the misrecognition number held in the speech recognition errata frequency storage unit 3 "3", the transition The previous conversation state is determined as _S121 and output.

【００３９】対話管理部５は、現在の対話状態をＳ₃₅か
らＳ₁₂₁へ遷移させ、図３の下段に示す対話状態Ｓ₁₂₁で
の対話手順を参照して、利用者に対して「市あるいは郡
名を入力してください」と応答する。これに対し利用者
は、例えば「鎌倉」と入力し対話を継続する。The dialogue management unit 5, the current dialog state to transition from S ₃₅ to S _121, with reference to the dialogue procedure in the dialog state S ₁₂₁ shown in the lower part of FIG. 3, "a city or to the user Please enter a county name. " On the other hand, the user inputs, for example, “Kamakura” and continues the dialogue.

【００４０】以上の動作により、誤認識を生じる回数が
少ない利用者に対しては、認識対象語彙を大きくして対
話回数が少なくなる『対話状態Ｓ₁₂₀』のような対話手
順を選択でき、誤認識を生じる回数が多い利用者に対し
ては、対話回数は多くなるが認識対象語彙を小さくする
ことで誤認識を少なくする『対話状態Ｓ₁₂₁』のような
対話手順を選択できる。したがって、利用者の音声認識
率に応じた最適な対話手順を選択できるため、利用者に
応じて最も効率よく対話目的を達成することができる。By the above operation, a user who has a small number of times of occurrence of erroneous recognition can select a dialogue procedure such as "dialogue state _S120 " in which the vocabulary to be recognized is enlarged and the number of times of dialogue is reduced. For a user who has a large number of times of recognition, a dialogue procedure such as “dialogue state S ₁₂₁ ” can be selected in which the number of dialogues increases but the vocabulary to be recognized is reduced to reduce erroneous recognition. Therefore, since an optimal dialog procedure according to the user's voice recognition rate can be selected, the purpose of the dialog can be achieved most efficiently according to the user.

【００４１】実施の形態２．この発明の実施の形態２に
係る音声対話装置について図面を参照しながら説明す
る。図５は、この発明の実施の形態２に係る音声対話装
置の構成を示す図である。Embodiment 2 Embodiment 2 A voice dialogue apparatus according to Embodiment 2 of the present invention will be described with reference to the drawings. FIG. 5 is a diagram showing a configuration of a voice interaction device according to Embodiment 2 of the present invention.

【００４２】図５において、１は音声認識部、２は対話
手順記憶部、３は音声認識正誤回数記憶部、４は遷移先
対話状態決定部、５は対話管理部、６は想定音声認識率
検定部である。In FIG. 5, 1 is a voice recognition unit, 2 is a dialogue procedure storage unit, 3 is a voice recognition correct / incorrect count storage unit, 4 is a transition destination dialogue state determination unit, 5 is a dialogue management unit, and 6 is an assumed voice recognition rate. It is a test section.

【００４３】つぎに、この実施の形態２に係る音声対話
装置の動作について図面を参照しながら説明する。図６
及び図７は、この発明の実施の形態２に係る音声対話装
置の対話手順の一例を示す図である。Next, the operation of the voice interaction apparatus according to the second embodiment will be described with reference to the drawings. FIG.
And FIG. 7 are diagrams showing an example of a dialogue procedure of the voice dialogue device according to Embodiment 2 of the present invention.

【００４４】対話手順記憶部２、遷移先対話状態決定部
４、及び想定音声認識率検定部６の動作について説明す
る。なお、音声認識部１、音声認識正誤回数記憶部３及
び対話管理部５の動作は、上記の実施の形態１と同じな
ので省略する。The operation of the dialogue procedure storage unit 2, the transition destination dialogue state determination unit 4, and the assumed speech recognition rate test unit 6 will be described. The operations of the voice recognition unit 1, the voice recognition correct / incorrect number storage unit 3, and the dialogue management unit 5 are the same as those in the first embodiment, and a description thereof will be omitted.

【００４５】例えば、図６の上段に示す対話状態Ｓ₁₀に
おいては、音声認識対象語彙Ｖ₁₀として日本の全ての県
名、音声認識結果および想定認識率に応じた遷移先対話
状態のテーブルＴ₁₀が規定されている。遷移先対話状態
のテーブルＴ₁₀は、音声認識結果が「神奈川」である場
合には想定認識率に関わらず遷移先対話状態がＳ₃₅であ
ることを示している。また、図６の下段に示す遷移先対
話状態のテーブルＴ₃₅は、音声認識結果が「はい」であ
り、利用者に対する想定認識率が９０％の場合には遷移
先対話状態がＳ₁₂₀、音声認識結果が「はい」であり、
利用者に対する想定認識率が８０％場合には遷移先対話
状態はＳ₁₂₁であることを示している。[0045] For example, in the dialog state S ₁₀ shown in the upper part of FIG. 6, all the prefecture name in Japan as the voice recognition target words V _10, table T ₁₀ transition destination dialog state in accordance with the speech recognition result and the expected recognition rate Is stipulated. Table T ₁₀ transition destination conversational state, the speech recognition result is a transition destination dialog state regardless assumed recognition rate in the case of "Kanagawa" indicates that the S _35. In the table T ₃₅ of the transition destination dialog state shown in the lower part of FIG. 6, the speech recognition result is “Yes”, and when the assumed recognition rate for the user is 90%, the transition destination dialog state is S ₁₂₀ , and the voice is If the recognition result is "yes",
If the assumed recognition rate for the user is 80%, this indicates that the transition destination dialog state is _S121 .

【００４６】音声認識正誤回数記憶部３に保持される音
声認識の正誤回数が図４に示す回数である利用者が、対
話状態Ｓ₁₀に到達した場合の動作を説明する。The user errata number of speech recognition held in the speech recognition errata count storage unit 3 is the number of times shown in FIG. 4, the operation when it reaches the dialog state S _10.

【００４７】対話状態Ｓ₁₀に到達すると、対話管理部５
は、対話手順記憶部２に保持された図６の上段に示す対
話状態Ｓ₁₀に対する対話手順を参照して、利用者に対し
て「県名を入力してください」と応答する。利用者が
「神奈川」と入力すると、音声認識部１は、入力音声に
対して音声認識を行ない認識結果「神奈川」を出力す
る。[0047] Upon reaching the dialogue state S _10, dialogue management unit 5
Refers to the dialogue procedure for dialog state S ₁₀ shown in the upper part of FIG. 6, which is held in the interactive procedure storage unit 2 responds with "Please enter a prefecture name" to the user. When the user inputs “Kanagawa”, the voice recognition unit 1 performs voice recognition on the input voice and outputs a recognition result “Kanagawa”.

【００４８】想定音声認識率検定部６は、音声認識結果
「神奈川」に対する想定認識率が任意なので検定は行わ
ない。The assumed speech recognition rate test section 6 does not perform a test because the assumed recognition rate for the speech recognition result "Kanagawa" is arbitrary.

【００４９】遷移先対話状態決定部４は、対話手順記憶
部２に保持された図６の上段に示す対話状態Ｓ₁₀での遷
移先対話状態のテーブルＴ₁₀を参照して、音声認識部１
が出力する音声認識結果「神奈川」から遷移先対話状態
をＳ₃₅と決定して出力する。The transition destination dialog state determination unit 4 refers to the table T ₁₀ transition destination dialog state in a dialog state S ₁₀ shown in the upper part of FIG. 6, which is held in the interactive procedure storage unit 2, the speech recognition unit 1
But the transition destination dialogue state determines and outputs the S ₃₅ from the output to the speech recognition result "Kanagawa".

【００５０】図６の下段に示す対話状態Ｓ₃₅での応答
「神奈川ですね」に対し、利用者が「はい」と入力する
と、対話管理部５は正解認識が生じたことを音声認識正
誤回数記憶部３に出力し、音声認識正誤回数記憶部３に
保持された正解認識回数は「８」に更新される。[0050] for the response "It is Kanagawa" in the dialogue state S ₃₅ shown in the lower part of FIG. 6, when the user inputs "yes", dialogue management unit 5 speech recognition errata number of times that a correct recognition occurs The number of correct recognitions output to the storage unit 3 and stored in the number of correctness / unreliability counts of speech recognition 3 is updated to “8”.

【００５１】想定音声認識率検定部６は、対話状態Ｓ₃₅
での対話手順を参照して想定認識率９０％、８０％を仮
説として、音声認識正誤回数記憶部３に保持された音声
認識正誤回数に対して予め定められた危険率で仮説検定
を行う。The assumed speech recognition rate test unit 6 sets the dialogue state S ₃₅
The hypothesis test is performed on the number of correct and incorrect voice recognitions held in the voice recognition correct and incorrect number storage unit 3 at a predetermined risk rate with the assumed recognition rates of 90% and 80% as the hypothesis with reference to the dialog procedure in.

【００５２】仮説検定には、図８に示すような式により
観測値に対するｕ求め、危険率に対するｕ₀を正規分布
表を用いて得て、ｕとｕ₀との比較により仮説の棄却を
判断する公知の手段があるので、それを用いる。なお、
図８において、ｐは仮説、ｋは正解認識回数、ｎは総音
声認識回数すなわち正解認識回数と誤認識回数の和であ
る。[0052] The hypothesis tests, determined u for observations by the formula shown in FIG. 8, obtained using normal distribution table to u ₀ for hazard ratio, determines rejection of the hypothesis by comparing the u and u ₀ Since there is a known means for performing the above, it is used. In addition,
In FIG. 8, p is a hypothesis, k is the number of correct recognitions, and n is the total number of speech recognitions, that is, the sum of the number of correct recognitions and the number of incorrect recognitions.

【００５３】総認識回数が１０回、正解認識回数が８回
について、危険率１０％で仮説９０％に対して検定を行
うと、ｕ＝１．０５４、ｕ₀＝１．２８２であるから、
ｕ＜ｕ₀となり仮説は棄却されない。仮説８０％に対し
て検定を行うとｕ＝０であるからｕ＜ｕ₀となり仮説は
棄却されない。したがって、想定音声認識率検定部６
は、検定結果として９０％と８０％を出力する。When the total number of times of recognition is 10 and the number of times of correct answer recognition is 8 and a test is performed on a hypothesis of 90% with a risk rate of 10%, u = 1.154 and u ₀ = 1.282.
u <u _{0 and the} hypothesis is not rejected. When a test is performed on the hypothesis 80%, u = 0, so that u <u _{0 and the} hypothesis is not rejected. Therefore, the assumed speech recognition rate test unit 6
Outputs 90% and 80% as test results.

【００５４】遷移先対話状態決定部４は、想定音声認識
率検定部６が出力する想定認識率９０％と８０％に対し
て例えば最も大きい９０％を選択する。選択の基準は、
利用者をできるかぎり認識率の良い利用者として想定
し、音声入力をなるべく限定せずに少ない対話回数で対
話を完了させるために最も大きい想定認識率を選択す
る、など設計者が予め定める。The transition destination dialog state determination unit 4 selects, for example, 90%, which is the largest of the assumed recognition rates of 90% and 80% output from the assumed speech recognition rate test unit 6. The criteria for selection are:
The designer presupposes that the user is assumed to be a user having the highest recognition rate as much as possible and selects the largest assumed recognition rate in order to complete the conversation with a small number of conversations without limiting the voice input as much as possible.

【００５５】遷移先対話状態決定部４は、対話手順記憶
部２に保持された図６の下段に示す対話状態Ｓ₃₅での遷
移先対話状態のテーブルＴ₃₅を参照して、音声認識部１
が出力する音声認識結果「はい」と、決定した想定認識
率９０％から、遷移先対話状態をＳ₁₂₀と決定して出力
する。The destination dialog state determination unit 4 refers to the table T ₃₅ of the destination dialog state in the dialog state S ₃₅ shown in the lower part of FIG.
There the output to the speech recognition result "Yes", from the determined assumed recognition rate of 90%, and outputs the transition destination dialog state to determine the S _120.

【００５６】対話管理部５は、遷移先対話状態決定部４
が出力する遷移先対話状態Ｓ₁₂₀へ現在の対話状態を遷
移させ、対話手順記憶部２に保持された図７の中段に示
す対話状態Ｓ₁₂₀での対話手順を参照して、利用者に対
して「県名以下の住所をどうぞ」と応答する。これに対
し利用者は、例えば「鎌倉市の大船です」と入力し対話
を継続する。The dialog management unit 5 includes a transition destination dialog state determination unit 4
There shifts the current dialog state to transition destination dialog state S ₁₂₀ to output, with reference to the dialogue procedure in dialog state S ₁₂₀ shown in the middle of FIG 7 held in the interactive procedure storage unit 2, to the user And reply, "Please enter the address below the prefecture name." On the other hand, the user inputs, for example, “It is Ofuna in Kamakura” and continues the dialogue.

【００５７】一方、音声認識正誤回数記憶部３に保持さ
れる音声認識の正誤回数が図４に示す回数である利用者
が、対話状態Ｓ₁₀において「神奈川」と入力し、音声認
識部１によって「香川」と誤認識された場合について説
明する。Meanwhile, user correctness number of speech recognition held in the speech recognition errata count storage unit 3 is the number of times shown in FIG. 4, type "Kanagawa" in the dialog state S _10, the speech recognition unit 1 A case in which “Kagawa” is erroneously recognized will be described.

【００５８】上記の実施の形態１と同様に、対話状態Ｓ
₁₀において再び利用者が県名として「神奈川」を入力
し、音声認識部１は正しく「神奈川」と認識した場合、
遷移先対話状態決定部４は、対話状態Ｓ₁₀での遷移先対
話状態のテーブルＴ₁₀を参照して、音声認識結果「神奈
川」から遷移先対話状態をＳ₃₅と決定し、対話管理部５
は、遷移先対話状態Ｓ₃₅へ現在の対話状態を遷移させ、
利用者に対して「神奈川ですね」と応答し、利用者が
「はい」と入力すると、音声認識部１は音声認識結果
「はい」を出力する。As in the first embodiment, the dialogue state S
_{If the} user inputs “Kanagawa” again as the prefecture name in ₁₀ and the voice recognition unit 1 correctly recognizes “Kanagawa”,
Transition destination dialog state determination unit 4 refers to the table T ₁₀ transition destination dialog state in a dialog state S _10, a transition destination dialog state is determined as S ₃₅ from the speech recognition result "Kanagawa", the dialog management unit 5
Makes the current conversation state transition to the transition destination conversation state S ₃₅ ,
When the user responds "I am Kanagawa" and the user inputs "Yes", the voice recognition unit 1 outputs the voice recognition result "Yes".

【００５９】対話管理部５は、確認応答「神奈川です
ね」に対する音声認識結果「はい」に基づき、認識結果
「神奈川」は正しい認識結果と判断し、正解認識が生じ
たことを音声認識正誤回数記憶部３に出力し、音声認識
正誤回数記憶部３に保持された正解認識回数は「８」に
更新される。なお、この時点で誤認識回数は「３」であ
る。Based on the speech recognition result "Yes" for the acknowledgment "Kanagawa is", the dialogue management unit 5 determines that the recognition result "Kanagawa" is a correct recognition result, and determines that correct recognition has occurred. The number of correct recognitions output to the storage unit 3 and stored in the number of correctness / unreliability counts of speech recognition 3 is updated to “8”. At this point, the number of erroneous recognitions is “3”.

【００６０】想定音声認識率検定部６は、総認識回数が
１１回、正解認識回数が８回について、危険率１０％で
仮説９０％および８０％に対して検定を行う。９０％に
対しては、ｕ＝１．９１０＞ｕ₀＝１．２８２であり仮
説は棄却される。８０％に対しては、ｕ＝０．６＜ｕ₀
＝１．２８２であり仮説は棄却されない。したがって、
想定音声認識率検定部６は検定結果として８０％を出力
する。The assumed speech recognition rate test unit 6 tests the hypotheses 90% and 80% at a risk rate of 10% for a total recognition count of 11 and a correct recognition count of 8 times. For 90%, u = 1.910> u ₀ = 1.282 and the hypothesis is rejected. For 80%, u = 0.6 <u ₀
= 1.282 and the hypothesis is not rejected. Therefore,
The assumed speech recognition rate test unit 6 outputs 80% as a test result.

【００６１】遷移先対話状態決定部４は、対話手順記憶
部２に保持された図６の下段に示す対話状態Ｓ₃₅での遷
移先対話状態のテーブルＴ₃₅を参照して、音声認識部１
が出力する音声認識結果「はい」と、決定した想定認識
率８０％から、遷移先対話状態をＳ₁₂₁と決定して出力
する。The destination dialog state determination unit 4 refers to the table T ₃₅ of the destination dialog state in the dialog state S ₃₅ shown in the lower part of FIG.
There the output to the speech recognition result "Yes", from the determined assumed recognition rate of 80%, and outputs the transition destination dialog state to determine the S _121.

【００６２】対話管理部５は、現在の対話状態をＳ₃₅か
らＳ₁₂₁へ遷移させ、図７の下段に示す対話状態Ｓ₁₂₁で
の対話手順を参照して、利用者に対して「市あるいは郡
名を入力してください」と応答する。これに対し利用者
は、例えば「鎌倉」と入力し対話を継続する。[0062] dialogue management unit 5, the current dialog state to transition from S ₃₅ to S _121, with reference to the dialogue procedure in the dialog state S ₁₂₁ shown in the lower part of FIG. 7, "city or to the user Please enter a county name. " On the other hand, the user inputs, for example, “Kamakura” and continues the dialogue.

【００６３】以上の動作により、利用者の音声認識正誤
回数に基づいた想定音声認識の検定結果に基づいて対話
手順を変更するため、想定認識率が良い利用者に対して
は、認識対象語彙を大きくして対話回数が少なくなる対
話状態Ｓ₁₂₀のような対話手順を選択でき、想定認識率
が悪い利用者に対しては、対話回数は多くなるが認識対
象語彙を小さくすることで誤認識を少なくする対話状態
Ｓ₁₂₁のような対話手順を選択できる。したがって、利
用者の音声認識率に応じた最適な対話手順を選択できる
ため、利用者に応じて最も効率よく対話目的を達成する
ことができる。By the above operation, the dialogue procedure is changed based on the test result of the assumed speech recognition based on the number of correct / incorrect speech recognitions of the user. increased to be selected interactive procedure such as a dialog state S ₁₂₀ the interaction number is reduced, the relative the assumed recognition rate is poor user interaction number erroneously that becomes more reduced the recognition target vocabulary recognition A dialog procedure such as the dialog state _S121 to be reduced can be selected. Therefore, since an optimal dialog procedure according to the user's voice recognition rate can be selected, the purpose of the dialog can be achieved most efficiently according to the user.

【００６４】実施の形態３．この発明の実施の形態３に
係る音声対話装置について図面を参照しながら説明す
る。図９は、この発明の実施の形態３に係る音声対話装
置の構成を示す図である。Embodiment 3 Embodiment 3 A voice interactive device according to Embodiment 3 of the present invention will be described with reference to the drawings. FIG. 9 is a diagram showing a configuration of a voice interaction device according to Embodiment 3 of the present invention.

【００６５】図９において、１は音声認識部、２は対話
手順記憶部、３は音声認識正誤回数記憶部、４は遷移先
対話状態決定部、５は対話管理部である。In FIG. 9, 1 is a voice recognition unit, 2 is a dialogue procedure storage unit, 3 is a voice recognition correct / incorrect number-of-times storage unit, 4 is a transition destination dialogue state determination unit, and 5 is a dialogue management unit.

【００６６】つぎに、この実施の形態３に係る音声対話
装置の動作について図面を参照しながら説明する。Next, the operation of the voice interaction apparatus according to the third embodiment will be described with reference to the drawings.

【００６７】対話管理部５の動作について説明する。な
お、音声認識部１、対話手順記憶部２、音声認識正誤回
数記憶部３、及び遷移先対話状態決定部４の動作は、上
記の実施の形態１と同じなので省略する。The operation of the dialog management unit 5 will be described. Note that the operations of the voice recognition unit 1, the dialogue procedure storage unit 2, the voice recognition correct / incorrect number-of-times storage unit 3, and the transition destination dialogue state determination unit 4 are the same as those in the first embodiment, and a description thereof will be omitted.

【００６８】音声認識正誤回数記憶部３に保持される音
声認識の正誤回数が、正解認識回数１０回、誤認識回数
７回である場合に、利用者が図２上段に示す対話状態Ｓ
₁₀に到達し、実施の形態１と同様に「県名を入力してく
ださい」に対し利用者が「神奈川」と入力した場合、音
声認識部１が「香川」と誤認識した場合の動作を説明す
る。When the number of correct / incorrect voice recognitions held in the voice recognition correct / incorrect number storage unit 3 is 10 times of correct answer recognition and 7 times of false recognition, the user enters the dialogue state S shown in the upper part of FIG.
_{When the} user arrives at ₁₀ and inputs "Kanagawa" for "Please enter the prefecture name" as in the first embodiment, the operation performed when the voice recognition unit 1 misrecognizes "Kagawa" is explain.

【００６９】遷移先対話状態決定部４が遷移先対話状態
のテーブルＴ₁₀を参照して、音声認識結果「香川」から
遷移先対話状態をＳ₅₃と決定して出力し、対話管理部５
が対話状態をＳ₅₃へ遷移させ「香川ですね」と応答する
と、利用者は「いいえ」と入力する。[0069] transition destination dialog state determination unit 4 refers to the table T ₁₀ transition destination dialog state, the transition destination dialog state and outputs the determined S ₅₃ from the speech recognition result "Kagawa", the dialog management unit 5
But when to transition the conversation state to the S ₅₃ to respond "It is Kagawa", the user inputs "No".

【００７０】対話管理部５は誤認識が生じたことを出力
し、音声認識正誤回数記憶部３に保持された誤認識回数
は「８」に更新される。The dialog management section 5 outputs that erroneous recognition has occurred, and the number of erroneous recognitions held in the voice recognition correct / incorrect number storage section 3 is updated to “8”.

【００７１】遷移先対話状態決定部４は、図３の上段に
示す遷移先対話状態のテーブルＴ₅₃を参照して、音声認
識結果「いいえ」と音声認識正誤回数記憶部３に保持さ
れた誤認識回数「８」に基づいて、遷移先対話状態を終
了対話状態であるＳ_endと決定して出力する。[0071] transition destination dialog state determination unit 4 refers to the table T ₅₃ transition destination dialog state shown in the upper part of FIG. 3, erroneously held speech recognition result "No" to the speech recognition errata count storage unit 3 Based on the number of times of recognition “8”, the transition destination conversation state is determined as S _end which is the end conversation state, and is output.

【００７２】対話管理部５は、遷移先対話状態決定部４
から対話状態Ｓ_endが入力されると、利用者に対して電
話番号を案内したか否かを調べ、案内していないならば
装置との対話を打ち切りオペレータへ対話を切り替え
る。The dialogue management unit 5 is provided with a transition destination dialogue state determination unit 4
When the dialogue state _Send is input from, it is determined whether or not the user has been guided to the telephone number. If the telephone number has not been guided, the dialogue with the device is terminated and the dialogue is switched to the operator.

【００７３】電話番号を案内したか否かは、例えば対話
管理部５内に、初期値として「０」を与えておき、案内
応答を実行した場合に値を「１」に変更するカウンタを
１つ設けておき、該カウンタを調べればよい。Whether the telephone number has been guided or not is determined by, for example, assigning “0” as an initial value to the dialogue management unit 5 and setting a counter that changes the value to “1” when a guidance response is executed by 1 And the counter may be checked.

【００７４】以上の動作により、認識率が低く対話目的
達成の見込みがない利用者に対しては、対話をオペレー
タへ切り替えることができ、利用者は効率よく対話目的
を達成することができる。With the above operation, for a user who has a low recognition rate and is unlikely to achieve the dialogue purpose, the dialogue can be switched to the operator, and the user can efficiently achieve the dialogue purpose.

【００７５】実施の形態４．この発明の実施の形態４に
係る音声対話装置について図面を参照しながら説明す
る。図１０は、この発明の実施の形態４に係る音声対話
装置の構成を示す図である。Embodiment 4 Embodiment 4 A speech dialogue apparatus according to Embodiment 4 of the present invention will be described with reference to the drawings. FIG. 10 is a diagram showing a configuration of a voice interaction device according to Embodiment 4 of the present invention.

【００７６】図１０において、１は音声認識部、２は対
話手順記憶部、３は音声認識正誤回数記憶部、４は遷移
先対話状態決定部、５は対話管理部、６は想定音声認識
率検定部である。In FIG. 10, 1 is a speech recognition unit, 2 is a dialogue procedure storage unit, 3 is a speech recognition correct / incorrect number storage unit, 4 is a transition destination dialogue state determination unit, 5 is a dialogue management unit, and 6 is an assumed speech recognition rate. It is a test section.

【００７７】つぎに、この実施の形態４に係る音声対話
装置の動作について図面を参照しながら説明する。図１
１は、この発明の実施の形態４に係る音声対話装置の対
話手順の一例を示す図である。Next, the operation of the voice interaction apparatus according to the fourth embodiment will be described with reference to the drawings. FIG.
FIG. 1 is a diagram showing an example of a dialogue procedure of a voice interactive device according to Embodiment 4 of the present invention.

【００７８】対話手順記憶部２及び遷移先対話状態決定
部４の動作について説明する。なお、音声認識部１、音
声認識正誤回数記憶部３、対話管理部５及び想定音声認
識率検定部６の動作は、実施の形態２と同じなので省略
する。The operation of the dialog procedure storage unit 2 and the transition destination dialog state determination unit 4 will be described. The operations of the speech recognition unit 1, the speech recognition correct / incorrect number storage unit 3, the dialogue management unit 5, and the assumed speech recognition rate test unit 6 are the same as those in the second embodiment, and therefore will not be described.

【００７９】例えば、図１１の上段に示す対話状態Ｓ₁₀
においては、音声認識対象語彙Ｖ₁₀として日本の全ての
県名、音声認識結果および想定認識率に応じた遷移先対
話状態のテーブルＴ₁₀、終了対話状態までの平均対話回
数の想定音声認識率ごとのテーブルＮ₁₀が規定されてい
る。For example, the dialogue state S ₁₀ shown in the upper part of FIG.
, A table T ₁₀ of transition destination conversation states according to all prefecture names, speech recognition results, and assumed recognition rates as speech recognition target vocabulary V ₁₀ , and an average number of conversations up to the end dialog state for each assumed speech recognition rate table N ₁₀ is defined.

【００８０】対話状態Ｓ₁₀における終了対話状態までの
平均対話回数としては、例えば、想定音声認識率が一定
で、誤認識が生じないと仮定した場合に、対話状態Ｓ₁₀
から到達可能な全ての終了対話状態までの状態遷移回数
の平均値を近似的に用いる。The average number of conversations up to the end conversation state in the conversation state S ₁₀ is, for example, assuming that the assumed speech recognition rate is constant and no erroneous recognition occurs, and the conversation state S ₁₀
The average value of the number of state transitions from to all reachable end dialogue states is used approximately.

【００８１】音声認識正誤回数記憶部３に保持される音
声認識の正誤回数が図４に示す回数である利用者が対話
状態Ｓ₁₀に到達した場合の動作を説明する。[0081] illustrating the operation when the user right or wrong number is the number of times shown in FIG. 4 of the speech recognition held in the speech recognition errata count storage unit 3 has reached the dialog state S _10.

【００８２】対話管理部５の応答「県名を入力してくだ
さい」に利用者が「神奈川」と入力し、対話管理部５の
応答「神奈川ですね」に利用者が「はい」と入力するま
での動作は実施の形態２と同様である。想定音声認識率
検定部６は実施の形態２と同様に動作し、検定結果とし
て９０％と８０％を出力する。The user inputs "Kanagawa" in the response "Enter the prefecture name" of the dialog management unit 5, and the user inputs "Yes" in the response "Kanagawa is" of the dialog management unit 5. The operations up to this point are the same as in the second embodiment. The assumed speech recognition rate test unit 6 operates in the same manner as in the second embodiment, and outputs 90% and 80% as test results.

【００８３】遷移先対話状態決定部４は、図１１の下段
に示したＳ₃₅における想定音声認識毎の平均対話回数の
テーブルＮ₃₅を参照して、想定音声認識率検定部４が出
力する想定音声認識率９０％と８０％から、最も平均対
話回数の少ない９０％を選択し、遷移先対話状態をＳ
₁₂₀と決定して出力する。The transition destination conversation state determination unit 4 refers to the table N ₃₅ of the average number of conversations for each assumed speech recognition in S ₃₅ shown in the lower part of FIG. From the speech recognition rates of 90% and 80%, 90% with the smallest average number of conversations is selected, and the transition destination conversation state is set to S.
Determined as ₁₂₀ and output.

【００８４】以上の動作により、利用者に対する想定音
声認識率に加え、想定音声認識率に応じた平均対話回数
を用いて対話手順を変更するため、利用者は最も効率よ
く対話目的を達成することができる。According to the above operation, in addition to the assumed speech recognition rate for the user, the conversation procedure is changed using the average number of conversations according to the assumed speech recognition rate, so that the user can achieve the conversation purpose most efficiently. Can be.

【００８５】実施の形態５．この発明の実施の形態５に
係る音声対話装置について図面を参照しながら説明す
る。図１２は、この発明の実施の形態５に係る音声対話
装置の構成を示す図である。Embodiment 5 FIG. Embodiment 5 A speech dialogue apparatus according to Embodiment 5 of the present invention will be described with reference to the drawings. FIG. 12 is a diagram showing a configuration of a voice interaction device according to Embodiment 5 of the present invention.

【００８６】図１２において、１は音声認識部、２は対
話手順記憶部、３は音声認識正誤回数記憶部、４は遷移
先対話状態決定部、５は対話管理部、７は音声認識率推
定部、８は音声認識成功可能性判定部である。In FIG. 12, 1 is a voice recognition unit, 2 is a dialogue procedure storage unit, 3 is a voice recognition correct / incorrect count storage unit, 4 is a transition destination dialogue state determination unit, 5 is a dialogue management unit, and 7 is a voice recognition rate estimation. A unit 8 is a speech recognition success possibility determining unit.

【００８７】つぎに、この実施の形態５に係る音声対話
装置の動作について図面を参照しながら説明する。図１
３は、この発明の実施の形態５に係る音声対話装置の対
話手順の一例を示す図である。Next, the operation of the voice interaction apparatus according to the fifth embodiment will be described with reference to the drawings. FIG.
FIG. 3 is a diagram showing an example of a dialogue procedure of the voice interactive device according to Embodiment 5 of the present invention.

【００８８】対話手順記憶部２、対話管理部５、音声認
識率推定部７及び音声認識成功可能性判定部８の動作に
ついて説明する。なお、音声認識部１、音声認識正誤回
数記憶部３及び遷移先対話状態決定部４の動作は、実施
の形態１と同じなので省略する。The operation of the dialogue procedure storage unit 2, the dialogue management unit 5, the speech recognition rate estimation unit 7, and the speech recognition success possibility determination unit 8 will be described. Note that the operations of the voice recognition unit 1, the voice recognition correct / incorrect number storage unit 3, and the transition destination dialog state determination unit 4 are the same as those in the first embodiment, and thus description thereof is omitted.

【００８９】例えば、図１３に示す対話状態Ｓ₁₀におい
ては、音声認識対象語彙Ｖ₁₀として日本の全ての県名、
音声認識結果および誤認識回数に応じた遷移先対話状態
のテーブルＴ₁₀、音声認識対象語彙Ｖ₁₀に対する音声認
識率の分布として、平均値８５、分散１０の正規分布Ｄ
₁₀：Ｎ（８５、１０）が規定されている。[0089] For example, in an interactive state S ₁₀ shown in FIG. 13, all of the prefecture name of Japan as the voice recognition target vocabulary V _10,
Table T ₁₀ transition destination dialog state in accordance with the speech recognition result and misrecognition number, as a distribution of the speech recognition accuracy for speech recognition target vocabulary V _10, average value 85, normal distribution D of the dispersion 10
₁₀ : N (85, ₁₀ ) is specified.

【００９０】音声認識正誤回数記憶部３に保持される音
声認識の正誤回数が図４に示す回数である利用者が対話
状態Ｓ₁₀に到達した場合の動作を説明する。[0090] illustrating the operation when the user right or wrong number is the number of times shown in FIG. 4 of the speech recognition held in the speech recognition errata count storage unit 3 has reached the dialog state S _10.

【００９１】音声認識率推定部７は、音声認識正誤回数
記憶部３を参照して、正解認識回数「７」、誤認識回数
「２」より、例えば最尤推定法を用いて利用者の推定認
識率Ｒ_u＝７／９×１００＝７８％を計算し出力する。The speech recognition rate estimating section 7 refers to the speech recognition correct / incorrect number storage section 3 and, based on the number of correct recognitions “7” and the number of incorrect recognitions “2”, estimates the user using, for example, the maximum likelihood estimation method. The recognition rate Ru = 7 / _9.times.100 = 78% is calculated and output.

【００９２】音声認識成功可能性判定部８は、音声認識
率推定部７が出力する利用者の推定認識率Ｒ_u＝７８％
と、対話状態Ｓ₁₀において規定された音声認識率の分布
から、利用者が音声認識率分布の予め定められた基準以
上の部分に含まれているか否かを判定する。The speech recognition success possibility judging section 8 outputs the estimated user recognition rate R _u = 78% output from the speech recognition rate estimating section 7.
If, it determines the distribution of the speech recognition rate which is defined in dialog state S _10, whether the user is included in a predetermined reference or more portions of the speech recognition rate distribution.

【００９３】例えば、基準が５０％であれば、正規分布
Ｎ（８５、１０）の５０％を含む認識率区間はＲ_L＝７
８．２≦Ｒ≦９１．８であり、利用者の推定認識率Ｒ_u
は区間の下限Ｒ_L以下である。したがって、音声認識成
功可能性判定部８は、利用者は音声認識成功可能性が無
いと判定する。For example, if the reference is 50%, the recognition rate section including 50% of the normal distribution N (85, 10) is R _L = 7.
8.2 ≦ a R ≦ 91.8, estimated recognition rate of the user R _u
Is not more than the lower limit _RL of the section. Therefore, the speech recognition success possibility determination unit 8 determines that the user has no possibility of successful speech recognition.

【００９４】対話管理部５は、音声認識成功可能性判定
部８の判定結果が音声認識可能性無しであるので、利用
者との対話を打ち切りオペレータに切り替える。Since the result of the speech recognition success possibility judging section 8 is that there is no possibility of speech recognition, the dialog management section 5 terminates the dialog with the user and switches to the operator.

【００９５】以上の動作により、音声認識成功可能性判
定部８により判定された利用者の音声認識可能性に基づ
き対話手順を変更するので、音声認識成功の可能性が低
い利用者が装置との無駄な対話を行うこと無くオペレー
タに切り替えが行われ、利用者は効率よく対話目的を達
成することができる。By the above operation, the dialogue procedure is changed based on the user's voice recognition possibility determined by the voice recognition success possibility determination section 8, so that the user who has a low possibility of successful voice recognition can communicate with the device. Switching to the operator is performed without performing useless conversation, and the user can efficiently achieve the purpose of the conversation.

【００９６】実施の形態６．この発明の実施の形態６に
係る音声対話装置について図面を参照しながら説明す
る。図１４は、この発明の実施の形態６に係る音声対話
装置の構成を示す図である。Embodiment 6 FIG. Embodiment 6 A speech dialogue apparatus according to Embodiment 6 of the present invention will be described with reference to the drawings. FIG. 14 is a diagram showing a configuration of a voice interaction device according to Embodiment 6 of the present invention.

【００９７】図１４において、１は音声認識部、２は対
話手順記憶部、３は音声認識正誤回数記憶部、４は遷移
先対話状態決定部、５は対話管理部、７は音声認識率推
定部、８は音声認識成功可能性判定部、９は音声認識率
正誤履歴蓄積部、１０は音声認識率分布更新部である。In FIG. 14, 1 is a voice recognition unit, 2 is a dialogue procedure storage unit, 3 is a voice recognition correct / incorrect number storage unit, 4 is a transition destination dialogue state determination unit, 5 is a dialogue management unit, and 7 is a voice recognition rate estimation. And 8, a speech recognition success possibility determining unit, 9 a speech recognition rate correct / error history accumulation unit, and 10 a speech recognition rate distribution updating unit.

【００９８】つぎに、この実施の形態６に係る音声対話
装置の動作について図面を参照しながら説明する。Next, the operation of the voice interaction apparatus according to the sixth embodiment will be described with reference to the drawings.

【００９９】音声認識率正誤履歴蓄積部９及び音声認識
率分布更新部１０の動作について説明する。なお、音声
認識部１、対話手順記憶部２、音声認識正誤回数記憶部
３、遷移先対話状態決定部４、対話管理部５、音声認識
率推定部７及び音声認識成功可能性判定部８の動作は、
実施の形態５と同じなので省略する。The operation of the speech recognition rate correct / incorrect history storage section 9 and the speech recognition rate distribution updating section 10 will be described. Note that the speech recognition unit 1, the dialogue procedure storage unit 2, the speech recognition correct / incorrect count storage unit 3, the transition destination dialogue state determination unit 4, the dialogue management unit 5, the speech recognition rate estimation unit 7, and the speech recognition success possibility determination unit 8 The operation is
The description is omitted because it is the same as in the fifth embodiment.

【０１００】対話手順記憶部２に保持された対話手順が
図１３に示すものであり、音声認識正誤回数記憶部３に
保持される音声認識の正誤回数が正解認識回数８回、誤
認識回数２回の場合、利用者が対話状態Ｓ₁₀に到達した
ときの動作を説明する。The dialog procedure stored in the dialog procedure storage unit 2 is shown in FIG. 13, and the number of correct / errors of speech recognition stored in the voice recognition correct / error count storage unit 3 is eight times of correct answer and two times of false recognition. for times, explaining the operation when the user has reached the dialog state S _10.

【０１０１】音声認識率推定部７は、実施の形態５と同
様にして利用者の推定音声認識率Ｒ _u＝８０％を計算し
出力する。The speech recognition rate estimating section 7 is the same as in the fifth embodiment.
User's estimated speech recognition rate R _u= 80%
Output.

【０１０２】音声認識正誤履歴蓄積部９は、音声認識率
推定部７が出力する利用者の推定音声認識率Ｒ_uに対
し、現在の対話状態Ｓ₁₀を対話管理部５から得て、図１
５に示す対話状態Ｓ₁₀に対する音声認識正誤履歴表を作
成する。なお、既に対話状態Ｓ ₁₀に対する表が存在する
場合には、表の末尾に追加して蓄積する。The speech recognition correct / error history accumulation unit 9 stores the speech recognition rate.
The estimated speech recognition rate R of the user output by the estimation unit 7_uTo
And the current conversation state S_TenIs obtained from the dialog management unit 5 and FIG.
Dialogue state S shown in 5_TenCreates a speech recognition error / correction history table for
To achieve. Note that the conversation state S _TenTable exists for
If this is the case, add it to the end of the table and accumulate it.

【０１０３】音声認識成功可能性判定部８は、実施の形
態５と同様に動作し、音声認識率の分布Ｎ（８５、１
０）において利用者が音声認識成功可能性が有ると判定
する。The speech recognition success possibility judging section 8 operates in the same manner as in the fifth embodiment, and distributes the speech recognition rate distribution N (85, 1).
At 0), it is determined that the user has a possibility of successful voice recognition.

【０１０４】対話管理部５の応答「県名を入力してくだ
さい」に利用者が「神奈川」と入力し、対話管理部５の
応答「神奈川ですね」に利用者が「はい」と入力するま
での動作は実施の形態５と同様である。The user inputs "Kanagawa" in the response "Please enter the prefecture name" of the dialog management unit 5, and the user inputs "Yes" in the response "Kanagawa is" of the dialog management unit 5. The operations up to this point are the same as in the fifth embodiment.

【０１０５】対話管理部５は、確認応答「神奈川です
ね」に対する音声認識結果「はい」に基づき、認識結果
「神奈川」は正しい認識結果と判断し、正解認識が生じ
たことを音声認識正誤回数記憶部３に出力するととも
に、音声認識正誤履歴蓄積部９にも出力する。Based on the speech recognition result “Yes” for the acknowledgment “Kanagawa is”, the dialog management unit 5 determines that the recognition result “Kanagawa” is a correct recognition result, and determines that correct recognition has occurred. Output to the storage unit 3 and also to the speech recognition correct / error history accumulation unit 9.

【０１０６】音声認識正誤履歴蓄積部９は、対話管理部
５から出力される正解認識判定を、図１５に示す対話状
態Ｓ₁₀に対する音声認識正誤履歴表の、推定音声認識率
８０％の音声認識正誤欄に、図１６に示すように記録す
る。[0106] Voice recognition errata history storage unit 9, a correct recognition determination output from the dialogue management unit 5, the speech recognition errata history table for dialog state S ₁₀ shown in FIG. 15, the speech recognition of the estimated speech recognition rate of 80% It is recorded in the correct / incorrect column as shown in FIG.

【０１０７】以下対話を継続することにより、各対話状
態に対する音声認識正誤履歴表が作成され、さらに複数
の利用者との対話が行われる度に、音声認識正誤履歴蓄
積部９には各対話状態における音声認識率と、該対話状
態での音声認識の正誤が蓄積されていく。By continuing the dialogue, a speech recognition correct / error history table for each dialogue state is created, and each time a dialogue with a plurality of users is performed, the voice recognition corrective / error history storage unit 9 stores each dialogue state. , And the correctness of the speech recognition in the dialogue state is accumulated.

【０１０８】音声認識率分布更新部１０は、音声認識正
誤履歴蓄積部９に蓄積された対話状態毎の音声認識正誤
履歴表を用いて、対話手順記憶部２が保持する各対話状
態における音声認識率分布を更新する。The speech recognition rate distribution updating unit 10 uses the speech recognition correct / error history table for each conversation state stored in the speech recognition correct / error history storage unit 9 to perform speech recognition in each dialog state held by the dialog procedure storage unit 2. Update the rate distribution.

【０１０９】例えば、音声認識正誤履歴蓄積部９に蓄積
された対話状態Ｓ₁₀の音声認識正誤履歴表から、正解認
識に対する音声認識率のみを抜き出したものが図１７に
示ものである場合、例えば最尤推定法を用いて平均値８
２．６３と分散１４．２５が推定値として得られる。[0109] For example, if the speech recognition errata history table dialog state S ₁₀ stored in the speech recognition errata history storage unit 9, is an extract only the audio recognition rate for correct recognition is shown as in FIG. 17, for example Average 8 using maximum likelihood estimation
2.63 and variance 14.25 are obtained as estimates.

【０１１０】音声認識率分布更新部１０は、対話状態Ｓ
₁₀における音声認識率の分布をＮ（８２．６３、１４．
２５）に更新する。[0110] The speech recognition rate distribution updating unit 10 sets the dialogue state S
₁₀ is N (82.63, 14.
Update to 25).

【０１１１】以上の動作により、推定音声認識率と音声
認識正誤判定からなる音声認識正誤履歴表を音声認識正
誤履歴蓄積部９に蓄積し、蓄積した音声認識正誤履歴表
から各対話状態における認識対象語彙に対する音声認識
率の分布を学習できるため、音声認識可能性判定の精度
が向上し、利用者は効率よく対話目的を達成することが
できる。With the above operation, the speech recognition correct / error history table including the estimated speech recognition rate and the voice recognition correct / false judgment is stored in the voice recognition correct / error history storage unit 9, and the recognition target in each dialog state is stored from the stored voice recognition correct / error history table. Since the distribution of the speech recognition rate for the vocabulary can be learned, the accuracy of the speech recognition possibility determination is improved, and the user can efficiently achieve the purpose of the conversation.

【０１１２】[0112]

【発明の効果】この発明の請求項１に係る音声対話装置
は、以上説明したとおり、入力音声に対して認識処理を
行い音声認識結果を出力する音声認識部と、各対話状態
における、音声認識対象語彙、音声認識結果及び誤認識
回数に応じた遷移先対話状態を規定した対話手順を保持
する対話手順記憶部と、音声認識の正誤回数を保持する
音声認識正誤回数記憶部と、前記音声認識正誤回数記憶
部に保持された音声認識の正誤回数と前記音声認識部が
出力する音声認識結果に基づいて、前記対話手順記憶部
に保持された対話手順を参照して遷移先対話状態を決定
して出力する遷移先対話状態決定部と、前記音声認識部
が出力する音声認識結果に対する正誤結果を出力し、前
記遷移先対話状態決定部が出力する遷移先対話状態へ対
話状態を遷移する対話管理部とを備えたので、利用者に
応じて最も効率よく対話目的を達成するための対話手順
を決定できるという効果を奏する。As described above, the speech dialogue apparatus according to the first aspect of the present invention includes a speech recognition unit that performs a recognition process on an input speech and outputs a speech recognition result, and a speech recognition unit in each dialogue state. A dialogue procedure storage unit that holds a dialogue procedure that defines a transition destination dialogue state according to a target vocabulary, a speech recognition result, and the number of erroneous recognitions; a speech recognition true / false count storage unit that holds the number of times of speech recognition errors; Based on the number of correct / incorrect voice recognitions held in the correct / incorrect number storage unit and the speech recognition result output by the voice recognition unit, a transition destination dialog state is determined with reference to the dialog procedure stored in the dialog procedure storage unit. A transition destination dialog state determining unit that outputs a correct / incorrect result for the speech recognition result output by the speech recognition unit, and transitions the dialog state to a transition destination dialog state output by the transition destination dialog state determination unit. Because a talk management unit, an effect that can determine the interaction steps to achieve the most efficient interaction object according to the user.

【０１１３】この発明の請求項２に係る音声対話装置
は、以上説明したとおり、入力音声に対して認識処理を
行い音声認識結果を出力する音声認識部と、各対話状態
における、音声認識対象語彙、音声認識結果及び想定認
識率に応じた遷移先対話状態を規定した対話手順を保持
する対話手順記憶部と、音声認識の正誤回数を保持する
音声認識正誤回数記憶部と、前記音声認識正誤回数記憶
部に保持された音声認識の正誤回数に基づいて、現在の
対話状態に規定された想定認識率に対して検定を行い、
棄却されない想定認識率をすべて出力する想定音声認識
率検定部と、前記対話手順記憶部に保持された対話手順
を参照して、前記音声認識部が出力する音声認識結果と
前記想定音声認識率検定部が出力する想定認識率に対応
する遷移先対話状態から、遷移先対話状態を１つに決定
して出力する遷移先対話状態決定部と、前記音声認識部
が出力する音声認識結果に対する正誤結果を出力し、前
記遷移先対話状態決定部が出力する遷移先対話状態へ対
話状態を遷移する対話管理部とを備えたので、利用者に
応じて最も効率よく対話目的を達成するための対話手順
を決定できるという効果を奏する。As described above, the speech dialogue apparatus according to claim 2 of the present invention performs a recognition process on an input speech and outputs a speech recognition result, and a speech recognition target vocabulary in each dialogue state. A dialogue procedure storage unit that holds a dialogue procedure that defines a transition destination dialogue state according to a speech recognition result and an assumed recognition rate, a speech recognition correct / error count storage unit that holds the number of correctness / errors of speech recognition, Based on the number of correct and incorrect voice recognitions held in the storage unit, a test is performed on the assumed recognition rate specified in the current conversation state,
An assumed speech recognition rate test unit that outputs all assumed recognition rates that are not rejected; and a speech recognition result output by the speech recognition unit and the assumed speech recognition rate test with reference to the dialogue procedure held in the dialogue procedure storage unit. A transition destination dialog state determining unit that determines and outputs one transition destination dialog state from a transition destination dialog state corresponding to an assumed recognition rate output by the unit, and a correct / incorrect result for a speech recognition result output by the voice recognition unit And a dialogue management unit that transitions the dialogue state to the transitional dialogue state output by the transitional dialogue state determination unit, so that a dialogue procedure for achieving the dialogue purpose most efficiently according to the user is provided. Is determined.

【０１１４】この発明の請求項３に係る音声対話装置
は、以上説明したとおり、前記対話管理部が、前記遷移
先対話状態決定部が出力する遷移先対話状態が対話終了
状態であり、かつ利用者の対話目的が達成されていない
場合には、利用者との対話を打ち切りオペレータに切り
替えるので、利用者に応じて最も効率よく対話目的を達
成するための対話手順を決定できるという効果を奏す
る。As described above, in the voice interaction apparatus according to a third aspect of the present invention, the dialog management unit determines that the transition destination dialog state output by the transition destination dialog state determination unit is a dialog end state, and If the user's dialogue purpose is not achieved, the dialogue with the user is terminated and the operator is switched to the operator, so that there is an effect that the dialogue procedure for achieving the dialogue purpose most efficiently can be determined according to the user.

【０１１５】この発明の請求項４に係る音声対話装置
は、以上説明したとおり、前記対話手順記憶部が、各対
話状態における終了対話状態までの平均対話回数を規定
した対話手順を保持し、前記遷移先対話状態決定部が、
前記対話手順記憶部に保持された対話手順を参照して、
前記音声認識部が出力する音声認識結果と、前記想定音
声認識率検定部が出力する想定認識率に対応する遷移先
対話状態から、終了対話状態までの平均対話回数に基づ
いて遷移先対話状態を１つに決定して出力するので、利
用者に応じて最も効率よく対話目的を達成するための対
話手順を決定できるという効果を奏する。As described above, in the voice interaction apparatus according to a fourth aspect of the present invention, the interaction procedure storage unit holds an interaction procedure defining an average number of interactions up to an end interaction state in each interaction state. The transition destination dialog state determination unit
With reference to the interaction procedure held in the interaction procedure storage unit,
From the speech recognition result output by the speech recognition unit and the transition destination conversation state corresponding to the assumed recognition rate output by the assumed speech recognition rate test unit, the transition destination dialog state is determined based on the average number of conversations until the end conversation state. Since the information is determined and output as one, it is possible to determine an interactive procedure for achieving the interactive purpose most efficiently according to the user.

【０１１６】この発明の請求項５に係る音声対話装置
は、以上説明したとおり、前記対話手順記憶部が、各対
話状態における音声認識率分布を規定した対話手順を保
持し、前記音声認識正誤回数記憶部に保持された音声認
識正誤回数を用いて、現在の対話状態までの利用者の音
声認識率を推定して出力する音声認識率推定部と、前記
音声認識率推定部が出力する音声認識率と、現在の対話
状態における音声認識率分布に基づいて、利用者の入力
が正しく認識される可能性を判定して判定結果を出力す
る音声認識成功可能性判定部とをさらに備え、前記対話
管理部が、前記音声認識成功可能性判定部の判定結果に
基づいて、利用者との対話を打ち切りオペレータに切り
替えるので、利用者に応じて最も効率よく対話目的を達
成するための対話手順を決定できるという効果を奏す
る。As described above, in the speech dialogue apparatus according to claim 5 of the present invention, the dialogue procedure storage unit holds a dialogue procedure defining a speech recognition rate distribution in each dialogue state, and A voice recognition rate estimating unit that estimates and outputs a user's voice recognition rate up to the current dialogue state using the number of times of voice recognition correct / error stored in the storage unit; and a voice recognition output by the voice recognition rate estimating unit. And a speech recognition success possibility determining unit that determines a possibility that the input of the user is correctly recognized based on the speech recognition rate distribution in the current dialogue state and outputs a determination result. The management unit terminates the dialogue with the user based on the determination result of the speech recognition success possibility determination unit and switches to the operator, so that the dialogue method for achieving the dialogue purpose most efficiently according to the user is performed. There is an effect that can be determined.

【０１１７】この発明の請求項６に係る音声対話装置
は、以上説明したとおり、各対話状態における、利用者
の該対話状態までの推定音声認識率と該対話状態におけ
る音声認識結果の正誤の履歴を蓄積する音声認識正誤履
歴蓄積部と、前記音声認識正誤履歴蓄積部を参照して、
各対話状態における音声認識率分布を計算し、前記対話
手順記憶部に保持された音声認識率分布を更新する音声
認識率分布更新部とをさらに備えたので、利用者に応じ
て最も効率よく対話目的を達成するための対話手順を決
定できるという効果を奏する。As described above, the speech dialogue apparatus according to claim 6 of the present invention provides, in each dialogue state, the estimated speech recognition rate of the user up to the dialogue state and the history of correctness of the speech recognition result in the dialogue state. With reference to the voice recognition correct / error history storage unit that stores
A speech recognition rate distribution updating unit that calculates a speech recognition rate distribution in each dialogue state and updates the speech recognition rate distribution held in the dialogue procedure storage unit; This has the effect that a dialog procedure for achieving the purpose can be determined.

[Brief description of the drawings]

【図１】この発明の実施の形態１に係る音声対話装置
の構成を示す図である。FIG. 1 is a diagram showing a configuration of a voice interaction device according to Embodiment 1 of the present invention.

【図２】この発明の実施の形態１に係る音声対話装置
の対話手順の一例を示す図である。FIG. 2 is a diagram showing an example of a dialogue procedure of the voice dialogue device according to Embodiment 1 of the present invention.

【図３】この発明の実施の形態１に係る音声対話装置
の対話手順の一例を示す図である。FIG. 3 is a diagram showing an example of a dialogue procedure of the voice dialogue device according to the first embodiment of the present invention.

【図４】この発明の実施の形態１に係る音声対話装置
の音声認識正誤回数記憶部の記憶内容を示す図である。FIG. 4 is a diagram showing storage contents of a voice recognition correct / incorrect number-of-times storage unit of the voice interactive device according to Embodiment 1 of the present invention;

【図５】この発明の実施の形態２に係る音声対話装置
の構成を示す図である。FIG. 5 is a diagram showing a configuration of a voice interaction device according to a second embodiment of the present invention.

【図６】この発明の実施の形態２に係る音声対話装置
の対話手順の一例を示す図である。FIG. 6 is a diagram showing an example of a dialogue procedure of the voice dialogue device according to Embodiment 2 of the present invention.

【図７】この発明の実施の形態２に係る音声対話装置
の対話手順の一例を示す図である。FIG. 7 is a diagram showing an example of a dialogue procedure of the voice dialogue device according to Embodiment 2 of the present invention.

【図８】この発明の実施の形態２に係る音声対話装置
の検定式の一例を示す図である。FIG. 8 is a diagram showing an example of a test expression of the voice interaction device according to the second embodiment of the present invention.

【図９】この発明の実施の形態３に係る音声対話装置
の構成を示す図である。FIG. 9 is a diagram showing a configuration of a voice interaction device according to a third embodiment of the present invention.

【図１０】この発明の実施の形態４に係る音声対話装
置の構成を示す図である。FIG. 10 is a diagram showing a configuration of a voice interaction device according to a fourth embodiment of the present invention.

【図１１】この発明の実施の形態４に係る音声対話装
置の対話手順の一例を示す図である。FIG. 11 is a diagram showing an example of a dialogue procedure of the voice dialogue device according to Embodiment 4 of the present invention.

【図１２】この発明の実施の形態５に係る音声対話装
置の構成を示す図である。FIG. 12 is a diagram showing a configuration of a voice interaction device according to a fifth embodiment of the present invention.

【図１３】この発明の実施の形態５に係る音声対話装
置の対話手順の一例を示す図である。FIG. 13 is a diagram showing an example of a dialogue procedure of the voice dialogue device according to Embodiment 5 of the present invention.

【図１４】この発明の実施の形態６に係る音声対話装
置の構成を示す図である。FIG. 14 is a diagram showing a configuration of a voice interaction device according to a sixth embodiment of the present invention.

【図１５】この発明の実施の形態６に係る音声対話装
置の音声認識正誤履歴表を示す図である。FIG. 15 is a diagram showing a speech recognition correct / incorrect history table of the speech interactive device according to Embodiment 6 of the present invention;

【図１６】この発明の実施の形態６に係る音声対話装
置の音声認識正誤履歴表を示す図である。FIG. 16 is a diagram showing a speech recognition correct / incorrect history table of the speech interaction apparatus according to Embodiment 6 of the present invention.

【図１７】この発明の実施の形態６に係る音声対話装
置の正解認識に対する音声認識率を示す図である。FIG. 17 is a diagram showing a speech recognition rate for correct answer recognition of the speech interaction apparatus according to Embodiment 6 of the present invention.

【図１８】従来の音声対話装置の構成を示す図であ
る。FIG. 18 is a diagram illustrating a configuration of a conventional voice interaction device.

[Explanation of symbols]

１音声認識部、２対話手順記憶部、３音声認識正
誤回数記憶部、４遷移先対話状態決定部、５対話管
理部、６想定音声認識率検定部、７音声認識率推定
部、８音声認識成功可能性判定部、９音声認識率正
誤履歴蓄積部、１０音声認識率分布更新部。DESCRIPTION OF SYMBOLS 1 Speech recognition part, 2 dialogue procedure storage part, 3 speech recognition correct / incorrect number storage part, 4 destination dialog state determination part, 5 dialogue management part, 6 assumed speech recognition rate test part, 7 speech recognition rate estimation part, 8 speech recognition A success possibility determination unit, 9 a speech recognition rate correct / error history storage unit, and 10 a speech recognition rate distribution updating unit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者石川泰東京都千代田区丸の内二丁目２番３号三菱電機株式会社内Ｆターム(参考） 5D015 LL00 LL10 ────────────────────────────────────────────────── ─── Continued on the front page (72) Inventor Yasushi Ishikawa 2-3-2 Marunouchi, Chiyoda-ku, Tokyo F-term (reference) in Mitsubishi Electric Corporation 5D015 LL00 LL10

Claims

[Claims]

1. A speech recognition unit that performs a recognition process on an input speech and outputs a speech recognition result, and a speech recognition target vocabulary, a speech recognition result, and a transition destination conversation state according to the number of times of erroneous recognition in each conversation state. A dialogue procedure storage unit for holding a specified dialogue procedure; a speech recognition error / correction count storage unit for holding the number of times of speech recognition error / error; a speech recognition error / error count held in the speech recognition error / error count storage unit; and the speech recognition unit A destination dialog state determining unit that determines and outputs a transition destination dialog state by referring to the dialog procedure stored in the dialog procedure storage unit based on the speech recognition result output by the voice recognition unit; A speech dialogue device, comprising: a dialogue management unit that outputs a correct / false result with respect to a speech recognition result and transitions a dialogue state to a transitional dialogue state output by the transitional dialogue state determination unit.

2. A speech recognition unit that performs recognition processing on an input speech and outputs a speech recognition result, and a speech recognition target vocabulary, a speech recognition result, and a transition destination conversation state corresponding to an assumed recognition rate in each conversation state. A dialogue procedure storage unit for holding a specified dialogue procedure, a speech recognition accuracy / error count number storage unit for holding the number of times of speech recognition error, and a current A test is performed on the assumed recognition rate defined in the dialog state, and an assumed speech recognition rate test unit that outputs all the assumed recognition rates that are not rejected, with reference to the dialog procedure stored in the dialog procedure storage unit,
A transition destination dialog state that determines and outputs one transition destination conversation state from the transition destination conversation state corresponding to the speech recognition result output by the speech recognition unit and the assumed recognition rate output by the assumed speech recognition rate test unit. A determining unit, and a dialog managing unit that outputs a correct / incorrect result with respect to the voice recognition result output by the voice recognizing unit, and transitions a dialog state to a transition destination dialog state output by the transition destination dialog state determining unit. A spoken dialogue device.

3. The dialog management unit, if the transition destination dialog state output by the transition destination dialog state determination unit is a dialog end state and the user's dialog purpose is not achieved, 3. The voice dialogue device according to claim 1, wherein the dialogue is terminated and switched to an operator.

4. The dialogue procedure storage unit holds a dialogue procedure defining an average number of dialogues up to an end dialogue state in each dialogue state, and the transition destination dialogue state determination unit is held in the dialogue procedure storage unit. Referring to the dialogue procedure described above, the average number of dialogues from the speech recognition result output by the voice recognition unit and the transition destination dialogue state corresponding to the assumed recognition rate output by the assumed speech recognition rate test unit to the end dialogue state The destination dialog state based on the
4. The voice interaction device according to claim 1, wherein the voice interaction device is determined and output.

5. The dialogue procedure storage unit holds a dialogue procedure defining a speech recognition rate distribution in each dialogue state, and uses a speech recognition correct / error count stored in the speech recognition correct / false count storage unit to store a current speech recognition correct / error count. A voice recognition rate estimating unit that estimates and outputs the user's voice recognition rate up to the dialogue state, and a voice recognition rate output by the voice recognition rate estimating unit, based on the voice recognition rate distribution in the current dialogue state, A voice recognition success possibility determining unit that determines a possibility that the input of the user is correctly recognized and outputs a determination result, wherein the dialogue management unit includes a determination result of the voice recognition success possibility determining unit. 5. The system according to claim 1, wherein the dialogue with the user is switched to an operator based on the termination.
The voice interaction device according to any one of the above.

6. A speech recognition correct / error history storage unit for storing an estimated speech recognition rate of a user up to the dialogue state and a history of correctness / error of speech recognition results in the dialogue state in each dialogue state, A speech recognition rate distribution updating unit that calculates a speech recognition rate distribution in each dialogue state with reference to the storage unit, and updates the speech recognition rate distribution held in the dialogue procedure storage unit. The voice interaction device according to claim 5, wherein