JPH11212594A

JPH11212594A - Speech interaction device

Info

Publication number: JPH11212594A
Application number: JP10025124A
Authority: JP
Inventors: Atsushi Noguchi; 淳野口
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-01-22
Filing date: 1998-01-22
Publication date: 1999-08-06
Anticipated expiration: 2018-01-22
Also published as: JP3231693B2

Abstract

PROBLEM TO BE SOLVED: To improve the operability by allowing the device check a recognition object vocabulary and output a voice guidance including the recognition object vocabulary when interaction does not proceed since a user does not know the recognition object vocabulary. SOLUTION: When information that interaction does not proceed is sent from a user state detection part 102, an interaction control part 105 sends out an instruction for generating a voice guidance including an input example, a currently used dictionary name for recognition, and the storage contents of an interaction storage part 106 to an input example guidance generation part 107. At this time, the input example guidance generation part 107 checks current vocabularies to be recognized in a recognition dictionary storage part 104, generates a voice guidance including those vocabularies, and sends the generation results to the interaction control part 105. Then the interaction control part 105 sends an instruction for outputting the generation results as a voice guidance to a voice output part 108.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声対話装置に関
し、特に音声ガイダンス出力機能を改善した音声対話装
置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice interactive device, and more particularly to a voice interactive device having an improved voice guidance output function.

【０００２】[0002]

【従来の技術】現在製品化されている多くの音声対話装
置では、ユーザーが音声入力した任意の語彙を音声認識
対象とすることはできず、数千単語程度の離散単語や、
これら離散単語に‘えー’や‘あのー’等の若干の付加
語が付いたもの、これら離散単語がいくつか接続した簡
単な文などを認識対象とすることが可能である。2. Description of the Related Art In many voice interactive devices currently being commercialized, an arbitrary vocabulary inputted by a user cannot be subjected to voice recognition.
It is possible to recognize those discrete words to which some additional words such as 'er' or 'ano' are attached and simple sentences to which some of these discrete words are connected.

【０００３】このような制約は、音声対話装置を用いた
サービスを利用する一般のユーザーにとって分かり難
く、利用時に認識対象語彙以外の単語を音声入力してし
まい、装置に入力音声を認識させることができなくなる
可能性がある。[0003] Such a restriction is difficult for a general user who uses a service using a speech dialogue device to understand, and when the speech is used, words other than the vocabulary to be recognized are input by speech, and the device is made to recognize the input speech. May not be possible.

【０００４】この点を考慮した従来技術としては、例え
ば、特開平８−４４３８８号公報に、特定話者用単語音
声認識装置として開示されている。この従来技術におい
ては、ユーザー音声認識装置の操作パネルに対して特定
の操作を行った時に登録されている単語の音声を順に出
力する。また、液晶などの表示パネルに登録単語を文字
で表示する。As a prior art in consideration of this point, for example, Japanese Patent Application Laid-Open No. 8-44388 discloses a word-speech recognition device for a specific speaker. In this conventional technique, when a specific operation is performed on an operation panel of a user voice recognition device, voices of registered words are sequentially output. The registered words are displayed in characters on a display panel such as a liquid crystal display.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、かかる
従来技術ではユーザーが装置に対して、認識対象語彙の
提示を指示しなければならないため、装置の使用に不慣
れなユーザーは指示方法が分からず認識対象語彙の提示
ができなくなるという問題点があった。However, in the prior art, since the user has to instruct the apparatus to present the vocabulary to be recognized, the user unfamiliar with the use of the apparatus cannot understand the instruction method and cannot recognize the vocabulary to be recognized. There was a problem that vocabulary could not be presented.

【０００６】また、上記の従来技術では電話回線上の音
声対話のように操作パネルや表示パネルを用意できない
環境における認識対象語彙方法に関しては考慮されてい
なかった。Further, in the above prior art, no consideration has been given to a vocabulary method to be recognized in an environment where an operation panel or a display panel cannot be prepared, such as a voice dialogue on a telephone line.

【０００７】そこで、本発明の目的は、ユーザーに対し
て入力可能な単語を迅速に知らせ、操作性を改善した音
声対話装置を提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to provide a spoken dialogue apparatus that promptly informs a user of inputtable words and improves operability.

【０００８】[0008]

【課題を解決するための手段】前述の課題を解決するた
め本発明による音声対話装置は、次のような特徴的な構
成を有する。In order to solve the above-mentioned problems, a voice interactive device according to the present invention has the following characteristic configuration.

【０００９】（１）ユーザーとの対話が進まない時に、
認識対象となっている語彙を調べこれら語彙の１つまた
は複数個を含む音声入力を促す音声ガイダンスをユーザ
ーに対し出力する音声対話装置。(1) When the conversation with the user does not progress,
A speech dialogue apparatus for examining words to be recognized and outputting voice guidance to a user for prompting voice input including one or more of these words.

【００１０】（２）ユーザーが音声を入力する音声入力
手段と、前記入力音声を認識する音声認識手段と、前記
音声認識手段にて用いる認識用辞書を記憶する音声認識
用辞書記憶手段と、音声ガイダンスを記憶しておくガイ
ダンス記憶手段とユーザーとの対話の流れを記憶してお
く対話記憶手段と、ユーザーが現在どのような語彙が音
声認識対象となっているか分からなくなってしまった状
態を検出するユーザー状態検出手段と、前記音声認識用
辞書記憶手段の記憶内容から入力例を含む音声ガイダン
スを作成する入力例ガイダンス作成手段と、前記音声認
識手段の認識結果に従い対話の流れを管理し、前記認識
手段から認識結果が送られてきた時に前記対話記憶部の
記憶内容より前記認識結果に対応する音声ガイダンスを
調べ音声ガイダンスとして出力するよう音声ガイダンス
出力手段に命令し、前記ユーザー状態検出手段にてユー
ザーが音声認識対象語彙が分からなくなってしまった状
態が検出された時に前記入力例ガイダンス作成手段に対
し入力例を含む音声ガイダンスを作成するように命令し
作成結果を音声ガイダンス出力手段に音声ガイダンスと
して出力するよう命令する対話管理手段と、前記対話管
理手段からの命令により前記ガイダンス記憶手段の記憶
内容を音声ガイダンスとして出力する音声ガイダンス出
力手段と、を備えて成る音声対話装置。(2) Voice input means for a user to input voice, voice recognition means for recognizing the input voice, voice recognition dictionary storage means for storing a recognition dictionary used in the voice recognition means, A guidance storage means for storing guidance and a dialog storage means for storing a flow of dialogue between the user and a state in which the user has no idea what vocabulary is currently being subjected to speech recognition. User state detecting means, input example guidance creating means for creating voice guidance including an input example from the contents stored in the voice recognition dictionary storage means, and managing the flow of dialogue in accordance with the recognition result of the voice recognition means; When the recognition result is sent from the means, the voice guidance corresponding to the recognition result is checked from the contents stored in the dialog storage unit. To the voice guidance output means, and when the user state detection means detects a state in which the user has lost the vocabulary for voice recognition, the voice including the input example to the input example guidance creation means. Dialogue management means for instructing to create guidance and output the creation result to voice guidance output means as voice guidance, and output the contents stored in the guidance storage means as voice guidance in response to a command from the dialogue management means. And a voice guidance output unit.

【００１１】（３）前記ユーザー状態検出手段は、ユー
ザーからの音声入力が一定時間無い場合にユーザーが現
在どのような語彙が音声認識対象となっているか分から
なくなってしまった状態とみなす（２）の音声対話装
置。(3) The user state detecting means considers that the user has no idea what vocabulary is currently subjected to voice recognition when there is no voice input from the user for a predetermined time (2). Spoken dialogue device.

【００１２】（４）前記ユーザー状態検出手段は、前記
音声認識手段にてリジェクションが一定回数生じた場合
にユーザーが現在どのような語彙が音声認識対象となっ
ているか分からなくなってしまった状態とみなす（２）
の音声対話装置。(4) The user state detecting means determines that the user has no idea what vocabulary is currently being subjected to speech recognition when the speech recognizing means has performed a certain number of rejections. Consider (2)
Spoken dialogue device.

【００１３】（５）前記ユーザー状態検出手段は、前記
対話管理手段にて音声入力結果を取り消す対話が一定回
数以上生じた場合にユーザーが現在どのような語彙が音
声認識対象となっているか分からなくなってしまった状
態とみなす（２）の音声対話装置。(5) The user state detecting means does not know what vocabulary is currently being subjected to speech recognition when the dialog managing means cancels a speech input result more than a predetermined number of times. (2) The speech dialogue device which is regarded as a state of being lost.

【００１４】（６）前記入力例ガイダンス作成手段は、
前記音声認識用辞書記憶手段の記憶内容のうち、各語彙
をユーザーが音声入力した回数を記憶しておき頻度の高
いものを用いて入力例を含む音声ガイダンスを作成する
（２）の音声対話装置。(6) The input example guidance creating means includes:
(2) The speech dialogue apparatus of (2), wherein, among the contents stored in the speech recognition dictionary storage means, the number of times each user has input a vocabulary by voice is stored, and a frequent one is used to create a voice guidance including an input example. .

【００１５】（７）ユーザーによる入力音声を音声認識
し、認識結果に基づいて音声対話を管理し、出力すべき
音声ガイダンスを音声出力する音声対話装置において、
ユーザーが一定時間以上音声入力せずに装置との対話が
進まない状態の発生を検出した場合に、入力音声語彙を
含む音声ガイダンスを作成して音声出力する音声対話装
置。(7) A speech dialogue apparatus for recognizing speech input by a user, managing speech dialogue based on the recognition result, and outputting speech guidance to be output as speech.
A voice interaction device that creates a voice guidance including an input voice vocabulary and outputs the voice when a state in which a user does not input a voice for more than a predetermined time and the conversation with the device does not proceed is detected.

【００１６】（８）現在の認識対象となっている語彙を
調べ、それら複数個の語彙を含む音声ガイダンスを作成
して音声出力する（７）の音声対話装置。(8) The voice dialogue apparatus of (7), which examines the vocabulary currently being recognized, creates voice guidance including the plurality of vocabularies, and outputs the voice guidance.

【００１７】（９）前記ユーザーが音声入力する語彙の
出現頻度を予め記憶しておき、頻度の高いものから音声
ガイダンスとして使用する（８）の音声対話装置。(9) The voice interaction apparatus according to (8), wherein the appearance frequency of the vocabulary input by the user by voice is stored in advance, and the most frequently used words are used as voice guidance.

【００１８】（１０）前記音声認識処理におけるリジェ
クションや、入力を取り消す対話が頻発している場合
に、前記ユーザーが音声認識対象となっている語彙が分
からないことが対話が進まない理由であると判断して対
応する音声ガイダンスを作成して音声出力する（７）の
音声対話装置。(10) The reason why the dialogue does not proceed is that the user does not know the vocabulary that is the target of the voice recognition when rejection in the voice recognition processing or dialogue for canceling the input occurs frequently. (7) The voice interaction device according to (7), wherein the voice guidance is created and the corresponding voice guidance is generated and output as voice.

【００１９】[0019]

【発明の実施の形態】以下、本発明の実施の形態につい
て添付図面を参照しながら説明する。図１は、本発明に
よる音声対話装置の一実施形態の基本構成図である。図
１を参照すると、本実施形態の音声対話装置は、ユーザ
ーが音声を入力する音声入力部１０１と、ユーザーが音
声入力しない場合等を検出するユーザー状態検出部１０
２と、入力音声を認識し結果を出力する音声認識部１０
３と、認識用辞書を記憶する認識用辞書記憶部１０４
と、ユーザーとの音声対話を制御する対話制御部１０５
と、装置が行なう音声対話を予め記憶しておく対話記憶
部１０６と、認識用辞書記憶部１０４、対話制御部１０
５、対話記憶部１０６の記憶内容より入力例を含む音声
ガイダンスを作成する入力例ガイダンス作成部１０７
と、音声ガイダンスを出力する音声出力部１０８と、ユ
ーザーへ出力する音声ガイダンスを記憶しておくガイダ
ンス記憶部１０９と、データベースを検索するデータベ
ース検索部１１０とから構成される。Embodiments of the present invention will be described below with reference to the accompanying drawings. FIG. 1 is a basic configuration diagram of an embodiment of a voice interaction device according to the present invention. Referring to FIG. 1, a voice interaction apparatus according to the present embodiment includes a voice input unit 101 for inputting a voice by a user and a user state detection unit 10 for detecting a case where the user does not input a voice.
2, a voice recognition unit 10 for recognizing an input voice and outputting a result
3, a recognition dictionary storage unit 104 for storing a recognition dictionary
And a dialog control unit 105 for controlling a voice dialog with a user
, A dialog storage unit 106 in which voice conversations performed by the apparatus are stored in advance, a recognition dictionary storage unit 104, and a dialog control unit 10.
5. An input example guidance creating unit 107 that creates voice guidance including an input example from the contents stored in the dialog storage unit 106
, A voice output unit 108 that outputs voice guidance, a guidance storage unit 109 that stores voice guidance to be output to the user, and a database search unit 110 that searches a database.

【００２０】ユーザーによる音声入力部１０１を介して
入力された音声入力は、音声認識部１０３にて認識処理
が行なわれ、認識結果が対話制御部１０５に送出され
る。対話制御部１０５では対話記憶部１０６に記憶され
た対話の流れに関する情報及び音声認識部１０３から送
られた認識結果より音声対話を管理し、出力すべき音声
ガイダンスに関する情報を音声出力部１０８に送る。入
力例ガイダンス作成部１０７は、対話制御部１０５から
送られた情報及び認識用辞書記憶部１０４の記憶内容か
ら入力例を含むガイダンスを作成し、結果を対話制御部
１０５に送る。音声出力部１０８は対話制御部１０５か
ら送られた情報よりガイダンス記憶部１０９に記憶され
た音声ガイダンスを出力する。A voice input by the user via the voice input unit 101 is subjected to recognition processing by a voice recognition unit 103, and a recognition result is sent to a dialog control unit 105. The dialogue control unit 105 manages the voiced dialogue based on the information on the flow of the dialogue stored in the dialogue storage unit 106 and the recognition result sent from the voice recognition unit 103, and sends information on the voice guidance to be output to the voice output unit 108. . The input example guidance creation unit 107 creates guidance including an input example from the information sent from the dialog control unit 105 and the storage contents of the recognition dictionary storage unit 104, and sends the result to the dialog control unit 105. The voice output unit 108 outputs the voice guidance stored in the guidance storage unit 109 based on the information sent from the dialog control unit 105.

【００２１】ガイダンス記憶部１０９は、予め収録され
た音声ガイダンスを記憶する。データベース検索部１１
０は、対話制御部１０５から送られてくる情報に従って
対応する音声データをデータベースから検索する。ユー
ザー状態検出部１０２は、ユーザーが一定時間以上音声
入力せずに装置との対話が進まない状態の発生を検出
し、検出された場合に対話制御部１０５にその情報を送
る。The guidance storage unit 109 stores voice guidance recorded in advance. Database search unit 11
0 searches the database for the corresponding voice data according to the information sent from the dialog control unit 105. The user state detection unit 102 detects the occurrence of a state in which the user does not input a voice for a certain period of time or more and the dialogue with the device does not proceed, and sends the information to the dialog control unit 105 when the state is detected.

【００２２】対話制御部１０５では、ユーザー状態検出
部１０２から対話が進まない状態であるという情報が送
られた時に、入力例ガイダンス作成部１０７に対して入
力例を含む音声ガイダンスを作成する命令、及び現在使
用している認識用辞書名、対話記憶部１０６の記憶内容
を送出する。この時、入力例ガイダンス作成部１０７は
認識用辞書記憶部１０４から現在の認識対象となってい
る語彙を調べ、それら複数個の語彙を含む音声ガイダン
スを作成して対話制御部１０５に作成結果を送る。対話
制御部１０５は、本作成結果を音声ガイダンスとして出
力する命令を音声出力部１０８に送る。その際、ユーザ
ーが音声入力する語彙の出現頻度を予め記憶しておき、
頻度の高いものから入力例として使用するという方法も
考えられる。In the dialogue control unit 105, when information indicating that the dialogue is not proceeding is sent from the user state detection unit 102, an instruction to create an audio guidance including an input example to the input example guidance creation unit 107. The name of the currently used recognition dictionary and the contents stored in the dialog storage unit 106 are transmitted. At this time, the input example guidance creating unit 107 checks the vocabulary currently being recognized from the recognition dictionary storage unit 104, creates a voice guidance including the plurality of vocabularies, and sends the created result to the dialog control unit 105. send. The dialog control unit 105 sends a command to output the result of the creation as voice guidance to the voice output unit 108. At that time, the appearance frequency of the vocabulary input by the user by voice is stored in advance,
It is also conceivable to use the most frequently used as an input example.

【００２３】また、ユーザー状態検出部１０２では、音
声認識処理におけるリジェクションや、入力を取り消す
対話（例えば認識結果を確認する対話でユーザーが「い
いえ」と音声入力するなど）などが頻発している場合
に、ユーザーが音声認識対象となっている語彙が分から
ないため対話が進まない情報であるとみなすという方法
も考えられる。In the user state detection unit 102, rejection in speech recognition processing, a dialog for canceling an input (for example, a user inputs "No" in a dialog for confirming a recognition result) and the like frequently occur. In such a case, a method may be considered in which the user does not know the vocabulary that is the target of speech recognition, and thus assumes that the information does not proceed with dialogue.

【００２４】次に、具体例を用いて本実施形態を説明す
る。以下の例では、ユーザーがプロ野球のチーム名を音
声入力することにより試合経過情報を得ることができる
サービスを行なうものとする。このようなサービスを行
なうインフラとしては、例えば専用の電話番号に電話を
かけるもの、街頭の情報端末によるものなどが考えられ
る。Next, the present embodiment will be described using a specific example. In the following example, a service is provided in which a user can obtain game progress information by inputting a professional baseball team name by voice. Infrastructures that provide such services include, for example, those that call a dedicated telephone number and those that use information terminals on the street.

【００２５】対話記憶部１０６に記憶されている音声対
話フローの例を図２に示す。また、認識用辞書記憶部１
０４の記憶内容を図３に示す。更に、図４には、具体的
な音声対話例が示されている。なお、以下の具体例では
認識用辞書は図３に示すもの１つのみしか用いないが、
複数個の認識用辞書を用意しユーザーとの対話に従い切
替えても良い。FIG. 2 shows an example of a voice dialogue flow stored in the dialogue storage unit 106. The recognition dictionary storage unit 1
FIG. 3 shows the storage contents of No. 04. Further, FIG. 4 shows a specific example of a voice dialogue. In the following specific example, only one recognition dictionary shown in FIG. 3 is used.
A plurality of recognition dictionaries may be prepared and switched according to the dialog with the user.

【００２６】図５はガイダンス記憶部１０９に記憶され
ている音声ガイダンスの内容と音声ファイル名の例であ
る。例えば、音声ファイル１．ｗａｖには「チーム名を
どうぞ」、音声ファイル１０．ｗａｖには「オリック
ス」という音声ガイダンスが記憶されている。FIG. 5 shows an example of the contents of voice guidance and voice file names stored in the guidance storage unit 109. For example, audio file 1. 9. Wav, "Please give me team name", audio file The voice guidance “ORIX” is stored in wav.

【００２７】図４をも参照すると、今、ユーザーが装置
との対話を開始し、装置からの音声ガイダンス「チーム
名をどうぞ」に対し、ユーザーが一定時間（例えば５
秒）音声入力がなかったものとする。この時ユーザー状
態検出部１０２は対話が進まないという情報を対話制御
部１０５に送る。対話制御部１０５は入力例ガイダンス
作成部１０７に対し入力例を含む音声ガイダンスを作成
する命令、現在の認識用辞書名（この場合図３に示した
もの）、対話記憶部１０６の記憶内容（図２に示したも
の）を送る。入力例ガイダンス作成部１０７は、ここで
は認識用辞書の始めの２単語（「巨人」、「広島」）を
入力例とし、対話記憶部１０６の記憶内容”＜入力例＞
のように”から音声ガイダンス「巨人、広島のように」
を作成し結果の文字列を対話制御部１０５に送る。対話
制御部１０５は本作成結果文字列を音声出力部１０８に
送り音声ガイダンスとして出力するよう命令する。Referring also to FIG. 4, now, the user starts a dialogue with the device, and the user receives a voice guidance from the device, "Please enter the team name," for a certain period of time (for example, 5 minutes).
Second) Assume that there is no voice input. At this time, the user state detection unit 102 sends information to the dialog control unit 105 that the dialog does not proceed. The dialog control unit 105 instructs the input example guidance creating unit 107 to create a voice guidance including an input example, the current recognition dictionary name (shown in FIG. 3 in this case), and the storage contents of the dialog storage unit 106 (FIG. 2). Here, the input example guidance creation unit 107 uses the first two words (“giant” and “Hiroshima”) of the recognition dictionary as an input example, and stores the contents of the dialog storage unit 106 “<input example>
Like "Guidance from Giants, Hiroshima"
And sends the resulting character string to the dialogue control unit 105. The dialogue control unit 105 sends the created result character string to the audio output unit 108 and instructs the audio output unit 108 to output the character string as audio guidance.

【００２８】[0028]

【発明の効果】以上述べたように本発明によれば、音声
対話装置において、ユーザーが認識対象語彙が分からず
対話が進まない場合に装置が認識対象語彙を調べ認識対
象語彙を含む音声ガイダンスを出力しているので、ユー
ザーはどのような単語が音声入力可能であるかを適切な
タイミングで知ることができ、ユーザーの操作性が格段
に改善される。As described above, according to the present invention, when the user does not know the vocabulary to be recognized and the dialogue does not proceed, the apparatus examines the vocabulary to be recognized and generates the voice guidance including the vocabulary to be recognized. Since the output is performed, the user can know what words can be input by voice at an appropriate timing, and the operability of the user is remarkably improved.

[Brief description of the drawings]

【図１】本発明による音声対話装置の一実施形態の基本
構成図である。FIG. 1 is a basic configuration diagram of an embodiment of a voice interaction device according to the present invention.

【図２】本発明の実施形態における対話記憶部１０６に
記憶されている音声対話フローチャートである。FIG. 2 is a voice interaction flowchart stored in the interaction storage unit 106 according to the embodiment of the present invention.

【図３】本発明の実施形態における認識用辞書記憶部１
０４の記憶内容を示す図である。FIG. 3 is a recognition dictionary storage unit 1 according to the embodiment of the present invention.
FIG. 4 is a diagram showing storage contents of the information storage unit 04.

【図４】本発明の実施形態における具体的な音声対話例
が示す図である。FIG. 4 is a diagram illustrating a specific example of a voice dialogue in the embodiment of the present invention.

【図５】本発明の実施形態におけるガイダンス記憶部１
０９に記憶されている音声ガイダンスの内容と音声ファ
イル名の例を示す図である。FIG. 5 is a guidance storage unit 1 according to the embodiment of the present invention.
FIG. 10 is a diagram illustrating an example of the content of voice guidance and a voice file name stored in 09.

[Explanation of symbols]

１０１音声入力部１０２ユーザー状態検出部１０３音声認識部１０４認識用辞書記憶部１０５対話制御部１０６対話記憶部１０７入力例ガイダンス作成部１０８音声出力部１０９ガイダンス記憶部１１０データベース検索部 Reference Signs List 101 voice input unit 102 user state detection unit 103 voice recognition unit 104 recognition dictionary storage unit 105 dialogue control unit 106 dialogue storage unit 107 input example guidance creation unit 108 voice output unit 109 guidance storage unit 110 database search unit

Claims

[Claims]

When a dialogue with a user does not proceed, a vocabulary to be recognized is examined, and voice guidance prompting a voice input including one or more of the vocabulary is output to the user. Voice interaction device.

2. A voice input unit for a user to input voice, a voice recognition unit for recognizing the input voice, a voice recognition dictionary storage unit for storing a recognition dictionary used in the voice recognition unit, and voice guidance. Dialogue storage means for storing the flow of the dialogue between the user and the guidance storage means for storing the vocabulary, and a user for detecting the state in which the user has no idea what vocabulary is currently being subjected to speech recognition. State detection means, input example guidance creating means for creating voice guidance including an input example from the storage contents of the speech recognition dictionary storage means, and managing the flow of dialogue according to the recognition result of the speech recognition means; When the recognition result is sent from the user, the voice guidance corresponding to the recognition result is checked from the contents stored in the dialog storage unit, and the voice guidance is The voice guidance output means is instructed to output the input guidance data when the user state detection means detects that the user has lost the vocabulary to be recognized. Dialogue management means for giving an instruction to create voice guidance and outputting the creation result to voice guidance output means as voice guidance, and output the contents stored in the guidance storage means as voice guidance according to a command from the dialogue management means. And a voice guidance output unit.

3. The apparatus according to claim 2, wherein said user state detecting means considers that the user has lost a vocabulary currently being recognized as a speech recognition target when there is no voice input from the user for a predetermined time. A speech dialogue device as described.

4. The user state detection means considers that the user has no idea what vocabulary is currently subjected to speech recognition when rejection occurs a certain number of times in the speech recognition means. The voice interaction device according to claim 2.

5. A method according to claim 1, wherein said user state detecting means does not know what vocabulary is currently being subjected to voice recognition when said dialog managing means cancels a voice input result a predetermined number of times or more. The voice interaction device according to claim 2, wherein the voice interaction device is regarded as a closed state.

6. The input example guidance creating means stores the number of times a user has input a speech of each vocabulary out of the contents stored in the speech recognition dictionary storage means, and uses the most frequently used input example. 3. The voice interaction device according to claim 2, wherein the voice guidance including the voice guidance is created.

7. A voice dialogue apparatus for recognizing voice input by a user, managing voice dialogue based on the recognition result, and outputting voice guidance to be output, wherein the user does not input voice for a certain period of time or more. A speech dialogue device, wherein, when an occurrence of a state in which a dialogue with a user does not progress is detected, a speech guidance including an input speech vocabulary is created and output as speech.

8. Examining the vocabulary currently being recognized,
The voice interaction apparatus according to claim 7, wherein a voice guidance including the plurality of vocabularies is created and output as voice.

9. The voice interaction apparatus according to claim 8, wherein the appearance frequency of the vocabulary input by the user by voice is stored in advance, and the vocabulary with the highest frequency is used as voice guidance.

10. In the case where rejection in the speech recognition processing and dialogue for canceling an input occur frequently, the reason that the dialogue does not proceed is that the user does not know the vocabulary targeted for speech recognition. The voice interaction apparatus according to claim 7, wherein the voice guidance is determined, the corresponding voice guidance is created, and the voice guidance is output.