JP2006189730A

JP2006189730A - Speech interactive method and speech interactive device

Info

Publication number: JP2006189730A
Application number: JP2005003119A
Authority: JP
Inventors: Takeshi Inoue; 剛井上; Sumiyuki Okimoto; 純幸沖本; Eiichi Naito; 栄一内藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-01-07
Filing date: 2005-01-07
Publication date: 2006-07-20
Anticipated expiration: 2025-01-07
Also published as: JP4634156B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech interactive method and a speech interactive device in which conversation is smoothly conducted and user's load is reduced even though the user does not know a recognizable vocabulary in all conversational conditions. <P>SOLUTION: A speech interactive information retrieving device is provided with a speech recognition section 101 which conducts speech recognition of the speech inputted by the user employing a speech recognition dictionary 102 and a model storage section 103 and outputs a recognition result, a recognition vocabulary already know degree determining section 104 which determines the degree of recognition vocabulary already known in the present conversation condition, an interactive determination section 106 which determines the next conversation condition, the screen at that conversation condition and response speech based on the speech recognition result and the recognition vocabulary already known degree and a response speech/screen output section 110 which outputs the screen and the response speech at the determined conversational condition. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、ユーザからの音声入力に応じて対話を行う音声対話方法および音声対話装置に関する。 The present invention relates to a voice dialogue method and a voice dialogue apparatus that conduct a dialogue in response to voice input from a user.

従来、ユーザからの音声入力に応じて対話を行う音声対話システムにおいて、ユーザの音声入力に対して正確な音声認識ができずに、円滑に対話が進まない状況に対処するために、いろいろな手法が提案されている。このような手法として、ユーザの再入力に対して、過去の認識結果の１位の語彙を認識対象から除き、過去の認識候補と、今回の処理で得た認識候補との双方を用いて認識結果を決定する音声認識装置が開示されている（例えば、特許文献１参照）。また、誤認識の回数をカウントし、そのカウント値が閾値よりも大きくなつた場合に、会話モ−ドから選択モ−ドへ変更が行われる会話音声理解方法が開示されている（例えば、特許文献２参照）。また、装置が応答を提示し終えた時刻から次に装置に音声が入力されるまでの入力間隔を用いて対話主導権を切り替える対話型情報検索システムが開示されている（例えば、特許文献３参照）。
特開平１１−１４９２９４号公報特許第２６５６２３４号特開２００３−１０８５８１号公報 2. Description of the Related Art Conventionally, in a voice dialogue system that performs dialogue according to voice input from a user, various methods have been used to cope with a situation where the voice cannot be recognized accurately and the dialogue does not proceed smoothly. Has been proposed. As such a method, the first vocabulary of the past recognition result is removed from the recognition target in response to the user's re-input, and recognition is performed using both the past recognition candidate and the recognition candidate obtained in this processing. A speech recognition apparatus for determining a result is disclosed (for example, see Patent Document 1). Further, there is disclosed a conversation speech understanding method in which the number of times of erroneous recognition is counted and the conversation mode is changed to the selection mode when the count value becomes larger than a threshold value (for example, patents). Reference 2). Further, an interactive information retrieval system that switches dialogue initiative using an input interval from the time when the device finishes presenting a response until the next time a voice is input to the device is disclosed (for example, see Patent Document 3). ).
JP-A-11-149294 Japanese Patent No. 2656234 JP 2003-108581 A

しかしながら、前記のような従来の手法においては、認識対象語彙を少なくすることにより認識精度を向上させたりユーザの入力回数や入力時間を基に対話制御を変更したりすることにより対話を円滑に進めようとするものであるが、各対話状態において同一の制御を行うため、必ずしも対話を円滑に進めることができない。 However, in the conventional method as described above, the dialogue is smoothly advanced by improving the recognition accuracy by reducing the recognition target vocabulary or changing the dialogue control based on the number of inputs and the input time of the user. However, since the same control is performed in each dialog state, the dialog cannot always proceed smoothly.

例えば、選択を行うような対話状態においては、ユーザはシステムがその場面で受け付けることが可能な語彙（以下単に認識対象語彙）を発話することが多く、また認識対象語彙も少ないので特許文献１のような手法は有効であるが、検索キーワードを入力するような対話状態においては、通常ユーザはシステムがその場面で受け付けることができない語彙（以下単に認識対象外語彙）を発話することが多く、また認識対象語彙も多いので特許文献１のような手法は有効ではない。 For example, in a dialog state where selection is performed, the user often utters a vocabulary (hereinafter simply referred to as a recognition target vocabulary) that the system can accept in the scene, and the recognition target vocabulary is also small. Although such a technique is effective, in a dialogue state in which a search keyword is entered, the user usually utters a vocabulary (hereinafter simply referred to as a non-recognized vocabulary) that the system cannot accept, Since there are many recognition target vocabularies, the technique as in Patent Document 1 is not effective.

また、特許文献２の会話音声理解方法では、誤認識の回数が一定の閾値より大きくなると対話制御を会話モードから選択モードに切り替えるが、この閾値を変更することが無いため、前記のような対話状態の違いに関係なく同じ動作をするため、無駄な聞き返しを多く行ってしまうことが生じる。 Further, in the conversation speech understanding method of Patent Document 2, when the number of erroneous recognitions exceeds a certain threshold, the conversation control is switched from the conversation mode to the selection mode. However, since this threshold is not changed, the conversation as described above is performed. Since the same operation is performed regardless of the difference in state, a lot of useless recollections occur.

そこで、本発明はこのような従来の課題を解決するためになされたものであって、対話を円滑に進め、かつユーザの負担を軽減することができる音声対話方法および音声対話装置を提供することを目的とする。 Therefore, the present invention has been made to solve such a conventional problem, and provides a voice dialogue method and a voice dialogue apparatus capable of smoothly promoting dialogue and reducing the burden on the user. With the goal.

上記目的を達成するため、本発明に係る音声対話方法は、音声を入力して対話する音声対話方法であって、入力された音声を認識して認識結果を出力する音声認識ステップと、現在の対話状態においてユーザが認識可能な語彙を把握している可能性の度合を示す認識語彙既知度合を決定する認識語彙既知度合決定ステップと、前記音声認識ステップにおいて認識された前記認識結果と前記認識語彙既知度合決定ステップにおいて決定された前記認識語彙既知度合とに基づいて、次の対話状態および当該対話状態における対話内容を決定する対話決定ステップと、前記対話決定ステップにおいて決定された対話内容を出力する出力ステップとを含むことを特徴とする。 In order to achieve the above object, a speech dialogue method according to the present invention is a speech dialogue method in which a speech is inputted to perform a dialogue, which recognizes the inputted speech and outputs a recognition result; A recognition vocabulary known degree determination step for determining a recognition vocabulary known degree indicating a degree of possibility of grasping a vocabulary recognizable by a user in a dialog state, the recognition result recognized in the speech recognition step, and the recognition vocabulary Based on the recognized vocabulary known degree determined in the known level determining step, a dialog determining step for determining the next dialog state and the dialog content in the dialog state, and outputting the dialog content determined in the dialog determining step And an output step.

これによって、ユーザが各対話状態において、認識可能な語彙を把握している可能性を考慮した対話制御が可能となり、ユーザにとってより負担が少なく、円滑に対話制御を行うことができる。 Accordingly, it is possible to perform dialogue control in consideration of the possibility that the user grasps a recognizable vocabulary in each dialogue state, and the dialogue control can be performed smoothly with less burden on the user.

また、前記認識語彙既知度合決定ステップでは、対象の対話状態における入力モード毎の前記認識語彙既知度合をあらかじめ格納した既知度合テーブルを用いて、前記認識語彙既知度合を決定してもよい。 In the recognized vocabulary known degree determination step, the recognized vocabulary known degree may be determined using a known degree table in which the recognized vocabulary known degree for each input mode in the target dialog state is stored in advance.

これによって、簡単に認識語彙既知度合を決定することが可能となり、ユーザにとってより負担が少なく、円滑な対話を実現することができる。 As a result, it is possible to easily determine the recognized vocabulary known degree, and it is possible to realize a smooth conversation with less burden on the user.

また、前記認識語彙既知度合決定ステップでは、対象の対話状態における入力モード、認識語彙の変動に関する認識語彙変動情報、認識語彙の属性を示す認識語彙属性情報、全認識対象語彙数、表示認識対象語彙数、ユーザ自身の情報、ユーザのシステム使用履歴、対話進行状態、画面や応答音声による認識語彙に関する情報量の少なくとも一つを用いて、前記認識語彙既知度合を算出してもよい。 In the recognition vocabulary known degree determination step, the input mode in the target conversation state, the recognition vocabulary fluctuation information regarding the fluctuation of the recognition vocabulary, the recognition vocabulary attribute information indicating the attributes of the recognition vocabulary, the total number of recognition target words, the display recognition target word The recognized vocabulary known degree may be calculated by using at least one of the number, the user's own information, the user's system usage history, the dialog progress state, and the amount of information related to the recognized vocabulary based on the screen and response voice.

これによって、例えばユーザや現在の対話進行状況に応じて、より精度よく認識語彙既知度合を求めることが可能となり、ユーザにとってより負担が少なく、円滑な対話を実現することができる。 Accordingly, for example, the recognition vocabulary known degree can be obtained more accurately according to the user and the current progress of the conversation, and a smooth conversation can be realized with less burden on the user.

また、前記対話決定ステップでは、前記認識語彙既知度合を示すための表示または音声応答の少なくとも１つを作成し、前記出力ステップでは、前記対話決定ステップにより作成された前記認識語彙既知度合を示す表示または音声応答の少なくとも１つを出力してもよい。 In the dialog determining step, at least one of a display or a voice response for indicating the recognized vocabulary known level is created, and in the outputting step, a display indicating the recognized vocabulary known level created by the dialog determining step is created. Alternatively, at least one of voice responses may be output.

これによって、ユーザに対して認識語彙既知度合、すなわち認識受理可能度合を伝えることになるので、ユーザの理解が深まり、円滑な対話を実現することができる。 As a result, the recognized vocabulary known level, that is, the recognized receivability level, is communicated to the user, so that the user's understanding is deepened and a smooth conversation can be realized.

また、前記対話決定ステップでは、前記対話内容に前記音声認識ステップにおける認識対象語彙に関する説明を含めるか否かを前記認識語彙既知度合に基づいて決定してもよい。 Further, in the dialogue determination step, whether or not to include an explanation about the recognition target vocabulary in the speech recognition step may be determined based on the recognition vocabulary known degree.

これによって、ユーザに適した出力が可能となり、よりユーザのレベルに応じた、円滑な対話を実現することができる。 As a result, an output suitable for the user is possible, and a smoother dialog according to the level of the user can be realized.

また、前記対話決定ステップでは、前記音声認識ステップにおいて認識された前記認識結果を未知語と判定した場合、前記対話内容を再度入力を促す対話内容とするか、または詳細な対話内容とするかを前記認識語彙既知度合に基づいて決定してもよい。 In the dialog determination step, if the recognition result recognized in the voice recognition step is determined to be an unknown word, whether the dialog content is a dialog content that prompts input again or a detailed dialog content. You may determine based on the said recognition vocabulary known degree.

また、前記対話決定ステップでは、前記再度入力を促す対話内容と決定した際、再入力回数に応じて前記音声認識ステップにおける音声認識用パラメータを変更してもよい。 Further, in the dialog determination step, when the dialog content that prompts input again is determined, the voice recognition parameter in the voice recognition step may be changed according to the number of re-inputs.

また、前記対話決定ステップでは、前記詳細な対話内容と決定した際、さらに前記認識語彙既知度合に基づいて対話内容を変更してもよい。 In the dialog determining step, when the detailed dialog content is determined, the dialog content may be further changed based on the recognized vocabulary known level.

これによって、認識語彙既知度合に応じた円滑な対話を実現することができる。
また、本発明に係る情報検索方法は、音声を入力して情報を検索する情報検索方法であって、入力された音声を認識して認識結果を出力する音声認識ステップと、現在の対話状態においてユーザが認識可能な語彙を把握している可能性の度合を示す認識語彙既知度合を決定する認識語彙既知度合決定ステップと、前記音声認識ステップにおいて認識された前記認識結果と前記認識語彙既知度合決定ステップにおいて決定された前記認識語彙既知度合とに基づいて、次の対話状態および当該対話状態における対話内容を決定する対話決定ステップと、前記対話決定ステップにおいて決定された対話内容を出力する出力ステップと、前記出力ステップにおいて出力されている前記対話内容が情報検索を受け付ける内容である場合に、前記音声認識ステップにおいて認識された前記認識結果に基づいて情報を検索する情報検索ステップとを含むことを特徴とする。 As a result, it is possible to realize a smooth dialogue according to the recognized vocabulary known degree.
The information search method according to the present invention is an information search method for searching for information by inputting voice, in a voice recognition step of recognizing the input voice and outputting a recognition result, A recognition vocabulary known degree determination step for determining a recognition vocabulary known degree indicating a degree of possibility of grasping a vocabulary recognizable by the user, and the recognition result recognized in the speech recognition step and the recognition vocabulary known degree determination. A dialog determination step for determining a next dialog state and a dialog content in the dialog state based on the recognized vocabulary known degree determined in the step; and an output step for outputting the dialog content determined in the dialog determination step; When the dialogue content output in the output step is content for accepting an information search, the voice recognition Characterized in that it comprises an information retrieval step of retrieving information on the basis of the recognized the recognition result in step.

これによって、ユーザが各対話状態において、認識可能な語彙を把握している可能性を考慮した対話制御が可能となり、ユーザにとってより負担が少なく、円滑な対話で情報検索を行うことができる。 As a result, it is possible to perform dialogue control in consideration of the possibility that the user grasps a recognizable vocabulary in each dialogue state, and the information retrieval can be performed in a smooth dialogue with less burden on the user.

なお、本発明は、このような音声対話方法および情報検索方法として実現することができるだけでなく、このような音声対話方法が備える特徴的なステップを手段とする音声対話装置および情報検索装置として実現したり、それらのステップをコンピュータに実行させるプログラムとして実現したりすることもできる。そして、そのようなプログラムは、ＣＤ−ＲＯＭ等の記録媒体やインターネット等の伝送媒体を介して配信することができるのは言うまでもない。 The present invention can be realized not only as such a voice interaction method and information retrieval method, but also as a voice interaction device and information retrieval device using the characteristic steps of such a voice interaction method as means. It can also be realized as a program that causes a computer to execute these steps. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.

本発明に係る音声対話方法および音声対話装置によれば、ユーザが認識可能な語彙を把握している可能性を考慮した対話制御が可能となり、ユーザにとってより負担の少ない円滑な対話が実現することができる。 According to the voice dialogue method and the voice dialogue apparatus according to the present invention, dialogue control considering the possibility of grasping a vocabulary recognizable by the user is possible, and smooth dialogue with less burden on the user is realized. Can do.

以下、本発明の各実施の形態について、それぞれ図面を参照しながら説明する。
（実施の形態１）
図１は本発明の実施の形態１に係る音声対話方法を用いた音声対話型情報検索装置の構成を示すブロック図である。 Embodiments of the present invention will be described below with reference to the drawings.
(Embodiment 1)
FIG. 1 is a block diagram showing a configuration of a voice interactive information retrieval apparatus using the voice interactive method according to Embodiment 1 of the present invention.

音声対話型情報検索装置は、音声を入力して対話しながら情報を検索する装置であり、図１に示すように音声認識部１０１、音声認識辞書１０２、モデル記憶部１０３、認識語彙既知度合決定部１０４、認識語彙既知度合記憶部１０５、対話決定部１０６、ユーザ情報入力部１０７、データベース検索部１０８、データベース記憶部１０９、および応答音声・画面出力部１１０を備えている。 The voice interactive information search device is a device that searches for information while inputting voice and interacting with it. As shown in FIG. 1, the voice recognition unit 101, the voice recognition dictionary 102, the model storage unit 103, and the recognition vocabulary known degree determination. Unit 104, recognized vocabulary known degree storage unit 105, dialogue determination unit 106, user information input unit 107, database search unit 108, database storage unit 109, and response voice / screen output unit 110.

音声認識部１０１は、音声認識辞書１０２およびモデル記憶部１０３を用いて、ユーザより入力された音声の音声認識を行い、認識結果を出力する。音声認識辞書１０２は、認識対象語彙が登録されている辞書である。モデル記憶部１０３は、音響モデルや言語モデルを記憶している。認識語彙既知度合記憶部１０５は、各対話状態においてユーザが認識対象語彙を把握している可能性を示す認識語彙既知度合を格納している認識語彙既知度合テーブルを記憶している。 The speech recognition unit 101 performs speech recognition of speech input by the user using the speech recognition dictionary 102 and the model storage unit 103, and outputs a recognition result. The speech recognition dictionary 102 is a dictionary in which recognition target words are registered. The model storage unit 103 stores an acoustic model and a language model. The recognized vocabulary known degree storage unit 105 stores a recognized vocabulary known degree table that stores a recognized vocabulary known degree indicating the possibility that the user grasps the recognition target vocabulary in each dialog state.

認識語彙既知度合決定部１０４は、対話決定部１０６より入力される現在の対話状態に関する情報で認識語彙既知度合テーブルを検索することにより、現在の対話状態における認識語彙既知度合を決定する。対話決定部１０６は、音声認識部１０１より入力される音声認識結果と、認識語彙既知度合決定部１０４より入力される認識語彙既知度合に基づいて、次の対話状態およびこの対話状態での画面と応答音声とを決定し、必要があればデータベース検索をデータベース検索部１０８に要求する。 The recognized vocabulary known degree determination unit 104 searches the recognized vocabulary known degree table with information about the current dialog state input from the dialog determination unit 106 to determine the recognized vocabulary known degree in the current dialog state. Based on the speech recognition result input from the speech recognition unit 101 and the recognized vocabulary known level input from the recognized vocabulary known level determination unit 104, the dialog determination unit 106 determines the next dialog state and the screen in this dialog state. The response voice is determined, and if necessary, a database search is requested to the database search unit 108.

ユーザ情報入力部１０７は、ユーザの性別や年齢などのユーザ情報を入力する。データベース検索部１０８は、対話決定部１０６からの情報検索要求に対し、データベース記憶部１０９に記憶されている情報検索用データベースから情報の検索を行う。データベース記憶部１０９は、情報検索用データベースを記憶している。応答音声・画面出力部１１０は、対話決定部１０６で決定された対話状態での画面や応答音声を出力する。 The user information input unit 107 inputs user information such as the gender and age of the user. In response to the information search request from the dialogue determination unit 106, the database search unit 108 searches for information from the information search database stored in the database storage unit 109. The database storage unit 109 stores an information search database. The response voice / screen output unit 110 outputs the screen or response voice in the dialog state determined by the dialog determination unit 106.

次に、上記のように構成された音声対話型情報検索装置において、番組情報を検索する際の動作について説明する。図２は音声対話型情報検索装置での対話全体の動作の流れを示すフローチャートである。 Next, an operation when searching for program information in the voice interactive information search apparatus configured as described above will be described. FIG. 2 is a flowchart showing the flow of the entire dialogue in the voice dialogue type information retrieval apparatus.

対話決定部１０６は対話開始の対話状態を決定し、決定した対話状態での画面と応答音声を決定し、応答音声・画面出力部１１０から出力することで、ユーザに対して入力要求を行う（ステップＳ１０１）。ここで、対話状態とは、対話決定部１０６で予め決定されているか、もしくは作成される対話の状態遷移全体における一状態を示し、多くの場合、システムの各状態に対応する。図３は具体的な出力画面例を示す図である。ここでは、例えば図３に示すように番組情報を検索する際のメニュー画面が出力され、システム応答として、エージェントの吹き出しの内容が応答音声として音声出力される。なお、吹き出し自体も画面表示してもよい。また、この例では図３における認識可能な語彙は四角で囲まれた「番組名検索」、「今放送中の番組」、…等の語彙のみである。 The dialog determination unit 106 determines the dialog state at the start of the dialog, determines the screen and response voice in the determined dialog state, and outputs the response voice / screen output unit 110 to make an input request to the user ( Step S101). Here, the dialog state indicates one state in the entire state transition of the dialog that is determined in advance by the dialog determination unit 106 or is created, and often corresponds to each state of the system. FIG. 3 is a diagram showing a specific output screen example. Here, for example, as shown in FIG. 3, a menu screen for searching for program information is output, and the content of the balloon of the agent is output as a response sound as a system response. The balloon itself may be displayed on the screen. In this example, the recognizable vocabulary in FIG. 3 is only the vocabulary such as “program name search”, “program currently being broadcast”,.

現在の対話状態が決定すると、認識語彙既知度合決定部１０４は、対話決定部１０６より入力される現在の対話状態で認識語彙既知度合テーブルを検索することにより、現在の対話状態における認識語彙既知度合を決定する（ステップＳ１０２）。図４は認識語彙既知度合テーブルの具体的な例を示す図である。ここで、項目４０１は対話状態であり、項目４０２は各対話状態における認識語彙既知度合が格納されている。この例では認識語彙既知度合は０から１までを取り得るパラメータで１に近いほどユーザがシステムの受理可能な語彙、すなわち認識対象語彙を知っていることを示す。例えば、現在の対話状態が「メニュー」であるとすると、認識語彙既知度合決定部１０４は、対話決定部１０６より入力される現在の対話状態「メニュー」で認識語彙既知度合テーブルを検索することにより、現在の対話状態における認識語彙既知度合として「０．９８」を対話決定部１０６に出力する。 When the current conversation state is determined, the recognized vocabulary known degree determination unit 104 searches the recognized vocabulary known degree table with the current conversation state input from the conversation determination unit 106 to thereby recognize the recognized vocabulary known degree in the current conversation state. Is determined (step S102). FIG. 4 is a diagram showing a specific example of the recognized vocabulary known degree table. Here, the item 401 is a dialogue state, and the item 402 stores the recognized vocabulary known degree in each dialogue state. In this example, the recognition vocabulary known degree is a parameter that can take from 0 to 1, and the closer to 1, the more the user is aware of the vocabulary that can be accepted by the system, that is, the recognition target vocabulary. For example, if the current dialog state is “menu”, the recognized vocabulary known degree determination unit 104 searches the recognized vocabulary known degree table with the current dialog state “menu” input from the dialog determination unit 106. Then, “0.98” is output to the dialogue determination unit 106 as the recognized vocabulary known degree in the current dialogue state.

なお、この認識語彙既知度合テーブルは予め評価実験等により各対話状態に対して求めておくものである。具体的には、例えば複数人の被験者にシステムを利用してもらい、各対話状態において、どのような発声が行われたかを記録する。その記録に基づき、ユーザが各対話状態において認識可能語彙を発声した割合を計算し、それを認識語彙既知度合としても良いし、さらに、ユーザが各対話状態に遷移してからユーザが発声するまでにかかった時間を加味しても良い。また、直接ユーザに各対話状態で発声する語彙が分かるかアンケートを取りつつ評価実験を進めその結果を集計して認識語彙既知度合を求めても良い。また、評価を年齢別や性別ごとに行い、認識語彙既知度合テーブルを複数用意しても良い。その場合、対話決定部１０６はユーザ情報入力部１０７より入力されたユーザ情報を認識語彙既知度合決定部１０４に出力し、認識語彙既知度合決定部１０４は対話決定部１０６より入力されたユーザ情報を基にどの認識語彙既知度合テーブルを利用するか決定する。 This recognized vocabulary known degree table is obtained in advance for each dialog state by an evaluation experiment or the like. Specifically, for example, a plurality of subjects are asked to use the system, and what utterances are made in each dialogue state is recorded. Based on the record, the ratio of the user uttering a recognizable vocabulary in each dialog state may be calculated and used as the recognition vocabulary known level, and further, after the user transitions to each dialog state until the user utters You may take into account the time it took. Further, it is possible to obtain a recognition vocabulary known degree by advancing an evaluation experiment while taking a questionnaire as to whether the user can know the vocabulary uttered in each dialog state directly. Further, the evaluation may be performed for each age and sex, and a plurality of recognized vocabulary known degree tables may be prepared. In this case, the dialogue determination unit 106 outputs the user information input from the user information input unit 107 to the recognized vocabulary known degree determination unit 104, and the recognized vocabulary known degree determination unit 104 receives the user information input from the dialogue determination unit 106. Based on the recognition vocabulary known degree table to be used, it is determined.

次に、対話決定部１０６からの現在の対話状態で認識可能な語彙の辞書登録と音声認識実行の要求により、音声認識部１０１は、上記の入力要求に対して発声されたユーザの音声入力について音声認識を実行し、認識結果を出力する（ステップＳ１０３）。音声認識部１０１は、認識結果として複数の候補およびそれぞれの候補の詳細な情報を出力すると共に、未知語判定結果も出力する。なお、未知語とはシステムにとって未知の語、すなわちシステムの認識対象外語であり、ユーザが認識対象語彙外の発声を行ったとき、ユーザが未知語を発声したという表現をする。例えば、図３における認識可能な語彙は四角で囲まれた語彙のみであるのに対し、「今何時」と言った発声は未知語発声であるとする。 Next, in response to a dictionary registration of a vocabulary that can be recognized in the current dialog state and a speech recognition execution request from the dialog determination unit 106, the speech recognition unit 101 performs voice input of the user uttered in response to the input request. Voice recognition is executed and a recognition result is output (step S103). The voice recognition unit 101 outputs a plurality of candidates and detailed information of each candidate as a recognition result, and also outputs an unknown word determination result. The unknown word is a word unknown to the system, that is, a non-recognized word of the system, and represents that the user uttered an unknown word when the user uttered outside the recognition target vocabulary. For example, it is assumed that the recognizable vocabulary in FIG. 3 is only a vocabulary surrounded by a square, whereas the utterance saying “what time is it” is an unknown utterance.

図５は音声認識部１０１が出力する認識結果の具体的な例を示す図である。ここで、項目５０１は候補順位であり認識スコア（項目５０６）によって順位付けされた認識結果である。項目５０２は認識結果文字列、項目５０３は認識日時、項目５０４は認識区間であり、ユーザの発声において音声認識に利用された音声区間の長さを示す。項目５０５は辞書単語数であり、認識が行われた対話状態における認識対象語彙数を示す。項目５０６は認識スコアであり、認識の確からしさを示す。項目５０７は未知語スコアであり、ユーザが認識対象外語彙を発声した可能性の度合を示す。ここでは未知語スコアが負であればユーザの発声は既知語、すなわちシステムの認識対象語であり、未知語スコアが正であれば、ユーザの発声は未知語、すなわち認識対象外語彙であると音声認識部１０１が判断したことを示す。 FIG. 5 is a diagram illustrating a specific example of the recognition result output by the voice recognition unit 101. Here, the item 501 is a candidate ranking and is a recognition result ranked by the recognition score (item 506). An item 502 is a recognition result character string, an item 503 is a recognition date and time, and an item 504 is a recognition section, which indicates the length of the voice section used for voice recognition in the user's utterance. An item 505 is the number of dictionary words, and indicates the number of words to be recognized in the dialog state in which recognition is performed. An item 506 is a recognition score, which indicates the likelihood of recognition. An item 507 is an unknown word score, which indicates the degree of possibility that the user has spoken a non-recognition vocabulary. Here, if the unknown word score is negative, the user's utterance is a known word, that is, a recognition target word of the system. If the unknown word score is positive, the user's utterance is an unknown word, that is, a non-recognition vocabulary. This indicates that the voice recognition unit 101 has made the determination.

次に、対話決定部１０６は、認識語彙既知度合決定部１０４により決定された認識語彙既知度合と、音声認識部１０１で認識された認識結果とに基づいて、次に行う対話状態を決定する（ステップＳ１０４）。このとき対話決定部１０６で行われる対話制御を説明する。図６は対話決定部１０６の動作の流れを示すフローチャートである。 Next, the dialogue determination unit 106 determines a dialogue state to be performed next based on the recognized vocabulary known level determined by the recognized vocabulary known level determination unit 104 and the recognition result recognized by the speech recognition unit 101 ( Step S104). The dialogue control performed by the dialogue determination unit 106 at this time will be described. FIG. 6 is a flowchart showing a flow of operations of the dialogue determination unit 106.

まず、対話決定部１０６は、認識語彙既知度合および認識結果の取得を行う（ステップＳ２０１）。次に、対話決定部１０６は、認識結果を基にユーザ入力音声が未知語であるか否かの判断を行う（ステップＳ２０２）。ここで、ユーザ入力音声が未知語または誤認識訂正発話でないと判断した場合（ステップＳ２０２でＮＯ）、対話決定部１０６は、認識結果に基づく次の対話状態の決定を行う（ステップＳ２０３）。なお、ここで次の対話状態を決定するために情報検索等の処理が必要であれば行われる。 First, the dialogue determination unit 106 acquires a recognized vocabulary known degree and a recognition result (step S201). Next, the dialogue determination unit 106 determines whether or not the user input voice is an unknown word based on the recognition result (step S202). Here, when it is determined that the user input voice is not an unknown word or an erroneous recognition correction utterance (NO in step S202), the dialog determination unit 106 determines the next dialog state based on the recognition result (step S203). Here, in order to determine the next dialog state, processing such as information retrieval is performed if necessary.

具体的には、図２で示す対話状態「メニュー」において、ユーザによって「番組名検索」と音声入力された場合、番組名検索を行う対話状態に対話を遷移させる。対話決定部１０６は、遷移させた対話状態である「番組名検索」における出力画面と応答音声とを決定する。そして、決定された出力画面と応答音声とが応答音声・画面出力部１１０から出力される。具体的な例として、「番組名検索」の対話状態へ進んだ場合の画面例を図７に示す。この対話状態での応答音声は「検索したい番組名を言ってください」である。 Specifically, in the dialog state “menu” shown in FIG. 2, when the user inputs “program name search” by voice, the dialog is shifted to the dialog state in which the program name search is performed. The dialogue determination unit 106 decides an output screen and a response voice in the “program name search” that is the transitioned dialogue state. Then, the determined output screen and response voice are output from the response voice / screen output unit 110. As a specific example, FIG. 7 shows a screen example in the case of proceeding to the “program name search” dialog state. The response voice in the dialog state is “Please say the name of the program you want to search”.

一方、上記判断おいて、ユーザ入力音声が未知語であると判断した場合（ステップＳ２０２でＹＥＳ）、対話決定部１０６は、認識語彙既知度合が所定の第１閾値より大きいか否かの判定を行う（ステップＳ２０４）。この第１閾値は対話決定部１０６が保持する値であり、具体的には、例えば「０．８」であるとする。この判定により、対話制御を変更する。なお、この第１閾値も認識語彙既知度合テーブルにおける認識語彙既知度の決定方法と同じように評価実験を行い適当な値を決定することができる。ここで、認識語彙既知度合が第１閾値より大きい場合（ステップＳ２０４でＹＥＳ）には、対話決定部１０６は対話状態を変えず、再入力を促すものと決定する（ステップＳ２０５）。一方、認識語彙既知度合が第１閾値以下である場合（ステップＳ２０４でＮＯ）には、対話決定部１０６は、後述する、認識語彙既知度合に基づいた詳細対話を行うものと決定する（ステップＳ２０６）。 On the other hand, in the above determination, when it is determined that the user input speech is an unknown word (YES in step S202), the dialogue determination unit 106 determines whether or not the recognized vocabulary known degree is greater than a predetermined first threshold value. This is performed (step S204). The first threshold is a value held by the dialogue determination unit 106, and specifically, is assumed to be “0.8”, for example. Based on this determination, the dialogue control is changed. It should be noted that this first threshold value can be determined by performing an evaluation experiment in the same manner as the method of determining the recognized vocabulary known level in the recognized vocabulary known level table. Here, when the recognized vocabulary known degree is larger than the first threshold value (YES in step S204), the dialogue determination unit 106 decides to prompt the re-input without changing the dialogue state (step S205). On the other hand, when the recognized vocabulary known level is equal to or less than the first threshold (NO in step S204), the dialog determining unit 106 determines to perform a detailed dialog based on the recognized vocabulary known level, which will be described later (step S206). ).

具体的な動作例としては、図２で示す対話状態「メニュー」において、ユーザによって「番組名検索」と音声入力されたが、音声認識部１０１の出力として未知語であると判定された場合、対話状態「メニュー」における認識語彙既知度合は「０．９８」であるので、認識語彙既知度合＞第１閾値が成立する。この場合、対話決定部１０６は対話状態を変えず、再入力の応答音声「すいません、もう一度御願いします」を応答音声・画面出力部１１０へ出力する。なお、再入力の際には対話決定部１０６は認識精度を向上させるため、認識用パラメータの変更や認識辞書の縮小などを行ってもよい。具体的には未知語判定の閾値を下げたり、認識用音響モデルを発声に適応することでより認識しやすくしたり、前回の１位の認識結果を辞書から取り除いたりする方法が考えられる。さらに、第１閾値の値を上げることにより、詳細対話（ステップＳ２０６）に進みやすくしてもよい。なお、これらの処理で行われた認識用パラメータの処理や第１閾値の変更は、対話状態が新たになったときにクリアされるものとする。 As a specific operation example, in the dialog state “menu” shown in FIG. 2, when a user inputs “program name search” as a voice, but is determined as an unknown word as an output of the voice recognition unit 101, Since the recognized vocabulary known degree in the dialog state “menu” is “0.98”, the recognized vocabulary known degree> the first threshold value is established. In this case, the dialogue determination unit 106 does not change the dialogue state, and outputs a response voice “I ’m sorry, please again” to the response voice / screen output unit 110. At the time of re-input, the dialogue determination unit 106 may change the recognition parameters or reduce the recognition dictionary in order to improve the recognition accuracy. Specifically, it is possible to make the recognition easier by lowering the threshold of unknown word determination, adapting the acoustic model for recognition to utterance, or removing the previous first-ranked recognition result from the dictionary. Furthermore, it is possible to facilitate the detailed dialogue (step S206) by increasing the value of the first threshold value. It should be noted that the processing of the recognition parameter and the change of the first threshold value performed in these processes are cleared when the conversation state becomes new.

対話決定部１０６は、上記のように次の対話状態を決定すると、決定した対話状態が検索終了を示す対話状態であるか否かの判断を行う（ステップＳ１０５）。決定した対話状態が検索終了を示す対話状態である場合（ステップＳ１０５でＹＥＳ）には、対話は終了する。一方、決定した対話状態が検索終了でない場合（ステップＳ１０５でＮＯ）には、認識語彙既知度合の決定処理（ステップＳ１０２）へ進み、以後上記と同じ動作を行う。 When the dialog determination unit 106 determines the next dialog state as described above, the dialog determination unit 106 determines whether or not the determined dialog state is a dialog state indicating the end of the search (step S105). If the determined dialog state is a dialog state indicating the end of the search (YES in step S105), the dialog ends. On the other hand, if the determined dialog state is not the end of the search (NO in step S105), the process proceeds to the process of determining the recognized vocabulary known degree (step S102) and thereafter performs the same operation as described above.

次にデータベース検索を行う対話についての動作例を説明する。具体的には現在の対話状態が「番組名検索」である動作例を説明する。この出力画面は図７であり、応答音声は「検索したい番組名を言って下さい」である。 Next, an operation example of a dialog for performing a database search will be described. Specifically, an operation example in which the current dialog state is “program name search” will be described. This output screen is shown in FIG. 7, and the response voice is “Please say the name of the program you want to search”.

上記と同様に、認識語彙既知度合決定部１０４は、対話決定部１０６より入力される現在の対話状態で認識語彙既知度合テーブルを検索することにより、現在の対話状態における認識語彙既知度合を決定する（ステップＳ１０２）。ここでは、現在の対話状態が「番組名検索」であるので、認識語彙既知度合決定部１０４は認識語彙既知度合「０．６８」を対話決定部１０６に出力する。 In the same manner as described above, the recognized vocabulary known degree determination unit 104 searches the recognized vocabulary known degree table in the current dialog state input from the dialog determination unit 106 to determine the recognized vocabulary known degree in the current dialog state. (Step S102). Here, since the current dialog state is “program name search”, the recognized vocabulary known level determination unit 104 outputs the recognized vocabulary known level “0.68” to the dialog determining unit 106.

次に、ユーザによって「宮本武蔵」と音声入力されたとすると、音声認識部１０１は、上記と同様に音声認識を実行し、上記と同様に例えば図５に示すような構造で認識結果を対話決定部１０６に出力する（ステップＳ１０３）。 Next, assuming that the user inputs “Miyamoto Musashi” as a voice, the voice recognition unit 101 executes voice recognition in the same manner as described above, and similarly determines the recognition result with a structure as shown in FIG. The data is output to the unit 106 (step S103).

対話決定部１０６は、認識結果と認識語彙既知度合とに基づいて、上記同様図６に示すフローチャートに従って対話状態を決定する（ステップＳ１０４）。ここで、ユーザによって入力された「宮本武蔵」が未知語ではない場合には、上記と同じく認識結果に基づく次の対話状態の決定を行う（ステップＳ２０３）。具体的には、「宮本武蔵」をキーワードとした番組検索の要求をデータベース検索部１０８に出力する。データベース検索部１０８は、対話決定部１０６より入力されたキーワードよる検索をデータベース記憶部１０９を用いて行い、検索結果を対話決定部１０６へ出力する。対話決定部１０６は、検索結果を表示した画面と、検索結果の選択を促す対話状態を次の対話状態と決定し、例えば図８に示すような画面と応答音声を応答音声・画面出力部１１０へ出力する。 The dialog determination unit 106 determines the dialog state according to the flowchart shown in FIG. 6 as described above based on the recognition result and the recognized vocabulary known degree (step S104). Here, if “Miyamoto Musashi” input by the user is not an unknown word, the next dialog state is determined based on the recognition result as described above (step S203). Specifically, a program search request using “Miyamoto Musashi” as a keyword is output to the database search unit 108. The database search unit 108 performs a search based on the keyword input from the dialog determination unit 106 using the database storage unit 109 and outputs the search result to the dialog determination unit 106. The dialog determination unit 106 determines the screen displaying the search result and the dialog state that prompts the user to select the search result as the next dialog state. For example, the screen and the response voice as illustrated in FIG. Output to.

一方、ユーザによって入力された「宮本武蔵」が未知語であると判定された場合には、対話決定部１０６は、現在の対話状態における認識語彙既知度合「０．６８」と第１閾値「０．８」とを比較する。この場合、対話決定部１０６は、認識語彙既知度合は第１閾値より小さいので認識語彙既知度合に基づいた詳細対話を行うものと決定する（ステップＳ２０６）。このとき対話決定部１０６で行われる詳細対話の対話制御を説明する。図９は対話決定部１０６で詳細対話の対話制御を行う際の動作の流れを示すフローチャートである。 On the other hand, when it is determined that “Miyamoto Musashi” input by the user is an unknown word, the dialogue determination unit 106 recognizes the recognized vocabulary known degree “0.68” in the current dialogue state and the first threshold “0”. .8 ". In this case, since the recognized vocabulary known level is smaller than the first threshold, the dialog determining unit 106 determines to perform a detailed dialog based on the recognized vocabulary known level (step S206). The dialog control of the detailed dialog performed at the dialog determination unit 106 at this time will be described. FIG. 9 is a flowchart showing an operation flow when the dialog determining unit 106 performs dialog control of the detailed dialog.

まず、対話決定部１０６は、認識語彙既知度合が上記第１閾値とは別の第２閾値より大きいか否かの判定を行う（ステップＳ３０１）。ここで、認識語彙既知度合が第２閾値より大きい場合（ステップＳ３０１でＹＥＳ）には、対話決定部１０６はその対話状態において発声可能な語彙の情報や認識文法を説明する応答音声や例を示す応答音声を決定する（ステップＳ３０２）。具体的には「ここでは、今週１週間の番組名について、音声により発声が可能です。もう一度おっしゃってください」や「『源氏物語』や『豊臣秀吉』のように番組名をおっしゃってください」という応答音声が出力される。 First, the dialogue determination unit 106 determines whether or not the recognized vocabulary known degree is larger than a second threshold different from the first threshold (step S301). Here, when the recognized vocabulary known degree is larger than the second threshold value (YES in step S301), the dialogue determination unit 106 shows response speech and an example explaining vocabulary information and recognition grammar that can be uttered in the dialogue state. A response voice is determined (step S302). Specifically, “Here, you can utter the name of the program for the week this week by voice. Answer voice is output.

一方、認識語彙既知度合が第２閾値以下である場合（ステップＳ３０１でＮＯ）には、階層型の絞込み検索を行う対話状態を次の対話状態と決定する（ステップＳ３０３）。図１０は階層型の絞込み検索の画面例を示す図である。階層型絞込み検索においては、例えば図１０に示すように頭文字を選択させ番組を絞りこみ、リスト表示を行う。なお、ここでは対話制御の種類を閾値１つ（第２閾値）により判定し、２種類のどちらかに振り分けたが、さらに閾値を増やし、振り分ける対話の種類を増やしてもよい。例えば、上記例の頭文字の指定を行の単位ではなく、ひらがな１文字の単位に更に絞り込む対話を行ったり、未知語である可能性を通知するだけの「すいません、その番組は有りません」といった応答を行う対話制御を行ったり、「それは何曜日の番組ですか？」や「その番組には誰がでていますか？」といった別の属性に関する質問をする応答を行う対話制御を認識語彙既知度合に基づいて行ってもよい。また、この閾値も上記認識語彙既知度合テーブルにおける認識語彙既知度の決定方法と同じように評価実験を行い適当な値を決定することができる。 On the other hand, when the recognized vocabulary known degree is equal to or smaller than the second threshold (NO in step S301), the dialog state in which the hierarchical search is performed is determined as the next dialog state (step S303). FIG. 10 is a diagram showing an example of a screen for hierarchical narrowing search. In the hierarchical narrowing search, for example, as shown in FIG. 10, an initial is selected to narrow down programs and display a list. Here, the type of dialogue control is determined by one threshold value (second threshold value) and distributed to one of the two types, but the threshold value may be further increased to increase the number of types of dialogue to be distributed. For example, “I'm sorry, there is no such program” just to have a dialog to further narrow down the designation of the initial character in the above example to a single character unit of hiragana instead of a line unit, or to notify the possibility of an unknown word Vocabulary known to perform interactive control that responds such as, or to respond to questions about other attributes such as “What day of the week is the program?” Or “Who is on that program?” You may carry out based on the degree. In addition, this threshold value can be determined by performing an evaluation experiment in the same manner as the method for determining the recognized vocabulary known level in the recognized vocabulary known level table.

以後は上記と同じ動作を行い、検索終了まで対話を行う。
以上の動作により、ユーザが各対話状態において、認識可能な語彙を把握している可能性を考慮した対話制御を行うことが可能となり、ユーザにとってより負担の少ない円滑な対話が実現できる。 Thereafter, the same operation as above is performed, and the dialogue is continued until the search is completed.
With the above operation, it is possible to perform dialogue control considering the possibility that the user grasps a recognizable vocabulary in each dialogue state, and a smooth dialogue with less burden on the user can be realized.

（実施の形態２）
上記実施の形態１によれば、各対話状態における認識語彙をユーザが把握している可能性を考慮に入れた対話制御が行えるが、認識語彙既知度合は予め学習された固定値を用いているため、日々検索対象が変わるようなコンテンツ検索の場合、すなわち認識対象語彙が一定でない場合に認識語彙既知度の精度が大きく落ちてしまい、適切な対話制御ができない。本実施の形態では、このような場合に対処するために認識語彙既知度合を算出する場合について説明する。 (Embodiment 2)
According to the first embodiment, dialogue control can be performed in consideration of the possibility that the user knows the recognized vocabulary in each dialogue state. However, the recognized vocabulary known degree uses a fixed value learned in advance. Therefore, in the case of a content search in which the search target changes every day, that is, when the recognition target vocabulary is not constant, the accuracy of the recognition vocabulary known degree greatly decreases, and appropriate dialogue control cannot be performed. In the present embodiment, a case will be described in which the recognized vocabulary known degree is calculated in order to deal with such a case.

図１１は本発明の実施の形態２に係る音声対話方法を用いた音声対話型情報検索装置の構成を示すブロック図である。なお、図１に示す上記実施の形態１と同様の部分については、同一の符号を付し、説明を省略する。 FIG. 11 is a block diagram showing a configuration of a voice interactive information search apparatus using the voice interactive method according to Embodiment 2 of the present invention. In addition, about the part similar to the said Embodiment 1 shown in FIG. 1, the same code | symbol is attached | subjected and description is abbreviate | omitted.

この音声対話型情報検索装置は、上記実施の形態１の構成とは認識語彙既知度合決定部２０１の構成および対話決定部２０２における処理が異なるものであり、他は実施の形態１と同様である。 This voice interactive information search device is different from the configuration of the first embodiment in the configuration of the recognized vocabulary known degree determination unit 201 and the processing in the dialogue determination unit 202, and is otherwise the same as in the first embodiment. .

認識語彙既知度合決定部２０１は、算出部２０１ａを備えている。この算出部２０１ａは、対話決定部２０２より入力された対話状態に関する情報を用いて、それぞれの情報における認識語彙既知度合を算出する。さらに、算出部２０１ａは、それぞれの情報における認識語彙既知度合を組み合わせて全体の認識語彙既知度合を算出する。 The recognized vocabulary known degree determination unit 201 includes a calculation unit 201a. The calculation unit 201a calculates the recognized vocabulary known degree in each piece of information using the information about the dialogue state input from the dialogue determination unit 202. Furthermore, the calculation unit 201a calculates the overall recognized vocabulary known degree by combining the recognized vocabulary known degrees in the respective information.

次に、上記のように構成された音声対話型情報検索装置において、番組情報を検索する際の動作について説明する。図１２は音声対話型情報検索装置の動作の流れを示すフローチャートである。 Next, an operation when searching for program information in the voice interactive information search apparatus configured as described above will be described. FIG. 12 is a flowchart showing the operation flow of the voice interactive information search apparatus.

対話決定部２０２は、上記実施の形態１と同様に対話開始の対話状態を決定し、決定した対話状態での画面と応答音声を決定し、応答音声・画面出力部１１０から出力することで、ユーザに対して入力要求を行う（ステップＳ４０１）。 As in the first embodiment, the dialog determination unit 202 determines the dialog state at the start of the dialog, determines the screen and response voice in the determined dialog state, and outputs them from the response voice / screen output unit 110. An input request is made to the user (step S401).

次に、対話決定部２０２は、現在の対話状態を決定すると、認識語彙既知度合決定部２０１に現在の対話状態に関する情報を出力する（ステップＳ４０２）。具体的には、ある対話状態Ｓｉにおける情報として、入力モード情報、固定語彙なのか変動語彙なのか、さらに変動語彙であればどの程度の時間間隔で変動する語彙なのかを現す認識語彙変動情報、認識語彙の属性を表す認識語彙属性情報、全認識対象語彙数、および、画面で表示されている表示認識対象語彙数を出力する。 Next, when determining the current dialog state, the dialog determination unit 202 outputs information on the current dialog state to the recognized vocabulary known degree determination unit 201 (step S402). Specifically, as information in a certain dialogue state Si, input mode information, recognition vocabulary variation information that indicates whether the vocabulary is a fixed vocabulary or a variable vocabulary, and if the vocabulary is a variable vocabulary, The recognition vocabulary attribute information indicating the attributes of the recognition vocabulary, the total number of recognition target words, and the number of display recognition target words displayed on the screen are output.

より具体的には、入力モード情報とは、例えば図２に示すような「選択型の入力画面」や例えば図７に示すような「自由型入力画面目」等の入力モード情報である。また、「固定語彙」とは、例えば図２に示すようなメニュー画面における選択用の語彙のように対象の対話状態において常に同一の認識対象語彙であり、「変動語彙」とは例えば図５に示すような番組名検索画面における日々更新される番組名のように、同一対話状態において、認識語彙が一定でない語彙である。また、認識語彙属性情報とは。例えば「コマンド」「番組名」「出演者名」「ジャンル名」「日時」「数字」といった語彙の属性を示す情報である。 More specifically, the input mode information is input mode information such as a “selective input screen” as shown in FIG. 2 and a “free input screen” as shown in FIG. Further, the “fixed vocabulary” is always the same recognition target vocabulary in the target dialog state, such as the vocabulary for selection on the menu screen as shown in FIG. 2, and the “variable vocabulary” is shown in FIG. The recognized vocabulary is a vocabulary that is not constant in the same dialog state, such as a program name updated daily on the program name search screen as shown. What is recognition vocabulary attribute information? For example, it is information indicating vocabulary attributes such as “command”, “program name”, “performer name”, “genre name”, “date / time”, and “number”.

次に、認識語彙既知度合決定部２０１の算出部２０１ａは、対話決定部２０２より入力された上記各対話状態に関する情報を用いて、それぞれの情報における認識語彙既知度合を算出する。ここで、入力モード情報を用いて求めた認識語彙既知度合Ｐ１、認識語彙変動情報を用いて求めた認識語彙既知度合Ｐ２、認識語彙属性情報を用いて求めた認識語彙既知度合Ｐ３、全認識対象語彙数と表示認識対象語彙数を用いて求めた認識語彙既知度合Ｐ４とする。 Next, the calculation unit 201 a of the recognized vocabulary known degree determination unit 201 calculates the recognized vocabulary known degree in each piece of information using the information regarding each dialogue state input from the dialogue decision unit 202. Here, the recognized vocabulary known degree P1 obtained using the input mode information, the recognized vocabulary known degree P2 obtained using the recognized vocabulary fluctuation information, the recognized vocabulary known degree P3 obtained using the recognized vocabulary attribute information, and all recognition targets The recognition vocabulary known degree P4 obtained using the vocabulary number and the display recognition target vocabulary number is set as P4.

具体的には、認識語彙既知度合Ｐ１は、図２に示すような選択型入力画面の方が図５に示すような自由型入力画面より高い値となる。認識語彙既知度合Ｐ２は、図２に示すような対話状態における認識対象語彙のように認識対象語彙が固定である方が図５に示すような対話状態における認識対象語彙のように変動する場合より高くなる。さらに、認識対象語彙の変動が早い方がより認識語彙既知度合Ｐ２は小さくなる。認識語彙既知度合Ｐ３は、コマンドのように共通認識度が高いものの方が番組名や出演者より高くなる。認識語彙既知度合Ｐ４は、認識対象語彙が多いほうが小さくなるが、さらに表示されていない語彙が多いほうが、小さくなる。 Specifically, the recognition vocabulary known degree P1 is higher in the selection type input screen as shown in FIG. 2 than in the free type input screen as shown in FIG. The recognition vocabulary known degree P2 is more variable when the recognition target vocabulary is fixed like the recognition target vocabulary in the dialogue state as shown in FIG. Get higher. Furthermore, the recognized vocabulary known degree P2 becomes smaller as the recognition target vocabulary changes more quickly. The recognition vocabulary known degree P3 is higher for a program having a higher common recognition degree than a program name or performer. The recognized vocabulary known degree P4 decreases as the number of recognition target words increases, but decreases as the number of vocabularies not displayed further increases.

認識語彙既知度合決定部２０１の算出部２０１ａは、上記のように各対話状態に関する情報を用いて求めた現在の対話状態における認識語彙既知度合を組み合わせて下記の式１により全体の認識語彙既知度合ＰＫ（Ｓｉ）を算出する（ステップＳ４０３）。 The calculation unit 201a of the recognized vocabulary known degree determination unit 201 combines the recognized vocabulary known degrees in the current conversation state obtained using the information on each conversation state as described above, and combines the recognized vocabulary known degrees in the following equation 1 to obtain the overall recognized vocabulary known degree. PK (Si) is calculated (step S403).

ここでｍｋは重み係数である。

Here, mk is a weighting coefficient.

認識語彙既知度合決定部２０１は、以上のように各対話状態に関する情報を用いて求めた認識語彙既知度合を対話決定部２０２に出力する。この認識語彙既知度合は、対話決定部２０２で上記実施の形態１と同様に対話制御の判断基準として利用される。 The recognized vocabulary known degree determination unit 201 outputs the recognized vocabulary known degree obtained using the information on each dialogue state to the dialogue determination unit 202 as described above. This recognized vocabulary known degree is used as a criterion for determining dialogue control in the dialogue determination unit 202 as in the first embodiment.

以降、音声認識処理（ステップＳ４０４）、次の対話状態の決定処理（ステップＳ４０５）、対話終了であるか否かの判断処理（ステップＳ４０６）については、上記実施の形態１と同様である。 Thereafter, the voice recognition process (step S404), the next dialog state determination process (step S405), and the process for determining whether or not the dialog is ended (step S406) are the same as those in the first embodiment.

なお、認識語彙既知度合決定部２０１は、算出部２０１ａで算出した認識語彙既知度合ＰＫ（Ｓｉ）と、上記実施の形態１と同様に認識語彙既知度合記憶部１０５に記憶されている認識語彙既知度合テーブルを用いて検索決定した認識語彙既知度合との２つの認識語彙既知度合から実際に対話制御で利用する認識語彙既知度合を決定しても良い。 The recognized vocabulary known degree determination unit 201 recognizes the recognized vocabulary known degree PK (Si) calculated by the calculating unit 201a and the recognized vocabulary known degree stored in the recognized vocabulary known degree storage unit 105 as in the first embodiment. The recognition vocabulary known degree actually used in the dialog control may be determined from the two recognized vocabulary known degrees, which are the retrieval vocabulary known degree determined by using the degree table.

また、対話決定部２０２は、認識語彙既知度合を画面や応答音声に反映させることで、ユーザに現在の対話状態においてユーザ入力の受理可能性を伝えてもよい。図９および図１０は具体的な出力画面例を示す図である。図９および図１０に示すように、認識語彙既知度合を受理可能性としてバー形式や、エージェントの表情でその度合を表示したり、応答音声の大きさや韻律を変更したりしてもよい。ここでは、図９では認識語彙既知度合が高く、図１０では認識語彙既知度合は低い場合を示している。 Further, the dialogue determination unit 202 may notify the user of acceptability of the user input in the current dialogue state by reflecting the recognized vocabulary known degree on the screen or the response voice. 9 and 10 are diagrams showing specific output screen examples. As shown in FIGS. 9 and 10, the degree of recognition vocabulary known may be accepted as a bar format, the degree may be displayed in the form of an agent, or the size or prosody of the response voice may be changed. Here, FIG. 9 shows a case where the recognized vocabulary known degree is high, and FIG. 10 shows a case where the recognized vocabulary known degree is low.

以上のように、本実施の形態においては認識語彙既知度合を対話状態の各種情報から算出するので、日々検索対象が変わるような、例えば、ＥＰＧを用いた番組検索など認識対象語彙が一定でない場合においても認識語彙既知度合の精度向上が可能となる。よって、各対話状態に適した対話制御が行え、ユーザにとってより負担の少ない円滑な対話が実現できる。 As described above, in the present embodiment, the recognition vocabulary known degree is calculated from various pieces of information in the dialog state, and therefore the recognition target vocabulary such as a program search using EPG is not constant because the search target changes every day. The accuracy of the recognized vocabulary known level can be improved. Therefore, dialogue control suitable for each dialogue state can be performed, and smooth dialogue with less burden on the user can be realized.

（実施の形態３）
上記実施の形態２によれば、各対話状態における認識語彙をユーザが把握している可能性である認識語彙既知度合を現在の対話状態に関する情報を用いて算出し、対話制御が行えるが、ユーザや対話の進行状態によらない計算手法を用いているため、ユーザに適応した対話制御ができない。本実施の形態では、このような場合に対処するために対話履歴を用いる場合について説明する。 (Embodiment 3)
According to the second embodiment, the recognition vocabulary known degree, which is the possibility that the user knows the recognized vocabulary in each dialog state, is calculated using the information related to the current dialog state, and dialog control can be performed. Because it uses a calculation method that does not depend on the progress status of the dialog, the dialog control adapted to the user cannot be performed. In this embodiment, a case will be described in which a dialogue history is used to deal with such a case.

図１５は本発明の実施の形態３に係る音声対話方法を用いた音声対話型情報検索装置の構成を示すブロック図である。なお、図１に示す上記実施の形態１と同様の部分については、同一の符号を付し、説明を省略する。 FIG. 15 is a block diagram showing a configuration of a voice interactive information search apparatus using the voice interactive method according to Embodiment 3 of the present invention. In addition, about the part similar to the said Embodiment 1 shown in FIG. 1, the same code | symbol is attached | subjected and description is abbreviate | omitted.

この音声対話型情報検索装置は、上記実施の形態１の構成に加えてユーザ情報記憶部３０３および対話履歴記憶部３０４を備えている。また、認識語彙既知度合決定部３０１の構成および対話決定部３０２における処理が異なるものであり、他は実施の形態１および実施の形態２と同様である。従って、本実施の形態においては、認識語彙既知度合決定部３０１の動作と、対話決定部３０２における対話制御、また、対話決定部３０２で作成される上記実施の形態１および実施の形態２において記述の無い出力画面や応答音声方法について説明する。 This voice interactive information search apparatus includes a user information storage unit 303 and a dialog history storage unit 304 in addition to the configuration of the first embodiment. Also, the configuration of the recognized vocabulary known degree determination unit 301 and the processing in the dialogue determination unit 302 are different, and the others are the same as those in the first and second embodiments. Therefore, in the present embodiment, the operation of the recognized vocabulary known degree determination unit 301, the dialog control in the dialog determination unit 302, and the description in the first embodiment and the second embodiment created in the dialog determination unit 302 are described. The output screen without response and the response voice method will be described.

対話決定部３０２は、現在の対話状態を決定すると、現対話状態および認識結果等を対話履歴記憶部３０４に記憶すると共に、認識語彙既知度合決定部３０１に現在の対話状態に関する情報と、ユーザに関する情報、そして対話進行状態に関する情報を出力する。 When the dialog determination unit 302 determines the current dialog state, the dialog determination unit 302 stores the current dialog state and the recognition result in the dialog history storage unit 304, and the recognition vocabulary known degree determination unit 301 stores information on the current dialog state and the user. Output information and information about the progress of the dialog.

図１６は対話履歴記憶部３０４に記憶されるデータの具体的な一例を示す図である。ここで、項目１２０１は対話状態名、項目１２０２は応答出力開始時刻、項目１２０３は認識結果であり、認識結果は例えば図５に示すような形で保存されている。項目１２０４は対話状態と再発声による停滞回数である。より具体的には状態の停滞回数はその対話状態における停滞回数で例えば、例えば図７に示すような番組名検索の対話状態が何回続いたかといった情報を持ち、再発声による停滞回数は同じ発声を何回続けて行っているかを示す情報である。 FIG. 16 is a diagram showing a specific example of data stored in the dialogue history storage unit 304. Here, the item 1201 is a dialog state name, the item 1202 is a response output start time, the item 1203 is a recognition result, and the recognition result is stored in a form as shown in FIG. 5, for example. An item 1204 is the number of stagnations due to the conversation state and the recurrent voice. More specifically, the number of stagnations in the state is the number of stagnations in the dialog state. For example, it has information such as how many times the dialog state of the program name search as shown in FIG. Is information indicating how many times the operation is continued.

項目１２０５は認識語彙情報通知レベルであり、画面や応答音声により認識語彙に関する情報をどの程度伝えたかを示すレベルである。図１７および図１８は具体的な出力画面例を示す図である。図１７および図１８に示す出力画面例は、同じ対話状態における出力画面であるが、認識語彙既知度合決定部３０１で決定される認識語彙既知度合により認識語彙に関する情報を伝える情報量を変化させた例である。図１７に示す出力画面例は例えばシステムをはじめて使うユーザのように認識語彙既知度合が低い場合の例であり、図１８に示す出力画面例は、図１７に示す出力画面例より認識語彙既知度合が高い場合の例である。図１７に示す出力画面例では認識語彙既知度合が低いため、画面の表示と応答音声にて認識対象語彙に対する情報をなるべく多くユーザに伝え、認識語彙既知度合を上げようと動作する。図１８に示す出力画面例では、認識語彙既知度合が図１７に示す出力画面例での値より大きくなったユーザに対して認識対象語彙に関する情報を減らした場合である。なお、応答音声は対話時間に大きく影響するため画面での出力情報より先に出力する情報を減らしてもよい。さらに認識語彙既知度合が上がると例えば図２に示すような画面となる。 An item 1205 is a recognition vocabulary information notification level, which is a level indicating how much information related to the recognition vocabulary is conveyed on the screen or response voice. 17 and 18 are diagrams showing specific output screen examples. The output screen examples shown in FIGS. 17 and 18 are output screens in the same dialog state, but the amount of information that conveys information about the recognized vocabulary is changed by the recognized vocabulary known degree determined by the recognized vocabulary known degree determining unit 301. It is an example. The example of the output screen shown in FIG. 17 is an example in the case where the recognized vocabulary known level is low, for example, for a user who uses the system for the first time, and the output screen example shown in FIG. This is an example of when the value is high. In the example of the output screen shown in FIG. 17, since the recognition vocabulary known level is low, the information on the recognition target vocabulary is transmitted to the user as much as possible through the display on the screen and the response voice, and the recognition vocabulary known level is increased. The example of the output screen shown in FIG. 18 is a case where the information related to the recognition target vocabulary is reduced for the user whose recognized vocabulary known degree is larger than the value in the example of the output screen shown in FIG. Since the response voice greatly affects the dialogue time, information output before the output information on the screen may be reduced. When the recognition vocabulary known level further increases, for example, a screen as shown in FIG. 2 is displayed.

項目１２０５は認識語彙既知度合であり、対象対話状態において利用した認識語彙既知度合である。なお、ここでは示さなかったが、対話履歴記憶部３０４には、応答音声や画面に出力された検索結果、対話開始時の対話状態を基準にした階層の深さなどの情報を記憶してもよい。 An item 1205 is a recognized vocabulary known level, which is a recognized vocabulary known level used in the target dialog state. Although not shown here, the dialog history storage unit 304 may store information such as response voices, search results output on the screen, and the depth of the hierarchy based on the dialog state at the start of the dialog. Good.

これらの各項目の情報が対話履歴として、図１６に示すように対話の進行に伴って１行ずつ、対話履歴記憶部３０４に記憶されることになる。なお、図１６に示す例では、最下段の状態においては、認識語彙情報通知レベルを前の状態の「２」から「６」としたことにより、認識語彙既知度合が前の状態の「０．６８」から「０．７２」に上がっている例を示している。 The information of each item is stored as a dialog history in the dialog history storage unit 304 line by line as the dialog progresses as shown in FIG. In the example shown in FIG. 16, in the lowermost state, the recognized vocabulary information notification level is changed from “2” to “6” in the previous state, so that the recognized vocabulary known degree is “0. In this example, “68” is increased to “0.72”.

次に、対話決定部３０２から認識語彙既知度合決定部３０１に出力される情報についてより詳細に説明する。 Next, information output from the dialogue determination unit 302 to the recognized vocabulary known degree determination unit 301 will be described in more detail.

現在の対話状態に関する情報は、上記実施の形態２で記載の情報と同一の情報である。ユーザの情報は、ユーザ情報記憶部３０３で記憶されている情報で、ユーザ自身の情報と、ユーザ使用履歴に関する情報である。具体的には図１９に示すように、ユーザ自身の情報としては、例えば、ユーザの年齢や性別、職業や他の機器操作の頻度などがあり、ユーザ使用履歴情報としては、例えばこれまでのシステム利用における、検索達成に関する情報や同一対話状態Ｓｉを経験した回数や対話状態Ｓｉから次の対話状態Ｓｉ＋１に遷移するのに必要とした平均発声回数などがある。 The information regarding the current dialogue state is the same information as the information described in the second embodiment. The user information is information stored in the user information storage unit 303, and is information related to the user himself / herself and user usage history. Specifically, as shown in FIG. 19, the user's own information includes, for example, the user's age and sex, occupation, frequency of other device operations, and the user usage history information includes, for example, conventional systems There are information on search achievement in use, the number of times of experiencing the same dialogue state Si, the average number of utterances required to transit from the dialogue state Si to the next dialogue state Si + 1, and the like.

対話進行状態に関する情報は前述したように図１６に示すような形式で対話履歴記憶部３０４に記憶される履歴情報を基に対話決定部３０２で作成される情報で、一つ前の対話状態から現在の対話状態までに要した時間や、認識結果、現在の対話状態に何回停滞しているか、現在の認識語彙情報通知レベルといった情報である。さらには特定の動き検出し、出力しても良い。具体的には、同じ対話状態に度々戻ったり、同じシーケンスを繰り返したりといった動作を検出する。 The information related to the dialog progress state is information created by the dialog determination unit 302 based on the history information stored in the dialog history storage unit 304 in the format shown in FIG. This is information such as the time required until the current dialogue state, the recognition result, how many times the current dialogue state has stagnated, and the current recognition vocabulary information notification level. Furthermore, a specific motion may be detected and output. Specifically, an operation such as returning to the same conversation state frequently or repeating the same sequence is detected.

認識語彙既知度合決定部３０１の算出部３０１ａは、対話制御１１０６より入力されたユーザ自身の情報とユーザ使用履歴情報を用いてそれぞれ認識語彙既知度合を算出する。認識語彙既知度合決定部３０１の算出部３０１ａは、ユーザ自身の情報を使って計算した認識語彙既知度合Ｐ５、これまで行った全ての使用履歴情報を用いて計算した認識語彙既知度合Ｐ６、対話進行状態に関する情報を用いて計算した認識語彙既知度合Ｐ７を算出する。 The calculation unit 301a of the recognized vocabulary known degree determination unit 301 calculates the recognized vocabulary known degree using the user's own information and user usage history information input from the dialogue control 1106, respectively. The calculation unit 301a of the recognized vocabulary known degree determination unit 301 includes a recognized vocabulary known degree P5 calculated using the user's own information, a recognized vocabulary known degree P6 calculated using all the usage history information performed so far, and the progress of dialogue. A recognition vocabulary known degree P7 calculated using information on the state is calculated.

そして、認識語彙既知度合決定部３０１の算出部３０１ａは、上記のようにそれぞれ算出した認識語彙既知度合を組み合わせて下記の式２により全体の認識語彙既知度合ＰＫ（Ｓｉ）を算出する。 Then, the calculation unit 301a of the recognized vocabulary known degree determination unit 301 calculates the overall recognized vocabulary known degree PK (Si) by the following equation 2 by combining the recognized vocabulary known degrees calculated as described above.

ここでｍｋは重み係数である。

Here, mk is a weighting coefficient.

より具体的には、この認識語彙既知度合Ｐ５は例えば高齢者や、他の情報検索システムの利用経験が少ないと小さな値となる。また、認識語彙既知度合Ｐ６はこれまでに同一対話状態の経験が少なく、その対話状態を通過するのに必要とした平均発話回数が多いほど小さな値となる。また、認識語彙既知度合Ｐ７は対話遷移に多くの時間を必要としたり、同じ対話状態に何回も停滞したりすると小さな値となる。 More specifically, the recognized vocabulary known degree P5 becomes a small value when, for example, an elderly person or other information retrieval system has little use experience. Further, the recognized vocabulary known degree P6 has a smaller value as the average number of utterances required to pass through the conversation state is less, so far the experience of the same conversation state is less. Further, the recognized vocabulary known degree P7 becomes a small value when a long time is required for the dialog transition or when the same vocabulary is stagnated many times.

これらは、予め評価実験の結果や開発者の設計により、各項目における値による認識語彙既知度合の決定ルールに従い決定することができる。これは、例えば図２０に示すようなテーブルを基に決定しても良いし、ＩＦＴＨＥＮのルールでより細かく設定されたプログラムにより判断してもよい。先にも述べたが、各項目における値の範囲の設定などは、例えば評価実験で得られた大量のデータを基に機械学習（例えば決定木）を用いて決定することができる。 These can be determined in advance according to a rule for determining a recognized vocabulary degree based on a value in each item based on a result of an evaluation experiment or a developer's design. This may be determined based on, for example, a table as shown in FIG. 20, or may be determined by a program set more finely by the IF THEN rule. As described above, the setting of the value range in each item can be determined using machine learning (for example, a decision tree) based on a large amount of data obtained by an evaluation experiment, for example.

なお、認識語彙既知度合決定部３０１は、上記実施の形態２で記載した対話状態に関する各々の情報を用いて求めた認識語彙既知度合と、上記の認識語彙既知度合Ｐ５〜Ｐ７を組み合わせて全体の認識語彙既知度合ＰＫ（Ｓｉ）を下記の式３により計算することもできる。 Note that the recognized vocabulary known degree determination unit 301 combines the recognized vocabulary known degrees obtained by using the information related to the conversation state described in the second embodiment and the recognized vocabulary known degrees P5 to P7 as a whole. The recognized vocabulary known degree PK (Si) can also be calculated by the following equation 3.

以上のように、本実施の形態においては、ユーザの情報と、対話進行状態に関する情報を用いて認識語彙既知度合を計算するため各対話状態においてユーザに適応した対話制御が可能となり、ユーザにとってより負担の少ない円滑な対話が実現できる。

As described above, in the present embodiment, the recognition vocabulary known degree is calculated using the user information and the information related to the dialog progress state, so that the dialog control adapted to the user in each dialog state is possible. Smooth dialogue with less burden can be realized.

本発明に係る音声対話方法および音声対話装置は、音声対話型インタフェースを持つ多くの装置に対して利用可能であり、音声認識対象語彙が時間や場所に応じて変化し、ユーザが認識対象語彙外の発声を行う可能性が高いＥＰＧ番組検索装置やカーナビゲーション装置などには特に有用であり、その利用可能性は非常に大きい。 The voice dialogue method and voice dialogue device according to the present invention can be used for many devices having a voice dialogue type interface, and the voice recognition target vocabulary changes according to time and place, and the user is outside the recognition target vocabulary. This is particularly useful for an EPG program search device, a car navigation device, and the like that are highly likely to utter the voice, and its applicability is very large.

本発明の実施の形態１に係る音声対話方法を用いた音声対話型情報検索装置の構成を示すブロック図である。It is a block diagram which shows the structure of the voice interactive type | formula information retrieval apparatus using the voice interactive method which concerns on Embodiment 1 of this invention. 本発明の実施の形態１における対話全体の動作の流れを示すフローチャートである。It is a flowchart which shows the flow of operation | movement of the whole dialog in Embodiment 1 of this invention. 本発明の実施の形態１における選択入力型の出力画面例を示す図である。It is a figure which shows the example of an output screen of the selection input type in Embodiment 1 of this invention. 本発明の実施の形態１における認識語彙既知度合記憶部に格納される認識語彙既知度合テーブルの例を示す図である。It is a figure which shows the example of the recognition vocabulary known degree table stored in the recognition vocabulary known degree memory | storage part in Embodiment 1 of this invention. 本発明の実施の形態１における音声認識部から出力される認識結果例を示す図である。It is a figure which shows the example of a recognition result output from the speech recognition part in Embodiment 1 of this invention. 本発明の実施の形態１における対話決定部における処理を示すフローチャートである。It is a flowchart which shows the process in the dialogue determination part in Embodiment 1 of this invention. 本発明の実施の形態１における自由入力型の出力画面例を示す図である。It is a figure which shows the example of a free input type output screen in Embodiment 1 of this invention. 本発明の実施の形態１における検索結果の出力画面例を示す図である。It is a figure which shows the example of an output screen of the search result in Embodiment 1 of this invention. 本発明の実施の形態１における対話決定部で詳細対話の対話制御を行う際の動作の流れを示すフローチャートである。It is a flowchart which shows the flow of operation | movement at the time of performing dialog control of a detailed dialog in the dialog determination part in Embodiment 1 of this invention. 本発明の実施の形態１における階層型の絞込み検索の画面例を示す図である。It is a figure which shows the example of a screen of the hierarchical refinement search in Embodiment 1 of this invention. 本発明の実施の形態２に係る音声対話方法を用いた音声対話型情報検索装置の構成を示すブロック図である。It is a block diagram which shows the structure of the voice interactive information retrieval apparatus using the voice interactive method which concerns on Embodiment 2 of this invention. 本発明の実施の形態２における対話全体の動作の流れを示すフローチャートである。It is a flowchart which shows the flow of operation | movement of the whole dialog in Embodiment 2 of this invention. 本発明の実施の形態２における認識語機知度合をバー形式により表示させた出力画面を示す図である。It is a figure which shows the output screen which displayed the recognition word machine knowledge in Embodiment 2 of this invention by the bar format. 本発明の実施の形態２における認識語機知度合をエージェントの表情により表示させた出力画面を示す図である。It is a figure which shows the output screen which displayed the recognition word machine knowledge in Embodiment 2 of this invention with the facial expression of an agent. 本発明の実施の形態３に係る音声対話方法を用いた音声対話型情報検索装置の構成を示すブロック図である。It is a block diagram which shows the structure of the voice interactive information retrieval apparatus using the voice interactive method which concerns on Embodiment 3 of this invention. 本発明の実施の形態３における対話履歴記憶部に記憶される対話履歴データ例を示す図である。It is a figure which shows the example of dialog log | history data memorize | stored in the dialog log | history memory | storage part in Embodiment 3 of this invention. 本発明の実施の形態３における認識語彙既知度合に対応して認識語彙に関する情報を多く表示させた出力画面例を示す図である。It is a figure which shows the example of an output screen which displayed much information regarding a recognition vocabulary corresponding to the recognition vocabulary known degree in Embodiment 3 of this invention. 本発明の実施の形態３における認識語彙既知度合に対応して認識語彙に関する情報を少し表示させた出力画面例を示す図である。It is a figure which shows the example of an output screen which displayed the information regarding a recognition vocabulary a little corresponding to the recognition vocabulary known degree in Embodiment 3 of this invention. 本発明の実施の形態３におけるユーザ情報記憶部に記憶されるユーザ情報データ例を示す図である。It is a figure which shows the example of user information data memorize | stored in the user information storage part in Embodiment 3 of this invention. 本発明の実施の形態３におけるユーザの情報の各項目における値による認識語彙既知度合の決定ルールの例を示す図である。It is a figure which shows the example of the determination rule of the recognition vocabulary known degree by the value in each item of the user's information in Embodiment 3 of this invention.

Explanation of symbols

１０１音声認識部
１０２音声認識辞書部
１０３モデル記憶部
１０４、２０１、３０１認識語彙既知度合決定部
１０５認識語彙既知度合記憶部
１０６、２０２、３０２対話決定部
１０７ユーザ情報入力部
１０８データベース検索部
１０９データベース記憶部
１１０応答音声・画面出力部
２０１ａ、３０１ａ算出部
３０３ユーザ情報記憶部
３０４対話履歴記憶部

DESCRIPTION OF SYMBOLS 101 Speech recognition part 102 Speech recognition dictionary part 103 Model memory | storage part 104,201,301 Recognition vocabulary known degree determination part 105 Recognition vocabulary known degree storage part 106,202,302 Dialogue determination part 107 User information input part 108 Database search part 109 Database Storage unit 110 Response voice / screen output unit 201a, 301a Calculation unit 303 User information storage unit 304 Dialog history storage unit

Claims

A voice dialogue method in which voice is inputted and dialogues are performed.
A speech recognition step for recognizing input speech and outputting a recognition result;
A recognition vocabulary known degree determination step for determining a recognized vocabulary known degree indicating a degree of possibility that the user can recognize a vocabulary that can be recognized in the current dialog state;
A dialog determining step for determining a next dialog state and a dialog content in the dialog state based on the recognition result recognized in the speech recognition step and the recognized vocabulary known level determined in the recognized vocabulary known level determining step. When,
An output step for outputting the content of the dialog determined in the dialog determination step.

In the recognition vocabulary known degree determination step,
The spoken dialogue method according to claim 1, wherein the recognized vocabulary known degree is determined using a known degree table in which the recognized vocabulary known degree for each input mode in a target dialogue state is stored in advance.

In the recognition vocabulary known degree determination step,
Input mode in the conversation state of the target, recognition vocabulary fluctuation information related to recognition vocabulary fluctuation, recognition vocabulary attribute information indicating the attributes of the recognition vocabulary, number of all recognition target vocabulary, number of display recognition target vocabulary, user's own information, user system use The spoken dialogue method according to claim 1, wherein the recognition vocabulary known degree is calculated using at least one of information relating to a recognized vocabulary based on a history, a dialogue progress state, a screen, and response voice.

In the dialog determination step, at least one of a dialog screen or a voice response is determined as the dialog content,
The voice dialog method according to claim 1, wherein at the output step, at least one of a screen or a voice response of the dialog determined in the dialog determination step is output.

In the dialog determination step, at least one of a display or a voice response for indicating the recognized vocabulary known degree is created,
The voice dialog method according to claim 1, wherein at the output step, at least one of a display or a voice response indicating the recognized vocabulary known degree created by the dialog determination step is output.

2. The speech dialogue method according to claim 1, wherein, in the dialogue determination step, whether or not to include an explanation related to a recognition target vocabulary in the speech recognition step is determined based on the recognition vocabulary known degree.

In the dialog determination step, when the recognition result recognized in the voice recognition step is determined as an unknown word, the recognition is performed to determine whether the dialog content is a dialog content that prompts input again or a detailed dialog content. The speech dialogue method according to claim 1, wherein the speech dialogue method is determined based on a vocabulary known degree.

8. The voice interaction method according to claim 7, wherein in the dialog determination step, when it is determined that the dialog content prompts the input again, the voice recognition parameter in the voice recognition step is changed according to the number of re-inputs.

8. The voice dialogue method according to claim 7, wherein, in the dialogue determination step, when the detailed dialogue content is determined, the dialogue content is further changed based on the recognized vocabulary known degree.

An information search method for searching for information by inputting voice,
A speech recognition step for recognizing input speech and outputting a recognition result;
A recognition vocabulary known degree determination step for determining a recognized vocabulary known degree indicating a degree of possibility that the user can recognize a vocabulary that can be recognized in the current dialog state;
A dialog determining step for determining a next dialog state and a dialog content in the dialog state based on the recognition result recognized in the speech recognition step and the recognized vocabulary known level determined in the recognized vocabulary known level determining step. When,
An output step for outputting the content of the dialog determined in the dialog determination step;
An information search step for searching for information based on the recognition result recognized in the voice recognition step when the dialogue content output in the output step is a content for accepting an information search. How to search for information.

A voice interactive device for inputting voice and interacting,
Speech recognition means for recognizing input speech and outputting a recognition result;
A recognized vocabulary known degree determining means for determining a recognized vocabulary known degree indicating a degree of possibility of grasping a vocabulary recognizable by the user in the current dialog state;
Dialog determining means for determining the next dialog state and the dialog content in the dialog state based on the recognition result recognized by the voice recognition means and the recognized vocabulary known degree determined by the recognized vocabulary known degree determining means. When,
A voice dialog device comprising: output means for outputting the dialog content determined by the dialog determination means.

An information search device for searching for information by inputting voice,
Speech recognition means for recognizing input speech and outputting a recognition result;
A recognized vocabulary known degree determining means for determining a recognized vocabulary known degree indicating a degree of possibility of grasping a vocabulary recognizable by the user in the current dialog state;
Dialog determining means for determining the next dialog state and the dialog content in the dialog state based on the recognition result recognized by the voice recognition means and the recognized vocabulary known degree determined by the recognized vocabulary known degree determining means. When,
Output means for outputting the content of the dialog determined by the dialog determination means;
An information search means for searching for information based on the recognition result recognized by the voice recognition means when the dialogue content output by the output means is a content for accepting an information search. Information retrieval device.

A program for dialogue by inputting voice,
A speech recognition step for recognizing input speech and outputting a recognition result;
A recognition vocabulary known degree determination step for determining a recognized vocabulary known degree indicating a degree of possibility that the user can recognize a vocabulary that can be recognized in the current dialog state;
A dialog determining step for determining a next dialog state and a dialog content in the dialog state based on the recognition result recognized in the speech recognition step and the recognized vocabulary known level determined in the recognized vocabulary known level determining step. When,
A program for causing a computer to execute an output step of outputting the dialog content determined in the dialog determination step.

A program for searching for information by inputting voice,
A speech recognition step for recognizing input speech and outputting a recognition result;
A recognition vocabulary known degree determination step for determining a recognized vocabulary known degree indicating a degree of possibility that the user can recognize a vocabulary that can be recognized in the current dialog state;
A dialog determining step for determining a next dialog state and a dialog content in the dialog state based on the recognition result recognized in the speech recognition step and the recognized vocabulary known level determined in the recognized vocabulary known level determining step. When,
An output step for outputting the content of the dialog determined in the dialog determination step;
Causing the computer to execute an information search step of searching for information based on the recognition result recognized in the voice recognition step when the dialogue content output in the output step is a content for receiving an information search. A program characterized by