JP5181533B2

JP5181533B2 - Spoken dialogue device

Info

Publication number: JP5181533B2
Application number: JP2007134781A
Authority: JP
Inventors: 利行難波; 義博大栄
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2007-05-21
Filing date: 2007-05-21
Publication date: 2013-04-10
Anticipated expiration: 2027-05-21
Also published as: JP2008287193A

Description

本発明は、音声対話装置に関し、特に、所定の主題に関するユーザとの間の過去の対話履歴を利用して音声認識の精度を向上させながら、その所定の主題に関する対話をより円滑なものにする音声対話装置に関する。 The present invention relates to a voice interaction device, and more particularly, to improve the accuracy of voice recognition using a past dialogue history with a user related to a predetermined subject, and to make the dialogue related to the predetermined subject smoother. The present invention relates to a voice interactive apparatus.

従来、音声入力開始ボタン等のボタン操作によらず自動的にユーザの音声入力を待ち受けるようにする地理的領域を設定し、車両がその地理的領域に進入した場合に、発話される可能性の高い語彙をその地理的領域が有する特徴に基づいて予測しながら音声認識に利用される音声認識辞書を変更した上で音声入力の待ち受けを開始させる音声入力装置が知られている（例えば、特許文献１参照。）。 Conventionally, a geographical area that automatically waits for a user's voice input regardless of the button operation such as the voice input start button is set, and when a vehicle enters the geographical area, there is a possibility of being uttered 2. Description of the Related Art A voice input device is known that starts waiting for voice input after changing a voice recognition dictionary used for voice recognition while predicting a high vocabulary based on features of the geographical region (for example, Patent Documents) 1).

この音声入力装置は、施設のジャンルに関する情報、ユーザの嗜好に関する情報、又は、目的地若しくは経由地の設定履歴情報に基づいて地理的領域を設定し、その地理的領域に属する施設名又は地名等の固有名詞を音声認識対象に含む音声認識辞書を一時的に採用して音声認識の精度を向上させながら、その地理的領域に関する主題におけるユーザとの間の対話が円滑になるようにする。 This voice input device sets a geographical area based on information on a genre of a facility, information on user preferences, or setting history information on a destination or waypoint, and names of facilities or places belonging to the geographical area The speech recognition dictionary that includes the proper noun is temporarily adopted to improve the accuracy of speech recognition, and the conversation with the user on the subject relating to the geographical area is facilitated.

これにより、この音声入力装置は、対話を通じてその地理的領域に関する情報を得ようとするユーザの負担を軽減させることができる。
特開２００６−２５１２９８号公報 As a result, this voice input device can reduce the burden on the user who wants to obtain information on the geographical area through dialogue.
JP 2006-251298 A

しかしながら、特許文献１に記載の音声入力装置は、地理的領域に応じた音声認識辞書を採用した上でユーザが発話するのを待ち受けるだけなので、音声認識の精度向上度合いが固定的かつ限定的であり、積極的に対話内容を充実させるようなこともできない。 However, since the voice input device described in Patent Document 1 only waits for the user to speak after adopting the voice recognition dictionary corresponding to the geographical area, the degree of accuracy improvement of voice recognition is fixed and limited. There is no way to actively enrich the dialogue.

上述の点に鑑み、本発明は、音声認識の精度を継続的に向上させながら、積極的に対話内容を充実させる音声対話装置を提供することを目的とする。 In view of the above points, an object of the present invention is to provide a speech dialogue apparatus that actively enhances the content of dialogue while continuously improving the accuracy of voice recognition.

上述の目的を達成するために、第一の発明に係る音声対話装置は、複数の音声認識辞書を参照対象として用いユーザの発話を認識しながらユーザとの間の対話を制御する音声対話装置であって、ユーザの発話履歴を主題毎に記録する発話履歴記録手段と、前記発話履歴記録手段が主題毎に記録した発話履歴に基づいて各主題における対話の新たなシナリオを決定する対話シナリオ決定手段と、前記対話シナリオ決定手段が決定した対話の新たなシナリオに基づいて参照対象とする音声認識辞書を決定する参照辞書決定手段と、を備えることを特徴とする。 In order to achieve the above-mentioned object, a voice interaction device according to a first invention is a voice interaction device that controls a dialogue with a user while recognizing a user's utterance using a plurality of voice recognition dictionaries as a reference object. An utterance history recording unit that records a user's utterance history for each subject, and a dialogue scenario determination unit that determines a new scenario for a conversation in each subject based on the utterance history recorded by the utterance history recording unit for each subject. And a reference dictionary determining means for determining a speech recognition dictionary to be referred to based on a new scenario of the dialog determined by the dialog scenario determining means.

また、第二の発明は、第一の発明に係る音声対話装置であって、車載機器の操作履歴を記録する操作履歴記録手段を更に備え、前記対話シナリオ決定手段は、前記操作履歴記録手段が記録した各主題に関する車載機器の操作履歴と前記発話履歴記録手段が主題毎に記録した発話履歴とに基づいて各主題における対話の新たなシナリオを決定することを特徴とする。 The second invention is a voice interaction device according to the first invention, further comprising an operation history recording means for recording an operation history of the in-vehicle device, wherein the operation scenario recording means includes the operation history recording means. A new scenario of dialogue in each subject is determined based on the operation history of the in-vehicle device related to each recorded subject and the utterance history recorded for each subject by the utterance history recording means.

また、第三の発明は、第一又は第二の発明に係る音声対話装置であって、前記参照辞書決定手段は、参照頻度が低い音声認識辞書を参照対象から除外することを特徴とする。 A third invention is a voice interaction device according to the first or second invention, wherein the reference dictionary determining means excludes a voice recognition dictionary having a low reference frequency from a reference target.

上述の手段により、本発明は、音声認識の精度を継続的に向上させながら、積極的に対話内容を充実させる音声対話装置を提供することができる。 With the above-described means, the present invention can provide a voice dialogue apparatus that actively enhances the dialogue contents while continuously improving the accuracy of voice recognition.

以下、図面を参照しつつ、本発明を実施するための最良の形態の説明を行う。 Hereinafter, the best mode for carrying out the present invention will be described with reference to the drawings.

図１は、本発明に係る音声対話装置の構成例を示すブロック図であり、音声対話装置１００は、主題（テーマ）毎に決定された対話シナリオに沿って質問やアドバイスを音声出力しユーザの発話内容を認識しながら車室内で交わされるユーザとの間の対話を継続させる車載装置であって、制御部１、音声入力部２、記憶部３、音声出力部４及び表示部５を備え、過去の対話履歴又は車載機器６やナビゲーション装置７の出力履歴に応じて所定の主題における対話シナリオを変えながらユーザとの間の対話内容を充実させるようにする。 FIG. 1 is a block diagram showing a configuration example of a voice dialogue apparatus according to the present invention. A voice dialogue apparatus 100 outputs a question and advice in a voice according to a dialogue scenario determined for each subject (theme). An in-vehicle device that continues a conversation with a user exchanged in a vehicle cabin while recognizing utterance content, and includes a control unit 1, a voice input unit 2, a storage unit 3, a voice output unit 4, and a display unit 5, The content of the dialogue with the user is enriched while changing the dialogue scenario on a predetermined subject according to the past dialogue history or the output history of the in-vehicle device 6 or the navigation device 7.

ここで、「対話シナリオ」とは、音声対話装置１００が発する質問の種類（肯定又は否定の何れかを要求する質問か、或いは、３以上の回答が予定される質問かを意味する。）、質問の内容、質問の順番、又は、対話全体の長さ（細かい質問を小刻みに出力するか、或いは、単刀直入に本題に入るかを意味する。）等で構成される対話の進め方であり、各主題に対して複数の対話シナリオが準備され、かつ、採用され得るものとする。 Here, the “dialog scenario” means the type of question issued by the voice dialog device 100 (a question requesting either affirmation or denial, or a question for which three or more answers are planned). It is a way of proceeding a dialogue composed of the contents of the question, the order of the questions, or the length of the whole dialogue (meaning whether a detailed question is output in small increments, or it enters the main subject directly) It is assumed that a plurality of dialogue scenarios can be prepared and adopted for each subject.

制御部１は、ＣＰＵ（Central Processing Unit）、ＲＡＭ(Random Access Memory)、ＲＯＭ（Read Only Memory）等を備えたコンピュータであって、例えば、音声認識手段１０、発話履歴記録手段１１、操作履歴記録手段１２、対話シナリオ決定手段１３、参照辞書決定手段１４及び音声対話制御手段１５のそれぞれに対応するプログラムをＲＯＭに記憶しながら、各手段に対応する処理をＣＰＵに実行させる。 The control unit 1 is a computer including a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and the like. For example, the voice recognition unit 10, the utterance history recording unit 11, the operation history recording While storing programs corresponding to each of the means 12, the dialog scenario determining means 13, the reference dictionary determining means 14 and the voice dialog control means 15 in the ROM, the CPU is caused to execute processing corresponding to each means.

音声入力部２は、ユーザ発話を入力するための装置であって、例えば、所定方向からの発話だけを認識する指向性マイク、又は、複数の音の位相差を利用して複数の方向からの発話を聞き分けられるよう複数の受音部を備えたマイクセット等がある。 The voice input unit 2 is a device for inputting user utterances. For example, a directional microphone that recognizes only utterances from a predetermined direction, or a plurality of directions using a phase difference of a plurality of sounds. There are microphone sets equipped with a plurality of sound receiving sections so that utterances can be recognized.

記憶部３は、各種情報を記憶するための装置であり、例えば、ハードディスクやＤＶＤ（Digital Versatile Disk）等の記憶媒体であって、音声認識辞書３０、発話履歴データベース３１及び操作履歴データベース３２を格納する。 The storage unit 3 is a device for storing various types of information. For example, the storage unit 3 is a storage medium such as a hard disk or a DVD (Digital Versatile Disk), and stores a voice recognition dictionary 30, an utterance history database 31, and an operation history database 32. To do.

音声認識辞書３０は、音声入力部２を介して取得した音声データをテキストデータに変換するために後述の音声認識手段１０が参照する辞書群であって、例えば、オートライトコントロールの機能説明（周囲の明るさに応じて自動的にヘッドライトを点灯・消灯させる機能である。）、回生エネルギーの回収方法、又は、追従走行制御の機能説明等のそれぞれの主題に関し発話が期待される表現（例えば、語句又は文である。）を所定の語彙数（例えば、１０語彙）毎に纏めた辞書の集合である。 The voice recognition dictionary 30 is a group of dictionaries that are referred to by the voice recognition unit 10 described later in order to convert voice data acquired via the voice input unit 2 into text data. This is a function that automatically turns on and off the headlight according to the brightness of the vehicle.) Expressions that are expected to be uttered on each subject such as a method for recovering regenerative energy or a function explanation of follow-up driving control (for example, Is a set of dictionaries in which a predetermined number of vocabularies (for example, 10 vocabularies) are collected.

発話が期待される表現は、各主題において音声対話装置１００が発する質問やアドバイス等に基づいて予め設定されており、言い換えれば、音声対話装置１００が発する質問やアドバイス等は、ユーザによる所定の発話を誘導するよう構成される。 Expressions expected to be uttered are set in advance based on questions, advices, etc. issued by the voice interaction apparatus 100 in each subject. In other words, questions, advices, etc. issued by the voice interaction apparatus 100 are predetermined utterances by the user. Configured to induce.

なお、音声認識辞書３０に記憶される語彙は、単語単位であってもよく、句単位であってもよく、或いは、文単位であってもよい。 Note that the vocabulary stored in the speech recognition dictionary 30 may be a word unit, a phrase unit, or a sentence unit.

発話履歴データベース３１は、ユーザの発話履歴を記憶するためのデータベースであり、例えば、発話時刻、発話内容、発話時における車両の位置情報（緯度、経度、高度）等を関連付けて記憶する。 The utterance history database 31 is a database for storing a user's utterance history, and stores, for example, utterance time, utterance content, vehicle position information (latitude, longitude, altitude) at the time of utterance and the like.

操作履歴データベース３２は、車載機器６の操作履歴を記憶するためのデータベースであり、例えば、操作時刻、操作内容、操作時における車両の位置情報（緯度、経度、高度）等を関連付けて記憶する。 The operation history database 32 is a database for storing an operation history of the in-vehicle device 6 and stores, for example, operation time, operation content, vehicle position information (latitude, longitude, altitude) at the time of operation in association with each other.

音声出力部４は、各種情報を音声出力するための装置であり、例えば、車載スピーカであって、制御部１からの制御信号に基づき、後述の対話シナリオ決定手段１３が決定した対話シナリオに沿って質問やアドバイス等を音声出力する。 The voice output unit 4 is a device for outputting various types of information as voices. For example, the voice output unit 4 is an in-vehicle speaker and is based on a dialogue scenario determined by a dialogue scenario determination unit 13 described below based on a control signal from the control unit 1. Voice output of questions and advice.

表示部５は、各種情報を表示するための装置であり、例えば、液晶ディスプレイ等であって、音声出力部４が出力する音声情報に対応するテキスト情報、又は、その音声情報を補うためのテキスト情報やグラフィック情報を表示する。 The display unit 5 is a device for displaying various types of information. For example, the display unit 5 is a liquid crystal display or the like, and is text information corresponding to audio information output by the audio output unit 4 or text for supplementing the audio information. Display information and graphic information.

車載機器６は、車両の走行状態を音声対話装置１００に把握させるための情報を出力する機器であって、例えば、オートライトコントロール、フォグライトスイッチ、二輪駆動・四輪駆動切り替えスイッチ、シフトセレクタ、追従走行制御装置等のように自身の設定状態を出力する機器があり、また、操舵角センサ、アクセル開度センサ、ブレーキ踏力センサ、照度センサ、勾配センサ、雨滴センサ、車速センサ等のように測定値を出力する機器をも含むものとする。さらに、車載機器６は、渋滞情報や気象情報を受信する通信機を含むものであってもよい。 The in-vehicle device 6 is a device that outputs information for allowing the voice interactive device 100 to grasp the running state of the vehicle. For example, the in-vehicle device 6 includes an auto light control, a fog light switch, a two-wheel drive / four-wheel drive changeover switch, a shift selector, There are devices that output their own setting status, such as a tracking control device, etc., and measurements such as steering angle sensor, accelerator opening sensor, brake pedal force sensor, illuminance sensor, gradient sensor, raindrop sensor, vehicle speed sensor, etc. Including devices that output values. Furthermore, the in-vehicle device 6 may include a communication device that receives traffic jam information and weather information.

ナビゲーション装置７は、電子地図情報と車両の現在位置とに基づいて目的地まで車両を誘導するための装置であり、例えば、ＧＰＳ(Global Positioning System)受信機によりＧＰＳ信号を受信しながら車両位置（緯度、経度、高度）を測定し、最短経路探索アルゴリズムとしてダイクストラ法を用いながら目的地に至るまでの最適な経路を導き出し、導き出した最適経路を電子地図上に重畳表示させたり音声案内を出力したりしながら車両を目的地まで誘導する。 The navigation device 7 is a device for guiding the vehicle to the destination based on the electronic map information and the current position of the vehicle. For example, the navigation device 7 receives the GPS signal by a GPS (Global Positioning System) receiver while receiving the vehicle position ( (Latitude, Longitude, Altitude) is measured, the optimal route to the destination is derived using the Dijkstra method as the shortest route search algorithm, and the derived optimal route is superimposed on the electronic map and voice guidance is output. Guide the vehicle to the destination.

次に、制御部１が有する各種手段について説明する。 Next, various units included in the control unit 1 will be described.

音声認識手段１０は、ユーザが発した音声をテキストデータとして認識するための手段であり、例えば、音響モデルを用いながらグラマーベースの音声認識エンジンによりユーザ発話を認識したり、音響モデルや言語モデルを用いながらディクテーションベースの音声認識エンジンによりユーザ発話を認識したりする。 The voice recognition unit 10 is a unit for recognizing a voice uttered by a user as text data. For example, the voice recognition unit 10 recognizes a user utterance by a grammar-based voice recognition engine using an acoustic model, or an acoustic model or a language model. While using it, the user's utterance is recognized by a dictation-based speech recognition engine.

発話履歴記録手段１１は、発話履歴を記録するための手段であり、例えば、音声認識手段１０により認識されたユーザ発話に対応するテキストデータを、発話時刻、発話の主題（例えば、オートライトコントロールの機能説明等がある。）、発話時における車両の位置情報（緯度、経度、高度）等と関連付けながら記憶部３の発話履歴データベース３１に記録する。 The utterance history recording unit 11 is a unit for recording the utterance history. For example, the text data corresponding to the user utterance recognized by the voice recognition unit 10 is converted into the utterance time, the subject of the utterance (for example, auto light control The function is recorded in the utterance history database 31 of the storage unit 3 while being associated with the position information (latitude, longitude, altitude) of the vehicle at the time of utterance.

また、発話履歴記録手段１１は、１台の車両を複数のユーザが利用するような場合には、ユーザ識別番号等を用いながらユーザ毎に発話履歴を記憶するようにしてもよい。 Further, when a plurality of users use one vehicle, the utterance history recording unit 11 may store the utterance history for each user using a user identification number or the like.

また、発話履歴記録手段１１は、音声認識できなかった事実を、発話時刻、発話の主題、発話時における車両の位置情報等と関連付けながら記憶部３の発話履歴データベース３１に記録するようにしてもよい。音声認識できなかった事実を主題に対するユーザの理解度の推定に利用できるようにするためである。 Further, the utterance history recording means 11 may record the fact that the speech could not be recognized in the utterance history database 31 of the storage unit 3 while associating it with the utterance time, the subject of the utterance, the position information of the vehicle at the time of utterance, and the like. Good. This is to make it possible to use the fact that the speech could not be recognized to estimate the user's understanding of the subject.

操作履歴記録手段１２は、操作履歴を記録するための手段であり、例えば、車載機器６が出力する情報を、操作時刻、操作内容（例えば、ある装置のスイッチがオンからオフに変化した事実等をいう。）、操作時における車両の位置情報（緯度、経度、高度）等を関連付けて記憶する。 The operation history recording unit 12 is a unit for recording an operation history. For example, information output from the in-vehicle device 6 is changed to an operation time, an operation content (for example, a fact that a switch of a certain device is changed from on to off, etc. The vehicle position information (latitude, longitude, altitude) at the time of operation is stored in association with each other.

また、操作履歴記録手段１２は、発話履歴記録手段１１と同様、１台の車両を複数のユーザが利用するような場合には、ユーザ識別番号等を用いながらユーザ毎に操作履歴を記憶するようにしてもよい。 Similarly to the utterance history recording unit 11, the operation history recording unit 12 stores an operation history for each user using a user identification number or the like when a plurality of users use one vehicle. It may be.

対話シナリオ決定手段１３は、発話履歴記録手段１１が記録した発話履歴、又は、操作履歴記録手段１２が記録した操作履歴に基づいて新たな対話シナリオを決定するための手段である。 The dialogue scenario determining unit 13 is a unit for determining a new dialogue scenario based on the utterance history recorded by the utterance history recording unit 11 or the operation history recorded by the operation history recording unit 12.

対話シナリオ決定手段１３は、例えば、過去に行ったオートライトコントロールを主題とする対話におけるユーザの発話履歴に基づいて、オートライトコントロールの操作方法がユーザに正しく伝わったか否かを判定し、正しく伝わっていないと判定した場合には、オートライトコントロールを主題とする対話を次回開始させるための条件（以下、「対話開始条件」とする。）、又は、その対話シナリオを変更する。 For example, the dialogue scenario determination means 13 determines whether or not the operation method of the auto light control is correctly transmitted to the user based on the user's utterance history in the conversation on the subject of the auto light control performed in the past, and the correct transmission is performed. If it is determined that it is not, the condition for starting the next dialogue with the subject of auto light control (hereinafter referred to as “dialogue start condition”) or the dialogue scenario is changed.

対話開始条件には、例えば、所定地点（例えば、この場合、トンネルのように周囲が暗くなる地点である。）までの距離が閾値未満となった場合、所定時刻（例えば、この場合、夕暮れ時等のように周囲が暗くなる時刻である。）となった場合、或いは、所定の車載機器６（例えば、この場合、ヘッドライトスイッチである。）を操作した場合等がある。なお、自車とトンネルとの間の距離は、ナビゲーション装置７の出力に基づいて取得される。 The dialogue start condition includes, for example, a predetermined time (for example, dusk at this time) when a distance to a predetermined point (for example, a point where the surroundings become dark like a tunnel in this case) is less than a threshold. Etc.), or when a predetermined vehicle-mounted device 6 (for example, a headlight switch in this case) is operated. The distance between the host vehicle and the tunnel is acquired based on the output of the navigation device 7.

また、対話シナリオの変更には、例えば、音声出力させる質問の内容、質問の順番、説明に用いる用語等の変更が含まれる。 In addition, the change of the dialogue scenario includes, for example, changes of the contents of questions to be output by voice, the order of questions, terms used for explanation, and the like.

なお、対話シナリオ決定手段１３は、質問に対するユーザ発話（回答）がなかった場合、或いは、期待した回答とは異なる回答を得た場合に、説明が正しく伝わっていないと判断する。 The dialogue scenario determination means 13 determines that the explanation is not correctly transmitted when there is no user utterance (answer) for the question or when an answer different from the expected answer is obtained.

また、対話シナリオ決定手段１３は、操作履歴データベース３２を参照して、オートライトコントロールを主題とする対話が終了した後にユーザがオートライトコントロールに対して行った操作内容を取得し、対話の最中に説明した通りの操作が行われたか否かを確認することで、オートライトコントロールの操作方法がユーザに正しく伝わったか否かを判定するようにしてもよく、発話履歴と操作履歴との間の関係に基づいてオートライトコントロールの操作方法がユーザに正しく伝わったか否かを判定するようにしてもよい。 In addition, the dialogue scenario determining means 13 refers to the operation history database 32, acquires the operation contents performed by the user for the autolight control after the dialogue on the subject of the autolight control is completed, and during the dialogue By checking whether or not the operation as described above is performed, it may be determined whether or not the operation method of the auto light control is correctly transmitted to the user, and between the utterance history and the operation history Based on the relationship, it may be determined whether or not the operation method of the auto light control has been correctly transmitted to the user.

例えば、期待した回答がなかったにもかかわらず説明通りの操作が行われたときには、操作方法を以前から知っていた、或いは、操作方法が正しく伝わったものと推定できる場合があるからであり、反対に、期待した回答があったにもかかわらず説明通りの操作が行われないときには、意図的に説明した手順とは異なる操作が行われただけであり、操作方法が正しく伝わっているものと推定できる場合があるからである。 For example, when there is no expected answer, if an operation as described is performed, it may be assumed that the operation method has been known for a long time, or that the operation method has been correctly transmitted, On the other hand, if there is an expected response but the operation is not performed as described, it is only an operation that is different from the procedure described intentionally, and that the operation method is correctly communicated. This is because there are cases where it can be estimated.

また、対話シナリオ決定手段１３は、ナビゲーション装置７が出力する現在時刻、目的地又はスケジュール等に関する情報から運転目的（例えば、通勤、通学、ショッピング等がある。）を推定し、推定した運転目的に基づいて対話シナリオを変更するようにしてもよい。例えば、対話シナリオ決定手段１３は、通勤、通学の途中では、単刀直入に本題に入る対話シナリオを採用し、後述の参照辞書決定手段１４により短い命令語を含む音声認識辞書３０が参照辞書として採用されるようにする。 The dialogue scenario determination means 13 estimates a driving purpose (for example, commuting, attending school, shopping, etc.) from information on the current time, destination, schedule, etc. output from the navigation device 7, and sets the estimated driving purpose. The dialogue scenario may be changed based on this. For example, the dialogue scenario determination means 13 adopts a dialogue scenario that goes directly to the main subject during commuting or school, and the speech recognition dictionary 30 including a short command word is used as a reference dictionary by the reference dictionary determination means 14 described later. To be adopted.

これにより、音声対話装置１００は、ユーザに不快感を与えることなく対話を継続させ、対話を通じて提供する情報により効果的に運転を支援することができる。 As a result, the voice interaction device 100 can continue the conversation without causing discomfort to the user, and can effectively support driving by the information provided through the conversation.

参照辞書決定手段１４は、参照すべき音声認識辞書３０を主題毎に決定するための手段であり、例えば、対話シナリオ決定手段１３の決定に基づいて変更された対話開始条件や対話シナリオに応じて何れの音声認識辞書３０を参照するかを決定する。 The reference dictionary determining unit 14 is a unit for determining the speech recognition dictionary 30 to be referred to for each subject. For example, the reference dictionary determining unit 14 corresponds to the dialog start condition or the dialog scenario changed based on the determination by the dialog scenario determining unit 13. It is determined which speech recognition dictionary 30 is to be referred to.

また、参照辞書決定手段１４は、発話履歴記録手段１１が記録した発話履歴に基づいて参照頻度の低い音声認識辞書３０を参照対象から除外するようにしてもよい。ユーザ発話が誤認識されるのを抑制し、音声認識の精度をより向上させるためである。 Further, the reference dictionary determining unit 14 may exclude the speech recognition dictionary 30 having a low reference frequency from the reference target based on the utterance history recorded by the utterance history recording unit 11. This is to prevent erroneous recognition of the user utterance and further improve the accuracy of speech recognition.

また、参照辞書決定手段１４は、発話履歴記録手段１１が記録した発話履歴に基づいてユーザ情報（例えば、あるユーザは、肯定を表す発話として「はい」、否定を表す発話として「いいえ」を用い、別のユーザは、肯定を表す発話として「そう」、否定を表す発話として「ちがう」を用いるといったユーザの特徴に関する情報をいう。）を抽出し、抽出したユーザ情報に基づいて参照すべき音声認識辞書３０を決定するようにしてもよい。 The reference dictionary determining unit 14 uses user information (for example, a certain user uses “yes” as an utterance indicating affirmation and “no” as an utterance indicating a denial based on the utterance history recorded by the utterance history recording unit 11. , Another user refers to information on the user's characteristics such as “Yes” as an utterance indicating affirmation and “No” as an utterance indicating denial), and the voice to be referred to based on the extracted user information The recognition dictionary 30 may be determined.

なお、参照辞書決定手段１４は、各音声認識辞書に割り当てられた辞書識別番号のうち音声認識手段１０に参照させたい音声認識辞書３０に対応する辞書識別番号だけを主題毎に記憶しながら、所望とする音声認識辞書３０を音声認識手段１０に参照させるようにする。 The reference dictionary determining unit 14 stores only the dictionary identification number corresponding to the speech recognition dictionary 30 to be referred to by the speech recognition unit 10 among the dictionary identification numbers assigned to the speech recognition dictionaries for each subject. The voice recognition dictionary 10 is referred to by the voice recognition means 10.

また、参照辞書決定手段１４は、各音声認識辞書が有する参照フラグの値を「０（非参照）」から「１（参照）」に切り替えることで音声認識手段１０が各音声認識辞書を参照できるようにし、全ての音声認識辞書３０の参照フラグの値を主題毎に記憶することで、所望とする音声認識辞書３０を音声認識手段１０に参照させるようにしてもよい。 Further, the reference dictionary determining means 14 can refer to each voice recognition dictionary by switching the value of the reference flag of each voice recognition dictionary from “0 (non-reference)” to “1 (reference)”. Thus, the speech recognition means 10 may be referred to by storing the values of the reference flags of all the speech recognition dictionaries 30 for each subject.

また、参照辞書決定手段１４は、主題毎に複数の音声認識辞書３０を参照対象として決定するが、単一の音声認識辞書３０を参照対象としてもよい。 Moreover, although the reference dictionary determination means 14 determines the some speech recognition dictionary 30 as a reference object for every theme, it is good also considering the single speech recognition dictionary 30 as a reference object.

さらに、参照辞書決定手段１４は、主題毎に何れの音声認識辞書３０を参照対象とするか決定するが、主題の一部分毎に、又は、質問毎に、何れの音声認識辞書３０を参照対象とするかを決定するようにしてもよい。 Further, the reference dictionary determining means 14 determines which speech recognition dictionary 30 is to be referred to for each subject, but which speech recognition dictionary 30 is to be referred to for each part of the subject or for each question. You may make it determine whether to do.

音声対話制御手段１５は、対話シナリオに沿ってユーザとの間の対話を制御する手段であり、例えば、所定の主題に対する対話開始条件が満たされた場合に、所定の質問を音声出力部４から音声出力させる。 The voice dialogue control means 15 is a means for controlling a dialogue with the user according to a dialogue scenario. For example, when a dialogue start condition for a predetermined subject is satisfied, a predetermined question is sent from the voice output unit 4. Output audio.

また、音声対話制御手段１５は、所定の質問を出力させてから所定時間（例えば、５秒間）にわたって音声入力部２を稼働させ、ユーザの発話を音声認識手段１０が認識できない状態（以下、「休止状態」とする。）からユーザの発話を音声認識手段１０が認識できる状態（以下、「待ち受け状態」とする。）に遷移させる。 The voice interaction control means 15 operates the voice input unit 2 for a predetermined time (for example, 5 seconds) after outputting a predetermined question, and the voice recognition means 10 cannot recognize the user's utterance (hereinafter, “ Transition to a state in which the speech recognition means 10 can recognize the user's utterance (hereinafter referred to as a “standby state”).

音声入力に対するユーザの負荷（例えば、音声入力のために音声入力ボタンの押下を強制することによる負荷がある。）を軽減させるためであり、無意味なユーザ発話を受け付けないようにしながら音声認識の精度をさらに向上させるためである。 This is to reduce the user's load on voice input (for example, there is a load caused by forcing the pressing of the voice input button for voice input), and voice recognition is performed while not accepting meaningless user utterances. This is to further improve the accuracy.

また、音声対話制御手段１５は、対話シナリオ決定手段１３が決定した対話シナリオに応じて音声認識手段１０が利用する音声認識エンジン（例えば、グラマーベース又はディクテーションベースがある。）を選択するようにしてもよい。 Further, the voice dialogue control means 15 selects a voice recognition engine (for example, grammar base or dictation base) used by the voice recognition means 10 in accordance with the dialogue scenario decided by the dialogue scenario decision means 13. Also good.

次に、図２を参照しながら、音声対話装置１００が参照対象となる音声認識辞書３０を決定する処理（以下、「音声認識辞書決定処理」とする。）について説明する。なお、図２は、音声認識辞書決定処理の流れを示すフローチャートであり、音声対話装置１００は、例えば、エンジンを始動させた時点において、所定の主題に対する次回の対話に備えて音声認識辞書決定処理を実行するものとする。 Next, a process (hereinafter referred to as “speech recognition dictionary determination process”) in which the voice interaction apparatus 100 determines a speech recognition dictionary 30 to be referred to will be described with reference to FIG. FIG. 2 is a flowchart showing the flow of the speech recognition dictionary determination process. For example, when the engine is started, the speech dialogue apparatus 100 prepares for the next conversation with respect to a predetermined subject. Shall be executed.

最初に、音声対話装置１００の制御部１は、発話履歴データベース３１を参照して、主題「二輪駆動・四輪駆動切り替えスイッチの使用方法」に対する発話履歴を取得する（ステップＳ１）。二輪駆動・四輪駆動切り替えスイッチに対するユーザの理解度を推定するためであり、制御部１は、例えば、質問に対するユーザ発話（回答）がなかった場合、或いは、期待した回答とは異なる回答を得た場合に、二輪駆動・四輪駆動切り替えスイッチに対するユーザの理解度が低いものと推定する。 First, the control unit 1 of the voice interaction apparatus 100 refers to the utterance history database 31 and acquires the utterance history for the subject “method of using the two-wheel drive / four-wheel drive switch” (step S1). This is to estimate the user's understanding of the two-wheel drive / four-wheel drive switch, and the control unit 1 obtains an answer different from the expected answer, for example, when there is no user utterance (answer) to the question. In this case, it is estimated that the user's understanding of the two-wheel drive / four-wheel drive switch is low.

このとき、制御部１は、発話履歴に基づいて対話の受け答えに関するユーザの特徴を抽出するようにしてもよい。対話の受け答えに関するユーザの特徴は、対話シナリオ及び音声認識辞書３０を決定する上で重要だからである。 At this time, the control unit 1 may extract the user's characteristics regarding the response to the dialogue based on the utterance history. This is because the user's characteristics regarding the response to the dialogue are important in determining the dialogue scenario and the speech recognition dictionary 30.

また、制御部１は、操作履歴データベース３２を参照して、二輪駆動・四輪駆動切り替えスイッチの操作履歴を取得する（ステップＳ２）。上記同様、二輪駆動・四輪駆動切り替えスイッチに対するユーザの理解度を推定するためであり、制御部１は、例えば、前回の対話で提供したアドバイスに沿った操作が行われていない場合に、二輪駆動・四輪駆動切り替えスイッチに対するユーザの理解度が低いものと推定する。 Further, the control unit 1 refers to the operation history database 32 and acquires the operation history of the two-wheel drive / four-wheel drive changeover switch (step S2). Similarly to the above, this is for estimating the user's understanding of the two-wheel drive / four-wheel drive changeover switch. For example, when the operation in accordance with the advice provided in the previous dialogue is not performed, the control unit 1 It is estimated that the user's understanding of the drive / four-wheel drive switch is low.

さらに、制御部１は、走行履歴（走行速度の推移等がある。）に基づいて二輪駆動・四輪駆動切り替えスイッチに対するユーザの理解度を推定するようにしてもよい。 Furthermore, the control unit 1 may estimate the user's level of understanding of the two-wheel drive / four-wheel drive changeover switch based on the travel history (there is a transition of travel speed, etc.).

その後、制御部１は、対話シナリオ決定手段１３により、二輪駆動・四輪駆動切り替えスイッチに対するユーザの推定理解度に応じて、対話シナリオを変更する（ステップＳ３）。 Thereafter, the control unit 1 uses the dialogue scenario determination unit 13 to change the dialogue scenario in accordance with the user's estimated comprehension level with respect to the two-wheel drive / four-wheel drive changeover switch (step S3).

前回の対話シナリオでは「二輪駆動・四輪駆動切り替えスイッチの使用方法」がユーザに十分伝わらなかったためユーザの理解度が低いと推定された場合、制御部１は、より詳細に使用方法を説明する対話シナリオ、説明の順序を変えた対話シナリオ、別の観点（例えば、燃費の向上等がある。）で説明をする対話シナリオ等を採用して、ユーザの理解度を高めるようにする。 In the previous dialogue scenario, when “how to use the two-wheel drive / four-wheel drive switching switch” is not sufficiently communicated to the user, and it is estimated that the user's understanding level is low, the control unit 1 explains the usage in more detail. A dialogue scenario, a dialogue scenario in which the order of explanation is changed, a dialogue scenario explaining from another viewpoint (for example, improvement in fuel consumption, etc.), etc. are adopted to enhance the user's understanding.

一方、前回の対話シナリオで「二輪駆動・四輪駆動切り替えスイッチの使用方法」がユーザに十分伝えることができたためユーザの理解度が高いと推定された場合、制御部１は、説明を簡略化した対話シナリオ等を採用し、かつ、対話開始条件を変更して対話が実行される頻度を低減させて、不要なアドバイスが繰り返し音声出力されないようにする。 On the other hand, when it is estimated that the degree of understanding of the user is high because “how to use the two-wheel drive / four-wheel drive switch” can be sufficiently communicated to the user in the previous dialogue scenario, the control unit 1 simplifies the explanation. The dialogue scenario or the like is adopted, and the dialogue start condition is changed to reduce the frequency of the dialogue, so that unnecessary advice is not repeatedly output by voice.

これに伴い、制御部１は、参照辞書決定手段１４により、対話シナリオ決定手段１３によって採用された新たな対話シナリオにおけるユーザ発話の音声認識精度を高めるために、発話が期待される語彙を含む音声認識辞書３０を参照対象として新たに採用し、また、参照頻度の低い音声認識辞書３０を参照対象から除外するようにする（ステップＳ４）。 Accordingly, the control unit 1 uses the reference dictionary determination unit 14 to increase the speech recognition accuracy of the user utterance in the new dialog scenario adopted by the dialog scenario determination unit 13, and includes the vocabulary that is expected to be uttered. The recognition dictionary 30 is newly adopted as a reference object, and the speech recognition dictionary 30 with a low reference frequency is excluded from the reference object (step S4).

以上の構成により、音声対話装置１００は、過去の対話履歴に基づいて対話シナリオを変えることで積極的に対話内容を充実させることができる。 With the above configuration, the voice interaction device 100 can actively enrich the conversation contents by changing the conversation scenario based on the past conversation history.

また、音声対話装置１００は、対話シナリオに応じて音声認識辞書３０を変えることで音声認識の精度を継続的に向上させることができる。 In addition, the voice interaction apparatus 100 can continuously improve the accuracy of voice recognition by changing the voice recognition dictionary 30 according to the dialogue scenario.

また、音声対話装置１００は、過去の対話履歴に加え、各車載機器の操作履歴をも参照しながら過去の対話シナリオの有効性を継続的に検証するので、よりよい対話シナリオを採用しながら積極的に対話内容を充実させることができる。
また、音声対話装置１００は、対話シナリオに応じて音声認識辞書３０を変えることで、ユーザ発話の誤認識が一時的に増加したり、音声認識の精度が一時的に低下したりするのを防止することができる。 In addition, since the voice interaction apparatus 100 continuously verifies the effectiveness of the past conversation scenario while referring to the operation history of each in-vehicle device in addition to the past conversation history, the voice interaction apparatus 100 actively uses a better conversation scenario. The content of dialogue can be enhanced.
In addition, the voice interaction apparatus 100 prevents the recognition error of the user utterance from temporarily increasing or the accuracy of the voice recognition from temporarily decreasing by changing the voice recognition dictionary 30 according to the conversation scenario. can do.

また、音声対話装置１００は、参照頻度の低い音声認識辞書３０を参照対象から除外するので、誤認識の確率を低減させながら、音声認識の精度をさらに向上させることができる。 In addition, since the voice interaction apparatus 100 excludes the voice recognition dictionary 30 with a low reference frequency from the reference target, it is possible to further improve the accuracy of voice recognition while reducing the probability of erroneous recognition.

以上、本発明の好ましい実施例について詳説したが、本発明は、上述した実施例に制限されることはなく、本発明の範囲を逸脱することなしに上述した実施例に種々の変形及び置換を加えることができる。 Although the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the above-described embodiments, and various modifications and substitutions can be made to the above-described embodiments without departing from the scope of the present invention. Can be added.

例えば、上述の実施例において、音声対話装置１００は、オートライトコントロールの操作方法を音声案内した場合にユーザが操作方法を正しく理解できたか否かを判断して対話シナリオ及び音声案内辞書を変更するが、燃費が低下していることから燃費を向上させるためのアドバイスを音声案内した場合にアドバイス通りの運転が行われたか否かを判断して対話シナリオ及び音声案内辞書を変更するようにしてもよい。 For example, in the embodiment described above, the voice interaction device 100 changes the dialogue scenario and the voice guidance dictionary by determining whether or not the user has correctly understood the operation method when voice guidance is given for the operation method of the automatic light control. However, when the guidance for improving the fuel efficiency is voice-guided because the fuel efficiency is reduced, it is possible to change the dialogue scenario and the voice guidance dictionary by judging whether or not the operation is performed as advised. Good.

本発明に係る音声対話装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the voice interactive apparatus which concerns on this invention. 音声認識辞書決定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a speech recognition dictionary determination process.

Explanation of symbols

１制御部
２音声入力部
３記憶部
４音声出力部
５表示部
６車載機器
７ナビゲーション装置
１０音声認識手段
１１発話履歴記録手段
１２操作履歴記録手段
１３対話シナリオ決定手段
１４参照辞書決定手段
１５音声対話制御手段
３０音声認識辞書
３１発話履歴データベース
３２操作履歴データベース
１００音声対話装置 DESCRIPTION OF SYMBOLS 1 Control part 2 Audio | voice input part 3 Memory | storage part 4 Audio | voice output part 5 Display part 6 In-vehicle apparatus 7 Navigation apparatus 10 Voice recognition means 11 Speech history recording means 12 Operation history recording means 13 Dialog scenario determination means 14 Reference dictionary determination means 15 Voice dialog Control means 30 Speech recognition dictionary 31 Utterance history database 32 Operation history database 100 Spoken dialogue apparatus

Claims

A speech dialogue apparatus for controlling a dialogue with a user while recognizing a user's utterance using a plurality of voice recognition dictionaries as a reference object,
Utterance history recording means for recording text data corresponding to the user's utterance as utterance history for each subject,
Dialog scenario determining means for determining a dialog scenario based on the utterance history recorded for each subject by the utterance history recording means ;
Reference dictionary determining means for determining a speech recognition dictionary to be referred to based on the dialog scenario determined by the dialog scenario determining means;
A spoken dialogue apparatus comprising:

For example Bei the operation history recording means to record the operation history of the car mounting equipment,
The dialogue scenario determining means determines an interaction scenario in each subject based on the operation history of the in-vehicle device related to each subject recorded by the operation history recording means and the utterance history recorded for each subject by the utterance history recording means,
The reference dictionary determining means determines a speech recognition dictionary to be referred to based on the dialog scenario determined by the dialog scenario determining means;
The spoken dialogue apparatus according to claim 1.

The reference dictionary determining means excludes a speech recognition dictionary having a low reference frequency from a reference target;
The spoken dialogue apparatus according to claim 1 or 2, characterized by the above.