JP2005004143A

JP2005004143A - Speech recognition device, program, and navigation system for vehicle

Info

Publication number: JP2005004143A
Application number: JP2003170591A
Authority: JP
Inventors: Ryoko Tokuhisa; 良子徳久
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2003-06-16
Filing date: 2003-06-16
Publication date: 2005-01-06

Abstract

<P>PROBLEM TO BE SOLVED: To provide a navigation system for vehicle that urges a user to speak so that efficient speech recognition processing is performed. <P>SOLUTION: A control part generates an answer sentence by applying an <input wait information type> and a <vocabulary retrieval range> to a template corresponding to an obtained input wait information type by reference to an answer sentence generation template. For example, when the input wait information type is a "destination" and the retrieval range is "Aichi Prefecture", the control part generates an answer sentence "Is the destination Aichi Prefecture?". When the input wait information type is a "facility name" and the retrieval range is "Aichi Prefecture", the control part generates an answer sentence "Input the facility name in Aichi Prefecture.". <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識装置及びプログラム並びに車両用ナビゲーションシステムに係り、例えば、車両に搭載されたナビゲーションシステムに用いて好適な音声認識装置及びプログラム並びに車両用ナビゲーションシステムに関する。
【０００２】
【従来の技術】
従来、音声認識対象の語彙が増加しても音声認識率が低下しないように音声認識辞書を動的に再構築するナビゲーション装置が提案されている（例えば、特許文献１を参照。）。
【０００３】
特許文献１に記載された技術は、音声認識対象語彙辞書データベースに記述された各語彙の属性情報と自車位置からの距離をもとに、認識対象とする語彙を選択し、認識対象語彙辞書の更新を行う。すなわち、各語彙のカテゴリ情報に応じた認識対象語彙を選択することができる。
【０００４】
【特許文献１】
特開２００１−２４９６８６号公報（第５３段落）
【０００５】
【発明が解決しようとする課題】
しかし、特許文献１に記載された技術では、ユーザの想定する音声認識の受理語彙と実際のシステムの受理語彙との不一致により、適切に相互対話を行うことができないことがあった。
【０００６】
例えば、自車位置が愛知県であるのに博多駅を目的地に設定する場合を考える。ナビゲーション装置が「目的地はどこですか？」と音声を出力し、これに対して、ユーザが「博多駅」と発話したとする。このとき、ナビゲーション装置は、愛知県内にある「熱田駅」と誤って認識してしまうことがあった。一方、ユーザは、再びナビゲーション装置に音声認識を実行させるために、「博多駅」を繰り返す。このように、ナビゲーション装置は、ユーザが自車位置と全く異なる場所を目的地に設定しようとしても、その目的地を認識することができなかった。
【０００７】
この理由は、第１に、ナビゲーション装置が、自車位置情報に基づいて受理語彙を愛知県の施設に限定して、音声認識を行ったためである。第２に、ユーザは、上記第１の理由を理解していなかったためである。この結果、ナビゲーション装置が単に音声の誤認識をしたのか、限定した辞書に「博多駅」が含まれていなかったのかを、ユーザは把握することができなかった。
【０００８】
本発明は、上述した課題を解決するために提案されたものであり、効率的に音声認識処理を行うようにユーザの発話を促すことができる音声認識装置及びプログラム並びに車両用ナビゲーションシステムを提供することを目的とする。
【０００９】
【課題を解決するための手段】
請求項１に記載の発明である音声認識装置は、入力音声の特徴量にマッチングさせる音響モデルと、語彙と音響モデルの並び方との対応関係を示す語彙辞書と、を用いて、入力音声の認識処理を行う音声認識装置であって、車両情報を検出する車両情報検出手段と、前記車両情報検出手段により検出された車両情報に基づいて、前記語彙辞書の検索範囲を限定する限定手段と、前記限定手段によって限定された検索範囲と、システムの入力待ち情報を表す入力待ち情報と、に基づいて応答文を生成する応答文生成手段と、前記応答文生成手段により生成された応答文を出力する応答文出力手段と、を備えている。
【００１０】
請求項４に記載の発明である音声認識装置は、入力音声の特徴量にマッチングさせる音響モデルと、語彙と音響モデルの並び方との対応関係を示す語彙辞書と、を用いて、入力音声の認識処理を行う音声認識プログラムであって、コンピュータを、車両情報に基づいて、前記語彙辞書の検索範囲を限定する限定手段と、前記限定手段によって限定された検索範囲と、システムの入力待ち情報を表す入力待ち情報と、に基づいて応答文を生成する応答文生成手段と、前記応答文生成手段により生成された応答文を出力する応答文出力手段と、として機能させる。
【００１１】
車両情報検出手段は、自車に関する車両情報、例えば、現在の自車位置などを検出する。限定手段は、車両情報に基づいて語彙辞書の検索範囲を限定する。このように、語彙辞書の検索範囲を限定することにより、入力された音声が語彙辞書の検索範囲内であれば、音声認識処理を迅速かつ効率的に行うことが可能になる。そこで、応答文生成手段は、語彙辞書の検索範囲内の音声が入力されるように、検索範囲と入力待ち情報とに基づいて応答文を生成する。
【００１２】
したがって、前記音声認識装置及びプログラムによれば、車両情報に基づいて語彙辞書の検索範囲を限定し、限定された検索範囲と入力待ち情報とに基づいて応答文を生成して出力することにより、語彙辞書を用いて迅速かつ効率的に音声認識処理ができるように、ユーザの発話を促すことができる。
【００１３】
請求項２に記載の発明である音声認識装置は、請求項１に記載の発明であって、前記応答文生成手段は、前記応答文として、前記入力待ち情報が前記検索範囲に関連しているか否かを確認する旨の疑問文を生成する。
【００１４】
請求項５に記載の発明である音声認識プログラムは、請求項４に記載の発明であって、前記応答文生成手段は、前記応答文として、前記入力待ち情報が前記検索範囲に関連しているか否かを確認する旨の疑問文を生成する。
【００１５】
したがって、前記音声認識装置及びプログラムによれば、応答文として、入力待ち情報が検索範囲に関連しているか否かを確認する旨の疑問文を生成して出力することにより、入力待ち情報が検索範囲に関連しているか否かの発話を促して、音声認識を迅速に行うことができる。
【００１６】
請求項３に記載の発明である音声認識装置は、請求項１または請求項２に記載の発明であって、前記応答文生成手段は、前記検索範囲において前記入力待ち情報の入力を促す旨の応答文を生成する。
【００１７】
請求項６に記載の発明である音声認識装置は、請求項４または請求項５に記載の発明であって、前記応答文生成手段は、前記検索範囲において前記入力待ち情報の入力を促す旨の応答文を生成する。
【００１８】
したがって、前記音声認識装置及びプログラムによれば、検索範囲において入力待ち情報の入力を促す旨の応答文を生成して出力することにより、音声認識を迅速に行うことができる。
【００１９】
請求項７に記載の発明である車両用ナビゲーションシステムは、請求項１から請求項３のいずれか１項に記載の音声認識装置と、地図情報を記憶する地図情報記憶手段と、前記音声認識装置によって認識された情報と、前記地図情報記憶手段に記憶された地図情報とに基づいて、目的地までの経路を探索する経路探索手段と、を備えている。
【００２０】
したがって、前記車両用ナビゲーションシステムによれば、音声認識装置によって認識された情報と地図情報記憶手段に記憶された地図情報とに基づいて目的地までの経路を探索することにより、音声入力により、迅速かつ効率的に目的地までの経路を探索することができる。
【００２１】
【発明の実施の形態】
以下、本発明の好ましい実施の形態について図面を参照しながら詳細に説明する。本実施形態では、車両に搭載されたナビゲーションシステムを例に挙げて説明する。
【００２２】
図１は、本発明の実施形態に係るナビゲーションシステム１の構成を示すブロック図である。
【００２３】
ナビゲーションシステム１は、ユーザの音声を入力するマイク１１と、音声認識開始を指示するためのＰＴＴスイッチ１２と、音声認識のためのデータを記憶した音声認識データベース１３と、音声認識を行う音声認識部１４と、ＧＰＳ電波を受信して現在の自車位置を検出するＧＰＳセンサ１５と、地図を描画するための地図データや施設データ等を記憶した地図データベース１６と、目的地の検索やルート探索その他のシステム全体の制御を行う制御部１７と、を備えている。
【００２４】
さらに、ナビゲーションシステム１は、後述のディスプレイ１９の表示制御を行う表示制御部１８と、自車位置周辺の地図やシステム画面等を表示するディスプレイ１９と、音声合成を行う音声合成部２０と、音声を出力するスピーカ２１と、システム上の様々なデータが記憶されたシステムデータベース２２と、ユーザが操作する操作部２３と、を備えている。
【００２５】
音声認識データベース１３には、入力音声の特徴量とマッチングするための音響モデルと、語彙と音響モデルの並び方（音響モデル列）との関係を表す語彙辞書と、が記憶されている。
【００２６】
図２は、音声認識データベース１３に記憶されている語彙辞書の構成を示す図である。語彙辞書は、例えば、語彙、音響モデル列、位置情報、ラベル名を有している。ここで、語彙は、目的地となりうるすべての語彙、例えば、地名、通り名、交差点名、娯楽施設、レストラン、デパート、個人宅などが該当する。そして、各々の語彙には、その音響モデル列、その語彙によって表されたものが存在する位置情報（緯度及び経度）、その特徴を表すラベル名が対応付けられている。ラベル名としては、例えば、施設名称、人名、地名などが該当する。
【００２７】
音声認識部１４は、ＰＴＴスイッチ１２がオンになると、マイク１１に入力された音声について音声認識処理を実行する。具体的には、音声認識処理部１４は、入力された音声について音声分析を行って特徴量を抽出する。次に、音声認識処理部１４は、特徴量と音響モデルとのマッチングを行い、さらに、音声認識データベース１３に記憶されている語彙辞書を参照して、音響モデル列に対応する語彙を認識する。音声認識部１４は、このようにして得られた認識結果を制御部１７に供給する。
【００２８】
制御部１７は、ユーザとシステムとの対話を円滑に制御するための対話インタフェース部１７ａを有している。また、制御部１７は、ＧＰＳセンサ１５で検出された現在の自車位置（緯度及び経度）と、地図データベース１６に記憶された地図データと、に基づいて、現在の自車位置周辺の地図をディスプレイ１９に表示する。さらに、制御部１７は、ユーザの発話指示に従って、目的地を探索し、現在の自車位置から目的地までのルートをディスプレイ１９に表示する。
【００２９】
システムデータベース２２には、ユーザに対して入力待ち情報を発話させるためのデータが記憶されている。システムデータベース２２には、具体的には、入力待ち情報種別を検出するための入力待ち情報種別検出テーブルと、応答文を生成するための応答文生成テンプレートテーブルと、が記憶されている。
【００３０】
図３は、入力待ち情報種別検出テーブルの構成を示す図である。入力待ち情報種別検出テーブルには、複数のラベル名と、各ラベル名に対応する入力待ち情報種別とが記述されている。例えば、ラベル名が「施設名称」のみの場合は、入力待ち情報種別は「施設名称」になる。また、ラベル名が「地名」のみの場合は、入力待ち情報種別は「地名」になる。さらに、ラベル名が複数ある場合（例えば、ラベル名が「地名」と「施設名称」）の場合、入力待ち情報種別は「目的地」になる。
【００３１】
図４は、応答文生成テンプレートテーブルの構成を示す図である。本実施形態では、応答文生成テンプレートテーブルは、入力待ち情報種別毎にテンプレートを有している。例えば、入力待ち情報種別が施設名称、人名、地名のいずれかの場合、テンプレートは「＜語彙検索範囲＞の＜入力待ち情報種別＞を入力して下さい」である。ここで、＜語彙検索範囲＞には、語彙辞書の現在の検索範囲が入力される。＜入力待ち情報種別＞には、施設名称、人名、地名のいずれかがそのまま入力される。また、入力待ち情報種別が目的地の場合、テンプレートは「＜入力待ち情報種別＞は＜語彙検索範囲＞ですか」である。
【００３２】
操作部２３は、システムの動作モードを選択したり、目的地を検索したり、目的地を設定するとき等、所定の情報を入力するためにユーザが操作するものであり、例えば、ボタン、ジョイスティック等が該当する。なお、ディスプレイ１９及び操作部２３の代わりに、いわゆるタッチパネルを設けてもよい。
【００３３】
上記ナビゲーションシステム１は、地図データベース１６に記憶されている施設データを用いて、キーワード検索により目的地を検索し、現在の自車位置から目的までのルートを表示する。
【００３４】
図５は、ナビゲーションシステム１に備えられた制御部１７の経路探索の処理手順を示すフローチャートである。制御部１７は、音声認識部１４を介して、ＰＴＴスイッチ１２が押下されたことを検出すると、ステップＳＴ１以下の処理を実行する。
【００３５】
ステップＳＴ１では、制御部１７は、ＧＰＳセンサ１５で検出された現在の自車位置情報（緯度及び経度）を取得して、ステップＳＴ２に移行する。
【００３６】
ステップＳＴ２では、制御部１７は、現在の自車位置に基づいて、音声認識データベース１３に記憶された語彙辞書の検索範囲を決定して、ステップＳＴ３に移行する。制御部１７は、例えば、現在の自車位置に基づいて、現在の自車位置がある都道府県を検出し、この都道府県を検索範囲として決定する。
【００３７】
ステップＳＴ３では、制御部１７は、語彙辞書の中に記述されている各々の位置情報を参照して、検索範囲に該当する語彙を選択して、ステップＳＴ４に移行する。制御部１７は、例えば、現在位置の都道府県に存在する語彙（例えば、地名、通り名、交差点名、娯楽施設、レストラン、デパート、個人宅など）を選択する。
【００３８】
また、制御部１７は、応答文（又は現在の検索範囲）に対する予測可能な返事を表す語彙を選択してもよい。予測可能な返事を表す語彙としては、例えば、応答文を肯定又は否定する語彙（例えば「はい」、「いいえ」、「違います」など）や、検索範囲と同じカテゴリの語彙（例えば、検索範囲が愛知県の場合、他の都道府県名）が該当する。
【００３９】
ステップＳＴ４では、制御部１７は、語彙辞書を参照して、ステップＳＴ３で選択された各々語彙に対応するラベルを抽出する。そして、システムデータベース２２に記憶された入力待ち情報種別検出テーブルして、各々ラベル名に対応する入力待ち情報種別を取得して、ステップＳＴ５に移行する。
【００４０】
例えば、ステップＳＴ３で所定の都道府県に存在するすべての語彙が選択された場合、複数種類のラベルが抽出され、その結果、入力待ち情報種別は「目的地」になる。また、ステップＳＴ３で選択された語彙に対応するラベルが「施設名称」のみの場合、入力待ち情報種別は「施設名称」になる。
【００４１】
ステップＳＴ５では、制御部１７は、応答文生成テンプレートテーブルを参照して、ステップＳＴ３で取得された入力待ち情報種別に対応するテンプレートに対して、＜入力待ち情報種別＞及び＜語彙検索範囲＞を当てはめて、応答文を作成して、ステップＳＴ６に移行する。ここで、＜入力待ち情報種別＞には、ステップＳＴ３で取得された入力待ち情報種別が当てはめられる。＜語彙検索範囲＞には、ステップＳＴ２で決定された検索範囲が当てはめられる。
【００４２】
例えば、入力待ち情報種別が「目的地」で、検索範囲が「愛知県」の場合、制御部１７は、「目的地は愛知県ですか」という応答文を生成する。また、入力待ち情報種別が「施設名称」で、検索範囲が「愛知県」の場合、制御部１７は、「愛知県の施設名称を入力して下さい」という応答文を生成する。
【００４３】
ステップＳＴ６では、制御部１７は、ステップＳＴ６で生成された応答文に基づいて、音声合成部２０を介して、スピーカ２１から音声合成を出力する。このとき、制御部１７は、システム状態を表すシステム画面をディスプレイ１９に表示させる。
【００４４】
図６から図８は、ディスプレイ１９に表示されたシステム画面を示す図である。システム画面３０は、システム状態を示すシステム状態表示エリア３１と、目的地の検索範囲を示す検索範囲ボックス３２と、目的地のジャンルを示すジャンルボックス３３と、目的地の市町村名を示す市区町村ボックス３４と、目的地の施設名を示す施設名ボックス３５と、を表示する。
【００４５】
例えば、制御部１７は、スピーカ２１を介して「目的地は愛知県ですか」という応答文を出力すると共に、図６に示すシステム画面３０を表示する。この場合、目的地の検索範囲が愛知県であるかを確認している状態であるので、システム状態表示エリア３１には「検索範囲指定中」が表示されている。
【００４６】
これにより、ナビゲーションシステム１は、システムの入力待ち情報が「目的地」であること、及び語彙辞書の検索範囲が「愛知県」内の施設等の語彙に限定されていることを、ユーザに明示することができる。
【００４７】
ここでは、ユーザは、ＰＴＴスイッチ１２を押して、例えば「はい」と発話したとする。制御部１７は、ＰＴＴスイッチ１２が押されたことを検出すると、ステップＳＴ７に移行する。
【００４８】
ステップＳＴ７では、制御部１７は、マイク１１を介して入力された音声（「はい」）について、音声認識処理部１４に音声認識処理を実行させて、ステップＳＴ８に移行する。
【００４９】
ステップＳＴ８では、制御部１７は、音声認識処理部１４の音声認識結果に基づいて様々なシステム処理を実行する。
【００５０】
例えば、制御部１７は、ステップＳＴ６の「はい」というユーザの発話に対して、スピーカ２１から「愛知県の施設名称を入力して下さい」という入力を促す応答文を出力させると共に、ディスプレイ１９に図７に示すシステム画面３０を表示させる。その後、制御部１７は、音声により愛知県内の施設名称が入力されると、その施設に目的地を設定する。
【００５１】
一方、ステップＳＴ６において、ユーザは、ＰＴＴスイッチ１２を押して、「はい」の代わりに「いいえ、福岡県です」と発話したとする。このとき、制御部１７は、ディスプレイ１９に図８に示すシステム画面３０を表示させると共に、スピーカ２１から「福岡県の施設名称を入力して下さい」という応答文を出力させる。その後、制御部１７は、音声により福岡県内の施設名称が入力されると、その施設に目的地を設定する。
【００５２】
制御部１７は、このような処理によって目的地を徐々に絞り込んで目的地を設定し、現在の自車位置から目的地までの経路探索等のシステム処理を実行して、処理を終了する。
【００５３】
以上のように、本実施形態に係るナビゲーションシステム１は、自車位置等の車両状態に基づいて、音声認識処理に使用する語彙辞書の検索範囲を決定している。そして、この検索範囲と入力待ち情報種別に基づいて、入力待ち情報が検索範囲に関連しているか否かを確認するような応答文や、入力待ち情報の入力を促すような応答文を出力することができる。この結果、効率的に音声認識処理を行うようにユーザの発話を促すことができる。
【００５４】
すなわち、上記ナビゲーションシステム１は、システムが期待する入力待ち情報と、音声認識で使用する語彙辞書の検索範囲と、をユーザに対して同時に明示することにより、ユーザに対してそれぞれの情報を個別に確認する場合に比べて、効率的に期待する情報を得ることができ、システムとユーザとの間で自然な会話を行うことができる。
【００５５】
なお、上記ナビゲーションシステム１は、所定の音声認識プログラムにより制御されている。このような音声認識プログラムは、ＲＯＭ等の半導体メモリ、光ディスク、磁気ディスク等のいずれに記録されてもよい。
【００５６】
なお、本発明は、上述した実施の形態に限定されるものではなく、特許請求の範囲に記載された範囲内で設計上の変更をされたものにも適用可能であるのは勿論である。
【００５７】
例えば、制御部１７は、ユーザに対して、スピーカ２１を介して応答文を音声出力していたが、ディスプレイ１９を介して応答文を画像出力してもよい。また、制御部１７は、現在の自車位置に基づいて語彙辞書の検索範囲を決定したが、ユーザからの音声入力情報、操作部２３の操作入力情報など、任意の情報を用いて、語彙辞書の検索範囲を決定してもよい。また、制御部１７は、自車位置、音声入力情報、操作部２３の操作入力情報など、任意の情報を用いて入力待ち情報（入力待ち情報種別）を検出してもよい。
【００５８】
【発明の効果】
本発明に係る音声認識装置及びプログラムは、車両情報に基づいて語彙辞書の検索範囲を限定し、限定された検索範囲と入力待ち情報とに基づいて応答文を生成して出力することにより、語彙辞書を用いて迅速かつ効率的に音声認識処理ができるように、ユーザの発話を促すことができる。
【００５９】
本発明に係る車両用ナビゲーションシステムは、音声認識装置によって認識された情報と地図情報記憶手段に記憶された地図情報とに基づいて目的地までの経路を探索することにより、音声入力により、迅速かつ効率的に目的地までの経路を探索することができる。
【図面の簡単な説明】
【図１】本発明の実施形態に係るナビゲーションシステムの構成を示すブロック図である。
【図２】音声認識データベースに記憶されている語彙辞書の構成を示す図である。
【図３】入力待ち情報種別検出テーブルの構成を示す図である。
【図４】応答文生成テンプレートテーブルの構成を示す図である。
【図５】ナビゲーションシステムに備えられた制御部の処理ルーチンを示すフローチャートである。
【図６】ディスプレイに表示されたシステム画面を示す図である。
【図７】ディスプレイに表示されたシステム画面を示す図である。
【図８】ディスプレイに表示されたシステム画面を示す図である。
【符号の説明】
１ナビゲーションシステム
１１マイク
１２ＰＴＴスイッチ
１３音声認識データベース
１４音声認識部
１５ＧＰＳセンサ
１６地図データベース
１７制御部
１８表示制御部
１９ディスプレイ
２０音声合成部
２１スピーカ
２２システムデータベース
２３操作部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a voice recognition device, a program, and a vehicle navigation system, and more particularly, to a voice recognition device, a program, and a vehicle navigation system that are suitable for use in a navigation system mounted on a vehicle.
[0002]
[Prior art]
Conventionally, there has been proposed a navigation device that dynamically reconstructs a speech recognition dictionary so that the speech recognition rate does not decrease even if the vocabulary to be recognized is increased (see, for example, Patent Document 1).
[0003]
The technique described in Patent Document 1 selects a vocabulary to be recognized based on attribute information of each vocabulary described in a speech recognition target vocabulary dictionary database and a distance from the vehicle position, and a recognition target vocabulary dictionary. Update. That is, the recognition target vocabulary according to the category information of each vocabulary can be selected.
[0004]
[Patent Document 1]
JP 2001-249686 A (paragraph 53)
[0005]
[Problems to be solved by the invention]
However, in the technique described in Patent Document 1, there is a case in which a mutual interaction cannot be appropriately performed due to a mismatch between a speech recognition acceptance vocabulary assumed by a user and an actual system acceptance vocabulary.
[0006]
For example, consider the case where Hakata Station is set as the destination even though the vehicle position is Aichi Prefecture. It is assumed that the navigation device outputs a voice saying “Where is your destination?” And the user utters “Hakata Station”. At this time, the navigation device may mistakenly recognize “Atsuta Station” in Aichi Prefecture. On the other hand, the user repeats “Hakata Station” to cause the navigation device to perform voice recognition again. As described above, the navigation device cannot recognize the destination even if the user tries to set a location completely different from the own vehicle position as the destination.
[0007]
The reason for this is that, first, the navigation device limited the accepted vocabulary to facilities in Aichi based on the vehicle position information and performed voice recognition. Secondly, the user did not understand the first reason. As a result, the user could not grasp whether the navigation device simply recognized the voice incorrectly or whether the limited dictionary did not include “Hakata Station”.
[0008]
The present invention has been proposed to solve the above-described problems, and provides a voice recognition device and program that can prompt a user's speech to efficiently perform voice recognition processing, and a vehicle navigation system. For the purpose.
[0009]
[Means for Solving the Problems]
The speech recognition apparatus according to the first aspect of the present invention recognizes input speech using an acoustic model to be matched with a feature amount of the input speech and a vocabulary dictionary indicating a correspondence relationship between the vocabulary and the arrangement of acoustic models. A speech recognition device that performs processing, vehicle information detecting means for detecting vehicle information, limiting means for limiting a search range of the vocabulary dictionary based on vehicle information detected by the vehicle information detecting means, A response sentence generating means for generating a response sentence based on the search range limited by the limiting means and the input waiting information indicating the input waiting information of the system, and a response sentence generated by the response sentence generating means are output. Response sentence output means.
[0010]
The speech recognition apparatus according to claim 4 recognizes input speech by using an acoustic model to be matched with a feature amount of the input speech and a vocabulary dictionary indicating a correspondence relationship between the vocabulary and the arrangement of acoustic models. A speech recognition program for performing processing, wherein the computer represents, based on vehicle information, limiting means for limiting the search range of the vocabulary dictionary, search range limited by the limiting means, and system input waiting information It functions as a response sentence generating unit that generates a response sentence based on the input waiting information, and a response sentence output unit that outputs the response sentence generated by the response sentence generating unit.
[0011]
The vehicle information detection means detects vehicle information related to the host vehicle, for example, the current host vehicle position. The limiting means limits the search range of the vocabulary dictionary based on the vehicle information. Thus, by limiting the search range of the vocabulary dictionary, if the input speech is within the search range of the vocabulary dictionary, the speech recognition process can be performed quickly and efficiently. Therefore, the response sentence generation unit generates a response sentence based on the search range and the input waiting information so that the speech within the search range of the vocabulary dictionary is input.
[0012]
Therefore, according to the voice recognition device and the program, by limiting the search range of the vocabulary dictionary based on the vehicle information, by generating and outputting a response sentence based on the limited search range and the input waiting information, The user's utterance can be encouraged so that speech recognition processing can be performed quickly and efficiently using the vocabulary dictionary.
[0013]
The speech recognition apparatus according to claim 2 is the invention according to claim 1, wherein the response sentence generation means determines whether the input waiting information is related to the search range as the response sentence. Generate a question to confirm whether or not.
[0014]
The speech recognition program according to claim 5 is the invention according to claim 4, wherein the response sentence generation unit is configured to determine whether the input waiting information is related to the search range as the response sentence. Generate a question to confirm whether or not.
[0015]
Therefore, according to the voice recognition device and the program, the input waiting information is searched by generating and outputting a question sentence for confirming whether the input waiting information is related to the search range as a response sentence. It is possible to promptly recognize a voice by prompting an utterance as to whether or not it is related to a range.
[0016]
The speech recognition device according to claim 3 is the invention according to claim 1 or claim 2, wherein the response sentence generation means prompts input of the input waiting information in the search range. Generate a response sentence.
[0017]
The voice recognition device according to claim 6 is the invention according to claim 4 or claim 5, wherein the response sentence generation means prompts the input of the input waiting information in the search range. Generate a response sentence.
[0018]
Therefore, according to the voice recognition device and the program, voice recognition can be quickly performed by generating and outputting a response sentence for prompting input of input waiting information in the search range.
[0019]
According to a seventh aspect of the present invention, there is provided a vehicle navigation system comprising: the voice recognition apparatus according to any one of the first to third aspects; map information storage means for storing map information; and the voice recognition apparatus. Route search means for searching for a route to the destination based on the information recognized by the map information and the map information stored in the map information storage means.
[0020]
Therefore, according to the vehicle navigation system, the route to the destination is searched based on the information recognized by the voice recognition device and the map information stored in the map information storage means, thereby quickly The route to the destination can be searched efficiently and efficiently.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. In the present embodiment, a navigation system mounted on a vehicle will be described as an example.
[0022]
FIG. 1 is a block diagram showing a configuration of a navigation system 1 according to an embodiment of the present invention.
[0023]
The navigation system 1 includes a microphone 11 for inputting a user's voice, a PTT switch 12 for instructing start of voice recognition, a voice recognition database 13 storing data for voice recognition, and a voice recognition unit for performing voice recognition. 14, a GPS sensor 15 that receives GPS radio waves and detects the current vehicle position, a map database 16 that stores map data and facility data for drawing a map, destination search, route search, and the like And a control unit 17 that controls the entire system.
[0024]
Furthermore, the navigation system 1 includes a display control unit 18 that performs display control of a display 19 described later, a display 19 that displays a map, a system screen, and the like around the vehicle position, a voice synthesis unit 20 that performs voice synthesis, and a voice. Is provided, a system database 22 storing various data on the system, and an operation unit 23 operated by the user.
[0025]
The speech recognition database 13 stores an acoustic model for matching with the feature amount of the input speech, and a vocabulary dictionary that represents the relationship between the vocabulary and the arrangement of acoustic models (acoustic model sequence).
[0026]
FIG. 2 is a diagram showing the configuration of the vocabulary dictionary stored in the speech recognition database 13. The vocabulary dictionary has, for example, a vocabulary, an acoustic model sequence, position information, and a label name. Here, the vocabulary corresponds to all vocabularies that can be destinations, such as place names, street names, intersection names, entertainment facilities, restaurants, department stores, private houses, and the like. Each vocabulary is associated with the acoustic model string, position information (latitude and longitude) where the vocabulary is represented, and a label name representing the feature. As the label name, for example, a facility name, a person name, a place name, and the like are applicable.
[0027]
When the PTT switch 12 is turned on, the voice recognition unit 14 performs a voice recognition process on the voice input to the microphone 11. Specifically, the voice recognition processing unit 14 performs voice analysis on the input voice and extracts a feature amount. Next, the speech recognition processing unit 14 performs matching between the feature quantity and the acoustic model, and further recognizes a vocabulary corresponding to the acoustic model sequence with reference to a vocabulary dictionary stored in the speech recognition database 13. The voice recognition unit 14 supplies the recognition result thus obtained to the control unit 17.
[0028]
The control unit 17 includes a dialogue interface unit 17a for smoothly controlling dialogue between the user and the system. Further, the control unit 17 calculates a map around the current vehicle position based on the current vehicle position (latitude and longitude) detected by the GPS sensor 15 and the map data stored in the map database 16. This is displayed on the display 19. Further, the control unit 17 searches for a destination according to the user's utterance instruction, and displays a route from the current vehicle position to the destination on the display 19.
[0029]
The system database 22 stores data for allowing the user to speak input waiting information. Specifically, the system database 22 stores an input waiting information type detection table for detecting an input waiting information type and a response sentence generation template table for generating a response sentence.
[0030]
FIG. 3 is a diagram illustrating a configuration of the input waiting information type detection table. In the input waiting information type detection table, a plurality of label names and input waiting information types corresponding to the respective label names are described. For example, when the label name is “facility name” only, the input waiting information type is “facility name”. When the label name is only “place name”, the input waiting information type is “place name”. Furthermore, when there are a plurality of label names (for example, the label names are “place name” and “facility name”), the input waiting information type is “destination”.
[0031]
FIG. 4 is a diagram showing the configuration of the response sentence generation template table. In the present embodiment, the response sentence generation template table has a template for each input waiting information type. For example, when the input waiting information type is any of a facility name, a person name, and a place name, the template is “Please input <input waiting information type> of <vocabulary search range>”. Here, the current search range of the vocabulary dictionary is input to <vocabulary search range>. In the <input waiting information type>, any of the facility name, the person name, and the place name is input as it is. Further, when the input waiting information type is the destination, the template is “<input waiting information type> is <vocabulary search range>”.
[0032]
The operation unit 23 is operated by a user to input predetermined information such as when selecting an operation mode of the system, searching for a destination, or setting a destination. For example, a button, a joystick Etc. A so-called touch panel may be provided instead of the display 19 and the operation unit 23.
[0033]
The navigation system 1 searches for a destination by keyword search using facility data stored in the map database 16 and displays a route from the current vehicle position to the destination.
[0034]
FIG. 5 is a flowchart showing a route search processing procedure of the control unit 17 provided in the navigation system 1. When the control unit 17 detects that the PTT switch 12 has been pressed via the voice recognition unit 14, the control unit 17 executes the processes in and after step ST1.
[0035]
In step ST1, the control unit 17 acquires the current vehicle position information (latitude and longitude) detected by the GPS sensor 15, and proceeds to step ST2.
[0036]
In step ST2, the control unit 17 determines the search range of the vocabulary dictionary stored in the speech recognition database 13 based on the current vehicle position, and proceeds to step ST3. For example, based on the current vehicle position, the control unit 17 detects a prefecture where the current vehicle position is located, and determines the prefecture as a search range.
[0037]
In step ST3, the control unit 17 refers to each position information described in the vocabulary dictionary, selects a vocabulary corresponding to the search range, and proceeds to step ST4. For example, the control unit 17 selects a vocabulary (for example, a place name, a street name, an intersection name, an amusement facility, a restaurant, a department store, a private house, etc.) existing in the prefecture at the current position.
[0038]
Further, the control unit 17 may select a vocabulary representing a predictable answer to the response sentence (or the current search range). Examples of vocabularies that represent predictable responses include vocabularies that affirm or deny response sentences (eg, “Yes”, “No”, “No”, etc.), and vocabularies in the same category as the search range (eg, search range) In the case of Aichi Prefecture, the name of another prefecture is applicable.
[0039]
In step ST4, the control unit 17 refers to the vocabulary dictionary and extracts a label corresponding to each vocabulary selected in step ST3. Then, the input waiting information type detection table stored in the system database 22 is acquired, and the input waiting information type corresponding to each label name is acquired, and the process proceeds to step ST5.
[0040]
For example, when all the vocabularies existing in a predetermined prefecture are selected in step ST3, a plurality of types of labels are extracted, and as a result, the input waiting information type becomes “destination”. When the label corresponding to the vocabulary selected in step ST3 is only “facility name”, the input waiting information type is “facility name”.
[0041]
In step ST5, the control unit 17 refers to the response sentence generation template table and sets <input waiting information type> and <vocabulary search range> for the template corresponding to the input waiting information type acquired in step ST3. By applying, a response sentence is created, and the process proceeds to step ST6. Here, the input waiting information type acquired in step ST3 is applied to <input waiting information type>. The <lexical search range> is applied to the search range determined in step ST2.
[0042]
For example, when the input waiting information type is “Destination” and the search range is “Aichi Prefecture”, the control unit 17 generates a response sentence “Are the destination Aichi Prefecture?”. When the input waiting information type is “facility name” and the search range is “Aichi Prefecture”, the control unit 17 generates a response sentence “Please input the facility name of Aichi Prefecture”.
[0043]
In step ST6, the control unit 17 outputs voice synthesis from the speaker 21 via the voice synthesis unit 20 based on the response sentence generated in step ST6. At this time, the control unit 17 causes the display 19 to display a system screen representing the system state.
[0044]
6 to 8 are diagrams showing system screens displayed on the display 19. The system screen 30 includes a system status display area 31 indicating the system status, a search range box 32 indicating the search range of the destination, a genre box 33 indicating the genre of the destination, and a municipality indicating the city name of the destination A box 34 and a facility name box 35 indicating the facility name of the destination are displayed.
[0045]
For example, the control unit 17 outputs a response sentence “Is the destination Aichi Prefecture?” Via the speaker 21 and displays the system screen 30 shown in FIG. In this case, since it is in a state in which it is confirmed whether the search range of the destination is Aichi Prefecture, “search range is being specified” is displayed in the system status display area 31.
[0046]
As a result, the navigation system 1 clearly indicates to the user that the input waiting information of the system is “Destination” and that the search range of the vocabulary dictionary is limited to vocabularies such as facilities in “Aichi Prefecture”. can do.
[0047]
Here, it is assumed that the user presses the PTT switch 12 and speaks, for example, “Yes”. When the control unit 17 detects that the PTT switch 12 has been pressed, the control unit 17 proceeds to step ST7.
[0048]
In step ST7, the control unit 17 causes the voice recognition processing unit 14 to execute a voice recognition process on the voice (“Yes”) input via the microphone 11, and proceeds to step ST8.
[0049]
In step ST <b> 8, the control unit 17 executes various system processes based on the speech recognition result of the speech recognition processing unit 14.
[0050]
For example, in response to the user's utterance “Yes” in step ST 6, the control unit 17 outputs a response sentence that prompts the speaker 21 to input “Please enter the facility name of Aichi Prefecture” and causes the display 19 to display the response. A system screen 30 shown in FIG. 7 is displayed. Thereafter, when a facility name in Aichi Prefecture is input by voice, the control unit 17 sets a destination for the facility.
[0051]
On the other hand, in step ST6, it is assumed that the user presses the PTT switch 12 and speaks “No, it is Fukuoka Prefecture” instead of “Yes”. At this time, the control unit 17 displays the system screen 30 shown in FIG. 8 on the display 19 and outputs a response sentence “Please input the facility name of Fukuoka Prefecture” from the speaker 21. Thereafter, when a facility name in Fukuoka Prefecture is input by voice, the control unit 17 sets a destination for the facility.
[0052]
The control unit 17 gradually narrows down the destination by such processing, sets the destination, executes system processing such as route search from the current vehicle position to the destination, and ends the processing.
[0053]
As described above, the navigation system 1 according to the present embodiment determines the search range of the vocabulary dictionary used for the speech recognition process based on the vehicle state such as the vehicle position. Based on the search range and the input wait information type, a response sentence for confirming whether the input wait information is related to the search range or a response sentence for prompting input of the input wait information is output. be able to. As a result, the user's utterance can be urged to efficiently perform the speech recognition process.
[0054]
That is, the navigation system 1 individually indicates to the user the information waiting for input expected by the system and the search range of the vocabulary dictionary used for speech recognition. Compared with the case of confirming, the expected information can be efficiently obtained, and a natural conversation can be performed between the system and the user.
[0055]
The navigation system 1 is controlled by a predetermined voice recognition program. Such a voice recognition program may be recorded in any of a semiconductor memory such as a ROM, an optical disk, a magnetic disk, and the like.
[0056]
Note that the present invention is not limited to the above-described embodiment, and it is needless to say that the present invention can also be applied to a design modified within the scope of the claims.
[0057]
For example, the control unit 17 outputs the response sentence to the user via the speaker 21, but may output the response sentence as an image via the display 19. In addition, the control unit 17 determines the search range of the vocabulary dictionary based on the current vehicle position, but the lexicon dictionary can be used by using arbitrary information such as voice input information from the user and operation input information of the operation unit 23. The search range may be determined. The control unit 17 may detect input waiting information (input waiting information type) using arbitrary information such as the vehicle position, voice input information, and operation input information of the operation unit 23.
[0058]
【The invention's effect】
The speech recognition apparatus and program according to the present invention limit the search range of a vocabulary dictionary based on vehicle information, generate a response sentence based on the limited search range and input waiting information, and output the vocabulary It is possible to prompt the user to speak so that voice recognition processing can be performed quickly and efficiently using a dictionary.
[0059]
The vehicle navigation system according to the present invention searches for a route to a destination based on the information recognized by the voice recognition device and the map information stored in the map information storage means. The route to the destination can be searched efficiently.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a navigation system according to an embodiment of the present invention.
FIG. 2 is a diagram showing a configuration of a vocabulary dictionary stored in a speech recognition database.
FIG. 3 is a diagram showing a configuration of an input waiting information type detection table.
FIG. 4 is a diagram showing a configuration of a response sentence generation template table.
FIG. 5 is a flowchart showing a processing routine of a control unit provided in the navigation system.
FIG. 6 is a diagram showing a system screen displayed on a display.
FIG. 7 is a diagram showing a system screen displayed on a display.
FIG. 8 is a diagram showing a system screen displayed on a display.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Navigation system 11 Microphone 12 PTT switch 13 Voice recognition database 14 Voice recognition part 15 GPS sensor 16 Map database 17 Control part 18 Display control part 19 Display 20 Speech synthesis part 21 Speaker 22 System database 23 Operation part

Claims

A speech recognition device that performs input speech recognition processing using an acoustic model to be matched with a feature amount of an input speech, and a vocabulary dictionary that indicates a correspondence relationship between a vocabulary and an arrangement of acoustic models,
Vehicle information detection means for detecting vehicle information;
Limiting means for limiting a search range of the vocabulary dictionary based on vehicle information detected by the vehicle information detecting means;
A response sentence generating means for generating a response sentence based on the search range limited by the limiting means and the input waiting information indicating the input waiting information of the system;
Response sentence output means for outputting the response sentence generated by the response sentence generation means;
A speech recognition device comprising:

The speech recognition apparatus according to claim 1, wherein the response sentence generation unit generates a question sentence for confirming whether the input waiting information is related to the search range as the response sentence.

The speech recognition apparatus according to claim 1, wherein the response sentence generation unit generates a response sentence that prompts an input of the input waiting information in the search range.

A speech recognition program that performs input speech recognition processing using an acoustic model to be matched with a feature amount of an input speech, and a vocabulary dictionary indicating a correspondence relationship between vocabulary and acoustic model arrangement,
Computer
Limiting means for limiting the search range of the vocabulary dictionary based on vehicle information;
A response sentence generating means for generating a response sentence based on the search range limited by the limiting means and the input waiting information indicating the input waiting information of the system;
Response sentence output means for outputting the response sentence generated by the response sentence generation means;
Voice recognition program that functions as

The speech recognition program according to claim 4, wherein the response sentence generation unit generates a question sentence for confirming whether the input waiting information is related to the search range as the response sentence.

6. The speech recognition program according to claim 4, wherein the response sentence generation unit generates a response sentence that prompts the input of the input waiting information in the search range.

The voice recognition device according to any one of claims 1 to 3,
Map information storage means for storing map information;
Route search means for searching for a route to a destination based on information recognized by the voice recognition device and map information stored in the map information storage means;
Vehicle navigation system equipped with