JP2004184715A

JP2004184715A - Speech recognition apparatus

Info

Publication number: JP2004184715A
Application number: JP2002351960A
Authority: JP
Inventors: Takeshi Ono; 健大野; Minoru Togashi; 実冨樫
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2002-12-04
Filing date: 2002-12-04
Publication date: 2004-07-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition apparatus with a high speech recognition rate. <P>SOLUTION: A passenger detection part 15 detects whether or not there is a passenger and when there is a passenger, the maximum wait time is shortened to complete speech input earlier than when there is no passenger, thereby decreasing the rate at which the voice of the passenger is included in the speech by a signal processing part 3 even if the passenger speaks while a user who is inputting a voice speaks. Consequently, deterioration in the speech recognition rate of the user by the signal processing part 3 can be reduced. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音声を認識する音声認識装置に関する。
【０００２】
【従来の技術】
【特許文献１】特開２００１−１６６７９４号公報
従来、音声認識装置として特開２００１−１６６７９４号公報記載のようなものがある。この音声認識装置を車載ナビゲーションシステムに適用した場合、音声認識に先立ち、乗員が１名であるか否かを判断し、乗員が２名以上である場合には非利用者すなわち同乗者への注意喚起を行うことで、車両内で音声認識装置の利用者のみが発話する環境を作り、より音声認識率を高くし、かつ使い勝手をよくしている。
【０００３】
【発明が解決しようとする課題】
このような上記従来の音声認識装置にあっては、音声認識装置の非利用者すなわち同乗者が注意喚起を聞き逃した場合、あるいは注意喚起を聞いたとしても意味を理解できなかった場合などは、音声認識処理中すなわち利用者の発話終了以前に同乗者が発話してしまい、音声認識率が低下してしまうといった問題があった。また、同乗者以外の原因により音声認識に適切な環境を維持できなくなった場合にも、音声認識率が低下してしまうといった問題があった。
【０００４】
そこで本発明はこのような問題点に鑑み、より精度よく音声認識を行うことができる音声認識装置を提供することを目的とする。
【０００５】
【課題を解決するための手段】
本発明は、音声の取り込み開始から、音声取り込み可能状態終了までの間の最大待受け時間を設定する最大待受け時間設定手段と、音声取り込み開始から、最大待受け時間設定手段によって設定された最大待受け時間終了までの間に入力された音声を認識する信号処理部とを有する音声認識装置において、認識対象とする音声以外の音が取り込まれる可能性がある場合に、最大待受け時間設定手段は、最大待受け時間を短縮するものとした。
【０００６】
【発明の効果】
本発明によれば、認識対象とする音声以外の音が取り込まれる可能性がある場合に、最大待受け時間を短縮することにより、信号処理部によって取り込まれた音声中に、認識対象とする音声以外の音が占める割合を少なくすることができる。よって音声認識率の悪化を低減することができる。
【０００７】
【発明の実施の形態】
次に本発明の実施の形態を実施例により説明する。
以下に示す各実施例は、本発明における音声認識装置を車両のナビゲーションシステムに適用したものである。
図１に、第一の実施例における車両のナビゲーションシステムの全体構成を示す。
図示しないＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）アンテナによって受信された信号より自車両の位置を演算し、使用者に各種の情報を提示するナビゲーション制御部２が信号処理部３に接続される。
【０００８】
信号処理部３はＣＰＵおよびメモリなどから構成され、音声の認識処理を行う。信号処理部３には、音声認識対象となる単語の階層構造からなる文法（後述）を記憶している記憶部６が接続される。また信号処理部３には、発話スイッチ１３と訂正スイッチ１４とを備えた入力部１２と、同乗者の有無を検出する同乗者検出部１５が接続されている。
【０００９】
さらに信号処理部３には、Ｄ／Ａコンバータ７、出力アンプ８を介してスピーカ９が接続され、信号処理部３から出力されたデジタルの音声信号がＤ／Ａコンバータ７によってアナログの音声信号に変換され、出力アンプ８によって増幅されてスピーカ９から音声として出力される。
信号処理部３には、Ａ／Ｄコンバータ１０を介してマイク１１が接続され、マイク１１から入力されたアナログの音声信号がＡ／Ｄコンバータ１０によってデジタルの音声信号に変換されて信号処理部３に伝達される。
【００１０】
ナビゲーション制御部２は表示部１６およびスピーカ９に接続されており、表示部１６およびスピーカ９を通じて車両のドライバ等に位置情報等を提示する。信号処理部３、記憶部６、Ｄ／Ａコンバータ７、出力アンプ８およびＡ／Ｄコンバータ１０より音声認識部１が構成される。
また音声認識部１、ナビゲーション制御部２、表示部１６、スピーカ９、マイク１１、入力部１２および同乗者検出部１５よりナビゲーションシステム２０が構成される。
【００１１】
次に図２のフローチャートを用いて、本実施例におけるナビゲーションシステムの音声認識処理の流れについて説明する。
ステップ１００において、信号処理部３はナビゲーションシステム２０の使用者によって、発話の開始を指示する入力部１２に設けられた発話スイッチ１３が操作されたかどうかの判断を行う。発話スイッチ１３の操作があった場合にはステップ１０１へ進む。
【００１２】
ステップ１０１において、信号処理部３は記憶部６に記憶された単語の階層構造からなる文法を認識対象語として設定する。ここで単語の階層構造からなる文法とは地名の配列を指し、図３にその一例を示す。まずはじめに、文頭に都道府県名を設定し、次に各都道府県に対応する市町村名のように順次地名を設定する。
【００１３】
ステップ１０２において、信号処理部３はステップ１０１において設定した認識対象語にもとづいて、最大待受け時間を設定する。この最大待受け時間は、音声を検出できないまま待受け状態を継続し続けることを避けるために設けるものである。最大待受け時間Ｔｍの長さは、設定した認識対象語に含まれる最長の文が発話された場合の推定発話時間Ｔｗに、余裕時間Ｔｘを加えたものであり、次式によって表される。
Ｔｍ＝Ｔｗ＋Ｔｘ（１）
余裕時間Ｔｘは、音声認識装置付のナビゲーションシステム２０に不慣れな場合、あるいは個人差によって発生する発話開始の遅れ、および発話の長さのばらつきを吸収するために通常は推定発話時間Ｔｗと同程度の値が設定される。
【００１４】
ステップ１０３において、信号処理部３は同乗者検出部１５を用いて同乗者の有無を検出する。同乗者検出部１５は、座席に取り付けた感圧センサを用いることによって同乗者の有無を検出する。同乗者が検出された場合ステップ１０４へ進み、同乗者が検出されなかった場合はステップ１０５へ進む。
【００１５】
ステップ１０４において、信号処理部３は式（１）で算出された最大待受け時間Ｔｍを短く変更し、次に示すように最大待受け時間Ｔｍ’の算出を行う。
図４に示すように、時刻Ａにおいて発話スイッチ１３が操作されて信号処理部３よる音声取り込みが開始された場合、使用者の発話は通常、音声の最大待受け時間Ｔｍが終了する時刻Ｃより以前の時刻Ｂの時点で終了する。
【００１６】
しかし図５に示すように、使用者の発話が終了する時刻Ｂの時点以前の時刻Ｄにおいて、途中から同乗者の発話Ｅがあった場合は音声取り込み処理はその影響を大きく受ける。このように信号処理部３は、使用者の発話の終端を検出することが不可能であり、使用者の発話の途中から同乗者の発話を継続して音声取り込みを行ってしまう。
【００１７】
その結果、ステップ１０２において設定された最大待受け時間Ｔｍが終了する時刻Ｃまで音声取り込みが継続される。よって取り込まれた音声には同乗者の音声が広範囲に含まれており、使用者の音声の認識率が悪化する。これを防ぐために次式を用いて最大待受け時間Ｔｍ’の算出を行う。
Ｔｍ’＝Ｔｗ＋Ｔｘ’ （２）
ここでＴｘ’＝ｋ×Ｔｘ、０＜ｋ＜１．０とする。
【００１８】
これにより図６に示すように、時刻Ｄから同乗者の発話があったとしても、最大待受け時間Ｔｍ’が終了する時刻Ｃ’において音声取り込みが終了するので、信号処理部３に取り込まれた音声中に同乗者の発話が含まれる範囲を少なくすることができる。
【００１９】
図２のフローチャートに戻りステップ１０５において、信号処理部３は音声取り込み処理を開始した旨を使用者に知らせるために、記憶部６に記憶されている告知音声をＤ／Ａコンバータ７および出力アンプ８を通じて、スピーカ９から出力する。
【００２０】
音声取り込み開始を知らせる告知音声を聞いた使用者は、認識対象語に含まれる単語の発話を行う。なお本実施例において、認識対象は図３に示した住所とする。
マイク１１から入力された音声信号は、Ａ／Ｄコンバータ１０によってデジタル信号に変換されて信号処理部３に入力される。
【００２１】
発話スイッチ１３が操作されるまでの間、信号処理部３はＡ／Ｄコンバータ１０によって変換された音声のデジタル信号の平均パワーを演算している。発話スイッチ１３が操作された後、演算していた平均パワーに比べてデジタル信号の瞬間パワーが所定値以上大きくなったときに、ステップ１０６において、使用者が発話したと判断して音声の取り込みを開始する。
【００２２】
音声取り込みが開始されると、ステップ１０７において信号処理部３は記憶部６に記憶された認識対象語との一致度演算を開始する。一致度とは取り込まれた音声部分と個々の認識対象語とがどの程度似ているかを指し、さらにこの一致度はスコアとして得られる。本実施例において、スコアの値が大きいほど一致度が高いとする。
なお、このステップの処理を行う間も、並列して信号処理部３による音声取り込みは継続されている。
【００２３】
ステップ１０８において、発話の終端が検出されたかどうかの判断を行う。この終端の検出は、音声のデジタル信号の瞬間パワーが所定時間以上、かつ所定値以下となったときに使用者の発話が終了したと判断するものである。発話の終端を検出した場合はステップ１０９へ進み、終了していない場合はステップ１１３へ進む。
【００２４】
ステップ１１３において、音声取り込み開始後、最大待受け時間Ｔｍ、または最大待受け時間Ｔｍ’を経過したかどうかの判断を行い、経過していない場合はステップ１０６へ戻る。また、最大待受け時間を経過しているときはステップ１０９へ進む。
【００２５】
ステップ１０９において、音声の取り込み処理を終了し、ステップ１１０において、信号処理部３は一致度の最も大きい認識対象語を認識結果として、Ｄ／Ａコンバータ７および出力アンプ８を通じてスピーカ９から出力する。本実施例においては、使用者が発話した「神奈川県横須賀市夏島町」が正しく認識され、信号処理部３は認識結果である「神奈川県横須賀市夏島町」をスピーカ９を通して出力する。
【００２６】
ステップ１１１では、ステップ１１０における認識結果の出力後、所定時間内に入力部１２に備えられた訂正スイッチ１４が操作されたかどうかの判断を行う。訂正スイッチ１４の操作があった場合は、使用者がナビゲーションシステム２０の音声認識結果に対して修正ありと判断してステップ１０１へ戻り、上述の音声取り込みをやり直す。
【００２７】
一方、所定時間内に訂正スイッチ１４の操作がない場合はステップ１１２へ進み、使用者がナビゲーションシステム２０の認識結果を容認したと判断して、認識結果に応じた処理を行う。本実施例においては、信号処理部３は認識結果である住所をナビゲーション制御部２へ出力する。ナビゲーション制御部２は認識された住所を目的地として設定し、表示部１６やスピーカ９を通じて使用者に道案内等の情報提示を行う。
なお本実施例において、図２におけるステップ１０２およびステップ１０４が本発明における最大待受け時間設定手段を構成する。また本実施例における同乗者検出部１５が、本発明における同席者検出部を構成する。
【００２８】
本実施例は以上のように構成され、同乗者検出部１５によって同乗者が検出された場合に、最大待受け時間Ｔｍを短縮して最大待受け時間Ｔｍ’を算出し、音声の取り込み時間幅を短くすることにより、使用者の発話が終了する前に同乗者の発話があったとしても、信号処理部３によって取り込まれた音声中に同乗者の音声が占める割合を少なくすることができる。よって信号処理部３による使用者の音声認識率の悪化を低減することができる。
【００２９】
なお本実施例では、同乗者検出部１５は座席に取り付けた感圧センサを用いることによって同乗者の有無を検出するものとしたが、これに限定されず、たとえば座席近傍に設けられ、人が座ると遮られる赤外線センサや超音波センサなどを用いて同乗者の有無を検出してもよい。またシートベルト着用センサを用い、運転席以外でシートベルトが着用されているかどうかによって同乗者の有無を検出するようにしてもよく、さらにドアスイッチの状況やドアの開閉状況によって同乗者の有無を検出するようにしてもよい。
【００３０】
次に、第二の実施例について説明する。
図７に、第二の実施例における車両のナビゲーションシステムの全体構成を示す。
本実施例の構成は、第一の実施例における同乗者検出部１５を削除し、雑音保持部１８を追加したものである。
ＣＰＵおよびメモリより構成される信号処理部３Ａが、自車両の位置を演算するナビゲーション制御部２に接続される。
【００３１】
信号処理部３Ａの内部には、音声や音声に近い音などの雑音情報を保持する雑音保持部１８を備える。
信号処理部３Ａ、記憶部６、Ｄ／Ａコンバータ７、出力アンプ８およびＡ／Ｄコンバータ１０より音声認識部１Ａが構成される。
また音声認識部１Ａ、ナビゲーション制御部２、表示部１６、スピーカ９、マイク１１および入力部１２よりナビゲーションシステム２０Ａが構成される。
なお、上記第一の実施例と同じ作用を持つ構成物については、同じ番号を付して説明を省略する。
【００３２】
次に本実施例におけるナビゲーションシステムの音声認識処理の流れについて説明する。
断続して騒音が起きるような環境下を車両が走行している場合や、使用者と同乗者が会話をしている場合は、図８に示すように信号処理部３Ａによって、音声取り込みが開始される時刻Ａよりも以前に、音声あるいはそれに近い音が雑音Ｇとして検出される。
このような場合、音声取り込み開始以降においても音声やそれに近い音が雑音Ｈとして検出される可能性が高くなる。
【００３３】
図９のフローチャートを用いて、音声取り込み開始の時刻Ａよりも以前の雑音Ｇの検出処理について説明する。
この検出処理は、信号処理部３Ａに内蔵された図示しないタイマの割り込み処理によって定期的に実行される。
ステップ２００において、信号処理部３Ａは所定時間内に雑音の始点を検出したかどうかの判断を行う。この雑音の始点検出は、第一の実施例におけるステップ１０６と同様に、雑音のデジタル信号の平均パワーを演算し、平均パワーに比べて瞬間パワーが所定値以上大きくなったときに雑音を検出したと判断するものである。
【００３４】
ステップ２００において雑音の始点が検出されると、ステップ２０１において信号処理部３Ａは雑音の取り込みを開始する。ステップ２０２において、雑音の終端が検出されると雑音の取り込みを終了する。この雑音の終端の検出は、第一の実施例におけるステップ１０８と同様に行う。
【００３５】
ステップ２０３において、信号処理部３Ａは雑音を取り込んだ時刻、および継続時間を雑音データとして雑音保持部１８に記憶する。
なおステップ２００において雑音の始点を検出できなかった場合は処理を終了する。
【００３６】
次に図１０のフローチャートを用いて、ナビゲーションシステム２０Ａが行う音声認識の流れについて説明する。
ステップ３００〜３０２は、上記第一の実施例におけるステップ１００〜１０２と、また図１０のステップ３０４〜３１３は、第一の実施例におけるステップ１０４〜１１３と同様であり説明を省略する。
【００３７】
ステップ３０３において、信号処理部３Ａは使用者の音声入力開始以前に、雑音を検出していたかどうかの判断を行う。この判断は、図９のフローチャートに示した雑音の検出処理によって記憶された雑音データを用いて、発話スイッチ１３が操作された時刻から所定時間前まで間に発生した雑音の継続時間を積算し、その積算値が所定値以上となった場合に雑音を検出したと判断するものである。雑音が検出されるとステップ３０４へ進み、最大待受け時間Ｔｍの短縮を行う。一方雑音が検出されなかった場合はステップ３０５へ進む。
なお本実施例において、図１０におけるステップ３０２およびステップ３０４が本発明における最大待受け時間設定手段を構成する。
【００３８】
本実施例は以上のように構成され、使用者のナビゲーションシステム２０Ａに対する音声の入力開始以前に、信号処理部３Ａが雑音を検出している場合は、図８に示すように最大待受け時間Ｔｍを短縮して最大待受け時間Ｔｍ’を算出し、時刻Ｃ’の時点で音声取り込みを終了することにより、使用者の発話以外の雑音入力があったとしても、信号処理部３Ａによって取り込まれた音声中に雑音が占める割合を少なくすることができる。よって信号処理部３Ａによる使用者の音声認識率の悪化を低減することができる。
【００３９】
次に第三の実施例について説明する。
図１１に、第三の実施例における車両のナビゲーションシステムの全体構成を示す。
本実施例の構成は、第一の実施例における同乗者検出部１５を削除して、車両の後方および側方の監視を行う後側方監視部１７を追加し、さらに自車両位置の検出を行う自車両位置検出部１９および環境音増大予測部２３を追加したものである。
ナビゲーション制御部２Ｂの内部に、図示しないＧＰＳアンテナからの信号より自車両の位置を演算する自車両位置検出部１９を備える。ナビゲーション制御部２Ｂは、ＣＰＵおよびメモリより構成される信号処理部３Ｂに接続される。
【００４０】
車両の後側方を監視する後側方監視部１７が、信号処理部３Ｂに接続される。また信号処理部３Ｂの内部には、環境音の増大を予測する環境音増大予測部２３を備えている。
信号処理部３Ｂ、記憶部６、Ｄ／Ａコンバータ７、出力アンプ８およびＡ／Ｄコンバータ１０より音声認識部１Ｂが構成される。
【００４１】
また音声認識部１Ｂ、ナビゲーション制御部２Ｂ、表示部１６、スピーカ９、マイク１１、入力部１２および後側方監視部１７よりナビゲーションシステム２０Ｂが構成される。
なお、上記第一の実施例と同じ作用を持つ構成物については、同じ番号を付して説明を省略する。
【００４２】
次に図１２のフローチャートを用いて、ナビゲーションシステム２０Ｂが行う音声認識処理の流れについて説明する。
ステップ４００〜４０２は、上記第一の実施例におけるステップ１００〜１０２と、またステップ４０４〜４１３は第一の実施例におけるステップ１０４〜１１３と同様であり説明を省略する。
【００４３】
ステップ４０３において、環境音が増大するか否かの予測処理を行う。この環境音としては、追い越し車両等の出現に伴う環境音や、トンネルへの侵入によって発生する環境音がある。
追い越し車両の出現に伴う環境音の増大は、車両の後側方を監視する後側方監視部１７によって、自車に近づく車両が検出された場合に環境音が増大するものとして予測することができる。
【００４４】
車両の後側方を監視する後側方監視部１７として、ＣＣＤカメラを用いる事ができる。このようなＣＣＤカメラを用いて車両の後側方を監視する方法として、たとえば特開２０００−２５９９９８号公報に開示された車両用後側方監視装置がある。また、ＣＣＤカメラ以外では、車載用後側方レーダ等を用いることもできる。
【００４５】
また、トンネルへの侵入によって発生する環境音の増大は、ナビゲーション制御部２の自車両位置検出部１９によって得られた自車両の位置情報をもとに、自車両が環境音が増大するトンネル等に侵入したかどうかを判断することによって、予測を行うことができる。このようなナビゲーション装置を用いたトンネルへの進入検知方法として、たとえば特開２００２−２３６０２３号公報に開示されたナビゲーション装置および付属機器制御方法に詳細に記述されている。
【００４６】
ステップ４０３において、環境音の増大が予想される場合はステップ４０４へ進み、最大待受け時間Ｔｍの短縮し、最大待受け時間Ｔｍ’の算出を行う。一方、環境音の増大が予想されない場合は、ステップ４０５へ進む。
なお本実施例において、後側方監視部１７が本発明における車両監視部を構成する。また図１２におけるステップ４０２およびステップ４０４が本発明における最大待受け時間設定手段を構成する。
【００４７】
本実施例は以上のように構成され、自車両がトンネル内に進入する場合や、自車両に近づく車両等が後側方監視部１７によって検出された場合には、環境音が増大すると予測して最大待受け時間Ｔｍを短縮する。これにより、図１３に示すように時刻Ａから音声の取り込みが開始され、使用者の音声発話中に環境音Ｉが発生し、使用者の発話と環境音Ｉとが重なってしまったとしても、最大待受け時間Ｔｍ’の終了する時刻Ｃ’において音声取り込みを終了することにより、信号処理部３Ｂによって取り込まれた音声中に環境音Ｉが占める割合を少なくすることができる。よって信号処理部３Ｂによる使用者の音声認識率の悪化を低減することができる。
なお本実施例において、後側方監視部１７を用いて車両の後側方を監視するものとしたがこれに限定されず、車両の前側方など車両周囲を監視するようにしてもよい。
【００４８】
次に、第四の実施例について説明する。
図１４に、第四の実施例における車両のナビゲーションシステムの全体構成を示す。
本実施例の全体構成は、第一の実施例における同乗者検出部１５を削除して、音声認識処理の実行回数を計数する使用経験記憶部２１を追加したものである。ＣＰＵおよびメモリより構成される信号処理部３Ｃの内部に、音声認識処理の実行回数を計数する使用経験記憶部２１を備える。
【００４９】
信号処理部３Ｃ、記憶部６、Ｄ／Ａコンバータ７、出力アンプ８およびＡ／Ｄコンバータ１０より音声認識部１Ｃが構成される。
また音声認識部１Ｃ、ナビゲーション制御部２、表示部１６、スピーカ９、マイク１１および入力部１２より、ナビゲーションシステム２０Ｃが構成される。他の構成および動作は第一の実施例と同様であり、同じ番号を付して説明を省略する。
【００５０】
次に、本実施例における音声認識装置を適用したナビゲーションシステムの動作について説明する。
図１５のフローチャートを用いて、ナビゲーションシステム２０Ｃが行う音声認識処理の流れについて説明する。
ステップ５００〜５０２は、上記第一の実施例におけるステップ１００〜１０２と、また図１５のステップ５０４〜５１３は、第一の実施例におけるステップ１０４〜１１３と同様であり説明を省略する。
【００５１】
ステップ５０３において、信号処理部３Ｃは、音声認識装置を適用したナビゲーションシステムの使用経験が十分か否かの判断を行う。信号処理部３Ｃは、使用経験記憶部２１によって計数された音声認識処理の実行回数値を用いて、実行回数値が所定値以上である場合に、使用経験が十分であるとの判断を行うものである。
【００５２】
ステップ５０３において、使用経験が十分あると判断された場合にはステップ５０４へ進み、最大待受け時間Ｔｍの短縮を行う。一方、使用経験が十分でないと判断された場合はステップ５０５へ進む。
なお本実施例において、図１５におけるステップ５０２およびステップ５０４が本発明における最大待受け時間設定手段を構成する。
【００５３】
本実施例は以上のように構成され、使用者は音声認識装置の使用経験が多くなると、音声の取り込み開始から速やかに発話を開始した場合に音声の認識率が高くなることを認知するようになる。よって、音声認識装置の使用経験が浅い使用者と比べて、使用経験の多い使用者は発話終了時が早くなる。この発話時間幅の短縮に合わせて最大待受け時間Ｔｍの短縮を行うことにより、信号処理部３Ｃに取り込まれた音声中に、使用者以外の音声等が占める割合を少なくすることができる。よって信号処理部３Ｃによる使用者の音声認識率の悪化を低減することができる。
【図面の簡単な説明】
【図１】本発明における第一の実施例を示す図である。
【図２】第一の実施例における音声認識処理の流れを示す図である。
【図３】単語の階層構造からなる文法を示す図である。
【図４】最大待受け時間と発話の関係を示す図である。
【図５】最大待受け時間と発話の関係を示す図である。
【図６】最大待受け時間と発話の関係を示す図である。
【図７】第二の実施例を示す図である。
【図８】最大待受け時間と発話の関係を示す図である。
【図９】雑音の取り込み処理の流れを示す図である。
【図１０】第二の実施例における音声認識処理の流れを示す図である。
【図１１】第三の実施例を示す図である。
【図１２】第三の実施例における音声認識処理の流れを示す図である。
【図１３】最大待受け時間と発話の関係を示す図である。
【図１４】第四の実施例を示す図である。
【図１５】第四の実施例における音声認識処理の流れを示す図である。
【符号の説明】
１、１Ａ、１Ｂ、１Ｃ音声認識部
２、２Ｂナビゲーション制御部
３、３Ａ、３Ｂ、３Ｃ信号処理部
６記憶部
７Ｄ／Ａコンバータ
８出力アンプ
９スピーカ
１０Ａ／Ｄコンバータ
１１マイク
１２入力部
１３発話スイッチ
１４訂正スイッチ
１５同乗者検出部
１６表示部
１７後側方監視部
１８雑音保持部
１９自車両位置検出部
２０、２０Ａ、２０Ｂ、２０Ｃナビゲーションシステム
２１使用経験記憶部
２３環境音増大予測部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice recognition device that recognizes voice.
[0002]
[Prior art]
[Patent Document 1] Japanese Patent Application Laid-Open No. 2001-166794 Conventionally, there is a speech recognition device as described in Japanese Patent Application Laid-Open No. 2001-166794. When this voice recognition device is applied to an in-vehicle navigation system, it is determined whether or not there is only one occupant before voice recognition. If there are two or more occupants, attention is given to a non-user, ie, a passenger. By evoking, an environment in which only the user of the voice recognition device speaks is created in the vehicle, and the voice recognition rate is increased and the usability is improved.
[0003]
[Problems to be solved by the invention]
In the above-described conventional voice recognition device, when a non-user of the voice recognition device, that is, a passenger, misses the alert, or does not understand the meaning even after hearing the alert, etc. However, there is a problem that the fellow passenger utters during the voice recognition process, that is, before the user's utterance ends, thereby lowering the voice recognition rate. There is also a problem that the speech recognition rate is reduced when an environment suitable for speech recognition cannot be maintained due to a cause other than a passenger.
[0004]
In view of the above problems, an object of the present invention is to provide a speech recognition device that can perform speech recognition with higher accuracy.
[0005]
[Means for Solving the Problems]
The present invention provides a maximum standby time setting means for setting a maximum standby time from the start of audio capture to the end of the audio capture enabled state, and an end of the maximum standby time set by the maximum standby time setting means from the start of audio capture. In a speech recognition device having a signal processing unit for recognizing a voice input before, when there is a possibility that a sound other than the voice to be recognized may be captured, the maximum standby time setting means sets the maximum standby time to Was shortened.
[0006]
【The invention's effect】
According to the present invention, when there is a possibility that a sound other than the speech to be recognized may be captured, the maximum standby time is reduced so that the voice other than the voice to be recognized is included in the voice captured by the signal processing unit. Sound can be reduced in proportion. Therefore, deterioration of the voice recognition rate can be reduced.
[0007]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, embodiments of the present invention will be described with reference to examples.
In each of the embodiments described below, the speech recognition device according to the present invention is applied to a vehicle navigation system.
FIG. 1 shows the overall configuration of a vehicle navigation system according to the first embodiment.
A navigation control unit 2 that calculates the position of the vehicle based on a signal received by a GPS (Global Positioning System) antenna (not shown) and presents various information to a user is connected to the signal processing unit 3.
[0008]
The signal processing unit 3 includes a CPU, a memory, and the like, and performs a voice recognition process. The signal processing unit 3 is connected to a storage unit 6 that stores a grammar (described later) having a hierarchical structure of words to be subjected to speech recognition. The signal processing unit 3 is connected to an input unit 12 having an utterance switch 13 and a correction switch 14, and a passenger detection unit 15 for detecting the presence or absence of a passenger.
[0009]
Further, a speaker 9 is connected to the signal processing unit 3 via a D / A converter 7 and an output amplifier 8, and the digital audio signal output from the signal processing unit 3 is converted into an analog audio signal by the D / A converter 7. It is converted, amplified by the output amplifier 8 and output from the speaker 9 as sound.
A microphone 11 is connected to the signal processing unit 3 via an A / D converter 10, and an analog audio signal input from the microphone 11 is converted into a digital audio signal by the A / D converter 10, and the signal processing unit 3 Is transmitted to.
[0010]
The navigation control unit 2 is connected to the display unit 16 and the speaker 9, and presents position information and the like to a vehicle driver or the like through the display unit 16 and the speaker 9. The speech recognition unit 1 is composed of the signal processing unit 3, the storage unit 6, the D / A converter 7, the output amplifier 8, and the A / D converter 10.
The navigation system 20 includes the voice recognition unit 1, the navigation control unit 2, the display unit 16, the speaker 9, the microphone 11, the input unit 12, and the passenger detection unit 15.
[0011]
Next, the flow of the voice recognition processing of the navigation system in the present embodiment will be described using the flowchart of FIG.
In step 100, the signal processing unit 3 determines whether the user of the navigation system 20 has operated the utterance switch 13 provided on the input unit 12 for instructing the start of utterance. When the utterance switch 13 is operated, the process proceeds to step 101.
[0012]
In step 101, the signal processing unit 3 sets a grammar having a hierarchical structure of words stored in the storage unit 6 as a recognition target word. Here, the grammar having a hierarchical structure of words refers to an array of place names, and FIG. 3 shows an example thereof. First, a prefecture name is set at the beginning of a sentence, and then a place name is sequentially set, such as a municipal name corresponding to each prefecture.
[0013]
In step 102, the signal processing unit 3 sets the maximum standby time based on the recognition target word set in step 101. This maximum standby time is provided in order to avoid continuing the standby state without detecting a sound. The length of the maximum standby time Tm is obtained by adding the margin time Tx to the estimated utterance time Tw when the longest sentence included in the set recognition target word is uttered, and is expressed by the following equation.
Tm = Tw + Tx (1)
The allowance time Tx is usually about the same as the estimated utterance time Tw in order to absorb delays in utterance start and utterance length variations caused by individual differences if the user is unfamiliar with the navigation system 20 with the voice recognition device. Is set.
[0014]
In step 103, the signal processing unit 3 detects the presence or absence of a passenger by using the passenger detection unit 15. The passenger detection unit 15 detects the presence or absence of a passenger by using a pressure-sensitive sensor attached to a seat. When a passenger is detected, the process proceeds to step 104, and when a passenger is not detected, the process proceeds to step 105.
[0015]
In step 104, the signal processing unit 3 changes the maximum standby time Tm calculated by the equation (1) to a shorter value, and calculates the maximum standby time Tm 'as described below.
As shown in FIG. 4, when the utterance switch 13 is operated at time A and the voice capturing by the signal processing unit 3 is started, the utterance of the user usually occurs before the time C when the maximum standby time Tm of the voice ends. The processing ends at time B.
[0016]
However, as shown in FIG. 5, at time D before time B, at which the user's speech ends, if the fellow passenger's speech E is present halfway, the voice capturing process is greatly affected. As described above, the signal processing unit 3 cannot detect the end of the utterance of the user, and continues to utter the fellow passenger from the middle of the utterance of the user to capture the voice.
[0017]
As a result, voice capture is continued until time C when the maximum standby time Tm set in step 102 ends. Therefore, the captured voice includes the voice of the passenger in a wide range, and the recognition rate of the voice of the user deteriorates. To prevent this, the maximum standby time Tm 'is calculated using the following equation.
Tm '= Tw + Tx' (2)
Here, Tx ′ = k × Tx, and 0 <k <1.0.
[0018]
As a result, as shown in FIG. 6, even if there is an utterance of a passenger from time D, voice capture ends at time C ′ at which the maximum standby time Tm ′ ends. The range in which the utterance of the fellow passenger is included can be reduced.
[0019]
Returning to the flowchart of FIG. 2, in step 105, the signal processing unit 3 converts the notification sound stored in the storage unit 6 into the D / A converter 7 and the output amplifier 8 in order to notify the user of the start of the sound capturing process. Through the speaker 9.
[0020]
The user who hears the notification voice that notifies the start of voice capture utters a word included in the recognition target word. In this embodiment, the recognition target is the address shown in FIG.
The audio signal input from the microphone 11 is converted into a digital signal by the A / D converter 10 and input to the signal processing unit 3.
[0021]
Until the utterance switch 13 is operated, the signal processing unit 3 calculates the average power of the audio digital signal converted by the A / D converter 10. After the utterance switch 13 is operated, when the instantaneous power of the digital signal becomes larger than the calculated average power by a predetermined value or more, in step 106, it is determined that the user has uttered, and the voice is captured. Start.
[0022]
When the voice capture is started, the signal processing unit 3 starts calculating the degree of coincidence with the recognition target word stored in the storage unit 6 in step 107. The degree of coincidence indicates how similar the captured voice part is to the individual recognition target words, and the degree of coincidence is obtained as a score. In this embodiment, it is assumed that the higher the score value, the higher the matching degree.
It should be noted that during the processing of this step, the voice capturing by the signal processing unit 3 is continued in parallel.
[0023]
In step 108, it is determined whether the end of the utterance has been detected. The detection of the termination is to judge that the utterance of the user has ended when the instantaneous power of the audio digital signal becomes equal to or more than a predetermined time and equal to or less than a predetermined value. If the end of the utterance is detected, the process proceeds to step 109, and if not, the process proceeds to step 113.
[0024]
In step 113, it is determined whether or not the maximum standby time Tm or the maximum standby time Tm 'has elapsed after the start of voice capture. If not, the process returns to step 106. If the maximum standby time has elapsed, the process proceeds to step 109.
[0025]
In step 109, the voice capturing process ends, and in step 110, the signal processing unit 3 outputs the recognition target word having the highest degree of coincidence as the recognition result from the speaker 9 through the D / A converter 7 and the output amplifier 8. In the present embodiment, “Natsushima-cho, Yokosuka-shi, Kanagawa” spoken by the user is correctly recognized, and the signal processing unit 3 outputs the recognition result “Natsushima-cho, Yokosuka-shi, Kanagawa” through the speaker 9.
[0026]
In step 111, after outputting the recognition result in step 110, it is determined whether the correction switch 14 provided in the input unit 12 has been operated within a predetermined time. If the correction switch 14 has been operated, the user determines that the voice recognition result of the navigation system 20 has been corrected, returns to step 101, and performs the above-described voice capture again.
[0027]
On the other hand, if the correction switch 14 has not been operated within the predetermined time, the process proceeds to step 112, where it is determined that the user has accepted the recognition result of the navigation system 20, and a process according to the recognition result is performed. In this embodiment, the signal processing unit 3 outputs the address as the recognition result to the navigation control unit 2. The navigation control unit 2 sets the recognized address as the destination, and presents information such as road guidance to the user through the display unit 16 and the speaker 9.
In this embodiment, steps 102 and 104 in FIG. 2 constitute the maximum standby time setting means in the present invention. Further, the passenger detection unit 15 in the present embodiment constitutes a passenger detection unit in the present invention.
[0028]
The present embodiment is configured as described above, and when a passenger is detected by the passenger detection unit 15, the maximum standby time Tm is reduced to calculate the maximum standby time Tm ′, and the voice capturing time width is reduced. By doing so, even if the utterance of the fellow passenger occurs before the utterance of the user ends, the proportion of the voice of the fellow passenger in the voice captured by the signal processing unit 3 can be reduced. Therefore, it is possible to reduce the deterioration of the user's voice recognition rate by the signal processing unit 3.
[0029]
In this embodiment, the passenger detection unit 15 detects the presence or absence of a passenger by using a pressure-sensitive sensor attached to the seat. However, the present invention is not limited to this. The presence or absence of a passenger may be detected using an infrared sensor, an ultrasonic sensor, or the like that is blocked when sitting. In addition, a seatbelt wearing sensor may be used to detect the presence or absence of a passenger based on whether or not a seatbelt is worn outside the driver's seat. You may make it detect.
[0030]
Next, a second embodiment will be described.
FIG. 7 shows the overall configuration of a vehicle navigation system according to the second embodiment.
In the configuration of the present embodiment, the passenger detection unit 15 in the first embodiment is deleted, and a noise holding unit 18 is added.
A signal processing unit 3A including a CPU and a memory is connected to the navigation control unit 2 that calculates the position of the vehicle.
[0031]
The signal processing unit 3 </ b> A includes a noise holding unit 18 that holds noise information such as a sound or a sound close to the sound.
The speech recognition unit 1A is composed of the signal processing unit 3A, the storage unit 6, the D / A converter 7, the output amplifier 8, and the A / D converter 10.
A navigation system 20A is constituted by the voice recognition unit 1A, the navigation control unit 2, the display unit 16, the speaker 9, the microphone 11, and the input unit 12.
The components having the same functions as those of the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
[0032]
Next, a flow of the voice recognition processing of the navigation system in the present embodiment will be described.
When the vehicle is traveling in an environment where noise occurs intermittently, or when the user and the passenger are talking, the signal processing unit 3A starts capturing voice as shown in FIG. Before time A, the sound or a sound close thereto is detected as noise G.
In such a case, there is a high possibility that a sound or a sound close thereto is detected as the noise H even after the start of the sound capturing.
[0033]
With reference to the flowchart of FIG. 9, a description will be given of a process of detecting the noise G before the time A at which the voice capturing starts.
This detection processing is periodically executed by interruption processing of a timer (not shown) incorporated in the signal processing unit 3A.
In step 200, the signal processing unit 3A determines whether or not the start point of noise has been detected within a predetermined time. In the detection of the starting point of the noise, the average power of the digital signal of the noise was calculated, and the noise was detected when the instantaneous power became larger than the average power by a predetermined value, similarly to step 106 in the first embodiment. Is determined.
[0034]
When the start point of the noise is detected in step 200, the signal processing unit 3A starts capturing noise in step 201. In step 202, when the end of the noise is detected, the acquisition of the noise ends. The detection of the end of the noise is performed in the same manner as in step 108 in the first embodiment.
[0035]
In step 203, the signal processing unit 3A stores the time at which the noise was taken and the duration thereof as noise data in the noise holding unit 18.
If the start point of the noise cannot be detected in step 200, the process is terminated.
[0036]
Next, the flow of voice recognition performed by the navigation system 20A will be described using the flowchart of FIG.
Steps 300 to 302 are the same as steps 100 to 102 in the first embodiment, and steps 304 to 313 in FIG. 10 are the same as steps 104 to 113 in the first embodiment, and a description thereof will be omitted.
[0037]
In step 303, the signal processing unit 3A determines whether noise has been detected before the user's voice input is started. This determination is made by using the noise data stored by the noise detection processing shown in the flowchart of FIG. 9 and integrating the duration of the noise that occurred between the time when the speech switch 13 was operated and a predetermined time before, If the integrated value is equal to or greater than a predetermined value, it is determined that noise has been detected. If noise is detected, the process proceeds to step 304, where the maximum standby time Tm is reduced. On the other hand, if no noise is detected, the process proceeds to step 305.
In this embodiment, steps 302 and 304 in FIG. 10 constitute the maximum standby time setting means in the present invention.
[0038]
This embodiment is configured as described above. If the signal processing unit 3A detects noise before the user starts inputting voice to the navigation system 20A, the maximum standby time Tm is set as shown in FIG. By shortening the maximum standby time Tm 'and ending the voice capture at the time C', even if there is a noise input other than the user's utterance, the voice capture by the signal processing unit 3A is performed. Can be reduced by the ratio of noise. Therefore, it is possible to reduce the deterioration of the user's voice recognition rate by the signal processing unit 3A.
[0039]
Next, a third embodiment will be described.
FIG. 11 shows an overall configuration of a vehicle navigation system according to the third embodiment.
In the configuration of the present embodiment, the passenger detection unit 15 in the first embodiment is deleted, a rear side monitoring unit 17 that monitors the rear and side of the vehicle is added, and the detection of the own vehicle position is further performed. This is an addition of the own vehicle position detection unit 19 and the environmental sound increase prediction unit 23 that perform the operation.
The navigation control unit 2B includes a host vehicle position detection unit 19 that calculates the position of the host vehicle from a signal from a GPS antenna (not shown). The navigation control unit 2B is connected to a signal processing unit 3B including a CPU and a memory.
[0040]
A rear side monitoring unit 17 that monitors the rear side of the vehicle is connected to the signal processing unit 3B. The signal processing unit 3B includes an environmental sound increase prediction unit 23 for predicting an increase in environmental sound.
The voice recognition unit 1B is composed of the signal processing unit 3B, the storage unit 6, the D / A converter 7, the output amplifier 8, and the A / D converter 10.
[0041]
The navigation system 20B includes the voice recognition unit 1B, the navigation control unit 2B, the display unit 16, the speaker 9, the microphone 11, the input unit 12, and the rear side monitoring unit 17.
The components having the same functions as those of the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
[0042]
Next, the flow of the voice recognition process performed by the navigation system 20B will be described using the flowchart of FIG.
Steps 400 to 402 are the same as steps 100 to 102 in the first embodiment, and steps 404 to 413 are the same as steps 104 to 113 in the first embodiment, and a description thereof will be omitted.
[0043]
In step 403, a process of predicting whether or not the environmental sound increases is performed. The environmental sound includes an environmental sound accompanying an appearance of a passing vehicle or the like and an environmental sound generated by entering a tunnel.
The increase of the environmental sound accompanying the appearance of the overtaking vehicle can be predicted by the rear side monitoring unit 17 that monitors the rear side of the vehicle as an increase in the environmental sound when the vehicle approaching the own vehicle is detected. it can.
[0044]
As the rear side monitoring unit 17 for monitoring the rear side of the vehicle, a CCD camera can be used. As a method of monitoring the rear side of a vehicle using such a CCD camera, there is, for example, a rear side monitoring device for a vehicle disclosed in JP-A-2000-259998. In addition, other than the CCD camera, a rear-side radar for vehicle use may be used.
[0045]
In addition, the increase of the environmental sound caused by the intrusion into the tunnel is based on the position information of the own vehicle obtained by the own vehicle position detection unit 19 of the navigation control unit 2, so that the own vehicle can increase the environmental sound. A prediction can be made by determining whether or not the user has invaded. A method of detecting entry into a tunnel using such a navigation device is described in detail in, for example, a navigation device and accessory control method disclosed in Japanese Patent Application Laid-Open No. 2002-236023.
[0046]
In step 403, if an increase in environmental sound is expected, the process proceeds to step 404, in which the maximum standby time Tm is reduced, and the maximum standby time Tm 'is calculated. On the other hand, when it is not expected that the environmental sound increases, the process proceeds to step 405.
In this embodiment, the rear side monitoring unit 17 forms a vehicle monitoring unit according to the present invention. Steps 402 and 404 in FIG. 12 constitute the maximum standby time setting means in the present invention.
[0047]
The present embodiment is configured as described above, and predicts that the environmental sound will increase when the own vehicle enters the tunnel or when a vehicle approaching the own vehicle is detected by the rear side monitoring unit 17. To reduce the maximum standby time Tm. As a result, as shown in FIG. 13, the capture of the voice starts at time A, the environmental sound I is generated during the voice utterance of the user, and even if the utterance of the user overlaps with the environmental sound I, By ending the audio capture at the time C ′ at which the maximum standby time Tm ′ ends, the ratio of the environmental sound I to the audio captured by the signal processing unit 3B can be reduced. Therefore, it is possible to reduce the deterioration of the user's voice recognition rate by the signal processing unit 3B.
In the present embodiment, the rear side monitoring unit 17 is used to monitor the rear side of the vehicle. However, the present invention is not limited to this, and the surroundings of the vehicle such as the front side of the vehicle may be monitored.
[0048]
Next, a fourth embodiment will be described.
FIG. 14 shows the overall configuration of a vehicle navigation system according to the fourth embodiment.
The overall configuration of the present embodiment is obtained by deleting the fellow passenger detection unit 15 in the first embodiment and adding a use experience storage unit 21 that counts the number of executions of the voice recognition process. The signal processing unit 3C including a CPU and a memory includes a use experience storage unit 21 that counts the number of executions of the voice recognition process.
[0049]
The speech recognition unit 1C is configured by the signal processing unit 3C, the storage unit 6, the D / A converter 7, the output amplifier 8, and the A / D converter 10.
A navigation system 20C includes the voice recognition unit 1C, the navigation control unit 2, the display unit 16, the speaker 9, the microphone 11, and the input unit 12. Other configurations and operations are the same as those of the first embodiment, and the same reference numerals are given and the description is omitted.
[0050]
Next, the operation of the navigation system to which the voice recognition device according to the present embodiment is applied will be described.
The flow of the voice recognition process performed by the navigation system 20C will be described with reference to the flowchart of FIG.
Steps 500 to 502 are the same as steps 100 to 102 in the first embodiment, and steps 504 to 513 in FIG. 15 are the same as steps 104 to 113 in the first embodiment, and a description thereof will be omitted.
[0051]
In step 503, the signal processing unit 3C determines whether or not the user has sufficient experience using the navigation system to which the voice recognition device is applied. The signal processing unit 3C uses the execution count value of the voice recognition process counted by the use experience storage unit 21 to determine that the use experience is sufficient when the execution count value is equal to or greater than a predetermined value. It is.
[0052]
If it is determined in step 503 that the user has sufficient experience, the process proceeds to step 504 to reduce the maximum standby time Tm. On the other hand, if it is determined that the user experience is not sufficient, the process proceeds to step 505.
In this embodiment, steps 502 and 504 in FIG. 15 constitute the maximum standby time setting means in the present invention.
[0053]
The present embodiment is configured as described above, and the user recognizes that when the use experience of the voice recognition device increases, the voice recognition rate increases when the utterance starts immediately after the start of voice capture. Become. Therefore, compared to a user who has little experience using the voice recognition device, a user who has a lot of experience has a shorter end time of the utterance. By reducing the maximum standby time Tm in accordance with the reduction in the utterance time width, it is possible to reduce the proportion of voices other than the user occupying in the voices captured by the signal processing unit 3C. Therefore, it is possible to reduce the deterioration of the user's voice recognition rate by the signal processing unit 3C.
[Brief description of the drawings]
FIG. 1 is a diagram showing a first embodiment of the present invention.
FIG. 2 is a diagram showing a flow of a voice recognition process in the first embodiment.
FIG. 3 is a diagram showing a grammar having a hierarchical structure of words.
FIG. 4 is a diagram showing the relationship between the maximum standby time and speech.
FIG. 5 is a diagram showing the relationship between the maximum standby time and speech.
FIG. 6 is a diagram showing the relationship between the maximum standby time and speech.
FIG. 7 is a diagram showing a second embodiment.
FIG. 8 is a diagram showing the relationship between the maximum standby time and speech.
FIG. 9 is a diagram showing a flow of a noise capturing process.
FIG. 10 is a diagram showing a flow of a voice recognition process in the second embodiment.
FIG. 11 is a diagram showing a third embodiment.
FIG. 12 is a diagram showing a flow of a voice recognition process in the third embodiment.
FIG. 13 is a diagram showing the relationship between the maximum standby time and speech.
FIG. 14 is a diagram showing a fourth embodiment.
FIG. 15 is a diagram showing a flow of a voice recognition process in the fourth embodiment.
[Explanation of symbols]
1, 1A, 1B, 1C Voice recognition unit 2, 2B Navigation control unit 3, 3A, 3B, 3C Signal processing unit 6 Storage unit 7 D / A converter 8 Output amplifier 9 Speaker 10 A / D converter 11 Microphone 12 Input unit 13 Speech switch 14 Correction switch 15 Passenger detection unit 16 Display unit 17 Rear side monitoring unit 18 Noise holding unit 19 Own vehicle position detection unit 20, 20A, 20B, 20C Navigation system 21 Usage experience storage unit 23 Environmental sound increase prediction unit

Claims

Maximum standby time setting means for setting a maximum standby time from the start of voice capture to the end of the voice capture enabled state,
A voice recognition device having a signal processing unit that recognizes voice input from the start of voice capture to the end of the maximum standby time set by the maximum standby time setting unit.
The speech recognition apparatus according to claim 1, wherein the maximum standby time setting means shortens the maximum standby time when a sound other than the voice to be recognized may be captured.

Equipped with an attendee detection unit that detects the presence of an attendee,
The voice recognition device according to claim 1, wherein when the co-located person is detected by the co-located person detection unit, there is a possibility that a sound other than the voice to be recognized is taken in.

A noise holding unit that holds a sound signal before the start of capturing the sound,
2. The speech recognition apparatus according to claim 1, wherein when the sound signal is equal to or more than a predetermined value, there is a possibility that a sound other than the speech to be recognized may be captured.

An environmental sound increase prediction unit that predicts an increase in ambient environmental sound is provided,
2. The speech recognition apparatus according to claim 1, wherein a time when an increase in environmental sound is predicted by the environmental sound increase prediction unit is a case where a sound other than the voice to be recognized is likely to be captured. .

Mounted on the vehicle,
A vehicle position detecting unit that detects a position of the vehicle,
The environmental sound increase prediction unit predicts that the environmental sound increases when the vehicle enters an area where the increase in the environmental sound is known, based on the position information detected by the host vehicle position detection unit. The speech recognition device according to claim 4, wherein:

Mounted on the vehicle,
A vehicle monitoring unit that detects vehicles around the vehicle,
The voice recognition device according to claim 4, wherein the environmental sound increase predicting unit predicts that environmental sound increases when the vehicle is detected by the vehicle monitoring unit.

Maximum standby time setting means for setting a maximum standby time from the start of voice capture to the end of the voice capture enabled state,
A voice recognition device having a signal processing unit that recognizes voice input from the start of voice capture to the end of the maximum standby time set by the maximum standby time setting unit.
A use experience storage unit that counts the number of executions of the voice recognition process by the signal processing unit,
The speech recognition apparatus according to claim 1, wherein the maximum standby time setting means shortens the maximum standby time when the coefficient value stored in the use experience storage unit is equal to or greater than a predetermined value.