JP3919337B2

JP3919337B2 - Voice recognition device for vehicle

Info

Publication number: JP3919337B2
Application number: JP17341098A
Authority: JP
Inventors: 文雄梅田; 敬央山本
Original assignee: Tokai Rika Co Ltd
Current assignee: Tokai Rika Co Ltd
Priority date: 1998-06-19
Filing date: 1998-06-19
Publication date: 2007-05-23
Anticipated expiration: 2018-06-19
Also published as: JP2000010589A

Description

【０００１】
【発明の属する技術分野】
本発明は、車両用音声認識装置に係り、詳しくは操縦者の音声検出に関するものである。
【０００２】
【従来の技術】
近年、カーナビゲーションの普及に伴って目的地の設定するため等に車両用音声認識装置が使われるようになった。また、ハンズフリー電話機などにおいても車両用音声認識装置が電話番号の設定等に使用されている。
【０００３】
一般に、走行中の車室内おいては、助手席や後部席の会話、カーオーディオからの音又は窓を開けた場合の車外からの騒音などよるノイズが存在するため、音声認識装置の音声認識率はあまりよくないのが現状である。
【０００４】
そこで、これらのノイズを取り除いて、操縦者の音声だけを検出する装置の改善が行われている。
従来の車両用音声認識装置において、ステアリングパッドには操縦者の音声を認識するためにマイクロフォンが設けられている。このマイクロフォンは、操縦者のみの音声を検出するためにかなり鋭い指向性を備えているとともに、その指向性の方向は操縦者の唇に向けられている。
【０００５】
【発明が解決しようとする課題】
ところが、指向性が鋭いために操縦者の姿勢によっては、その指向性の方向が操縦者の唇位置から容易に外れてしまい、安定した音声が入力されないのが現状である。
【０００６】
そこで、車室内のノイズの中でも、操縦者の音声を安定して音声認識装置に入力できることが要求されている。
本発明の目的は、ノイズの多い車室内において、操縦者の音声を安定して入力することができる車両用音声認識装置を提供することにある。
【０００７】
【課題を解決するための手段】
上記問題点を解決するために、請求項１に記載の発明は、車内に設けられた複数のマイクロフォンと、その複数のマイクロフォンが検出した音声信号に基づいて操縦者の音声を認識する音声認識手段を備えた車両用音声認識装置において、前記複数のマイクロフォンは、ステアリングホイールに設けられるとともに、複数のマイクロフォンの音声信号を合成した合成音声信号を最大とする遅延時間をステアリング遅延時間と規定し、予め設定した操縦者の唇位置において、ステアリングホイールの回動位置に相対するステアリング遅延時間を記憶する記憶手段と、前記ステアリングホイールの回動位置を検出する回動位置検出手段の出力信号に基づいてステアリングホイールの回動位置を演算し、前記記憶手段の記憶内容を参照してその時のステアリングホイールの回動位置に相対するステアリング遅延時間を設定する遅延時間演算手段と、前記複数のマイクロフォンの少なくともいずれか一つの音声信号を、前記遅延時間演算手段により設定されたステアリング遅延時間だけ他のマイクロフォンの音声信号より遅延して前記複数のマイクロフォンの音声信号を合成する遅延音声合成手段とを備えたことを要旨とする。
【００１５】
請求項１に記載の発明によれば、遅延時間演算手段は、ステアリングホイールの回動位置を検出するための回動位置検出手段の出力信号に基づいてステアリングホイールの回動位置を演算し、その時のステアリングホイールの回動位置において、複数のマイクロフォンの少なくとも一つの音声信号を予め設定した操縦者の唇位置において遅延し、前記マイクロフォンの音声信号と他のマイクロフォンの音声信号を合成した合成音声信号を最大にする遅延時間をステアリング遅延時間に設定している。
【００１６】
そして、遅延音声合成手段は、前記複数のマイクロフォンの少なくともいずれか一つの音声信号を前記ステアリング遅延時間だけ他のマイクロフォンの音声信号より遅延し、前記複数のマイクロフォンの音声信号を合成しているので、その合成された音声信号は操縦者の唇位置の方向からの音声を最もよく抽出することができる。
【００１７】
従って、ステアリングホイールが回動されても、音声認識手段には操縦者の音声が常に安定して供給される。その結果、音声認識手段は前記合成音声信号に基づいて音声認識を行うため、音声認識率を向上させることができる。
【００２１】
【発明の実施の形態】
（第１の実施形態）
以下、本発明を具体化した車両用音声認識装置の一実施形態を図１〜図４に従って説明する。
【００２２】
図１に示すように、車室１内の運転席側の前方にあるピラー（Ａピラー）２には撮像手段としてのＣＣＤカメラ３が操縦者の顔にその焦点を合わせて設けられている。
【００２３】
図２は、運転席から見た左右方向に回動操作されていない状態（中立状態）のステアリングホイール４を示す。ステアリングホイール４の周縁部には第１及び第２マイクロフォン５，６が離間して設けられている。そして、本実施形態では両マイクロフォン５，６はその間隔が３４センチメートルとなるようにステアリングホイール４に設置されている。
図３は、上記のように構成された車両用音声認識装置の電気的構成を示す。
【００２４】
ＣＣＤカメラ３は、操縦者の顔の唇を含んだ画像信号（ビデオ信号）を出力する。この画像信号は、画像処理装置１１に入力される。この画像処理装置１１は、唇位置演算手段としての画像用デジタルシグナルプロセッサ（画像用ＤＳＰ）１２、データを一時記憶する読み出し及び書き込み専用メモリ（画像用ＲＡＭ）１３、読み出し専用メモリ（画像用ＲＯＭ）１４及び高域フィルタ１５から構成されている。画像用ＲＯＭ１４には画像用ＤＳＰ１２による高速デジタル演算を行わせるための制御プログラムが格納されている。この制御プログラムは、画像用ＤＳＰ１２内の画像用ＲＡＭ１３に転送されて、画像用ＤＳＰ１２内で所望の画像認識処理が行われる。
【００２５】
高域フィルタ１５は、入力した画像信号を微分して微分画像信号を出力する。画像用ＤＳＰ１２は、高域フィルタ１５から微分画像信号を入力し、操縦者の顔の輪郭を演算している。画像用ＤＳＰ１２は、前記顔の輪郭に外接する４角形の中心位置を演算して鼻位置としている。そして、画像用ＤＳＰ１２は前記鼻位置に予め設定した距離を加えて唇位置を演算している。画像用ＤＳＰ１２は、前記唇位置に相対する唇位置信号Ｐ１を音声処理装置１６に出力する。
【００２６】
第１及び第２マイクロフォン５，６は、操縦者の音声を音声処理装置１６に出力する。音声処理装置１６は、遅延時間演算手段、遅延音声合成手段及び音声認識手段としての音声用ＤＳＰ１７、音声用ＲＡＭ１８及び音声用ＲＯＭ１９から構成されている。音声用ＲＯＭ１９には音声用ＤＳＰ１７による高速デジタル演算を行わせるための制御プログラムが格納されている。この制御プログラムは、音声用ＤＳＰ１６内の音声用ＲＡＭ１７に転送され、音声用ＤＳＰ１６内で所望の音声認識処理が行われる。又、音声用ＲＯＭ１９には予め設定した複数の音声命令パターンからなる複数の標準音声命令パターンが格納されている。更に、音声用ＲＯＭ１９には、運転席に座った時の操縦者の各唇位置（唇位置信号Ｐ１）に対する遅延時間τのデータが格納されている。
【００２７】
前記遅延時間τは、第２マイクロフォン６の音声信号を第１マイクロフォン５の音声信号より遅延させるための時間である。そして、第２マイクロフォン６の音声信号は音声用ＤＳＰ１７内でこの遅延時間τだけ遅延される。第１マイクロフォン５の音声信号は前記遅延時間τだけ遅延した第２マイクロフォン６の音声信号と音声用ＤＳＰ１７内で合成されて合成音声信号となる。つまり、第１マイクロフォン５と第２マイクロフォン６を１つのマイクロフォンとして考えた場合の指向性は、第１マイクロフォン５と第２マイクロフォン６のいずれか一方の音声信号を遅延させて合成させることによって変化することが知られている。従って、操縦者の音声を最も感度のよく集音することのできるマイクロフォンの指向性は、第２マイクロフォン６の音声信号を操縦者のその時々の唇位置にあわせた遅延時間τで遅延させることによって設定することができる。
【００２８】
図４（ａ）は、音声信号の周波数１０００ヘルツのときの周期をＴ（＝０．００１秒）、その波長をλ、第１及び第２マイクロフォン５，６間の距離をλ／２（＝３４センチメートル）とした場合の遅延時間τ（＝Ｔ／２＝０．０００５秒）における前記合成音声信号の指向性特性を示す。遅延時間τをＴ／２とすると、図４（ａ）に示すように、前記遅延時間τ（＝Ｔ／２＝０．０００５秒）における前記合成音声信号は０度及び１８０度の方向からの操縦者の音声を最もよく抽出している。つまり、このＴ／２の遅延時間τは、図２に示す状態にあるステアリングホイール４に対して操縦者の唇位置が真正面にある時、第１及び第２マイクロフォン５，６が操縦者の音声に対して最も優れた指向性を得る時間である。
【００２９】
同様に、図４（ｂ）は遅延時間τ（＝Ｔ／３≒０．０００３秒）における前記合成音声信号の指向性特性を示す。遅延時間τをＴ／３とすると、図４（ｂ）に示すように、前記遅延時間τ（＝Ｔ／３≒０．０００３秒）における前記合成音声信号は３４０度及び２００度の方向からの操縦者の音声を最もよく抽出している。つまり、このＴ／３の遅延時間τは、図２に示す状態にあるステアリングホイール４に対して操縦者の唇位置が真正面から左側に２０度傾けた位置にある時、第１及び第２マイクロフォン５，６が操縦者の音声に対して最も優れた指向性を得る時間である。
【００３０】
同様に、図４（ｃ）は遅延時間τ（＝３Ｔ／５＝０．０００６秒）における前記合成音声信号の指向性特性を示す。遅延時間τを３Ｔ／５とすると、図４（ｃ）に示すように、前記遅延時間τ（＝３Ｔ／５＝０．０００６秒）における前記合成音声信号は１０度及び１７０度の方向からの操縦者の音声を最もよく抽出している。つまり、この３Ｔ／５の遅延時間τは、図２に示す状態にあるステアリングホイール４に対して操縦者の唇位置が真正面から右側に１０度傾けた位置にある時、第１及び第２マイクロフォン５，６が操縦者の音声に対して最も優れた指向性を得る時間である。
【００３１】
従って、音声用ＲＯＭ１９に記憶された各マイクロフォン５，６に対する各唇位置の遅延時間τは、合成音声信号がその時々の唇位置で操縦者が音声を発した時における最も優れた指向性特性で得られるようにした時間である。
【００３２】
音声用ＤＳＰ１７は、イグニッションスイッチによる車両の始動とともに前記した複数の標準音声命令パターン及び遅延時間τデータを音声用ＤＳＰ１７内の音声用ＲＡＭ１８に読み込む。
【００３３】
音声用ＤＳＰ１７は、画像用ＤＳＰ１２から唇位置信号Ｐ１を入力し、前記唇位置信号Ｐ１に基づいてその時の唇位置に対する遅延時間τを音声用ＲＡＭ１７から読み出す。
【００３４】
音声用ＤＳＰ１７は、第１及び第２マイクロフォン５，６からの音声信号を入力し、音声用ＤＳＰ１７は前記遅延時間τに基づいて第２マイクロフォン６の音声信号を遅延する。
【００３５】
音声用ＤＳＰ１７は、第１及び第２マイクロフォン５，６からのそれぞれの音声信号を合成する。音声用ＤＳＰ１７は、両マイクロフォン５，６のその合成音声信号に基づいて音声認識を開始し、音声パターン（実音声パターン）を演算する。そして、音声用ＤＳＰ１７は前記実音声パターンが予め設定した複数の標準音声命令パターンのうちのいずれか一つに一致した時、その一致した標準音声命令パターンに相対する制御信号を出力する。
【００３６】
今、例えば操縦者の唇位置が０度の位置にある場合、画像用ＤＳＰ１２は唇位置（＝０度）に相対する唇位置信号Ｐ１を出力する。音声用ＤＳＰ１７はその唇位置信号Ｐ１に基づいてその時の唇位置に対する遅延時問τ（＝Ｔ／２）を音声用ＲＡＭ１７から読み出す。そして、音声用ＤＳＰ１７は第２マイクロフォン６の音声信号を第１マイクロフォン５の音声信号より前記遅延時間τ（＝Ｔ／２）だけ遅延させて第１及び第２マイクロフォン５，６の音声信号を合成する。従って、図４（ａ）に示すように、その合成音声信号は、操縦者の唇位置の方向（＝０度）からの音声が最もよく抽出している。
【００３７】
上記実施形態の車両用音声認識装置によれば、以下のような特徴を得ることができる。
（１）本実施形態においては、音声用ＤＳＰ１７は、各マイクロフォン５，６に対する操縦者の唇位置に基づいて第２マイクロフォン６の音声信号を遅延させる遅延時間τを演算している。そして、音声用ＤＳＰ１７は第２マイクロフォン６の音声信号を第１マイクロフォン５の音声信号より前記遅延時間τだけ遅延して第１及び第２マイクロフォン５，６の音声信号を合成している。このように合成した合成音声信号においては、操縦者の唇位置からの方向からの音声が最もよく抽出されている。
【００３８】
従って、音声用ＤＳＰ１７には操縦者の唇位置が変化しても操縦者の音声が常に安定して供給される。その結果、音声用ＤＳＰ１７は前記合成音声信号に基づいて音声認識を行うため、音声認識率を向上させることができる。
【００３９】
（２）本実施形態においては、運転席に座って音声を発する操縦者の顔をＣＣＤカメラ３で撮像し、画像用ＳＤＰ１２にてその操縦者の唇位置を認識するようにした。従って、操縦者が常にどちらに向いて音声を発しているか正確に検出することができる。
【００４０】
（第２の実施形態）
以下、本発明を車両用音声認識装置に具体化した第２の実施形態を図５に従って説明する。
【００４１】
本実施形態の構成は、第１の実施形態の車両用音声認識装置のＣＣＤカメラ３及び画像処理装置１１に代えてステアリング角度センサ２０を設けた点において第１の実施形態と異なる。
【００４２】
図示しないステアリングシャフトにはステアリングホイール４の回動位置を検出する回動位置検出手段としての図示しないステアリング角度センサ２０が設けられている。前記ステアリング角度センサ２０は、光学式のロータリエンコーダより構成されている。
【００４３】
図５は、その車両用音声認識装置の電気的構成を示す。
第１及び第２マイクロフォン５，６は、操縦者の音声を音声処理装置１６に出力する。
【００４４】
音声処理装置１６は、遅延時間演算手段、遅延音声合成手段及び音声認識手段としての音声用ＤＳＰ１７、音声用ＲＡＭ１８及び音声用ＲＯＭ１９から構成されている。音声用ＲＯＭ１９には音声用ＤＳＰ１７による高速デジタル演算を行わせるための制御プログラムが格納されている。この制御プログラムは、音声用ＤＳＰ１７内の音声用ＲＡＭ１８に転送され、音声用ＤＳＰ１７内で所望の音声認識処理が行われる。又、音声用ＲＯＭ１９には予め設定した複数の音声命令パターンからなる複数の標準音声命令パターンが格納されている。更に、音声用ＲＯＭ１９には予め設定した操縦者の一つの唇位置において、ステアリングホイール４の複数の回動位置にそれぞれ関連付けて予め設定した複数のステアリング遅延時間τｓのデータが格納されている。
【００４５】
前記ステアリング遅延時間τｓは、第２マイクロフォン６の音声信号を遅延させるための時間である。又、前記複数のステアリング遅延時間τｓはその時々のステアリングホイール４の回動位置において、第１及び第２マイクロフォン５，６の音声信号を合成した合成音声信号を最大にする遅延時間である。
【００４６】
イグニッションスイッチによる車両の始動とともに前記した複数の標準音声命令パターン及び前記遅延時間τのデータが音声用ＤＳＰ１７内の音声用ＲＡＭ１８に読み込まれる。
【００４７】
前記ステアリング角度センサ２０は、ステアリングホイール４の回動位置に相対する信号を出力する。
音声用ＤＳＰ１７は、前記ステアリング角度センサ２０からの信号を入力して、ステアリングホイール４の回動位置を演算してその回動位置に相対するステアリング遅延時間τｓを音声用ＲＡＭ１８から読み込む。そして、音声用ＤＳＰ１７は前記回動位置において両マイクロフォン５，６の音声信号を入力し、音声用ＤＳＰ１７内において第２マイクロフォン６の音声信号を前記遅延時間τｓを用いて第１マイクロフォン５の音声信号より遅延し、両マイクロフォン５，６の音声信号を合成して合成音声信号を生成する。
【００４８】
そして、音声用ＤＳＰ１７は両マイクロフォン５，６の前記合成音声信号に基づいて音声認識を開始し、実音声パターンを演算し、前記実音声パターンが予め設定した複数の標準音声命令パターンのうちのいずれか一つに一致した時、その一致した標準音声命令パターンに相対する制御信号を出力する。
【００４９】
今、操縦者の予め定めた一つの唇位置において、音声用ＤＳＰ１７はステアリング角度センサ２０からの信号を入力して、ステアリングホイール４の回動位置を演算している。音声用ＤＳＰ１７は、前記回動位置に相対するステアリング遅延時間τｓを音声用ＲＡＭ１８から読み出す。そして、音声用ＤＳＰ１７は前記回動位置において、両マイクロフォン５，６の音声信号を入力し、第２マイクロフォン６の音声信号を第１マイクロフォン５の音声信号より前記遅延時間τｓだけ遅延して両マイクロフォン５，６の音声信号を合成して合成音声信号を生成している。
【００５０】
従って、前記合成音声信号は前記回動位置において、操縦者の唇位置の方向からの音声を最もよく抽出している。
次に、ステアリングホイール４が回動させられると、音声用ＤＳＰ１７はステアリング角度センサ２０からの信号を入力して、その時のステアリングホイール４の回動位置を演算している。音声用ＤＳＰ１７は、前記回動位置に相対するステアリング遅延時間τｓを音声用ＲＡＭ１８から読み出す。そして、音声用ＤＳＰ１７は前記回動位置において、両マイクロフォン５，６の音声信号を入力し、第２マイクロフォン６の音声信号を第１マイクロフォン５の音声信号より前記遅延時間τｓだけ遅延して両マイクロフォン５，６の音声信号を合成して合成音声信号を生成している。
【００５１】
従って、前記合成音声信号はステアリング操作時の前記回動位置においても、操縦者の唇位置の方向からの音声を最もよく抽出している。つまり、音声用ＤＳＰ１７はステアリングホイール４が回動したときにも操縦者の唇位置の方向からの音声を最もよく抽出することができる。
【００５２】
本実施形態によれば、以下のような特徴を得ることができる。
（１）音声用ＤＳＰ１７は、ステアリング角度センサ２０からの信号を入力して、ステアリングホイール４の回動位置を演算している。音声用ＤＳＰ１７は、前記回動位置に相対するステアリング遅延時間τｓを音声用ＲＡＭ１８から読み込む。そして、音声用ＤＳＰ１７は前記回動位置において、両マイクロフォン５，６の音声信号を入力し、第２マイクロフォン６の音声信号を第１マイクロフォン５の音声信号より前記遅延時間τｓだけ遅延して両マイクロフォン５，６の音声信号を合成して合成音声信号を生成している。
【００５３】
このように生成された前記合成音声信号は、予め設定した操縦者の一つの唇位置において、ステアリングホイール４のその時々の回動位置で操縦者の唇位置の方向からの音声を最もよく抽出している。
【００５４】
従って、音声用ＤＳＰ１７にはステアリングホイール４が回動しても操縦者の音声が常に安定して供給される。その結果、音声用ＤＳＰ１７は音声認識率を向上させることができる。
【００５５】
（２）本実施形態では、第１の実施形態のようなＣＣＤカメラ３及び画像用ＤＳＰ１２が不要なため、コストダウンを図ることができる。
（第３の実施形態）
以下、本発明を車両用音声認識装置に具体化した第３の実施形態を図６に従って説明する。
【００５６】
本実施形態の構成は、第２の実施形態の車両用音声認識装置からステアリング角度センサ２０を取り除いたものである。
第１及び第２マイクロフォン５，６は、操縦者の音声を音声処理装置１６に出力する。
【００５７】
音声処理装置１６は、遅延時間演算手段、遅延音声合成手段及び音声認識手段としての音声用ＤＳＰ１７、音声用ＲＡＭ１８及び音声用ＲＯＭ１９から構成されている。音声用ＲＯＭ１９には音声用ＤＳＰ１７による高速デジタル演算を行わせるための制御プログラムが格納されている。この制御プログラムは、音声用ＤＳＰ１７内の音声用ＲＡＭ１８に転送され、音声用ＤＳＰ１７内で所望の音声認識処理が行われる。又、音声用ＲＯＭ１９には予め設定した複数の音声命令パターンからなる複数の標準音声命令パターンが格納されている。
【００５８】
操縦者の音声を出力する第１マイクロフォン５の音声信号と遅延時間τで遅延した第２マイクロフォン６の音声信号を合成すると、第１及び第２マイクロフォンの合成音声信号は指向性を備え、その指向性は前記遅延時間τの値によって変化するため、その合成音声信号も変化することが知られている。従って、前記合成音声信号が最大となる遅延時間τを演算してその遅延時間τを第２マイクロフォン６の遅延時間τに設定することによって操縦者の唇位置からの音声を最もよく抽出することができる。
【００５９】
イグニッションスイッチによる車両の始動とともに前記した複数の標準音声命令パターンが音声用ＤＳＰ１７内の音声用ＲＡＭ１８に読み込まれる。
音声用ＤＳＰ１７は、第１及び第２マイクロフォン５，６の音声信号を入力し、遅延時間τを予め設定した範囲内で順次変化させ、音声用ＤＳＰ１７内において第２マイクロフォン６の音声信号を種々の遅延時間τで第１マイクロフォン５の音声信号より遅延する。そして、音声用ＤＳＰ１７はそれぞれの遅延時間τにおいて、両マイクロフォン５，６の音声信号を合成して合成音声信号を生成する。そして、音声用ＤＳＰ１７はそれらの合成音声信号をそれぞれ比較して、前記合成音声信号が最大となる時の遅延時間τ（＝τｍ）を演算する。
【００６０】
音声用ＤＳＰ１７は、遅延時間τを演算した前記遅延時間τｍに設定して第２マイクロフォン６の音声信号を第１マイクロフォン５の音声信号より遅延時間τｍだけ遅延し、両マイクロフォン５，６の音声信号を合成して合成音声信号を生成する。
【００６１】
そして、音声用ＤＳＰ１７は両マイクロフォン５，６の前記合成音声信号に基づいて音声認識を開始し、実音声パターンを演算し、前記実音声パターンが予め設定した複数の標準音声命令パターンのうちのいずれか一つに一致した時、その一致した標準音声命令パターンに相対する制御信号を出力する。
【００６２】
本実施形態によれば、以下のような特徴を得ることができる。
（１）本実施形態においては、音声用ＤＳＰ１７は第１及び第２マイクロフォン５，６から音声信号を入力し、遅延時間τを予め設定した範囲内で順次変化させ、第２マイクロフォン６の音声信号を第１マイクロフォン５の音声信号より遅延時間τだけ遅延して第１及び第２マイクロフォン５，６の音声信号を合成して合成音声信号を生成している。そして、音声用ＤＳＰ１７は前記合成音声信号が最大となる遅延時間τｍを演算している。即ち、前記遅延時間τｍは操縦者の音声を最もよく抽出する値に設定される。
【００６３】
そして、音声用ＤＳＰ１７は、第２マイクロフォン６の音声信号を第１マイクロフォン５の音声信号より遅延時間τｍだけ遅延して両マイクロフォン５，６の音声信号を合成しているので、操縦者の音声は操縦者の唇位置に関わらず、その唇位置の方向からの音声を最もよく抽出することができる。
【００６４】
従って、両マイクロフォン５，６の指向性は、常に操縦者の唇位置の方向を向くため、操縦者の音声が常に安定して供給される。その結果、音声用ＤＳＰ１７は前記合成音声信号に基づいて音声認識を行うため、音声認識率を向上させることができる。
【００６５】
（２）本実施形態では、第３の実施形態のようなＣＣＤカメラ３及び画像用ＤＳＰ１２や第２の実施形態のようなステアリング角度センサ２０が不要なため、コストダウンを図ることができる。
【００６６】
尚、本発明の実施形態は以下のように変更してもよい。
○第１の実施形態において、ＣＣＤカメラ３に代えて撮像管を用いてもよい。○第１の実施形態において、一つのＣＣＤカメラ３に代えて複数のＣＣＤカメラ３を用いてもよい。この場合、複数のＣＣＤカメラ３のうち操縦者の唇を的確にとらえている一つのＣＣＤカメラ３の画像信号が、画像用ＤＳＰ１２内において選択され、画像用ＤＳＰ１２はその選択された画像信号に基づいて操縦者の唇位置を演算する。
【００６７】
○第１の実施形態において、Ａピラー２にＣＣＤカメラ３を設けたが、操縦者の顔の画像を得られる車室１内ならＣＣＤカメラ３をどこに設けてもよい。
○第１の実施形態において、周期Ｔは０．００１秒としたが、音声周波数の領域（数ヘルツ〜２０キロヘルツ）の周波数の周期であればどの数値でもよい。
【００６８】
○第２の実施形態において、ステアリングホイール４の回動位置を検出するために光学式のロータリエンコーダを用いたが、これを磁気式のロータリエンコーダに変更してもよい。
【００６９】
○第２の実施形態において、音声用ＲＯＭ１９には予め設定した操縦者の一つの唇位置におけるステアリングホイール４の複数の回動位置に関連付け、予め設定した複数のステアリング遅延時間τｓからなる遅延時間τのデータを格納したが、操縦者の複数の唇位置において、ステアリングホイール４の複数の回動位置に関連付け、予め設定した複数のステアリング遅延時間τｓからなる遅延時間τのデータを格納してもよい。
【００７０】
○第１から第３の実施形態において、第１及び第２マイクロフォン５，６に加えて、ステアリングパッド４に更に一つ以上のマイクロフォンを設けてもよい。
このように構成した場合、音声用ＤＳＰ１７は追加されたマイクロフォンの音声信号を遅延させるための各遅延時間τを演算し、前記各遅延時間τだけ各マイクロフォンの音声信号を遅延させた後、各マイクロフォンの音声信号を合成する。
【００７１】
○前記各実施形態では、第１マイクロフォン５と第２マイクロフォン６の２個用いて実施したが、３個、４個又はそれ以上用いて実施してもよい。
○前記各実施形態では、第１及び第２マイクロフォン５，６は、ステアリングホイール４に設置したが、ステアリングホイール４以外の例えばインパネ等のその他の箇所に設置してもよい。
【００７２】
○上記各実施形態において、画像用ＤＳＰ１２又は音声用ＤＳＰ１７の少なくともいずれか一方を中央演算処理装置（ＣＰＵ）に代えてもよい。
上記各別例のように構成した場合にも、前記各実施形態にほぼ同様の特徴を得ることができる。
【００７３】
次に、前記各実施形態及び別例から把握できる請求項に記載した発明以外の技術的思想について、それらの効果と共に以下に記載する。
（１）車内に設けられた複数のマイクロフォンと、その複数のマイクロフォンが検出した操縦者の音声を認識する音声認識手段とからなる車両用音声認識装置の音声検出方法において、
前記操縦者の唇位置に基づいて前記複数のマイクロフォンの少なくともいずれか一つの音声信号を遅延するための遅延時間（τ）を演算し、前記複数のマイクロフォンの少なくともいずれか一つの音声信号を前記遅延時間（τ）だけ他のマイクロフォンの音声信号より遅延し、前記複数のマイクロフォンの音声信号を合成して操縦者の音声を検出する車両用音声認識装置の音声検出方法。
【００７４】
この場合、音声認識手段には操縦者の音声が常に安定して供給される。その結果、音声認識手段は前記合成音声信号に基づいて音声認識を行うため、音声認識率を向上させることができる。
【００７５】
【発明の効果】
以上詳述したように、請求項１に記載の発明によれば、ステアリングホイールが回動されても、音声認識手段には操縦者の音声が常に安定して供給される。その結果、音声認識手段は前記合成音声信号に基づいて音声認識を行うため、音声認識率を向上させることができる。
【図面の簡単な説明】
【図１】ＣＣＤカメラ及びマイクロフォンの配置を説明するための車室内の概略図。
【図２】マイクロフォンの配置を説明するステアリングパッドの概略図。
【図３】第１の実施形態における車両用音声認識装置の電気的構成図。
【図４】両マイクロフォンの合成音声信号の指向性特性図であり、（ａ）は遅延時間（＝Ｔ／２）の場合、（ｂ）は遅延時間（＝Ｔ／３）の場合、（ｃ）は遅延時間（＝３Ｔ／５）場合を示す。
【図５】第２の実施形態における車両用音声認識装置の電気的構成図。
【図６】第３の実施形態における車両用音声認識装置の電気的構成図。
【符号の説明】
τ，τｍ…遅延時間、τｓ…ステアリング遅延時間、Ｐ１…唇位置信号、３…撮像手段、４…ステアリングホイール、５…マイクロフォンとしての第１マイクロフォン、６…マイクロフォンとしての第２マイクロフォン、１２…唇位置演算手段としての画像用ＤＳＰ、１７…音声認識手段、遅延音声合成手段、遅延時間演算手段としての音声用ＤＳＰ、２０…回動位置検出手段としてのステアリング角度センサ。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a vehicle voice recognition apparatus, and more particularly to detection of a driver's voice.
[0002]
[Prior art]
In recent years, with the spread of car navigation, a vehicle voice recognition device has been used to set a destination. Further, in a hands-free telephone or the like, a vehicular voice recognition device is used for setting a telephone number.
[0003]
In general, in a running vehicle interior, there are noises such as conversations in the front passenger seat and rear seats, sound from car audio, or noise from outside the vehicle when a window is opened. Is currently not good.
[0004]
In view of this, improvements have been made to devices that remove these noises and detect only the voice of the operator.
In a conventional vehicle voice recognition device, the steering pad is provided with a microphone for recognizing the voice of the driver. This microphone has a fairly sharp directivity for detecting the voice of the driver only, and the direction of the directivity is directed to the pilot's lips.
[0005]
[Problems to be solved by the invention]
However, since the directivity is sharp, the direction of the directivity easily deviates from the position of the lips of the pilot depending on the attitude of the pilot, and stable voice is not input at present.
[0006]
So, in the noise in the passenger compartment Also, It is required that the voice of the operator can be stably input to the voice recognition device.
It is an object of the present invention to be installed in a noisy vehicle interior. And An object of the present invention is to provide a vehicle voice recognition device capable of stably inputting a voice of a driver.
[0007]
[Means for Solving the Problems]
In order to solve the above problem, the invention according to claim 1 is a voice recognition means for recognizing a driver's voice based on a plurality of microphones provided in the vehicle and a voice signal detected by the plurality of microphones. In the vehicle speech recognition apparatus comprising: the plurality of microphones. Is , Steering wheel Le As well as Steering delay time is defined as the maximum delay time of the synthesized speech signal composed of multiple microphones, and the steering delay time relative to the turning position of the steering wheel is stored at the preset lip position of the driver. Storage means for Based on the output signal of the rotational position detecting means for detecting the rotational position of the steering wheel, the rotational position of the steering wheel is calculated, With reference to the storage contents of the storage means At the turning position of the steering wheel at that time Opposite Steering delay time The Delay time calculation to set Stepped Delayed voice for synthesizing voice signals of the plurality of microphones by delaying at least one voice signal of the plurality of microphones from a voice signal of another microphone by a steering delay time set by the delay time calculation means Synthetic hand Stepped The main point is that
[0015]
According to the invention of claim 1, The delay time calculation means calculates the rotation position of the steering wheel based on the output signal of the rotation position detection means for detecting the rotation position of the steering wheel, and at the rotation position of the steering wheel at that time, a plurality of At least one audio signal from the microphone In the pilot's lip position set in advance A delay time for maximizing a synthesized voice signal that is delayed and synthesized from the voice signal of the microphone and the voice signal of another microphone is set as the steering delay time.
[0016]
And The delayed speech synthesizer is configured to add at least one speech signal of the plurality of microphones to the steering delay time. The ma Since the voice signals of the plurality of microphones are synthesized after being delayed from the voice signal of the microphone, the synthesized voice signal can best extract the voice from the direction of the lip position of the pilot.
[0017]
Therefore, even if the steering wheel is rotated, the voice of the driver is always stably supplied to the voice recognition means. As a result, since the speech recognition means performs speech recognition based on the synthesized speech signal, the speech recognition rate can be improved.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
(First embodiment)
DESCRIPTION OF EMBODIMENTS Hereinafter, an embodiment of a vehicle voice recognition device embodying the present invention will be described with reference to FIGS.
[0022]
As shown in FIG. 1, a pillar (A pillar) 2 in front of a driver's seat in a passenger compartment 1 is provided with a CCD camera 3 as an imaging means so as to focus on a driver's face.
[0023]
FIG. 2 shows the steering wheel 4 in a state where it is not rotated in the left-right direction as viewed from the driver seat (neutral state). First and second microphones 5 and 6 are provided apart from each other at the peripheral edge of the steering wheel 4. In this embodiment, the microphones 5 and 6 are installed on the steering wheel 4 so that the distance between them is 34 centimeters.
FIG. 3 shows an electrical configuration of the vehicle speech recognition apparatus configured as described above.
[0024]
The CCD camera 3 outputs an image signal (video signal) including the lips of the driver's face. This image signal is input to the image processing device 11. The image processing apparatus 11 includes an image digital signal processor (image DSP) 12 as a lip position calculation means, a read / write memory (image RAM) 13 for temporarily storing data, and a read only memory (image ROM). 14 and a high-pass filter 15. The image ROM 14 stores a control program for performing high-speed digital computation by the image DSP 12. This control program is transferred to the image RAM 13 in the image DSP 12 and a desired image recognition process is performed in the image DSP 12.
[0025]
The high-pass filter 15 differentiates the input image signal and outputs a differentiated image signal. The image DSP 12 receives the differential image signal from the high-pass filter 15 and calculates the contour of the driver's face. The image DSP 12 calculates the center position of a rectangle circumscribing the outline of the face as the nose position. The image DSP 12 calculates a lip position by adding a preset distance to the nose position. The image DSP 12 outputs a lip position signal P1 relative to the lip position to the sound processing device 16.
[0026]
The first and second microphones 5 and 6 output the voice of the operator to the voice processing device 16. The voice processing device 16 includes a voice DSP 17, a voice RAM 18, and a voice ROM 19 as delay time calculation means, delayed voice synthesis means, and voice recognition means. The audio ROM 19 stores a control program for performing high-speed digital computation by the audio DSP 17. This control program is transferred to the audio RAM 17 in the audio DSP 16 and desired audio recognition processing is performed in the audio DSP 16. The voice ROM 19 stores a plurality of standard voice command patterns composed of a plurality of preset voice command patterns. Furthermore, the voice ROM 19 stores data of the delay time τ for each lip position (lip position signal P1) of the operator when sitting in the driver's seat.
[0027]
The delay time τ is a time for delaying the audio signal of the second microphone 6 from the audio signal of the first microphone 5. The audio signal of the second microphone 6 is delayed by this delay time τ in the audio DSP 17. The audio signal of the first microphone 5 is synthesized in the audio DSP 17 with the audio signal of the second microphone 6 delayed by the delay time τ to become a synthesized audio signal. That is, the directivity when the first microphone 5 and the second microphone 6 are considered as one microphone is changed by delaying and synthesizing the audio signal of one of the first microphone 5 and the second microphone 6. It is known. Therefore, the directivity of the microphone that can collect the voice of the pilot with the highest sensitivity is obtained by delaying the voice signal of the second microphone 6 with a delay time τ that matches the lip position of the pilot. Can be set.
[0028]
FIG. 4 (a) shows the period when the frequency of the audio signal is 1000 Hz as T (= 0.001 second), its wavelength as λ, and the distance between the first and second microphones 5 and 6 as λ / 2 (= The directivity characteristics of the synthesized speech signal at a delay time τ (= T / 2 = 0.0005 seconds) in the case of 34 centimeters) are shown. Assuming that the delay time τ is T / 2, as shown in FIG. 4A, the synthesized speech signal in the delay time τ (= T / 2 = 0.0005 seconds) is from 0 degrees and 180 degrees. The voice of the pilot is extracted best. In other words, the delay time τ of T / 2 is such that when the operator's lip position is in front of the steering wheel 4 in the state shown in FIG. It is time to obtain the most excellent directivity for.
[0029]
Similarly, FIG. 4B shows the directivity characteristics of the synthesized speech signal at the delay time τ (= T / 3≈0.0003 seconds). Assuming that the delay time τ is T / 3, as shown in FIG. 4B, the synthesized speech signal at the delay time τ (= T / 3≈0.0003 seconds) is from 340 degrees and 200 degrees. The voice of the pilot is extracted best. That is, the delay time τ of T / 3 is the first and second microphones when the operator's lip position is tilted 20 degrees from the front to the left with respect to the steering wheel 4 in the state shown in FIG. Times 5 and 6 are times for obtaining the best directivity with respect to the voice of the pilot.
[0030]
Similarly, FIG. 4C shows the directivity characteristics of the synthesized speech signal at the delay time τ (= 3T / 5 = 0.006 seconds). Assuming that the delay time τ is 3T / 5, as shown in FIG. 4C, the synthesized speech signal in the delay time τ (= 3T / 5 = 0.006 seconds) is from 10 degrees and 170 degrees. The voice of the pilot is extracted best. That is, the delay time τ of 3T / 5 is the first and second microphones when the operator's lip position is tilted 10 degrees from the front to the right with respect to the steering wheel 4 in the state shown in FIG. Times 5 and 6 are times for obtaining the best directivity with respect to the voice of the pilot.
[0031]
Therefore, the delay time τ of each lip position with respect to each of the microphones 5 and 6 stored in the voice ROM 19 is the most excellent directivity characteristic when the synthesized voice signal is emitted by the operator at the current lip position. It is the time that was made available.
[0032]
The voice DSP 17 reads the plurality of standard voice command patterns and the delay time τ data into the voice RAM 18 in the voice DSP 17 together with the start of the vehicle by the ignition switch.
[0033]
The audio DSP 17 receives the lip position signal P1 from the image DSP 12, and reads out the delay time τ for the lip position from the audio RAM 17 based on the lip position signal P1.
[0034]
The audio DSP 17 inputs audio signals from the first and second microphones 5 and 6, and the audio DSP 17 delays the audio signal of the second microphone 6 based on the delay time τ.
[0035]
The audio DSP 17 synthesizes the audio signals from the first and second microphones 5 and 6. The voice DSP 17 starts voice recognition based on the synthesized voice signals of both microphones 5 and 6, and calculates a voice pattern (actual voice pattern). When the actual voice pattern matches any one of a plurality of preset standard voice command patterns, the voice DSP 17 outputs a control signal relative to the matched standard voice command pattern.
[0036]
Now, for example, when the lip position of the driver is at a 0 degree position, the image DSP 12 outputs a lip position signal P1 relative to the lip position (= 0 degree). The audio DSP 17 reads the delay time τ (= T / 2) for the current lip position from the audio RAM 17 based on the lip position signal P1. The audio DSP 17 synthesizes the audio signals of the first and second microphones 5 and 6 by delaying the audio signal of the second microphone 6 by the delay time τ (= T / 2) from the audio signal of the first microphone 5. To do. Therefore, as shown in FIG. 4A, the synthesized voice signal is most extracted from the direction of the lips position (= 0 degree) of the driver.
[0037]
According to the vehicle voice recognition apparatus of the above embodiment, the following features can be obtained.
(1) In the present embodiment, the audio DSP 17 calculates a delay time τ for delaying the audio signal of the second microphone 6 based on the lip position of the operator with respect to the microphones 5 and 6. The audio DSP 17 synthesizes the audio signals of the first and second microphones 5 and 6 by delaying the audio signal of the second microphone 6 by the delay time τ from the audio signal of the first microphone 5. In the synthesized voice signal synthesized in this way, the voice from the direction from the lips position of the driver is extracted most.
[0038]
Therefore, the voice of the pilot is always stably supplied to the voice DSP 17 even if the position of the lip of the pilot changes. As a result, since the speech DSP 17 performs speech recognition based on the synthesized speech signal, the speech recognition rate can be improved.
[0039]
(2) In the present embodiment, the face of the driver who sits in the driver's seat and emits sound is imaged by the CCD camera 3, and the lip position of the driver is recognized by the image SDP 12. Therefore, it is possible to accurately detect which direction the driver is always speaking.
[0040]
(Second Embodiment)
A second embodiment in which the present invention is embodied in a vehicle voice recognition device will be described below with reference to FIG.
[0041]
The configuration of this embodiment is different from that of the first embodiment in that a steering angle sensor 20 is provided in place of the CCD camera 3 and the image processing device 11 of the vehicle voice recognition device of the first embodiment.
[0042]
A steering shaft (not shown) is provided with a steering angle sensor 20 (not shown) as a turning position detecting means for detecting the turning position of the steering wheel 4. The steering angle sensor 20 is composed of an optical rotary encoder.
[0043]
FIG. 5 shows the electrical configuration of the vehicle speech recognition apparatus.
The first and second microphones 5 and 6 output the voice of the operator to the voice processing device 16.
[0044]
The voice processing device 16 includes a voice DSP 17, a voice RAM 18, and a voice ROM 19 as delay time calculation means, delayed voice synthesis means, and voice recognition means. The audio ROM 19 stores a control program for performing high-speed digital computation by the audio DSP 17. This control program is transferred to the audio RAM 18 in the audio DSP 17 and desired audio recognition processing is performed in the audio DSP 17. The voice ROM 19 stores a plurality of standard voice command patterns composed of a plurality of preset voice command patterns. Further, the voice ROM 19 stores data of a plurality of steering delay times τs set in advance in association with a plurality of rotation positions of the steering wheel 4 in one lip position of the pilot set in advance.
[0045]
The steering delay time τs is a time for delaying the audio signal of the second microphone 6. The plurality of steering delay times τs are delay times for maximizing a synthesized voice signal obtained by synthesizing the voice signals of the first and second microphones 5 and 6 at the turning position of the steering wheel 4 at that time.
[0046]
As the vehicle is started by the ignition switch, the plurality of standard voice command patterns and the data of the delay time τ are read into the voice RAM 18 in the voice DSP 17.
[0047]
The steering angle sensor 20 outputs a signal relative to the rotational position of the steering wheel 4.
The audio DSP 17 inputs a signal from the steering angle sensor 20, calculates the rotational position of the steering wheel 4, and reads the steering delay time τs relative to the rotational position from the audio RAM 18. The audio DSP 17 inputs the audio signals of both microphones 5 and 6 at the rotation position, and the audio signal of the second microphone 6 is input to the audio signal of the first microphone 5 in the audio DSP 17 using the delay time τs. The synthesized voice signal is generated by synthesizing the voice signals of both microphones 5 and 6 with a further delay.
[0048]
The voice DSP 17 starts voice recognition based on the synthesized voice signals of the microphones 5 and 6, calculates a real voice pattern, and the real voice pattern is one of a plurality of standard voice command patterns set in advance. When it matches, the control signal corresponding to the matched standard voice command pattern is output.
[0049]
Now, the voice DSP 17 inputs a signal from the steering angle sensor 20 to calculate the rotational position of the steering wheel 4 at one predetermined lip position of the operator. The audio DSP 17 reads the steering delay time τs relative to the rotation position from the audio RAM 18. Then, the audio DSP 17 receives the audio signals of the two microphones 5 and 6 at the rotation position, and delays the audio signal of the second microphone 6 from the audio signal of the first microphone 5 by the delay time τs. A synthesized speech signal is generated by synthesizing 5 and 6 speech signals.
[0050]
Therefore, the synthesized voice signal best extracts the voice from the direction of the lips position of the driver at the turning position.
Next, when the steering wheel 4 is rotated, the audio DSP 17 inputs a signal from the steering angle sensor 20 and calculates the rotation position of the steering wheel 4 at that time. The sound DSP 17 reads the steering delay time τs relative to the rotation position from the sound RAM 18. Then, the audio DSP 17 receives the audio signals of the two microphones 5 and 6 at the rotation position, and delays the audio signal of the second microphone 6 from the audio signal of the first microphone 5 by the delay time τs. A synthesized speech signal is generated by synthesizing 5 and 6 speech signals.
[0051]
Therefore, the synthesized voice signal best extracts the voice from the direction of the operator's lip position even at the turning position during the steering operation. That is, the voice DSP 17 can best extract the voice from the direction of the operator's lips even when the steering wheel 4 rotates.
[0052]
According to this embodiment, the following features can be obtained.
(1) The voice DSP 17 inputs a signal from the steering angle sensor 20 and calculates the rotational position of the steering wheel 4. The audio DSP 17 reads the steering delay time τs relative to the rotation position from the audio RAM 18. Then, the audio DSP 17 receives the audio signals of the two microphones 5 and 6 at the rotation position, and delays the audio signal of the second microphone 6 from the audio signal of the first microphone 5 by the delay time τs. A synthesized speech signal is generated by synthesizing 5 and 6 speech signals.
[0053]
The synthesized voice signal generated in this way best extracts the voice from the direction of the operator's lip position at the occasional turning position of the steering wheel 4 at one preset lip position of the operator. ing.
[0054]
Therefore, the audio DSP 17 The Even if the tearing wheel 4 rotates, the voice of the driver is always supplied stably. As a result, the voice DSP 17 can improve the voice recognition rate.
[0055]
(2) In this embodiment, since the CCD camera 3 and the image DSP 12 as in the first embodiment are unnecessary, the cost can be reduced.
(Third embodiment)
A third embodiment in which the present invention is embodied in a vehicle voice recognition device will be described below with reference to FIG.
[0056]
The configuration of the present embodiment is obtained by removing the steering angle sensor 20 from the vehicle voice recognition device of the second embodiment.
The first and second microphones 5 and 6 output the voice of the operator to the voice processing device 16.
[0057]
The voice processing device 16 includes a voice DSP 17, a voice RAM 18, and a voice ROM 19 as delay time calculation means, delayed voice synthesis means, and voice recognition means. The audio ROM 19 stores a control program for performing high-speed digital computation by the audio DSP 17. This control program is transferred to the audio RAM 18 in the audio DSP 17 and desired audio recognition processing is performed in the audio DSP 17. The voice ROM 19 stores a plurality of standard voice command patterns composed of a plurality of preset voice command patterns.
[0058]
When the voice signal of the first microphone 5 that outputs the voice of the driver and the voice signal of the second microphone 6 delayed by the delay time τ are synthesized, the synthesized voice signals of the first and second microphones have directivity and the directivity thereof. It is known that the synthesized voice signal also changes because the characteristics change depending on the value of the delay time τ. Therefore, by calculating the delay time τ that maximizes the synthesized voice signal and setting the delay time τ to the delay time τ of the second microphone 6, it is possible to best extract the voice from the driver's lip position. it can.
[0059]
As the vehicle is started by the ignition switch, the plurality of standard voice command patterns are read into the voice RAM 18 in the voice DSP 17.
The audio DSP 17 inputs the audio signals of the first and second microphones 5 and 6 and sequentially changes the delay time τ within a preset range, and the audio signal of the second microphone 6 is changed in various ways in the audio DSP 17. Delayed from the audio signal of the first microphone 5 by the delay time τ. Then, the audio DSP 17 synthesizes the audio signals of both microphones 5 and 6 at each delay time τ to generate a synthesized audio signal. The speech DSP 17 compares these synthesized speech signals, and calculates a delay time τ (= τm) when the synthesized speech signal is maximized.
[0060]
The audio DSP 17 sets the delay time τm to the calculated delay time τm, delays the audio signal of the second microphone 6 by the delay time τm from the audio signal of the first microphone 5, and the audio signals of both microphones 5 and 6. Are synthesized to generate a synthesized speech signal.
[0061]
The voice DSP 17 starts voice recognition based on the synthesized voice signals of the microphones 5 and 6, calculates a real voice pattern, and the real voice pattern is one of a plurality of standard voice command patterns set in advance. When it matches, the control signal corresponding to the matched standard voice command pattern is output.
[0062]
According to this embodiment, the following features can be obtained.
(1) In the present embodiment, the audio DSP 17 inputs audio signals from the first and second microphones 5 and 6 and sequentially changes the delay time τ within a preset range, so that the audio signal of the second microphone 6 is obtained. Is delayed from the audio signal of the first microphone 5 by a delay time τ, and the audio signals of the first and second microphones 5 and 6 are synthesized to generate a synthesized audio signal. The audio DSP 17 calculates a delay time τm that maximizes the synthesized audio signal. That is, the delay time τm is set to a value that best extracts the voice of the driver.
[0063]
The voice DSP 17 synthesizes the voice signals of the two microphones 5 and 6 by delaying the voice signal of the second microphone 6 by the delay time τm from the voice signal of the first microphone 5. Regardless of the lip position of the operator, the sound from the direction of the lip position can be extracted best.
[0064]
Accordingly, the directivity of the microphones 5 and 6 is always directed to the direction of the lip position of the pilot, so that the voice of the pilot is always supplied stably. As a result, since the speech DSP 17 performs speech recognition based on the synthesized speech signal, the speech recognition rate can be improved.
[0065]
(2) In this embodiment, since the CCD camera 3 and the image DSP 12 as in the third embodiment and the steering angle sensor 20 as in the second embodiment are unnecessary, the cost can be reduced.
[0066]
In addition, you may change embodiment of this invention as follows.
In the first embodiment, an imaging tube may be used instead of the CCD camera 3. In the first embodiment, a plurality of CCD cameras 3 may be used instead of one CCD camera 3. In this case, the image signal of one CCD camera 3 that accurately captures the operator's lips among the plurality of CCD cameras 3 is selected in the image DSP 12, and the image DSP 12 is based on the selected image signal. To calculate the pilot's lip position.
[0067]
In the first embodiment, the CCD camera 3 is provided in the A pillar 2, but the CCD camera 3 may be provided anywhere in the passenger compartment 1 where an image of the driver's face can be obtained.
In the first embodiment, the period T is set to 0.001 seconds, but any numerical value may be used as long as it is a frequency period in the audio frequency range (several hertz to 20 kilohertz).
[0068]
In the second embodiment, an optical rotary encoder is used to detect the rotational position of the steering wheel 4, but this may be changed to a magnetic rotary encoder.
[0069]
In the second embodiment, the voice ROM 19 is associated with a plurality of rotational positions of the steering wheel 4 at a preset lip position of the operator, and a delay time τ composed of a plurality of preset steering delay times τs. However, data of delay time τ composed of a plurality of preset steering delay times τs may be stored in association with a plurality of rotational positions of the steering wheel 4 at a plurality of lip positions of the operator. .
[0070]
In the first to third embodiments, one or more microphones may be further provided on the steering pad 4 in addition to the first and second microphones 5 and 6.
In this configuration, the audio DSP 17 calculates each delay time τ for delaying the added microphone audio signal, delays the audio signal of each microphone by the respective delay time τ, and then sets each microphone. The audio signal is synthesized.
[0071]
In each of the above embodiments, the first microphone 5 and the second microphone 6 are used. However, three, four, or more may be used.
In each of the above embodiments, the first and second microphones 5 and 6 are installed on the steering wheel 4. However, the first and second microphones 5 and 6 may be installed at other locations other than the steering wheel 4, such as an instrument panel.
[0072]
In each of the above embodiments, at least one of the image DSP 12 and the audio DSP 17 may be replaced with a central processing unit (CPU).
Even when configured as in each of the above examples, substantially the same features can be obtained in the above embodiments.
[0073]
Next, technical ideas other than the invention described in the claims that can be grasped from the respective embodiments and other examples will be described below together with their effects.
(1) In a voice detection method for a vehicle voice recognition device comprising a plurality of microphones provided in a vehicle and voice recognition means for recognizing a voice of a driver detected by the plurality of microphones.
A delay time (τ) for delaying at least one audio signal of the plurality of microphones is calculated based on the lip position of the driver, and at least one of the audio signals of the plurality of microphones is delayed. A speech detection method for a vehicle speech recognition apparatus that detects a driver's speech by synthesizing speech signals of a plurality of microphones, delayed by time (τ) from a speech signal of another microphone.
[0074]
In this case, the voice of the driver is always stably supplied to the voice recognition means. As a result, since the speech recognition means performs speech recognition based on the synthesized speech signal, the speech recognition rate can be improved.
[0075]
【The invention's effect】
As described above in detail, according to the first aspect of the present invention, even if the steering wheel is rotated, Is Longitudinal voice is always supplied stably. As a result, since the speech recognition means performs speech recognition based on the synthesized speech signal, the speech recognition rate can be improved.
[Brief description of the drawings]
FIG. 1 is a schematic view of a passenger compartment for explaining the arrangement of a CCD camera and a microphone.
FIG. 2 is a schematic diagram of a steering pad for explaining the arrangement of microphones.
FIG. 3 is an electrical configuration diagram of the vehicle voice recognition device according to the first embodiment.
FIGS. 4A and 4B are directivity characteristics diagrams of a synthesized voice signal of both microphones, where FIG. 4A is a delay time (= T / 2), FIG. 4B is a delay time (= T / 3), and FIG. ) Shows the case of delay time (= 3T / 5).
FIG. 5 is an electrical configuration diagram of a vehicle voice recognition device according to a second embodiment.
FIG. 6 is an electrical configuration diagram of a vehicle voice recognition device according to a third embodiment.
[Explanation of symbols]
τ, τm ... delay time, τs ... steering delay time, P1 ... lip position signal, 3 ... imaging means, 4 ... steering wheel, 5 ... 1st microphone as a microphone, 6 ... 2nd microphone as a microphone, 12 ... Image DSP as lip position calculating means, 17 ... Speech recognition means, A voice DSP as a delay voice synthesis means, a delay time calculation means, 20... A steering angle sensor as a rotation position detection means.

Claims

A vehicle equipped with a plurality of microphones (5, 6) provided in the vehicle and voice recognition means (17) for recognizing the voice of the driver based on the voice signals detected by the plurality of microphones (5, 6). In a speech recognition device,
The plurality of microphones (5, 6) are provided on the steering wheel (4),
The delay time that maximizes the synthesized voice signal obtained by synthesizing the voice signals of the plurality of microphones (5, 6) is defined as the steering delay time (τs). Storage means (19) for storing the steering delay time (τs) relative to the rotational position;
Based on the output signal of the rotation position detecting means (20) for detecting the rotation position of the steering wheel (4), the rotation position of the steering wheel (4) is calculated, and the storage contents of the storage means (19) are stored. referring to the delay time calculating means for setting the relative steering delay (.tau.s) the rotational position of the steering wheel (4) at that time (17),
At least one audio signal of the plurality of microphones (5, 6) is delayed from the audio signals of other microphones by a steering delay time (τs) set by the delay time calculating means (17). A speech recognition apparatus for a vehicle, comprising delay speech synthesis means (17) for synthesizing speech signals of the microphones (5, 6).