JP4410378B2

JP4410378B2 - Speech recognition method and apparatus

Info

Publication number: JP4410378B2
Application number: JP2000112942A
Authority: JP
Inventors: 和行野木
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2000-04-14
Filing date: 2000-04-14
Publication date: 2010-02-03
Anticipated expiration: 2020-04-14
Also published as: JP2001296891A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声で制御される制御対象機器に与える入力指示の語彙を認識する、車載用などの音声認識装置に関するものである。
【０００２】
【従来の技術】
図８は、従来の車載用音声認識装置のシステム構成を示すブロック図である。以下、図に従って説明する。車両等の運転中にナビゲーション機器やオーディオ機器等を操作するに際して、スイッチ操作による運転者の負担を軽減するために、運転者など話者（発話者）の発声した音声を認識して、接続された機器に入力指示する音声認識装置がある。
【０００３】
１は話者の発声した音声を電気信号に変換する音声入力部で、無指向性の複数のマイクロホン１ａ〜１ｃからなっている。３はマイクロホン１ａ〜１ｃで検出された信号を調整して音声入力部１の指向性を話者の方向に調整した音声信号を出力するための指向性調整部である。５はナビゲーション機器やオーディオ機器等の入力部に接続された音声認識部である。８は話者の方向を検出する方向検出手段で、ルームミラーの角度や座席スライド位置、リクライニング角度などから音声の方向を検出する。４１は方向検出手段８の検出結果に基いて指向性調整部３を制御する指向性制御部である。
【０００４】
次に、動作について説明する。
図９は、従来の車載用音声認識装置の動作内容を示すフローチャートである。最初にステップＳ０において、音声認識開始の操作が行われる。次にステップＳ１において、方向検出手段８で話者の方向を検出し、話者の位置情報を取得する。次にステップＳ２において、ステップＳ１にて取得した話者の位置情報を基に指向性制御部４１が指向性を話者の方向に変更するように指向性調整部３を制御する。次にステップＳ３において、マイクロホン１ａ〜１ｃから話者の音声を入力する。続いてステップＳ４において、指向性調整部３が指向性を変更した音声に対して音声認識部５が認識処理を実行する。そしてステップＳ５において、音声認識部５から認識結果を出力する。
【０００５】
【発明が解決しようとする課題】
従来の音声認識装置では、指向性を調整する場合、話者の位置を特定するために、話者の方向を検出する手段が必要となる。この方向検出手段として従来は、車載用の場合、ルームミラーの角度や座席スライド位置、リクライニング角度などから検出した。そのため、話者が運転者であることに限定されてしまい、運転者以外が話者となる場合は上記方向検出手段では検出できない。運転者以外をも話者としてその音声を検出する場合、方向検出手段の構成が複雑となるだけでなく、誰が発話するのかを検出しなければならないのでその手段として話者判別用のスイッチなどが必要となる。また、これらの構成を実現した場合においても、指定した指向性が必ずしも音声認識処理において高い性能が得られる特性であるとは限らない。
【０００６】
本発明は、話者の音声以外のノイズを低減して音声認識性能を向上させると共に、複数の方向に存在する話者の発声にも、話者判別用のスイッチなしで対応できる利便性の高い音声認識方法、装置を提供することを目的とする。
【０００７】
【課題を解決するための手段】
請求項１に係る音声認識装置は、音声入力部と、音声入力部からの信号を保存する音声バッファ部と、音声バッファ部に保存された信号から指向性を変更した信号を生成する指向性調整部と、指向性調整部により指向性を変更した信号に対して音声認識処理を行う音声認識部と、音声バッファ部に保存された信号を再生する音声バッファ制御部と、指向性調整部にて変更する複数の指向性情報を記憶した指向性情報記憶部と、指向性情報記憶部に記憶された指向性情報から１つを選択して指向性調整部を制御する指向性制御部と、音声認識部における認識対象を記憶した音声認識辞書部と、
指向性制御部の制御により指向性調整部にて複数回指向性を変更して各指向性の信号を使用し認識処理を複数回実行させる音声認識制御部と、変更した複数の指向性における認識結果と音声認識辞書部に保存されたデータとの一致度を記憶する判定結果記憶部と、判定結果記憶部に記憶された認識結果の中から一致度が最も高い指向性を判定する一致度判定部とを備え、一致度が最も高い指向性を判定し、その後に続く一連の音声対話においては判定した指向性を継続利用し音声操作者からの音声を最も認識性能の高い指向性で入力するものである。
【０００８】
請求項２に係る音声認識装置は、音声入力部と、音声入力部からの信号を保存する音声バッファ部と、音声バッファ部に保存された信号から指向性および利得を変更した信号を生成する指向性利得調整部と、指向性利得調整部により指向性および利得を変更した信号に対して音声認識処理を行う音声認識部と、音声バッファ部に保存された信号を再生する音声バッファ制御部と、指向性利得調整部にて変更する複数の指向性情報および利得情報を記憶した指向性利得情報記憶部と、指向性利得情報記憶部に記憶された指向性情報および利得情報から各１つを選択して指向性利得調整部を制御する指向性利得制御部と、音声認識部における認識対象を記憶した音声認識辞書部と、
指向性制御部の制御により指向性調整部にて複数回指向性および利得を変更して各指向性の信号を使用し認識処理を複数回実行させる音声認識制御部と、変更した複数の指向性および利得における認識結果と音声認識辞書部に保存されたデータとの一致度を記憶する判定結果記憶部と、判定結果記憶部に記憶された認識結果の中から一致度が最も高い指向性および利得を判定する一致度判定部とを備え、一致度が最も高い指向性および利得を判定し、その後に続く一連の音声対話においては判定した指向性および利得を継続利用し音声操作者からの音声を最も認識性能の高い指向性で入力するものである。
【０００９】
請求項３に係る音声認識方法は、入力された音声信号に対して認識処理を行い、あるキーワードが認識されたかどうかを判定するステップ、キーワードが認識されたとき、そのキーワードの音声信号に対して指向性を変更した各指向性の音声信号を使用し複数回の認識処理を行い、変更した複数の指向性における認識結果と音声認識辞書部に保存されたデータとの一致度を得るステップ、これらの認識結果の中から一致度が最も高い指向性を判定するステップ、およびこの判定した最も一致度の高い指向性に指向性を設定してその後のオーディオ機器等の操作コマンド音声認識を行うステップを有するものである。
【００１０】
請求項４に係る音声認識方法は、入力された音声信号に対して認識処理を行い、あるキーワードが認識されたかどうかを判定するステップ、キーワードが認識されたとき、そのキーワードの音声信号に対して指向性および利得を変更した各指向性および各利得の音声信号を使用し複数回の認識処理を行い、変更した複数の指向性および利得における認識結果と音声認識辞書部に保存されたデータとの一致度を得るステップ、これらの認識結果の中から一致度が最も高い指向性および利得を判定するステップ、およびこの判定した最も一致度の高い指向性および利得に指向性および利得を設定してその後のオーディオ機器等の操作コマンド音声認識を行うステップを有するものである。
【００１１】
【発明の実施の形態】
実施の形態１．
以下、この発明の実施の形態を、車載用の音声認識装置について説明する。
図１は、この発明の実施の形態１における音声認識装置のシステム構成を示すブロック図である。図において、１は話者が発した音声を電気信号（以下、音声信号と呼ぶ）に変換する音声入力部で、無指向性の複数（ここでは３つ）のマイクロホン１ａ〜１ｃからなる。
図２は、車両へのマイクロホンの取付位置の例を示す平面図である。車両１１内で、３つのマイクロホン１ａ〜１ｃをダッシュボード１２上中央部へ等間隔に三角形をなすように設置する。例えば、マイクロホン１ａ〜１ｃの各出力信号のゲインバランスを調整することにより、指向性を真正面や運転席１３方向、あるいは助手席１４方向に変更することが可能となる。
【００１２】
図１へ戻り、２は音声入力部１からの音声信号を保存する音声バッファ部で、それぞれマイクロホン１ａ〜１ｃに対応して設けられた複数の音声バッファ２ａ〜２ｃからなる。３は音声バッファ部２に保存された音声信号を調整して、指向性を変更した音声信号を出力する指向性調整部、５は指向性調整部３で指向性を変更された音声信号に対して音声認識処理を実行する音声認識部であり、認識結果と次に述べる音声認識辞書部６に保存されたデータとの一致度を出力する。６は認識対象を記憶する音声認識辞書部であり、音声認識部５の音声認識処理における基準となるデータが保存されている。
【００１３】
４は音声バッファ部２と指向性調整部３と音声認識部５を制御する制御部、４３は音声バッファ部２での音声信号の保存と再生を制御する音声バッファ制御部、４１は指向性調整部３での指向性変更を制御する指向性制御部、４２は指向性制御部４１による指向性制御のための複数の指向性情報を記憶する指向性情報記憶部であり、例えば正面を０°とし、５°間隔で±９０°までを記憶しておき、指向性制御部４１がその中から１つずつ選択して制御を行う。４４は音声認識部５の認識処理の開始や中止、および認識結果と一致度の取得を行う音声認識制御部、４５は指向性制御部４１からの指向性情報および音声認識制御部４４からの認識結果と一致度から、どの指向性が最適であるかを判定する一致度判定部、４６は音声認識部での認識結果と一致度、および一致度判定部４５での判定結果を記憶する判定結果記憶部であり、上記４１〜４６で制御部４を構成している。
【００１４】
次に、動作について説明する。
図３は、図１に示した音声認識装置の動作内容を示すフローチャートである。最初にステップＡ０において、制御部４の各部の初期化および処理の開始操作が実行される。
次にステップＡ１において、指向性制御部４１からの制御により指向性調整部３の指向性設定を無指向性に設定する。
次にステップＡ２において、マイクロホン１ａ〜１ｃに入力されて音声信号に変換されたそれぞれの信号を音声バッファ制御部４３からの制御により音声バッファ２ａ〜２ｃに格納し、この格納された音声信号を音声バッファ制御部４３からの制御により再生し、この音声信号を、無指向性に設定された指向性調整部３に入力し、指向性調整部３の出力を音声認識部５に入力する。音声認識部５では入力された音声信号に対して、音声認識制御部４４からの制御により、音声操作の開始コマンドとなるキーワード、例えば「認識スタート」を認識する処理を実行する。
【００１５】
次にステップＡ３において、音声認識部５の認識結果に基づき、音声認識制御部４４は、キーワード「認識スタート」が認識されたのかを判定し、認識されなかった場合はステップＡ２に戻り、再度音声入力処理およびキーワード認識処理を実行する。認識された場合はステップＡ４に進む。
次に、ステップＡ４へ進んだときは、音声バッファ制御部４３からの制御により、音声バッファ２ａ〜２ｃへの音声入力を停止し、キーワード「認識スタート」が認識された時のその音声信号を格納する。
【００１６】
次にステップＡ５において、指向性制御部４１からの制御により、指向性調整部３の指向性設定を指向性情報記憶部４２に記憶された、例えば正面０°方向に設定する。
次にステップＡ６において、ステップＡ４で音声バッファ２ａ〜２ｃに格納されたキーワード「認識スタート」の音声信号を、音声バッファ制御部４３からの制御により再生し、指向性調整部３にて指向性制御部４１が設定した指向性をもつ音声信号を生成し、音声認識部５にてキーワード認識処理を再度実行し、音声認識制御部４４が認識処理の結果と一致度を音声認識部５から取得して、一致度判定部４５に送信する。一致度判定部４５は現在設定されている指向性情報と認識結果と一致度を判定結果記憶部４６へ送信して記憶させる。
【００１７】
次にステップＡ７において、指向性情報記憶部４２に記憶された全ての指向性についての再認識処理および認識結果と一致度の取得が終了していない場合はステップＡ５に戻り、指向性情報記憶部４２に記憶された全ての指向性について終了するまで繰り返す。全ての指向性について再認識処理および認識結果と一致度の取得が終了した場合はステップＡ８に進む。
【００１８】
次に、ステップＡ８へ進んだときは、判定結果記憶部４６に記憶された全ての指向性についての認識結果と一致度から、一致度判定部４５は、認識結果が正解、すなわちキーワード「認識スタート」であり、かつ最も一致度の高い指向性はどれであるかを判定し、指向性制御部４１は、一致度判定部４５が判定した指向性となるように指向性調整部３を制御する。
【００１９】
次にステップＡ９において、ステップＡ４で停止した音声バッファ２ａ〜２ｃへの音声入力を再開する。すなわちマイクロホン１ａ〜１ｃに入力されて音声信号に変換されたそれぞれの信号を音声バッファ制御部４３からの制御により音声バッファ２ａ〜２ｃに格納し、この格納された音声信号を音声バッファ制御部４３からの制御により再生し、ステップＡ８で認識結果が正解であってかつ最も一致度の高い指向性に設定された指向性調整部３に音声信号を入力し、指向性調整部３の出力を音声認識部５に入力する。音声認識部５では入力された音声信号に対して、音声認識制御部４４からの制御により音声認識辞書部６に格納された認識語彙を認識する処理を実行する。
次にステップＡ１０において、音声認識部５は認識処理の結果を出力し、図示外のオーディオ機器等の操作を行う。
なお、ステップＡ８で、図示外のディスプレイまたはランプにより、キーワード「認識スタート」の認識完了と指向性の設定方向を表示するようにしておけば、話者がその表示を確認して、ステップＡ９で、続くコマンドを入力することができる。
【００２０】
以上のように実施の形態１の音声認識方法、装置においては、キーワード「認識スタート」を認識した時点の音声バッファの音声信号を用いて指向性を変更し、音声認識における一致度から話者の方向を判定して、話者の音声を抽出するため、話者の方向が定まっていない場合においても話者の音声を有効に抽出し、認識する事が可能である。また、話者の方向検出手段が不要であり、話者判別用のスイッチあるいは方向検出用のセンサなどのコスト削減が可能となる。
【００２１】
実施の形態２．
図４は、この発明の実施の形態２における音声認識装置のシステム構成を示すブロック図である。
本実施の形態では、実施の形態１で行った指向性を変化させ一致度の最大のものを選ぶ方法に加えて、利得変化すなわち信号レベルを変化させて一致度を見る方法を用いている。
図４では図１の指向性調整部３、指向性制御部４１および指向性記憶部４２に代えて、それぞれ指向性利得調整部３１、指向性利得制御部４７および指向性利得記憶部４８を設けている。
【００２２】
図４において、３１は音声バッファ部２に保有された音声信号を調整して指向性および利得を変更した音声信号を出力する指向性利得調整部、４７は指向性利得調整部３１での指向性と利得の変更を制御する指向性利得制御部、４８は指向性利得制御部４７の指向性と利得の制御において複数の指向性情報と複数の利得情報を記憶する指向性利得情報記憶部であり、例えば正面を０°とし、５°間隔で±９０までを記憶するとともに、初期利得を０ｄＢとし、３ｄＢ間隔で±１５ｄＢまでを記憶している。
【００２３】
音声認識部５は、指向性利得調整部３１で指向性と利得を変更された音声信号に対して音声認識処理を実行する。一致度判定部４５は、指向性利得制御部４７からの指向性情報と利得情報および音声認識制御部４４からの認識結果と一致度から、どの指向性および利得が最適であるかを判定する。
制御部４は、４３〜４８で構成されている。その他の部分は図１と同様であるので説明を省略する。
【００２４】
次に、動作について説明する。
図５は、図４に示した音声認識装置の動作内容を示すフローチャートである。最初にステップＢ０において、制御部４の各部の初期化および処理の開始操作が実行される。
次にステップＢ１において、指向性利得制御部４７からの制御により指向性利得調整部３１の指向性および利得設定を無指向性および初期利得に設定する。
次にステップＢ２において、マイクロホン１ａ〜１ｃに入力されて音声信号に変換されたそれぞれの信号を音声バッファ制御部４３からの制御により音声バッファ２ａ〜２ｃに格納し、この格納された音声信号を音声バッファ制御部４３からの制御により再生し、この音声信号を、無指向性および初期利得に設定された指向性利得調整部３１に入力し、指向性利得調整部３１の出力を音声認識部５に入力する。音声認識部５では入力された音声信号に対して、音声認識制御部４４からの制御により、音声操作の開始コマンドとなるキーワード、例えば「認識スタート」を認識する処理を実行する。
【００２５】
次にステップＢ３において、音声認識部５の認識結果に基づき、音声認識制御部４４は、キーワード「認識スタート」が認識されたのかを判定し、認識されなかった場合はステップＢ２に戻り再度音声入力処理およびキーワード認識処理を実行する。認識された場合はステップＢ４に進む。
次に、ステップＢ４へ進んだときは、音声バッファ制御部４３からの制御により、音声バッファ２ａ〜２ｃへの音声入力を停止し、キーワード「認識スタート」が認識された時のその音声信号を格納する。
【００２６】
次にステップＢ５において、指向性利得制御部４７からの制御により、指向性利得調整部３１の指向性設定を指向性利得情報記憶部４８に記憶された例えば正面０°方向に設定する。
次にステップＢ６において、ステップＢ４で音声バッファ２ａ〜２ｃに格納されたキーワード「認識スタート」の音声信号を音声バッファ制御部４３からの制御により再生し、指向性利得調整部３１にて指向性利得制御部４７が設定した指向性をもつ音声信号を生成し、音声認識部５にてキーワード認識処理を再度実行し、音声認識制御部４４が認識処理の結果と一致度を音声認識部５から取得して、一致度判定部４５に送信する。一致度判定部４５は現在設定されている指向性情報と認識結果と一致度を判定結果記憶部４６へ送信して記憶させる。
【００２７】
次にステップＢ７において、指向性利得情報記憶部４８に記憶された全ての指向性についての再認識処理および認識結果と一致度の取得が終了していない場合はステップＢ５に戻り、指向性利得情報記憶部４８に記憶された全ての指向性について再認識処理および認識結果と一致度の取得が終了するまで繰り返す。全ての指向性について再認識処理および認識結果と一致度の取得が終了した場合はステップＢ８に進む。
【００２８】
次に、ステップＢ８へ進んだときは、判定結果記憶部４６に記憶された全ての指向性についての認識結果と一致度から、一致度判定部４５は、認識結果が正解、すなわちキーワード「認識スタート」であり、かつ最も一致度の高い指向性はどれであるかを判定し、指向性利得制御部４７は、一致度判定部４５が判定した指向性となるように指向性利得調整部３１を制御する。
【００２９】
次にステップＢ９において、指向性利得制御部４７からの制御により、指向性利得調整部３１の利得設定を指向性利得情報記憶部４８に記憶された、例えば初期利得より３ｄＢ高い利得に設定する。この場合の利得調整は、ステップＢ８にて判定された指向性の方向についてのみ利得が調整されるものとする。
【００３０】
次にステップＢ１０において、ステップＢ４で音声バッファ２ａ〜２ｃに格納されたキーワード「認識スタート」の音声信号を音声バッファ制御部４３からの制御により再生し、指向性利得調整部３１にて指向性利得制御部４７が設定した指向性および利得をもつ音声信号を生成し、音声認識部５にてキーワード認識処理を再度実行し、音声認識制御部４４が認識処理の結果と一致度を音声認識部５から取得し、一致度判定部４５に送信する。一致度判定部４５は現在設定されている指向性情報と利得情報と認識結果と一致度を判定結果記憶部４６へ送信して記憶させる。
【００３１】
次にステップＢ１１において、指向性利得情報記憶部４８に記憶された全ての利得についての再認識処理および認識結果と一致度の取得が終了していない場合はステップＢ９に戻り指向性利得情報記憶部４８に記憶された全ての利得に対する再認識処理および認識結果と一致度の取得が終了するまで繰り返す。全ての利得について再認識処理および認識結果と一致度の取得が終了した場合はステップＢ１２に進む。
【００３２】
次に、ステップＢ１２へ進んだときは、判定結果記憶部４６に記憶された全ての利得についての認識結果と一致度から、一致度判定部４５は、認識結果が正解であり、かつ最も一致度の高い利得はどれであるかを判定し、指向性利得制御部４７は、一致度判定部４５が判定した指向性および利得となるように指向性利得調整部３１を制御する。
【００３３】
次にステップＢ１３において、ステップＢ４で停止した音声バッファ２ａ〜２ｃへの音声入力を再開する。すなわちマイクロホン１ａ〜１ｃに入力されて音声信号に変換されたそれぞれの信号を音声バッファ制御部４３からの制御により音声バッファ２ａ〜２ｃに格納し、この格納された音声信号を音声バッファ制御部４３からの制御により再生し、ステップＢ１２で認識結果が正解であってかつ最も一致度の高い指向性および利得に設定された指向性利得調整部３１に音声信号を入力し、指向性利得調整部３１の出力を音声認識部５に入力する。音声認識部５では入力された音声信号に対して、音声認識制御部４４からの制御により音声認識辞書部６に格納された認識語彙を認識する処理を実行する。
次にステップＢ１４において、音声認識部５は認識処理の結果を出力し、図示外のオーディオ機器等の制御を行う。
【００３４】
以上のように実施の形態２の音声認識方法、装置においては、音声認識における一致度を用いて話者の方向を判定し、更に音声認識における一致度を用いて最適な利得、すなわち音声認識に最適な入力信号レベルを判定し、話者の音声を抽出するため、話者の方向が定まっていない場合においても話者の音声を有効に抽出するとともに、話者からマイクロホンまでの距離などのためにマイクロホンへの音声入力レベルが異なる場合でも、音声認識に最適な入力信号レベルで認識処理を実行する事が可能である。また、話者の方向検出手段が不要であり、方向検出用のセンサなどのコスト削減が可能となる。
【００３５】
実施の形態３．
図６は、この発明の実施の形態３における音声認識装置のシステム構成を示すブロック図である。
本実施の形態は、音声でナビゲーション装置を制御する例を示す。図６では、図４に示したものに加えてナビゲーション装置７を示している。
図６において、７は音声認識の結果に基づいて制御部４により、種々の操作が実行されるナビゲーション装置である。その他は図４と同様であるので説明を省略する。
【００３６】
次に、動作について説明する。
図７は、図６に示した音声認識装置の動作内容を示すフローチャートである。ステップＣ０〜Ｃ１２は、図５のステップＢ０〜Ｂ１２と同様であるので説明を省略する。ただし、図５ではキーワードの例を「認識スタート」として説明したが、図７では別のキーワード、例えば「ナビゲーション」を用いる。
【００３７】
ステップＣ１２に続くステップＣ１３において、ステップＣ４で停止した音声バッファ２ａ〜２ｃへの音声入力を再開する。すなわちマイクロホン１ａ〜１ｃに入力されて音声信号に変換されたそれぞれの信号を音声バッファ制御部４３からの制御により音声バッファ２ａ〜２ｃに格納し、この格納された音声信号を音声バッファ制御部４３からの制御により再生し、ステップＣ１２で認識結果が正解であり、かつ最も一致度の高い指向性および利得に設定された指向性利得調整部３１に音声信号を入力し、指向性利得調整部３１の出力を音声認識部５に入力する。音声認識部５では入力された音声信号に対して、音声認識制御部４４からの制御により、音声認識辞書部６に格納されたナビゲーション装置７の制御コマンド語彙、例えば「詳細表示」、「広域表示」、「目的地設定」などを認識する処理を実行する。
【００３８】
次にステップＣ１４において、音声認識制御部４４が音声認識部５の認識結果、例えば「詳細表示」を取得し、認識結果「詳細表示」に対応した制御信号をナビゲーション装置７に送信する。次にステップＣ１５において、ナビゲーション装置７が受信した認識結果「詳細表示」に対応した制御信号に応じて表示画面を詳細表示する処理を実行する。
【００３９】
以上のように実施の形態３の音声認識方法、装置においては、キーワード「ナビゲーション」を認識した時点の音声バッファの音声信号を用い指向性を変更して音声認識における一致度の最も高い指向性を判定し、これを話者の方向とし、さらに一致度の最も高い利得を判定して以後の音声認識処理を実行するため、音声認識の開始スイッチなどは不要であり、操作が簡便となる。また、キーワード認識処理における一致度の最も高い指向性および利得を判定するため、話者の方向が運転者に限定されず、助手席からの音声操作も可能となり、以後のコマンド認識処理において最適な指向性と利得で認識処理を実行可能であり、認識性能の向上が可能である。また、話者の方向検出手段が不要であり、方向検出用のセンサなどのコスト削減が可能となる。
【００４０】
【発明の効果】
請求項１に係る音声認識装置によれば、指向性を変更した信号に対して音声認識処理を行う音声認識部と、認識結果の中から一致度が最も高い指向性を判定する一致度判定部を備えているので、一致度判定部で一致度が最高と判定した指向性に設定することにより、話者が複数存在してその方向が定まっていない場合でも話者判別用のスイッチやセンサなしで、話者以外のノイズを低減して話者の音声を有効に抽出でき、音声認識性能が高く、利便性の高い音声認識装置が得られる。
【００４１】
請求項２に係る音声認識装置によれば、指向性および利得を変更した信号に対して音声認識処理を行う音声認識部と、認識結果の中から一致度の最も高い指向性と利得を判定する一致度判定部を備えているので、一致度判定部で一致度が最高と判定した指向性および利得に設定することにより、話者の方向が定まっておらず、また音声入力部（マイクロホン）への入力レベルが大きいあるいは小さい場合でも話者判別用のスイッチやセンサなしで、話者の音声を有効に抽出するとともに適切な信号レベルで認識処理を行うことができ、音声認識性能が高く、利便性の高い音声認識装置が得られる。
【００４２】
請求項３に係る音声認識方法によれば、認識されたキーワードの音声信号に対して指向性を変更した複数の認識処理を行い、その認識結果の中から一致度が最も高い指向性を判定し、この指向性に設定して以後の音声認識を行うので、話者の方向が定まっていない場合でも話者判別用のスイッチやセンサなしで、話者の音声を有効に抽出でき、音声認識性能が高く、利便性の高い音声認識方法が得られる。
【００４３】
請求項４に係る音声認識方法によれば、認識されたキーワードの音声信号に対して指向性と利得を変更した複数の認識処理を行い、その認識結果の中から一致度が最も高い指向性および利得を判定し、この指向性および利得に設定して以後の音声認識を行うので、話者の方向が定まっておらず、また音声入力部への入力レベルが大きいあるいは小さい場合でも話者判別用のスイッチやセンサなしで、話者の音声を有効に抽出するとともに適切な信号レベルで認識処理を行うことができ、音声認識性能が高く、利便性の高い音声認識方法が得られる。
【図面の簡単な説明】
【図１】この発明の実施の形態１における音声認識装置のシステム構成を示すブロック図である。
【図２】図１の音声認識装置のマイクロホンの取付位置を示す平面図である。
【図３】図１の音声認識装置の動作内容を示すフローチャートである。
【図４】この発明の実施の形態２における音声認識装置のシステム構成を示すブロック図である。
【図５】図４の音声認識装置の動作内容を示すフローチャートである。
【図６】この発明の実施の形態３における音声認識装置のシステム構成を示すブロック図である。
【図７】図６の音声認識装置の動作内容を示すフローチャートである。
【図８】従来の音声認識装置のシステム構成を示すブロック図である。
【図９】図８の音声認識装置の動作内容を示すフローチャートである。
【符号の説明】
１音声入力部、２音声バッファ部、３指向性調整部、５音声認識部、
６音声認識辞書部、３１指向性利得調整部、４１指向性制御部、
４２指向性情報記憶部、４３音声バッファ制御部、４４音声認識制御部、
４５一致度判定部、４６判定結果記憶部、４７指向性利得制御部、
４８指向性利得情報記憶部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an on-vehicle speech recognition device that recognizes a vocabulary of input instructions given to a control target device controlled by speech.
[0002]
[Prior art]
FIG. 8 is a block diagram showing a system configuration of a conventional in-vehicle speech recognition device. Hereinafter, it demonstrates according to a figure. When operating a navigation device or audio device while driving a vehicle, etc., to reduce the burden on the driver due to the switch operation, the voice spoken by the speaker (speaker) such as the driver is recognized and connected. There is a voice recognition device for instructing input to a device.
[0003]
Reference numeral 1 denotes a voice input unit that converts a voice uttered by a speaker into an electrical signal, and includes a plurality of omnidirectional microphones 1a to 1c. A directivity adjustment unit 3 adjusts signals detected by the microphones 1a to 1c to output a voice signal in which the directivity of the voice input unit 1 is adjusted in the direction of the speaker. Reference numeral 5 denotes a voice recognition unit connected to an input unit such as a navigation device or an audio device. Reference numeral 8 denotes direction detection means for detecting the direction of the speaker, which detects the direction of voice from the angle of the rearview mirror, the seat slide position, the reclining angle, and the like. A directivity control unit 41 controls the directivity adjustment unit 3 based on the detection result of the direction detection unit 8.
[0004]
Next, the operation will be described.
FIG. 9 is a flowchart showing the operation content of a conventional in-vehicle voice recognition device. First, in step S0, a voice recognition start operation is performed. Next, in step S1, the direction detection means 8 detects the direction of the speaker and acquires the position information of the speaker. Next, in step S2, the directivity control unit 41 controls the directivity adjusting unit 3 so that the directivity control unit 41 changes the directivity to the direction of the speaker based on the position information of the speaker acquired in step S1. Next, in step S3, the voice of the speaker is input from the microphones 1a to 1c. Subsequently, in step S4, the voice recognition unit 5 performs a recognition process on the voice whose directivity adjustment unit 3 has changed the directivity. In step S5, the speech recognition unit 5 outputs a recognition result.
[0005]
[Problems to be solved by the invention]
In the conventional speech recognition apparatus, when adjusting the directivity, a means for detecting the direction of the speaker is required in order to specify the position of the speaker. Conventionally, this direction detection means is detected from the angle of the rearview mirror, the seat slide position, the reclining angle, etc. in the case of in-vehicle use. For this reason, the speaker is limited to being a driver, and when the person other than the driver is a speaker, the direction detection unit cannot detect the speaker. When detecting the voice as a speaker other than the driver, not only the configuration of the direction detection means is complicated, but also a switch for determining the speaker is necessary as it is necessary to detect who speaks. Necessary. Even when these configurations are realized, the specified directivity is not necessarily a characteristic that can provide high performance in the speech recognition processing.
[0006]
The present invention improves the speech recognition performance by reducing noise other than the voice of the speaker, and is highly convenient to handle the voice of the speaker existing in a plurality of directions without a switch for speaker discrimination. An object is to provide a speech recognition method and apparatus.
[0007]
[Means for Solving the Problems]
The speech recognition apparatus according to claim 1 includes a speech input unit, a speech buffer unit that stores a signal from the speech input unit, and a directivity adjustment that generates a signal having a changed directivity from the signal stored in the speech buffer unit. A voice recognition unit that performs voice recognition processing on a signal whose directivity has been changed by the directivity adjustment unit, a voice buffer control unit that reproduces a signal stored in the voice buffer unit, and a directivity adjustment unit. A directivity information storage section that stores a plurality of directivity information to be changed, a directivity control section that selects one of the directivity information stored in the directivity information storage section and controls the directivity adjustment section; A speech recognition dictionary that stores recognition targets in the recognition unit;
Each directivity signal is used by changing the directivity multiple times in the directivity adjustment section under the control of the directivity control section. Speech recognition control unit that executes recognition processing multiple times, and recognition results for multiple modified directivities And the data stored in the voice recognition dictionary A determination result storage unit that stores the degree of coincidence, and a coincidence degree determination unit that determines the directivity having the highest degree of coincidence among the recognition results stored in the determination result storage unit. , Determine the directivity with the highest degree of coincidence, and continue to use the determined directivity in a series of subsequent voice conversations, and input the voice from the voice operator with the directivity with the highest recognition performance Is.
[0008]
The speech recognition apparatus according to claim 2 includes a speech input unit, a speech buffer unit that stores a signal from the speech input unit, and a directivity that generates a signal in which directivity and gain are changed from the signal stored in the speech buffer unit. A directivity gain adjustment unit, a speech recognition unit that performs speech recognition processing on a signal whose directivity and gain have been changed by the directivity gain adjustment unit, a speech buffer control unit that reproduces a signal stored in the speech buffer unit, A directivity gain information storage unit storing a plurality of directivity information and gain information to be changed by the directivity gain adjustment unit, and one each selected from the directivity information and gain information stored in the directivity gain information storage unit A directivity gain control unit that controls the directivity gain adjustment unit, a speech recognition dictionary unit that stores recognition targets in the speech recognition unit, and
Each directivity signal is used by changing the directivity and gain multiple times in the directivity adjustment section under the control of the directivity control section. Speech recognition control unit that executes recognition processing multiple times, and recognition results for multiple modified directivities and gains And the data stored in the voice recognition dictionary A determination result storage unit that stores the degree of coincidence, and a coincidence degree determination unit that determines the directivity and gain having the highest degree of coincidence among the recognition results stored in the determination result storage unit. Determine the directivity and gain with the highest degree of coincidence, and continue to use the determined directivity and gain in a series of subsequent voice conversations, and input the voice from the voice operator with the directivity with the highest recognition performance. Is.
[0009]
According to a third aspect of the present invention, there is provided a speech recognition method for performing recognition processing on an input speech signal and determining whether or not a keyword is recognized. When a keyword is recognized, the speech signal of the keyword is recognized. Changed directivity Multiple times using each directional audio signal Perform recognition processing, The recognition results of multiple modified directivities and the data stored in the speech recognition dictionary A step of obtaining a degree of coincidence, a step of determining the directivity having the highest degree of coincidence among these recognition results, and the determination Set the directivity to the directivity with the highest degree of coincidence, and then recognize the operation commands for audio equipment etc. The step which performs is performed.
[0010]
According to a fourth aspect of the present invention, there is provided a speech recognition method for performing recognition processing on an input speech signal and determining whether or not a keyword is recognized. When a keyword is recognized, the speech signal of the keyword is recognized. Changed directivity and gain Multiple times using audio signals of each directivity and gain Perform recognition processing, The recognition result of multiple modified directivities and gains and the data stored in the speech recognition dictionary A step of obtaining a degree of coincidence, a step of determining directivity and gain having the highest degree of coincidence among the recognition results, and the determination Set the directivity and gain to the directivity and gain with the highest degree of coincidence, and then recognize the operation commands for audio equipment etc. The step which performs is performed.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
Embodiment 1 FIG.
Hereinafter, embodiments of the present invention will be described with reference to an on-vehicle speech recognition apparatus.
FIG. 1 is a block diagram showing a system configuration of a speech recognition apparatus according to Embodiment 1 of the present invention. In the figure, reference numeral 1 denotes a voice input unit that converts a voice uttered by a speaker into an electrical signal (hereinafter referred to as a voice signal), and includes a plurality (three in this case) of microphones 1a to 1c.
FIG. 2 is a plan view showing an example of a position where the microphone is attached to the vehicle. Within the vehicle 11, three microphones 1 a to 1 c are installed on the dashboard 12 at a central portion so as to form a triangle at regular intervals. For example, by adjusting the gain balance of the output signals of the microphones 1a to 1c, the directivity can be changed to the front, the driver's 13 direction, or the passenger's 14 direction.
[0012]
Returning to FIG. 1, reference numeral 2 denotes an audio buffer unit for storing an audio signal from the audio input unit 1, and includes a plurality of audio buffers 2a to 2c provided corresponding to the microphones 1a to 1c, respectively. The directivity adjusting unit 3 adjusts the audio signal stored in the audio buffer unit 2 and outputs the audio signal whose directivity is changed. The reference numeral 5 denotes the audio signal whose directivity is changed by the directivity adjusting unit 3. The speech recognition unit executes speech recognition processing, and outputs the degree of coincidence between the recognition result and data stored in the speech recognition dictionary unit 6 described below. Reference numeral 6 denotes a voice recognition dictionary unit for storing recognition targets, in which data serving as a reference in the voice recognition processing of the voice recognition unit 5 is stored.
[0013]
4 is a control unit that controls the audio buffer unit 2, the directivity adjustment unit 3, and the voice recognition unit 5, 43 is an audio buffer control unit that controls the storage and reproduction of audio signals in the audio buffer unit 2, and 41 is a directivity adjustment. A directivity control unit that controls the directivity change in the unit 3, and 42 is a directivity information storage unit that stores a plurality of directivity information for directivity control by the directivity control unit 41. Then, up to ± 90 ° is stored at intervals of 5 °, and the directivity control unit 41 selects and controls one by one. 44 is a speech recognition control unit that starts and stops the recognition process of the speech recognition unit 5 and acquires the degree of coincidence with the recognition result, and 45 is the directivity information from the directivity control unit 41 and the recognition from the speech recognition control unit 44. A coincidence determination unit that determines which directivity is optimal from the result and the coincidence, 46 is a determination result for storing the recognition result and the coincidence in the speech recognition unit, and the determination result in the coincidence determination unit 45 It is a memory | storage part and the control part 4 is comprised by said 41-46.
[0014]
Next, the operation will be described.
FIG. 3 is a flowchart showing the operation contents of the speech recognition apparatus shown in FIG. First, in step A0, initialization of each part of the control part 4 and start operation of the process are executed.
Next, in step A1, the directivity setting of the directivity adjustment unit 3 is set to non-directivity by the control from the directivity control unit 41.
Next, in step A2, the signals input to the microphones 1a to 1c and converted into audio signals are stored in the audio buffers 2a to 2c under the control of the audio buffer control unit 43, and the stored audio signals are stored as audio signals. Reproduction is performed under the control of the buffer control unit 43, and this audio signal is input to the directivity adjustment unit 3 set to non-directionality, and the output of the directivity adjustment unit 3 is input to the audio recognition unit 5. The voice recognition unit 5 executes a process of recognizing a keyword that is a voice operation start command, for example, “recognition start”, by the control from the voice recognition control unit 44 on the input voice signal.
[0015]
Next, in step A3, based on the recognition result of the speech recognition unit 5, the speech recognition control unit 44 determines whether or not the keyword “recognition start” has been recognized. Input processing and keyword recognition processing are executed. If recognized, the process proceeds to step A4.
Next, when the process proceeds to step A4, the voice input to the voice buffers 2a to 2c is stopped by the control from the voice buffer control unit 43, and the voice signal when the keyword “recognition start” is recognized is stored. To do.
[0016]
Next, in step A5, the directivity setting of the directivity adjustment unit 3 is set in the directivity information storage unit 42, for example, in the 0 ° front direction, under the control of the directivity control unit 41.
Next, in step A6, the voice signal of the keyword “recognition start” stored in the voice buffers 2a to 2c in step A4 is reproduced under the control of the voice buffer control unit 43, and the directivity control unit 3 controls the directivity. The voice signal having directivity set by the unit 41 is generated, the keyword recognition process is executed again by the voice recognition unit 5, and the voice recognition control unit 44 acquires the result of the recognition process and the degree of coincidence from the voice recognition unit 5. And transmitted to the coincidence degree determination unit 45. The coincidence determination unit 45 transmits the currently set directivity information, the recognition result, and the coincidence to the determination result storage unit 46 for storage.
[0017]
Next, in step A7, if re-recognition processing for all directivities stored in the directivity information storage unit 42 and acquisition of recognition results and coincidences are not completed, the process returns to step A5, and the directivity information storage unit It repeats until it complete | finishes about all the directivity memorize | stored in 42. FIG. If the re-recognition process and the recognition result and the degree of coincidence have been acquired for all directivities, the process proceeds to step A8.
[0018]
Next, when proceeding to Step A8, the coincidence degree determination unit 45 determines that the recognition result is correct, that is, the keyword “recognition start” from the recognition results and the coincidence degrees for all directivities stored in the determination result storage unit 46. The directivity control unit 41 controls the directivity adjustment unit 3 so that the directivity determined by the coincidence determination unit 45 is obtained. .
[0019]
Next, in step A9, the audio input to the audio buffers 2a to 2c stopped in step A4 is resumed. That is, the respective signals input to the microphones 1 a to 1 c and converted into audio signals are stored in the audio buffers 2 a to 2 c under the control of the audio buffer control unit 43, and the stored audio signals are transmitted from the audio buffer control unit 43. The voice signal is input to the directivity adjustment unit 3 that is set to the directivity with the correct recognition result and the highest coincidence in step A8, and the output of the directivity adjustment unit 3 is recognized as a voice. Input to part 5. The voice recognition unit 5 executes processing for recognizing the recognition vocabulary stored in the voice recognition dictionary unit 6 under the control of the voice recognition control unit 44 with respect to the input voice signal.
Next, in step A10, the voice recognition unit 5 outputs the result of the recognition process, and operates an audio device or the like not shown.
In step A8, if the recognition completion of the keyword “recognition start” and the directionality setting direction are displayed on a display or lamp (not shown), the speaker confirms the display, and in step A9. , You can enter the following command.
[0020]
As described above, in the speech recognition method and apparatus of the first embodiment, the directivity is changed using the speech signal of the speech buffer at the time when the keyword “recognition start” is recognized, and the speaker's degree of coincidence in speech recognition is determined. Since the speaker's voice is extracted by determining the direction, it is possible to effectively extract and recognize the speaker's voice even when the speaker's direction is not fixed. Further, no speaker direction detection means is required, and costs such as a speaker discrimination switch or a direction detection sensor can be reduced.
[0021]
Embodiment 2. FIG.
FIG. 4 is a block diagram showing a system configuration of the speech recognition apparatus according to Embodiment 2 of the present invention.
In the present embodiment, in addition to the method of changing the directivity performed in the first embodiment and selecting the one with the highest degree of matching, a method of viewing the degree of matching by changing the gain, that is, the signal level, is used.
In FIG. 4, instead of the directivity adjustment unit 3, the directivity control unit 41, and the directivity storage unit 42 of FIG. 1, a directivity gain adjustment unit 31, a directivity gain control unit 47, and a directivity gain storage unit 48 are provided. ing.
[0022]
In FIG. 4, 31 is a directivity gain adjustment unit that adjusts an audio signal held in the audio buffer unit 2 and outputs an audio signal whose directivity and gain are changed, and 47 is a directivity in the directivity gain adjustment unit 31. A directivity gain control unit 48 that controls the change in gain, and a directivity gain information storage unit 48 that stores a plurality of directivity information and a plurality of gain information in the directivity and gain control of the directivity gain control unit 47. For example, the front is 0 °, and ± 90 is stored at 5 ° intervals, and the initial gain is 0 dB, and ± 15 dB is stored at 3 dB intervals.
[0023]
The voice recognition unit 5 performs voice recognition processing on the voice signal whose directivity and gain are changed by the directivity gain adjustment unit 31. The coincidence determination unit 45 determines which directivity and gain are optimal from the directivity information and gain information from the directivity gain control unit 47 and the recognition result and coincidence from the speech recognition control unit 44.
The control unit 4 includes 43 to 48. The other parts are the same as in FIG.
[0024]
Next, the operation will be described.
FIG. 5 is a flowchart showing the operation contents of the speech recognition apparatus shown in FIG. First, in step B0, initialization of each part of the control part 4 and start operation of the process are executed.
Next, in step B1, the directivity and gain setting of the directivity gain adjustment unit 31 are set to non-directivity and initial gain by the control from the directivity gain control unit 47.
Next, in step B2, the signals input to the microphones 1a to 1c and converted into audio signals are stored in the audio buffers 2a to 2c under the control of the audio buffer control unit 43, and the stored audio signals are stored as audio signals. The audio signal is reproduced under the control of the buffer control unit 43, and this audio signal is input to the directivity gain adjustment unit 31 set to omnidirectionality and initial gain, and the output of the directivity gain adjustment unit 31 is input to the audio recognition unit 5. input. The voice recognition unit 5 executes a process of recognizing a keyword that is a voice operation start command, for example, “recognition start”, by the control from the voice recognition control unit 44 on the input voice signal.
[0025]
Next, in step B3, based on the recognition result of the voice recognition unit 5, the voice recognition control unit 44 determines whether or not the keyword “recognition start” has been recognized. Processing and keyword recognition processing are executed. If recognized, the process proceeds to step B4.
Next, when the process proceeds to step B4, the voice input to the voice buffers 2a to 2c is stopped by the control from the voice buffer control unit 43, and the voice signal when the keyword “recognition start” is recognized is stored. To do.
[0026]
Next, in step B 5, the directivity setting of the directivity gain adjustment unit 31 is set to, for example, the front 0 ° direction stored in the directivity gain information storage unit 48 by the control from the directivity gain control unit 47.
Next, in step B 6, the voice signal of the keyword “recognition start” stored in the voice buffers 2 a to 2 c in step B 4 is reproduced under the control of the voice buffer control unit 43, and the directivity gain adjustment unit 31 directs the directivity gain. A voice signal having directivity set by the control unit 47 is generated, the keyword recognition process is executed again by the voice recognition unit 5, and the voice recognition control unit 44 acquires the result of the recognition process and the degree of coincidence from the voice recognition unit 5. Then, it transmits to the coincidence degree determination unit 45. The coincidence determination unit 45 transmits the currently set directivity information, the recognition result, and the coincidence to the determination result storage unit 46 for storage.
[0027]
Next, in step B7, if re-recognition processing for all directivities stored in the directivity gain information storage unit 48 and acquisition of recognition results and coincidence have not been completed, the process returns to step B5, and directivity gain information is acquired. The process is repeated until the re-recognition process and the recognition result and the degree of coincidence are acquired for all directivities stored in the storage unit 48. If the re-recognition process and the recognition result and the degree of coincidence have been acquired for all directivities, the process proceeds to step B8.
[0028]
Next, when the process proceeds to step B8, the coincidence determination unit 45 determines that the recognition result is the correct answer, that is, the keyword “recognition start” from the recognition results and the coincidence of all directivities stored in the determination result storage unit 46. And the directivity gain control unit 47 controls the directivity gain adjustment unit 31 so that the directivity determined by the coincidence determination unit 45 is obtained. Control.
[0029]
Next, in step B9, the gain setting of the directivity gain adjustment unit 31 is set to a gain that is 3 dB higher than the initial gain stored in the directivity gain information storage unit 48 under the control of the directivity gain control unit 47, for example. In this case, the gain is adjusted only in the direction of directivity determined in step B8.
[0030]
Next, in step B10, the speech signal of the keyword “recognition start” stored in the speech buffers 2a to 2c in step B4 is reproduced under the control of the speech buffer control unit 43, and the directivity gain adjustment unit 31 directs the directivity gain. A voice signal having the directivity and gain set by the control unit 47 is generated, and the keyword recognition process is executed again by the voice recognition unit 5, and the voice recognition control unit 44 determines the degree of coincidence with the result of the recognition process. And sent to the coincidence determination unit 45. The coincidence determination unit 45 transmits the currently set directivity information, gain information, recognition result, and coincidence to the determination result storage unit 46 for storage.
[0031]
Next, in step B11, when the re-recognition processing for all gains stored in the directivity gain information storage unit 48 and the recognition result and the degree of coincidence have not been completed, the process returns to step B9 to return to the directivity gain information storage unit. The process is repeated until the re-recognition processing for all the gains stored in 48 and the acquisition of the recognition result and the matching degree are completed. If the re-recognition process and the recognition result and the degree of coincidence have been acquired for all gains, the process proceeds to step B12.
[0032]
Next, when the process proceeds to step B12, the coincidence degree determination unit 45 determines that the recognition result is correct and has the highest degree of coincidence based on the recognition results and coincidence values for all gains stored in the determination result storage unit 46. The directivity gain control unit 47 controls the directivity gain adjustment unit 31 so that the directivity and gain determined by the coincidence determination unit 45 are obtained.
[0033]
Next, in step B13, the audio input to the audio buffers 2a to 2c stopped in step B4 is resumed. That is, the respective signals input to the microphones 1 a to 1 c and converted into audio signals are stored in the audio buffers 2 a to 2 c under the control of the audio buffer control unit 43, and the stored audio signals are transmitted from the audio buffer control unit 43. The voice signal is input to the directivity gain adjustment unit 31 that is set to the directivity and gain with the highest recognition and the recognition result is correct in step B12. The output is input to the voice recognition unit 5. The voice recognition unit 5 executes processing for recognizing the recognition vocabulary stored in the voice recognition dictionary unit 6 under the control of the voice recognition control unit 44 with respect to the input voice signal.
Next, in step B14, the voice recognition unit 5 outputs the result of the recognition process, and controls an audio device (not shown).
[0034]
As described above, in the speech recognition method and apparatus according to the second embodiment, the direction of the speaker is determined using the degree of coincidence in speech recognition, and the optimum gain, ie, speech recognition, is further obtained using the degree of coincidence in speech recognition. Because the optimal input signal level is determined and the speaker's voice is extracted, the speaker's voice is effectively extracted even when the speaker's direction is not fixed, and the distance from the speaker to the microphone, etc. Even if the voice input level to the microphone is different, the recognition process can be executed with the optimum input signal level for voice recognition. In addition, no speaker direction detection means is required, and the cost of a direction detection sensor or the like can be reduced.
[0035]
Embodiment 3 FIG.
FIG. 6 is a block diagram showing a system configuration of the speech recognition apparatus according to Embodiment 3 of the present invention.
This embodiment shows an example of controlling a navigation device by voice. In FIG. 6, in addition to what was shown in FIG. 4, the navigation apparatus 7 is shown.
In FIG. 6, reference numeral 7 denotes a navigation device in which various operations are executed by the control unit 4 based on the result of voice recognition. Others are the same as in FIG.
[0036]
Next, the operation will be described.
FIG. 7 is a flowchart showing the operation content of the speech recognition apparatus shown in FIG. Steps C0 to C12 are the same as steps B0 to B12 in FIG. However, although an example of a keyword is described as “recognition start” in FIG. 5, another keyword such as “navigation” is used in FIG. 7.
[0037]
In step C13 following step C12, the audio input to the audio buffers 2a to 2c stopped in step C4 is resumed. That is, the respective signals input to the microphones 1 a to 1 c and converted into audio signals are stored in the audio buffers 2 a to 2 c under the control of the audio buffer control unit 43, and the stored audio signals are transmitted from the audio buffer control unit 43. The speech signal is input to the directivity gain adjustment unit 31 that is set to the directivity and gain with the highest degree of coincidence and the recognition result is correct in step C12. The output is input to the voice recognition unit 5. In the voice recognition unit 5, the control command vocabulary of the navigation device 7 stored in the voice recognition dictionary unit 6, for example, “detail display”, “wide area display”, is controlled by the voice recognition control unit 44 with respect to the input voice signal. ”And“ Destination setting ”are recognized.
[0038]
Next, in step C14, the voice recognition control unit 44 acquires the recognition result of the voice recognition unit 5, for example, “detailed display”, and transmits a control signal corresponding to the recognition result “detailed display” to the navigation device 7. Next, in step C15, a process of displaying the display screen in detail according to the control signal corresponding to the recognition result “detailed display” received by the navigation device 7 is executed.
[0039]
As described above, in the speech recognition method and apparatus of the third embodiment, the directivity is changed by using the speech signal of the speech buffer at the time when the keyword “navigation” is recognized, and the directivity having the highest degree of coincidence in speech recognition is obtained. Since the determination is made and this is set as the direction of the speaker, and the gain with the highest degree of matching is determined and the subsequent speech recognition process is executed, a voice recognition start switch or the like is unnecessary, and the operation becomes simple. Further, since the directivity and gain with the highest degree of coincidence in the keyword recognition process are determined, the direction of the speaker is not limited to the driver, and voice operation from the passenger seat is possible, which is optimal in the subsequent command recognition process. Recognition processing can be executed with directivity and gain, and recognition performance can be improved. In addition, no speaker direction detection means is required, and the cost of a direction detection sensor or the like can be reduced.
[0040]
【The invention's effect】
According to the speech recognition apparatus of the first aspect, the speech recognition unit that performs speech recognition processing on the signal with the changed directivity, and the coincidence determination unit that determines the directivity having the highest coincidence among the recognition results. Therefore, there is no switch or sensor for speaker identification even when there are multiple speakers and the direction is not fixed by setting the directivity that the matching score is determined to be the highest by the matching score determination unit. Thus, noise other than the speaker can be reduced to effectively extract the voice of the speaker, and a voice recognition device with high voice recognition performance and high convenience can be obtained.
[0041]
According to the speech recognition apparatus of the second aspect, the speech recognition unit that performs speech recognition processing on the signal with the changed directivity and gain, and the directivity and gain having the highest degree of coincidence are determined from the recognition results. Since it has a coincidence determination unit, the direction of the speaker is not fixed by setting to the directivity and gain determined to be the highest in the coincidence determination unit, and to the voice input unit (microphone) Even if the input level is high or low, it is possible to extract the speaker's voice effectively and perform recognition processing at an appropriate signal level without using a switch or sensor for speaker identification. A highly recognizable speech recognition apparatus can be obtained.
[0042]
According to the speech recognition method according to claim 3, a plurality of recognition processes with different directivities are performed on the recognized keyword speech signal, and the directivity having the highest degree of matching is determined from the recognition results. Because the voice recognition is performed after setting this directivity, the voice of the speaker can be extracted effectively without the switch or sensor for speaker discrimination even when the direction of the speaker is not fixed. And a highly convenient speech recognition method can be obtained.
[0043]
According to the speech recognition method according to claim 4, a plurality of recognition processes with different directivities and gains are performed on the recognized keyword speech signal, and the directivity having the highest degree of matching among the recognition results and Since the gain is determined, and the subsequent speech recognition is performed with this directivity and gain set, the speaker's direction is not fixed, and the speaker input is used even when the input level to the voice input unit is high or low. Without the switches and sensors, the voice of the speaker can be extracted effectively and the recognition process can be performed at an appropriate signal level, so that a voice recognition method with high voice recognition performance and high convenience can be obtained.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a system configuration of a speech recognition apparatus according to Embodiment 1 of the present invention.
2 is a plan view showing a microphone attachment position of the speech recognition apparatus of FIG. 1. FIG.
FIG. 3 is a flowchart showing the operation content of the speech recognition apparatus of FIG. 1;
FIG. 4 is a block diagram showing a system configuration of a speech recognition apparatus according to Embodiment 2 of the present invention.
FIG. 5 is a flowchart showing the operation content of the speech recognition apparatus of FIG. 4;
FIG. 6 is a block diagram showing a system configuration of a speech recognition apparatus according to Embodiment 3 of the present invention.
7 is a flowchart showing the operation content of the speech recognition apparatus of FIG. 6;
FIG. 8 is a block diagram showing a system configuration of a conventional speech recognition apparatus.
FIG. 9 is a flowchart showing the operation content of the speech recognition apparatus of FIG. 8;
[Explanation of symbols]
1 speech input unit, 2 speech buffer unit, 3 directivity adjustment unit, 5 speech recognition unit,
6 speech recognition dictionary unit, 31 directivity gain adjustment unit, 41 directivity control unit,
42 directivity information storage unit, 43 voice buffer control unit, 44 voice recognition control unit,
45 coincidence determination unit, 46 determination result storage unit, 47 directivity gain control unit,
48 Directivity gain information storage unit.

Claims

An audio input unit, an audio buffer unit that stores a signal from the audio input unit, a directivity adjustment unit that generates a signal whose directivity is changed from the signal stored in the audio buffer unit, and the directivity adjustment unit A speech recognition unit that performs speech recognition processing on a signal whose directivity has been changed by the above, a speech buffer control unit that reproduces a signal stored in the speech buffer unit, and a plurality of directivities that are changed by the directivity adjustment unit A directivity information storage unit storing the directivity information; a directivity control unit that selects one of the directivity information stored in the directivity information storage unit to control the directivity adjustment unit; and the speech recognition unit A speech recognition dictionary part that stores the recognition target in
A voice recognition control unit that performs directivity processing a plurality of times using a signal of each directivity by changing the directivity a plurality of times in the directivity adjustment unit under the control of the directivity control unit;
A determination result storage unit that stores the degree of coincidence between the changed recognition results for the plurality of directivities and the data stored in the speech recognition dictionary unit, and the degree of coincidence among the recognition results stored in the determination result storage unit A matching degree determination unit that determines the highest directivity, determines the directivity with the highest degree of matching, and continuously uses the determined directivity in a series of subsequent voice conversations to receive voice from the voice operator Voice recognition device that inputs with the highest recognition performance directivity .

An audio input unit, an audio buffer unit for storing a signal from the audio input unit, a directivity gain adjusting unit for generating a signal in which directivity and gain are changed from the signal stored in the audio buffer unit, and the directivity A speech recognition unit that performs speech recognition processing on a signal whose directivity and gain have been changed by a directivity gain adjustment unit, a speech buffer control unit that reproduces a signal stored in the speech buffer unit, and the directivity gain adjustment unit A directivity gain information storage unit storing a plurality of directivity information and gain information to be changed in step 1, and selecting each one from the directivity information and gain information stored in the directivity gain information storage unit A directivity gain control unit that controls the directional gain adjustment unit, a speech recognition dictionary unit that stores recognition targets in the speech recognition unit,
A voice recognition control unit that executes a recognition process a plurality of times using a signal of each directivity by changing the directivity and gain a plurality of times in the directivity adjustment unit under the control of the directivity control unit;
A determination result storage unit that stores the degree of coincidence between the recognition results for the changed plurality of directivities and gains and the data stored in the speech recognition dictionary unit, and a match among the recognition results stored in the determination result storage unit And a coincidence determination unit that determines the directivity and gain with the highest degree of determination, determine the directivity and gain with the highest degree of coincidence, and continue to use the determined directivity and gain in the subsequent series of voice conversations A voice recognition device for inputting voice from a voice operator with directivity having the highest recognition performance .

A step of performing recognition processing on the input voice signal to determine whether or not a keyword is recognized. When a keyword is recognized, each directional voice in which directivity is changed with respect to the voice signal of the keyword. A signal is used to perform recognition processing multiple times to obtain a degree of coincidence between the changed recognition results for the plurality of directivities and the data stored in the speech recognition dictionary, and the degree of coincidence is the highest among these recognition results. A speech recognition method comprising: determining a high directivity; and setting a directivity to the determined directivity having the highest degree of coincidence to perform subsequent operation command speech recognition of an audio device or the like .

A step of performing recognition processing on the input voice signal to determine whether or not a keyword is recognized. When a keyword is recognized, each directivity in which directivity and gain are changed with respect to the voice signal of the keyword . A step of performing recognition processing a plurality of times using the sound signal of each gain, and obtaining a degree of coincidence between the recognition results in the changed plurality of directivities and gains and the data stored in the speech recognition dictionary unit, and these recognition results Determining the directivity and gain with the highest degree of coincidence from the above, and setting the directivity and gain to the determined directivity and gain with the highest degree of coincidence for subsequent operation command speech recognition of audio equipment, etc. A speech recognition method comprising the steps of: