JP4201411B2

JP4201411B2 - Voice recognition device and navigation system

Info

Publication number: JP4201411B2
Application number: JP35921198A
Authority: JP
Inventors: 英夫宮内; 一郎赤堀; 教英北岡
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 1998-12-17
Filing date: 1998-12-17
Publication date: 2008-12-24
Anticipated expiration: 2018-12-17
Also published as: JP2000181488A

Description

【０００１】
【発明の属する技術分野】
本発明は、例えばナビゲーションシステムにおける目的地の設定などを音声入力できるようにする場合などに有効な音声認識装置及びその音声認識装置を備えたナビゲーションシステムに関する。
【０００２】
【従来の技術】
従来より、入力された音声を予め記憶されている複数の比較対象パターン候補と比較し、一致度合の高いものを認識結果とする音声認識装置が既に実用化されており、例えばナビゲーションシステムにおいて設定すべき目的地を利用者が地名を音声で入力するためなどに用いられている。特に車載ナビゲーションシステムを運転手自身が利用する場合、音声入力であればボタン操作や画面注視が伴わないため、車両の走行中に行っても安全性が高いため有効である。
【０００３】
また、音声認識装置側が常に音声入力に備えるように準備しておくのは負担が大きいため、認識対象となる音声を入力させる期間の開始及び終了タイミングについては利用者自身が指定するような構成が採用されることが多い。この入力期間を指定するための手段としては、例えばＰＴＴ（Push-To-Talk）スイッチなどが利用される。つまり、ＰＴＴスイッチが押されている場合にだけ入力音声に対する認識処理を実行するようにして、実際上必要がない入力音声については処理をしないようにしている。
【０００４】
このように利用価値の高い音声入力の手法であるが、利用者が音声認識のシステム自体に慣れていない初心者であると、次のような問題点が生じる。例えば、利用方法が判らないことに起因して、装置にとっては不適切な音声を適当に入力してしまい実効性が上がらないという問題や、あるいは、極端な場合には装置自体を使わないという問題がある。また、利用方法自体をある程度は理解しているが具体的な利用方法に慣れていないため、例えば入力すべき言葉がすぐに出てこなくて沈黙してしまうという問題もある。
【０００５】
このような初心者のための対策として、例えば所定のガイド内容を報知することも考えられる。例えば入力できるコマンドとしてどのようなものがあるのかをガイドすることが考えられる。具体的には、「目的地の設定、地図拡大・縮小が行えます」というようなガイド内容を報知する。あるいは、コマンドの具体的な入力方法を知らない場合に「目的地を設定するときは都道府県名から入力して下さい」という入力方法の説明をガイド内容として報知することも考えられる。
【０００６】
【発明が解決しようとする課題】
しかしながら、このように初心者向けの対策を講じた場合、初心者にとっては有益なガイド内容も、熟練者にとっては不要なものとなる。つまり、そのようなガイド内容を聞かなくても入力できるコマンドや入力方法自体を熟知している場合には、そのガイド内容が出力されている間は音声入力ができなくなるなどの不都合が生じてくる。
【０００７】
また、全くガイド内容を聞かなくてよい熟練者のレベルまではいかないが、概略的なガイドだけしてもらえば後は自分で対処できる「中級者」レベルの場合には、初心者と同じ内容のガイドは不要である。そして、これら初心者、中級者、熟練者は必ずしも別人ではなく、同じ人が、最初は初心者であり、その後、中級者、熟練者とステップアップしていく。したがって、装置側としても、初心者用、中級者用、熟練者用というように使用者のレベルに合わせて固定的にシステム設定することは好ましくなく、使用時の状況に応じて、初心者には初心者用の入力方法、中級者には中級者用の入力方法、熟練者には熟練者用の入力方法を任意に選択できるようなシステム設定とすることが期待される。
【０００８】
本発明は、このような問題を解決し、音声認識装置の利用方法に対する熟知度合が異なる各利用者のいずれにとっても使い勝手の良い音声認識装置及びその音声認識装置を備えたナビゲーションシステムを提供することを目的とする。
【０００９】
【課題を解決するための手段及び発明の効果】
上記目的を達成するためになされた請求項１、７に記載の音声認識装置によれば、利用者は例えばマイクロフォンなどの音声入力手段を介して音声を入力するのであるが、認識対象の音声を入力させる期間の開始及び終了を利用者自身が指定するために入力期間指定手段が設けられており、この入力期間指定手段によって指定された入力期間内に入力された音声が認識対象となる。そして、認識手段が、その入力された音声を予め辞書手段に記憶されている複数の比較対象パターン候補と比較して一致度合の高いものを認識結果とし、報知手段によって認識結果を報知する。そして、認識結果が報知された後に所定の確定指示がなされた場合には、確定後処理手段が、その認識結果を確定したものとして所定の確定後処理を実行する。
【００１０】
そして、本発明の音声認識装置においては、２つ以上のガイド要求操作が設定されている。このガイド要求操作は、入力期間指定手段には、前記音声入力期間の開始及び終了の指定操作とは異なる２つ以上のガイド要求操作が設定されており、当該ガイド要求操作それぞれに対して、熟知度合に応じたガイド要求レベルが設定されている。そして、ガイド手段は、入力期間指定手段によるガイド要求操作がされた場合には、対応するガイド要求レベルに応じた所定のガイド内容を報知する。ここで請求項１の発明では、前記ガイド手段は、前記ガイド内容として、前記音声入力手段を介した指示が可能な設定項目自体を報知する。また、請求項７の発明では、前記ガイド手段が報知する前記所定のガイド内容は、本音声認識装置による音声認識結果が利用されるシステムでの処理に関する内容であり、前記ガイド手段は、前記所定のガイド内容を報知する時点での前記システムでの処理実行段階に対応したガイド内容を報知するよう構成され、前記ガイド手段は、前記システムでの処理実行段階に対応したガイド内容として、その処理実行段階において前記音声入力手段を介した指示が可能な設定項目自体を報知する。
【００１１】
ここで、「音声入力期間の開始及び終了の指定操作とは異なるガイド要求操作」とは、例えば次のような場合を言う。例えば、入力期間指定手段による音声入力期間の開始及び終了の指定操作を、スイッチの操作開始から終了までが所定時間以上となる場合の、その操作開始及び終了にそれぞれ対応させた場合を想定する。この場合、スイッチ操作開始後に音声入力を行い、その後に操作終了となるため、スイッチ操作が継続している時間が例えば１秒以下となるようなことは想定しにくい。したがって、例えばスイッチ操作開始から終了までの時間が例えば０．５秒以下というような比較的短いクリック時間となる、いわゆる「クリック操作」を上述の「音声入力期間の開始及び終了の指定操作とは異なるガイド要求操作」としておけばよい。もちろん、装置側において通常の音声入力期間の開始及び終了の指定ではありえない様な特別の操作であれば、これ以外の操作もガイド要求のための操作であるとしてよい。例えば、クリック操作が所定間隔以内に２回連続して行われる「ダブルクリック操作」であってもよい。
【００１２】
このようにすれば、当該ガイド要求操作が２つ以上設定されている場合には、ガイドが必要な利用者は、自分の熟知度合に応じたガイド要求レベルでのガイド内容となるよう、対応するガイド要求操作をすればよく、一方、ガイドが必要ない利用者は、ガイド要求操作をせずに通常通り、入力期間指定手段による音声入力期間の開始及び終了の指定操作をしながら所定の音声入力をすればよい。
【００１３】
この場合の所定のガイド内容としては種々考えられるので、理解を容易にするため、ここでは、ナビゲーションシステムにおける目的地などを音声入力するために音声認識装置を適用した場合を想定していくつか説明する。
▲１▼まず、ナビゲーションシステムに対して自分が音声で何を入力することができるのか自体を知らない利用者が想定される。この場合にはクリック操作をすることで、例えば「目的地の設定が行えます」というように、音声入力によって指示が可能な設定項目自体を案内するものが考えられる。なお、音声入力によって指示する対象のシステム（この場合にはナビゲーションシステム）における一連の処理のどの段階にあるかで内容を変更してもよい。例えば、目的地までの経路設定を一連の処理と考えると、上述した目的地の設定がされた後、例えば経由地の指定はするのか、というような経路を設定する上での条件をさらに指定する場合がある。したがって、目的地の設定が済んでいる状況で、上述した音声入力期間の開始が指定された時点から所定時間経過しても音声入力がない場合には、今度は「経由地の指定が行えます」というようなガイド内容を報知すればよい。
【００１４】
▲２▼また、ナビゲーションシステムに対して自分が音声で何を入力することができるのか自体は知っているが、具体的な入力方法を熟知していない利用者も想定される。この場合には、「目的地を設定するときは都道府県名から入力して下さい」というように、入力方法の説明をガイド内容として報知することが考えられる。また、これでも利用者がどのように入力するかを完全には理解できず、入力できずにいる場合には、「例えば愛知県刈谷市昭和町と入力して下さい」というように、具体的な入力例をガイド内容として報知することが考えられる。これによって利用者は具体的な入力方法が判り、また、具体例まで報知してもらえばそれに倣って自分が希望する目的地を容易に入力することができる。
【００１５】
もちろん、上記▲１▼及び▲２▼で説明したガイド内容は一例であり、適用するナビゲーションシステムの処理に適合した適切なガイド内容を出せばよいし、またナビゲーションシステム以外に適用するのであれば、そのシステムの処理に適合するように工夫すればよい。
【００１６】
また、ガイドが必要な対象者として、初心者レベル及び中級者レベルの２つを持つようにした場合に、初心者レベルの場合にどのようなガイド内容にし、中級者の場合にどのようなガイド内容とするか、などについても、当該システムの使用方法の難易性に応じて適宜設定すればよい。例えば、初心者レベルではガイド要求操作がある度に、Ａ→Ｂ→Ｃという順番でガイド内容を報知するようにされている場合に、中級者レベルではＡ，Ｂ，Ｃのガイド内容をまとめて報知してしまうことも考えられる。もちろん、適用するシステムによっては、ガイドが必要な対象者を３つ以上のレベルに分類してもよい。
【００１７】
一方、ガイド要求操作をしない限りガイド内容が勝手に報知されてしまうことはないため、ガイドが必要でない利用者にとっては、入力期間指定手段による音声入力期間の開始及び終了の指定操作をしながら所定の音声入力をすれば、相対的に（つまりガイド内容を報知させている状態よりも）短時間で所望の音声入力を行うことができる。
【００１８】
このように、本発明の音声認識装置によれば、音声認識装置の利用方法に対する熟知度合が異なる各利用者のいずれにとっても使い勝手が良くなる。その上、ガイド要求操作を受け付けるために特別な構成を必要とするのではなく、音声入力のための基本的な処理に必要な入力期間指定手段をガイド要求操作にも利用できるため、構成の簡略化にも寄与する。
【００１９】
なお、ガイドが必要な対象者として、初心者レベル及び中級者レベルの２つを持つようにした場合のそれぞれのガイド要求操作としては、請求項２、８に示すようにすることが考えられる。すなわち、入力期間指定手段による音声入力期間の開始及び終了の指定操作を、スイッチの操作開始から終了までが所定時間以上となる場合の、その操作開始及び終了にそれぞれ対応させることを前提とし、初心者レベル及び中級者レベルに対するガイド要求操作として、クリック操作とダブルクリック操作を、各々いずれかのレベルの操作に対応させて設定するのである。
【００２０】
なお、請求項３、９に示すように、初心者用のガイドにおいて、所定のガイド内容を報知した後、音声入力がされることなくさらにクリック操作がなされた場合、従前に報知したガイド内容よりも詳細なガイド内容を報知するようにしてもよい。
【００２１】
もちろん、これらは一例であり、操作内容の違いが装置側で区別できればよい。なお、報知手段による認識結果の報知形態、あるいはガイド手段によるガイド内容の報知形態については、利用者に対して報知が可能であればどのような形態でもよいが、請求項４、１０に示すように、少なくとも音声による報知を行なうことが好ましいと考えられる。これは、例えばカーナビゲーションシステムなどの車載機器用として用いる場合には、音声で出力されればドライバーは視点を表示装置にずらしたりする必要がないので、安全運転のより一層の確保の点では有利であることなどの理由からである。また、認識結果の報知を音声で行ない、かつガイド内容の報知も音声で行なえば、それらのためのハード構成を共通化することができる。但し、音声に加え例えば画像で報知するようにしてもよい。車載機器として適用する場合に音声出力が有利であることを述べたが、もちろん車両が走行中でない状況もあるので、音声及び画像の両方で報知すれば、ドライバーは表示による確認と音声による確認との両方が可能となる。
【００２２】
また、請求項１〜１０のいずれかに記載の音声認識装置をナビゲーションシステム用として用いる場合には、請求項１１に示すように構成することが考えられる。すなわち、請求項１〜１０のいずれかに記載の音声認識装置と、ナビゲーション装置とを備え、音声認識装置の音声入力手段は、少なくともナビゲーション装置がナビゲート処理をする上で指定される必要のある所定のナビゲート処理関連データの指示を利用者が音声にて入力するために用いられるものであり、確定後処理手段は、認識手段による認識結果をナビゲーション装置に出力するよう構成されているのである。この場合の「所定のナビゲート処理関連データ」としては、目的地が代表的なものとして挙げられるが、それ以外にもルート探索に関する条件選択など、ナビゲート処理をする上で指定の必要のある指示が含まれる。
【００２３】
なお、音声認識装置の適用先としては、上述したナビゲーションシステムには限定されない。例えば音声認識装置を空調システム用として用いる場合には、設定温度の調整、空調モード（冷房・暖房・ドライ）の選択、あるいは風向モードの選択を音声入力によって行うようにすることが考えられる。そして、この場合には、その設定項目（温度・空調モード・風向モードなど）自体をガイド内容として報知したり、あるいは、「設定温度を２５度にする」と言えばよいのか「設定温度を５度下げる」というように言えばよいのか、などをさらにガイド内容として報知することが考えられる。空調モードや風向モードなどについても同様である。
【００２４】
なお、上述のナビゲーションシステム及び空調システムは、車載機器として用いられる場合だけではなく、例えば携帯型ナビゲーション装置や屋内用空調装置などでもよい。但し、これまで説明したように車載機器用として用いる場合には利用者がドライバーであることが考えられ、その場合には運転自体が最重要であり、それ以外の車載機器については、なるべく運転に支障がないことが好ましい。したがって、車載機器としてのナビゲーションシステムや空調システムを前提とした音声認識装置の場合には、より一層の利点がある。もちろん、このような視点で考えるならば、ナビゲーションシステムや空調システム以外の車載機器に対しても同様に利用することができる。例えば、カーオーディオ機器などは有効である。また、いわゆるパワーウインドウの開閉やミラー角度の調整などを音声によって指示するような構成を考えれば、そのような状況でも有効である。
【００２５】
また、車載機器用とした場合にはそれ特有の利点があることは述べたが、本発明の音声認識装置の適用先としては、利用者による音声入力指示にしたがって所定の処理を実行するものであれば同様に考えられる。例えば、携帯用の情報端末装置、あるいは街頭やパーキングエリアなどに設定される情報端末装置などにも同様に適用できる。
【００２６】
【発明の実施の形態】
図１は本発明の実施形態の音声認識装置３０を適用したカーナビゲーションシステム２の概略構成を示すブロック図である。本カーナビゲーションシステム２は、位置検出器４、地図データ入力器６、操作スイッチ群８、これらに接続された制御回路１０、制御回路１０に接続された外部メモリ１２、表示装置１４及びリモコンセンサ１５及び音声認識装置３０を備えている。なお制御回路１０は通常のコンピュータとして構成されており、内部には、周知のＣＰＵ、ＲＯＭ、ＲＡＭ、Ｉ／Ｏ及びこれらの構成を接続するバスラインが備えられている。
【００２７】
位置検出器４は、周知の地磁気センサ１６、ジャイロスコープ１８、距離センサ２０、及び衛星からの電波に基づいて車両の位置を検出するＧＰＳ（Global Positioning System ）のためのＧＰＳ受信機２２を有している。
これらのセンサ等１６，１８，２０，２２は各々が性質の異なる誤差を持っているため、複数のセンサにより、各々補間しながら使用するように構成されている。なお、精度によっては上述した内の一部で構成してもよく、更に、ステアリングの回転センサ、各転動輪の車輪センサ等を用いてもよい。
【００２８】
地図データ入力器６は、位置検出の精度向上のためのいわゆるマップマッチング用データ、地図データ及び目印データを含む各種データを入力するための装置である。媒体としては、そのデータ量からＣＤ−ＲＯＭを用いるのが一般的であるが、メモリカード等の他の媒体を用いても良い。
【００２９】
表示装置１４はカラー表示装置であり、表示装置１４の画面には、位置検出器４から入力された車両現在位置マークと、地図データ入力器６より入力された地図データと、更に地図上に表示する誘導経路や後述する設定地点の目印等の付加データとを重ねて表示することができる。
【００３０】
また、本カーナビゲーションシステム２は、リモートコントロール端末（以下、リモコンと称する。）１５ａを介してリモコンセンサ１５から、あるいは操作スイッチ群８により目的地の位置を入力すると、現在位置からその目的地までの最適な経路を自動的に選択して誘導経路を形成し表示する、いわゆる経路案内機能も備えている。このような自動的に最適な経路を設定する手法は、ダイクストラ法等の手法が知られている。操作スイッチ群８は、例えば、表示装置１４と一体になったタッチスイッチもしくはメカニカルなスイッチ等が用いられ、各種入力に使用される。
【００３１】
そして、音声認識装置３０は、上記操作スイッチ群８あるいはリモコン１５ａが手動操作により目的地などを指示するために用いられるのに対して、利用者が音声で入力することによっても同様に目的地などを指示することができるようにするための装置である。
【００３２】
この音声認識装置３０は、「認識手段」としての音声認識部３１と、「確定後処理手段」としての対話制御部３２と、音声合成部３３と、音声抽出部３４と、「音声入力手段」としてのマイク３５と、「入力期間指定手段」としてのＰＴＴ（Push-To-Talk）スイッチ３６と、スピーカ３７と、制御部３８と、表示制御部３９と、表示部４０とを備えている。
【００３３】
音声認識部３１は、音声抽出部３４から入力された音声データを、対話制御部３２からの指示により入力音声の認識処理を行い、その認識結果を対話制御部３２に返す。すなわち、音声抽出部３４から取得した音声データに対し、記憶している辞書データを用いて照合を行ない、複数の比較対象パターン候補と比較して一致度の高い上位比較対象パターンを対話制御部３２へ出力する。入力音声中の単語系列の認識は、音声抽出部３４から入力された音声データを順次音響分析して音響的特徴量（例えばケプストラム）を抽出し、この音響分析によって得られた音響的特徴量時系列データを得る。そして、周知のＤＰマッチング法、ＨＭＭ（隠れマルコフモデル）あるいはニューラルネットなどによって、この時系列データをいくつかの区間に分け、各区間が辞書データとして格納されたどの単語に対応しているかを求める。
【００３４】
対話制御部３２は、音声認識部３１における認識結果や制御部３８からの指示に基づき、音声合成部３３への応答音声の出力指示や表示制御部３９への応答表示の出力指示、あるいは、システム自体の処理を実行する制御回路１０に対して例えばナビゲート処理のために必要な目的地を通知して設定処理を実行させるよう指示する処理を実行する。このような処理が確定後処理であり、結果として、この音声認識装置３０を利用すれば、上記操作スイッチ群８あるいはリモコン１５ａを手動しなくても、音声入力によりナビゲーションシステムに対する目的地の指示などが可能となるのである。
【００３５】
なお、音声合成部３３は、波形データベース内に格納されている音声波形を用い、対話制御部３２からの応答音声の出力指示に基づく音声を合成する。この合成音声がスピーカ３７から出力されることとなる。また、表示制御部３９が、対話制御部３２からの応答表示の出力指示に基づく表示画像を生成する。この生成画像が表示部４０に出力されることとなる。
【００３６】
音声抽出部３４は、マイク３５にて取り込んだ周囲の音声をデジタルデータに変換して音声認識部３１に出力するものである。詳しくは、入力した音声の特徴量を分析するため、例えば数１０ｍｓ程度の区間のフレーム信号を一定間隔で切り出し、その入力信号が、音声の含まれている音声区間であるのか音声の含まれていない雑音区間であるのか判定する。マイク３５から入力される信号は、認識対象の音声だけでなく雑音も混在したものであるため、音声区間と雑音区間の判定を行なう。この判定方法としては従来より多くの手法が提案されており、例えば入力信号の短時間パワーを一定時間毎に抽出していき、所定の閾値以上の短時間パワーが一定以上継続したか否かによって音声区間であるか雑音区間であるかを判定する手法がよく採用されている。そして、音声区間であると判定された場合には、その入力信号が音声認識部３１に出力されることとなる。
【００３７】
また、本実施形態においては、利用者がＰＴＴスイッチ３６を押しながらマイク３５を介して音声を入力するという使用方法である。具体的には、制御部３８がＰＴＴスイッチ３６が押されたタイミングや戻されたタイミング及び押された状態が継続した時間を監視しており、ＰＴＴスイッチ３６が押された場合には音声抽出部３４及び音声人式部３１に対して処理の実行を指示する。一方、ＰＴＴスイッチ３６が押されていない場合にはその処理を実行させないようにしている。したがって、ＰＴＴスイッチ３６が押されている間にマイク３５を介して入力された音声データが音声認識部３１へ出力されることとなる。
【００３８】
また、制御部３８は、ＰＴＴスイッチ３６がクリック操作されたこと、及びダブルクリック操作されたことも判断できるようにされている。具体的には、ＰＴＴスイッチ３６がオンされた後の比較的短い時間（例えば０．５秒以内）にオフされた場合にはそれをクリック操作とみなす。そして、そのクリック操作が所定間隔以内（例えば０．５秒以内）に２回連続して行われた場合にクリック操作とみなす。
【００３９】
次に、本実施形態１のカーナビゲーションシステム２の動作について説明する。なお、音声認識装置３０に関係する部分が特徴であるので、カーナビゲーションシステム２としての一般的な動作を簡単に説明した後、音声認識装置３０に関係する部分の動作について詳しく説明することとする。
【００４０】
カーナビゲーションシステム２の電源オン後に、表示装置１４上に表示されるメニューから、ドライバーがリモコン１５ａ（操作スイッチ群８でも同様に操作できる。以後の説明においても同じ）により、案内経路を表示装置１４に表示させるために経路情報表示処理を選択した場合、あるいは、音声認識装置３０を介して希望するメニューをマイク３５を介して音声入力することで、対話制御部３２から制御回路１０へ、リモコン１５ａを介して選択されるのを同様の指示がなされた場合、次のような処理を実施する。
【００４１】
すなわち、ドライバーが表示装置１４上の地図に基づいて、音声あるいはリモコンなどの操作によって目的地を入力すると、ＧＰＳ受信機２２から得られる衛星のデータに基づき車両の現在地が求められ、目的地と現在地との間に、ダイクストラ法によりコスト計算して、現在地から目的地までの最も短距離の経路を誘導経路として求める処理が行われる。そして、表示装置１４上の道路地図に重ねて誘導経路を表示して、ドライバーに適切なルートを案内する。このような誘導経路を求める計算処理や案内処理は一般的に良く知られた処理であるので説明は省略する。
【００４２】
次に、音声認識装置３０における動作について、上述の経路案内のための目的地を音声入力する場合を例にとって説明する。
まず、最初のステップＳ１１０では、ＰＴＴスイッチ３６がオンされたか（押下されたか）否かを判断する。この判断は制御部３８で行われる。そして、ＰＴＴスイッチ３６がオンされた場合には（Ｓ１１０：ＹＥＳ）、続くＳ１２０でクリック操作かどうかを判断する。この判断も制御部３８にて行われるが、具体的には、ＰＴＴスイッチ３６がオンされた後の比較的短い時間（例えば０．５秒以内）にオフされた場合にはそれをクリック操作とみなす。クリック操作がされた場合には（Ｓ１２０：ＹＥＳ）、続くＳ１３０でダブルクリック操作かどうかを判断する。この判断も制御部３８にて行われるが、具体的には、クリック操作が所定間隔以内（例えば０．５秒以内）に２回連続して行われた場合にはそれをダブルクリック操作とみなす。
【００４３】
Ｓ１３０にて否定判断であればクリック操作がなされたと判断し、Ｓ１４０へ移行して、初心者用のガイド内容を報知する。一方、Ｓ１３０にて肯定判断であればダブルクリック操作がなされたと判断し、Ｓ１５０へ移行して、中級者用のガイド内容を報知する。
【００４４】
これら初心者用及び中級者用のガイド内容については、後述することとして、図２のフローチャートの説明を続ける。
Ｓ１２０にて否定判断、つまりクリック操作でない場合には、ＰＴＴスイッチ３６が所定時間以上押し続けられている状態であるため、Ｓ１６０へ移行して音声抽出及び認識の処理を実行する。音声抽出部３４はマイク３５を介して入力された音声データに基づき音声区間であるか雑音区間であるかを判定し、音声区間のデータを音声認識部へ出力する。音声認識部３１では、音声抽出部３４から取得した音声データに対し、記憶されている辞書データを用いて照合を行なう。そして、その照合結果によって定まった上位比較対象パターンを認識結果として対話制御部３２に出力することとなる。
【００４５】
ＰＴＴスイッチ３６のオン状態が継続している間は（Ｓ１７０：ＹＥＳ）、このような音声抽出・認識（Ｓ１６０）を実行し、ＰＴＴスイッチ３６がオフされた場合には（Ｓ１７０：ＮＯ）、続くＳ１８０にて、その認識結果をトークバック及び表示する。つまり、対話制御部３２が音声合成部３３及び表示制御部３９を制御し、認識した結果を音声によりスピーカ３７から出力させると共に、表示部４０に認識結果を示す文章を表示させる。
【００４６】
その後、Ｓ１９０にて正しい認識であるかどうかを判断する。これは、利用者からの指示に応じて判断することとなるが、例えば正しい認識であればマイク３５から「はい」を音声入力し、間違った認識であれば「いいえ」を音声入力するようにしておくことが考えられる。もちろん、操作スイッチ群８を介してこれらの指示を入力するようにしてもよい。
【００４７】
そして、誤った認識であれば（Ｓ１９０：ＮＯ）、Ｓ１１０へ戻るが、正しい認識である場合には（Ｓ１９０：ＹＥＳ）、Ｓ２００へ移行して認識結果を確定する。そして続くＳ２１０にて、所定の確定後処理を実行する。この場合の確定後処理とは、認識結果としての「経路案内のための目的地」に関するデータを、制御回路１０（図１参照）へ出力する処理などである。
【００４８】
このような確定後処理（Ｓ２１０）が終了した後は、Ｓ１１０へ移行して、ＰＴＴスイッチ３６がオンされたか（押下されたか）否かの判断処理へ戻る。
なお、本実施形態の場合には、対話制御部３２、音声合成部３３、制御部３８、表示制御部３９及び表示部４０が「ガイド手段」に相当する。
【００４９】
ここで、Ｓ１４０及びＳ１５０にて報知する初心者用及び中級者用のガイド内容について説明する。
まず、初心者用のガイドについて説明する。
▲１▼このガイドは、ナビゲーションシステムに対して自分が音声で何を入力することができるのか自体を知らない利用者を対象とした内容とされている。そのため、例えば「目的地の設定が行えます」というように、音声入力によって指示が可能な設定項目自体を案内するものが考えられる。
【００５０】
なお、音声入力によって指示する対象のシステム（この場合にはナビゲーション装置）における一連の処理のどの段階にあるかで内容を変更してもよい。例えば、目的地までの経路設定を一連の処理と考えると、上述した目的地の設定がされた後、例えば経由地の指定はするのか、というような経路を設定する上での条件をさらに指定する場合がある。したがって、目的地の設定が済んでいる状況でクリック操作がされた場合、今度は「経由地の指定が行えます」というようなガイド内容を報知すればよい。
【００５１】
▲２▼また、上述したガイドがされて、ナビゲーションシステムに対して何を入力すればよいかが判った利用者であっても、具体的な入力方法を知らない場合も想定される。したがって、上述の「目的地の設定が行えます」というガイドをした後に、音声入力がされることなくさらにクリック操作がされた場合には、「目的地を設定するときは都道府県名から入力して下さい」というように、入力方法の説明をガイド内容として報知することが考えられる。
【００５２】
なお、これでも利用者がどのように入力するかを完全には理解できず、入力できずにいる場合も想定されるので、上述の「目的地を設定するときは都道府県名から入力して下さい」というガイドをした後に、音声入力がされることなくさらにクリック操作がされた場合には、「例えば愛知県刈谷市昭和町と入力して下さい」というように、具体的な入力例をガイド内容として報知することが考えられる。これによって利用者は具体的な入力方法が判り、また、具体例まで報知してもらえばそれに倣って自分が希望する目的地を容易に入力することができる。
【００５３】
次に、中級者用のガイドについて説明する。
上述の初心者用ガイドでは、目的地設定に際して、１回クリック操作をする毎に、「目的地の設定が行えます」→「経由地の指定が行えます」→その他、というように段階的にガイドした。これに対して、中級者用ガイドとしては、それらをまとめて報知してしまうことも考えられる。また、目的地設定を行えるという基本的なガイドはせずに、その設定の際の指定可能条件（つまり経由地の指定や道路種類の指定など）のみをガイドしてもよい。
【００５４】
また、上述した▲２▼の内容のような初心者用ガイドは、中級者用としては報知する必要はない。
このように、本実施形態の音声認識装置３０を用いれば、何を音声入力してよいのかまで判っていない初心者にとっては、クリック操作をすれば、その場面場面で「初心者」にとって必要なガイドがなされるため、非常に使い勝手が良くなる。また、基本的なことは判っているが、細かい点については判っていない中級者にとっては、ダブルクリック操作をすれば、その場面場面で「中級者」に必要なガイドがなされるため、やはり非常に使い勝手が良くなる。また、初心者用のガイドのような逐次的なものでもないため、まどろっこしいこともない。
【００５５】
一方、クリックあるいはダブルクリック操作というガイドを要求する操作をしない限りガイド内容が勝手に報知されてしまうことはないため、ガイドが必要でない利用者にとっては、通常通り、ＰＴＴスイッチ３６を押している間に所定の音声入力をすれば、相対的に（つまりガイド内容を報知させている状態よりも）短時間で所望の音声入力を行うことができる。
【００５６】
このように、本実施形態の音声認識装置３０によれば、音声認識装置３０の利用方法に対する熟知度合が異なる各利用者のいずれにとっても使い勝手が良くなる。その上、ガイド要求操作を受け付けるために特別な構成を必要とするのではなく、音声入力のための基本的な処理に必要なＰＴＴスイッチ３６及び制御部３８をガイド要求操作にも利用できるため、構成の簡略化にも寄与する。
【００５７】
以上、本発明はこのような実施形態に何等限定されるものではなく、本発明の主旨を逸脱しない範囲において種々なる形態で実施し得る。
（１）例えば、上記実施形態では、目的地設定に関するガイド内容を取り上げて説明したが、他にも地図の拡大、縮小や向き指定（ノースアップ・ヘディングアップなど）などについてのガイド内容、あるいはその他、カーナビゲーションシステムにて音声入力が可能な項目であれば、同様にガイド内容を設定することができる。
【００５８】
（２）上記実施例では、ＰＴＴスイッチ３６の長押しを音声入力の期間を指示する操作として採用したが、その音声入力期間の開始及び終了の指定操作にクリック操作又はダブルクリック操作を採用してもよい。その場合には、ガイド要求操作として、スイッチの操作開始から終了までが所定時間以上となる操作を採用する。そしてこの場合は、スイッチ操作が所定時間継続した時点からガイド内容を報知し始め、スイッチ操作が終了した時点でそのガイド内容の報知も終了するようにすればよい。そして、そのガイド内容の報知後に入力された利用者の音声を認識すればよい。
【００５９】
（３）また、上記実施形態では、ガイド対象の利用者レベルとして初心者及び中級者という２種類としたが、３種類以上設定しても構わない。
（４）また、上記実施形態では、認識結果の報知やガイド内容の報知を音声と画像の２種類で行なうようにしたが、利用者に対して報知が可能であればどのような形態でもよく、仮に１つだけ採用するならば、音声による報知を行なうことが好ましい。これは、カーナビゲーションシステム２という車載機器として用いていることを考慮したものであり、音声で出力されればドライバーは視点を表示装置にずらしたりする必要がなく、安全運転のより一層の確保の点では有利であることなどの理由からである。もちろん上記実施形態のように、音声及び画像の両方で報知すれば、ドライバーは表示による確認と音声による確認との両方が可能となる。
【００６０】
（５）また、音声認識装置３０の適用先としては、上述したナビゲーションシステムには限定されない。例えば音声認識装置を空調システム用として用いる場合には、設定温度の調整、空調モード（冷房・暖房・ドライ）の選択、あるいは風向モードの選択を音声入力によって行うようにすることが考えられる。そして、この場合には、その設定項目（温度・空調モード・風向モードなど）自体をガイド内容として報知したり、あるいは、「設定温度を２５度にする」と言えばよいのか「設定温度を５度下げる」というように言えばよいのか、などをさらにガイド内容として報知することが考えられる。空調モードや風向モードなどについても同様である。
さらに、ナビゲーションシステムや空調システムを車載機器として用いる場合には限定されず、例えば携帯型ナビゲーション装置や屋内用空調装置などのように車載機器以外に用いてもよい。但し、これまで説明したように車載機器用として用いる場合には利用者がドライバーであることが考えられ、その場合には運転自体が最重要であり、それ以外の車載機器に対する操作については、なるべく運転に支障がないことが好ましい。したがって、車載機器としてのカーナビゲーションシステム２や空調システムを前提とした音声認識装置３０の場合には、より一層の利点がある。
【００６１】
もちろん、このような視点で考えるならば、ナビゲーションシステムや空調システム以外の車載機器に対しても同様に利用することができ、例えばカーオーディオ機器などは有効である。また、それ以外にも、いわゆるパワーウインドウの開閉やミラー角度の調整などを音声によって指示するような構成を考えれば、そのような制御対象についても同様に適用でき、やはり有効である。
【００６２】
また、車載機器用とした場合にはそれ特有の利点があることは述べたが、音声認識装置３０の適用先としては、利用者による音声入力指示にしたがって所定の処理を実行するものであれば同様に考えられる。例えば、携帯用の情報端末装置、あるいは街頭やパーキングエリアなどに設定される情報端末装置などにも同様に適用できる。
【図面の簡単な説明】
【図１】本発明の実施形態としてのカーナビゲーションシステムの概略構成を示すブロック図である。
【図２】音声認識装置が実行する処理を示すフローチャートである。
【符号の説明】
２…カーナビゲーションシステム４…位置検出器
６…地図データ入力器８…操作スイッチ群
１０…制御回路１２…外部メモリ
１４…表示装置１５…リモコンセンサ
１５ａ…リモコン１６…地磁気センサ
１８…ジャイロスコープ２０…距離センサ
２２…ＧＰＳ受信機３０…音声認識装置
３１…音声認識部３２…対話制御部
３３…音声合成部３４…音声入力部
３５…マイク３６…ＰＴＴスイッチ
３７…スピーカ３８…制御部
３９…表示制御部４０…表示部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a voice recognition device that is effective when, for example, a destination setting or the like in a navigation system can be input by voice, and a navigation system including the voice recognition device.
[0002]
[Prior art]
Conventionally, a speech recognition apparatus that compares input speech with a plurality of comparison target pattern candidates stored in advance and uses a recognition result having a high degree of coincidence has already been put into practical use. This is used for a user to input a place name by voice. In particular, when the driver himself / herself uses the in-vehicle navigation system, since it is not accompanied by button operation or screen gaze if it is a voice input, it is effective because it is highly safe even when the vehicle is running.
[0003]
In addition, since it is burdensome to prepare the voice recognition device side to always prepare for voice input, there is a configuration in which the user himself / herself specifies the start and end timing of the period for inputting the voice to be recognized. Often adopted. As means for designating the input period, for example, a PTT (Push-To-Talk) switch or the like is used. That is, the recognition process for the input voice is executed only when the PTT switch is pressed, and the input voice that is not actually necessary is not processed.
[0004]
Thus, although it is a speech input method with high utility value, the following problems will arise if the user is a beginner who is not familiar with the speech recognition system itself. For example, there is a problem that the voice that is inappropriate for the device is appropriately input due to the fact that the usage method is unknown, or the effectiveness is not improved, or the device itself is not used in an extreme case There is. In addition, although the usage method itself is understood to some extent, but is not used to a specific usage method, there is a problem that, for example, a word to be input does not come out immediately and is silent.
[0005]
As a countermeasure for such a beginner, for example, notification of predetermined guide contents may be considered. For example, it is conceivable to guide what commands can be input. Specifically, the guide contents such as “the destination can be set and the map can be enlarged / reduced” are notified. Alternatively, if the user does not know the specific command input method, it may be possible to notify the description of the input method “Please input from the prefecture name when setting the destination” as the guide content.
[0006]
[Problems to be solved by the invention]
However, when measures are taken for beginners in this way, the content of guides useful for beginners is also unnecessary for skilled workers. In other words, if you are familiar with commands and input methods that you can enter without listening to such guide content, you will not be able to input voice while the guide content is being output. .
[0007]
In addition, it is not possible to reach the level of a skilled person who does not need to listen to the guide content at all, but if it is a “intermediate” level that can be handled by yourself after receiving only a rough guide, the guide of the same content as the beginner Is unnecessary. These beginners, intermediate persons, and skilled persons are not necessarily different people, but the same person is a beginner at first, and then steps up with intermediate and skilled persons. Therefore, it is not preferable for the device side to set a fixed system according to the level of the user, such as for beginners, intermediates, and experts, and for beginners to beginners depending on the situation during use. It is expected that the system setting is such that an intermediate input method can be selected arbitrarily for intermediate users, and an input method for intermediate users can be arbitrarily selected by skilled workers.
[0008]
The present invention solves such problems and provides a speech recognition device that is easy to use for each user who has a different level of familiarity with the method of using the speech recognition device, and a navigation system including the speech recognition device. With the goal.
[0009]
[Means for Solving the Problems and Effects of the Invention]
Claim 1 made to achieve the above object. , 7 According to the voice recognition apparatus described in the above, the user inputs voice via voice input means such as a microphone, but the user himself designates the start and end of the period for inputting the voice to be recognized. Therefore, an input period designating unit is provided, and a voice input within the input period designated by the input period designating unit is a recognition target. The recognizing unit compares the input speech with a plurality of comparison target pattern candidates stored in advance in the dictionary unit, and determines the one having a high degree of coincidence as a recognition result, and notifies the recognition result by the notifying unit. Then, when a predetermined confirmation instruction is given after the recognition result is notified, the post-confirmation processing means executes the predetermined post-confirmation process assuming that the recognition result is confirmed.
[0010]
And in the speech recognition apparatus of the present invention, 2 One or more guide request operations are set. This guide request operation is different from the operation for specifying the start and end of the voice input period for the input period specifying means. 2 More than one guide request operation is set This A guide request level corresponding to the degree of familiarity is set for each guide request operation. Then, when a guide request operation is performed by the input period specifying unit, the guide unit notifies a predetermined guide content corresponding to the corresponding guide request level. Here, in the first aspect of the present invention, the guide unit notifies the setting item itself that can be instructed via the voice input unit as the guide content. According to a seventh aspect of the present invention, the predetermined guide content notified by the guide means is a content related to processing in a system in which a voice recognition result by the voice recognition apparatus is used, and the guide means The guide content corresponding to the process execution stage in the system at the time when the guide content of the system is notified is configured, and the guide means executes the process as the guide content corresponding to the process execution stage in the system. In the stage, the setting item itself that can be instructed via the voice input means is notified.
[0011]
Here, “different from the operation for specifying the start and end of the voice input period. Ru The “id request operation” refers to the following case, for example. For example, it is assumed that the operation for specifying the start and end of the voice input period by the input period specifying means is associated with the start and end of the operation when the switch operation starts and ends for a predetermined time or longer. In this case, voice input is performed after the start of the switch operation, and then the operation ends. Therefore, it is difficult to assume that the time during which the switch operation continues is, for example, 1 second or less. Therefore, for example, a so-called “click operation” in which the time from the start to the end of the switch operation is a relatively short click time, such as 0.5 seconds or less, "sound Different from the operation to specify the start and end of the voice input period Guide request It can be set as “operation”. Of course, if the operation is a special operation that cannot be designated to start and end the normal voice input period on the apparatus side, other operations may be operations for requesting a guide. For example, it may be a “double click operation” in which the click operation is continuously performed twice within a predetermined interval.
[0012]
In this way, When two or more guide request operations are set, A user who needs a guide only needs to perform a corresponding guide request operation so that the guide content is at the guide request level according to his / her familiarity level, while a user who does not need a guide performs a guide request operation. Instead, a predetermined voice input may be performed as usual while performing the input operation of the start and end of the voice input period by the input period specifying means.
[0013]
Since there are various possible contents of the predetermined guide in this case, in order to facilitate understanding, here, some explanations are given assuming that a speech recognition device is applied to input a destination in the navigation system. To do.
(1) First, a user who does not know what he / she can input by voice to the navigation system is assumed. In this case, by performing a click operation, for example, it is conceivable to guide the setting items that can be instructed by voice input, such as “You can set the destination”. The contents may be changed depending on which stage of a series of processes in the target system (in this case, the navigation system) instructed by voice input. For example, if the route to the destination is considered as a series of processing, after the destination is set, the conditions for setting the route, such as whether to specify the waypoint, are further specified. There is a case. Therefore, if there is no voice input even after a predetermined time has elapsed since the start of the voice input period described above was specified in the situation where the destination has already been set, this time " The guide content such as “
[0014]
(2) It is also assumed that a user who knows what he / she can input to the navigation system itself but does not know a specific input method. In this case, it is conceivable that the description of the input method is notified as the guide content, such as “Please input from the prefecture name when setting the destination”. Also, if you still cannot fully understand how the user inputs, and if you are unable to input, please enter a specific name such as "Please enter Showacho, Kariya City, Aichi Prefecture". It is conceivable that a simple input example is notified as guide content. As a result, the user knows a specific input method, and if he / she informs a specific example, the user can easily input a destination desired by the user.
[0015]
Of course, the guide contents described in the above (1) and (2) are only examples, and it is only necessary to output appropriate guide contents suitable for the processing of the navigation system to be applied. What is necessary is just to devise so that it may suit the processing of the system.
[0016]
In addition, if there are two beginner level and intermediate level as the target people who need a guide, what kind of guide content will be used at the beginner level, and what kind of guide content will be used in the intermediate level What is necessary is just to set suitably also according to the difficulty of the usage method of the said system. For example, at the beginner level, when there is a guide request operation and the guide contents are notified in the order of A → B → C, at the intermediate level, the guide contents of A, B, and C are collectively notified. It is also possible to end up. Of course, depending on the system to be applied, a target person who needs a guide may be classified into three or more levels.
[0017]
On the other hand, unless the guide request operation is performed, the contents of the guide will not be notified without permission. For users who do not need the guide, the input period specifying means starts and ends the voice input period while performing the specified operation. If the voice input is performed, a desired voice input can be performed in a relatively short time (that is, compared to a state where the guide contents are informed).
[0018]
As described above, according to the speech recognition apparatus of the present invention, usability is improved for each user who has a different level of familiarity with the usage method of the speech recognition apparatus. In addition, a special configuration is not required to accept the guide request operation, but the input period designation means necessary for basic processing for voice input can also be used for the guide request operation, thus simplifying the configuration. It also contributes to
[0019]
Note that each of the guide request operations in the case where the beginner level and the intermediate level are provided as the target persons who need the guide, claim 2. , 8 It can be considered as shown in. That is, it is assumed that the operation for specifying the start and end of the voice input period by the input period specifying means corresponds to the start and end of the operation when the switch operation starts and ends for a predetermined time or longer. As a guide request operation for the level and intermediate level, a click operation and a double click operation are set corresponding to each level operation.
[0020]
Claims 3, 9 As shown in FIG. 2, in the guide for beginners, when the predetermined guide content is notified, and further click operation is performed without voice input, the guide content more detailed than the previously notified guide content is notified. You may do it.
[0021]
Of course, these are only examples, and it is only necessary that the difference in operation contents can be distinguished on the apparatus side. The notification form of the recognition result by the notification means or the notification form of the guide content by the guide means may be any form as long as it can be notified to the user. 4, 10 As shown in FIG. 4, it is considered preferable to perform at least voice notification. For example, when used for in-vehicle equipment such as a car navigation system, the driver does not need to shift the viewpoint to the display device if the sound is output, which is advantageous in terms of further ensuring safe driving. This is because it is. Further, if the recognition result is notified by voice and the guide contents are also notified by voice, the hardware configuration for them can be shared. However, for example, an image may be notified in addition to the sound. Although it has been stated that audio output is advantageous when applied as an in-vehicle device, there are of course situations where the vehicle is not running. Both are possible.
[0022]
Claims 1 to 10 When the voice recognition device according to any one of the above is used for a navigation system, 11 It is possible to configure as shown in FIG. That is, claims 1 to 10 The voice recognition device according to any one of the above and a navigation device, and the voice input means of the voice recognition device includes at least predetermined navigation processing related data that is required to be specified when the navigation device performs the navigation processing. The post-confirmation processing means is configured to output the recognition result by the recognition means to the navigation device. In this case, the “predetermined navigation processing related data” is representative of the destination, but other than that, it is necessary to specify it for navigation processing, such as selecting conditions for route search. Instructions are included.
[0023]
Note that the application destination of the voice recognition device is not limited to the navigation system described above. For example, when the speech recognition apparatus is used for an air conditioning system, it is conceivable to adjust a set temperature, select an air conditioning mode (cooling / heating / dry), or select a wind direction mode by voice input. In this case, the setting item (temperature, air conditioning mode, wind direction mode, etc.) itself is notified as a guide content, or it can be said that “set the set temperature to 25 degrees” or “set the set temperature to 5”. It may be possible to further inform as guide contents whether or not it should be said that “degrade”. The same applies to the air conditioning mode and the wind direction mode.
[0024]
The navigation system and the air conditioning system described above are not limited to being used as in-vehicle devices, but may be, for example, a portable navigation device or an indoor air conditioner. However, as described above, when used for in-vehicle equipment, it is considered that the user is a driver. In that case, driving itself is the most important. It is preferable that there is no hindrance. Therefore, in the case of a speech recognition device based on a navigation system or an air conditioning system as an in-vehicle device, there are further advantages. Of course, from this point of view, it can also be used for in-vehicle devices other than navigation systems and air conditioning systems. For example, car audio equipment is effective. Also, considering a configuration in which a so-called power window opening / closing or mirror angle adjustment is instructed by voice, it is effective even in such a situation.
[0025]
In addition, although it has been described that there is a unique advantage when used for in-vehicle devices, the application of the speech recognition apparatus of the present invention is to execute predetermined processing in accordance with a voice input instruction by a user. If there is, it is considered similarly. For example, the present invention can be similarly applied to a portable information terminal device or an information terminal device set in a street or a parking area.
[0026]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a block diagram showing a schematic configuration of a car navigation system 2 to which a voice recognition device 30 according to an embodiment of the present invention is applied. The car navigation system 2 includes a position detector 4, a map data input device 6, an operation switch group 8, a control circuit 10 connected thereto, an external memory 12 connected to the control circuit 10, a display device 14, and a remote control sensor 15. And a voice recognition device 30. The control circuit 10 is configured as a normal computer, and includes a well-known CPU, ROM, RAM, I / O, and a bus line for connecting these configurations.
[0027]
The position detector 4 includes a well-known geomagnetic sensor 16, a gyroscope 18, a distance sensor 20, and a GPS receiver 22 for GPS (Global Positioning System) that detects the position of the vehicle based on radio waves from a satellite. ing.
Since these sensors 16, 18, 20, and 22 have errors of different properties, they are configured to be used while being interpolated by a plurality of sensors. Depending on the accuracy, a part of the above may be used, and further, a steering rotation sensor, a wheel sensor of each rolling wheel, or the like may be used.
[0028]
The map data input device 6 is a device for inputting various data including so-called map matching data, map data, and landmark data for improving the accuracy of position detection. As a medium, a CD-ROM is generally used because of the amount of data, but another medium such as a memory card may be used.
[0029]
The display device 14 is a color display device. On the screen of the display device 14, the vehicle current position mark input from the position detector 4, the map data input from the map data input device 6, and further displayed on the map. Additional data such as guidance routes to be set and landmarks of setting points to be described later can be displayed in an overlapping manner.
[0030]
Further, when the destination position is input from the remote control sensor 15 or the operation switch group 8 via a remote control terminal (hereinafter referred to as a remote controller) 15a, the present car navigation system 2 is moved from the current position to the destination. There is also a so-called route guidance function that automatically selects the optimal route and forms and displays a guidance route. As a method for automatically setting an optimal route, a method such as the Dijkstra method is known. For example, a touch switch or a mechanical switch integrated with the display device 14 is used as the operation switch group 8 and is used for various inputs.
[0031]
The voice recognition device 30 is used for the operation switch group 8 or the remote controller 15a to manually instruct a destination or the like, but when the user inputs a voice, the destination or the like is similarly applied. It is a device for making it possible to instruct.
[0032]
The speech recognition apparatus 30 includes a speech recognition unit 31 as a “recognition unit”, a dialogue control unit 32 as a “post-processing unit”, a speech synthesis unit 33, a speech extraction unit 34, and a “speech input unit”. A microphone 35, a PTT (Push-To-Talk) switch 36 as “input period designating means”, a speaker 37, a control unit 38, a display control unit 39, and a display unit 40.
[0033]
The voice recognition unit 31 performs input voice recognition processing on the voice data input from the voice extraction unit 34 according to an instruction from the dialogue control unit 32, and returns the recognition result to the dialogue control unit 32. That is, the speech data acquired from the speech extraction unit 34 is collated using the stored dictionary data, and the upper comparison target pattern having a higher degree of coincidence than a plurality of comparison target pattern candidates is displayed in the dialog control unit 32. Output to. The recognition of the word sequence in the input speech is performed by sequentially analyzing the speech data input from the speech extraction unit 34 to extract an acoustic feature amount (for example, cepstrum), and at the time of the acoustic feature amount obtained by this acoustic analysis. Get series data. Then, the time series data is divided into several sections by a known DP matching method, HMM (Hidden Markov Model), or a neural network, and it is determined which word is stored as dictionary data. .
[0034]
Based on the recognition result in the voice recognition unit 31 and the instruction from the control unit 38, the dialogue control unit 32 outputs a response voice to the voice synthesis unit 33, outputs a response display to the display control unit 39, or system For example, a process of instructing the control circuit 10 that executes the process itself to notify the destination necessary for the navigation process and execute the setting process is executed. Such a process is a post-confirmation process. As a result, if the voice recognition device 30 is used, the destination of the navigation system can be instructed by voice input without manually operating the operation switch group 8 or the remote controller 15a. Is possible.
[0035]
The voice synthesizer 33 synthesizes a voice based on a response voice output instruction from the dialogue control unit 32 using a voice waveform stored in the waveform database. This synthesized voice is output from the speaker 37. In addition, the display control unit 39 generates a display image based on the response display output instruction from the dialogue control unit 32. This generated image is output to the display unit 40.
[0036]
The voice extraction unit 34 converts the surrounding voice captured by the microphone 35 into digital data and outputs the digital data to the voice recognition unit 31. Specifically, in order to analyze the feature amount of the input voice, for example, a frame signal of a section of about several tens of milliseconds is cut out at a constant interval, and whether the input signal is a voice section including the voice is included. Determine if there is no noise interval. Since the signal input from the microphone 35 includes not only the speech to be recognized but also noise, the speech section and the noise section are determined. Many methods have been proposed as this determination method. For example, the short-time power of the input signal is extracted at regular intervals, and depending on whether or not the short-time power equal to or greater than a predetermined threshold continues for a certain period. A method of determining whether a speech section or a noise section is often used. Then, when it is determined that it is a voice section, the input signal is output to the voice recognition unit 31.
[0037]
In the present embodiment, the user inputs voice through the microphone 35 while pressing the PTT switch 36. Specifically, the control unit 38 monitors the timing when the PTT switch 36 is pressed, the timing when the PTT switch 36 is returned, and the time during which the pressed state continues, and if the PTT switch 36 is pressed, the voice extraction unit 34 and the voice person section 31 are instructed to execute the process. On the other hand, when the PTT switch 36 is not pressed, the processing is not executed. Accordingly, voice data input via the microphone 35 while the PTT switch 36 is being pressed is output to the voice recognition unit 31.
[0038]
The control unit 38 can also determine that the PTT switch 36 has been clicked and double-clicked. Specifically, when the PTT switch 36 is turned off in a relatively short time (for example, within 0.5 seconds) after the PTT switch 36 is turned on, it is regarded as a click operation. Then, when the click operation is continuously performed twice within a predetermined interval (for example, within 0.5 seconds), it is regarded as a click operation.
[0039]
Next, the operation of the car navigation system 2 according to the first embodiment will be described. Since the portion related to the speech recognition device 30 is characteristic, the general operation as the car navigation system 2 will be briefly described, and then the operation related to the speech recognition device 30 will be described in detail. .
[0040]
After the car navigation system 2 is turned on, the driver can operate the guide route from the menu displayed on the display device 14 by the remote controller 15a (the operation switch group 8 can also be operated in the same way). When the route information display process is selected for display on the screen, or by inputting a desired menu through the voice recognition device 30 through the microphone 35, the remote control 15a is transferred from the dialogue control unit 32 to the control circuit 10. When the same instruction is given to select via the following, the following processing is performed.
[0041]
That is, when the driver inputs the destination by operating voice or a remote controller based on the map on the display device 14, the current location of the vehicle is obtained based on the satellite data obtained from the GPS receiver 22, and the destination and current location are determined. In the meantime, the cost is calculated by the Dijkstra method, and the shortest route from the current location to the destination is obtained as the guidance route. Then, the guidance route is displayed on the road map on the display device 14 to guide the driver of the appropriate route. Such calculation processing and guidance processing for obtaining a guidance route are generally well-known processing, and thus description thereof is omitted.
[0042]
Next, the operation of the voice recognition device 30 will be described by taking as an example the case where the destination for the above-described route guidance is inputted by voice.
First, in the first step S110, it is determined whether or not the PTT switch 36 is turned on (pressed). This determination is made by the control unit 38. When the PTT switch 36 is turned on (S110: YES), it is determined whether or not a click operation is performed in the subsequent S120. This determination is also performed by the control unit 38. Specifically, when the PTT switch 36 is turned off for a relatively short time (for example, within 0.5 seconds), the click operation is performed. I reckon. If a click operation has been performed (S120: YES), it is determined in the subsequent S130 whether the operation is a double click operation. This determination is also performed by the control unit 38. Specifically, if the click operation is performed twice within a predetermined interval (for example, within 0.5 seconds), it is regarded as a double click operation. .
[0043]
If a negative determination is made in S130, it is determined that a click operation has been performed, the process proceeds to S140, and the contents of the guide for beginners are notified. On the other hand, if an affirmative determination is made in S130, it is determined that a double-click operation has been performed, and the process proceeds to S150 to notify the guide content for intermediate players.
[0044]
The contents of the guide for beginners and intermediates will be described later, and the description of the flowchart of FIG. 2 will be continued.
If the negative determination is not made in S120, that is, if it is not a click operation, the PTT switch 36 is kept pressed for a predetermined time or more, so that the process proceeds to S160 to execute voice extraction and recognition processing. The voice extraction unit 34 determines whether the voice segment is a voice segment or a noise segment based on the voice data input via the microphone 35, and outputs the voice segment data to the voice recognition unit. The voice recognition unit 31 collates the voice data acquired from the voice extraction unit 34 using stored dictionary data. Then, the upper comparison target pattern determined by the collation result is output to the dialogue control unit 32 as a recognition result.
[0045]
While the PTT switch 36 is kept on (S170: YES), such voice extraction / recognition (S160) is executed, and when the PTT switch 36 is turned off (S170: NO), it continues. In S180, the recognition result is talkbacked and displayed. That is, the dialogue control unit 32 controls the voice synthesis unit 33 and the display control unit 39 to output the recognized result from the speaker 37 by voice and to display a sentence indicating the recognition result on the display unit 40.
[0046]
Thereafter, in S190, it is determined whether or not the recognition is correct. This is determined according to an instruction from the user. For example, if the recognition is correct, “Yes” is input from the microphone 35, and if it is incorrect, “No” is input. It is possible to keep it. Of course, these instructions may be input via the operation switch group 8.
[0047]
If the recognition is incorrect (S190: NO), the process returns to S110. However, if the recognition is correct (S190: YES), the process proceeds to S200 to determine the recognition result. In subsequent S210, predetermined post-determination processing is executed. The post-determination process in this case is a process of outputting data related to “a destination for route guidance” as a recognition result to the control circuit 10 (see FIG. 1).
[0048]
After such post-determination process (S210) is completed, the process proceeds to S110, and the process returns to the determination process of whether or not the PTT switch 36 is turned on (pressed).
In the case of the present embodiment, the dialogue control unit 32, the voice synthesis unit 33, the control unit 38, the display control unit 39, and the display unit 40 correspond to “guide means”.
[0049]
Here, the content of the guide for beginners and intermediate users notified in S140 and S150 will be described.
First, a beginner's guide will be described.
(1) This guide is intended for users who do not know what they can input by voice to the navigation system. For this reason, for example, it is possible to guide the setting items that can be instructed by voice input, such as “You can set the destination”.
[0050]
Note that the content may be changed depending on which stage in the series of processes in the target system (in this case, the navigation device) instructed by voice input. For example, if the route to the destination is considered as a series of processing, after the destination is set, the conditions for setting the route, such as whether to specify the waypoint, are further specified. There is a case. Therefore, when a click operation is performed in a situation where the destination is already set, it is sufficient to notify the guide content such as “you can specify a waypoint”.
[0051]
{Circle around (2)} It is also assumed that even the user who has been given the above-mentioned guide and knows what to input to the navigation system does not know the specific input method. Therefore, if the click operation is performed without any voice input after the above-mentioned “You can set the destination” guide, “If you want to set the destination, enter the name of the prefecture. For example, it is conceivable to notify the description of the input method as guide contents.
[0052]
Note that even if this is not possible, it is assumed that the user cannot completely understand how to input, so it is assumed that the user cannot input. If the user clicks without any voice input after the guide, please enter a specific input example such as “Please enter Showa-cho, Kariya City, Aichi Prefecture”. It is conceivable to notify the contents. As a result, the user knows a specific input method, and if he / she informs a specific example, the user can easily input a destination desired by the user.
[0053]
Next, the intermediate guide will be described.
In the above-mentioned guide for beginners, every time a destination is clicked, the destination is set step by step, such as “You can set the destination” → “You can specify the waypoint” → Others did. On the other hand, as a guide for intermediate persons, it is also conceivable to report them together. In addition, the basic guide that the destination can be set is not provided, but only the specifiable conditions (that is, the waypoint specification, the road type specification, etc.) at the time of setting may be guided.
[0054]
Also, the beginner's guide as described in (2) above does not need to be reported for intermediate users.
As described above, if the voice recognition device 30 according to the present embodiment is used, a beginner who does not know what to input can be clicked. Because it is done, it is very easy to use. In addition, for intermediate players who know the basics but do not know the details, double-clicking will provide the necessary guides for the “intermediate” in the scene, so it is also very It is easy to use. It's also not confusing because it's not sequential, like a beginner's guide.
[0055]
On the other hand, since the guide contents are not notified without permission unless an operation requesting a guide such as a click or double click operation is performed, for a user who does not require a guide, while the PTT switch 36 is being pressed, If a predetermined voice input is performed, a desired voice input can be performed in a relatively short time (that is, compared to a state where the guide content is informed).
[0056]
As described above, according to the speech recognition apparatus 30 of the present embodiment, usability is improved for each user who has a different level of familiarity with the usage method of the speech recognition apparatus 30. In addition, since a special configuration is not required to accept the guide request operation, the PTT switch 36 and the control unit 38 necessary for basic processing for voice input can also be used for the guide request operation. It also contributes to simplification of the configuration.
[0057]
As described above, the present invention is not limited to such an embodiment, and can be implemented in various forms without departing from the gist of the present invention.
(1) For example, in the above-described embodiment, the guide content related to destination setting has been described, but other guide content for map enlargement / reduction, direction designation (such as north-up / heading-up), or the like If it is an item that allows voice input in the car navigation system, the contents of the guide can be set similarly.
[0058]
(2) In the above-described embodiment, the long press of the PTT switch 36 is used as an operation for instructing the voice input period. However, a click operation or a double click operation is adopted for the start and end designation operations of the voice input period. Also good. In that case, as the guide request operation, an operation in which the switch operation start to end is a predetermined time or longer is adopted. In this case, the guide content may be started to be notified when the switch operation is continued for a predetermined time, and the guide content notification may be ended when the switch operation is finished. And what is necessary is just to recognize the user's audio | voice input after alert | reporting the content of the guide.
[0059]
(3) Moreover, in the said embodiment, although it was set as two types, a beginner and an intermediate level, as a user level of a guide object, you may set three or more types.
(4) In the above embodiment, the recognition result is notified and the guide content is notified by two types of sound and image. However, any form may be used as long as the user can be notified. If only one is employed, it is preferable to perform voice notification. This is because it is used as an in-vehicle device such as a car navigation system 2, and if it is output by voice, the driver does not have to shift the viewpoint to the display device, and further secures safe driving. This is because it is advantageous in that respect. Of course, as in the above-described embodiment, if the notification is made by both sound and image, the driver can perform both confirmation by display and confirmation by sound.
[0060]
(5) Further, the application destination of the voice recognition device 30 is not limited to the navigation system described above. For example, when the speech recognition apparatus is used for an air conditioning system, it is conceivable to adjust a set temperature, select an air conditioning mode (cooling / heating / dry), or select a wind direction mode by voice input. In this case, the setting item (temperature, air conditioning mode, wind direction mode, etc.) itself is notified as a guide content, or it can be said that “set the set temperature to 25 degrees” or “set the set temperature to 5”. It may be possible to further inform as guide contents whether or not it should be said that “degrade”. The same applies to the air conditioning mode and the wind direction mode.
Furthermore, the present invention is not limited to the case where the navigation system or the air conditioning system is used as an in-vehicle device, and may be used other than the in-vehicle device such as a portable navigation device or an indoor air conditioner. However, as described above, when used for in-vehicle devices, it is considered that the user is a driver. In that case, driving itself is the most important, and operations for other in-vehicle devices are as much as possible. It is preferable that there is no trouble in driving. Therefore, in the case of the speech recognition apparatus 30 based on the car navigation system 2 or the air conditioning system as the in-vehicle device, there are further advantages.
[0061]
Of course, from this point of view, it can be used in the same way for in-vehicle devices other than the navigation system and the air conditioning system. For example, a car audio device is effective. In addition, if a configuration in which so-called power window opening / closing and mirror angle adjustment are instructed by voice is considered, it can be similarly applied to such a control target and is also effective.
[0062]
In addition, although it has been described that there is a unique advantage when it is used for in-vehicle devices, the application destination of the speech recognition device 30 may be any one that executes predetermined processing in accordance with a voice input instruction by a user. The same can be considered. For example, the present invention can be similarly applied to a portable information terminal device or an information terminal device set in a street or a parking area.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of a car navigation system as an embodiment of the present invention.
FIG. 2 is a flowchart showing processing executed by the speech recognition apparatus.
[Explanation of symbols]
2. Car navigation system 4. Position detector
6 ... Map data input device 8 ... Operation switch group
10 ... Control circuit 12 ... External memory
14 ... Display device 15 ... Remote control sensor
15a ... remote control 16 ... geomagnetic sensor
18 ... Gyroscope 20 ... Distance sensor
22 ... GPS receiver 30 ... Voice recognition device
31 ... Voice recognition unit 32 ... Dialog control unit
33 ... Speech synthesis unit 34 ... Speech input unit
35 ... Microphone 36 ... PTT switch
37 ... Speaker 38 ... Control unit
39 ... Display control unit 40 ... Display unit

Claims

Voice input means for inputting voice;
An input period designating unit for performing a predetermined operation for the user to designate the start and end of a period for inputting a speech to be recognized using the voice input unit;
The voice inputted through the voice input means within the input period designated by the input period designating means is compared with a plurality of comparison target pattern candidates stored in advance in the dictionary means, and has a high degree of coincidence A recognition means as a recognition result;
Notification means for notifying the recognition result by the recognition means;
When a predetermined confirmation instruction is given after the recognition result is notified by the notification means, a post-confirmation processing means for executing a predetermined post-confirmation process as confirming the recognition result;
A speech recognition device comprising:
In the input period specifying means, two or more guide request operations different from the start and end specifying operations of the voice input period are set, and a guide request corresponding to the degree of familiarity is set for each of the guide request operations. Level is set,
When a guide request operation is performed by the input period specifying means, the guide means for notifying a predetermined guide content according to the corresponding guide request level,
The voice recognition apparatus characterized in that the guide means notifies the setting item itself that can be instructed via the voice input means as the guide content.

The speech recognition apparatus according to claim 1,
The operation for specifying the start and end of the voice input period by the input period specifying means corresponds to the operation start and end when the switch operation start to end is a predetermined time or more, respectively.
On the other hand, the guide request level includes two levels, a beginner level and an intermediate level, and the guide request operation for the two levels includes a click operation in which the start and end of the switch operation are performed within a predetermined click time, A speech recognition apparatus, wherein a double-click operation in which a click operation is continuously performed twice within a predetermined interval corresponds to an operation at any level.

The speech recognition device according to claim 2,
In the guide for beginners, the guide means notifies the guide contents in more detail than the guide contents previously notified when a click operation is performed without voice input after notifying the predetermined guide contents. A speech recognition apparatus characterized by:

The speech recognition device according to any one of claims 1 to 3,
The voice recognition characterized in that the notification means is configured to notify the recognition result at least by voice, and the guide means is configured to notify the predetermined guide content at least by voice. apparatus.

The speech recognition apparatus according to any one of claims 1 to 4,
The voice recognition apparatus characterized in that the predetermined guide contents notified by the guide means are contents relating to processing in a system in which a voice recognition result by the voice recognition apparatus is used.

The speech recognition apparatus according to claim 5.
The speech recognition apparatus, wherein the guide means is configured to notify guide content corresponding to a process execution stage in the system at the time when the predetermined guide content is notified.

Voice input means for inputting voice;
An input period designating unit for performing a predetermined operation for the user to designate the start and end of a period for inputting a speech to be recognized using the voice input unit;
The voice inputted through the voice input means within the input period designated by the input period designating means is compared with a plurality of comparison target pattern candidates stored in advance in the dictionary means, and has a high degree of coincidence A recognition means as a recognition result;
Notification means for notifying the recognition result by the recognition means;
When a predetermined confirmation instruction is given after the recognition result is notified by the notification means, a post-confirmation processing means for executing a predetermined post-confirmation process as confirming the recognition result;
A speech recognition device comprising:
In the input period specifying means, two or more guide request operations different from the start and end specifying operations of the voice input period are set, and a guide request corresponding to the degree of familiarity is set for each of the guide request operations. Level is set,
When a guide request operation is performed by the input period specifying means, the guide means for notifying a predetermined guide content according to the corresponding guide request level,
The predetermined guide content notified by the guide means is content related to processing in a system in which a speech recognition result by the speech recognition apparatus is used,
The guide means is configured to notify the guide content corresponding to the process execution stage in the system at the time when the predetermined guide content is notified,
A voice recognition apparatus characterized in that the guide means informs a setting item itself that can be instructed via the voice input means in the process execution stage as guide contents corresponding to the process execution stage in the system. .

The speech recognition apparatus according to claim 7.
The operation for specifying the start and end of the voice input period by the input period specifying means corresponds to the operation start and end when the switch operation start to end is a predetermined time or more, respectively.
On the other hand, the guide request level includes two levels, a beginner level and an intermediate level, and the guide request operation for the two levels includes a click operation in which the start and end of the switch operation are performed within a predetermined click time, A speech recognition apparatus, wherein a double-click operation in which a click operation is continuously performed twice within a predetermined interval corresponds to an operation at any level.

The speech recognition apparatus according to claim 8.
In the guide for beginners, the guide means notifies the guide contents in more detail than the guide contents previously notified when a click operation is performed without voice input after notifying the predetermined guide contents. A speech recognition apparatus characterized by:

The speech recognition apparatus according to any one of claims 7 to 9 ,
The voice recognition characterized in that the notification means is configured to notify the recognition result at least by voice, and the guide means is configured to notify the predetermined guide content at least by voice. apparatus.

A voice recognition device according to any one of claims 1 to 10 and a navigation device,
The voice input means of the voice recognition device is used for a user to input by voice an instruction of predetermined navigation processing related data that needs to be specified at least when the navigation device performs a navigation process. The navigation system is characterized in that the post-determination processing means is configured to output a recognition result by the recognition means to the navigation device.