JP2004354722A

JP2004354722A - Speech recognition device

Info

Publication number: JP2004354722A
Application number: JP2003152645A
Authority: JP
Inventors: Masaru Yamazaki; 勝山崎; Takeshi Ono; 健大野
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2003-05-29
Filing date: 2003-05-29
Publication date: 2004-12-16

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition device for vehicles which can enhance recognition accuracy regardless of the presence or absence of environmental noise and sound volume and is high in convenience. <P>SOLUTION: The sound volume of noise in the vehicle is detected by a noise detecting section 103, and a speech input system is changed over in a speech recognition switch control section 105 based on the detected sound volume of the noise. The speech input system includes a fist mode for continuously awaiting a series of information, a second mode for assigning the timing for starting the input of respective pieces of the information, and a third mode for assigning the input start and input end of respective pieces of the information, and a system of the best operability and the highest convenience is selected under the conditions of the noise volume in the vehicle chamber. Also, the driving load of a driver is detected in a driving load detecting section 108 and if the driving load is large, the change of an utterance input mode is prevented from being carried out so as to reduce the driver's load. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【技術分野】
本発明は、車両用の音声認識装置に関し、特に、車室内の発話音声以外の音（環境騒音）の有無、音量にかかわらず、発話された音声の認識精度を高めることができる音声認識装置に関する。
【０００２】
【背景技術】
たとえばナビゲーションシステム、オーディオシステム、あるいは車載電話システム等の車載装置において、音声認識技術を利用し、運転者等の使用者が発話することにより操作したり情報を入力したりすることのできる車両用の装置が使用されつつある。
この種の車両用の音声認識装置であって、騒音環境下でも認識率を向上させることができる装置として、たとえば特許文献１に開示された音声認識装置が知られている。この装置は、入力信号波の音声区間を検出し、この入力信号波の音声区間と認識対象となる単語を記録してある音声辞書中の単語との一致度を演算し、最も一致度の高い単語を認識結果として出力する方式を適用した装置である。この方式においては、特に、認識対象となる単語の語頭の半音節を省略した単語を認識対象とし、これを入力信号波の音声区間との一致度の演算に用いるように構成している。
【０００３】
しかしながら、そのような従来の音声認識装置においては、たとえば次のような問題がある。車両用の音声認識装置は、エアコン等の高騒音の機器の使用中に音声認識を行う場合がある。このような場合、たとえ認識対象となる単語の語頭の半音節を省略した単語を認識対象の単語として入力信号波の音声区間との一致度を演算したとしても、図８の最下図に示すように、騒音の区切りを話者が発した音声の区切りとして誤認識してしまい、話者が話し始める前に音声認識を終了してしまう場合が少なくない。そのような場合、発話された音声がそもそも音声認識の処理対象として入力されないこととなり、音声認識を利用した装置の操作性、利便性を向上させることができない。
【０００４】
【特許文献１】特開平１０−６９２９１号公報
【０００５】
【発明の開示】
本発明は、上記問題に鑑みてなされたものであって、その目的は、環境騒音の有無、音量にかかわらず認識精度を高めることができ、これにより運転者等の使用者にとって利便性の高い種々の車載装置を提供することができる車両用の音声認識装置を提供することにある。
【０００６】
上記目的を達成するために、本発明の音声認識装置は、発話された音声を入力する音声入力手段と、前記入力された音声の発話内容を認識する音声認識手段とを有する車両用の音声認識装置であって、車室内の前記発話以外の音である騒音を検出する騒音検出手段と、前記音声入力手段に対する発話された音声の入力タイミングを特定する方式を、前記検出された車室内の騒音の音量に基づいて、発話ごとに発話開始時および発話終了時を指定する方式を含む所定の複数の入力方式の中のいずれかの方式に切り換える音声入力制御手段とを有する。
【０００７】
このような構成の音声認識装置においては、騒音検出手段において車室内の騒音の音量を検出し、検出した騒音の音量に基づいて、音声入力制御手段において音声入力方式を切り換えている。また、その切り換える方式の１つとして、発話ごとに開始と終了とを指定する確実性の高い方式を含んでおり、たとえば騒音の音量がある程度大きい場合にも適切に音声の入力タイミングを規定できるようにしている。したがって、車室内の騒音量という条件の下で最も操作性がよく利便性の高い方式を選択することができる。その結果、環境騒音があったとしても適切に音声入力のタイミングを特定することができ、発話した音声が音声認識の処理対象としてそもそも入力されないという障害が生じる可能性を低減させることができる。そして、適切に音声が入力されることにより、音声認識手段においては入力された音声を精度よく認識することができる。
【０００８】
このように、本発明によれば、環境騒音の有無、音量にかかわらず認識精度を高めることができ、これにより運転者等の使用者にとって利便性の高い種々の車載装置を提供することができる車両用の音声認識装置を提供することができる。にある。
【０００９】
【発明の実施の形態】
本発明の一実施の形態について、図１〜図５を参照して説明する。
図１は、本発明の一実施の形態の車両用の音声認識装置の構成を示すブロック図である。
図１に示すように、音声認識装置１は、マイク１００、音声認識スイッチ１０１、キャンセルスイッチ１０２、騒音検出部１０３、音声認識部１０４、音声認識スイッチ制御部１０５、ディスプレイ１０６、スピーカ１０７および運転負荷検出部１０８を有する。また、騒音検出部１０３は、騒音取得マイク部１０９、車速取得部１１０およびエンジン回転数取得部１１１を有する。また、運転負荷検出部１０８は、ナビゲーション装置１１２および車速取得部１１３を有する。
【００１０】
まず、各部の構成について説明する。
マイク１００は、車両の運転者等が発話する音声を集音し入力するために、たとえば車両の運転席あるいはその近傍に設けられたマイクである。マイク１００は、周囲で発せられた音声等を集音し、電気信号に変換し、さらにデジタルデータに変換して音声認識部１０４に出力する。
【００１１】
音声認識スイッチ１０１は、運転者が発話した音声を音声認識装置１に入力するタイミングを指定するためのプッシュ式スイッチである。音声認識スイッチ１０１は、押下されている間オン（ＯＮ）信号を音声認識部１０４に出力する。
音声認識スイッチ１０１は、図２に示すようなホールド機構を具備するスイッチ１２０により構成される。スイッチ１２０においては、押下される押ボタン１２１と一体的に移動する可動端子１２２の端部にスプリング（コイルばね）１２３の一方の端部が接続されている。また、そのスプリング１２３の他方の端部はスイッチ１２０のフレーム内に固定された電磁石１２４に固定されている。これにより、押ボタン１２１が押し込まれ可動端子１２２が接触端子１２５の間に保持されている時に、後述する音声認識スイッチ制御部１０５より電磁石１２４の端子１２６に通電をすると、電磁石１２４に磁力が生じ、可動端子１２２に吸引力が作用し、可動端子１２２は接触端子１２５の間に維持される。
その結果、操作者の押ボタン１２１を押し込む力が解除された後も、音声認識スイッチ１０１の押ボタン１２１は押し込まれた状態に維持される。また、この間、音声認識スイッチ１０１からは継続してオン信号が音声認識部１０４に出力される。
【００１２】
キャンセルスイッチ１０２は、音声により入力した情報を取り消し、再入力等の処理に移るためのプッシュ式スイッチである。キャンセルスイッチ１０２は、押下された場合にキャンセル信号を音声認識部１０４に出力する。
音声認識装置１においては、運転者がマイク１００より入力した内容は、音声認識部１０４において認識された後、確認のためにディスプレイ１０６またはスピーカ１０７を介して運転者に知らされる。その際、その示された情報が運転者が入力しようとした内容と異なる場合がある。これは、運転者が間違った内容を発話した場合や、音声認識部１０４が誤認識をした場合等に生じる。このような場合、キャンセルスイッチ１０２を押下することにより、キャンセル信号が音声認識部１０４に出力され、間違った入力内容を取り消す処理が行われる。また、必要に応じて、音声を再入力するように、音声認識装置１の動作が制御される。
【００１３】
騒音検出部１０３は、車室内の音量（騒音量）に関わる情報を検出し、その検出した情報を音声認識スイッチ制御部１０５に出力する。
本実施の形態において騒音検出部１０３は、騒音取得マイク部１０９、車速取得部１１０およびエンジン回転数取得部１１１を有する。
騒音取得マイク部１０９は、車室内で実際に聴取される音であって、運転者のマイク１００に対する意図的な発話以外の音（騒音）を集音するためのマイクである。路面の凹凸あるいは風切り音等により車室内で聞かれる騒音は、騒音取得マイク部１０９により集音される。騒音取得マイク部１０９は、騒音を集音するのに好適な車室内の所定の位置に設けられ、車室内の音を集音し、これをデジタルデータに変換して音声認識スイッチ制御部１０５に出力する。
【００１４】
車速取得部１１０は、騒音量を推定するためのデータとして、車両の走行速度の情報を取得し、これを音声認識スイッチ制御部１０５に出力する。通常、車速が速くなれば風切り音や路面の凹凸等による走行ノイズも大きくなることから、車両の走行速度を騒音量を推定するためのデータとして使用する。車速取得部１１０は、たとえば車速センサ等からの出力信号を他の制御系とともに受けることにより車速情報を取得する。
【００１５】
エンジン回転数取得部１１１は、騒音量を推定するためのデータとして、エンジンの回転数の情報を取得し、これを音声認識スイッチ制御部１０５に出力する。通常、エンジンの回転数が高くなればエンジン音が大きくなることから、エンジン回転数を騒音量を推定するためのデータとして使用する。エンジン回転数取得部１１１は、たとえば車両のエンジン制御回路からの出力信号を他の制御系とともに受けることにより、エンジンの回転数情報を取得する。
これら騒音検出部１０３の騒音取得マイク部１０９、車速取得部１１０およびエンジン回転数取得部１１１は、所定の時間間隔で、それら騒音量、車速あるいはエンジン回転数の情報を取得し、音声認識スイッチ制御部１０５に出力する。
【００１６】
運転負荷検出部１０８は、運転者にかかる運転の負荷に関する情報を取得し、その情報を音声認識スイッチ制御部１０５に出力する。
運転負荷検出部１０８は、ナビゲーション装置１１２および車速取得部１１３を有する。
ナビゲーション装置１１２は、車両が走行している走行路の種別や状況等の情報を、音声認識スイッチ制御部１０５に出力する。たとえば一般道路を走行する場合は、信号や右左折の操作、歩行者やバイク等の障害物の存在等により、高速道路等の自動車専用道路を走行する場合に比べて運転における負荷が大きくなる。また、同じ一般道路でも、道幅が細い、交通量が多い、道路形状が複雑である等の条件によっても運転における負荷が大きくなる。そのため、運転の負荷を推定するためのデータとして、走行路の情報を検出する。ナビゲーション装置１１２は、運転者へ目的地までの経路を誘導する通常のナビゲーション装置であり、このナビゲーション処理の際に取得したそれらの情報を、音声認識スイッチ制御部１０５に出力する。
【００１７】
車速取得部１１３は、車両の走行速度を検出し、これを音声認識スイッチ制御部１０５に出力する。通常、車速が速くなるにつれて、運転者の運転に対する負荷は大きくなることから、運転負荷を推定するためのデータとして車速のデータを使用する。車速取得部１１３は、たとえば車速センサ等からの出力信号を他の制御系とともに受けることにより車速情報を取得する。
これら運転負荷検出部１０８のナビゲーション装置１１２および車速取得部１１３は、各々所定の時間間隔で、それら走行路の情報および車速の情報を取得し、音声認識スイッチ制御部１０５に出力する。
【００１８】
音声認識スイッチ制御部１０５は、騒音検出部１０３より入力される車室内の騒音を示す情報、および、運転負荷検出部１０８から入力される運転者の負荷に関わる情報に基づいて、騒音量および運転者の運転負荷を検出する。そして、これに基づいて音声入力の発話入力方式（以後、発話入力モードあるいは単に入力モードと称する）を設定する。設定した発話入力モードは、音声認識部１０４に出力する。また、音声認識スイッチ制御部１０５は、設定した発話入力モードに基づいて、後述する音声認識スイッチ１０１の制御を行う。
【００１９】
音声認識スイッチ制御部１０５は、騒音検出部１０３より入力される車室内の騒音の音量、車速およびエンジン回転数の各情報に基づいて、マイク１００を介して入力される騒音の音量を検出する。そして、その騒音量を所定の閾値と比較することにより騒音量のレベルを検出し、マイク１００を介した音声入力の入力モードを決定する。
本実施の形態においては、騒音量のレベルとして、第１の基準レベル（閾値）および第１の基準レベルよりも大きい第２の基準レベルの２段階の基準値を予め設定する。そして、検出した騒音量が第１の基準レベルよりも小さい場合には、マイク１００を介した発話入力モードとして第１の入力モードを設定する。また、検出した騒音量が第１の基準レベルから第２の基準レベルの間の場合には、マイク１００を介した発話入力モードとして第２の入力モードを設定する。また、検出した騒音量が第３の基準レベル以上の場合には、マイク１００を介した発話入力モードとして第３の入力モードを設定する。なお、これら第１〜第３の各入力モードにおける音声認識スイッチ１０１の操作方法および音声入力のタイミング、および、音声認識装置１の動作については、後に詳細に説明する。
なお、この発話入力モードの設定、変更は、所定の時間間隔で行う。
【００２０】
また、音声認識スイッチ制御部１０５は、運転負荷検出部１０８より入力される車両が走行している道路の情報および車両の速度の情報に基づいて、運転者の運転負荷を検出する。そして、その負荷が所定のレベル以上か否かを検出し、所定のレベル以上の負荷が運転者にかかっていると判定した場合には、運転中における発話入力モードの変更を行わないようにする。
前述したように、音声認識スイッチ制御部１０５は、所定の時間間隔ごとに発話入力モードを変更するが、発話入力モードを変更すると、後述するように音声認識スイッチ１０１の操作方法およびこれに対する発話のタイミングが異なり、運転者の負荷が増加することとなる。そこで、運転者の車両の運転に対する負荷がある程度高いと推定される時には、音声認識スイッチ制御部１０５は、発話入力モードの設定を行わず，音声認識スイッチ１０１の制御状態を変更しないようにする。
なお、この運転者の運転負荷の検出は、所定の時間間隔で行われる。
【００２１】
また、運転者の負荷が所定レベル以上と判定されて発話入力モードの変更が抑制された場合、運転者の負荷がその所定のレベルより小さくなったと判定されるまで、その発話入力モードの変更は抑制される。そして、運転者の負荷が所定のレベルより小さくなったと判定されたら、発話入力モードは、その時の騒音の音量に基づいた入力モードに設定される。
【００２２】
音声認識部１０４は、運転者が発話しマイク１００から入力された音声を認識する。音声認識部１０４は、マイク１００等から入力された音声をデジタルデータに変換し、デジタル音声データを予め用意した対象語辞書のデータとパターンマッチングすることにより、発話内容を認識する。この際、音声認識部１０４は、設定された発話入力モードに応じて、順次入力される音声からの認識対象の音声の切り出し処理を行う。また、認識結果は、認識結果の確認等のために、ディスプレイ１０６あるいはスピーカ１０７を介して運転者に出力される。
また、音声認識部１０４は、運転者等に音声入力を促す時に、たとえば「コマンドをどうぞ」というような質問内容をディスプレイ１０６あるいはスピーカ１０７から出力する。これにより、対話形式の音声入力が可能となる。
【００２３】
ディスプレイ１０６は、音声認識部１０４より入力される運転者への質問内容および音声認識結果等を運転者に表示する。
【００２４】
スピーカ１０７は、音声認識部１０４より入力される運転者への質問内容および音声認識結果等を、運転者に音声により出力する。
【００２５】
次に、本実施の形態の音声認識装置１で適宜切り換えて使用される第１〜第３の発話入力モードについて説明する。
まず、第１の発話入力モードは、音声認識装置１が、使用者（運転者）の一連の複数のコマンドあるいは情報の発話を連続的に待ち受け入力する方式である。この入力モードにおいては、使用者が音声認識スイッチ１０１を押下して一度発話開始のタイミングを指示したら、使用者が順に発する複数の情報を音声認識装置１は連続的に待ち受け、受け付ける。したがって、使用者は、一度音声認識スイッチ１０１を操作した後は新たなスイッチ操作をすることなく複数の情報を入力でき、利便性が高い。
【００２６】
一方、この入力モードにおいて音声認識装置１の音声認識部１０４は、連続的に入力される音声情報より、情報、入力内容ごとの音声を切り出す必要がある。そのために、入力される音声は、環境騒音の音量が比較的小さく、使用者の音声が明瞭に識別できるものである必要がある。したがってこの発話入力モードは、環境騒音の騒音量が小さい場合、または小さいと推定される場合に適用するのが有効である。
なお、この第１の発話入力モードにおいて音声認識スイッチ１０１は、音声認識スイッチ制御部１０５の制御により、一度押下された後は、一連の情報の全ての入力が終了するまで押下されオン信号を音声認識部１０４に出力する状態にホールドされる。
【００２７】
次に、第２の発話入力モードは、使用者が各コマンドあるいは情報の発話ごとに、その発話開始タイミングを指定する方式である。すなわち、使用者が音声認識スイッチ１０１を押下することにより音声認識装置１の音声認識部１０４は音声入力待ち受け状態となり、使用者の発話を取り込む。そして、１つの情報の入力ごとに発話の終了を音声認識部１０４が自動的に判定し、待ち受け状態を終了する。
この発話入力モードにおいては、使用者は各入力情報ごとに発話開始の指定を行わなければならないが、換言すれば発話開始の指定のみでよいため、それほど操作負荷は大きくならず、ある程度の利便性が確保できる。そして、音声認識部１０４は、各発話の開始タイミングが明確に指示されるので、周囲の環境騒音がある程度あっても、発話部分を適切に検出することができ、認識精度を維持することができる。したがって、この発話入力モードは、環境騒音の騒音量が中程度の場合、または中程度と推定される場合に適用するのが有効である。
なお、この第２の発話入力モードにおいて音声認識スイッチ１０１は、音声認識スイッチ制御部１０５の制御により、一度押下された後は、１つの情報の入力が終了するまで押下されオン信号を音声認識部１０４に出力する状態にホールドされる。
【００２８】
次に、第３の発話入力モードは、使用者が各コマンドあるいは情報の発話ごとに、その発話開始タイミングと発話終了タイミングとの両方を指定する方式である。すなわち、使用者が音声認識スイッチ１０１を押下し、この押下した状態を維持することにより、音声認識装置１の音声認識部１０４は音声入力待ち受け状態となる。そして、運転者が音声認識スイッチ１０１を開放することにより、音声認識装置１は音声入力が終了したものとみなす。
この発話入力モードにおいては、使用者は各入力情報ごとに発話開始と発話終了の両方のタイミングを指定しなければならないが、これにより音声認識部１０４は各発話の開始タイミングと終了タイミングを明確に知ることができる。したがって、環境騒音がある程度大きくても、発話音声の切り出しに失敗することがなく、高い精度の認識を維持することができる。
【００２９】
次に、このような構成の音声認識装置１の動作について説明する。
まず、入力方式の決定方法、すなわち、発話入力モードの設定方法について図３のフローチャートを参照して説明する。
まず、車両の電源を投入しエンジンを始動することにより処理が開始され、まず、騒音検出部１０３が、車室内の騒音量に関わる情報を取得する（ステップＳ１０１）。すなわち、騒音取得マイク部１０９が車室内の音量を検出し、車速取得部１１０が車両の走行速度を検出し、エンジン回転数取得部１１１がエンジン回転数を検出する。検出した車室内の騒音量に関わる情報は、音声認識スイッチ制御部１０５に出力される。
次に、運転負荷検出部１０８が、運転者の運転中の負荷の情報を取得する（ステップＳ１０３）。すなわち、ナビゲーション装置１１２が車両が走行している走行路の情報を検出し、車速取得部１１３が車両の走行速度を検出する。検出した運転者の運転負荷に関わる情報は、音声認識スイッチ制御部１０５に出力される。
【００３０】
次に、音声認識スイッチ制御部１０５において、騒音検出部１０３より入力された車室内の騒音に関わる情報に基づいて、車室内の騒音レベルを算出し、また、運転負荷検出部１０８より入力された運転者の運転負荷に関わる情報に基づいて、運転負荷のレベルを算出する（ステップＳ１０５）。
そして、算出された騒音レベルに基づいて、設定する発話入力モードを決定し、また、算出した負荷レベルに基づいて、決定した発話入力モードを実際に直ちに適用するか否かを判定する。負荷レベルが所定のレベル以下の場合には、決定した発話入力モードを実際に制御を行うモードとして設定する（ステップＳ１０７）。また、負荷レベルが所定のレベル以上の場合には、決定した発話入力モードの適用は行わず、現在設定されている入力モードをそのまま維持する。
音声認識装置１においては、このようなステップＳ１０１〜ステップＳ１０７の処理を、所定の時間間隔で繰り返し行い、車両の走行環境等に基づいて、適宜発話入力モードの設定を行う。
【００３１】
次に、前述したように設定された発話入力モードに基づいて音声入力を行う動作について、図４のフローチャートを参照して説明する。
まず、音声認識スイッチ制御部１０５および音声認識部１０４は、前述した処理により設定された発話入力モードの情報を取得する（ステップＳ２０１）。
次に、運転者が音声認識スイッチ１０１を押下するのを待機する（ステップＳ２０３）。
【００３２】
運転者による音声認識スイッチ１０１の押下を検出したら（ステップＳ２０３）、設定されている発話入力モードが一連の情報の入力を継続的に順次受け付ける第１の入力モード、または、１つの情報の入力ごとにスイッチ１０１を押下して発話開始を通知する第２の入力モードであるか否かを検出する（ステップＳ２０５）。
設定されている入力モードが第１または第２の入力モードの場合には、音声認識スイッチ１０１は、音声認識スイッチ制御部１０５の制御により押下された状態でホールドされる（ステップＳ２０７）。すなわち、図２に示す音声認識スイッチ１０１において、電磁石１２４の端子１２６に所定の電流が印加され、電磁石１２４に発生する磁力により可動端子１２２が接触端子１２５の間に吸引保持され、可動端子１２２に一体的に形成されている押しボタン１２１が押下された状態に維持される。そして、音声認識スイッチ１０１が押下される状態に維持されることにより、音声認識スイッチ１０１からはオン信号が音声認識部１０４に出力され続ける。
【００３３】
次に、音声認識部１０４の動作により、運転者に情報の入力を促す通知を行う（ステップＳ２０９）。すなわち、ディスプレイ１０６への情報の入力を要求するメッセージの表示、あるいは、スピーカ１０７からの情報の入力を要求するメッセージを音声出力等の処理を行う。
この入力要求の通知にしたがって、運転者は、要求された情報をマイク１００に向かって発話し、音声による情報入力を行う（ステップＳ２１１）。入力された音声データは、音声認識部１０４に入力され、認識に供される。
【００３４】
音声認識部１０４が入力される音声の切れ目を検出したら、発話入力モードが第２の入力モードの場合は（ステップＳ２１３）、音声認識スイッチ制御部１０５の制御により音声認識スイッチ１０１の押ボタンのホールド状態が解除される（ステップＳ２１５）。これにより、音声認識スイッチ１０１から音声認識部１０４へのオン信号の入力も終了される。なお、音声の切れ目を検出した場合においても、発話入力モードが第１のモードの場合は（ステップＳ２１３）、音声認識スイッチ１０１のホールドはのまま維持される。
【００３５】
一方、ステップＳ２０３において音声認識スイッチ１０１が押下されたことを検出した際に、入力モードが第３の入力モードだった場合には、音声認識スイッチ１０１のホールドは行わずに、直ちに、音声認識部１０４より運転者に情報を入力を促す通知を行う（ステップＳ２１７）。すなわち、ディスプレイ１０６への情報の入力を要求するメッセージの表示、あるいは、スピーカ１０７からの情報の入力を要求するメッセージを音声出力等の処理を行う。
この入力要求の通知にしたがって、運転者は、要求された情報をマイク１００に向かって発話し、音声による情報入力を行う（ステップＳ２１９）。マイク１００を介した運転者の発話音声の入力は、音声認識スイッチ１０１が押下されている間続けられ、運転者がスイッチを開放した時点で終了する（ステップＳ２２１）。
【００３６】
入力モードが第１のモードであって発話入力された音声の切れ目を検出した場合（ステップＳ２１１）、入力モードが第２のモードであって発話入力された音声の切れ目を検出した後に音声認識スイッチ１０１のホールドが解除された場合（ステップＳ２１５）、および、入力モードが第３のモードであって運転者により音声認識スイッチ１０１の押下が開放された場合（ステップＳ２２１）のいずれも、次に、情報の入力が終了か否かがチェックされる（ステップＳ２２３）。
引き続き入力する情報がある場合であって（ステップＳ２２３）、入力モードが第１のモードの場合には（ステップＳ２２５）、既に音声認識スイッチ１０１はホールドされているので、ステップＳ２０９に戻り、次の入力項目について運転者に情報の入力を促す通知を行う。以下、ステップＳ２０９〜ステップＳ２２２５の処理を繰り返す。
また、入力モードが第２または第３のモードの場合には、ステップＳ２０３に戻って、運転者により音声認識スイッチ１０１が押下されるのを検出する。そして、音声認識スイッチ１０１が押下されたら、各々、ステップＳ２０５からステップＳ２１１等を介してステップＳ２２３に至る処理、あるいは、ステップＳ２０５からステップＳ２１９等を介してステップＳ２２３に至る処理を繰り返す。
【００３７】
そして、いずれの入力モードの場合も、ステップＳ２２３において、全ての情報の入力が終了した場合には、入力モードが第１のモードである場合のみ（ステップＳ２２７）、音声認識スイッチ１０１のホールド状態を解除する処理が行われ（ステップＳ２２９）、一連の処理が終了される。
【００３８】
次に、このような音声入力処理の流れについて、図４および図５を参照し、各入力モードごとに説明する。
なお、以下の説明において、ステップの番号は図４に示すフローチャート中の符号を示し、状態の符号は図５中の符号を示す。
【００３９】
まず、発話入力モードが第１の入力モードの場合の音声入力処理の流れについて図５（Ａ）を参照して説明する。
まず、第１の入力モードにおいては、運転者が音声認識スイッチ１０１を押下すると（ステップＳ２０３、状態ａ）、音声認識スイッチ１０１はホールド状態に維持され（ステップＳ２０７、状態ｂ）、音声認識部１０４は入力要求をディスプレイ１０６あるいはスピーカ１０７に表示し（ステップＳ２０９、状態ｃ１）、音声入力の待ち受け状態となる（状態ｄ１）。ここで、運転者が発話をすることにより、その内容が音声認識部１０４で認識され入力される（ステップＳ２１１、状態ｅ１）。
【００４０】
１つの項目の入力が終了したら、直ちに次の項目について音声認識部１０４が入力要求を出力し（ステップＳ２０７、状態ｃ２）、再度音声入力の待ち受け状態となり（状態ｄ２）、運転者が発話をすることにより、その内容が入力される（ステップＳ２１１、状態ｅ２）。
第１の入力モードにおいては、このように、音声認識スイッチ１０１が押下されホールドされている状態で（状態ｂ）、入力要求の出力（ステップＳ２０７、状態ｃ１，ｃ２）、音声入力の待ち受け（状態ｄ１，ｄ２）および発話（ステップＳ２１１、状態ｅ１，ｅ２）が順次繰り返され、一連の情報が順次入力される。
そして、全ての項目について入力が終了したら、音声認識スイッチ１０１のホールド状態を解除し（ステップＳ２２９、状態ｆ）、処理が終了される（状態ｇ）。
【００４１】
次に、発話入力モードが第２の入力モードの場合の音声入力処理の流れについて図５（Ｂ）を参照して説明する。
第２の入力モードにおいては、運転者が音声認識スイッチ１０１を押下すると（ステップＳ２０３、状態ｈ１）、音声認識スイッチ１０１はホールド状態に維持され（ステップＳ２０７、状態ｉ１）、音声認識部１０４は入力要求をディスプレイ１０６あるいはスピーカ１０７に表示し（ステップＳ２０９、状態ｊ１）、音声入力の待ち受け状態となる（状態ｋ１）。ここで、運転者が発話をすることにより、その内容が音声認識部１０４で認識され入力される（ステップＳ２１１、状態ｍ１）。そして、１つの項目の入力が終了したら、音声認識スイッチ１０１のホールド状態は解除される（ステップＳ２１５、状態ｎ１）。
【００４２】
そして、次の項目について入力する場合には、再度運転者が音声認識スイッチ１０１を押下する（ステップＳ２０３、状態ｈ２）。これにより、最初の場合と同様に、音声認識スイッチ１０１はホールド状態に維持され（ステップＳ２０７、状態ｉ２）、音声認識部１０４が入力要求を出力し（ステップＳ２０９、状態ｊ２）、音声入力の待ち受け状態となり（状態ｋ２）、運転者が発話をすることにより、その内容が入力される（ステップＳ２１１、状態ｍ２）。そして、その項目の入力が終了したら、やはり音声認識スイッチ１０１のホールド状態は解除される（ステップＳ２１５、状態ｎ２）。
【００４３】
第２の入力モードにおいては、このように、各音声入力の開始時点において、音声認識スイッチ１０１を押下する（状態ｈ１、ｈ２）。押下された音声認識スイッチ１０１は、第１のモードと同様にホールドされるので（状態ｉ１，ｉ２）、その状態において、入力要求の出力（ステップＳ２０７、状態ｊ１，ｊ２）、音声入力の待ち受け（状態ｋ１，ｋ２）および発話（ステップＳ２１１、状態ｍ１，ｍ２）を行う。そして、このような処理を順次繰り返し、一連の情報を入力し、全ての項目について入力が終了したら、処理を終了する（状態ｐ）。
【００４４】
次に、発話入力モードが第３の入力モードの場合の音声入力処理の流れについて図５（Ｃ）を参照して説明する。
第３の入力モードにおいては、運転者は、音声認識スイッチ１０１を押下し（ステップＳ２０３、状態ｑ１）、発話をする間、音声認識スイッチ１０１を押下し続ける（状態ｒ１）。この時、音声認識スイッチ１０１はホールドされない。この状態で、音声認識部１０４は入力要求をディスプレイ１０６あるいはスピーカ１０７より出力し（ステップＳ２１７、状態ｓ１）、音声入力の待ち受け状態となる（状態ｔ１）。ここで、運転者が発話をすることにより、その内容が音声認識部１０４で認識され入力される（ステップＳ２１９、状態ｕ１）。そして、運転者が音声認識スイッチ１０１を開放することにより（ステップＳ２２１、状態ｖ１）、１つの項目の入力が終了する。
【００４５】
そして、次の項目について入力する場合には、再度運転者が音声認識スイッチ１０１を押下し（ステップＳ２０３、状態ｑ２）、押下し続ける（状態ｒ２）。この状態で、音声認識部１０４は入力要求をディスプレイ１０６あるいはスピーカ１０７に出力し（ステップＳ２１７、状態ｓ２）、音声入力の待ち受け状態となり（状態ｔ２）、運転者が発話をすることによりその内容が音声認識部１０４で認識され入力される（ステップＳ２１９、状態ｕ２）。そして、運転者が音声認識スイッチ１０１を開放することにより（ステップＳ２２１、状態ｖ２）、その項目の入力が終了する。
【００４６】
第３の入力モードにおいては、このように、各情報の入力ごとに、運転者が音声認識装置１０１０を押下し続け、発話の開始（状態ｑ１、ｑ１）と終了（状態１、ｖ２）のタイミングを指定し、その間に、入力要求の出力（ステップＳ２１７、状態ｓ１，ｓ２）、音声入力の待ち受け（状態ｔ１，ｔ２）および発話（ステップＳ２１９、状態ｕ１，ｕ２）を行い、情報を入力する。そして、これを順次繰り返し、一連の情報を順次入力する。
そして、全ての項目について入力が終了したら、一連の情報入力処理も終了する（状態ｐ）。
【００４７】
なお、以上説明した実施の形態は、本発明の理解を容易にするために記載されたものであって、本発明を限定するために記載されたものではない。したがって、上記の実施の形態に開示された各要素は、本発明の技術的範囲に属する全ての設計変更や均等物をも含む趣旨である。
【００４８】
たとえば、前述した実施の形態においては、第１〜第３の３種類の発話入力モードから、環境騒音等の情報に基づいて１つの発話入力モードを選択するようにしていた。しかし、切り換える発話入力モードは、２週類であってもよい。
たとえば、図６に示すように、第１の発話入力モードと第２の発話入力モードの２つを環境騒音の騒音量に基づいて切り換えるようにしてもよい。
また、図７に示すように、第２の発話入力モードと第３の発話入力モードの２つを環境騒音の騒音量等に基づいて切り換えるようにしてもよい。
【００４９】
また、前述した実施の形態においては、騒音検出部１０３の騒音取得マイク部１０９、車速取得部１１０およびエンジン回転数取得部１１１、および、運転負荷検出部１０８のナビゲーション装置１１２および車速取得部１１３は、各々、所定の時間間隔で騒音や車速等を検出し、検出結果を音声認識スイッチ制御部１０５に出力していた。しかし、たとえば騒音量や車速あるいはエンジン回転数の変化を検出した時に、その変化した検出結果を音声認識スイッチ制御部１０５に出力するようにしてもよい。
【図面の簡単な説明】
【図１】図１は、本発明の一実施の形態の音声認識装置の構成を示すブロック図である。
【図２】図２は、図１に示した音声認識装置の音声認識スイッチの構成を示す図である。
【図３】図３は、図１に示した音声認識装置における発話入力モードの設定方法を示すフローチャートである。
【図４】図４は、図１に示した音声認識装置の音声認識スイッチ制御部で行われる音声認識スイッチの制御方法を示すフローチャートである。
【図５】図５は、図１に示した音声認識装置において使用する発話入力モードの種類とその各モードにおける音声入力動作を説明するための図である。
【図６】図６は、図１に示した音声認識装置において使用する発話入力モードの種類とその各モードにおける音声入力動作の他の例を説明するための図である。
【図７】図７は、図１に示した音声認識装置において使用する発話入力モードの種類とその各モードにおける音声入力動作のさらに他の例を説明するための図である。
【図８】図８は、従来の音声認識装置における音声入力動作を説明するための図である。
【符号の説明】
１…音声認識装置
１００…マイク
１０１…音声認識スイッチ
１０２…キャンセルスイッチ
１０３…騒音検出部
１０９…騒音取得マイク部
１１０…車速取得部
１１１…エンジン回転数取得部
１０４…音声認識部
１０５…音声認識スイッチ制御部
１０６…ディスプレイ
１０７…スピーカ
１０８…運転負荷検出部
１１２…ナビゲーション装置
１１３…車速取得部[0001]
【Technical field】
The present invention relates to a voice recognition device for a vehicle, and more particularly, to a voice recognition device capable of improving the recognition accuracy of a voice uttered regardless of the presence or absence of sound (environmental noise) other than the voice uttered in a vehicle cabin. .
[0002]
[Background Art]
For example, in an in-vehicle device such as a navigation system, an audio system, or an in-vehicle telephone system, a vehicle for which a user such as a driver can operate or input information by speaking using a voice recognition technology. The device is being used.
2. Description of the Related Art As a voice recognition device for a vehicle of this type, which can improve a recognition rate even in a noisy environment, for example, a voice recognition device disclosed in Patent Document 1 is known. This apparatus detects a voice section of an input signal wave, calculates a degree of matching between the voice section of the input signal wave and a word in a voice dictionary in which a word to be recognized is recorded, and calculates the highest matching degree. This is an apparatus to which a method of outputting a word as a recognition result is applied. In this method, in particular, a word in which the syllable at the beginning of the word to be recognized is omitted is regarded as a recognition target, and the word is used for calculating the degree of coincidence with the speech section of the input signal wave.
[0003]
However, such a conventional speech recognition device has, for example, the following problem. A voice recognition device for a vehicle may perform voice recognition while using a high-noise device such as an air conditioner. In such a case, even if a word obtained by omitting the syllable at the beginning of the word to be recognized is a word to be recognized and the degree of coincidence with the speech section of the input signal wave is calculated, as shown in the bottom diagram of FIG. In addition, there are many cases where a noise segment is erroneously recognized as a segment of a voice emitted by a speaker, and the speech recognition is terminated before the speaker starts speaking. In such a case, the uttered voice is not input as a target of the voice recognition processing in the first place, and the operability and convenience of the device using the voice recognition cannot be improved.
[0004]
[Patent Document 1] JP-A-10-69291
[0005]
DISCLOSURE OF THE INVENTION
The present invention has been made in view of the above problems, and its object is to improve the recognition accuracy regardless of the presence or absence of environmental noise, regardless of the volume, thereby providing a high convenience for a user such as a driver. An object of the present invention is to provide a vehicle voice recognition device capable of providing various on-vehicle devices.
[0006]
In order to achieve the above object, a voice recognition device according to the present invention includes a voice input unit for inputting a uttered voice, and a voice recognition unit for recognizing the utterance content of the input voice. An apparatus, comprising: a noise detection unit that detects noise that is a sound other than the utterance in the vehicle cabin; and a system that specifies an input timing of the uttered voice to the voice input unit. Voice input control means for switching to any one of a plurality of predetermined input methods including a method for designating a start time and an end time of each utterance based on the volume of the utterance.
[0007]
In the voice recognition device having such a configuration, the noise detection unit detects the volume of the noise in the vehicle compartment, and the voice input control unit switches the voice input method based on the detected noise volume. Further, as one of the switching methods, a method having a high certainty of designating a start and an end for each utterance is included. For example, even when the volume of the noise is somewhat large, the input timing of the sound can be appropriately defined. I have to. Therefore, it is possible to select a method with the highest operability and the highest convenience under the condition of the amount of noise in the vehicle compartment. As a result, even if there is environmental noise, it is possible to appropriately specify the timing of voice input, and it is possible to reduce the possibility that an uttered voice is not input as a processing target of voice recognition. Then, by appropriately inputting the voice, the voice recognition unit can accurately recognize the input voice.
[0008]
As described above, according to the present invention, it is possible to improve recognition accuracy regardless of the presence / absence and volume of environmental noise, and thereby it is possible to provide various in-vehicle devices that are highly convenient for users such as drivers. A voice recognition device for a vehicle can be provided. It is in.
[0009]
BEST MODE FOR CARRYING OUT THE INVENTION
An embodiment of the present invention will be described with reference to FIGS.
FIG. 1 is a block diagram illustrating a configuration of a vehicle voice recognition device according to an embodiment of the present invention.
As shown in FIG. 1, the voice recognition device 1 includes a microphone 100, a voice recognition switch 101, a cancel switch 102, a noise detection unit 103, a voice recognition unit 104, a voice recognition switch control unit 105, a display 106, a speaker 107, and a driving load. It has a detection unit 108. The noise detection unit 103 includes a noise acquisition microphone unit 109, a vehicle speed acquisition unit 110, and an engine speed acquisition unit 111. The driving load detection unit 108 includes a navigation device 112 and a vehicle speed acquisition unit 113.
[0010]
First, the configuration of each unit will be described.
The microphone 100 is, for example, a microphone provided at or near the driver's seat of the vehicle in order to collect and input voices spoken by a driver or the like of the vehicle. The microphone 100 collects sounds and the like emitted from the surroundings, converts the sounds into electric signals, converts the signals into digital data, and outputs the digital data to the voice recognition unit 104.
[0011]
The voice recognition switch 101 is a push-type switch for specifying a timing at which a voice uttered by the driver is input to the voice recognition device 1. The voice recognition switch 101 outputs an on (ON) signal to the voice recognition unit 104 while being pressed.
The voice recognition switch 101 includes a switch 120 having a hold mechanism as shown in FIG. In the switch 120, one end of a spring (coil spring) 123 is connected to an end of a movable terminal 122 that moves integrally with a push button 121 to be pressed. The other end of the spring 123 is fixed to an electromagnet 124 fixed in a frame of the switch 120. Accordingly, when the push button 121 is pushed in and the movable terminal 122 is held between the contact terminals 125, when the terminal 126 of the electromagnet 124 is energized by the voice recognition switch control unit 105 described later, a magnetic force is generated in the electromagnet 124 Attraction force acts on the movable terminal 122, and the movable terminal 122 is maintained between the contact terminals 125.
As a result, the push button 121 of the voice recognition switch 101 is maintained in a pressed state even after the operator's pressing force on the push button 121 is released. During this time, the voice recognition switch 101 continuously outputs an ON signal to the voice recognition unit 104.
[0012]
The cancel switch 102 is a push-type switch for canceling information input by voice and proceeding to processing such as re-input. The cancel switch 102 outputs a cancel signal to the voice recognition unit 104 when pressed.
In the voice recognition device 1, the content input by the driver from the microphone 100 is recognized by the voice recognition unit 104, and then is notified to the driver via the display 106 or the speaker 107 for confirmation. At that time, the indicated information may be different from the content that the driver tried to input. This occurs, for example, when the driver utters wrong content, or when the voice recognition unit 104 performs erroneous recognition. In such a case, by pressing the cancel switch 102, a cancel signal is output to the voice recognition unit 104, and a process of canceling incorrect input contents is performed. In addition, the operation of the speech recognition device 1 is controlled so that speech is re-input as necessary.
[0013]
The noise detection unit 103 detects information related to the sound volume (noise amount) in the vehicle compartment, and outputs the detected information to the voice recognition switch control unit 105.
In the present embodiment, noise detection section 103 has noise acquisition microphone section 109, vehicle speed acquisition section 110, and engine speed acquisition section 111.
The noise acquisition microphone unit 109 is a microphone for collecting sound (noise) other than intentional speech to the microphone 100 of the driver, which is a sound actually heard in the vehicle interior. Noise that is heard in the vehicle cabin due to unevenness of the road surface or wind noise is collected by the noise acquisition microphone unit 109. The noise acquisition microphone unit 109 is provided at a predetermined position in the vehicle interior suitable for collecting noise, collects the sound in the vehicle interior, converts this into digital data, and sends the digital data to the voice recognition switch control unit 105. Output.
[0014]
The vehicle speed acquisition unit 110 acquires information on the traveling speed of the vehicle as data for estimating the noise amount, and outputs the information to the voice recognition switch control unit 105. Normally, as the vehicle speed increases, running noise due to wind noise and unevenness of the road surface also increases. Therefore, the running speed of the vehicle is used as data for estimating the noise amount. The vehicle speed acquisition unit 110 acquires vehicle speed information by receiving, for example, an output signal from a vehicle speed sensor or the like together with another control system.
[0015]
The engine speed obtaining unit 111 obtains information on the engine speed as data for estimating the noise amount, and outputs the information to the voice recognition switch control unit 105. Normally, as the engine speed increases, the engine noise increases, so the engine speed is used as data for estimating the noise amount. The engine speed obtaining unit 111 obtains engine speed information, for example, by receiving an output signal from an engine control circuit of the vehicle together with another control system.
The noise acquisition microphone unit 109, the vehicle speed acquisition unit 110, and the engine speed acquisition unit 111 of the noise detection unit 103 acquire information on the noise amount, the vehicle speed, or the engine speed at predetermined time intervals, and perform voice recognition switch control. Output to the unit 105.
[0016]
The driving load detection unit 108 acquires information on the driving load on the driver, and outputs the information to the voice recognition switch control unit 105.
The driving load detection unit 108 includes a navigation device 112 and a vehicle speed acquisition unit 113.
The navigation device 112 outputs to the voice recognition switch control unit 105 information such as the type and situation of the traveling road on which the vehicle is traveling. For example, when traveling on a general road, the load on driving is greater than when traveling on a motorway or other highway due to the presence of obstacles such as pedestrians and motorbikes, etc. In addition, even in the same general road, the load in driving becomes large even under conditions such as a narrow road width, a large traffic volume, and a complicated road shape. Therefore, information on the traveling road is detected as data for estimating the driving load. The navigation device 112 is a normal navigation device that guides the driver to the route to the destination, and outputs the information obtained during the navigation processing to the voice recognition switch control unit 105.
[0017]
The vehicle speed acquisition unit 113 detects the traveling speed of the vehicle and outputs this to the voice recognition switch control unit 105. Normally, the load on the driver's driving increases as the vehicle speed increases. Therefore, vehicle speed data is used as data for estimating the driving load. The vehicle speed acquisition unit 113 acquires vehicle speed information by receiving, for example, an output signal from a vehicle speed sensor or the like together with another control system.
The navigation device 112 and the vehicle speed acquisition unit 113 of the driving load detection unit 108 acquire the information of the traveling road and the information of the vehicle speed at predetermined time intervals, and output the information to the voice recognition switch control unit 105.
[0018]
The voice recognition switch control unit 105 controls the noise amount and the driving based on the information indicating the vehicle interior noise input from the noise detection unit 103 and the information related to the driver's load input from the driving load detection unit 108. The driver's driving load is detected. Then, based on this, an utterance input method of voice input (hereinafter referred to as an utterance input mode or simply an input mode) is set. The set utterance input mode is output to the voice recognition unit 104. The voice recognition switch control unit 105 controls a voice recognition switch 101, which will be described later, based on the set utterance input mode.
[0019]
The voice recognition switch control unit 105 detects the volume of the noise input via the microphone 100 based on the information on the volume of the vehicle interior noise, the vehicle speed, and the engine speed input from the noise detection unit 103. Then, the level of the noise amount is detected by comparing the noise amount with a predetermined threshold value, and the input mode of voice input via the microphone 100 is determined.
In the present embodiment, two levels of reference values of a first reference level (threshold) and a second reference level larger than the first reference level are preset as the noise level. If the detected noise amount is smaller than the first reference level, the first input mode is set as the speech input mode via the microphone 100. When the detected noise level is between the first reference level and the second reference level, the second input mode is set as the speech input mode via the microphone 100. If the detected noise level is equal to or higher than the third reference level, the third input mode is set as the speech input mode via the microphone 100. The operation method of the voice recognition switch 101, the timing of voice input, and the operation of the voice recognition device 1 in each of the first to third input modes will be described later in detail.
The setting and change of the utterance input mode are performed at predetermined time intervals.
[0020]
Further, the voice recognition switch control unit 105 detects the driving load of the driver based on the information on the road on which the vehicle is traveling and the information on the speed of the vehicle input from the driving load detection unit 108. Then, whether or not the load is equal to or higher than a predetermined level is detected, and when it is determined that the load equal to or higher than the predetermined level is applied to the driver, the utterance input mode is not changed during driving. .
As described above, the voice recognition switch control unit 105 changes the utterance input mode at predetermined time intervals. However, when the utterance input mode is changed, the operation method of the voice recognition switch 101 and the utterance The timing is different, and the load on the driver increases. Therefore, when it is estimated that the driver's load on driving the vehicle is high to some extent, the voice recognition switch control unit 105 does not set the utterance input mode and does not change the control state of the voice recognition switch 101.
The detection of the driving load of the driver is performed at predetermined time intervals.
[0021]
Further, when the driver's load is determined to be equal to or higher than the predetermined level and the change in the utterance input mode is suppressed, the change in the utterance input mode is determined until the driver's load is determined to be lower than the predetermined level. Be suppressed. When it is determined that the driver's load has become smaller than the predetermined level, the utterance input mode is set to an input mode based on the volume of the noise at that time.
[0022]
The voice recognition unit 104 recognizes voice input by the driver from the speaking microphone 100. The voice recognition unit 104 converts voice input from the microphone 100 or the like into digital data, and performs pattern matching of the digital voice data with data of a target word dictionary prepared in advance, thereby recognizing the utterance content. At this time, the voice recognition unit 104 performs a process of cutting out a voice to be recognized from sequentially input voices according to the set utterance input mode. The recognition result is output to the driver via the display 106 or the speaker 107 for confirmation of the recognition result or the like.
Further, when prompting the driver or the like to input a voice, the voice recognition unit 104 outputs, for example, a question content such as “Please give a command” from the display 106 or the speaker 107. This enables interactive voice input.
[0023]
The display 106 displays the content of the question to the driver and the result of the voice recognition input from the voice recognition unit 104 to the driver.
[0024]
The speaker 107 outputs the content of the question to the driver and the voice recognition result input from the voice recognition unit 104 to the driver by voice.
[0025]
Next, first to third utterance input modes that are appropriately switched and used in the speech recognition device 1 of the present embodiment will be described.
First, the first utterance input mode is a method in which the voice recognition device 1 continuously waits and inputs a series of a plurality of commands or information of a user (driver). In this input mode, once the user presses the voice recognition switch 101 to instruct the start timing of the utterance, the voice recognition device 1 continuously waits for and receives a plurality of pieces of information sequentially uttered by the user. Therefore, after operating the voice recognition switch 101 once, the user can input a plurality of pieces of information without operating a new switch, which is highly convenient.
[0026]
On the other hand, in this input mode, the voice recognition unit 104 of the voice recognition device 1 needs to cut out voice for each information and input content from continuously input voice information. For this purpose, the input voice needs to have a relatively low volume of environmental noise and be able to clearly identify the voice of the user. Therefore, it is effective to apply this utterance input mode when the amount of environmental noise is small or estimated to be small.
In the first utterance input mode, the speech recognition switch 101 is depressed once under the control of the speech recognition switch control unit 105, and is depressed until the input of a series of information is completed, and an ON signal is outputted. It is held in a state of outputting to the recognition unit 104.
[0027]
Next, the second utterance input mode is a method in which the user specifies the utterance start timing for each utterance of each command or information. That is, when the user presses the voice recognition switch 101, the voice recognition unit 104 of the voice recognition device 1 enters a voice input waiting state, and captures the utterance of the user. Then, the speech recognition unit 104 automatically determines the end of the utterance for each input of one piece of information, and ends the standby state.
In this utterance input mode, the user has to specify the utterance start for each input information. In other words, since the utterance start only needs to be specified, the operation load is not so large, and a certain degree of convenience is obtained. Can be secured. Then, since the start timing of each utterance is clearly indicated, the voice recognition unit 104 can appropriately detect the utterance portion even if there is some ambient environmental noise, and can maintain recognition accuracy. . Therefore, it is effective to apply this utterance input mode when the amount of environmental noise is moderate or estimated to be moderate.
In the second utterance input mode, the speech recognition switch 101 is depressed once under the control of the speech recognition switch control unit 105, and is depressed until the input of one piece of information is completed. It is held in a state of outputting to 104.
[0028]
Next, the third utterance input mode is a method in which the user specifies, for each command or information utterance, both the utterance start timing and the utterance end timing. That is, when the user presses the voice recognition switch 101 and maintains the pressed state, the voice recognition unit 104 of the voice recognition device 1 enters a voice input standby state. When the driver opens the voice recognition switch 101, the voice recognition device 1 determines that the voice input has been completed.
In the utterance input mode, the user has to specify both the utterance start timing and the utterance end timing for each input information, whereby the voice recognition unit 104 clarifies the start timing and end timing of each utterance. You can know. Therefore, even if the environmental noise is large to some extent, it is possible to maintain high-accuracy recognition without failing to cut out the uttered voice.
[0029]
Next, the operation of the voice recognition device 1 having such a configuration will be described.
First, a method of determining an input method, that is, a method of setting an utterance input mode will be described with reference to a flowchart of FIG.
First, the process is started by turning on the power of the vehicle and starting the engine, and first, the noise detection unit 103 acquires information related to the amount of noise in the vehicle compartment (step S101). That is, the noise acquisition microphone unit 109 detects the volume of the vehicle interior, the vehicle speed acquisition unit 110 detects the running speed of the vehicle, and the engine speed acquisition unit 111 detects the engine speed. Information on the detected noise amount in the vehicle compartment is output to the voice recognition switch control unit 105.
Next, the driving load detection unit 108 acquires information on the load during driving of the driver (step S103). That is, the navigation device 112 detects information on the traveling path on which the vehicle is traveling, and the vehicle speed acquisition unit 113 detects the traveling speed of the vehicle. Information on the detected driving load of the driver is output to the voice recognition switch control unit 105.
[0030]
Next, the voice recognition switch control unit 105 calculates the vehicle interior noise level based on the information on the vehicle interior noise input from the noise detection unit 103, and inputs the noise level from the driving load detection unit 108. The level of the driving load is calculated based on the information on the driving load of the driver (step S105).
Then, based on the calculated noise level, the utterance input mode to be set is determined, and based on the calculated load level, it is determined whether or not the determined utterance input mode is actually immediately applied. If the load level is equal to or lower than the predetermined level, the determined utterance input mode is set as a mode for actually controlling (step S107). If the load level is equal to or higher than the predetermined level, the determined utterance input mode is not applied, and the currently set input mode is maintained.
In the voice recognition device 1, the processes of steps S101 to S107 are repeatedly performed at predetermined time intervals, and the utterance input mode is appropriately set based on the traveling environment of the vehicle and the like.
[0031]
Next, an operation of performing voice input based on the utterance input mode set as described above will be described with reference to the flowchart of FIG.
First, the voice recognition switch control unit 105 and the voice recognition unit 104 obtain information on the utterance input mode set by the above-described processing (step S201).
Next, it waits for the driver to press the voice recognition switch 101 (step S203).
[0032]
When the depression of the voice recognition switch 101 by the driver is detected (step S203), the set utterance input mode is the first input mode in which the input of a series of information is continuously and sequentially received, or each time one information is input. Then, it is detected whether or not the second input mode for notifying the start of the utterance by pressing the switch 101 (step S205).
If the set input mode is the first or second input mode, the voice recognition switch 101 is held in a pressed state under the control of the voice recognition switch control unit 105 (step S207). That is, in the voice recognition switch 101 shown in FIG. 2, a predetermined current is applied to the terminal 126 of the electromagnet 124, and the movable terminal 122 is attracted and held between the contact terminals 125 by the magnetic force generated in the electromagnet 124. The integrally formed push button 121 is kept pressed. When the voice recognition switch 101 is maintained in a pressed state, an ON signal is continuously output from the voice recognition switch 101 to the voice recognition unit 104.
[0033]
Next, a notification urging the driver to input information is issued by the operation of the voice recognition unit 104 (step S209). That is, a process of displaying a message requesting input of information on the display 106 or outputting a message requesting input of information from the speaker 107 such as voice output is performed.
In accordance with the notification of the input request, the driver speaks the requested information toward the microphone 100 and inputs information by voice (step S211). The input voice data is input to the voice recognition unit 104 and provided for recognition.
[0034]
When the voice recognition unit 104 detects a break in the input voice, if the utterance input mode is the second input mode (step S213), the voice recognition switch control unit 105 controls the push button of the voice recognition switch 101 to hold. The state is released (step S215). Thus, the input of the ON signal from the voice recognition switch 101 to the voice recognition unit 104 is terminated. Note that, even when a speech break is detected, if the utterance input mode is the first mode (step S213), the hold of the speech recognition switch 101 is maintained.
[0035]
On the other hand, if the input mode is the third input mode when detecting that the voice recognition switch 101 has been pressed in step S203, the voice recognition unit 101 is immediately held without holding the voice recognition switch 101. A notification prompting the driver to input information is issued from the driver 104 (step S217). That is, a process of displaying a message requesting input of information on the display 106 or outputting a message requesting input of information from the speaker 107 such as voice output is performed.
In accordance with the notification of the input request, the driver speaks the requested information toward the microphone 100 and inputs information by voice (step S219). The input of the driver's uttered voice via the microphone 100 is continued while the voice recognition switch 101 is being pressed, and ends when the driver opens the switch (step S221).
[0036]
When the input mode is the first mode and the break of the uttered voice is detected (step S211), the voice recognition switch is switched after the input mode is the second mode and the break of the uttered voice is detected. Both when the hold of 101 is released (step S215) and when the input mode is the third mode and the driver presses down the voice recognition switch 101 (step S221), It is checked whether the input of the information is completed (step S223).
If there is information to be continuously input (step S223) and the input mode is the first mode (step S225), since the voice recognition switch 101 has already been held, the process returns to step S209 and returns to the next step S209. A notification that prompts the driver to input information on the input items is performed. Hereinafter, the processing of steps S209 to S2225 is repeated.
If the input mode is the second or third mode, the process returns to step S203 to detect that the driver has pressed the voice recognition switch 101. When the voice recognition switch 101 is pressed, the process from step S205 to step S223 via step S211 or the like or the process from step S205 to step S223 via step S219 or the like is repeated.
[0037]
In any of the input modes, if all the information has been input in step S223, the hold state of the voice recognition switch 101 is changed only when the input mode is the first mode (step S227). Release processing is performed (step S229), and a series of processing ends.
[0038]
Next, the flow of such a voice input process will be described for each input mode with reference to FIGS.
In the following description, the step numbers indicate the reference numerals in the flowchart shown in FIG. 4, and the state codes indicate the reference numerals in FIG.
[0039]
First, the flow of the voice input process when the utterance input mode is the first input mode will be described with reference to FIG.
First, in the first input mode, when the driver presses the voice recognition switch 101 (step S203, state a), the voice recognition switch 101 is maintained in the hold state (step S207, state b), and the voice recognition unit 104 is activated. Displays an input request on the display 106 or the speaker 107 (step S209, state c1), and enters a state of waiting for voice input (state d1). Here, when the driver speaks, the content is recognized and input by the voice recognition unit 104 (step S211, state e1).
[0040]
As soon as the input of one item is completed, the voice recognition unit 104 outputs an input request for the next item (step S207, state c2), and the apparatus enters the voice input standby state again (state d2), and the driver speaks. As a result, the content is input (step S211, state e2).
In the first input mode, in the state where the voice recognition switch 101 is pressed and held (state b), the input request is output (step S207, states c1 and c2), and the voice input is awaited (state d1, d2) and utterance (step S211, states e1, e2) are sequentially repeated, and a series of information is sequentially input.
When the input of all the items is completed, the hold state of the voice recognition switch 101 is released (step S229, state f), and the process is ended (state g).
[0041]
Next, the flow of the voice input process when the utterance input mode is the second input mode will be described with reference to FIG.
In the second input mode, when the driver presses the voice recognition switch 101 (step S203, state h1), the voice recognition switch 101 is maintained in the hold state (step S207, state i1), and the voice recognition unit 104 performs input. The request is displayed on the display 106 or the speaker 107 (step S209, state j1), and the apparatus enters a state of waiting for voice input (state k1). Here, when the driver speaks, the content is recognized and input by the voice recognition unit 104 (step S211, state m1). When the input of one item is completed, the hold state of the voice recognition switch 101 is released (step S215, state n1).
[0042]
Then, when inputting the next item, the driver presses the voice recognition switch 101 again (step S203, state h2). Thus, as in the first case, the voice recognition switch 101 is maintained in the hold state (step S207, state i2), the voice recognition unit 104 outputs an input request (step S209, state j2), and waits for a voice input. The state is entered (state k2), and when the driver speaks, the contents are input (step S211, state m2). When the input of the item is completed, the hold state of the voice recognition switch 101 is also released (step S215, state n2).
[0043]
In the second input mode, the voice recognition switch 101 is pressed at the start of each voice input (states h1 and h2). The depressed voice recognition switch 101 is held in the same manner as in the first mode (states i1 and i2). In this state, the input request is output (step S207, states j1 and j2), and the voice input is awaited (step S207). (States k1 and k2) and utterance (step S211, states m1 and m2). Then, such processing is sequentially repeated, a series of information is input, and when the input is completed for all items, the processing is ended (state p).
[0044]
Next, the flow of the voice input process when the utterance input mode is the third input mode will be described with reference to FIG.
In the third input mode, the driver presses the voice recognition switch 101 (step S203, state q1), and keeps pressing the voice recognition switch 101 while speaking (state r1). At this time, the voice recognition switch 101 is not held. In this state, the voice recognition unit 104 outputs an input request from the display 106 or the speaker 107 (step S217, state s1), and enters a state of waiting for voice input (state t1). Here, when the driver speaks, the content is recognized and input by the voice recognition unit 104 (step S219, state u1). Then, when the driver opens the voice recognition switch 101 (step S221, state v1), the input of one item is completed.
[0045]
Then, when inputting the next item, the driver presses down the voice recognition switch 101 again (step S203, state q2) and keeps pressing down (state r2). In this state, the voice recognition unit 104 outputs an input request to the display 106 or the speaker 107 (step S217, state s2), and enters a state of waiting for voice input (state t2). The voice is recognized and input by the voice recognition unit 104 (step S219, state u2). Then, when the driver opens the voice recognition switch 101 (step S221, state v2), the input of the item ends.
[0046]
In the third input mode, the driver keeps pressing down the voice recognition device 1010 every time each information is input, and the timing of the start (state q1, q1) and the end (state 1, v2) of the utterance In the meantime, output of an input request (step S217, states s1, s2), standby of voice input (states t1, t2) and utterance (step S219, states u1, u2) are performed, and information is input. This is sequentially repeated, and a series of information is sequentially input.
Then, when the input has been completed for all the items, the series of information input processing also ends (state p).
[0047]
The embodiments described above are described for facilitating the understanding of the present invention, and are not described for limiting the present invention. Therefore, each element disclosed in the above embodiment is intended to include all design changes and equivalents belonging to the technical scope of the present invention.
[0048]
For example, in the above-described embodiment, one utterance input mode is selected from the first to third types of utterance input modes based on information such as environmental noise. However, the utterance input mode to be switched may be of two weeks.
For example, as shown in FIG. 6, the first utterance input mode and the second utterance input mode may be switched based on the amount of environmental noise.
Further, as shown in FIG. 7, the second utterance input mode and the third utterance input mode may be switched based on the amount of environmental noise and the like.
[0049]
In the embodiment described above, the noise acquisition microphone unit 109, the vehicle speed acquisition unit 110, and the engine speed acquisition unit 111 of the noise detection unit 103, and the navigation device 112 and the vehicle speed acquisition unit 113 of the driving load detection unit 108 The noise and the vehicle speed are detected at predetermined time intervals, and the detection results are output to the voice recognition switch control unit 105. However, for example, when a change in the noise amount, the vehicle speed, or the engine speed is detected, the changed detection result may be output to the voice recognition switch control unit 105.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a speech recognition device according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a configuration of a voice recognition switch of the voice recognition device illustrated in FIG. 1;
FIG. 3 is a flowchart showing a method of setting an utterance input mode in the voice recognition device shown in FIG. 1;
FIG. 4 is a flowchart illustrating a control method of a voice recognition switch performed by a voice recognition switch control unit of the voice recognition device illustrated in FIG. 1;
FIG. 5 is a diagram for explaining types of utterance input modes used in the voice recognition device shown in FIG. 1 and a voice input operation in each mode;
FIG. 6 is a diagram for explaining types of speech input modes used in the voice recognition device shown in FIG. 1 and other examples of a voice input operation in each mode.
FIG. 7 is a diagram for explaining types of speech input modes used in the voice recognition device shown in FIG. 1 and still another example of a voice input operation in each mode.
FIG. 8 is a diagram for explaining a voice input operation in a conventional voice recognition device.
[Explanation of symbols]
1. Voice recognition device
100 ... microphone
101: Voice recognition switch
102 ... Cancel switch
103 ... Noise detector
109: Noise acquisition microphone section
110 ... Vehicle speed acquisition unit
111: engine speed acquisition unit
104: voice recognition unit
105: Voice recognition switch control unit
106 ... Display
107 ... speaker
108: Operation load detection unit
112 ... Navigation device
113 ... Vehicle speed acquisition unit

Claims

A voice recognition device for a vehicle having voice input means for inputting a voice that has been uttered, and voice recognition means for recognizing the uttered content of the input voice,
Noise detection means for detecting noise that is a sound other than the utterance in the vehicle interior;
A method for specifying the input timing of the uttered voice to the voice input means includes a method for designating a utterance start time and a utterance end time for each utterance based on the detected sound volume of the vehicle interior noise. Voice input control means for switching to any one of a plurality of input methods.

The voice input control means includes: a first utterance input method for continuously waiting for a plurality of utterances; a second utterance input for designating a start time of each utterance; Or a means for switching to any one of a third utterance input method for designating the utterance start time and the utterance end time for each utterance, wherein the detected sound volume of the vehicle interior noise is a predetermined amount. When the sound volume is lower than or equal to the first level, the first utterance input method is used. When the sound volume of the noise is medium from the first level to a predetermined second level, the second sound input method is used. 2. The voice recognition device according to claim 1, wherein the voice recognition method switches to the third voice input method when the volume of the noise is higher than the second level.

The voice input control means may be a first utterance input method for continuously waiting for a plurality of utterances or a second utterance input for designating a start time of each utterance. Means for switching to any one of the three methods, or one of a third utterance input method for designating the start of utterance and the end of utterance for each utterance, wherein the volume of the detected vehicle interior noise is When the sound volume is equal to or less than a predetermined first level, the first utterance input method or the second utterance input method is used. The voice recognition device according to claim 1, wherein each of the voice recognition devices is switched to the third utterance input method.

The third utterance input method detects a predetermined continuous operation of a speaker, sets a start time of the operation as a start of utterance, sets a end time of the operation as an end of utterance, and sets the end time of the utterance. The speech recognition device according to claim 1, wherein the speech recognition device is configured to specify an input timing.

The noise detection unit may be any one of a microphone unit that collects the noise in the vehicle interior, a vehicle speed detection unit that detects the speed of the vehicle, and an engine speed detection unit that detects the engine speed of the vehicle. One, more or all,
The voice input control means is configured to control the noise in the vehicle cabin based on one, a plurality, or all of the noise collected by the noise detection means, the detected vehicle speed, and the detected engine speed. The voice recognition device according to any one of claims 1 to 4, wherein the voice recognition device detects a volume of the voice.

Further comprising a driving load detecting means for detecting a driving load of the driving vehicle of the vehicle,
The said voice input control means does not perform switching of the method which specifies the input timing of the said uttered voice, when the detected driving load becomes more than predetermined level. A speech recognition device according to claim 1.

The driving load detection unit has one or both of a traveling road information detection unit that detects information of a traveling road of the vehicle, and a vehicle speed detection unit that detects a speed of the vehicle,
7. The voice input control unit according to claim 6, wherein the driving load is detected based on one or both of information on a traveling path of the vehicle detected by the driving load detection unit and the detected vehicle speed. 8. Voice recognition device.