JP4042589B2

JP4042589B2 - Voice input device for vehicles

Info

Publication number: JP4042589B2
Application number: JP2003050984A
Authority: JP
Inventors: 敏裕脇田; 位好寺澤
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2003-02-27
Filing date: 2003-02-27
Publication date: 2008-02-06
Anticipated expiration: 2023-02-27
Also published as: JP2004258480A

Description

【０００１】
【発明の属する技術分野】
本発明は、車両用音声入力装置にかかり、特に、音声認識によって文字を入力し、認識された文字を同一母音の文字、音声認識スコアの高い文字、または語に該当する文字等の特定の一文字へのみ修正できるようにすることによって、ドライバが画面を注視する時間を減少させて文字を入力する際に安全に運転することができるようにした車両用音声入力装置に関する。
【０００２】
【従来の技術】
音声を用いない文字入力としては、タッチパネルを備えた画面に五十音入力パレットを表示し、タッチパネル上の入力したい文字部分に指を接触させることで文字入力を行なう方式が実用化されている。しかしながら、この方式では、タッチパネルを備えた画面が手の届かない距離に配置されている場合（例えば、自動車のセンターパネルに配置されていて、運転中に操作入力しようとする場合）には、使用できない、という問題がある。
【０００３】
また、画面に表示した五十音入力パレットとジョイスティックとを用い、カーソルで選択された文字を選択表示しながら、ジョイスティックでカーソルを全方向に移動させることで文字入力を行なう方式も提案されている。しかしながら、この方式では、ジョイスティックが上下左右斜め方向の全方向へ移動可能になっているので、意図しない方向にカーソルが移動し易く、操作ミスが発生し易い、という問題がある。
【０００４】
さらに、ジョグダイアルを用いた五十音入力方式も提案されている（例えば、特許文献１）が、一文字を入力するためにジョグダイヤルによる選択と確定という操作を２回行なう必要があり、入力の効率が悪い、という問題がある。
【０００５】
一方、車両用文字入力装置として効果的な音声認識による文字入力装置は、種々の方式が提案されている。しかしながら、音声認識誤りが存在し、認識率１００％の音声認識は困難なため、文字入力装置を実用化するためには、誤認識時の修正手段が不可欠である。従来では、キーボード等の補助入力手段を用いて前入力を削除し再入力する、という方式が提案されている（例えば、特許文献２）。
【０００６】
この方式では、誤認識する度に再度音声入力が必要なため、利便性に欠けると共に、再度同様の誤認識が繰り返されると、入力が不可能になる可能性がある。
【０００７】
また、単語の音声認識において、誤認識した音節をカーソルで指示し、該当する音節と同一段の文字を順次一文字ずつ提示することで音節を修正する方法が提案されている（例えば、特許文献３）。さらに、単音節による音声認識装置で単音節語を同一の母音グループ毎に一連のアドレスに各々登録し、アドレスを加減算することでグループ内で認識候補の単音節語を一文字ずつ表示して修正することが記載されている（例えば、特許文献４）。
【０００８】
【特許文献１】
特開２０００−１９３４７５号公報
【特許文献２】
特開平１−１８９６９９号公報
【特許文献３】
特開昭６１−４１１９９号公報
【特許文献４】
特開昭５６−１１６０９６号公報
【０００９】
【発明が解決しようとする課題】
しかしながら、特許文献３の方法では、修正キーを押して画面を目視し、一文字ずつ修正する必要があるため、画面を注視する時間が長くなる、という問題がある。また、特許文献４の音声認識装置では表示部には一文字しか表示されないので、単音節語の修正方向を一目で確認することが困難であり、上記と同様に画面を注視する時間が長くなる、という問題がある。そして、一般に画面の注視時間が長くなるほど車両のふらつきが増加する等によって、安全性が損なわれることが知られており（H.T.Zwahlen et al:Safety aspects of CRT touch panel controls in automobile, Vision in Vehicles II, A.G.Gale et al (Eds), pp.335-344, North Holland Press, Amsterdam (1998)）、ドライバが使用する音声入力装置として画面注視時間の長い方式は不適切である。
【００１０】
本発明は、上記問題点を解消すべくなされたもので、入力された音声を音声認識によって文字に変換し、変換された文字を文字入力パレット上の同一母音の文字、音声認識スコアの高い文字、または語に該当する文字等の特定の一文字へのみ修正できるようにすることによって、ドライバが画面を注視する時間を減少させて安全に文字を入力することができるようにした車両用音声入力装置を提供することを目的とする。
【００１１】
【課題を解決するための手段】
上記目的を達成するために本発明は、音声を入力する音声入力手段と、前記音声入力手段から入力された音声を認識し、認識結果を認識結果に対応する文字または文字列に変換する音声認識手段と、前記音声認識手段で変換された文字または文字列を、同一母音が同一ライン上に並ぶように五十音の文字、または五十音の文字の一部分を配列して表示しておいて該五十音の文字、または該五十音の文字の一部分の文字の中から選択された一文字を一文字ずつ入力するための五十音入力パレットと共に表示する表示手段と、前記音声認識手段によって変換された文字または文字列に対応した前記五十音入力パレット上の一文字を選択表示すると共に、修正要求に応じて前記同一ライン上でのみ選択表示を変更し、確定要求に応じて選択表示された文字を確定する制御手段と、を含んで構成したものである。
【００１２】
本発明によれば、音声入力手段から入力された音声は、音声認識手段によって認識され、認識結果が認識結果に対応する文字または文字列に変換される。音声認識手段で変換された文字または文字列は、同一母音が同一ライン上に並ぶように五十音の文字、または五十音の文字の一部分を配列して表示しておいてこの五十音の文字、または該五十音の文字の一部分の文字の中から選択された一文字を一文字ずつ入力するための五十音入力パレットと共に表示手段に表示される。
【００１３】
また、制御手段は、音声認識手段によって変換された文字または文字列に対応した文字入力パレット上の一文字を選択表示する。これによって、音声認識の結果を視覚によって確認することができる。なお、音声認識手段での認識結果を音声で報知するようにすれば、聴覚によっても音声認識の結果を確認することができる。
【００１４】
音声認識の結果が正しくない場合に修正を要求すると、制御手段は、修正要求に応じて前記同一ライン上でのみ選択表示を変更する。
【００１５】
音声認識では、同一母音の文字の認識率は極めて高い。そこで、修正要求に応じて同一母音の文字へのみ選択表示を変更するようにすることにより、五十音入力パレット上で同一母音の文字へのみ修正することができるため、ドライバの画面を注視する時間が減少し、安全に文字を入力することができるようになる。また、五十音入力パレット上には、複数の文字が表示されているので、選択表示を変更する方向を一目で確認することができる。
【００１６】
また、制御手段は、修正要求に応じて同一ライン上であって、音声認識スコアの高い文字、語に該当する文字、または、音声認識スコアの高い文字、及び語に該当する文字のいずれか一つの文字を定めるための条件を組み合わせた条件を満たす文字へのみ変更することができる。
五十音入力パレットにおいて同一ライン上に並ぶ認識候補の別の例として、音声認識スコアの高い文字がある。音声認識では一般に入力された音声に対して認識スコアの高い複数の認識候補が得られるが、それらの中で最もスコアの高い候補を一つの認識結果とする。誤認識が起こる場合でも、正解が認識スコアの高い候補に含まれる場合が多い。そこで、入力時の修正要求に応じて認識スコアの高い候補文字へのみ選択表示を変更することで、上記と同様に安全に文字を入力することができる。
【００１７】
また、五十音入力パレットにおいて同一ライン上に並ぶ認識候補の別の例として、語に該当する文字、すなわち入力しようとする語を構成する候補となる文字がある。例えば、本装置を一文字ずつの地名入力に用いる場合を例に説明する。今、「と」「う」まで入力されたとすると、次に入力される文字は、「き」（東京など）や「か」（東海など）等の入力しようとする語を構成する候補となる文字に限られ「ん」は続かない。従って、入力時の修正要求に応じて語に該当する文字（上記の例では、「き」や「か」等）へのみ選択表示を変更することで、上記と同様に安全に文字を入力することができる。
【００１９】
そして、制御手段は、確定要求に応じて選択表示された文字を確定する。修正要求及び確定要求は、音声で入力してもよく、スイッチ等の機械的操作手段で入力するようにしてもよい。
【００２０】
五十音入力パレットの同一ラインは、左右方向のラインでも上下方向のラインでもよいが、ドライバにとっては左右方向のラインがより好ましい。
【００２２】
【発明の実施の形態】
以下、図面を参照して、音声で文字を一文字ずつ発話することで音声認識によって文字列を入力すると共に、音声認識スコアの高い文字の入力候補の文字を左右方向の同一ライン上に表示し、音声認識スコアの高い文字の他の文字へのみ修正可能として文字を入力するカーナビゲーションの音声入力装置（文字列入力装置）に本発明を適用した実施の形態について詳細に説明する。
【００２３】
図１に示すように本実施の形態の車両用音声入力装置は、ドライバ等の音声を入力するマイクロホン等で構成された音声入力装置１１と、音声入力装置１１から入力された音声を予め記憶されている音声認識プログラムによって認識し、認識結果を認識結果に対応する文字または文字列に変換するマイクロコンピュータで構成された音声認識装置１２とを備えている。音声認識装置１２には、単音節での辞書を含む各種の辞書を記録した辞書データベース１７が接続されている。音声認識装置１２は、辞書データベース１７に記録されたいずれか１つの辞書を選択して音声認識処理を実行する。
【００２４】
また、音声認識装置１２には、音声認識処理によって認識された音声を出力するスピーカ等で構成された音声出力装置１３が接続され、音声出力装置１３には判定装置１４が接続されている。
【００２５】
さらに、音声認識装置１２には、液晶表示装置等で構成された補助表示装置１６が接続されている。この補助表示装置１６には、図２に示すように、音声認識装置１２で変換された文字または文字列を単音節毎に表示するウインドウ１６Ａ、「違う」、「確定」、「中止」のコマンド名が表示されたコマンド表示部１６Ｂ、五十音の文字を一文字ずつ入力するための同一母音の文字列を左右方向に配列して構成された文字入力パレットを表示する文字入力パレット部１６Ｃ、及び文字入力パレットの選択表示を左右に移動するための「左」、「右」のコマンド名、及び文字入力パレットを次の画面に切換えるための「次へ」のコマンド名が表示された切換コマンド部１６Ｄが設けられている。
【００２６】
このコマンド部１６Ｂ、文字入力パレット部１６Ｃ、切換コマンド部１６Ｄには、タッチパネルが設けられており、指を接触させることにより、表示された文字に対応するコマンドを入力したり、入力候補の選択表示を変更することが可能である。なお、コマンドや入力候補の選択表示の変更は、タッチパネル以外に、後述するように音声によって入力したコマンド、または補助入力装置１５から入力した信号によっても変更することができる。
【００２７】
本実施の形態の文字入力パレットは、同一母音の複数の文字が左右方向（横方向）に１列並んで表示された１段のパレットで構成されている。この文字入力パレットの同一母音の複数の文字は、音声認識のスコアの高い文字で構成されている。この文字入力パレットによれば、「次へ」のコマンドを入力することにより、表示されている文字入力パレットを次の段の文字入力パレットに切換えて表示することができる。次の段の文字入力パレットも同一母音の文字が横方向に１列並んで表示された１段のパレットで構成されている。また、「左」または「右」のコマンドを入力することにより、修正要求が入力され、文字の選択表示（図２では「にゃ」の単音節文字が選択表示された状態が図示されている）を左または右に移動することができる。したがって、本実施の形態では、音声入力装置１１から入力する音声、補助入力装置１５の操作によって、修正要求を入力し、文字の選択表示を左または右に移動することができる。
【００２８】
なお、文字入力パレットの左端の文字が選択表示された状態で「左」のコマンドが入力されたときに、前段の文字入力パレット（または次段の文字入力パレット）に切換え、文字入力パレットの右端の文字が選択表示された状態で「右」のコマンドが入力されたときに、次段の文字入力パレット（または前段の文字入力パレット）に切換えるようにしてもよい。
【００２９】
そして、音声認識装置１２には、文字入力パレットの文字の選択表示を手による操作によって文字入力パレットの文字配列方向に移動させたり、文字入力パレットを切換えるための補助入力装置１５が接続されている。なお、この補助入力装置１５から「違う」、「確定」、「中止」のコマンドを更に入力できるようにしてもよい。
【００３０】
以下、音声認識装置１２による音声を認識し、修正要求に応じて入力候補の文字の選択表示を変更すると共に、確定要求に応じて入力候補の文字を確定して文字列を入力する処理ルーチンについて図３及び図４を参照して説明する。
【００３１】
自動車に搭載されているカーナビゲーションのスイッチがオンされると、図３に示す処理ルーチンが起動され、ステップ１００において初期モードに設定される。これによって、音声認識装置１２は、辞書データベースの初期モード辞書を選択して音声認識処理を実行する。
【００３２】
次のステップ１０２では、音声入力装置１１から音声の入力があったか否かを判断し、音声の入力があった場合には、ステップ１０４において初期モード辞書を使用して入力された音声を認識し、認識結果を認識結果に対応する文字列に変換する音声認識処理を実行する。ステップ１０６では、「施設」等のキーワードが入力されたか否かを判断し、キーワードが入力されたと判断された場合のみ、ステップ１０８において音声出力装置１３に認識結果及び動作作業の情報を受け渡す。これによって、音声出力装置１３は、単音節での入力要求を音声で報知すると共に、判定装置１４に、認識結果情報を受け渡す。
【００３３】
認識結果情報を受け取った判定装置１４は、辞書データベースを単音節モード辞書に切換えて、キーワードの内容に適した入力モードで音声認識処理を実行するようにモード切換指示を音声認識装置１２に出力する。
【００３４】
例えば、音声認識されたキーワードが「施設」であった場合には、音声出力装置１３では、「施設名を一文字ずつお話ください。」と音声出力して単音節での入力要求を音声で報知し、判定装置１４は、施設名入力モードにし、認識辞書データベースを単音節入力モードへ切換える指示を音声認識装置へ出力する。
【００３５】
キーワード以外の音声が入力された場合、または誤認識でキーワードが認識されなかった場合にはそのまま初期モードを継続し、キーワード待ち受け状態を継続する。
【００３６】
なお、音声出力装置１３から認識結果を音声で報知するようにすれば、ドライバが認識結果を聴覚で確認することができるので、ドライバは発話した音声と認識された音声とが同一であるか否かを補助表示装置を目視することなく判断することができる。
【００３７】
次のステップ１１０では、判定装置１４からモード切換指示が入力されたか否かを判断し、モード切換指示が入力された場合には、ステップ１１２において認識辞書データベースを入力されたキーワードに対応する辞書に切換えると共にキーワードに対応する入力モードに切換える。
【００３８】
そして、ステップ１１４において、キーワードに対応する辞書を用いて音声認識処理を行なう単音節入力モード処理を実施し、認識結果を認識結果に対応する文字または文字列に変換する。
【００３９】
次に、図４を参照してステップ１１４の詳細を説明する。ステップ２００において音声入力装置１１から音声入力があったか否かを判断し、音声入力があった場合には、ステップ２０１において単音節モードによる音声認識処理を実施し、ステップ２０２において音声認識結果に基づいて「違う」、「確定」、「左」、「右」等のコマンドが音声で入力されたか否かを判断する。
【００４０】
ステップ２０２においてコマンド以外、すなわち単音節の音声が入力されたと判断されたときは、ステップ２０４において入力する音声の最初の音節か否かを判断する。最初の音節の場合には、音声出力装置１３に認識結果を出力して音声で認識結果を報知すると共に、補助表示装置１６のウインドウ１６Ａに音声認識結果によって変換された文字を表示する。図２では、「役場」の最初の音節「や」が認識されて表示された結果が例示されている。
【００４１】
ステップ２０４で最初の音節でないと判断されたとき、すなわち次の音節が入力されたと判断されたときには、ステップ２０８において前回音声認識された音節を確定する処理を行ない、ステップ２１０で音声出力装置１３に認識結果を出力して前回認識された音節と今回認識された音節とを連続させた音声が出力されるようにするとと共に、ウインドウ１６Ａに前回認識された音節と今回認識された音節とを連続させた文字列を表示する。
【００４２】
これによって、ドライバは、聴覚によって音声認識の結果を確認することができるので、補助表示装置１６のウインドウ１６Ａを目視しなくても音声認識の結果を確認することができる。
【００４３】
ステップ２００において、音声入力が無いと判断されたときには、ステップ２１２において所定時間経過したか否かを判断し、所定時間経過していない場合には音声入力待ち状態とし、所定時間経過しても音声の入力が無い場合には、ステップ２１４において音声認識処理によって認識されて確定されていない音声が存在するか否かを判断し、認識されて確定されていない音声が存在する場合には、ステップ２１６で認識されて確定されていない音声を確定して文字に変換する。
【００４４】
これによって、所定時間音声の入力が無い場合には、前回認識した音声を自動的に確定することができる。
【００４５】
ステップ２０２においてコマンドが入力されたと判断されたときは、以下で説明するコマンドに応じた処理を実行する。まず、ステップ２１８で「違う」を表すコマンドが入力されたと判断された場合には、音声認識の結果に基づいて音声出力装置１３から音声で報知された音声は、入力を意図した音声と異なるので、認識結果を確定することなくステップ２２０で認識結果を取り消し、ステップ２００に戻って音声入力待ち受け状態とする。
【００４６】
ステップ２２２で「確定」を表すコマンドが入力されたと判断されたときは、音声認識の結果に基づいて音声出力装置１３から音声で報知された音声は、入力を意図した音声であるので、ステップ２２４において認識された音声を確定する処理を行なってステップ２００に戻って音声入力待ち受け状態とする。したがって、本実施の形態では、「確定」を表すコマンドを音声で入力することによっても入力候補を確定することができる。すなわち、本実施の形態では、「確定」コマンドの入力、音声を入力した後所定時間以内の次の音声の入力、音声を入力した後所定時間以上の音声無入力状態のいずれかで、確定要求を行なうことができる。
【００４７】
次のステップ２２６で、「右」、「左」、「次へ」等の移動を表すコマンドが入力されたと判断されたときは、ステップ２２８において「右」を表すコマンド、「左」を表すコマンド、または「次へ」を表すコマンドが入力されたかを判断し、「右」を表すコマンドまたは「左」を表すコマンドが入力された場合は、ステップ２３０において図２の文字入力パレットの選択表示を段に沿って「右」または「左」に一文字移動する処理を行ないステップ２００に戻る。これによって、音声認識の結果、入力候補として文字入力パレット上に選択表示された文字を文字入力パレットの文字配列方向にのみ、すなわち同一母音の文字へのみ一文字移動させて修正することができる。
【００４８】
図２では、「や」と音声入力したにも拘わらず、「にゃ」と音声認識され、「にゃ」の文字が文字入力パレット上に選択表示された場合が示されている。この場合は、「右」を表すコマンドを２回入力することで、文字の選択表示が２文字分右に移動し、正しい入力候補である「や」を選択表示し、確定要求によって確定することによってウインドウ１６Ａに正しい文字を表示することができる。
【００４９】
また、ステップ２２８で、「次へ」を表すコマンドが入力されたと判断されたときは、ステップ２３２において次の候補を表示するように文字入力パレットを次の文字入力ハレットに切換える処理を行なってステップ２００に戻る。
【００５０】
また、ステップ２２６で「中止」を表すコマンドが入力されたと判断されたときは、このルーチン終了す。
【００５１】
上記では、音声入力で選択表示を移動させるコマンドを入力する例について説明したが、ドライバが補助入力装置１５を操作することによって文字の選択表示を同一母音を表す文字を配列した段に沿って移動させてもよい。
【００５２】
なお、上記では、音声認識では複数の認識候補を出力することができ、この認識候補は同一母音の文字であるのが殆どであることを考慮して、音声認識スコアが高い複数の文字のみを配列した１段の文字入力パレットを切換えて使用する例について説明したが、図５に示すように五十音の同一母音の文字を横方向に配列した五十音文字入力パレット、または同一母音の文字を横方向に配列した五十音の一部分を用いた五十音文字入力パレットを用いるようにしてもよい。また、文字入力パレットとして語に該当する文字、すなわち入力しようとする語を構成する候補となる文字を配列したパレットを用いるようにしてもよい。例えば、地名を入力する場合、文字入力パレットには、「と」「う」まで入力された段階で、「と」「う」「き」「か」「ほ」（東京、東海、東北などが語の場合）等が表示される。
【００５３】
さらに、上記では、選択表示を横方向に移動させる例について説明したが上下方向に移動させるようにしてもよい。
【００５４】
以上説明したように、本実施の形態では、音声で文字を一文字ずつ発話することで文字または文字列を入力すると共に入力候補の文字を特定の他の一文字へのみ修正可能としたので、ドライバの画面を注視する時間を減少させ、安全に文字入力ができるようになる、という効果が得られる。
【００５５】
【発明の効果】
以上説明したように本発明によれば、音声認識によって文字を入力し、認識された文字を入力パレット上の特定の一文字へのみ修正できるようにしたので、ドライバが画面を注視する時間を減少させて安全に文字を入力することができる、という効果が得られる。
【図面の簡単な説明】
【図１】本発明の実施の形態のブロック図である。
【図２】本発明の実施の形態の補助表示装置の画面の表示例を示す図である。
【図３】本発明の実施の形態の処理ルーチンを示す流れ図である。
【図４】図３の単音節入力モード処理のルーチンを示す流れ図である。
【図５】五十音の文字を表示した五十音入力パレットを示す平面図である。
【符号の説明】
１１音声入力装置
１２音声認識装置
１３音声出力装置
１６補助表示装置
１７辞書データベース[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a vehicle voice input device, and in particular, a character is input by voice recognition, and the recognized character is a specific character such as a character having the same vowel, a character having a high voice recognition score, or a character corresponding to a word. The present invention relates to an audio input device for a vehicle that allows a driver to drive safely when inputting characters by reducing the time for the driver to gaze at the screen.
[0002]
[Prior art]
As a character input without using a voice, a method has been put into practical use in which a Japanese syllabary input palette is displayed on a screen provided with a touch panel, and a character is input by bringing a finger into contact with a character portion to be input on the touch panel. However, this method is used when a screen with a touch panel is placed at a distance that is out of reach (for example, when it is placed on the center panel of an automobile and an operation is input while driving). There is a problem that it is not possible.
[0003]
Also, a method has been proposed in which a character input is performed by moving the cursor in all directions with the joystick while selecting and displaying the character selected with the cursor using the syllabary input palette and joystick displayed on the screen. . However, this method has a problem that the joystick can move in all directions of up, down, left, and right diagonal directions, so that the cursor can easily move in an unintended direction and an operation error easily occurs.
[0004]
Furthermore, a Japanese syllabary input method using a jog dial has also been proposed (for example, Patent Document 1). However, in order to input a single character, it is necessary to perform selection and confirmation operations twice with the jog dial, and the input efficiency is improved. There is a problem of being bad.
[0005]
On the other hand, various methods have been proposed for a character input device by voice recognition that is effective as a vehicle character input device. However, since speech recognition errors exist and speech recognition with a recognition rate of 100% is difficult, correction means for erroneous recognition is indispensable for putting the character input device into practical use. Conventionally, a method has been proposed in which auxiliary input such as a keyboard is used to delete and re-input previous input (for example, Patent Document 2).
[0006]
This method requires voice input again every time it is erroneously recognized, so that it is not convenient, and if the same erroneous recognition is repeated again, input may become impossible.
[0007]
Also, in speech recognition of words, a method has been proposed in which a syllable is corrected by pointing a misrecognized syllable with a cursor and sequentially presenting characters one step at a time in sequence with the corresponding syllable (for example, Patent Document 3). ). Furthermore, a single syllable word is registered in a series of addresses for each identical vowel group by a single syllable speech recognition device, and a single syllable word of a recognition candidate is displayed and corrected one by one in the group by adding and subtracting the address. (For example, Patent Document 4).
[0008]
[Patent Document 1]
JP 2000-193475 A [Patent Document 2]
JP-A-1-189699 [Patent Document 3]
JP 61-41199 A [Patent Document 4]
Japanese Patent Laid-Open No. 56-1116096
[Problems to be solved by the invention]
However, the method of Patent Document 3 has a problem that it takes a long time to gaze at the screen because it is necessary to visually check the screen by pressing the correction key and to correct each character one by one. In addition, since only one character is displayed on the display unit in the speech recognition device of Patent Document 4, it is difficult to confirm the correction direction of a single syllable word at a glance, and it takes a long time to gaze at the screen as described above. There is a problem. In general, it has been known that the longer the gazing time of the screen, the more the vehicle's wobbling increases. AGGale et al (Eds), pp.335-344, North Holland Press, Amsterdam (1998)), and a method with a long screen gaze time as an audio input device used by a driver is inappropriate.
[0010]
The present invention has been made to solve the above problems, and converts input speech into characters by speech recognition. The converted characters are characters of the same vowel on the character input palette and characters having a high speech recognition score. Or a voice input device for a vehicle that allows a driver to input characters safely by reducing the time for the driver to gaze at the screen by making corrections to only one specific character such as a character corresponding to a word The purpose is to provide.
[0011]
[Means for Solving the Problems]
In order to achieve the above object, the present invention provides a voice input means for inputting voice, and voice recognition for recognizing the voice inputted from the voice input means and converting the recognition result into a character or a character string corresponding to the recognition result. And a character or a character string converted by the voice recognition means are displayed by arranging 50-characters or a part of 50-characters so that the same vowels are arranged on the same line. Display means for displaying one character selected from the characters of the Japanese syllabary or a part of the character of the Japanese syllabary along with a Japanese syllabary input palette , and conversion by the speech recognition unit while characters or selects a character on the alphabetical input palette corresponding to the character string display, change only the selected display on the same line in response to the modification request, the selection display in response to confirmation request And control means for determining the character, which is constituted contains.
[0012]
According to the present invention, the voice input from the voice input unit is recognized by the voice recognition unit, and the recognition result is converted into a character or a character string corresponding to the recognition result. Converted character or character string by voice recognition means, the Japanese syllabary same vowel is allowed to display by arranging syllabary character or a portion of the Japanese syllabary character, so as to line up on the same line Or a character selected from a part of the 50- character character is displayed on the display means together with a 50-character input palette for inputting one character at a time.
[0013]
The control means selectively displays one character on the character input palette corresponding to the character or character string converted by the voice recognition means. Thereby, the result of voice recognition can be visually confirmed. If the recognition result by the voice recognition means is notified by voice, the result of voice recognition can be confirmed by hearing.
[0014]
When the correction is requested when the result of the speech recognition is not correct, the control unit changes the selection display only on the same line in response to the correction request.
[0015]
In the speech recognition, the recognition rate of the same vowel character is very high. Therefore, by changing the selection display only to the characters of the same vowel in response to the correction request, it is possible to correct only the characters of the same vowel on the Japanese syllabary input palette , so watch the driver's screen. Time is reduced and you can enter characters safely. In addition, since a plurality of characters are displayed on the Japanese syllabary input palette , the direction of changing the selection display can be confirmed at a glance.
[0016]
Further, the control means is one of a character having a high speech recognition score, a character corresponding to a word, a character having a high speech recognition score, and a character corresponding to a word on the same line in response to a correction request. It is possible to change only to a character that satisfies a condition combining conditions for defining one character.
As another example of recognition candidates arranged on the same line in the Japanese syllabary input palette, there is a character having a high speech recognition score. In speech recognition, a plurality of recognition candidates having a high recognition score is generally obtained for input speech. Among them, a candidate having the highest score is taken as one recognition result. Even when misrecognition occurs, correct answers are often included in candidates with high recognition scores. Therefore, by changing the selection display only to candidate characters having a high recognition score in response to a correction request at the time of input, it is possible to input characters safely as described above.
[0017]
Another example of recognition candidates arranged on the same line in the Japanese syllabary input palette is a character corresponding to a word, that is, a character that becomes a candidate constituting a word to be input. For example, the case where this apparatus is used for inputting a place name for each character will be described as an example. Assuming that "to" and "u" are entered, the next character to be entered is a candidate for a word to be entered, such as "ki" (such as Tokyo) or "ka" (such as Tokai). It is limited to letters and “n” does not follow. Therefore, according to the correction request at the time of input, by changing the selection display only to the character corresponding to the word (in the above example, “ki”, “ka”, etc.), the character can be safely input as described above. be able to.
[0019]
Then, the control means confirms the character selected and displayed in response to the confirmation request. The correction request and the confirmation request may be input by voice or may be input by a mechanical operation means such as a switch.
[0020]
The same line of the Japanese syllabary input palette may be a horizontal line or a vertical line, but a horizontal line is more preferable for the driver.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
In the following, referring to the drawings, a character string is input by voice recognition by speaking characters one by one by voice , and input candidate characters having a high voice recognition score are displayed on the same line in the left-right direction. An embodiment in which the present invention is applied to a voice input device (character string input device) for car navigation that inputs a character that can be corrected only to other characters having a high voice recognition score will be described in detail.
[0023]
As shown in FIG. 1, the vehicular audio input device according to the present embodiment stores in advance an audio input device 11 composed of a microphone or the like for inputting audio from a driver and the like, and audio input from the audio input device 11. And a speech recognition device 12 composed of a microcomputer for converting the recognition result into characters or character strings corresponding to the recognition result. Connected to the speech recognition device 12 is a dictionary database 17 that records various dictionaries including a dictionary of single syllables. The speech recognition device 12 selects any one dictionary recorded in the dictionary database 17 and executes speech recognition processing.
[0024]
The voice recognition device 12 is connected to a voice output device 13 configured by a speaker or the like that outputs voice recognized by the voice recognition processing, and the voice output device 13 is connected to a determination device 14.
[0025]
Further, an auxiliary display device 16 composed of a liquid crystal display device or the like is connected to the voice recognition device 12. As shown in FIG. 2, the auxiliary display device 16 has a window 16A for displaying characters or character strings converted by the speech recognition device 12 for each single syllable, commands “different”, “confirm”, and “stop”. A command display unit 16B displaying names, a character input palette unit 16C for displaying a character input palette configured by arranging character strings of the same vowel in the left-right direction for inputting one-fifth characters one by one, and A switching command section that displays "left" and "right" command names for moving the character input palette selection display to the left and right, and "next" command names for switching the character input palette to the next screen. 16D is provided.
[0026]
The command unit 16B, the character input palette unit 16C, and the switching command unit 16D are provided with a touch panel. By touching the finger, a command corresponding to the displayed character can be input, or input candidate selection display. It is possible to change. The selection display of commands and input candidates can be changed not only by the touch panel, but also by commands input by voice as described later or signals input from the auxiliary input device 15.
[0027]
The character input palette according to the present embodiment includes a one-stage palette in which a plurality of characters of the same vowel are displayed side by side in the left-right direction (horizontal direction). A plurality of characters of the same vowel in this character input palette are composed of characters having a high speech recognition score. According to this character input palette, by inputting the “next” command, the displayed character input palette can be switched to the next character input palette and displayed. The next-stage character input palette is also composed of a one-stage palette in which characters of the same vowel are displayed in a row in the horizontal direction. Further, by inputting a “left” or “right” command, a correction request is input, and a character selection display (FIG. 2 shows a state where a single syllable character “Nya” is selected and displayed). Can move left or right. Therefore, in the present embodiment, the correction request can be input by moving the voice input from the voice input device 11 and the operation of the auxiliary input device 15, and the character selection display can be moved to the left or right.
[0028]
When the “Left” command is entered while the leftmost character of the character input palette is selected and displayed, the character input palette is switched to the previous character input palette (or the next character input palette) and the right end of the character input palette. When the “right” command is input in a state where the character is selected and displayed, the character input palette may be switched to the next character input palette (or the previous character input palette).
[0029]
The voice recognition device 12 is connected to an auxiliary input device 15 for moving the character selection display on the character input palette in the character arrangement direction of the character input palette by hand operation or switching the character input palette. . Note that “different”, “confirm”, and “cancel” commands may be further input from the auxiliary input device 15.
[0030]
Hereinafter, a processing routine for recognizing speech by the speech recognition device 12, changing the selection display of the input candidate character in response to the correction request, and confirming the input candidate character in response to the confirmation request and inputting the character string This will be described with reference to FIGS.
[0031]
When the car navigation switch mounted on the automobile is turned on, the processing routine shown in FIG. 3 is started, and the initial mode is set in step 100. As a result, the voice recognition device 12 selects the initial mode dictionary in the dictionary database and executes the voice recognition process.
[0032]
In the next step 102, it is determined whether or not there is a voice input from the voice input device 11. If there is a voice input, the voice input using the initial mode dictionary is recognized in step 104; A speech recognition process is performed for converting the recognition result into a character string corresponding to the recognition result. In step 106, it is determined whether or not a keyword such as “facility” has been input. Only when it is determined that a keyword has been input, in step 108, the recognition result and operation work information are transferred to the audio output device 13. As a result, the voice output device 13 informs the input request in a single syllable by voice and transfers the recognition result information to the determination device 14.
[0033]
Upon receiving the recognition result information, the determination device 14 switches the dictionary database to a single syllable mode dictionary and outputs a mode switching instruction to the speech recognition device 12 so as to execute speech recognition processing in an input mode suitable for the keyword content. .
[0034]
For example, if the recognized keyword is “facility”, the voice output device 13 outputs a voice message “Please speak the facility name one character at a time.” The determination device 14 enters the facility name input mode and outputs an instruction to switch the recognition dictionary database to the single syllable input mode to the speech recognition device.
[0035]
When a voice other than the keyword is input or when the keyword is not recognized due to misrecognition, the initial mode is continued as it is, and the keyword standby state is continued.
[0036]
If the recognition result is notified by voice from the voice output device 13, the driver can confirm the recognition result by hearing, so that the driver speaks the same voice as the recognized voice. This can be determined without viewing the auxiliary display device.
[0037]
In the next step 110, it is determined whether or not a mode switching instruction is input from the determination device 14. When the mode switching instruction is input, the recognition dictionary database is converted into a dictionary corresponding to the input keyword in step 112. Switch to the input mode corresponding to the keyword.
[0038]
In step 114, a single syllable input mode process is performed in which a speech recognition process is performed using a dictionary corresponding to the keyword, and the recognition result is converted into a character or a character string corresponding to the recognition result.
[0039]
Next, the details of step 114 will be described with reference to FIG. In step 200, it is determined whether or not there is a voice input from the voice input device 11. If there is a voice input, a voice recognition process in a single syllable mode is performed in step 201, and based on the voice recognition result in step 202. It is determined whether or not a command such as “different”, “confirm”, “left”, “right”, or the like has been input by voice.
[0040]
If it is determined in step 202 that a command other than a command, that is, a single syllable speech is input, it is determined in step 204 whether or not it is the first syllable of the input speech. In the case of the first syllable, the recognition result is output to the voice output device 13 to notify the recognition result by voice, and the character converted by the voice recognition result is displayed on the window 16A of the auxiliary display device 16. FIG. 2 illustrates the result of the recognition and display of the first syllable “ya” of “office”.
[0041]
If it is determined in step 204 that the syllable is not the first syllable, that is, if it is determined that the next syllable has been input, a process of confirming the syllable recognized last time in step 208 is performed. The recognition result is output so that a sound in which the previously recognized syllable and the currently recognized syllable are continuously output is output, and the previously recognized syllable and the currently recognized syllable are continuously displayed in the window 16A. Display the character string.
[0042]
As a result, the driver can confirm the result of the voice recognition by hearing, so that the result of the voice recognition can be confirmed without viewing the window 16A of the auxiliary display device 16.
[0043]
If it is determined in step 200 that there is no voice input, it is determined in step 212 whether or not a predetermined time has passed. If the predetermined time has not passed, the voice input is waited for. When there is no input, it is determined in step 214 whether or not there is an unrecognized voice that has been recognized by the voice recognition process. If there is an unrecognized voice that has been recognized, step 216 is determined. The voice that has been recognized but not yet confirmed is confirmed and converted into characters.
[0044]
Thereby, when there is no voice input for a predetermined time, the voice recognized last time can be automatically determined.
[0045]
When it is determined in step 202 that a command has been input, processing corresponding to the command described below is executed. First, if it is determined in step 218 that a command representing “different” has been input, the voice notified from the voice output device 13 based on the result of voice recognition is different from the voice intended for input. Then, without recognizing the recognition result, the recognition result is canceled in step 220, and the process returns to step 200 to be in a voice input standby state.
[0046]
If it is determined in step 222 that a command indicating “confirm” has been input, the voice notified from the voice output device 13 based on the result of the voice recognition is a voice intended for input. A process for confirming the voice recognized in step S3 is performed, and the process returns to step 200 to enter a voice input standby state. Therefore, in this embodiment, the input candidate can also be confirmed by inputting a command representing “confirm” by voice. That is, in the present embodiment, the confirmation request is made either by inputting the “confirm” command, inputting the next voice within a predetermined time after inputting the voice, or no voice input state for a predetermined time or more after inputting the voice. Can be performed.
[0047]
When it is determined in the next step 226 that a command representing movement such as “right”, “left”, “next”, etc. is input, a command representing “right”, a command representing “left” in step 228. 2 or a command indicating “next” is input, and if a command indicating “right” or a command indicating “left” is input, the character input palette shown in FIG. A process of moving one character “right” or “left” along the steps is performed, and the process returns to step 200. As a result, the character selected and displayed on the character input palette as an input candidate as a result of speech recognition can be corrected by moving one character only in the character arrangement direction of the character input palette, that is, only to the character of the same vowel.
[0048]
FIG. 2 shows a case in which “Nya” is recognized as a voice and “Nya” is selected and displayed on the character input palette even though “ya” is voiced. In this case, by inputting the command representing “right” twice, the character selection display moves to the right by two characters, and the correct input candidate “ya” is selected and displayed and confirmed by the confirmation request. Thus, correct characters can be displayed in the window 16A.
[0049]
If it is determined in step 228 that a command representing “next” has been input, a process of switching the character input palette to the next character input hallet so as to display the next candidate in step 232 is performed. Return to 200.
[0050]
If it is determined in step 226 that a command indicating “stop” has been input, this routine ends.
[0051]
In the above description, an example of inputting a command for moving the selection display by voice input has been described. However, when the driver operates the auxiliary input device 15, the character selection display is moved along a stage in which characters representing the same vowel are arranged. You may let them.
[0052]
In the above description, it is possible to output a plurality of recognition candidates in speech recognition. Considering that the recognition candidates are mostly characters of the same vowel, only a plurality of characters having a high speech recognition score are used. The example of switching and using the arranged one-stage character input palette has been described. However, as shown in FIG. 5, a 50-character input palette in which characters of the same vowel are arranged horizontally, or It is also possible to use a Japanese syllabary character input palette using a part of the Japanese syllabary in which characters are arranged in the horizontal direction. Further, as the character input palette, a palette in which characters corresponding to words, that is, characters that are candidates for configuring the word to be input, are arranged may be used. For example, when entering a place name, “to”, “u”, “ki”, “ka”, “ho” (Tokyo, Tokai, Tohoku, etc.) are entered in the character input palette until “to” and “u” are entered. In the case of words) is displayed.
[0053]
Furthermore, in the above description, an example in which the selection display is moved in the horizontal direction has been described. However, the selection display may be moved in the vertical direction.
[0054]
As described above, in this embodiment, since a character or a character string can be input by uttering characters one by one by voice and the input candidate character can be corrected only to another specific character. The effect of reducing the time for gazing at the screen and enabling safe character input is obtained.
[0055]
【The invention's effect】
As described above, according to the present invention, since characters are input by voice recognition and the recognized characters can be corrected to only one specific character on the input palette, the time for the driver to watch the screen is reduced. Thus, it is possible to input characters safely.
[Brief description of the drawings]
FIG. 1 is a block diagram of an embodiment of the present invention.
FIG. 2 is a diagram showing a display example of a screen of the auxiliary display device according to the embodiment of the present invention.
FIG. 3 is a flowchart showing a processing routine according to the embodiment of the present invention.
4 is a flowchart showing a routine of single syllable input mode processing of FIG. 3;
FIG. 5 is a plan view showing a Japanese syllabary input palette displaying Japanese syllabary characters.
[Explanation of symbols]
11 voice input device 12 voice recognition device 13 voice output device 16 auxiliary display device 17 dictionary database

Claims

Voice input means for inputting voice;
Voice recognition means for recognizing the voice input from the voice input means and converting the recognition result into a character or a character string corresponding to the recognition result;
Characters or character strings converted by the speech recognition means are displayed by arranging 50- characters or parts of 50- characters so that the same vowels are arranged on the same line. Display means for displaying a sound character, or a character selected from a part of the character of the Japanese syllabary, together with a Japanese syllabary input palette for inputting one character at a time;
Select and display one character on the Japanese syllabary input palette corresponding to the character or character string converted by the speech recognition means, change the selection display only on the same line in response to a correction request, A control means for confirming the character selected and displayed in response,
A voice input device for vehicles including the above.

In response to the correction request, the control means is any one of a character on the same line and having a high speech recognition score, a character corresponding to a word, a character having a high speech recognition score, and a character corresponding to a word. 2. The vehicle voice input device according to claim 1, wherein only the character that satisfies the condition that combines the conditions for defining one character is changed.