JP3985525B2

JP3985525B2 - Voice recognition device

Info

Publication number: JP3985525B2
Application number: JP2002003787A
Authority: JP
Inventors: 健大野
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2002-01-10
Filing date: 2002-01-10
Publication date: 2007-10-03
Anticipated expiration: 2022-01-10
Also published as: JP2003208193A

Description

【０００１】
【発明の属する技術分野】
本発明は、入力された音声を認識して、入力された実際の音声に対する認識候補を表示する音声認識装置に関する。
【０００２】
【従来の技術】
従来の音声認識装置として、特開平１１−３５２９９１号公報に開示されたものがある。この音声認識装置では、単音節ごとに区切って発声された音声を認識して認識候補を表示し、表示した認識候補が音声入力者によって確定されるまで、順次認識候補を表示していくものである。
【０００３】
【発明が解決しようとする課題】
しかしながら、従来の音声認識装置では、所望の認識候補が得られない場合、次の認識候補を順次表示させていくが、例えば音声入力時に大きいレベルの騒音が混入した時には、入力音声の誤認識により、認識候補を順次表示させていっても所望の認識候補が表示されないことがある。従って、正しい認識候補の有無が分からないまま、認識候補の選択操作を行わなければならなかった。
【０００４】
本発明の目的は、操作装置を用いて認識候補の選択を行う際に、認識候補の中に誤認識されやすい認識候補が存在するときは、操作装置を操作する時の操作感を変えることにより、正しい可能性が高い認識候補の有無を操作者に伝えることができる音声認識装置を提供することにある。
【０００５】
【課題を解決するための手段】
一実施の形態を示す図１を参照して本発明を説明する。
（１）請求項１の発明は、音声を入力する音声入力装置と、入力される音声に対する認識対象語を複数記憶し、認識対象語のうち誤認識されやすい認識対象語同士を対応付けて記憶する記憶装置と、音声入力装置に入力された音声と、記憶装置に記憶されている認識対象語とが一致する度合いを示す一致度を演算するとともに、一致度の高い順に並べた認識対象語を上位から認識候補とする制御装置と、少なくとも認識候補の中から所望の認識候補を選択する操作を操作者が行うことができる操作装置と、認識候補の中に、記憶装置に記憶されている誤認識されやすい認識対象語同士が含まれているときに、操作装置を操作するときの操作感を変更する操作感変更装置とを備えることにより上記目的を達成する。
（２）請求項２の発明は、請求項１の音声認識装置において、操作感変更装置は、記憶装置に記憶されている誤認識されやすい認識対象語同士のいずれか一方が認識候補として選択されるまでは、操作者が操作装置を用いて認識候補の選択操作を行う負荷を小さくすることを特徴とする。
（３）請求項３の発明は、請求項２の音声認識装置において、操作装置はホイールを備えた回転式入力装置であって、操作者がホイールを回転させることにより選択操作を行うことができるものであり、操作感変更装置は、次の認識候補の選択の際に、ホイールの回転操作をアシストする力と妨げる力とを交互に発生させるものであり、記憶装置に記憶されている誤認識されやすい認識対象語同士のいずれか一方が認識候補として選択されるまでは、ホイールの回転操作を妨げる力を小さくすることを特徴とする。
（４）請求項４の発明は、請求項２の音声認識装置において、操作装置はホイールを備えた回転式入力装置であって、操作者がホイールを回転させることにより選択操作を行うことができるものであり、操作感変更装置は、記憶装置に記憶されている誤認識されやすい認識対象語同士のいずれか一方が認識候補として選択されるまでは、次の認識候補を選択するために必要なホイールの回転操作量を減少させることを特徴とする。
（５）請求項５の発明は、請求項１の音声認識装置において、操作感変更装置は、記憶装置に記憶されている誤認識されやすい認識対象語同士のうち、操作者が一致度の低い方の認識対象語を認識候補としてを選択する際の操作感を変更することを特徴とする。
（６）請求項６の発明は、請求項５の音声認識装置において、操作装置はホイールを備えた回転式入力装置であって、操作者がホイールを回転させることにより選択操作を行うことができるものであり、操作感変更装置は、次の認識候補の選択の際に、ホイールの回転操作をアシストする力と妨げる力とを交互に発生させるものであり、記憶装置に記憶されている誤認識されやすい認識対象語同士のうち、一致度が低い方の認識対象語が認識候補として選択される際に発生させるホイールの回転操作をアシストする力と妨げる力とを大きくすることを特徴とする。
（７）請求項７の発明は、請求項５の音声認識装置において、操作装置はホイールを備えた回転式入力装置であって、操作者がホイールを回転させることにより選択操作を行うことができるものであり、操作感変更装置は、記憶装置に記憶されている誤認識されやすい認識対象語同士のうち、一致度が低い方の認識対象語が認識候補として選択された状態から、次の認識候補を選択するために必要なホイールの回転操作量を増大させることを特徴とする。
（８）請求項８の発明は、請求項１〜７のいずれかの音声認識装置において、制御装置は、一致度の高い順に並べられた認識候補の中に、記憶装置に記憶されている誤認識されやすい認識対象語同士のいずれか一方の認識対象語が存在する場合に、一致度の高い順に並べられた認識候補の中に存在する誤認識されやすい認識対象語と対応付けて記憶装置に記憶されている認識対象語を誤認識されやすい認識対象語である認識候補の次に並べ替えることを特徴とする。
（９）請求項９の発明は、請求項８の音声認識装置において、操作感変更装置は、制御装置によって並べ替えられた誤認識されやすい認識対象語が選択された状態から次の認識候補が選択される際の操作感を変更することを特徴とする。
【０００６】
なお、上記課題を解決するための手段の項では、本発明をわかりやすく説明するために実施の形態の図１と対応づけたが、これにより本発明が実施の形態に限定されるものではない。
【０００７】
【発明の効果】
本発明によれば、次のような効果を奏する。
（１）請求項１〜９の発明によれば、抽出された認識候補の中に誤認識されやすい認識候補が含まれているときに、操作装置を操作するときの操作感を変更するので、操作者は、操作装置の操作感により、正しい可能性の高い認識候補の有無を前もって知ることができる。
（２）請求項２の発明によれば、対応付けられた誤認識されやすい認識候補同士が選択されるまでは、操作装置を用いて認識候補の選択操作を行う負荷を小さくするので、認識候補の選択操作を容易に行うことができる。
（３）請求項３の発明によれば、操作装置はホイールを備えた回転式入力装置であり、操作感変更装置は、ホイールの回転操作をアシストする力と妨げる力とを交互に発生させるものであり、対応付けられた誤認識されやすい認識候補同士が選択されるまでは、ホイールの回転操作を妨げる力を小さくするので、正しい可能性の高い認識候補があることをホイールの回転操作に要する力が小さくなることにより確実に知ることができ、かつ、認識候補の選択操作を容易に行うことができる。
（４）請求項４の発明によれば、操作装置はホイールを備えた回転式入力装置であり、対応付けられた誤認識されやすい認識候補同士が選択されるまでは、次の認識候補を選択するために必要なホイールの回転操作量を減少させるので、正しい可能性の高い認識候補があることをホイールの回転操作量が減少することにより確実に知ることができ、かつ、認識候補の選択操作を容易に行うことができる。
（５）請求項５の発明によれば、操作者が対応付けられた誤認識されやすい認識候補同士のうち、一致度の低い方の認識候補を選択する際の操作感を変更するので、操作者は、操作装置の操作感により、正しい可能性の高い認識候補の選択を行うことができる。
（６）請求項６の発明によれば、操作装置はホイールを備えた回転式入力装置であり、操作感変更装置は、ホイールの回転操作をアシストする力と妨げる力とを交互に発生させるものであり、対応付けられた誤認識されやすい認識候補同士のうち、一致度が低い方の認識候補が選択される際に発生させるホイールの回転操作をアシストする力と妨げる力とを大きくするので、操作者は、操作装置の操作感により、正しい可能性の高い認識候補の選択を行うことができる。また、操作者が適当にホイールを回転させた時でも、ホイールの回転を妨げる力が大きいので、正しい可能性が高い認識候補の位置で回転が停止する可能性が高くなり、認識候補の選択操作を容易に行うことができる。
（７）請求項７の発明によれば、操作装置はホイールを備えた回転式入力装置であり、対応付けられた誤認識されやすい認識候補同士のうち、一致度が低い方の認識候補を選択した状態から、次の認識候補を選択するために必要なホイールの回転操作量を増大させるので、操作者は、ホイールの回転操作量に基づいて、正しい可能性の高い認識候補の選択を行うことができる。
（８）請求項８の発明によれば、抽出された認識候補の中に誤認識されやすい認識候補が存在する場合に、誤認識されやすい認識候補と対応付けられている認識対象語を誤認識されやすい認識候補の次に並べ変えるので、誤認識されやすい認識対象語の選択を容易に行うことができる。
（９）請求項９の発明によれば、並べ替えられた誤認識されやすい認識対象語が選択された状態から次の認識候補が選択される際の操作感を変更するので、以後の認識候補が一致度の高い順に並んでいることを、操作者に認知させることができる。
【０００８】
【発明の実施の形態】
（第１の実施の形態）
図１は、本発明による音声認識装置の第１の実施の形態の構成を示す図である。第１の実施の形態における音声認識装置は、マイク１０１と、スピーカ１０２と、信号処理ユニット１０３と、入力装置１０４と、ディスプレイ１０５とを備える。信号処理ユニット１０３は、Ａ／Ｄコンバータ１０３１と、Ｄ／Ａコンバータ１０３２と、出力アンプ１０３３と、信号処理装置１０３４と、外部記憶装置１０３５とを有する。
【０００９】
マイク１０１を介して入力された音声は、音声信号として信号処理ユニット１０３のＡ／Ｄコンバータ１０３１に入力される。Ａ／Ｄコンバータ１０３１は、入力された音声信号をデジタル信号に変換して、信号処理装置１０３４に出力する。信号処理装置１０３４は、ＣＰＵ１０３４ａとメモリ１０３４ｂとを有し、外部記憶装置１０３５に記憶されている認識対象語のデジタル信号と、入力された音声のデジタル信号との一致度を演算する。外部記憶装置１０３５には、認識対象語が複数記憶されている。この認識対象語のうち、誤認識されやすい認識対象語同士は対応付けて記憶されている。
【００１０】
Ｄ／Ａコンバータ１０３２は、スピーカ１０２から音声等を出力するために、認識対象語のデジタル信号をアナログ信号に変換して、出力アンプ１０３３に出力する。Ｄ／Ａコンバータ１０３２から出力アンプ１０３３に入力されたアナログ信号は増幅されて、スピーカ１０２を介して音声として出力される。
【００１１】
ディスプレイ１０５は、入力された音声の認識候補等を表示するためのものである。入力装置１０４は、ホイール１０４ａと複数個のスイッチ１０４ｂとを有し、操作者の音声認識開始要求入力、入力の取り消し、認識候補選択操作等を検出して信号処理装置１０３４に出力する。ホイール１０４ａは、図１の矢印Ａの方向への押し込み操作と、矢印Ｂの方向への回転操作とが可能である。矢印Ｂの方向への回転操作は、ディスプレイ１０５に表示された認識候補の選択操作時に行われ、矢印Ａの方向への押し込み操作は、矢印Ｂ方向への回転操作により選択された認識候補を確定する操作時に行われる。
【００１２】
図２は、入力装置１０４の構成を示す詳細図である。入力装置１０４は、上述したホイール１０４ａとスイッチ１０４ｂの他に、ホイール駆動モータ１０４ｃとホイール制御ＣＰＵ１０４ｄとホイール位置センサ１０４ｅと通信デバイス１０４ｆとを備える。ホイール駆動モータ１０４ｃは、ホイール１０４ａの矢印Ｂの回転方向にトルクを発生することができる。操作者が回転操作する方向にトルクを発生させると、操作者がホイール１０４ａを回転するのを助け、操作者が回転操作する方向と逆の方向にトルクを発生させると、操作者がホイール１０４ａを回転するのを妨げることになる。このトルクの発生により、操作者はホイール１０４ａの回転操作が軽くなる感覚や重くなる感覚を感じる。すなわち、ホイール駆動モータ１０４ｃは、操作者のホイール１０４ａの操作感を変更させることができる。
【００１３】
ホイール位置センサ１０４ｅは、ホイール１０４ａの回転角および矢印Ａ方向の押し込み操作を検出する。ホイール位置センサ１０４ｅにより検出された信号は、ホイール制御ＣＰＵ１０４ｄに送られる。ホイール制御ＣＰＵ１０４ｄは、ホイール位置センサ１０４ｅから入力された信号をデジタル化してホイール位置情報に変換するとともに、信号処理装置１０３４から入力される情報、すなわち後述する発生トルクパターン情報とホイール位置情報とに基づいて、ホイール駆動モータ１０４ｃに発生させるトルク量を計算する。ホイール制御ＣＰＵ１０４ｄは、計算した発生トルク量に基づいたトルク制御信号をホイール駆動モータ１０４ｃに出力する。ホイール駆動モータ１０４ｃは、この制御信号に基づいて駆動し、ホイール１０４ａの矢印Ｂの回転方向にトルクを発生させる。
【００１４】
通信デバイス１０４ｆは、信号処理装置１０３４と接続されており、ホイール制御ＣＰＵ１０４ｄから入力されるホイール位置情報を信号処理装置１０３４に出力するとともに、信号処理装置１０３４から入力される発生トルクパターン情報をホイール制御ＣＰＵ１０４ｄに出力する。
【００１５】
図３は、ホイール１０４ａにトルクを発生させる時の概要を説明するための図である。円盤状のホイール１０４ａの中心には、シャフト１０が取り付けられており、シャフト１０の他端にはホイール駆動モータ１０４ｃが設けられている。ホイール位置センサ１０４ｅにより検出されたホイール１０４ａの回転量は、ホイール制御ＣＰＵ１０４ｄに送られる。ホイール制御ＣＰＵ１０４ｄは、このホイール位置情報と、信号処理装置１０３４から入力される情報とに基づいて、ホイール駆動モータ１０４ｃに発生させるトルク量を計算する。計算した発生トルク量に基づいたトルク制御信号は、ホイール駆動モータ１０４ｃに出力される。ホイール駆動モータ１０４ｃは、この制御信号に基づいて駆動してシャフト１０にトルクを加えることにより、ホイール１０４ａの矢印Ｂの回転方向にトルクを発生させることができる。
【００１６】
図４は、本発明による音声認識装置により行われる一実施の形態の処理手順を示すフローチャートである。この制御は、信号処理ユニット１０３の信号処理装置１０３４により行われる。ステップＳ２０１から始まる処理は、操作者が入力装置１０４を操作して、音声入力を開始する旨の信号が信号処理装置１０３４に入力されることにより始まる。
【００１７】
ステップＳ２０１では、音声認識処理を開始する旨を操作者に知らせるための告知音信号を外部記憶装置１０３５から読み込んで、Ｄ／Ａコンバータ１０３２に出力する。Ｄ／Ａコンバータ１０３２でアナログ変換された告知音信号は、出力アンプ１０３３を介してスピーカ１０２から告知音として出力される。操作者は、スピーカ１０２から発せられる告知音を聞いて、マイク１０１に音声入力を開始する。ここでは、本発明による音声認識装置をカーナビゲーション装置に適用した例について取りあげる。すなわち、操作者が目的地を音声入力するものである。説明を容易にするために、ここでは目的地の都道府県の名称を音声入力するものとし、外部記憶装置１０３５には、都道府県の名称が認識対象語として記憶されているものとする。
【００１８】
次のステップＳ２０２では、入力された音声の取り込みを開始する。操作者がマイク１０１に向かって発した音声は、Ａ／Ｄコンバータ１０３１でデジタル信号に変換された後、信号処理装置１０３４に入力される。マイク１０１は、不図示の電源から電力が供給されると、ステップＳ２０１で操作者が入力装置１０４を操作する前から、周辺の音を拾ってＡ／Ｄコンバータ１０３１に出力し、Ａ／Ｄコンバータ１０３１で変換されたデジタル信号が信号処理装置１０３４に入力されている。信号処理装置１０３４は、ステップＳ２０１で操作者が入力装置１０４を操作するまでは、入力されるデジタル信号の平均パワーを演算している。ステップＳ２０１で入力装置１０４が操作されて音声が入力されると、演算していたデジタル信号の平均パワーより大きいパワーのデジタル信号が入力される。従って、信号処理装置１０３４は、演算していた平均パワーより所定値以上のパワーのデジタル信号が入力されたときに、操作者がマイク１０１に向かって音声入力を行ったと判断し、音声の取り込みを開始する。
【００１９】
音声の取り込みを開始するとステップＳ２０３に進む。ステップＳ２０３では、取り込んだ音声と、外部記憶装置１０３５に記憶されている認識対象語との一致度を演算する。信号処理装置１０３４は、取り込みを開始した音声のデジタル信号のうち、信号のパワーに基づいて、操作者が発した音声区間の開始を識別しておく。この音声区間の開始以降のデジタル信号と、外部記憶装置１０３５に記憶されている複数の認識対象語のデジタル信号とが、それぞれどれほど似ているかを常時演算し、数値化していくことにより、一致度を演算する。数値化された一致度の値が大きいほど、比較している両者が似ていることを意味する。なお、並列処理により、一致度の演算が行われている間も、音声の取り込みは継続して行われている。
【００２０】
取り込んでいる音声のデジタル信号のパワーが所定値以下となる時間が所定時間以上継続すると、操作者による音声入力が終了したと判断して、ステップＳ２０４にて音声の取り込みを終了する。次のステップＳ２０５では、一致度の演算処理が終了した後に、一致度の大きい順に所定の数の認識対象語を抽出して認識候補とする。図５は、ディスプレイ１０５に表示された認識候補の一例である。ディスプレイ１０５には、認識候補とともに一致度も表示される。抽出する認識対象語の所定の数は、予め定めることができ、例えば１０である。図５では、一致度が高い順に５つの認識候補が表示されており、表示する所定の数を１０とした場合、一致度が８８０（「熊本県」）より小さい５つの認識候補がさらに存在する。
【００２１】
抽出された所定の数の認識候補をディスプレイ１０５に表示すると、ステップＳ２０６に進む。ステップＳ２０６では、操作者がディスプレイ１０５に表示された認識候補の中から、入力装置１０４を操作することにより、所望の認識候補を選択して確定したことを示す信号が入力されると、本制御を終了する。すなわち、操作者は、ディスプレイ１０５に表示された認識候補の中から、入力装置１０４のホイール１０４ａを回転操作して所望の認識候補を選択し、選択した所望の認識候補に対して、ホイール１０４ａの押し込み操作を行うことにより、所望の認識候補を確定させる。上述したように、ホイール１０４ａの回転操作や押し込み操作は、ホイール位置センサ１０４ｅにて検出されてホイール制御ＣＰＵ１０４ｄに送られ、通信デバイス１０４ｆを介して信号処理装置１０３４に入力される。信号処理装置１０３４は、この信号を受信すると本制御を終了する。
【００２２】
本発明による音声認識装置は、ステップＳ２０６で、操作者がディスプレイ１０５に表示された複数の認識候補の中から、ホイール１０４ａの回転操作により所望の候補を選択する際の入力装置１０４の制御に特徴がある。この制御について、図６を用いて説明する。
【００２３】
図６は、ホイール駆動モータ１０４ｃに対してホイール１０４ａの回転方向にトルクを発生させるための発生トルクポテンシャルと、ホイール１０４ａの回転角との関係を示す図である。この発生トルクポテンシャルと回転角との関係を示すグラフには、いくつかの種類があり、これらを発生トルクパターンと呼ぶ。このグラフは、複数ある発生トルクパターンを視覚的に捉えやすいので、以下の説明のために用いるが、実際にホイール１０４ａに発生させるトルクは、各回転角に対応するグラフの傾きである。すなわち、図示する発生トルクパターンは、ホイール１０４ａの回転角に対応する発生トルクを積分したものである。発生トルクポテンシャルのうち、図６に示す軸方向（正方向）のトルクが発生すると、操作者のホイール１０４ａの回転操作を妨げることになり、軸方向と反対方向（負方向）のトルクが発生すると、操作者のホイール１０４ａの回転操作をアシストすることになる。
【００２４】
図６に示すように、一致度が高い順にディスプレイ１０５に表示された認識候補の中から、操作者が所望の候補を選択するためにホイール１０４ａの回転操作を行うと、表示された認識候補、すなわち、「長野県」、「佐賀県」、「滋賀県」、「神奈川県」、「熊本県」等が順次選択される。図６に示すように、各認識候補に対応する発生トルクポテンシャルを「発生トルクポテンシャルの谷」と呼ぶことにする。上述したように、発生トルクポテンシャルの軸方向と反対方向のトルク、すなわち、発生トルクポテンシャルの谷の部分に対応するトルクがホイール駆動モータ１０４ｃに発生すると、ホイール１０４ａの回転をアシストすることになる。従って、操作者がホイール１０４ａの回転操作により、第１の認識候補である「長野県」を選択する際には、強く引き寄せられるような感覚がホイール１０４ａに発生し、「長野県」を選択しやすいようになっている。
【００２５】
「長野県」を選択した状態から、さらにホイール１０４ａを同一方向に回転させて次の認識候補を選択するときには、図６の矢印Ｃの位置のトルク勾配を上った後、矢印Ｄの位置のトルク勾配を下って、次の認識候補である「佐賀県」を選択する。矢印Ｃのトルク勾配を登る部分には、ホイール１０４ａの回転を妨げる向きの反力が働く。以後、ホイール１０４ａを同一方向に回転させると、ホイール１０４ａには回転をアシストする力と、回転を妨げる反力とが交互に働いて、順次「滋賀県」、「神奈川県」等の認識候補を選択することができる。
【００２６】
選択された認識候補は、ディスプレイ１０５に拡大表示されると同時にスピーカ１０２により合成音声で操作者に知らされる。図７は、操作者が「佐賀県」を選択したときのディスプレイ１０５の表示４０１と、スピーカ１０２から発せられる合成音声４０２とを示したものである。これにより、操作者は選択した認識候補が何であるかを正確に知ることができる。ここで、操作者が音声入力した言葉が「佐賀県」である場合は、「佐賀県」を選択した状態でホイール１０４ａの押し込み操作を行うことにより、「佐賀県」を確定することができる。
【００２７】
本発明による音声認識装置は、図４に示すフローチャートのステップＳ２０５で認識候補が抽出された時に、一致度の高い上位候補の認識対象語と誤認識されやすい認識対象語が存在する場合に、発生トルクパターンを変更する点に特徴がある。例えば、過去の実験等のデータにより、第１の認識候補が「長野県」である場合に、実際に音声入力された言葉が「神奈川県」である頻度が高かったとする。この場合、外部記憶装置１０３５には、誤認識されやすい認識対象語として、「長野県」と「神奈川県」が対応付けられて記憶されている。この場合、一致度の高い順に認識候補を表示するが、「神奈川県」が選択されるまでの選択操作を容易にすれば、操作者にとって便利である。従って、図６に示すように、「神奈川県」に至るまでの上り勾配（矢印Ｃ）はゆるやかにして発生させる反力を小さくし、「神奈川県」以後の上り勾配（矢印Ｅ）は通常時のものとする。ここで、通常時の上り勾配とは、下り勾配と上り勾配の傾きが同じ状態を意味し、ホイール１０４ａの回転操作操作時に発生するアシストトルクと反力としてのトルクとが同じ状態を言う。
【００２８】
これにより、操作者は認識候補の選択を行う時に、「神奈川県」に至るまでのホイール１０４ａの回転操作時にホイール１０４ａに加わる反力が通常時よりも小さいことを実感することができるので、音声入力した言葉と一致する可能性の高い認識対象語の有無を予め知ることができる。また、反力を小さくすることにより、音声入力した言葉と一致する可能性の高い認識対象語の有無を操作者に知らせるので、操作者は、認識候補の選択操作を容易に行うことができる。さらに、音声入力した言葉と一致する可能性の高い認識候補以後の認識候補を選択する際には、通常の反力が加わるので、それ以上認識候補の選択操作を行っても、所望の認識候補が得られる可能性が低いことを知ることができる。
【００２９】
（第２の実施の形態）
第２の実施の形態の音声認識装置が第１の実施の形態の音声認識装置と異なるのは、信号処理装置１０３４で行われる処理である。すなわち、信号処理装置１０３４で行われる処理のうち、図４のフローチャートを用いて説明した処理は同じであるが、操作者がホイール１０４ａの回転操作により認識候補の選択を行う時に、ホイール駆動モータ１０４ｃに発生させるトルクパターンが異なる。従って、以下では、トルクパターンの説明を主に行う。
【００３０】
図８は、ホイール１０４ａの回転角と各回転角に対応する発生トルクポテンシャルとの関係を示す図である。第１の実施の形態と同様に、第１の認識候補が「長野県」である場合に、過去の実験等のデータから実際に音声入力された言葉が「神奈川県」である頻度が高く、外部記憶装置１０３５には、誤認識されやすい認識対象語として、「長野県」と「神奈川県」が対応付けられて記憶されているものとする。
【００３１】
第２の実施の形態の音声認識装置で用いられるトルクパターンは、４番目の認識候補である「神奈川県」に至るまでの谷と谷との間隔（図中の間隔Ｆ）が、通常の谷と谷との間隔Ｇに比べて狭く設定されている。すなわち、各認識候補に対して、トルクポテンシャルの谷が対応付けられているが、「長野県」と「佐賀県」、「佐賀県」と「滋賀県」、「滋賀県」と「神奈川県」のそれぞれの谷と谷との間隔Ｆは、「神奈川県」の谷と「熊本県」の谷との間隔Ｇよりも、狭く設定されている。従って、例えば、「長野県」を選択した状態から「佐賀県」を選択するためにホイール１０４ａを回転させる量は、通常の回転量よりも少なくて済む。なお、抽出された認識候補の上位候補の中に、誤認識されやすい認識候補が含まれていないときは、トルクの谷と谷との間隔は通常時の間隔Ｇとなる。
【００３２】
これにより、操作者は認識候補の選択を行う時に、「神奈川県」に至るまでのホイール１０４ａに発生するトルクの谷と谷との間隔が通常の間隔よりも狭いことを、ホイール１０４ａの回転操作時に実感することができるので、音声入力した言葉と一致する可能性の高い認識対象語の有無を予め知ることができる。また、トルクの谷と谷との間隔を狭くすることにより、音声入力した言葉と一致する可能性の高い認識対象語の有無を操作者に知らせるので、操作者は、認識候補の選択操作を容易に行うことができる。さらに、音声入力した言葉と一致する可能性の高い認識候補以後の認識候補を選択する際には、谷と谷との間隔が通常時の間隔となるので、それ以上認識候補の選択操作を行っても、所望の認識候補が得られる可能性が低いことを知ることができる。
【００３３】
（第３の実施の形態）
第３の実施の形態の音声認識装置が第１，第２の実施の形態の音声認識装置と異なるのは、信号処理装置１０３４で行われる処理である。すなわち、信号処理装置１０３４で行われる処理のうち、図４のフローチャートを用いて説明した処理は同じであるが、操作者がホイール１０４ａの回転操作により認識候補の選択を行う時に、ホイール駆動モータ１０４ｃに発生させるトルクパターンが異なる。従って、以下では、トルクパターンの説明を主に行う。
【００３４】
図９は、ホイール１０４ａの回転角と各回転角に対応する発生トルクポテンシャルとの関係を示す図である。第１，第２の実施の形態と同様に、外部記憶装置１０３５には、誤認識されやすい認識対象語として、「長野県」と「神奈川県」が対応付けられて記憶されているものとする。
【００３５】
第３の実施の形態の音声認識装置で用いられるトルクパターンでは、各認識候補に対して、トルクポテンシャルの谷が対応付けられているが、誤認識されやすい「神奈川県」に対応する谷Ｈの深さが、他の認識候補に対する谷の深さよりも深くなっている。これにより、操作者はホイール１０４ａに加わる反力の変化を手がかりに、正しい可能性の高い認識候補（本実施の形態では、「神奈川県」）の選択を容易に行うことができる。また、「神奈川県」に対応する谷Ｈの深さが深いので、「神奈川県」の次の認識候補である「熊本県」を選択する際にホイール１０４ａに加わる反力も、通常時の反力よりも大きくなる。従って、操作者が適当にホイール１０４ａを回転させた時でも、実際に音声入力された言葉と一致する可能性が高い「神奈川県」で回転が停止する可能性が高く、正しい可能性の高い認識候補の選択がより容易になる。
【００３６】
（第４の実施の形態）
第４の実施の形態の音声認識装置も、第１〜第３の実施の形態の音声認識装置の信号処理装置１０３４で行われる処理は同じであるが、操作者がホイール１０４ａの回転操作により認識候補の選択を行う時に、ホイール駆動モータ１０４ｃに発生させるトルクパターンが異なる。従って、以下では、トルクパターンの説明を主に行う。
【００３７】
図１０は、ホイール１０４ａの回転角と各回転角に対応する発生トルクポテンシャルとの関係を示す図である。第１〜第３の実施の形態と同様に、外部記憶装置１０３５には、誤認識されやすい認識対象語として、「長野県」と「神奈川県」が対応付けられて記憶されているものとする。
【００３８】
第４の実施の形態の音声認識装置で用いられるトルクパターンでは、各認識候補に対してトルクポテンシャルの谷が対応付けられているが、誤認識されやすい「神奈川県」に対応する谷Ｉの幅が、他の認識候補に対する谷の幅よりも広くなっている。すなわち、操作者は、「神奈川県」を選択した状態から、次の認識候補である「熊本県」を選択する時は、ホイール１０４ａの回転操作量を通常時の回転操作量よりも多くする必要がある。
【００３９】
これにより、操作者はホイール１０４ａの回転操作量を手がかりにして、正しい可能性の高い認識候補（本実施の形態では、「神奈川県」）の選択を容易に行うことができる。また、「神奈川県」に対応する谷Ｉの幅が広いので、ディスプレイ１０５の表示を見ずに、適当にホイール１０４ａを回転させた時でも、実際に音声入力された言葉と一致する可能性が高い「神奈川県」が選択される可能性が高く、正しい可能性の高い認識候補の選択がより容易になる。
【００４０】
（第５の実施の形態）
第５の実施の形態の音声認識装置が第１〜第４の実施の形態の音声認識装置と異なるのは、信号処理装置１０３４で行われる処理である。すなわち、信号処理装置１０３４で行われる処理のうち、図４のフローチャートのステップＳ２０５で抽出した認識候補の並べ方が異なる。上述したように、第１〜第４の実施の形態の音声認識装置では、一致度の高い順に認識候補を並べているが、第５の実施の形態の音声認識装置では、一致度の高い上位の認識候補と誤認識されやすい認識対象語が存在するときは、その認識対象語を誤認識されやすい認識候補の次に並べる。
【００４１】
図１１は、ホイール１０４ａの回転角と各回転角に対応する発生トルクポテンシャルとの関係を示す図である。第１〜第４の実施の形態と同様に、外部記憶装置１０３５には、誤認識されやすい認識対象語として、「長野県」と「神奈川県」が対応付けられて記憶されているものとする。
【００４２】
この場合、第１の認識候補の「長野県」の次に並べられるのは、「長野県」の次に一致度の高い「佐賀県」ではなく、「長野県」と誤認識されやすい「神奈川県」である。従って、一致度が２番目に高い「佐賀県」は３番目に並べ替えられ、３番目に一致度が高い「滋賀県」は、４番目に並べられる。また、各認識候補に対してトルクポテンシャルの谷が対応付けられているが、誤認識されやすい「神奈川県」に対応する谷Ｊの深さは、他の認識候補に対応する谷の深さよりも深い。これにより、操作者は、実際に音声入力された言葉と一致する可能性が高い「神奈川県」を迅速、かつ、容易に選択することができる。また、「神奈川県」から次の認識候補である「佐賀県」を選択する際の、トルクポテンシャルの平らな部分Ｋ（以下、「トルクの丘」と呼ぶ）が、「長野県」から「神奈川県」に至る時のトルクの丘よりも高くなっている。これにより、「神奈川県」以後の認識候補が、通常通りに一致度の高い順に並んでいることを操作者に認知させることができる。
【００４３】
本発明は、上述した実施の形態に限定されることはない。例えば、入力装置１０４を用いて認識候補を選択するために、ホイール１０４ａの回転操作を行うものとしているが、入力装置１０４にジョイスティックを採用して、ジョイスティックにより認識候補の選択を行うこともできる。また、入力装置にキーボードやコントローラを採用して、十字キーにより認識候補の選択を行ってもよい。
【００４４】
また、上述した第１〜第５の実施の形態では、誤認識されやすい認識候補が１番目にある場合について説明したが、誤認識されやすい認識候補の順番は何番目でもよい。ただし、誤認識されやすい認識候補の一致度が低い場合、すなわち、誤認識されやすい認識候補が下位の場合には、正しい可能性の高い認識候補を選択するまでの操作に時間がかかるため、正しい可能性の高い認識候補の有無を操作者に知らせないために、ホイール１０４ａに発生させるトルクや回転操作量等の操作感を変更しないこともできる。すなわち、誤認識されやすい認識候補が上位候補である場合に、第１〜第５の実施の形態で説明したようなホイール１０４ａの操作感を変更すれば、操作者にとって便利である。
【００４５】
さらに、第５の実施の形態では、一致度の高い順に並べられた認識候補の中に誤認識されやすい認識候補が含まれている場合に、その認識候補に続いて、対応付けて記憶されている認識対象語を並べたが、この誤認識されやすい認識対象語の並べ替えは、第１〜第４の実施の形態の音声認識装置にも適用することができる。すなわち、誤認識されやすい認識対象語の並べ替えを行った後に、第１〜第４の実施の形態で説明したように、入力装置１０４の操作感を変更するようにすればよい。
【００４６】
上述した実施の形態では、本発明による音声認識装置をカーナビゲーション装置に適用した例について説明したが、カーナビゲーション装置以外のものにも適用することができる。
【図面の簡単な説明】
【図１】本発明による音声認識装置の一実施の形態の構成を示す図
【図２】本発明による音声認識装置に用いられる入力装置の一実施の形態の構成を示す図
【図３】ホイールにトルクを発生させるための概要を説明するための図
【図４】信号処理装置にて行われる一実施の形態の制御手順を示すフローチャート
【図５】ディスプレイに表示される認識候補の一例を示す図
【図６】第１の実施の形態の音声認識装置で用いられるトルクパターンを示す図
【図７】選択された認識候補をディスプレイに表示するとともに音声で知らせることを示す図
【図８】第２の実施の形態の音声認識装置で用いられるトルクパターンを示す図
【図９】第３の実施の形態の音声認識装置で用いられるトルクパターンを示す図
【図１０】第４の実施の形態の音声認識装置で用いられるトルクパターンを示す図
【図１１】第５の実施の形態の音声認識装置で用いられるトルクパターンを示す図
【符号の説明】
１０…シャフト、１０１…マイク、１０２…スピーカ、１０３…信号処理ユニット、１０３１…Ａ／Ｄコンバータ、１０３２…Ｄ／Ａコンバータ、１０３３…出力アンプ、１０３４…信号処理装置、１０３４ａ…ＣＰＵ、１０３４ｂ…メモリ、１０３５…外部記憶装置、１０４…入力装置、１０４ａ…ホイール、１０４ｂ…スイッチ、１０４ｃ…ホイール駆動モータ、１０４ｄ…ホイール制御ＣＰＵ、１０４ｅ…ホイール位置センサ、１０４ｆ…通信デバイス、１０５…表示ディスプレイ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech recognition apparatus that recognizes input speech and displays recognition candidates for the input actual speech.
[0002]
[Prior art]
A conventional speech recognition apparatus is disclosed in Japanese Patent Application Laid-Open No. 11-352991. This speech recognition apparatus recognizes speech uttered by dividing into single syllables, displays recognition candidates, and sequentially displays the recognition candidates until the displayed recognition candidates are determined by the voice input person. is there.
[0003]
[Problems to be solved by the invention]
However, in the conventional speech recognition apparatus, when a desired recognition candidate cannot be obtained, the next recognition candidate is sequentially displayed. For example, when a large level of noise is mixed during speech input, the input speech may be misrecognized. Even if the recognition candidates are sequentially displayed, a desired recognition candidate may not be displayed. Accordingly, it has been necessary to perform a recognition candidate selection operation without knowing whether or not there is a correct recognition candidate.
[0004]
The object of the present invention is to change the operational feeling when operating the operating device when there is a recognition candidate that is easily misrecognized among the recognition candidates when selecting the recognition candidate using the operating device. Another object of the present invention is to provide a speech recognition apparatus that can inform an operator of the presence or absence of a recognition candidate that is highly likely to be correct.
[0005]
[Means for Solving the Problems]
The present invention will be described with reference to FIG. 1 showing an embodiment.
(1) The invention of claim 1 stores a plurality of recognition target words for a speech input device for inputting speech and recognition speech for input speech, and stores recognition target words that are easily misrecognized among recognition target words. To calculate the degree of coincidence indicating the degree of coincidence between the speech input to the speech input device and the speech input word stored in the storage device, and the recognition target words arranged in descending order of coincidence Among the control device that is a recognition candidate from the top, the operation device that allows the operator to perform an operation of selecting a desired recognition candidate from at least the recognition candidates, and the recognition candidates, Recognized words that are easily misrecognized and stored in the storage device The above-mentioned object is achieved by providing an operation feeling change device that changes an operation feeling when operating the operation device.
(2) The invention of claim 2 is the speech recognition apparatus of claim 1, wherein the operation feeling changing device is: One of recognition target words that are easily misrecognized and stored in the storage device is a recognition candidate. Until the selection is made, the load for the operator to perform the recognition candidate selection operation using the operation device is reduced.
(3) The voice recognition device according to claim 2, wherein the operation device is a rotary input device including a wheel, and an operator can perform a selection operation by rotating the wheel. The operation feeling changing device alternately generates a force that assists and prevents a wheel rotation operation when selecting the next recognition candidate. One of recognition target words that are easily misrecognized and stored in the storage device is a recognition candidate. Until selected, the force that hinders the rotation of the wheel is reduced.
(4) The voice recognition device according to claim 2, wherein the operation device is a rotary input device including a wheel, and an operator can perform a selection operation by rotating the wheel. The operation feeling changing device is One of recognition target words that are easily misrecognized and stored in the storage device is a recognition candidate. Until the selection is made, the amount of rotation operation of the wheel necessary for selecting the next recognition candidate is reduced.
(5) The invention of claim 5 is the speech recognition apparatus of claim 1, wherein the operation feeling changing device is: Recognized words that are easily misrecognized and stored in the storage device Out of which the operator is the less consistent The recognition target word Recognition candidates As The operational feeling when selecting is changed.
(6) In the voice recognition device according to claim 5, the operation device is a rotary input device including a wheel, and the operator can perform a selection operation by rotating the wheel. The operation feeling changing device alternately generates a force that assists and prevents a wheel rotation operation when selecting the next recognition candidate. Recognized words that are easily misrecognized and stored in the storage device Of the ones with lower match The recognition target word is Recognition candidates As It is characterized in that the force for assisting and preventing the rotation operation of the wheel generated when selected is increased.
(7) The invention according to claim 7 is the voice recognition device according to claim 5, wherein the operating device is a rotary input device provided with a wheel, and an operator can perform a selection operation by rotating the wheel. The operation feeling changing device is Recognized words that are easily misrecognized and stored in the storage device Of the ones with lower match The recognition target word is Recognition candidates As Choice Was It is characterized in that the amount of wheel rotation operation necessary to select the next recognition candidate from the state is increased.
(8) The invention according to claim 8 is the speech recognition apparatus according to any one of claims 1 to 7, wherein the control device includes recognition candidates arranged in descending order of coincidence. , One of the recognition target words stored in the storage device and easily misrecognized Is present, Exists among recognition candidates arranged in descending order of matching Misunderstood easily Recognition word And correspondence Is stored in the storage device. The recognition target word is easily misrecognized It is a recognition target word Next to recognition candidates Replacement It is characterized by.
(9) The invention according to claim 9 is the voice recognition device according to claim 8, wherein the operation feeling changing device is: By control unit The operational feeling when the next recognition candidate is selected is changed from the state in which the rearranged recognition target words that are easily misrecognized are selected.
[0006]
In the section of means for solving the above problems, the present invention is associated with FIG. 1 of the embodiment for easy understanding. However, the present invention is not limited to the embodiment. .
[0007]
【The invention's effect】
The present invention has the following effects.
(1) According to the inventions of claims 1 to 9, since the extracted recognition candidates include recognition candidates that are easily misrecognized, the operational feeling when operating the controller device is changed. The operator can know in advance the presence / absence of a recognition candidate that is highly likely to be correct from the operational feeling of the operating device.
(2) According to the invention of claim 2, the recognition candidate selection operation is reduced using the operation device until the associated recognition candidates that are likely to be erroneously recognized are selected. The selection operation can be easily performed.
(3) According to the invention of claim 3, the operating device is a rotary input device provided with a wheel, and the operation feeling changing device alternately generates a force assisting and a hindering wheel rotating operation. Until the associated recognition candidates that are likely to be misrecognized are selected, the force that hinders the wheel rotation operation is reduced. Therefore, it is necessary for the wheel rotation operation that there is a recognition candidate that is highly likely to be correct. By reducing the force, it can be surely known, and the recognition candidate selection operation can be easily performed.
(4) According to the invention of claim 4, the operation device is a rotary input device having a wheel, and the next recognition candidate is selected until the corresponding recognition candidates that are likely to be erroneously recognized are selected. This reduces the amount of wheel rotation operation required to perform recognition, so that it is possible to know with certainty that there is a recognition candidate that is highly likely to be correct by reducing the amount of wheel rotation operation, and to select a recognition candidate. Can be easily performed.
(5) According to the invention of claim 5, since the operation feeling when selecting a recognition candidate with a lower degree of coincidence among recognition candidates easily associated with erroneous recognition associated with an operator is changed, The person can select a recognition candidate having a high possibility of being correct based on the operational feeling of the controller device.
(6) According to the invention of claim 6, the operation device is a rotary input device provided with a wheel, and the operation feeling changing device alternately generates a force assisting and preventing a wheel rotation operation. And, among the associated recognition candidates that are easily misrecognized, increase the force that assists and prevents the wheel rotation operation that is generated when the recognition candidate with the lower degree of coincidence is selected. The operator can select a recognition candidate with a high possibility of correctness based on the operational feeling of the operating device. In addition, even when the operator appropriately rotates the wheel, the force that hinders the rotation of the wheel is large, so that it is highly likely that the rotation will stop at the position of the recognition candidate that is highly likely to be correct. Can be easily performed.
(7) According to the invention of claim 7, the operation device is a rotary input device having a wheel, and selects a recognition candidate having a lower degree of coincidence among the corresponding recognition candidates that are easily misrecognized. From this state, the amount of rotation operation of the wheel necessary for selecting the next recognition candidate is increased, so that the operator selects a recognition candidate having a high possibility of being correct based on the amount of rotation operation of the wheel. Can do.
(8) According to the invention of claim 8, when there is a recognition candidate that is easily misrecognized among the extracted recognition candidates, the recognition target word associated with the recognition candidate that is easily misrecognized is erroneously recognized. Since the recognition candidates are rearranged next to the recognition candidates that are likely to be recognized, the recognition target words that are likely to be erroneously recognized can be easily selected.
(9) According to the invention of claim 9, since the operation feeling when the next recognition candidate is selected from the state in which the rearranged recognition target words that are easily misrecognized is selected, the subsequent recognition candidates are changed. It is possible to make the operator recognize that the items are arranged in descending order of coincidence.
[0008]
DETAILED DESCRIPTION OF THE INVENTION
(First embodiment)
FIG. 1 is a diagram showing a configuration of a first embodiment of a speech recognition apparatus according to the present invention. The speech recognition apparatus according to the first embodiment includes a microphone 101, a speaker 102, a signal processing unit 103, an input device 104, and a display 105. The signal processing unit 103 includes an A / D converter 1031, a D / A converter 1032, an output amplifier 1033, a signal processing device 1034, and an external storage device 1035.
[0009]
The audio input via the microphone 101 is input to the A / D converter 1031 of the signal processing unit 103 as an audio signal. The A / D converter 1031 converts the input audio signal into a digital signal and outputs the digital signal to the signal processing device 1034. The signal processing device 1034 includes a CPU 1034a and a memory 1034b, and calculates the degree of coincidence between the digital signal of the recognition target word stored in the external storage device 1035 and the digital signal of the input voice. The external storage device 1035 stores a plurality of recognition target words. Among the recognition target words, recognition target words that are easily misrecognized are stored in association with each other.
[0010]
The D / A converter 1032 converts the digital signal of the recognition target word into an analog signal and outputs the analog signal to the output amplifier 1033 in order to output sound or the like from the speaker 102. The analog signal input from the D / A converter 1032 to the output amplifier 1033 is amplified and output as sound through the speaker 102.
[0011]
The display 105 is for displaying input speech recognition candidates and the like. The input device 104 includes a wheel 104a and a plurality of switches 104b. The input device 104 detects a voice recognition start request input by the operator, cancels input, selects a recognition candidate, and the like, and outputs the detected signal to the signal processing device 1034. The wheel 104a can be pushed in the direction of arrow A in FIG. 1 and can be rotated in the direction of arrow B. The rotation operation in the direction of the arrow B is performed at the time of the selection operation of the recognition candidate displayed on the display 105, and the push operation in the direction of the arrow A confirms the recognition candidate selected by the rotation operation in the arrow B direction. Performed during the operation.
[0012]
FIG. 2 is a detailed diagram illustrating the configuration of the input device 104. The input device 104 includes a wheel drive motor 104c, a wheel control CPU 104d, a wheel position sensor 104e, and a communication device 104f in addition to the wheel 104a and the switch 104b described above. The wheel drive motor 104c can generate torque in the rotation direction of the arrow B of the wheel 104a. When torque is generated in the direction in which the operator rotates, the operator helps the wheel 104a rotate. When torque is generated in the direction opposite to the direction in which the operator rotates, the operator turns the wheel 104a. It will prevent it from rotating. Due to the generation of this torque, the operator feels that the rotation operation of the wheel 104a becomes lighter or heavier. That is, the wheel drive motor 104c can change the operational feeling of the operator's wheel 104a.
[0013]
The wheel position sensor 104e detects the rotation angle of the wheel 104a and the pushing operation in the arrow A direction. The signal detected by the wheel position sensor 104e is sent to the wheel control CPU 104d. The wheel control CPU 104d digitizes the signal input from the wheel position sensor 104e and converts it into wheel position information, and also based on information input from the signal processing device 1034, that is, generated torque pattern information and wheel position information described later. Thus, the amount of torque generated by the wheel drive motor 104c is calculated. The wheel control CPU 104d outputs a torque control signal based on the calculated generated torque amount to the wheel drive motor 104c. The wheel drive motor 104c is driven based on this control signal, and generates torque in the rotation direction of the arrow B of the wheel 104a.
[0014]
The communication device 104f is connected to the signal processing device 1034, outputs wheel position information input from the wheel control CPU 104d to the signal processing device 1034, and controls generated torque pattern information input from the signal processing device 1034 to wheel control. It outputs to CPU104d.
[0015]
FIG. 3 is a view for explaining an outline when torque is generated in the wheel 104a. A shaft 10 is attached to the center of the disc-shaped wheel 104a, and a wheel drive motor 104c is provided at the other end of the shaft 10. The amount of rotation of the wheel 104a detected by the wheel position sensor 104e is sent to the wheel control CPU 104d. The wheel control CPU 104d calculates the amount of torque to be generated by the wheel drive motor 104c based on the wheel position information and information input from the signal processing device 1034. A torque control signal based on the calculated generated torque amount is output to the wheel drive motor 104c. The wheel drive motor 104c is driven based on this control signal and applies torque to the shaft 10, thereby generating torque in the rotation direction of the arrow B of the wheel 104a.
[0016]
FIG. 4 is a flowchart showing a processing procedure of an embodiment performed by the speech recognition apparatus according to the present invention. This control is performed by the signal processing device 1034 of the signal processing unit 103. The processing starting from step S201 starts when the operator operates the input device 104 and a signal indicating that voice input is to be started is input to the signal processing device 1034.
[0017]
In step S <b> 201, a notification sound signal for notifying the operator that voice recognition processing is to be started is read from the external storage device 1035 and output to the D / A converter 1032. The notification sound signal analog-converted by the D / A converter 1032 is output as a notification sound from the speaker 102 via the output amplifier 1033. The operator listens to the notification sound emitted from the speaker 102 and starts voice input to the microphone 101. Here, an example in which the speech recognition apparatus according to the present invention is applied to a car navigation apparatus will be described. That is, the operator inputs the destination by voice. In order to facilitate the explanation, it is assumed here that the name of the destination prefecture is inputted by voice, and the name of the prefecture is stored in the external storage device 1035 as a recognition target word.
[0018]
In the next step S202, capturing of the input voice is started. The voice uttered by the operator toward the microphone 101 is converted into a digital signal by the A / D converter 1031 and then input to the signal processing device 1034. When power is supplied from a power source (not shown), the microphone 101 picks up surrounding sounds and outputs them to the A / D converter 1031 before the operator operates the input device 104 in step S201. The digital signal converted in 1031 is input to the signal processing device 1034. The signal processing device 1034 calculates the average power of the input digital signal until the operator operates the input device 104 in step S201. When voice is input by operating the input device 104 in step S201, a digital signal having a power greater than the average power of the calculated digital signal is input. Therefore, the signal processing device 1034 determines that the operator has made a voice input to the microphone 101 when a digital signal having a power greater than a predetermined value from the calculated average power is input, and captures the voice. Start.
[0019]
When the audio capturing is started, the process proceeds to step S203. In step S203, the degree of coincidence between the captured voice and the recognition target word stored in the external storage device 1035 is calculated. The signal processing device 1034 identifies the start of the voice section issued by the operator based on the power of the signal among the digital signals of the voice that has been captured. The degree of coincidence is calculated by always calculating how much the digital signals after the start of the speech section and the digital signals of the plurality of recognition target words stored in the external storage device 1035 are similar to each other and digitizing them. Is calculated. The larger the value of the degree of coincidence, the more similar the two being compared. Note that, while the matching degree is being calculated by the parallel processing, the voice is continuously captured.
[0020]
If the time during which the power of the digital signal of the voice being captured is equal to or less than the predetermined value continues for a predetermined time or longer, it is determined that the voice input by the operator has been completed, and the voice capturing is terminated in step S204. In the next step S205, after completion of the coincidence calculation process, a predetermined number of recognition target words are extracted in descending order of coincidence and set as recognition candidates. FIG. 5 is an example of recognition candidates displayed on the display 105. The display 105 displays the degree of coincidence along with the recognition candidates. The predetermined number of recognition target words to be extracted can be determined in advance, for example, 10. In FIG. 5, five recognition candidates are displayed in descending order of the degree of coincidence. When the predetermined number to be displayed is 10, there are further five recognition candidates whose degree of coincidence is smaller than 880 (“Kumamoto Prefecture”). .
[0021]
When the extracted predetermined number of recognition candidates are displayed on display 105, the process proceeds to step S206. In step S206, when a signal indicating that the operator has selected and confirmed a desired recognition candidate by operating the input device 104 from the recognition candidates displayed on the display 105, the control is performed. Exit. That is, the operator selects a desired recognition candidate by rotating the wheel 104a of the input device 104 from the recognition candidates displayed on the display 105, and the wheel 104a is selected with respect to the selected desired recognition candidate. A desired recognition candidate is determined by performing a push-in operation. As described above, the rotation operation and push-in operation of the wheel 104a are detected by the wheel position sensor 104e, sent to the wheel control CPU 104d, and input to the signal processing device 1034 via the communication device 104f. The signal processing apparatus 1034 ends this control when receiving this signal.
[0022]
The voice recognition apparatus according to the present invention is characterized in that the input device 104 is controlled when the operator selects a desired candidate by rotating the wheel 104a from among a plurality of recognition candidates displayed on the display 105 in step S206. There is. This control will be described with reference to FIG.
[0023]
FIG. 6 is a diagram showing the relationship between the generated torque potential for generating torque in the rotation direction of the wheel 104a with respect to the wheel drive motor 104c and the rotation angle of the wheel 104a. There are several types of graphs showing the relationship between the generated torque potential and the rotation angle, and these are called generated torque patterns. Since this graph makes it easy to visually grasp a plurality of generated torque patterns, it is used for the following explanation. The torque actually generated on the wheel 104a is the slope of the graph corresponding to each rotation angle. That is, the generated torque pattern shown is obtained by integrating the generated torque corresponding to the rotation angle of the wheel 104a. When the torque in the axial direction (positive direction) shown in FIG. 6 is generated in the generated torque potential, the rotation operation of the wheel 104a by the operator is hindered, and torque in the direction opposite to the axial direction (negative direction) is generated. Assisting the operator to rotate the wheel 104a.
[0024]
As shown in FIG. 6, when the operator performs a rotation operation of the wheel 104a to select a desired candidate from the recognition candidates displayed on the display 105 in descending order of the degree of coincidence, the displayed recognition candidates are displayed. That is, “Nagano Prefecture”, “Saga Prefecture”, “Shiga Prefecture”, “Kanagawa Prefecture”, “Kumamoto Prefecture”, etc. are sequentially selected. As shown in FIG. 6, the generated torque potential corresponding to each recognition candidate is referred to as a “generated torque potential valley”. As described above, when the torque in the direction opposite to the axial direction of the generated torque potential, that is, the torque corresponding to the valley portion of the generated torque potential is generated in the wheel drive motor 104c, the rotation of the wheel 104a is assisted. Therefore, when the operator selects “Nagano Prefecture”, which is the first recognition candidate, by rotating the wheel 104a, the wheel 104a has a feeling of being strongly drawn, and the user selects “Nagano Prefecture”. It has become easy.
[0025]
When the next recognition candidate is selected by further rotating the wheel 104a in the same direction from the state where “Nagano Prefecture” is selected, the torque gradient at the position indicated by the arrow C in FIG. Down the torque gradient, the next recognition candidate “Saga” is selected. A reaction force in a direction that prevents the rotation of the wheel 104a is applied to the portion that climbs the torque gradient of the arrow C. Thereafter, when the wheel 104a is rotated in the same direction, a force assisting the rotation and a reaction force that prevents the rotation work alternately on the wheel 104a, and the recognition candidates such as “Shiga Prefecture” and “Kanagawa Prefecture” are sequentially selected. You can choose.
[0026]
The selected recognition candidate is enlarged and displayed on the display 105, and at the same time, the speaker 102 notifies the operator with synthesized speech. FIG. 7 shows a display 401 on the display 105 when the operator selects “Saga Prefecture” and a synthesized voice 402 emitted from the speaker 102. Thereby, the operator can know exactly what recognition candidate has been selected. Here, when the word inputted by the operator is “Saga Prefecture”, “Saga Prefecture” can be determined by performing the pushing operation of the wheel 104a with “Saga Prefecture” selected.
[0027]
The speech recognition apparatus according to the present invention occurs when a recognition candidate word is extracted in step S205 of the flowchart shown in FIG. It is characterized in that the torque pattern is changed. For example, when the first recognition candidate is “Nagano Prefecture” based on past experiment data or the like, it is assumed that the frequency of actually input speech as “Kanagawa Prefecture” is high. In this case, “Nagano Prefecture” and “Kanagawa Prefecture” are stored in the external storage device 1035 in association with each other as easy-to-recognize recognition target words. In this case, recognition candidates are displayed in descending order of coincidence, but it is convenient for the operator if the selection operation until “Kanagawa Prefecture” is selected is facilitated. Therefore, as shown in FIG. 6, the upward gradient (arrow C) leading to “Kanagawa” gently reduces the reaction force generated, and the upward gradient (arrow E) after “Kanagawa” is normal. Shall be. Here, the normal upward gradient means a state where the downward gradient and the upward gradient are the same, and the assist torque generated when the wheel 104a is rotated and the torque as the reaction force are the same.
[0028]
As a result, when selecting the recognition candidate, the operator can realize that the reaction force applied to the wheel 104a during the rotation operation of the wheel 104a up to “Kanagawa Prefecture” is smaller than that in the normal state. It is possible to know in advance whether or not there is a recognition target word that is highly likely to match the input word. In addition, by reducing the reaction force, the operator is notified of the presence or absence of a recognition target word that is likely to match the speech input word, so that the operator can easily perform a recognition candidate selection operation. Furthermore, when selecting a recognition candidate after a recognition candidate that is highly likely to match the speech input word, a normal reaction force is applied, so that the desired recognition candidate can be selected even if the recognition candidate is further selected. It is possible to know that is unlikely to be obtained.
[0029]
(Second Embodiment)
The speech recognition apparatus according to the second embodiment is different from the speech recognition apparatus according to the first embodiment in processing performed by the signal processing device 1034. That is, among the processes performed by the signal processing device 1034, the processes described using the flowchart of FIG. 4 are the same, but when the operator selects a recognition candidate by rotating the wheel 104a, the wheel drive motor 104c. The torque pattern to be generated is different. Therefore, the torque pattern will be mainly described below.
[0030]
FIG. 8 is a diagram showing the relationship between the rotation angle of the wheel 104a and the generated torque potential corresponding to each rotation angle. As in the first embodiment, when the first recognition candidate is “Nagano Prefecture”, the frequency that the actual voice input from the data of the past experiment or the like is “Kanagawa Prefecture” is high, It is assumed that “Nagano Prefecture” and “Kanagawa Prefecture” are stored in the external storage device 1035 in association with each other as easy-to-recognize recognition target words.
[0031]
The torque pattern used in the speech recognition apparatus of the second embodiment is such that the interval between the valleys up to the fourth recognition candidate “Kanagawa Prefecture” (interval F in the figure) is a normal valley. It is set narrower than the gap G between and the valley. That is, torque recognition valleys are associated with each recognition candidate, but “Nagano” and “Saga”, “Saga” and “Shiga”, “Shiga” and “Kanagawa” The interval F between the valleys of each of the above is set narrower than the interval G between the valleys of “Kanagawa” and “Kumamoto”. Therefore, for example, the amount by which the wheel 104a is rotated to select “Saga Prefecture” from the state in which “Nagano Prefecture” is selected may be smaller than the normal rotation amount. In addition, when the recognition candidates that are likely to be misrecognized are not included in the extracted upper candidates of the recognition candidates, the interval between the torque valleys is the normal interval G.
[0032]
Thereby, when the operator selects a recognition candidate, the rotation operation of the wheel 104a indicates that the interval between the troughs of the torque generated in the wheel 104a up to “Kanagawa” is smaller than the normal interval. Since it can be felt at times, it is possible to know in advance whether or not there is a recognition target word that is highly likely to match the word input by voice. In addition, by narrowing the interval between the troughs of the torque, the operator is notified of the presence or absence of recognition target words that are likely to match the speech input words, so the operator can easily select recognition candidates. Can be done. Furthermore, when selecting a recognition candidate after a recognition candidate that is highly likely to match the speech input word, the interval between the valleys is the normal interval. However, it is possible to know that the possibility of obtaining a desired recognition candidate is low.
[0033]
(Third embodiment)
The speech recognition apparatus according to the third embodiment is different from the speech recognition apparatuses according to the first and second embodiments in processing performed by the signal processing apparatus 1034. That is, among the processes performed by the signal processing device 1034, the processes described using the flowchart of FIG. 4 are the same, but when the operator selects a recognition candidate by rotating the wheel 104a, the wheel drive motor 104c. The torque pattern to be generated is different. Therefore, the torque pattern will be mainly described below.
[0034]
FIG. 9 is a diagram showing the relationship between the rotation angle of the wheel 104a and the generated torque potential corresponding to each rotation angle. As in the first and second embodiments, it is assumed that “Nagano Prefecture” and “Kanagawa Prefecture” are stored in the external storage device 1035 as the recognition target words that are easily misrecognized. .
[0035]
In the torque pattern used in the speech recognition apparatus according to the third embodiment, the torque potential valley is associated with each recognition candidate, but the valley H corresponding to “Kanagawa Prefecture” that is easily misrecognized. The depth is deeper than the valley depth for other recognition candidates. Thereby, the operator can easily select a recognition candidate (in this embodiment, “Kanagawa Prefecture”) having a high possibility of being correct based on a change in the reaction force applied to the wheel 104a. In addition, since the valley H corresponding to “Kanagawa Prefecture” is deep, the reaction force applied to the wheel 104a when selecting “Kumamoto Prefecture”, which is the next recognition candidate of “Kanagawa Prefecture”, is also the reaction force at the normal time. Bigger than. Therefore, even when the operator appropriately rotates the wheel 104a, it is highly likely that the rotation will stop in “Kanagawa Prefecture”, which is likely to match the words that were actually input by voice, and the recognition is highly likely to be correct. Selection of candidates becomes easier.
[0036]
(Fourth embodiment)
The voice recognition device of the fourth embodiment is the same as the processing performed by the signal processing device 1034 of the voice recognition device of the first to third embodiments, but the operator recognizes it by rotating the wheel 104a. When a candidate is selected, the torque pattern generated by the wheel drive motor 104c is different. Therefore, the torque pattern will be mainly described below.
[0037]
FIG. 10 is a diagram showing the relationship between the rotation angle of the wheel 104a and the generated torque potential corresponding to each rotation angle. As in the first to third embodiments, it is assumed that “Nagano Prefecture” and “Kanagawa Prefecture” are associated and stored in the external storage device 1035 as recognition target words that are easily misrecognized. .
[0038]
In the torque pattern used in the speech recognition apparatus according to the fourth embodiment, the valley of the torque potential is associated with each recognition candidate, but the width of the valley I corresponding to “Kanagawa Prefecture” that is easily misrecognized. However, it is wider than the valley of other recognition candidates. That is, when selecting the next recognition candidate “Kumamoto Prefecture” from the state in which “Kanagawa Prefecture” is selected, the operator needs to increase the rotational operation amount of the wheel 104a more than the normal rotational operation amount. There is.
[0039]
Accordingly, the operator can easily select a recognition candidate (in this embodiment, “Kanagawa Prefecture”) having a high possibility of being correct, using the amount of rotation operation of the wheel 104a as a clue. In addition, since the valley I corresponding to “Kanagawa Prefecture” is wide, even when the wheel 104a is appropriately rotated without looking at the display 105, there is a possibility that it matches the words actually inputted by voice. A high “Kanagawa Prefecture” is likely to be selected, and it becomes easier to select a recognition candidate having a high possibility of being correct.
[0040]
(Fifth embodiment)
The speech recognition apparatus according to the fifth embodiment is different from the speech recognition apparatuses according to the first to fourth embodiments in processing performed by the signal processing device 1034. That is, among the processes performed by the signal processing device 1034, the recognition candidates arranged in step S205 in the flowchart of FIG. 4 are different. As described above, in the speech recognition apparatuses according to the first to fourth embodiments, the recognition candidates are arranged in descending order of the degree of coincidence. When there are recognition target words that are easily misrecognized as recognition candidates, the recognition target words are arranged next to recognition candidates that are easily misrecognized.
[0041]
FIG. 11 is a diagram showing the relationship between the rotation angle of the wheel 104a and the generated torque potential corresponding to each rotation angle. Similar to the first to fourth embodiments, it is assumed that “Nagano Prefecture” and “Kanagawa Prefecture” are stored in the external storage device 1035 as the recognition target words that are easily misrecognized. .
[0042]
In this case, “Kanagawa”, which is likely to be misrecognized as “Nagano Prefecture”, is not placed after “Nagano Prefecture” as the first recognition candidate, but “Saga Prefecture” with the next highest degree of matching after “Nagano Prefecture”. Prefecture. Therefore, “Saga Prefecture” with the second highest degree of coincidence is rearranged third, and “Shiga Prefecture” with the third highest degree of coincidence is arranged fourth. Moreover, although the valley of the torque potential is associated with each recognition candidate, the depth of the valley J corresponding to “Kanagawa Prefecture” that is easily misrecognized is larger than the depth of the valley corresponding to the other recognition candidates. deep. As a result, the operator can quickly and easily select “Kanagawa Prefecture” that is highly likely to match the words that are actually input by voice. In addition, when selecting the next recognition candidate “Saga Prefecture” from “Kanagawa Prefecture”, the flat portion K of torque potential (hereinafter referred to as “Torque Hill”) is changed from “Nagano Prefecture” to “Kanagawa Prefecture”. It is higher than the Torque Hill when it reaches the prefecture. This allows the operator to recognize that recognition candidates after “Kanagawa Prefecture” are arranged in the order of the degree of coincidence as usual.
[0043]
The present invention is not limited to the embodiment described above. For example, in order to select a recognition candidate using the input device 104, the rotation operation of the wheel 104a is performed. However, a joystick can be adopted as the input device 104, and the recognition candidate can be selected using the joystick. Alternatively, a keyboard or controller may be employed as the input device, and recognition candidates may be selected using a cross key.
[0044]
In the first to fifth embodiments described above, the case where there is the first recognition candidate that is likely to be erroneously recognized has been described, but the order of recognition candidates that are likely to be erroneously recognized may be any order. However, if the degree of coincidence of recognition candidates that are likely to be erroneously recognized is low, that is, if the recognition candidates that are likely to be erroneously recognized are lower, it takes time to select a recognition candidate that is likely to be correct. In order not to notify the operator of the presence or absence of a highly likely recognition candidate, it is also possible not to change the operational feeling such as the torque generated on the wheel 104a and the rotational operation amount. That is, it is convenient for the operator if the operation feeling of the wheel 104a as described in the first to fifth embodiments is changed when a recognition candidate that is likely to be erroneously recognized is a higher candidate.
[0045]
Furthermore, in the fifth embodiment, when recognition candidates that are easily misrecognized are included in recognition candidates arranged in descending order of degree of coincidence, the recognition candidates are stored in association with each other following the recognition candidates. However, the rearrangement of the recognition target words that are easily misrecognized can also be applied to the speech recognition apparatuses according to the first to fourth embodiments. That is, after the recognition target words that are likely to be erroneously recognized are rearranged, the operational feeling of the input device 104 may be changed as described in the first to fourth embodiments.
[0046]
In the above-described embodiment, an example in which the speech recognition apparatus according to the present invention is applied to a car navigation apparatus has been described, but the present invention can also be applied to apparatuses other than a car navigation apparatus.
[Brief description of the drawings]
FIG. 1 is a diagram showing the configuration of an embodiment of a speech recognition apparatus according to the present invention.
FIG. 2 is a diagram showing a configuration of an embodiment of an input device used in a speech recognition device according to the present invention.
FIG. 3 is a diagram for explaining an outline for generating torque in the wheel;
FIG. 4 is a flowchart showing a control procedure of an embodiment performed in the signal processing apparatus.
FIG. 5 is a diagram showing an example of recognition candidates displayed on the display.
FIG. 6 is a diagram showing a torque pattern used in the speech recognition apparatus according to the first embodiment.
FIG. 7 is a diagram showing that a selected recognition candidate is displayed on the display and notified by voice.
FIG. 8 is a diagram showing a torque pattern used in the speech recognition apparatus according to the second embodiment.
FIG. 9 is a diagram showing a torque pattern used in the speech recognition apparatus according to the third embodiment.
FIG. 10 is a diagram showing a torque pattern used in the speech recognition apparatus according to the fourth embodiment.
FIG. 11 is a diagram showing a torque pattern used in the speech recognition apparatus according to the fifth embodiment.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 ... Shaft, 101 ... Microphone, 102 ... Speaker, 103 ... Signal processing unit, 1031 ... A / D converter, 1032 ... D / A converter, 1033 ... Output amplifier, 1034 ... Signal processing device, 1034a ... CPU, 1034b ... Memory DESCRIPTION OF SYMBOLS 1035 ... External storage device, 104 ... Input device, 104a ... Wheel, 104b ... Switch, 104c ... Wheel drive motor, 104d ... Wheel control CPU, 104e ... Wheel position sensor, 104f ... Communication device, 105 ... Display display

Claims

A voice input device for inputting voice;
A storage device that stores a plurality of recognition target words for input speech, and stores recognition target words that are easily misrecognized among the recognition target words,
While calculating the degree of coincidence indicating the degree of coincidence between the speech input to the voice input device and the recognition target word stored in the storage device, the recognition target words arranged in descending order of the degree of coincidence from the top A control device as a recognition candidate;
An operating device that allows an operator to perform an operation of selecting a desired recognition candidate from at least the recognition candidates;
An operation feeling change device that changes an operation feeling when operating the operation device when the recognition candidates include recognition target words that are easily misrecognized and stored in the storage device. A speech recognition apparatus comprising:

The speech recognition apparatus according to claim 1,
The operation feeling changing device is configured such that an operator uses the operation device to select the recognition candidate until one of recognition target words stored in the storage device, which are easily misrecognized, is selected as a recognition candidate. A speech recognition apparatus characterized by reducing a load for performing a selection operation.

The speech recognition device according to claim 2,
The operation device is a rotary input device including a wheel, and an operator can perform the selection operation by rotating the wheel.
The operation feeling changing device alternately generates a force to assist and prevent a wheel rotation operation when selecting a next recognition candidate, and the erroneous recognition stored in the storage device is detected. A speech recognition apparatus characterized by reducing a force that hinders the rotation operation of the wheel until any one of easy recognition target words is selected as a recognition candidate .

The speech recognition device according to claim 2,
The operation device is a rotary input device including a wheel, and an operator can perform the selection operation by rotating the wheel.
The operation feeling changing device requires the wheel necessary for selecting the next recognition candidate until one of the recognition target words that are easily misrecognized and stored in the storage device is selected as a recognition candidate. A voice recognition device that reduces the amount of rotation operation.

The speech recognition apparatus according to claim 1,
The operation feeling changing device, among the recognized easily recognized target word with each other erroneously stored in the storage device, an operational feeling when the operator selects a recognition terms having lower the matching degree as the recognition candidates A speech recognition apparatus characterized by changing.

The speech recognition apparatus according to claim 5.
The operation device is a rotary input device including a wheel, and an operator can perform the selection operation by rotating the wheel.
The operation feeling changing device alternately generates a force to assist and prevent a wheel rotation operation when selecting a next recognition candidate, and the erroneous recognition stored in the storage device is detected. of easy recognition terms with each other, characterized in that to increase the force that prevents the force assisting the rotation operation of the wheel which is generated when the matching degree is lower recognition terms of being selected as a recognition candidate Voice recognition device.

The speech recognition apparatus according to claim 5.
The operation device is a rotary input device including a wheel, and an operator can perform the selection operation by rotating the wheel.
The operation feeling changing device, among the recognized easily recognized target word with each other erroneously stored in the storage device, from the state recognition terms having the lower the degree of coincidence are selected as the recognition candidates, the next recognition candidate A voice recognition device characterized by increasing the amount of rotation operation of the wheel necessary for selecting a wheel.

In the voice recognition device according to any one of claims 1 to 7,
Wherein the control device, if any one of the recognized words of the recognized easily recognized target word with each other erroneously stored in the storage device in the recognition candidates arranged in descending order of the degree of coincidence is present, the it is a degree of coincidence higher the misrecognized easy recognition terms of recognition target words stored in the storage device Installing and corresponding misrecognized easy recognition terms present in the ordered recognition candidate sequentially recognition speech recognition apparatus characterized by may exchange arranged next candidate.

The speech recognition apparatus according to claim 8.
The operation feeling changing device changes an operation feeling when a next recognition candidate is selected from a state in which recognition target words that are easily misrecognized rearranged by the control device are selected. apparatus.