JP2004109468A

JP2004109468A - Control system

Info

Publication number: JP2004109468A
Application number: JP2002271701A
Authority: JP
Inventors: Minoru Yokota; 横田　稔
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2002-09-18
Filing date: 2002-09-18
Publication date: 2004-04-08
Anticipated expiration: 2022-09-18
Also published as: JP3925368B2

Abstract

<P>PROBLEM TO BE SOLVED: To achieve a talk-back such that it can easily be judged whether facilities are ones that a user intends by a control system which places predetermined equipment in operation according to the name of the facilities as the recognition result of a voice that the user speaks. <P>SOLUTION: When the name of speech recognition is a facility name, the recognized facilities are specified (S50). Geographic information with which the presence position of the facilities can be grasped is acquired from a map data storage part (S60). Then geographic information is selected so that pieces of selected information are less than a given number (S80), "the name of facilities as a landmark" plus "a turn of phrase showing geographic relation" is added before the speech-recognized facility name, and a talk-back for the facility is performed (S90). Consequently, the user is able to judge whether facilities that the user intends are correctly recognized. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、使用者が発話した音声の認識結果としての施設の名称に基づいて所定の機器を動作させる制御システムに関する。
【０００２】
【従来の技術】
従来より、現在地から目的地までの適切な経路（目的地経路）を設定し、その目的地経路を、ＧＰＳ等により検出した現在位置と共にディスプレイ上に表示して経路案内する車載用のナビゲーションシステムが知られており、より円滑なドライブに寄与している。この際、目的地は利用者自身が入力するようになっている。この入力方法として、例えばディスプレイにおけるメニュー表示から階層的に目的地を検索して、所望の目的地が表示されればそれを指定する、といったものもあるが、例えばタッチスイッチ、リモコン、ハードキーなどを介した手入力が必要となる。そのため、ユーザの利便性を考えて音声によっても目的地を入力ができるようにされていることも多い。特に、いわゆるカーナビゲーションシステムを運転者自身が使用する場合には、スイッチ操作や画面注視などの動作を伴わないので車両の走行中に行っても安全性が高いため、有効な入力方法である。そして、このような音声入力の場合には、適切に音声認識がなされているか否かをユーザが確認できるようにするため、音声認識結果を音声にて出力（いわゆるトークバック）するのが一般的である。このトークバックの内容をユーザが確認し、もしも自分が発した言葉が誤って認識されていたならば再度音声入力する、という対処ができることとなる。
【０００３】
なお、上述したトークバックは、音声入力可能なカーナビゲーションシステムにおいては通常行われる一般的な動作であり、公知・公用の技術に該当するので、特に先行技術文献は開示しない。
【０００４】
【発明が解決しようとする課題】
しかしながら、従来の音声によるトークバックは、音声認識した結果をそのまま音声出力しているだけであり、つまりユーザが音声入力した目的地の施設名称などをトークバックするのみである。そのため、得に施設名称を音声入力する場合は、正しく認識されていも実際にはユーザの意図しない目的地が設定されてしまう可能性もある。例えば同じ名称の施設が別の場所にも存在する場合、ディスプレイにて地図上で表示すれば、ユーザは周辺の状況も加味して自分が意図している目的地であるか否かが分かるが、トークバックの場合はそのような周辺状況が分からない。したがって、ユーザとしては自分が音声入力した施設名称が正しくトークバックされているため、自分の意図した施設が目的地に設定されていると思ってします。それにもかかわらず、実際には同じ名称の別の場所の施設が目的地としてされ、ユーザにとっては無意味な経路が設定されてしまう可能性もある。
【０００５】
また、このような事態は経路案内には限られず、例えば施設を指定することで当該施設を含む地図を表示させる場合や、例えば店舗や駐車場などの施設においてその営業時間や料金といった詳しい情報を報知する場合などにおいても同様である。
【０００６】
そこで本発明は、使用者が発話した音声の認識結果としての施設の名称に基づいて所定の機器を動作させる制御システムにおいて、当該施設がユーザの意図したものか否かを容易に判断できるトークバックを実現することを目的とする。
【０００７】
【課題を解決するための手段及び発明の効果】
本発明の制御システムは、使用者が発話した音声を音声入力手段を介して入力し、音声認識手段にて認識した結果としての施設の名称に基づいて所定の機器を動作させる。例えば請求項７に示すように施設を経路案内のための目的地として設定したり、請求項８に示すように施設を含む地図を表示したり、請求項９に示すように施設に関する情報を報知したりすることなどが考えられる。しかし、従来手法のように単に施設名称をトークバックするだけでは、制御システム側が認識している施設とユーザが認識している施設とが一致しない場合もあり得る。
【０００８】
そこで本発明の制御システムによれば、認識した施設の存在位置が把握可能な地理的情報を地図データを用いて取得し、その地理的情報を、音声認識手段にて認識した施設の名称と共にトークバックする。このようにトークバックされた施設周辺の地理的情報に基づいて、ユーザは自分が意図している施設が正しく認識されているのか否かを判断することができる。
【０００９】
ここで、トークバックする地理的情報としては、例えば請求項２に示すように、認識した施設の住所であることが考えられる。施設の存在する住所をユーザが知っている場合には有効である。
しかし、必ずしもユーザが施設の存在する住所を知っているとは限らない。たとえその施設に過去行ったことがあったとしても、その住所を覚えているとは限らない。むしろその施設周辺の状況の方は覚えているが住所は覚えていない、といったことの方が多い。したがって、例えば請求項３に示すように、認識した施設の周辺に存在する所定の目印用施設との地理的関係を示す情報をトークバックすることも有効な対処である。例えば学校やデパート、スタジアムといった比較的大きな建造物による施設であれば、目印になり易い。もちろん大きさだけではなく、交番、コンビニエンスストア、ガソリンスタンドといった比較的小さな施設であっても、目印にはなる。そして、交差点も有効な目印になり得る。交差点には交差点名称が信号機に取り付けられていたりして、その名称はユーザの記憶に残りやすいと考えられる。特に自動車を運転している場合にはなおさらである。したがって、「〜交差点角」といった地理的関係をトークバックした場合、ユーザは該当する交差点が自分の意図する施設の近くにあったものなのか、全く的はずれであるものなのかを容易に理解できることが期待される。
【００１０】
また、このような目印用施設との地理的関係を示す情報は、例えば請求項４に示すように、距離が相対的に近い場合と遠い場合とでその表現方法を異ならせることも有効である。例えば相対的に遠ければ「〜付近」と表現し、相対的に近ければ「〜角」「〜前」「〜隣」「〜北」などと表現するのである。このようにトークバックすれば、ユーザは位置関係をイメージし易い。
【００１１】
また、目印用施設が複数存在する場合、全ての目印用施設との地理的関係を示す情報をトークバックしてもよいが、例えば本システムをカーナビゲーションシステムとして実現する場合、運転中のユーザに対してあまり多くの情報をトークバックするのは好ましくない。そこで、例えば２つ程度に絞ってトークバックすることが考えられる。その際、どの情報に絞るかについては、例えば請求項５に示すように両者の距離に基づくことが考えられる。例えば認識した施設に近い順に２つ選択してもよいし、最も近いものと最も遠いものを選択しても良い。これら２つの目印用施設の内のいずれか一方でも覚えていれば、認識した施設の適否が判断できるからである。もちろん、適用する制御システムによっては３つ以上トークバックしてもよい。
【００１２】
一方、請求項６に示すように、制御システムが移動体に搭載されている場合、認識した施設との関係で目印用施設が複数存在する場合は、移動体の性質に応じた目印としての機能を加味した優先度の高さに基づいて所定数の目印用施設のみとの地理的関係を示す情報をトークバックすることが考えられる。これは、制御システムの適用先に応じた適切な目印用施設は異なる可能性があることを鑑みたものである。例えば移動体としての自動車に制御システムが搭載されている場合、自動車を運転しているユーザにとって認識し易い目印用施設というものがあるため、それを考慮した目印用施設との地理的関係を示す情報をトークバックするのである。
【００１３】
【発明の実施の形態】
以下、本発明が適用された実施例について図面を用いて説明する。なお、本発明の実施の形態は、下記の実施例に何ら限定されることなく、本発明の技術的範囲に属する限り、種々の形態を採り得ることは言うまでもない。
【００１４】
本実施例では車載用のナビゲーションシステムに適用しているので、そのナビゲーションシステム１の概略構成を図１を参照して説明する。本ナビゲーションシステム１は、位置検出部１１、地図データ格納部１２、スイッチ情報入力部１３、表示部１４、メモリ部１５、音声出力部１６、音声入力部１７、制御部２０などを備えている。
【００１５】
前記位置検出部１１は、周知のジャイロスコープ、距離センサ、衛星からの電波に基づいて車両の位置を検出するＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇ　Ｓｙｓｔｅｍ）のためのＧＰＳ受信機などを有している。これらのセンサ等は各々が性質の異なる誤差を持っているため、複数のセンサにより各々補間しながら使用するように構成されている。なお、精度によっては上述した内の一部で構成してもよく、さらに、地磁気センサ、ステアリングの回転センサや各転動輪の車輪センサ等を用いてもよい。
【００１６】
また、前記地図データ格納部１２は、地図データ記憶手段に相当し、位置検出の精度向上のためのいわゆるマップマッチング用データ、地図データを含む各種データを格納しておく部分である。道路データは、交差点等の複数のノード間をリンクにより接続して地図を構成したものであって、それぞれのリンクに対し、リンクを特定する固有番号（リンクＩＤ）、リンクの長さを示すリンク長、リンクの始端と終端とのｘ，ｙ座標、リンクの道路幅、および道路種別（有料道路等の道路情報を示すもの）のデータからなるリンク情報を記憶している。また、地図データには、このような道路データ以外に、住所や交差点、施設を示す文字・記号等で構成される表示アイテムデータも含まれている。この表示アイテムデータは対応する位置情報を有しており、道路地図上の該当位置にこれら表示アイテム（を構成する文字・記号）を重ねて表示させることができる。またここで言うところの施設は、地図上に表示可能な施設であって、施設ジャンル及び地図上に表示する場合の施設位置等を含んでいる。なお、記憶媒体としては、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ又はハードディスクなどを用いるのが一般的であるが、メモリカード等の他の媒体を用いても良い。また当然ではあるが、これらの記憶媒体からデータを読み出す必要があるので、ＣＤプレーヤやＤＶＤプレーヤも備えられている。
【００１７】
スイッチ情報入力部１３は、例えば表示部１４のディスプレイ装置と一体になったタッチスイッチや、ディスプレイ装置の周囲に取り付けられたメカニカルなスイッチ、あるいは後述するトークスイッチなどで構成され、各種入力に使用される。タッチスイッチは、表示画面上に縦横無尽に配置された赤外線センサより構成されており、例えば指やタッチペンなどでその赤外線を遮断すると、その遮断した位置が２次元座標値（Ｘ，Ｙ）として検出される。なお、図示しないリモコンによって各種入力を行うようにしてもよく、その場合にはリモコンセンサを準備すればよい。
【００１８】
また、表示部１４は、その表示画面に、位置検出部１１から入力された車両現在位置マークと、地図データ格納部１２から入力された地図データと、さらに地図上に表示する誘導経路や後述する設定地点の目印等の付加データとを重ねて表示することができる。
【００１９】
また、メモリ部１５は、ナビゲーション機能に係る各種処理を実行するためのプログラムを記憶しており、またプログラムを実行する際のワークメモリとしても用いられる。さらに、地図データ格納部１２から取得した地図データなどを一時的に格納しておくためにも用いられる。
【００２０】
また、音声出力部１６は、スピーカを介し、音声にて走行案内をユーザ（運転者など）に報知するように構成されている。したがって、表示部１４による表示と音声出力部１６からの音声出力との両方で、ユーザに走行案内することができる。
【００２１】
また、音声入力部１７は、音声入力手段に相当し、ユーザが発した音声をマイクロフォンを介して入力し、その音声信号をデジタルデータに変換して制御部２０に入力する。
次に、制御部２０について説明する。制御部２０は通常のコンピュータとして構成されており、内部には、周知のＣＰＵ、ＲＯＭ、ＲＡＭ、Ｉ／Ｏおよびこれらの構成を接続するバスラインが備えられているのであるが、ここでは、図１に示すように、機能ブロックとして示してある。すなわち、制御部２０は、地図データ取得部２１、マップマッチング部２２、経路計算部２３、経路案内部２４、描画部２５、画面制御管理部２６、音声認識・トークバック制御部２７を備えている。
【００２２】
マップマッチング部２２は、位置検出部１１で検出した現在地情報と、地図データ格納部１２に格納されている地図データの道路形状データなどを使って、現在位置がどの道路上に存在するかを特定する。また、利用者はスイッチ情報入力部１３を使って所望の地図を表示させるなどの指示を入力し、さらに目的地を設定したりする。
【００２３】
経路計算部２３では、マップマッチング部２２で算出された現在位置の情報や利用者が設定した出発地と上記目的地までの経路を計算する。このような自動的に最適な経路を設定する手法は、ダイクストラ法等の手法が知られている。そして、経路案内部２４は、上記経路計算部２３による計算結果と地図データ内に格納されている道路の形状データや、交差点の位置情報や踏切の位置情報などから経路案内に必要なポイントを算出したり、どのような案内（例えば右に曲がるのか左に曲がるのかなど）が必要であるのかを決定する。
【００２４】
描画部２５は、現在位置の地図や高速道路の略図、交差点付近では交差点拡大図などを、画面制御管理部２６の指示に従ってＶＲＡＭなどで構成された描画メモリ部１７に描画する。なお、この描画に際しては、アウトラインフォント技術により、文字・記号の表示方法を状況によって決定する。この描画された地図などは、画面制御管理部２６の指示によって表示部１４へ表示される。
【００２５】
一方、音声認識・トークバック制御部２７は、音声認識手段及びトークバック制御手段に相当し、音声入力部１７を介して入力される音声信号から、ユーザが発話した言葉としてのキーワード（以下、発話キーワードともいう）を認識して取得するための音声認識部を備えており、音声認識部は、照合部及び認識辞書部を備えている。この認識辞書部は、使用者が発話すると想定される複数のキーワード（認識対象語彙）毎の辞書データを記憶している。そして、照合部では、音声入力部１７から入力した音声データと認識辞書部の辞書データを用いて照合（認識）を行う。そして、その認識結果をトークバックする。このトークバックは、音声出力部１６を制御し、認識した結果を音声によって出力させる。その後、ユーザからの指示に応じた処理を行う。認識結果が施設名称であった場合には単にその施設名称のみをトークバックするのではなく、施設の存在位置が把握可能な地理的情報もトークバックするのであるが、この点については後述する。
【００２６】
なお、本実施例においては、利用者がスイッチ情報入力部１３を構成するスイッチの一つであるトークスイッチ（図示せず）を押すと、その後に音声入力が可能となるよう構成されている。そして、トークスイッチを押したのに音声入力がされない場合も想定されるため、トークスイッチが押されて音声入力が可能となった後に所定時間以上の無音区間があれば、音声入力が不可能な状態に移行する。したがって、音声入力部１７はトークスイッチが押されたタイミングを監視しており、押されたことを検知する。
【００２７】
地図データ取得部２１は、上記各処理部で必要となる地図データを地図データ格納部１２から取得して、各処理部へ供給する。なお、上記各処理は、メモリ部１５内のプログラムに基づき実行され、同じくメモリ部１５内のワークメモリを用いて実行される。
【００２８】
続いて、音声認識に係る処理について、説明する。
まず、スイッチ情報入力部１３からの情報に基づいてトークスイッチがオンされたか（押下されたか）否かを判断し（Ｓ１０）、トークスイッチがオンされた場合には（Ｓ１０：ＹＥＳ）、音声抽出処理を行う（Ｓ２０）。この音声抽出処理は、音声入力部１７において、マイクを介して入力された音声データに基づき音声区間であるか雑音区間であるかを判定し、音声区間のデータを抽出して音声認識・トークバック制御部２７へ出力する処理である。
【００２９】
次に、音声認識処理を行い（Ｓ３０）、その認識結果が施設名称か否かを判断する（Ｓ４０）。
施設名称でない場合には（Ｓ４０：ＮＯ）、通常のトークバックを実行する（Ｓ１００）。この通常のトークバックとは、認識結果のみをトークバックするものであり、音声認識・トークバック制御部２７が音声出力部１６を制御し、認識結果を音声によりスピーカから出力させることによってユーザに認識結果を確認させるものである。
【００３０】
そして、正しい認識であったか否かを、利用者からの指示に基づいて判断する（Ｓ１１０）。具体的には、ユーザによるマイクからの音声入力に基づいて判断する。例えば「はい」という肯定的な内容を示す音声入力があれば正しい認識であったと判断できるし、「いいえ」「違う」などの否定的な内容を示す音声入力があれば誤った認識であったと判断できる。誤った認識であった場合には（Ｓ１１０：ＮＯ）、Ｓ１０へ戻って、処理を繰り返す。一方、Ｓ１１０で肯定判断、すなわち正しい認識であると判断した場合には、音声認識・トークバック制御部２７にて認識結果を確定し（Ｓ１２０）、所定の確定後処理を実行する（Ｓ１３０）。この場合の確定後処理とは、例えば認識結果が「メニュー画面」であれば、メニュー画面の表示を指示するコマンドが入力されたものとして画面制御管理部２６へ出力するといった処理である。Ｓ１３０の処理の後はＳ１１０へ戻って、処理を繰り返す。
【００３１】
一方、Ｓ４０にて肯定判断、すなわち認識した結果が施設名称である場合には、Ｓ５０〜Ｓ９０の処理を実行する。まず、認識した施設を特定する（Ｓ５０）。具体的には、その施設が存在する位置座標等を特定する。なお、施設の存在位置が特定できればよいため、位置座標以外のデータであってもよいが、本実施例では地図データ中に施設毎の位置座標が含まれているため、位置座標で特定するものとする。そして、その特定した位置座標等に基づいて当該施設の存在位置が把握可能な地理的情報を地図データ格納部１２から取得し（Ｓ６０）、取得した地理的情報が所定数以上存在する場合には（Ｓ７０：ＹＥＳ）、所定数未満となるよう地理的情報を選択する（Ｓ８０）。一方、地理的情報が所定数未満であれば（Ｓ７０：ＮＯ）、地理的情報の選択はしない。このように地理的情報の選択がなされた場合（Ｓ８０）または地理的情報が所定数未満であった場合（Ｓ７０：ＮＯ）は、施設用のトークバックを実行する（Ｓ９０）。そして、Ｓ９０の処理後はＳ１１０へ移行する。Ｓ１１０で肯定判断、すなわち正しい認識であると判断した場合には、音声認識・トークバック制御部２７にて認識結果を確定し（Ｓ１２０）、所定の確定後処理を実行する（Ｓ１３０）。例えば経路設定のための目的地設定の場面であれば、このように認識・確定した施設を目的地として経路計算部２３へ出力するといったことである。なお、「○○を目的地として設定してよいですか？」といった確認のための問い合わせを行うようにしてもよい。また、例えば施設を指定することでその施設を含む地図を表示させる場合や、例えば店舗や駐車場などの施設においてその営業時間や料金といった詳しい情報を報知させる場合などであれば、このようにして確定した施設を各処理の実行主体部に対して出力する。
【００３２】
これが施設名称のトークバックの処理の概要であるが、各処理について補足説明する。
（１）　Ｓ６０における地理的情報の取得について
この地理的情報は、Ｓ５０で特定した施設がどのような位置にあるかを示すための情報であるから、その特定した施設周辺の地理的情報を取得することとなる。本実施例では、その施設の周辺に存在する所定の目印用施設との地理的関係を示す情報をトークバックするため、まず目印用施設を抽出する。目印用施設としては、例えば学校やデパート、スタジアムといった比較的大きな建造物、交番、コンビニエンスストア、ガソリンスタンドといった比較的小さな施設、さらには交差点などが考えられる。交差点には交差点名称が信号機に取り付けられていることも多く、実際にその場にユーザが出向いた際には有効な目印になり得る。
【００３３】
この目印用施設の抽出手法について、何例か説明する。
▲１▼まず、例えばＳ５０で特定した施設の位置を中心とした半径Ｘｍの範囲内に存在するものを抽出することが考えられる。最低１個は目印用施設を抽出したいので、数段階の半径Ｘｍを準備しておき、例えば最初半径５０ｍの範囲で探索し、目印用施設が一つも見つからなければ次に半径１００ｍの範囲に広げて探索する、というようにして、一つ以上見つかった場合には、そこで探索を終了することが考えられる。なお、あまり範囲を広げすぎても実質的に目印になり得ないため、上限の半径は決めておく。そして、その準備した最も広い範囲内でも目印用施設が見つからない場合は、例えばＳ５０で特定した施設の住所を利用して「〜市付近」「〜町付近」というように、目印用施設によらない地理的情報を採用しても良い。
【００３４】
▲２▼また、「周辺地図区画から設定」する手法も考えられる。地図データは区画分けして格納されているので、区画内で探すのである。その際１区画だけを探索範囲とすると区画の端付近の施設の場合に近くに別施設があっても探せないので、例えばＳ５０で特定した施設が地図区画の端部付近に存在する場合には、その端部に隣接する地図区画も探すようにすればよい。なお、区画の大きさは地図データの量に応じて変わっているので、いわゆる都心部、郊外部、山村部といった地域ごとに探索範囲が変わる可能性がある。
【００３５】
▲３▼また、「同一行政区画（大字レベルとか市町村レベル）で設定」する手法も考えられる。この場合も、上記▲２▼と同様に、行政区画の端付近に特定施設が存在する場合には、隣接する行政区画も探索範囲とする。
（２）　Ｓ８０における地理的情報の選択について
Ｓ８０では、取得した地理的情報が所定数以上存在する場合に（Ｓ７０：ＹＥＳ）、所定数未満となるよう選択する。この所定数については、例えば３程度が考えられる。本実施例では車載用のナビゲーションシステムに適用しているため、トークバックも基本的には車両を運転しているユーザ（ドライバ）に対してなされることとなる。運転中のユーザに対してあまり多くの情報をトークバックするのは好ましくないため、例えば２つ程度に絞ってトークバックすることが好ましい。そのため、例えばＳ７０で３以上の地理的情報が存在すれば、Ｓ８０で２以下に絞ることが考えられる。
【００３６】
そして、どの地理的情報に絞るかについての手法を何例か説明する。
▲１▼例えばＳ５０で特定した施設との距離に基づくことが考えられる。具体的には、「近い順」に２つ選択したり、あるいは、「最も近いものと最も遠いもの」を選択する、といったことである。これら２つの目印用施設の内のいずれか一方でも覚えていれば、認識した施設の適否が判断できる。
【００３７】
▲２▼また、「種別」に基づいて選択することも考えられる。具体的には、本実施例の場合は車載用のナビゲーションシステムであるため、運転者から見て認識し易く、一般的によく知られている施設の優先度を高くし、そのような施設を優先して選択する、といったことである。例えば市役所・駅・スタジアム・学校などは優先すべき施設として有効である。また交差点は自動車からの確認の点、また全国いたるところに存在するという点で、とても有効であると考えられる。なお、この場合、複数の交差点を目印用施設として採用することも可能であるが、一般的には、別の種類の施設を目印用施設として採用した方が好ましい場合が多いと考えられるため、例えば交差点を１つ選択した場合には、２つ目は交差点以外の施設として上述した市役所・駅・スタジアム・学校などを選択することが好ましい。
【００３８】
▲３▼また、上述した▲１▼あるいは▲２▼の観点に加えて、トークバックする際の言い回しも考慮することが考えられる。例えば「〜付近〜交差点角」という言い回しはトークバックされたユーザにとって施設の位置をイメージするのに好ましいが、例えば「〜付近〜付近」というように同じ言い回しが続くものは、施設の位置をイメージさせるという目的からすれば相対的に好ましくない。したがって、極力、同じ言い回しが続かないような位置にある施設を採用することが考えられる。
（３）　Ｓ９０における施設用のトークバックについて
Ｓ１００での通常のトークバックの場合は、認識結果のみをトークバックするものであったが、Ｓ９０での施設用トークバックは、Ｓ５０で特定された施設の名称の前に地理的情報を付加してトークバックする。具体的には、「目印用施設の名称」＋「その目印用施設とＳ５０で特定された施設との地理的関係を示す言い回し」＋「Ｓ５０で特定された施設の名称」をトークバックする。
【００３９】
ここで、地理的関係を示す言い回しとしては、例えば次のようなものが考えられる。
▲１▼「〜付近」：対象となる施設までの距離がある程度離れている場合。例えば１００ｍ以上、というようにある程度距離が離れている場合は、「〜付近」という言い回しがユーザの感覚に合うものと考えられる。
【００４０】
▲２▼「〜交差点角」：交差点に距離が近い場合。例えば交差点から数十ｍの距離なら「〜交差点角」を使用できると考えられる。
▲３▼「〜前」：例えば小学校の正門に道を挟んで目的地がある場合には、「△△小学校前」といった言い回しが考えられる。但し目印用施設の正面方向の情報が必要であり、専用データが必要となる。
【００４１】
▲４▼「〜隣」：例えば市役所の横に道を挟まずに対象の施設がある場合には、「□□市役所隣」といった言い回しが考えられる。
▲５▼「〜北」「〜南」「〜東」「〜西」：方角が明瞭な場合には、このような表現が考えられる。
【００４２】
これら▲１▼〜▲５▼について考えてみると、対象の施設からの距離が近い順に、例えば、▲３▼，▲４▼＜▲５▼，▲２▼＜▲１▼というような関係が考えられる。これは、▲３▼の「〜前」や「▲４▼の「〜隣」の場合に比べればやや離れている場合にこれらの表現を用いることが好ましいと考えられるからであり、近接していれば、方角は関係なく「〜前」や「〜隣」で位置特定ができるからである。したがって、目印用施設に対してどのような地理的関係を示す言い回しを付加するかについては、例えば近距離の言い回しを優先して選択し、▲２▼〜▲５▼のいずれにも条件が当てはまらない場合には、▲１▼の「〜付近」を選択することが好ましいと考えられる。
【００４３】
このようにして「目印用施設の名称」＋「地理的関係を示す言い回し」＋「Ｓ５０でされた特定施設の名称」をトークバックする際の具体例としては、例えば「○○○ドーム付近のスーパー△△△大曽根店ですか？」、「刈谷市刈谷駅前交差点角の□□□書店ですか？」といったものが挙げられる。
【００４４】
なお、このようにトークバックする際、地理的情報を専用に持つことも可能であるが、本実施例においては、地図データ中に存在する索引、表示文字の情報を利用している。現状のカーナビゲーションシステムにおいては、索引情報、表示文字情報は既に地図データとして格納されているのが一般的である。そして、大抵の場合５０音検索は標準機能として実施できるので、索引データに読みのデータが入っており、その読みデータを用いて合成音声を生成し、トークバックすることができる。
【００４５】
また、本実施例のシステムは音声認識機能を持つため、その音声認識のためのデータをそのまま使えばよい。音声認識データは施設索引データ、住所索引データと関連付けられているので、それを逆利用すれば容易に実現できる。
一方、表示文字情報については、現状では読みデータが入っていないことが多いため、その場合は読みデータを追加する必要がある。但し、そのような読みデータを追加するのであれば、専用データを持たせる方が実現し易いとも考えられる。そして、専用データを作成する場合には、最低限「座標」「優先付けできる情報（種別など）」「トークバック用データ（肉声もしくは合成音声読み）」が必要と考えられる。
【００４６】
また、交差点に関しては、現状においても高機能のナビゲーションでは肉声の音声データが格納されているので、それを利用することもできる。
以上説明したように、本実施例のナビゲーションシステム１においては、ユーザが例えば経路案内のための目的地を施設名称で音声入力することができる。しかし、単に施設名称をトークバックするだけでは、ナビゲーションシステム１側が認識している施設とユーザが認識している施設とが一致しない場合もあり得る。そのような認識の不一致が生じているにもかかわらずそのまま経路が設定されてしまうと、ユーザは、設定された経路が表示部１４に表示されたものを見て初めて自分が意図していたのとは異なる目的地が設定されていたことに気付き、再度目的地設定からやり直さなくてはならないこととなる。
【００４７】
そこで本実施例のナビゲーションシステム１によれば、施設用のトークバックとして、音声認識した施設名称の前に「目印用施設の名称」＋「地理的関係を示す言い回し」を加えてトークバックする。これにより、ユーザは自分が意図している施設が正しく認識されているのか否かを判断することができる。例えば「刈谷市刈谷駅前交差点角の□□□書店ですか？」というトークバックがなされた場合、ユーザは自分が考えていた□□□書店が実は別の地域に存在する別のチェーン店であることに気付くことができる。例えば、このような複数のチェーン店が存在する場合、ユーザが今度は支店名等も合わせて音声入力することによって、ユーザの意図する施設が認識されることとなる。
【００４８】
なお、候補となるチェーン店が複数抽出される場合には、例えば「目印用施設の名称」＋「地理的関係を示す言い回し」を加えたトークバックを順番にしていき、ユーザが例えば「違う」といった音声入力した場合には次の候補をトークバックし、ユーザが例えば「はい」といった音声入力した場合には、図２のＳ１１０にて肯定判断となり、認識結果を確定（Ｓ１２０）するようにしてもよい。
【００４９】
［別実施例］
（１）上記実施例においては、認識した施設の周辺に存在する所定の目印用施設との地理的関係を示す情報をトークバックしたが、例えば認識した施設の住所を地理的情報としてトークバックしてもよい。例えば「○○市△△町□□番地の□□□書店ですか？」といったトークバックである。施設の存在する住所をユーザが知っている場合にはこのようなトークバックも有効である。しかし、必ずしもユーザが施設の存在する住所を知っているとは限らない。たとえその施設に過去行ったことがあったとしても、その住所を覚えているとは限らないため、上記実施例のような目印用施設との地理的関係をトークバックした方が有効な場合が多いと考えられる。
【００５０】
（２）上記実施例では、車載用のナビゲーションシステム１に適用した例を挙げて説明したが、その用途に限定されることなく、例えば人間が携帯する端末装置などであっても適用可能である。なお、上記実施例では、Ｓ８０に関する説明において、取得した地理的情報が所定数以上存在する場合に「種別」に基づいて選択することも考えられる旨を述べた。そして、車載用のナビゲーションシステムへの適用の場合には、運転者から見て認識し易く、一般的によく知られている施設の優先度を高くするようにした。したがって、人間が携帯する端末装置などに適用する場合には、今度は歩く人にとって認識し易いという観点で優先度を決めればよい。
【００５１】
また、ナビゲーションシステムに限定されず、ユーザが発話した音声の認識結果としての施設の名称に基づいて所定の機器を動作させる制御システムであれば同様に適用できる。
（３）上述した音声認識・トークバックに係る処理をコンピュータシステムにて実現する機能は、例えば、コンピュータシステム側で起動するプログラムとして備えることができる。このようなプログラムの場合、例えば、フレキシブルディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ハードディスク等のコンピュータ読み取り可能な記録媒体に記録し、必要に応じてコンピュータシステムにロードして起動することにより用いることができる。この他、ＲＯＭやバックアップＲＡＭをコンピュータ読み取り可能な記録媒体として前記プログラムを記録しておき、このＲＯＭあるいはバックアップＲＡＭをコンピュータシステムに組み込んで用いても良い。
【図面の簡単な説明】
【図１】実施例としてのナビゲーションシステムの全体構成を示すブロック図である。
【図２】音声認識に係る処理を示すフローチャートである。
【符号の説明】
１…ナビゲーションシステム、１１…位置検出部、１２…地図データ格納部、１３…スイッチ情報入力部、１４…表示部、１５…メモリ部、１６…音声出力部、１７…音声入力部、２０…制御部、２１…地図データ取得部、２２…マップマッチング部、２３…経路計算部、２４…経路案内部、２５…描画部、２６…画面制御管理部、２７…音声認識・トークバック制御部。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a control system for operating a predetermined device based on a name of a facility as a recognition result of a voice spoken by a user.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, an in-vehicle navigation system that sets an appropriate route (a destination route) from a current location to a destination, displays the destination route on a display together with a current position detected by GPS or the like, and provides route guidance has been developed. It is known and contributes to a smoother drive. At this time, the destination is input by the user himself. As this input method, for example, there is a method in which a destination is hierarchically searched from a menu display on a display, and a desired destination is designated if the desired destination is displayed. Manual input via is required. For this reason, in many cases, the destination can be input by voice in consideration of user's convenience. In particular, when the driver himself uses a so-called car navigation system, it is an effective input method because it does not involve any operation such as a switch operation or a screen gaze, and thus has high safety even when the vehicle is running. In the case of such a voice input, it is general to output a voice recognition result by voice (so-called talkback) so that the user can confirm whether or not voice recognition has been properly performed. It is. The user can confirm the content of the talkback, and if the word uttered by the user is erroneously recognized, a voice input can be performed again.
[0003]
Note that the above-described talkback is a general operation that is usually performed in a car navigation system capable of inputting voice, and corresponds to a known / public technology.
[0004]
[Problems to be solved by the invention]
However, the conventional voice talkback merely outputs the voice recognition result as it is, that is, talks back only the destination facility name or the like to which the user has input voice. Therefore, when a facility name is input by voice, there is a possibility that a destination not intended by the user is actually set even if the facility name is correctly recognized. For example, if a facility with the same name also exists in another place, if it is displayed on a map on the display, the user can know whether or not the destination is intended by taking the surrounding situation into account. However, in the case of talkback, such a surrounding situation is not known. Therefore, the user thinks that the intended facility is set as the destination because the name of the facility that he / she spoken is correctly spoken back. Nevertheless, there is a possibility that a facility of another location having the same name is actually set as a destination, and a route meaningless to the user is set.
[0005]
In addition, such a situation is not limited to route guidance.For example, when a map including the facility is displayed by designating the facility, detailed information such as business hours and charges at a facility such as a store or a parking lot is provided. The same applies to the case of notifying.
[0006]
Therefore, the present invention provides a control system that operates a predetermined device based on the name of a facility as a result of recognition of a voice uttered by a user, and a talkback that can easily determine whether the facility is intended by the user. The purpose is to realize.
[0007]
Means for Solving the Problems and Effects of the Invention
The control system of the present invention inputs a voice uttered by a user through a voice input unit, and operates a predetermined device based on the name of the facility as a result of recognition by the voice recognition unit. For example, a facility is set as a destination for route guidance as described in claim 7, a map including the facility is displayed as described in claim 8, and information on the facility is notified as described in claim 9. And so on. However, simply talking back the facility name as in the conventional method may not match the facility recognized by the control system with the facility recognized by the user.
[0008]
Therefore, according to the control system of the present invention, geographical information capable of grasping the location of the recognized facility is acquired using the map data, and the geographical information is talked with the name of the facility recognized by the voice recognition means. Back. Based on the geographical information around the facility talked back in this way, the user can determine whether or not the facility intended by the user is correctly recognized.
[0009]
Here, the geographical information to be talked back may be, for example, the address of the recognized facility, as described in claim 2. This is effective when the user knows the address where the facility is located.
However, the user does not always know the address where the facility is located. Even if you have been to the facility in the past, you may not always remember the address. Rather, they often remember the situation around the facility but not the address. Therefore, it is also an effective measure to talk back information indicating a geographical relationship with a predetermined landmark facility existing around the recognized facility, for example, as described in claim 3. For example, facilities using relatively large buildings such as schools, department stores, and stadiums are likely to be landmarks. Of course, not only the size, but also a relatively small facility such as a police box, convenience store, gas station, can be a landmark. And intersections can also be effective landmarks. An intersection name is attached to a traffic light at an intersection, and it is considered that the name is likely to remain in the user's memory. This is especially true when driving a car. Therefore, when talking back a geographical relationship such as "intersection angle", the user can easily understand whether the relevant intersection is near the facility intended by the user or is completely off-target. There is expected.
[0010]
Further, it is also effective to make the expression method of the information indicating the geographical relationship with the landmark facility different when the distance is relatively short and when the distance is relatively long. . For example, if it is relatively far, it is expressed as "near", and if it is relatively close, it is expressed as "-corner", "-before", "-next", "-north", and so on. With such a talkback, the user can easily imagine the positional relationship.
[0011]
In addition, when there are a plurality of landmark facilities, information indicating the geographical relationship with all the landmark facilities may be talked back. On the other hand, it is not preferable to talk back too much information. Therefore, for example, it is conceivable to talk back to about two. At this time, it is conceivable that the information to be narrowed down is based on the distance between the two as described in claim 5 for example. For example, two items may be selected in the order closest to the recognized facility, or the closest item and the farthest item may be selected. This is because if one of these two landmark facilities is memorized, the appropriateness of the recognized facility can be determined. Of course, three or more talkbacks may be performed depending on the control system applied.
[0012]
On the other hand, when the control system is mounted on a moving body, and when there are a plurality of landmark facilities in relation to the recognized facility, the function as a mark according to the property of the moving body is provided. It is conceivable to talk back information indicating a geographical relationship with only a predetermined number of landmark facilities based on the level of priority taking into account the priority. This is in view of the fact that the appropriate landmark facility depending on the application destination of the control system may be different. For example, when a control system is mounted on a vehicle as a moving body, there is a landmark facility that is easily recognizable to a user who is driving the vehicle. It talks back information.
[0013]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments to which the present invention is applied will be described with reference to the drawings. It is needless to say that the embodiments of the present invention are not limited to the following examples, and can take various forms as long as they belong to the technical scope of the present invention.
[0014]
Since the present embodiment is applied to an in-vehicle navigation system, a schematic configuration of the navigation system 1 will be described with reference to FIG. The navigation system 1 includes a position detection unit 11, a map data storage unit 12, a switch information input unit 13, a display unit 14, a memory unit 15, a voice output unit 16, a voice input unit 17, a control unit 20, and the like.
[0015]
The position detection unit 11 includes a well-known gyroscope, a distance sensor, a GPS (Global Positioning System) GPS receiver that detects the position of the vehicle based on radio waves from satellites, and the like. Since each of these sensors and the like has an error having a different property, the sensors and the like are configured to be used while being interpolated by a plurality of sensors. Note that, depending on the accuracy, a part of the above-described components may be used, and a geomagnetic sensor, a steering rotation sensor, a wheel sensor for each rolling wheel, or the like may be used.
[0016]
The map data storage unit 12 corresponds to a map data storage unit and stores various data including map data and so-called map matching data for improving the accuracy of position detection. The road data forms a map by connecting a plurality of nodes such as intersections with links, and for each link, a unique number (link ID) for specifying the link and a link indicating the length of the link It stores link information including data of length, x and y coordinates of the start and end of the link, road width of the link, and road type (indicating road information such as a toll road). In addition to the road data, the map data also includes display item data composed of characters and symbols indicating addresses, intersections, and facilities. The display item data has corresponding position information, and these display items (characters and symbols constituting the display items) can be superimposed and displayed at corresponding positions on the road map. The facilities referred to here are facilities that can be displayed on a map, and include a facility genre, a facility position when displayed on a map, and the like. As a storage medium, a CD-ROM, a DVD-ROM, a hard disk, or the like is generally used, but another medium such as a memory card may be used. Needless to say, since it is necessary to read data from these storage media, a CD player and a DVD player are also provided.
[0017]
The switch information input unit 13 includes, for example, a touch switch integrated with the display device of the display unit 14, a mechanical switch attached around the display device, or a talk switch described later, and is used for various inputs. You. The touch switch is composed of an infrared sensor that is arranged on the display screen endlessly and vertically. For example, when the infrared ray is interrupted by a finger or a touch pen, the interrupted position is detected as a two-dimensional coordinate value (X, Y). Is done. Various inputs may be performed by a remote controller (not shown), in which case a remote control sensor may be prepared.
[0018]
In addition, the display unit 14 displays, on its display screen, the vehicle current position mark input from the position detection unit 11, the map data input from the map data storage unit 12, and a guidance route to be further displayed on the map. The additional data such as the mark of the set point can be displayed in a superimposed manner.
[0019]
The memory unit 15 stores a program for executing various processes related to the navigation function, and is used as a work memory when executing the program. Further, it is also used for temporarily storing map data and the like obtained from the map data storage unit 12.
[0020]
Further, the audio output unit 16 is configured to notify a user (a driver or the like) of travel guidance by voice via a speaker. Therefore, the user can be guided by both the display on the display unit 14 and the audio output from the audio output unit 16.
[0021]
The voice input unit 17 corresponds to a voice input unit, and inputs voice emitted by the user via a microphone, converts the voice signal into digital data, and inputs the digital data to the control unit 20.
Next, the control unit 20 will be described. The control unit 20 is configured as a normal computer, and internally includes a well-known CPU, ROM, RAM, I / O, and a bus line connecting these components. As shown in FIG. 1, it is shown as a functional block. That is, the control unit 20 includes a map data acquisition unit 21, a map matching unit 22, a route calculation unit 23, a route guidance unit 24, a drawing unit 25, a screen control management unit 26, and a voice recognition / talkback control unit 27. .
[0022]
The map matching unit 22 specifies on which road the current position is located using the current position information detected by the position detection unit 11 and the road shape data of the map data stored in the map data storage unit 12. I do. Further, the user uses the switch information input unit 13 to input an instruction to display a desired map or the like, and further sets a destination.
[0023]
The route calculation unit 23 calculates information on the current position calculated by the map matching unit 22 and a route from the departure point set by the user to the destination. As a method for automatically setting the optimum route, a method such as the Dijkstra method is known. The route guidance unit 24 calculates points necessary for route guidance from the calculation result of the route calculation unit 23, road shape data stored in the map data, intersection position information, level crossing position information, and the like. And what kind of guidance is needed (for example, whether to turn right or left).
[0024]
The drawing unit 25 draws a map of the current position, a schematic diagram of an expressway, an enlarged view of an intersection near an intersection, and the like in the drawing memory unit 17 configured by a VRAM or the like in accordance with an instruction from the screen control management unit 26. At the time of this drawing, the display method of characters / symbols is determined according to the situation by the outline font technique. The drawn map and the like are displayed on the display unit 14 according to an instruction from the screen control management unit 26.
[0025]
On the other hand, the voice recognition / talkback control unit 27 corresponds to a voice recognition unit and a talkback control unit, and outputs a keyword (hereinafter referred to as a utterance) as a word spoken by a user from a voice signal input through the voice input unit 17. A voice recognition unit for recognizing and acquiring a keyword is also provided. The voice recognition unit includes a collation unit and a recognition dictionary unit. This recognition dictionary unit stores dictionary data for each of a plurality of keywords (recognition target vocabulary) that the user is supposed to speak. The collating unit performs collation (recognition) using the voice data input from the voice input unit 17 and the dictionary data of the recognition dictionary unit. Then, the recognition result is talked back. This talkback controls the audio output unit 16 to output the recognized result by audio. After that, a process according to the instruction from the user is performed. When the recognition result is a facility name, not only the facility name is talked back, but also geographical information that can grasp the location of the facility is talked back, which will be described later.
[0026]
In this embodiment, when the user presses a talk switch (not shown), which is one of the switches included in the switch information input unit 13, a voice input can be performed thereafter. Since it is assumed that no voice input is made even when the talk switch is pressed, if there is a silent section for a predetermined time or more after the talk switch is pressed and voice input becomes possible, voice input becomes impossible. Transition to state. Therefore, the voice input unit 17 monitors the timing at which the talk switch is pressed, and detects that the talk switch is pressed.
[0027]
The map data obtaining unit 21 obtains the map data required by each processing unit from the map data storage unit 12 and supplies the map data to each processing unit. Each of the above processes is executed based on a program in the memory unit 15 and is also executed using a work memory in the memory unit 15.
[0028]
Next, a process related to voice recognition will be described.
First, it is determined whether or not the talk switch is turned on (pressed) based on information from the switch information input unit 13 (S10). If the talk switch is turned on (S10: YES), voice extraction is performed. Processing is performed (S20). In the voice extraction processing, the voice input unit 17 determines whether the voice section is a voice section or a noise section based on voice data input via a microphone, extracts data of the voice section, and performs voice recognition / talkback. This is a process of outputting to the control unit 27.
[0029]
Next, voice recognition processing is performed (S30), and it is determined whether or not the recognition result is a facility name (S40).
If it is not a facility name (S40: NO), a normal talkback is executed (S100). The normal talkback is to talk back only the recognition result, and the voice recognition / talkback control unit 27 controls the voice output unit 16 and outputs the recognition result from the speaker by voice to recognize the user. This is to confirm the result.
[0030]
Then, whether or not the recognition is correct is determined based on an instruction from the user (S110). Specifically, the determination is made based on a voice input from the microphone by the user. For example, if there is a voice input indicating a positive content such as "Yes", it can be determined that the recognition is correct, and if there is a voice input indicating a negative content such as "No" or "No", the recognition is incorrect. I can judge. If the recognition is incorrect (S110: NO), the process returns to S10 and the process is repeated. On the other hand, if a positive determination is made in S110, that is, if it is determined that the recognition is correct, the recognition result is determined by the voice recognition / talkback control unit 27 (S120), and a predetermined post-determination process is executed (S130). The post-confirmation process in this case is, for example, a process in which if the recognition result is a “menu screen”, a command instructing display of the menu screen is input and output to the screen control management unit 26. After the process of S130, the process returns to S110 and repeats the process.
[0031]
On the other hand, if an affirmative determination is made in S40, that is, if the recognized result is a facility name, the processes in S50 to S90 are executed. First, the recognized facility is specified (S50). Specifically, the position coordinates and the like where the facility exists are specified. Note that data other than the position coordinates may be used as long as the location of the facility can be specified. However, in this embodiment, since the map data includes the position coordinates of each facility, the data is specified by the position coordinates. And Then, based on the specified position coordinates and the like, geographical information from which the location of the facility can be grasped is acquired from the map data storage unit 12 (S60), and if a predetermined number or more of the acquired geographical information is present, (S70: YES), geographic information is selected so as to be less than the predetermined number (S80). On the other hand, if the geographic information is less than the predetermined number (S70: NO), no geographic information is selected. When the geographic information is thus selected (S80) or when the geographic information is less than the predetermined number (S70: NO), a talkback for the facility is executed (S90). Then, after the process of S90, the process proceeds to S110. If a positive determination is made in S110, that is, if it is determined that the recognition is correct, the recognition result is determined by the voice recognition / talkback control unit 27 (S120), and a predetermined post-determination process is executed (S130). For example, in the case of a destination setting scene for route setting, the facility thus recognized and determined is output to the route calculation unit 23 as a destination. In addition, an inquiry for confirmation, such as "Is XX set as a destination?" Also, for example, when a map including the facility is displayed by designating the facility, or when, for example, detailed information such as business hours and charges is notified in a facility such as a store or a parking lot, etc. The determined facility is output to the execution subject of each process.
[0032]
This is an outline of the process of talking back the facility name, and each process will be supplementarily described.
(1) Acquisition of geographic information in S60
Since this geographical information is information for indicating the location of the facility specified in S50, geographical information around the specified facility is obtained. In the present embodiment, first, a landmark facility is extracted in order to talk back information indicating a geographical relationship with a predetermined landmark facility existing around the facility. The landmark facility may be, for example, a relatively large building such as a school, department store, or stadium, a relatively small facility such as a police box, a convenience store, or a gas station, or an intersection. Intersection names are often attached to traffic lights at intersections, and can be a valid landmark when the user actually goes to the place.
[0033]
Several examples of the method for extracting the mark facility will be described.
{Circle around (1)} First, it is conceivable to extract, for example, those existing within a range of a radius Xm centered on the position of the facility specified in S50. Since at least one marker facility is to be extracted, several levels of radius Xm are prepared. For example, first search within a radius of 50 m, and if no marker facility is found, then expand to a radius of 100 m. If one or more are found, the search may be terminated there. Note that even if the range is excessively widened, it cannot substantially be a landmark, so the upper limit radius is determined. If the facility for the landmark is not found even in the prepared widest range, the address of the facility specified in S50 is used, and the address of the facility specified in S50 is changed to "around the city" or "around the town". No geographic information may be employed.
[0034]
{Circle around (2)} A method of “setting from surrounding map sections” is also conceivable. Since the map data is divided and stored, it is searched within the block. At this time, if only one section is set as the search range, it is not possible to search for a facility near the end of the section even if there is another facility nearby. For example, if the facility specified in S50 exists near the end of the map section, The map section adjacent to the end may be searched. Since the size of the section changes according to the amount of map data, the search range may change for each area such as the so-called city center, suburbs, and mountain villages.
[0035]
{Circle around (3)} Also, a method of “setting at the same administrative division (larger character level or municipal level)” is conceivable. Also in this case, as in the case of the above (2), when a specific facility exists near the end of the administrative division, the adjacent administrative division is also set as the search range.
(2) Selection of geographic information in S80
In S80, if the acquired geographic information is equal to or more than the predetermined number (S70: YES), the selection is made to be less than the predetermined number. The predetermined number may be, for example, about three. In this embodiment, since the present invention is applied to an in-vehicle navigation system, talkback is basically performed for a user (driver) driving the vehicle. Since it is not preferable to talk back too much information to the user while driving, it is preferable to talk back to about two, for example. Therefore, for example, if there are three or more pieces of geographic information in S70, it is conceivable to narrow down to two or less in S80.
[0036]
Then, some examples of a method for narrowing down to geographic information will be described.
{Circle around (1)} For example, it may be based on the distance from the facility specified in S50. Specifically, two are selected in the order of “closest”, or “the closest and farthest” is selected. If one of these two landmark facilities is also memorized, the appropriateness of the recognized facility can be determined.
[0037]
{Circle around (2)} It is also conceivable to make a selection based on the “type”. Specifically, in the case of the present embodiment, since the navigation system is mounted on a vehicle, it is easy for the driver to recognize the navigation system, and the priorities of generally well-known facilities are set high, and such facilities are identified. That is, the priority is selected. For example, city halls, stations, stadiums, schools, etc. are effective as priority facilities. Also, intersections are considered to be very effective in that they can be checked from a car and that they can be found all over the country. In this case, it is possible to employ a plurality of intersections as landmark facilities, but in general, it is considered that it is often preferable to adopt another type of facility as a landmark facility, For example, when one intersection is selected, it is preferable to select the above-described city hall, station, stadium, school, or the like as a facility other than the intersection.
[0038]
(3) In addition to the viewpoints of (1) and (2) described above, it is conceivable to consider the wording when talking back. For example, the phrase "~ near-intersection angle" is preferable for the talked-back user to imagine the location of the facility, but those with the same phrase such as "~ near-near" represent the location of the facility. It is relatively unfavorable from the point of view of making it work. Therefore, it is conceivable to adopt a facility in a position where the same wording does not continue as much as possible.
(3) About talkback for facilities in S90
In the case of the normal talkback in S100, only the recognition result is talked back. However, in the facility talkback in S90, geographic information is added before the name of the facility specified in S50. Talk back. More specifically, talkback is performed on “name of landmark facility” + “phrase indicating the geographical relationship between the landmark facility and the facility specified in S50” + “name of facility specified in S50”.
[0039]
Here, for example, the following can be considered as a phrase indicating a geographical relationship.
{Circle around (1)} “Nearby”: When the distance to the target facility is far to some extent. For example, when the distance is a certain distance such as 100 m or more, the phrase “to near” is considered to match the user's sense.
[0040]
(2) "~ intersection angle": when the distance to the intersection is short. For example, if the distance is several tens of meters from the intersection, it is considered that “〜 intersection angle” can be used.
{Circle around (3)} “Before”: For example, if there is a destination across the main gate of an elementary school across a road, the phrase “△△ in front of elementary school” may be used. However, information on the front direction of the landmark facility is required, and exclusive data is required.
[0041]
{Circle around (4)} “Beside”: For example, if there is a target facility without a road beside the city hall, the phrase “□□ next to city hall” may be used.
(5) "-North", "-South", "-East", "-West": When the direction is clear, such expressions can be considered.
[0042]
Considering these (1) to (5), the relations such as (3), (4) <(5), (2) <(1) in order of decreasing distance from the target facility are as follows. Conceivable. This is because it is considered that it is preferable to use these expressions when it is a little distant compared to the case of “before” in (3) or the “to next” of (4). This is because the position can be specified “before” or “beside” regardless of the direction. Therefore, as to what kind of geographical relation wording is added to the landmark facility, for example, short-distance wording is preferentially selected, and any of the conditions (2) to (5) does not apply. If not, it is considered preferable to select "1."
[0043]
As a specific example of the talkback of “name of landmark facility” + “phrase indicating geographical relationship” + “name of specific facility performed in S50”, for example, “ Super @ Osone store? "," Is it a bookstore at the corner of Kariya station square? "
[0044]
When talking back in this way, it is possible to have dedicated geographical information, but in the present embodiment, information on indexes and display characters existing in map data is used. In current car navigation systems, index information and display character information are generally already stored as map data. In most cases, the Japanese syllabary search can be performed as a standard function, so that the index data includes reading data, and the read data can be used to generate a synthesized speech and talk back.
[0045]
Further, since the system of this embodiment has a voice recognition function, data for voice recognition may be used as it is. Since the voice recognition data is associated with the facility index data and the address index data, it can be easily realized by reusing them.
On the other hand, the display character information does not often include the reading data at present, and in that case, it is necessary to add the reading data. However, if such read data is added, it may be easier to provide dedicated data. When the dedicated data is created, it is considered that "coordinates", "information (type, etc.) that can be prioritized", and "talkback data (real voice or synthesized voice reading)" are required at a minimum.
[0046]
Further, as for the intersection, the voice data of the real voice is stored in the high-performance navigation even at present, so that it can be used.
As described above, in the navigation system 1 of the present embodiment, the user can, for example, voice-input a destination for route guidance with a facility name. However, simply talking back the facility name may not match the facility recognized by the navigation system 1 with the facility recognized by the user. If the route is set as it is despite such a recognition mismatch, the user would not have intended himself only after seeing the set route displayed on the display unit 14. When the user notices that a destination different from the destination has been set, the user has to start over from the destination setting again.
[0047]
Therefore, according to the navigation system 1 of the present embodiment, as the talkback for the facility, the name of the landmark facility + the wording indicating the geographical relationship is added before the voice-recognized facility name, and the talkback is performed. This allows the user to determine whether or not the intended facility is correctly recognized. For example, if a talkback "is a bookstore at the corner of Kariya station square in Kariya city?" Is made, the user thinks that the bookstore he is thinking of is actually another chain store in another area. You can notice that. For example, when there are a plurality of such chain stores, the user recognizes the facility intended by the user by inputting a voice together with the branch name or the like.
[0048]
When a plurality of candidate chain stores are extracted, for example, the talkback including “name of landmark facility” + “phrase indicating geographical relationship” is added in order, and the user, for example, “different” If the user inputs a voice such as "Yes", the user makes a positive determination in S110 of FIG. 2 and the recognition result is determined (S120). Is also good.
[0049]
[Another embodiment]
(1) In the above embodiment, the information indicating the geographical relationship with the predetermined landmark facility existing around the recognized facility is talked back. For example, the address of the recognized facility is talked back as geographical information. May be. For example, a talkback such as "Is this a bookstore at the address of △△ city ○ town □□□?" Such talkback is also effective when the user knows the address where the facility is located. However, the user does not always know the address where the facility is located. Even if you have been to the facility in the past, you may not always remember the address, so it may be more effective to talk back the geographical relationship with the landmark facility as in the above example. It is thought that there are many.
[0050]
(2) In the above embodiment, an example in which the present invention is applied to the in-vehicle navigation system 1 has been described. However, the present invention is not limited to this application, and may be applied to, for example, a terminal device carried by a person. . In the above-described embodiment, in the description of S80, it has been described that, when the acquired geographical information has a predetermined number or more, selection may be made based on the “type”. Then, in the case of application to an in-vehicle navigation system, the driver is easily recognized by the driver, and the priority of a generally well-known facility is increased. Therefore, when the present invention is applied to a terminal device carried by a human, the priority may be determined from the viewpoint that it is easily recognized by a walking person.
[0051]
Further, the present invention is not limited to the navigation system, but may be applied to any control system that operates a predetermined device based on the name of a facility as a recognition result of the voice uttered by the user.
(3) The function of realizing the above-described processing related to voice recognition / talkback in a computer system can be provided, for example, as a program activated on the computer system side. In the case of such a program, for example, it is possible to use the program by recording it on a computer-readable recording medium such as a flexible disk, a magneto-optical disk, a CD-ROM, and a hard disk, and loading and activating the computer system as needed. it can. Alternatively, the program may be recorded on a ROM or a backup RAM as a computer-readable recording medium, and the ROM or the backup RAM may be incorporated in a computer system.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating an overall configuration of a navigation system as an embodiment.
FIG. 2 is a flowchart showing processing related to speech recognition.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Navigation system, 11 ... Position detection part, 12 ... Map data storage part, 13 ... Switch information input part, 14 ... Display part, 15 ... Memory part, 16 ... Audio output part, 17 ... Audio input part, 20 ... Control Units 21 Map data acquisition unit 22 Map matching unit 23 Route calculation unit 24 Route guidance unit 25 Drawing unit 26 Screen control management unit 27 Voice recognition / talkback control unit

Claims

Voice input means for inputting voice spoken by the user;
Voice recognition means for recognizing voice input via the voice input means,
A control system for operating predetermined equipment based on the name of the facility as a result of recognition by the voice recognition means,
Map data storage means for storing map data;
Using the map data stored in the map data storage means, obtains geographical information from which the location of the facility recognized by the voice recognition means can be grasped, and recognizes the geographical information by the voice recognition means. And a talkback means for talking back together with the name of the facility.

The control system according to claim 1,
The control system, wherein the geographic information is an address of the recognized facility.

The control system according to claim 1,
The control system, wherein the geographic information is information indicating a geographical relationship with a predetermined landmark facility existing around the recognized facility.

The control system according to claim 3,
The information indicating the geographical relationship with the landmark facility is characterized in that the expression method is different depending on whether the distance between the recognized facility and the landmark facility is relatively short or long. system.

The control system according to claim 3 or 4,
When there are a plurality of landmark facilities in relation to the recognized facility, the talkback means talks back information indicating a geographical relationship with only a predetermined number of landmark facilities based on the distance between the two. A control system, characterized in that:

The control system according to claim 3 or 4,
The control system is mounted on a mobile object,
The talkback means, when there are a plurality of landmark facilities in relation to the recognized facility, a predetermined number based on a priority level taking into account a function as a landmark according to the property of the moving body. A control system characterized by talking back information indicating a geographical relationship with only a landmark facility.

The control system according to any one of claims 1 to 6,
A control system for setting a facility recognized as a result of the voice recognition means as a destination for route guidance.

The control system according to any one of claims 1 to 6,
A control system for displaying a map including facilities as a result of recognition by the voice recognition means.

The control system according to any one of claims 1 to 6,
A control system for notifying information on facilities as a result of recognition by the voice recognition means.