JP4137399B2

JP4137399B2 - Voice search device

Info

Publication number: JP4137399B2
Application number: JP2001100615A
Authority: JP
Inventors: 光章渡邉; 克典高橋; 望斉藤
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2001-03-30
Filing date: 2001-03-30
Publication date: 2008-08-20
Anticipated expiration: 2021-03-30
Also published as: JP2002297374A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識を利用して各種情報の検索を行う音声検索装置に関する。
【０００２】
【従来の技術】
従来から、利用者により発声される音声に対して音声認識処理を行い、認識結果に基づいて各種情報の検索を行う音声検索装置が知られている。
このような音声検索装置は、車載用のナビゲーション装置などと組み合わせて用いられている。例えば、ナビゲーション装置において、経路探索の目的地とするために各種施設を検索する機能を実行する場合を考えると、音声検索装置は、利用者の音声に対応して、（１）実行する機能を特定し、（２）利用者により指定された施設種別（例えば、食事場所や給油所等）に属する施設を検索し、（３）利用者によって指定されたフランチャイズ名などで特定される施設をさらに検索し、（４）最終的に利用者により指定された一の施設を抽出する、といった手順で情報の検索を行う。
【０００３】
【発明が解決しようとする課題】
ところで、従来の音声検索装置では、音声検索装置に対して、どのような言葉を音声入力することができるのかについては各利用者が覚えている必要がある。例えば、上述した例では、各種機能や施設種別などがどのような呼び方で認識対象として設定されているかをあらかじめ把握しておかなければ、的確な音声入力を行うことができない。しかしながら、多くの利用者は、認識対象となっている言葉を全て把握しきれないので、とりあえず適当に思いついた言葉を入力してみることとなり、的確な言葉を用いた音声入力を行うことができないことから音声認識の精度の低下を招く場合があるという問題があった。
【０００４】
本発明は、このような点に鑑みて創作されたものであり、その目的は、音声認識の精度を向上させることができる音声検索装置を提供することにある。
【０００５】
【課題を解決するための手段】
上述した課題を解決するために、本発明の音声検索装置は、複数の検索対象項目のそれぞれに検索キーが対応付けられており、利用者の入力音声の内容と検索キーとを比較することにより、検索対象項目の中から該当するものを抽出する場合に、検索キーとなりうる文字列の最大個数を設定し、その数を超えない範囲で複数の文字列の読みを認識対象文字列出力手段により音声出力する。そして、マイクロホンによって集音した利用者の音声に対して音声認識処理手段によって所定の音声認識処理を行い、この音声に対応する文字列を、認識対象文字列出力手段による音声出力の対象となった文字列の中から選択しており、音声認識処理手段によって選択された文字列によって特定される検索キーに対応する検索対象項目を項目抽出手段によって抽出している。
【０００６】
検索キーとなりうる複数の文字列の読みを音声出力することにより、認識対象となる文字列を利用者にあらかじめ提示しており、この提示に対応して利用者により入力される音声に対応する文字列を、音声出力の対象となった複数の文字列の中から選択して検索キーを特定しているので、音声認識の精度を向上させることができる。
【０００７】
また、利用者が発声する前に操作されるスイッチをさらに備え、このスイッチが操作されたときに、音声認識処理手段による音声認識処理を開始することが望ましい。スイッチが操作された場合に音声認識処理を開始すればよいため、音声認識処理を開始するタイミングが明確になり、処理の簡略化が可能となる。
【０００８】
また、音声認識処理手段によって選択された文字列の読みを音声出力する選択文字列確認手段をさらに備えることが望ましい。選択された文字列の読みを音声出力することにより、利用者は自分が入力した音声に対する認識結果を容易に確認することができる。
【０００９】
また、音声認識処理手段による文字列の選択結果に対して利用者による否定的な見解が示されたときに、この選択結果を得るために用いられた複数の文字列の読みを再度音声出力する指示を認識対象文字列出力手段に対して行う再選択指示手段をさらに備えることが望ましい。これにより利用者は、自分の希望とは異なる文字列が選択結果として得られた場合に、否定的な見解を示すことにより、検索キーを入力し直すことができる。
【００１０】
また、認識対象文字列出力手段は、認識対象となる文字列の総数が上述した所定の最大個数を超えているときに、複数回に分けてこの最大個数を超えない範囲の数の文字列の読みを音声出力し、１回の音声出力毎に、音声認識処理手段による文字列の選択判定を行うことが望ましい。認識対象となる文字列が多数存在する場合であっても、所定個数ずつに分けて音声出力が行われるため、利用者はこの所定個数の文字列にのみ着目して文字列の選択を行えばよく、所望の文字列の選択を確実に行うことができる。
【００１１】
また、利用者によって他の選択候補の音声出力が指示されたときに、認識対象文字列出力手段に対して２回目以降の音声出力を指示する音声出力指示手段をさらに備えていることが望ましい。これにより利用者は、他の選択候補を容易に得ることができる。
【００１２】
また、再度の音声出力が利用者によって指示されたときに、認識対象文字列出力手段に対して、直前に音声出力した複数の文字列の読みを再度音声出力する指示を行う再音声出力指示手段をさらに備えていることが望ましい。これにより、音声出力の内容を聞き逃したような場合に、再度の音声出力を行わせてその内容を確認することができる。
【００１３】
また、文字列の選択動作をまかせる旨の指示が利用者によってなされたときに、音声認識処理手段による音声認識処理の結果を用いずに文字列の選択を行う文字列選択手段をさらに備えておき、この文字列選択手段による文字列の選択が行われたときには、音声認識処理手段によって選択される文字列に代えて、文字列選択手段によって選択された文字列を用いて項目抽出手段による検索対象項目の抽出動作を行うことが望ましい。「まかせる」旨の指示を行うことにより、利用者は文字列の選択を音声検索装置に対して委ねることができるため、いずれの文字列が選択されても構わないというような場合における操作の簡略化が可能となる。
【００１４】
また、検索対象項目のそれぞれに複数の検索キーが対応付けられており、一の検索キーに対応して項目抽出手段によって一の検索対象項目の絞り込みが行えなかった場合には、一の検索対象項目の絞り込みが行えるまで、他の検索キーを用いた認識対象文字列出力手段、音声認識処理手段および項目抽出手段による処理を繰り返すことが望ましい。これにより、一の検索対象項目を確実に絞り込むことができる。
【００１５】
また、複数の検索キーのそれぞれには異なる優先度が対応付けられており、複数の検索対象項目のそれぞれに複数の検索キーに対応する複数の文字列が対応付けられたテーブル情報をテーブル格納手段に格納し、このテーブル格納手段に格納されるテーブル情報に基づいて、認識対象文字列出力手段により、優先度が高い検索キーから順番に、対応する文字列の読みを音声出力することが望ましい。優先度が設定された検索キー毎に内容の追加や変更を行うことができるため、データ更新を容易に行うことができる。
【００１６】
また、一の検索キーに対応する文字列の選択が行われたときに、次に選択対象となる検索キーおよびこの検索キーに対応する文字列を示す複数階層のツリー構造情報をツリー構造格納手段に格納し、このツリー構造格納手段に格納されるツリー構造情報に基づいて、認識対象文字列出力手段により、次に音声出力の対象となる検索キーに対応する複数の文字列を抽出して、これらの文字列の読みを音声出力するようにしてもよい。ツリー構造を上位階層から順に辿っていくだけで、次に音声出力する文字列を抽出することができるため、処理の簡略化が可能となる。
【００１７】
また、音声認識処理手段による過去の選択履歴情報を格納する選択履歴格納手段をさらに備えておき、この選択履歴格納手段に格納される選択履歴情報に基づいて、選択頻度が高い文字列を認識対象文字列出力手段によって判定し、この文字列の読みを優先的に音声出力することが望ましい。選択される頻度が高い文字列ほど優先的に音声出力を行うようにすることにより、選択頻度の高い文字列を少ない音声入力によって選択することができるようになり、操作性を向上させることができる。
【００１８】
また、複数の文字列のそれぞれが日本語の５０音の中の一音からなっている場合に、項目抽出手段は、先頭の一語が音声認識処理手段によって選択された一音に一致する検索キーを抽出することが望ましい。選択候補となる文字列が多数存在する場合であっても、容易に候補の文字列を絞り込むことができる。
【００１９】
また、音声認識処理手段は、文字列を構成する全ての文字と、音声認識処理結果の全体とを比較することにより、文字列の選択を行うことが望ましい。文字列と音声認識結果とが完全に一致するもののみを考慮して文字列の選択を行えばよいため、比較処理が容易となり処理を簡略化することができる。
【００２０】
また、音声認識処理手段は、文字列の一部を構成する文字と、音声認識処理結果の全体とを比較することにより、文字列の選択を行うようにしてもよい。文字列の一部を構成する文字を考慮した比較を行うことにより、文字列の一部に特徴がある場合等において、利用者はこの特徴があって覚えやすい一部分のみを音声入力することが可能となり、操作性の向上を図ることができる。
【００２１】
また、音声認識処理手段は、認識対象文字列出力手段による音声出力が終了する前に、利用者の音声がマイクロホンによって集音されたときには、その時点から文字列の選択動作を開始することが望ましい。音声出力において、最初の方で案内された認識対象文字列を選択したい場合などにおいて、全ての音声出力を待つことなくこの所望の文字列を音声入力することができるため、より一層の操作性の向上を図ることができる。
【００２２】
また、上述した検索キーとなりうる文字列の最大個数は、７±２の範囲に設定されていることが望ましい。認知心理学における短期記憶の理論によれば、なんらかのまとまりを持つ情報のかたまりを「チャンク」と定義すると、人間が一度に保持することができる情報の量は、およそ７±２チャンクであるとされている。例えば、電話番号を記憶する場合には、基本的には電話番号を構成する数字１個が１チャンクに相当することとなる。また、「２９８３」という数字列を「肉屋さん」のように語呂合わせにして記憶した場合には、この「肉屋さん」という情報が１チャンクに相当する。したがって、この「チャンク」の概念に基づいて、検索キーとなりうる文字列の最大個数を７±２の範囲に設定しておくことにより、利用者が検索キーとなりうる文字列を確実に覚えておくことができる。なお、上述した「チャンク」に関する詳細については、例えば、文献「認知心理学２記憶高野陽太郎編１９９５東京大学出版会」の７５頁などに記載されている。
【００２３】
また、ネットワークを介して接続されたサーバと端末装置とに機能を分散配置して音声検索装置を構成してもよい。具体的には、検索対象項目とそれぞれに対応する検索キーに関する情報を格納する機能をサーバに配置し、端末装置には、認識対象文字列出力手段、マイクロホン、音声認識処理手段、項目抽出手段に対応する機能を配置するようにし、各種の処理に先立って、端末装置がサーバから必要な情報を取得することにより、音声検索装置を構成することが好ましい。各種の処理に必要な情報を端末装置がサーバから取得しているため、端末装置は、内容の更新された新しい情報をサーバから通信によって取得して各種処理に反映させることができる。
【００２４】
また、サーバから端末装置に送られてくる情報は、前回までに送られてきた情報に対する変更内容を含む差分情報であることが望ましい。内容に変更があった場合に、その変更内容を含んだ差分情報だけを取得すればよく、通信コストを削減することができる。
【００２５】
また、ネットワークを介して接続されたサーバと端末装置とに機能を分散配置して音声検索装置を構成する場合に、検索対象項目とそれぞれに対応する検索キーに関する情報を格納するとともに、認識対象文字列出力手段による音声出力の対象となる文字列の抽出処理と、項目抽出手段による検索対象項目の抽出処理を行う機能をサーバに配置し、端末装置には、認識対象文字列出力手段、マイクロホン、音声認識処理手段に対応する機能を配置し、これらの処理に必要な情報を端末装置がサーバから取得するようにしてもよい。多くの機能をサーバ側に配置することにより、端末装置の処理負担が軽減し、構成の簡略化が可能となるため、端末装置のコストダウンを図ることができる。
【００２６】
【発明の実施の形態】
以下、本発明を適用した一実施形態の音声検索装置について、図面を参照しながら説明する。
〔第１の実施形態〕
図１は、第１の実施形態の音声検索装置１を含んで構成される車載用システムの構成を示す図である。図１に示す車載用システムは、利用者が発声した音声に応答して対話形式で各種の動作指示を決定して出力する音声検索装置１と、自車位置を検出して自車位置周辺の地図を表示したり、利用者によって選択された目的地までの経路探索および経路誘導等を行うナビゲーション装置２と、コンパクトディスクやミニディスク等の記録媒体に記録された音楽の再生等を行うオーディオ装置３を含んで構成されている。
【００２７】
次に、音声検索装置１の詳細構成について説明する。図１に示す音声検索装置１は、マイクロホン１０、音声認識処理部１２、対話開始ボタン１４、再要求ボタン１５、選択肢設定部１６、案内文生成部１８、音声合成部２０、スピーカ２２、選択項目判定部２４、候補セットデータベース（ＤＢ）２６、ＤＢ更新部２８、動作指示出力部３０を含んで構成されている。
【００２８】
マイクロホン１０は、利用者が発声した音声を集音して音声信号に変換する。
音声認識処理部１２は、マイクロホン１０から出力される音声信号を解析して所定の音声認識処理を行い、利用者が発声した音声に対応する文字列を特定する。本実施形態の音声認識処理部１２は、選択肢設定部１６によって設定される所定数の選択肢に対応した文字列を認識対象として、所定の認識処理を行っている。
【００２９】
対話開始ボタン１４は、利用者が音声検索装置１と対話を開始する際に押下する押しボタンスイッチである。また、再要求ボタン１５は、利用者が音声検索装置１から出力される音声を再度聞きたい場合に押下する押しボタンスイッチである。
【００３０】
選択肢設定部１６は、候補セットＤＢ２６に格納されたデータに基づいて、音声入力を行う際の候補として提示される所定数の選択肢を設定するものである。なお、この所定数は、１回の提示機会において７±２個の範囲内で設定されることが望ましく、本実施形態では５つの選択肢が設定される。選択肢設定部１６によって行われる処理の詳細については後述する。
【００３１】
案内文生成部１８は、選択肢設定部１６によって設定される所定数の選択肢に基づいて、利用者に対して出力する案内音声の内容、すなわち案内文を生成する。
音声合成部２０は、案内文生成部１８によって生成された案内文に対応した音声出力を行うための音声信号を生成し、スピーカ２２に出力する。スピーカ２２は、入力される音声信号に基づいて案内音声を出力する
選択項目判定部２４は、音声認識処理部１２から出力される認識結果の文字列に基づいて、所定数の選択肢の中からいずれの項目が利用者により選択されたかを判定する。
【００３２】
候補セットＤＢ２６は、選択肢設定部１６が複数の選択肢を設定するために必要なデータを格納している。
図２は、候補セットＤＢ２６に格納されるデータの構造を示す図である。図２に示すように、候補セットＤＢ２６には、階層構造を有する所定の候補セット（ツリー構造情報）が格納されている。それぞれの候補セットには、所定数の選択肢が含まれている。最上位階層の候補セットには、ナビゲーション装置２等に対して実行させることができる複数の機能が選択肢として含まれている。また、２番目以降の階層の候補セットには、上位階層の候補セットに含まれる複数の選択肢のいずれかに関連付けられた複数の選択肢が含まれている。
【００３３】
図３は、図２に示したデータ構造における上位階層の候補セットと下位階層の候補セットとの対応関係を示す図である。例えば、最上位階層の候補セット１００には、“食事場所検索”、“給油所検索”、“施設検索”、“駐車場検索”、“オーディオ操作”、および“その他”という選択肢が含まれている。これらの選択肢は、所定の優先順位に基づいて並べられており、これらの選択肢を案内する案内音声を生成する際には、優先順位の高いものから順に各選択肢が案内される。例えば、図３に示す候補セット１００では、“食事場所検索”が最も優先順位が高くなっており、この候補セット１００に基づいて生成される案内音声では、“食事場所検索”、“給油所検索”、…、“その他”の順に各選択肢が案内される。各選択肢が案内される具体例については後述する。なお、他の候補セットについても同様である。
【００３４】
また、選択肢の“その他”に関連付けられて、同じ階層に他の候補セット１００ａがあり、この候補セット１００ａには、“交通情報”、“地図表示”、…、“その他”という選択肢が含まれている。この候補セット１００ａに含まれる選択肢の“その他”については、さらに他の選択肢が存在する場合には、新たな候補セットが設けられ、この“その他”に関連付けられる。
【００３５】
また、候補セット１００等に含まれる“その他”以外の選択肢については、この選択肢に関連付けて、複数の選択肢を含む候補セットが下位階層に設けられる。例えば、候補セット１００に含まれる“食事場所検索”に関連付けられた下位階層の候補セットとしては、候補セット１０２が存在しており、この候補セット１０２には、食事場所を選択するために、“レストランａ”など複数のフランチャイズ名等が選択肢として含まれている。同様に、候補セット１００に含まれる“給油所検索”に関連付けられた下位階層の候補セットとしては、候補セット１０４が存在しており、この候補セット１０４には、給油所を選択するために、“Ａ石油”など複数のフランチャイズ名等が選択肢として含まれている。
【００３６】
このように、本実施形態では、最上位階層の候補セットから順に、一の選択肢を選択してその選択肢に関連付けられた下位階層の候補セットに移るという処理を繰り返していき、最終的に、最下位階層の候補セットに含まれる複数の選択肢の中から一を選択することにより、動作指示の内容が決定される。この場合に、上位階層の候補セットに含まれる複数の選択肢が「検索キー」に対応し、最下位階層の候補セットに含まれる複数の選択肢が「検索対象項目」に対応している。なお、図２では４階層の階層構造を有する候補セットが示されているが、これは一例であり、動作指示の内容によりこの階層数は増減する。
【００３７】
ＤＢ更新部２８は、車両位置の検出結果をナビゲーション装置２から取得し、これに基づいて、候補セットＤＢ２６に格納された食事場所、給油所、駐車場などの施設の位置に関するデータ（位置データ）の内容を更新する。例えばＤＢ更新部２８は、給油所検索が行われており、検索対象となる施設のフランチャイズ名が選択された場合に、このフランチャイズ名に対応する店舗の中から、その時点での車両位置を中心とした所定範囲内に存在する店舗を抽出し、抽出された店舗についてその位置データを算出し、候補セットＤＢ２６の内容を更新する。
【００３８】
動作指示出力部３０は、複数の選択肢からいずれか一を選択する処理が繰り返されて最終的に選択された項目の内容に対応して、所定の動作指示をナビゲーション装置２またはオーディオ装置３に向けて出力する。
上述した選択肢設定部１６、案内文生成部１８、音声合成部２０、スピーカ２２が認識対象文字列出力手段および選択文字列確認手段に、音声認識処理部１２が音声認識処理手段に、選択肢設定部１６、選択項目判定部２４が項目抽出手段に、対話開始ボタン１４がスイッチに、再要求ボタン１５が再音声出力指示手段に、候補セットＤＢ２６がツリー構造格納手段に、選択項目判定部２４が音声出力指示手段にそれぞれ対応している。
【００３９】
本実施形態の音声検索装置１はこのような構成を有しており、次にその動作について説明する。
図４は、第１の実施形態の音声検索装置１の動作手順を示す流れ図である。利用者の発声する音声に対応してナビゲーション装置２に対する動作指示を出力する際の動作手順が示されている。
【００４０】
選択肢設定部１６は、利用者により対話開始ボタン１４が押下されたか否かを判定している（ステップ１００）。対話開始ボタン１４が押下されない場合は否定判断がなされ、ステップ１００での処理が繰り返される。対話開始ボタンが利用者により押下された場合には肯定判断が行われ、選択肢設定部１６は、候補セットＤＢ２６に格納されたデータを用いて、最上位階層の候補セットを先頭の候補セットとして設定する（ステップ１０１）。
【００４１】
次に選択肢設定部１６は、候補セットに含まれる複数の選択肢に対応する文字列を音声認識処理部１２および案内文生成部１８に通知する（ステップ１０２）。
案内文生成部１８は、候補セットに含まれる複数の選択肢を案内する所定の案内文を生成して音声合成部２０に出力する。音声合成部２０によって案内文に対応する音声信号が生成されてスピーカ２２に出力され、スピーカ２２から選択肢を提示する案内音声が出力される（ステップ１０３）。
【００４２】
また選択肢設定部１６は、利用者により再要求ボタン１５が押下されたか否かを判定する（ステップ１０４）。再要求ボタン１５が押下された場合には、ステップ１０４で肯定判断が行われ、ステップ１０３に戻り、以降の処理が繰り返される。具体的には、案内文を再度出力するように要求された旨が選択肢設定部１６から案内文生成部１８に通知される。この通知に応じて、先の処理時に生成した案内文が、案内文生成部１８により、音声合成部２０に再度出力される。これにより、案内音声の再出力が行われる。
【００４３】
再要求ボタン１５が押下されない場合には、ステップ１０４で否定判断が行われ、音声認識処理部１２は、マイクロホン１０から出力される音声信号の有無に基づいて、利用者により音声入力が行われたか否かを判定する（ステップ１０５）。音声入力が行われない場合には、ステップ１０５で否定判断が行われ、この場合には上述したステップ１０４に戻り、以降の処理が繰り返される。
【００４４】
音声入力が行われた場合には、ステップ１０５で肯定判断が行われ、音声認識処理部１２は、選択肢設定部１６から通知された複数の選択肢に対応する文字列のみを音声認識の対象として所定の音声認識処理を行い、利用者によって選択された一の選択肢を特定する（ステップ１０６）。なお本実施形態では、選択肢設定部１６から通知された複数の選択肢に加えて、「その他」についても選択肢の１つとして音声認識の対象とされているものとする。
【００４５】
選択項目判定部２４は、音声認識処理部１２から出力される音声認識結果に基づいて、選択肢の中から「その他」が選択されたか否かを判定する（ステップ１０７）。「その他」が選択されなかった場合には、ステップ１０７で否定判断が行われ、選択項目判定部２４は、選択肢設定部１６に指示を送り、利用者によって選択された選択肢に対応する次の候補セット（下位階層の候補セット）が存在するか否かを判定する（ステップ１０８）。
【００４６】
次の候補セットが存在する場合には、ステップ１０８で肯定判断が行われ、選択項目判定部２４は、次の候補セットを設定するように選択肢設定部１６に指示する。指示を受けた選択肢設定部１６は、次の候補セットを設定する（ステップ１０９）。その後、ステップ１０２に戻り、以降の処理が行われる。
【００４７】
また、選択肢の中から「その他」が選択された場合には、上述したステップ１０７で肯定判断が行われ、選択項目判定部２４は、次の選択肢を設定するように選択肢設定部１６に通知する。通知を受けた選択肢設定部１６は、候補セットＤＢ２６に格納されたデータに基づいて、次の選択肢が存在するか否かを判定し（ステップ１１０）、存在する場合には肯定判断を行って、次の選択肢を設定する（ステップ１１１）。その後、上述したステップ１０２に戻り、次の選択肢が音声認識処理部１２および案内文生成部１８に通知され、以降の処理が行われる。
【００４８】
また、次の選択肢が存在しない場合には、ステップ１１０で否定判断が行われ、この場合には、選択肢設定部１６は、次の選択肢がない旨を案内する案内文を生成するように案内文生成部１８に指示を送る。指示を受けた案内文生成部１８によって所定の案内文が生成されて音声合成部２０に出力され、次の選択肢がない旨を通知する案内音声がスピーカ２２から出力される（ステップ１１２）。その後、上述したステップ１０３に戻り、前回の処理時に案内された選択肢が、利用者に対して再度提示され、以降の処理が繰り返される。
【００４９】
また、上述したステップ１０８における次の候補セットが存在するかどうかの判定処理において、次の候補セットが存在しない場合には否定判断が行われ、選択項目判定部２４は、利用者によって最終的に選択された項目の内容を動作指示出力部３０に通知する。通知を受けた動作指示出力部３０は、利用者によって選択された項目の内容に対応する動作指示を、ナビゲーション装置２等に出力する（ステップ１１３）。
【００５０】
次に、上述した図４に示した処理にしたがって、音声検索装置１と利用者の間で行われる対話を具体的に説明する。なお、以降の説明では、利用者を「Ｕ」、音声検索装置１を「Ｓ」として、両者の対話例を説明する。また、対話例と合わせて、候補セットＤＢ２６から読み出されるデータの内容を示す図面を適宜参照する。
【００５１】
（対話例１）
対話例１は、最寄りの給油所を検索する際の対話例を示している。また図５は、対話例１において用いられるデータの内容を示す図である。
Ｕ：対話開始ボタン１４を押下する。
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください。」…（１）
Ｕ：「給油所検索」…（２）
Ｓ：「給油所検索ですね、ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」…（３）
Ｕ：「Ｂ石油」…（４）
Ｓ：「Ｂ石油ですね、では２ｋｍ先右側、２．５ｋｍ左側、３ｋｍ先左側、５ｋｍ先右側、その他、の中から選択してください」…（５）
Ｕ：「２ｋｍ先右側」…（６）
Ｓ：「２ｋｍ先右側ですね、それではＢ石油いわき店に目的地をセットします」…（７）
図５に示すように、利用者により対話開始ボタン１４が押下されると、まず最上位階層の候補セットが読み出され、利用者が選択可能な複数の機能が上述した音声（１）のように案内される。
【００５２】
この音声に対応して、上述した音声（２）に示すように利用者により「給油所検索」が選択されると、この給油所選択に対応した下位階層の候補セットが読み出され、利用者が選択可能な複数のフランチャイズ名が上述した音声（３）のように案内される。
【００５３】
ここで、上述した音声（４）に示すように利用者によりフランチャイズ名の「Ｂ石油」が選択されると、このＢ石油に対応した下位階層の候補セットが読み出され、自車位置を基準とした各施設の位置（相対的な距離）が上述した音声（５）のように案内される。
【００５４】
次に、上述した音声（６）に示すように、利用者により位置「２ｋｍ先右側」が選択されると、この選択された位置に対応する一の給油所である「Ｂ石油いわき店」が特定され、上述した音声（７）に示すように、この給油所が経路探索の目的地にセットされ、一連の処理が終了する。
【００５５】
（対話例２）
対話例２は、上述した対話例１と同様に最寄りの給油所を検索する場合であって、再要求ボタン１５が押下された場合の対話例を示している。なお、対話例２において用いられるデータの内容は上述した図５と同様である。
Ｕ：対話開始ボタン１４を押下する。
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください」
Ｕ：「給油所検索」
Ｓ：「給油所検索ですね、ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」
Ｕ：再要求ボタン１５を押下する。
Ｓ：「入力は給油所検索ですね、つづいてフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」…（８）
Ｕ：「Ｂ石油」
Ｓ：「Ｂ石油ですね、では２ｋｍ先右側、２．５ｋｍ左側、３ｋｍ先左側、５ｋｍ先右側、その他、の中から選択してください」
Ｕ：「２ｋｍ先右側」
Ｓ：「２ｋｍ先右側ですね、それではＢ石油いわき店に目的地をセットします」上述した対話例における音声（８）に示すように、利用者により再要求ボタン１５が押下されると、直前に案内された候補セットの内容が、再度案内される。
【００５６】
なお、この場合には、１回目と２回目で案内文の内容を変更することが望ましい。上述した例では、１回目の案内文は「給油所検索ですね、では…」、２回目の案内文は「入力は給油所検索ですね、つづいて…」となっており、両者の内容が変更されている。また、案内音声が聞き取りにくかった場合も考えられるので、再要求がなされた場合には、２回目の音声の発話スピードを１回目よりも遅くするようにしてもよい。
【００５７】
（対話例３）
対話例３は、上述した対話例１と同様に最寄りの給油所を検索する場合であって、選択肢の中から「その他」が選択された場合の対話例を示している。また図６は、対話例３において用いられるデータの内容を示す図である。
Ｕ：対話開始ボタン１４を押下する。
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください」
Ｕ：「給油所検索」
Ｓ：「給油所検索ですね、ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」
Ｕ：「その他」…（９）
Ｓ：「では、Ｆ石油、Ｇ石油、Ｈ石油、Ｉ石油、Ｊ石油、その他、の中から選択してください」…（１０）
Ｕ：「Ｇ石油」
Ｓ：「Ｇ石油ですね、では２ｋｍ先右側、２．５ｋｍ左側、３ｋｍ先左側、５ｋｍ先右側、その他、の中から選択してください」
Ｕ：「２ｋｍ先右側」
Ｓ：「２ｋｍ先右側ですね、それではＧ石油いわき店に目的地をセットします」上述した対話例における音声（９）に示すように、利用者により選択肢の中から「その他」が選択されると、この「その他」に対応して、同じ階層における次の選択肢を含んだ候補セットが読み出され、上述した音声（１０）に示すように、利用者が選択可能なフランチャイズ名が追加して案内される。
【００５８】
（対話例４）
対話例４は、上述した対話例３と同様に、選択肢の中から「その他」が選択された場合であって、同じ階層における次の候補セットが存在しなかった場合の対話例を示している。また図７は、対話例４において用いられるデータの内容を示す図である。
Ｕ：対話開始ボタン１４を押下する。
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください」
Ｕ：「給油所検索」
Ｓ：「給油所検索ですね、ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」
Ｕ：「その他」
Ｓ：「では、Ｆ石油、Ｇ石油、Ｈ石油、Ｉ石油、Ｊ石油、その他、の中から選択してください」
Ｕ：「その他」…（１１）
Ｓ：「申し訳ございません。その他の候補はありません」…（１２）
上述した対話例における音声（１１）に示すように、利用者により選択肢の中から「その他」が選択された場合であって、同じ階層における次の候補セットが存在しない場合には、上述した音声（１２）に示すように、利用者が選択可能な選択肢がもう存在しない旨が案内される。
【００５９】
（対話例５）
対話例５は、所望の施設を検索する際の対話例を示している。また図８は、対話例５において用いられるデータの内容を示す図である。
Ｕ：対話開始ボタン１４を押下する。
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください」
Ｕ：「施設検索」…（１３）
Ｓ：「施設検索ですね、では施設の地方を、北海道、東北地方、関東地方、中部地方、近畿地方、その他、の中から選択してください」…（１４）
Ｕ：「東北地方」…（１５）
Ｓ：「東北地方ですね、では施設の県を、福島県、秋田県、岩手県、宮城県、青森県、その他、の中から選択してください」…（１６）
Ｕ：「福島県」…（１７）
Ｓ：「福島県ですね、では施設名称の先頭文字を、あ行、か行、さ行、た行、な行、その他から選択してください」…（１８）
Ｕ：「あ行」…（１９）
Ｓ：「あ行ですね、ではＲパイ、Ｒパイ技研、Ｒパイ情報システム、Ｒピー事業所、Ｒピー物流、その他、の中から選択してください」…（２０）
Ｕ：「Ｒパイ」…（２１）
Ｓ：「Ｒパイですね、それではＲパイに目的地をセットします」…（２２）
このように、選択可能な複数の機能が音声により案内され、利用者は、この音声に対応して目的の機能を選択する。上述した音声（１３）で示すように、利用者により「施設検索」が選択されると、この施設検索に対応した下位階層の候補セットが読み出され、上述した音声（１４）に示すように、施設の所在する地方が選択肢として案内される。
【００６０】
ここで、上述した音声（１５）に示すように利用者により施設の所在する地方として「東北地方」が選択されると、この「東北地方」に対応した下位階層の候補セットが読み出され、施設の所在する都府県名が上述した音声（１６）のように案内される。
【００６１】
上述した音声（１７）に示すように、利用者により「福島県」が選択されると、対応する下位階層の候補セットが読み出され、上述した音声（１８）に示すように、施設の名称の先頭文字（あ行、か行等）が選択肢として案内される。
上述した音声（１９）に示すように、利用者により「あ行」が選択されると、対応する下位階層の候補セットが読み出され、上述した音声（２０）に示すように、「福島県」に所在する施設であって、施設名称の先頭文字が「あ行」に属する施設の名称が選択肢として案内される。
【００６２】
ここで、上述した音声（２１）に示すように、利用者により一の施設名称「Ｒパイ」が選択されると、一の施設である「Ｒパイ」が特定されるため、上述した音声（２２）に示すように、この施設が経路探索の目的地にセットされ、一連の処理が終了する。
【００６３】
このように、第１の実施形態では、所定の階層構造を有する候補セットを含んだデータを候補セットＤＢ２６に格納しており、この候補セットに基づいて、次に音声出力の対象となる選択肢に対応する複数の文字列を抽出して、これらの文字列の読みを音声出力している。そして、認識対象となる文字列を利用者にあらかじめ提示し、この提示に対応して利用者により入力される音声に対応する文字列を、音声出力の対象となった複数の文字列の中から選択し、利用者により選択された選択肢を特定しているので、音声認識の精度を向上させることができる。特に、階層構造を有する候補セットを上位階層から順に辿っていくだけで、次に音声出力する文字列を抽出することができるため、処理の簡略化が可能となる
〔第２の実施形態〕
ところで、上述した第１の実施形態では、候補セットＤＢ２６には、階層構造を有する候補セットがあらかじめ用意されて格納されていたが、一般的なテーブル形式の構造を有するデータベースを用いて第１の実施形態と同様の処理を行うこともできる。
【００６４】
図９は、第２の実施形態の音声検索装置１Ａを含んで構成される車載用システムの構成を示す図である。図９に示す第２の実施形態の音声検索装置１Ａは、上述した第１の実施形態における音声検索装置１と比較して、候補セットＤＢ２６がデータ内容の異なる候補セットＤＢ２６ａに置き換えられた点が異なっており、またこのデータ内容の変更に伴って、利用者の発声する音声に対応して動作指示の内容を絞り込む際の動作手順が異なっている。以下、主に第１の実施形態との相違点について着目して説明を行う。
【００６５】
候補セットＤＢ２６ａは、選択肢設定部１６が複数の選択肢を設定するために必要なデータを格納している。
図１０は、第２の実施形態の候補セットＤＢ２６ａに格納されるデータの構造を示す図である。図１０に示すように、第２の実施形態の候補セットＤＢ２６ａに格納されるデータは、上述した第１の実施形態の場合と異なり、テーブル形式となっている。
【００６６】
この候補セットＤＢ２６ａに格納されるデータ（テーブル情報）は、「優先度」、「候補セットタイトル」、「選択肢」という３つの要素から構成されている。
「優先度」は、上述した第１の実施形態における階層と同様の意味を示している。すなわち、何らかの動作指示を決定する際には、優先度１の「機能」から順に、複数の選択肢の中から一の選択肢が選択される。選択肢を提示し、選択する処理の具体例については後述する。なお、図１０に示す優先度１〜６に対応付けられている各選択肢が「検索キー」に対応し、最終的に特定される選択肢である優先度７の各選択肢が「検索対象項目」に対応している。
【００６７】
また候補セットＤＢ２６ａでは、横方向の１行分が１つのデータ群（以後、これを「レコード」と呼ぶ）となっている。例えば、図１０に示した１行目のレコードは、施設名「Ｂ石油いわき店」に関するデータ群であり、機能としては「給油所検索」に関連しており、フランチャイズ名が「Ｂ石油」、施設の所在する地方が「東北地方」、施設の所在する都府県が「福島県」、施設名称の先頭文字が「は行」にそれぞれ属していることを示している。なお、位置については、上述したＤＢ更新部２８によってその内容が更新される。候補セットＤＢ２６ａには、このようなレコードが複数含まれている。なお、この候補セットＤＢ２６ａがテーブル格納手段に対応している。
【００６８】
本実施形態の音声検索装置１Ａはこのような構成を有しており、次にその動作について説明する。
図１１は、第２の実施形態の音声検索装置１Ａの部分的な動作手順を示す流れ図である。なお、音楽検索装置１Ａの基本的な操作手順は、上述した図４に示した第１の実施形態の音楽検索装置１と同様であり、ステップ１０１の処理内容とステップ１０７以降の処理内容が異なっている。図１１には、この処理内容の相違する部分が主に示されている。
【００６９】
選択肢設定部１６は、利用者により対話開始ボタン１４が押下されたか否かを判定する（ステップ１００）。利用者により対話開始ボタン１４が押下されない場合は否定判断がなされ、ステップ１００の処理が繰り返される。
対話開始ボタン１４が押下された場合には、ステップ１００で肯定判断が行われ、選択肢設定部１６は、候補セットＤＢ２６ａから“優先度１”の列に属するデータを抽出し、抽出したデータを用いて先頭の候補セットを設定する（ステップ１０１Ａ）。具体的には、図１０に示したように、本実施形態では、“優先度１”の列のデータには各種機能の内容が含まれており、これらの機能の内容を選択肢として含んだ候補セットが設定される。その後、上述した第１の実施形態と同様にして、図４に示すステップ１０２〜ステップ１０７に示した処理が行われる。
【００７０】
選択項目判定部２４は、音声認識処理部１２から出力される音声認識結果に基づいて、選択肢の中から「その他」が選択されたか否かを判定する（ステップ１０７）。
「その他」が選択されなかった場合には、ステップ１０７で否定判断が行われ、その旨が選択項目判定部２４から選択肢設定部１６に通知される。通知を受けた選択肢設定部１６は、利用者によって選択された選択肢に対応して、次の候補セットとして提示される候補となる選択肢の絞り込みを行う（ステップ１２０）。例えば、利用者によって「給油所検索」が選択された場合であれば、選択項目判定部２４は、この「給油所検索」に対応するレコードの絞り込みを行う。
【００７１】
次に選択肢設定部１６は、優先度の高い候補セットから順に、選択肢を２つ以上含む候補セットがあるか否かを判定する（ステップ１２１）。
選択肢を２つ以上含んだ候補セットが存在する場合には、ステップ１２１で肯定判断が行われ、次に選択肢設定部１６は、ステップ１２１で特定された候補セット（２つ以上の選択肢を含む候補セット）に対応して、所定数の選択肢を抽出し、次の候補セットを設定する（ステップ１２２）。
【００７２】
図１２は、ステップ１２２に示す処理の詳細な手順を示す流れ図である。まず、選択肢設定部１６は、候補セットＤＢ２６ａに格納されているデータに基づいて、優先度が高く、種類の異なる選択肢を２つ以上含んだ候補セットを選択する（ステップ１３０）。
【００７３】
次に選択肢設定部１６は、選択した候補セットに含まれている選択肢の種類が所定数（本実施形態では５つ）以下であるか否かを判定する（ステップ１３１）。
選択肢の種類が所定数以下でない場合には、ステップ１３１で否定判断が行われ、次に選択肢設定部１６は、所定数の選択肢を抽出する（ステップ１３２）。
【００７４】
また、選択肢の種類が所定数以下である場合には、ステップ１３１で肯定判断が行われ、次に選択制設定部１６は、存在する選択肢を全て抽出する（ステップ１３３）。
次に選択肢設定部１６は、ステップ１３２またはステップ１３３に示した処理において抽出された選択肢を、次の候補セットとして設定し（ステップ１３４）、図１１に示すステップ１２２での処理が終了する。その後、ステップ１０２に戻り、以降の処理が繰り返される。
【００７５】
上述したステップ１０７において、選択肢の中から「その他」が選択された場合には肯定判断が行われ、選択項目判定部２４は、次の選択肢を設定するように選択肢設定部１６に通知する。通知を受けた選択肢設定部１６は、候補セットＤＢ２６ａに格納されたデータに基づいて、前回の処理において既に提示された選択肢以外の他の選択肢が存在するか否かを判定する（ステップ１２３）。存在する場合には肯定判断を行って、次の選択肢を設定する（ステップ１２４）。その後、上述したステップ１０２に戻り、次の選択肢が音声認識処理部１２および案内文生成部１８に通知され、以降の処理が行われる。
【００７６】
また、次の選択肢が存在しない場合には、ステップ１２３で否定判断が行われる。この場合には、選択肢設定部１６は、次の選択肢が存在しない旨を案内する案内文を生成するように案内文生成部１８に指示を送る。指示を受けた案内文生成部１８によって所定の案内文が生成されて音声合成部２０に出力され、次の選択肢がない旨を通知する案内音声がスピーカ２２から出力される（ステップ１２５）。その後、上述したステップ１０３に戻り、前回の処理時に案内された選択肢が、利用者に対して再度提示され、以降の処理が繰り返される。
【００７７】
上述したステップ１２１おいて、選択肢を２つ以上含む候補セットが存在しなくなった場合には否定判断が行われ、選択項目判定部２４は、利用者によって最終的に選択された選択肢の内容を動作指示出力部３０に通知する。通知を受けた動作指示出力部３０は、利用者により選択された選択肢の内容に対応する動作指示をナビゲーション装置２等に出力する（ステップ１２６）。
【００７８】
次に、上述した図１１に示した処理にしたがって、音声検索装置１Ａと利用者の間で行われる対話を具体的に説明し、この対話例と合わせて、候補セットＤＢ２６ａに格納されたデータの中から必要なレコードを抽出する様子について、図面を適宜参照して説明する。
【００７９】
（対話例６）
対話例６は、最寄りの給油所を検索する際の対話例を示している。また図１３は、対話例６において候補セットＤＢ２６ａから抽出されるレコードの内容を示す図である。
Ｕ：対話開始ボタン１４を押下する。
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください。」…（２３）
Ｕ：「給油所検索」…（２４）
Ｓ：「給油所検索ですね、ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」…（２５）
Ｕ：「Ｂ石油」…（２６）
Ｓ：「Ｂ石油ですね、では２ｋｍ先右側、５ｋｍ左側、その他、の中から選択してください」…（２７）
Ｕ：「２ｋｍ先右側」…（２８）
Ｓ：「２ｋｍ先右側ですね、それではＢ石油いわき店に目的地をセットします」…（２９）
図１３に示すように、利用者により対話開始ボタン１４が押下されると、まず優先度１の候補セットタイトルである「機能」に対応して複数の選択肢が抽出され、利用者が選択可能な複数の機能が上述した音声（２３）のように案内される。
【００８０】
この音声（２３）に対応して、上述した音声（２４）に示すように利用者により「給油所検索」が選択されると、この「給油所選択」に対応したレコードのみが絞り込まれ、次に優先度の高い優先度２の候補セットタイトルである「フランチャイズ名」に対応して複数の選択肢が抽出され、利用者が選択可能な複数のフランチャイズ名が上述した音声（２５）のように案内される。
【００８１】
上述した音声（２６）に示すように利用者によりフランチャイズ名の「Ｂ石油」が選択されると、この「Ｂ石油」に対応したレコードのみが絞り込まれ、次に優先度が高く、かつ２つ以上の種類の選択肢を含んでいる候補セットタイトルである優先度６の「位置」に対応して、さらに複数の選択肢が抽出され、自車位置を基準とした各施設の位置（相対的な距離）が上述した音声（２７）のように案内される。
【００８２】
上述した音声（２８）に示すように、利用者により位置「２ｋｍ先右側」が選択されると、この選択された位置に対応する一の給油所である「Ｂ石油いわき店」が特定され、上述した音声（２９）に示すように、この給油所が経路探索の目的地にセットされ、一連の処理が終了する。
【００８３】
なお、再要求ボタン１５が押下された場合については、上述した第１の実施形態における対話例２と同様の対話が行われることとなり、その場合に用いられるデータの内容は、図１３に示すものと同様である。
（対話例７）
対話例７は、上述した対話例６と同様に最寄りの給油所を検索する場合であって、選択肢の中から「その他」が選択された場合の対話例を示している。また図１４は、対話例７において候補セットＤＢ２６ａから抽出されるレコードの内容を示す図である。
Ｕ：対話開始ボタン１４を押下する。
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください」
Ｕ：「給油所検索」
Ｓ：「給油所検索ですね、ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」
Ｕ：「その他」…（３０）
Ｓ：「では、Ｆ石油、Ｇ石油、その他、の中から選択してください」…（３１）
Ｕ：「Ｇ石油」…（３２）
Ｓ：「Ｇ石油ですね、それではＧ石油いわき店に目的地をセットします」…（３３）
上述した対話例７における音声（３０）に示すように、利用者により選択肢の中から「その他」が選択されると、既に提示されたＡ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油を除いたレコードが絞り込まれ、上述した音声（３１）に示すように、利用者が選択可能なフランチャイズ名が追加して案内される。
【００８４】
上述した音声（３２）に示すように利用者によりフランチャイズ名の「Ｇ石油」が選択されると、この「Ｇ石油」に対応したレコードのみが絞り込まれる。この場合には、フランチャイズ名に基づいた絞り込みを行った時点で、一のレコードが絞り込まれており、案内対象となる一の給油所である「Ｇ石油いわき店」が特定されるため、上述した音声（３３）に示すように、この給油所が経路探索の目的地にセットされ、一連の処理が終了する。
【００８５】
（対話例８）
対話例８は、上述した対話例７と同様に、選択肢の中から「その他」が選択された場合であって、次に提示可能な選択肢が存在しなかった場合の対話例を示している。また図１５は、対話例８において候補セットＤＢ２６ａから抽出されるレコードの内容を示す図である。
Ｕ：対話開始ボタン１４を押下する。
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください」
Ｕ：「給油所検索」
Ｓ：「給油所検索ですね、ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」
Ｕ：「その他」
Ｓ：「では、Ｆ石油、Ｇ石油、その他、の中から選択してください」
Ｕ：「その他」…（３４）
Ｓ：「申し訳ございません。その他の候補はありません」…（３５）
上述した対話例における音声（３４）に示すように、利用者により選択肢の中から「その他」が選択された場合であって、次に提示可能な選択肢が存在しなかった場合には、上述した音声（３５）に示すように、利用者が選択可能な選択肢がもう存在しない旨が案内される。
【００８６】
このように、第２の実施形態では、所定のテーブル形式を有し、複数の選択肢のそれぞれに異なる優先度が対応付けられた所定のテーブル情報を候補セットＤＢ２６ａに格納し、このテーブル情報に基づいて、優先度が高い選択肢から順番に、対応する文字列の読みを音声出力している。認識対象となる文字列を利用者にあらかじめ提示し、この提示に対応して利用者により入力される音声に対応する文字列を、音声出力の対象となった複数の文字列の中から選択し、利用者により選択された選択肢を特定しているので、音声認識の精度を向上させることができる。特に、テーブル形式でデータを格納しているので、レコードの追加・変更などを容易に行うことができる利点がある。
【００８７】
〔変形例〕
なお、本発明は上述した各実施形態のみに限定されるものではなく、本発明の要旨の範囲内においてさらに種々の変形実施が可能である。例えば、上述した実施形態では、提示される複数の選択肢の中からいずれか一の選択肢を利用者が順次選択していくことにより最終的な選択肢が選択され、その内容に対応する動作指示がナビゲーション装置２等に対して行われていたが、利用者が望んだ場合には選択肢が自動的に選択されるようにしてもよい。
【００８８】
図１６は、選択肢が自動的に選択される変形例における音声検索装置の構成を示す図である。図１６に示す音声検索装置１Ｂは、上述した第１の実施形態における音声検索装置１と比較して、選択頻度学習部３２と学習結果格納部３４が追加された点が異なっている。以下、主に第１の実施形態との相違点について着目して、構成および動作の説明を行う。
【００８９】
選択頻度学習部３２は、利用者に対して提示される複数の選択肢について、利用者による選択頻度を学習する。学習結果格納部３４は、選択頻度学習部３２による学習結果を格納する。
本実施形態では、選択肢を選択するための音声入力を行う際に、利用者が「まかせる」と入力することにより、選択肢が自動的に選択されるようになっている。この「まかせる」が入力された場合に、選択項目判定部２４は、学習結果格納部３４に格納された学習結果を用いて、過去の選択頻度が高い選択肢を自動的に選択し、動作指示の内容を決定している。なおこの場合には、選択項目判定部２４が文字列選択手段に対応する。
【００９０】
音声検索装置１Ｂはこのような構成を有しており、次に、過去の選択肢の選択頻度に応じて選択肢を自動的に選択する場合の動作について説明する。
図１７は、過去の選択肢の選択頻度に応じて選択肢を自動的に選択する場合の音声検索装置１Ｂの部分的な動作手順を示す流れ図である。なお、音楽検索装置１Ｂの基本的な操作手順は、上述した図４に示した第１の実施形態の音楽検索装置１と同様であり、ステップ１０７以降の処理内容が異なっている。図１７には、この処理内容の相違する部分が主に示されている。
【００９１】
選択項目判定部２４は、音声認識処理部１２から出力される音声認識結果に基づいて、選択肢の中から「その他」が選択されたか否かを判定する（ステップ１０７）。「その他」が選択された場合の処理は、図４に示した第１の実施形態の音楽検索装置１と同様であり、説明を省略する。
【００９２】
「その他」が選択されなかった場合には、ステップ１０７で否定判断が行われ、次に選択項目判定部２４は、音声認識処理部１２から出力される音声認識結果に基づいて、選択肢の中から「まかせる」が選択されたか否かを判定する（ステップ１４０）。
【００９３】
選択肢の中から「まかせる」が選択された場合には、ステップ１４０で肯定判断が行われ、選択項目判定部２４は、学習結果格納部３４に格納された学習結果を読み出し、過去の選択頻度に基づいて選択肢を自動的に選択する（ステップ１４１）。例えば、本実施形態では、最終的な選択肢に至るまでの選択肢が全て自動的に選択される。
【００９４】
過去の選択頻度に応じて最終的な選択肢が自動的に選択されると、あるいは、ステップ１０８で否定判断が行われると、選択項目判定部２４は、最終的な選択肢の内容を動作指示出力部３０に通知する。通知を受けた動作指示出力部３０は、利用者により選択された項目の内容に対応する動作指示をナビゲーション装置２等に出力する（ステップ１１３）。
【００９５】
次に、上述した図１７に示した処理にしたがって、音声検索装置１Ｂと利用者の間で行われる対話を具体的に説明する。
（対話例９）
対話例９は、上述した対話例１と同様に最寄りの給油所を検索する場合であって、選択肢として「まかせる」が選択された場合の対話例を示している。なお、対話例９において用いられるデータの内容は、上述した図５と同様である。
Ｕ：対話開始ボタン１４を押下する。
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください」
Ｕ：「給油所検索」
Ｓ：「給油所検索ですね、ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください。または、“まかせる”とお話しください」…（３６）
Ｕ：「まかせる」…（３７）
Ｓ：「まかせていただけますね、それではＢ石油いわき店に目的地をセットします」
上述した対話例では、音声（３６）に示すように、利用者に対して、“まかせる”と言う選択肢が新たに加えられる。これに対して、音声（３７）に示すように、利用者により“まかせる”が選択されると、過去の選択頻度に応じて、最も選択頻度の高い選択肢が自動的に選択される。上述した例では、フランチャイズ名以降の選択肢が過去の選択頻度に応じて自動的に選択されている。具体的には、フランチャイズ名としては「Ｂ石油」が自動的に選択され、位置については「２ｋｍ先右側」が自動的に選択されることにより、最終的に「Ｂ石油いわき店」という選択肢が選択されている。
【００９６】
なお、上述した例では、最終的な選択肢に至るまで全て自動的に選択されていたが、その時点での選択肢のみが自動的に選択されるようにしてもよい。例えば、フランチャイズ名を選択する際に「まかせる」が選択された場合であれば、このフランチャイズ名についてのみ自動的に選択し、下位階層の候補セットである「位置」に移行し、この候補セットに含まれる複数の選択肢を提示するようにすればよい。また上述した例では、選択肢を自動的に選択する処理を第１の実施形態に対して追加した場合について説明したが、第２の実施形態に対しても同様にしてこの機能を追加することができる。
【００９７】
また、上述した変形例では、選択肢として「まかせる」が選択された場合に、過去の選択頻度に応じて選択肢を選択していたが、過去の選択頻度にかかわらずランダムに選択肢を選択するようにしてもよい。この場合には、過去の選択頻度を学習する処理が不要となり、構成を簡略化することができる。
【００９８】
また、選択肢の過去の選択頻度を学習する処理を行う場合に、この学習結果を候補セットＤＢ２６（または２６ａ）に格納されているデータに反映させるようにしてもよい。例えば、上述した各実施形態では、フランチャイズ名に関する複数の選択肢を案内する場合に、「Ｇ石油」は、１回目に提示される選択肢の中から“その他”が選択された場合に行われる２回目の案内時に提示されていた。しかしながら、「Ｇ石油」が高い頻度で選択されているという学習結果が得られている場合であれば、この「Ｇ石油」が１回目の案内時に提示されるようにしてもよい。あるいは、１回の案内に含まれる複数の選択肢の中においても、過去の選択頻度に応じて案内順序を入れ替えてもよい。例えば、初期状態では１回目の案内において、Ａ石油、Ｂ石油、Ｃ石油、…という順番で案内されていた場合に、過去の選択頻度としてＧ石油、Ｂ石油、Ａ石油、…という順に選択頻度が高いという学習結果が得られている場合には、１回目の案内を、Ｇ石油、Ｂ石油、Ａ石油、…という順番に入れ替えればよい。なお、この場合には学習結果格納部３４が選択履歴格納手段に対応する。
【００９９】
また、上述した各実施形態では、案内音声を再度聞きたい場合には再要求ボタン１５を押下していたが、この再要求操作を音声入力によって行うようにしてもよい。この場合には、例えば、「もう一度」などという音声を入力し、これらの音声に対応して、直前の案内内容が再度出力されるようにすればよい。
【０１００】
また上述した各実施形態では、一旦選択された選択肢を取り消して、新たに選択肢を選択する場合の動作については説明されなかったが、そのような処理を行うこともできる。
図１８は、一旦選択された選択肢を取り消して、新たに選択肢を選択する場合の音声検索装置の動作手順を部分的に示す流れ図である。例えば、上述した第１の実施形態において説明した音声検索装置１において、この処理が行われるものとして説明を行う。この場合における基本的な動作手順は、上述した図４に示す流れ図と同様であり、ステップ１０７以降に新たな処理が追加されることとなる。図１８には、新たに追加される処理内容が主に示されている。なお、この変形例においては、選択項目判定部２４が再選択指示手段に対応する。
【０１０１】
選択項目判定部２４は、音声認識処理部１２から出力される音声認識結果に基づいて、選択肢の中から「その他」が選択されたか否かを判定する（ステップ１０７）。利用者により「その他」が選択された場合の処理は前述した図４と同様であり、ここでの説明は省略する。
【０１０２】
「その他」が選択されなかった場合には、ステップ１０７で否定判断が行われ、次に選択項目判定部２４は、音声認識処理部１２から出力される音声認識結果に基づいて、「修正」という音声が入力されたか否かを判定する（ステップ１５０）。具体的には、この「修正」という音声入力によって否定的な見解を示すことにより、一旦選択した選択肢を取り消す処理が行われるようになっている。なお「修正」の代わりに「戻る」や「違う」などといった音声入力を行うことにより、否定的な見解を示してもよい。
【０１０３】
「修正」という音声が入力された場合には、ステップ１５０で肯定判断が行われ、選択項目判定部２４は、選択肢設定部１６に対して、その時点で着目している階層よりも１つ上位階層の候補セットを再度設定するように指示する。この指示に応じて、選択肢設定部１６は、上位階層の候補セットを再度設定する（ステップ１５１）。その後、上述した図４に示すステップ１０２に戻り、上位階層の候補セットに含まれる選択肢に対応する文字列が認識対象として通知されるとともに、この選択肢が音声出力され、以降の処理が繰り返される。
【０１０４】
また「修正」という音声が入力されていない場合には、ステップ１５０で否定判断が行われ、この場合にはステップ１０８に進み、それ以降の処理が行われる。
次に、上述した図１８に示した処理にしたがって、音声検索装置１と利用者の間で行われる対話を具体的に説明する。
【０１０５】
（対話例１０）
対話例１０は、上述した対話例１等と同様に最寄りの給油所を検索する場合であって、「修正」という音声が入力された場合の対話例を示している。
Ｕ：対話開始ボタン１４を押下する。
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください。」
Ｕ：「給油所検索」
Ｓ：「給油所検索ですね、ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」
Ｕ：「Ａ石油」…（３８）
Ｓ：「Ｅ石油ですね、では２ｋｍ先右側、２．５ｋｍ先左側、３ｋｍ先左側、５ｋｍ左側、その他、の中から選択してください」…（３９）
Ｕ：「修正」…（４０）
Ｓ：「ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」…（４１）
Ｕ：「Ａ石油」
Ｓ：「Ａ石油ですね、では２ｋｍ先左側、５ｋｍ左側、その他、の中から選択してください」
Ｕ：「２ｋｍ先左側」
Ｓ：「２ｋｍ先左側ですね、それではＡ石油いわき店に目的地をセットします」上述した対話例１０では、音声（３８）に示すように利用者により選択肢の中から「Ａ石油」が選択されたにも関わらず、音声（３９）に示すように誤認識が生じて「Ｅ石油」が選択されたことになっている。この場合に、音声（４０）に示すように利用者が「修正」と音声入力を行うことにより、音声（４１）に示しように、上位階層の候補セットであるフランチャイズ名に基づいて、利用者が選択可能なフランチャイズ名が再度案内される。
【０１０６】
（対話例１１）
対話例１１は、「修正」という音声が入力された場合の他の対話例を示している。
Ｕ：対話開始ボタン１４を押下する。
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください。」
Ｕ：「給油所検索」
Ｓ：「給油所検索ですね、ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」
Ｕ：「施設検索」…（４２）
Ｓ：「Ｃ石油ですね、では２ｋｍ先右側、２．５ｋｍ先左側、３ｋｍ先右側、５ｋｍ左側、その他、の中から選択してください」…（４３）
Ｕ：「修正」…（４４）
Ｓ：「ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」…（４５）
Ｕ：「修正」…（４６）
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください。」…（４７）
Ｕ：「施設検索」
以下、施設検索を選択してからの対話は、上述した対話例５と同様に行われるので、ここでは説明を省略する。
【０１０７】
上述した対話例１１では、利用者は一旦「給油所検索」を選択したものの、「施設検索」を選択したくなったため、音声（４２）に示すように、音声検索装置１から提示されている選択肢とは異なる選択肢である「施設検索」を音声入力している。この場合であっても音声検索装置１は、音声（４３）に示すように、その時点における選択肢の中から、入力された音声に最も近いものを選択して処理を続行する。
【０１０８】
音声（４４）に示すように、利用者が「修正」と音声入力を行うことにより、音声（４５）に示すように、上位階層の候補セットであるフランチャイズ名に基づいて、利用者が選択可能なフランチャイズ名が再度案内される。
音声（４６）に示すように、ここで利用者が、さらに「修正」と音声入力を行うことにより、音声（４７）に示すように、さらに上位階層の候補セットである「機能」に基づいて、利用者が選択可能な選択肢が再度案内される。
【０１０９】
ところで、上述した対話例１１では、提示される複数の選択肢の内容に沿わない音声入力が行われた場合であっても、その時点における複数の選択肢の中からいずれかが選択されていたが、選択肢の内容に沿わない音声入力が行われ、選択肢を特定することが難しい場合には、有効な認識結果が得られなかった旨を通知するようにしてもよい。この場合には、上述した図４に示すステップ１０６の処理において認識結果の有効性を判断し、文字列の一致率が非常に低い（例えば、１０％以下など）場合には、選択肢を特定できなかった旨を通知すればよい。以下に、有効な認識結果が得られなかった場合の対話を具体的に説明する。
【０１１０】
（対話例１２）
Ｕ：対話開始ボタン１４を押下する。
Ｓ：「食事場所検索、給油所検索、施設検索、駐車場検索、オーディオ操作、その他、の中から選択してください。」
Ｕ：「給油所検索」
Ｓ：「給油所検索ですね、ではフランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」
Ｕ：「Ｈ石油」…（４８）
Ｓ：「申し訳ございません。入力された単語を認識できませんでした。フランチャイズ名を、Ａ石油、Ｂ石油、Ｃ石油、Ｄ石油、Ｅ石油、その他、の中から選択してください」…（４９）
上述した対話例１２では、音声（４８）に示すように、選択肢として提示されていない「Ｈ石油」が利用者によって選択されたため、音声検索装置１は有効な認識結果を得ることができない。したがって音声検索装置１は、音声（４９）に示すように、選択肢を特定することができなかった旨を利用者に対して通知するとともに、再度の選択肢の入力を促す案内を行っている。
【０１１１】
また、音声認識処理を行う際に、提示した複数の選択肢に対応する文字列と利用者によって入力された音声に対する文字列との部分的な一致を考慮して、認識精度を高めるようにしてもよい。例えば、選択肢として提示された「給油所検索」を選択する際に、利用者によっては「給油所」という部分しか発声しないことも考えられる。このような場合に、「給油所検索」という文字列の全体だけを音声認識の対象とすると、利用者が発声した「給油所」とは部分的にしか一致していないため一致率が低く、もちろん、他の選択肢（「食事場所検索」等）とも一致率が低いため、利用者により選択された選択肢を正確に特定することが難しい場合がある。したがって、例えば給油所検索については、認識対象文字列を「給油所検索」および「給油所」とし、食事場所検索については「食事場所検索」、「食事場所」、「食事」などにすることにより、複数の選択肢に対応する文字列と利用者によって入力された音声に対する文字列との全体的な一致と部分的な一致の両者を判定することができるため、認識精度を高めることができる。
【０１１２】
なお、このように部分的な一致を考慮する場合においても、認識結果を返答する際には、文字列の全体を出力することが好ましい。例えば、利用者により「給油所」と入力された場合であっても、対応する認識結果の返答としては、「給油所検索ですね」というように、文字列の全体を返答することが好ましい。
【０１１３】
また上述した各実施形態では、複数の選択肢を提示し、いずれか一を利用者に音声入力させていたが、各選択肢に対して所定の符号を付加して提示し、所望の選択肢に付加された符号を音声入力するようにしてもよい。具体的には、所定の符号としては「１、２、３、…」等の数字や「Ａ、Ｂ、Ｃ、…」等の文字などが考えられる。例えば、選択肢として複数の機能を提示する場合であれば、「１：食事場所検索、２：給油所検索、３：施設検索、４：駐車場検索、５：オーディオ操作、６：その他、の中から該当する数字を選択してください」というような内容の案内音声を出力し、１〜６のいずれかの数字を利用者に音声入力させればよい。このように、所定の符号を用いる場合には、利用者は所望の選択肢に対応付けられた符号を発声するだけでよく、音声入力をより簡単にすることができる。また、音声認識処理の対象とする文字列を数字等の簡単な文字列にすることができるため、認識精度を向上させることができる。
【０１１４】
また、上述した各実施形態では、複数の選択肢が全て提示された後に、利用者が一の選択肢を選択して音声入力を行っていたが、全ての選択肢が提示されるよりも先に利用者による音声入力が行われた場合には、その時点で音声認識処理を開始するようにしてもよい。利用者によっては、出力される音声案内を聞いているとき、所望の選択肢が出力された直後に、音声入力を開始する場合がある。このような場合には、選択肢が全て提示された後でなくても、速やかに音声認識処理を開始することにより、操作性をより向上させることができる。
【０１１５】
また、上述した各実施形態では、単体で用いられる音声検索装置について説明していたが、ネットワークを介して接続されたサーバと端末装置とに機能を分散配置して音声検索装置を構成してもよい。
図１９は、ネットワークを介して接続されたサーバと端末装置とに機能を分散配置した場合の音声検索装置の構成例を示す図である。図１９に示す音声検索装置は、所定のネットワーク６を介して接続された音声検索端末装置４とサーバ５から構成されている。
【０１１６】
音声検索端末装置４は、基本的には上述した第２の実施形態における音声検索装置１Ａと同様の構成を有しており、通信処理部３６が追加された点が異なっている。なお、音声検索端末装置４の構成は、上述した第１の実施形態における音声検索装置１と同様にしてもよい。
【０１１７】
音声検索端末装置４に備わった通信処理部３６は、候補セットＤＢ２６ａに格納されるデータを更新するために必要な情報をネットワーク６を介してサーバ５から取得するための通信処理を行う。ＤＢ更新部２８は、通信処理部３６によって受信された情報に基づいて、候補セットＤＢ２６ａに格納されているデータを更新する。この更新処理は、音声検索端末装置４による所定の処理に先だって行われる。
【０１１８】
また、サーバ５は、サーバ制御部５０、候補セットＤＢ５２、通信処理部５４を含んで構成されている。サーバ制御部５０は、サーバ５の全体動作を制御する。候補セットＤＢ５２は、上述した音声検索端末装置４に備わっている候補セットＤＢ２６ａと基本的に同じ内容データを格納している。この候補セットＤＢ５２に格納されるデータは、随時、新しい内容に更新されている。
【０１１９】
音声検索端末装置４から所定の要求がなされた場合に、サーバ制御部５０は、以前に音声検索端末装置４に送信済みの内容に対する変更内容を含んだ所定の差分情報を候補セットＤＢ５２から抽出し、この差分情報を音声検索端末装置４に対して送信する。通信処理部５４は、サーバ５が音声検索端末装置４との間でデータの送受を行うために必要な通信処理を行う。
【０１２０】
このように、サーバ５から送信される所定の差分情報に基づいて、音声検索端末装置４に備わった候補セットＤＢ２６ａの内容を更新することができるので、音声検索端末装置４は、内容の更新された新しい情報を各種処理に反映させることができる。特に、サーバ５から音声検索端末装置４に送られてくる情報は、前回までに送られてきた情報に対する変更内容を含む差分情報であるため、送受するデータ量を低減し、通信コストを削減することができる。
【０１２１】
なお、上述したサーバ５が、検索対象項目とそれぞれに対応する検索キーに関する情報を格納する機能を有している。音声検索端末装置４が、認識対象文字列出力手段、マイクロホン、音声認識処理手段、項目抽出手段に対応する機能を有しており、これらを用いた各種の処理に先だって、上述したサーバ５から必要な情報を取得している。
【０１２２】
図２０は、ネットワークを介して接続されたサーバと端末装置とに機能を分散配置した場合の音声検索装置の他の構成例を示す図である。図２０に示す音声検索装置は、所定のネットワーク６を介して接続された音声検索端末装置４Ａとサーバ５Ａから構成されている。
【０１２３】
図２０に示す音声検索装置では、上述した第１の実施形態の音声検索装置１に備わっていた候補セットＤＢ２６（あるいは第２の実施形態の音声検索装置１Ａに備わっていた候補セットＤＢ２６ａ）、選択肢設定部１６、選択項目判定部２４のそれぞれによって実現される機能に対応する構成がサーバ５Ａに配置されている。具体的には、サーバ５Ａは、サーバ制御部５０、候補セットＤＢ５２、通信処理部５４、選択肢設定部５６、選択項目判定部５８を備えている。
【０１２４】
また、音声検索端末装置４Ａは、上述した音声検索端末装置４から、選択肢設定部１６、選択項目判定部２４、候補セットＤＢ２６ａ、ＤＢ更新部２８が省略されており、制御部３８が追加されている。
利用者の発声する音声に対応してナビゲーション装置２等に対して動作指示を出力する際に、音声検索端末装置４Ａ内の制御部３８は、選択肢を提示するために必要な最小限のデータを通信処理部３６を介してサーバ５Ａから取得する。案内文生成部１８は、制御部３８からの指示にしたがって、所定の案内文を生成し、出力する。サーバ５Ａは、利用者の音声に対する音声認識結果を音声検索端末装置４Ａから取得し、次の候補セットを設定し、選択肢の提示に必要なデータを音声検索端末装置４Ａに送信する処理や、最終的に選択された一の選択肢を抽出する処理などを行っている。
【０１２５】
なお、上述したサーバ５Ａが、検索対象項目とそれぞれに対応する検索キーに関する情報を格納するとともに、認識対象文字列出力手段による音声出力の対象となる文字列の抽出処理と、項目抽出手段による検索対象項目の抽出処理を行う機能を有している。また、音声検索端末装置４Ａが、認識対象文字列出力手段、マイクロホン、音声認識処理手段に対応する機能を有しており、これらの処理に必要な情報を上述したサーバ５Ａから取得している。
【０１２６】
図２１は、ネットワークを介して接続されたサーバと端末装置とに機能を分散配置した場合の音声検索装置の他の構成例を示す図である。図２１に示す音声検索装置は、所定のネットワーク６を介して接続された音声検索端末装置４Ｂとサーバ５Ｂから構成されている。
【０１２７】
図２１に示す音声検索装置では、上述した図２０に示した音声検索装置において、さらに案内文生成部１８の機能をサーバ側に配置した点が異なっている。具体的には、サーバ５Ｂは、サーバ制御部５０、候補セットＤＢ５２、通信処理部５４、選択肢設定部５６、選択項目判定部５８、案内文生成部６０を備えている。また音声検索端末装置４Ｂは、音声検索端末装置４Ａから案内文生成部１８が削除された点が異なっている。図２１に示す音声検索装置では、案内文の生成がサーバ５Ｂで行われるため、音声検索端末装置４Ｂ内の制御部３８は、サーバ５Ｂによって生成された案内文を受け取り、これを音声合成部２０に出力する。それ以外の動作内容は、図２０に示す音声検索装置と同様である。
【０１２８】
図２２は、ネットワークを介して接続されたサーバと端末装置とに機能を分散配置した場合の音声検索装置の他の構成例を示す図である。図２２に示す音声検索装置は、所定のネットワーク６を介して接続された音声検索端末装置４Ｃとサーバ５Ｃから構成されている。図２２に示す音声検索装置では、上述した図２１に示した音声検索装置において、さらに音声認識処理部１２と音声合成部２０の機能をサーバ側に配置した点が異なっている。具体的には、サーバ５Ｃは、サーバ制御部５０、候補セットＤＢ５２、通信処理部５４、選択肢設定部５６、選択項目判定部５８、案内文生成部６０、音声認識処理部６２、音声合成部６４を備えている。また音声検索端末装置４Ｃは、音声検索端末装置４Ｂから音声認識処理部１２と音声合成部２０が削除された点が異なっている。
【０１２９】
図２２に示す音声検索装置では、マイクロホン１０によって集音された利用者の音声が制御部３８によってデジタルの音声データに変換されてサーバ５Ｃに送信される。そして、送信された音声データに基づいて、サーバ５Ｃ内の音声認識処理部６２により所定の音声認識処理が行われる。また、案内文生成部６０によって生成された案内文に対応して、音声合成部６４により所定の音声合成処理が行われ、案内文に対応した音声データが生成される。生成された音声データは、音声検索端末装置４Ｃに送信され、音声検索端末装置４Ｃ内の制御部３８によってアナログ信号に変換されてスピーカ２２に出力される。
【０１３０】
図２０〜図２２に示す変形例の音声検索装置では、多くの機能をサーバ側に配置しているので、音声検索端末装置側の処理負担が軽減し、構成の簡略化が可能となるため、音声検索端末装置のコストダウンを図ることができる利点がある。
また、上述した各実施形態や変形例では、本発明の音声検索装置を車載用システムに適用した場合について種々の形態を説明してきたが、本発明の適用範囲は車載用システムに限定されるものではなく、他の種々のシステムに適用することができる。
【０１３１】
【発明の効果】
上述したように、本発明によれば、検索キーとなりうる複数の文字列の読みを音声出力することにより、認識対象となる文字列をあらかじめ利用者に提示しており、これらの文字列のみを音声認識の対象としているため、音声認識の精度を向上させることができる。
【図面の簡単な説明】
【図１】第１の実施形態の音声検索装置を含んで構成される車載用システムの構成を示す図である。
【図２】候補セットＤＢに格納されるデータの構造を示す図である。
【図３】図２に示したデータ構造における上位階層の候補セットと下位階層の候補セットとの対応関係を示す図である。
【図４】第１の実施形態の音声検索装置の動作手順を示す流れ図である。
【図５】対話例１において用いられるデータの内容を示す図である。
【図６】対話例３において用いられるデータの内容を示す図である。
【図７】対話例４において用いられるデータの内容を示す図である。
【図８】対話例５において用いられるデータの内容を示す図である。
【図９】第２の実施形態の音声検索装置を含んで構成される車載用システムの構成を示す図である。
【図１０】第２の実施形態の候補セットＤＢに格納されるデータの構造を示す図である。
【図１１】第２の実施形態の音声検索装置の部分的な動作手順を示す流れ図である。
【図１２】ステップ１２２に示す処理の詳細な手順を示す流れ図である。
【図１３】対話例６において候補セットＤＢから抽出されるレコードの内容を示す図である。
【図１４】対話例７において候補セットＤＢから抽出されるレコードの内容を示す図である。
【図１５】対話例８において候補セットＤＢから抽出されるレコードの内容を示す図である。
【図１６】選択肢が自動的に選択される変形例における音声検索装置の構成を示す図である。
【図１７】選択頻度に応じて選択肢を自動的に選択する場合の音声検索装置の部分的な動作手順を示す流れ図である。
【図１８】一旦選択された選択肢を取り消して、新たに選択肢を選択する場合の音声検索装置の動作手順を部分的に示す流れ図である。
【図１９】ネットワークを介して接続されたサーバと端末装置とに機能を分散配置した場合の音声検索装置の構成例を示す図である。
【図２０】ネットワークを介して接続されたサーバと端末装置とに機能を分散配置した場合の音声検索装置の他の構成例を示す図である。
【図２１】ネットワークを介して接続されたサーバと端末装置とに機能を分散配置した場合の音声検索装置の他の構成例を示す図である。
【図２２】ネットワークを介して接続されたサーバと端末装置とに機能を分散配置した場合の音声検索装置の他の構成例を示す図である。
【符号の説明】
１、１Ａ、１Ｂ音声検索装置
２ナビゲーション装置
３オーディオ装置
４、４Ａ、４Ｂ、４Ｃ音声検索端末装置
５、５Ａ、５Ｂ、５Ｃサーバ
６ネットワーク
１０マイクロホン
１２、６２音声認識処理部
１４対話開始ボタン
１５再要求ボタン
１６、５６選択肢設定部
１８、６０案内文生成部
２０、６４音声出力部
２２スピーカ
２４、５８選択項目判定部
２６、２６ａ、５２候補セットＤＢ（データベース）
２８ＤＢ更新部
３０動作指示出力部
３２選択頻度学習部
３４学習結果格納部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a voice search apparatus that searches various information using voice recognition.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, there has been known a voice search device that performs voice recognition processing on voice uttered by a user and searches various information based on a recognition result.
Such a voice search device is used in combination with an in-vehicle navigation device. For example, considering the case where a navigation device executes a function of searching for various facilities in order to make a route search destination, the voice search device (1) performs a function to be executed in response to a user's voice. (2) Search for facilities belonging to the facility type specified by the user (for example, eating place or gas station), and (3) further specify the facility specified by the franchise name specified by the user Search for information and (4) search for information in the procedure of finally extracting one facility designated by the user.
[0003]
[Problems to be solved by the invention]
By the way, in the conventional voice search device, each user needs to remember what words can be input to the voice search device. For example, in the above-described example, accurate voice input cannot be performed unless it is known in advance what kind of function, facility type, and the like are set as recognition targets. However, many users cannot grasp all the words that are recognized, so they will try to input the words that they have come up with appropriately, and cannot input speech using the correct words. For this reason, there is a problem that the accuracy of voice recognition may be lowered.
[0004]
The present invention has been created in view of such a point, and an object thereof is to provide a voice search device capable of improving the accuracy of voice recognition.
[0005]
[Means for Solving the Problems]
In order to solve the above-described problem, in the voice search device of the present invention, a search key is associated with each of a plurality of search target items, and the content of the user's input voice is compared with the search key. Set the maximum number of character strings that can be used as search keys when extracting relevant items from the search target items, and read multiple character strings within the range not exceeding that number Output audio. The user's voice collected by the microphone is subjected to predetermined voice recognition processing by the voice recognition processing means, and the character string corresponding to the voice is subjected to voice output by the recognition target character string output means. A search target item corresponding to the search key selected from the character string and specified by the character string selected by the speech recognition processing means is extracted by the item extraction means.
[0006]
A character string corresponding to the voice input by the user corresponding to this presentation is presented to the user in advance by outputting the reading of a plurality of character strings that can serve as search keys. Since the column is selected from a plurality of character strings that are the targets of voice output and the search key is specified, the accuracy of voice recognition can be improved.
[0007]
Further, it is desirable to further include a switch operated before the user utters, and when the switch is operated, it is desirable to start the voice recognition processing by the voice recognition processing means. Since the voice recognition process only needs to be started when the switch is operated, the timing for starting the voice recognition process becomes clear, and the process can be simplified.
[0008]
Further, it is desirable to further include a selected character string confirmation unit that outputs a reading of the character string selected by the voice recognition processing unit. By outputting the reading of the selected character string by voice, the user can easily confirm the recognition result for the voice input by the user.
[0009]
In addition, when a negative opinion is shown by the user with respect to the selection result of the character string by the voice recognition processing means, a plurality of character string readings used to obtain the selection result are again output as a voice It is desirable to further include a reselection instruction means for giving an instruction to the recognition target character string output means. Thus, the user can re-enter the search key by showing a negative opinion when a character string different from his / her wish is obtained as a selection result.
[0010]
Further, the recognition target character string output means, when the total number of character strings to be recognized exceeds the predetermined maximum number, the number of character strings in a range not exceeding the maximum number divided into a plurality of times. It is desirable to output the reading as a voice, and to perform selection determination of the character string by the voice recognition processing means for each voice output. Even when there are a large number of character strings to be recognized, voice output is performed by dividing into a predetermined number of characters, so the user can select a character string by paying attention only to the predetermined number of character strings. Well, it is possible to reliably select a desired character string.
[0011]
In addition, it is preferable that the apparatus further includes voice output instruction means for instructing the recognition target character string output means for the second or subsequent voice output when the user instructs voice output of another selection candidate. Thereby, the user can easily obtain other selection candidates.
[0012]
In addition, when a voice output is instructed again by the user, a re-speech output instructing unit that instructs the recognition target character string output unit to output a plurality of character strings read out immediately before the voice is output again. It is desirable to further include. Thereby, when the content of the audio output is missed, the audio output can be performed again and the content can be confirmed.
[0013]
Further, the apparatus further includes character string selection means for selecting a character string without using the result of the voice recognition processing by the voice recognition processing means when an instruction to perform the character string selection operation is given by the user. When the character string is selected by the character string selecting unit, the item extraction unit uses the character string selected by the character string selecting unit instead of the character string selected by the speech recognition processing unit. It is desirable to perform an item extraction operation. Since the user can leave the selection of the character string to the voice search device by giving an instruction to “keep”, the operation can be simplified in the case where any character string may be selected. Can be realized.
[0014]
In addition, when a plurality of search keys are associated with each search target item and one search target item cannot be narrowed down by the item extraction unit corresponding to one search key, one search target item is displayed. It is desirable to repeat the processing by the recognition target character string output means, the speech recognition processing means and the item extraction means using other search keys until the items can be narrowed down. Thereby, one search object item can be narrowed down reliably.
[0015]
The table storage means stores table information in which different priority levels are associated with each of the plurality of search keys, and a plurality of character strings corresponding to the plurality of search keys are associated with each of the plurality of search target items. It is desirable to read out the corresponding character string readings in order from the search key with the highest priority by the recognition target character string output means based on the table information stored in the table storage means. Since contents can be added or changed for each search key for which priority is set, data can be updated easily.
[0016]
In addition, when a character string corresponding to one search key is selected, a tree structure storage unit stores a search key to be selected next and tree structure information of a plurality of layers indicating the character string corresponding to the search key. Based on the tree structure information stored in the tree structure storage means, the recognition target character string output means extracts a plurality of character strings corresponding to the search key that is the next voice output target, The reading of these character strings may be output by voice. Since it is possible to extract a character string to be output next by simply following the tree structure from the upper layer in order, the processing can be simplified.
[0017]
Further, a selection history storage means for storing past selection history information by the speech recognition processing means is further provided, and a character string having a high selection frequency is recognized based on the selection history information stored in the selection history storage means. It is desirable that the character string output means make a determination and preferentially output the reading of the character string. By performing voice output preferentially for a character string having a higher frequency of selection, it becomes possible to select a character string having a higher frequency of selection with less voice input, thereby improving operability. .
[0018]
In addition, when each of the plurality of character strings is composed of one sound among Japanese 50 sounds, the item extracting means searches for the first word that matches the sound selected by the speech recognition processing means. It is desirable to extract the key. Even if there are many character strings that are selection candidates, the candidate character strings can be easily narrowed down.
[0019]
In addition, it is desirable that the voice recognition processing means selects a character string by comparing all characters constituting the character string with the entire voice recognition processing result. Since it is only necessary to select a character string in consideration of only a character string and a speech recognition result that completely match, the comparison process becomes easy and the process can be simplified.
[0020]
The speech recognition processing means may select a character string by comparing characters constituting a part of the character string with the entire speech recognition processing result. By performing comparisons that take into account the characters that make up part of the character string, the user can input only those parts that have this feature and are easy to remember when there is a feature in the character string. Thus, the operability can be improved.
[0021]
Further, it is preferable that the voice recognition processing means starts the character string selecting operation from the point of time when the user's voice is collected by the microphone before the voice output by the recognition target character string output means is finished. . In voice output, when it is desired to select a recognition target character string guided in the first place, it is possible to input the desired character string without waiting for all the voice output, so that the operability is further improved. Improvements can be made.
[0022]
The maximum number of character strings that can serve as the search key described above is preferably set in a range of 7 ± 2. According to the theory of short-term memory in cognitive psychology, if a chunk of information that has some sort of chunk is defined as a “chunk”, the amount of information that a person can hold at one time is approximately 7 ± 2 chunks. ing. For example, when storing a telephone number, basically, one number constituting the telephone number corresponds to one chunk. In addition, when the number string “2983” is stored in the same manner as “butcher”, the information “butcher” corresponds to one chunk. Therefore, by setting the maximum number of character strings that can serve as search keys in the range of 7 ± 2 based on the concept of “chunk”, the user can surely remember the character strings that can serve as search keys. be able to. Details regarding the above-mentioned “chunk” are described in, for example, page 75 of the document “Cognitive Psychology 2 Memory edited by Yotaro Takano 1995 University of Tokyo Press”.
[0023]
Further, the voice search device may be configured by distributing functions to servers and terminal devices connected via a network. Specifically, a function for storing information on search target items and search keys corresponding to the search target items is arranged in the server, and the terminal device includes a recognition target character string output unit, a microphone, a voice recognition processing unit, and an item extraction unit. It is preferable to configure the voice search device by arranging the corresponding function and acquiring the necessary information from the server by the terminal device prior to various processes. Since the terminal device acquires information necessary for various processes from the server, the terminal device can acquire new information with updated contents from the server by communication and reflect the information in the various processes.
[0024]
Further, the information sent from the server to the terminal device is preferably difference information including changes to the information sent so far. When the contents are changed, only the difference information including the changed contents needs to be acquired, and the communication cost can be reduced.
[0025]
In addition, when a voice search device is configured by distributing functions to a server and a terminal device connected via a network, information on search target items and search keys corresponding to the search target items is stored, and recognition target characters are stored. A function for performing extraction processing of a character string that is a target of voice output by the column output unit and extraction processing of an item to be searched by the item extraction unit is arranged in the server, and the terminal device includes a recognition target character string output unit, a microphone, Functions corresponding to the voice recognition processing means may be arranged so that the terminal device acquires information necessary for these processes from the server. By placing many functions on the server side, the processing load on the terminal device is reduced and the configuration can be simplified, so that the cost of the terminal device can be reduced.
[0026]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a speech search apparatus according to an embodiment to which the present invention is applied will be described with reference to the drawings.
[First Embodiment]
FIG. 1 is a diagram illustrating a configuration of an in-vehicle system configured to include the voice search device 1 according to the first embodiment. The in-vehicle system shown in FIG. 1 includes a voice search device 1 that determines and outputs various operation instructions in an interactive manner in response to a voice uttered by a user, and detects a position of the vehicle and A navigation device 2 that displays a map or performs route search and route guidance to a destination selected by a user, and an audio device that reproduces music recorded on a recording medium such as a compact disc or a mini disc 3 is comprised.
[0027]
Next, the detailed configuration of the voice search device 1 will be described. A voice search device 1 shown in FIG. 1 includes a microphone 10, a voice recognition processing unit 12, a dialog start button 14, a re-request button 15, an option setting unit 16, a guidance sentence generation unit 18, a voice synthesis unit 20, a speaker 22, and selection items. The determination unit 24 includes a candidate set database (DB) 26, a DB update unit 28, and an operation instruction output unit 30.
[0028]
The microphone 10 collects the voice uttered by the user and converts it into a voice signal.
The voice recognition processing unit 12 analyzes a voice signal output from the microphone 10 to perform a predetermined voice recognition process, and specifies a character string corresponding to the voice uttered by the user. The voice recognition processing unit 12 according to the present embodiment performs a predetermined recognition process using a character string corresponding to a predetermined number of options set by the option setting unit 16 as a recognition target.
[0029]
The dialog start button 14 is a push button switch that is pressed when the user starts a dialog with the voice search device 1. The re-request button 15 is a push button switch that is pressed when the user wants to hear the voice output from the voice search device 1 again.
[0030]
The option setting unit 16 sets a predetermined number of options to be presented as candidates when performing voice input based on the data stored in the candidate set DB 26. This predetermined number is desirably set within a range of 7 ± 2 in one presentation opportunity, and in this embodiment, five options are set. Details of processing performed by the option setting unit 16 will be described later.
[0031]
The guidance sentence generation unit 18 generates the content of the guidance voice to be output to the user, that is, the guidance sentence, based on the predetermined number of options set by the option setting unit 16.
The voice synthesis unit 20 generates a voice signal for performing voice output corresponding to the guidance sentence generated by the guidance sentence generation unit 18 and outputs the voice signal to the speaker 22. The speaker 22 outputs a guidance voice based on the inputted voice signal.
The selection item determination unit 24 determines which item is selected by the user from a predetermined number of options based on the character string of the recognition result output from the speech recognition processing unit 12.
[0032]
The candidate set DB 26 stores data necessary for the option setting unit 16 to set a plurality of options.
FIG. 2 is a diagram illustrating the structure of data stored in the candidate set DB 26. As shown in FIG. 2, the candidate set DB 26 stores a predetermined candidate set (tree structure information) having a hierarchical structure. Each candidate set includes a predetermined number of options. The candidate set of the highest hierarchy includes a plurality of functions that can be executed by the navigation device 2 or the like as options. The candidate set for the second and subsequent layers includes a plurality of options associated with one of the plurality of options included in the candidate set for the upper layer.
[0033]
FIG. 3 is a diagram illustrating a correspondence relationship between the upper layer candidate set and the lower layer candidate set in the data structure illustrated in FIG. 2. For example, the candidate set 100 in the highest hierarchy includes the options “meal location search”, “gas station search”, “facility search”, “parking location search”, “audio operation”, and “others”. Yes. These options are arranged based on a predetermined priority, and when generating a guidance voice for guiding these options, each option is guided in descending order of priority. For example, in the candidate set 100 shown in FIG. 3, “meal place search” has the highest priority. In the guidance voice generated based on this candidate set 100, “meal place search” and “gas station search”. Each option is guided in the order of “,. A specific example in which each option is guided will be described later. The same applies to other candidate sets.
[0034]
In addition, there is another candidate set 100a associated with the option “other” in the same hierarchy, and this candidate set 100a includes options “traffic information”, “map display”,. ing. Regarding “others” of the options included in the candidate set 100a, when other options exist, a new candidate set is provided and associated with the “others”.
[0035]
For options other than “others” included in the candidate set 100 and the like, a candidate set including a plurality of options is provided in the lower hierarchy in association with this option. For example, a candidate set 102 exists as a candidate set in a lower hierarchy associated with “meal location search” included in the candidate set 100, and this candidate set 102 includes “ A plurality of franchise names and the like such as restaurant “a” are included as options. Similarly, a candidate set 104 exists as a candidate set in a lower hierarchy associated with the “search for gas station” included in the candidate set 100. In this candidate set 104, in order to select a gas station, Multiple franchise names such as “A Oil” are included as options.
[0036]
As described above, in this embodiment, the process of selecting one option in order from the candidate set of the highest hierarchy and moving to the candidate set of the lower hierarchy associated with the option is repeated. The content of the operation instruction is determined by selecting one from a plurality of options included in the candidate set in the lower hierarchy. In this case, a plurality of options included in the candidate set in the upper hierarchy correspond to the “search key”, and a plurality of options included in the candidate set in the lowest hierarchy correspond to the “search target item”. Note that FIG. 2 shows a candidate set having a four-level hierarchical structure, but this is an example, and the number of hierarchical levels increases or decreases depending on the contents of the operation instruction.
[0037]
The DB update unit 28 acquires the detection result of the vehicle position from the navigation device 2, and based on this, the data (position data) related to the position of the facility such as a meal place, a gas station, or a parking lot stored in the candidate set DB 26. Update the contents of. For example, when the gas station search is performed and the franchise name of the facility to be searched is selected, the DB update unit 28 focuses on the vehicle position at that time from the stores corresponding to the franchise name. The store which exists in the predetermined range is extracted, the position data is calculated for the extracted store, and the contents of the candidate set DB 26 are updated.
[0038]
The operation instruction output unit 30 directs a predetermined operation instruction to the navigation device 2 or the audio device 3 in accordance with the content of the item finally selected by repeating the process of selecting one of a plurality of options. Output.
The option setting unit 16, the guidance sentence generation unit 18, the voice synthesis unit 20, and the speaker 22 described above are used as the recognition target character string output unit and the selected character string confirmation unit, and the voice recognition processing unit 12 is used as the voice recognition processing unit. 16, the selection item determination unit 24 is an item extraction unit, the dialog start button 14 is a switch, the re-request button 15 is a re-voice output instruction unit, the candidate set DB 26 is a tree structure storage unit, and the selection item determination unit 24 is a voice Each corresponds to the output instruction means.
[0039]
The voice search device 1 of this embodiment has such a configuration, and the operation thereof will be described next.
FIG. 4 is a flowchart illustrating an operation procedure of the voice search device 1 according to the first embodiment. An operation procedure for outputting an operation instruction to the navigation device 2 corresponding to the voice uttered by the user is shown.
[0040]
The option setting unit 16 determines whether or not the dialogue start button 14 has been pressed by the user (step 100). If the dialogue start button 14 is not pressed, a negative determination is made, and the processing in step 100 is repeated. When the dialog start button is pressed by the user, an affirmative determination is made, and the option setting unit 16 sets the topmost candidate set as the top candidate set using the data stored in the candidate set DB 26. (Step 101).
[0041]
Next, the option setting unit 16 notifies the speech recognition processing unit 12 and the guidance sentence generating unit 18 of character strings corresponding to a plurality of options included in the candidate set (step 102).
The guide sentence generator 18 generates a predetermined guide sentence that guides a plurality of options included in the candidate set, and outputs it to the speech synthesizer 20. A voice signal corresponding to the guidance sentence is generated by the voice synthesizer 20 and output to the speaker 22, and a guidance voice for presenting options is output from the speaker 22 (step 103).
[0042]
The option setting unit 16 determines whether or not the re-request button 15 has been pressed by the user (step 104). If the re-request button 15 is pressed, an affirmative determination is made in step 104, the process returns to step 103, and the subsequent processing is repeated. Specifically, the option sentence setting unit 16 notifies the guidance sentence generating unit 18 that the guidance sentence has been requested to be output again. In response to this notification, the guidance sentence generated during the previous processing is output again to the speech synthesizer 20 by the guidance sentence generator 18. As a result, the guidance voice is re-output.
[0043]
If the re-request button 15 is not pressed, a negative determination is made in step 104, and the voice recognition processing unit 12 has made a voice input by the user based on the presence or absence of a voice signal output from the microphone 10. It is determined whether or not (step 105). If no voice input is performed, a negative determination is made in step 105. In this case, the process returns to step 104 described above, and the subsequent processing is repeated.
[0044]
If speech input has been performed, an affirmative determination is made in step 105, and the speech recognition processing unit 12 predetermines only character strings corresponding to a plurality of options notified from the option setting unit 16 as speech recognition targets. The voice recognition process is performed, and one option selected by the user is specified (step 106). In the present embodiment, in addition to the plurality of options notified from the option setting unit 16, “others” is assumed to be a speech recognition target as one of the options.
[0045]
The selection item determination unit 24 determines whether or not “other” is selected from the choices based on the speech recognition result output from the speech recognition processing unit 12 (step 107). If “others” is not selected, a negative determination is made in step 107, and the selection item determination unit 24 sends an instruction to the option setting unit 16, and the next candidate corresponding to the option selected by the user It is determined whether or not there is a set (candidate set in a lower hierarchy) (step 108).
[0046]
If there is a next candidate set, an affirmative determination is made in step 108, and the selection item determination unit 24 instructs the option setting unit 16 to set the next candidate set. Upon receiving the instruction, the option setting unit 16 sets the next candidate set (step 109). Thereafter, the processing returns to step 102 and the subsequent processing is performed.
[0047]
If “others” is selected from the options, an affirmative determination is made in step 107 described above, and the selection item determination unit 24 notifies the option setting unit 16 to set the next option. . Upon receiving the notification, the option setting unit 16 determines whether or not the next option exists based on the data stored in the candidate set DB 26 (step 110). The next option is set (step 111). Thereafter, the process returns to step 102 described above, the next option is notified to the voice recognition processing unit 12 and the guidance sentence generation unit 18, and the subsequent processing is performed.
[0048]
If the next option does not exist, a negative determination is made in step 110, and in this case, the option setting unit 16 generates a guide message for guiding that there is no next option. An instruction is sent to the generation unit 18. Upon receipt of the instruction, a predetermined guidance sentence is generated by the guidance sentence generator 18 and output to the voice synthesizer 20, and a guidance voice notifying that there is no next option is output from the speaker 22 (step 112). Thereafter, the process returns to step 103 described above, and the options guided during the previous processing are presented to the user again, and the subsequent processing is repeated.
[0049]
In addition, in the determination process of whether or not the next candidate set exists in the above-described step 108, a negative determination is made when the next candidate set does not exist, and the selection item determination unit 24 is finally determined by the user. The operation instruction output unit 30 is notified of the content of the selected item. Upon receiving the notification, the operation instruction output unit 30 outputs an operation instruction corresponding to the content of the item selected by the user to the navigation device 2 or the like (step 113).
[0050]
Next, a dialogue performed between the voice search device 1 and the user will be specifically described in accordance with the processing shown in FIG. 4 described above. In the following description, an example of the interaction between the users is described with “U” as the user and “S” as the voice search device 1. Further, together with the dialogue example, the drawings showing the contents of the data read from the candidate set DB 26 will be referred to as appropriate.
[0051]
(Dialogue example 1)
Dialogue example 1 shows a dialogue example when searching for the nearest gas station. FIG. 5 is a diagram showing the contents of data used in Dialogue Example 1.
U: Press the dialogue start button 14.
S: “Please select from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”… (1)
U: “Fuel station search” (2)
S: “Search for gas stations. Then, select the franchise name from A, B, C, D, E, etc.”… (3)
U: “B Petroleum” (4)
S: “Because it is B Petroleum, please choose from 2km ahead right side, 2.5km left side, 3km ahead left side, 5km ahead right side, etc.”… (5)
U: “2km ahead right”… (6)
S: “It's 2 km ahead on the right side, so let ’s set the destination at B Petroleum Iwaki”… (7)
As shown in FIG. 5, when the dialogue start button 14 is pressed by the user, the candidate set of the highest hierarchy is first read, and a plurality of functions that can be selected by the user are as in the above-described voice (1). Be guided to.
[0052]
Corresponding to this voice, as shown in the above-mentioned voice (2), when the “gas station search” is selected by the user, a candidate set in the lower hierarchy corresponding to this gas station selection is read out, and the user A plurality of franchise names that can be selected are guided as in the voice (3) described above.
[0053]
Here, as shown in the voice (4) described above, when the user selects the franchise name “B Petroleum”, a candidate set in a lower hierarchy corresponding to this B Petroleum is read out and the vehicle position is set as a reference. The position (relative distance) of each facility is guided as in the voice (5) described above.
[0054]
Next, as shown in the voice (6) described above, when the position “2 km ahead right” is selected by the user, “B Petroleum Iwaki Store”, which is one filling station corresponding to the selected position, is displayed. As specified and shown in voice (7) above, this gas station is set as the destination of the route search, and a series of processing ends.
[0055]
(Dialogue example 2)
Dialogue example 2 shows a dialogue example in the case where the nearest gas station is searched in the same manner as in dialogue example 1 described above, and when re-request button 15 is pressed. The contents of data used in the dialogue example 2 are the same as those in FIG.
U: Press the dialogue start button 14.
S: “Choose from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”
U: “Fuel station search”
S: “Search for gas stations. Then select the franchise name from A, B, C, D, E, etc.”
U: Press the re-request button 15.
S: “Input is a gas station search, then select a franchise name from A, B, C, D, E, etc.”… (8)
U: “B Oil”
S: “Because it is B Petroleum, please choose from 2km ahead right side, 2.5km left side, 3km ahead left side, 5km ahead right side, etc.”
U: “2km ahead right”
S: “It is 2 km ahead, right, so we will set the destination at B Petroleum Iwaki store.” As shown in the voice (8) in the above dialogue example, when the re-request button 15 is pressed by the user, The contents of the candidate set guided to are guided again.
[0056]
In this case, it is desirable to change the content of the guidance sentence at the first time and the second time. In the above example, the first guidance sentence is “Search for a gas station,” and the second guidance sentence is “Input is a gas station search, continue…”. has been edited. In addition, since it is possible that the guidance voice is difficult to hear, when the re-request is made, the utterance speed of the second voice may be made slower than the first voice.
[0057]
(Dialogue example 3)
Dialogue example 3 shows a dialogue example in the case where the nearest gas station is searched in the same manner as in dialogue example 1 described above, and “others” is selected from the options. FIG. 6 is a diagram showing the contents of data used in Dialogue Example 3.
U: Press the dialogue start button 14.
S: “Choose from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”
U: “Fuel station search”
S: “Search for gas stations. Then select the franchise name from A, B, C, D, E, etc.”
U: “Others” (9)
S: “Please select from F, G, H, I, J, etc.”… (10)
U: “G Oil”
S: “It ’s G Petroleum, please choose from 2km ahead right side, 2.5km left side, 3km ahead left side, 5km ahead right side, etc.”
U: “2km ahead right”
S: “It is 2km ahead, right, so we set the destination at G Petroleum Iwaki Store” As shown in the voice (9) in the above dialogue example, the user selects “Other” from the options Corresponding to this “other”, a candidate set including the next option in the same hierarchy is read, and a franchise name that can be selected by the user is added as shown in the voice (10) described above. Guided.
[0058]
(Dialogue example 4)
Similar to the above-described dialog example 3, the dialog example 4 shows a dialog example in the case where “other” is selected from the options and there is no next candidate set in the same hierarchy. . FIG. 7 is a diagram showing the contents of data used in the dialogue example 4.
U: Press the dialogue start button 14.
S: “Choose from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”
U: “Fuel station search”
S: “Search for gas stations. Then select the franchise name from A, B, C, D, E, etc.”
U: "Other"
S: “Please choose from F, G, H, I, J, etc.”
U: “Others” (11)
S: "I'm sorry. There are no other candidates" ... (12)
As shown in the voice (11) in the dialogue example described above, when “other” is selected from the options by the user and there is no next candidate set in the same hierarchy, the voice mentioned above As shown in (12), the user is informed that there are no more options that can be selected by the user.
[0059]
(Dialogue example 5)
Dialog example 5 shows an example dialog when searching for a desired facility. FIG. 8 is a diagram showing the contents of data used in the dialogue example 5.
U: Press the dialogue start button 14.
S: “Choose from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”
U: “Facilities search” (13)
S: “Looking for facilities, then select the facility region from Hokkaido, Tohoku region, Kanto region, Chubu region, Kinki region, and others”… (14)
U: “Tohoku region”… (15)
S: “In the Tohoku region, please select the prefecture of the facility from Fukushima, Akita, Iwate, Miyagi, Aomori, and others”… (16)
U: “Fukushima Prefecture” (17)
S: “In Fukushima, please select the first letter of the facility name from A, K, S, TA, NA, etc.” ... (18)
U: “A line”… (19)
S: “Ah, please select from R-Pi, R-Pi-Giken, R-Pi Information System, R-Pi Office, R-Pi Logistics, etc.”… (20)
U: “R pie” (21)
S: “R pie, then set the destination on the R pie”… (22)
Thus, a plurality of selectable functions are guided by voice, and the user selects a target function corresponding to the voice. As shown in the voice (13) described above, when “facility search” is selected by the user, a lower-layer candidate set corresponding to this facility search is read out, as shown in the voice (14) described above. The region where the facility is located is guided as an option.
[0060]
Here, as shown in the voice (15) described above, when “Tohoku region” is selected as a region where the facility is located by the user, a candidate set of a lower hierarchy corresponding to this “Tohoku region” is read out, The name of the prefecture where the facility is located is guided as in the voice (16) described above.
[0061]
As shown in the voice (17) described above, when “Fukushima Prefecture” is selected by the user, the corresponding lower-layer candidate set is read out, and as shown in the voice (18), the name of the facility The first character (A line, KA line, etc.) is guided as an option.
As shown in the above voice (19), when “A row” is selected by the user, the corresponding lower layer candidate set is read out, and as shown in the above voice (20), “Fukushima The name of the facility that is located in “” and the first character of the facility name belongs to “A” is guided as an option.
[0062]
Here, as shown in the voice (21) described above, when one facility name “R pie” is selected by the user, “R pie”, which is one facility, is specified. As shown in 22), this facility is set as the destination of the route search, and a series of processing ends.
[0063]
As described above, in the first embodiment, data including a candidate set having a predetermined hierarchical structure is stored in the candidate set DB 26, and based on this candidate set, an option to be the next audio output target is selected. A plurality of corresponding character strings are extracted, and the readings of these character strings are output as voice. Then, the character string to be recognized is presented to the user in advance, and the character string corresponding to the voice input by the user corresponding to this presentation is selected from among the plurality of character strings that are the targets of the voice output. Since the selection and the option selected by the user are specified, the accuracy of voice recognition can be improved. In particular, it is possible to extract a character string to be output next by simply following a candidate set having a hierarchical structure from the upper layer in order, so that the processing can be simplified.
[Second Embodiment]
By the way, in the first embodiment described above, the candidate set DB 26 is prepared and stored in advance with a candidate set having a hierarchical structure. However, the first set using a database having a general table format structure is used. Processing similar to that in the embodiment can also be performed.
[0064]
FIG. 9 is a diagram illustrating a configuration of a vehicle-mounted system configured to include the voice search device 1A of the second embodiment. The voice search device 1A according to the second embodiment shown in FIG. 9 is different from the voice search device 1 according to the first embodiment described above in that the candidate set DB 26 is replaced with a candidate set DB 26a having different data contents. The operation procedure for narrowing down the content of the operation instruction corresponding to the voice uttered by the user is different with the change of the data content. Hereinafter, description will be made mainly focusing on differences from the first embodiment.
[0065]
The candidate set DB 26a stores data necessary for the option setting unit 16 to set a plurality of options.
FIG. 10 is a diagram illustrating a structure of data stored in the candidate set DB 26a according to the second embodiment. As shown in FIG. 10, the data stored in the candidate set DB 26a of the second embodiment is in a table format, unlike the case of the first embodiment described above.
[0066]
The data (table information) stored in the candidate set DB 26a is composed of three elements “priority”, “candidate set title”, and “option”.
The “priority” has the same meaning as the hierarchy in the first embodiment described above. That is, when deciding some operation instruction, one option is selected from a plurality of options in order from the “function” of priority 1. A specific example of processing for presenting and selecting options will be described later. It should be noted that the options associated with the priorities 1 to 6 shown in FIG. 10 correspond to the “search key”, and the options of the priority 7 that are finally specified are “search target items”. It corresponds.
[0067]
In the candidate set DB 26a, one horizontal row is one data group (hereinafter referred to as “record”). For example, the record in the first row shown in FIG. 10 is a data group related to the facility name “B Petroleum Iwaki Store”, and the function is related to “Gas Station Search”, and the franchise name is “B Petroleum”. This indicates that the region where the facility is located belongs to the “Tohoku region”, the prefecture where the facility is located belongs to “Fukushima”, and the first character of the facility name belongs to “ha line”. In addition, about the position, the content is updated by DB update part 28 mentioned above. The candidate set DB 26a includes a plurality of such records. This candidate set DB 26a corresponds to the table storage means.
[0068]
The voice search device 1A of the present embodiment has such a configuration, and the operation thereof will be described next.
FIG. 11 is a flowchart showing a partial operation procedure of the voice search device 1A according to the second embodiment. Note that the basic operation procedure of the music search device 1A is the same as that of the music search device 1 of the first embodiment shown in FIG. 4 described above, and the processing content of step 101 is different from the processing content after step 107. ing. FIG. 11 mainly shows the different parts of the processing contents.
[0069]
The option setting unit 16 determines whether or not the dialog start button 14 has been pressed by the user (step 100). If the dialog start button 14 is not pressed by the user, a negative determination is made, and the process of step 100 is repeated.
When the dialogue start button 14 is pressed, an affirmative determination is made in step 100, and the option setting unit 16 extracts data belonging to the column of “priority 1” from the candidate set DB 26a, and uses the extracted data. The first candidate set is set (step 101A). Specifically, as shown in FIG. 10, in the present embodiment, the data in the column of “Priority 1” includes the contents of various functions, and candidates including the contents of these functions as options. Set is set. Thereafter, similarly to the first embodiment described above, the processing shown in step 102 to step 107 shown in FIG. 4 is performed.
[0070]
The selection item determination unit 24 determines whether or not “other” is selected from the choices based on the speech recognition result output from the speech recognition processing unit 12 (step 107).
If “others” is not selected, a negative determination is made in step 107, and that is notified from the selection item determination unit 24 to the option setting unit 16. Upon receiving the notification, the option setting unit 16 narrows down the options that are candidates to be presented as the next candidate set in accordance with the option selected by the user (step 120). For example, if “refill station search” is selected by the user, the selection item determination unit 24 narrows down records corresponding to this “refill station search”.
[0071]
Next, the option setting unit 16 determines whether or not there is a candidate set including two or more options in order from the candidate set with the highest priority (step 121).
If there is a candidate set including two or more options, an affirmative determination is made in step 121, and then the option setting unit 16 selects the candidate set identified in step 121 (a candidate including two or more options). Corresponding to the set), a predetermined number of options are extracted and the next candidate set is set (step 122).
[0072]
FIG. 12 is a flowchart showing a detailed procedure of the process shown in step 122. First, the option setting unit 16 selects a candidate set having two or more options with different priorities based on the data stored in the candidate set DB 26a (step 130).
[0073]
Next, the option setting unit 16 determines whether or not the number of types of options included in the selected candidate set is equal to or less than a predetermined number (five in this embodiment) (step 131).
If the type of option is not less than the predetermined number, a negative determination is made in step 131, and then the option setting unit 16 extracts a predetermined number of options (step 132).
[0074]
If the number of options is less than or equal to the predetermined number, an affirmative determination is made in step 131, and then the selection system setting unit 16 extracts all existing options (step 133).
Next, the option setting unit 16 sets the options extracted in the process shown in step 132 or step 133 as the next candidate set (step 134), and the process in step 122 shown in FIG. 11 ends. Thereafter, the process returns to step 102, and the subsequent processing is repeated.
[0075]
In step 107 described above, when “others” is selected from the options, an affirmative determination is made, and the selection item determination unit 24 notifies the option setting unit 16 to set the next option. Upon receiving the notification, the option setting unit 16 determines whether there are other options other than the options already presented in the previous process based on the data stored in the candidate set DB 26a (step 123). If it exists, an affirmative decision is made and the next option is set (step 124). Thereafter, the process returns to step 102 described above, the next option is notified to the voice recognition processing unit 12 and the guidance sentence generation unit 18, and the subsequent processing is performed.
[0076]
If there is no next option, a negative determination is made in step 123. In this case, the option setting unit 16 sends an instruction to the guide sentence generation unit 18 so as to generate a guide sentence that guides that there is no next option. Upon receipt of the instruction, a predetermined guidance sentence is generated by the guidance sentence generator 18 and output to the voice synthesizer 20, and a guidance voice notifying that there is no next option is output from the speaker 22 (step 125). Thereafter, the process returns to step 103 described above, and the options guided during the previous processing are presented to the user again, and the subsequent processing is repeated.
[0077]
In step 121 described above, when there is no longer a candidate set including two or more options, a negative determination is made, and the selection item determination unit 24 operates the contents of the options finally selected by the user. The instruction output unit 30 is notified. Upon receiving the notification, the operation instruction output unit 30 outputs an operation instruction corresponding to the content of the option selected by the user to the navigation device 2 or the like (step 126).
[0078]
Next, the dialogue performed between the voice search device 1A and the user will be specifically described in accordance with the processing shown in FIG. 11 described above. Together with this dialogue example, the data stored in the candidate set DB 26a will be described. The manner in which necessary records are extracted from the inside will be described with reference to the drawings as appropriate.
[0079]
(Dialogue example 6)
Dialog example 6 shows an example of dialog when searching for the nearest gas station. FIG. 13 is a diagram showing the contents of records extracted from the candidate set DB 26a in the dialogue example 6.
U: Press the dialogue start button 14.
S: “Choose from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”… (23)
U: “Fuel station search” (24)
S: “It ’s a gas station search, so choose a franchise name from A, B, C, D, E, etc.”… (25)
U: “B Petroleum” (26)
S: “Because it is B Petroleum, please choose from 2km ahead right side, 5km left side, etc.”… (27)
U: “2km ahead right”… (28)
S: “It's 2km ahead on the right side, so let ’s set the destination at B Petroleum Iwaki”… (29)
As shown in FIG. 13, when the dialogue start button 14 is pressed by the user, first, a plurality of choices are extracted corresponding to “function” that is a candidate set title of priority 1 and can be selected by the user. A plurality of functions are guided like the voice (23) described above.
[0080]
Corresponding to this voice (23), as shown in the above-mentioned voice (24), when the “gas station search” is selected by the user, only records corresponding to this “gas station selection” are narrowed down. A plurality of choices are extracted in correspondence with “franchise name”, which is a candidate set title of priority 2 with a high priority, and a plurality of franchise names that can be selected by the user are guided as in the voice (25) described above. Is done.
[0081]
As shown in the voice (26), when the user selects the franchise name “B Petroleum”, only the records corresponding to this “B Petroleum” are narrowed down, the next highest priority, and two Corresponding to the “position” of priority 6 that is a candidate set title including the above types of options, a plurality of options are further extracted, and the position of each facility (relative distance) based on the vehicle position ) Is guided as in the voice (27) described above.
[0082]
As shown in the voice (28) described above, when the position “2 km ahead right” is selected by the user, “B Petroleum Iwaki Store”, which is one filling station corresponding to the selected position, is specified. As shown in the voice (29) described above, this gas station is set as a route search destination, and a series of processing ends.
[0083]
When the re-request button 15 is pressed, the same dialogue as the dialogue example 2 in the first embodiment described above is performed, and the content of data used in this case is as shown in FIG. It is the same.
(Dialogue example 7)
Dialog example 7 shows a dialog example in the case where the nearest gas station is searched in the same manner as dialog example 6 described above, and “others” is selected from the options. FIG. 14 is a diagram showing the contents of records extracted from the candidate set DB 26a in the dialogue example 7.
U: Press the dialogue start button 14.
S: “Choose from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”
U: “Fuel station search”
S: “Search for gas stations. Then select the franchise name from A, B, C, D, E, etc.”
U: “Others” (30)
S: “Now, please select from F, G, etc.”… (31)
U: “G Oil” (32)
S: “It ’s G Oil, so let ’s set a destination at G Oil Iwaki”… (33)
As shown in the voice (30) in the dialogue example 7 described above, when “others” is selected from the choices by the user, the already presented A petroleum, B petroleum, C petroleum, D petroleum, E petroleum The excluded records are narrowed down, and the franchise names that can be selected by the user are additionally guided as shown in the voice (31) described above.
[0084]
As shown in the above-described voice (32), when the user selects the franchise name “G oil”, only records corresponding to this “G oil” are narrowed down. In this case, at the time of narrowing down based on the franchise name, one record is narrowed down, and the “G Oil Iwaki Store”, which is one gas station to be guided, is specified. As shown in the voice (33), this gas station is set as the destination of the route search, and a series of processing ends.
[0085]
(Dialogue example 8)
Similar to the above-described dialog example 7, the dialog example 8 shows a dialog example when “other” is selected from the options and there is no option that can be presented next. FIG. 15 is a diagram showing the contents of records extracted from the candidate set DB 26a in the dialogue example 8.
U: Press the dialogue start button 14.
S: “Choose from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”
U: “Fuel station search”
S: “Search for gas stations. Then select the franchise name from A, B, C, D, E, etc.”
U: "Other"
S: “Please choose from F, G, and others”
U: “Others” (34)
S: “I'm sorry. There are no other candidates”… (35)
As shown in the voice (34) in the dialogue example described above, when “others” is selected from the options by the user and there is no next option that can be presented, As shown in the voice (35), the user is informed that there are no more options that can be selected by the user.
[0086]
As described above, in the second embodiment, predetermined table information having a predetermined table format and in which different priorities are associated with a plurality of options is stored in the candidate set DB 26a, and based on this table information. Thus, the readings of the corresponding character strings are output in voice from the option with the highest priority. The character string to be recognized is presented to the user in advance, and the character string corresponding to the voice input by the user in response to this presentation is selected from a plurality of character strings that are the targets of the voice output. Since the option selected by the user is specified, the accuracy of voice recognition can be improved. In particular, since data is stored in a table format, there is an advantage that records can be easily added or changed.
[0087]
[Modification]
The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the gist of the present invention. For example, in the above-described embodiment, the user sequentially selects one of a plurality of presented options to select a final option, and an operation instruction corresponding to the content is displayed in the navigation. Although it was performed for the device 2 or the like, the option may be automatically selected when the user desires.
[0088]
FIG. 16 is a diagram showing a configuration of a voice search device in a modified example in which options are automatically selected. The voice search apparatus 1B shown in FIG. 16 differs from the voice search apparatus 1 in the first embodiment described above in that a selection frequency learning unit 32 and a learning result storage unit 34 are added. Hereinafter, the configuration and operation will be described mainly focusing on the differences from the first embodiment.
[0089]
The selection frequency learning unit 32 learns the selection frequency by the user for a plurality of options presented to the user. The learning result storage unit 34 stores the learning result from the selection frequency learning unit 32.
In the present embodiment, when voice input for selecting an option is performed, the option is automatically selected when the user inputs “make it ok”. When this “automatic” is input, the selection item determination unit 24 automatically selects an option having a high past selection frequency using the learning result stored in the learning result storage unit 34, and selects an operation instruction. The contents are determined. In this case, the selection item determination unit 24 corresponds to a character string selection unit.
[0090]
The voice search device 1B has such a configuration. Next, an operation in the case of automatically selecting an option according to the selection frequency of past options will be described.
FIG. 17 is a flowchart showing a partial operation procedure of the voice search device 1B when an option is automatically selected according to the selection frequency of past options. The basic operation procedure of the music search device 1B is the same as that of the music search device 1 of the first embodiment shown in FIG. 4 described above, and the processing contents after step 107 are different. FIG. 17 mainly shows the different parts of the processing contents.
[0091]
The selection item determination unit 24 determines whether or not “other” is selected from the choices based on the speech recognition result output from the speech recognition processing unit 12 (step 107). The processing when “others” is selected is the same as that of the music search device 1 of the first embodiment shown in FIG.
[0092]
If “others” is not selected, a negative determination is made in step 107, and then the selection item determination unit 24 selects from the options based on the speech recognition result output from the speech recognition processing unit 12. It is determined whether or not “Left” is selected (step 140).
[0093]
When “Kid” is selected from the choices, an affirmative determination is made in step 140, and the selection item determination unit 24 reads the learning result stored in the learning result storage unit 34 and sets the past selection frequency to the past selection frequency. Based on this, an option is automatically selected (step 141). For example, in the present embodiment, all options up to the final option are automatically selected.
[0094]
When the final option is automatically selected according to the past selection frequency, or when a negative determination is made in step 108, the selection item determination unit 24 displays the content of the final option as an operation instruction output unit. 30 is notified. Upon receiving the notification, the operation instruction output unit 30 outputs an operation instruction corresponding to the content of the item selected by the user to the navigation device 2 or the like (step 113).
[0095]
Next, a dialogue performed between the voice search device 1B and the user will be specifically described in accordance with the processing shown in FIG.
(Dialogue example 9)
Dialogue example 9 shows a dialogue example in the case where the nearest gas station is searched in the same manner as in dialogue example 1 described above, and when “make money” is selected as an option. The contents of data used in the dialogue example 9 are the same as those in FIG.
U: Press the dialogue start button 14.
S: “Choose from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”
U: “Fuel station search”
S: “Search for a gas station. Then, select the franchise name from A, B, C, D, E, etc. Or,“ Please let me know ”…” 36)
U: “Let ’s do it”… (37)
S: “You can leave it, then set the destination at B Petroleum Iwaki”
In the dialog example described above, as shown in the voice (36), an option of “make it cheat” is newly added to the user. On the other hand, as shown in the voice (37), when “automatic” is selected by the user, the option with the highest selection frequency is automatically selected according to the past selection frequency. In the example described above, the options after the franchise name are automatically selected according to the past selection frequency. Specifically, “B Petroleum” is automatically selected as the franchise name, and “2 km ahead right” is automatically selected as the location, so that the option “B Petroleum Iwaki Store” is finally selected. Is selected.
[0096]
In the above-described example, all the options are automatically selected until reaching the final option. However, only the options at that time may be automatically selected. For example, if “Kid” is selected when selecting a franchise name, only this franchise name is automatically selected, and the position is shifted to “position”, which is a candidate set in a lower hierarchy. A plurality of options included may be presented. In the above-described example, the case where the process for automatically selecting an option is added to the first embodiment has been described. However, this function can be added to the second embodiment in the same manner. it can.
[0097]
Further, in the above-described modification, when “Left” is selected as the option, the option is selected according to the past selection frequency. However, the option is selected at random regardless of the past selection frequency. May be. In this case, the process of learning the past selection frequency becomes unnecessary, and the configuration can be simplified.
[0098]
Further, when the process of learning the past selection frequency of options is performed, the learning result may be reflected in the data stored in the candidate set DB 26 (or 26a). For example, in each of the above-described embodiments, when guiding a plurality of options related to the franchise name, “G Oil” is the second time performed when “Other” is selected from the options presented first time. Was presented at the time of guidance. However, if the learning result that “G petroleum” is selected with high frequency is obtained, this “G petroleum” may be presented at the first guidance. Alternatively, among the plurality of options included in one guidance, the guidance order may be changed according to the past selection frequency. For example, in the initial state, when guidance was given in the order of A petroleum, B petroleum, C petroleum,... In the first guidance, the selection frequency in the order of G petroleum, B petroleum, A petroleum,. If the learning result is obtained, the first guidance may be changed in the order of G petroleum, B petroleum, A petroleum,. In this case, the learning result storage unit 34 corresponds to the selection history storage unit.
[0099]
In each of the above-described embodiments, the re-request button 15 is pressed to hear the guidance voice again. However, this re-request operation may be performed by voice input. In this case, for example, a voice such as “again” may be input, and the immediately preceding guidance content may be output again in response to these voices.
[0100]
Further, in each of the above-described embodiments, the operation in the case of canceling the selected option and selecting a new option has not been described, but such processing can also be performed.
FIG. 18 is a flowchart partially showing an operation procedure of the speech search apparatus when an option once selected is canceled and a new option is selected. For example, the description will be made on the assumption that this processing is performed in the voice search device 1 described in the first embodiment. The basic operation procedure in this case is the same as the flowchart shown in FIG. 4 described above, and new processing is added after step 107. FIG. 18 mainly shows newly added processing contents. In this modification, the selection item determination unit 24 corresponds to a reselection instruction unit.
[0101]
The selection item determination unit 24 determines whether or not “other” is selected from the choices based on the speech recognition result output from the speech recognition processing unit 12 (step 107). Processing when “others” is selected by the user is the same as that in FIG. 4 described above, and a description thereof is omitted here.
[0102]
If “other” is not selected, a negative determination is made in step 107, and then the selection item determination unit 24 refers to “correction” based on the speech recognition result output from the speech recognition processing unit 12. It is determined whether or not a voice has been input (step 150). Specifically, a negative opinion is indicated by the voice input of “correction”, so that the process of canceling the option once selected is performed. A negative opinion may be indicated by performing voice input such as “return” or “different” instead of “correction”.
[0103]
When the voice “correction” is input, an affirmative determination is made in step 150, and the selection item determination unit 24 is one level higher than the hierarchy currently focused on for the option setting unit 16. Instructs the candidate set of the hierarchy to be set again. In response to this instruction, the option setting unit 16 sets the upper-layer candidate set again (step 151). Thereafter, the process returns to step 102 shown in FIG. 4 described above, and a character string corresponding to an option included in the upper-layer candidate set is notified as a recognition target, and this option is output as voice, and the subsequent processing is repeated.
[0104]
If the voice “correction” is not input, a negative determination is made in step 150. In this case, the process proceeds to step 108, and the subsequent processing is performed.
Next, the dialogue performed between the voice search device 1 and the user will be specifically described in accordance with the process shown in FIG.
[0105]
(Dialogue example 10)
Dialog example 10 shows a dialog example in the case where the nearest gas station is searched in the same manner as dialog example 1 described above, and a voice of “correction” is input.
U: Press the dialogue start button 14.
S: “Choose from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”
U: “Fuel station search”
S: “Search for gas stations. Then select the franchise name from A, B, C, D, E, etc.”
U: “A Oil” (38)
S: “E oil, please choose from 2km ahead right side, 2.5km ahead left side, 3km ahead left side, 5km left side, etc.”… (39)
U: “Correction” (40)
S: “Please choose a franchise name from A, B, C, D, E, etc.”… (41)
U: “A Oil”
S: “A petroleum, please choose from 2km ahead left side, 5km left side, other”
U: “2km ahead left”
S: “It is 2km ahead, so let ’s set the destination at A Petroleum Iwaki Store” In Dialog 10 above, “A Petroleum” is selected from the choices by the user as shown in voice (38) In spite of this, erroneous recognition occurred as shown in the voice (39), and “E Oil” was selected. In this case, as shown in the voice (40), the user inputs “correction” as a voice, and as shown in the voice (41), the user is based on the franchise name that is a candidate set in the upper hierarchy. The franchise names that can be selected are shown again.
[0106]
(Dialogue example 11)
The dialogue example 11 shows another dialogue example in the case where the voice “correction” is inputted.
U: Press the dialogue start button 14.
S: “Choose from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”
U: “Fuel station search”
S: “Search for gas stations. Then select the franchise name from A, B, C, D, E, etc.”
U: “Facilities search” (42)
S: “C oil, please choose from 2km ahead right side, 2.5km ahead left side, 3km ahead right side, 5km left side, etc.”… (43)
U: “Correction” (44)
S: “Please choose a franchise name from A, B, C, D, E, etc.”… (45)
U: “Correction” (46)
S: “Choose from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”… (47)
U: “Facilities Search”
Hereinafter, since the dialog after selecting the facility search is performed in the same manner as the above-described dialog example 5, the description is omitted here.
[0107]
In the dialogue example 11 described above, the user once selected “Filling station search”, but wants to select “Facility search”, so that the user is presented from the voice search device 1 as shown in the voice (42). “Facilities search”, which is an option different from the option, is input by voice. Even in this case, as shown in the voice (43), the voice search device 1 selects the one closest to the inputted voice from the options at that time, and continues the process.
[0108]
As shown in the voice (44), when the user inputs “correction” as a voice, as shown in the voice (45), the user can select based on the franchise name that is a candidate set in the upper hierarchy. Franchise names will be introduced again.
As shown in the voice (46), when the user further performs “correction” and voice input, as shown in the voice (47), based on the “function” that is a candidate set of a higher hierarchy. The user-selectable options are again guided.
[0109]
By the way, in the above-described dialogue example 11, even when voice input that does not follow the contents of the plurality of options presented is performed, one of the plurality of options at that time is selected. When voice input that does not match the content of the option is performed and it is difficult to specify the option, it may be notified that an effective recognition result has not been obtained. In this case, the validity of the recognition result is determined in the processing of step 106 shown in FIG. 4 described above, and when the character string matching rate is very low (for example, 10% or less), an option can be specified. What is necessary is just to notify that there was not. Below, the dialogue in the case where an effective recognition result is not obtained will be specifically described.
[0110]
(Dialogue example 12)
U: Press the dialogue start button 14.
S: “Choose from meal location search, gas station search, facility search, parking lot search, audio operation, etc.”
U: “Fuel station search”
S: “Search for gas stations. Then select the franchise name from A, B, C, D, E, etc.”
U: “H Oil” (48)
S: “I'm sorry. I couldn't recognize the word you entered. Choose your franchise name from A, B, C, D, E, etc.”… (49)
In the dialog example 12 described above, as shown in the voice (48), “H Petroleum” that is not presented as an option is selected by the user, so the voice search device 1 cannot obtain a valid recognition result. Therefore, as shown in the voice (49), the voice search device 1 notifies the user that the option could not be specified and provides guidance for prompting the user to input another option.
[0111]
Further, when performing the speech recognition process, the recognition accuracy may be improved in consideration of partial matching between the character string corresponding to the plurality of presented options and the character string corresponding to the voice input by the user. Good. For example, when selecting “Fuel station search” presented as an option, some users may utter only the portion “Fuel station”. In such a case, if only the entire character string “Fill station search” is subject to speech recognition, the match rate is low because it matches only partially with the “Fill station” spoken by the user, Of course, since the matching rate with other options (such as “meal location search”) is low, it may be difficult to accurately specify the option selected by the user. Therefore, for example, for gas station search, the character strings to be recognized are “gas station search” and “gas station”, and for meal location search, “meal location search”, “meal location”, “meal”, etc. Since it is possible to determine both the overall match and the partial match between the character string corresponding to the plurality of options and the character string for the voice input by the user, the recognition accuracy can be improved.
[0112]
Even when partial matching is considered in this way, it is preferable to output the entire character string when returning the recognition result. For example, even when “gas station” is input by the user, it is preferable that the entire character string is returned as a response of a corresponding recognition result, such as “It is a gas station search”.
[0113]
In each of the above-described embodiments, a plurality of options are presented and any one of them is input by voice. However, each option is presented with a predetermined code added to the desired option. You may make it input the code | cord | chord by voice. Specifically, numbers such as “1, 2, 3,...” And characters such as “A, B, C,. For example, in the case of presenting a plurality of functions as an option, “1: Meal location search, 2: Gas station search, 3: Facility search, 4: Parking search, 5: Audio operation, 6: Other, etc. Please select the appropriate number from the above "and output a guidance voice with a content such as" Any number from 1 to 6 ". As described above, when a predetermined code is used, the user only has to utter a code associated with a desired option, and voice input can be simplified. Further, since the character string to be subjected to the speech recognition process can be a simple character string such as a number, the recognition accuracy can be improved.
[0114]
Further, in each of the above-described embodiments, after all the multiple options are presented, the user selects one option and performs voice input. However, the user is presented before all the options are presented. When the voice input by is performed, the voice recognition process may be started at that time. Depending on the user, when listening to the output voice guidance, voice input may be started immediately after a desired option is output. In such a case, the operability can be further improved by promptly starting the speech recognition process even after all the options have not been presented.
[0115]
Further, in each of the above-described embodiments, the voice search device used alone has been described. However, the voice search device may be configured by distributing functions to servers and terminal devices connected via a network. Good.
FIG. 19 is a diagram illustrating a configuration example of a voice search device when functions are distributed and arranged in a server and a terminal device connected via a network. The voice search device shown in FIG. 19 includes a voice search terminal device 4 and a server 5 connected via a predetermined network 6.
[0116]
The voice search terminal device 4 basically has the same configuration as the voice search device 1A in the second embodiment described above, except that a communication processing unit 36 is added. The configuration of the voice search terminal device 4 may be the same as that of the voice search device 1 in the first embodiment described above.
[0117]
The communication processing unit 36 provided in the voice search terminal device 4 performs communication processing for acquiring information necessary for updating data stored in the candidate set DB 26 a from the server 5 via the network 6. The DB update unit 28 updates the data stored in the candidate set DB 26a based on the information received by the communication processing unit 36. This update process is performed prior to the predetermined process by the voice search terminal device 4.
[0118]
The server 5 includes a server control unit 50, a candidate set DB 52, and a communication processing unit 54. The server control unit 50 controls the overall operation of the server 5. The candidate set DB 52 stores basically the same content data as the candidate set DB 26a provided in the voice search terminal device 4 described above. Data stored in the candidate set DB 52 is updated with new contents as needed.
[0119]
When a predetermined request is made from the voice search terminal device 4, the server control unit 50 extracts, from the candidate set DB 52, predetermined difference information including changes to the content that has been previously transmitted to the voice search terminal device 4. The difference information is transmitted to the voice search terminal device 4. The communication processing unit 54 performs communication processing necessary for the server 5 to exchange data with the voice search terminal device 4.
[0120]
Thus, since the content of candidate set DB26a with which the voice search terminal device 4 was equipped can be updated based on the predetermined difference information transmitted from the server 5, the voice search terminal device 4 is updated. New information can be reflected in various processes. In particular, since the information sent from the server 5 to the voice search terminal device 4 is differential information including changes to the information sent up to the previous time, the amount of data transmitted and received is reduced and the communication cost is reduced. be able to.
[0121]
Note that the server 5 described above has a function of storing information on search target items and search keys corresponding to the search target items. The voice search terminal device 4 has functions corresponding to a recognition target character string output unit, a microphone, a voice recognition processing unit, and an item extraction unit, and is necessary from the server 5 described above prior to various processes using these. Is getting information.
[0122]
FIG. 20 is a diagram illustrating another configuration example of the voice search device when the functions are distributed and arranged in a server and a terminal device connected via a network. The voice search device shown in FIG. 20 includes a voice search terminal device 4A and a server 5A connected via a predetermined network 6.
[0123]
In the speech search apparatus shown in FIG. 20, the candidate set DB 26 provided in the speech search apparatus 1 of the first embodiment described above (or the candidate set DB 26a provided in the speech search apparatus 1A of the second embodiment), options A configuration corresponding to a function realized by each of the setting unit 16 and the selection item determination unit 24 is arranged in the server 5A. Specifically, the server 5A includes a server control unit 50, a candidate set DB 52, a communication processing unit 54, an option setting unit 56, and a selection item determination unit 58.
[0124]
Further, the voice search terminal device 4A has the option setting unit 16, the selection item determination unit 24, the candidate set DB 26a, and the DB update unit 28 omitted from the voice search terminal device 4 described above, and a control unit 38 is added. Yes.
When outputting an operation instruction to the navigation device 2 or the like corresponding to the voice uttered by the user, the control unit 38 in the voice search terminal device 4A obtains the minimum data necessary for presenting the option. Obtained from the server 5A via the communication processing unit 36. The guide sentence generator 18 generates and outputs a predetermined guide sentence in accordance with an instruction from the controller 38. The server 5A acquires a voice recognition result for the user's voice from the voice search terminal device 4A, sets a next candidate set, and transmits data necessary for presenting options to the voice search terminal device 4A. For example, a process for extracting one option that is selected automatically is performed.
[0125]
Note that the server 5A described above stores information on search target items and search keys corresponding to the search target items, extraction processing of character strings to be speech output by the recognition target character string output means, and search by the item extraction means. It has a function to perform extraction processing of target items. The voice search terminal device 4A has functions corresponding to the recognition target character string output means, the microphone, and the voice recognition processing means, and obtains information necessary for these processes from the server 5A.
[0126]
FIG. 21 is a diagram illustrating another configuration example of the voice search device when the functions are distributed and arranged in a server and a terminal device connected via a network. The voice search device shown in FIG. 21 includes a voice search terminal device 4B and a server 5B connected via a predetermined network 6.
[0127]
The voice search device shown in FIG. 21 is different from the voice search device shown in FIG. 20 described above in that the function of the guidance sentence generation unit 18 is further arranged on the server side. Specifically, the server 5B includes a server control unit 50, a candidate set DB 52, a communication processing unit 54, an option setting unit 56, a selection item determination unit 58, and a guidance sentence generation unit 60. The voice search terminal device 4B is different in that the guidance sentence generation unit 18 is deleted from the voice search terminal device 4A. In the voice search device shown in FIG. 21, since the guidance sentence is generated by the server 5B, the control unit 38 in the voice search terminal apparatus 4B receives the guidance sentence generated by the server 5B and uses it as the voice synthesis unit 20. Output to. The other operation contents are the same as those of the voice search device shown in FIG.
[0128]
FIG. 22 is a diagram illustrating another configuration example of the voice search device when the functions are distributed and arranged in a server and a terminal device connected via a network. The voice search device shown in FIG. 22 includes a voice search terminal device 4C and a server 5C connected via a predetermined network 6. The speech search apparatus shown in FIG. 22 is different from the speech search apparatus shown in FIG. 21 described above in that the functions of the speech recognition processing unit 12 and the speech synthesis unit 20 are further arranged on the server side. Specifically, the server 5C includes a server control unit 50, a candidate set DB 52, a communication processing unit 54, an option setting unit 56, a selection item determination unit 58, a guidance sentence generation unit 60, a speech recognition processing unit 62, and a speech synthesis unit 64. It has. The voice search terminal device 4C is different in that the voice recognition processing unit 12 and the voice synthesis unit 20 are deleted from the voice search terminal device 4B.
[0129]
In the voice search device shown in FIG. 22, the voice of the user collected by the microphone 10 is converted into digital voice data by the control unit 38 and transmitted to the server 5C. Based on the transmitted voice data, a voice recognition processing unit 62 in the server 5C performs a predetermined voice recognition process. Further, in response to the guidance sentence generated by the guidance sentence generation unit 60, a predetermined voice synthesis process is performed by the voice synthesis unit 64, and voice data corresponding to the guidance sentence is generated. The generated voice data is transmitted to the voice search terminal device 4C, converted into an analog signal by the control unit 38 in the voice search terminal device 4C, and output to the speaker 22.
[0130]
In the voice search device of the modification shown in FIGS. 20 to 22, since many functions are arranged on the server side, the processing load on the voice search terminal device side is reduced, and the configuration can be simplified. There is an advantage that the cost of the voice search terminal device can be reduced.
Further, in each of the above-described embodiments and modifications, various forms have been described for the case where the voice search device of the present invention is applied to an in-vehicle system, but the scope of the present invention is limited to the in-vehicle system. Instead, it can be applied to various other systems.
[0131]
【The invention's effect】
As described above, according to the present invention, the character strings to be recognized are presented to the user in advance by outputting the readings of a plurality of character strings that can serve as search keys, and only these character strings are displayed. Since it is the target of speech recognition, the accuracy of speech recognition can be improved.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration of an in-vehicle system that includes a voice search device according to a first embodiment.
FIG. 2 is a diagram illustrating a structure of data stored in a candidate set DB.
3 is a diagram illustrating a correspondence relationship between a candidate set of an upper hierarchy and a candidate set of a lower hierarchy in the data structure shown in FIG.
FIG. 4 is a flowchart showing an operation procedure of the voice search device according to the first embodiment.
FIG. 5 is a diagram showing the contents of data used in Dialog Example 1.
FIG. 6 is a diagram showing the contents of data used in Dialog Example 3.
FIG. 7 is a diagram showing the contents of data used in Dialog Example 4.
FIG. 8 is a diagram showing the contents of data used in Dialog Example 5.
FIG. 9 is a diagram illustrating a configuration of an in-vehicle system configured to include a voice search device according to a second embodiment.
FIG. 10 is a diagram illustrating a structure of data stored in a candidate set DB according to the second embodiment.
FIG. 11 is a flowchart showing a partial operation procedure of the voice search device according to the second embodiment;
12 is a flowchart showing a detailed procedure of processing shown in step 122. FIG.
FIG. 13 is a diagram showing the contents of a record extracted from a candidate set DB in a dialogue example 6.
14 is a diagram showing the contents of a record extracted from a candidate set DB in a dialogue example 7. FIG.
15 is a diagram showing the contents of a record extracted from a candidate set DB in a dialogue example 8. FIG.
FIG. 16 is a diagram showing a configuration of a voice search device in a modified example in which options are automatically selected.
FIG. 17 is a flowchart showing a partial operation procedure of the voice search device when an option is automatically selected according to a selection frequency.
FIG. 18 is a flowchart partially showing an operation procedure of the speech search apparatus when an option once selected is canceled and a new option is selected.
FIG. 19 is a diagram illustrating a configuration example of a voice search device when functions are distributed to a server and a terminal device connected via a network.
FIG. 20 is a diagram illustrating another configuration example of the voice search device when functions are distributed and arranged in a server and a terminal device connected via a network.
FIG. 21 is a diagram illustrating another configuration example of the voice search device when the functions are distributed and arranged in a server and a terminal device connected via a network.
FIG. 22 is a diagram illustrating another configuration example of the voice search device when functions are distributed and arranged in a server and a terminal device connected via a network.
[Explanation of symbols]
1, 1A, 1B Voice search device
2 Navigation device
3 Audio equipment
4, 4A, 4B, 4C Voice search terminal device
5, 5A, 5B, 5C server
6 network
10 Microphone
12, 62 Voice recognition processing unit
14 Dialogue start button
15 Re-request button
16, 56 Option setting section
18, 60 Guide sentence generator
20, 64 Audio output unit
22 Speaker
24, 58 selection item determination unit
26, 26a, 52 Candidate set DB (database)
28 DB update section
30 Operation instruction output unit
32 Selection frequency learning section
34 Learning result storage

Claims

A search key is associated with each of a plurality of search target items, and a voice search for extracting a corresponding item from the search target items by comparing the content of user input speech with the search key In the device
A maximum number of character strings that can serve as the search key, and a recognition target character string output means for outputting a plurality of readings of the character strings in a range not exceeding the number;
A microphone that collects the user's voice,
Voice recognition processing is performed on the voice collected by the microphone, and a character string corresponding to the voice is selected from the character strings that are to be output by the recognition target character string output unit. Processing means;
Item extraction means for extracting the search target item corresponding to the search key specified by the character string selected by the voice recognition processing means;
A voice search device comprising:

In claim 1,
A switch that is operated before the user speaks;
2. A voice search apparatus, wherein voice recognition processing by the voice recognition processing means is started when the switch is operated.

In claim 1,
A voice search device further comprising selected character string confirmation means for outputting the reading of the character string selected by the voice recognition processing means.

In claim 3,
When a negative opinion is shown by the user with respect to the selection result of the character string by the voice recognition processing means, the reading of the plurality of character strings used to obtain the selection result is again voiced. The speech search apparatus further comprising: a reselection instruction unit that issues an output instruction to the recognition target character string output unit.

In any one of Claims 1-4,
When the total number of the character strings to be recognized exceeds the maximum number, the recognition target character string output unit reads the number of the character strings in a range not exceeding the maximum number divided into a plurality of times. Audio output,
The voice search apparatus according to claim 1, wherein the voice recognition processing unit performs selection determination of the character string every time voice is output.

In claim 5,
The voice further comprising voice output instruction means for instructing the recognition target character string output means to output voice for the second and subsequent times when the user instructs voice output of another selection candidate. Search device.

In any one of Claims 1-6,
A re-speech output instruction for instructing the recognition target character string output means to re-speech the reading of the plurality of character strings that were output immediately before, when re-speech output is instructed by the user A voice search device further comprising means.

In any one of Claims 1-7,
Further provided is a character string selection means for selecting the character string without using the result of the voice recognition processing by the voice recognition processing means when an instruction to perform the character string selection operation is given by the user. ,
When the character string is selected by the character string selection unit, the item extraction unit replaces the character string selected by the voice recognition processing unit and the item selected by the character string selection unit. A voice search device that performs an operation of extracting the search target item using a character string.

In any one of Claims 1-8,
Each of the search target items is associated with a plurality of the search keys, and when the search of the search target items cannot be performed by the item extraction unit corresponding to the one search key, A speech search apparatus that repeats the processing by the recognition target character string output means, the speech recognition processing means, and the item extraction means using the other search keys until one search target item can be narrowed down .

In claim 9,
Different priority is associated with each of the plurality of search keys, and table information in which a plurality of character strings corresponding to the plurality of search keys is associated with each of the plurality of search target items is stored. Further comprising table storage means for
The recognition target character string output means, based on the table information, outputs the corresponding reading of the character string by voice in order from the search key having the highest priority.

In claim 9,
When the character string corresponding to one search key is selected, the search key to be selected next and the tree structure information of a plurality of layers indicating the character string corresponding to the search key are stored. A tree structure storage means;
The recognition target character string output means extracts a plurality of the character strings corresponding to the search key to be output next, based on the tree structure information, and outputs the readings of these character strings as a sound A voice search device characterized by:

In any one of Claims 1-11,
Further comprising selection history storage means for storing past selection history information by the voice recognition processing means,
The recognition target character string output means determines the character string having a high selection frequency based on the selection history information and preferentially outputs the reading of the character string as a voice.

In any one of Claims 1-12,
Each of the plurality of character strings is composed of one of the 50 Japanese sounds.
The item extraction unit extracts the search key whose first word matches the one selected by the voice recognition processing unit.

In any one of Claims 1-13,
The speech search apparatus according to claim 1, wherein the speech recognition processing means selects the character string by comparing all characters constituting the character string with the entire speech recognition processing result.

In any one of Claims 1-13,
The voice search apparatus according to claim 1, wherein the voice recognition processing unit selects the character string by comparing characters constituting a part of the character string with the entire voice recognition processing result.

In claim 14 or 15,
The voice recognition processing means starts the character string selection operation when the user's voice is collected by the microphone before the voice output by the recognition target character string output means is finished. A featured voice search device.

In any one of Claims 1-16,
The maximum number is set in a range of 7 ± 2.

In any one of Claims 1-17,
Functions are distributed and distributed between servers and terminal devices connected via a network.
The server has a function of storing information about the search target item and the search key corresponding to each item,
The terminal device has functions corresponding to the recognition target character string output means, the microphone, the voice recognition processing means, and the item extraction means, and is necessary from the server prior to various processes using these. A voice search device characterized by acquiring information.

In claim 18,
The information sent from the server to the terminal device is difference information including changes to the information sent up to the previous time.

In any one of Claims 1-17,
Functions are distributed and distributed between servers and terminal devices connected via a network.
The server stores information related to the search target item and the search key corresponding to each of the search target items, and extracts the character string to be subjected to speech output by the recognition target character string output unit, and the item extraction unit. A function of performing extraction processing of the search target item;
The terminal device has functions corresponding to the recognition target character string output means, the microphone, and the voice recognition processing means, and acquires information necessary for these processes from the server. .