JP3748429B2

JP3748429B2 - Speech input type compound noun search device and speech input type compound noun search method

Info

Publication number: JP3748429B2
Application number: JP2002322547A
Authority: JP
Inventors: 久美子大森; 正信東田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2002-11-06
Filing date: 2002-11-06
Publication date: 2006-02-22
Anticipated expiration: 2022-11-06
Also published as: JP2004157748A

Description

【０００１】
【発明の属する技術分野】
本発明は、利用者が音声入力した複合名詞の検索語について、音声認識装置を利用し、データベースから検索し、利用者の入力を確定する検索装置および検索方法に関するものである。
【０００２】
【従来の技術】
音声認識技術を用い、利用者が入力した検索語を特定するシステムにおいて、対話処理実時間内に音声認識処理可能な語彙数には限界があり、認識対象となる語彙数が多ければ多い程、認識精度が低下するという認識技術の現状の課題から、利用者による音声入力を、一意に短時間で確定することは、非常に困難であることが知られている。
【０００３】
現状、利用者は、認識処理が終了し結果が提示されるまで待機し、認識結果の正誤は利用者に確認を行う以外に方策はなく、認識装置が出力する認識スコアの高い順に確認し、正解を確認することができるまで、候補の提示が繰り返される（たとえば、非特許文献１）。
【０００４】
【非特許文献１】
大森久美子、東田正信「大語彙の複合名詞を対象とする音声対話による確定率向上に関する検討」、情報処理学会第６４回（平成１４年）全国大会講演論文集（２）、平成１４年３月１２日、ｐ．２−２７５，２−２７６
【０００５】
【発明が解決しようとする課題】
利用者の音声入力について音声認識装置を利用し、入力を特定する従来システムは、認識対象である検索語が大規模であればある程、認識処理に時間がかかり、精度の低下が避けられない。利用者にとって、待機時間、候補提示の繰り返しは、ストレスに繋がり、オペレータ対応のシステムと比較すると、利用者満足度が得られず、音声ポータル等、現在の音声応答サービスを、繰り返し使いたいとはユーザが思わないのが現状である。
【０００６】
そこで、利用者にストレスを与えず、短時間かつ自然な対話の流れの中で入力を特定する手法として、検索語が名詞の複合語で構成されている場合、構成名詞の頻度を利用して検索補助データベースを作成し、この作成された検索補助データベースを用い、使用頻度の高い構成名詞を優先認識対象として予め定め、認識対象を予め絞り込むことによって、待機時間の減少、精度向上を目指すことが考えられる。
【０００７】
しかし、１つの検索語を構成する各名詞の使用頻度に偏りがある場合、または、非常に似通った構成名詞が多数、同じ構成順序に使用されている場合には、該当する構成名詞が、上記検索補助データベースに含まれている優先認識対象名詞リスト（使用頻度が高い名詞のリストであり、優先的に認識対象とする名詞のリスト）に登録されていないことがあり、そうであると、実在する検索語が存在しないという結果に繋がる。
【０００８】
また、該当する構成名詞が、上記優先認識対象名詞リストに登録されていたとしても、認識結果として、似通った単語が多数出力されるので、正解構成単語が、出力結果中に出現せず、実在検索語リスト作成処理の結果、実在する検索語が存在しないという結果に繋がる。
【０００９】
また、部分的に抽出でき、しかも有力と思われる単名詞（複合名詞を構成する単一の名詞）を付加情報とし、複合名詞を構成する複数の単名詞のうちで、部分的に認識尤度が高い単名詞を含む検索語を、検索データベースから抽出する部分一致検索語処理を行い、これと並行し、上記部分一致検索語処理の時間を利用し、認識処理が終了していない各順序の前方検索補助データベース（検索語である複合名詞を構成する複数の単名詞の前からｎ番目の単名詞を認識した単名詞群毎に作成されている検索補助データベース）と、後方検索補助データベース（検索語である複合名詞を構成する複数の単名詞の後からｎ番目の単名詞を認識した単名詞群毎に作成されている検索補助データベース）とを認識対象とし、各単名詞の認識処理を継続し、利用者への提示をする度に、その時点での出力結果をマージする処理を繰り返すことが考えられる。
【００１０】
上記音声認識装置を利用した音声入力型の検索において、音声認識技術のみを利用し、利用者による音声入力について、確定するので、認識装置の出力結果を利用者に提示し、確認すること以外に確定の手法がなく、認識対象語数が大規模になればなる程、また似通った対象が多ければ多い程、認識処理時間が長くなり、さらに認識精度が低下し、したがって、利用者を待機させた上、誤認識による提示確認の繰り返しに繋がる。
【００１１】
現状では、音声認識装置を利用した音声入力型検索において、利用者満足度を獲得するためには、限られた分野や地域への適用と、階層的な構成をしている情報に対して種類数の少ない上位階層から順に、確実に確定しながら、順にグループ化し、絞り込むこととによって、認識対象数を減少させなければ、利用者に許容される範囲である速度と精度とを得ることが困難である。
【００１２】
さらに、周囲の雑音レベルも、様々な場所からの不特定多数話者の入力による音声認識は非常に難しく、このような場合では、最も容易であると言われている単語認識においても、実用レベルで扱える単語数は、数百が限度であると言われている。
【００１３】
本発明は、検索対象がどんなに大規模であっても、利用者が音声で入力した検索語を、検索データベース中から検索し確定する場合、オペレータ対応の検索装置のように、迅速かつ自然に検索処理を実現することができる音声入力型複合名詞の検索装置および音声入力型複合名詞の検索方法を提供することを目的とするものである。
【００１４】
【課題を解決するための手段】
本発明は、利用者が音声入力した検索語を構成する複数の単名詞のうちで、一部の単名詞のみの認識尤度が所定の閾値を超える場合、上記閾値を超える認識尤度を具備する上記単名詞を備え、上記検索語を構成する単名詞の数と同じ数の単名詞を備えている部分一致検索語候補を、検索データベースから抽出する部分一致検索語候補抽出手段と、上記抽出された部分一致検索語候補を対象として音声認識し、しかも、上記抽出された部分一致検索語候補を構成する単名詞毎に、上記入力された音声を再認識する再認識手段と、上記再認識手段が再認識した結果の認識尤度を、所定の演算方法で演算することによって、統合尤度を演算する統合尤度演算手段とを有する音声対話型複合名詞の検索装置である。
【００１５】
【発明の実施の形態および実施例】
図１は、本発明の一実施例である音声対話型複合名詞の検索装置１を示すブロック図である。
【００１６】
音声対話型複合名詞の検索装置１は、音声入力部２と、音声認識部３と、音声認識用ソフトウェア３Ｓと、音声認識結果出力部４と、検索語候補リスト作成部５と、対話制御部６と、音声出力部７と、音声出力用ソフトウェア７Ｓと、システムデータベース８とによって構成されている。
【００１７】
検索語候補リスト作成部５は、実在検索語リスト作成部５１と、部分一致検索語リスト作成部５２とによって構成されている。
【００１８】
また、対話制御部６は、検索語候補選定対話部６１と、部分一致検索語候補選定対話部６２とによって構成されている。音声認識部３において、入力音声に対する音声認識処理の際、システムデータベース８を利用する。
【００１９】
システムデータベース８は、検索データベース８１と、検索補助データベース８２と、ＹＥＳ／ＮＯデータベース８３とによって構成されている。
【００２０】
検索データベース８１は、複数の単名詞で構成されている複合名詞が検索語として登録され、しかも、上記検索語が各単名詞に区切られて登録されているデータベースである。
【００２１】
検索補助データベース８２は、上記登録されている各複合名詞のｎ番目（ｎは整数値）に表記されている単名詞の群を、「ｎ番目表記の単名詞集合群」と呼び、上記ｎ番目表記の単名詞集合群が、その頻度の高い順に登録され、しかも、この頻度の高い順に登録されている単名詞が、その頻度の高い方から順に、所定の数毎にまとめられ、複数のサブ集合群が形成されているデータベースである。
【００２２】
音声認識部３は、音声認識装置を使用し、音声出力部７は、音声出力の際に、音声出力装置を使用する。
【００２３】
図２は、検索データベース８１の全体像の一例を示す図である。
【００２４】
図３は、検索補助データベース８２の全体像の一例を示す図である。
【００２５】
検索データベース８１は、対話処理実時間内に音声認識処理不可能な数の検索語を保持し、各検索語は、図２に示すように、構成名詞に分割された状態で記録されている。検索補助データベース８２は、検索データベース８１中の全検索語について、上記ｎ番目表記の単名詞群の中で、その単名詞の使用頻度が算出され、この算出された使用頻度順に、上記単名詞が並べられている状態（順位付けられている状態）で、格納されている。
【００２６】
検索補助データベース８２は、検索語を構成する単名詞の最大数をｐとすると、１番目表記の単名詞用の検索補助データベースから、ｐ番目表記の単名詞検索用の検索補助データベースまで存在する。
【００２７】
各検索補助データベース８２が保持している単名詞（構成名詞）の数は、検索データベース８１が保持している全ての検索語数よりも少ないことが予想され、対話処理実時間内には処理不可能な数であるとする。
【００２８】
各検索補助データベースは、検索装置１が予め既定した対話処理実時間内に処理可能な語数毎に、使用頻度が高い単名詞から順に、複数の単名詞を細分化した形で格納されている。第ｎ検索補助データベースが格納している複数の単名詞のうちで、使用頻度が最も高い単名詞のグループを、「第１サブ集合群」と呼ぶ。
【００２９】
音声対話型複合名詞の検索装置１は、音声入力部２を介して入力された利用者の音声を、音声認識部３へ送る。
【００３０】
音声認識部３は、音声認識装置を利用し、利用者入力音声について、認識処理を行う。音声認識部３が、複合名詞を単名詞（構成名詞）単位に分割し、構成名詞毎に認識結果を出力する機能を保持していない場合、単名詞毎に区切って入力することを、利用者に指示する。音声力された音声信号を、音声認識部３が、構成名詞単位（単名詞単位）で分割し、単名詞毎に認識結果を出力する機能を保持している場合、利用者は、検索したい複合名詞を一連で音声入力する。
【００３１】
また、音声認識部３は、音声力された音声信号を、音声認識部３が、単名詞単位（構成名詞単位）で分割し、単名詞毎に認識結果を出力する機能を保持していない場合、利用者Ｐｃに、単名詞毎（単語毎）に区切って音声入力するように指示する。
【００３２】
音声認識部３は、検索装置１の処理状況に合わせて、認識対象とするデータベースを、システムデータベース８から選択する。検索語が入力された場合、検索補助データベース８２を参照し、利用者への正誤確認に対する応答を認識する場合は、ＹＥＳ／ＮＯデータベース８３を参照する。
【００３３】
音声認識結果出力部４は、認識結果を、検索語候補リスト作成部５へ送る。
【００３４】
この時点で、まだ認識処理が終了していないサブ集合群が、各検索補助データベース８２中に存在する場合、音声認識部３２は、サブ集合群に対する認識処理を再びスタートさせ、音声認識結果出力部４３は、認識結果を出力し、検索語候補リスト作成部５へ送る処理を繰り返す。
【００３５】
検索語候補リスト作成部５において、実在検索語候補リスト作成部５１は、認識結果の中から、有力構成名詞候補を選択し、有力構成名詞候補の組み合わせを作成し、検索データベース８１を参照しながら、実在する検索語を抽出した実在検索語候補リストを作成し、この作成された実在検索語候補リストを対話制御部６へ送る。
【００３６】
実在検索語を抽出する場合、有力構成名詞の全組み合わせを作成し、検索データベース８１を参照し、実在するか否かをチェックする。
【００３７】
また、有力構成名詞候補を１つずつ組み合わせる度に、該当する有力構成名詞の繋がりを有する検索語が実在するか否かを、検索データベース８１を使用して、チェックし、実在することが確認された有力構成名詞候補の組み合わせについてのみ、他の構成順序の有力構成名詞を組み合わせ、再び検索データベース８１をチェックようにしてもよい。
【００３８】
さらに、有力構成名詞を組み合わせる順序として、先頭から順に組み合わせるようにしてもよく、また、検索時間が最短になるように組み合わせを選択するようにしてもよい。
【００３９】
実在検索語リストを作成している間に、部分一致検索語リスト作成部５２は、認識結果から選択された有力構成名詞候補を部分的に含む検索語を、「部分一致検索語候補」として、検索データベース８１から抽出し、「部分一致尤度」を算出し、既定閾値以上の部分一致尤度を有する部分一致検索語について、認識処理を行う。
【００４０】
検索語を単名詞毎に区切り、利用者が音声入力した場合、構成名詞数が同じである検索語のみを、検索データベース８１から選択し、部分一致検索語全体について、最初に入力された音声を利用して認識処理すると同時に、有力構成名詞を抽出することができない箇所のみについて、認識対象名詞リストを作成し、最初に音声入力された音声の中で、該当番目表記の単名詞のみについて、認識処理する。つまり、単名詞毎に音声認識を再度実行する。
【００４１】
音声入力された音声信号を単名詞毎に分割し、認識する機能を、音声認識装置が有する場合は、部分一致検索語候補を使って、利用者が音声入力した音声信号を認識処理する。この認識処理の結果を対話制御部６へ送る。
【００４２】
対話制御部６は、実在検索語候補と部分一致検索語候補とを利用し、検出候補選定対話を実行する。このときに、部分一致検索語リスト作成部５２から送られた部分一致検索語に対する結果を利用し、各検索語候補について、部分一致尤度を算出する。
【００４３】
部分一致検索語全体について認識処理を行った場合、それまでの部分一致尤度を利用せずに、算出された認識尤度を、部分一致尤度として利用するようにしてもよく、部分一致している有力構成名詞毎の認識尤度の和と、算出された認識尤度とを加え、「部分一致した構成名詞数＋１」で除算することによって、部分一致尤度を求めるようにしてもよく、また、ただ和や積をとることによって、部分一致尤度を求めるようにしてもよい。
【００４４】
また、有力構成名詞が出力されていない箇所のみを対象にして、単名詞を認識した場合、予め算出されている有力構成名詞毎の尤度と、加算または乗算し、構成名詞数で除算するようにしてもよく、また、各尤度の積をとり（各認識尤度を互いに掛け合い）、これを、構成名詞数で除算するようにしてもよい。
【００４５】
そして、実在検索語候補リスト中の検索語候補が、利用者との確認処理のみで検索語を特定可能な検索装置の規定条件を満たす場合は、確認処理ガイダンスの出力命令を音声出力部７へ送る。
【００４６】
実在検索語候補リスト中の検索語候補が、利用者との確認処理のみでは検索語を特定可能な条件を満たさない場合、または、実在検索語候補が抽出されない場合は、部分一致検索語処理の結果、算出された新たな部分一致尤度が利用者との確認処理のみで検索語を特定可能な検索装置の規定条件を満たす場合は、確認処理ガイダンスの出力命令を、音声出力部７へ送る。
【００４７】
確認処理のみで特定可能な検索語候補が、実在検索語候補からも、部分一致検索語候補を利用した部分一致検索語処理からも、抽出されない場合、この時点で、認識処理を終了する。
【００４８】
使用頻度が次に高いサブ集合群についての認識処理の結果得られた有力構成名詞候補を利用し、実在検索語候補リストと部分一致検索語リストとを更新し、上記検索語候補選定対話と、上記部分一致検索語処理とを繰り返す。
【００４９】
音声出力部７は、確認処理ガイダンス出力命令が送られてきた場合、検索語候補と指定された候補との正誤確認を行うガイダンスを、利用者に出力する。
【００５０】
そして、確認処理ガイダンスに対する利用者からの応答が、音声入力部２から再び入力されると、音声認識部３は、ＹＥＳ／ＮＯデータベース８３を参照し、利用者の応答を認識し、音声認識結果出力部４が認識結果を出力し、利用者から肯定を表す応答が得られた場合、検索語特定が完了した旨を利用者へガイダンスする命令を、対話制御部６が、音声出力部７へ送る。
【００５１】
検索語が特定できるまで、上記検索語候補リストの更新、上記検索語候補選定対話、上記部分一致検索語処理を繰り返す。
【００５２】
検索補助データベース８２中の全ての単名詞集合群について認識処理が終了し、実在検索語候補リストをそれ以上更新不可能な場合、対話制御部６は、単名詞単位（構成名詞単位）で、利用者に再入力を要求し、入力された構成順序の単名詞の認識結果に基づいて、該当構成名詞を含む検索語を、検索データベース８１から抽出し、音声認識を再度繰り返す。
【００５３】
検索装置１は、実在検索語候補リストの更新と検索語候補選定対話とが行われている間、優先認識対象名詞群以外の構成名詞集合群の少なくとも１集合（規定数からなる集合の１つ分）に対しては、認識処理が終了しているように構成名詞数を規定する。
【００５４】
次に、上記実施例の動作をより具体的に説明する。
【００５５】
なお、以下の説明では、複合名詞で構成されている電話帳掲載の法人企業名を検索する場合を例にとって説明する。
【００５６】
電話帳に掲載されている法人名義は、日本全国で２，２００万件存在する。現行の音声認識技術をそのまま適用したのでは、２，２００万件を対話処理実時間内で認識処理することは不可能であり、精度は非常に低いことが知られている。
【００５７】
図４は、電話帳に登録されている２，２００万件の法人名義データを、それを構成する単名詞（構成名詞）で区切り、この単名詞で区切られた法人名義データが、検索データベース８１に登録されている例を示す図である。
【００５８】
検索データベース８１に登録されている法人名義について、法人名義を構成する各単名詞の頻度を、単名詞毎に調べ、頻度の高い順に、検索補助データベース８２に登録する。
【００５９】
法人名義２，２００万件を構成する複合名詞のうちで、最長構成単語数が７である場合、１番目表記の単名詞の総数は、約３６０万種類であり、２番目表記の単名詞の総数が、約２５０万種類であり、３番目表記の単名詞の総数が、約２７０万種類であり、４番目表記の単名詞の総数が、約１００万種類であり、……、対話処理実時間内では、各単名詞に対しても認識処理は不可能であり、精度が低いことが予想される。
【００６０】
図５は、上記実施例において、検索補助データベース８２の一例を示す図である。
【００６１】
検索装置１は、各構成順序の検索補助データベース８２中の単名詞を、単名詞毎に、使用頻度が高い順に並べ、使用頻度が高い方からｑ個ずつに細分化し（区分けし）、第ｎ表記の単名詞集合群を構成する（１≦ｎ≦７）。
【００６２】
使用する音声認識装置の性能に応じて、上記ｑを規定し、本実施例では、ｑ＝５００であるとする。そして、ｎ番目表記の単名詞集合群の中で、使用頻度が最も高い５００個の単名詞のグループを、「第１サブ集合群」と定める。
【００６３】
また、本実施例において使用する認識装置は、入力された単名詞を分割し、単名詞単位で認識処理し、構成順序毎に、単名詞に認識尤度を算出し、出力する機能を有していない場合を想定している。したがって、検索目的である法人名義（複合名詞）を、その単名詞毎に区切って発話することを、利用者に要求する。
【００６４】
なお、入力された単名詞を分割し、単名詞単位で認識処理し、構成順序毎に、単名詞の認識尤度を算出し、出力する機能を有する場合に、本実施例を適用するようにしてもよい。
【００６５】
ここで、「東京（とうきょう）／卸売（おろしうり）／市場（しじょう）」を、検索語として、利用者Ｐｃが音声入力した場合について考える。
【００６６】
音声入力された「東京」について、１番目表記の単名詞集合群Ｇｃ１の第１サブ集合群Ｇｃ１−１に属する単名詞の中から認識し、音声入力された「卸売」について、２番目表記の単名詞集合群Ｇｃ２の第１サブ集合群Ｇｃ２−１に属する単名詞の中から認識し、音声入力された「市場」について、３番目表記の単名詞集合群Ｇｃ３の第１サブ集合群Ｇｃ３−１に属する単名詞の中から認識する。
【００６７】
前方検索補助データベース８２、後方検索補助データベース８２のそれぞれについて、認識処理し、この認識処理された結果をマージする。そして、認識装置が出力する認識スコアの大きい順に、認識結果のそれぞれを出力する。
【００６８】
図６は、上記実施例において、前方検索補助データベース８２と後方検索補助データベース８２とを用いて、各単名詞を認識し、この認識の結果をマージした状態を示す図である。
【００６９】
図６に示すマージ結果において、１番目表記の単名詞である「東京」は、１番目表記の単名詞集合群Ｇｃ１の中の第１サブ集合群Ｇｃ１−１に含まれ、３番目表記の単名詞である「市場」は、３番目表記の単名詞集合群Ｇｃ３の中の第１サブ集合群Ｇｃ３−１に含まれているが、２番目表記の単名詞である「卸売」は、２番目表記の単名詞集合群Ｇｃ２の中の第１サブ集合群Ｇｃ２−１には含まれていない。
【００７０】
本実施例では、認識尤度８０以上を出力した単名詞を、「有力構成名詞候補」と定め、図６において、有力構成名詞候補を網点で示してある。
【００７１】
図６に示すマージ結果では、音声入力した「東京」について、「東京」、「東急」の２候補が、有力構成名詞候補として選択され、音声入力した「卸売」については、有力構成名詞候補が選択されず、音声入力された「市場」については、「地所」の１候補が、有力構成名詞候補として選択されている。
【００７２】
図６に示すマージ結果に基づいて、実在検索語候補リスト作成処理を行うが、音声入力した「卸売」については、有力構成名詞候補が選択されていないので、実在する検索語が存在せず、したがって、検索装置１は、各番目表記の単名詞集合群の第１サブ集合群に対する認識処理が終了次第、残っている認識対象としての集合群のうちで、次に使用頻度が高い集合群（第２サブ集合群）（単名詞５００個で構成されている集合群）について、認識処理する。そして、この認識処理の後に、部分一致検索処理を行う。
【００７３】
１番目表記の単名詞の有力構成名詞候補「東京」、「東急」を含む検索語と、３番目表記の単名詞の有力構成名詞候補「地上」を含む検索語とを、検索データベース８１から抽出する。
【００７４】
本実施例において、各有力構成名詞候補の認識尤度の和を、部分一致尤度とする。
【００７５】
図７は、上記実施例において、部分一致検索語処理した結果と、その結果を利用した検索語候補選定対話の一例とを示す図である。
【００７６】
図８は、上記実施例において、部分一致尤度と、部分一致検索語候補との関係を示す図である。
【００７７】
図８に示すように、１番目表記の単名詞である「東京」を含み、３番目表記の単名詞である「地所」を含む検索語の部分一致尤度が、最大になる。部分一致検索語処理として抽出した部分一致検索語全体を、認識対象とし、初めに発話された利用者の音声を、単名詞毎に、再度音声認識する。
【００７８】
このように、単名詞毎に、音声認識を再度実行するので、単名詞「東京」について、図７に示す認識尤度と、図８に示す認識尤度とが異なる。これと同様に、単名詞「東急」、「地所」、「市場」のそれぞれについても、図７に示す認識尤度と、図８に示す認識尤度とが異なる。また、単名詞毎に音声認識するので、認識精度が高くなる。
【００７９】
同時に抽出した部分一致検索語を構成し、１番目表記の単名詞のリスト、２番目表記の単名詞のリスト、３番目表記の単名詞のリストを、それぞれ作成し、単名詞で区切って利用者が発話した単名詞を再認識し、各順序について出力された認識結果を、部分一致検索語の有力構成名詞候補を抽出することができなかった番目表記の単名詞に嵌め込み、部分一致検索語尤度を再度計算し、図８（２）に示すように、実在候補のみを残す。
【００８０】
この際、部分一致検索語尤度を算出する場合、各単名詞の認識尤度の和をとることによって、部分一致検索語尤度を算出するようにしてもよく、また、積をとることによって、部分一致検索語尤度を算出するようにしてもよく、各単名詞の認識尤度の和を、複合名詞を構成する単名詞の数で除算することによって、部分一致検索語尤度を算出するようにしてもよい。
【００８１】
本実施例においては、各単名詞の認識尤度の和をとることによって、部分一致検索語尤度を算出することにし、この時点で認識処理が終了している部分一致検索語全体に対する認識結果と、マージし、また、この時点で認識処理が終了していると思われる次に使用頻度が高い第ｎサブ集合群の認識結果を用いた実在検索語候補作成処理の結果との間で、マージする。
【００８２】
このマージの結果、図８（２）に示すように、「東京／卸売／市場」に対する認識尤度が、既定閾値２４０を超えるので、「東京／卸売／市場」が実在候補として残る。
【００８３】
また、本実施例において、検索データベース８１として、電話帳データベースを利用しているので、住所が予め分かっていれば、その住所に実在する法人企業名、または個人を特定することができる。
【００８４】
本実施例は、検索語が複合名詞であることに着目し、単名詞毎に使用頻度を調べ、単名詞毎に認識処理するので、検索対象語が実時間処理不可能な大語彙であっても、利用者に対して迅速かつ正確に検索処理することができる。
【００８５】
本実施例は、検索語を構成する単名詞の頻度に偏りがある場合、非常に似通った単名詞が存在する場合にも、単名詞の認識結果について、音声入力された複合名詞を構成する複数の単名詞のうちで、部分的な単名詞が有力構成名詞候補である検索語を、部分一致検索語候補として検索データベース８１から抽出し、この抽出された部分一致検索語候補を対象として、単名詞毎に音声認識するので、全部を音声認識対象に定め実時間内に正しく認識処理することが不可能であった大語彙検索語の中から、迅速かつ正確に有力候補を絞り込むことができる。
【００８６】
これによって、利用者を待機させることなく、検索処理することができると考えられる。
【００８７】
なお、上記実施例において、検索データベースが、電話帳データベースのような個人姓名である場合、苗字と名前とを、上記複合名詞の構成単位とし、上記検索補助データベースには、上記１番目表記の単名詞として、上記苗字を登録し、上記２番目表記の単名詞として、上記名前を登録することによって、個人姓名を確定するようにしてもよい。
【００８８】
また、検索補助データベース中の全ての構成単語集合群に対する認識処理が終了し、実在検索語候補リスト、部分一致検索語候補リストのそれぞれを、それ以上更新不可能、かつ候補が特定できない場合は、先頭の構成名詞、または末尾の構成名詞を具体的に利用者に入力要求し、この入力要求に応じて入力された情報を獲得し、組み合わせることによって、検索語を特定するようにしてもよい。
【００８９】
つまり、上記実施例は、複数の単名詞で構成されている複合名詞が検索語として登録され、しかも、上記検索語が各単名詞に区切られて登録されている検索データベースと、上記登録されている各複合名詞のｎ番目（ｎは整数値）に表記されている単名詞の群を、ｎ番目表記の単名詞集合群と呼び、上記ｎ番目表記の単名詞集合群が、その頻度の高い順に登録され、しかも、この頻度の高い順に登録されている単名詞が、その頻度の高い方から順に、所定の数毎にまとめられ、複数のサブ集合群が形成されている検索補助データベースと、利用者が上記検索語を単名詞毎に音声入力すると、上記複合名詞のｎ番目表記の単名詞については、上記ｎ番目表記の単名詞集合群で認識し、しかも上記ｎ番目表記の単名詞集合群のうちで、最も頻度が高い単名詞を含む第１サブ集合群の範囲内で認識処理し、認識尤度を対応させて、認識結果リストを作成する認識結果リスト作成手段と、上記認識処理された単名詞である構成名詞候補と、上記構成名詞候補についての認識尤度との組が認識尤度順に並べられている認識結果リストを、上記音声入力された単名詞のそれぞれについて作成し、上記認識結果リストに記載されている構成名詞候補のうちで、所定の第１の閾値を超える認識尤度を具備する構成名詞候補を、有力構成名詞候補として選出する有力構成名詞候補選出手段と、上記検索語を構成する複数の単名詞のうちで一部の単名詞のみの認識尤度が上記第１の所定の閾値を超える場合、上記第１の閾値を超える認識尤度を具備する上記単名詞を備え、上記検索語を構成する単名詞の数と同じ数の単名詞を備えている部分一致検索語候補を、上記検索データベースから抽出する部分一致検索語候補抽出手段と、上記抽出された部分一致検索語候補を対象として音声認識し、しかも、上記抽出された部分一致検索語候補を構成する単名詞毎に、上記入力された音声を再認識する再認識手段と、上記再認識手段が再認識した結果の認識尤度を、所定の演算方法で演算することによって、統合尤度を演算する統合尤度演算手段とを有する音声対話型複合名詞の検索装置の例である。
【００９０】
また、上記実施例は、複数の単名詞で構成されている複合名詞が検索語として登録され、しかも、上記検索語が各単名詞に区切られて登録されている検索データベースと、上記登録されている各複合名詞のｎ番目（ｎは整数値）に表記されている単名詞の群を、ｎ番目表記の単名詞集合群と呼び、上記ｎ番目表記の単名詞集合群が、その頻度の高い順に登録され、しかも、この頻度の高い順に登録されている単名詞が、その頻度の高い方から順に、所定の数毎にまとめられ、複数のサブ集合群が形成されている検索補助データベースと、利用者が上記検索語である複合名詞を単名詞毎に区切らずに、一連で音声入力すると、上記音声入力された検索語である複合名詞を単名詞毎に区切る検索語区切り手段と、上記利用者が上記検索語である複合名詞を一連で音声入力すると、上記複合名詞のｎ番目表記の単名詞については、上記ｎ番目表記の単名詞集合群で認識し、しかも上記ｎ番目表記の単名詞集合群のうちで、最も頻度が高い単名詞を含む第１サブ集合群の範囲内で認識処理し、認識尤度を対応させて、認識結果リストを作成する認識結果リスト作成手段と、上記認識処理された単名詞である構成名詞候補と、上記構成名詞候補についての認識尤度との組が認識尤度順に並べられている認識結果リストを、上記音声入力された単名詞のそれぞれについて作成し、上記認識結果リストに記載されている構成名詞候補のうちで、所定の第１の閾値を超える認識尤度を具備する構成名詞候補を、有力構成名詞候補として選出する有力構成名詞候補選出手段と、上記検索語を構成する複数の単名詞のうちで一部の単名詞のみの認識尤度が上記第１の所定の閾値を超える場合、上記第１の閾値を超える認識尤度を具備する上記単名詞を備え、上記検索語を構成する単名詞の数と同じ数の単名詞を備えている部分一致検索語候補を、上記検索データベースから抽出する部分一致検索語候補抽出手段と、上記抽出された部分一致検索語候補を対象として音声認識し、しかも、上記抽出された部分一致検索語候補を構成する単名詞毎に、上記入力された音声を再認識する再認識手段と、上記再認識手段が再認識した結果の尤度を、所定の演算方法で演算することによって、統合尤度を演算する統合尤度演算手段とを有する音声対話型複合名詞の検索装置の例である。
【００９１】
【発明の効果】
本発明によれば、法人企業名や辞書中の四文字熟語等、検索対象がどんなに大規模であっても、利用者が音声で入力した検索語を、検索データベース中から検索し確定する場合オペレータ対応の検索装置のように、迅速かつ自然に検索処理を実現することができるという効果を奏する。
【図面の簡単な説明】
【図１】本発明の一実施例である音声対話型複合名詞の検索装置１を示すブロック図である。
【図２】検索データベース８１の全体像の一例を示す図である。
【図３】検索補助データベース８２の全体像の一例を示す図である。
【図４】電話帳に登録されている２，２００万件の法人名義データを、それを構成する単名詞（構成名詞）で区切り、この単名詞で区切られた法人名義データが、検索データベース８１に登録されている例を示す図である。
【図５】上記実施例において、検索補助データベース８２の一例を示す図である。
【図６】上記実施例において、前方検索補助データベース８２と後方検索補助データベース８２とを用いて、各単名詞を認識し、この認識の結果をマージした状態を示す図である。
【図７】上記実施例において、部分一致検索語処理した結果と、その結果を利用した検索語候補選定対話の一例とを示す図である。
【図８】上記実施例において、部分一致尤度と、部分一致検索語候補との関係を示す図である。
【符号の説明】
１…音声対話型複合名詞の検索装置、
２…音声入力部、
３…音声認識部、
４…音声認識結果出力部、
５…検索語候補リスト作成部、
６…対話制御部、
７…音声出力部、
８…システムデータベース、
８１…検索データベース、
８２…検索補助データベース。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a search device and a search method for searching for a compound noun search word input by a user using a speech recognition device, searching a database, and confirming the user's input.
[0002]
[Prior art]
In a system that uses a speech recognition technology to identify a search term input by a user, there is a limit to the number of vocabularies that can be recognized in speech processing in real time, and the more words that can be recognized, It is known that it is very difficult to determine a user's voice input uniquely in a short time due to the current problem of recognition technology that the recognition accuracy decreases.
[0003]
Currently, the user waits until the recognition process is completed and the result is presented, and there is no measure other than confirming the correctness of the recognition result to the user, confirming in order of recognition score output by the recognition device, The candidate is repeatedly presented until the correct answer can be confirmed (for example, Non-Patent Document 1).
[0004]
[Non-Patent Document 1]
Kumiko Omori, Masanobu Higashida “Study on Improving the Determining Rate by Spoken Dialogue for Large Vocabulary Compound Nouns”, Proc. Of the 64th Annual Conference of Information Processing Society of Japan (2), March 2002 12th, p. 2-275,2-276
[0005]
[Problems to be solved by the invention]
The conventional system that uses a speech recognition device for user speech input and identifies the input requires more time for the recognition process and the reduction in accuracy is unavoidable as the search term to be recognized is larger. . For users, waiting time and repetition of candidate presentations lead to stress, and user satisfaction cannot be obtained when compared to operator-compatible systems, and you want to use current voice response services such as voice portals repeatedly. The current situation is that the user does not think.
[0006]
Therefore, as a technique to identify input in a short and natural flow of conversation without giving stress to the user, if the search word is composed of compound words of nouns, use the frequency of the constituent nouns. Aiming to reduce waiting time and improve accuracy by creating a search auxiliary database, using the created search auxiliary database, predefining frequently used constituent nouns as priority recognition targets, and narrowing down the recognition targets in advance. Conceivable.
[0007]
However, if there is a bias in the frequency of use of each noun that constitutes one search word, or if there are many very similar constituent nouns that are used in the same constituent order, the corresponding constituent noun is It may not be registered in the priority recognition target noun list (a list of nouns that are frequently used and nominated as a priority recognition target) included in the auxiliary search database. This leads to the result that there is no search term to be found.
[0008]
In addition, even if the corresponding constituent noun is registered in the above priority recognition target noun list, many similar words are output as recognition results, so that the correct constituent words do not appear in the output result and are actually present. As a result of the search word list creation process, it leads to a result that there is no actual search word.
[0009]
In addition, single nouns (single nouns that make up compound nouns) that can be extracted in part and that are considered to be useful information are used as additional information. A partial match search word process that extracts a search term including a single noun with a high value from the search database is performed, and in parallel with this, the time of the partial match search word process is used, and the recognition process is not completed. Forward search auxiliary database (search auxiliary database created for each single noun group that recognizes the nth single noun from the front of multiple single nouns constituting the compound noun that is the search term) and backward search auxiliary database (search Continues recognition processing for each single noun, with the search target database (recognized for each single noun group recognizing the nth single noun after the multiple nouns that make up the compound noun as words) And Each time that the presentation to use business, it is conceivable to repeat the process of merging the output at that time.
[0010]
In the voice input type search using the voice recognition device, only the voice recognition technology is used and the voice input by the user is confirmed. In addition to presenting and confirming the output result of the recognition device to the user There is no definitive method, the larger the number of words to be recognized and the more similar objects, the longer the recognition processing time and the lower the recognition accuracy, thus causing the user to wait. Moreover, it leads to the repetition of the presentation confirmation by misrecognition.
[0011]
Currently, in speech input type search using speech recognition devices, in order to obtain user satisfaction, it can be applied to limited fields and regions, and types of information that have a hierarchical structure. It is difficult to obtain speed and accuracy that are acceptable for users unless the number of recognition targets is reduced by grouping and narrowing down in order, with certainty in order starting from the lowest number It is.
[0012]
Furthermore, the ambient noise level is also very difficult to recognize by the input of an unspecified number of speakers from various places. In such a case, even in the word recognition, which is said to be the easiest, it is a practical level. It is said that the maximum number of words that can be handled in is hundreds.
[0013]
In the present invention, when a search term input by a user is searched and confirmed from a search database regardless of how large the search target is, a search is quickly and naturally performed like an operator-compatible search device. It is an object of the present invention to provide a speech input type compound noun search apparatus and a speech input type compound noun search method capable of realizing processing.
[0014]
[Means for Solving the Problems]
The present invention has a recognition likelihood that exceeds the threshold when the recognition likelihood of only some of the single nouns exceeds a predetermined threshold among a plurality of single nouns constituting a search term input by a user. A partial match search word candidate extraction means for extracting partial match search word candidates from the search database, the partial match search word candidates having the same number of single nouns as the number of single nouns constituting the search word. Re-recognition means for re-recognizing the input speech for each single noun constituting the extracted partial match search word candidate, and re-recognition The speech interactive compound noun search device includes integrated likelihood calculating means for calculating the integrated likelihood by calculating the recognition likelihood as a result of re-recognition by the means by a predetermined calculation method.
[0015]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 is a block diagram showing a spoken dialogue compound noun search apparatus 1 according to an embodiment of the present invention.
[0016]
The speech interactive compound noun search device 1 includes a speech input unit 2, a speech recognition unit 3, speech recognition software 3S, a speech recognition result output unit 4, a search word candidate list creation unit 5, and a dialog control unit. 6, an audio output unit 7, audio output software 7 </ b> S, and a system database 8.
[0017]
The search word candidate list creation unit 5 includes an actual search word list creation unit 51 and a partial match search word list creation unit 52.
[0018]
The dialogue control unit 6 includes a search word candidate selection dialogue unit 61 and a partial match search word candidate selection dialogue unit 62. The speech recognition unit 3 uses the system database 8 when performing speech recognition processing on the input speech.
[0019]
The system database 8 includes a search database 81, a search auxiliary database 82, and a YES / NO database 83.
[0020]
The search database 81 is a database in which compound nouns composed of a plurality of single nouns are registered as search words, and the search words are registered by being divided into single nouns.
[0021]
The search auxiliary database 82 calls a group of single nouns written in the n-th (n is an integer value) of each registered compound noun as a “n-th notation set of single nouns”, and A single noun set group of notation is registered in the order of the frequency, and the single nouns registered in the order of the frequency are grouped into a plurality of sub-groups in order from the highest frequency. A database in which a set group is formed.
[0022]
The voice recognition unit 3 uses a voice recognition device, and the voice output unit 7 uses a voice output device for voice output.
[0023]
FIG. 2 is a diagram illustrating an example of the overall image of the search database 81.
[0024]
FIG. 3 is a diagram illustrating an example of the overall image of the search assistance database 82.
[0025]
The search database 81 holds a number of search words that cannot be subjected to speech recognition processing within the real time of interactive processing, and each search word is recorded in a state of being divided into constituent nouns as shown in FIG. The search auxiliary database 82 calculates the frequency of use of the single nouns in the n-th notation group of nouns for all search words in the search database 81, and the single nouns are calculated in the calculated usage frequency order. Stored in a state of being arranged (ordered state).
[0026]
The search auxiliary database 82 includes a search auxiliary database for a single noun of the first notation to a search auxiliary database for a single noun search of the pth notation, where p is the maximum number of single nouns constituting the search word.
[0027]
The number of single nouns (constituent nouns) held in each search auxiliary database 82 is expected to be smaller than the total number of search words held in the search database 81, and cannot be processed within the interactive processing real time. Suppose that it is a large number.
[0028]
Each search auxiliary database is stored in a form in which a plurality of single nouns are subdivided in order from the single noun with the highest frequency of use for each number of words that can be processed within the real time of the interactive processing preset by the search device 1. Among a plurality of single nouns stored in the nth search auxiliary database, a group of single nouns having the highest use frequency is referred to as a “first sub-set group”.
[0029]
The speech interactive compound noun search device 1 sends the user's voice input via the voice input unit 2 to the voice recognition unit 3.
[0030]
The voice recognition unit 3 performs a recognition process on the user input voice using a voice recognition device. If the speech recognition unit 3 does not have a function of dividing a compound noun into single nouns (constitutive nouns) and outputting a recognition result for each constituent noun, the user is required to divide and input each single noun. To instruct. When the speech recognition unit 3 has a function of dividing a speech signal subjected to speech power into constituent noun units (single noun units) and outputting a recognition result for each single noun, the user wants to search Input a series of nouns by voice.
[0031]
In addition, the speech recognition unit 3 does not have a function for the speech recognition unit 3 to divide speech-powered speech signals into single noun units (constant noun units) and output a recognition result for each single noun. Then, the user Pc is instructed to input speech by dividing into single nouns (every word).
[0032]
The voice recognition unit 3 selects a database to be recognized from the system database 8 in accordance with the processing status of the search device 1. When a search term is input, the search auxiliary database 82 is referred to, and when a response to the correctness confirmation to the user is recognized, the YES / NO database 83 is referred to.
[0033]
The speech recognition result output unit 4 sends the recognition result to the search word candidate list creation unit 5.
[0034]
At this time, if there is a sub-set group for which the recognition process has not been completed yet in each of the search auxiliary databases 82, the speech recognition unit 32 restarts the recognition process for the sub-set group, and the speech recognition result output unit 43 repeats the process of outputting the recognition result and sending it to the search word candidate list creation unit 5.
[0035]
In the search word candidate list creation unit 5, the actual search word candidate list creation unit 51 selects a possible constituent noun candidate from the recognition results, creates a combination of possible constituent noun candidates, and refers to the search database 81. Then, an actual search word candidate list obtained by extracting actual search words is created, and the created real search word candidate list is sent to the dialogue control unit 6.
[0036]
When extracting a real search word, all combinations of influential constituent nouns are created, and the search database 81 is referenced to check whether or not it exists.
[0037]
In addition, each time a candidate for a prominent constituent noun is combined one by one, the search database 81 is used to check whether or not a search word having a connection with the relevant prominent constituent noun actually exists. It is also possible to check the search database 81 again by combining influential constituent nouns in other constituent orders only with respect to the possible constituent noun candidate combinations.
[0038]
Furthermore, as an order of combining the prominent constituent nouns, they may be combined in order from the top, or the combination may be selected so that the search time is minimized.
[0039]
While creating the actual search word list, the partial match search word list creation unit 52 sets a search word partially including the prominent constituent noun candidate selected from the recognition result as a “partial match search word candidate”. Extracted from the search database 81, “partial match likelihood” is calculated, and recognition processing is performed for a partial match search word having a partial match likelihood equal to or greater than a predetermined threshold.
[0040]
When the search term is divided into single nouns and the user inputs a voice, only the search words having the same number of constituent nouns are selected from the search database 81, and the first input voice for the entire partially matched search word is selected. At the same time as using the recognition process, a list of nouns to be recognized is created only for the places where powerful constituent nouns cannot be extracted. To process. That is, speech recognition is performed again for each single noun.
[0041]
When the speech recognition apparatus has a function of dividing and recognizing a speech signal input for each single noun and recognizing the speech signal, the speech signal input by the user is recognized using a partial match search word candidate. The result of this recognition processing is sent to the dialogue control unit 6.
[0042]
The dialogue control unit 6 executes a detection candidate selection dialogue by using the actual search word candidates and the partial match search word candidates. At this time, the partial match likelihood is calculated for each search word candidate using the result for the partial match search word sent from the partial match search word list creation unit 52.
[0043]
When recognition processing is performed for the entire partial match search word, the calculated recognition likelihood may be used as the partial match likelihood without using the partial match likelihood so far. It is also possible to obtain the partial match likelihood by adding the sum of the recognition likelihoods for each leading constituent noun and the calculated recognition likelihood and dividing by “number of partially matching constituent nouns + 1”. Alternatively, the partial match likelihood may be obtained simply by taking a sum or product.
[0044]
In addition, when a single noun is recognized only for a portion where no powerful constituent noun is output, it is added or multiplied by the likelihood of each powerful constituent noun calculated in advance and divided by the number of constituent nouns. Alternatively, the products of the respective likelihoods may be calculated (the respective recognition likelihoods are multiplied with each other), and this may be divided by the number of constituent nouns.
[0045]
If the search word candidate in the actual search word candidate list satisfies the specified conditions of the search device that can specify the search word only by the confirmation process with the user, the confirmation process guidance output command is sent to the voice output unit 7. send.
[0046]
If the search word candidate in the actual search word candidate list does not satisfy the conditions for specifying the search word only by the confirmation process with the user, or if the actual search word candidate is not extracted, the partial match search word processing As a result, when the calculated new partial match likelihood satisfies the specified conditions of the search device that can specify the search word only by the confirmation process with the user, an output instruction of the confirmation process guidance is sent to the voice output unit 7. .
[0047]
If the search word candidate that can be specified only by the confirmation process is not extracted from the actual search word candidate or the partial match search word process using the partial match search word candidate, the recognition process is terminated at this point.
[0048]
Using the powerful constituent noun candidates obtained as a result of the recognition process for the next highest subset set, update the actual search word candidate list and the partial match search word list, and the search word candidate selection dialogue, The partial match search word processing is repeated.
[0049]
When the confirmation processing guidance output command is sent, the voice output unit 7 outputs a guidance for performing correct / incorrect confirmation between the search word candidate and the designated candidate to the user.
[0050]
When a response from the user to the confirmation processing guidance is input again from the voice input unit 2, the voice recognition unit 3 refers to the YES / NO database 83 to recognize the user's response, and the voice recognition result When the output unit 4 outputs the recognition result and a response indicating affirmation is obtained from the user, the dialogue control unit 6 sends a command for guiding the user to the effect that the search term specification is completed, to the voice output unit 7. send.
[0051]
The search word candidate list is updated, the search word candidate selection dialog, and the partial match search word processing are repeated until a search word can be specified.
[0052]
When the recognition process is completed for all single noun sets in the auxiliary search database 82 and the actual search word candidate list cannot be updated any more, the dialogue control unit 6 uses the single noun unit (constituent noun unit). The user is requested to input again, and based on the input recognition result of the single nouns in the composition order, the search term including the corresponding constituent noun is extracted from the search database 81 and the speech recognition is repeated again.
[0053]
While the actual search word candidate list is updated and the search word candidate selection dialogue is performed, the search device 1 is configured to include at least one set of constituent noun sets other than the priority recognition target noun group (one of a set of specified numbers). Min), the number of constituent nouns is defined so that the recognition process is completed.
[0054]
Next, the operation of the above embodiment will be described more specifically.
[0055]
In the following description, a case where a corporate company name included in a telephone book composed of compound nouns is searched will be described as an example.
[0056]
There are 22 million corporate names in the phone book across Japan. It is known that if the current speech recognition technology is applied as it is, it is impossible to recognize and process 22 million cases within the real time of interactive processing, and the accuracy is very low.
[0057]
FIG. 4 shows that 22 million corporate name data registered in the telephone directory are delimited by single nouns (constituent nouns) constituting the data, and corporate name data delimited by the single nouns is the search database 81. It is a figure which shows the example registered into.
[0058]
For the corporate names registered in the search database 81, the frequency of each single noun constituting the corporate name is checked for each single noun and registered in the search auxiliary database 82 in descending order of frequency.
[0059]
Among compound nouns constituting 22 million corporate names, when the maximum number of constituent words is 7, the total number of single nouns in the first notation is about 3.6 million, and the number of single nouns in the second notation The total number is about 2.5 million, the total number of single nouns in the third notation is about 2.7 million, the total number of single nouns in the fourth notation is about 1 million, and so on. Within the time, recognition processing is impossible even for each single noun, and it is expected that the accuracy is low.
[0060]
FIG. 5 is a diagram showing an example of the search assistance database 82 in the above embodiment.
[0061]
The search device 1 arranges the single nouns in the search auxiliary database 82 of each configuration order in order of frequency of use for each single noun, and subdivides (sorts) the q nouns in descending order of frequency of use. Construct a single noun set group (1 ≦ n ≦ 7).
[0062]
The above q is defined according to the performance of the speech recognition apparatus to be used, and in this embodiment, it is assumed that q = 500. Then, a group of 500 single nouns having the highest use frequency in the n-th notation single noun set group is defined as a “first sub-set group”.
[0063]
Further, the recognition device used in the present embodiment has a function of dividing an input single noun, performing recognition processing on a single noun basis, calculating a recognition likelihood for a single noun for each configuration order, and outputting it. The case is not assumed. Therefore, the user is requested to speak the corporate name (compound noun), which is the purpose of the search, divided into single nouns.
[0064]
In addition, this embodiment is applied to the case where the input single noun is divided, subjected to recognition processing in units of single nouns, and has a function of calculating and outputting the recognition likelihood of single nouns for each configuration order. May be.
[0065]
Here, a case where the user Pc inputs a voice using “Tokyo / Wholesale / Market” as a search term will be considered.
[0066]
“Tokyo” inputted by voice is recognized from single nouns belonging to the first sub-set group Gc1-1 of the first noun set group Gc1 of the first notation, and “wholesale” inputted by voice is indicated by the second notation. Recognized from the single nouns belonging to the first sub-set group Gc2-1 of the single noun set group Gc2, and for the “market” inputted by speech, the first sub-set group Gc3- of the third-stated single noun set group Gc3 Recognize from single nouns belonging to 1.
[0067]
Each of the forward search auxiliary database 82 and the backward search auxiliary database 82 is subjected to recognition processing, and the results of the recognition processing are merged. Then, the recognition results are output in descending order of the recognition score output by the recognition device.
[0068]
FIG. 6 is a diagram showing a state in which single nouns are recognized using the forward search auxiliary database 82 and the backward search auxiliary database 82 and the results of recognition are merged in the above embodiment.
[0069]
In the merge result shown in FIG. 6, “Tokyo”, which is the first noun single noun, is included in the first sub-set group Gc1-1 in the first noun single noun set group Gc1. “Market” as a noun is included in the first sub-set group Gc3-1 in the single noun set group Gc3 of the third notation, but “wholesale” as the single noun of the second notation is the second It is not included in the first sub-set group Gc2-1 in the single noun set group Gc2.
[0070]
In this embodiment, a single noun that outputs a recognition likelihood of 80 or more is defined as a “possible influential noun candidate”, and in FIG. 6, influential constituent noun candidates are indicated by halftone dots.
[0071]
In the merge result shown in FIG. 6, two candidates “Tokyo” and “Tokyu” are selected as probable constituent noun candidates for “Tokyo” inputted by voice, and the prominent constituent noun candidate is inputted for “wholesale” inputted by voice. For the “market” that is not selected and is voice-inputted, one candidate “estate” is selected as a potential constituent noun candidate.
[0072]
Based on the merge result shown in FIG. 6, the actual search word candidate list creation process is performed. However, for “wholesale” inputted by voice, since no prominent constituent noun candidate is selected, there is no actual search word, Therefore, as soon as the recognition process for the first sub-set group of each noun single noun set group is completed, the search device 1 sets the next most frequently used set group among the remaining set groups as recognition targets ( The second sub-set group (a set group made up of 500 single nouns) is recognized. Then, after this recognition process, a partial match search process is performed.
[0073]
A search word including the leading single noun candidate constituent nouns “Tokyo” and “Tokyu” and a search word including the third single noun leading constituent noun candidate “terrestrial” are extracted from the search database 81. To do.
[0074]
In the present embodiment, the sum of the recognition likelihoods of the leading constituent noun candidates is set as the partial match likelihood.
[0075]
FIG. 7 is a diagram illustrating a result of the partial match search word process and an example of a search word candidate selection dialogue using the result in the above embodiment.
[0076]
FIG. 8 is a diagram showing the relationship between the partial match likelihood and the partial match search word candidate in the above embodiment.
[0077]
As shown in FIG. 8, the partial match likelihood of a search term including “Tokyo” as the single noun of the first notation and including “the estate” as the single noun of the third notation is maximized. The entire partial match search word extracted as the partial match search word processing is set as a recognition target, and the voice of the user spoken first is recognized again for each single noun.
[0078]
Thus, since speech recognition is performed again for each single noun, the recognition likelihood shown in FIG. 7 and the recognition likelihood shown in FIG. 8 are different for the single noun “Tokyo”. Similarly, the recognition likelihoods shown in FIG. 7 and the recognition likelihoods shown in FIG. 8 are different for each of the single nouns “Tokyu”, “Place”, and “Market”. Moreover, since voice recognition is performed for each single noun, recognition accuracy is increased.
[0079]
Create a partial match search word extracted at the same time, and create a list of single nouns in the first notation, a list of single nouns in the second notation, and a list of single nouns in the third notation, respectively. Re-recognize single nouns uttered by, and fit the recognition results output for each order into the single nouns of the second notation that could not extract potential constituent noun candidates for partial match search words. The degree is calculated again, and only the actual candidates are left as shown in FIG.
[0080]
In this case, when calculating the partial match search word likelihood, the partial match search word likelihood may be calculated by taking the sum of the recognition likelihood of each single noun, or by taking the product. The partial match search word likelihood may be calculated, and the partial match search word likelihood is calculated by dividing the sum of the recognition likelihood of each single noun by the number of single nouns constituting the compound noun. You may make it do.
[0081]
In this embodiment, the partial match search word likelihood is calculated by taking the sum of the recognition likelihood of each single noun, and the recognition result for the entire partial match search word for which the recognition process has been completed at this point And the result of the real search word candidate creation process using the recognition result of the nth sub-set group having the next highest frequency of use, which is considered to have been recognized at this point. Merge.
[0082]
As a result of the merging, as shown in FIG. 8 (2), the recognition likelihood for “Tokyo / Wholesale / Market” exceeds the predetermined threshold 240, so “Tokyo / Wholesale / Market” remains as a real candidate.
[0083]
Further, in the present embodiment, since the telephone directory database is used as the search database 81, if the address is known in advance, it is possible to specify the corporate company name or the individual that actually exists at the address.
[0084]
In this embodiment, focusing on the fact that the search word is a compound noun, the frequency of use is checked for each single noun, and recognition processing is performed for each single noun, so the search target word is a large vocabulary that cannot be processed in real time. In addition, it is possible to perform a search process quickly and accurately for the user.
[0085]
In the present embodiment, when there is a bias in the frequency of single nouns constituting a search word, even when there are very similar single nouns, a plurality of nouns that constitute a speech-input compound noun with respect to single noun recognition results Among the simple nouns, a search word in which a partial single noun is a probable constituent noun candidate is extracted from the search database 81 as a partial match search word candidate, and the extracted partial match search word candidate is used as a target. Since speech recognition is performed for each noun, it is possible to quickly and accurately narrow down potential candidates from large vocabulary search words that are all subject to speech recognition and cannot be recognized correctly in real time.
[0086]
Thus, it is considered that the search process can be performed without causing the user to wait.
[0087]
In the above embodiment, when the search database is a personal first and last name such as a telephone directory database, the last name and the first name are the constituent units of the compound noun, and the search auxiliary database has the first notation. The last name may be registered by registering the last name as a noun and registering the name as a single noun of the second notation.
[0088]
In addition, when the recognition processing for all of the constituent word set groups in the search auxiliary database is completed, each of the actual search word candidate list and the partial match search word candidate list cannot be updated any more, and the candidate cannot be specified, A search term may be specified by specifically requesting the user to input the first constituent noun or the last constituent noun, and acquiring and combining information input in response to the input request.
[0089]
In other words, in the above embodiment, a compound noun composed of a plurality of single nouns is registered as a search word, and the search word is registered by being divided into each single noun, and the above registered A group of single nouns written in the nth (n is an integer value) of each compound noun is called an nth single noun set group, and the nth single noun set group has a high frequency. A search auxiliary database in which single nouns that are registered in order and that are registered in order from the highest frequency are grouped into a predetermined number in order from the highest frequency, and a plurality of sub-set groups are formed, When the user inputs the search word by speech for each single noun, the n-th single noun of the compound noun is recognized by the n-th single noun set, and the n-th single noun set. The most frequent of the group A recognition result list creating means for creating a recognition result list by performing recognition processing within the range of the first sub-set group including a single noun and corresponding recognition likelihood, and a constituent noun that is the single noun subjected to the recognition processing A recognition result list in which pairs of candidates and recognition likelihoods for the constituent noun candidates are arranged in the order of recognition likelihood is created for each of the single nouns that are input by speech, and are listed in the recognition result list. Among the constituent noun candidates, the constituent noun candidate selecting means for selecting the constituent noun candidate having the recognition likelihood exceeding the predetermined first threshold as the dominant constituent noun candidate, and a plurality of constituent nouns constituting the search word If the recognition likelihood of only some of the single nouns exceeds the first predetermined threshold, the single noun having the recognition likelihood exceeding the first threshold is provided, and the search word is Composed of single nouns A partial match search word candidate extraction means for extracting partial match search word candidates having the same number of single nouns from the search database, and voice recognition for the extracted partial match search word candidates, For each single noun constituting the extracted partial match search word candidate, a re-recognizing means for re-recognizing the input speech, and a recognition likelihood as a result of re-recognizing the re-recognizing means, a predetermined calculation method This is an example of a speech interactive compound noun search device having an integrated likelihood calculating means for calculating an integrated likelihood.
[0090]
In the above embodiment, a compound noun composed of a plurality of single nouns is registered as a search word, and the search word is registered by being divided into each single noun, and the above registration is performed. A group of single nouns written in the nth (n is an integer value) of each compound noun is called an nth single noun set group, and the nth single noun set group has a high frequency. A search auxiliary database in which single nouns that are registered in order and that are registered in order from the highest frequency are grouped into a predetermined number in order from the highest frequency, and a plurality of sub-set groups are formed, When a user inputs a series of voices without separating compound nouns as search terms into single nouns, search term separation means for separating compound nouns as speech input words into single nouns and the use If the user is the above search term When the nouns are input in a series of speech, the n noun single noun of the compound noun is recognized by the n noun single noun set group, and most frequently among the n noun single noun set groups. A recognition result list creating means for creating a recognition result list by performing recognition processing within the range of the first sub-set group including a single noun having a high level, and corresponding to the recognition likelihood, and a configuration that is a single noun subjected to the recognition processing A recognition result list in which pairs of noun candidates and recognition likelihoods for the constituent noun candidates are arranged in the order of recognition likelihood is created for each of the single nouns that are input by speech and listed in the recognition result list. Among the constituent noun candidates, a constituent noun candidate selecting means for selecting a constituent noun candidate having a recognition likelihood exceeding a predetermined first threshold as a potential constituent noun candidate, and a plurality of constituent noun candidates constituting the search word If the recognition likelihood of only some of the single nouns exceeds the first predetermined threshold, the single noun having the recognition likelihood exceeding the first threshold is provided, and the search word The partial match search word candidate extraction means for extracting from the search database partial match search word candidates having the same number of single nouns as the number of single nouns constituting the target, and the extracted partial match search word candidates as targets And re-recognizing means for re-recognizing the input speech for each single noun constituting the extracted partial match search word candidate, and the likelihood of the result of re-recognizing by the re-recognizing means This is an example of a speech interactive compound noun search device having an integrated likelihood calculating means for calculating an integrated likelihood by calculating with a predetermined calculation method.
[0091]
【The invention's effect】
According to the present invention, when searching for and confirming a search word input by a user from a search database, such as a corporate company name or a four-character idiom in a dictionary, regardless of how large the search target is, an operator There is an effect that the search processing can be realized quickly and naturally like a corresponding search device.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a voice interactive compound noun search apparatus 1 according to an embodiment of the present invention.
FIG. 2 is a diagram showing an example of an overall image of a search database 81. FIG.
FIG. 3 is a diagram showing an example of an overall view of a search assistance database 82. FIG.
FIG. 4 shows 22 million corporate name data registered in the telephone book, separated by single nouns (constituent nouns) constituting the data, and the corporate name data delimited by the single nouns is the search database 81. It is a figure which shows the example registered into.
FIG. 5 is a diagram showing an example of a search auxiliary database 82 in the embodiment.
FIG. 6 is a diagram showing a state in which single nouns are recognized using the forward search auxiliary database 82 and the backward search auxiliary database 82 and the results of the recognition are merged in the embodiment.
FIG. 7 is a diagram showing a result of partial match search word processing and an example of a search word candidate selection dialogue using the result in the embodiment.
FIG. 8 is a diagram illustrating a relationship between a partial match likelihood and a partial match search word candidate in the embodiment.
[Explanation of symbols]
1 ... Spoken dialogue compound noun search device,
2 ... Voice input part,
3 ... voice recognition unit,
4 ... voice recognition result output unit,
5 ... Search word candidate list creation part,
6 ... Dialogue control unit,
7 ... Audio output unit,
8 ... System database,
81 ... Search database,
82 ... Search auxiliary database.

Claims

A search database in which compound nouns composed of a plurality of single nouns are registered as search terms, and the search terms are registered by being divided into single nouns;
A group of single nouns described in the nth (n is an integer value) of each registered compound noun is referred to as an nth single noun set group, and the single noun set group of the nth notation is A search in which single nouns registered in descending order of frequency are grouped by a predetermined number in order from the highest frequency to form a plurality of sub-set groups. With an auxiliary database;
When the user inputs the search word by speech for each single noun, the n-th single noun of the compound noun is recognized by the n-th single noun set, and the n-th single noun set. A recognition result list creation means for creating a recognition result list by performing recognition processing within the range of the first subset group including the single noun having the highest frequency among the groups, and by associating the recognition likelihoods;
A recognition result list in which a combination of a constituent noun candidate that is the recognition-nominated single noun and a recognition likelihood for the constituent noun candidate is arranged in the order of recognition likelihood is obtained for each of the single nouns that are input by speech. Among the constituent noun candidates that are created and described in the recognition result list, the constituent noun candidates that select the constituent noun candidates that have the recognition likelihood exceeding the predetermined first threshold as the dominant constituent noun candidates are selected. Means;
When the recognition likelihood of only some of the single nouns constituting the search word exceeds the first predetermined threshold, the single noun having a recognition likelihood exceeding the first threshold is used. Partial match search word candidate extraction means for extracting from the search database partial match search word candidates having nouns and having the same number of single nouns as the number of single nouns constituting the search word;
Re-recognizing means for recognizing the extracted partial match search word candidate as a target and re-recognizing the input voice for each single noun constituting the extracted partial match search word candidate;
An integrated likelihood calculating means for calculating an integrated likelihood by calculating a recognition likelihood as a result of re-recognition by the re-recognizing means by a predetermined calculating method;
A spoken dialogue compound noun search device characterized by comprising:

A search database in which compound nouns composed of a plurality of single nouns are registered as search terms, and the search terms are registered by being divided into single nouns;
A group of single nouns described in the nth (n is an integer value) of each registered compound noun is referred to as an nth single noun set group, and the single noun set group of the nth notation is A search in which single nouns registered in descending order of frequency are grouped by a predetermined number in order from the highest frequency to form a plurality of sub-set groups. With an auxiliary database;
A search term delimiting means for delimiting the compound noun that is the voice input search term for each single noun when the user inputs the compound noun that is the search term for each single noun, without separating the compound noun for each single noun;
When the user inputs a series of compound nouns that are the search terms, the n-th notation of the compound noun is recognized by the n-th notation set of nouns, and the n-th notation A recognition result list creation means for creating a recognition result list by performing recognition processing within the range of the first sub-set group including the single noun having the highest frequency among the single noun set group;
A recognition result list in which a combination of a constituent noun candidate that is the recognition-nominated single noun and a recognition likelihood for the constituent noun candidate is arranged in the order of recognition likelihood is obtained for each of the single nouns that are input by speech. Among the constituent noun candidates that are created and described in the recognition result list, the constituent noun candidates that select the constituent noun candidates that have the recognition likelihood exceeding the predetermined first threshold as the dominant constituent noun candidates are selected. Means;
When the likelihood of only some of the single nouns constituting the search word exceeds the first predetermined threshold, the single noun having the likelihood exceeding the first threshold is determined. A partial match search word candidate extraction unit that extracts partial match search word candidates having the same number of single nouns as the number of single nouns constituting the search word from the search database;
Re-recognizing means for recognizing the extracted partial match search word candidate as a target and re-recognizing the input voice for each single noun constituting the extracted partial match search word candidate;
An integrated likelihood calculating means for calculating an integrated likelihood by calculating a recognition likelihood as a result of re-recognition by the re-recognizing means by a predetermined calculating method;
A spoken dialogue compound noun search device characterized by comprising:

In claim 1 or claim 2,
When the recognition processing for all of the constituent word set groups in the search auxiliary database is completed and each of the actual search word candidate list and the partial match search word candidate list cannot be updated any more and the candidate cannot be specified, Spoken dialogue type characterized by specifying a search term by specifically requesting a user to input a constituent noun or a constituent noun at the end, and acquiring and combining information input in response to the input request Compound noun search device.

In claim 1 or claim 2,
If the search database is a personal first and last name, the last name and first name are the constituent units of the compound noun, and the last name is registered in the search auxiliary database as the first notation of the first notation. A compound noun search device, wherein an individual first name is confirmed by registering the name as a single noun.

A compound noun composed of a plurality of single nouns is registered as a search word, and the search database in which the search word is divided into each single noun is registered, and the nth of each compound noun registered above A group of single nouns represented by (n is an integer value) is called an n-th group of single nouns, and the above-mentioned n-th group of single nouns is registered in descending order of frequency. Search for compound nouns using a search auxiliary database in which single nouns registered in descending order of frequency are grouped by number in order from the highest frequency and a plurality of sub-set groups are formed. In the way to
When the user inputs the search word by speech for each single noun, the n-th single noun of the compound noun is recognized by the n-th single noun set, and the n-th single noun set. A recognition result list creation stage in which recognition processing is performed within the range of the first subset group including a single noun having the highest frequency among the groups, and a recognition result list is created by associating the recognition likelihood;
A recognition result list in which a combination of a constituent noun candidate that is the recognition-nominated single noun and a recognition likelihood for the constituent noun candidate is arranged in the order of recognition likelihood is obtained for each of the single nouns that are input by speech. Among the constituent noun candidates that are created and described in the recognition result list, the constituent noun candidates that select the constituent noun candidates that have the recognition likelihood exceeding the predetermined first threshold as the dominant constituent noun candidates are selected. Stages;
When the recognition likelihood of only some of the single nouns constituting the search word exceeds the first predetermined threshold, the single noun having a recognition likelihood exceeding the first threshold is used. A partial match search word candidate extraction step of extracting from the search database partial match search word candidates having nouns and having the same number of single nouns as the number of single nouns constituting the search word;
A re-recognition step of recognizing the extracted partial match search word candidate as a target and re-recognizing the input voice for each single noun constituting the extracted partial match search word candidate;
An integrated likelihood calculating step of calculating an integrated likelihood by calculating the likelihood of the result of speech recognition in the re-recognition step by a predetermined calculation method;
A spoken dialogue type compound noun search method characterized by comprising:

A compound noun composed of a plurality of single nouns is registered as a search word, and the search database in which the search word is divided into each single noun is registered, and the nth of each compound noun registered above A group of single nouns represented by (n is an integer value) is called an n-th group of single nouns, and the above-mentioned n-th group of single nouns is registered in descending order of frequency. Search for compound nouns using a search auxiliary database in which single nouns registered in descending order of frequency are grouped by number in order from the highest frequency and a plurality of sub-set groups are formed. In the way to
A search term separation step of separating the compound noun, which is the speech-inputted search word, for each single noun when the user inputs a series of voices without separating the compound noun, which is the search word, for each single noun;
When the user inputs a series of compound nouns that are the search terms, the n-th notation of the compound noun is recognized by the n-th notation set of nouns, and the n-th notation A recognition result list creation step of creating a recognition result list by performing recognition processing within the range of the first sub-set group including the single noun having the highest frequency among the single noun set groups;
A recognition result list in which a combination of a constituent noun candidate that is the recognition-nominated single noun and a recognition likelihood for the constituent noun candidate is arranged in the order of recognition likelihood is obtained for each of the single nouns that are input by speech. Among the constituent noun candidates that are created and described in the recognition result list, the constituent noun candidates that select the constituent noun candidates that have the recognition likelihood exceeding the predetermined first threshold as the dominant constituent noun candidates are selected. Stages;
The single noun having a Nishiki likelihood exceeding the first threshold when the likelihood of only some of the single nouns constituting the search word exceeds the first predetermined threshold A partial match search word candidate extraction stage that extracts partial match search word candidates having the same number of single nouns as the number of single nouns constituting the search word from the search database;
A re-recognition step of recognizing the extracted partial match search word candidate as a target and re-recognizing the input voice for each single noun constituting the extracted partial match search word candidate;
An integrated likelihood calculating step of calculating the integrated likelihood by calculating the recognition likelihood of the result of speech recognition in the re-recognition step by a predetermined calculation method;
A spoken dialogue type compound noun search method characterized by comprising: