JP2004133003A

JP2004133003A - Method and apparatus for preparing speech recognition dictionary and speech recognizing apparatus

Info

Publication number: JP2004133003A
Application number: JP2002294402A
Authority: JP
Inventors: Yohei Okato; 岡登　洋平; Jun Ishii; 石井　純
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2002-10-08
Filing date: 2002-10-08
Publication date: 2004-04-30
Anticipated expiration: 2022-10-08
Also published as: JP4269625B2

Abstract

<P>PROBLEM TO BE SOLVED: To automatically register speaking variations of a speaker as to some word in a dictionary for speech recognition. <P>SOLUTION: Provided is a means of automatically generating a reworded expression which does not appear in the character string description of an index word and its reading by dividing an inputted index word into partial character strings, obtaining output words including the partial character strings as input words by the partial character strings according to a word substitution rule representing the relation between input words and output words, and substituting the partial character strings for the output words. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、言い換えを自動登録可能な音声認識のための認識辞書作成方法及びその装置とこの方式で作成した辞書を用いた音声認識装置に関するものである。
【０００２】
【従来の技術】
従来の音声認識システムは、認識辞書に登録されている語彙に基づいて認識を行うため、認識辞書に登録されていない語彙を認識することはできない。しかし、ユーザは認識辞書に登録されている語彙通りの発声を行うとは限らない。例えばユーザが発声対象となる名称を正確に知っているとは限らないし、正確な名称を知っていても省略可能と判断した部位を適宜省略して発声するのが実情である。そこで、音声認識に用いる認識辞書には、同じ単語や概念について、ユーザが異なる言い方をしても認識できるように、あらかじめ複数の言い換えが登録されている必要がある。例えば、特許文献１には、カーナビゲーションの音声による操作コマンドの言い換え表現を予め登録しておき、ユーザが複数通りの発声をしても正しく認識するための手法が開示されている。
【０００３】
ここで、例えば、認識対象語の表記が「大阪大学菅平高原実験センター」で、その音声的な表記を表す読みが（オオサカダイガクスガダイラコウゲンジッケンセンター）である場合、ユーザは、「阪大菅平実験センター」（ハンダイスガダイラジッケンセンター）や「阪大菅平実験所」（ハンダイスガダイラジッケンジョ）、「大阪大学菅平実験センター」（オオサカダイガクスガダイラジッケンセンター）、「菅平実験センター」（スガダイラジッケンセンター）、「菅平阪大実験センター」（スガダイラハンダイジッケンセンター）などと言い換えて発声することが考えられるが、従来は、想定される言い方のバリエーションを全て人手で辞書に追加していた。
【０００４】
しかし、対象の単語数が多い場合や、認識対象の語彙が逐次更新される場合、これらを全て人手で登録することは困難であり、自動処理が必須である。
【０００５】
この問題に対して、限定されたテキストを対象として言い換えを自動生成する手法として、対象範囲のテキストから形態素解析や読み付与のあいまい性、部分的な省略を考慮した言い換えを辞書へ自動追加する方法が特許文献２に開示されている。
【０００６】
図１８は、特許文献２に開示された手法による音声認識辞書作成装置の動作を説明する機能ブロック図である。図１８において、１０は言い換え表現を求める対象となる文字列情報である。１１は本文献で開示された辞書作成装置であって、１２は文字列情報１０をテキスト分割し、その読みを付与する解析処理手段である。また１３は解析処理手段１２がテキスト分割し、その読みを付与するために参照記憶する言語解析辞書であり、１００１はテキスト分割および読み付与手段１２の出力に基づいて言い換え表現を生成する語彙作成手段であって、１６は語彙作成手段１００１が生成した言い換え表現を記憶する語彙記憶手段である。
【０００７】
解析処理手段１２は、文字列情報１０で示される表記テキストを部分文字列へ分割し、それぞれの部分文字列へ読みを付与する。分割方法や読み方にあいまい性がある場合は、それらを全て含むような複数の候補へ分割することができる。言語解析辞書１３は、解析処理部がテキスト分割し読み付与するために参照する辞書である。
【０００８】
語彙作成手段１００１は、解析処理手段１２で分割されて読みを付与されたテキストを読み込み、分割した候補から任意の部分文字列の組み合わせを生成して、出力する。
【０００９】
語彙記憶手段１６は、音声認識用の辞書であり、語彙作成手段１００１で作成された部分文字列の組み合わせとその読みを認識語彙として記憶する。
【００１０】
図１９は、特許文献２で開示された手法による音声認識辞書作成の例である。「大阪大学菅平実験センター」という認識対象語は、形態素解析されて形態素へ分割される。分割した形態素それぞれに読みを付与し、これらの任意の組み合わせを辞書へ登録する。さらに、形態素分割のあいまい性、読み付与のあいまい性が考慮され、組み合わせのそれぞれに出現確率を付与することも可能である。この場合であれば、図１９に示す６つの形態素がそれぞれ一通りの読みを持つため、　６３通りの組み合わせが生成される。
【００１１】
また、認識対象語を、この語よりも短い言語単位の組み合わせとして表すことにより、任意の言い換えを大語彙連続音声認識の枠組みでも扱うことができる。一般的な大語彙の連続音声を対象とした音声認識方法として、大量のテキストから単語の連鎖確率を統計的に学習した言語モデルを認識辞書として用いる方法がある。例えば、特許文献３では、読みを考慮して日本語の大語彙を扱う言語モデルを作成する方法が開示されている。
【００１２】
これらの手法により認識辞書を作成することにより、音声認識を実施可能である。その典型的な手法は、非特許文献１に詳しく記されている。
【００１３】
【特許文献１】
特開２０００−０２９４９０（段落００５１）
【００１４】
【特許文献２】
特開２００２−４１０８１（第１図）
【００１５】
【特許文献３】
特開平１１−２５９０８８（段落００１１−００４６、第２図）
【００１６】
【非特許文献１】
「音声認識の基礎（上、下）」Ｌ．Ｒ．ＲＡＢＩＮＥＲ、Ｂ．Ｈ．ＪＵＡＮＧ（古井監訳）、１９９５年、１１月、ＮＴＴアドバンステクノロジ
【００１７】
【非特許文献２】
「音声認識システム」鹿野・伊藤・河原・武田・山本、２００１年、オーム社、ｐ１０８
【００１８】
【発明が解決しようとする課題】
しかし、特許文献２で開示された手法は、主に認識対象とするテキストの一部を組み合わせることにより言い換え表現を生成するものである。したがって認識対象となるテキスト表記には現れない表現を組み合わせて得られるような言い換え表現を生成することができない。また与えられたテキストの部分の順序が入れ替わる言い換え表現を生成することもできないという問題がある。
【００１９】
また、特許文献３で開示された手法は、高精度な言語モデルの学習には認識対象とするユーザ発声を大量に収集し、テキスト化する必要がある。これは、非常に高コストであり、データ収集を含めると開発に長い時間を要する。また、認識単語数が増加すると、全ての単語の十分な言い回しを集めること自体が困難という問題がある。さらに、認識結果と認識対象となる語の関係が明確でないという問題がある。
【００２０】
そこで、本発明の目的は、認識語彙を低コストかつ効率的に追加することにより、高い認識精度を得る音声認識用辞書作成装置、作成された辞書を用いた音声認識装置、および音声認識用辞書作成方法、作成された辞書を用いた音声認識方法を提供することである。
【００２１】
【課題を解決するための手段】
本発明に係る音声認識用辞書作成方法は、見出し語を入力する入力ステップと、不揮発性記憶装置が記憶し入力語と出力語との関係を表現する語置換規則に基づいて、上記見出し語を上記入力語とする上記出力語を上記言い換え表現として取得し、さらにその言い換え表現の読みを取得する言い換え表現作成ステップと、上記言い換え表現とその読みを音声認識用辞書に記憶させる出力ステップを有するものである。
【００２２】
【発明の実施の形態】
実施の形態１．
図１は、第１の実施の形態に係る音声認識用辞書の作成方法と、これを用いた音声認識方法の動作を説明するブロック図である。図１において、１０は認識対象となる文字列表記を含む文字列情報である。文字列情報１０は、ハードディスク装置が記憶するファイルやＲＡＭが記憶する文字列、インターネット上のＨＴＭＬファイルなどでよく、処理の都度キーボードより入力することで与えてもよい。１１はユーザが発声するバリエーション表現を文字列情報１０にマッチング可能とする言い換え表現を生成する辞書作成装置である。辞書作成装置１１において、１２は文字列情報１０で示される表記テキストを部分文字列へ分割し、それぞれの部分文字列へ読みを付与する解析手段である。１３は解析処理手段１２が文字列情報１０をテキスト分割し、各部分文字列に読み付与するために参照する言語解析辞書である。１４は言い換え表現を生成する言い換え生成手段であって、１５は言い換え表現手段１４が言い換え表現を生成するために参照する言い換え辞書である。また１１０は認識処理の対象となる入力音声であって、１１１は入力音声１１０の音声認識を行う音声認識装置である。音声認識装置１１１において、１１２は入力音声１１０の分析を行う音響分析手段であり、１１３は音響分析手段１１２の出力結果と音響標準パタンとの尤度を求める尤度計算手段である。１１４は尤度計算手段１１３が参照する音響標準パタンであって、１１５は語彙記憶手段１６と尤度計算手段１１３との出力を照合して音声認識を行う照合手段である。なお、上記において、言語解析辞書１３、言い換え辞書１５、音響標準パタン１１４は、主としてハードディスク装置が記憶するファイルにより構成されるが、ＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）や磁気カードに記憶させたものを用いてもよく、また他の情報処理装置が動的に生成する結果をプロセス間通信などにより読み込んでこれらの構成要素としてもよい。
【００２３】
本実施の形態による辞書作成装置１１の動作について説明する。文字列情報１０が入力されると、解析処理手段１２は文字列情報１０をその形態素や文字などの単位に基づいて、部分文字列に分割する。次に解析処理手段１２は部分文字列に対応する読みを言語解析辞書１３より読み込む。言語解析辞書１３は、文字列表記ごとに少なくとも読み情報を記憶している。図２は、分割単位を形態素とした場合の、言語解析辞書１３が記憶する文字列表記と読みの例を示すものである。言語解析辞書１３は、文字列表記と対応する読みの他に、解析のための言語情報として、品詞や部分文字列間の接続確率などの情報を保持していてもよい。
【００２４】
次に言い換え生成手段１４は、解析処理手段１２の出力に対して、言い換え辞書１５が記憶する規則を適用して言い換えを生成し、言い換えと入力した元のテキストとの対応付けを付与して語彙記憶手段１６へ出力する。図３は、言い換え生成手段１４が言い換え表現を生成するために参照する言い換え辞書１５の構成例である。この例では、言い換え辞書１５は入力の形態素列、その読みと、出力する言い換えの形態素列と読みの対応付けを記憶している。図において、出力側の欄内に「ＮＩＬ」と記載されている場合は、入力側に指定された表現が省略可能であることを示している。
【００２５】
言い換え生成手段１４の出力結果は、語彙記憶手段１６によって音声認識用辞書として保管される。ここに格納される内容は、認識語彙の音響標準パタンの並びを表す読みと、読みと対応する元の入力テキストである。さらに、元のテキストおよび読みに付与された、付加情報があれば、それらも保持することもできる。付加情報とは、例えば、出現尤度、認識語彙間の接続情報である。
【００２６】
次に本実施の形態における音声認識装置１１１の動作について説明する。ユーザが入力音声１１０を発声すると図示せぬマイクロフォンなどによりこれを取り込み、音響分析手段１１２は、入力音声１０を一定時間間隔で分析して、音声の特徴をよく表す音響特徴量を計算する。例えば、１６ｋＨｚで標本化された音声信号を１０ｍｓ間隔で窓長２５ｍｓのＨａｍｍｉｎｇ　窓で切り出して、１４次のＬＰＣ分析から１０次のメルケプストラム、１０次のデルタメルケプストラムを求め、１次のデルタパワーと合わせた合計２１次元の音響特徴量ベクトルを計算する。
【００２７】
このようにして求められた音響特徴量に対し、尤度計算手段１１３は、音響標準パタン１１４の記憶する音響標準パタンを照合して、照合の度合いを示す尤度を求める。音響標準パタン１１４とは、音声の断片について音響特徴量の性質を表す標準モデルであって、例えば音素を単位として、ＨＭＭ（隠れマルコフモデル）等によりモデル化されたものである。また、それぞれのモデルの構造はＬｅｆｔ−ｔｏ−ｒｉｇｈｔ型３状態、出力確率密度関数が１６混合の対角共分散行列からなるガウス分布とすることができる。
【００２８】
さらに照合手段１１５は、語彙記憶部から読み込んだ認識語彙の音響標準パタン系列に従い、例えばビタビアルゴリズムを使って認識候補の尤度を加算した累積尤度を計算する。入力音声の終端に到達したら、尤度の大きさを比較して認識結果を決定する。
【００２９】
次に図４の動作フローを用いて、本実施の形態による辞書の作成手順を説明する。ここでは、例として、「大阪大学菅平実験センター」という語を形態素単位の部分文字列へ分割して言い換えを生成する処理の過程を示すこととする。
【００３０】
まずステップＳ１１において、解析処理手段１２は、文字列情報１０の表記テキストを部分文字列へ分割し、それぞれの部分文字列へ読みを付与する。部分文字列への分割は、一般的な仮名漢字変換や形態素解析と同一の手法を用いることができる。例えば、文字列の左側から辞書と一致する最長部分を逐次切り出す方法や、分割したテキストの組み合わせの中から読み付与辞書１３の部分文字列と読みに付与されたスコアが高くなる部分文字列の組み合わせを選択する方法を用いてもよい。
【００３１】
部分文字列への分割や読みの付与にあいまい性がある場合は、可能な部分文字列の組み合わせを包含した形式で出力する。出力形式は、例えば、あいまい性を展開して列挙したものや、ラティスやトレリスを用いたより効率的な表現を用いる。ラティスやトレリスによる表現方法は、非特許文献２に詳しく説明されている。図２に示した辞書は、形態素を単位とした、部分文字列と対応する読みの組み合わせを示している。「大阪大学菅平実験センター」という入力は、形態素・読み付与のあいまい性を考慮すると、図５に示す３通りの解析候補が得られる。ただし、図中、スラッシュ（／）は部分文字列区切り、括弧内はカタカナ表記で当該部分文字列の読みを示す。
【００３２】
なお、解析処理手段１２は、文字列情報１０として、表記テキストの他にその読みを受け取ってもよい。この場合には、部分文字列に付与される読みは、文字列情報１０の有する読みと整合するものとする。図５の例では読み「オオサカダイガクスガダイラコウゲンジッケンセンター」という読みが付与されていれば、［１］の候補のみを選択されることになる。
【００３３】
次にステップＳ１２において言い換え生成手段１４は、解析処理手段１２の出力を言い換え辞書１５と照合する。その結果、言い換え生成手段１４は、部分文字列のうち言い換え辞書１５との照合に成功したものを言い換え辞書中の表現に置換することで、言い換え表記とその読みを作成する。ここで、言い換え辞書１５との照合は、解析処理手段１２が出力した部分文字列の複数の部分を範囲としても良い。また照合にあいまい性が生じる場合、すなわち、照合結果として複数の候補が選択できる場合には、それらの組み合わせを全て展開する。図３に示した例では、「大阪／大学」は「阪大」、「菅平／高原」は「菅平」、「実験／センター」は「実験／場」と置き換え可能であることがわかる。この結果、図５に示した分割・読み付与候補から、図６に示す１６通りの言い換え文字列を生成する。
【００３４】
最後にステップＳ１３において、生成した言い換え文字列を語彙記憶手段１６へ追加する。
【００３５】
次に図７の動作フローを参照し、本実施の形態による音声認識の手順を説明する。まず、ステップＳ１１０１において音響分析手段１１２は、入力音声１１０を１時刻フレーム分読み込み、音響分析して音響特徴量を得る。続いてステップＳ１１０２において、その音響特徴量と各音響標準パタン間の尤度を計算する。次にステップＳ１１０３において、認識語彙ごとに読みが指定する音響標準パタンの尤度を加算し、それまでの累積尤度へ加算する。次にＳ１１０４において、入力音声が終端に到達しているか判定し、到達していなければステップＳ１１０１へ戻る。最後にステップＳ１１０５において、入力音声の終端に到達したら、累積尤度が大きい認識候補を求め認識結果として出力する。
【００３６】
以上のように、本実施の形態によれば、文字列を分割し、分割された部分文字列に読み付与辞書を用いて読みを付与して、言い換え辞書に従って言い換え表現を生成可能である。言い換え表現は、辞書を用いて生成するので、元の文字列が含まない表現を生成することができる。また、単に一部の部分文字列をスキップして言い換え表現を生成する方法に比べると、不要な言い換えの生成を少なくすることができる。
【００３７】
なお、本実施の形態による辞書作成方法は、部分文字列への分割を行っているが、言い換え辞書は部分文字列だけでなく入力文字列全体に対しても適用可能であることはいうまでもなく、したがって部分文字列への分割処理を省略しても、言い換え表現を生成することが可能である。
【００３８】
また、本実施の形態による辞書作成方法、音声認識方法は、プログラムとして記憶媒体に記憶することもできる。この場合、このプログラムは、図１の辞書作成装置１１に対応する辞書作成プログラムと、音声認識装置１１１に対応する音声認識プログラムから構成される。辞書作成プログラムは、テキスト分割および読み付与手段１２と同様の処理を行う解析処理機能、言い換え生成手段１４と同様の処理を行う言い換え生成機能、語彙記憶手段１６と同様の処理を行う語彙記憶機能から構成されるソフトウェアである。また、音声認識プログラムは、音響分析手段１１２と同様の処理を行う音響分析機能、尤度計算手段１１３と同様の処理を行う尤度計算機能、照合手段１１５と同様の処理を行う照合機能から構成されるソフトウェアである。
【００３９】
実施の形態２．
図８は、実施の形態２に係る音声認識用辞書の作成方法を説明するブロック図である。図８において、２１は本実施の形態による辞書作成装置であり、辞書作成装置２１において、２２は文字列を部分文字列に分割し、それぞれの部分文字列にその読みと読み以外の言語情報を付与する言語解析手段である。また２３は文字列についての読み情報と言語情報を記憶する言語解析辞書である。２４は言語解析手段２２の出力結果に基づいて、言い換え表現を生成する言語情報付き言い換え生成手段であって、２５は、言語情報付き言い換え生成手段２４が参照する言語情報付き言い換え辞書である。なお本実施の形態において、実施の形態１と同じ符号を付した構成要素については、実施の形態１と同様であるため説明を省略する。
【００４０】
次に図９の動作フローを用いて、本実施の形態に示す辞書の作成手順を説明する。ここでは、実施の形態１の場合と同様に、「大阪大学菅平実験センター」という入力例について、形態素単位の部分文字列へ分割して言い換えを生成する処理の過程を示す。
【００４１】
初めにステップＳ２１において、言語解析手段２２は、文字列情報１０の表記テキストを部分文字列へ分割し、それぞれの部分文字列へ読み・言語情報を付与する。典型的な言語解析部の処理は、次のようなものである。
【００４２】
入力の表記文字列を形態素解析し、分割された形態素を単位として読みと品詞情報を得る。次に、形態素に付与された情報から言い換え生成に必要な形態素ごとの意味情報を言語解析辞書２３より得る。意味情報とは、地名・人名などの固有名詞のさらに詳細な情報や、業種・職種を表す語、修飾語などの分類である。さらに形態素を単位として、表記・品詞・意味を参照して、形態素間の係り受け関係や、並列関係などの統語情報を求める。部分文字列への分割や付与する言語情報にあいまい性がある場合、言語解析手段２２は可能な組み合わせをすべて包含した形式で出力する。
【００４３】
図１０は、解析結果の一例である。分割したそれぞれの部分文字列には読み、品詞、意味の言語情報が付与されている。また、複数の部分文字列にまたがる係り受けや並列関係の統語情報が付与されている。解析の結果、入力例は６形態素からなり、さらに３つの複合名詞から構成されていること、先頭の二つの複合名詞はそれぞれ最後の複合名詞にかかる並列構造を持つことがわかる。
【００４４】
なお、言語解析手段２２の入力は、テキスト表記と部分的な言語解析結果としてもよい。部分的な言語解析結果とは、例えば、図１０で示した解析結果の一部である。あらかじめ部分的な言語解析結果を与えることにより、言語解析の誤りを防ぐ効果がある。この場合、部分文字列の分割結果と付与される言語情報は、入力の言語情報と整合するものとする。
【００４５】
次にステップＳ２２において、言語情報付き言い換え生成手段２４は、言語解析手段２２の出力を、言語情報付き言い換え辞書２３と照合する。この照合処理においては、部分文字列の表記、読みのほか、部分文字列の品詞、意味、統語情報を利用することができる。辞書との照合にあいまい性がある場合は、それらの組み合わせを全て展開する。
【００４６】
図１１は、言語情報付き言い換え辞書２５の内容の例を示したものである。本実施の形態では、言語情報付き言い換え辞書は図のように、入力値の条件とそれに対応する出力値の組み合わせを、規則という形で与え、この規則が複数集合したものとなっている。各規則には、２−１、２−２のように規則番号が付与されている。この例では、入力値の条件として、表記の他、意味・構文による構造情報が表されている。ここで、図中の「＊」は、照合の際に無視できる項目であることを示す。また、出力値に「＜ｎ＞（ｎは数字）」と記載されている場合は、照合結果のｎ番目の部分文字列を出力とすることを示す。規則番号「２−１」「２−２」は、表記のみと対応する言い換えの例である。一方、規則「２−３」は地名の接尾語が省略可能であることを示す規則である。この規則により、表記上で「菅平／高原」を「菅平」と言い換える場合があることを表している。また、規則「２−４」では、２つの項（２つの部分文字列）からなる並列関係を検出したとき、それらの順番を入れ替えた言い換えを生成する規則の例を示している。このような規則の表現を許すことにより、語順の入れ替えや、隣接する部分文字列の言語情報に依存した言い換えの生成を処理できる。複数の部分文字列の照合は、統語情報を利用する。このため、「大学／菅平」「高原／実験」のように隣接しても、直接の統語関係がない場合は照合しない。
【００４７】
ステップＳ２３において言語情報付き言い換え生成手段２４が照合に成功した場合は、該当部分を言い換え辞書の出力表現に置換した表記・読みを作成する。図３に示した辞書では、実施の形態１について図６に示した言い換えの生成のほかに、省略や語順の入れ替えを許すため、図１２に示す１６通りの言い換えが生成可能である。
【００４８】
最後にステップＳ２４において、生成した文字列を辞書へ追加する。
【００４９】
本実施の形態によれば、言語情報付き言い換え辞書２５に従って表記と読みに加えて、意味や統語情報などの言語情報を利用することにより、言い換え表現を生成できる。ここで生成する言い換え表現は、言語情報を考慮したものであるため、不適切な言い換えを廃し、実際の発声を広範囲にカバーする結果、このような認識辞書を用いることで、従来より音声認識の精度を向上することができる。
【００５０】
なお、本実施の形態における辞書作成方法は、プログラムとして記憶媒体に記憶することもできる。このプログラムは、言語解析手段２２と同様の処理を行う言語解析機能、言語情報付き言い換え生成手段２４と同様の処理を行う言語情報付き言い換え生成機能、語彙記憶手段１６と同様の処理を行う語彙記憶機能から構成されるソフトウェアである。
【００５１】
実施の形態３．
図１３は、実施の形態３に係る音声認識用辞書の作成方法を説明するブロック図である。図１３において、３０は言い換え表現の生成対象となる文字列情報である。本実施の形態においては、文字列情報３０は出現頻度情報も有するものとする。３１は本実施の形態における辞書作成装置である。辞書作成装置３１において、３２は文字列情報３０のテキスト表記を部分文字列に分割するとともに、各部分文字列に出現頻度尤度を付与する言語解析・尤度付与手段である。３３は言語解析・尤度付与手段３２が参照する言語解析用尤度付き辞書である。３４は言語解析・尤度付与手段３２の出力結果に基づいて、各部分文字列に規則を適用し、言い換え表現を生成する一方で、言い換え生成尤度を付与する言語情報・尤度付き言い換え生成手段である。３５は言語情報・尤度付き言い換え生成手段３４が参照する言語情報・尤度付き言い換え辞書である。３６は言語情報・尤度付き言い換え生成手段３４の出力結果に基づいて、各言い換え表現の発声尤度を計算する言い換え生成尤度計算手段である。なお本実施の形態において、実施の形態１と同じ符号を付した構成要素については、実施の形態１と同様の動作を行うものであるため説明を省略する。
【００５２】
本実施の形態の特徴的な部分は、辞書作成装置３１が、出現頻度情報と、テキスト分割および言語情報付与における解析の尤もらしさと、生成した言い換えが出現する確率を考慮した尤度を生成した言い換えに付与する点にある。以下、図１３の機能ブロックについて説明する。
【００５３】
言語解析・尤度付与手段３２は、文字列情報３０から表記テキストを読み込み、可能な全ての分割候補による部分文字列へ分割する一方で、言語解析用尤度付き辞書３３を参照して、それぞれの部分文字列へ言語情報、出現頻度尤度および言語解析尤度を付与する。ここで言語情報には、部分文字列の読みと、品詞、意味、統語情報などを含み、出現頻度尤度には、文字列情報３０が有する出現頻度情報から求められる出現のしやすさを表す数値を含む。また言語解析尤度とは、表記テキストから分割された各部分文字列に付与された言語情報の尤もらしさを表す数値である。言語解析・尤度付与手段３２の解析結果は、分割された各部分文字列とその言語情報、出現頻度尤度、言語解析尤度の組、あるいは等価な出力形式で出力する。例えば図５で示した３つの分割・言語情報付与候補に対して、それぞれＬ０（１）、Ｌ０（２）、Ｌ０（３）、Ｌ０（４）という出現頻度尤度と、Ｌ１（１）、Ｌ１（２）、Ｌ１（３）、Ｌ１（４）という言語解析尤度を付与する。
【００５４】
次に、言語情報・尤度付き言い換え生成手段３４は、言語解析・尤度付与手段３２の出力結果を読み込み、言語情報・尤度付き言い換え辞書３５の記憶する規則の中から適用可能なものを選択して、言い換え表現を生成する。その一方で、言語情報・尤度付き言い換え生成手段３４は、それぞれの言い換えが生成される出現確率を表す言い換え尤度を付与する。例えば、図６で示した言い換え生成結果について、それぞれＬ２（１−１）、Ｌ２（１−２）…というように、言い換え生成尤度を付与する。
【００５５】
最後に、言い換え生成尤度計算手段３６は、言語情報・尤度付き言い換え生成手段３４の出力を読み込み、上記で説明した出現頻度尤度Ｌ０、言語解析尤度Ｌ１、言い換え尤度Ｌ２と、次に説明する読み配列尤度Ｌ３のうち、少なくとも一つを用いて対象語の発声尤度を計算し、認識語彙、その読みとともに語彙記憶手段１６へ格納する。この読み配列尤度Ｌ３とは、生成した読みの発声のしやすさや一般性を考慮して算出される尤度である。例えば、生成された認識語彙の読みＹがｍ個のモーラによりＹ＝［ｙ_１．．．ｙ_ｍ］　と表わすことができるとき、読み付与尤度Ｌ３を発声される確率をＰ（Ｙ）とする。さらに、Ｐ（Ｙ）は、語彙のモーラ数に関して定義される確率分布　とモーラ単位のＮ−ｇｒａｍ確率Ｐ_ｓｅｑ（Ｙ）の重み付き線形和として、Ｐ（Ｙ）＝α_１Ｐ_ｌｅｎ（ｍ）＋α_２Ｐ_ｓｅｑ（Ｙ）、あるいは両者の積であるＰ（Ｙ）＝α_１Ｐ_ｌｅｎ（ｍ）×α_２Ｐ_ｓｅｑ（Ｙ）　とする。ここでα_１、α_２は重み付けパラメータである。Ｐ_ｓｅｑ（Ｙ）は、式１に基づいて算出する。
【００５６】
【数１】

【００５７】
次に図１４を用いて参照し、実施の形態３にかかるシステムの動作フローを説明する。まずステップＳ３１において、言語解析・尤度付与手段３２は、文字列情報および出現頻度情報３０の表記テキストを部分文字列へ分割し、それぞれの部分文字列へ言語情報と言語解析尤度を付与する。言語解析尤度は、例えば、解析時に適用したそれぞれの規則にあらかじめ尤度を付与しておき、それらの重み付き加重和や積として算出する。
【００５８】
次にステップＳ３２において、言語情報・尤度付き言い換え生成手段３４は、言語情報・尤度付き言い換え辞書３５を参照し、言語解析・尤度付与手段３２の出力である表記の部分文字列あるいは付与した言語情報と照合する辞書エントリを検索する。
【００５９】
続いてステップＳ３３において、言い換え生成尤度計算手段３６は、テキスト分割および読み付与尤度Ｌ１、言い換え尤度Ｌ２、生成された認識語彙の読み配列に基づく読み配列尤度Ｌ３の少なくとも一つを用いて、例えばそれらを重み付き加算して、それぞれの言い換えごとに尤度を付与する。
【００６０】
最後にステップＳ３４において、生成した文字列と尤度を認識辞書へ追加する。
【００６１】
本実施の形態によれば、言語情報・尤度付き言い換え辞書の記憶する言語情報を参照して照合処理を行うことにより、もとの文字列表記にはない表記を用いた言い換え表現を生成可能である。このため、不要な言い換えを生成することが少なく、効率的に言い換えを自動で生成することができる。さらに、それぞれの認識語彙に言語解析の信頼性、言い換えられる表現の出現確率を考慮した尤度を付与しており、この尤度は、言い換え候補の尤もらしさを表しているため、音声認識時に計算する累積尤度と合わせて、認識結果に反映することにより、精度の高い音声認識処理を実現することができる。
【００６２】
なお、本実施の形態による辞書作成方法、音声認識方法は、プログラムとして記憶媒体に記憶することもできる。この場合、このプログラムは、言語解析・尤度付与手段３２と同様の処理を行う言語解析・尤度付与機能、言語情報・尤度付き言い換え生成手段３４と同様の処理を行う言語情報・尤度付き言い換え生成機能、語彙記憶手段１６と同様の処理を行う語彙記憶機能から構成されるソフトウェアである。
【００６３】
実施の形態４．
図１５は、実施の形態４に係る音声認識用辞書の作成方法を説明するブロック図である。本実施の形態において、４１は生成した言い換え表現のうち尤度の低いものを削除する語彙候補枝刈り手段である。なお、本実施の形態において実施の形態３と同一の符号を付した構成要素については、実施の形態３と同様の動作を行うものであるため、説明を省略する。
【００６４】
語彙候補枝刈り手段４１は、認識語彙の表記・読みと、言い換え生成尤度計算手段３６にて計算された言い換え生成尤度を入力として読み込み、入力された文字列情報一つごとに生成される認識語彙とその尤度のうち、尤度値の相対順位、尤度値としきい値との比較の少なくとも一条件により選んだ認識語彙のみ語彙記録部へ登録する。
【００６５】
次に図１６を用いて、本実施の形態に係るシステムの動作フローを説明する。ただし、ステップＳ３１、Ｓ３２、Ｓ３３については実施の形態３と同様の動作を行うものであるため、同一の記号を付し、説明を省略する。
【００６６】
ステップＳ４１において、語彙候補枝狩り手段４１は、ステップＳ３３により生成された認識語彙のうち、同一の語から生成された言い換えの中の相対的な尤度差、しきい値の少なくとも一条件を用いて、尤度が小さい言い換えを認識候補から削除する。
【００６７】
次に、ステップＳ４２において、ステップＳ４１の結果残存している言い換え候補を認識語彙として語彙記憶手段１６へ記憶する。
【００６８】
本実施の形態によれば、尤度が低く、出現する見込みが少ない言い換えを認識語彙から削除するため、この結果得られる認識辞書を用いて音声認識を行うことにより、語彙候補枝刈りを実施しない場合に比べて認識辞書サイズを削減することができ、限られた計算量・メモリで言い換えを処理可能とする効果がある。
【００６９】
なお、本実施の形態における辞書作成方法、音声認識方法はプログラムとして記憶媒体に記憶することもできる。この場合、このプログラムは、言語解析・尤度付与手段３２と同様の処理を行う言語解析・尤度付与機能、言語情報・尤度付き言い換え生成手段３４と同様の処理を行う言語情報・尤度付き言い換え生成機能、語彙候補枝刈り手段４１と同様の処理を行う語彙候補枝刈り機能、語彙記憶手段１６と同様の処理を行う語彙記憶機能から構成されるソフトウェアである。
【００７０】
実施の形態５．
図１７は、実施の形態５に係る音声認識用辞書の作成方法を説明するブロック図である。図において、５１は一以上の言い換え表現から所定の制約に適合する言い換え表現を選択する言い換え検証手段である。５２は言い換え検証手段５１に対して制約条件を与えるシステム知識データベースである。なお、本実施の形態において実施の形態３と同一の符号を付した構成要素については、実施の形態３と同様の動作を行うものであるため、説明を省略する。
【００７１】
次に本実施の形態による処理について説明する。言い換え検証手段５１は、言い換え生成尤度計算手段３６の出力する登録対象語彙の言い換え表現を全て読み込む。次に、システム知識データベース５２に与えられた制約に従い、認識語彙に用いる言い換え表現を選択する。システム知識データベース５２による制約とは、例えば音声認識システムの計算速度、メモリ量など、現実に実時間処理するために課せられる制約であり、これを満たすために生成された言い換え全体から尤度の低いものを順次削除する。具体的には、認識語彙から計算量と必要なメモリ量を求め、システムの条件を超える場合は、尤度の低い言い換えから順に認識語彙から削除する。ただし、全ての語について少なくとも一つの認識語彙は残す。
【００７２】
システム知識データベース５２による別の制約は、音声認識の性質から認識困難な語彙を削除するものである。例えば、認識語彙の読みの長さが非常に短い場合、十分な認識精度が確保できないという音声認識の制約がある。これを避けて十分な精度を得るために、例えば２音節以下の短い言い換えを削除する。あるいは、言い換え表現として同音異義語が多数生成されることによる選択範囲の制約も考えられる。同音、あるいは非常に類似した認識語彙がある場合は、正しく認識できたとしても、さらに認識語彙の候補から同定する必要が生じる。この候補数が増加すると、認識しても同定の処理が困難となる。そこで、このような制約条件をシステム知識データベース５２に定義することにより、尤度が低い同音あるいは類似した言い換えを削除する。
【００７３】
またその他の制約として、対象とするユーザ目的に応じた語彙の設定を行うことが考えられる。例えば、ある施設名がユーザ発話の認識対象であっても、ユーザが施設の電話番号を尋ねる場合と、施設近辺の天気を尋ねる場合では、それぞれ言い換えの傾向が異なる。これは、電話番号を尋ねる場合は、対象施設のチェーン名など、他の施設と識別する情報が強調される一方、天気を尋ねる場合は場所の情報こそが重要と考えられるためである。このような目的を達成するためにタスク知識による言い換え型の制約を条件としてシステム知識データベースに記述する。
【００７４】
このような言い換え検証部５１による処理を通じて、システムが実用的に稼動可能な認識語彙を選択する。最後に選択された言い換えとその尤度を認識対象語彙として語彙記憶手段１６へ出力する。
【００７５】
本実施の形態によれば、システムの言い換え検証手段５１によって、システムの制約を考慮した認識語彙を設定可能となり、全体の認識精度を改善させる効果がある。また、限られた計算量・メモリでの実施のために、認識辞書サイズを削減する効果がある。この結果、音声認識に用いた場合は、コンパクトで高精度の音声認識エンジンが構築可能となる。
【００７６】
なお、本実施の形態における辞書作成方法、音声認識方法は、プログラムとして記憶媒体に記憶することもできる。この場合、このプログラムは、言語解析・尤度付与手段３２と同様の処理を行う言語解析・尤度付与機能、言語情報・尤度付き言い換え生成手段３４と同様の処理を行う言語情報・尤度付き言い換え生成機能、言い換え検証手段５１と同様の処理を行う言い換え検証機能、語彙記憶手段１６と同様の処理を行う語彙記憶機能から構成されるソフトウェアである。
【００７７】
【発明の効果】
本発明は、入力語と出力語の関係を記述した語置換規則に基づいて見出し語の言い換え表現とその読みを作成することとしたので、見出し語の表記上出現しない表現を組み合わせた表現を含む音声認識用辞書を自動生成することが可能となる。
【図面の簡単な説明】
【図１】実施の形態１による辞書作成装置と音声認識装置のブロック図である。
【図２】実施の形態１における言語解析辞書の記憶内容例を示す図である。
【図３】実施の形態１における語置換規則の例を示す図である。
【図４】実施の形態１における辞書作成処理を表すフローチャートである。
【図５】実施の形態１における形態素解析を用いた文字列分割結果の例を示す図である。
【図６】実施の形態１における言い換え表現生成結果の例を示す図である。
【図７】実施の形態１における音声認識処理を表すフローチャートである。
【図８】実施の形態２における辞書作成装置のブロック図である。
【図９】実施の形態２における辞書作成処理のフローチャートである。
【図１０】実施の形態２における言語的意味の付与例を示す図である。
【図１１】実施の形態２における語置換規則の例を示す図である。
【図１２】実施の形態２における言い換え表現生成結果の例を示す図である。
【図１３】実施の形態３における辞書作成装置のブロック図である。
【図１４】実施の形態３における辞書作成処理のフローチャートである。
【図１５】実施の形態４における辞書作成装置のブロック図である。
【図１６】実施の形態４における辞書作成処理のフローチャートである。
【図１７】実施の形態５における辞書作成装置のブロック図である。
【図１８】従来技術による辞書作成装置のブロック図である。
【図１９】従来技術の動作例を示す図である。
【符号の説明】
１０：文字列情報　１１：辞書作成装置　１２：解析処理手段
１３：言語解析辞書　１４：言い換え生成手段　１５：言い換え辞書
１６：語彙記憶手段　２１：辞書作成装置　２２：解析処理手段
２３：言語解析辞書　２４：言語情報付き言い換え生成手段
２５：言い換え辞書　３１：辞書作成装置　３２：言語解析・尤度付与手段
３３：言語解析用尤度付き辞書　３４：言語情報・尤度付き言い換え生成手段
３５：言語情報・尤度付き言い換え辞書　３６：言い換え生成尤度計算手段
４１：語彙候補枝刈り手段　５１：言い換え検証手段
５２：システム知識データベース　１１０：入力音声　１１１：音声認識装置
１１２：音響分析手段　１１３：尤度計算手段　１１４：音響標準パタン
１１５：照合手段　１００１：語彙作成手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a method and apparatus for creating a recognition dictionary for speech recognition capable of automatically registering paraphrases, and a speech recognition apparatus using a dictionary created by this method.
[0002]
[Prior art]
Since the conventional speech recognition system performs recognition based on the vocabulary registered in the recognition dictionary, it cannot recognize a vocabulary not registered in the recognition dictionary. However, the user does not always utter the vocabulary registered in the recognition dictionary. For example, the user does not always know the name to be uttered accurately, and actually utters while omitting a part determined to be omissible even if he knows the correct name. Therefore, in a recognition dictionary used for speech recognition, a plurality of paraphrases must be registered in advance so that a user can recognize the same word or concept even if the user speaks differently. For example, Patent Literature 1 discloses a method in which paraphrase expressions of operation commands by voices of car navigation are registered in advance, and a user can correctly recognize even if a user makes a plurality of utterances.
[0003]
Here, for example, when the notation of the word to be recognized is “Sugadaira Kogen Experiment Center, Osaka University” and the pronunciation indicating the phonetic notation is (Osaka Daiga Kusadaira Kogengenkenken Center), the user can select “Osakadai Sugadaira "Experiment Center" (Handais Gadadai Ricken Center), "Osaka University Sugadaira Experiment Center" (Handais Gadadai Rickenjo), "Osaka University Sugadaira Experiment Center" (Osaka Daigaku Sugadaira Ricken Center), "Sugadaira Experiment Center" ( Sugadai Rajiken Center), “Sugahira Osaka University Experimental Center” (Sugadaira Handi Jikken Center), etc. can be uttered in other words. Conventionally, however, all possible variations of the wording are manually added to the dictionary. Was.
[0004]
However, when the number of target words is large or when the vocabulary to be recognized is sequentially updated, it is difficult to manually register all of them, and automatic processing is essential.
[0005]
In order to solve this problem, as a method of automatically generating paraphrases for limited text, a method of automatically adding paraphrases to the dictionary taking into account morphological analysis and ambiguity of reading addition and partial omission from text in the target range Is disclosed in Patent Document 2.
[0006]
FIG. 18 is a functional block diagram for explaining the operation of the speech recognition dictionary creation device according to the technique disclosed in Patent Document 2. In FIG. 18, reference numeral 10 denotes character string information for which a paraphrase expression is to be obtained. Reference numeral 11 denotes a dictionary creation device disclosed in this document, and reference numeral 12 denotes an analysis processing unit that divides the character string information 10 into text and gives the reading thereof. Reference numeral 13 denotes a language analysis dictionary that the analysis processing unit 12 divides into texts and references and stores the texts in order to add the readings. 1001 denotes a vocabulary creation unit that generates a paraphrase expression based on the output of the text division and reading addition units 12. Reference numeral 16 denotes a vocabulary storage unit that stores the paraphrase expression generated by the vocabulary creation unit 1001.
[0007]
The analysis processing unit 12 divides the notation text indicated by the character string information 10 into partial character strings, and gives a reading to each partial character string. When there is ambiguity in the division method or the reading method, it can be divided into a plurality of candidates including all of them. The linguistic analysis dictionary 13 is a dictionary that the analysis processing unit refers to in order to divide and read texts.
[0008]
The vocabulary creation unit 1001 reads the text divided by the analysis processing unit 12 and given a reading, generates a combination of arbitrary partial character strings from the divided candidates, and outputs the combination.
[0009]
The vocabulary storage unit 16 is a dictionary for speech recognition, and stores a combination of partial character strings created by the vocabulary creation unit 1001 and its reading as a recognized vocabulary.
[0010]
FIG. 19 is an example of creating a speech recognition dictionary by the method disclosed in Patent Document 2. The recognition target word "Osaka University Sugadaira Experiment Center" is morphologically analyzed and divided into morphemes. Yomi is assigned to each of the divided morphemes, and an arbitrary combination of these is registered in the dictionary. Furthermore, the ambiguity of morpheme division and the ambiguity of giving a reading are considered, and it is also possible to assign an appearance probability to each of the combinations. In this case, since each of the six morphemes shown in FIG. 19 has one reading, 63 combinations are generated.
[0011]
Also, by expressing the recognition target word as a combination of linguistic units shorter than this word, any paraphrase can be handled in the framework of large vocabulary continuous speech recognition. As a speech recognition method for continuous speech of a general large vocabulary, there is a method of using a language model obtained by statistically learning a chain probability of words from a large amount of text as a recognition dictionary. For example, Patent Document 3 discloses a method of creating a language model that handles a large Japanese vocabulary in consideration of reading.
[0012]
By creating a recognition dictionary using these methods, speech recognition can be performed. The typical method is described in detail in Non-Patent Document 1.
[0013]
[Patent Document 1]
JP-A-2000-29490 (paragraph 0051)
[0014]
[Patent Document 2]
JP-A-2002-41081 (FIG. 1)
[0015]
[Patent Document 3]
JP-A-11-259088 (paragraphs 0011-0046, FIG. 2)
[0016]
[Non-patent document 1]
"Basics of speech recognition (upper, lower)" L. R. RABINER, B.A. H. JUANG (translated by Furui), November 1995, November, NTT Advanced Technology
[0017]
[Non-patent document 2]
"Speech Recognition System", Kano, Ito, Kawahara, Takeda, Yamamoto, 2001, Ohmsha, p108
[0018]
[Problems to be solved by the invention]
However, the technique disclosed in Patent Literature 2 generates a paraphrase expression by mainly combining a part of text to be recognized. Therefore, it is impossible to generate a paraphrase expression that can be obtained by combining expressions that do not appear in the text notation to be recognized. There is also a problem that it is not possible to generate a paraphrase expression in which the order of a given text portion is changed.
[0019]
Further, in the method disclosed in Patent Literature 3, it is necessary to collect a large amount of user utterances to be recognized and convert them to text for learning a language model with high accuracy. This is very expensive and takes a long time to develop including data collection. Further, when the number of recognized words increases, there is a problem that it is difficult to collect sufficient wording of all words. Furthermore, there is a problem that the relation between the recognition result and the word to be recognized is not clear.
[0020]
Accordingly, it is an object of the present invention to provide a speech recognition dictionary creation device that obtains high recognition accuracy by efficiently adding a recognition vocabulary at low cost, a speech recognition device using the created dictionary, and a speech recognition dictionary. An object of the present invention is to provide a creation method and a speech recognition method using the created dictionary.
[0021]
[Means for Solving the Problems]
In the method for creating a dictionary for speech recognition according to the present invention, based on an input step of inputting a headword and a word replacement rule that is stored in a non-volatile storage device and expresses a relationship between an input word and an output word, Having a paraphrase expression creating step of acquiring the output word as the input word as the paraphrase expression, and further acquiring a reading of the paraphrase expression, and an output step of storing the paraphrase expression and the pronunciation in a speech recognition dictionary It is.
[0022]
BEST MODE FOR CARRYING OUT THE INVENTION
Embodiment 1 FIG.
FIG. 1 is a block diagram illustrating a method for creating a dictionary for speech recognition according to the first embodiment and an operation of the speech recognition method using the dictionary. In FIG. 1, reference numeral 10 denotes character string information including a character string notation to be recognized. The character string information 10 may be a file stored in a hard disk device, a character string stored in a RAM, an HTML file on the Internet, or the like, and may be given by inputting from a keyboard each time processing is performed. Reference numeral 11 denotes a dictionary creation device that generates a paraphrase expression that enables a variation expression spoken by a user to be matched with the character string information 10. In the dictionary creation apparatus 11, reference numeral 12 denotes an analysis unit that divides the notation text indicated by the character string information 10 into partial character strings and gives a reading to each partial character string. Reference numeral 13 denotes a language analysis dictionary that is referred to by the analysis processing unit 12 for dividing the character string information 10 into text and reading and adding the partial character strings. Reference numeral 14 denotes a paraphrase generating unit that generates a paraphrase expression, and reference numeral 15 denotes a paraphrase dictionary referred to by the paraphrase expression unit 14 to generate a paraphrase expression. Reference numeral 110 denotes an input speech to be recognized, and reference numeral 111 denotes a speech recognition device that performs speech recognition of the input speech 110. In the speech recognition apparatus 111, reference numeral 112 denotes acoustic analysis means for analyzing the input speech 110, and reference numeral 113 denotes likelihood calculation means for calculating the likelihood between the output result of the acoustic analysis means 112 and the acoustic standard pattern. Reference numeral 114 denotes an acoustic standard pattern referred to by the likelihood calculating means 113, and reference numeral 115 denotes a matching means for performing voice recognition by checking the outputs of the vocabulary storage means 16 and the likelihood calculating means 113. In the above description, the language analysis dictionary 13, the paraphrase dictionary 15, and the sound standard pattern 114 are mainly constituted by files stored in the hard disk device, but may be stored in a ROM (Read Only Memory) or a magnetic card. Alternatively, a result dynamically generated by another information processing apparatus may be read by inter-process communication or the like and used as these components.
[0023]
The operation of the dictionary creation device 11 according to the present embodiment will be described. When the character string information 10 is input, the analysis processing unit 12 divides the character string information 10 into partial character strings based on units such as morphemes and characters. Next, the analysis processing unit 12 reads the reading corresponding to the partial character string from the language analysis dictionary 13. The linguistic analysis dictionary 13 stores at least reading information for each character string notation. FIG. 2 shows an example of character string notation and reading stored in the language analysis dictionary 13 when the division unit is a morpheme. The linguistic analysis dictionary 13 may hold information such as part of speech and connection probability between partial character strings as linguistic information for analysis, in addition to the reading corresponding to the character string notation.
[0024]
Next, the paraphrase generation unit 14 generates a paraphrase by applying the rules stored in the paraphrase dictionary 15 to the output of the analysis processing unit 12, and assigns a correspondence between the paraphrase and the input original text, and Output to the storage means 16. FIG. 3 is a configuration example of the paraphrase dictionary 15 referred to by the paraphrase generation unit 14 to generate a paraphrase expression. In this example, the paraphrase dictionary 15 stores an input morpheme string and its reading, and the correspondence between the output paraphrase morpheme string and the reading. In the drawing, when "NIL" is described in the column on the output side, it indicates that the expression specified on the input side can be omitted.
[0025]
The output result of the paraphrase generation unit 14 is stored by the vocabulary storage unit 16 as a speech recognition dictionary. The contents stored here are a reading indicating the arrangement of acoustic standard patterns of the recognized vocabulary, and an original input text corresponding to the reading. Furthermore, if there is additional information added to the original text and the reading, they can also be retained. The additional information is, for example, appearance likelihood and connection information between recognized vocabularies.
[0026]
Next, the operation of the speech recognition device 111 according to the present embodiment will be described. When the user utters the input voice 110, the input voice 110 is fetched by a microphone or the like (not shown), and the acoustic analysis unit 112 analyzes the input voice 10 at regular time intervals, and calculates an acoustic feature amount that well represents the characteristics of the voice. For example, an audio signal sampled at 16 kHz is cut out at intervals of 10 ms by a Hamming window having a window length of 25 ms, and a 10th-order mel cepstrum is obtained from a 14th-order LPC analysis, and a 10th-order delta mel-cepstrum is obtained. , And a total of 21-dimensional acoustic feature vector is calculated.
[0027]
The likelihood calculating means 113 collates the sound standard pattern stored in the sound standard pattern 114 with the sound feature amount obtained in this way, and obtains a likelihood indicating the degree of matching. The acoustic standard pattern 114 is a standard model that represents the characteristics of acoustic features of a speech fragment, and is modeled, for example, using a HMM (Hidden Markov Model) in units of phonemes. Further, the structure of each model can be a left-to-right type three-state, and a Gaussian distribution composed of a diagonal covariance matrix with 16 mixtures of output probability density functions.
[0028]
Further, the matching unit 115 calculates the cumulative likelihood by adding the likelihood of the recognition candidate using, for example, a Viterbi algorithm according to the acoustic standard pattern sequence of the recognized vocabulary read from the vocabulary storage unit. When the end of the input speech is reached, the magnitude of the likelihood is compared to determine the recognition result.
[0029]
Next, a dictionary creation procedure according to the present embodiment will be described with reference to the operation flow of FIG. Here, as an example, a process of generating a paraphrase by dividing the word “Sugadaira Experimental Center in Osaka University” into partial character strings in morpheme units will be described.
[0030]
First, in step S11, the analysis processing means 12 divides the notation text of the character string information 10 into partial character strings, and gives a reading to each partial character string. The division into partial character strings can be performed using the same method as general kana-kanji conversion or morphological analysis. For example, a method of sequentially cutting out the longest part that matches the dictionary from the left side of the character string, or a combination of a partial character string of the reading assignment dictionary 13 and a partial character string with a higher score given to the reading among combinations of divided texts May be used.
[0031]
If there is an ambiguity in the division into sub-character strings and the addition of readings, output is performed in a format that includes possible combinations of sub-character strings. As the output format, for example, an enumeration obtained by developing ambiguity or a more efficient expression using a lattice or trellis is used. The expression method using lattices and trellis is described in detail in Non-Patent Document 2. The dictionary shown in FIG. 2 shows combinations of partial character strings and corresponding readings in units of morphemes. With regard to the input “Osaka University Sugadaira Experiment Center”, three types of analysis candidates shown in FIG. 5 are obtained in consideration of the ambiguity of morpheme and reading assignment. However, in the figure, a slash (/) indicates a partial character string delimiter, and parentheses indicate the reading of the partial character string in katakana notation.
[0032]
Note that the analysis processing unit 12 may receive the reading as the character string information 10 in addition to the written text. In this case, the reading given to the partial character string matches the reading of the character string information 10. In the example of FIG. 5, if the reading “Osaka Daiga Kusadaira Kogen Jicken Center” is given, only the candidate [1] will be selected.
[0033]
Next, in step S12, the paraphrase generation unit 14 checks the output of the analysis processing unit 12 with the paraphrase dictionary 15. As a result, the paraphrase generation unit 14 creates a paraphrase notation and its reading by replacing, of the partial character strings, those successfully matched with the paraphrase dictionary 15 with expressions in the paraphrase dictionary. Here, the comparison with the paraphrase dictionary 15 may be performed with a plurality of parts of the partial character string output by the analysis processing unit 12 as a range. When ambiguity occurs in the collation, that is, when a plurality of candidates can be selected as a collation result, all combinations thereof are developed. In the example shown in FIG. 3, it can be seen that “Osaka / University” can be replaced with “Osaka University”, “Sugadaira / Kogen” can be replaced with “Sugadaira”, and “Experiment / Center” can be replaced with “Experiment / Place”. As a result, 16 different paraphrase character strings shown in FIG. 6 are generated from the division / reading assignment candidates shown in FIG.
[0034]
Finally, in step S13, the generated paraphrase string is added to the vocabulary storage unit 16.
[0035]
Next, the procedure of voice recognition according to the present embodiment will be described with reference to the operation flow of FIG. First, in step S1101, the acoustic analysis unit 112 reads the input voice 110 for one time frame and performs acoustic analysis to obtain an acoustic feature amount. Subsequently, in step S1102, the likelihood between the acoustic feature amount and each acoustic standard pattern is calculated. Next, in step S1103, the likelihood of the acoustic standard pattern specified by the reading is added for each recognized vocabulary, and the result is added to the cumulative likelihood up to that time. Next, in step S1104, it is determined whether the input voice has reached the end. If not, the process returns to step S1101. Finally, in step S1105, when the end of the input speech is reached, a recognition candidate having a large cumulative likelihood is obtained and output as a recognition result.
[0036]
As described above, according to the present embodiment, it is possible to generate a paraphrase expression in accordance with a paraphrase dictionary by dividing a character string, giving a reading to the divided partial character string by using a reading dictionary. Since the paraphrase expression is generated using the dictionary, an expression that does not include the original character string can be generated. Also, compared to a method of generating a paraphrase expression by simply skipping some partial character strings, generation of unnecessary paraphrases can be reduced.
[0037]
Although the dictionary creation method according to the present embodiment divides a character string into partial character strings, it goes without saying that the paraphrase dictionary can be applied to not only partial character strings but also entire input character strings. Therefore, it is possible to generate a paraphrase expression even if the division into partial character strings is omitted.
[0038]
Further, the dictionary creation method and the speech recognition method according to the present embodiment can be stored in a storage medium as a program. In this case, this program includes a dictionary creation program corresponding to the dictionary creation device 11 of FIG. 1 and a speech recognition program corresponding to the speech recognition device 111. The dictionary creation program includes an analysis processing function for performing the same processing as the text division and reading provision means 12, a paraphrase generation function for performing the same processing as the paraphrase generation means 14, and a vocabulary storage function for performing the same processing as the vocabulary storage means 16. It is software that is composed. The speech recognition program includes an acoustic analysis function that performs the same processing as the acoustic analysis unit 112, a likelihood calculation function that performs the same processing as the likelihood calculation unit 113, and a matching function that performs the same processing as the matching unit 115. Software.
[0039]
Embodiment 2 FIG.
FIG. 8 is a block diagram illustrating a method for creating a dictionary for speech recognition according to the second embodiment. In FIG. 8, reference numeral 21 denotes a dictionary creation device according to the present embodiment. In the dictionary creation device 21, reference numeral 22 divides a character string into partial character strings, and stores the reading and linguistic information other than the reading in each partial character string. This is a language analysis means to be provided. Reference numeral 23 denotes a language analysis dictionary that stores reading information and language information on character strings. Reference numeral 24 denotes a paraphrase-with-language information generating unit that generates a paraphrase expression based on the output result of the language analysis unit 22, and reference numeral 25 denotes a paraphrase dictionary with language information that the paraphrase-with-language information generation unit 24 refers to. In the present embodiment, components denoted by the same reference numerals as those in the first embodiment are the same as those in the first embodiment, and thus description thereof will be omitted.
[0040]
Next, a procedure for creating a dictionary according to the present embodiment will be described using the operation flow of FIG. Here, as in the case of the first embodiment, the process of generating a paraphrase by dividing an input example of “Sugadaira Experimental Center in Osaka University” into partial character strings in morpheme units will be described.
[0041]
First, in step S21, the language analysis unit 22 divides the written text of the character string information 10 into partial character strings, and adds reading / language information to each partial character string. A typical process performed by the language analysis unit is as follows.
[0042]
The input notation character string is subjected to morphological analysis, and reading and part of speech information are obtained in units of the divided morphemes. Next, semantic information for each morpheme required for paraphrase generation is obtained from the linguistic analysis dictionary 23 from information given to the morpheme. The semantic information is more detailed information of proper nouns such as place names and personal names, and classifications such as words and modifiers representing business types and occupations. Further, syntactic information such as dependency relations and parallel relations between morphemes is obtained by referring to notation, part of speech, and meaning in units of morphemes. If there is ambiguity in the division into the partial character strings and the linguistic information to be added, the linguistic analysis means 22 outputs in a format including all possible combinations.
[0043]
FIG. 10 is an example of the analysis result. Each of the divided partial character strings is provided with linguistic information of reading, part of speech, and meaning. In addition, syntactic information of dependency and parallel relation over a plurality of partial character strings is provided. As a result of the analysis, it can be seen that the input example is composed of six morphemes and is further composed of three compound nouns, and that the first two compound nouns each have a parallel structure related to the last compound noun.
[0044]
Note that the input of the language analysis unit 22 may be a text description and a partial language analysis result. The partial language analysis result is, for example, a part of the analysis result shown in FIG. Providing a partial linguistic analysis result in advance has the effect of preventing linguistic analysis errors. In this case, it is assumed that the division result of the partial character string and the linguistic information to be given match the linguistic information of the input.
[0045]
Next, in step S22, the paraphrase-with-language information generation unit 24 checks the output of the language analysis unit 22 with the paraphrase dictionary 23 with the language information. In this collation processing, in addition to the notation and reading of the partial character string, the part of speech, meaning, and syntactic information of the partial character string can be used. If there is an ambiguity in matching with the dictionary, expand all of those combinations.
[0046]
FIG. 11 shows an example of the contents of the paraphrase dictionary 25 with linguistic information. In the present embodiment, as shown in the figure, the paraphrase dictionary with linguistic information gives a combination of an input value condition and an output value corresponding thereto in the form of a rule, and a plurality of these rules are collected. Each rule is provided with a rule number such as 2-1 and 2-2. In this example, as the condition of the input value, in addition to the notation, structural information based on the meaning / syntax is shown. Here, “*” in the figure indicates that the item can be ignored in the collation. Further, when “<n> (n is a number)” is described in the output value, it indicates that the nth partial character string of the collation result is to be output. The rule numbers “2-1” and “2-2” are examples of paraphrase corresponding to only the notation. On the other hand, rule “2-3” is a rule indicating that the suffix of the place name can be omitted. This rule indicates that “Sugadaira / Kogen” may be paraphrased as “Sugadaira” in notation. Further, the rule “2-4” shows an example of a rule that, when a parallel relationship consisting of two terms (two partial character strings) is detected, a paraphrase in which the order is changed is generated. By allowing the expression of such rules, it is possible to process replacement of word order and generation of paraphrase depending on linguistic information of adjacent partial character strings. The collation of a plurality of partial character strings uses syntactic information. For this reason, if there is no direct syntactic relationship even if adjacent, such as “university / Sugadaira” or “Kogen / experiment”, no collation is performed.
[0047]
If the paraphrase generation unit with linguistic information 24 succeeds in the collation in step S23, a notation / read is created by replacing the corresponding part with the output expression of the paraphrase dictionary. In the dictionary shown in FIG. 3, in addition to the generation of the paraphrase shown in FIG. 6 for the first embodiment, 16 types of paraphrase shown in FIG. 12 can be generated to allow omission and replacement of the word order.
[0048]
Finally, in step S24, the generated character string is added to the dictionary.
[0049]
According to the present embodiment, a paraphrase expression can be generated by using linguistic information such as meaning and syntactic information in addition to notation and reading according to the paraphrase dictionary 25 with linguistic information. Since the paraphrase expression generated here takes into account linguistic information, it eliminates inappropriate paraphrases and covers a wide range of actual utterances. Accuracy can be improved.
[0050]
Note that the dictionary creation method according to the present embodiment can be stored in a storage medium as a program. This program includes a language analysis function for performing the same processing as the language analysis means 22, a paraphrase generation function with language information for performing the same processing as the language information-based paraphrase generation means 24, and a vocabulary storage for performing the same processing as the vocabulary storage means 16. Software composed of functions.
[0051]
Embodiment 3 FIG.
FIG. 13 is a block diagram illustrating a method for creating a dictionary for speech recognition according to the third embodiment. In FIG. 13, reference numeral 30 denotes character string information for which a paraphrase expression is to be generated. In the present embodiment, it is assumed that character string information 30 also has appearance frequency information. Reference numeral 31 denotes a dictionary creation device according to the present embodiment. In the dictionary creation device 31, reference numeral 32 denotes a language analysis / likelihood assigning unit that divides the text description of the character string information 30 into partial character strings and assigns an appearance frequency likelihood to each partial character string. Reference numeral 33 denotes a dictionary with a likelihood for language analysis referred to by the language analysis / likelihood providing means 32. Numeral 34 applies linguistic information to each partial character string based on the output result of the linguistic analysis / likelihood assigning means 32 to generate a paraphrase expression, while giving linguistic information / paraphrase generation with a likelihood. Means. Reference numeral 35 denotes a paraphrase dictionary with linguistic information and likelihood referred to by the paraphrase generating unit with linguistic information and likelihood 34. Reference numeral 36 denotes a paraphrase generation likelihood calculation unit that calculates the utterance likelihood of each paraphrase expression based on the output result of the paraphrase generation unit 34 with linguistic information and likelihood. Note that, in the present embodiment, components denoted by the same reference numerals as those in the first embodiment perform the same operations as those in the first embodiment, and a description thereof will be omitted.
[0052]
A characteristic part of the present embodiment is that the dictionary creation device 31 generates the likelihood in consideration of the appearance frequency information, the likelihood of the analysis in text division and linguistic information addition, and the probability that the generated paraphrase appears. In other words, it is provided. Hereinafter, the functional blocks in FIG. 13 will be described.
[0053]
The linguistic analysis / likelihood assigning means 32 reads the notation text from the character string information 30 and divides the notation text into partial character strings by all possible division candidates, while referring to the linguistic analysis likelihood dictionary 33, The linguistic information, the appearance frequency likelihood, and the linguistic analysis likelihood are added to the partial character string. Here, the linguistic information includes partial character string reading, part of speech, meaning, syntactic information, and the like, and the appearance frequency likelihood indicates the likelihood of appearance obtained from the appearance frequency information of the character string information 30. Contains numeric values. The linguistic analysis likelihood is a numerical value indicating the likelihood of the linguistic information assigned to each partial character string divided from the written text. The analysis result of the linguistic analysis / likelihood providing means 32 is output in a set of each divided partial character string and its linguistic information, appearance frequency likelihood, linguistic analysis likelihood, or an equivalent output format. For example, for the three division / language information addition candidates shown in FIG. 5, the appearance frequency likelihoods L0 (1), L0 (2), L0 (3), L0 (4) and L1 (1), Language analysis likelihoods L1 (2), L1 (3), and L1 (4) are assigned.
[0054]
Next, the paraphrase generation unit 34 with linguistic information and likelihood reads the output result of the linguistic analysis and likelihood providing unit 32, and selects an applicable rule from the rules stored in the linguistic information / paraphrase with likelihood dictionary 35. Select to generate a paraphrase expression. On the other hand, the paraphrase generation unit 34 with linguistic information and likelihood gives a paraphrase likelihood representing the appearance probability of each paraphrase being generated. For example, paraphrase generation likelihoods are assigned to the paraphrase generation results shown in FIG. 6, such as L2 (1-1), L2 (1-2).
[0055]
Finally, the paraphrase generation likelihood calculation means 36 reads the output of the paraphrase generation means 34 with linguistic information and likelihood, and outputs the above-described occurrence frequency likelihood L0, language analysis likelihood L1, paraphrase likelihood L2, The utterance likelihood of the target word is calculated using at least one of the reading arrangement likelihoods L3 described in (1), and is stored in the vocabulary storage means 16 together with the recognized vocabulary and its reading. The reading arrangement likelihood L3 is a likelihood calculated in consideration of easiness of utterance and generality of the generated reading. For example, the read Y of the generated recognition vocabulary is Y = [y ₁ . . . y _m ], Let P (Y) be the probability of uttering the reading addition likelihood L3. Further, P (Y) is a probability distribution defined with respect to the number of mora in the vocabulary, and an N-gram probability P in mora units. _seq P (Y) = α as a weighted linear sum of (Y) ₁ P _len (M) + α ₂ P _seq (Y) or P (Y) = α which is the product of the two ₁ P _len (M) × α ₂ P _seq (Y). Where α ₁ , Α ₂ Is a weighting parameter. P _seq (Y) is calculated based on Equation 1.
[0056]
(Equation 1)

[0057]
Next, an operation flow of the system according to the third embodiment will be described with reference to FIG. First, in step S31, the linguistic analysis / likelihood assigning unit 32 divides the notation text of the character string information and the appearance frequency information 30 into partial character strings, and assigns linguistic information and linguistic analysis likelihood to each partial character string. . The linguistic analysis likelihood is calculated, for example, by assigning a likelihood to each rule applied at the time of analysis in advance, and calculating a weighted weighted sum or a product thereof.
[0058]
Next, in step S32, the linguistic information / likelihood paraphrase generating means 34 refers to the linguistic information / likelihood paraphrase dictionary 35, and outputs the notation partial character string or the Search for dictionary entries that match the linguistic information.
[0059]
Subsequently, in step S33, the paraphrase generation likelihood calculation means 36 uses at least one of the text segmentation and reading provision likelihood L1, the paraphrase likelihood L2, and the reading array likelihood L3 based on the reading array of the generated recognition vocabulary. Then, for example, they are weighted and added, and likelihood is given to each paraphrase.
[0060]
Finally, in step S34, the generated character string and likelihood are added to the recognition dictionary.
[0061]
According to the present embodiment, it is possible to generate a paraphrase expression using a notation that is not in the original character string notation by referring to the linguistic information stored in the paraphrase dictionary with linguistic information and likelihood. It is. For this reason, unnecessary paraphrases are rarely generated, and paraphrases can be efficiently and automatically generated. Furthermore, each recognition vocabulary is given a likelihood that takes into account the reliability of the linguistic analysis and the appearance probability of the paraphrased expression. By reflecting the accumulated likelihood on the recognition result together with the accumulated likelihood, highly accurate speech recognition processing can be realized.
[0062]
Note that the dictionary creation method and the voice recognition method according to the present embodiment can be stored in a storage medium as a program. In this case, the program includes a language analysis / likelihood providing function for performing the same processing as the language analysis / likelihood providing means 32, and a language information / likelihood processing for performing the same processing as the language information / likelihood paraphrase generating means 34. This is software including a vocabulary storage function for performing the same processing as the vocabulary storage function and the vocabulary storage unit 16.
[0063]
Embodiment 4 FIG.
FIG. 15 is a block diagram illustrating a method for creating a speech recognition dictionary according to the fourth embodiment. In the present embodiment, reference numeral 41 denotes a vocabulary candidate pruning unit that deletes a generated paraphrase expression having a low likelihood. Note that, in the present embodiment, components denoted by the same reference numerals as those in Embodiment 3 perform the same operations as those in Embodiment 3, and thus description thereof will be omitted.
[0064]
The vocabulary candidate pruning unit 41 reads as input the notation / reading of the recognized vocabulary and the paraphrase generation likelihood calculated by the paraphrase generation likelihood calculation unit 36, and is generated for each input character string information. Of the recognized vocabulary and its likelihood, only the recognized vocabulary selected based on at least one condition of the relative rank of the likelihood value and the comparison between the likelihood value and the threshold is registered in the vocabulary recording unit.
[0065]
Next, an operation flow of the system according to the present embodiment will be described with reference to FIG. However, since steps S31, S32, and S33 perform the same operations as those in the third embodiment, the same symbols are given and the description is omitted.
[0066]
In step S41, the vocabulary candidate branch hunting unit 41 uses at least one condition of a relative likelihood difference in a paraphrase generated from the same word and a threshold value in the recognized vocabulary generated in step S33. Then, the paraphrase having a small likelihood is deleted from the recognition candidates.
[0067]
Next, in step S42, the paraphrase candidates remaining as a result of step S41 are stored in the vocabulary storage means 16 as recognized vocabulary.
[0068]
According to the present embodiment, in order to delete paraphrases having a low likelihood and having a low probability of appearing from the recognized vocabulary, vocabulary candidate pruning is not performed by performing speech recognition using the resulting recognition dictionary. As compared with the case, the size of the recognition dictionary can be reduced, and there is an effect that paraphrasing can be processed with a limited amount of calculation and memory.
[0069]
The dictionary creation method and the speech recognition method in the present embodiment can be stored in a storage medium as a program. In this case, the program includes a language analysis / likelihood providing function for performing the same processing as the language analysis / likelihood providing means 32, and a language information / likelihood processing for performing the same processing as the language information / likelihood paraphrase generating means 34. This is software comprising a paraphrase generation function, a vocabulary candidate pruning function for performing the same processing as the vocabulary candidate pruning means 41, and a vocabulary storage function for performing the same processing as the vocabulary storage means 16.
[0070]
Embodiment 5 FIG.
FIG. 17 is a block diagram illustrating a method for creating a speech recognition dictionary according to the fifth embodiment. In the figure, reference numeral 51 denotes a paraphrase verification unit that selects a paraphrase expression that satisfies a predetermined constraint from one or more paraphrase expressions. Reference numeral 52 denotes a system knowledge database that gives a constraint to the paraphrase verification unit 51. Note that, in the present embodiment, components denoted by the same reference numerals as those in Embodiment 3 perform the same operations as those in Embodiment 3, and thus description thereof will be omitted.
[0071]
Next, processing according to the present embodiment will be described. The paraphrase verification unit 51 reads all the paraphrase expressions of the registration target vocabulary output from the paraphrase generation likelihood calculation unit 36. Next, a paraphrase expression to be used for the recognized vocabulary is selected according to the constraint given to the system knowledge database 52. The restrictions imposed by the system knowledge database 52 are restrictions imposed for actually performing real-time processing, such as a calculation speed and a memory amount of a speech recognition system. Things are deleted sequentially. Specifically, the amount of calculation and the required memory amount are obtained from the recognized vocabulary, and if the condition of the system is exceeded, paraphrases with lower likelihood are deleted from the recognized vocabulary in order. However, at least one recognition vocabulary is left for every word.
[0072]
Another limitation imposed by the system knowledge database 52 is that vocabularies that are difficult to recognize due to the nature of speech recognition are deleted. For example, if the length of reading of the recognized vocabulary is very short, there is a restriction on speech recognition that sufficient recognition accuracy cannot be secured. In order to avoid this and obtain sufficient accuracy, short paraphrases of, for example, two syllables or less are deleted. Alternatively, the selection range may be restricted by generating a large number of homonyms as paraphrasing expressions. If there is a homophone or a very similar recognition vocabulary, even if it can be correctly recognized, it is necessary to further identify from the candidates of the recognition vocabulary. When the number of candidates increases, the identification process becomes difficult even if the recognition is performed. Therefore, by defining such a constraint condition in the system knowledge database 52, the same sound or a similar paraphrase having a low likelihood is deleted.
[0073]
As another restriction, it is conceivable to set a vocabulary according to a target user purpose. For example, even if a certain facility name is a recognition target of the user's utterance, the paraphrasing tendency is different when the user asks for the phone number of the facility and when he asks for the weather near the facility. This is because, when asking for a telephone number, information for identifying the facility, such as the chain name of the target facility, is emphasized, while when asking for the weather, information about the location is considered to be important. In order to achieve such an object, a paraphrase type constraint based on task knowledge is described in a system knowledge database as a condition.
[0074]
Through such processing by the paraphrase verification unit 51, the recognition vocabulary that allows the system to operate practically is selected. The finally selected paraphrase and its likelihood are output to the vocabulary storage means 16 as the recognition target vocabulary.
[0075]
According to the present embodiment, the paraphrase verification means 51 of the system makes it possible to set a recognition vocabulary in consideration of system restrictions, and has the effect of improving the overall recognition accuracy. In addition, there is an effect of reducing the size of the recognition dictionary due to the implementation with a limited amount of calculation and memory. As a result, when used for speech recognition, a compact and high-accuracy speech recognition engine can be constructed.
[0076]
Note that the dictionary creation method and the voice recognition method in the present embodiment can be stored in a storage medium as a program. In this case, the program includes a language analysis / likelihood providing function for performing the same processing as the language analysis / likelihood providing means 32, and a language information / likelihood processing for performing the same processing as the language information / likelihood paraphrase generating means 34. This is software including a paraphrase generation function, a paraphrase verification function for performing the same processing as the paraphrase verification means 51, and a vocabulary storage function for performing the same processing as the vocabulary storage means 16.
[0077]
【The invention's effect】
Since the present invention creates a paraphrase expression of a headword and its reading based on a word replacement rule that describes a relationship between an input word and an output word, it includes an expression combining expressions that do not appear in the notation of a headword. It is possible to automatically generate a speech recognition dictionary.
[Brief description of the drawings]
FIG. 1 is a block diagram of a dictionary creation device and a speech recognition device according to a first embodiment.
FIG. 2 is a diagram showing an example of stored contents of a language analysis dictionary according to the first embodiment.
FIG. 3 is a diagram showing an example of a word replacement rule according to the first embodiment.
FIG. 4 is a flowchart illustrating a dictionary creation process according to the first embodiment.
FIG. 5 is a diagram illustrating an example of a character string division result using morphological analysis according to the first embodiment.
FIG. 6 is a diagram showing an example of a paraphrase expression generation result in the first embodiment.
FIG. 7 is a flowchart illustrating a voice recognition process according to the first embodiment.
FIG. 8 is a block diagram of a dictionary creation device according to a second embodiment.
FIG. 9 is a flowchart of a dictionary creation process according to the second embodiment.
FIG. 10 is a diagram showing an example of assigning a linguistic meaning in the second embodiment.
FIG. 11 is a diagram showing an example of a word replacement rule according to the second embodiment.
FIG. 12 is a diagram illustrating an example of a paraphrase expression generation result according to the second embodiment.
FIG. 13 is a block diagram of a dictionary creation device according to a third embodiment.
FIG. 14 is a flowchart of a dictionary creation process according to the third embodiment.
FIG. 15 is a block diagram of a dictionary creation device according to a fourth embodiment.
FIG. 16 is a flowchart of a dictionary creation process according to the fourth embodiment.
FIG. 17 is a block diagram of a dictionary creation device according to a fifth embodiment.
FIG. 18 is a block diagram of a dictionary creation device according to the related art.
FIG. 19 is a diagram showing an operation example of a conventional technique.
[Explanation of symbols]
10: Character string information 11: Dictionary creation device 12: Analysis processing means
13: Linguistic analysis dictionary 14: Paraphrase generation means 15: Paraphrase dictionary
16: Vocabulary storage means 21: Dictionary creation device 22: Analysis processing means
23: Linguistic analysis dictionary 24: Paraphrase generation means with linguistic information
25: Paraphrase dictionary 31: Dictionary creation device 32: Language analysis and likelihood providing means
33: Dictionary with likelihood for language analysis 34: Paraphrase generation means with language information and likelihood
35: Paraphrase dictionary with linguistic information and likelihood 36: Paraphrase generation likelihood calculating means
41: vocabulary candidate pruning means 51: paraphrase verification means
52: System knowledge database 110: Input speech 111: Speech recognition device
112: sound analysis means 113: likelihood calculation means 114: sound standard pattern
115: collation means 1001: vocabulary creation means

Claims

An input step of inputting a headword,
Based on a word substitution rule that is stored in a non-volatile storage device and that expresses the relationship between an input word and an output word, the output word having the headword as the input word is obtained as the paraphrase expression, and the paraphrase expression is further obtained. A paraphrase expression creation step of obtaining the reading;
A method for creating a speech recognition dictionary, comprising an output step of storing the paraphrase expression and its reading in a speech recognition dictionary.

The voice recognition dictionary creating method has a character string dividing step of dividing the headword input by the input step into partial character strings,
The paraphrase expression creating step acquires a paraphrase expression of the partial character string and its reading based on the output word of the word replacement rule using the partial character string as the input word. The dictionary creation method for voice recognition described in.

The character string dividing step is to give a linguistic meaning to each partial character string divided from the headword,
The paraphrase expression creating step includes, based on the word replacement rule instructing the omission of the partial character string in accordance with the linguistic meaning of the partial character string, changes the linguistic meaning of the partial character string divided from the headword. 3. The method for creating a speech recognition dictionary according to claim 2, wherein the paraphrase expression of the headword and its reading are acquired by referring to the dictionary.

The character string dividing step is to give a linguistic meaning to each partial character string divided from the headword,
The paraphrase expression creating step includes a step of, based on the word replacement rule instructing replacement of the partial character string and the preceding and succeeding partial character strings according to the linguistic meaning of the partial character string, a part divided from the headword. 3. The method for creating a dictionary for speech recognition according to claim 2, wherein the paraphrase expression of the headword and its reading are acquired by referring to the linguistic meaning of the character string.

The character string dividing step is to give an appearance frequency likelihood and a language analysis likelihood for each of the partial character strings,
The paraphrase expression creating step calculates an utterance likelihood of the paraphrase expression from the occurrence frequency likelihood and the language analysis likelihood of the partial character string,
5. The speech recognition dictionary creating method according to claim 2, wherein the output step stores the utterance likelihood of the paraphrase expression in a speech recognition dictionary.

The paraphrase expression creation step is to select a paraphrase expression from which the paraphrase expression utterance likelihood satisfies a predetermined condition,
The speech recognition dictionary creating method according to claim 5, wherein the output step stores the paraphrase expression selected by the paraphrase expression creation step and its reading in a speech recognition dictionary.

The paraphrase expression creation step, according to a predetermined condition based on the system knowledge database stored in the nonvolatile storage device, select a paraphrase expression of the headword and its reading,
The speech recognition dictionary according to any one of claims 1 to 6, wherein the output step stores the paraphrase expression selected by the paraphrase expression creation step and its reading in a speech recognition dictionary. How to make.

An input means for inputting a headword,
The non-volatile storage device stores the output word having the headword as the input word as the paraphrase expression based on the word replacement rule expressing the relationship between the input word and the output word, and further obtains the paraphrase expression. A paraphrase expression creating means for acquiring the reading,
An apparatus for creating a dictionary for speech recognition, characterized by comprising output means for storing said paraphrase expression and its reading in a dictionary for speech recognition.

The voice recognition dictionary creating device includes a character string dividing unit that divides the headword input by the input unit into partial character strings,
The paraphrase expression creating means is configured to acquire a paraphrase expression of the partial character string and its reading based on the output word of the word replacement rule using the partial character string as the input word. The dictionary creation device for speech recognition according to claim 8.

The character string dividing means assigns a linguistic meaning to each partial character string divided from the headword,
The paraphrase expression creating means, based on the word replacement rule instructing the omission of the partial character string according to the linguistic meaning of the partial character string, converts the linguistic meaning of the partial character string divided from the headword. The apparatus for creating a dictionary for speech recognition according to claim 9, wherein the paraphrase expression of the headword and its reading are acquired by referring to the dictionary.

The character string dividing means assigns a linguistic meaning to each partial character string divided from the headword,
The paraphrase expression creating means, based on the word replacement rule that instructs replacement of the partial character string with the preceding and following partial character strings in accordance with the linguistic meaning of the partial character string, a part divided from the headword. 10. The apparatus for creating a dictionary for speech recognition according to claim 9, wherein a paraphrase expression of the headword and its reading are acquired by referring to a linguistic meaning of a character string.

The character string dividing means assigns an appearance frequency likelihood and a language analysis likelihood to each of the partial character strings,
The paraphrase expression creating means calculates the utterance likelihood of the paraphrase expression from the appearance frequency likelihood of the partial character string and the linguistic analysis likelihood,
12. The speech recognition dictionary creating apparatus according to claim 9, wherein the output unit is configured to store the utterance likelihood of the paraphrase expression in a speech recognition dictionary.

The paraphrase expression creating means selects a paraphrase expression in which the utterance likelihood of the paraphrase expression satisfies a predetermined condition from the created paraphrase expression,
13. The speech recognition dictionary creating apparatus according to claim 12, wherein the output unit is configured to store the paraphrase expression selected by the paraphrase expression creation unit and its reading in a speech recognition dictionary.

The paraphrase expression creating means selects a paraphrase expression of the headword and its reading according to a predetermined condition based on a system knowledge database stored in the nonvolatile storage device,
14. The apparatus according to claim 8, wherein the output unit is configured to store the paraphrase expression selected by the paraphrase expression creation unit and its reading in a speech recognition dictionary. A dictionary creation device for speech recognition.

15. The speech recognition system according to claim 14, wherein the system knowledge database is configured such that the predetermined condition is to satisfy constraints on hardware resources of the speech recognition dictionary creation device. Dictionary creation device.

15. The speech recognition dictionary creating apparatus according to claim 14, wherein the system knowledge database is configured such that rejection of vocabulary that is difficult to recognize due to the nature of speech recognition is the predetermined condition.

15. The speech recognition dictionary creating apparatus according to claim 14, wherein the system knowledge database is configured such that the predetermined condition is to preferentially select a vocabulary according to a use purpose of a user.

An acoustic analysis unit that analyzes an input voice according to a time series and calculates an acoustic feature amount;
A likelihood calculating means for comparing the acoustic feature amount with an acoustic standard pattern and calculating a likelihood, and calculating a likelihood of a vocabulary stored in the speech recognition dictionary from the likelihood, and calculating a vocabulary having a high likelihood. A voice recognition device having a matching means for outputting as a recognition vocabulary,
18. A speech recognition device, wherein the speech recognition dictionary is created by the speech recognition dictionary creation device according to any one of claims 8 to 17.