JP4226942B2

JP4226942B2 - Accent position estimation method, apparatus and program

Info

Publication number: JP4226942B2
Application number: JP2003102670A
Authority: JP
Inventors: 秀治中嶋; 昌明永田; 久子浅野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-04-07
Filing date: 2003-04-07
Publication date: 2009-02-18
Anticipated expiration: 2023-04-07
Also published as: JP2004309753A

Description

【０００１】
【産業上の利用分野】
この発明は、アクセント位置推定方法、装置およびプログラムに関し、特に、登録されていない単語のアクセント位置を推定するアクセント位置推定規則を識別境界面と識別境界面に近いデータとの距離を最大化する識別学習アルゴリズムによって学習することにより獲得し、この学習により得られた規則を用いて登録されていない単語のアクセント位置を推定するアクセント位置推定方法、装置およびプログラムに関する。
【０００２】
【従来の技術】
テキストに基づいて音声を合成する音声合成装置は、入力されたテキスト内の各単語の読み、際立たせるべき音は何れであるかを指示する情報であるアクセント位置情報を、辞書から取り出して合成音声の生成に利用している。ところが、任意のテキストには、辞書に登録されていない単語、即ち、未登録語が出現する事態が生じる。従って、任意のテキストの音声合成においては、未登録語に対して尤もらしい読みとアクセントの情報を推定することが必要となる。
この様に、音声合成には読みとアクセント位置の両情報が必要であるが、入力されるテキスト中に出現する単語の後ろに括弧書きその他により読みが付与されることと比較して、アクセント位置が付与されるということは極く少ないか、殆どない。この状況は日本語に限らず、一般に、音を際立たせる箇所を示すアクセント記号をテキスト中に書かない中国語、英語、フランス語、ドイツ語その他の言語においても同様である。従って、音声合成において、アクセント位置の推定は重要な意味を持つ。
【０００３】
日本語の標準語の場合、例えば、Ｎモーラより成る単語はＮ個のモーラの何処かにアクセントが置かれるか(Ｎ通り)、或いは何処にも置かれないか(１通り)により、（Ｎ＋１）通りのアクセント型が考えられる。単語はそれらの内の何れか１つのアクセント型を持つ。英語においても、Ｎ個のシラブルの内の何処かにアクセントが置かれるか、或いは、何処にも置かれないかの、合計（Ｎ＋１）通りの場合があり、その内の１つが単語毎に決まっている。
以上のことから、アクセント位置の推定という問題は、その単語の（Ｎ＋１）個のカテゴリの識別問題として解決することが試みられてきた。
【０００４】
従来技術として、統計的な分布に基づいたスコアを定義し、各単語に対するスコアの値からアクセント型を推定する技術が発表されている（非特許文献１参照）。
しかし、この非特許文献１の技術は、アクセント型の推定として単純な２分類を行なっているに過ぎない。即ち、３モーラの語には４通りの可能性の内の平板形か或いは１形かという２通りの識別だけを取り扱っており、４モーラの語には５通りの可能性の内の平板形か或いは２形かという２通りの識別だけを取り扱っているに過ぎない。実際は、他の形も存在するが、これらに対してアクセント型を付与する仕組みになっていない。
【０００５】
従来技術として、語尾の読みの違いに基づくアクセント型の推定規則を人手によって作成し、評価した結果が報告されている（非特許文献２参照）。
しかし、この非特許文献２の技術も、非特許文献１の技術と同様に、取り扱われていないアクセント型があり、それらの型を付与する仕組みになっていない。
従来技術として、分類、識別を行う識別器として、ニューラルネット或いは決定木を用いている。これら識別器は大量の例題を使った帰納学習の結果として得られる（特許文献１参照）。
しかし、特許文献１の技術で用いられるニューラルネット、決定木という識別器は学習用の例題の事例の頻度その他の統計量のみに基づいて構築されるので、事例の内の正しいけれども数の充分ではない事例に対しては充分に学習される保証はない。即ち、学習結果としての識別境界の置かれる位置が学習データの分布の中でさえ適切である保証が無く、学習に使われていない未知のデータに対しても適切ではない危険性が常に残されている。
【０００６】
また、非特許文献２、特許文献１の技術は、アクセント位置推定のための入力情報が単語の読みを表すカナ文字列或は音素記号列に限定されている。しかし、日本語の単語は、読みの結合の仕方、即ち、まとまり方の相違によりアクセントが異なる。例えば、同じ「あさか」という読みを持つ人名「朝霞」「阿坂」の読みには、それぞれ「あさ/か」「あ/さか」の様に「/」の前後のまとまりがあり、そのまとまり方の相違によって、「低高高」、「高低低」の様に音の高さを変えて発音される。即ち、アクセントが異なる。しかし、非特許文献２、特許文献１の技術は、入力情報が音素記号列に限定されているところから、これらに対して異なるアクセントを付与することができない。特許文献１は、追加情報を設定することができる記述がなされているが、如何なる情報を設定するかについては記載されていない。
【０００７】
【非特許文献１】
広川、外による"人名(姓)におけるアクセント形推定法"、日本音響学会、昭和５６年５月、第４２５頁から第４２６頁
【非特許文献２】
土田、外による"日本人の名前アクセント型付与規則"、電子通信学会論文誌、１９８４年５号、Vol.J６７-D Ｎo.５、第６２５頁から第６２６頁
【特許文献１】
特開２００２−７３０７１号公報
【０００８】
【発明が解決しようとする課題】
非特許文献１、非特許文献２において全アクセント型を考慮した識別の仕組みになっていない点であり、特許文献１で用いられる決定木、ニューラルネットの識別器自体とその学習の不充分さという点であり、非特許文献１、非特許文献２および特許文献１の日本語以外の言語を扱えない点であり、非特許文献２、特許文献１の読みのまとまりの違いによるアクセントの違いを規則内に表現できない点である。
この発明は、特許文献１のニューラルネット、決定木という識別器を用いる代わりに、サポートベクターマシン（ＳＶＭ）に代表される識別境界面と識別境界面に近い識別学習用データとの距離を最大化する識別器を利用し、単語に関する言語情報と当該単語のアクセント位置の組より成る学習データからアクセント位置を推定する規則を学習し、その規則を使って学習データに含まれていない新規の単語のアクセント位置を高精度に推定するアクセント位置推定方法、装置およびプログラムを提供するものである。
【０００９】
そして、特許文献１は日本語だけが対象であるが、英語の単語をシラブル単位に分けて構成した部分文字列を言語情報として設定することにより、英語のアクセント位置推定を行なうことができるアクセント位置推定方法、装置およびプログラムを提供するものである。
また、読みのまとまりを表現する言語情報を識別のための情報として用いることにより、より精度の高いアクセント位置推定を行なうことができるアクセント位置推定方法、装置およびプログラムを提供するものである。
【００１０】
【課題を解決するための手段】
単語に関する言語情報を入力としてアクセント位置を推定するアクセント位置推定規則を格納するアクセント位置推定規則格納部１０４と、未登録語に関する言語情報を設定する言語情報設定部１０２と、前記言語情報設定部で設定された言語情報に従って、前記アクセント位置推定規則格納部に格納されるアクセント位置推定規則を選択し、当該未登録語のアクセント位置を推定する推定部１０５と、を具備するアクセント位置推定装置を構成した。
【００１１】
前記言語情報は、単語の品詞と、単語の発音を構成する全ての部分文字列と、部分文字列の総数と、部分文字列間の結合の有無を表す（部分文字列数−１）個の情報との全てを含む。
【００１２】
先のアクセント位置推定装置において、前記単語の発音を構成する全ての部分文字列として、日本語の単語の発音をモーラ単位に分けることによって作成された部分文字列を利用するアクセント位置推定装置を構成した。
【００１３】
先のアクセント位置推定装置において、単語に関する言語情報と当該単語のアクセント位置の組を格納した学習用データ格納部１０１を更に具備し、前記言語情報設定部１０２は、学習用の単語に関する言語情報も設定し、前記言語情報設定部１０２によって設定された学習用の言語情報に従って前記学習用データ格納部１０１から選択された学習用データに基づいて、単語に関する言語情報を入力としてアクセント位置を推定するアクセント位置推定規則を学習する学習部１０３を更に具備するアクセント位置推定装置を構成した。
【００１４】
未登録語に関する言語情報を設定する言語情報設定過程と、前記言語情報設定過程で設定された言語情報に従って、アクセント位置推定規則を選択し、前記未登録語のアクセント位置を推定する推定過程と、を有し、前記言語情報は、単語の品詞と、単語の発音を構成する全ての部分文字列と、部分文字列の総数と、部分文字列間の結合の有無を表す（部分文字列数−１）個の情報と、の全てを含むアクセント位置推定方法を構成した。
【００１５】
先のアクセント位置推定方法において、前記単語の発音を構成する全ての部分文字列として、日本語の単語の発音をモーラ単位に分けることによって作成された部分文字列を利用するアクセント位置推定方法を構成した。また、上述したアクセント位置推定装置の各部としてコンピュータを機能させるためのアクセント位置推定プログラムを構成した。
【００１６】
【発明の実施の形態】
この発明のアクセント位置推定装置は、単語に関する言語情報と当該単語のアクセント位置の組を格納した学習用データ格納部と、単語に関する言語情報を設定する言語情報設定部と、言語情報設定部によって設定された情報に従って学習用データ格納部から選択された学習用データに基づいて、単語に関する言語情報を入力としてアクセント位置を推定する規則を、識別境界面と識別境界面に近いデータとの距離を最大化する識別学習アルゴリズムによって学習する学習部と、学習結果を格納するアクセント位置推定規則格納部と、言語情報設定部によって未登録語に関する言語情報を設定し、設定された言語情報に従ってアクセント位置推定規則格納部に格納されるアクセント位置推定規則を選択し、当該未登録語のアクセント位置を推定する推定部とを具備している。
【００１７】
上記学習手段およびアクセント位置推定規則として、ＳＶＭを利用することのほかに、最大エントロピー法で推定された確率モデルを利用したり、バギングやブースティングに代表される識別精度増強を目的とした学習事例の選択アルゴリズムに基づいて学習された決定木、決定リスト、ニューラルネット、線形識別器の混合を利用する。
この発明のアクセント位置推定方法は、単語に関する言語情報と当該単語のアクセント位置の組を格納した学習用データ格納部を有し、単語に関する言語情報を設定する言語情報設定工程を有し、言語情報設定工程によって設定された言語情報に従って学習用データ格納部から選択された学習用データに基づいて、単語に関する言語情報を入力としてアクセント位置を推定するアクセント位置推定規則を、識別境界面と識別境界面に近いデータとの距離を最大化する識別学習アルゴリズムによって学習する学習工程を有し、言語情報設定工程によって未登録語に関する言語情報を設定し、設定された言語情報に従ってアクセント位置推定規則を選択し、当該未登録語のアクセント位置を推定するアクセント位置推定工程を有する。
【００１８】
上記学習工程およびアクセント位置推定規則として、ＳＶＭを利用することのほかに、最大エントロピー法で推定された確率モデルを利用したり、バギングやブースティングに代表される識別精度増強を目的とした学習事例の選択アルゴリズムに基づいて学習された決定木、決定リスト、ニューラルネット、線形識別器の混合を利用する。
【００１９】
【実施例】
この発明の実施の形態を図の実施例を参照して説明する。図１はこの発明のアクセント位置推定装置の実施例を示すブロック図である。
このアクセント位置推定装置の実施例は登録されていない単語のアクセント位置を推定するに、単語に関する言語情報と単語のアクセント位置の情報の組より成る学習用データから、アクセント位置推定規則を学習し、学習の結果得られた推定規則を用いて、未登録語のアクセント位置を推定するもので、例えば、任意のテキストを音声に変換する音声合成装置に利用される。以下、具体的に説明する。
【００２０】
図１において、１０１は学習用データ格納部であり、アクセント位置の推定規則を学習する単語に関する言語情報とこれらの単語のアクセント位置の組を格納している。１０２は言語情報設定部であり、学習時およびアクセント位置の推定時に単語に関する言語情報を設定する。１０３は学習部であり、学習用データ格納部１０１から読み出された学習用データと言語情報設定部１０２で設定された言語情報を使って、識別境界面と識別境界面に近いデータとの距離を最大化する識別学習アルゴリズムに基づいて、アクセント位置を識別する１つ以上の識別器をアクセント位置推定規則として学習する。１０４はアクセント位置推定規則格納部であり、学習部１０３による学習結果を格納する。１０５は推定部であり、アクセント位置を推定するに際して、未登録語に対して言語情報設定部１０２で設定された情報に基づいて、アクセント位置推定規則格納部１０４に格納されている規則を選択し、アクセント位置を推定する。
【００２１】
図２は図１の構成を備える音声言語処理装置における未登録語のアクセント位置の推定規則の学習手順を示すフローチャートである。先ず、言語情報設定工程Ｓ２０１で、単語の言語情報を設定する。次に、ベクトル表現生成工程Ｓ２０２において、言語情報設定工程Ｓ２０１で設定された言語情報に対応するデータを学習用データ格納部１０１から抽出し、ベクトル表現を生成する。即ち、言語情報の内容毎に成分を設けるベクトルで、各情報内容の有無に従ってベクトルのその成分を１或いは０とするベクトルを作成する。そして、ベクトルの最終成分にアクセント位置を示す数値を設定する。続いて、識別学習工程Ｓ２０３においては、言語情報毎にベクトル表現生成工程Ｓ２０２で作成されたベクトルを、識別境界面と識別境界面に近いデータとの距離を最大化する識別学習アルゴリズムによって学習し、アクセント位置推定規則を作成する。
【００２２】
学習用データ格納部に格納される日本語の場合の学習データの事例を図３に示す。１つの行が１つの単語に関するデータを意味する。単語の言語情報として、単語の読みをモーラ単位に分けることによって構成した部分文字列としての各モーラの音韻、単語の品詞とモーラ数、部分文字列間の結合、即ち、まとまりの有無を示す数字を設定している。この数字は３行目のタツロウに対しては左から「１」「０」「１」となっている。これは例えば、タツロウが「達郎」である場合に存在するまとまり、「たつ」「ろう」を「た」と「つ」の間にまとまりがあるから「１」、「つ」と「ろ」との間にはまとまりがないから「O」、「ろ」と「う」との間にはまとまりがあるから「１」ということを表している。そして、右端にその単語のアクセントの位置、即ち、アクセントが付与される部分文字列の場所が格納されている。アイコの例の場合、１番目の「ア」にアクセントの位置があることを示している。
【００２３】
ここからアクセント位置推定規則を作成する単語のグループ毎に、各単語の実際の学習に用いられる属性データを取り出す。ここで、単語のグループとは、発音単位数、即ち、部分文字列数と品詞の組み合わせを意味する。例えば、３モーラの固有名詞をグループとして、そのアクセント位置推定規則を作成する場合には、データは図４のようになる。図４は構成要素数３の日本語の場合のデータであり、学習に品詞と部分文字列とその総数だけに入力データが形成変更されて用いる例を示している。部分文字列数分の発音種別とアクセント位置とを対応付けして使用する。
【００２４】
場合によっては、（部分文字列数−１）個分の部分文字列間のまとまりの有無の情報を対応付ける。この場合、（部分文字列数−１）個分の部分文字列間のまとまりの有無の情報は、（部分文字列数−１）次元のベクトル量で示され、各次元要素においてまとまりが有る場合は１を記載し、ない場合は０を記載する。
言語情報設定工程Ｓ２０１ではこれらを取り出せる様に品詞、部分文字列数を設定する。
【００２５】
ベクトル表現生成工程Ｓ２０２においては、この学習データの場合、３つの各モーラの位置毎に、ア、イ、ウ、・・・・・、キャ、キュ、キョ、・・・、ン、より成る全モーラが有るか無いかを示す（部分文字列数３×全モーラの種類数）の次元のベクトルを作成する。図４の最初の事例(アイコ)の場合、第１モーラ目の「ア」に相当する成分を１に、第２モーラ目の「イ」に相当する成分を１に、第３モーラ目の「コ」に相当する成分を１に、それ以外をすべて０にしたベクトルを、図２のベクトル表現生成工程Ｓ２０２で作成する。即ち、第１モーラ目「ア」＝１、第１モーラ目「イ」＝０、・・・・、第１モーラ目「ン」＝０、第２モーラ目「ア」＝０、第２モーラ目「イ」＝１、第２モーラ目「ウ」＝０、・・・、第２モーラ目「ン」＝０、第３モーラ目「ア」＝０、・・・、第３モーラ目「ケ」＝０、第３モーラ目「コ」＝１、第３モーラ目「サ」＝０、・・・・、第３モーラ目「ン」＝０なるベクトルを生成する。
【００２６】
この様なベクトルとその単語の何処にアクセントが付与されるかを示す数字、即ち、図４の右端の数値、とを組み合わせて識別学習工程Ｓ２０３の学習データとする。
上述した通りにして、作成した多数のベクトルデータに基づいて、モーラ数と品詞の組み合わせ毎に、アクセント位置推定規則を学習することができる。
以上の図４の事例は構成要素数が３のものだけであるが、任意の数のものを図３の様に学習用データに格納することもできる。構成要素数が４以上のものについては、部分文字列３以上のフィールドを設け、部分文字列数を部分文字列の総数に変更し、３未満のものについては、部分文字列のフィールドを削減し、部分文字列数を部分文字列の総数に変更することで実現することができる。
【００２７】
図３の部分文字列のまとまりを表す数字情報を図４の情報に加えて利用することもできる。即ち、（部分文字列数−１）個分の部分文字列間のまとまりの有無の情報を利用し、更に、（部分文字列数−１）次元多いベクトルｘを利用してもよい。これにより、まとまりの違いによるアクセントの違いがある同音語のアクセントも正しく学習することができる。
識別学習工程Ｓ２０３において、ベクトル表現生成工程Ｓ２０２で作成された学習データのベクトルを用い、識別境界面と識別境界面に近いデータとの距離を最大化する識別学習アルゴリズムを用いて学習を行なう。
【００２８】
図５は図１の構成を備える音声言語処理装置における未登録語のアクセント位置を推定する手順を示すフローチャートである。先ず、言語情報設定工程Ｓ５０１において、アクセント位置の推定を必要とされる未登録語の言語情報を設定する。次に、ベクトル表現生成工程Ｓ５０２において、単語の言語情報を先のベクトル表現生成工程Ｓ２０２と同様にしてベクトル表現にする。識別工程Ｓ５０３において、言語情報設定工程Ｓ５０１で設定された言語情報に基づいて先の学習において獲得されたアクセント位置推定規則を選択し、アクセント位置を推定する。
【００２９】
未登録語のアクセント位置の推定の入力として用いられる情報は、未登録語に関する図４の１行の中で右端のアクセント位置以外の情報に相当する。即ち、学習に使われた言語情報と同一であれば、図４に限らず、部分文字列のまとまり方の情報その他様々な言語情報を推定時の入力情報として設定することができる。
学習部１０３、アクセント位置推定規則格納部１０４、推定部１０５、識別学習工程Ｓ２０３、識別工程Ｓ５０３、即ち、学習部１０３とアクセント位置推定規則格納部１０４と推定部１０５における識別学習アルゴリズムおよびその学習結果のアクセント位置推定規則として、ＳＶＭを用いる場合は、その学習方法として下記の参照文献１に示されている方法を用いることができる。未知語の部分文字列の総数および品詞をトリガーとして学習結果のアクセント位置推定規則を選択し、それへの入力として、未知語の部分文字列、場合によっては、部分文字列間のまとまりを表すベクトルをも入力し、アクセント型を決定する。ＳＶＭは２値分類器である。例えば、３モーラの単語は４つのアクセント型を取り得るので、少なくとも３回の識別が必要である。即ち、すべてのアクセント型が０型、１型、２型、３型の４つである場合に、
０型か、それ以外かの識別、
０型以外の内で、１型か、それ以外かの識別、
０型１型以外の内で、２型か、それ以外の３型かの識別の、
少なくとも３回の２値分類を行うことで、アクセント型が決定する。参照文献１の第１２９頁での識別学習例題の識別への入力情報を示すベクトルＸ_iには、この発明のＳ２０２で作成されたベクトルの内の、アクセントの位置を除いた部分が対応する。参照文献１の第１２９頁での識別学習例題において２つの識別先クラスを示すＹ_iにはアクセント型が対応する。前記の各２値分類においては、Ｙ_iは、参照文献１の第１２９頁のｙ＝｛−１，１｝の様に２値の何れかに対応付けられる。
【００３０】
この様なデータを多数用意し、これらから識別境界面を学習する。即ち、これらのデータを使って参照文献１の第１３０頁の識別境界面を求めるための目的関数を示す式５．１７を、この目的関数を解く際の第１の制約条件を示す式５．１８、および第２の制約条件を示す式５．１９の制約の下で最大化するという数理計画問題を解く。その結果、参照文献１の第１３１頁に記載される学習例題の１つ１つのベクトルに対応付けられた係数を要素とする係数ベクトルα₀ が得られる。この成分は学習に用いられた事例のそれぞれに対応する。この内の非零なる係数に対応する事例がサポートベクトルであり、第１２８頁の図の太線の丸印、太線の×印の様に境界面に近接する複数の学習事例のベクトルである。これによって識別境界面が形成される。
【００３１】
また、未知語のアクセント位置の推定においては、未知語の情報を記したＳ５０２で作成されたベクトルを参照文献１の第１３１頁のＸとして前記の少なくとも１回以上の２値分類を、参照文献１の第１３１頁に記載される識別境界を使った識別判定のための関数を意味する式５．２０を使って行うことにより推定する。
この様に、サポートベクトルがこの発明においてアクセント位置推定に必要な情報を代表する。
また、先の３モーラ単語のアクセント位置の推定規則の学習においては、１型以外の分類については、更に、２と２以外、２以外の分類については更に、３と３以外と、順次に境界面を定めることによりアクセント位置推定規則情報を求めることができる。
【００３２】
＜参照文献１＞ Vladimir N.Vapnik 著、The Nature of Ｓtatistical Learning Theory,Ｓpringer(1995),p.129〜p131,5.5 Constructing the Optimal Hyperplane。
学習部１０３、アクセント位置推定規則格納部１０４、推定部１０５、識別学習工程Ｓ２０３、識別工程Ｓ５０３、即ち、学習部１０３とアクセント位置推定規則格納部１０４と推定部１０５における識別学習アルゴリズムおよびその学習結果のアクセント位置推定規則として、最大エントロピー法で推定された確率モデルを用いる場合は、その学習方法として下記の参照文献２に示されている方法を用いることができる。参照文献２の第３６頁のｈにはこの発明のベクトル表現生成工程Ｓ２０２で作成されたベクトルの内のアクセントの位置を示す数値を除いた部分が対応し、ｗには推定対象のアクセント位置を示す数値が対応する。
【００３３】
ｗ型のアクセント型の推定規則はＰ（ｗ｜ｈ）である。ｗは０からＮまでの値をとる。これらの確率値は、単語の品詞と単語の発音を構成する部分文字列の総数ごとにもつ。
未知語のアクセント型を推定する際には、アクセント型推定規則の学習時に設定されたものと同じ情報でＳ５０２で作成されたベクトルをｈに設定する。このｈをトリガーとして、ｗの中で最大の確率値をとるアクセント型を推定結果とする。
＜参照文献２＞ Ronald Rosenfeld、"Adａptive Statistical Language Modeling、p.34〜p.37"、A Maximum Entropy Approａch",Ph.D.thesis,Computer Science Department,Carnegie Mellon University,TRCMU-CS-94-138,April(1994)。
【００３４】
学習部１０３、アクセント位置推定規則格納部１０４、推定部１０５、識別学習工程Ｓ２０３、識別工程Ｓ５０３、即ち学習部１０３とアクセント位置推定規則格納部１０４と推定部１０５における識別学習アルゴリズムおよびその学習結果のアクセント位置推定規則として、バギング、ブースティングに代表される精度増強を目的とした学習事例の選択アルゴリズムに基づいて学習された決定木、決定リスト、ニューラルネット、線形識別器の混合を利用する場合は下記の参照文献３、参照文献４に示される方法を用いることができる。
【００３５】
参照文献３のＸ１、Ｘｍおよび参照文献４のＸｎを単語の部分文字列、場合によっては、それらに部分文字列間のまとまりを表すベクトルを含めたもの、Ｙをアクセント型とした場合の識別器によってアクセント推定規則が表現される。識別器が決定木の場合には、参照文献３のＸ１、Ｘｍおよび参照文献４のＸｎの各成文を参照して条件判断を行い、葉に到ったときにアクセント型が決定される決定木で、ニューラルネットの場合は、参照文献３のＸ１、Ｘｍおよび参照文献４のＸｎを入力して出力層の出力結果としてアクセント型が得られるニューラルネットが、それぞれ規則に対応する。参照文献３のＸ１、Ｘｍおよび参照文献４のＸｎをトリガーとしてＹが推定される。
【００３６】
参照文献３の図１における訓練データの識別における入力データを示すＸ１、Ｘｍには、この発明のベクトル表現生成工程Ｓ２０２で作成されたベクトルの内の、アクセントの位置を除いた部分が対応する。２値分類の点では前記ＳＶＭの場合と同様である。但し、この識別アルゴリズムは用いられる学習器により、２値以上の多値分類を取り扱うことができる。
参照文献４の第１、２頁の第１章のＸｎには、この発明のベクトル表現生成工程Ｓ２０２で作成されたベクトルの内の、アクセントの位置を除いた部分が対応する。２値分類の点では前記ＳＶＭの場合と同様である。但し、この識別アルゴリズムは用いられる学習器により２値以上の多値分類を取り扱うことができる。
【００３７】
＜参照文献３＞ヨアブ・フロイント他著、「ブースティング入門、第１１，１２頁」、人工知能学会誌、９月号(１９９９)。
＜参照文献４＞ Leo Breiman、"Bagging Predictors,"Machine Learning,24,2(1996)、p.123〜140"。
【００３８】
以上の実施例は日本語を対象とした実施例であるが、この発明は日本語以外の他の言語にも適用することができる。図６はこの発明を構成要素数３の英語に適用した場合の実施例の入力および学習データを示す図である。linguisticおよびignoranceの品詞、部分文字列数、発音記号より成る部分文字列である。本来は発音記号で記述すべきところであるが、ここではアスキー文字で代用した。図６においては、英単語の品詞、単語を構成する発音記号列をシラブルに分けることによって作成した部分文字列と、その部分文字列の総数が設定されている。
【００３９】
日本語の３モーラ人名（たろう、はなこ、その他）のアクセント位置推定に関する実験結果を示す。２１５６語のモーラとアクセント位置のペアを学習データとし、これに含まれない５３９語のモーラをテストセットとする実験データを５組用意して、５回のアクセント位置推定の実験を行ない、５回の正解率の平均で評価した。学習に使われたデータの構成は図４と同じである。
その結果、決定木による従来技術の正解率の平均は８８. ２２％であったが、ＳＶＭを用いた実施例の場合の正解率の平均は８９.８０％となり、１.５８ポイント改善された。これにより、識別境界面と識別境界面に近いデータとの距離を最大化する識別学習アルゴリズムを用いることの有効性が確認された。
【００４０】
この発明は、複数の機器からなるシステムに適用しても、１つの機器からなる装置に適用しても実施することができる。上述した実施の形態の機能を実現するソフトウエアのプログラムコードを記録した記憶媒体を、システム或は装置に供給し、このシステム或いは装置の電子計算機（ＣＰＵ、ＭＰＵ）、機能拡張ボード、専用回路が記憶媒体に格納されたプログラムコードを読み出し実行することによっても実施することができる。この場合、記憶媒体から読み出されたプログラムコード自体が上述した実施形態の機能を実現することとなり、そのプログラムコードを記録した記憶媒体によってもこの発明を構成することができる。
【００４１】
プログラムコードを供給する記憶媒体としては例えば、フロッピー（登録商標）ディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ、磁気テープ、不揮発性のメモリカード、或いはＲＯＭを使用することができる。
また、コンピュータが読み出したプログラムコードを実行することにより、上述した実施形態の機能が実現される訳ではなく、そのプログラムコードの内容に基づいて、コンピュータ上で稼動しているＯＳなどが実際の処理の一部或いは全部を行い、その処理によって上述した実施形態の機能を実現することもできる。
【００４２】
【発明の効果】
以上の通りであって、この発明は、１）アクセント型の推定が必要な単語に想定される全てのアクセント型を付与し得る構成を具備することにより、そして、２）理論的により安全な識別器を用いることにより、また、３）発音を構成する部分文字列間のまとまりを表す情報を用いることにより、未登録の単語に対するアクセント位置を正確に推定することできる、という効果を奏す。
更に、この発明は、４）言語に依存した情報を仮定しないことにより、学習に用いたデータと同じ言語であれば言語を問わずにアクセント位置を推定することができる、という効果を奏す。
【図面の簡単な説明】
【図１】実施例を説明するブロック図。
【図２】学習手順を示すフローチャート。
【図３】日本語の学習データの構成を示す図。
【図４】学習時および推定時に利用される日本語のデータの構成を示す図。
【図５】未登録語のアクセント位置を推定する手順を示すフローチャート。
【図６】学習時および推定時に利用される英語のデータの構成を示す図。
【符号の説明】
１０１学習用データ格納部１０２言語情報設定部
１０３学習部１０４アクセント位置推定規則格納部
１０５推定部[0001]
[Industrial application fields]
The present invention relates to an accent position estimation method, apparatus, and program, and in particular, an accent position estimation rule for estimating the accent position of an unregistered word is identified to maximize the distance between the identification boundary surface and data close to the identification boundary surface. The present invention relates to an accent position estimation method, apparatus, and program for estimating an accent position of a word that is acquired by learning with a learning algorithm and is not registered using a rule obtained by the learning.
[0002]
[Prior art]
A speech synthesizer that synthesizes speech based on text reads out each word in the input text and extracts accent position information, which is information indicating which sound should be distinguished, from the dictionary and synthesizes speech It is used to generate However, in any text, a word that is not registered in the dictionary, that is, an unregistered word appears. Therefore, in speech synthesis of an arbitrary text, it is necessary to estimate probable reading and accent information for an unregistered word.
In this way, both speech and accent position information is required for speech synthesis, but the accent position is compared to the case where reading is given in parenthesis or other words after the word that appears in the input text. Is rarely or rarely given. This situation is not limited to Japanese, and is generally the same in Chinese, English, French, German, and other languages that do not write accent marks in the text that indicate the points that make the sound stand out. Therefore, in the speech synthesis, the estimation of the accent position has an important meaning.
[0003]
In the case of Japanese standard words, for example, a word consisting of N mora is (N + 1) depending on whether accents are placed somewhere in the N mora (N ways) or not (one way). ) Street accent type is possible. Words have one of the accent types. In English, there are a total of (N + 1) cases where accents are placed or not placed anywhere in the N syllables, one of which is determined for each word. ing.
From the above, attempts have been made to solve the problem of accent position estimation as an identification problem of (N + 1) categories of the word.
[0004]
As a conventional technique, a technique for defining a score based on a statistical distribution and estimating an accent type from a score value for each word has been announced (see Non-Patent Document 1).
However, the technique of Non-Patent Document 1 merely performs two simple classifications as accent type estimation. In other words, the three-mora word only deals with two types of identification, either flat or one of the four possibilities, and the four-mora word has five of the possibilities. It only deals with two types of identification: whether it is the two or the two. Actually, there are other shapes, but there is no mechanism to give them an accent type.
[0005]
As a prior art, a result of manually creating and evaluating an accent type estimation rule based on differences in ending reading has been reported (see Non-Patent Document 2).
However, similarly to the technique of Non-Patent Document 1, the technique of Non-Patent Document 2 has an accent type that is not handled, and does not have a mechanism for providing these types.
As a conventional technique, a neural network or a decision tree is used as a classifier for performing classification and identification. These discriminators are obtained as a result of inductive learning using a large number of examples (see Patent Document 1).
However, since a classifier called neural network and decision tree used in the technique of Patent Document 1 is constructed based only on the frequency of examples of learning examples and other statistics, the number of examples is correct but not enough. There is no guarantee that the case will not be fully learned. In other words, there is no guarantee that the position of the identification boundary as a learning result is appropriate even in the distribution of learning data, and there is always a risk that it is not appropriate for unknown data that is not used for learning. ing.
[0006]
In the techniques of Non-Patent Document 2 and Patent Document 1, the input information for estimating the accent position is limited to a kana character string or phoneme symbol string representing a word reading. However, Japanese words have different accents depending on how readings are combined, that is, how they are organized. For example, in the readings of the names “Asaka” and “Asaka” with the same “Asa” reading, there are groups before and after “/”, such as “Asa / ka” and “Asa Saka”. Depending on the difference, the sound is generated with different pitches, such as “low and high” and “high and low”. That is, the accents are different. However, the techniques of Non-Patent Document 2 and Patent Document 1 cannot give different accents to the input information because the input information is limited to phoneme symbol strings. Patent Document 1 describes that additional information can be set, but does not describe what information is set.
[0007]
[Non-Patent Document 1]
Hirokawa, et al. "Accent Form Estimation Method for Personal Names (Last Names)", Acoustical Society of Japan, May 1981, pp. 425-426
[Non-Patent Document 2]
Tsuchida, Gao, “Rules for Assigning Japanese Name Accents”, IEICE Transactions, Vol. J67-D No. 5, pp. 625 to 626
[Patent Document 1]
JP 2002-73071 A
[0008]
[Problems to be solved by the invention]
In non-patent literature 1 and non-patent literature 2, it is not a discriminating mechanism that considers all accent types, and the decision tree used in patent literature 1, the discriminator of the neural network itself, and the lack of learning It is a point that cannot handle languages other than Japanese of Non-Patent Document 1, Non-Patent Document 2, and Patent Document 1, and the difference in accents due to the difference in reading of Non-Patent Document 2 and Patent Document 1 is regulated. It is a point that cannot be expressed in.
This invention maximizes the distance between the identification boundary surface represented by the support vector machine (SVM) and the data for identification learning close to the identification boundary surface, instead of using the neural network and decision tree classifier disclosed in Patent Document 1. A rule for estimating the accent position from learning data consisting of a set of linguistic information about the word and the accent position of the word, and using that rule, a new word not included in the learning data is learned. An accent position estimation method, apparatus, and program for estimating an accent position with high accuracy are provided.
[0009]
Patent Document 1 is intended only for Japanese, but by setting a partial character string formed by dividing English words into syllable units as linguistic information, an accent position where English accent position can be estimated. An estimation method, apparatus, and program are provided.
It is another object of the present invention to provide an accent position estimation method, apparatus, and program that can perform more accurate accent position estimation by using linguistic information that expresses a group of readings as identification information.
[0010]
[Means for Solving the Problems]
  An accent position estimation rule storage unit 104 that stores an accent position estimation rule for estimating an accent position using language information about a word as input, a language information setting unit 102 that sets language information about an unregistered word, and the language information setting unit An estimation unit 105 that selects an accent position estimation rule stored in the accent position estimation rule storage unit according to set language information and estimates an accent position of the unregistered word;DoAn accent position estimation device was constructed.
[0011]
  The language information includes the part of speech of the word, all the partial character strings constituting the pronunciation of the word, the total number of the partial character strings, and the presence / absence of coupling between the partial character strings (the number of partial character strings-1). Includes everything with information.
[0012]
  In the above accent position estimation apparatus, partial character strings created by dividing the pronunciation of a Japanese word into mora units are used as all the partial character strings constituting the pronunciation of the word.Accent position estimationapparatusConfigured.
[0013]
  The previous accent position estimation apparatus further includes a learning data storage unit 101 that stores a set of language information related to a word and an accent position of the word, and the language information setting unit 102 also includes language information related to a word for learning. An accent that is set and an accent position is estimated by inputting language information about a word based on learning data selected from the learning data storage unit 101 according to the learning language information set by the language information setting unit 102 An accent position estimation apparatus further comprising a learning unit 103 for learning a position estimation rule was configured.
[0014]
  A language information setting process for setting language information relating to an unregistered word, an estimation process for selecting an accent position estimation rule according to the language information set in the language information setting process, and estimating an accent position of the unregistered word; The linguistic information includes the part of speech of the word, all the partial character strings constituting the pronunciation of the word, the total number of partial character strings,(Partial character string number-1) pieces of information indicating the presence / absence of coupling between partial character strings;An accent position estimation method including all of the above is constructed.
[0015]
  AheadIn the accent position estimation method, the accent position estimation method using partial character strings created by dividing the pronunciation of a Japanese word into mora units as all the partial character strings constituting the pronunciation of the word is configured. . Moreover, the accent position estimation program for functioning a computer as each part of the accent position estimation apparatus mentioned above was comprised.
[0016]
DETAILED DESCRIPTION OF THE INVENTION
The accent position estimation apparatus according to the present invention is set by a learning data storage unit that stores a set of language information about a word and an accent position of the word, a language information setting unit that sets language information about the word, and a language information setting unit Based on the learning data selected from the learning data storage unit according to the recorded information, the rule for estimating the accent position with the linguistic information about the word as the input, the distance between the identification boundary surface and the data close to the identification boundary surface is maximized A learning unit that learns by the discriminative learning algorithm, an accent position estimation rule storage unit that stores a learning result, and language information setting unit that sets language information related to an unregistered word, and an accent position estimation rule according to the set language information Select the accent position estimation rule stored in the storage unit and estimate the accent position of the unregistered word. It is provided with an estimation unit for.
[0017]
In addition to using SVM as the learning means and the accent position estimation rule, a learning example that uses a probability model estimated by the maximum entropy method, or for the purpose of enhancing identification accuracy represented by bagging or boosting A mixture of a decision tree, a decision list, a neural network, and a linear classifier learned based on the selection algorithm is used.
The accent position estimation method according to the present invention includes a learning data storage unit that stores a set of language information related to a word and an accent position of the word, and includes a language information setting step for setting language information related to a word. Based on the learning data selected from the learning data storage unit according to the language information set in the setting step, the accent position estimation rule for estimating the accent position by inputting the language information about the word, the identification boundary surface and the identification boundary surface Has a learning process that learns by the discriminative learning algorithm that maximizes the distance to data close to, sets language information about unregistered words by the language information setting process, and selects an accent position estimation rule according to the set language information And an accent position estimating step of estimating the accent position of the unregistered word.
[0018]
In addition to using SVM as the above learning process and accent position estimation rule, use of a probabilistic model estimated by the maximum entropy method, or learning examples aimed at enhancing identification accuracy represented by bagging and boosting A mixture of a decision tree, a decision list, a neural network, and a linear classifier learned based on the selection algorithm is used.
[0019]
【Example】
Embodiments of the present invention will be described with reference to the examples of the drawings. FIG. 1 is a block diagram showing an embodiment of an accent position estimating apparatus according to the present invention.
In this embodiment of the accent position estimation apparatus, in order to estimate the accent position of a word that is not registered, an accent position estimation rule is learned from learning data consisting of a set of language information about the word and information on the accent position of the word, The estimation rule obtained as a result of learning is used to estimate the accent position of an unregistered word. For example, it is used in a speech synthesizer that converts arbitrary text into speech. This will be specifically described below.
[0020]
In FIG. 1, reference numeral 101 denotes a learning data storage unit, which stores language information relating to a word for learning an accent position estimation rule and a set of accent positions of these words. Reference numeral 102 denotes a language information setting unit that sets language information related to words at the time of learning and at the time of estimating an accent position. Reference numeral 103 denotes a learning unit, which uses the learning data read from the learning data storage unit 101 and the language information set by the language information setting unit 102 to determine the distance between the identification boundary surface and data close to the identification boundary surface. One or more discriminators that identify accent positions are learned as accent position estimation rules based on a discriminative learning algorithm that maximizes. An accent position estimation rule storage unit 104 stores a learning result by the learning unit 103. An estimation unit 105 selects a rule stored in the accent position estimation rule storage unit 104 based on the information set by the language information setting unit 102 for an unregistered word when estimating the accent position. Estimate the accent position.
[0021]
FIG. 2 is a flowchart showing a learning procedure of an unregistered word accent position estimation rule in the spoken language processing apparatus having the configuration of FIG. First, in language information setting step S201, word language information is set. Next, in the vector expression generation step S202, data corresponding to the language information set in the language information setting step S201 is extracted from the learning data storage unit 101 to generate a vector expression. That is, a vector in which a component is provided for each content of language information, and a vector in which the component of the vector is 1 or 0 is created according to the presence or absence of each information content. Then, a numerical value indicating the accent position is set as the final component of the vector. Subsequently, in the identification learning step S203, the vector created in the vector expression generation step S202 for each language information is learned by an identification learning algorithm that maximizes the distance between the identification boundary surface and data close to the identification boundary surface. Create an accent location estimation rule.
[0022]
An example of learning data in the case of Japanese stored in the learning data storage unit is shown in FIG. One line means data related to one word. As linguistic information of a word, the phoneme of each mora as a partial character string constructed by dividing the reading of the word into mora units, the part of speech and the number of mora of the word, the number between the partial character strings, that is, the number of unity Is set. These numbers are “1”, “0”, and “1” from the left for the third row of Tatsuro. This is, for example, a group that exists when Tatsuro is “Tatsuro”, and “Tatsu” and “Taro” are grouped between “T” and “T”, so “1”, “T” and “T” Since there is no unity between “O”, “1” because there is a unity between “RO” and “U”. In the right end, the accent position of the word, that is, the location of the partial character string to which the accent is added is stored. In the example of the icon, it is indicated that the first “a” has an accent position.
[0023]
From this, attribute data used for actual learning of each word is extracted for each group of words for which an accent position estimation rule is created. Here, the word group means the number of pronunciation units, that is, the combination of the number of partial character strings and the part of speech. For example, when a proper noun of 3 mora is grouped to create an accent position estimation rule, the data is as shown in FIG. FIG. 4 shows data in the case of Japanese with 3 component elements, and shows an example in which input data is formed and changed only for part of speech, partial character strings, and the total number for learning. Corresponding pronunciation types and accent positions for the number of partial character strings are used.
[0024]
In some cases, information on the presence / absence of unity between partial character strings corresponding to (number of partial character strings−1) is associated. In this case, information on the presence / absence of grouping between (number of partial character strings−1) partial character strings is indicated by a (partial character string−1) -dimensional vector quantity, and there is a grouping in each dimension element. Describes 1 and 0 if not.
In the language information setting step S201, the part of speech and the number of partial character strings are set so that they can be extracted.
[0025]
In the vector expression generation step S202, in the case of this learning data, all of the three mora positions including a, i, c,..., K, kyu, kyo,. A vector of dimensions indicating whether or not there is a mora (the number of partial character strings 3 × the number of types of all mora) is created. In the case of the first case (Ico) in FIG. 4, the component corresponding to “a” of the first mora is 1, the component corresponding to “i” of the second mora is 1, and “ A vector in which the component corresponding to “co” is set to 1 and all other values are set to 0 is created in the vector expression generation step S202 of FIG. That is, the first mora eye “A” = 1, the first mora eye “I” = 0,..., The first mora eye “N” = 0, the second mora eye “A” = 0, the second mora Eye “I” = 1, second mora eye “U” = 0,..., Second mora eye “n” = 0, third mora eye “a” = 0,. A vector is generated such that “K” = 0, the third mora eye “co” = 1, the third mora eye “sa” = 0,..., The third mora eye “n” = 0.
[0026]
Such a vector and a number indicating where the accent is given, that is, the numerical value at the right end of FIG. 4, are combined to obtain learning data in the identification learning step S203.
As described above, an accent position estimation rule can be learned for each combination of the number of mora and the part of speech based on a large number of created vector data.
The above example of FIG. 4 has only three components, but any number can be stored in the learning data as shown in FIG. For those with 4 or more components, provide a field with 3 or more partial character strings, and change the number of partial character strings to the total number of partial character strings. For those with less than 3, reduce the number of partial character strings. This can be realized by changing the number of partial character strings to the total number of partial character strings.
[0027]
Numeric information representing a group of partial character strings in FIG. 3 can be used in addition to the information in FIG. In other words, information on the presence / absence of unity between (partial character string-1) partial character strings may be used, and a vector x having (partial character string-1) many dimensions may be used. This makes it possible to correctly learn accents of homophones that have different accents due to differences in unity.
In the identification learning step S203, learning is performed using a learning data vector created in the vector expression generation step S202, using an identification learning algorithm that maximizes the distance between the identification boundary surface and data close to the identification boundary surface.
[0028]
FIG. 5 is a flowchart showing a procedure for estimating an accent position of an unregistered word in the spoken language processing apparatus having the configuration of FIG. First, in the language information setting step S501, language information of unregistered words that require estimation of accent positions is set. Next, in the vector expression generation step S502, the linguistic information of the word is converted into a vector expression in the same manner as in the previous vector expression generation step S202. In the identification step S503, the accent position estimation rule acquired in the previous learning is selected based on the language information set in the language information setting step S501, and the accent position is estimated.
[0029]
Information used as an input for estimating the accent position of an unregistered word corresponds to information other than the accent position at the right end in one line of FIG. That is, as long as it is the same as the language information used for learning, not only FIG. 4 but also information on how to assemble partial character strings and other various language information can be set as input information at the time of estimation.
Learning unit 103, accent position estimation rule storage unit 104, estimation unit 105, identification learning step S203, identification step S503, that is, identification learning algorithms and learning results in learning unit 103, accent position estimation rule storage unit 104, and estimation unit 105 When SVM is used as the accent position estimation rule, the method shown in Reference Document 1 below can be used as the learning method. Select the accent position estimation rule of the learning result using the total number of partial strings of unknown words and the part of speech as a trigger, and a vector that represents the partial strings of unknown words and, in some cases, a group of partial strings, as input to it Is also entered to determine the accent type. SVM is a binary classifier. For example, a 3 mora word can take 4 accent types, so it needs to be identified at least 3 times. That is, when all the accent types are 0 type, 1 type, 2 type, 3 type,
Identification of type 0 or not,
Identification of type 1 or other than type 0,
Of type other than 0 type 1 type, identification of type 2 or other type 3
The accent type is determined by performing binary classification at least three times. Vector X indicating input information for identification learning example identification on page 129 of reference 1_iCorresponds to the portion of the vector created in S202 of the present invention excluding the position of the accent. Y indicating the two classes to be identified in the identification learning example on page 129 of Reference Document 1._iAccent type corresponds to. In each binary classification described above, Y_iIs associated with one of the binary values as y = {− 1, 1} on page 129 of Reference Document 1.
[0030]
A lot of such data is prepared, and the identification boundary surface is learned from them. That is, Expression 5.17 indicating the objective function for obtaining the identification boundary surface on page 130 of Reference Document 1 using these data is expressed by Expression 5.17 indicating the first constraint condition when solving this objective function. 18 and solve the mathematical programming problem of maximizing under the constraint of Equation 5.19 which shows the second constraint. As a result, a coefficient vector α whose elements are coefficients associated with each vector of the learning examples described on page 131 of the reference 1.₀ Is obtained. This component corresponds to each of the cases used for learning. A case corresponding to a non-zero coefficient among these is a support vector, which is a vector of a plurality of learning cases that are close to the boundary surface as indicated by a thick circle and a bold x in the figure on page 128. As a result, an identification boundary surface is formed.
[0031]
Further, in the estimation of the accent position of the unknown word, the vector generated in S502 describing the information of the unknown word is set to X on page 131 of the reference document 1, and the binary classification is performed at least once. This is estimated by using Equation 5.20 which means a function for discrimination determination using the discrimination boundary described on page 131 of 1.
As described above, the support vector represents information necessary for accent position estimation in the present invention.
In addition, in the learning of the rule for estimating the accent position of the previous 3 mora word, the classification other than type 1 is further separated from 2 and 2, and the classification other than 2 is further separated from 3 and 3, in order. By defining the face, the accent position estimation rule information can be obtained.
[0032]
<Reference Document 1> Vladimir N. Vapnik, The Nature of Statiistic Learning Theory, Springer (1995), p.129-p131,5.5 Constructing the Optimal Hyperplane.
Learning unit 103, accent position estimation rule storage unit 104, estimation unit 105, identification learning step S203, identification step S503, that is, identification learning algorithms and learning results in learning unit 103, accent position estimation rule storage unit 104, and estimation unit 105 When a probability model estimated by the maximum entropy method is used as the accent position estimation rule, the method shown in Reference Document 2 below can be used as the learning method. The part excluding the numerical value indicating the position of the accent in the vector created in the vector expression generation step S202 of the present invention corresponds to h on page 36 of Reference Document 2, and w represents the accent position to be estimated. The numerical value shown corresponds.
[0033]
The w-type accent type estimation rule is P (w | h). w takes a value from 0 to N. These probability values are provided for each total number of partial character strings constituting the part of speech of the word and the pronunciation of the word.
When estimating the accent type of an unknown word, the vector created in S502 with the same information set during learning of the accent type estimation rule is set to h. Using h as a trigger, an accent type having the maximum probability value in w is assumed to be an estimation result.
<Reference 2> Ronald Rosenfeld, "Adaptive Statistical Language Modeling, p.34-p.37", A Maximum Entropy Approach ", Ph.D. thesis, Computer Science Department, Carnegie Mellon University, TRCMU-CS-94-138 April (1994).
[0034]
The learning unit 103, the accent position estimation rule storage unit 104, the estimation unit 105, the identification learning step S203, the identification step S503, that is, the identification learning algorithm and the learning result of the learning unit 103, the accent position estimation rule storage unit 104, and the estimation unit 105 When using a mixture of decision trees, decision lists, neural networks, and linear discriminators learned based on learning case selection algorithms for the purpose of enhancing accuracy represented by bagging and boosting as accent location estimation rules The methods shown in Reference Document 3 and Reference Document 4 below can be used.
[0035]
X1 and Xm of Reference Document 3 and Xn of Reference Document 4 are partial character strings of words, and in some cases, those including a vector representing a unit between partial character strings, and an identifier when Y is an accent type An accent estimation rule is expressed by. When the discriminator is a decision tree, a decision is made with reference to each of X1, Xm of reference document 3 and Xn of reference document 4, and the accent type is determined when the leaf is reached. In the case of a neural network, each of the neural networks that receives X1 and Xm of Reference Document 3 and Xn of Reference Document 4 and obtains an accent type as an output result of the output layer corresponds to a rule. Y is estimated using X1 and Xm of Reference Document 3 and Xn of Reference Document 4 as triggers.
[0036]
X1 and Xm indicating input data in identification of training data in FIG. 1 of Reference Document 3 correspond to portions of the vector created in the vector expression generation step S202 of the present invention excluding the accent position. The binary classification is the same as in the case of the SVM. However, this identification algorithm can handle multi-level classification of two or more values depending on the learning device used.
Xn in Chapter 1 on page 1 and page 2 of Reference Document 4 corresponds to a portion of the vector created in the vector expression generation step S202 of the present invention excluding the accent position. The binary classification is the same as in the case of the SVM. However, this identification algorithm can handle multi-level classification of two or more values depending on the learning device used.
[0037]
<Reference 3> Joab Freund et al., “Introduction to Boosting, pages 11, 12”, Journal of the Japanese Society for Artificial Intelligence, September issue (1999).
<Reference Document 4> Leo Breiman, "Bagging Predictors," Machine Learning, 24, 2 (1996), p. 123-140.
[0038]
Although the above embodiments are directed to Japanese, the present invention can be applied to languages other than Japanese. FIG. 6 is a diagram showing the input and learning data of the embodiment when the present invention is applied to English having three components. This is a substring consisting of parts of speech of linguistic and ignorance, the number of substrings, and phonetic symbols. Originally, it should be described with phonetic symbols, but ASCII characters were substituted here. In FIG. 6, the part of speech of the English word, the partial character string created by dividing the pronunciation symbol string constituting the word into syllables, and the total number of the partial character strings are set.
[0039]
The experimental result about the accent position estimation of a Japanese 3 mora person name (Taro, Hanako, etc.) is shown. Five sets of experimental data were prepared using a 2156-word mora / accent position pair as learning data and a 539-word mora not included in the test set. It was evaluated by the average of the correct answer rate. The structure of the data used for learning is the same as in FIG.
As a result, the average accuracy rate of the prior art based on the decision tree was 88.22%, but the average accuracy rate in the example using the SVM was 89.80%, which is an improvement of 1.58 points. . This confirmed the effectiveness of using an identification learning algorithm that maximizes the distance between the identification boundary surface and data close to the identification boundary surface.
[0040]
The present invention can be applied to either a system composed of a plurality of devices or an apparatus composed of a single device. A storage medium storing software program codes for realizing the functions of the above-described embodiments is supplied to a system or apparatus, and an electronic computer (CPU, MPU), a function expansion board, and a dedicated circuit of the system or apparatus are provided. It can also be implemented by reading and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiment, and the present invention can be configured by a storage medium that records the program code.
[0041]
For example, a floppy (registered trademark) disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, DVD, magnetic tape, nonvolatile memory card, or ROM is used as a storage medium for supplying the program code. be able to.
The functions of the above-described embodiment are not realized by executing the program code read by the computer, and the OS running on the computer is actually processed based on the contents of the program code. The function of the above-described embodiment can be realized by performing part or all of the above-described processing.
[0042]
【The invention's effect】
As described above, the present invention includes 1) a configuration capable of giving all accent types assumed to words that need accent type estimation, and 2) theoretically safer identification. 3), and 3) using the information representing the grouping between the partial character strings constituting the pronunciation, it is possible to accurately estimate the accent position for the unregistered word.
Further, the present invention has the effect that 4) the accent position can be estimated regardless of the language as long as the language is the same as the data used for learning by not assuming the information depending on the language.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating an embodiment.
FIG. 2 is a flowchart showing a learning procedure.
FIG. 3 is a diagram showing a configuration of Japanese learning data.
FIG. 4 is a diagram showing the structure of Japanese data used during learning and estimation.
FIG. 5 is a flowchart showing a procedure for estimating an accent position of an unregistered word.
FIG. 6 is a diagram showing the structure of English data used during learning and estimation.
[Explanation of symbols]
101 learning data storage unit 102 language information setting unit
103 Learning unit 104 Accent position estimation rule storage unit
105 Estimator

Claims

An accent position estimation rule storage unit for storing an accent position estimation rule for estimating an accent position by inputting language information about a word;
A language information setting section for setting language information relating to unregistered words;
Selecting an accent position estimation rule stored in the accent position estimation rule storage unit according to the language information set in the language information setting unit, and estimating an accent position of the unregistered word, and
The language information includes the part of speech of the word, all the partial character strings constituting the pronunciation of the word, the total number of the partial character strings, and the presence / absence of coupling between the partial character strings (the number of partial character strings-1). An accent position estimation apparatus characterized by including all of the information .

The accent position estimation apparatus according to claim 1 ,
An accent position estimation apparatus using a partial character string created by dividing a pronunciation of a Japanese word into mora units as all the partial character strings constituting the pronunciation of the word.

In the accent position estimation apparatus according to claim 1 or 2 ,
A learning data storage unit that stores a set of linguistic information about the word and the accent position of the word;
The language information setting unit also sets language information regarding a learning word,
Based on learning data selected from the learning data storage unit according to the learning language information set by the language information setting unit, learn accent position estimation rules for estimating accent positions using language information about words as input. An accent position estimation apparatus, further comprising: a learning unit that performs the learning.

Language information setting process for setting language information about unregistered words,
Selecting an accent position estimation rule according to the language information set in the language information setting process, and estimating an accent position of the unregistered word,
The language information includes the part of speech of the word, all the partial character strings constituting the pronunciation of the word, the total number of the partial character strings, and the presence / absence of coupling between the partial character strings (the number of partial character strings-1). The accent position estimation method characterized by including all with information .

In the accent position estimation method according to claim 4 ,
An accent position estimation method using a partial character string created by dividing the pronunciation of a Japanese word into mora units as all the partial character strings constituting the pronunciation of the word.

The accent position estimation program for functioning a computer as each part of the accent position estimation apparatus in any one of Claims 1 thru | or 3 .