JP3780341B2

JP3780341B2 - Language analysis processing system and sentence conversion processing system

Info

Publication number: JP3780341B2
Application number: JP2002337747A
Authority: JP
Inventors: 真樹村田; 均井佐原
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2002-11-21
Filing date: 2002-11-21
Publication date: 2006-05-31
Anticipated expiration: 2022-11-21
Also published as: JP2004171354A

Description

【０００１】
【発明の属する技術分野】
本発明は、コンピュータで実現する自然言語処理技術に関する。さらに詳しくは、機械学習法により電子化された文を用いた言語解析処理方法および前記処理方法を実現する処理システムに関する。
【０００２】
特に、本発明は、省略補完処理、文生成処理、機械翻訳処理、文字認識処理、音声認識処理など、語句を生成する処理を含むような極めて広範囲な問題を扱う言語処理に適用することができる。
【０００３】
【従来の技術】
言語解析処理の分野では、形態素解析、構文解析の次の段階である意味解析処理が重要性を増している。特に意味解析の主要部分である格解析処理、省略解析処理などにおいて、処理にかかる労力の負担軽減や処理精度の向上が望まれている。
【０００４】
格解析処理とは、文の一部が主題化もしくは連体化などをすることにより隠れている表層格を復元する処理である。例えば、「りんごは食べた。」という文において、「りんごは」の部分は主題化しているが、この部分を表層格に戻すと「りんごを」である。このように、「りんごは食べた。」の「りんごは」の「は」の部分を「ヲ格」と解析する処理である。また、「昨日買った本はもう読んだ。」という文において、「買った本」の部分が連体化しているが、この部分を表層格に戻すと「本を買った」である。この場合に、「買った本」の連体の部分を「ヲ格」と解析する。
【０００５】
省略解析処理とは、文の一部に省略されている表層格を復元する処理を意味する。例えば、「みかんを買いました。そして食べました。」という文において、「そして食べました」の部分に省略されている名詞句（ゼロ代名詞）は「みかんを」であると解析する。
【０００６】
このような言語解析処理をコンピュータで実現する場合に、処理を行う者の労力の負担を軽減しつつ高い処理精度を得るために、機械学習法を用いて言語解析処理を行う手法を提示した（非特許文献１参照）。
【０００７】
非特許文献１において提示した機械学習法を用いて言語解析処理を行う手法（非借用型機械学習法）は、以下のような利点を備える。
(i) より大きな教師データを持つコーパスを用意することで、さらに高い精度で処理を行えることができると推測できる。
(ii)よりよい機械学習手法が開発されたとき、その機械学習手法を用いることでさらに高い精度を獲得できると予測できる。
【０００８】
さらに、非特許文献１では、借用型機械学習法を用いた言語解析処理方法を提示した。借用型機械学習法とは、機械学習法の解析対象となる情報が付加されていないデータ（以下「教師なしデータ」という。）から生成した教師信号を用いた機械学習方法である。借用型機械学習法によれば、例えば格フレーム辞書など、人手などで解析対象となる情報（解情報）を予め付与しておいたデータを用いることなく、大量に存在する一般的な電子化された文を機械学習の教師なしデータとして利用することができ、大量の教師信号による機械学習の学習精度が向上するため、高い精度の言語解析処理を実現することができる。
【０００９】
さらに、非特許文献１では、併用型機械学習法を用いた言語解析処理方法を提示した。併用型機械学習法とは、通常の機械学習法で用いる教師信号すなわち機械学習法の解析対象となる情報が付加されたデータ（以下「教師ありデータ」という。）と、教師なしデータから生成した教師信号とを用いて機械学習を行う方法である。併用型機械学習法によれば、取得が容易な教師なしデータから生成された大量の教師信号と、通常の学習精度を確保できる教師ありデータの教師信号との両方の利点を活かした言語解析処理を実現することができる。
【００１０】
また、自然言語処理の分野における重要な問題として、受け身文や使役文から能動文への変換処理がある。この文変換処理は、文生成処理、言い換え処理、文の平易化／言語運用支援、自然言語文を利用した知識獲得・情報抽出処理、質問応答システムなど、多くの研究分野で役に立つ。例えば質問応答システムにおいて、質問文が能動文で書かれ回答を含む文が受動文で書かれているような文書がある場合に、質問文と回答を含む文では文構造が異なっているために質問の回答を取り出すのが困難な場合がある。このような問題も、受け身文や使役文から能動文への変換処理を行うことにより解決することができる。
【００１１】
日本語の受け身文や使役文を能動文に文変換処理する際には、文変換後に用いる変換後格助詞を推定することが求められる。例えば、「犬に私が噛まれた。」という受け身文から「犬が私を噛んだ。」という能動文に変換する場合に、「犬に」の格助詞「に」が「が」に、「私が」の「が」が「を」に変換されると推定する処理である。また、「彼が彼女に髪を切らせた。」という使役文を「彼女が髪を切った。」という能動文に変換する場合に、「彼女に」の格助詞「に」が「が」に変換され、「髪を」の「を」は変換しないと推定する処理である。しかし、受け身文や使役文から能動文への変換処理における格助詞の変換は、変換される格助詞が動詞やその動詞の使われ方に依存して変わるので、簡単に自動処理できる問題ではない。
【００１２】
格助詞の変換処理については、例えば、以下の非特許文献２〜４に示すような従来手法がいくつかある。非特許文献２〜４で開示されている技術では、格助詞の変換処理の問題を、どのように格助詞を変換すればよいかを記載した格フレーム辞書を用いて対処している。
【００１３】
【非特許文献１】
村田真樹、
機械学習手法を用いた日本語格解析−教師信号借用型と非借用型さらには併用型−、
電子情報通信学会、電子情報通信学会技術研究報告NLC-2001-24
２００１年７月１７日
【非特許文献２】
情報処理振興事業協会技術センター、
計算機用日本語基本動詞辞書ＩＰＡＬ(Basic Verbs) 説明書、
１９８７
【非特許文献３】
Sadao Kurohashi and Makoto Nagao,
A Method of Case Structure Analysis for Japanese Sentences based on Examples in Case Frame Dictionary,
IEICE Transactions of Information and Systems, Vol.E77-D, No.2, １９９４
【非特許文献４】
近藤恵子、佐藤理史、奥村学、
格変換による単文の言い換え、
情報処理学会論文誌、Vol.42, No.3,
２００１
【００１４】
【発明が解決しようとする課題】
前記の非特許文献１は、機械学習法を言語解析処理に適用することで処理精度を向上させるという効果を持つ。また、借用型機械学習法や併用型機械学習法は、人手による労力負担を増やすことなく機械学習の教師信号を増大させることができる点で非常に有効である。
【００１５】
機械学習処理では、与えられた教師データにおいて正解率を最大とするように学習を行う。また、教師なしデータは、解析対象となる情報を持たないという点で教師ありデータと異なる性質のものである。
【００１６】
したがって、非特許文献１に示す併用型機械学習法のように単純に教師なしデータを教師ありデータに追加した教師信号を用いた機械学習処理は、教師ありデータと教師なしデータとを合計したデータでの正解率を最大にするように学習する。そのため、教師なしデータと教師ありデータとの関係によっては教師ありデータだけでの正解率を最大にするように学習した機械学習の場合に比べて学習精度が低下してしまうという問題が生ずる。
【００１７】
このような従来技術の問題に鑑みると、教師ありデータと教師なしデータの利点を活かして、より確実に精度の高い学習処理が行えるような手法の実現が求められる。
【００１８】
また、受け身文・使役文から能動文への文変換処理について、前記の非特許文献２〜４に示すような従来の技術では、どのように格助詞を変換すればよいかをすべての動詞とその動詞の使い方について記載した格フレーム辞書が必要であった。
【００１９】
しかし、すべての動詞とその動詞の使い方を記載した辞書を用意することは事実上困難であるため、この格フレーム辞書を用いた変換処理方法は不十分であり、格フレーム辞書に記載されていない動詞や動詞の使い方がされた文を変換することができなかったり、誤変換する確率が高かったりするという問題が生じていた。
【００２０】
したがって、特に受け身文・使役文から能動文への文変換処理について、人手による労力負担を増大させずに高い精度の処理が行えるような手法が求められる。
【００２１】
本発明の目的は、教師ありデータと教師なしデータの両方を用いて機械学習を行う併用型教師学習法を用いて言語解析処理を行う場合に、双方のデータの利点を活かして、より高い精度で言語解析処理を行える処理システムを提供することである。
【００２２】
さらに、本発明の目的は、特に受け身文や使役文から能動文への文変換処理について、機械学習法を用いて高い精度で変換後格助詞を推定できる文変換処理システムを提供することである。
【００２３】
【課題を解決するための手段】
上記の目的を達成するため、本発明は以下のような構成をとる。
【００２４】
本発明は、機械学習処理を用いて言語解析処理を行うメイン用処理システムと、前記メイン用処理システムに対して機械学習処理で使用するデータを提供するスタック用処理システムとで構成され、所定の言語解析処理を行う言語解析処理システムであって、
前記スタック用処理システムは、１）前記言語解析処理での解析対象であって機械学習処理で扱われる問題に対する解情報を含まない文データを記憶する文データ記憶手段と、
前記問題が示される所定の文表現である問題表現と、前記問題表現に相当する部分とを組にして記憶する問題表現情報記憶手段と、２）前記文データ記憶手段に記憶された文データから、前記問題表現に相当する部分に合致する部分を抽出して問題表現相当部とする問題表現相当部抽出手段と、３）前記文データの問題表現相当部を前記問題表現で変換した変換文を問題とし、前記問題表現相当部を解として、問題と解との組である教師なしデータを作成する問題構造変換手段と、４）前記作成された教師なしデータを記憶する教師なしデータ記憶手段と、５）前記教師なしデータ記憶手段に記憶された教師なしデータの問題から、所定の解析処理によって、少なくとも文字列または単語または品詞を含む所定の情報である素性を抽出し、前記教師なしデータごとに前記素性の集合と解との組を生成するスタック用解−素性対抽出手段と、６）所定の機械学習アルゴリズムにもとづいて、前記素性の集合と解との組について、どのような素性の集合の場合にどのような解になりやすいかということを機械学習処理し、学習結果として、前記どのような素性の集合との場合にどのような解になりやすいかということをスタック用学習結果データ記憶手段に保存するスタック用機械学習手段と、７）前記メイン用処理システムから、前記スタック用解−素性対抽出手段が行う抽出処理と同様の抽出処理によって抽出された前記所定の情報である素性の集合を受け取った場合に、前記スタック用学習結果データ記憶手段に学習結果として記憶された前記どのような素性の集合の場合にどのような解になりやすいかということにもとづいて、前記素性の集合の場合になりやすい解を推定し、前記推定した解をスタック用出力解として出力するスタック用解推定処理手段とを備え、
前記メイン用処理システムは、８）問題と解とで構成された文データであって、前記言語解析処理での解析対象であって機械学習処理で扱われる問題に対する解情報が付与された解データを記憶する解データ記憶手段と、９）前記解データ記憶手段に記憶された解データの問題から、前記スタック用解−素性対抽出手段が行う抽出処理と同様の抽出処理によって前記所定の情報である素性を抽出し、前記解データごとに前記素性の集合と解との組を生成するメイン用解−素性対抽出手段と、１０）前記メイン用解−素性対抽出手段で生成された前記素性の集合に対して前記スタック用解推定処理手段において推定され出力された前記スタック用出力解を、前記メイン用解−素性対抽出手段によって生成された素性の集合に素性として追加し、第１の素性の集合とする第１素性追加手段と、１１）所定の機械学習アルゴリズムにもとづいて、前記第１の素性の集合と解との組について、どのような素性の集合の場合にどのような解になりやすいかということを機械学習処理し、学習結果として、前記どのような素性の集合の場合にどのような解になりやすいかということをメイン用学習結果データ記憶手段に保存するメイン用機械学習手段と、１２）前記言語解析処理の対象として入力された入力文データから、前記スタック用解−素性対抽出手段が行う抽出処理と同様の抽出処理によって前記所定の情報である素性として抽出する素性抽出手段と、１３）前記素性抽出手段で生成された前記素性の集合に対して前記スタック用解推定処理手段において推定され出力されたスタック用出力解を、前記素性抽出手段によって生成された素性の集合に素性として追加し、第２の素性の集合とする第２素性追加手段と、１４）前記メイン用学習結果データ記憶手段に学習結果として記憶された前記どのような素性の集合の場合にどのような解になりやすいかということにもとづいて、前記第２の素性の集合の場合になりやすい解を推定する解推定処理手段とを備え、
前記所定の機械学習アルゴリズムとして決定リスト法または最大エントロピー法またはサポートベクトルマシン法のいずれかのアルゴリズムを使用し、
前記決定リスト法では、前記スタック用機械学習手段および前記メイン用機械学習手段によって、前記教師なしデータの素性の集合と解との組を規則とし、前記規則を所定の優先順位により格納したリストが前記学習結果として記憶され、前記スタック用解推定処理手段および前記解推定処理手段によって、前記学習結果であるリストに格納された規則を優先順位の高い順に前記入力データの素性の集合と比較し、素性が一致した規則の解が、前記入力データの素性の集合のときになりやすい解として推定される処理が、または、
前記最大エントロピー法では、前記スタック用機械学習手段および前記メイン用機械学習手段によって、前記教師なしデータの素性の集合と解との組から、前記素性の集合が所定の条件式を満足しかつエントロピーを示す式を最大にするときの確率分布が前記学習結果として記憶され、前記スタック用解推定処理手段および前記解推定処理手段によって、前記学習結果である確率分布をもとに、前記入力データの素性の集合の場合の各分類の確率が求められ、前記確率が最大の確率値を持つ分類が、前記入力データの素性の集合のときになりやすい解として推定される処理が、または、
前記サポートベクトルマシン法では、前記スタック用機械学習手段および前記メイン用機械学習手段によって、前記教師なしデータの素性の集合と解との組を用いて、所定のサポートベクトルマシン法による超平面を求め、前記超平面および前記超平面により分割された空間の分類が前記学習結果として記憶され、前記スタック用解推定処理手段および前記解推定処理手段によって、前記学習結果である超平面をもとに、前記入力文データの素性の集合が前記超平面で分割された空間のいずれかに属するかが求められ、前記素性の集合が属する空間の分類が、前記入力文データの素性の集合の場合になりやすい解として推定される処理が行われることを特徴とする。
【００２５】
また、前記スタック用処理システムは、１５）問題と解とで構成され、前記言語解析処理での解析対象であって機械学習処理で扱われる問題に対する解情報が付与された解データを記憶する解データ記憶手段を備えるとともに、
前記スタック用解−素性対抽出手段は、前記解データ記憶手段に記憶された解データの問題から、前記抽出処理によって前記所定の情報である素性を抽出し、前記解データごとに前記素性の集合と解との組を生成し、前記スタック用機械学習手段は、前記文データおよび前記解データから生成された素性の集合と解との組について、どのような素性の集合の場合にどのような解になりやすいかということを機械学習処理することを特徴とする。
【００２６】
さらに、本発明は、機械学習処理を用いて言語解析処理を行うメイン用処理システムと、前記メイン用処理システムに対して機械学習処理で使用するデータを提供するスタック用処理システムとで構成され、所定の言語解析処理を行う言語解析処理システムであって、
前記スタック用処理システムは、１）前記言語解析処理での解析対象であって機械学習処理で扱われる問題に対する解情報を含まない文データを記憶する文データ記憶手段と、２）前記問題が示される所定の文表現である問題表現と、前記問題表現に相当する部分とを組にして記憶する問題表現情報記憶手段と、３）前記文データ記憶手段に記憶された文データから、前記問題表現に相当する部分に合致する部分を抽出して問題表現相当部とする問題表現相当部抽出手段と、４）前記文データの問題表現相当部を前記問題表現で変換した変換文を問題とし、前記問題表現相当部を解または解候補として、問題と解または解候補との組である教師なしデータを作成する問題構造変換手段と、５）前記作成された教師なしデータを記憶する教師なしデータ記憶手段と、６）前記教師なしデータ記憶手段に記憶された教師なしデータの問題から、所定の解析処理によって、少なくとも文字列または単語または品詞を含む所定の情報である素性を抽出し、前記教師なしデータごとに前記素性の集合と解または解候補との組を生成するスタック用素性−解対・素性−解候補対抽出手段と、７）所定の機械学習アルゴリズムにもとづいて、前記素性の集合と解または解候補との組について、どのような素性の集合と解または解候補との組の場合に所定の二分類先である正例もしくは負例である確率を機械学習処理し、学習結果として、前記素性の集合と解または解候補との組の場合に正例もしくは負例である確率をスタック用学習結果データ記憶手段に保存するスタック用機械学習手段と、８）前記メイン用処理システムから、前記スタック用素性−解対・素性−解候補対抽出手段が行う抽出処理と同様の抽出処理によって抽出された前記所定の情報である素性とする素性の集合と解または解候補との組を受け取った場合に、前記学習結果データ記憶手段に学習結果として記憶された前記素性の集合と解または解候補の組の場合に正例もしくは負例である確率にもとづいて、前記素性の集合と解候補との組の場合に正例もしくは負例である確率を求め、全ての解候補の中から正例である確率が最大の解候補をスタック用出力解として出力するスタック用解推定処理手段とを備え、
前記メイン用処理システムは、９）問題と解とで構成された文データであって、前記言語解析処理での解析対象であって機械学習処理で扱われる問題に対する解情報が付与された解データを記憶する解データ記憶手段と、１０）前記解データ記憶手段に記憶された解データの問題から、前記スタック用素性−解対・素性−解候補対抽出手段が行う抽出処理と同様の抽出処理によって前記所定の情報である素性を抽出し、前記素性の集合と前記解または解候補との組を生成するメイン用素性−解対・素性−解候補対抽出手段と、１１）前記メイン用素性−解対・素性−解候補対抽出手段で生成された前記素性の集合と解または解候補との組に対して前記スタック用解推定処理手段において推定され出力されたスタック用出力解を、前記メイン用解−素性対抽出手段によって生成された素性の集合に素性として追加し、第１の素性の集合とする第１素性追加手段と、１２）所定の機械学習アルゴリズムにもとづいて、前記解と第１の素性の集合と解または解候補との組について、前記素性の集合と解または解候補の場合に正例もしくは負例である確率を機械学習処理し、学習結果として、前記素性の集合と解または解候補の場合に正例もしくは負例である確率をメイン用学習結果データ記憶手段に保存するメイン用機械学習手段と、１３）前記言語解析処理の対象として入力された入力文データから、前記スタック用素性−解対・素性−解候補対抽出手段が行う抽出処理と同様の抽出処理によって前記所定の情報である素性として抽出する素性抽出手段と、１４）前記素性抽出手段で生成された前記素性の集合と解または解候補の組に対して前記スタック用解推定処理手段において推定され出力されたスタック用出力解を、前記素性抽出手段によって生成された素性の集合に素性として追加し、第２の素性の集合とする第２素性追加手段と、１５）前記メイン用学習結果データ記憶手段に学習結果として記憶された前記素性の集合と解または解候補との組の場合に正例もしくは負例である確率にもとづいて、前記第２の素性の集合と解候補との組の場合に正例もしくは負例である確率を求め、全ての解候補の中から正例である確率が最大の解候補を解として推定する解推定処理手段とを備え、
前記所定の機械学習アルゴリズムとして決定リスト法または最大エントロピー法またはサポートベクトルマシン法のいずれかのアルゴリズムを使用し、
前記決定リスト法では、前記スタック用機械学習手段および前記メイン用機械学習手段によって、前記教師なしデータの素性の集合と解との組を規則とし、前記規則を所定の優先順位により格納したリストが前記学習結果として記憶され、前記スタック用解推定処理手段および前記解推定処理手段によって、前記学習結果であるリストに格納された規則を優先順位の高い順に前記入力データの素性の集合と比較し、素性が一致した規則の解が、前記入力データの素性の集合のときになりやすい解として推定される処理が、または、
前記最大エントロピー法では、前記スタック用機械学習手段および前記メイン用機械学習手段によって、前記教師なしデータの素性の集合と解との組から、前記素性の集合が所定の条件式を満足しかつエントロピーを示す式を最大にするときの確率分布が前記学習結果として記憶され、前記スタック用解推定処理手段および前記解推定処理手段によって、前記学習結果である確率分布をもとに、前記入力データの素性の集合の場合の各分類の確率が求められ、前記確率が最大の確率値を持つ分類が、前記入力データの素性の集合のときになりやすい解として推定される処理が、または、
前記サポートベクトルマシン法では、前記スタック用機械学習手段および前記メイン用機械学習手段によって、前記教師なしデータの素性の集合と解との組を用いて、所定のサポートベクトルマシン法による超平面を求め、前記超平面および前記超平面により分割された空間の分類が前記学習結果として記憶され、前記スタック用解推定処理手段および前記解推定処理手段によって、前記学習結果である超平面をもとに、前記入力文データの素性の集合が前記超平面で分割された空間のいずれかに属するかが求められ、前記素性の集合が属する空間の分類が、前記入力文データの素性の集合の場合になりやすい解として推定される処理が行われることを特徴とする。
【００２７】
また、前記スタック用処理システムは、１６）問題と解とで構成され、前記言語解析処理での解析対象であって機械学習処理で扱われる問題に対する解情報が付与された解データを記憶する解データ記憶手段を備えるとともに、
前記スタック用解−素性対抽出手段は、前記解データ記憶手段に記憶された解データの問題から、前記抽出処理によって前記所定の情報である素性を抽出し、前記解データごとに前記素性の集合と解との組を生成し、前記スタック用機械学習手段は、前記文データおよび前記解データから生成された素性の集合と解または解候補との組について、前記素性の集合と解または解候補との組の場合に正例もしくは負例である確率を機械学習処理することを特徴とする。
【００２８】
このように、本発明では、教師なしデータを用いた機械学習法による解析結果を教師ありデータの素性として組み込むことにより、機械学習処理において教師ありデータについての正解率を最大とするように学習が行われるため、異なる性質の教師なしデータと教師ありデータとの双方の利点を活かした機械学習処理を行うことができ、高い精度の解析処理を実現することができる。
【００２９】
さらに、本発明は、機械学習処理を用いて、受け身文または使役文である文データを能動文の文データへ変換する場合の変換後の格助詞を推定する文変換処理システムであって、１）問題と解とで構成されたデータであって、文データを問題とし、前記変換処理での問題に対する解情報を解とする解データを記憶する解データ記憶手段と、２）前記解データ記憶手段に記憶された解データの問題から、所定の解析処理によって、少なくとも文字列または単語または品詞を含む所定の情報である素性を抽出し、前記解データごとに前記素性の集合と解との組を生成する解−素性対抽出手段と、３）所定の機械学習アルゴリズムにもとづいて、前記素性の集合と解との組について、どのような素性の集合の場合にどのような解になりやすいかということを機械学習処理し、学習結果として、前記どのような素性の集合の場合にどのような解になりやすいかということを学習結果データ記憶手段に保存する機械学習手段と、４）前記変換処理の対象として入力された入力文データから、前記解−素性対抽出手段が行う抽出処理と同様の抽出処理によって前記所定の情報である素性として抽出する素性抽出手段と、５）前記学習結果データ記憶手段に学習結果として記憶された前記どのような素性の集合の場合にどのような解になりやすいかということにもとづいて、前記素性の集合の場合になりやすい解を推定する解推定処理手段とを備え、
前記所定の機械学習アルゴリズムとして決定リスト法または最大エントロピー法またはサポートベクトルマシン法のいずれかのアルゴリズムを使用し、
前記決定リスト法では、前記機械学習手段によって、前記教師なしデータの素性の集合と解との組を規則とし、前記規則を所定の優先順位により格納したリストが前記学習結果として記憶され、前記解推定処理手段によって、前記学習結果であるリストに格納された規則を優先順位の高い順に前記入力データの素性の集合と比較し、素性が一致した規則の解が、前記入力データの素性の集合のときになりやすい解として推定される処理が、または、
前記最大エントロピー法では、前記機械学習手段によって、前記教師なしデータの素性の集合と解との組から、前記素性の集合が所定の条件式を満足しかつエントロピーを示す式を最大にするときの確率分布が前記学習結果として記憶され、前記解推定処理手段によって、前記学習結果である確率分布をもとに、前記入力データの素性の集合の場合の各分類の確率が求められ、前記確率が最大の確率値を持つ分類が、前記入力データの素性の集合のときになりやすい解として推定される処理が、または、
前記サポートベクトルマシン法では、前記機械学習手段によって、前記教師なしデータの素性の集合と解との組を用いて、所定のサポートベクトルマシン法による超平面を求め、前記超平面および前記超平面により分割された空間の分類が前記学習結果として記憶され、前記解推定処理手段によって、前記学習結果である超平面をもとに、前記入力文データの素性の集合が前記超平面で分割された空間のいずれかに属するかが求められ、前記素性の集合が属する空間の分類が、前記入力文データの素性の集合の場合になりやすい解として推定される処理が行われることを特徴とする。
【００３０】
さらに、本発明は、機械学習処理を用いて、受け身文または使役文である文データを能動文の文データへ変換する場合の変換後の格助詞を推定する文変換処理システムであって、１）問題と解とで構成されたデータであって、文データを問題とし、前記変換処理での問題に対する解情報を解とする解データを記憶する解データ記憶手段と、２）前記解データ記憶手段に記憶された前記解データの問題から、所定の解析処理によって、少なくとも文字列または単語または品詞を含む所定の情報である素性を抽出し、前記解データごとに前記素性の集合と解または解候補との組を生成する素性−解対・素性−解候補対抽出手段と、３）所定の機械学習アルゴリズムにもとづいて、前記素性の集合と解または解候補との組について、どのような素性の集合と解または解候補との組の場合に正例もしくは負例である確率を機械学習処理し、学習結果として、前記素性の集合と解または解候補との組の場合に正例もしくは負例である確率を学習結果データ記憶手段に保存する機械学習手段と、４）前記変換処理の対象として入力された入力文データから、前記素性−解対・素性−解候補対抽出手段が行う抽出処理と同様の抽出処理によって前記所定の情報である素性を抽出し、前記素性の集合と解候補との組を生成する素性−解候補対抽出手段と、５）前記学習結果データ記憶手段に学習結果として記憶された前記素性の集合と解または解候補との組の場合に正例もしくは負例である確率にもとづいて、前記素性の集合と解候補との組の場合に正例もしくは負例である確率を求め、全ての解候補の中から正例である確率が最大の解候補を解として推定する解推定処理手段とを備え、
前記所定の機械学習アルゴリズムとして決定リスト法または最大エントロピー法またはサポートベクトルマシン法のいずれかのアルゴリズムを使用し、
前記決定リスト法では、前記機械学習手段によって、前記教師なしデータの素性の集合と解との組を規則とし、前記規則を所定の優先順位により格納したリストが前記学習結果として記憶され、前記解推定処理手段によって、前記学習結果であるリストに格納された規則を優先順位の高い順に前記入力データの素性の集合と比較し、素性が一致した規則の解が、前記入力データの素性の集合のときになりやすい解として推定される処理が、または、
前記最大エントロピー法では、前記機械学習手段によって、前記教師なしデータの素性の集合と解との組から、前記素性の集合が所定の条件式を満足しかつエントロピーを示す式を最大にするときの確率分布が前記学習結果として記憶され、前記解推定処理手段によって、前記学習結果である確率分布をもとに、前記入力データの素性の集合の場合の各分類の確率が求められ、前記確率が最大の確率値を持つ分類が、前記入力データの素性の集合のときになりやすい解として推定される処理が、または、
前記サポートベクトルマシン法では、前記機械学習手段によって、前記教師なしデータの素性の集合と解との組を用いて、所定のサポートベクトルマシン法による超平面を求め、前記超平面および前記超平面により分割された空間の分類が前記学習結果として記憶され、前記解推定処理手段によって、前記学習結果である超平面をもとに、前記入力文データの素性の集合が前記超平面で分割された空間のいずれかに属するかが求められ、前記素性の集合が属する空間の分類が、前記入力文データの素性の集合の場合になりやすい解として推定される処理が行われることを特徴とする。
【００３１】
受け身文や使役文から能動文への文変換処理における格助詞変換処理は、変換後の文で用いられる格助詞を決定することである。そして、変換後の格助詞の種類数は有限であるから、変換後の格助詞の推定問題は分類問題に帰着でき、機械学習手法を用いた処理として扱うことが可能である。
【００３２】
本発明では、解析対象についての情報（変換後格助詞など）を付与されていない文から生成されたデータ（教師なしデータ）を教師信号として機械学習を行う。これにより、大量に存在する通常の電子データ（文）を教師データとして利用することができ、解析対象についての情報を人手などにより付与するという労力負担を増加させることなく、高い精度の文変換処理を実現することができる。
【００３３】
【発明の実施の形態】
以下に本発明の実施の形態のいくつかを説明する。
【００３４】
第１の実施の形態として、受け身文・使役文から能動文への文変換処理に教師ありデータを用いた機械学習法（非借用型機械学習法）を適用する処理について説明する。また、第２の実施の形態として、受け身文・使役文から能動文への文変換処理に教師なしデータを用いた機械学習法（借用型機械学習法）を適用する処理について説明する。また、第３の実施の形態として、受け身文・使役文から能動文への文変換処理に教師ありデータと教師なしデータを併用して用いた機械学習法（併用型機械学習法）を適用する処理について説明する。
【００３５】
さらに、第４の実施の形態として、言語解析処理に、教師なしデータを用いた機械学習の結果を、教師ありデータの素性として用いた機械学習法（教師なしデータスタック型機械学習法）を適用する処理について説明する。
【００３６】
なお、本発明の実施の形態において、受け身文・使役文から能動文への変換処理での格助詞の変換処理とは、元の受け身文・使役文の格助詞を変換後の能動文の格助詞へ変換する処理、および元の受け身文・使役文の不要部分を消去する処理をいう。不要部分とは、使役文「彼が彼女に髪を切らせた。」から能動文「彼女が髪を切った。」への文変換において、元の使役文「彼が」の部分である。また、元の文（受け身文・使役文）の格助詞を変換前格助詞とし、能動文への文変換時に付与される新たな格助詞を変換後格助詞とする。
【００３７】
本形態では、これらの格助詞変換処理のみを対象にし、能動文への変換に伴う助動詞表現の変換処理などは処理対象として説明しない。助動詞表現部分程度の変換処理は、既存の処理、例えば文法に従った規則を用いる処理を用いて容易に実現することが可能である。
【００３８】
〔第１の実施の形態〕
第１の実施の形態として、受け身文・使役文から能動文への文変換処理を行う場合に、教師ありデータを用いた機械学習により、変更されるべき格助詞を自動変換処理する文変換処理システムの処理を説明する。
【００３９】
図１に、本形態における文変換処理システムの構成例を示す。文変換処理システム１００は、ＣＰＵおよびメモリからなり、解−素性対抽出部１０１、機械学習部１０２、学習結果データベース１０３、素性抽出部１１０、解推定処理部１１１および解データベース２を備える。
【００４０】
解−素性対抽出部１０１は、解データベース２から教師ありデータである事例を取り出し、事例ごとに事例の解と素性の集合との組（対）を抽出する手段である。
【００４１】
機械学習部１０２は、抽出された解と素性の集合との組から、どのうような素性のときにどのような解となりやすいかを機械学習法により学習し、その学習結果を学習結果データベース１０３に記憶する手段である。
【００４２】
素性抽出部１１０は、入力された文（受け身文または使役文）３から、素性の集合を抽出する手段である。なお、文は、文または少なくとも体言と用言を持つ文の一部とする。
【００４３】
解推定処理部１１１は、学習結果データベース１０３を参照して、入力文３の素性の場合にどのような解になりやすいか、すなわち能動文へ変換する場合に変換後格助詞になりやすい格助詞を推定し、推定した格助詞を解４として出力する手段である。
【００４４】
解データベース２は、機械学習で解析対象となる情報が付与された「問題−解」という構造を持つ教師ありデータを記憶する。本形態では、受け身文・使役文から能動文への変換処理における変換後格助詞が解析対象であり、能動文への変換処理で変更されるべき格助詞（変換後格助詞）の情報がタグ付けされた事例（単文）が記憶されたデータベースを利用することができる。
【００４５】
図２に、文変換処理システム１００の処理フローを示す。
【００４６】
ステップＳ１：解−素性対抽出部１０１により、解データベース２から事例を取り出し、各事例ごとに解と素性の集合との組を抽出する。例えば、解データベース２として、受け身文や使役文のそれぞれの格助詞に対してそれが能動文になったときに用いられる変換後格助詞がタグとして付与されているタグ付きコーパスを用いる。
【００４７】
図３に、タグ付きコーパスに記憶されている事例（単文）を示す。図３に示す単文に下線を付けた５つの格助詞は変換前格助詞であり、下線部の下に矢印で示す格助詞は変換後格助詞を示す情報である。図３（Ａ）の事例は、この受け身文が能動文に変換される場合に、変換前格助詞が、それぞれ、「に」から「が」へ、「が」から「を」へ変換されることを意味する。また、図３（Ｂ）の事例は、この使役文が能動文に変換される場合に、変換前格助詞が、それぞれ、「に」から「が」へ、「を」から「を」へ変換され、「彼が」の部分は消去されることを意味している。「ｏｔｈｅｒ」は、その部分は能動文になるとき消去されることを意味するタグとする。
【００４８】
ここで、素性とは、機械学習法による解析処理で用いる細かい情報の１単位を意味する。抽出する素性としては、例えば以下のようなものがある。
【００４９】
１．体言ｎについている格助詞（変換前格助詞）
２．用言ｖの品詞
３．用言ｖの単語の基本形
４．用言ｖにつく助動詞列（例：「れる」、「させる」など）
５．体言ｎの単語
６．体言ｎの単語の分類語彙表の分類番号
７．用言ｖにかかる体言ｎ以外の体言がとる格
例えば、事例の問題が「犬に噛まれた。」である場合に、
・推定すべき格にある体言ｎの単語＝犬、
・推定すべき格が修飾する用言ｖ（単語の基本形）＝噛む、
・体言ｎと用言ｖとの間の格助詞（変換前格助詞）＝に、
などの素性が抽出される。
【００５０】
また、解は、各事例にタグ情報として付与された変換後格助詞であり、上記のの事例では、
・解（変換後格助詞）＝が
である。そして、解−素性対抽出部１０１は、抽出した素性の集合を機械学習部１０２で実行する機械学習処理での文脈とし、解を分類先とする。
【００５１】
ステップＳ２：機械学習部１０２により、抽出された解と素性の集合との組から、どのような素性のときにどのような解になりやすいかを機械学習法により学習し、この学習結果を学習結果データベース１０３に記憶する。
【００５２】
例えば、事例「犬に噛まれた。⇒が」から抽出された、
・推定すべき格にある体言ｎの単語＝犬、
・推定すべき格が修飾する用言ｖ（単語の基本形）＝噛む、
・体言ｎと用言ｖとの間の格助詞（変換前格助詞）＝に、
のような素性の集合の場合には、
・解（変換後格助詞）＝が
となりやすいことを学習する。
【００５３】
また、事例「ヘビに噛まれた。⇒が」から抽出された、
・推定すべき格にある体言ｎの単語＝ヘビ、
・推定すべき格が修飾する用言ｖ（単語の基本形）＝噛む、
・体言ｎと用言ｖとの間の格助詞（変換前格助詞）＝に、
のような素性の集合の場合にも、
・解（変換後格助詞）＝が
となりやすいことを学習する。
【００５４】
機械学習法は、例えば、決定リスト法、最大エントロピー法、サポートベクトルマシン法などを用いるが、これらの手法に限定されない。
【００５５】
決定リスト法は、素性（解析に用いる情報で文脈を構成する各要素) と分類先の組を規則とし、それらをあらかじめ定めた優先順序でリストに蓄えておき、解析すべき入力が与えられたときに、リストで優先順位の高いところから入力のデータと規則の素性を比較し素性が一致した規則の分類先をその入力の分類先とする方法である。
【００５６】
最大エントロピー法は、あらかじめ設定しておいた素性ｆ_j（１≦ｊ≦ｋ）の集合をＦとするとき、所定の条件式を満足しながらエントロピーを意味する式を最大にするときの確率分布ｐ（ａ，ｂ）を求め、その確率分布にしたがって求まる各分類の確率のうち、もっとも大きい確率値を持つ分類を解（求める分類）とする方法である。
［参考文献１：村田真樹、内山将夫、内元清貴、馬青、井佐原均、種々の機械学習法を用いた多義解消実験、電子情報通信学会言語理解とコミュニケーション研究会，NCL2001-2, (2001) ]
サポートベクトルマシン法は、空間を超平面で分割することにより、２つの分類からなるデータを分類する手法である。サポートベクトルマシン法は、分類の数が２個のデータを扱うものでる。このため、通常、サポートベクトルマシン法にペアワイズ手法を組み合わせて使用することで、分類数が３個以上のデータを扱うことができる。ペアワイズ手法とは、Ｎ個の分類を持つデータの場合に、異なる二つの分類先のあらゆるペア（Ｎ（Ｎ−１）／２個) を作り、各ペアごとにどちらがよいかを２値分類器（ここではサポートベクトルマシン法によるもの）で求め、最終的にＮ（Ｎ−１）／２個の２値分類器の分類先の多数決により、分類先を求める方法である。
［参考文献２：Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods,(Cambridge University Press,2000) ］
［参考文献３：Taku Kudoh, TinySVM:Support Vector Machines,(http://cl.aist-nara.ac.jp/taku-ku//software/Tiny SVM/index.html,2000)］
サポートベクトルマシン法を説明するため、図４に、サポートベクトルマシン法のマージン最大化の概念を示す。図４において、白丸は正例、黒丸は負例を意味し、実線は空間を分割する超平面を意味し、破線はマージン領域の境界を表す面を意味する。図４（Ａ）は、正例と負例の間隔が狭い場合（スモールマージン）の概念図、図４（Ｂ）は、正例と負例の間隔が広い場合（ラージマージン）の概念図である。
【００５７】
サポートベクトルマシン法の２つの分類が正例と負例からなるものとすると、学習データにおける正例と負例の間隔（マージン) が大きいものほどオープンデータで誤った分類をする可能性が低いと考えられ、図４（Ｂ）に示すように、このマージンを最大にする超平面を求めそれを用いて分類を行なう。
【００５８】
サポートベクトルマシン法は基本的には上記のとおりであるが、通常、学習データにおいてマージンの内部領域に少数の事例が含まれてもよいとする手法の拡張や、超平面の線形の部分を非線型にする拡張（カーネル関数の導入など) がなされたものが用いられる。
【００５９】
この拡張された方法は、以下の識別関数を用いて分類することと等価であり、その識別関数の出力値が正か負かによって二つの分類を判別することができる。
【００６０】
【数１】

【００６１】
ただし、ｘは識別したい事例の文脈（素性の集合) を、ｘ_iとｙ_j（ｉ＝１，…，ｌ，ｙ_j∈｛１，−１｝）は学習データの文脈と分類先を意味し、関数ｓｇｎは、

であり、また、各α_iは式（４）と式（５）の制約のもと式（３）を最大にする場合のものである。
【００６２】
【数２】

【００６３】
また、関数Ｋはカーネル関数と呼ばれ、様々なものが用いられるが、本形態では以下の多項式のものを用いる。
【００６４】
Ｋ（ｘ，ｙ）＝（ｘ・ｙ＋１）^d （６）
Ｃ、ｄは実験的に設定される定数である。後述する具体例ではＣはすべての処理を通して１に固定した。また、ｄは、１と２の二種類を試している。ここで、α_i＞０となるｘ_iは，サポートベクトルと呼ばれ、通常，式（１) の和をとっている部分はこの事例のみを用いて計算される。つまり、実際の解析には学習データのうちサポートベクトルと呼ばれる事例のみしか用いられない。
【００６５】
サポートベクトルマシン法は、分類の数が２個のデータを扱うものであるから、分類の数が３個以上のデータを扱うために、ペアワイズ手法を組み合わせて用いることになる。本例では、文変換処理システム１５０は、サポートベクトルマシン法とペアワイズ手法を組み合わせた処理を行う。具体的には、ＴｉｎｙＳＶＭを利用して実現する。
［参考文献４：工藤拓松本裕治，Support vector machineを用いたchunk 同定、自然言語処理研究会、 2000-NL-140,(2000) ］
ステップＳ３：その後、解を求めたいデータとして入力文３が素性抽出部１１０に入力される。
【００６６】
ステップＳ４：素性抽出部１１０により、解−素性対抽出部１０１での処理とほぼ同様の処理により入力文３から素性の集合を取り出し、取り出した素性の集合を解推定処理部１１１へ渡す。例えば、入力文３が「犬に噛まれた。」である場合に、以下のような素性を抽出し、抽出した素性の集合を解推定処理部１１１へ渡す。
【００６７】
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖとの間の変換前格助詞＝に、
ステップＳ５：解推定処理部１１１により、学習結果データベース１０３に記憶した学習結果をもとに、渡された素性の集合の場合にどのような解４になりやすいかを推定し、推定された解（変換後格助詞）４を出力する。
【００６８】
例えば、事例「犬に噛まれた。⇒が」、「ヘビに噛まれた。⇒が」の事例について前記のような学習結果が学習結果データベース１０３に記憶されていた場合には、解推定処理部１１１は、この学習結果を参照して、受け取った入力文３から抽出された素性の集合を解析して、変換後格助詞に最もなりやすいのは「が」であると推定して、解４＝「が」を出力する。
【００６９】
図５に、第１の実施の形態における文変換処理システムの別の構成例を示す。なお、以降の図において同一の番号が付与された処理手段などの構成要素は、同一の機能を持つものとする。
【００７０】
文変換処理システム１５０は、素性−解対・素性−解候補対抽出部１６１、機械学習部１６２、学習結果データベース１６３、素性−解候補対抽出部１７０、解推定処理部１７１、および解データベース２を備える。
【００７１】
素性−解対・素性−解候補対抽出部１６１は、解データベース２から事例を取り出し、事例ごとに解もしくは解候補と素性の集合との組を抽出する手段である。
【００７２】
ここで、解候補は、解以外の解の候補を意味する。すなわち、変換後格助詞となる格助詞が「を」、「に」、「が」、「と」、および「で」の５つであると仮定すると、「が」が解である場合には、「を」、「に」、「と」、および「で」の４つの格助詞が解候補となる。また、解と素性の集合との組を正例と、解候補と素性の集合との組を負例とする。
【００７３】
機械学習部１６２は、素性−解対・素性−解候補対抽出部１６１により抽出された解もしくは解候補と素性の集合との組から、どのような解もしくは解候補と素性の集合との組のときに正例である確率または負例である確率を、サポートベクトルマシン法およびこれに類似する機械学習法により学習し、その学習結果を学習結果データベース１６３に記憶する手段である。
【００７４】
素性−解候補抽出部１７０は、入力文３から解候補と素性の集合との組を素性−解対・素性−解候補対抽出部１６１と同様の処理により抽出し、解推定処理部１７１へ渡す手段である。
【００７５】
解推定処理部１７１は、学習結果データベース１６３を参照して、素性−解候補抽出部１７０から渡された解候補と素性の集合との場合に正例または負例である確率を求め、正例である確率が最も大きい解候補を解４と推定し、推定された解４を出力する手段である。
【００７６】
図６に、文変換処理システム１５０の処理フローを示す。
【００７７】
ステップＳ１１：素性−解対・素性−解候補対抽出部１６１により、解データベース２から事例を取り出し、各事例ごとに、解もしくは解候補と素性の集合との組を抽出する。素性−解対・素性−解候補対抽出部１６１により抽出される素性の集合は、ステップＳ１の処理（図２参照）で抽出する素性の集合と同様である。
【００７８】
ステップＳ１２：機械学習部１６２により、抽出した解もしくは解候補と素性の集合との組から、どのような解もしくは解候補と素性の集合のときに正例である確率または負例である確率を機械学習法により学習する。この学習結果を学習結果データベース１６３に記憶する。
【００７９】
例えば、事例が「犬に噛まれた。⇒が」であって、素性の集合が、
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖとの間の変換前格助詞＝に、
である場合に、解「が」である確率（正例である確率）と、各解候補「を」、「に」、「と」、および「で」のそれぞれである確率（負例である確率）を求める。
【００８０】
ステップＳ１３：その後、素性−解候補抽出部１７０に、解を求めたい入力文３が入力される。
【００８１】
ステップＳ１４：素性−解候補抽出部１７０により、入力文３から解候補と素性の集合との組を、素性−解対・素性−解候補対抽出部１６１と同様の処理により取り出し、取り出した解候補と素性の集合との組を解推定処理部１７１へ渡す。
【００８２】
ステップＳ１５：解推定処理部１７１により、学習結果データベース１６３に記憶された学習結果をもとに、渡された解候補と素性の集合との組の場合に正例である確率または負例である確率を求める。
【００８３】
例えば、入力文が「犬に噛まれた。」である場合に、抽出した素性の集合と解候補「が」、「を」、「に」、「と」、および「で」それぞれについて、正例である確率または負例である確率を求める。
【００８４】
ステップＳ１６：すべての解候補に対して正例である確率または負例である確率を求め、正例である確率が最も高い解候補を求める解４として推定し、推定された解４を出力する。
【００８５】
〔第２の実施の形態〕
第２の実施の形態として、受け身文・使役文から能動文への変換処理において、教師なし学習により格助詞を自動変換する文変換処理システムの処理を説明する。
【００８６】
まず、機械学習法で用いる教師なしデータを説明する。図７（Ａ）に教師なしデータを作成するために与えられる電子化された文を示す。図７（Ａ）の能動文「犬が私を噛んだ。」は、解析対象となる情報すなわち能動文への文変換時の格助詞の変換に関する情報が付与されていないデータである。しかし、図７（Ａ）の文を能動文への文変換の結果と考えると、この能動文へ変換される元の受け身文・使役文で表れるはずの格助詞（変換前格助詞）は不明であるが、推定すべき解すなわち処理結果（能動文）に表れるべき格助詞（変換格助詞）を抽出することができる。
【００８７】
図７（Ｂ）に変換前格助詞と変換後格助詞との関係を表す単文を示す。図７（Ａ）の能動文の変換元の文は、「犬＜？＞私＜？＞噛んだ（噛まれた）。」と表すことができる。元の文に表れるはずの変換前格助詞は与えられていないことから、「＜？＞（不明）」で示す。また、図７（Ａ）の文から抽出した推定すべき解である変換後格助詞は、＜？＞の下に矢印で示す「が」および「を」で示す。図７（Ｂ）に示すように、解析対象となる情報が与えられていない能動文は、変換前格助詞の情報については不明であるが、解（分類先）である変換後格助詞の情報を持つ。そして、図７（Ｂ）に示す文のうち「犬＜？＞噛んだ。」は、以下のような問題構造に変換することができる。
【００８８】
「問題⇒解」＝「犬＜？＞噛んだ。⇒が」
このように、解析対象の情報が付加されていない能動文を機械学習の教師データとして利用できることがわかる。
【００８９】
図７（Ａ）の能動文から生成される教師なしデータは、変換前格助詞の情報を持たないという点で教師ありデータよりも情報が少ない。しかし、受け身文・使役文に比べて能動文の数が多く、かつ手作業によって変換後格助詞の情報をタグ付けするという作業が不要であるため大量の能動文を教師なしデータとして利用することができ、機械学習法で扱う教師信号を増大させるという利点がある。
【００９０】
図８に、第２の実施の形態における文変換処理システムの構成例を示す。文変換処理システム２００は、ＣＰＵおよびメモリからなり、問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、問題構造変換部２０４、教師なしデータ記憶部２０５、解−素性対抽出部１０１、機械学習部１０２、学習結果データベース１０３、素性抽出部１１０、解推定処理部１１１、および文データベース５を備える。
【００９１】
問題表現相当部抽出部２０１は、本システムでの処理においてどのようなものが問題表現に相当する部分（問題表現相当部）であるかを予め記憶した問題表現情報記憶部２０２を参照して、解析対象となる情報が付与されていないデータ（文）を記憶した文データベース５から文を取り出し、取り出した文から問題表現相当部を抽出する手段である。
【００９２】
ここでは、問題表現情報記憶部２０２は、問題表現相当部として受け身文・使役文から能動文への変換において変更されるべき格助詞（変換後格助詞）を記憶しておく。
【００９３】
問題構造変換部２０４は、抽出された問題表現相当部を変換する必要がある場合に、意味解析のための情報を記憶する意味解析情報記憶部２０３を参照して、問題表現相当部を変換した文を問題とし問題表現相当部から抽出した格助詞を解として「問題−解」の構造に変換し、この変換した教師なしデータを事例として教師なしデータ記憶部２０５に記憶する手段である。
【００９４】
文変換処理システム２００の解−素性対抽出部１０１、機械学習部１０２、学習結果データベース１０３、素性抽出部１１０、および解推定処理部１１１は、第１の実施の形態において説明した同一番号の処理手段とほぼ同様の処理を行う手段である。なお、解−素性対抽出部１０１は、教師なしデータ記憶部２０５から、教師なしデータである事例を取り出して各事例ごとに解と素性の集合との組を抽出する。
【００９５】
図９に、教師なしデータ生成処理の処理フローを示す。
【００９６】
ステップＳ２１：文データベース５から、解析対象となる情報が付与されていない自然文の電子データである文（能動文）が問題表現相当部抽出部２０１に入力される。
【００９７】
ステップＳ２２：問題表現相当部抽出部２０１により、問題表現情報記憶部２０２を参照し、入力された能動文の構造を検出して問題表現相当部を抽出する。このとき、どのようなものが問題表現相当部であるかの情報は、問題表現情報記憶部２０２に記憶されている問題表現情報により与えられる。例えば、問題表現情報として「犬＜？＝推定すべき格（変換後格助詞）＞噛む」を記憶しておく。そして、問題表現相当部抽出部２０１は、問題表現情報として記憶している文構造と入力文（能動文）の構造とをマッチングして、一致するものを問題表現相当部とする。例えば入力文が「犬が噛む。」であれば、マッチングの結果、「が」を問題表現相当部として抽出する。
【００９８】
ステップＳ２３：問題構造変換部２０４により、意味解析情報記憶部２０３を参照して、抽出された問題表現相当部を解として抽出し、その部分を問題表現（＜？＞）に変換し、結果として得た文を問題とする。例えば、能動文「犬が噛む。」から問題表現相当部として抽出された「が」を解とし、抽出した「が」の部分を問題表現（＜？＞）に変換し、「犬＜？＞噛む。」を問題とする。
ステップＳ２４：さらに、問題構造変換部２０４により、この問題および解の構成を持つデータを教師なしデータ（事例）として教師なしデータ記憶部２０５に記憶する。
【００９９】
その後、文変換処理システム２００は、第１の実施の形態における処理と同様に処理を行う（図２参照）。すなわち、解−素性対抽出部１０１により、教師なしデータ記憶部２０５から事例を取り出して、事例ごとに解と素性の集合との組を抽出する（ステップＳ１）。
【０１００】
取り出した事例が、「犬＜？＞噛む。」⇒「が」であれば、例えば以下のような素性の集合を抽出する。
【０１０１】
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖの間にあった元の格助詞＝？（不明）。
そして、機械学習部１０２は、解と素性の集合との組から、どのような素性のときにどのような格助詞が解となるかを学習する。機械学習部１０２は、上記のような素性の集合の場合には、「解＝が」になりやすいと学習し、その学習結果を学習結果データベース１０３に記憶する（ステップＳ２）。
【０１０２】
また、取り出した事例が、「ヘビ＜？＞噛む。」⇒「が」であれば、以下のような素性の集合を抽出する。
【０１０３】
・推定すべき格にある体言ｎ＝ヘビ、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖの間にあった元の格助詞＝？（不明）。
そして、機械学習部１０２は、上記のような素性の集合の場合にも、「解＝が」になりやすいと学習し、その学習結果を学習結果データベース１０３に記憶する。
【０１０４】
以降、素性抽出部１１０に入力文３が入力されてから解推定処理部１１１で解４が出力されるまでの処理は、第１の実施の形態における処理として図２の処理フローのステップＳ３〜ステップＳ５に示す処理と同様であるので説明を省略する。
【０１０５】
図１０に、第２の実施の形態における文変換処理システムの別の構成例を示す。文変換処理システム２５０は、問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、問題構造変換部２０４、教師なしデータ記憶部２０５、素性−解対・素性−解候補対抽出部１６１、機械学習部１６２、学習結果データベース１６３、素性−解候補抽出部１７０、解推定処理部１７１、および文データベース５を備える。
【０１０６】
文変換処理システム２５０の問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、および問題構造変換部２０４は、図８に示す同一の番号が付与された各処理手段と同様の処理を行う手段である。
【０１０７】
また、文変換処理システム２５０の素性−解対・素性−解候補対抽出部１６１、機械学習部１６２、学習結果データベース１６３、素性−解候補抽出部１７０、および解推定処理部１７１は、図５に示す同一の番号が付与された各処理手段とほぼ同様の処理を行う手段である。
【０１０８】
文変換処理システム２５０は、素性−解対・素性−解候補対抽出部１６１により、教師なしデータ記憶部２０５から、各事例ごとに、解もしくは解候補と素性の集合との組を抽出する（図６：ステップＳ１１）。
【０１０９】
取り出した事例が、「犬＜？＞噛む。」⇒「が」であれば、例えば以下のような素性の集合を抽出する。
【０１１０】
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖの間にあった元の格助詞＝？（不明）。
そして、機械学習部１６２により、解もしくは解候補と素性の集合の組から、どのような解もしくは解候補と素性の集合のときに正例である確率または負例である確率を機械学習法により学習する。この学習結果を学習結果データベース１６３に記憶する（図６：ステップＳ１２）。
【０１１１】
以降、素性−解候補抽出部１７０に入力文３が入力されてから解推定処理部１７１で解４が出力されるまでの処理は、第１の実施の形態における処理として図６の処理フローのステップＳ１３〜ステップＳ１６の処理と同様であるので説明を省略する。
【０１１２】
〔第３の実施の形態〕
教師なしデータ記憶部２０５に記憶される事例（「問題−解」）は、解データベース２に記憶されている事例（「問題−解」）とほとんど同じ構造であることから、教師なしデータの事例と教師ありデータの事例とを混ぜ合わせて利用することも可能である。本形態で、教師なしデータおよび教師ありデータの両方を教師信号として用いて機械学習を行う方法を、「教師あり／なし学習」と呼ぶ。
【０１１３】
教師なしデータは、元の文に表れる変換前格助詞の情報を持たず、教師ありデータよりも情報が少ない。しかし、人手により事例ごとに解情報（変換後格助詞など）をタグ付けする必要がない。また、一般的に受け身文の数より能動文の数が多いため、多くの文を教師信号として利用できる。このため、教師あり／なし学習による文変換処理は、人手により解析対象の情報を付与するという労力負担を増やすことなく大量の教師データを用いた機械学習の学習結果を用いた文変換処理を行うことができるという利点がある。
【０１１４】
図１１に、第３の実施の形態における文変換処理システム３００の構成例を示す。文変換処理システム３００は、ＣＰＵおよびメモリからなり、問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、問題構造変換部２０4 、教師なしデータ記憶部２０５、解−素性対抽出部１０１、機械学習部１０２、学習結果データベース１０３、素性抽出部１１０、解推定処理部１１１、解データベース２、および文データベース５を備える。文変換処理システム３００は、第２の実施の形態として説明した図８に示す構成を備える文変換処理システム２００に、さらに解データベース２を備えた構成をとり、文変換処理システム２００とほぼ同様の処理を行う。
【０１１５】
解−素性対抽出部１０１は、解データベース２に記憶された教師ありデータである事例および教師なしデータ記憶部２０５に記憶された教師なしデータである事例について、事例ごとに解と素性の集合との組を抽出する。
【０１１６】
図１２に、第３の実施の形態における文変換処理システムの別の構成例を示す。文変換処理システム３５０は、ＣＰＵおよびメモリからなり、問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、問題構造変換部２０4 、教師なしデータ記憶部２０５、素性−解対・素性−解候補対抽出部１６１、機械学習部１６２、学習結果データベース１６３、素性−解候補抽出部１７０、解推定処理部１７１、解データベース２、および文データベース５を備える。
【０１１７】
文変換処理システム３５０は、第２の実施の形態として説明した図１０に示す構成を備える文変換処理システム２５０に、さらに解データベース２を備えた構成をとり、文変換処理システム２５０とほぼ同様の処理を行う。
【０１１８】
素性−解対・素性−解候補対抽出部１６１は、解データベース２に記憶された教師ありデータである事例および教師なしデータ記憶部２０５に記憶された教師なしデータである事例について、事例ごとに解もしくは解候補と素性の集合との組を抽出する。
【０１１９】
〔第４の実施の形態〕
第４の実施の形態として、言語解析処理を行う場合に、教師なしデータおよび教師ありデータの両方の利点を活かしたスタック型機械学習を行って解析処理を行う言語解析処理システムの処理を説明する。
【０１２０】
スタック型機械学習は、複数のシステムの解析結果の融合に用いられている「スタッキング」と呼ばれる手法を用いた機械学習であって、異なる機械学習法の解析結果を素性に追加した教師信号を用いて機械学習を行うものである。
［参考文献５：Hans van Halteren, Jakub, Zavrel, and Walter Daelemans, Improving Accuracy in Word Class Tagging Through the Combination of Machine Learning Systems, Computational Linguistics, Vol.27, No.2, (2001), pp.199-229 ］
本形態において、言語解析処理システムは、借用型機械学習（教師なしデータを用いた機械学習）または併用型機械学習（教師あり／なしデータによる機械学習）を用いた言語解析処理を行い、その処理結果である推定解を素性の集合の要素として追加する。そして、推定解が追加された素性の集合を用いてさらに教師あり学習による言語解析処理を行う。
【０１２１】
例えば、本形態の言語解析処理システムで用いられる教師あり機械学習において、ある教師ありデータ（事例）から抽出される素性の集合がリスト｛ａ，ｂ，ｃ｝を持つとする。そして、スタッキング用処理システムが教師なし機械学習を用いた言語解析処理システムであり、その解析結果が「ｄ₁」であるとする。この場合に、言語解析処理システムの教師あり機械学習処理では、素性の集合｛ａ，ｂ，ｃ｝に解析結果「ｄ₁」を追加し、リスト｛ａ，ｂ，ｃ，”教師なし学習の解析結果＝ｄ₁”｝を新しい素性の集合として機械学習を行なう。
【０１２２】
また、スタッキング用処理システムが教師あり／なし機械学習を用いた言語解析処理システムであり、その解析結果が「ｄ₂」であるとする。この場合に、言語解析処理システムの教師あり機械学習処理では、素性の集合｛ａ，ｂ，ｃ｝に解析結果「ｄ₂」を追加し、リスト｛ａ，ｂ，ｃ，”教師あり／なし学習の解析結果＝ｄ₂”｝を新しい素性の集合として機械学習を行なう。
【０１２３】
また、スタッキング用処理システムとして、教師なし機械学習を用いた言語解析処理システムと、教師あり／なし機械学習を用いた言語解析処理システムとを利用することも可能である。この場合に、言語解析処理システムの教師あり機械学習処理では、素性の集合｛ａ，ｂ，ｃ｝に解析結果「ｄ₁」および「ｄ₂」を追加し、リスト｛ａ，ｂ，ｃ，”教師なし学習の解析結果＝ｄ₁”，”教師あり／なし学習の解析結果＝ｄ₂”｝を新しい素性の集合として機械学習を行なう。
【０１２４】
このように、スタッキング手法を用いて、教師ありデータを用いた非借用型機械学習と借用型機械学習または併用型機械学習とを組み合わせた場合には、教師あり機械学習に用いる教師ありデータ（事例）の素性が増加する。これにより、教師あり機械学習に用いる個々の事例自体が学習精度を向上させると考えられる。さらに、教師あり機械学習では、素性が増加してはいるが教師ありデータ（事例）についての正解率を最大にするような学習、すなわち解析処理対象についての精度を最大にするような学習を行い、その学習結果を用いて解析処理を行う。これにより、教師あり機械学習、教師なし機械学習それぞれの利点をうまく利用して高い解析精度を得ることが期待できる。
【０１２５】
図１３に、第４の実施の形態における言語解析処理システムの構成例を示す。
【０１２６】
言語解析処理システム５００は、与えられた問題に対する言語解析処理の解析結果を出力するシステムであって、ＣＰＵおよびメモリからなり、解−素性対抽出部５０１、機械学習部５０２、学習結果データベース５０３、素性抽出部５０４、解推定処理部５０５、スタック用教師なし学習処理システム１０１０、第１素性追加部５１１、第２素性追加部５１２、文データベース５、および解データベース６を備える。
【０１２７】
解−素性対抽出部５０１、機械学習部５０２、学習結果データベース５０３、素性抽出部５０４、および解推定処理部５０５の各処理手段は、それぞれ、文変換処理システム１００の解−素性対抽出部１０１、機械学習部１０２、学習結果データベース１０３、素性抽出部１１０、および解推定処理部１１１とほぼ同様の処理を行う手段である。
【０１２８】
スタック用教師なし学習処理システム１０１０は、言語解析処理について、文データベース５から生成した教師なしデータから素性の集合を抽出し、抽出された素性の集合からどのような素性の集合のときにどのような解（解析結果）になりやすいかを学習してその学習結果を記憶しておき、第１素性追加部５１１または第２素性追加部５１２から受け取った素性の集合の場合にどのような解（解析結果）になりやすいかを記憶しておいた学習結果から推定し、推定された解ｄ₁を第１素性追加部５１１へまたは解ｄ₁’を第２素性追加部５１２へ返却する手段である。
【０１２９】
スタック用教師なし学習処理システム１０１０は、図８に示す文変換処理システム２００と同様に構成された処理手段、すなわち問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、問題構造変換部２０４、教師なしデータ記憶部２０５、解−素性対抽出部１０１、機械学習部１０２、学習結果データベース１０３、素性抽出部１１０、および解推定処理部１１１を備え（図示しない）、与えられた問題に対する言語解析処理の解析結果を出力する。
【０１３０】
第１素性追加部５１１は、解−素性対抽出部５０１から受け取った解と素性の集合の組から素性の集合のみを取り出してスタック用教師なし学習処理システム１０１０へ渡し、スタック用教師なし学習処理システム１０１０から返却された解ｄ₁を受け取り、”教師なし学習の解析結果＝ｄ₁”を素性として元の素性の集合に追加する手段である。
【０１３１】
第２素性追加部５１２は、素性抽出部５０４から受け取った素性の集合を取り出してスタック用教師なし学習処理システム１０１０へ渡し、スタック用教師なし学習処理システム１０１０から返却された解ｄ₁’を受け取り、”教師なし学習の解析結果＝ｄ₁’”を素性として素性の集合に追加する手段である。
【０１３２】
図１４および図１５に、言語解析処理システム５００の処理フローを示す。
【０１３３】
ステップＳ３０：スタック用教師なし学習処理システム１０１０では、文データベース５に格納された単文を取り出す。取り出した文から問題表現情報を参照して問題表現相当部を抽出して解とし、意味解析情報を参照して問題表現相当部を問題構造に変換して結果として得た文を問題とし、この「問題−解」構造を持つ事例を教師なしデータとして記憶する。さらに、各事例ごとに解と素性の集合との組を抽出し、どのような素性のときにどのような解になりやすいかを機械学習法により学習し、学習結果を記憶しておく。
【０１３４】
ステップＳ３１：その後、解−素性対抽出部５０１により、解データベース６から事例を取り出し、各事例ごとに解と素性の集合との組を抽出する。
【０１３５】
ステップＳ３２：第１素性追加部５１１により、解と素性の集合との組のうち素性の集合のみを取り出し、スタック用教師なし学習処理システム１０１０へ渡す。
【０１３６】
ステップＳ３３：スタック用教師なし学習処理システム１０１０では、予め記憶しておいた学習結果を参照して、受け取った素性の集合についてどのような解になりやすいかを推定し、推定された解ｄ₁を第１素性追加部５１１へ返却する。
【０１３７】
ステップＳ３４：第１素性追加部５１１により、返却された解ｄ₁を素性として元の素性の集合に追加する。その結果、元の素性の集合が｛ａ，ｂ，ｃ｝であるとすると、機械学習部５０２に渡される素性の集合は、｛ａ，ｂ，ｃ，”教師なし学習の解析結果＝ｄ₁”｝となる。
【０１３８】
ステップＳ３５：機械学習部５０２により、解と”教師なし学習の解析結果＝ｄ₁”を含む素性の集合との組から、どのような素性のときにどのような解になりやすいかを学習し、学習結果を学習結果データベース５０３に記憶する。
【０１３９】
ステップＳ３６：解を求めたい文が素性抽出部５０４に入力される。
【０１４０】
ステップＳ３７：素性抽出部５０４により、入力文３から素性の集合を取り出して、第２素性追加部５１２へ渡す。
【０１４１】
ステップＳ３８：第２素性追加部５１２により、受け取った素性の集合がスタック用教師なし学習処理システム１０１０へ渡される。
【０１４２】
ステップＳ３９：スタック用教師なし学習処理システム１０１０では、予め記憶しておいた学習結果を参照して、受け取った素性の集合のときにどのような解となりやすいかを推定し、推定された解ｄ₁’を第２素性追加部５１２へ渡す。
【０１４３】
ステップＳ３１０：第２素性追加部５１２により、返却された解ｄ₁’を素性として元の素性の集合に追加する。元の素性の集合が｛ａ，ｂ，ｃ｝であるとすると、機械学習部５０２に渡される素性の集合は、｛ａ，ｂ，ｃ，”教師なし学習の解析結果＝ｄ₁’”｝となり、この素性の集合が解推定処理部５０５へ渡される。
【０１４４】
ステップＳ３１１：解推定処理部５０５により、学習結果データベース５０３に記憶された学習結果を参照して、渡された素性の集合の場合にどのような解になりやすいかを推定し、推定された解４を出力する。
【０１４５】
以下に、具体的な処理を例として言語解析処理システム５００の処理をより詳細に説明する。第１の具体例として、言語解析処理システム５００が受け身文・使役文から能動文への変換処理における変換後格助詞の推定を行う場合の処理例を示す。
【０１４６】
言語解析処理システム５００のスタック用教師なし学習処理システム１０１０では、予め受け身文・使役文から能動文への変換処理において変換すべき格助詞（推定すべき格助詞）を問題表現として記憶しておく。そして、文データベース５から取り出した文が「犬が噛む」であるときには、問題表現相当部として「が」を抽出して解（分類先）とし、文を「犬＜？＞噛む。」に変形して問題（文脈）とし、
事例（問題⇒解）：「犬＜？＞噛む。」⇒「が」
を記憶する。さらに、この事例から以下のような素性の集合を抽出する。
【０１４７】
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言と用言の間の元の（変換前）格助詞＝？（不明）
そして、この素性の集合の場合には変換後格助詞は「が」になりやすいと学習し、その学習結果を記憶する。
【０１４８】
また、文データベース５から取り出した文が「ヘビが噛む」であるときには、同様の処理により、
事例（問題⇒解）：「ヘビ＜？＞噛む。」⇒「が」
を記憶する。さらに、この事例から、以下のような素性の集合を抽出する。
【０１４９】
・推定すべき格にある体言ｎ＝ヘビ、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言と用言の間の元の（変換前）格助詞＝？（不明）
そして、この素性の集合の場合にも変換後格助詞は「が」になりやすいと学習し、その学習結果を記憶する。
【０１５０】
その後、解−素性対抽出部５０１により、解データベース６から、
事例（問題⇒解）：「犬に噛まれる。」⇒「が」
を取り出し、各事例ごとに解「が」と以下の素性の集合との組を抽出する。
【０１５１】
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖとの間の元の（変換前）格助詞＝に
さらに、第１素性追加部５１１により、抽出した解と素性の集合との組のうち、素性の集合のみを取り出し、スタック用教師なし学習処理システム１０１０へ渡す。スタック用教師なし学習処理システム１０１０では、予め記憶しておいた学習結果を参照して、受け取った素性の集合についてどのような解になりやすいかを推定し、推定された解ｄ₁「が」を第１素性追加部５１１へ返却する。
【０１５２】
次に、第１素性追加部５１１により、返却された解ｄ₁を素性として元の素性の集合に追加し、以下のような素性の集合とする。
【０１５３】
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖとの間の元の（変換前）格助詞＝に、
・教師なし学習の解析結果＝が（解ｄ₁）
そして、機械学習部５０２により、解と解ｄ₁を含む素性の集合との組から、どのような素性のときにどのような解になりやすいかを学習し、学習結果を学習結果データベース５０３に記憶する。
【０１５４】
その後、解を求めたい文が素性抽出部５０４に入力される。素性抽出部５０４により、入力文３から素性の集合を取り出す。例えば、入力文３が「ヘビに噛まれる。」である場合に、以下のような素性の集合を抽出して、第２素性追加部５１２へ渡す。
【０１５５】
・推定すべき格にある体言ｎ＝ヘビ、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖとの間の元の（変換前）格助詞＝に
そして、第２素性追加部５１２により、受け取った素性の集合がスタック用教師なし学習処理システム１０１０へ渡される。スタック用教師なし学習処理システム１０１０では、予め記憶しておいた学習結果を参照して、受け取った素性の集合のときにどのような解となりやすいかを推定し、推定された解ｄ₁’「が」を第２素性追加部５１２へ返却する。
【０１５６】
第２素性追加部５１２により、返却された解ｄ₁’を素性として元の素性の集合に追加する。例えば、以下のような素性の集合となる。
【０１５７】
・推定すべき格にある体言ｎ＝ヘビ、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖとの間の元の（変換前）格助詞＝に、
・教師なし学習の解析結果＝が（解ｄ₁’）
そして、解ｄ₁’を含む素性の集合は、解推定処理部５０５へ渡される。解推定処理部５０５により、学習結果データベース５０３に記憶された学習結果を参照して、渡された素性の集合の場合にどのような解になりやすいかを推定して、推定された解４を出力する。
【０１５８】
ここでは、スタック用教師なし学習処理システム１０１０から返却された解析結果「が」を追加した素性の集合をもとに教師あり学習の学習結果を参照して推定した格助詞「が」が出力される。
【０１５９】
このように、機械学習部５０２は、解データベース６の教師ありデータ（事例）から抽出した素性の集合に”教師なし学習の解析結果＝ｄ₁”を追加した素性の集合を用いて機械学習を行う。この場合に用いる素性の集合は、教師ありデータから抽出した素性の集合よりも素性の情報が多くなるため、教師ありデータのみを用いて機械学習を行う場合に比べてより高い精度で機械学習を行うことができる。また、データ量は膨大であるが素性の情報が少ない教師なしデータのみを用いて機械学習を行う場合に比べても、素性の情報が多い点でより高い精度の機械学習を行うことができる。
【０１６０】
さらに、解推定処理部５０５は、素性の集合の情報が多い事例を用いて学習された高い精度の学習結果を参照して、入力文３から抽出した素性の集合の類似性をみることになる。したがって、素性の集合に”教師なし学習の解析結果＝ｄ₁’”を含まない場合に比べて、素性の集合同士の類似性が高くなり、推定処理の精度も高くなる。
【０１６１】
第２の具体例として、言語解析処理システム５００が、文の意味が深層格などで表現されている場合に、その文を生成する際に与えられる表層格を推定する処理を行う場合の処理例を示す。
【０１６２】
例えば、文の意味を深層格で示すと以下のように表すことができる。
【０１６３】
文「りんご＜←ｏｂｊ＞食べる」
この文において、「りんご」は「食べる」の目的語であり、「りんご」と「食べる」とは深層格の目的格（＜←ｏｂｊ＞で示す。）で連結されている。
【０１６４】
そして、文生成処理では、前記の元の文から、生成文「りんごを食べる」を出力するが、この場合に＜←ｏｂｊ＞に対応する格助詞「を」を生成する必要がある。この処理において与えられる問題構造（問題⇒格）を以下に示す。
【０１６５】
問題（問題⇒格）：
「りんご＜←ｏｂｊ＞食べる」⇒「を」
言語解析処理システム５００のスタック用教師なし学習処理システム１０１０は、与えられている深層格を問題表現として記憶しておく。そして、スタック用教師なし学習処理システム１０１０では、文データベース５から取り出した文が「りんごを食べる。」である場合に、格助詞「を」を問題表現相当部として置き換え、格助詞「を」を解として抽出し、取り出した文の問題表現相当部を変換した結果得た文を問題として、以下のような事例を教師なしデータとして記憶する。
【０１６６】
事例（問題⇒解）：
「りんご＜？＞食べる」⇒「を」
さらに、この事例から解と素性の集合との組を抽出する。ここで、素性の集合は、以下のようになる。
【０１６７】
・生成すべき格にある体言ｎ＝りんご、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝？（不明）
そして、どのような素性の集合のときにどのような解となりやすいかを学習し、その学習結果を記憶しておく。例えば、前記の素性の集合の場合には「解＝を」になりやすいと学習する。
【０１６８】
また、文データベース５から文「みかんを食べる」を取り出したとする。この場合には、以下のような事例を教師なしデータとする。
【０１６９】
事例（問題⇒解）：
「みかん＜？＞食べる」⇒「を」
さらに、この事例から解と素性の集合との組を抽出する。ここで、素性の集合は、以下のようになる。
【０１７０】
・生成すべき格にある体言ｎ＝みかん、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝？（不明）
なお、文生成処理における格推定の場合にも、一般的な教師ありデータに比べて素性の情報は少なくなるが、教師なしデータとして利用できる文自体は多量にあるため、多数の教師なしデータを準備することが可能である。
【０１７１】
そして、どのような素性の集合のときにどのような解となりやすいかを学習し、その学習結果を記憶しておく。この場合にも、「解＝を」になりやすいと学習する。
【０１７２】
その後、解−素性対抽出部５０１により、解データベース６から以下の事例を取り出したとする。
【０１７３】
事例：「りんご＜←ｏｂｊ＞食べる」⇒「を」
さらに、取り出した事例から解と素性の集合との組を抽出する。素性の集合として以下のものが抽出される。
【０１７４】
・生成すべき格にある体言ｎ＝りんご、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝ｏｂｊ
第１素性追加部５１１により、抽出した素性の集合をスタック用教師なし学習処理システム１０１０へ渡し、スタック用教師なし学習処理システム１０１０では、記憶しておいた学習結果をもとに、受け取った素性の集合の場合にどのような解になりやすいかを推定し、推定された解ｄ₁＝「を」を第１素性追加部５１１へ返却する。そして、第１素性追加部５１１は、返却された解ｄ₁を素性の集合に追加して、以下の素性の集合とする。
【０１７５】
・生成すべき格にある体言ｎ＝りんご、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝ｏｂｊ、
・教師なし学習の解析結果＝を（解ｄ₁）
そして、機械学習部５０２は、前記の素性の集合の場合にどのような解になりやすいかを学習する。このとき、スタック用教師なし学習処理システム１０１０から取得した解ｄ₁による”教師なし学習の解析結果＝を（解ｄ₁）”を素性の集合として持つため、
・生成すべき格にある体言ｎ＝りんご、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝ｏｂｊ、
・教師なし学習の解析結果＝を（解ｄ₁）
という素性があれば、「を」が解となるという学習ができている。この学習結果を学習結果データベース５０３に記憶する。
【０１７６】
その後、素性抽出部５０４に文「みかん＜←ｏｂｊ＞食べる」が入力されると、素性抽出部５０４は、入力文３から、以下のような素性の集合を抽出して、第２素性追加部５１２へ渡す。
【０１７７】
・生成すべき格にある体言ｎ＝みかん、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝ｏｂｊ
第２素性追加部５１２により、この素性の集合がスタック用教師なし学習処理システム１０１０に渡されると、スタック用教師なし学習処理システム１０１０では、記憶しておいた学習結果を参照して受け取った素性の集合の場合になりやすい解ｄ₁’＝「を」を推定し、第２素性追加部５１２へ返却する。
【０１７８】
第２素性追加部５１２は、元の素性の集合に解ｄ₁’を追加した以下の素性の集合を解推定処理部５０５へ渡す。
【０１７９】
・生成すべき格にある体言ｎ＝みかん、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝ｏｂｊ、
・教師なし学習の解析結果＝を（解ｄ₁’）
解推定処理部５０５により、この素性の集合の場合にどのような解になりやすいかを推定する。ここで、学習結果として記憶しておいた素性の集合と、入力文３から抽出した素性の集合とがよく類似しているので、学習結果で解とした「を」を正しく推定することができる。そして、推定された解４として生成すべき格助詞「を」を出力する。
【０１８０】
次に、第３の具体例として、言語解析処理システム５００が、動詞の省略表現を補完する処理を行う場合の処理例を示す。例えば、「そんなにうまくいくとは。」という文は文末の動詞部分が省略されている表現であると考えて、省略された動詞部分「思えない」を補完する処理を行う。
【０１８１】
この場合に、省略された「補完すべき動詞部分」を問題表現とし、その省略表現を補完する「動詞部分」を解とする。言語解析処理システム５００のスタック用教師なし学習処理システム１０１０では、このような問題表現を抽出するために予め問題表現情報を記憶しておく。
【０１８２】
そして、文データベース５から取り出した文が「そんなにうまくいくとは思えない。」である場合に、文末の動詞部分を問題表現相当部として置き換え、文末の動詞部分「思えない」を解として抽出し、取り出した文の問題表現相当部を変換した結果得た文を問題として、以下のような事例を教師なしデータとして記憶する。
【０１８３】
事例（問題⇒解）：
「そんなにうまくいくとは＜？＞」⇒「思えない」
さらに、この事例から解と素性の集合との組を抽出する。ここで、素性の集合は、以下のようになる。
【０１８４】
・「は」、
・「とは」、
・「くとは」、
・「いくとは」、
…、
・「そんなにうまくいくとは思えない」
そして、どのような素性の集合のときにどのような解となりやすいかを学習し、その学習結果を記憶しておく。例えば、前記の素性の集合の場合には「解＝思えない」になりやすいと学習する。
【０１８５】
その後、解−素性対抽出部５０１により、解データベース６から、
事例：「そんなにうまくいくとは。」⇒「思えない」
を取り出し、取り出した事例から解と素性の集合との組を抽出する。ここで、素性の集合は、以下の素性からなる。
【０１８６】
・「は」、
・「とは」、
・「くとは」、
・「いくとは」、
…、
・「そんなにうまくいくとは」
・「そんなにうまくいくとは思えない」
第１素性追加部５１１は、抽出した素性の集合をスタック用教師なし学習処理システム１０１０へ渡す。
【０１８７】
スタック用教師なし学習処理システム１０１０では、記憶しておいた学習結果をもとに、受け取った素性の集合の場合にどのような解になりやすいかを推定し、推定された解ｄ₁＝「思えない」を第１素性追加部５１１へ返却する。
【０１８８】
そして、第１素性追加部５１１は、返却された解ｄ₁を素性の集合に追加して、以下の素性の集合とする。
【０１８９】
・「は」、
・「とは」、
・「くとは」、
・「いくとは」、
…、
・「そんなにうまくいくとは」
・「そんなにうまくいくとは思えない」
・教師なし学習の解析結果＝思えない（解ｄ₁）
そして、機械学習部５０２は、前記の素性の集合の場合にどのような解になりやすいかを学習し、学習結果を学習結果データベース５０３に記憶する。
【０１９０】
その後、素性抽出部５０４に文「そううまくいくとは。」が入力されると、素性抽出部５０４は、入力文３から、以下のような素性の集合を抽出して、第２素性追加部５１２へ渡す。
【０１９１】
・「は」、
・「とは」、
・「くとは」、
・「いくとは」、
…、
・「そううまくいくとは」
第２素性追加部５１２により、この素性の集合がスタック用教師なし学習処理システム１０１０に渡されると、スタック用教師なし学習処理システム１０１０では、記憶しておいた学習結果を参照して受け取った素性の集合の場合になりやすい解ｄ₁’＝「思えない」を推定し、第２素性追加部５１２へ返却する。
【０１９２】
第２素性追加部５１２は、元の素性の集合に解ｄ₁’を追加した以下の素性の集合を解推定処理部５０５へ渡す。
【０１９３】
・「は」、
・「とは」、
・「くとは」、
・「いくとは」、
…、
・「そううまくいくとは」
・教師なし学習の解析結果＝思えない（解ｄ₁’）
解推定処理部５０５により、この素性の集合の場合にどのような解になりやすいかを推定し、推定された解４として省略された動詞部分「思えない」を出力する。
【０１９４】
図１６に、第４の実施の形態における言語解析処理システムの別の構成例を示す。言語解析処理システム５４０は、言語解析処理システム５００と同様の処理手段を備え、スタック用教師なし学習処理システム１０１０の代わりに、スタック用教師あり／なし学習処理システム１０２０を備えた構成をとる。
【０１９５】
スタック用教師あり／なし学習処理システム１０２０は、スタック用教師なし学習処理システム１０１０と同様の処理手段に解データベース２を追加した構成をとる。スタック用教師あり／なし学習処理システム１０２０は、言語解析処理について、文データベース５から生成した教師なしデータおよび解データベース２の事例（教師ありデータ）からそれぞれ素性の集合を抽出し、抽出された素性からどのような素性の集合のときにどのような解（解析結果）になりやすいかを学習してその学習結果を記憶しておき、第１素性追加部５１１または第２素性追加部５１２から受け取った素性の集合の場合にどのような解（解析結果）になりやすいかを記憶しておいた学習結果から推定し、推定された解ｄ₂を第１素性追加部５１１へ、または解ｄ₂’を第２素性追加部５１２へ返却する手段である。
【０１９６】
言語解析処理システム５４０の第１素性追加部５１１は、スタック用教師あり／なし学習処理システム１０２０から返却された解ｄ₂を受け取り、”教師あり／なし学習の解析結果＝ｄ₂”を素性として元の素性の集合に追加する。また、言語解析処理システム５４０の第２素性追加部５１２は、スタック用教師あり／なし学習処理システム１０２０から返却された解ｄ₂’を受け取り、”教師あり／なし学習の解析結果＝ｄ₂’”を素性として素性の集合に追加する。
【０１９７】
さらに、図１７に、第４の実施の形態における言語解析処理システムの別の構成例を示す。
【０１９８】
言語解析処理システム５５０は、与えられた問題に対する言語解析処理の解析結果を出力システムであって、ＣＰＵおよびメモリからなり、素性−解対・素性−解候補対抽出部５６１、機械学習部５６２、学習結果データベース５６３、素性−解候補抽出部５６４、解推定処理部５６５、スタック用教師なし学習処理システム１０３０、第１素性追加部５２１、第２素性追加部５２２、文データベース５、および解データベース６を備える。
【０１９９】
素性−解対・素性−解候補対抽出部５６１、機械学習部５６２、学習結果データベース５６３、素性−解候補抽出部５６４、および解推定処理部５６５の各処理手段は、それぞれ、文変換処理システム１５０の素性−解対・素性−解候補対抽出部１６１、機械学習部１６２、学習結果データベース１６３、素性−解候補抽出部１７０、および解推定処理部１７１とほぼ同様の処理を行う手段である。
【０２００】
スタック用教師なし学習処理システム１０３０は、言語解析処理について、文データベース５から生成した教師なしデータから解もしくは解候補と素性の集合との組を抽出し、抽出された解もしくは解候補と素性の集合との組から、どのような解もしくは解候補と素性の集合のときに正例である確率または負例である確率を機械学習法により学習してその学習結果を記憶しておき、この学習結果を参照して第１素性追加部５２１または第２素性追加部５２２から受け取った解もしくは解候補と素性の集合との組の場合に正例または負例である確率を求めて正例である確率が最も大きい解候補を解（解析結果）と推定し、推定された解ｄ₃を第１素性追加部５２１へまたは解ｄ₃’を第２素性追加部５２２へ返却する手段である。
【０２０１】
スタック用教師なし学習処理システム１０３０は、解ｄ₃、解ｄ₃’として、解と推定した解候補を出力するとともに、その解が正例もしくは負例であるかの情報や、正例もしくは負例である確率の情報などを出力することもできる。
【０２０２】
スタック用教師なし学習処理システム１０３０は、図１０に示す文変換処理システム２５０と同様に構成された処理手段、すなわち問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、問題構造変換部２０４、教師なしデータ記憶部２０５、素性−解対・素性−解候補対抽出部１６１、機械学習部１６２、学習結果データベース１６３、素性−解候補抽出部１７０、および解推定処理部１７１を備え（図示しない）、与えられた問題に対する言語解析処理の解析結果を出力する。
【０２０３】
第１素性追加部５２１は、素性−解対・素性−解候補対抽出部５６１から受け取った解もしくは解候補と素性の集合との組をスタック用教師なし学習処理システム１０３０へ渡し、スタック用教師なし学習処理システム１０３０から返却された解ｄ₃を受け取り、”教師なし学習の解析結果＝解ｄ₃”を素性として元の素性の集合に追加する手段である。
【０２０４】
第２素性追加部５２２は、素性−解候補抽出部５６４から受け取った解候補と素性の集合との組をスタック用教師なし学習処理システム１０３０へ渡し、スタック用教師なし学習処理システム１０３０から返却された解ｄ₃’を受け取り、”教師なし学習の解析結果＝解ｄ₃’”を素性として元の素性の集合に追加する手段である。
【０２０５】
図１８および図１９に、言語解析処理システム５５０の処理フローを示す。
【０２０６】
ステップＳ４０：スタック用教師なし学習処理システム１０３０では、文データベース５に格納された単文を取り出し、取り出した文から問題表現情報を参照して問題表現相当部を抽出して解とし、さらに意味解析情報を参照して問題表現相当部を問題構造に変換し、変換結果として得た文を問題として「問題−解」構造を持つ事例を教師なしデータとして記憶する。さらに、各事例ごとに解もしくは解候補と素性の集合との組を抽出し、どのような解もしくは解候補と素性の集合との組のときに正例である確率または負例である確率を機械学習法により学習し、学習結果を記憶しておく。
【０２０７】
ステップＳ４１：その後、素性−解対・素性−解候補対抽出部５６１により、解データベース６から事例を取り出し、各事例ごとに解もしくは解候補と素性の集合との組を抽出する。
【０２０８】
ステップＳ４２：第１素性追加部５２１により、解もしくは解候補と素性の集合との組をスタック用教師なし学習処理システム１０３０へ渡す。
【０２０９】
ステップＳ４３：スタック用教師なし学習処理システム１０３０では、予め記憶しておいた学習結果を参照して、受け取った解もしくは解候補と素性の集合との組について正例である確率または負例である確率を求めて正例である確率が最も大きい解候補を解ｄ₃と推定し、解ｄ₃を第１素性追加部５２１へ返却する。
【０２１０】
ステップＳ４４：第１素性追加部５２１により、返却された解ｄ₃から、”教師なし学習の解析結果＝解ｄ₃”を素性として元の素性の集合に追加する。解ｄ₃として、推定された解候補の他に、正例もしくは負例であるかの情報、正例もしくは負例である確率などの情報が含まれている場合には、受け取った解ｄ₃に含まれる情報の一部または全部を素性の集合に追加するようにしてもよい。例えば、”教師なし学習の解析結果＝推定された解候補（解ｄ₃）”、”教師なし学習の解析結果＝正例／負例（解ｄ₃）”、または”教師なし学習の解析結果＝正例の確率／負例の確率（解ｄ₃）”のような素性の１つもしくは複数が元の素性の集合に追加される。
【０２１１】
ステップＳ４１〜ステップＳ４４の処理は、すべての解もしくは解候補と素性の集合との組について行なわれる。
【０２１２】
ステップＳ４５：機械学習部５６２により、解もしくは解候補と解ｄ₃を含む素性の集合との組から、どのような解もしくは解候補と素性の集合の組のときに正例である確率または負例である確率を機械学習法により求め、その学習結果を学習結果データベース５６３に記憶する。
【０２１３】
ステップＳ４６：解を求めたい文が素性−解候補抽出部５６４に入力される。
【０２１４】
ステップＳ４７：素性−解候補抽出部５６４により、入力文３から解候補と素性の集合との組を取り出す。
【０２１５】
ステップＳ４８：第２素性追加部５２２により、受け取った解候補と素性の集合との組をスタック用教師なし学習処理システム１０３０へ渡す。
【０２１６】
ステップＳ４９：スタック用教師なし学習処理システム１０３０では、予め記憶しておいた学習結果を参照して、受け取った解候補と素性の集合との組からどのような解候補と素性の集合との組のときに正例である確率または負例である確率を求めて正例である確率が最も大きい解候補を解ｄ₃’と推定し、解ｄ₃’を第２素性追加部５２２へ返却する。
【０２１７】
ステップＳ４１０：第２素性追加部５２２により、返却された解ｄ₃’から、”教師なし学習の解析結果＝解ｄ₃’”を素性として元の素性の集合に追加する。
【０２１８】
ステップＳ４１１：解推定処理部５６５により、学習結果データベース５６３に記憶された学習結果を参照して、渡された解候補と素性の集合との場合に正例である確率または負例である確率を求める。すべての解候補についてこの確率を求め、正例である確率が最も大きい解候補を求める解４として出力する。
【０２１９】
図２０に、第４の実施の形態における言語解析処理システムの別の構成例を示す。言語解析処理システム５８０は、言語解析処理システム５５０と同様の処理手段を備え、スタック用教師なし学習処理システム１０３０の代わりに、スタック用教師あり／なし学習処理システム１０４０を備えた構成をとる。
【０２２０】
スタック用教師あり／なし学習処理システム１０４０は、スタック用教師あり／なし学習処理システム１０２０と同様の処理手段に解データベース２を追加した構成をとる。スタック用教師あり／なし学習処理システム１０４０は、言語解析処理について、文データベース５から生成した教師なしデータから解もしくは解候補と素性の集合との組を抽出し、抽出された解もしくは解候補と素性の集合との組から、どのような解もしくは解候補と素性の集合のときに正例である確率または負例である確率を機械学習法により学習してその学習結果を記憶しておき、この学習結果を参照して第１素性追加部５２１または第２素性追加部５２２から受け取った解もしくは解候補と素性の集合との組の場合に正例または負例である確率を求めて正例である確率が最も大きい解候補を解（解析結果）と推定し、推定された解ｄ₄を第１素性追加部５２１へまたは解ｄ₄’を第２素性追加部５２２へ返却する手段である。
【０２２１】
スタック用教師あり／なし学習処理システム１０４０は、解ｄ₄、解ｄ₄’として、解と推定した解候補を出力するとともに、その解が正例もしくは負例であるかの情報や、正例もしくは負例である確率の情報などを出力することもできる。
【０２２２】
言語解析処理システム５８０の第１素性追加部５２１は、スタック用教師あり／なし学習処理システム１０４０から返却された解ｄ₄を受け取り、”教師あり／なし学習の解析結果＝ｄ₄”を素性として元の素性の集合に追加する。また、言語解析処理システム５８０の第２素性追加部５２２は、スタック用教師あり／なし学習処理システム１０４０から返却された解ｄ₄’を受け取り、”教師あり／なし学習の解析結果＝ｄ₄’”を素性として元の素性の集合に追加する。
【０２２３】
図２１に、第４の実施の形態における言語解析処理システムの別の構成例を示す。言語解析処理システム６００は、言語解析処理システム５００と同様の処理手段を備え、さらにスタック用教師あり／なし学習処理システム１０２０を備えた構成をとる。
【０２２４】
言語解析処理システム６００の第１素性追加部６１１は、解−素性対抽出部５０１から受け取った解と素性の集合との組から素性の集合のみをスタック用教師なし学習処理システム１０１０およびスタック用教師あり／なし学習処理システム１０２０へ渡し、スタック用教師なし学習処理システム１０１０から返却された解ｄ₁およびスタック用教師あり／なし学習処理システム１０２０から返却された解ｄ₂を受け取る。そして、”教師なし学習の解析結果＝ｄ₁”および”教師あり／なし学習の解析結果＝ｄ₂”を素性として元の素性の集合に追加する。
【０２２５】
また、言語解析処理システム６００の第２素性追加部６１２は、素性抽出部５０４から受け取った素性の集合をスタック用教師なし学習処理システム１０１０およびスタック用教師あり／なし学習処理システム１０２０へ渡し、スタック用教師なし学習処理システム１０１０から返却された解ｄ₁’およびスタック用教師あり／なし学習処理システム１０２０から返却された解ｄ₂’を受け取り、”教師なし学習の解析結果＝ｄ₁’”および”教師あり／なし学習の解析結果＝ｄ₂’”を素性として元の素性の集合に追加する。
【０２２６】
図２２に、第４の実施の形態における言語解析処理システムの別の構成例を示す。言語解析処理システム６５０は、言語解析処理システム５５０と同様の処理手段を備え、さらにスタック用教師あり／なし学習処理システム１０４０を備えた構成をとる。
【０２２７】
言語解析処理システム６５０の第１素性追加部６２１は、素性−解対・素性−解候補対抽出部５６１から受け取った解もしくは解候補と素性の集合との組をスタック用教師なし学習処理システム１０３０およびスタック用教師あり／なし学習処理システム１０４０へ渡し、スタック用教師なし学習処理システム１０３０から返却された解ｄ₃およびスタック用教師あり／なし学習処理システム１０４０から返却された解ｄ₄を受け取る。そして、”教師なし学習の解析結果＝ｄ₃”および”教師あり／なし学習の解析結果＝ｄ₄”を素性として元の素性の集合に追加する。
【０２２８】
また、言語解析処理システム６５０の第２素性追加部６２２は、素性−解候補抽出部５６４から受け取った解候補と素性の集合との組をスタック用教師なし学習処理システム１０３０およびスタック用教師あり／なし学習処理システム１０４０へ渡し、スタック用教師なし学習処理システム１０３０から返却された解ｄ₃’およびスタック用教師あり／なし学習処理システム１０４０から返却された解ｄ₄’を受け取り、”教師なし学習の解析結果＝ｄ₃’”および”教師あり／なし学習の解析結果＝ｄ₄’”を素性として元の素性の集合に追加する。
【０２２９】
スタック用教師なし学習処理システム１０３０およびスタック用教師あり／なし学習処理システム１０４０は、解ｄ₃、解ｄ₃’、解ｄ₄、解ｄ₄’として、解と推定した解候補を出力するとともに、その解が正例もしくは負例であるかの情報や、正例もしくは負例である確率の情報などを出力することもできる。この場合には、受け取った解に含まれる情報の一部または全部が素性の集合に追加されるようにする。例えば、”教師なし学習の解析結果＝推定された解候補”、”教師なし学習の解析結果＝正例／負例”、または”教師なし学習の解析結果＝正例の確率／負例の確率”のような素性などの１つもしくは複数が元の素性の集合に追加される。
【０２３０】
すでに説明したように、教師なしデータは、教師ありデータと異なる性質を持つことから、単純に教師なしデータを教師ありデータに追加して機械学習を行うことが処理精度の改善に不十分である場合もある。本形態のようにスタッキング手法により教師なしデータによる機械学習と教師ありデータによる機械学習とを融合することで、これら双方の学習の利点を適切に利用することができ、解析処理の精度向上を図ることができたと思われる。
【０２３１】
最後に、従来技術による手法と本発明の手法の実施例を示す。実施例として受け身文・使役文から能動文への文変換処理における格変換処理を採用した。機械学習法としてサポートベクトルマシン法を採用した。また、京大コーパスを教師ありデータとして利用し、また、京大コーパスに含まれるの能動文のすべての格助詞（５３，１５７個）を教師なしデータとして利用した。図２３に、教師なしデータにおける変換後格助詞の分布を示す。
【０２３２】
さらに、実施例での処理精度の評価にも京大コーパスを用い、１０分割のクロスバリデーションにより評価を行った。
［参考文献６：黒橋禎夫、長尾真、京都大学テキストコーパス・プロジェクト、言語処理学会第３回年次大会、1997、pp115-118 ］
以下の方法を用いて格助詞の変換の実験を行なった。
【０２３３】
・教師あり学習の利用
・教師なし学習の利用
・教師あり／なし学習の利用
・スタッキング手法１：
教師なし学習の解析結果を素性に追加後、教師あり学習を行なう。
【０２３４】
・スタッキング手法２：
教師あり／なし学習の解析結果を素性に追加後、教師あり学習を行なう。
【０２３５】
・スタッキング手法３：
教師なし学習の解析結果と教師あり／なし学習の解析結果とを素性に追加後、教師あり学習を行なう。
【０２３６】
処理精度の評価結果を、以下に示す。処理精度は教師ありデータの事例数４，６７１個のうち、どれだけ正解したかを意味する。
【０２３７】
・教師あり学習の利用＝８９．０６％
・教師なし学習の利用＝５１．１５％
・教師あり／なし学習の利用＝８７．０９％
・スタッキング手法１＝８９．４７％
・スタッキング手法２＝８９．５５％
・スタッキング手法３＝８９．５５％
教師あり学習方法を用いた処理の精度は、８９．０６％であった。これは、受け身文・使役文から能動文へ文変換における格助詞の変換処理を、機械学習法を用いて処理することにより、少なくともこの精度で実現できることを意味する。従来、機械学習法を用いた格助詞の変換処理はないので、本発明の実施例が示すこの精度は、本発明の格別な効果を示すものである。
【０２３８】
教師なし学習方法を用いた処理の精度は、５１．１５％と極めて低かった。解析対象である変換前格助詞の情報の欠如の影響が大きいと考えられる。
【０２３９】
また、教師あり／なし学習方法を用いた処理の精度も、教師あり学習方法を用いた処理の精度よりも低かった。教師なしデータは、教師ありデータとは異なる性質を持つため、教師なしデータの利用が精度低下を招いたと考えられる。
【０２４０】
すべてのスタッキング手法を用いた処理の精度は、教師あり学習方法を用いた処理の精度の精度を上回った。しかし、精度の向上は大きくない。そこで、二項検定を使って統計的検定を行なった結果、すべてのスタッキング手法が教師あり学習に対して有意水準０．０１で有意差を持った。このため、本発明における、教師なし学習の結果を素性に追加して利用する手法が、効果を持つことが確認できた。
【０２４１】
さらに、本発明の「教師あり学習を用いた処理」の精度との比較のため、従来技術の一つとして非特許文献４に記載された方法による処理を実施した。
【０２４２】
非特許文献４に記載された手法による格変換処理の精度はＦ値で３６％（再現率７５％、適合率２４％) であった。この従来技術による処理精度が低い理由は、与えられた文に辞書にない語が存在することである。そのような辞書に未定義の語を登録した後の処理の精度はＦ値で８３％（再現率９４％、適合率７４％) であった。なお、ここで精度をＦ値で示しているのは非特許文献４の手法での格変換は１つの入力に複数の変換結果を出力するためである。このように、すでに指摘したとおり既存の各フレーム辞書の不十分さの影響が大きいことがわかる。
【０２４３】
また、非特許文献４の手法による処理結果が文単位であるため、本発明による処理結果も文単位で集計した。このとき、本発明による処理では、文単位の精度は８５．５８％であった。ただし、ここでの文単位は用言が１つの文であり、複文など複数の文により構成されている文は用言が１つの文に分割してから精度の算出を行なった。
【０２４４】
本発明による処理の精度は、非特許文献４に示す手法で未知語などを辞書に登録した後の処理精度と同程度である。本発明では、解析対象となる情報について辞書への追加登録などは一切行なわずに８５％程度の精度を得ている。このことから、本発明による処理が、従来技術より高い精度で処理を行えることがわかる。
【０２４５】
以上、本発明をその実施の形態により説明したが、本発明はその主旨の範囲において種々の変形が可能であることは当然である。
【０２４６】
本発明の実施の形態では、主に受け身文、使役文から能動文への変換処理における格助詞の変換を扱った。しかし、本発明における機械学習部での分類先を能動文での格助詞から受け身文、使役文での格助詞とすることにより、能動文から受け身文、使役文への変換処理についても本発明を適用することが可能である。
【０２４７】
また、本発明の実施の形態で言語解析処理として説明した解析処理以外にも、指示詞・代名詞・ゼロ代名詞などの照応解析、間接照応解析、「ＡのＢ」の意味解析、換喩解析などの種々の解析処理、文生成処理における格助詞生成処理、翻訳処理における格助詞生成処理などの処理についても本発明を適用することが可能である。
【０２４８】
また、本発明の各手段または機能または要素は、コンピュータにより読み取られ実行される処理プログラムとして実現することができる。また、本発明を実現する処理プログラムは、コンピュータが読み取り可能な、可搬媒体メモリ、半導体メモリ、ハードディスクなどの適当な記録媒体に格納することができ、これらの記録媒体に記録して提供され、または、通信インタフェースを介して種々の通信網を利用した送受信により提供されるものである。
【０２４９】
【発明の効果】
以上説明したように、本発明により、教師なしデータを用いた機械学習の解析結果を素性に追加し、追加された素性を持つ教師ありデータを用いて機械学習を行なう新しい手法を実現した。これにより、教師なしデータと教師ありデータの双方の利点を用いた機械学習が実現でき、より高い精度の文変換処理を実現することが可能となった。
【０２５０】
特に本発明は、省略補完処理、文生成処理、機械翻訳処理、文字認識処理、音声認識処理など、語句生成処理を含むようなきわめて広範囲の問題に適用することができる。これにより、実用性の高い言語解析処理システムを実現することができる。
【０２５１】
また、本発明により、日本語の受け身文・使役文から能動文へ変換処理における格助詞の変換を機械学習を用いて行う新しい手法を実現した。本発明により、従来に比べて高い精度で変換後格助詞の推定を行うことが可能となった。
【０２５２】
本発明を適用した受け身文・使役文から能動文への変換は、文生成処理、文言い換え処理、知識獲得システム、質問応答システムなどのコンピュータを用いた自然言語処理の数多くの分野で役に立つものである。
【図面の簡単な説明】
【図１】第１の実施の形態における文変換処理システムの構成例を示す図である。
【図２】第１の実施の形態における文変換処理システムの処理フローを示す図である。
【図３】タグ付きコーパスに記憶されている事例の例を示す図である。
【図４】サポートベクトルマシン法のマージン最大化の概念を示す図である。
【図５】第１の実施の形態における文変換処理システムの別の構成例を示す図である。
【図６】第１の実施の形態において別の構成例をとる文変換処理システムの処理フローを示す図である。
【図７】教師なしデータを説明するための図である。
【図８】第２の実施の形態における文変換処理システムの構成例を示す図である。
【図９】教師なしデータ生成処理の処理フローを示す図である。
【図１０】第２の実施の形態における文変換処理システムの別の構成例を示す図である。
【図１１】第３の実施の形態における文変換処理システムの構成例を示す図である。
【図１２】第３の実施の形態における文変換処理システムの別の構成例を示す図である。
【図１３】第４の実施の形態における言語解析処理システムの構成例を示す図である。
【図１４】第４の実施の形態における言語解析処理システムの処理フローを示す図である。
【図１５】第４の実施の形態における言語解析処理システムの処理フローを示す図である。
【図１６】第４の実施の形態における言語解析処理システムの別の構成例を示す図である。
【図１７】第４の実施の形態における言語解析処理システムの別の構成例を示す図である。
【図１８】第４の実施の形態において別の構成例をとる言語解析処理システムの処理フローを示す図である。
【図１９】第４の実施の形態において別の構成例をとる言語解析処理システムの処理フローを示す図である。
【図２０】第４の実施の形態における言語解析処理システムの別の構成例を示す図である。
【図２１】第４の実施の形態における言語解析処理システムの別の構成例を示す図である。
【図２２】第４の実施の形態における言語解析処理システムの別の構成例を示す図である。
【図２３】実施例において教師なしデータにおける変換後格助詞の分布を示す図である。
【符号の説明】
１００，１５０，２００，２５０，３００，３５０文変換処理システム
１０１，５０１解−素性対抽出部
１０２，１６２，５０２，５６２機械学習部
１０３，１６３，５０３，５６３学習結果データベース
１１０，５０４素性抽出部
１１１，１７１，５０５，５６５解推定処理部
１６１，５６１素性−解対・素性−解候補対抽出部
１７０，５６４素性−解候補抽出部
２０１問題表現相当部抽出部
２０２問題表現情報記憶部
２０３意味解析情報記憶部
２０４問題構造変換部
２０５教師なしデータ記憶部
５００，５４０，５５０，５８０，６００，６５０言語解析処理システム
５１１，５２１，６１１，６２１第１素性追加部
５１２，５２２，６１２，６２２第２素性追加部
１０１０，１０３０スタック用教師なし学習処理システム
１０２０，１０４０スタック用教師あり／なし学習処理システム
２，６解データベース
３入力文
４解
５文データベース[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a natural language processing technique realized by a computer. More specifically, the present invention relates to a language analysis processing method using a sentence digitized by a machine learning method and a processing system that realizes the processing method.
[0002]
In particular, the present invention can be applied to language processing that handles a very wide range of problems including processing for generating phrases, such as abbreviation completion processing, sentence generation processing, machine translation processing, character recognition processing, and speech recognition processing. .
[0003]
[Prior art]
In the field of language analysis processing, semantic analysis processing, which is the next stage of morphological analysis and syntax analysis, is becoming increasingly important. In particular, in case analysis processing and omission analysis processing, which are main parts of semantic analysis, reduction of the labor required for processing and improvement in processing accuracy are desired.
[0004]
Case analysis processing is processing to restore a surface case that is hidden by part of a sentence being themed or linked. For example, in the sentence “I ate apples”, the part of “Apples” is themed, but when I return this part to the surface case, it is “Apples”. In this way, the process of analyzing the “ha” part of “apple is” in “I ate apples” as “wo”. Also, in the sentence "I have read the book I bought yesterday", the "Buyed book" part is integrated, but when I put this part back to the surface case, I bought "Book". In this case, the part of the “Buyed Book” is analyzed as “Wo”.
[0005]
Omission analysis processing means processing for restoring a surface case that is omitted from a part of a sentence. For example, in the sentence “I bought a mandarin orange and ate it”, the noun phrase (zero pronoun) omitted in the part of “and ate” is analyzed as “mandarin orange”.
[0006]
When such language analysis processing is realized by a computer, a method for performing language analysis processing using a machine learning method has been presented in order to obtain high processing accuracy while reducing the labor burden of the person performing the processing ( Non-patent document 1).
[0007]
The method of performing language analysis processing using the machine learning method presented in Non-Patent Document 1 (non-borrowed machine learning method) has the following advantages.
(i) By preparing a corpus having larger teacher data, it can be estimated that processing can be performed with higher accuracy.
(ii) When a better machine learning method is developed, it can be predicted that higher accuracy can be obtained by using the machine learning method.
[0008]
Furthermore, Non-Patent Document 1 presented a language analysis processing method using a borrowed machine learning method. The borrowed machine learning method is a machine learning method using a teacher signal generated from data to which information to be analyzed by the machine learning method is not added (hereinafter referred to as “unsupervised data”). According to the borrowed machine learning method, for example, a large amount of generalized computerized data can be used without using data to which analysis information (solution information) is assigned in advance, such as a case frame dictionary. Can be used as unsupervised data for machine learning, and the learning accuracy of machine learning by a large amount of teacher signals is improved, so that highly accurate language analysis processing can be realized.
[0009]
Furthermore, Non-Patent Document 1 presented a language analysis processing method using a combined machine learning method. The combined machine learning method is generated from the teacher signal used in the normal machine learning method, that is, data to which the information to be analyzed by the machine learning method is added (hereinafter referred to as “supervised data”) and unsupervised data. This is a method of performing machine learning using a teacher signal. According to the combined machine learning method, language analysis processing that takes advantage of both the large amount of teacher signals generated from unsupervised data that is easy to acquire and the teacher signals of supervised data that can ensure normal learning accuracy Can be realized.
[0010]
Further, as an important problem in the field of natural language processing, there is a conversion process from a passive sentence or a use sentence to an active sentence. This sentence conversion processing is useful in many research fields, such as sentence generation processing, paraphrase processing, sentence simplification / language operation support, knowledge acquisition / information extraction processing using natural language sentences, and question answering systems. For example, in a question answering system, when there is a document in which a question sentence is written in an active sentence and a sentence including an answer is written in a passive sentence, the sentence structure is different between the question sentence and the sentence including the answer. Sometimes it is difficult to retrieve the answers to questions. Such a problem can also be solved by performing a conversion process from a passive sentence or a usage sentence to an active sentence.
[0011]
When sentence conversion processing is performed on Japanese passive sentences and usage sentences into active sentences, it is required to estimate post-conversion case particles used after sentence conversion. For example, when converting from a passive sentence "I was bitten by a dog" to an active sentence "Dog bite me", the case particle "Ni" for "Dog" becomes "G", This is a process of estimating that “I” is converted to “I”. In addition, when converting the usage sentence “He made her cut her hair” into the active sentence “She cut her hair”, the case particle “ni” for “to her” was “ga”. This is a process of estimating that “no” of “hair” is not converted. However, the conversion of case particles in the conversion process from passive sentence or use sentence to active sentence is not a problem that can be easily processed automatically because the converted case particles depend on the verb and how the verb is used. .
[0012]
Regarding the case particle conversion processing, for example, there are several conventional methods as shown in Non-Patent Documents 2 to 4 below. In the technologies disclosed in Non-Patent Documents 2 to 4, the problem of case particle conversion processing is dealt with by using a case frame dictionary that describes how case particles should be converted.
[0013]
[Non-Patent Document 1]
Maki Murata,
Japanese case analysis using machine learning techniques-Teacher signal borrowing type and non-borrowing type and combined type-
IEICE Technical Report NLC-2001-24
July 17, 2001
[Non-Patent Document 2]
Information processing promotion business association technical center,
Japanese basic verb dictionary for computers, IPAL (Basic Verbs),
1987
[Non-Patent Document 3]
Sadao Kurohashi and Makoto Nagao,
A Method of Case Structure Analysis for Japanese Sentences based on Examples in Case Frame Dictionary,
IEICE Transactions of Information and Systems, Vol.E77-D, No.2, 1994
[Non-Patent Document 4]
Keiko Kondo, Satoshi Sato, Manabu Okumura,
Paraphrasing of simple sentences by case conversion,
IPSJ Journal, Vol.42, No.3,
2001
[0014]
[Problems to be solved by the invention]
The non-patent document 1 has an effect of improving processing accuracy by applying a machine learning method to language analysis processing. Also, the borrowed machine learning method and the combined machine learning method are very effective in that the teacher signal for machine learning can be increased without increasing the labor burden by manpower.
[0015]
In the machine learning process, learning is performed so that the correct answer rate is maximized in given teacher data. Unsupervised data is different from supervised data in that it has no information to be analyzed.
[0016]
Therefore, machine learning processing using a teacher signal obtained by simply adding unsupervised data to supervised data as in the combined machine learning method shown in Non-Patent Document 1 is a sum of supervised data and unsupervised data. Learn to maximize the correct answer rate. Therefore, depending on the relationship between the unsupervised data and the supervised data, there arises a problem that the learning accuracy is lowered as compared with the case of machine learning in which learning is performed so as to maximize the accuracy rate of only the supervised data.
[0017]
In view of such a problem of the prior art, it is required to realize a technique that can perform highly accurate learning processing more reliably by taking advantage of supervised data and unsupervised data.
[0018]
Also, with regard to the sentence conversion processing from passive sentences / usage sentences to active sentences, the conventional techniques such as those described in Non-Patent Documents 2 to 4 above describe how to convert case particles as all verbs. We needed a case frame dictionary describing how to use the verb.
[0019]
However, since it is practically difficult to prepare a dictionary that describes all the verbs and how to use the verbs, the conversion method using this case frame dictionary is inadequate and is not described in the case frame dictionary. There were problems that verbs and sentences that used verbs could not be converted or that there was a high probability of erroneous conversion.
[0020]
Accordingly, there is a need for a technique that can perform highly accurate processing without increasing the labor burden of manual processing, particularly for sentence conversion processing from passive sentences / usage sentences to active sentences.
[0021]
The object of the present invention is to improve the accuracy of the language analysis process using the combined teacher learning method in which machine learning is performed using both supervised data and unsupervised data. It is to provide a processing system that can perform language analysis processing.
[0022]
Furthermore, an object of the present invention is to provide a sentence conversion processing system capable of estimating a post-conversion case particle with high accuracy using a machine learning method, particularly for sentence conversion processing from a passive sentence or a service sentence to an active sentence. .
[0023]
[Means for Solving the Problems]
In order to achieve the above object, the present invention has the following configuration.
[0024]
The present invention includes a main processing system that performs language analysis processing using machine learning processing, and a stack processing system that provides data to be used for machine learning processing to the main processing system. A language analysis processing system for performing language analysis processing,
The stack processing system includes: 1) sentence data storage means for storing sentence data that is an analysis target in the language analysis process and does not include solution information for a problem handled in the machine learning process;
Problem expression information storage means for storing a problem expression, which is a predetermined sentence expression indicating the problem, and a portion corresponding to the problem expression, and 2) from sentence data stored in the sentence data storage means 3) a problem expression equivalent part extracting means for extracting a part corresponding to the part corresponding to the problem expression to be a problem expression equivalent part; and 3) a converted sentence obtained by converting the problem expression equivalent part of the sentence data with the problem expression. A problem structure conversion means for creating unsupervised data that is a set of a problem and a solution with the problem expression equivalent part as a solution, and 4) an unsupervised data storage means for storing the created unsupervised data 5) From the problem of unsupervised data stored in the unsupervised data storage means, a feature that is predetermined information including at least a character string, a word, or a part of speech is extracted by a predetermined analysis process; A stack solution-feature pair extraction means for generating a set of feature sets and solutions for each unsupervised data; and 6) for each set of feature sets and solutions based on a predetermined machine learning algorithm. Machine learning processing is performed on what kind of solution is likely to occur in the case of such a feature set, and what kind of solution is likely to occur in the case of the above-described feature set as a learning result. A stack machine learning means for storing in the stack learning result data storage means; and 7) the predetermined extracted from the main processing system by an extraction process similar to the extraction process performed by the stack solution-feature pair extraction means. When the feature set that is the information of the stack is received, which feature set is stored as the learning result in the stack learning result data storage means Based on the fact of whether easy now look Do solution, the estimates to the prone solution when the set of feature, and a stack for solution estimation processing means as output solutions stack solutions obtained by the estimated,
The main processing system is 8) sentence data composed of a problem and a solution, and solution data to which solution information for a problem to be analyzed in the language analysis process and handled in the machine learning process is given. 9) From the problem of the solution data stored in the solution data storage means, and 9) using the predetermined information by extraction processing similar to the extraction processing performed by the stack solution-feature pair extraction means A main solution-feature pair extraction unit that extracts a feature and generates a set of the feature set and solution for each solution data; and 10) the feature generated by the main solution-feature pair extraction unit. The stack output solution estimated and output by the stack solution estimation processing means with respect to the set is added as a feature to the feature set generated by the main solution-feature pair extraction means, 11) a first feature adding means for making a set of one feature; and 11) a set of the first feature set and a solution based on a predetermined machine learning algorithm. The machine learning process is performed to determine whether the solution is likely to be a simple solution, and the learning result data storage unit stores, as a learning result, what kind of solution is likely to be generated in the case of the set of features described above Machine learning means, and 12) a feature that is the predetermined information is extracted from the input sentence data that is input as the object of the language analysis process by an extraction process similar to the extraction process performed by the stack solution-feature pair extraction means. A feature extracting means for extracting, and 13) for the stack estimated and output by the stack solution estimating processing means for the set of features generated by the feature extracting means A second feature adding means for adding a force solution as a feature to the set of features generated by the feature extracting means to form a second feature set; 14) as a learning result in the main learning result data storage means Solution estimation processing means for estimating a solution that is likely to be the case of the second feature set, based on what kind of solution is the case of the stored feature set; ,
Using a decision list method or a maximum entropy method or a support vector machine method as the predetermined machine learning algorithm;
In the decision list method, the stack machine learning means and the main machine learning means use a set of feature sets and solutions of the unsupervised data as rules, and a list in which the rules are stored in a predetermined priority order is provided. Compared with the set of features of the input data in descending order of priority, the rules stored in the list as learning results are stored as the learning results and are stored in the list as the learning results by the solution estimation processing means for the stack and the solution estimation processing means, A process in which a solution of a rule with a matched feature is estimated as a solution that is likely to be a set of features of the input data, or
In the maximum entropy method, the set of features satisfy a predetermined conditional expression from the set of feature sets and solutions of the unsupervised data by the stack machine learning means and the main machine learning means, and the entropy Is stored as the learning result, and the solution estimation processing means for stack and the solution estimation processing means determine the probability distribution of the input data based on the probability distribution that is the learning result. A process in which the probability of each classification in the case of a set of features is obtained, and the classification having the maximum probability value is estimated as a solution that is likely to occur when the set of features of the input data, or
In the support vector machine method, a hyperplane is determined by a predetermined support vector machine method using a set of feature sets and solutions of the unsupervised data by the stack machine learning means and the main machine learning means. , The hyperplane and a classification of the space divided by the hyperplane are stored as the learning result, and the stack solution estimation processing means and the solution estimation processing means, based on the hyperplane as the learning result, Whether the set of features of the input sentence data belongs to any of the spaces divided by the hyperplane is determined, and the classification of the space to which the set of features belongs is a set of features of the input sentence data A process that is estimated as an easy solution is performed.
[0025]
Further, the stack processing system is composed of 15) a problem and a solution, and stores a solution data to which solution information for a problem to be analyzed in the language analysis process and handled in the machine learning process is added. A data storage means,
The stack solution-feature pair extraction unit extracts the feature as the predetermined information by the extraction process from the solution data problem stored in the solution data storage unit, and the set of features for each solution data The stack machine learning means generates a set of features and a set of features generated from the sentence data and the solution data in any feature set. It is characterized by machine learning processing that is likely to be a solution.
[0026]
Further, the present invention comprises a main processing system that performs language analysis processing using machine learning processing, and a stack processing system that provides data to be used in machine learning processing to the main processing system, A language analysis processing system for performing predetermined language analysis processing,
The stack processing system includes: 1) sentence data storage means for storing sentence data that is an object to be analyzed in the language analysis process and does not include solution information for the problem handled in the machine learning process; and 2) the problem indicates A problem expression information storage means for storing a problem expression which is a predetermined sentence expression and a portion corresponding to the problem expression, and 3) the problem expression from sentence data stored in the sentence data storage means 4) a problem expression equivalent part extracting means for extracting a part corresponding to the part corresponding to the above and as a problem expression equivalent part, and 4) converting the problem expression equivalent part of the sentence data into the problem expression as a problem, A problem structure conversion means for creating unsupervised data that is a set of a problem and a solution or a solution candidate by using a question expression equivalent part as a solution or a solution candidate; and 5) a teacher that stores the created unsupervised data. 6) a feature that is predetermined information including at least a character string, a word, or a part of speech is extracted from a problem of unsupervised data stored in the unsupervised data storage unit by a predetermined analysis process; Stack feature-solution pair / feature-solution candidate pair extraction means for generating a set of the feature set and solution or solution candidate for each unsupervised data; and 7) based on a predetermined machine learning algorithm, For a set and set of solutions or solution candidates, learning is performed by machine learning processing of the probability of being a positive or negative example that is a predetermined two classification destination in the case of any set of features and a solution or solution candidate As a result, in the case of a set of the feature set and a solution or a solution candidate, a stack machine learning means for storing a probability that is a positive example or a negative example in the stack learning result data storage means; 8) A set of features that are the predetermined information extracted from the main processing system by the same extraction process as the extraction process performed by the stack feature-solution pair / feature-solution candidate pair extraction unit and a solution or When a set of solution candidates is received, based on the set of features stored as learning results in the learning result data storage means and the probability of being a positive example or a negative example in the case of a solution or solution candidate set, A stack that obtains a probability of being a positive example or a negative example in the case of a set of feature sets and solution candidates, and outputs a solution candidate having a maximum probability of being a positive example from among all solution candidates as an output solution for a stack Solution estimation processing means,
9) The main processing system is sentence data composed of 9) a problem and a solution, and solution data to which solution information for a problem to be analyzed in the language analysis process and handled in the machine learning process is given And 10) an extraction process similar to the extraction process performed by the stack feature-solution pair / feature-solution candidate pair extraction means from the problem of the solution data stored in the solution data storage means And 11) a main feature-solution pair / feature-solution candidate pair extraction unit that extracts a feature that is the predetermined information and generates a set of the feature set and the solution or solution candidate. 11) the main feature -Solution pair / feature-The stack output solution estimated and output by the stack solution estimation processing means for the set of the feature set and solution or solution candidate generated by the solution candidate pair extraction means, For main -A first feature addition unit that adds a feature set to the feature set generated by the feature pair extraction unit to form a first feature set; 12) based on a predetermined machine learning algorithm, the solution and the first feature For a set of a feature set and a solution or solution candidate, machine learning processing is performed on the probability of being a positive example or a negative example in the case of the feature set and solution or solution candidate, and as a learning result, the feature set and solution or solution or Main machine learning means for storing in the main learning result data storage means the probability of being a positive or negative example in the case of a solution candidate; and 13) from the input sentence data inputted as the object of the language analysis processing, the stack A feature extraction unit that extracts the feature as the predetermined information by an extraction process similar to the extraction process performed by the feature-solution pair / feature-solution candidate pair extraction unit; 14) The stack output solution estimated and output by the stack solution estimation processing means for the set of the feature set and solution or solution candidate added is added as a feature to the feature set generated by the feature extraction means And a second feature adding means for making a second feature set, and 15) a correct feature set in the case of a set of the feature set and solution or solution candidate stored as a learning result in the main learning result data storage means. Based on the probability of being an example or a negative example, the probability of being a positive example or a negative example is obtained in the case of the set of the second feature set and the solution candidate, and the probability of being a positive example among all the solution candidates Comprises solution estimation processing means for estimating the largest solution candidate as a solution,
Using a decision list method or a maximum entropy method or a support vector machine method as the predetermined machine learning algorithm;
In the decision list method, the stack machine learning means and the main machine learning means use a set of feature sets and solutions of the unsupervised data as rules, and a list in which the rules are stored in a predetermined priority order is provided. Compared with the set of features of the input data in descending order of priority, the rules stored in the list as learning results are stored as the learning results and are stored in the list as the learning results by the solution estimation processing means for the stack and the solution estimation processing means, A process in which a solution of a rule with a matched feature is estimated as a solution that is likely to be a set of features of the input data, or
In the maximum entropy method, the set of features satisfy a predetermined conditional expression from the set of features and solutions of the unsupervised data by the stack machine learning means and the main machine learning means, and the entropy Is stored as the learning result, and the solution estimation processing means for stack and the solution estimation processing means determine the probability distribution of the input data based on the probability distribution that is the learning result. A process in which the probability of each classification in the case of a set of features is obtained, and the classification having the maximum probability value is estimated as a solution that is likely to occur when the set of features of the input data, or
In the support vector machine method, a hyperplane is determined by a predetermined support vector machine method using a set of feature sets and solutions of the unsupervised data by the stack machine learning means and the main machine learning means. , The hyperplane and a classification of the space divided by the hyperplane are stored as the learning result, and the stack solution estimation processing means and the solution estimation processing means, based on the hyperplane as the learning result, Whether the set of features of the input sentence data belongs to any of the spaces divided by the hyperplane is determined, and the classification of the space to which the set of features belongs is a set of features of the input sentence data A process that is estimated as an easy solution is performed.
[0027]
Further, the stack processing system is composed of 16) a problem and a solution, and stores a solution data to which solution information for the problem to be analyzed in the language analysis process and handled in the machine learning process is added. A data storage means,
The stack solution-feature pair extraction unit extracts the feature as the predetermined information by the extraction process from the solution data problem stored in the solution data storage unit, and the set of features for each solution data The stack machine learning means generates the feature set and the solution or solution candidate for the set of feature data and the solution or solution candidate generated from the sentence data and the solution data. In the case of a pair, the machine learning process is performed on the probability of being a positive example or a negative example.
[0028]
As described above, in the present invention, learning is performed so as to maximize the accuracy rate of supervised data in the machine learning process by incorporating the analysis result of the machine learning method using unsupervised data as the feature of supervised data. As a result, machine learning processing that takes advantage of both unsupervised data and supervised data of different properties can be performed, and highly accurate analysis processing can be realized.
[0029]
Furthermore, the present invention is a sentence conversion processing system that estimates a case particle after conversion when converting sentence data that is a passive sentence or a service sentence into sentence data of an active sentence using machine learning processing. 2) The solution data storage means for storing solution data that is composed of a problem and a solution and that uses sentence data as a problem and that contains solution information for the problem in the conversion process. A feature, which is predetermined information including at least a character string, a word, or a part of speech, is extracted from a problem of solution data stored in the means by a predetermined analysis process, and a set of the feature set and the solution is set for each solution data. 3) The solution-feature pair extraction means for generating the feature 3) Based on a predetermined machine learning algorithm, what kind of solution is likely to be obtained for the feature set and the solution set? Toi Machine learning processing for storing the results in the learning result data storage device, and what kind of solution is likely to be obtained in the case of the set of features as a learning result, and 4) the conversion processing A feature extraction means for extracting as the feature that is the predetermined information from the input sentence data inputted as a target of the extraction by the same extraction process as the extraction process performed by the solution-feature pair extraction means; and 5) the learning result data storage Solution estimation processing means for estimating a solution likely to be a case of the feature set based on what kind of feature set is stored as a learning result in the means; With
Using a decision list method or a maximum entropy method or a support vector machine method as the predetermined machine learning algorithm;
In the decision list method, the machine learning means sets a set of feature sets of unsupervised data and a solution as a rule, and stores a list in which the rule is stored with a predetermined priority as the learning result. The estimation processing means compares the rules stored in the list as the learning result with the set of features of the input data in descending order of priority, and the solution of the rule with the matched feature is the set of features of the input data. A process that is estimated as a likely solution, or
In the maximum entropy method, when the machine learning means maximizes an expression indicating that the feature set satisfies a predetermined conditional expression and exhibits entropy from a set of feature sets and solutions of the unsupervised data. A probability distribution is stored as the learning result, and the solution estimation processing means obtains the probability of each classification in the case of a set of features of the input data based on the probability distribution as the learning result, and the probability is A process in which a classification having the maximum probability value is estimated as a solution that is likely to be a set of features of the input data, or
In the support vector machine method, the machine learning means obtains a hyperplane by a predetermined support vector machine method using a set of feature sets and solutions of the unsupervised data, and uses the hyperplane and the hyperplane. A classification of the divided space is stored as the learning result, and a set of features of the input sentence data is divided by the hyperplane based on the hyperplane as the learning result by the solution estimation processing means. The feature classification is performed, and a process is performed in which the classification of the space to which the feature set belongs is estimated as a solution that is likely to occur in the case of the feature set of the input sentence data.
[0030]
Furthermore, the present invention is a sentence conversion processing system that estimates a case particle after conversion when converting sentence data that is a passive sentence or a service sentence into sentence data of an active sentence using machine learning processing. 2) The solution data storage means for storing solution data that is composed of a problem and a solution and that uses sentence data as a problem and that contains solution information for the problem in the conversion process. A feature, which is predetermined information including at least a character string, a word, or a part of speech, is extracted from a problem of the solution data stored in the means by a predetermined analysis process, and the set of features and the solution or solution for each solution data are extracted. Feature-solution pair / feature-solution candidate pair extraction means for generating a pair with a candidate, and 3) any feature for a set of the feature set and a solution or solution candidate based on a predetermined machine learning algorithm Probability that is a positive or negative example in the case of a set and a set of solutions or solution candidates is machine-learned, and the learning result is a positive or negative example in the case of a set of the feature set and a solution or solution candidate And 4) an extraction process performed by the feature-solution pair / feature-solution candidate pair extraction unit from input sentence data input as a target of the conversion process. A feature-solution candidate pair extraction unit that extracts the feature as the predetermined information by the same extraction process as the above and generates a set of the feature set and solution candidate; and 5) a learning result in the learning result data storage unit Based on the probability of being a positive example or a negative example in the case of a set of the feature set and the solution or solution candidate stored as a positive or negative example in the case of a set of the feature set and solution candidate Find a certain probability, all the solutions And a solution estimation processing means probability of positive cases is estimated as the solution up to the solution candidates from among,
Using a decision list method or a maximum entropy method or a support vector machine method as the predetermined machine learning algorithm;
In the decision list method, the machine learning means sets a set of feature sets of unsupervised data and a solution as a rule, and stores a list in which the rule is stored with a predetermined priority as the learning result. The estimation processing means compares the rules stored in the list as the learning result with the set of features of the input data in descending order of priority, and the solution of the rule with the matched feature is the set of features of the input data. A process that is estimated as a likely solution, or
In the maximum entropy method, when the machine learning means maximizes an expression indicating that the feature set satisfies a predetermined conditional expression and exhibits entropy from a set of feature sets and solutions of the unsupervised data. A probability distribution is stored as the learning result, and the solution estimation processing means obtains the probability of each classification in the case of a set of features of the input data based on the probability distribution as the learning result, and the probability is A process in which a classification having the maximum probability value is estimated as a solution that is likely to be a set of features of the input data, or
In the support vector machine method, the machine learning means obtains a hyperplane by a predetermined support vector machine method using a set of feature sets and solutions of the unsupervised data, and uses the hyperplane and the hyperplane. A classification of the divided space is stored as the learning result, and a set of features of the input sentence data is divided by the hyperplane based on the hyperplane as the learning result by the solution estimation processing means. The feature classification is performed, and a process is performed in which the classification of the space to which the feature set belongs is estimated as a solution that is likely to occur in the case of the feature set of the input sentence data.
[0031]
The case particle conversion process in the sentence conversion process from passive sentence or use sentence to active sentence is to determine the case particles used in the converted sentence. Since the number of types of case particles after conversion is finite, the conversion problem of case particles after conversion can be reduced to a classification problem and can be handled as a process using a machine learning technique.
[0032]
In the present invention, machine learning is performed using, as a teacher signal, data (unsupervised data) generated from a sentence to which information (such as a converted case particle) about an analysis target is not given. As a result, a large amount of normal electronic data (sentences) can be used as teacher data, and high-accuracy sentence conversion processing is performed without increasing the labor burden of manually providing information about the analysis target. Can be realized.
[0033]
DETAILED DESCRIPTION OF THE INVENTION
Some of the embodiments of the present invention will be described below.
[0034]
As a first embodiment, a process of applying a machine learning method (unborrowed machine learning method) using supervised data to a sentence conversion process from passive sentences / usage sentences to active sentences will be described. Further, as a second embodiment, a process of applying a machine learning method (unborrowed machine learning method) using unsupervised data to sentence conversion processing from passive sentences / usage sentences to active sentences will be described. Further, as a third embodiment, a machine learning method (combined machine learning method) using both supervised data and unsupervised data is applied to sentence conversion processing from passive sentences / usage sentences to active sentences. Processing will be described.
[0035]
Furthermore, as a fourth embodiment, a machine learning method (unsupervised data stack type machine learning method) using the result of machine learning using unsupervised data as a feature of supervised data is applied to language analysis processing. Processing to be performed will be described.
[0036]
In the embodiment of the present invention, the case particle conversion process in the conversion process from passive sentence / serving sentence to active sentence is the case of the active sentence after converting the case particle of the original passive sentence / serving sentence. The process of converting to a particle, and the process of erasing unnecessary parts of the original passive / usual sentence. The unnecessary part is the part of the original sentence “He is” in the sentence conversion from the sentence “He made her cut her hair” to the active sentence “She cut her hair”. Also, the case particle of the original sentence (passive sentence / serving sentence) is used as a pre-conversion case particle, and a new case particle given at the time of sentence conversion to an active sentence is used as a post-conversion case particle.
[0037]
In this embodiment, only these case particle conversion processes are targeted, and the auxiliary verb expression conversion process associated with the conversion to an active sentence is not described as a process target. The conversion processing of the auxiliary verb expression part can be easily realized by using existing processing, for example, processing using rules according to the grammar.
[0038]
[First Embodiment]
Sentence conversion processing for automatically converting case particles to be changed by machine learning using supervised data when performing sentence conversion processing from passive sentences / serving sentences to active sentences as the first embodiment System processing will be described.
[0039]
FIG. 1 shows a configuration example of a sentence conversion processing system in this embodiment. The sentence conversion processing system 100 includes a CPU and a memory, and includes a solution-feature pair extraction unit 101, a machine learning unit 102, a learning result database 103, a feature extraction unit 110, a solution estimation processing unit 111, and a solution database 2.
[0040]
The solution-feature pair extraction unit 101 is a means for extracting a case as supervised data from the solution database 2 and extracting a pair (pair) of a case solution and a feature set for each case.
[0041]
The machine learning unit 102 learns by using a machine learning method what kind of solution is likely to be obtained from a set of the extracted solution and the set of features, and the learning result is stored in the learning result database 103. It is means to memorize.
[0042]
The feature extraction unit 110 is a means for extracting a set of features from the inputted sentence (passive sentence or usage sentence) 3. A sentence is a part of a sentence or a sentence having at least a body phrase and a pretext.
[0043]
The solution estimation processing unit 111 refers to the learning result database 103 to determine what solution is likely to occur in the case of the feature of the input sentence 3, that is, a case particle that is likely to become a post-conversion case particle when converted into an active sentence. And the estimated case particle is output as a solution 4.
[0044]
The solution database 2 stores supervised data having a structure of “problem-solution” to which information to be analyzed by machine learning is added. In this embodiment, the post-conversion case particle in the conversion process from the passive sentence / serving sentence to the active sentence is the target of analysis, and the information of the case particle (post-conversion case particle) to be changed in the conversion process to the active sentence is the tag. It is possible to use a database in which attached cases (single sentences) are stored.
[0045]
FIG. 2 shows a processing flow of the sentence conversion processing system 100.
[0046]
Step S1: The solution-feature pair extraction unit 101 extracts cases from the solution database 2, and extracts a set of a solution and a set of features for each case. For example, as the solution database 2, a tagged corpus in which a post-conversion case particle used as a tag for each case particle of a passive sentence and a service sentence is used as a tag is used.
[0047]
FIG. 3 shows an example (single sentence) stored in the tagged corpus. The five case particles underlined in the simple sentence shown in FIG. 3 are pre-conversion case particles, and the case particles indicated by arrows below the underlined portion are information indicating post-conversion case particles. In the case of FIG. 3A, when this passive sentence is converted into an active sentence, the pre-conversion case particles are converted from “ni” to “ga” and “ga” to “to”, respectively. Means that. In addition, in the example of FIG. 3B, when this usage sentence is converted into an active sentence, the pre-conversion case particles are converted from “ni” to “ga”, and from “to” to “to”, respectively. It means that the “he” part will be erased. “Other” is a tag that means that the part is deleted when it becomes an active sentence.
[0048]
Here, the feature means one unit of detailed information used in the analysis processing by the machine learning method. Examples of features to be extracted include the following.
[0049]
1. Case particles attached to the body n (pre-conversion case particles)
2. Part of speech of idiom v
3. Basic form of the word of the idiom v
4). Auxiliary verbs attached to the idiom v (eg "Rare", "Let" etc.)
5. Word of body n
6). Classification number of classification vocabulary table of word of body n
7). Cases other than the word n for the word v
For example, if the problem in the case is “A dog bite”
・ The word of the word n in the case to be estimated = dog,
・ The word v (basic form of word) that the case to be estimated modifies = bite,
・ The case particle between the body n and the idiom v (pre-conversion case particle) =
Features such as are extracted.
[0050]
The solution is a converted case particle given to each case as tag information. In the above case,
・ The answer (converted case particle) =
It is. Then, the solution-feature pair extraction unit 101 sets the extracted feature set as a context in the machine learning process executed by the machine learning unit 102, and sets the solution as a classification destination.
[0051]
Step S2: The machine learning unit 102 learns from the set of the extracted solution and the feature set, what kind of solution is likely to become a solution, and learns the learning result. Store in the result database 103.
[0052]
For example, it was extracted from the case “The dog bite.
・ The word of the word n in the case to be estimated = dog,
・ The word v (basic form of word) that the case to be estimated modifies = bite,
・ The case particle between the body n and the idiom v (pre-conversion case particle) =
For a set of features like
・ The answer (converted case particle) =
Learn what is easy to become.
[0053]
Also extracted from the case “bitten by a snake.
・ The word of the word n in the case to be estimated = snake,
・ The word v (basic form of word) that the case to be estimated modifies = bite,
・ The case particle between the body n and the idiom v (pre-conversion case particle) =
Even in the case of a set of features like
・ The answer (converted case particle) =
Learn what is easy to become.
[0054]
As the machine learning method, for example, a decision list method, a maximum entropy method, a support vector machine method, or the like is used, but the method is not limited to these methods.
[0055]
The decision list method uses features (elements that make up the context with information used for analysis) and pairs of classification destinations as rules, stores them in a list in a predetermined priority order, and gives input to be analyzed. In some cases, the input data and the rule features are compared from the highest priority in the list, and the classification destination of the rule having the same feature is set as the classification destination of the input.
[0056]
The maximum entropy method uses a preset feature f _j When a set of (1 ≦ j ≦ k) is F, a probability distribution p (a, b) when maximizing an expression meaning entropy while satisfying a predetermined conditional expression is obtained, and according to the probability distribution In this method, the classification having the largest probability value is used as a solution (classification to be obtained) among the obtained probabilities of the respective classifications.
[Reference 1: Maki Murata, Masao Uchiyama, Kiyotaka Uchimoto, Ma Aoi, Hitoshi Isahara, Ambiguity Solving Experiments Using Various Machine Learning Methods, IEICE Language Understanding and Communication Study Group, NCL2001-2, ( 2001)]
The support vector machine method is a method of classifying data composed of two classifications by dividing a space by a hyperplane. The support vector machine method handles data with two classifications. For this reason, normally, by using a pairwise method in combination with the support vector machine method, it is possible to handle data having three or more classification numbers. The pair-wise method is a binary classifier that makes every pair (N (N-1) / 2) of two different classification destinations for data having N classifications, and which is better for each pair. This is a method of obtaining a classification destination by a majority decision of classification destinations of N (N−1) / 2 binary classifiers finally obtained by (in this case, by the support vector machine method).
[Reference 2: Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, (Cambridge University Press, 2000)]
[Reference 3: Taku Kudoh, TinySVM: Support Vector Machines, (http://cl.aist-nara.ac.jp/taku-ku//software/Tiny SVM / index.html, 2000)]
In order to explain the support vector machine method, FIG. 4 shows the concept of margin maximization in the support vector machine method. In FIG. 4, a white circle means a positive example, a black circle means a negative example, a solid line means a hyperplane that divides the space, and a broken line means a surface that represents the boundary of the margin area. 4A is a conceptual diagram when the interval between the positive example and the negative example is narrow (small margin), and FIG. 4B is a conceptual diagram when the interval between the positive example and the negative example is wide (large margin). is there.
[0057]
Assuming that the two classifications of the support vector machine method consist of positive examples and negative examples, the larger the interval (margin) between the positive examples and the negative examples in the learning data, the lower the possibility of incorrect classification with open data. As shown in FIG. 4B, a hyperplane that maximizes this margin is obtained, and classification is performed using the hyperplane.
[0058]
The support vector machine method is basically as described above. However, in general, the training data may include a small number of cases in the inner area of the margin, and the linear part of the hyperplane is not included. A linear extension (such as the introduction of a kernel function) has been used.
[0059]
This extended method is equivalent to classification using the following discriminant function, and the two classes can be discriminated depending on whether the output value of the discriminant function is positive or negative.
[0060]
[Expression 1]

[0061]
Where x is the context of the case you want to identify _i And y _j (I = 1, ..., l, y _j ∈ {1, -1}) means the context and classification destination of the learning data, and the function sgn is

And each α _i Is for maximizing equation (3) under the constraints of equations (4) and (5).
[0062]
[Expression 2]

[0063]
The function K is called a kernel function, and various functions are used. In this embodiment, the following polynomial is used.
[0064]
K (x, y) = (x · y + 1) ^d (6)
C and d are constants set experimentally. In a specific example to be described later, C is fixed to 1 through all the processes. Moreover, two types of 1 and 2 are tried for d. Where α _i X for> 0 _i Is called a support vector, and the part taking the sum of Equation (1) is usually calculated using only this case. That is, only actual cases called support vectors are used for actual analysis.
[0065]
Since the support vector machine method handles data with two classifications, the pair-wise method is used in combination to handle data with three or more classifications. In this example, the sentence conversion processing system 150 performs processing that combines the support vector machine method and the pair-wise method. Specifically, it is realized using TinySVM.
[Reference 4: Taku Kudo, Yuji Matsumoto, chunk identification using Support vector machine, Natural Language Processing, 2000-NL-140, (2000)]
Step S3: Thereafter, the input sentence 3 is input to the feature extraction unit 110 as data for which a solution is desired.
[0066]
Step S4: The feature extraction unit 110 takes out a set of features from the input sentence 3 by processing almost similar to the processing in the solution-feature pair extraction unit 101, and passes the extracted feature set to the solution estimation processing unit 111. For example, when the input sentence 3 is “bitten by a dog”, the following features are extracted, and the extracted feature set is passed to the solution estimation processing unit 111.
[0067]
・ The description n = dog in the case to be estimated
・ The case that the case to be estimated modifies v = bite,
・ The pre-conversion case particle between the body n and the predicate v =
Step S5: Based on the learning result stored in the learning result database 103, the solution estimation processing unit 111 estimates what kind of solution 4 is likely to occur in the case of the passed feature set, and the estimated solution (Converted case particle) 4 is output.
[0068]
For example, when the learning results as described above are stored in the learning result database 103 for the cases “bitten by a dog.⇒ga” and “bitten by a snake.⇒ga”, the solution estimation processing is performed. The unit 111 refers to this learning result, analyzes the set of features extracted from the received input sentence 3, estimates that “g” is the most likely converted case particle, and solves the problem. 4 = “ga” is output.
[0069]
FIG. 5 shows another configuration example of the sentence conversion processing system in the first embodiment. In the following drawings, components such as processing means to which the same numbers are assigned have the same functions.
[0070]
The sentence conversion processing system 150 includes a feature-solution pair / feature-solution candidate pair extraction unit 161, a machine learning unit 162, a learning result database 163, a feature-solution candidate pair extraction unit 170, a solution estimation processing unit 171, and a solution database 2. Is provided.
[0071]
The feature-solution pair / feature-solution candidate pair extraction unit 161 is a means for extracting a case from the solution database 2 and extracting a set of a solution or a solution candidate and a set of features for each case.
[0072]
Here, the solution candidate means a solution candidate other than the solution. In other words, assuming that there are five case particles that are converted to a case particle: “O”, “Ni”, “GA”, “TO”, and “DE”, when “GA” is the solution , “Wo”, “ni”, “to”, and “de” are four candidate particles. A pair of a solution and a set of features is a positive example, and a pair of a solution candidate and a set of features is a negative example.
[0073]
The machine learning unit 162 selects any solution or a set of solution candidates and feature sets from the solution or the solution candidate and feature set extracted by the feature-solution pair / feature-solution candidate pair extraction unit 161. In this case, the probability of being a positive example or the probability of being a negative example is learned by a support vector machine method and a machine learning method similar thereto, and the learning result is stored in the learning result database 163.
[0074]
The feature-solution candidate extraction unit 170 extracts a set of a solution candidate and a set of features from the input sentence 3 by the same process as the feature-solution pair / feature-solution candidate pair extraction unit 161, and supplies the solution estimation processing unit 171. It is a means of passing.
[0075]
The solution estimation processing unit 171 refers to the learning result database 163 to obtain a probability that is a positive example or a negative example in the case of the solution candidate and the feature set passed from the feature-solution candidate extraction unit 170, and the positive example This is a means for estimating the solution candidate having the highest probability of being the solution 4 and outputting the estimated solution 4.
[0076]
FIG. 6 shows a processing flow of the sentence conversion processing system 150.
[0077]
Step S11: The feature-solution pair / feature-solution candidate pair extraction unit 161 extracts a case from the solution database 2 and extracts a set of a solution or a solution candidate and a set of features for each case. The feature set extracted by the feature-solution pair / feature-solution candidate pair extraction unit 161 is the same as the feature set extracted in the process of step S1 (see FIG. 2).
[0078]
Step S12: By the machine learning unit 162, from the set of the extracted solution or solution candidate and the feature set, a probability that is a positive example or a probability that is a negative example at any solution or solution candidate and feature set is determined. Learning by machine learning method. This learning result is stored in the learning result database 163.
[0079]
For example, the case is “A dog bite.
・ The description n = dog in the case to be estimated
・ The case that the case to be estimated modifies v = bite,
・ The pre-conversion case particle between the body n and the predicate v =
, The probability that the solution “is” (probability that is a positive example) and the probability that each of the solution candidates is “to”, “ni”, “to”, and “de” (a negative example) Probability).
[0080]
Step S13: Thereafter, the input sentence 3 for which a solution is to be obtained is input to the feature / solution candidate extraction unit 170.
[0081]
Step S14: The feature-solution candidate extraction unit 170 extracts a set of solution candidates and feature sets from the input sentence 3 by the same processing as the feature-solution pair / feature-solution candidate pair extraction unit 161, and extracts the extracted solution. A pair of the candidate and the feature set is passed to the solution estimation processing unit 171.
[0082]
Step S15: Based on the learning result stored in the learning result database 163 by the solution estimation processing unit 171, it is a probability or a negative example that is a positive example in the case of a set of passed solution candidates and feature sets. Find the probability.
[0083]
For example, when the input sentence is “bitten by dog”, the set of extracted features and the solution candidates “ga”, “on”, “ni”, “to”, and “de” are correct. Find the probability of being an example or the probability of being a negative example.
[0084]
Step S16: The probability of being a positive example or the probability of being a negative example is obtained for all the solution candidates, estimated as a solution 4 for obtaining a solution candidate having the highest probability of being a positive example, and the estimated solution 4 is output. .
[0085]
[Second Embodiment]
As a second embodiment, processing of a sentence conversion processing system that automatically converts case particles by unsupervised learning in conversion processing from passive sentences / serving sentences to active sentences will be described.
[0086]
First, unsupervised data used in the machine learning method will be described. FIG. 7A shows an electronic sentence given to create unsupervised data. The active sentence “Dog bites me” in FIG. 7A is data to which information to be analyzed, that is, information on conversion of a case particle at the time of sentence conversion to an active sentence is not given. However, considering the sentence in FIG. 7A as the result of sentence conversion to an active sentence, the case particle (pre-conversion case particle) that should appear in the original passive sentence / serving sentence converted to this active sentence is unknown. However, it is possible to extract a case particle (converted case particle) that should appear in the solution to be estimated, that is, the processing result (active sentence).
[0087]
FIG. 7B shows a simple sentence representing the relationship between a pre-conversion case particle and a post-conversion case particle. The conversion source sentence of the active sentence in FIG. 7A can be expressed as “dog <?> I <?> Chewed (bite)”. Since there is no pre-conversion case particle that should appear in the original sentence, it is indicated by “<?> (Unknown)”. Moreover, the post-conversion case particle which is the solution to be estimated extracted from the sentence of FIG. "G" and "" are indicated by arrows below>. As shown in FIG. 7B, the active sentence to which the information to be analyzed is not given is unknown about the pre-conversion case particle information, but the post-conversion case particle information that is the solution (classification destination) have. In the sentence shown in FIG. 7B, “dog <?> Bitten” can be converted into the following problem structure.
[0088]
“Problem ⇒ Solution” = “Dog <?> Bitten.
Thus, it can be seen that an active sentence to which information to be analyzed is not added can be used as teacher data for machine learning.
[0089]
The unsupervised data generated from the active sentence in FIG. 7A has less information than the supervised data in that it does not have information on pre-conversion case particles. However, since there are more active sentences than passive sentences and working sentences, and there is no need to manually tag post-conversion case particle information, a large amount of active sentences should be used as unsupervised data. There is an advantage that teacher signals handled by the machine learning method can be increased.
[0090]
FIG. 8 shows a configuration example of a sentence conversion processing system in the second embodiment. The sentence conversion processing system 200 includes a CPU and a memory, and includes a problem expression corresponding part extraction unit 201, a problem expression information storage unit 202, a semantic analysis information storage unit 203, a problem structure conversion unit 204, an unsupervised data storage unit 205, an answer- A feature pair extraction unit 101, a machine learning unit 102, a learning result database 103, a feature extraction unit 110, a solution estimation processing unit 111, and a sentence database 5 are provided.
[0091]
The problem expression equivalent part extraction unit 201 refers to a problem expression information storage unit 202 that stores in advance what is a part corresponding to the problem expression (problem expression equivalent part) in the processing in this system. This is means for taking out a sentence from the sentence database 5 storing data (sentence) to which information to be analyzed is not given and extracting a problem expression equivalent part from the taken out sentence.
[0092]
Here, the problem expression information storage unit 202 stores a case particle (post-conversion case particle) to be changed in conversion from a passive sentence / serving sentence to an active sentence as a problem expression equivalent part.
[0093]
When it is necessary to convert the extracted problem expression equivalent part, the problem structure conversion part 204 refers to the semantic analysis information storage part 203 that stores information for semantic analysis and converts the problem expression equivalent part. This is means for converting a case particle as a problem into a question-solution structure from a problem expression equivalent part as a problem, and storing the converted unsupervised data in the unsupervised data storage unit 205 as an example.
[0094]
The solution-feature pair extraction unit 101, the machine learning unit 102, the learning result database 103, the feature extraction unit 110, and the solution estimation processing unit 111 of the sentence conversion processing system 200 are the processes with the same numbers described in the first embodiment. It is a means for performing processing similar to the means. Note that the solution-feature pair extraction unit 101 extracts cases that are unsupervised data from the unsupervised data storage unit 205 and extracts a set of a solution and a set of features for each case.
[0095]
FIG. 9 shows a processing flow of unsupervised data generation processing.
[0096]
Step S21: A sentence (active sentence) which is electronic data of a natural sentence to which information to be analyzed is not given is input from the sentence database 5 to the problem expression corresponding part extraction unit 201.
[0097]
Step S22: The problem expression corresponding part extraction unit 201 refers to the problem expression information storage unit 202, detects the structure of the input active sentence, and extracts the problem expression corresponding part. At this time, information indicating what is the problem expression equivalent unit is given by the problem expression information stored in the problem expression information storage unit 202. For example, “dog <? = Case to be estimated (converted case particle)> bite” is stored as problem expression information. Then, the problem expression corresponding part extraction unit 201 matches the sentence structure stored as the problem expression information with the structure of the input sentence (active sentence) and sets the matching as the problem expression corresponding part. For example, if the input sentence is “dog bites”, “ga” is extracted as a problem expression equivalent as a result of matching.
[0098]
Step S23: The problem structure conversion unit 204 refers to the semantic analysis information storage unit 203, extracts the extracted problem expression equivalent part as a solution, converts the part into a problem expression (<?>), And as a result The sentence obtained is taken as a problem. For example, “ga” extracted as a problem expression equivalent part from the active sentence “dog bites” is taken as a solution, and the extracted “ga” part is converted into a problem expression (<?>). "Bite."
Step S24: Further, the question structure conversion unit 204 stores the data having the structure of the question and the solution in the unsupervised data storage unit 205 as unsupervised data (example).
[0099]
Thereafter, the sentence conversion processing system 200 performs the same process as the process in the first embodiment (see FIG. 2). That is, the solution-feature pair extraction unit 101 extracts cases from the unsupervised data storage unit 205, and extracts a set of solutions and feature sets for each case (step S1).
[0100]
If the extracted example is “dog <?> Chew” ⇒ “ga”, for example, the following set of features is extracted.
[0101]
・ Descriptive n = dog, in a case to be estimated
・ The case that the case to be estimated modifies v = bite,
・ The original case particle between the body n and the idiom v =? (unknown).
Then, the machine learning unit 102 learns what case particle is the solution at what feature from the set of the solution and the feature set. In the case of the set of features as described above, the machine learning unit 102 learns that “solution = is” easily, and stores the learning result in the learning result database 103 (step S2).
[0102]
Also, if the extracted example is “snake <?> Bite” ⇒ “ga”, the following feature set is extracted.
[0103]
・ Symbol n = snake in the case to be estimated,
・ The case that the case to be estimated modifies v = bite,
・ The original case particle between the body n and the idiom v =? (unknown).
The machine learning unit 102 learns that “solution = is” easily even in the case of a set of features as described above, and stores the learning result in the learning result database 103.
[0104]
Thereafter, the processing from when the input sentence 3 is input to the feature extraction unit 110 until the solution 4 is output by the solution estimation processing unit 111 is performed in steps S3 to S3 in the processing flow of FIG. 2 as processing in the first embodiment. Since it is the same as the process shown in step S5, description is abbreviate | omitted.
[0105]
FIG. 10 shows another configuration example of the sentence conversion processing system in the second exemplary embodiment. The sentence conversion processing system 250 includes a problem expression corresponding part extraction unit 201, a problem expression information storage unit 202, a semantic analysis information storage unit 203, a problem structure conversion unit 204, an unsupervised data storage unit 205, a feature-solution pair, and a feature-solution. A candidate pair extraction unit 161, a machine learning unit 162, a learning result database 163, a feature-solution candidate extraction unit 170, a solution estimation processing unit 171, and a sentence database 5 are provided.
[0106]
The problem expression equivalent part extraction unit 201, the problem expression information storage unit 202, the semantic analysis information storage unit 203, and the problem structure conversion unit 204 of the sentence conversion processing system 250 are each processing means to which the same numbers shown in FIG. Is a means for performing the same processing.
[0107]
Further, the feature-solution pair / feature-solution candidate pair extraction unit 161, the machine learning unit 162, the learning result database 163, the feature-solution candidate extraction unit 170, and the solution estimation processing unit 171 of the sentence conversion processing system 250 are illustrated in FIG. This is a means for performing substantially the same processing as each processing means assigned with the same number.
[0108]
In the sentence conversion processing system 250, the feature-solution pair / feature-solution candidate pair extraction unit 161 extracts a set of a solution or a solution candidate and a set of features for each case from the unsupervised data storage unit 205 ( FIG. 6: Step S11).
[0109]
If the extracted example is “dog <?> Chew” ⇒ “ga”, for example, the following set of features is extracted.
[0110]
・ Descriptive n = dog, in a case to be estimated
・ The case that the case to be estimated modifies v = bite,
・ The original case particle between the body n and the idiom v =? (unknown).
Then, the machine learning unit 162 uses a machine learning method to determine a probability that is a positive example or a probability that is a negative example from a set of a solution or a solution candidate and a feature set. learn. This learning result is stored in the learning result database 163 (FIG. 6: Step S12).
[0111]
Thereafter, the processing from when the input sentence 3 is input to the feature-solution candidate extraction unit 170 to when the solution estimation processing unit 171 outputs the solution 4 is the processing flow of FIG. 6 as the processing in the first embodiment. Since it is the same as the process of step S13-step S16, description is abbreviate | omitted.
[0112]
[Third Embodiment]
Since the case (“question-solution”) stored in the unsupervised data storage unit 205 has almost the same structure as the case stored in the solution database 2 (“question-solution”), the case of unsupervised data And supervised data examples can be used together. In this embodiment, a method of performing machine learning using both unsupervised data and supervised data as teacher signals is referred to as “supervised / unsupervised learning”.
[0113]
Unsupervised data does not have information on pre-conversion case particles that appear in the original sentence, and has less information than supervised data. However, it is not necessary to manually tag solution information (such as converted case particles) for each case manually. Also, since the number of active sentences is generally larger than the number of passive sentences, many sentences can be used as teacher signals. For this reason, sentence conversion processing by supervised / unsupervised learning performs sentence conversion processing using machine learning learning results using a large amount of teacher data without increasing the labor burden of manually adding information to be analyzed. There is an advantage that you can.
[0114]
FIG. 11 shows a configuration example of a sentence conversion processing system 300 in the third embodiment. The sentence conversion processing system 300 includes a CPU and a memory, and includes a problem expression corresponding part extraction unit 201, a problem expression information storage unit 202, a semantic analysis information storage unit 203, a problem structure conversion unit 204, an unsupervised data storage unit 205, an answer- A feature pair extraction unit 101, a machine learning unit 102, a learning result database 103, a feature extraction unit 110, a solution estimation processing unit 111, a solution database 2, and a sentence database 5 are provided. The sentence conversion processing system 300 has a configuration including the solution database 2 in addition to the sentence conversion processing system 200 having the configuration shown in FIG. 8 described as the second embodiment, and is almost the same as the sentence conversion processing system 200. Process.
[0115]
The solution-feature pair extraction unit 101 sets a set of solutions and features for each case with respect to a case that is supervised data stored in the solution database 2 and a case that is unsupervised data stored in the unsupervised data storage unit 205. Extract a set of
[0116]
FIG. 12 shows another configuration example of the sentence conversion processing system in the third exemplary embodiment. The sentence conversion processing system 350 includes a CPU and a memory, and includes a problem expression corresponding part extraction unit 201, a problem expression information storage unit 202, a semantic analysis information storage unit 203, a problem structure conversion unit 204, an unsupervised data storage unit 205, a feature- A solution / feature-solution candidate pair extraction unit 161, a machine learning unit 162, a learning result database 163, a feature-solution candidate extraction unit 170, a solution estimation processing unit 171, a solution database 2, and a sentence database 5 are provided.
[0117]
The sentence conversion processing system 350 has a structure including the solution database 2 in addition to the sentence conversion processing system 250 having the structure shown in FIG. 10 described as the second embodiment, and is almost the same as the sentence conversion processing system 250. Process.
[0118]
The feature-solution pair / feature-solution candidate pair extraction unit 161 performs, for each case, a case that is supervised data stored in the solution database 2 and a case that is unsupervised data stored in the unsupervised data storage unit 205. A set of a solution or a solution candidate and a set of features is extracted.
[0119]
[Fourth Embodiment]
As a fourth embodiment, processing of a language analysis processing system that performs analysis processing by performing stack type machine learning utilizing the advantages of both unsupervised data and supervised data when performing language analysis processing will be described. .
[0120]
Stacked machine learning is a machine learning method called “stacking” that is used to fuse analysis results of multiple systems, and uses teacher signals that add the analysis results of different machine learning methods to the features. Machine learning.
[Reference 5: Hans van Halteren, Jakub, Zavrel, and Walter Daelemans, Improving Accuracy in Word Class Tagging Through the Combination of Machine Learning Systems, Computational Linguistics, Vol.27, No.2, (2001), pp.199- 229]
In this embodiment, the language analysis processing system performs language analysis processing using borrowed machine learning (machine learning using unsupervised data) or combined machine learning (machine learning using supervised / unsupervised data), and the processing The resulting estimated solution is added as an element of the feature set. Then, language analysis processing by supervised learning is further performed using the feature set to which the estimated solution is added.
[0121]
For example, in the supervised machine learning used in the language analysis processing system of this embodiment, it is assumed that a set of features extracted from a certain supervised data (example) has a list {a, b, c}. The stacking processing system is a language analysis processing system using unsupervised machine learning, and the analysis result is “d ₁ ”. In this case, in the supervised machine learning process of the language analysis processing system, the analysis result “d” is added to the feature set {a, b, c}. ₁ ”And the list {a, b, c,“ analysis result of unsupervised learning = d ₁ "} Is used as a set of new features to perform machine learning.
[0122]
The stacking processing system is a language analysis processing system using supervised / unsupervised machine learning, and the analysis result is “d ₂ ”. In this case, in the supervised machine learning process of the language analysis processing system, the analysis result “d” is added to the feature set {a, b, c}. ₂ ”And the list {a, b, c,“ supervised / unsupervised analysis result = d ₂ "} Is used as a set of new features to perform machine learning.
[0123]
As a stacking processing system, it is also possible to use a language analysis processing system using unsupervised machine learning and a language analysis processing system using supervised / unsupervised machine learning. In this case, in the supervised machine learning process of the language analysis processing system, the analysis result “d” is added to the feature set {a, b, c}. ₁ "And" d ₂ ”And the list {a, b, c,“ analysis result of unsupervised learning = d ₁ "," Analysis result of supervised / unsupervised learning = d ₂ "} Is used as a set of new features to perform machine learning.
[0124]
In this way, when non-borrowing machine learning using supervised data and borrowed machine learning or combined machine learning are combined using the stacking method, supervised data used for supervised machine learning (examples) ) Feature increases. Thereby, it is considered that each case itself used for supervised machine learning improves learning accuracy. Furthermore, in supervised machine learning, learning is performed to maximize the accuracy rate of supervised data (examples), that is, to increase the accuracy of the analysis target, although the features have increased. Then, an analysis process is performed using the learning result. As a result, it is expected that high analysis accuracy can be obtained by making good use of the advantages of supervised machine learning and unsupervised machine learning.
[0125]
FIG. 13 shows a configuration example of a language analysis processing system in the fourth embodiment.
[0126]
The language analysis processing system 500 is a system that outputs an analysis result of a language analysis process for a given problem, and includes a CPU and a memory, and a solution-feature pair extraction unit 501, a machine learning unit 502, a learning result database 503, A feature extraction unit 504, a solution estimation processing unit 505, an unsupervised learning processing system for stack 1010, a first feature addition unit 511, a second feature addition unit 512, a sentence database 5, and a solution database 6 are provided.
[0127]
The processing means of the solution-feature pair extraction unit 501, the machine learning unit 502, the learning result database 503, the feature extraction unit 504, and the solution estimation processing unit 505 are respectively the solution-feature pair extraction unit 101 of the sentence conversion processing system 100. These are means for performing substantially the same processing as the machine learning unit 102, the learning result database 103, the feature extraction unit 110, and the solution estimation processing unit 111.
[0128]
The unsupervised learning processing system 1010 for stacks extracts a set of features from unsupervised data generated from the sentence database 5 for language analysis processing, and what kind of feature set is used from the extracted set of features. The learning result is stored and the learning result is stored, and in the case of a set of features received from the first feature adding unit 511 or the second feature adding unit 512, what kind of solution ( Analysis result) is estimated from the learning result stored, and the estimated solution d ₁ To the first feature adding unit 511 or the solution d ₁ 'Is a means for returning' to the second feature adding unit 512.
[0129]
The unsupervised learning processing system 1010 for stack is a processing unit configured similarly to the sentence conversion processing system 200 shown in FIG. 8, that is, a problem expression corresponding part extraction unit 201, a problem expression information storage unit 202, and a semantic analysis information storage unit 203. A problem structure conversion unit 204, an unsupervised data storage unit 205, a solution-feature pair extraction unit 101, a machine learning unit 102, a learning result database 103, a feature extraction unit 110, and a solution estimation processing unit 111 (not shown). Outputs the analysis result of the language analysis process for the given problem.
[0130]
The first feature adding unit 511 extracts only the feature set from the set of the solution and the feature set received from the solution-feature pair extracting unit 501 and passes the feature set to the unsupervised learning processing system 1010 for stacking. Solution d returned from system 1010 ₁ And "analysis result of unsupervised learning = d ₁ "Is a means to add" as a feature to the original feature set.
[0131]
The second feature adding unit 512 extracts the set of features received from the feature extracting unit 504, passes the set to the unsupervised learning processing system 1010 for stack, and the solution d returned from the unsupervised learning processing system 1010 for stack. ₁ 'And receive "Unsupervised learning analysis result = d ₁ It is a means to add '"as a feature to the feature set.
[0132]
14 and 15 show a processing flow of the language analysis processing system 500. FIG.
[0133]
Step S30: In the unsupervised learning processing system 1010 for stacks, a single sentence stored in the sentence database 5 is taken out. By referring to the problem expression information from the extracted sentence and extracting the problem expression equivalent part as a solution, referring to the semantic analysis information and converting the problem expression equivalent part into a problem structure, the resulting sentence is taken as a problem. Cases having a “problem-solution” structure are stored as unsupervised data. Further, a set of a solution and a set of features is extracted for each case, and what kind of solution is likely to be obtained at what feature is learned by a machine learning method, and the learning result is stored.
[0134]
Step S31: Thereafter, the solution-feature pair extraction unit 501 extracts cases from the solution database 6, and extracts a set of a solution and a set of features for each case.
[0135]
Step S32: The first feature adding unit 511 extracts only the feature set from the set of the solution and the feature set, and passes it to the unsupervised learning processing system 1010 for stacking.
[0136]
Step S33: The unsupervised learning processing system 1010 for stacks refers to learning results stored in advance to estimate what kind of solution is likely to be obtained for the received feature set, and the estimated solution d ₁ Is returned to the first feature adding unit 511.
[0137]
Step S34: The solution d returned by the first feature adding unit 511 ₁ Is added as a feature to the original set of features. As a result, if the original feature set is {a, b, c}, the feature set passed to the machine learning unit 502 is {a, b, c, “analysis result of unsupervised learning = d ₁ "}.
[0138]
Step S35: The machine learning unit 502 analyzes the solution and “analysis result of unsupervised learning = d ₁ From what is included in the set of features including “”, what kind of solution is likely to become a solution is learned, and the learning result is stored in the learning result database 503.
[0139]
Step S36: A sentence for which a solution is desired is input to the feature extraction unit 504.
[0140]
Step S37: The feature extraction unit 504 extracts a set of features from the input sentence 3 and passes it to the second feature addition unit 512.
[0141]
Step S38: The second feature adding unit 512 passes the received feature set to the unsupervised learning processing system 1010 for stacking.
[0142]
Step S39: The unsupervised learning processing system for stack 1010 refers to the learning result stored in advance, estimates what kind of solution is likely to be obtained in the received feature set, and estimates the solution d ₁ 'Is passed to the second feature addition unit 512.
[0143]
Step S310: Solution d returned by second feature adding unit 512 ₁ Add 'as a feature to the original set of features. If the original feature set is {a, b, c}, the feature set passed to the machine learning unit 502 is {a, b, c, “analysis result of unsupervised learning = d ₁ '"}, And this set of features is passed to the solution estimation processing unit 505.
[0144]
Step S311: The solution estimation processing unit 505 refers to the learning result stored in the learning result database 503 to estimate what kind of solution is likely to occur in the case of the passed feature set, and the estimated solution 4 is output.
[0145]
Hereinafter, the processing of the language analysis processing system 500 will be described in more detail by taking specific processing as an example. As a first specific example, a processing example in which the language analysis processing system 500 estimates a post-conversion case particle in a conversion process from a passive sentence / serving sentence to an active sentence is shown.
[0146]
In the unsupervised learning processing system 1010 for stack of the language analysis processing system 500, case particles to be converted (case particles to be estimated) to be converted in a conversion process from passive sentences / serving sentences to active sentences are stored in advance as problem expressions. . Then, when the sentence taken out from the sentence database 5 is “dog bites”, “ga” is extracted as a problem expression equivalent part to be a solution (classification destination), and the sentence is transformed into “dog <?> Bite”. As a problem (context)
Case (Problem ⇒ Solution): “Dog <?> Chew” ⇒ “Ga”
Remember. Furthermore, the following feature set is extracted from this case.
[0147]
・ Descriptive n = dog, in a case to be estimated
・ The case that the case to be estimated modifies v = bite,
・ The original (pre-conversion) case particle between body and usage (unknown)
In the case of this set of features, it learns that the converted case particle is likely to be “ga” and stores the learning result.
[0148]
Further, when the sentence taken out from the sentence database 5 is “snake bites”, the same processing is performed.
Case (Problem ⇒ Solution): “Snake <?> Chew” ⇒ “Ga”
Remember. Furthermore, the following feature set is extracted from this case.
[0149]
・ Symbol n = snake in the case to be estimated,
・ The case that the case to be estimated modifies v = bite,
・ The original (pre-conversion) case particle between body and usage (unknown)
In the case of the set of features, it learns that the converted case particle is likely to be “ga” and stores the learning result.
[0150]
Thereafter, the solution-feature pair extraction unit 501 uses the solution database 6 to
Case (Problem ⇒ Solution): “Dog bites” ⇒ “Ga”
And for each case, a set of the solution “ga” and the following feature set is extracted.
[0151]
・ Descriptive n = dog, in a case to be estimated
・ The case that the case to be estimated modifies v = bite,
-The original (before conversion) case particle between the body n and the idiom v =
Furthermore, the first feature adding unit 511 extracts only the feature set from the set of the extracted solution and the feature set, and passes it to the unsupervised learning processing system 1010 for stacking. The unsupervised learning processing system 1010 for stacks refers to learning results stored in advance, estimates what kind of solution is likely to be obtained for the received feature set, and estimates the solution d ₁ “Ga” is returned to the first feature adding unit 511.
[0152]
Next, the returned solution d by the first feature adding unit 511 ₁ Is added to the original feature set as a feature, and the following feature set is obtained.
[0153]
・ Descriptive n = dog, in a case to be estimated
・ The case that the case to be estimated modifies v = bite,
The original case particle (before conversion) between the body n and the idiom v =
-Analysis result of unsupervised learning = (solution d ₁ )
Then, the machine learning unit 502 uses the solution and the solution d ₁ It learns what kind of solution it is likely to be in the case of the feature from the set with the feature set including, and stores the learning result in the learning result database 503.
[0154]
Thereafter, a sentence for which a solution is desired is input to the feature extraction unit 504. A feature extraction unit 504 extracts a set of features from the input sentence 3. For example, when the input sentence 3 is “bitten by a snake”, a set of the following features is extracted and passed to the second feature adding unit 512.
[0155]
・ Symbol n = snake in the case to be estimated,
・ The case that the case to be estimated modifies v = bite,
-The original (before conversion) case particle between the body n and the idiom v =
Then, the second feature adding unit 512 passes the received feature set to the unsupervised learning processing system 1010 for stacking. The unsupervised learning processing system 1010 for stacks refers to a learning result stored in advance, estimates what kind of solution is likely to occur when the received feature set, and estimates the solution d ₁ 'Return “ga” to the second feature addition unit 512.
[0156]
The solution d returned by the second feature adding unit 512 ₁ Add 'as a feature to the original set of features. For example, the set of features is as follows.
[0157]
・ Symbol n = snake in the case to be estimated,
・ The case that the case to be estimated modifies v = bite,
The original case particle (before conversion) between the body n and the idiom v =
-Analysis result of unsupervised learning = (solution d ₁ ')
And the solution d ₁ A set of features including 'is passed to the solution estimation processing unit 505. The solution estimation processing unit 505 refers to the learning result stored in the learning result database 503, estimates what kind of solution is likely to be generated in the case of the set of passed features, and calculates the estimated solution 4 Output.
[0158]
Here, the case particle “ga” estimated by referring to the learning result of the supervised learning based on the set of features added with the analysis result “ga” returned from the unsupervised learning processing system 1010 for stack is output. The
[0159]
In this way, the machine learning unit 502 adds “analysis result of unsupervised learning = d” to a set of features extracted from supervised data (example) in the solution database 6. ₁ Machine learning is performed using a set of features to which "" is added. Since the feature set used in this case has more feature information than the set of features extracted from supervised data, only the supervised data is used. Compared to machine learning, machine learning can be performed with higher accuracy, compared to machine learning using only unsupervised data with a large amount of data but less feature information. The machine learning with higher accuracy can be performed in that the feature information is large.
[0160]
Further, the solution estimation processing unit 505 refers to the high-precision learning result learned using the case with a large amount of feature set information, and sees the similarity of the feature set extracted from the input sentence 3. . Therefore, “Unsupervised learning analysis result = d” ₁ Compared to the case where no “” is included, the similarity between feature sets is high, and the accuracy of estimation processing is also high.
[0161]
As a second specific example, when the language analysis processing system 500 performs a process of estimating a surface case given when generating a sentence when the meaning of the sentence is expressed in a deep case, etc. Indicates.
[0162]
For example, if the meaning of a sentence is expressed in a deep case, it can be expressed as follows.
[0163]
Sentence "eating apple <← obj>"
In this sentence, “apple” is an object of “eating”, and “apple” and “eat” are connected by a deep case (indicated by <← obj>).
[0164]
In the sentence generation process, the generated sentence “eating apples” is output from the original sentence. In this case, it is necessary to generate a case particle “” corresponding to <← obj>. The problem structure (problem ⇒ case) given in this process is shown below.
[0165]
Problem (problem ⇒ case):
“Eat apple <← obj>” => “O”
The unsupervised learning processing system for stack 1010 of the language analysis processing system 500 stores the given deep case as a problem expression. Then, in the unsupervised learning processing system 1010 for stacks, when the sentence extracted from the sentence database 5 is “eating an apple”, the case particle “is replaced with a problem expression equivalent part, and the case particle“ is ”is replaced. The following example is stored as unsupervised data with the sentence obtained as a result of converting the problem expression equivalent part of the extracted sentence extracted as a solution as a problem.
[0166]
Case (Problem ⇒ Solution):
“Eat apples?” ⇒ “O”
Furthermore, a set of a solution and a set of features is extracted from this case. Here, the set of features is as follows.
[0167]
・ The word n = apple in the case to be generated,
・ The word v = eating that the case to generate modifies
・ Deep case between body n and idiom v =? (unknown)
Then, it learns what kind of solution it is likely to be in the set of features, and stores the learning result. For example, in the case of the set of features described above, it is learned that “solution = is” is likely to occur.
[0168]
It is also assumed that the sentence “eating mandarin oranges” is extracted from the sentence database 5. In this case, the following example is used as unsupervised data.
[0169]
Case (Problem ⇒ Solution):
"Eat mandarin?" ⇒ "O"
Furthermore, a set of a solution and a set of features is extracted from this case. Here, the set of features is as follows.
[0170]
・ The word n = mandarin orange in the case to be generated,
・ The word v = eating that the case to generate modifies
・ Deep case between body n and idiom v =? (unknown)
In the case of case estimation in sentence generation processing, feature information is less than that of general supervised data, but there are a large number of sentences that can be used as unsupervised data. It is possible to prepare.
[0171]
Then, it learns what kind of solution it is likely to be in the set of features, and stores the learning result. Also in this case, it is learned that “solution = is”.
[0172]
Thereafter, it is assumed that the solution-feature pair extraction unit 501 extracts the following cases from the solution database 6.
[0173]
Example: “Apple <← obj> eat” ⇒ “O”
Further, a set of a solution and a set of features is extracted from the extracted case. The following is extracted as a set of features.
[0174]
・ The word n = apple in the case to be generated,
・ The word v = eating that the case to generate modifies
・ Deep case between statement n and predicate v = obj
The first feature adding unit 511 passes the extracted feature set to the unsupervised learning processing system 1010 for stacks. The unsupervised learning processing system 1010 for stacks receives the received features based on the stored learning results. And estimate what solution is likely to occur in the case of a set of ₁ = "" Is returned to the first feature adding unit 511. The first feature adding unit 511 then returns the returned solution d ₁ Is added to the feature set to make the following feature set.
[0175]
・ The word n = apple in the case to be generated,
・ The word v = eating that the case to generate modifies
・ Deep case between body n and idiom v = obj,
-Analysis result of unsupervised learning = (solution d ₁ )
The machine learning unit 502 learns what kind of solution is likely to occur in the case of the set of features. At this time, the solution d acquired from the unsupervised learning processing system 1010 for stacks ₁ "Unsupervised learning analysis result = (solution d ₁ ) ”As a set of features,
・ The word n = apple in the case to be generated,
・ The word v = eating that the case to generate modifies
・ Deep case between body n and idiom v = obj,
-Analysis result of unsupervised learning = (solution d ₁ )
If there is such a feature, learning that “O” is the solution is possible. This learning result is stored in the learning result database 503.
[0176]
After that, when the sentence “mandarin <← obj> eat” is input to the feature extraction unit 504, the feature extraction unit 504 extracts a set of the following features from the input sentence 3 to obtain a second feature addition unit. Pass to 512.
[0177]
・ The word n = mandarin orange in the case to be generated,
・ The word v = eating that the case to generate modifies
・ Deep case between statement n and predicate v = obj
When the set of features is transferred to the unsupervised learning processing system for stacks 1010 by the second feature adding unit 512, the unsupervised learning processing system for stacks 1010 receives the features received by referring to the stored learning results. Solution d likely to be in the case of a set of ₁ '=' Is estimated and returned to the second feature adding unit 512.
[0178]
The second feature adding unit 512 adds the solution d to the original feature set. ₁ The following feature set to which 'is added is passed to the solution estimation processing unit 505.
[0179]
・ The word n = mandarin orange in the case to be generated,
・ The word v = eating that the case to generate modifies
・ Deep case between body n and idiom v = obj,
-Analysis result of unsupervised learning = (solution d ₁ ')
The solution estimation processing unit 505 estimates what kind of solution is likely to occur in the case of this feature set. Here, since the feature set stored as the learning result and the feature set extracted from the input sentence 3 are very similar, it is possible to correctly estimate “” as a solution based on the learning result. . Then, the case particle “” to be generated as the estimated solution 4 is output.
[0180]
Next, as a third specific example, a processing example in the case where the language analysis processing system 500 performs processing for complementing the verb abbreviation will be described. For example, the sentence “What is going to work so well?” Is considered to be an expression in which the verb part at the end of the sentence is omitted, and a process of complementing the omitted verb part “I don't think” is performed.
[0181]
In this case, the omitted “verb part to be complemented” is set as a problem expression, and the “verb part” that complements the abbreviated expression is set as a solution. In the unsupervised learning processing system 1010 for stack of the language analysis processing system 500, problem expression information is stored in advance in order to extract such a problem expression.
[0182]
Then, if the sentence extracted from the sentence database 5 is “I don't think it works so well.”, The verb part at the end of the sentence is replaced with a problem expression equivalent part, and the verb part “I don't think” at the end of the sentence is extracted as a solution. Then, the following example is stored as unsupervised data with the sentence obtained as a result of converting the problem expression equivalent part of the extracted sentence as a problem.
[0183]
Case (Problem ⇒ Solution):
"Is it going so well <?>" ⇒ "I don't think"
Furthermore, a set of a solution and a set of features is extracted from this case. Here, the set of features is as follows.
[0184]
・ "Ha",
・ "What is",
・ "Kuto"
・ What is Iku
…,
・ "I don't think it works so well"
Then, it learns what kind of solution it is likely to be in the set of features, and stores the learning result. For example, in the case of the set of features described above, it is learned that “solution = I don't think” tends to occur.
[0185]
Thereafter, the solution-feature pair extraction unit 501 uses the solution database 6 to
Example: “It will work so well” ⇒ “I don't think”
And a set of a solution and a set of features is extracted from the extracted example. Here, the set of features includes the following features.
[0186]
・ "Ha",
・ "What is",
・ "Kuto"
・ What is Iku
…,
・ "What works so well"
・ "I don't think it works so well"
The first feature adding unit 511 passes the extracted feature set to the unsupervised learning processing system 1010 for stack.
[0187]
In the unsupervised learning processing system 1010 for stacks, based on the stored learning result, it is estimated what kind of solution is likely to occur in the case of the received feature set, and the estimated solution d ₁ = "I don't think" is returned to the first feature adding unit 511.
[0188]
The first feature adding unit 511 then returns the returned solution d ₁ Is added to the feature set to make the following feature set.
[0189]
・ "Ha",
・ "What is",
・ "Kuto"
・ What is Iku
…,
・ "What works so well"
・ "I don't think it works so well"
-Analysis result of unsupervised learning = I don't think (Solution d ₁ )
The machine learning unit 502 learns what kind of solution is likely to occur in the case of the set of features, and stores the learning result in the learning result database 503.
[0190]
Thereafter, when the sentence “What is going to do well” is input to the feature extraction unit 504, the feature extraction unit 504 extracts a set of the following features from the input sentence 3, and then adds a second feature addition unit. Pass to 512.
[0191]
・ "Ha",
・ "What is",
・ "Kuto"
・ What is Iku
…,
・ "What does it work?"
When the set of features is transferred to the unsupervised learning processing system for stacks 1010 by the second feature adding unit 512, the unsupervised learning processing system for stacks 1010 receives the features received by referring to the stored learning results. Solution d likely to be in the case of a set of ₁ '= "I don't think" is estimated and returned to the second feature adding unit 512.
[0192]
The second feature adding unit 512 adds the solution d to the original feature set. ₁ The following feature set to which 'is added is passed to the solution estimation processing unit 505.
[0193]
・ "Ha",
・ "What is",
・ "Kuto"
・ What is Iku
…,
・ "What does it work?"
-Analysis result of unsupervised learning = I don't think (Solution d ₁ ')
The solution estimation processing unit 505 estimates what kind of solution is likely to occur in the case of this feature set, and outputs the verb part “I don't think” omitted as the estimated solution 4.
[0194]
FIG. 16 shows another configuration example of the language analysis processing system according to the fourth embodiment. The language analysis processing system 540 includes processing means similar to that of the language analysis processing system 500 and has a configuration including a supervised / unsupervised learning processing system 1020 for stacks instead of the unsupervised learning processing system 1010 for stacks.
[0195]
The supervised / unsupervised learning processing system 1020 for stack has a configuration in which the solution database 2 is added to the same processing means as the unsupervised learning processing system 1010 for stacking. The supervised / unsupervised learning processing system 1020 for a stack extracts feature sets from unsupervised data generated from the sentence database 5 and examples (supervised data) of the solution database 2 for language analysis processing, and the extracted features Learning what kind of solution (analysis result) is likely to be in the set of features, storing the learning result, and receiving from the first feature adding unit 511 or the second feature adding unit 512 In the case of a set of features, the solution (analysis result) is estimated from the learning result stored, and the estimated solution d ₂ To the first feature adding unit 511 or the solution d ₂ 'Is a means for returning' to the second feature adding unit 512.
[0196]
The first feature adding unit 511 of the language analysis processing system 540 receives the solution d returned from the supervised / unsupervised learning processing system 1020 for stacking. ₂ , "Analysis result of supervised / unsupervised learning = d ₂ "Is added to the original feature set as a feature. The second feature addition unit 512 of the language analysis processing system 540 also adds the solution d returned from the supervised / unsupervised learning processing system 1020 for stacking. ₂ 'And receive "supervised / unsupervised analysis result = d ₂ Add “” as a feature to the feature set.
[0197]
FIG. 17 shows another configuration example of the language analysis processing system according to the fourth embodiment.
[0198]
The language analysis processing system 550 is an output system that outputs an analysis result of a language analysis process for a given problem, and includes a CPU and a memory, and a feature-solution pair / feature-solution candidate pair extraction unit 561, a machine learning unit 562, Learning result database 563, feature-solution candidate extraction unit 564, solution estimation processing unit 565, unsupervised learning processing system 1030 for stack, first feature addition unit 521, second feature addition unit 522, sentence database 5, and solution database 6 Is provided.
[0199]
Each processing means of the feature-solution pair / feature-solution candidate pair extraction unit 561, machine learning unit 562, learning result database 563, feature-solution candidate extraction unit 564, and solution estimation processing unit 565 is a sentence conversion processing system. 150 feature-solution pairs / feature-solution candidate pair extraction unit 161, machine learning unit 162, learning result database 163, feature-solution candidate extraction unit 170, and solution estimation processing unit 171. .
[0200]
The unsupervised learning processing system 1030 for stacks extracts a set of solutions or solution candidates and feature sets from unsupervised data generated from the sentence database 5 for language analysis processing, and extracts the extracted solutions or solution candidates and features. Learn what the probability of being a positive example or the probability of a negative example from a set with a set or a set of solution candidates and features by machine learning method and storing the learning result In the case of a set of a solution or a solution candidate received from the first feature adding unit 521 or the second feature adding unit 522 with reference to the result, a probability that is a positive example or a negative example is obtained and is a positive example The solution candidate with the highest probability is estimated as the solution (analysis result), and the estimated solution d _Three To the first feature adding unit 521 or the solution d _Three 'Is a means for returning' to the second feature addition unit 522.
[0201]
The unsupervised learning processing system 1030 for stack _Three , Solution d _Three As well as outputting a solution candidate estimated as a solution, it is also possible to output information on whether the solution is a positive or negative example, information on the probability of being a positive or negative example, and the like.
[0202]
The unsupervised learning processing system 1030 for stack is a processing unit configured in the same manner as the sentence conversion processing system 250 shown in FIG. 10, that is, a problem expression corresponding part extraction unit 201, a problem expression information storage unit 202, and a semantic analysis information storage unit 203. , Problem structure conversion unit 204, unsupervised data storage unit 205, feature-solution pair / feature-solution candidate pair extraction unit 161, machine learning unit 162, learning result database 163, feature-solution candidate extraction unit 170, and solution estimation processing A unit 171 is provided (not shown) and outputs the analysis result of the language analysis processing for the given problem.
[0203]
The first feature adding unit 521 passes the solution received from the feature-solution pair / feature-solution candidate pair extraction unit 561 or a set of solution candidates and a set of features to the unsupervised learning processing system for stacks 1030, and the stack teacher None The solution d returned from the learning processing system 1030 _Three , “Unsupervised learning analysis result = solution d _Three "Is a means to add" as a feature to the original feature set.
[0204]
The second feature adding unit 522 passes the pair of the solution candidate and the feature set received from the feature-solution candidate extracting unit 564 to the unsupervised learning processing system 1030 for stack and is returned from the unsupervised learning processing system 1030 for stack. D _Three 'And receive "Unsupervised learning analysis result = solution d _Three It is a means to add '"as a feature to the original feature set.
[0205]
18 and 19 show a processing flow of the language analysis processing system 550.
[0206]
Step S40: In the unsupervised learning processing system 1030 for stack, a single sentence stored in the sentence database 5 is taken out, a problem expression corresponding part is extracted from the taken sentence by referring to the problem expression information, and further, semantic analysis information is obtained. The problem expression equivalent part is converted into a problem structure with reference to, and a case having a “problem-solution” structure is stored as unsupervised data with a sentence obtained as a conversion result as a problem. Further, for each case, a pair of a solution or a candidate solution and a feature set is extracted, and a probability that is a positive example or a negative example is set for any solution or a set of solution candidates and a feature set. Learning by machine learning method and storing the learning result.
[0207]
Step S41: Then, the feature-solution pair / feature-solution candidate pair extraction unit 561 extracts cases from the solution database 6, and extracts a set of a solution or a solution candidate and a set of features for each case.
[0208]
Step S42: The first feature adding unit 521 passes a solution or a set of solution candidates and feature sets to the unsupervised learning processing system 1030 for stacking.
[0209]
Step S43: The unsupervised learning processing system for stacks 1030 refers to a learning result stored in advance, and is a probability or a negative example that is a positive example for a received solution or a set of solution candidates and a set of features. Find the probability and find the solution candidate with the highest probability of being a positive example as the solution d _Three And the solution d _Three Is returned to the first feature adding unit 521.
[0210]
Step S44: Solution d returned by first feature adding unit 521 _Three From “Unsupervised learning analysis result = solution d _Three "Is added as a feature to the set of original features. Solution d _Three In addition to the estimated solution candidate, if the information includes whether it is a positive example or a negative example, and information such as the probability of being a positive example or a negative example, the received solution d _Three A part or all of the information included in may be added to the feature set. For example, “analysis result of unsupervised learning = estimated solution candidate (solution d _Three ) "," Unsupervised learning analysis result = positive example / negative example (solution d _Three ) "Or" analysis result of unsupervised learning = probability of positive example / probability of negative example (solution d _Three One or more features such as) "are added to the original feature set.
[0211]
The processes in steps S41 to S44 are performed for all solutions or combinations of solution candidates and feature sets.
[0212]
Step S45: The machine learning unit 562 causes a solution or a solution candidate and a solution d _Three The probability of being a positive example or the probability of being a negative example is determined by a machine learning method from a set of feature sets including Store in database 563.
[0213]
Step S46: A sentence for which a solution is desired is input to the feature-solution candidate extraction unit 564.
[0214]
Step S47: The feature-solution candidate extraction unit 564 extracts a set of solution candidates and feature sets from the input sentence 3.
[0215]
Step S48: The second feature adding unit 522 passes the received set of solution candidates and feature sets to the unsupervised learning processing system 1030 for stacking.
[0216]
Step S49: In the unsupervised learning processing system for stacks 1030, referring to the learning result stored in advance, any set of solution candidates and feature sets from the set of received solution candidates and feature sets. The probability of being a positive example or the probability of being a negative example is obtained and the solution candidate having the highest probability of being a positive example is determined as d _Three 'And estimate d _Three 'Is returned to the second feature addition unit 522.
[0217]
Step S410: The solution d returned by the second feature adding unit 522 _Three 'From "Unsupervised learning analysis result = Solution d _Three Add “” as a feature to the original set of features.
[0218]
Step S411: By referring to the learning result stored in the learning result database 563 by the solution estimation processing unit 565, a probability that is a positive example or a probability that is a negative example in the case of the passed solution candidate and feature set is determined. Ask. This probability is obtained for all solution candidates, and is output as a solution 4 for obtaining a solution candidate having the highest probability of being a positive example.
[0219]
FIG. 20 shows another configuration example of the language analysis processing system according to the fourth embodiment. The language analysis processing system 580 includes processing means similar to that of the language analysis processing system 550, and has a configuration including a supervised / unsupervised learning processing system 1040 for stacks instead of the unsupervised learning processing system 1030 for stacks.
[0220]
The supervised / unsupervised learning processing system 1040 for stacks has a configuration in which the solution database 2 is added to the same processing means as the supervised / unsupervised learning processing system 1020 for stacks. The supervised / unsupervised learning processing system 1040 for stacks extracts a set of solutions or solution candidates and feature sets from unsupervised data generated from the sentence database 5 for language analysis processing, and extracts the extracted solutions or solution candidates. From the set with the feature set, what kind of solution or solution candidate and feature set probability is a positive example probability or negative example probability is learned by machine learning method, and the learning result is stored. Referring to this learning result, a positive example is obtained by obtaining a probability that is a positive example or a negative example in the case of a set of a solution or a solution candidate received from the first feature adding unit 521 or the second feature adding unit 522 and a set of features. The solution candidate with the highest probability of is estimated as a solution (analysis result), and the estimated solution d _Four To the first feature adding unit 521 or the solution d _Four 'Is a means for returning' to the second feature addition unit 522.
[0221]
The supervised / unsupervised learning processing system 1040 for stack _Four , Solution d _Four As well as outputting a solution candidate estimated as a solution, it is also possible to output information on whether the solution is a positive or negative example, information on the probability of being a positive or negative example, and the like.
[0222]
The first feature adding unit 521 of the language analysis processing system 580 returns the solution d returned from the supervised / unsupervised learning processing system 1040 for stacking. _Four , "Analysis result of supervised / unsupervised learning = d _Four "Is added to the original feature set as a feature. The second feature addition unit 522 of the language analysis processing system 580 also adds the solution d returned from the supervised / unsupervised learning processing system 1040 for stacking. _Four 'And receive "supervised / unsupervised analysis result = d _Four Add “” as a feature to the original set of features.
[0223]
FIG. 21 shows another configuration example of the language analysis processing system according to the fourth embodiment. The language analysis processing system 600 includes processing means similar to that of the language analysis processing system 500, and further includes a supervised / unsupervised learning processing system 1020 for stacking.
[0224]
The first feature adding unit 611 of the language analysis processing system 600 uses the unsupervised learning processing system 1010 for stacks and the stack teacher only for the feature set from the set of solutions and feature sets received from the solution-feature pair extraction unit 501. The solution d passed to the yes / no learning processing system 1020 and returned from the unsupervised learning processing system 1010 for stack ₁ And d returned from the supervised / unsupervised learning processing system 1020 for stacks ₂ Receive. And “analysis result of unsupervised learning = d ₁ "And" supervised / unsupervised analysis result = d ₂ "Is added as a feature to the original set of features.
[0225]
Also, the second feature addition unit 612 of the language analysis processing system 600 passes the feature set received from the feature extraction unit 504 to the unsupervised learning processing system 1010 for stacks and the supervised / unsupervised learning processing system 1020 for stacks. Solution d returned from unsupervised learning processing system 1010 ₁ 'And the solution d returned from the supervised / unsupervised stack processing system 1020 ₂ 'And receive "Unsupervised learning analysis result = d ₁ Analysis result of “” and “supervised / unsupervised learning” = d ₂ Add “” as a feature to the original set of features.
[0226]
FIG. 22 shows another configuration example of the language analysis processing system according to the fourth embodiment. The language analysis processing system 650 includes a processing unit similar to that of the language analysis processing system 550, and further includes a supervised / unsupervised learning processing system 1040 for stacking.
[0227]
The first feature addition unit 621 of the language analysis processing system 650 uses the unsupervised learning processing system 1030 for stacking a solution or a set of solution candidates and feature sets received from the feature-solution pair / feature-solution candidate pair extraction unit 561. And the solution d passed to the unsupervised learning processing system 1040 for stack and returned from the unsupervised learning processing system 1030 for stack _Three And d returned from the supervised / unsupervised learning processing system 1040 for stacks _Four Receive. And “analysis result of unsupervised learning = d _Three "And" supervised / unsupervised analysis result = d _Four "Is added as a feature to the original set of features.
[0228]
Also, the second feature adding unit 622 of the language analysis processing system 650 uses the unsupervised learning processing system 1030 for stacks and the supervised / unsupervised stacks for pairs of solution candidates and feature sets received from the feature-solution candidate extracting unit 564. The solution d passed to the none learning processing system 1040 and returned from the unsupervised learning processing system 1030 for stack _Three 'And the solution d returned from the supervised / unsupervised stack processing system 1040 _Four 'And receive "Unsupervised learning analysis result = d _Three Analysis result of “” and “supervised / unsupervised learning” = d _Four Add “” as a feature to the original set of features.
[0229]
The unsupervised learning processing system for stack 1030 and the supervised / unsupervised learning processing system 1040 for stack have solutions d _Three , Solution d _Three ', Solution d _Four , Solution d _Four As well as outputting a solution candidate estimated as a solution, it is also possible to output information on whether the solution is a positive or negative example, information on the probability of being a positive or negative example, and the like. In this case, part or all of the information included in the received solution is added to the feature set. For example, “Unsupervised learning analysis result = estimated solution candidate”, “Unsupervised learning analysis result = positive / negative example”, or “Unsupervised learning analysis result = positive probability / negative probability” One or more features such as "" are added to the set of original features.
[0230]
As already explained, since unsupervised data has different properties from supervised data, simply adding unsupervised data to supervised data and performing machine learning is insufficient to improve processing accuracy. In some cases. By combining machine learning with unsupervised data and machine learning with supervised data using the stacking method as in this embodiment, the advantages of both can be used appropriately, and the accuracy of the analysis process is improved. It seems that he was able to.
[0231]
Finally, an embodiment of the technique according to the prior art and the technique of the present invention will be described. As an example, the case conversion process in the sentence conversion process from passive sentence / usage sentence to active sentence was adopted. The support vector machine method is adopted as the machine learning method. Moreover, the Kyoto University corpus was used as supervised data, and all case particles (53,157) of active sentences included in the Kyoto University corpus were used as unsupervised data. FIG. 23 shows the distribution of converted case particles in unsupervised data.
[0232]
Furthermore, the Kyoto University corpus was also used for evaluation of processing accuracy in the examples, and evaluation was performed by 10-part cross validation.
[Reference 6: Ikuo Kurohashi, Makoto Nagao, Kyoto University Text Corpus Project, 3rd Annual Conference of the Language Processing Society, 1997, pp115-118]
An experiment of conversion of case particles was performed using the following method.
[0233]
・ Use of supervised learning
・ Use of unsupervised learning
・ Use of supervised / unsupervised learning
・ Stacking method 1:
After adding the analysis result of unsupervised learning to the feature, supervised learning is performed.
[0234]
・ Stacking method 2:
After adding the analysis result of supervised / unsupervised learning to the feature, supervised learning is performed.
[0235]
・ Stacking method 3:
After adding the analysis result of unsupervised learning and the analysis result of supervised / unsupervised learning to the feature, supervised learning is performed.
[0236]
The evaluation results of the processing accuracy are shown below. The processing accuracy means how many of the 4,671 cases of supervised data are correct.
[0237]
・ Use of supervised learning = 89.06%
-Use of unsupervised learning = 51.15%
・ Use of supervised / unsupervised learning = 87.09%
・ Stacking method 1 = 89.47%
・ Stacking method 2 = 89.55%
・ Stacking method 3 = 89.55%
The accuracy of processing using the supervised learning method was 89.06%. This means that the case particle conversion process in the sentence conversion from the passive sentence / usage sentence to the active sentence can be realized at least with this accuracy by using the machine learning method. Conventionally, since there is no case particle conversion processing using the machine learning method, this accuracy of the embodiment of the present invention shows the special effect of the present invention.
[0238]
The accuracy of processing using the unsupervised learning method was extremely low at 51.15%. The effect of lack of information on the pre-conversion case particles to be analyzed is considered to be significant.
[0239]
Also, the accuracy of processing using the supervised / unsupervised learning method was lower than the accuracy of processing using the supervised learning method. Since unsupervised data has a different property from supervised data, the use of unsupervised data is considered to have caused a decrease in accuracy.
[0240]
The accuracy of processing using all stacking techniques exceeded the accuracy of processing using supervised learning methods. However, the improvement in accuracy is not great. Therefore, as a result of statistical test using binomial test, all stacking methods have a significant difference with respect to supervised learning at a significance level of 0.01. For this reason, it has been confirmed that the method of using the results of unsupervised learning in the present invention in addition to the features has an effect.
[0241]
Furthermore, for comparison with the accuracy of the “processing using supervised learning” of the present invention, processing according to the method described in Non-Patent Document 4 was performed as one of the conventional techniques.
[0242]
The accuracy of the case conversion process by the method described in Non-Patent Document 4 was 36% in terms of F value (recall rate 75%, compliance rate 24%). The reason why the processing accuracy according to this conventional technique is low is that a word that is not in the dictionary exists in a given sentence. The accuracy of processing after registering an undefined word in such a dictionary was 83% in terms of F value (reproduction rate 94%, matching rate 74%). Here, the accuracy is indicated by the F value because the case conversion by the method of Non-Patent Document 4 outputs a plurality of conversion results to one input. Thus, as already pointed out, it can be seen that the influence of the insufficiency of each existing frame dictionary is large.
[0243]
Moreover, since the processing result by the method of the nonpatent literature 4 is a sentence unit, the processing result by this invention was also totaled per sentence. At this time, in the processing according to the present invention, the accuracy of sentence units was 85.58%. However, the sentence unit here is a sentence with a single predicate, and the accuracy of the sentence composed of a plurality of sentences such as a compound sentence is calculated after the predicate is divided into one sentence.
[0244]
The processing accuracy according to the present invention is comparable to the processing accuracy after an unknown word or the like is registered in the dictionary by the method shown in Non-Patent Document 4. In the present invention, accuracy of about 85% is obtained without performing any additional registration in the dictionary for information to be analyzed. From this, it can be seen that the processing according to the present invention can be performed with higher accuracy than the prior art.
[0245]
As mentioned above, although this invention was demonstrated by the embodiment, it cannot be overemphasized that a various deformation | transformation is possible for this invention in the range of the main point.
[0246]
In the embodiment of the present invention, the case particle conversion in the conversion process from the passive sentence and the active sentence to the active sentence is mainly handled. However, the present invention also relates to a conversion process from an active sentence to a passive sentence and a working sentence by setting a classification destination in the machine learning unit in the present invention from a case particle in an active sentence to a passive sentence and a case particle in a working sentence. It is possible to apply.
[0247]
In addition to the analysis processing described as the language analysis processing in the embodiment of the present invention, anaphora analysis such as directives, pronouns, and zero pronouns, indirect anaphora analysis, semantic analysis of “B of A”, metaphor analysis, etc. The present invention can be applied to various analysis processes, case particle generation processing in sentence generation processing, case particle generation processing in translation processing, and the like.
[0248]
Each means, function, or element of the present invention can be realized as a processing program that is read and executed by a computer. The processing program for realizing the present invention can be stored in an appropriate recording medium such as a portable medium memory, a semiconductor memory, and a hard disk, which can be read by a computer, and is provided by being recorded on these recording media. Alternatively, it is provided by transmission / reception using various communication networks via a communication interface.
[0249]
【The invention's effect】
As described above, according to the present invention, an analysis result of machine learning using unsupervised data is added to a feature, and a new technique for performing machine learning using supervised data having the added feature has been realized. As a result, machine learning using the advantages of both unsupervised data and supervised data can be realized, and sentence conversion processing with higher accuracy can be realized.
[0250]
In particular, the present invention can be applied to a very wide range of problems including phrase generation processing such as omission completion processing, sentence generation processing, machine translation processing, character recognition processing, and speech recognition processing. Thereby, a highly practical language analysis processing system can be realized.
[0251]
In addition, according to the present invention, a new technique has been realized in which case particles are converted by using machine learning in a conversion process from Japanese passive sentences / serving sentences to active sentences. According to the present invention, post-conversion case particles can be estimated with higher accuracy than in the past.
[0252]
The conversion of passive sentences / usage sentences to active sentences to which the present invention is applied is useful in many fields of natural language processing using computers such as sentence generation processing, sentence paraphrase processing, knowledge acquisition system, question answering system, etc. is there.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration example of a sentence conversion processing system according to a first embodiment.
FIG. 2 is a diagram showing a processing flow of the sentence conversion processing system in the first embodiment.
FIG. 3 is a diagram illustrating an example of a case stored in a tagged corpus.
FIG. 4 is a diagram illustrating a concept of margin maximization in the support vector machine method.
FIG. 5 is a diagram illustrating another configuration example of the sentence conversion processing system according to the first embodiment.
FIG. 6 is a diagram showing a processing flow of a sentence conversion processing system taking another configuration example in the first embodiment;
FIG. 7 is a diagram for explaining unsupervised data;
FIG. 8 is a diagram illustrating a configuration example of a sentence conversion processing system according to a second embodiment.
FIG. 9 is a diagram illustrating a processing flow of unsupervised data generation processing;
FIG. 10 is a diagram illustrating another configuration example of the sentence conversion processing system according to the second embodiment.
FIG. 11 is a diagram illustrating a configuration example of a sentence conversion processing system according to a third embodiment.
FIG. 12 is a diagram illustrating another configuration example of the sentence conversion processing system according to the third embodiment.
FIG. 13 is a diagram illustrating a configuration example of a language analysis processing system according to a fourth embodiment.
FIG. 14 is a diagram showing a processing flow of a language analysis processing system in a fourth embodiment;
FIG. 15 is a diagram showing a processing flow of a language analysis processing system in a fourth embodiment;
FIG. 16 is a diagram illustrating another configuration example of the language analysis processing system according to the fourth embodiment;
FIG. 17 is a diagram illustrating another configuration example of the language analysis processing system according to the fourth embodiment.
FIG. 18 is a diagram showing a processing flow of a language analysis processing system taking another configuration example in the fourth embodiment;
FIG. 19 is a diagram showing a processing flow of a language analysis processing system according to another configuration example in the fourth embodiment.
FIG. 20 is a diagram illustrating another configuration example of the language analysis processing system according to the fourth embodiment.
FIG. 21 is a diagram illustrating another configuration example of the language analysis processing system according to the fourth embodiment;
FIG. 22 is a diagram illustrating another configuration example of the language analysis processing system according to the fourth embodiment.
FIG. 23 is a diagram illustrating a distribution of converted case particles in unsupervised data in an example.
[Explanation of symbols]
100, 150, 200, 250, 300, 350 sentence conversion processing system
101,501 solution-feature pair extraction unit
102, 162, 502, 562 Machine learning unit
103,163,503,563 Learning result database
110,504 feature extraction unit
111,171,505,565 Solution estimation processing unit
161,561 Feature-solution pair / feature-solution candidate pair extraction unit
170,564 Feature-solution candidate extraction unit
201 Problem expression equivalent part extraction part
202 Problem expression information storage unit
203 Semantic analysis information storage unit
204 Problem structure converter
205 Unsupervised data storage
500, 540, 550, 580, 600, 650 Language analysis processing system
511, 521, 611, 621 First feature addition unit
512, 522, 612, 622 second feature addition unit
1010, 1030 Unsupervised learning processing system for stack
1020, 1040 Stacked supervised / unsupervised learning processing system
2,6 solution database
3 Input sentences
4 solutions
5 sentence database

Claims

A main processing system that performs language analysis processing using machine learning processing, and a stack processing system that provides data used in the machine learning processing to the main processing system, and performs predetermined language analysis processing. A language analysis processing system to perform,
The stack processing system includes:
Sentence data storage means for storing sentence data that is an analysis target in the language analysis process and does not include solution information for a problem handled in the machine learning process;
Problem expression information storage means for storing a problem expression, which is a predetermined sentence expression indicating the problem, and a portion corresponding to the problem expression;
A problem expression equivalent part extracting means for extracting a part that matches the part corresponding to the problem expression from the sentence data stored in the sentence data storage means,
A problem structure conversion means for creating unsupervised data that is a set of a problem and a solution, using the problem sentence equivalent part of the sentence data converted by the problem expression as a problem, the problem expression equivalent part as a solution,
Unsupervised data storage means for storing the created unsupervised data;
From a problem of unsupervised data stored in the unsupervised data storage means, a feature that is predetermined information including at least a character string, a word, or a part of speech is extracted by a predetermined analysis process, and the feature for each unsupervised data Stack-solution-feature pair extraction means for generating a set of sets and solutions;
Based on a predetermined machine learning algorithm, for a set of the feature set and the solution, what kind of solution is likely to become a machine learning process, and as a learning result, A machine learning means for stack that stores in the learning result data storage means for stacks what kind of solution is likely to become in the case of what feature set,
The learning result data for the stack when the set of features that are the predetermined information extracted by the same extraction process as the extraction process performed by the stack solution-feature pair extraction unit is received from the main processing system. Based on what kind of feature set stored as a learning result in the storage means, the solution likely to become the case of the feature set is estimated, and the estimated A stack solution estimation processing means for outputting the solution as a stack output solution,
The main processing system is:
Sentence data composed of a problem and a solution, solution data storage means for storing solution data to which solution information for a problem to be analyzed in the language analysis process and handled in the machine learning process is added;
The feature, which is the predetermined information, is extracted from the problem of the solution data stored in the solution data storage unit by an extraction process similar to the extraction process performed by the stack solution-feature pair extraction unit, and for each solution data Main solution-feature pair extraction means for generating a set of feature sets and solutions;
The stack output solution estimated and output by the stack solution estimation processing means with respect to the set of features generated by the main solution-feature pair extraction means is converted by the main solution-feature pair extraction means. A first feature adding means for adding a feature to the generated feature set to be a first feature set;
Based on a predetermined machine learning algorithm, machine learning processing is performed on what kind of feature set the first feature set and solution are likely to become, and a learning result As a main machine learning means for storing in the main learning result data storage means what kind of solution is likely to be in the case of what feature set,
A feature extraction unit that extracts the predetermined information from the input sentence data that is input as the target of the language analysis processing by the same extraction process as the extraction process performed by the stack solution-feature pair extraction unit;
The stack output solution estimated and output by the stack solution estimation processing means for the feature set generated by the feature extraction means is added as a feature to the feature set generated by the feature extraction means. , A second feature adding means as a second feature set;
Based on what kind of set of features stored as learning results in the main learning result data storage means, it becomes the case of the second set of features. Solution estimation processing means for estimating an easy solution,
Using a decision list method or a maximum entropy method or a support vector machine method as the predetermined machine learning algorithm;
In the decision list method, the stack machine learning means and the main machine learning means use a set of feature sets and solutions of the unsupervised data as rules, and a list in which the rules are stored in a predetermined priority order is provided. Compared with the set of features of the input data in descending order of priority, the rules stored in the list as learning results are stored as the learning results and are stored in the list as the learning results by the solution estimation processing means for the stack and the solution estimation processing means, A process in which a solution of a rule with a matched feature is estimated as a solution that is likely to be a set of features of the input data, or
In the maximum entropy method, the set of features satisfy a predetermined conditional expression from the set of features and solutions of the unsupervised data by the stack machine learning means and the main machine learning means, and the entropy Is stored as the learning result, and the solution estimation processing means for stack and the solution estimation processing means determine the probability distribution of the input data based on the probability distribution that is the learning result. A process in which the probability of each classification in the case of a set of features is obtained, and the classification having the maximum probability value is estimated as a solution that is likely to occur when the set of features of the input data, or
In the support vector machine method, a hyperplane is determined by a predetermined support vector machine method using a set of feature sets and solutions of the unsupervised data by the stack machine learning means and the main machine learning means. , The hyperplane and a classification of the space divided by the hyperplane are stored as the learning result, and the stack solution estimation processing means and the solution estimation processing means, based on the hyperplane as the learning result, Whether the set of features of the input sentence data belongs to any of the spaces divided by the hyperplane is determined, and the classification of the space to which the set of features belongs is a set of features of the input sentence data A language analysis processing system characterized in that processing that is estimated as an easy solution is performed.

The stack processing system includes solution data storage means for storing solution data to which solution information for a problem that is an analysis target in the language analysis process and is handled in the machine learning process is provided. As well as
The stack solution-feature pair extraction unit extracts the feature as the predetermined information by the extraction process from the solution data problem stored in the solution data storage unit, and the set of features for each solution data And a set of solutions and
The machine learning means for stacks determines what kind of solution is likely to be obtained in the case of a set of features for a set of features and solutions generated from the sentence data and the solution data. The language analysis processing system according to claim 1, wherein learning processing is performed.

A main processing system that performs language analysis processing using machine learning processing, and a stack processing system that provides data used in the machine learning processing to the main processing system, and performs predetermined language analysis processing. A language analysis processing system to perform,
The stack processing system includes:
Sentence data storage means for storing sentence data that is an analysis target in the language analysis process and does not include solution information for a problem handled in the machine learning process;
Problem expression information storage means for storing a problem expression, which is a predetermined sentence expression indicating the problem, and a portion corresponding to the problem expression;
A problem expression equivalent part extracting means for extracting a part that matches the part corresponding to the problem expression from the sentence data stored in the sentence data storage means,
Create unsupervised data that is a set of a problem and a solution or a solution candidate, with a converted sentence obtained by converting the problem expression corresponding portion of the sentence data with the problem expression as a problem and the problem expression corresponding portion as a solution or a solution candidate Problem structure conversion means,
Unsupervised data storage means for storing the created unsupervised data;
From a problem of unsupervised data stored in the unsupervised data storage means, a feature that is predetermined information including at least a character string, a word, or a part of speech is extracted by a predetermined analysis process, and the feature for each unsupervised data Stack feature-solution pair / feature-solution candidate pair extraction means for generating a set of solutions and solution candidates,
Based on a predetermined machine learning algorithm, with respect to a set of the feature set and solution or solution candidate, a positive example that is a predetermined two classification destination in the case of any set of feature set and solution or solution candidate, or A stack that performs machine learning processing of a probability that is a negative example, and stores a probability that is a positive example or a negative example in a learning result data storage unit for a stack in the case of a set of the feature set and a solution or a solution candidate as a learning result Machine learning means,
A set of features that are the predetermined information extracted by the same extraction process as the extraction process performed by the stack feature-solution pair / feature-solution candidate pair extraction unit from the main processing system, When a set of solution candidates is received, based on the set of features stored as learning results in the learning result data storage means and the probability of being a positive example or a negative example in the case of a solution or solution candidate set, A stack that obtains a probability of being a positive example or a negative example in the case of a set of feature sets and solution candidates, and outputs a solution candidate having a maximum probability of being a positive example from among all solution candidates as an output solution for a stack Solution estimation processing means,
The main processing system is:
Sentence data composed of a problem and a solution, solution data storage means for storing solution data to which solution information for a problem to be analyzed in the language analysis process and handled in the machine learning process is added;
The feature, which is the predetermined information, is extracted from the problem of the solution data stored in the solution data storage unit by the same extraction process as that performed by the feature-solution pair / feature-solution candidate pair extraction unit for the stack. , A main feature-solution pair / feature-solution candidate pair extraction means for generating a set of the feature set and the solution or solution candidate;
Stack output estimated and output by the stack solution estimation processing means for the set of the feature set and solution or solution candidate generated by the main feature-solution pair / feature-solution candidate pair extraction means First feature adding means for adding a solution as a feature to the set of features generated by the main solution-feature pair extraction means, and making the first feature set;
Based on a predetermined machine learning algorithm, for a set of the solution and the first feature set and a solution or solution candidate, a probability that is a positive example or a negative example in the case of the feature set and the solution or solution candidate is determined by a machine. A main machine learning unit that performs learning processing and stores, as a learning result, a set of features and a probability that is a positive example or a negative example in the case of a solution candidate, in a main learning result data storage unit;
From the input sentence data input as the object of the language analysis processing, the features as the predetermined information are extracted by the same extraction processing as the extraction processing performed by the stack feature-solution pair / feature-solution candidate pair extraction means. A feature extraction means;
A feature output generated by the feature extraction means is an output solution for stack estimated and output by the solution estimation processing means for the stack with respect to a set of the feature set and solution or solution candidate generated by the feature extraction means. A second feature adding means for adding to the set of features as a second feature set,
Based on the probability of being a positive example or a negative example in the case of a set of the feature set and solution or solution candidate stored as a learning result in the main learning result data storage means, the second feature set and In the case of a pair with a solution candidate, a probability estimation processing means for obtaining a probability that is a positive example or a negative example and estimating a solution candidate having a maximum probability of being a positive example among all solution candidates,
Using a decision list method or a maximum entropy method or a support vector machine method as the predetermined machine learning algorithm;
In the decision list method, the stack machine learning means and the main machine learning means use a set of feature sets and solutions of the unsupervised data as rules, and a list in which the rules are stored in a predetermined priority order is provided. Compared with the set of features of the input data in descending order of priority, the rules stored in the list as learning results are stored as the learning results and are stored in the list as the learning results by the solution estimation processing means for the stack and the solution estimation processing means, A process in which a solution of a rule with a matched feature is estimated as a solution that is likely to be a set of features of the input data, or
In the maximum entropy method, the set of features satisfy a predetermined conditional expression from the set of features and solutions of the unsupervised data by the stack machine learning means and the main machine learning means, and the entropy Is stored as the learning result, and the solution estimation processing means for stack and the solution estimation processing means determine the probability distribution of the input data based on the probability distribution that is the learning result. A process in which the probability of each classification in the case of a set of features is obtained, and the classification having the maximum probability value is estimated as a solution that is likely to occur when the set of features of the input data, or
In the support vector machine method, a hyperplane is determined by a predetermined support vector machine method using a set of feature sets and solutions of the unsupervised data by the stack machine learning means and the main machine learning means. , The hyperplane and a classification of the space divided by the hyperplane are stored as the learning result, and the stack solution estimation processing means and the solution estimation processing means, based on the hyperplane as the learning result, Whether the set of features of the input sentence data belongs to any of the spaces divided by the hyperplane is determined, and the classification of the space to which the set of features belongs is a set of features of the input sentence data A language analysis processing system characterized in that processing that is estimated as an easy solution is performed.

The stack processing system includes solution data storage means for storing solution data to which solution information for a problem that is an analysis target in the language analysis process and is handled in the machine learning process is provided. As well as
The stack solution-feature pair extraction unit extracts the feature as the predetermined information by the extraction process from the solution data problem stored in the solution data storage unit, and the set of features for each solution data And a set of solutions and
The stack machine learning means, for a set of a feature set generated from the sentence data and the solution data and a solution or solution candidate, is a positive example in the case of a set of the feature set and solution or solution candidate. The language analysis processing system according to claim 3, wherein machine learning processing is performed for a probability that is a negative example.

In the stack processing system and the main processing system, when the sentence data to be subjected to the language analysis process is a passive sentence or a use sentence, the post-conversion case in the sentence conversion process from the sentence data to the active sentence. The language analysis processing system according to any one of claims 1 to 4, wherein a particle is analyzed.

A sentence conversion processing system that estimates a case particle after conversion when converting sentence data that is a passive sentence or an active sentence into sentence data of an active sentence using machine learning processing,
Solution data storage means for storing solution data that is composed of a problem and a solution, and that has sentence data as a problem and solution information for the problem in the conversion process as a solution;
From the solution data problem stored in the solution data storage means, a predetermined analysis process is performed to extract a feature that is predetermined information including at least a character string, a word, or a part of speech, and the set of features for each solution data Solution-feature pair extraction means for generating a pair with a solution;
Based on a predetermined machine learning algorithm, for a set of the feature set and the solution, what kind of solution is likely to become a machine learning process, and as a learning result, Machine learning means for storing in the learning result data storage means what kind of solution is likely to become in the case of what feature set;
A feature extraction unit that extracts the predetermined information from the input sentence data input as the target of the conversion process by the same extraction process as the extraction process performed by the solution-feature pair extraction unit;
Based on what kind of feature set is stored as a learning result in the learning result data storage means, a solution that is likely to be the case of the feature set is estimated. Solution estimation processing means,
Using a decision list method or a maximum entropy method or a support vector machine method as the predetermined machine learning algorithm;
In the decision list method, the machine learning means sets a set of feature sets of unsupervised data and a solution as a rule, and stores a list in which the rule is stored with a predetermined priority as the learning result. The estimation processing means compares the rules stored in the list as the learning result with the set of features of the input data in descending order of priority, and the solution of the rule with the matched feature is the set of features of the input data. A process that is estimated as a likely solution, or
In the maximum entropy method, when the machine learning means maximizes an expression indicating that the feature set satisfies a predetermined conditional expression and exhibits entropy from a set of feature sets and solutions of the unsupervised data. A probability distribution is stored as the learning result, and the solution estimation processing means obtains the probability of each classification in the case of a set of features of the input data based on the probability distribution as the learning result, and the probability is A process in which a classification having the maximum probability value is estimated as a solution that is likely to be a set of features of the input data, or
In the support vector machine method, the machine learning means obtains a hyperplane by a predetermined support vector machine method using a set of feature sets and solutions of the unsupervised data, and uses the hyperplane and the hyperplane. A classification of the divided space is stored as the learning result, and a set of features of the input sentence data is divided by the hyperplane based on the hyperplane as the learning result by the solution estimation processing means. The sentence conversion is characterized in that a process is performed in which the classification of the space to which the feature set belongs is estimated as a solution that is likely to occur in the case of the feature set of the input sentence data. Processing system.

A sentence conversion processing system that estimates a case particle after conversion when converting sentence data that is a passive sentence or an active sentence into sentence data of an active sentence using machine learning processing,
Solution data storage means for storing solution data that is composed of a problem and a solution, and that has sentence data as a problem and solution information for the problem in the conversion process as a solution;
A feature, which is predetermined information including at least a character string, a word, or a part of speech, is extracted from a problem of the solution data stored in the solution data storage means by a predetermined analysis process, and the set of the features for each solution data A feature-solution pair / feature-solution candidate pair extraction means for generating a set of a solution and a solution candidate,
Based on a predetermined machine learning algorithm, with respect to a set of the feature set and a solution or a solution candidate, a probability of being a positive example or a negative example in the case of any set of feature set and solution or solution candidate is determined by a machine. Machine learning means that performs learning processing and stores, as a learning result, a probability that is a positive example or a negative example in the case of a set of a set of features and a solution or a solution candidate in a learning result data storage means;
The feature, which is the predetermined information, is extracted from the input sentence data input as the target of the conversion process by an extraction process similar to the extraction process performed by the feature-solution pair / feature-solution candidate pair extraction unit, and the feature A feature-solution candidate pair extraction means for generating a set of a set of solutions and solution candidates;
Based on the probability of being a positive example or a negative example in the case of a set of feature sets and solutions or solution candidates stored as learning results in the learning result data storage means, the set of feature sets and solution candidates A probability estimation processing means for obtaining a probability that is a positive example or a negative example in the case of, and estimating a solution candidate having a maximum probability of being a positive example among all solution candidates;
Using a decision list method or a maximum entropy method or a support vector machine method as the predetermined machine learning algorithm;
In the decision list method, the machine learning means sets a set of feature sets of unsupervised data and a solution as a rule, and stores a list in which the rule is stored with a predetermined priority as the learning result. The estimation processing means compares the rules stored in the list as the learning result with the set of features of the input data in descending order of priority, and the solution of the rule with the matched feature is the set of features of the input data. A process that is estimated as a likely solution, or
In the maximum entropy method, when the machine learning means maximizes an expression indicating that the feature set satisfies a predetermined conditional expression and exhibits entropy from a set of feature sets and solutions of the unsupervised data. A probability distribution is stored as the learning result, and the solution estimation processing means obtains the probability of each classification in the case of a set of features of the input data based on the probability distribution as the learning result, and the probability is A process in which a classification having the maximum probability value is estimated as a solution that is likely to be a set of features of the input data, or
In the support vector machine method, the machine learning means obtains a hyperplane by a predetermined support vector machine method using a set of feature sets and solutions of the unsupervised data, and uses the hyperplane and the hyperplane. A classification of the divided space is stored as the learning result, and a set of features of the input sentence data is divided by the hyperplane based on the hyperplane as the learning result by the solution estimation processing means. The sentence conversion is characterized in that a process is performed in which the classification of the space to which the feature set belongs is estimated as a solution that is likely to occur in the case of the feature set of the input sentence data. Processing system.