JP2004171354A

JP2004171354A - Language analysis processing method, sentence conversion processing method, language analysis processing system, and sentence conversion processing system

Info

Publication number: JP2004171354A
Application number: JP2002337747A
Authority: JP
Inventors: Maki Murata; 真樹村田; Hitoshi Isahara; 均井佐原
Original assignee: Communications Research Laboratory
Current assignee: National Institute of Information and Communications Technology
Priority date: 2002-11-21
Filing date: 2002-11-21
Publication date: 2004-06-17
Anticipated expiration: 2022-11-21
Also published as: JP3780341B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a processing method capable of highly accurately carrying out a converting process of an electronically converted sentence. <P>SOLUTION: A solution-origin pair extracting part 101 takes out example data from a solution database 2, and extracts a set of a solution and a collection of origins for every example datum. A mechanical learning part 102 learns what kind of solutions are likely to be obtained under what kind of cases of origins by using the set of the solution and the collection of origins, and a learning result is memorized in a learning result database 103. An origin extracting part 110 extracts the collection of origins from an input sentence 3. A solution prediction processing part 111 predicts what kind of solutions are likely to be obtained in the case of the origin extracted from the input sentence 3 by referring to the learning result database 103, and a predicted solution 4 is output. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、コンピュータで実現する自然言語処理技術に関する。さらに詳しくは、機械学習法により電子化された文を用いた言語解析処理方法および前記処理方法を実現する処理システムに関する。
【０００２】
特に、本発明は、省略補完処理、文生成処理、機械翻訳処理、文字認識処理、音声認識処理など、語句を生成する処理を含むような極めて広範囲な問題を扱う言語処理に適用することができる。
【０００３】
【従来の技術】
言語解析処理の分野では、形態素解析、構文解析の次の段階である意味解析処理が重要性を増している。特に意味解析の主要部分である格解析処理、省略解析処理などにおいて、処理にかかる労力の負担軽減や処理精度の向上が望まれている。
【０００４】
格解析処理とは、文の一部が主題化もしくは連体化などをすることにより隠れている表層格を復元する処理である。例えば、「りんごは食べた。」という文において、「りんごは」の部分は主題化しているが、この部分を表層格に戻すと「りんごを」である。このように、「りんごは食べた。」の「りんごは」の「は」の部分を「ヲ格」と解析する処理である。また、「昨日買った本はもう読んだ。」という文において、「買った本」の部分が連体化しているが、この部分を表層格に戻すと「本を買った」である。この場合に、「買った本」の連体の部分を「ヲ格」と解析する。
【０００５】
省略解析処理とは、文の一部に省略されている表層格を復元する処理を意味する。例えば、「みかんを買いました。そして食べました。」という文において、「そして食べました」の部分に省略されている名詞句（ゼロ代名詞）は「みかんを」であると解析する。
【０００６】
このような言語解析処理をコンピュータで実現する場合に、処理を行う者の労力の負担を軽減しつつ高い処理精度を得るために、機械学習法を用いて言語解析処理を行う手法を提示した（非特許文献１参照）。
【０００７】
非特許文献１において提示した機械学習法を用いて言語解析処理を行う手法（非借用型機械学習法）は、以下のような利点を備える。
（ｉ）より大きな教師データを持つコーパスを用意することで、さらに高い精度で処理を行えることができると推測できる。
（ｉｉ）よりよい機械学習手法が開発されたとき、その機械学習手法を用いることでさらに高い精度を獲得できると予測できる。
【０００８】
さらに、非特許文献１では、借用型機械学習法を用いた言語解析処理方法を提示した。借用型機械学習法とは、機械学習法の解析対象となる情報が付加されていないデータ（以下「教師なしデータ」という。）から生成した教師信号を用いた機械学習方法である。借用型機械学習法によれば、例えば格フレーム辞書など、人手などで解析対象となる情報（解情報）を予め付与しておいたデータを用いることなく、大量に存在する一般的な電子化された文を機械学習の教師なしデータとして利用することができ、大量の教師信号による機械学習の学習精度が向上するため、高い精度の言語解析処理を実現することができる。
【０００９】
さらに、非特許文献１では、併用型機械学習法を用いた言語解析処理方法を提示した。併用型機械学習法とは、通常の機械学習法で用いる教師信号すなわち機械学習法の解析対象となる情報が付加されたデータ（以下「教師ありデータ」という。）と、教師なしデータから生成した教師信号とを用いて機械学習を行う方法である。併用型機械学習法によれば、取得が容易な教師なしデータから生成された大量の教師信号と、通常の学習精度を確保できる教師ありデータの教師信号との両方の利点を活かした言語解析処理を実現することができる。
【００１０】
また、自然言語処理の分野における重要な問題として、受け身文や使役文から能動文への変換処理がある。この文変換処理は、文生成処理、言い換え処理、文の平易化／言語運用支援、自然言語文を利用した知識獲得・情報抽出処理、質問応答システムなど、多くの研究分野で役に立つ。例えば質問応答システムにおいて、質問文が能動文で書かれ回答を含む文が受動文で書かれているような文書がある場合に、質問文と回答を含む文では文構造が異なっているために質問の回答を取り出すのが困難な場合がある。このような問題も、受け身文や使役文から能動文への変換処理を行うことにより解決することができる。
【００１１】
日本語の受け身文や使役文を能動文に文変換処理する際には、文変換後に用いる変換後格助詞を推定することが求められる。例えば、「犬に私が噛まれた。」という受け身文から「犬が私を噛んだ。」という能動文に変換する場合に、「犬に」の格助詞「に」が「が」に、「私が」の「が」が「を」に変換されると推定する処理である。また、「彼が彼女に髪を切らせた。」という使役文を「彼女が髪を切った。」という能動文に変換する場合に、「彼女に」の格助詞「に」が「が」に変換され、「髪を」の「を」は変換しないと推定する処理である。しかし、受け身文や使役文から能動文への変換処理における格助詞の変換は、変換される格助詞が動詞やその動詞の使われ方に依存して変わるので、簡単に自動処理できる問題ではない。
【００１２】
格助詞の変換処理については、例えば、以下の非特許文献２〜４に示すような従来手法がいくつかある。非特許文献２〜４で開示されている技術では、格助詞の変換処理の問題を、どのように格助詞を変換すればよいかを記載した格フレーム辞書を用いて対処している。
【００１３】
【非特許文献１】
村田真樹、
機械学習手法を用いた日本語格解析−教師信号借用型と非借用型さらには併用型−、
電子情報通信学会、電子情報通信学会技術研究報告ＮＬＣ−２００１−２４
２００１年７月１７日
【非特許文献２】
情報処理振興事業協会技術センター、
計算機用日本語基本動詞辞書ＩＰＡＬ（ＢａｓｉｃＶｅｒｂｓ）説明書、
１９８７
【非特許文献３】
ＳａｄａｏＫｕｒｏｈａｓｈｉａｎｄＭａｋｏｔｏＮａｇａｏ，
ＡＭｅｔｈｏｄｏｆＣａｓｅＳｔｒｕｃｔｕｒｅＡｎａｌｙｓｉｓｆｏｒＪａｐａｎｅｓｅＳｅｎｔｅｎｃｅｓｂａｓｅｄｏｎＥｘａｍｐｌｅｓｉｎＣａｓｅＦｒａｍｅＤｉｃｔｉｏｎａｒｙ，
ＩＥＩＣＥＴｒａｎｓａｃｔｉｏｎｓｏｆＩｎｆｏｒｍａｔｉｏｎａｎｄＳｙｓｔｅｍｓ，Ｖｏｌ．Ｅ７７−Ｄ，Ｎｏ．２，１９９４
【非特許文献４】
近藤恵子、佐藤理史、奥村学、
格変換による単文の言い換え、
情報処理学会論文誌、Ｖｏｌ．４２，Ｎｏ．３，
２００１
【００１４】
【発明が解決しようとする課題】
前記の非特許文献１は、機械学習法を言語解析処理に適用することで処理精度を向上させるという効果を持つ。また、借用型機械学習法や併用型機械学習法は、人手による労力負担を増やすことなく機械学習の教師信号を増大させることができる点で非常に有効である。
【００１５】
機械学習処理では、与えられた教師データにおいて正解率を最大とするように学習を行う。また、教師なしデータは、解析対象となる情報を持たないという点で教師ありデータと異なる性質のものである。
【００１６】
したがって、非特許文献１に示す併用型機械学習法のように単純に教師なしデータを教師ありデータに追加した教師信号を用いた機械学習処理は、教師ありデータと教師なしデータとを合計したデータでの正解率を最大にするように学習する。そのため、教師なしデータと教師ありデータとの関係によっては教師ありデータだけでの正解率を最大にするように学習した機械学習の場合に比べて学習精度が低下してしまうという問題が生ずる。
【００１７】
このような従来技術の問題に鑑みると、教師ありデータと教師なしデータの利点を活かして、より確実に精度の高い学習処理が行えるような手法の実現が求められる。
【００１８】
また、受け身文・使役文から能動文への文変換処理について、前記の非特許文献２〜４に示すような従来の技術では、どのように格助詞を変換すればよいかをすべての動詞とその動詞の使い方について記載した格フレーム辞書が必要であった。
【００１９】
しかし、すべての動詞とその動詞の使い方を記載した辞書を用意することは事実上困難であるため、この格フレーム辞書を用いた変換処理方法は不十分であり、格フレーム辞書に記載されていない動詞や動詞の使い方がされた文を変換することができなかったり、誤変換する確率が高かったりするという問題が生じていた。
【００２０】
したがって、特に受け身文・使役文から能動文への文変換処理について、人手による労力負担を増大させずに高い精度の処理が行えるような手法が求められる。
【００２１】
本発明の目的は、教師ありデータと教師なしデータの両方を用いて機械学習を行う併用型教師学習法を用いて言語解析処理を行う場合に、双方のデータの利点を活かして、より高い精度で言語解析処理を行える処理システムを提供することである。
【００２２】
さらに、本発明の目的は、特に受け身文や使役文から能動文への文変換処理について、機械学習法を用いて高い精度で変換後格助詞を推定できる文変換処理システムを提供することである。
【００２３】
【課題を解決するための手段】
上記の目的を達成するため、本発明は以下のような構成をとる。
【００２４】
本発明は、スタック用処理システムと前記スタック用処理システムの処理結果を用いて処理を行うメイン処理システムとで構成される言語解析処理システムで機械学習法を用いて言語解析を処理する方法であって、前記スタック用処理システムでは、解析処理を行う場合に機械学習法が問題とする表現の情報を予め記憶した問題表現情報記憶部へアクセスし、前記問題表現情報をもとに、当該問題についての情報が付与されていない事例データから前記問題表現情報に合致する問題表現相当部を抽出し、前記問題表現相当部を解とし前記問題表現相当部を変換した前記事例データを問題とする構造を持つ教師なしデータに変換する。そして、前記教師なしデータごとに解と素性の集合との組を抽出し、前記解と素性の集合との組から、どのような素性の場合にどのような解になりやすいかを機械学習法により学習し、前記学習の結果を学習結果データベースに記憶しておく。その後、前記メイン処理システムから事例データから抽出された第１の素性の集合または処理対象として入力された事例データから抽出された第２の素性の集合を取得し、前記学習結果データベースを参照して、前記第１の素性の集合または前記第２の素性の集合から、どのような素性の場合にどのような解になりやすいかを推定し、前記第１の素性の集合に対する第１の処理結果または第２の素性の集合に対する第２の処理結果を前記メイン処理システムへ送出する。
【００２５】
前記メイン処理システムでは、解析処理を行う場合に機械学習法が扱う問題についての解が付与された事例データを記憶する解データベースにアクセスし、前記解データベースから前記事例データを取り出し、前記事例データごとに解と素性の集合との組を抽出し、前記解と素性の集合との組から素性の集合のみを取り出した第１の素性の集合を前記スタック用処理システムへ送出する。そして、前記スタック用処理システムから前記第１の処理結果を受け取り、前記第１の処理結果を素性として前記事例データから抽出した第１の素性の集合に追加し、前記解と前記処理結果を追加した第１の素性の集合との組から、どのような素性の場合にどのような解になりやすいかを機械学習法により学習し、前記学習の結果を学習結果データベースに記憶する。その後、処理対象として入力された事例データから素性の集合を抽出し、前記入力事例データの第２の素性の集合を前記スタック用処理システムへ送出し、前記スタック用処理システムから出力された第２の処理結果を受け取り、前記第２の処理結果を素性として前記入力事例データの第２の素性の集合に追加し、前記学習結果データベースを参照して、前記処理結果を追加した前記第２の素性の集合から、どのような素性の場合にどのような解になりやすいかを推定する。
【００２６】
また、本発明は、スタック用処理システムと前記スタック用処理システムの処理結果を用いて処理を行うメイン処理システムとで構成される言語解析処理システムで機械学習法を用いて言語解析を処理する方法であって、前記スタック用処理システムでは、解析処理を行う場合に機械学習法が問題とする表現の情報を予め記憶した問題表現情報記憶部へアクセスし、前記問題表現情報をもとに、当該問題についての情報が付与されていない事例データから前記問題表現情報に合致する問題表現相当部を抽出し、前記問題表現相当部を解とし前記問題表現相当部を変換した前記事例データを問題とする構造を持つ教師なしデータに変換し、前記教師なしデータごとに解もしくは解候補と素性の集合との組を抽出し、前記解もしくは解候補と素性の集合との組から、どのような解もしくは解候補と素性の集合の場合に正例であるかの確率または負例であるかの確率を機械学習法により学習し、前記学習の結果を学習結果データベースに記憶しておく。その後、前記メイン処理システムから、事例データから抽出された解もしくは解候補と素性の集合の第１の組または処理対象として入力された事例データから抽出された解候補と素性の集合の第２の組を取得し、前記学習結果データベースを参照して、前記第１の組または前記第２の組から、どのような解候補と素性の集合の場合に正例であるかの確率または負例であるかの確率を求め、すべての解候補の中から正例である確率が最も高い解候補を解と推定する処理を行い、前記第１の組に対する第１の処理結果または第２の組に対する第２の処理結果を前記メイン処理システムへ送出する。
【００２７】
前記メイン処理システムでは、解析処理を行う場合に機械学習法が扱う問題についての解が付与された事例データを記憶する解データベースにアクセスし、前記解データベースから前記事例データを取り出し、前記事例データごとに解もしくは解候補と素性の集合との第１の組を抽出し、前記第１の組を前記スタック用処理システムへ送出する。そして、前記スタック用処理システムから前記第１の処理結果を受け取り、前記第１の処理結果を前記第１の組の素性の集合に追加し、前記第１の処理結果を追加した前記第１の組から、どのような解もしくは解候補と素性の集合の場合に正例であるかの確率または負例であるかの確率を機械学習法により学習し、前記学習の結果を学習結果データベースに記憶する。その後、処理対象として入力された事例データから解候補と素性の集合の第２の組を抽出し、前記第２の組を前記スタック用処理システムへ送出し、前記スタック用処理システムから第２の処理結果を受け取り、前記第２の処理結果を素性として前記第２の組の素性の集合に追加し、前記学習結果データベースを参照して、前記第２の処理結果を追加した前記第２の組から、どのような解候補と素性の集合の場合に正例であるかの確率または負例であるかの確率を求め、すべての解候補の中から正例である確率が最も大きい解候補を求める解として推定する。
【００２８】
このように、本発明では、教師なしデータを用いた機械学習法による解析結果を教師ありデータの素性として組み込むことにより、機械学習処理において教師ありデータについての正解率を最大とするように学習が行われるため、異なる性質の教師なしデータと教師ありデータとの双方の利点を活かした機械学習処理を行うことができ、高い精度の解析処理を実現することができる。
【００２９】
また、本発明は、コンピュータを用いて電子化された受け身文または使役文を能動文へ変換処理する文変換処理方法であって、受け身文または使役文から能動文への文変換処理を行う場合に機械学習法が扱う問題についての解が付与された事例データを記憶する解データベースへアクセスし、前記解データベースから前記事例データを取り出し、前記事例データごとに解と素性の集合との組を抽出し、前記解と素性の集合との組から、どのような素性の場合にどのような解になりやすいかを機械学習法により学習し、前記学習の結果を学習結果データベースに記憶する。そして、解析対象として入力された電子化された文から素性の集合を抽出し、前記学習結果データベースを参照して、前記入力文から抽出された素性の場合にどのような解になりやすいかを推定する。
【００３０】
また、本発明は、コンピュータを用いて電子化された受け身文または使役文を能動文へ変換処理する文変換処理方法であって、受け身文または使役文から能動文への変換処理を行う場合に機械学習法が扱う問題についての解が付与された事例データを記憶する解データベースへアクセスし、前記解データベースから前記事例データを取り出し、前記事例データごとに解もしくは解候補と素性の集合との組を抽出し、前記解もしくは解候補と素性の集合との組から、どのような解もしくは解候補と素性の集合の場合に正例であるかの確率または負例であるかの確率を機械学習法により学習し、前記学習の結果を学習結果データベースに記憶する。そして、解析対象として入力された電子化された文から解候補と素性の集合との組を抽出し、前記学習結果データベースを参照して、前記入力文から抽出された解候補と素性の集合との組から、どのような解候補と素性の集合の場合に正例であるかの確率または負例であるかの確率を求め、すべての解候補の中から正例である確率が最も高い解候補を解と推定する。
【００３１】
受け身文や使役文から能動文への文変換処理における格助詞変換処理は、変換後の文で用いられる格助詞を決定することである。そして、変換後の格助詞の種類数は有限であるから、変換後の格助詞の推定問題は分類問題に帰着でき、機械学習手法を用いた処理として扱うことが可能である。
【００３２】
本発明では、解析対象についての情報（変換後格助詞など）を付与されていない文から生成されたデータ（教師なしデータ）を教師信号として機械学習を行う。これにより、大量に存在する通常の電子データ（文）を教師データとして利用することができ、解析対象についての情報を人手などにより付与するという労力負担を増加させることなく、高い精度の文変換処理を実現することができる。
【００３３】
【発明の実施の形態】
以下に本発明の実施の形態のいくつかを説明する。
【００３４】
第１の実施の形態として、受け身文・使役文から能動文への文変換処理に教師ありデータを用いた機械学習法（非借用型機械学習法）を適用する処理について説明する。また、第２の実施の形態として、受け身文・使役文から能動文への文変換処理に教師なしデータを用いた機械学習法（借用型機械学習法）を適用する処理について説明する。また、第３の実施の形態として、受け身文・使役文から能動文への文変換処理に教師ありデータと教師なしデータを併用して用いた機械学習法（併用型機械学習法）を適用する処理について説明する。
【００３５】
さらに、第４の実施の形態として、言語解析処理に、教師なしデータを用いた機械学習の結果を、教師ありデータの素性として用いた機械学習法（教師なしデータスタック型機械学習法）を適用する処理について説明する。
【００３６】
なお、本発明の実施の形態において、受け身文・使役文から能動文への変換処理での格助詞の変換処理とは、元の受け身文・使役文の格助詞を変換後の能動文の格助詞へ変換する処理、および元の受け身文・使役文の不要部分を消去する処理をいう。不要部分とは、使役文「彼が彼女に髪を切らせた。」から能動文「彼女が髪を切った。」への文変換において、元の使役文「彼が」の部分である。また、元の文（受け身文・使役文）の格助詞を変換前格助詞とし、能動文への文変換時に付与される新たな格助詞を変換後格助詞とする。
【００３７】
本形態では、これらの格助詞変換処理のみを対象にし、能動文への変換に伴う助動詞表現の変換処理などは処理対象として説明しない。助動詞表現部分程度の変換処理は、既存の処理、例えば文法に従った規則を用いる処理を用いて容易に実現することが可能である。
【００３８】
〔第１の実施の形態〕
第１の実施の形態として、受け身文・使役文から能動文への文変換処理を行う場合に、教師ありデータを用いた機械学習により、変更されるべき格助詞を自動変換処理する文変換処理システムの処理を説明する。
【００３９】
図１に、本形態における文変換処理システムの構成例を示す。文変換処理システム１００は、ＣＰＵおよびメモリからなり、解−素性対抽出部１０１、機械学習部１０２、学習結果データベース１０３、素性抽出部１１０、解推定処理部１１１および解データベース２を備える。
【００４０】
解−素性対抽出部１０１は、解データベース２から教師ありデータである事例を取り出し、事例ごとに事例の解と素性の集合との組（対）を抽出する手段である。
【００４１】
機械学習部１０２は、抽出された解と素性の集合との組から、どのうような素性のときにどのような解となりやすいかを機械学習法により学習し、その学習結果を学習結果データベース１０３に記憶する手段である。
【００４２】
素性抽出部１１０は、入力された文（受け身文または使役文）３から、素性の集合を抽出する手段である。なお、文は、文または少なくとも体言と用言を持つ文の一部とする。
【００４３】
解推定処理部１１１は、学習結果データベース１０３を参照して、入力文３の素性の場合にどのような解になりやすいか、すなわち能動文へ変換する場合に変換後格助詞になりやすい格助詞を推定し、推定した格助詞を解４として出力する手段である。
【００４４】
解データベース２は、機械学習で解析対象となる情報が付与された「問題−解」という構造を持つ教師ありデータを記憶する。本形態では、受け身文・使役文から能動文への変換処理における変換後格助詞が解析対象であり、能動文への変換処理で変更されるべき格助詞（変換後格助詞）の情報がタグ付けされた事例（単文）が記憶されたデータベースを利用することができる。
【００４５】
図２に、文変換処理システム１００の処理フローを示す。
【００４６】
ステップＳ１：解−素性対抽出部１０１により、解データベース２から事例を取り出し、各事例ごとに解と素性の集合との組を抽出する。例えば、解データベース２として、受け身文や使役文のそれぞれの格助詞に対してそれが能動文になったときに用いられる変換後格助詞がタグとして付与されているタグ付きコーパスを用いる。
【００４７】
図３に、タグ付きコーパスに記憶されている事例（単文）を示す。図３に示す単文に下線を付けた５つの格助詞は変換前格助詞であり、下線部の下に矢印で示す格助詞は変換後格助詞を示す情報である。図３（Ａ）の事例は、この受け身文が能動文に変換される場合に、変換前格助詞が、それぞれ、「に」から「が」へ、「が」から「を」へ変換されることを意味する。また、図３（Ｂ）の事例は、この使役文が能動文に変換される場合に、変換前格助詞が、それぞれ、「に」から「が」へ、「を」から「を」へ変換され、「彼が」の部分は消去されることを意味している。「ｏｔｈｅｒ」は、その部分は能動文になるとき消去されることを意味するタグとする。
【００４８】
ここで、素性とは、機械学習法による解析処理で用いる細かい情報の１単位を意味する。抽出する素性としては、例えば以下のようなものがある。
【００４９】
１．体言ｎについている格助詞（変換前格助詞）
２．用言ｖの品詞
３．用言ｖの単語の基本形
４．用言ｖにつく助動詞列（例：「れる」、「させる」など）
５．体言ｎの単語
６．体言ｎの単語の分類語彙表の分類番号
７．用言ｖにかかる体言ｎ以外の体言がとる格
例えば、事例の問題が「犬に噛まれた。」である場合に、
・推定すべき格にある体言ｎの単語＝犬、
・推定すべき格が修飾する用言ｖ（単語の基本形）＝噛む、
・体言ｎと用言ｖとの間の格助詞（変換前格助詞）＝に、
などの素性が抽出される。
【００５０】
また、解は、各事例にタグ情報として付与された変換後格助詞であり、上記のの事例では、
・解（変換後格助詞）＝が
である。そして、解−素性対抽出部１０１は、抽出した素性の集合を機械学習部１０２で実行する機械学習処理での文脈とし、解を分類先とする。
【００５１】
ステップＳ２：機械学習部１０２により、抽出された解と素性の集合との組から、どのような素性のときにどのような解になりやすいかを機械学習法により学習し、この学習結果を学習結果データベース１０３に記憶する。
【００５２】
例えば、事例「犬に噛まれた。⇒が」から抽出された、
・推定すべき格にある体言ｎの単語＝犬、
・推定すべき格が修飾する用言ｖ（単語の基本形）＝噛む、
・体言ｎと用言ｖとの間の格助詞（変換前格助詞）＝に、
のような素性の集合の場合には、
・解（変換後格助詞）＝が
となりやすいことを学習する。
【００５３】
また、事例「ヘビに噛まれた。⇒が」から抽出された、
・推定すべき格にある体言ｎの単語＝ヘビ、
・推定すべき格が修飾する用言ｖ（単語の基本形）＝噛む、
・体言ｎと用言ｖとの間の格助詞（変換前格助詞）＝に、
のような素性の集合の場合にも、
・解（変換後格助詞）＝が
となりやすいことを学習する。
【００５４】
機械学習法は、例えば、決定リスト法、最大エントロピー法、サポートベクトルマシン法などを用いるが、これらの手法に限定されない。
【００５５】
決定リスト法は、素性（解析に用いる情報で文脈を構成する各要素）と分類先の組を規則とし、それらをあらかじめ定めた優先順序でリストに蓄えておき、解析すべき入力が与えられたときに、リストで優先順位の高いところから入力のデータと規則の素性を比較し素性が一致した規則の分類先をその入力の分類先とする方法である。
【００５６】
最大エントロピー法は、あらかじめ設定しておいた素性ｆ_ｊ（１≦ｊ≦ｋ）の集合をＦとするとき、所定の条件式を満足しながらエントロピーを意味する式を最大にするときの確率分布ｐ（ａ，ｂ）を求め、その確率分布にしたがって求まる各分類の確率のうち、もっとも大きい確率値を持つ分類を解（求める分類）とする方法である。
［参考文献１：村田真樹、内山将夫、内元清貴、馬青、井佐原均、種々の機械学習法を用いた多義解消実験、電子情報通信学会言語理解とコミュニケーション研究会，ＮＣＬ２００１−２，（２００１）］
サポートベクトルマシン法は、空間を超平面で分割することにより、２つの分類からなるデータを分類する手法である。サポートベクトルマシン法は、分類の数が２個のデータを扱うものでる。このため、通常、サポートベクトルマシン法にペアワイズ手法を組み合わせて使用することで、分類数が３個以上のデータを扱うことができる。ペアワイズ手法とは、Ｎ個の分類を持つデータの場合に、異なる二つの分類先のあらゆるペア（Ｎ（Ｎ−１）／２個）を作り、各ペアごとにどちらがよいかを２値分類器（ここではサポートベクトルマシン法によるもの）で求め、最終的にＮ（Ｎ−１）／２個の２値分類器の分類先の多数決により、分類先を求める方法である。
［参考文献２：ＮｅｌｌｏＣｒｉｓｔｉａｎｉｎｉａｎｄＪｏｈｎＳｈａｗｅ−Ｔａｙｌｏｒ，ＡｎＩｎｔｒｏｄｕｃｔｉｏｎｔｏＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅｓａｎｄＯｔｈｅｒＫｅｒｎｅｌ−ＢａｓｅｄＬｅａｒｎｉｎｇＭｅｔｈｏｄｓ，（ＣａｍｂｒｉｄｇｅＵｎｉｖｅｒｓｉｔｙＰｒｅｓｓ，２０００）］
［参考文献３：ＴａｋｕＫｕｄｏｈ，ＴｉｎｙＳＶＭ：ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅｓ，（ｈｔｔｐ：／／ｃｌ．ａｉｓｔ−ｎａｒａ．ａｃ．ｊｐ／ｔａｋｕ−ｋｕ／／ｓｏｆｔｗａｒｅ／ＴｉｎｙＳＶＭ／ｉｎｄｅｘ．ｈｔｍｌ，２０００）］
サポートベクトルマシン法を説明するため、図４に、サポートベクトルマシン法のマージン最大化の概念を示す。図４において、白丸は正例、黒丸は負例を意味し、実線は空間を分割する超平面を意味し、破線はマージン領域の境界を表す面を意味する。図４（Ａ）は、正例と負例の間隔が狭い場合（スモールマージン）の概念図、図４（Ｂ）は、正例と負例の間隔が広い場合（ラージマージン）の概念図である。
【００５７】
サポートベクトルマシン法の２つの分類が正例と負例からなるものとすると、学習データにおける正例と負例の間隔（マージン）が大きいものほどオープンデータで誤った分類をする可能性が低いと考えられ、図４（Ｂ）に示すように、このマージンを最大にする超平面を求めそれを用いて分類を行なう。
【００５８】
サポートベクトルマシン法は基本的には上記のとおりであるが、通常、学習データにおいてマージンの内部領域に少数の事例が含まれてもよいとする手法の拡張や、超平面の線形の部分を非線型にする拡張（カーネル関数の導入など）がなされたものが用いられる。
【００５９】
この拡張された方法は、以下の識別関数を用いて分類することと等価であり、その識別関数の出力値が正か負かによって二つの分類を判別することができる。
【００６０】
【数１】

【００６１】
ただし、ｘは識別したい事例の文脈（素性の集合）を、ｘ_ｉとｙ_ｊ（ｉ＝１，…，ｌ，ｙ_ｊ∈｛１，−１｝）は学習データの文脈と分類先を意味し、関数ｓｇｎは、
ｓｇｎ（ｘ）＝１（ｘ≧０）（２）
−１（ｏｔｈｅｒｗｉｓｅ）
であり、また、各α_ｉは式（４）と式（５）の制約のもと式（３）を最大にする場合のものである。
【００６２】
【数２】

【００６３】
また、関数Ｋはカーネル関数と呼ばれ、様々なものが用いられるが、本形態では以下の多項式のものを用いる。
【００６４】
Ｋ（ｘ，ｙ）＝（ｘ・ｙ＋１）^ｄ（６）
Ｃ、ｄは実験的に設定される定数である。後述する具体例ではＣはすべての処理を通して１に固定した。また、ｄは、１と２の二種類を試している。ここで、α_ｉ＞０となるｘ_ｉは，サポートベクトルと呼ばれ、通常，式（１）の和をとっている部分はこの事例のみを用いて計算される。つまり、実際の解析には学習データのうちサポートベクトルと呼ばれる事例のみしか用いられない。
【００６５】
サポートベクトルマシン法は、分類の数が２個のデータを扱うものであるから、分類の数が３個以上のデータを扱うために、ペアワイズ手法を組み合わせて用いることになる。本例では、文変換処理システム１５０は、サポートベクトルマシン法とペアワイズ手法を組み合わせた処理を行う。具体的には、ＴｉｎｙＳＶＭを利用して実現する。
［参考文献４：工藤拓松本裕治，Ｓｕｐｐｏｒｔｖｅｃｔｏｒｍａｃｈｉｎｅを用いたｃｈｕｎｋ同定、自然言語処理研究会、２０００−ＮＬ−１４０，（２０００）］
ステップＳ３：その後、解を求めたいデータとして入力文３が素性抽出部１１０に入力される。
【００６６】
ステップＳ４：素性抽出部１１０により、解−素性対抽出部１０１での処理とほぼ同様の処理により入力文３から素性の集合を取り出し、取り出した素性の集合を解推定処理部１１１へ渡す。例えば、入力文３が「犬に噛まれた。」である場合に、以下のような素性を抽出し、抽出した素性の集合を解推定処理部１１１へ渡す。
【００６７】
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖとの間の変換前格助詞＝に、
ステップＳ５：解推定処理部１１１により、学習結果データベース１０３に記憶した学習結果をもとに、渡された素性の集合の場合にどのような解４になりやすいかを推定し、推定された解（変換後格助詞）４を出力する。
【００６８】
例えば、事例「犬に噛まれた。⇒が」、「ヘビに噛まれた。⇒が」の事例について前記のような学習結果が学習結果データベース１０３に記憶されていた場合には、解推定処理部１１１は、この学習結果を参照して、受け取った入力文３から抽出された素性の集合を解析して、変換後格助詞に最もなりやすいのは「が」であると推定して、解４＝「が」を出力する。
【００６９】
図５に、第１の実施の形態における文変換処理システムの別の構成例を示す。なお、以降の図において同一の番号が付与された処理手段などの構成要素は、同一の機能を持つものとする。
【００７０】
文変換処理システム１５０は、素性−解対・素性−解候補対抽出部１６１、機械学習部１６２、学習結果データベース１６３、素性−解候補対抽出部１７０、解推定処理部１７１、および解データベース２を備える。
【００７１】
素性−解対・素性−解候補対抽出部１６１は、解データベース２から事例を取り出し、事例ごとに解もしくは解候補と素性の集合との組を抽出する手段である。
【００７２】
ここで、解候補は、解以外の解の候補を意味する。すなわち、変換後格助詞となる格助詞が「を」、「に」、「が」、「と」、および「で」の５つであると仮定すると、「が」が解である場合には、「を」、「に」、「と」、および「で」の４つの格助詞が解候補となる。また、解と素性の集合との組を正例と、解候補と素性の集合との組を負例とする。
【００７３】
機械学習部１６２は、素性−解対・素性−解候補対抽出部１６１により抽出された解もしくは解候補と素性の集合との組から、どのような解もしくは解候補と素性の集合との組のときに正例である確率または負例である確率を、サポートベクトルマシン法およびこれに類似する機械学習法により学習し、その学習結果を学習結果データベース１６３に記憶する手段である。
【００７４】
素性−解候補抽出部１７０は、入力文３から解候補と素性の集合との組を素性−解対・素性−解候補対抽出部１６１と同様の処理により抽出し、解推定処理部１７１へ渡す手段である。
【００７５】
解推定処理部１７１は、学習結果データベース１６３を参照して、素性−解候補抽出部１７０から渡された解候補と素性の集合との場合に正例または負例である確率を求め、正例である確率が最も大きい解候補を解４と推定し、推定された解４を出力する手段である。
【００７６】
図６に、文変換処理システム１５０の処理フローを示す。
【００７７】
ステップＳ１１：素性−解対・素性−解候補対抽出部１６１により、解データベース２から事例を取り出し、各事例ごとに、解もしくは解候補と素性の集合との組を抽出する。素性−解対・素性−解候補対抽出部１６１により抽出される素性の集合は、ステップＳ１の処理（図２参照）で抽出する素性の集合と同様である。
【００７８】
ステップＳ１２：機械学習部１６２により、抽出した解もしくは解候補と素性の集合との組から、どのような解もしくは解候補と素性の集合のときに正例である確率または負例である確率を機械学習法により学習する。この学習結果を学習結果データベース１６３に記憶する。
【００７９】
例えば、事例が「犬に噛まれた。⇒が」であって、素性の集合が、
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖとの間の変換前格助詞＝に、
である場合に、解「が」である確率（正例である確率）と、各解候補「を」、「に」、「と」、および「で」のそれぞれである確率（負例である確率）を求める。
【００８０】
ステップＳ１３：その後、素性−解候補抽出部１７０に、解を求めたい入力文３が入力される。
【００８１】
ステップＳ１４：素性−解候補抽出部１７０により、入力文３から解候補と素性の集合との組を、素性−解対・素性−解候補対抽出部１６１と同様の処理により取り出し、取り出した解候補と素性の集合との組を解推定処理部１７１へ渡す。
【００８２】
ステップＳ１５：解推定処理部１７１により、学習結果データベース１６３に記憶された学習結果をもとに、渡された解候補と素性の集合との組の場合に正例である確率または負例である確率を求める。
【００８３】
例えば、入力文が「犬に噛まれた。」である場合に、抽出した素性の集合と解候補「が」、「を」、「に」、「と」、および「で」それぞれについて、正例である確率または負例である確率を求める。
【００８４】
ステップＳ１６：すべての解候補に対して正例である確率または負例である確率を求め、正例である確率が最も高い解候補を求める解４として推定し、推定された解４を出力する。
【００８５】
〔第２の実施の形態〕
第２の実施の形態として、受け身文・使役文から能動文への変換処理において、教師なし学習により格助詞を自動変換する文変換処理システムの処理を説明する。
【００８６】
まず、機械学習法で用いる教師なしデータを説明する。図７（Ａ）に教師なしデータを作成するために与えられる電子化された文を示す。図７（Ａ）の能動文「犬が私を噛んだ。」は、解析対象となる情報すなわち能動文への文変換時の格助詞の変換に関する情報が付与されていないデータである。しかし、図７（Ａ）の文を能動文への文変換の結果と考えると、この能動文へ変換される元の受け身文・使役文で表れるはずの格助詞（変換前格助詞）は不明であるが、推定すべき解すなわち処理結果（能動文）に表れるべき格助詞（変換格助詞）を抽出することができる。
【００８７】
図７（Ｂ）に変換前格助詞と変換後格助詞との関係を表す単文を示す。図７（Ａ）の能動文の変換元の文は、「犬＜？＞私＜？＞噛んだ（噛まれた）。」と表すことができる。元の文に表れるはずの変換前格助詞は与えられていないことから、「＜？＞（不明）」で示す。また、図７（Ａ）の文から抽出した推定すべき解である変換後格助詞は、＜？＞の下に矢印で示す「が」および「を」で示す。図７（Ｂ）に示すように、解析対象となる情報が与えられていない能動文は、変換前格助詞の情報については不明であるが、解（分類先）である変換後格助詞の情報を持つ。そして、図７（Ｂ）に示す文のうち「犬＜？＞噛んだ。」は、以下のような問題構造に変換することができる。
【００８８】
「問題⇒解」＝「犬＜？＞噛んだ。⇒が」
このように、解析対象の情報が付加されていない能動文を機械学習の教師データとして利用できることがわかる。
【００８９】
図７（Ａ）の能動文から生成される教師なしデータは、変換前格助詞の情報を持たないという点で教師ありデータよりも情報が少ない。しかし、受け身文・使役文に比べて能動文の数が多く、かつ手作業によって変換後格助詞の情報をタグ付けするという作業が不要であるため大量の能動文を教師なしデータとして利用することができ、機械学習法で扱う教師信号を増大させるという利点がある。
【００９０】
図８に、第２の実施の形態における文変換処理システムの構成例を示す。文変換処理システム２００は、ＣＰＵおよびメモリからなり、問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、問題構造変換部２０４、教師なしデータ記憶部２０５、解−素性対抽出部１０１、機械学習部１０２、学習結果データベース１０３、素性抽出部１１０、解推定処理部１１１、および文データベース５を備える。
【００９１】
問題表現相当部抽出部２０１は、本システムでの処理においてどのようなものが問題表現に相当する部分（問題表現相当部）であるかを予め記憶した問題表現情報記憶部２０２を参照して、解析対象となる情報が付与されていないデータ（文）を記憶した文データベース５から文を取り出し、取り出した文から問題表現相当部を抽出する手段である。
【００９２】
ここでは、問題表現情報記憶部２０２は、問題表現相当部として受け身文・使役文から能動文への変換において変更されるべき格助詞（変換後格助詞）を記憶しておく。
【００９３】
問題構造変換部２０４は、抽出された問題表現相当部を変換する必要がある場合に、意味解析のための情報を記憶する意味解析情報記憶部２０３を参照して、問題表現相当部を変換した文を問題とし問題表現相当部から抽出した格助詞を解として「問題−解」の構造に変換し、この変換した教師なしデータを事例として教師なしデータ記憶部２０５に記憶する手段である。
【００９４】
文変換処理システム２００の解−素性対抽出部１０１、機械学習部１０２、学習結果データベース１０３、素性抽出部１１０、および解推定処理部１１１は、第１の実施の形態において説明した同一番号の処理手段とほぼ同様の処理を行う手段である。なお、解−素性対抽出部１０１は、教師なしデータ記憶部２０５から、教師なしデータである事例を取り出して各事例ごとに解と素性の集合との組を抽出する。
【００９５】
図９に、教師なしデータ生成処理の処理フローを示す。
【００９６】
ステップＳ２１：文データベース５から、解析対象となる情報が付与されていない自然文の電子データである文（能動文）が問題表現相当部抽出部２０１に入力される。
【００９７】
ステップＳ２２：問題表現相当部抽出部２０１により、問題表現情報記憶部２０２を参照し、入力された能動文の構造を検出して問題表現相当部を抽出する。このとき、どのようなものが問題表現相当部であるかの情報は、問題表現情報記憶部２０２に記憶されている問題表現情報により与えられる。例えば、問題表現情報として「犬＜？＝推定すべき格（変換後格助詞）＞噛む」を記憶しておく。そして、問題表現相当部抽出部２０１は、問題表現情報として記憶している文構造と入力文（能動文）の構造とをマッチングして、一致するものを問題表現相当部とする。例えば入力文が「犬が噛む。」であれば、マッチングの結果、「が」を問題表現相当部として抽出する。
【００９８】
ステップＳ２３：問題構造変換部２０４により、意味解析情報記憶部２０３を参照して、抽出された問題表現相当部を解として抽出し、その部分を問題表現（＜？＞）に変換し、結果として得た文を問題とする。例えば、能動文「犬が噛む。」から問題表現相当部として抽出された「が」を解とし、抽出した「が」の部分を問題表現（＜？＞）に変換し、「犬＜？＞噛む。」を問題とする。
ステップＳ２４：さらに、問題構造変換部２０４により、この問題および解の構成を持つデータを教師なしデータ（事例）として教師なしデータ記憶部２０５に記憶する。
【００９９】
その後、文変換処理システム２００は、第１の実施の形態における処理と同様に処理を行う（図２参照）。すなわち、解−素性対抽出部１０１により、教師なしデータ記憶部２０５から事例を取り出して、事例ごとに解と素性の集合との組を抽出する（ステップＳ１）。
【０１００】
取り出した事例が、「犬＜？＞噛む。」⇒「が」であれば、例えば以下のような素性の集合を抽出する。
【０１０１】
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖの間にあった元の格助詞＝？（不明）。
そして、機械学習部１０２は、解と素性の集合との組から、どのような素性のときにどのような格助詞が解となるかを学習する。機械学習部１０２は、上記のような素性の集合の場合には、「解＝が」になりやすいと学習し、その学習結果を学習結果データベース１０３に記憶する（ステップＳ２）。
【０１０２】
また、取り出した事例が、「ヘビ＜？＞噛む。」⇒「が」であれば、以下のような素性の集合を抽出する。
【０１０３】
・推定すべき格にある体言ｎ＝ヘビ、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖの間にあった元の格助詞＝？（不明）。
そして、機械学習部１０２は、上記のような素性の集合の場合にも、「解＝が」になりやすいと学習し、その学習結果を学習結果データベース１０３に記憶する。
【０１０４】
以降、素性抽出部１１０に入力文３が入力されてから解推定処理部１１１で解４が出力されるまでの処理は、第１の実施の形態における処理として図２の処理フローのステップＳ３〜ステップＳ５に示す処理と同様であるので説明を省略する。
【０１０５】
図１０に、第２の実施の形態における文変換処理システムの別の構成例を示す。文変換処理システム２５０は、問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、問題構造変換部２０４、教師なしデータ記憶部２０５、素性−解対・素性−解候補対抽出部１６１、機械学習部１６２、学習結果データベース１６３、素性−解候補抽出部１７０、解推定処理部１７１、および文データベース５を備える。
【０１０６】
文変換処理システム２５０の問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、および問題構造変換部２０４は、図８に示す同一の番号が付与された各処理手段と同様の処理を行う手段である。
【０１０７】
また、文変換処理システム２５０の素性−解対・素性−解候補対抽出部１６１、機械学習部１６２、学習結果データベース１６３、素性−解候補抽出部１７０、および解推定処理部１７１は、図５に示す同一の番号が付与された各処理手段とほぼ同様の処理を行う手段である。
【０１０８】
文変換処理システム２５０は、素性−解対・素性−解候補対抽出部１６１により、教師なしデータ記憶部２０５から、各事例ごとに、解もしくは解候補と素性の集合との組を抽出する（図６：ステップＳ１１）。
【０１０９】
取り出した事例が、「犬＜？＞噛む。」⇒「が」であれば、例えば以下のような素性の集合を抽出する。
【０１１０】
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖの間にあった元の格助詞＝？（不明）。
そして、機械学習部１６２により、解もしくは解候補と素性の集合の組から、どのような解もしくは解候補と素性の集合のときに正例である確率または負例である確率を機械学習法により学習する。この学習結果を学習結果データベース１６３に記憶する（図６：ステップＳ１２）。
【０１１１】
以降、素性−解候補抽出部１７０に入力文３が入力されてから解推定処理部１７１で解４が出力されるまでの処理は、第１の実施の形態における処理として図６の処理フローのステップＳ１３〜ステップＳ１６の処理と同様であるので説明を省略する。
【０１１２】
〔第３の実施の形態〕
教師なしデータ記憶部２０５に記憶される事例（「問題−解」）は、解データベース２に記憶されている事例（「問題−解」）とほとんど同じ構造であることから、教師なしデータの事例と教師ありデータの事例とを混ぜ合わせて利用することも可能である。本形態で、教師なしデータおよび教師ありデータの両方を教師信号として用いて機械学習を行う方法を、「教師あり／なし学習」と呼ぶ。
【０１１３】
教師なしデータは、元の文に表れる変換前格助詞の情報を持たず、教師ありデータよりも情報が少ない。しかし、人手により事例ごとに解情報（変換後格助詞など）をタグ付けする必要がない。また、一般的に受け身文の数より能動文の数が多いため、多くの文を教師信号として利用できる。このため、教師あり／なし学習による文変換処理は、人手により解析対象の情報を付与するという労力負担を増やすことなく大量の教師データを用いた機械学習の学習結果を用いた文変換処理を行うことができるという利点がある。
【０１１４】
図１１に、第３の実施の形態における文変換処理システム３００の構成例を示す。文変換処理システム３００は、ＣＰＵおよびメモリからなり、問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、問題構造変換部２０４、教師なしデータ記憶部２０５、解−素性対抽出部１０１、機械学習部１０２、学習結果データベース１０３、素性抽出部１１０、解推定処理部１１１、解データベース２、および文データベース５を備える。文変換処理システム３００は、第２の実施の形態として説明した図８に示す構成を備える文変換処理システム２００に、さらに解データベース２を備えた構成をとり、文変換処理システム２００とほぼ同様の処理を行う。
【０１１５】
解−素性対抽出部１０１は、解データベース２に記憶された教師ありデータである事例および教師なしデータ記憶部２０５に記憶された教師なしデータである事例について、事例ごとに解と素性の集合との組を抽出する。
【０１１６】
図１２に、第３の実施の形態における文変換処理システムの別の構成例を示す。文変換処理システム３５０は、ＣＰＵおよびメモリからなり、問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、問題構造変換部２０４、教師なしデータ記憶部２０５、素性−解対・素性−解候補対抽出部１６１、機械学習部１６２、学習結果データベース１６３、素性−解候補抽出部１７０、解推定処理部１７１、解データベース２、および文データベース５を備える。
【０１１７】
文変換処理システム３５０は、第２の実施の形態として説明した図１０に示す構成を備える文変換処理システム２５０に、さらに解データベース２を備えた構成をとり、文変換処理システム２５０とほぼ同様の処理を行う。
【０１１８】
素性−解対・素性−解候補対抽出部１６１は、解データベース２に記憶された教師ありデータである事例および教師なしデータ記憶部２０５に記憶された教師なしデータである事例について、事例ごとに解もしくは解候補と素性の集合との組を抽出する。
【０１１９】
〔第４の実施の形態〕
第４の実施の形態として、言語解析処理を行う場合に、教師なしデータおよび教師ありデータの両方の利点を活かしたスタック型機械学習を行って解析処理を行う言語解析処理システムの処理を説明する。
【０１２０】
スタック型機械学習は、複数のシステムの解析結果の融合に用いられている「スタッキング」と呼ばれる手法を用いた機械学習であって、異なる機械学習法の解析結果を素性に追加した教師信号を用いて機械学習を行うものである。
［参考文献５：ＨａｎｓｖａｎＨａｌｔｅｒｅｎ，Ｊａｋｕｂ，Ｚａｖｒｅｌ，ａｎｄＷａｌｔｅｒＤａｅｌｅｍａｎｓ，ＩｍｐｒｏｖｉｎｇＡｃｃｕｒａｃｙｉｎＷｏｒｄＣｌａｓｓＴａｇｇｉｎｇＴｈｒｏｕｇｈｔｈｅＣｏｍｂｉｎａｔｉｏｎｏｆＭａｃｈｉｎｅＬｅａｒｎｉｎｇＳｙｓｔｅｍｓ，ＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓ，Ｖｏｌ．２７，Ｎｏ．２，（２００１），ｐｐ．１９９−２２９］
本形態において、言語解析処理システムは、借用型機械学習（教師なしデータを用いた機械学習）または併用型機械学習（教師あり／なしデータによる機械学習）を用いた言語解析処理を行い、その処理結果である推定解を素性の集合の要素として追加する。そして、推定解が追加された素性の集合を用いてさらに教師あり学習による言語解析処理を行う。
【０１２１】
例えば、本形態の言語解析処理システムで用いられる教師あり機械学習において、ある教師ありデータ（事例）から抽出される素性の集合がリスト｛ａ，ｂ，ｃ｝を持つとする。そして、スタッキング用処理システムが教師なし機械学習を用いた言語解析処理システムであり、その解析結果が「ｄ_１」であるとする。この場合に、言語解析処理システムの教師あり機械学習処理では、素性の集合｛ａ，ｂ，ｃ｝に解析結果「ｄ_１」を追加し、リスト｛ａ，ｂ，ｃ，”教師なし学習の解析結果＝ｄ_１”｝を新しい素性の集合として機械学習を行なう。
【０１２２】
また、スタッキング用処理システムが教師あり／なし機械学習を用いた言語解析処理システムであり、その解析結果が「ｄ_２」であるとする。この場合に、言語解析処理システムの教師あり機械学習処理では、素性の集合｛ａ，ｂ，ｃ｝に解析結果「ｄ_２」を追加し、リスト｛ａ，ｂ，ｃ，”教師あり／なし学習の解析結果＝ｄ_２”｝を新しい素性の集合として機械学習を行なう。
【０１２３】
また、スタッキング用処理システムとして、教師なし機械学習を用いた言語解析処理システムと、教師あり／なし機械学習を用いた言語解析処理システムとを利用することも可能である。この場合に、言語解析処理システムの教師あり機械学習処理では、素性の集合｛ａ，ｂ，ｃ｝に解析結果「ｄ_１」および「ｄ_２」を追加し、リスト｛ａ，ｂ，ｃ，”教師なし学習の解析結果＝ｄ_１”，”教師あり／なし学習の解析結果＝ｄ_２”｝を新しい素性の集合として機械学習を行なう。
【０１２４】
このように、スタッキング手法を用いて、教師ありデータを用いた非借用型機械学習と借用型機械学習または併用型機械学習とを組み合わせた場合には、教師あり機械学習に用いる教師ありデータ（事例）の素性が増加する。これにより、教師あり機械学習に用いる個々の事例自体が学習精度を向上させると考えられる。さらに、教師あり機械学習では、素性が増加してはいるが教師ありデータ（事例）についての正解率を最大にするような学習、すなわち解析処理対象についての精度を最大にするような学習を行い、その学習結果を用いて解析処理を行う。これにより、教師あり機械学習、教師なし機械学習それぞれの利点をうまく利用して高い解析精度を得ることが期待できる。
【０１２５】
図１３に、第４の実施の形態における言語解析処理システムの構成例を示す。
【０１２６】
言語解析処理システム５００は、与えられた問題に対する言語解析処理の解析結果を出力するシステムであって、ＣＰＵおよびメモリからなり、解−素性対抽出部５０１、機械学習部５０２、学習結果データベース５０３、素性抽出部５０４、解推定処理部５０５、スタック用教師なし学習処理システム１０１０、第１素性追加部５１１、第２素性追加部５１２、文データベース５、および解データベース６を備える。
【０１２７】
解−素性対抽出部５０１、機械学習部５０２、学習結果データベース５０３、素性抽出部５０４、および解推定処理部５０５の各処理手段は、それぞれ、文変換処理システム１００の解−素性対抽出部１０１、機械学習部１０２、学習結果データベース１０３、素性抽出部１１０、および解推定処理部１１１とほぼ同様の処理を行う手段である。
【０１２８】
スタック用教師なし学習処理システム１０１０は、言語解析処理について、文データベース５から生成した教師なしデータから素性の集合を抽出し、抽出された素性の集合からどのような素性の集合のときにどのような解（解析結果）になりやすいかを学習してその学習結果を記憶しておき、第１素性追加部５１１または第２素性追加部５１２から受け取った素性の集合の場合にどのような解（解析結果）になりやすいかを記憶しておいた学習結果から推定し、推定された解ｄ_１を第１素性追加部５１１へまたは解ｄ_１’を第２素性追加部５１２へ返却する手段である。
【０１２９】
スタック用教師なし学習処理システム１０１０は、図８に示す文変換処理システム２００と同様に構成された処理手段、すなわち問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、問題構造変換部２０４、教師なしデータ記憶部２０５、解−素性対抽出部１０１、機械学習部１０２、学習結果データベース１０３、素性抽出部１１０、および解推定処理部１１１を備え（図示しない）、与えられた問題に対する言語解析処理の解析結果を出力する。
【０１３０】
第１素性追加部５１１は、解−素性対抽出部５０１から受け取った解と素性の集合の組から素性の集合のみを取り出してスタック用教師なし学習処理システム１０１０へ渡し、スタック用教師なし学習処理システム１０１０から返却された解ｄ_１を受け取り、”教師なし学習の解析結果＝ｄ_１”を素性として元の素性の集合に追加する手段である。
【０１３１】
第２素性追加部５１２は、素性抽出部５０４から受け取った素性の集合を取り出してスタック用教師なし学習処理システム１０１０へ渡し、スタック用教師なし学習処理システム１０１０から返却された解ｄ_１’を受け取り、”教師なし学習の解析結果＝ｄ_１’”を素性として素性の集合に追加する手段である。
【０１３２】
図１４および図１５に、言語解析処理システム５００の処理フローを示す。
【０１３３】
ステップＳ３０：スタック用教師なし学習処理システム１０１０では、文データベース５に格納された単文を取り出す。取り出した文から問題表現情報を参照して問題表現相当部を抽出して解とし、意味解析情報を参照して問題表現相当部を問題構造に変換して結果として得た文を問題とし、この「問題−解」構造を持つ事例を教師なしデータとして記憶する。さらに、各事例ごとに解と素性の集合との組を抽出し、どのような素性のときにどのような解になりやすいかを機械学習法により学習し、学習結果を記憶しておく。
【０１３４】
ステップＳ３１：その後、解−素性対抽出部５０１により、解データベース６から事例を取り出し、各事例ごとに解と素性の集合との組を抽出する。
【０１３５】
ステップＳ３２：第１素性追加部５１１により、解と素性の集合との組のうち素性の集合のみを取り出し、スタック用教師なし学習処理システム１０１０へ渡す。
【０１３６】
ステップＳ３３：スタック用教師なし学習処理システム１０１０では、予め記憶しておいた学習結果を参照して、受け取った素性の集合についてどのような解になりやすいかを推定し、推定された解ｄ_１を第１素性追加部５１１へ返却する。
【０１３７】
ステップＳ３４：第１素性追加部５１１により、返却された解ｄ_１を素性として元の素性の集合に追加する。その結果、元の素性の集合が｛ａ，ｂ，ｃ｝であるとすると、機械学習部５０２に渡される素性の集合は、｛ａ，ｂ，ｃ，”教師なし学習の解析結果＝ｄ_１”｝となる。
【０１３８】
ステップＳ３５：機械学習部５０２により、解と”教師なし学習の解析結果＝ｄ_１”を含む素性の集合との組から、どのような素性のときにどのような解になりやすいかを学習し、学習結果を学習結果データベース５０３に記憶する。
【０１３９】
ステップＳ３６：解を求めたい文が素性抽出部５０４に入力される。
【０１４０】
ステップＳ３７：素性抽出部５０４により、入力文３から素性の集合を取り出して、第２素性追加部５１２へ渡す。
【０１４１】
ステップＳ３８：第２素性追加部５１２により、受け取った素性の集合がスタック用教師なし学習処理システム１０１０へ渡される。
【０１４２】
ステップＳ３９：スタック用教師なし学習処理システム１０１０では、予め記憶しておいた学習結果を参照して、受け取った素性の集合のときにどのような解となりやすいかを推定し、推定された解ｄ_１’を第２素性追加部５１２へ渡す。
【０１４３】
ステップＳ３１０：第２素性追加部５１２により、返却された解ｄ_１’を素性として元の素性の集合に追加する。元の素性の集合が｛ａ，ｂ，ｃ｝であるとすると、機械学習部５０２に渡される素性の集合は、｛ａ，ｂ，ｃ，”教師なし学習の解析結果＝ｄ_１’”｝となり、この素性の集合が解推定処理部５０５へ渡される。
【０１４４】
ステップＳ３１１：解推定処理部５０５により、学習結果データベース５０３に記憶された学習結果を参照して、渡された素性の集合の場合にどのような解になりやすいかを推定し、推定された解４を出力する。
【０１４５】
以下に、具体的な処理を例として言語解析処理システム５００の処理をより詳細に説明する。第１の具体例として、言語解析処理システム５００が受け身文・使役文から能動文への変換処理における変換後格助詞の推定を行う場合の処理例を示す。
【０１４６】
言語解析処理システム５００のスタック用教師なし学習処理システム１０１０では、予め受け身文・使役文から能動文への変換処理において変換すべき格助詞（推定すべき格助詞）を問題表現として記憶しておく。そして、文データベース５から取り出した文が「犬が噛む」であるときには、問題表現相当部として「が」を抽出して解（分類先）とし、文を「犬＜？＞噛む。」に変形して問題（文脈）とし、
事例（問題⇒解）：「犬＜？＞噛む。」⇒「が」
を記憶する。さらに、この事例から以下のような素性の集合を抽出する。
【０１４７】
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言と用言の間の元の（変換前）格助詞＝？（不明）
そして、この素性の集合の場合には変換後格助詞は「が」になりやすいと学習し、その学習結果を記憶する。
【０１４８】
また、文データベース５から取り出した文が「ヘビが噛む」であるときには、同様の処理により、
事例（問題⇒解）：「ヘビ＜？＞噛む。」⇒「が」
を記憶する。さらに、この事例から、以下のような素性の集合を抽出する。
【０１４９】
・推定すべき格にある体言ｎ＝ヘビ、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言と用言の間の元の（変換前）格助詞＝？（不明）
そして、この素性の集合の場合にも変換後格助詞は「が」になりやすいと学習し、その学習結果を記憶する。
【０１５０】
その後、解−素性対抽出部５０１により、解データベース６から、
事例（問題⇒解）：「犬に噛まれる。」⇒「が」
を取り出し、各事例ごとに解「が」と以下の素性の集合との組を抽出する。
【０１５１】
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖとの間の元の（変換前）格助詞＝に
さらに、第１素性追加部５１１により、抽出した解と素性の集合との組のうち、素性の集合のみを取り出し、スタック用教師なし学習処理システム１０１０へ渡す。スタック用教師なし学習処理システム１０１０では、予め記憶しておいた学習結果を参照して、受け取った素性の集合についてどのような解になりやすいかを推定し、推定された解ｄ_１「が」を第１素性追加部５１１へ返却する。
【０１５２】
次に、第１素性追加部５１１により、返却された解ｄ_１を素性として元の素性の集合に追加し、以下のような素性の集合とする。
【０１５３】
・推定すべき格にある体言ｎ＝犬、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖとの間の元の（変換前）格助詞＝に、
・教師なし学習の解析結果＝が（解ｄ_１）
そして、機械学習部５０２により、解と解ｄ_１を含む素性の集合との組から、どのような素性のときにどのような解になりやすいかを学習し、学習結果を学習結果データベース５０３に記憶する。
【０１５４】
その後、解を求めたい文が素性抽出部５０４に入力される。素性抽出部５０４により、入力文３から素性の集合を取り出す。例えば、入力文３が「ヘビに噛まれる。」である場合に、以下のような素性の集合を抽出して、第２素性追加部５１２へ渡す。
【０１５５】
・推定すべき格にある体言ｎ＝ヘビ、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖとの間の元の（変換前）格助詞＝に
そして、第２素性追加部５１２により、受け取った素性の集合がスタック用教師なし学習処理システム１０１０へ渡される。スタック用教師なし学習処理システム１０１０では、予め記憶しておいた学習結果を参照して、受け取った素性の集合のときにどのような解となりやすいかを推定し、推定された解ｄ_１’「が」を第２素性追加部５１２へ返却する。
【０１５６】
第２素性追加部５１２により、返却された解ｄ_１’を素性として元の素性の集合に追加する。例えば、以下のような素性の集合となる。
【０１５７】
・推定すべき格にある体言ｎ＝ヘビ、
・推定すべき格が修飾する用言ｖ＝噛む、
・体言ｎと用言ｖとの間の元の（変換前）格助詞＝に、
・教師なし学習の解析結果＝が（解ｄ_１’）
そして、解ｄ_１’を含む素性の集合は、解推定処理部５０５へ渡される。解推定処理部５０５により、学習結果データベース５０３に記憶された学習結果を参照して、渡された素性の集合の場合にどのような解になりやすいかを推定して、推定された解４を出力する。
【０１５８】
ここでは、スタック用教師なし学習処理システム１０１０から返却された解析結果「が」を追加した素性の集合をもとに教師あり学習の学習結果を参照して推定した格助詞「が」が出力される。
【０１５９】
このように、機械学習部５０２は、解データベース６の教師ありデータ（事例）から抽出した素性の集合に”教師なし学習の解析結果＝ｄ_１”を追加した素性の集合を用いて機械学習を行う。この場合に用いる素性の集合は、教師ありデータから抽出した素性の集合よりも素性の情報が多くなるため、教師ありデータのみを用いて機械学習を行う場合に比べてより高い精度で機械学習を行うことができる。また、データ量は膨大であるが素性の情報が少ない教師なしデータのみを用いて機械学習を行う場合に比べても、素性の情報が多い点でより高い精度の機械学習を行うことができる。
【０１６０】
さらに、解推定処理部５０５は、素性の集合の情報が多い事例を用いて学習された高い精度の学習結果を参照して、入力文３から抽出した素性の集合の類似性をみることになる。したがって、素性の集合に”教師なし学習の解析結果＝ｄ_１’”を含まない場合に比べて、素性の集合同士の類似性が高くなり、推定処理の精度も高くなる。
【０１６１】
第２の具体例として、言語解析処理システム５００が、文の意味が深層格などで表現されている場合に、その文を生成する際に与えられる表層格を推定する処理を行う場合の処理例を示す。
【０１６２】
例えば、文の意味を深層格で示すと以下のように表すことができる。
【０１６３】
文「りんご＜←ｏｂｊ＞食べる」
この文において、「りんご」は「食べる」の目的語であり、「りんご」と「食べる」とは深層格の目的格（＜←ｏｂｊ＞で示す。）で連結されている。
【０１６４】
そして、文生成処理では、前記の元の文から、生成文「りんごを食べる」を出力するが、この場合に＜←ｏｂｊ＞に対応する格助詞「を」を生成する必要がある。この処理において与えられる問題構造（問題⇒格）を以下に示す。
【０１６５】
問題（問題⇒格）：
「りんご＜←ｏｂｊ＞食べる」⇒「を」
言語解析処理システム５００のスタック用教師なし学習処理システム１０１０は、与えられている深層格を問題表現として記憶しておく。そして、スタック用教師なし学習処理システム１０１０では、文データベース５から取り出した文が「りんごを食べる。」である場合に、格助詞「を」を問題表現相当部として置き換え、格助詞「を」を解として抽出し、取り出した文の問題表現相当部を変換した結果得た文を問題として、以下のような事例を教師なしデータとして記憶する。
【０１６６】
事例（問題⇒解）：
「りんご＜？＞食べる」⇒「を」
さらに、この事例から解と素性の集合との組を抽出する。ここで、素性の集合は、以下のようになる。
【０１６７】
・生成すべき格にある体言ｎ＝りんご、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝？（不明）
そして、どのような素性の集合のときにどのような解となりやすいかを学習し、その学習結果を記憶しておく。例えば、前記の素性の集合の場合には「解＝を」になりやすいと学習する。
【０１６８】
また、文データベース５から文「みかんを食べる」を取り出したとする。この場合には、以下のような事例を教師なしデータとする。
【０１６９】
事例（問題⇒解）：
「みかん＜？＞食べる」⇒「を」
さらに、この事例から解と素性の集合との組を抽出する。ここで、素性の集合は、以下のようになる。
【０１７０】
・生成すべき格にある体言ｎ＝みかん、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝？（不明）
なお、文生成処理における格推定の場合にも、一般的な教師ありデータに比べて素性の情報は少なくなるが、教師なしデータとして利用できる文自体は多量にあるため、多数の教師なしデータを準備することが可能である。
【０１７１】
そして、どのような素性の集合のときにどのような解となりやすいかを学習し、その学習結果を記憶しておく。この場合にも、「解＝を」になりやすいと学習する。
【０１７２】
その後、解−素性対抽出部５０１により、解データベース６から以下の事例を取り出したとする。
【０１７３】
事例：「りんご＜←ｏｂｊ＞食べる」⇒「を」
さらに、取り出した事例から解と素性の集合との組を抽出する。素性の集合として以下のものが抽出される。
【０１７４】
・生成すべき格にある体言ｎ＝りんご、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝ｏｂｊ
第１素性追加部５１１により、抽出した素性の集合をスタック用教師なし学習処理システム１０１０へ渡し、スタック用教師なし学習処理システム１０１０では、記憶しておいた学習結果をもとに、受け取った素性の集合の場合にどのような解になりやすいかを推定し、推定された解ｄ_１＝「を」を第１素性追加部５１１へ返却する。そして、第１素性追加部５１１は、返却された解ｄ_１を素性の集合に追加して、以下の素性の集合とする。
【０１７５】
・生成すべき格にある体言ｎ＝りんご、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝ｏｂｊ、
・教師なし学習の解析結果＝を（解ｄ_１）
そして、機械学習部５０２は、前記の素性の集合の場合にどのような解になりやすいかを学習する。このとき、スタック用教師なし学習処理システム１０１０から取得した解ｄ_１による”教師なし学習の解析結果＝を（解ｄ_１）”を素性の集合として持つため、
・生成すべき格にある体言ｎ＝りんご、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝ｏｂｊ、
・教師なし学習の解析結果＝を（解ｄ_１）
という素性があれば、「を」が解となるという学習ができている。この学習結果を学習結果データベース５０３に記憶する。
【０１７６】
その後、素性抽出部５０４に文「みかん＜←ｏｂｊ＞食べる」が入力されると、素性抽出部５０４は、入力文３から、以下のような素性の集合を抽出して、第２素性追加部５１２へ渡す。
【０１７７】
・生成すべき格にある体言ｎ＝みかん、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝ｏｂｊ
第２素性追加部５１２により、この素性の集合がスタック用教師なし学習処理システム１０１０に渡されると、スタック用教師なし学習処理システム１０１０では、記憶しておいた学習結果を参照して受け取った素性の集合の場合になりやすい解ｄ_１’＝「を」を推定し、第２素性追加部５１２へ返却する。
【０１７８】
第２素性追加部５１２は、元の素性の集合に解ｄ_１’を追加した以下の素性の集合を解推定処理部５０５へ渡す。
【０１７９】
・生成すべき格にある体言ｎ＝みかん、
・生成すべき格が修飾する用言ｖ＝食べる、
・体言ｎと用言ｖの間の深層格＝ｏｂｊ、
・教師なし学習の解析結果＝を（解ｄ_１’）
解推定処理部５０５により、この素性の集合の場合にどのような解になりやすいかを推定する。ここで、学習結果として記憶しておいた素性の集合と、入力文３から抽出した素性の集合とがよく類似しているので、学習結果で解とした「を」を正しく推定することができる。そして、推定された解４として生成すべき格助詞「を」を出力する。
【０１８０】
次に、第３の具体例として、言語解析処理システム５００が、動詞の省略表現を補完する処理を行う場合の処理例を示す。例えば、「そんなにうまくいくとは。」という文は文末の動詞部分が省略されている表現であると考えて、省略された動詞部分「思えない」を補完する処理を行う。
【０１８１】
この場合に、省略された「補完すべき動詞部分」を問題表現とし、その省略表現を補完する「動詞部分」を解とする。言語解析処理システム５００のスタック用教師なし学習処理システム１０１０では、このような問題表現を抽出するために予め問題表現情報を記憶しておく。
【０１８２】
そして、文データベース５から取り出した文が「そんなにうまくいくとは思えない。」である場合に、文末の動詞部分を問題表現相当部として置き換え、文末の動詞部分「思えない」を解として抽出し、取り出した文の問題表現相当部を変換した結果得た文を問題として、以下のような事例を教師なしデータとして記憶する。
【０１８３】
事例（問題⇒解）：
「そんなにうまくいくとは＜？＞」⇒「思えない」
さらに、この事例から解と素性の集合との組を抽出する。ここで、素性の集合は、以下のようになる。
【０１８４】
・「は」、
・「とは」、
・「くとは」、
・「いくとは」、
…、
・「そんなにうまくいくとは思えない」
そして、どのような素性の集合のときにどのような解となりやすいかを学習し、その学習結果を記憶しておく。例えば、前記の素性の集合の場合には「解＝思えない」になりやすいと学習する。
【０１８５】
その後、解−素性対抽出部５０１により、解データベース６から、
事例：「そんなにうまくいくとは。」⇒「思えない」
を取り出し、取り出した事例から解と素性の集合との組を抽出する。ここで、素性の集合は、以下の素性からなる。
【０１８６】
・「は」、
・「とは」、
・「くとは」、
・「いくとは」、
…、
・「そんなにうまくいくとは」
・「そんなにうまくいくとは思えない」
第１素性追加部５１１は、抽出した素性の集合をスタック用教師なし学習処理システム１０１０へ渡す。
【０１８７】
スタック用教師なし学習処理システム１０１０では、記憶しておいた学習結果をもとに、受け取った素性の集合の場合にどのような解になりやすいかを推定し、推定された解ｄ_１＝「思えない」を第１素性追加部５１１へ返却する。
【０１８８】
そして、第１素性追加部５１１は、返却された解ｄ_１を素性の集合に追加して、以下の素性の集合とする。
【０１８９】
・「は」、
・「とは」、
・「くとは」、
・「いくとは」、
…、
・「そんなにうまくいくとは」
・「そんなにうまくいくとは思えない」
・教師なし学習の解析結果＝思えない（解ｄ_１）
そして、機械学習部５０２は、前記の素性の集合の場合にどのような解になりやすいかを学習し、学習結果を学習結果データベース５０３に記憶する。
【０１９０】
その後、素性抽出部５０４に文「そううまくいくとは。」が入力されると、素性抽出部５０４は、入力文３から、以下のような素性の集合を抽出して、第２素性追加部５１２へ渡す。
【０１９１】
・「は」、
・「とは」、
・「くとは」、
・「いくとは」、
…、
・「そううまくいくとは」
第２素性追加部５１２により、この素性の集合がスタック用教師なし学習処理システム１０１０に渡されると、スタック用教師なし学習処理システム１０１０では、記憶しておいた学習結果を参照して受け取った素性の集合の場合になりやすい解ｄ_１’＝「思えない」を推定し、第２素性追加部５１２へ返却する。
【０１９２】
第２素性追加部５１２は、元の素性の集合に解ｄ_１’を追加した以下の素性の集合を解推定処理部５０５へ渡す。
【０１９３】
・「は」、
・「とは」、
・「くとは」、
・「いくとは」、
…、
・「そううまくいくとは」
・教師なし学習の解析結果＝思えない（解ｄ_１’）
解推定処理部５０５により、この素性の集合の場合にどのような解になりやすいかを推定し、推定された解４として省略された動詞部分「思えない」を出力する。
【０１９４】
図１６に、第４の実施の形態における言語解析処理システムの別の構成例を示す。言語解析処理システム５４０は、言語解析処理システム５００と同様の処理手段を備え、スタック用教師なし学習処理システム１０１０の代わりに、スタック用教師あり／なし学習処理システム１０２０を備えた構成をとる。
【０１９５】
スタック用教師あり／なし学習処理システム１０２０は、スタック用教師なし学習処理システム１０１０と同様の処理手段に解データベース２を追加した構成をとる。スタック用教師あり／なし学習処理システム１０２０は、言語解析処理について、文データベース５から生成した教師なしデータおよび解データベース２の事例（教師ありデータ）からそれぞれ素性の集合を抽出し、抽出された素性からどのような素性の集合のときにどのような解（解析結果）になりやすいかを学習してその学習結果を記憶しておき、第１素性追加部５１１または第２素性追加部５１２から受け取った素性の集合の場合にどのような解（解析結果）になりやすいかを記憶しておいた学習結果から推定し、推定された解ｄ_２を第１素性追加部５１１へ、または解ｄ_２’を第２素性追加部５１２へ返却する手段である。
【０１９６】
言語解析処理システム５４０の第１素性追加部５１１は、スタック用教師あり／なし学習処理システム１０２０から返却された解ｄ_２を受け取り、”教師あり／なし学習の解析結果＝ｄ_２”を素性として元の素性の集合に追加する。また、言語解析処理システム５４０の第２素性追加部５１２は、スタック用教師あり／なし学習処理システム１０２０から返却された解ｄ_２’を受け取り、”教師あり／なし学習の解析結果＝ｄ_２’”を素性として素性の集合に追加する。
【０１９７】
さらに、図１７に、第４の実施の形態における言語解析処理システムの別の構成例を示す。
【０１９８】
言語解析処理システム５５０は、与えられた問題に対する言語解析処理の解析結果を出力システムであって、ＣＰＵおよびメモリからなり、素性−解対・素性−解候補対抽出部５６１、機械学習部５６２、学習結果データベース５６３、素性−解候補抽出部５６４、解推定処理部５６５、スタック用教師なし学習処理システム１０３０、第１素性追加部５２１、第２素性追加部５２２、文データベース５、および解データベース６を備える。
【０１９９】
素性−解対・素性−解候補対抽出部５６１、機械学習部５６２、学習結果データベース５６３、素性−解候補抽出部５６４、および解推定処理部５６５の各処理手段は、それぞれ、文変換処理システム１５０の素性−解対・素性−解候補対抽出部１６１、機械学習部１６２、学習結果データベース１６３、素性−解候補抽出部１７０、および解推定処理部１７１とほぼ同様の処理を行う手段である。
【０２００】
スタック用教師なし学習処理システム１０３０は、言語解析処理について、文データベース５から生成した教師なしデータから解もしくは解候補と素性の集合との組を抽出し、抽出された解もしくは解候補と素性の集合との組から、どのような解もしくは解候補と素性の集合のときに正例である確率または負例である確率を機械学習法により学習してその学習結果を記憶しておき、この学習結果を参照して第１素性追加部５２１または第２素性追加部５２２から受け取った解もしくは解候補と素性の集合との組の場合に正例または負例である確率を求めて正例である確率が最も大きい解候補を解（解析結果）と推定し、推定された解ｄ_３を第１素性追加部５２１へまたは解ｄ_３’を第２素性追加部５２２へ返却する手段である。
【０２０１】
スタック用教師なし学習処理システム１０３０は、解ｄ_３、解ｄ_３’として、解と推定した解候補を出力するとともに、その解が正例もしくは負例であるかの情報や、正例もしくは負例である確率の情報などを出力することもできる。
【０２０２】
スタック用教師なし学習処理システム１０３０は、図１０に示す文変換処理システム２５０と同様に構成された処理手段、すなわち問題表現相当部抽出部２０１、問題表現情報記憶部２０２、意味解析情報記憶部２０３、問題構造変換部２０４、教師なしデータ記憶部２０５、素性−解対・素性−解候補対抽出部１６１、機械学習部１６２、学習結果データベース１６３、素性−解候補抽出部１７０、および解推定処理部１７１を備え（図示しない）、与えられた問題に対する言語解析処理の解析結果を出力する。
【０２０３】
第１素性追加部５２１は、素性−解対・素性−解候補対抽出部５６１から受け取った解もしくは解候補と素性の集合との組をスタック用教師なし学習処理システム１０３０へ渡し、スタック用教師なし学習処理システム１０３０から返却された解ｄ_３を受け取り、”教師なし学習の解析結果＝解ｄ_３”を素性として元の素性の集合に追加する手段である。
【０２０４】
第２素性追加部５２２は、素性−解候補抽出部５６４から受け取った解候補と素性の集合との組をスタック用教師なし学習処理システム１０３０へ渡し、スタック用教師なし学習処理システム１０３０から返却された解ｄ_３’を受け取り、”教師なし学習の解析結果＝解ｄ_３’”を素性として元の素性の集合に追加する手段である。
【０２０５】
図１８および図１９に、言語解析処理システム５５０の処理フローを示す。
【０２０６】
ステップＳ４０：スタック用教師なし学習処理システム１０３０では、文データベース５に格納された単文を取り出し、取り出した文から問題表現情報を参照して問題表現相当部を抽出して解とし、さらに意味解析情報を参照して問題表現相当部を問題構造に変換し、変換結果として得た文を問題として「問題−解」構造を持つ事例を教師なしデータとして記憶する。さらに、各事例ごとに解もしくは解候補と素性の集合との組を抽出し、どのような解もしくは解候補と素性の集合との組のときに正例である確率または負例である確率を機械学習法により学習し、学習結果を記憶しておく。
【０２０７】
ステップＳ４１：その後、素性−解対・素性−解候補対抽出部５６１により、解データベース６から事例を取り出し、各事例ごとに解もしくは解候補と素性の集合との組を抽出する。
【０２０８】
ステップＳ４２：第１素性追加部５２１により、解もしくは解候補と素性の集合との組をスタック用教師なし学習処理システム１０３０へ渡す。
【０２０９】
ステップＳ４３：スタック用教師なし学習処理システム１０３０では、予め記憶しておいた学習結果を参照して、受け取った解もしくは解候補と素性の集合との組について正例である確率または負例である確率を求めて正例である確率が最も大きい解候補を解ｄ_３と推定し、解ｄ_３を第１素性追加部５２１へ返却する。
【０２１０】
ステップＳ４４：第１素性追加部５２１により、返却された解ｄ_３から、”教師なし学習の解析結果＝解ｄ_３”を素性として元の素性の集合に追加する。解ｄ_３として、推定された解候補の他に、正例もしくは負例であるかの情報、正例もしくは負例である確率などの情報が含まれている場合には、受け取った解ｄ_３に含まれる情報の一部または全部を素性の集合に追加するようにしてもよい。例えば、”教師なし学習の解析結果＝推定された解候補（解ｄ_３）”、”教師なし学習の解析結果＝正例／負例（解ｄ_３）”、または”教師なし学習の解析結果＝正例の確率／負例の確率（解ｄ_３）”のような素性の１つもしくは複数が元の素性の集合に追加される。
【０２１１】
ステップＳ４１〜ステップＳ４４の処理は、すべての解もしくは解候補と素性の集合との組について行なわれる。
【０２１２】
ステップＳ４５：機械学習部５６２により、解もしくは解候補と解ｄ_３を含む素性の集合との組から、どのような解もしくは解候補と素性の集合の組のときに正例である確率または負例である確率を機械学習法により求め、その学習結果を学習結果データベース５６３に記憶する。
【０２１３】
ステップＳ４６：解を求めたい文が素性−解候補抽出部５６４に入力される。
【０２１４】
ステップＳ４７：素性−解候補抽出部５６４により、入力文３から解候補と素性の集合との組を取り出す。
【０２１５】
ステップＳ４８：第２素性追加部５２２により、受け取った解候補と素性の集合との組をスタック用教師なし学習処理システム１０３０へ渡す。
【０２１６】
ステップＳ４９：スタック用教師なし学習処理システム１０３０では、予め記憶しておいた学習結果を参照して、受け取った解候補と素性の集合との組からどのような解候補と素性の集合との組のときに正例である確率または負例である確率を求めて正例である確率が最も大きい解候補を解ｄ_３’と推定し、解ｄ_３’を第２素性追加部５２２へ返却する。
【０２１７】
ステップＳ４１０：第２素性追加部５２２により、返却された解ｄ_３’から、”教師なし学習の解析結果＝解ｄ_３’”を素性として元の素性の集合に追加する。
【０２１８】
ステップＳ４１１：解推定処理部５６５により、学習結果データベース５６３に記憶された学習結果を参照して、渡された解候補と素性の集合との場合に正例である確率または負例である確率を求める。すべての解候補についてこの確率を求め、正例である確率が最も大きい解候補を求める解４として出力する。
【０２１９】
図２０に、第４の実施の形態における言語解析処理システムの別の構成例を示す。言語解析処理システム５８０は、言語解析処理システム５５０と同様の処理手段を備え、スタック用教師なし学習処理システム１０３０の代わりに、スタック用教師あり／なし学習処理システム１０４０を備えた構成をとる。
【０２２０】
スタック用教師あり／なし学習処理システム１０４０は、スタック用教師あり／なし学習処理システム１０２０と同様の処理手段に解データベース２を追加した構成をとる。スタック用教師あり／なし学習処理システム１０４０は、言語解析処理について、文データベース５から生成した教師なしデータから解もしくは解候補と素性の集合との組を抽出し、抽出された解もしくは解候補と素性の集合との組から、どのような解もしくは解候補と素性の集合のときに正例である確率または負例である確率を機械学習法により学習してその学習結果を記憶しておき、この学習結果を参照して第１素性追加部５２１または第２素性追加部５２２から受け取った解もしくは解候補と素性の集合との組の場合に正例または負例である確率を求めて正例である確率が最も大きい解候補を解（解析結果）と推定し、推定された解ｄ_４を第１素性追加部５２１へまたは解ｄ_４’を第２素性追加部５２２へ返却する手段である。
【０２２１】
スタック用教師あり／なし学習処理システム１０４０は、解ｄ_４、解ｄ_４’として、解と推定した解候補を出力するとともに、その解が正例もしくは負例であるかの情報や、正例もしくは負例である確率の情報などを出力することもできる。
【０２２２】
言語解析処理システム５８０の第１素性追加部５２１は、スタック用教師あり／なし学習処理システム１０４０から返却された解ｄ_４を受け取り、”教師あり／なし学習の解析結果＝ｄ_４”を素性として元の素性の集合に追加する。また、言語解析処理システム５８０の第２素性追加部５２２は、スタック用教師あり／なし学習処理システム１０４０から返却された解ｄ_４’を受け取り、”教師あり／なし学習の解析結果＝ｄ_４’”を素性として元の素性の集合に追加する。
【０２２３】
図２１に、第４の実施の形態における言語解析処理システムの別の構成例を示す。言語解析処理システム６００は、言語解析処理システム５００と同様の処理手段を備え、さらにスタック用教師あり／なし学習処理システム１０２０を備えた構成をとる。
【０２２４】
言語解析処理システム６００の第１素性追加部６１１は、解−素性対抽出部５０１から受け取った解と素性の集合との組から素性の集合のみをスタック用教師なし学習処理システム１０１０およびスタック用教師あり／なし学習処理システム１０２０へ渡し、スタック用教師なし学習処理システム１０１０から返却された解ｄ_１およびスタック用教師あり／なし学習処理システム１０２０から返却された解ｄ_２を受け取る。そして、”教師なし学習の解析結果＝ｄ_１”および”教師あり／なし学習の解析結果＝ｄ_２”を素性として元の素性の集合に追加する。
【０２２５】
また、言語解析処理システム６００の第２素性追加部６１２は、素性抽出部５０４から受け取った素性の集合をスタック用教師なし学習処理システム１０１０およびスタック用教師あり／なし学習処理システム１０２０へ渡し、スタック用教師なし学習処理システム１０１０から返却された解ｄ_１’およびスタック用教師あり／なし学習処理システム１０２０から返却された解ｄ_２’を受け取り、”教師なし学習の解析結果＝ｄ_１’”および”教師あり／なし学習の解析結果＝ｄ_２’”を素性として元の素性の集合に追加する。
【０２２６】
図２２に、第４の実施の形態における言語解析処理システムの別の構成例を示す。言語解析処理システム６５０は、言語解析処理システム５５０と同様の処理手段を備え、さらにスタック用教師あり／なし学習処理システム１０４０を備えた構成をとる。
【０２２７】
言語解析処理システム６５０の第１素性追加部６２１は、素性−解対・素性−解候補対抽出部５６１から受け取った解もしくは解候補と素性の集合との組をスタック用教師なし学習処理システム１０３０およびスタック用教師あり／なし学習処理システム１０４０へ渡し、スタック用教師なし学習処理システム１０３０から返却された解ｄ_３およびスタック用教師あり／なし学習処理システム１０４０から返却された解ｄ_４を受け取る。そして、”教師なし学習の解析結果＝ｄ_３”および”教師あり／なし学習の解析結果＝ｄ_４”を素性として元の素性の集合に追加する。
【０２２８】
また、言語解析処理システム６５０の第２素性追加部６２２は、素性−解候補抽出部５６４から受け取った解候補と素性の集合との組をスタック用教師なし学習処理システム１０３０およびスタック用教師あり／なし学習処理システム１０４０へ渡し、スタック用教師なし学習処理システム１０３０から返却された解ｄ_３’およびスタック用教師あり／なし学習処理システム１０４０から返却された解ｄ_４’を受け取り、”教師なし学習の解析結果＝ｄ_３’”および”教師あり／なし学習の解析結果＝ｄ_４’”を素性として元の素性の集合に追加する。
【０２２９】
スタック用教師なし学習処理システム１０３０およびスタック用教師あり／なし学習処理システム１０４０は、解ｄ_３、解ｄ_３’、解ｄ_４、解ｄ_４’として、解と推定した解候補を出力するとともに、その解が正例もしくは負例であるかの情報や、正例もしくは負例である確率の情報などを出力することもできる。この場合には、受け取った解に含まれる情報の一部または全部が素性の集合に追加されるようにする。例えば、”教師なし学習の解析結果＝推定された解候補”、”教師なし学習の解析結果＝正例／負例”、または”教師なし学習の解析結果＝正例の確率／負例の確率”のような素性などの１つもしくは複数が元の素性の集合に追加される。
【０２３０】
すでに説明したように、教師なしデータは、教師ありデータと異なる性質を持つことから、単純に教師なしデータを教師ありデータに追加して機械学習を行うことが処理精度の改善に不十分である場合もある。本形態のようにスタッキング手法により教師なしデータによる機械学習と教師ありデータによる機械学習とを融合することで、これら双方の学習の利点を適切に利用することができ、解析処理の精度向上を図ることができたと思われる。
【０２３１】
最後に、従来技術による手法と本発明の手法の実施例を示す。実施例として受け身文・使役文から能動文への文変換処理における格変換処理を採用した。機械学習法としてサポートベクトルマシン法を採用した。また、京大コーパスを教師ありデータとして利用し、また、京大コーパスに含まれるの能動文のすべての格助詞（５３，１５７個）を教師なしデータとして利用した。図２３に、教師なしデータにおける変換後格助詞の分布を示す。
【０２３２】
さらに、実施例での処理精度の評価にも京大コーパスを用い、１０分割のクロスバリデーションにより評価を行った。
［参考文献６：黒橋禎夫、長尾真、京都大学テキストコーパス・プロジェクト、言語処理学会第３回年次大会、１９９７、ｐｐ１１５−１１８］
以下の方法を用いて格助詞の変換の実験を行なった。
【０２３３】
・教師あり学習の利用
・教師なし学習の利用
・教師あり／なし学習の利用
・スタッキング手法１：
教師なし学習の解析結果を素性に追加後、教師あり学習を行なう。
【０２３４】
・スタッキング手法２：
教師あり／なし学習の解析結果を素性に追加後、教師あり学習を行なう。
【０２３５】
・スタッキング手法３：
教師なし学習の解析結果と教師あり／なし学習の解析結果とを素性に追加後、教師あり学習を行なう。
【０２３６】
処理精度の評価結果を、以下に示す。処理精度は教師ありデータの事例数４，６７１個のうち、どれだけ正解したかを意味する。
【０２３７】
・教師あり学習の利用＝８９．０６％
・教師なし学習の利用＝５１．１５％
・教師あり／なし学習の利用＝８７．０９％
・スタッキング手法１＝８９．４７％
・スタッキング手法２＝８９．５５％
・スタッキング手法３＝８９．５５％
教師あり学習方法を用いた処理の精度は、８９．０６％であった。これは、受け身文・使役文から能動文へ文変換における格助詞の変換処理を、機械学習法を用いて処理することにより、少なくともこの精度で実現できることを意味する。従来、機械学習法を用いた格助詞の変換処理はないので、本発明の実施例が示すこの精度は、本発明の格別な効果を示すものである。
【０２３８】
教師なし学習方法を用いた処理の精度は、５１．１５％と極めて低かった。解析対象である変換前格助詞の情報の欠如の影響が大きいと考えられる。
【０２３９】
また、教師あり／なし学習方法を用いた処理の精度も、教師あり学習方法を用いた処理の精度よりも低かった。教師なしデータは、教師ありデータとは異なる性質を持つため、教師なしデータの利用が精度低下を招いたと考えられる。
【０２４０】
すべてのスタッキング手法を用いた処理の精度は、教師あり学習方法を用いた処理の精度の精度を上回った。しかし、精度の向上は大きくない。そこで、二項検定を使って統計的検定を行なった結果、すべてのスタッキング手法が教師あり学習に対して有意水準０．０１で有意差を持った。このため、本発明における、教師なし学習の結果を素性に追加して利用する手法が、効果を持つことが確認できた。
【０２４１】
さらに、本発明の「教師あり学習を用いた処理」の精度との比較のため、従来技術の一つとして非特許文献４に記載された方法による処理を実施した。
【０２４２】
非特許文献４に記載された手法による格変換処理の精度はＦ値で３６％（再現率７５％、適合率２４％）であった。この従来技術による処理精度が低い理由は、与えられた文に辞書にない語が存在することである。そのような辞書に未定義の語を登録した後の処理の精度はＦ値で８３％（再現率９４％、適合率７４％）であった。なお、ここで精度をＦ値で示しているのは非特許文献４の手法での格変換は１つの入力に複数の変換結果を出力するためである。このように、すでに指摘したとおり既存の各フレーム辞書の不十分さの影響が大きいことがわかる。
【０２４３】
また、非特許文献４の手法による処理結果が文単位であるため、本発明による処理結果も文単位で集計した。このとき、本発明による処理では、文単位の精度は８５．５８％であった。ただし、ここでの文単位は用言が１つの文であり、複文など複数の文により構成されている文は用言が１つの文に分割してから精度の算出を行なった。
【０２４４】
本発明による処理の精度は、非特許文献４に示す手法で未知語などを辞書に登録した後の処理精度と同程度である。本発明では、解析対象となる情報について辞書への追加登録などは一切行なわずに８５％程度の精度を得ている。このことから、本発明による処理が、従来技術より高い精度で処理を行えることがわかる。
【０２４５】
以上、本発明をその実施の形態により説明したが、本発明はその主旨の範囲において種々の変形が可能であることは当然である。
【０２４６】
本発明の実施の形態では、主に受け身文、使役文から能動文への変換処理における格助詞の変換を扱った。しかし、本発明における機械学習部での分類先を能動文での格助詞から受け身文、使役文での格助詞とすることにより、能動文から受け身文、使役文への変換処理についても本発明を適用することが可能である。
【０２４７】
また、本発明の実施の形態で言語解析処理として説明した解析処理以外にも、指示詞・代名詞・ゼロ代名詞などの照応解析、間接照応解析、「ＡのＢ」の意味解析、換喩解析などの種々の解析処理、文生成処理における格助詞生成処理、翻訳処理における格助詞生成処理などの処理についても本発明を適用することが可能である。
【０２４８】
また、本発明の各手段または機能または要素は、コンピュータにより読み取られ実行される処理プログラムとして実現することができる。また、本発明を実現する処理プログラムは、コンピュータが読み取り可能な、可搬媒体メモリ、半導体メモリ、ハードディスクなどの適当な記録媒体に格納することができ、これらの記録媒体に記録して提供され、または、通信インタフェースを介して種々の通信網を利用した送受信により提供されるものである。
【０２４９】
【発明の効果】
以上説明したように、本発明により、教師なしデータを用いた機械学習の解析結果を素性に追加し、追加された素性を持つ教師ありデータを用いて機械学習を行なう新しい手法を実現した。これにより、教師なしデータと教師ありデータの双方の利点を用いた機械学習が実現でき、より高い精度の文変換処理を実現することが可能となった。
【０２５０】
特に本発明は、省略補完処理、文生成処理、機械翻訳処理、文字認識処理、音声認識処理など、語句生成処理を含むようなきわめて広範囲の問題に適用することができる。これにより、実用性の高い言語解析処理システムを実現することができる。
【０２５１】
また、本発明により、日本語の受け身文・使役文から能動文へ変換処理における格助詞の変換を機械学習を用いて行う新しい手法を実現した。本発明により、従来に比べて高い精度で変換後格助詞の推定を行うことが可能となった。
【０２５２】
本発明を適用した受け身文・使役文から能動文への変換は、文生成処理、文言い換え処理、知識獲得システム、質問応答システムなどのコンピュータを用いた自然言語処理の数多くの分野で役に立つものである。
【図面の簡単な説明】
【図１】第１の実施の形態における文変換処理システムの構成例を示す図である。
【図２】第１の実施の形態における文変換処理システムの処理フローを示す図である。
【図３】タグ付きコーパスに記憶されている事例の例を示す図である。
【図４】サポートベクトルマシン法のマージン最大化の概念を示す図である。
【図５】第１の実施の形態における文変換処理システムの別の構成例を示す図である。
【図６】第１の実施の形態において別の構成例をとる文変換処理システムの処理フローを示す図である。
【図７】教師なしデータを説明するための図である。
【図８】第２の実施の形態における文変換処理システムの構成例を示す図である。
【図９】教師なしデータ生成処理の処理フローを示す図である。
【図１０】第２の実施の形態における文変換処理システムの別の構成例を示す図である。
【図１１】第３の実施の形態における文変換処理システムの構成例を示す図である。
【図１２】第３の実施の形態における文変換処理システムの別の構成例を示す図である。
【図１３】第４の実施の形態における言語解析処理システムの構成例を示す図である。
【図１４】第４の実施の形態における言語解析処理システムの処理フローを示す図である。
【図１５】第４の実施の形態における言語解析処理システムの処理フローを示す図である。
【図１６】第４の実施の形態における言語解析処理システムの別の構成例を示す図である。
【図１７】第４の実施の形態における言語解析処理システムの別の構成例を示す図である。
【図１８】第４の実施の形態において別の構成例をとる言語解析処理システムの処理フローを示す図である。
【図１９】第４の実施の形態において別の構成例をとる言語解析処理システムの処理フローを示す図である。
【図２０】第４の実施の形態における言語解析処理システムの別の構成例を示す図である。
【図２１】第４の実施の形態における言語解析処理システムの別の構成例を示す図である。
【図２２】第４の実施の形態における言語解析処理システムの別の構成例を示す図である。
【図２３】実施例において教師なしデータにおける変換後格助詞の分布を示す図である。
【符号の説明】
１００，１５０，２００，２５０，３００，３５０文変換処理システム
１０１，５０１解−素性対抽出部
１０２，１６２，５０２，５６２機械学習部
１０３，１６３，５０３，５６３学習結果データベース
１１０，５０４素性抽出部
１１１，１７１，５０５，５６５解推定処理部
１６１，５６１素性−解対・素性−解候補対抽出部
１７０，５６４素性−解候補抽出部
２０１問題表現相当部抽出部
２０２問題表現情報記憶部
２０３意味解析情報記憶部
２０４問題構造変換部
２０５教師なしデータ記憶部
５００，５４０，５５０，５８０，６００，６５０言語解析処理システム
５１１，５２１，６１１，６２１第１素性追加部
５１２，５２２，６１２，６２２第２素性追加部
１０１０，１０３０スタック用教師なし学習処理システム
１０２０，１０４０スタック用教師あり／なし学習処理システム
２，６解データベース
３入力文
４解
５文データベース[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a computer-implemented natural language processing technology. More specifically, the present invention relates to a language analysis processing method using a sentence digitized by a machine learning method and a processing system for realizing the processing method.
[0002]
In particular, the present invention can be applied to language processing that handles a very wide range of problems, including word and phrase generation processing, such as abbreviation completion processing, sentence generation processing, machine translation processing, character recognition processing, and speech recognition processing. .
[0003]
[Prior art]
In the field of language analysis processing, semantic analysis processing, which is the next stage of morphological analysis and syntax analysis, is becoming increasingly important. In particular, in case analysis processing, omission analysis processing, and the like, which are the main parts of semantic analysis, it is desired to reduce the burden on processing and improve processing accuracy.
[0004]
The case analysis process is a process of restoring a surface case in which part of a sentence is hidden by subjecting or unifying. For example, in the sentence "I ate apples.", The part "Apples" is the theme, but when this part is returned to the surface case, it is "Apples". In this manner, the process of analyzing the “ha” of “apple” of “apple ate” as “ヲ” is performed. In addition, in the sentence "I have read the book I bought yesterday." The part of "Book I bought" is integrated, but when this part is returned to the surface case, "Bought the book". In this case, the union part of the “book” is analyzed as “ヲ”.
[0005]
The omission analysis processing means processing for restoring a surface case omitted in a part of a sentence. For example, in the sentence "I bought oranges and ate them.", The noun phrase (zero pronoun) omitted in the part of "And ate" is analyzed as "Mikano".
[0006]
When such a language analysis process is realized by a computer, a method of performing a language analysis process using a machine learning method has been proposed in order to obtain high processing accuracy while reducing the burden on the person performing the process ( Non-Patent Document 1).
[0007]
The technique of performing language analysis processing using the machine learning method presented in Non-Patent Document 1 (non-borrowed machine learning method) has the following advantages.
(I) It can be assumed that by preparing a corpus having larger teacher data, processing can be performed with higher accuracy.
(Ii) When a better machine learning method is developed, it can be predicted that higher accuracy can be obtained by using the machine learning method.
[0008]
Furthermore, Non-Patent Document 1 has presented a language analysis processing method using a borrowed machine learning method. The borrowed machine learning method is a machine learning method using a teacher signal generated from data to which information to be analyzed by the machine learning method is not added (hereinafter, referred to as “unsupervised data”). According to the borrowed machine learning method, a large amount of general electronic data is used without using data to which information to be analyzed (solution information) is manually added in advance, such as a case frame dictionary. The sentence can be used as unsupervised data for machine learning, and the learning accuracy of machine learning using a large number of teacher signals is improved, so that a highly accurate language analysis process can be realized.
[0009]
Furthermore, Non-Patent Document 1 has presented a language analysis processing method using a combined machine learning method. The combined machine learning method is a method in which a teacher signal used in a normal machine learning method, that is, data to which information to be analyzed by the machine learning method is added (hereinafter referred to as “supervised data”) and unsupervised data are generated. This is a method of performing machine learning using a teacher signal. According to the combined machine learning method, language analysis processing that takes advantage of both the large amount of teacher signals generated from unsupervised data that is easy to acquire and the teacher signals of supervised data that can ensure normal learning accuracy Can be realized.
[0010]
Further, as an important problem in the field of natural language processing, there is a conversion process from passive sentences or causative sentences to active sentences. This sentence conversion processing is useful in many research fields such as sentence generation processing, paraphrase processing, sentence simplification / language operation support, knowledge acquisition / information extraction processing using natural language sentences, and question answering systems. For example, in a question answering system, if there is a document in which the question sentence is written in the active sentence and the sentence containing the answer is written in the passive sentence, the sentence structure differs between the question sentence and the sentence containing the answer. It can be difficult to get answers to questions. Such a problem can also be solved by performing a conversion process from a passive sentence or a causative sentence to an active sentence.
[0011]
When translating a Japanese passive sentence or causative sentence into an active sentence, it is necessary to estimate a post-translation case particle to be used after the sentence transformation. For example, when converting from a passive sentence "I was bitten by a dog" to an active sentence "Dog bit me", the case particle "Ni" in "Dog" becomes "Ga", This is a process of estimating that "ga" of "I" is converted to "". Also, when converting the causal sentence "He cut her hair." To the active sentence "She cut her hair.", The case particle "Ni" of "She" becomes "Ga". This is a process of estimating that “to” of “hair” is not converted. However, the conversion of case particles in the conversion process from passive sentence or causative sentence to active sentence is not a problem that can be easily processed automatically because the case particle to be converted changes depending on the verb and how the verb is used. .
[0012]
Regarding the case particle conversion processing, for example, there are some conventional methods as shown in Non-Patent Documents 2 to 4 below. In the techniques disclosed in Non-Patent Documents 2 to 4, the problem of case particle conversion processing is dealt with using a case frame dictionary that describes how to convert case particles.
[0013]
[Non-patent document 1]
Maki Murata,
Japanese case analysis using machine learning method-Teacher signal borrowing type, non-borrowing type, and combined type-
IEICE, IEICE Technical Report NLC-2001-24
July 17, 2001
[Non-patent document 2]
Information processing promotion business association technology center,
Japanese Basic Verb Dictionary for Computers IPAL (Basic Verbs) Manual,
1987
[Non-Patent Document 3]
Sadao Kurohashi and Makoto Nagao,
A Method of Case Structure Analysis for Japanese Sentences based on Examples in Case Frame Dictionary,
IEICE Transactions of Information and Systems, Vol. E77-D, no. 2, 1994
[Non-patent document 4]
Keiko Kondo, Masafumi Sato, Manabu Okumura,
Paraphrasing simple sentences by case conversion,
IPSJ Transactions, Vol. 42, no. 3,
2001
[0014]
[Problems to be solved by the invention]
Non-Patent Document 1 has an effect of improving processing accuracy by applying a machine learning method to language analysis processing. Moreover, the borrowed machine learning method and the combined machine learning method are very effective in that the number of teacher signals for machine learning can be increased without increasing the labor burden by humans.
[0015]
In the machine learning process, learning is performed so as to maximize the correct answer rate in given teacher data. Unsupervised data differs from supervised data in that it has no information to be analyzed.
[0016]
Therefore, a machine learning process using a teacher signal in which unsupervised data is simply added to supervised data, such as the combined machine learning method disclosed in Non-Patent Document 1, is a data obtained by summing supervised data and unsupervised data. Learn to maximize the correct answer rate in. Therefore, depending on the relationship between the unsupervised data and the supervised data, there is a problem that the learning accuracy is reduced as compared with the case of machine learning in which the learning rate is maximized only with the supervised data.
[0017]
In view of such a problem of the related art, it is required to realize a method capable of more reliably performing a highly accurate learning process by taking advantage of supervised data and unsupervised data.
[0018]
Further, with regard to the sentence conversion process from the passive sentence / causative sentence to the active sentence, in the conventional techniques as described in the above-mentioned Non-Patent Documents 2 to 4, how to convert case particles is described as all verbs. I needed a case frame dictionary describing how to use the verb.
[0019]
However, since it is practically difficult to prepare a dictionary that describes all the verbs and how to use the verbs, the conversion processing method using this case frame dictionary is insufficient and is not described in the case frame dictionary. There has been a problem that a verb or a sentence in which the verb is used cannot be converted, or the probability of incorrect conversion is high.
[0020]
Therefore, there is a need for a method capable of performing high-accuracy processing, particularly, from the passive sentence / causative sentence to the active sentence, without increasing the labor burden.
[0021]
SUMMARY OF THE INVENTION An object of the present invention is to achieve a higher accuracy by utilizing the advantages of both data when performing a language analysis process using a combined learning method in which machine learning is performed using both supervised data and unsupervised data. It is an object of the present invention to provide a processing system capable of performing language analysis processing on a computer.
[0022]
Further, an object of the present invention is to provide a sentence conversion processing system which can estimate a converted case particle with high accuracy using a machine learning method, particularly for a sentence conversion process from a passive sentence or a causative sentence to an active sentence. .
[0023]
[Means for Solving the Problems]
In order to achieve the above object, the present invention has the following configuration.
[0024]
The present invention is a method for processing language analysis using a machine learning method in a language analysis processing system including a stack processing system and a main processing system that performs processing using the processing result of the stack processing system. In the stack processing system, when performing an analysis process, the machine learning method accesses a problem expression information storage unit in which information of an expression to be a problem is stored in advance, and based on the problem expression information, A problem expression corresponding part that matches the problem expression information is extracted from the case data to which no information is given, and the problem data is converted to the problem expression equivalent part using the problem expression equivalent part as a solution. Convert to unsupervised data. Then, a set of a solution and a set of features is extracted for each of the unsupervised data, and from the set of the solution and the set of features, a machine learning method is used to determine what kind of solution is likely to be obtained in what kind of feature. And the result of the learning is stored in a learning result database. Then, a first feature set extracted from the case data from the main processing system or a second feature set extracted from the case data input as a processing target is acquired, and the learning result database is referred to. From the first feature set or the second feature set, what kind of feature is likely to be a solution is estimated from the first feature set or the second feature set, and a first processing result for the first feature set is estimated. Alternatively, a second processing result for the second feature set is sent to the main processing system.
[0025]
The main processing system accesses a solution database that stores case data to which a solution to a problem handled by a machine learning method is given when performing an analysis process, retrieves the case data from the solution database, and Then, a set of features and feature sets is extracted, and a first feature set obtained by extracting only feature sets from the set of solution and feature sets is sent to the stack processing system. Receiving the first processing result from the stack processing system, adding the first processing result as a feature to a first feature set extracted from the case data, and adding the solution and the processing result; From the set with the first feature set thus obtained, what kind of feature is liable to be solved is learned by a machine learning method, and the result of the learning is stored in a learning result database. Then, a set of features is extracted from the case data input as a processing target, a second set of features of the input case data is sent to the stack processing system, and the second set of features output from the stack processing system is output. And adds the second processing result as a feature to a second feature set of the input case data, and refers to the learning result database to add the processing result to the second feature. From the set of, it is estimated what kind of solution is likely to result in what kind of feature.
[0026]
The present invention also provides a method for processing language analysis using a machine learning method in a language analysis processing system including a stack processing system and a main processing system that performs processing using the processing result of the stack processing system. In the stack processing system, when performing an analysis process, a machine learning method accesses a problem expression information storage unit in which information of an expression to be a problem is stored in advance, and based on the problem expression information, A problem expression equivalent portion that matches the problem expression information is extracted from the case data to which no information on the problem is added, and the case data obtained by converting the problem expression equivalent portion with the problem expression equivalent portion as a solution is defined as a problem. Is converted into unsupervised data having a structure, and a set of a solution or solution candidate and a set of features is extracted for each unsupervised data, and the solution or solution candidate and the feature are extracted. From the set and the set, what kind of solution or solution candidate and the set of features are learned as a positive example or a negative example by a machine learning method, and the result of the learning is obtained as a learning result. Store it in the database. Then, from the main processing system, a first set of solutions or solution candidates and features extracted from the case data or a second set of solution candidates and features extracted from the case data input as the processing target is input. A set is acquired, and by referring to the learning result database, from the first set or the second set, a probability or a negative example of what kind of solution candidate and feature set is a positive example is obtained. A probability is determined, and among all solution candidates, a process of estimating a solution candidate having the highest probability of being a positive example as a solution is performed, and a first processing result for the first set or a process result for the second set is performed. The second processing result is sent to the main processing system.
[0027]
The main processing system accesses a solution database that stores case data to which a solution to a problem handled by a machine learning method is given when performing an analysis process, retrieves the case data from the solution database, and And extracts a first set of solutions or solution candidates and a set of features, and sends the first set to the stack processing system. And receiving the first processing result from the stack processing system, adding the first processing result to the first set of feature sets, and adding the first processing result to the first processing result. From the set, the probability of a positive example or the probability of a negative example in the case of a set of solutions or solution candidates and features is learned by a machine learning method, and the result of the learning is stored in a learning result database. I do. Then, a second set of solution candidates and feature sets is extracted from the case data input as the processing target, and the second set is sent to the stack processing system. Receiving the processing result, adding the second processing result as a feature to the second set of features, and referencing the learning result database to add the second processing result to the second set From the solution candidates and the set of features, the probability of being a positive example or the probability of being a negative example is obtained, and the solution candidate having the largest positive example probability is determined from all the solution candidates. Estimate as the solution to be found.
[0028]
As described above, in the present invention, by incorporating the analysis result by the machine learning method using the unsupervised data as the feature of the supervised data, the learning is performed so as to maximize the correct answer rate for the supervised data in the machine learning process. Since it is performed, machine learning processing can be performed by utilizing the advantages of both unsupervised data and supervised data having different characteristics, and highly accurate analysis processing can be realized.
[0029]
Further, the present invention is a sentence conversion processing method for converting a passive sentence or causative sentence digitized by using a computer into an active sentence, wherein a sentence conversion process from a passive sentence or a causative sentence to an active sentence is performed. Access to a solution database that stores case data to which a solution to a problem handled by the machine learning method is assigned, extract the case data from the solution database, and extract a set of a solution and a set of features for each case data Then, from a set of the solution and the set of features, what kind of feature is likely to be a solution is learned by a machine learning method, and the result of the learning is stored in a learning result database. Then, a set of features is extracted from the digitized sentence input as the analysis target, and by referring to the learning result database, what kind of solution is likely to be obtained in the case of the feature extracted from the input sentence is determined. presume.
[0030]
Further, the present invention is a sentence conversion processing method for converting a passive sentence or a causal sentence digitized by using a computer into an active sentence, wherein a conversion process from a passive sentence or a causative sentence to an active sentence is performed. Accessing a solution database storing case data to which a solution to a problem handled by the machine learning method is assigned, extracting the case data from the solution database, and setting a solution or a solution candidate and a set of features for each case data Is extracted from the set of the solution or the solution candidate and the set of features, and the probability of a positive example or the probability of a negative example in the case of the solution or the solution candidate and the set of features is determined by machine learning. Learning is performed by a method, and the result of the learning is stored in a learning result database. Then, a set of a solution candidate and a set of features is extracted from the digitized sentence input as an analysis target, and a solution candidate and a set of features extracted from the input sentence are referred to by referring to the learning result database. From the set of, find the probability of a positive example or the probability of a negative example for what set of solution candidates and features, and find the solution with the highest probability of being a positive example among all the solution candidates. Estimate the candidate as a solution.
[0031]
The case particle conversion process in the sentence conversion process from the passive sentence or the causative sentence to the active sentence is to determine the case particle used in the converted sentence. Since the number of types of case particles after conversion is finite, the problem of estimating case particles after conversion can be reduced to a classification problem and can be treated as a process using a machine learning method.
[0032]
In the present invention, machine learning is performed using, as a teacher signal, data (unsupervised data) generated from a sentence to which information (such as a case particle after conversion) about an analysis target is not added. As a result, a large amount of ordinary electronic data (sentences) can be used as teacher data, and a high-accuracy sentence conversion process can be performed without increasing the labor burden of manually adding information on an analysis target. Can be realized.
[0033]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, some of the embodiments of the present invention will be described.
[0034]
As a first embodiment, a description will be given of a process of applying a machine learning method (supervised machine learning method) using supervised data to a sentence conversion process from a passive sentence / causative sentence to an active sentence. Further, as a second embodiment, a description will be given of a process of applying a machine learning method using unsupervised data (borrowed machine learning method) to a sentence conversion process from a passive sentence / causative sentence to an active sentence. Further, as a third embodiment, a machine learning method (combination type machine learning method) using both supervised data and unsupervised data is applied to the sentence conversion process from a passive sentence / causative sentence to an active sentence. The processing will be described.
[0035]
Further, as a fourth embodiment, a machine learning method (unsupervised data stack type machine learning method) using a result of machine learning using unsupervised data as a feature of supervised data is applied to language analysis processing. Will be described.
[0036]
In the embodiment of the present invention, the conversion of the case particle in the conversion process from the passive sentence / causative sentence to the active sentence means the case of the active sentence after conversion of the original passive sentence / causative sentence case particle. It refers to the process of conversion to particle and the process of erasing unnecessary parts of the original passive sentence and causative sentence. The unnecessary portion is the portion of the original causal sentence "he" in the sentence conversion from the causal sentence "he has cut her hair." To the active sentence "she has cut her hair." In addition, the case particle of the original sentence (passive sentence / causative sentence) is set as a case particle before conversion, and a new case particle given at the time of converting a sentence into an active sentence is set as a case particle after conversion.
[0037]
In the present embodiment, only the case particle conversion processing is targeted, and the conversion processing of the auxiliary verb expression accompanying the conversion to the active sentence is not described as the processing target. The conversion processing for the auxiliary verb expression part can be easily realized using existing processing, for example, processing using rules according to grammar.
[0038]
[First Embodiment]
As a first embodiment, when performing a sentence conversion process from a passive sentence / causative sentence to an active sentence, a sentence conversion process for automatically converting a case particle to be changed by machine learning using supervised data. The processing of the system will be described.
[0039]
FIG. 1 shows a configuration example of a sentence conversion processing system according to the present embodiment. The sentence conversion processing system 100 includes a CPU and a memory, and includes a solution-feature pair extraction unit 101, a machine learning unit 102, a learning result database 103, a feature extraction unit 110, a solution estimation processing unit 111, and a solution database 2.
[0040]
The solution-feature pair extraction unit 101 is a unit that extracts a case that is supervised data from the solution database 2 and extracts a set (pair) of a solution of the case and a set of features for each case.
[0041]
The machine learning unit 102 learns, from a set of the extracted solution and the set of features, what kind of feature is likely to be a solution by using a machine learning method, and stores the learning result in a learning result database 103. This is a means for storing the information.
[0042]
The feature extraction unit 110 is a unit that extracts a set of features from the input sentence (passive sentence or causative sentence) 3. The sentence is a sentence or a part of a sentence having at least a nominative and a decency.
[0043]
The solution estimation processing unit 111 refers to the learning result database 103 to determine what kind of solution is likely to occur in the case of the feature of the input sentence 3, that is, a case particle which is likely to be a converted case particle when converting to an active sentence. And outputs the estimated case particle as solution 4.
[0044]
The solution database 2 stores supervised data having a structure of “problem-solution” to which information to be analyzed in machine learning is added. In this embodiment, the case particles after conversion in the conversion process from the passive sentence / causative sentence to the active sentence are to be analyzed, and information of the case particles (converted case particles) to be changed in the conversion process to the active sentence is tagged. A database in which the attached cases (single sentences) are stored can be used.
[0045]
FIG. 2 shows a processing flow of the sentence conversion processing system 100.
[0046]
Step S1: The solution-feature pair extraction unit 101 extracts cases from the solution database 2 and extracts a set of a solution and a set of features for each case. For example, a tagged corpus is used as the solution database 2 in which each case particle in a passive sentence or a causative sentence is given a converted case particle used as a tag when it becomes an active sentence.
[0047]
FIG. 3 shows examples (single sentences) stored in a tagged corpus. The five case particles in which the simple sentence shown in FIG. 3 is underlined are case particles before conversion, and the case particle indicated by an arrow below the underlined portion is information indicating the case particle after conversion. In the case of FIG. 3A, when the passive sentence is converted to an active sentence, the prepositional case particles are converted from “ni” to “ga” and “ga” to “wo”, respectively. Means that. In the case of FIG. 3B, when the causative sentence is converted into an active sentence, the prepositional case particles are converted from "ni" to "ga" and "wo" to "wo", respectively. And that the word "he" is erased. “Other” is a tag meaning that the part is deleted when it becomes an active sentence.
[0048]
Here, the feature means one unit of fine information used in the analysis processing by the machine learning method. The features to be extracted include, for example, the following.
[0049]
1. Case particle attached to the physical n (case particle before conversion)
2. Part of speech of verb v
3. Basic form of word of verb v
4. Auxiliary verb sequence attached to the verb v (eg, "re", "suru")
5. Words of the nominal n
6. The classification number of the vocabulary table of the classification of the word of the n
7. The case that a body other than the body n of the verb v takes
For example, if the problem in the case is "The dog was bitten."
The word of the nominative n in the case to be estimated = dog,
・ Vocabulary v (basic form of word) to be modified by the case to be inferred = chew,
-Case particles between case n and verb v (case particles before conversion) =
Features such as are extracted.
[0050]
The solution is a case particle after conversion added to each case as tag information. In the above case,
・ Solution (converted case particle) =
It is. Then, the solution-feature pair extraction unit 101 sets the extracted set of features as a context in the machine learning process executed by the machine learning unit 102, and sets the solution as a classification destination.
[0051]
Step S2: The machine learning unit 102 learns, from a set of the extracted solution and the set of features, what kind of feature is likely to be a solution by a machine learning method, and learns the learning result. The result is stored in the result database 103.
[0052]
For example, extracted from the case "dog bitten.
The word of the nominative n in the case to be estimated = dog,
・ Vocabulary v (basic form of word) to be modified by the case to be inferred = chew,
-Case particles between case n and verb v (case particles before conversion) =
For a set of features such as
・ Solution (converted case particle) =
Learn that it is easy to be.
[0053]
In addition, it was extracted from the case "Snake bite.
The word of the nominative n in the case to be estimated = snake,
・ Vocabulary v (basic form of word) to be modified by the case to be inferred = chew,
-Case particles between case n and verb v (case particles before conversion) =
In the case of a set of features like
・ Solution (converted case particle) =
Learn that it is easy to be.
[0054]
The machine learning method uses, for example, a decision list method, a maximum entropy method, a support vector machine method, etc., but is not limited to these methods.
[0055]
In the decision list method, a set of a feature (each element constituting a context by information used for analysis) and a classification destination is set as a rule, and these are stored in a list in a predetermined priority order, and an input to be analyzed is given. In some cases, the input data is compared with the rule features from the highest priority in the list, and the rule classification destination having the same feature is set as the input classification destination.
[0056]
The maximum entropy method is based on the feature f _j When a set of (1 ≦ j ≦ k) is F, a probability distribution p (a, b) for maximizing an expression meaning entropy while satisfying a predetermined conditional expression is obtained, and according to the probability distribution, In this method, among the probabilities of the obtained classes, the class having the largest probability value is used as a solution (class to be obtained).
[Reference 1: Maki Murata, Masao Uchiyama, Kiyotaka Uchimoto, Maao, Hitoshi Isahara, Experiments on disambiguation using various machine learning methods, IEICE Language Understanding and Communication Study Group, NCL 2001-2, 2001)]
The support vector machine method is a method of classifying data consisting of two classifications by dividing a space by a hyperplane. The support vector machine method handles data having two classifications. For this reason, by using the pair vector technique in combination with the support vector machine method, data with three or more classifications can be usually handled. In the case of data having N classifications, a pairwise method creates all pairs of two different classification destinations (N (N-1) / 2) and determines which pair is better for each pair. (Here, based on the support vector machine method), and finally, the classification destination is obtained by a majority decision of the classification destinations of N (N-1) / 2 binary classifiers.
[Reference 2: Nello Christianian and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning, Canada, Canada, Canada, Canada, Canada, Canada]
[Reference 3: Taku Kudoh, TinySVM: Support Vector Machines, (http://cl.aist-nara.ac.jp/taku-ku//software/TinySVM/index.hml).
To explain the support vector machine method, FIG. 4 shows the concept of maximizing the margin of the support vector machine method. In FIG. 4, a white circle indicates a positive example, a black circle indicates a negative example, a solid line indicates a hyperplane that divides a space, and a broken line indicates a surface representing a boundary of a margin area. FIG. 4A is a conceptual diagram when the interval between the positive and negative examples is small (small margin), and FIG. 4B is a conceptual diagram when the interval between the positive and negative examples is large (large margin). is there.
[0057]
Assuming that the two classifications of the support vector machine method are composed of positive and negative examples, the larger the interval (margin) between the positive and negative examples in the training data, the lower the possibility of incorrect classification with open data. Considering this, as shown in FIG. 4B, a hyperplane that maximizes the margin is obtained, and classification is performed using the hyperplane.
[0058]
The support vector machine method is basically as described above, but usually, an extension of the method that allows a small number of cases to be included in the inner region of the margin in the training data, and the non-linear portion of the hyperplane A linearized extension (such as the introduction of a kernel function) is used.
[0059]
This extended method is equivalent to classifying using the following discriminant function, and it is possible to discriminate between the two classes depending on whether the output value of the discriminant function is positive or negative.
[0060]
(Equation 1)

[0061]
Where x is the context (set of features) of the case to be identified, x _i And y _j (I = 1, ..., l, y _j {1, -1}) means the context and classification destination of the training data, and the function sgn is
sgn (x) = 1 (x ≧ 0) (2)
-1 (otherwise)
And each α _i Is for maximizing equation (3) under the constraints of equations (4) and (5).
[0062]
(Equation 2)

[0063]
The function K is called a kernel function, and various functions are used. In the present embodiment, the following polynomial is used.
[0064]
K (x, y) = (x · y + 1) ^d (6)
C and d are constants set experimentally. In a specific example described later, C was fixed at 1 throughout all processes. As for d, two

types

1 and 2 are tried. Where α _i X such that> 0 _i Is called a support vector, and the sum of the expressions (1) is usually calculated using only this case. That is, only the case called the support vector in the learning data is used for the actual analysis.
[0065]
Since the support vector machine method handles data with two classifications, a pair-wise method is used in combination to handle data with three or more classifications. In this example, the sentence conversion processing system 150 performs a process combining the support vector machine method and the pairwise method. Specifically, it is realized using TinySVM.
[Reference 4: Taku Kudo, Yuji Matsumoto, chunk identification using Support vector machine, Natural Language Processing Research Group, 2000-NL-140, (2000)]
Step S3: Then, the input sentence 3 is input to the feature extraction unit 110 as data for which a solution is to be obtained.
[0066]
Step S4: The feature extraction unit 110 extracts a set of features from the input sentence 3 by substantially the same processing as that of the solution-feature pair extraction unit 101, and passes the extracted set of features to the solution estimation processing unit 111. For example, when the input sentence 3 is “bitten by dog”, the following features are extracted, and a set of extracted features is passed to the solution estimation processing unit 111.
[0067]
・ Nomenclature n = dog, which has a case to be estimated
A predicate to modify the case v = bite,
A prepositional case particle between the n and n
Step S5: Based on the learning result stored in the learning result database 103, the solution estimation processing unit 111 estimates what kind of solution 4 is likely to be obtained in the case of the given set of features, and (Converted case particle) 4 is output.
[0068]
For example, in the case where the learning result as described above is stored in the learning result database 103 for the case of the case "bitten by a dog. ⇒" or the case of "bite by a snake." The unit 111 analyzes the set of features extracted from the received input sentence 3 with reference to the learning result, estimates that “ga” is most likely to be the case particle after conversion, and 4 = Output "ga".
[0069]
FIG. 5 shows another configuration example of the sentence conversion processing system according to the first embodiment. In the following drawings, components such as processing units assigned the same numbers have the same functions.
[0070]
The sentence conversion processing system 150 includes a feature-solution pair / feature-solution candidate pair extraction unit 161, a machine learning unit 162, a learning result database 163, a feature-solution candidate pair extraction unit 170, a solution estimation processing unit 171, and a solution database 2 Is provided.
[0071]
The feature-solution pair / feature-solution candidate pair extraction unit 161 is a unit that extracts a case from the solution database 2 and extracts a set of a solution or a solution candidate and a set of features for each case.
[0072]
Here, the solution candidate means a solution candidate other than the solution. In other words, assuming that the case particles after conversion are the five case particles "wo", "ni", "ga", "to", and "de", if "ga" is a solution, , "Wo", "ni", "to", and "de" are the solution candidates. A set of a solution and a set of features is a positive example, and a set of a solution candidate and a set of features is a negative example.
[0073]
The machine learning unit 162 determines a set of any solution or solution candidate and a set of features from the set of the solution or solution candidate and the set of features extracted by the feature-solution pair / feature-solution candidate pair extraction unit 161. This is a means for learning the probability of being a positive example or the probability of being a negative example by the support vector machine method and a machine learning method similar thereto, and storing the learning result in the learning result database 163.
[0074]
The feature-solution candidate extraction unit 170 extracts a set of a solution candidate and a feature set from the input sentence 3 by the same processing as that of the feature-solution pair / feature-solution candidate pair extraction unit 161 and sends it to the solution estimation processing unit 171. It is a means of passing.
[0075]
The solution estimation processing unit 171 refers to the learning result database 163 to determine the probability of being a positive or negative example in the case of the solution candidate and the set of features passed from the feature-solution candidate extraction unit 170, and Is a means for estimating the solution candidate having the highest probability as the solution 4 and outputting the estimated solution 4.
[0076]
FIG. 6 shows a processing flow of the sentence conversion processing system 150.
[0077]
Step S11: The feature-solution pair / feature-solution candidate pair extraction unit 161 extracts a case from the solution database 2 and extracts a set of a solution or a solution candidate and a set of features for each case. The feature set extracted by the feature-solution pair / feature-solution candidate pair extraction unit 161 is the same as the feature set extracted in the process of step S1 (see FIG. 2).
[0078]
Step S12: From the set of the extracted solution or solution candidate and the set of features, the machine learning unit 162 determines the probability of being a positive example or the probability of being a negative example for any solution, solution candidate and set of features. Learn by machine learning method. This learning result is stored in the learning result database 163.
[0079]
For example, an example is “dog bitten.
・ Nomenclature n = dog, which has a case to be estimated
A predicate to modify the case v = bite,
A prepositional case particle between the n and n
, The probability that the solution "is" (probability that is a positive example) and the probability that each of the solution candidates "", "", "", "", and "" Probability).
[0080]
Step S13: Then, the input sentence 3 for which a solution is to be obtained is input to the feature-solution candidate extraction unit 170.
[0081]
Step S14: The feature-solution candidate extraction unit 170 extracts a set of solution candidates and feature sets from the input sentence 3 by the same processing as the feature-solution pair / feature-solution candidate pair extraction unit 161 and extracts the extracted solution. The set of the candidate and the feature set is passed to the solution estimation processing unit 171.
[0082]
Step S15: Based on the learning result stored in the learning result database 163 by the solution estimation processing unit 171, the probability is a positive example or a negative example in the case of a set of the passed solution candidate and the set of features. Find the probability.
[0083]
For example, if the input sentence is "dog bitten.", The extracted set of features and the solution candidates "ga", "wo", "ni", "to", and "de" Find the probability of being an example or the probability of being a negative example.
[0084]
Step S16: The probability of being a positive example or the probability of being a negative example is obtained for all solution candidates, and is estimated as a solution 4 for obtaining a solution candidate having the highest positive example probability, and the estimated solution 4 is output. .
[0085]
[Second embodiment]
As a second embodiment, a description will be given of a process of a sentence conversion processing system that automatically converts case particles by unsupervised learning in a process of converting a passive sentence / causative sentence into an active sentence.
[0086]
First, unsupervised data used in the machine learning method will be described. FIG. 7A shows an electronic sentence provided to create unsupervised data. The active sentence “dog bites me” in FIG. 7A is data to which information to be analyzed, that is, information relating to conversion of case particles during sentence conversion into an active sentence is not added. However, considering the sentence of FIG. 7A as a result of sentence conversion into an active sentence, the case particle (pre-conversion case particle) that should be represented by the original passive sentence and causative sentence converted into this active sentence is unknown. However, it is possible to extract a case particle (conversion case particle) that should appear in a solution to be estimated, that is, a processing result (active sentence).
[0087]
FIG. 7B shows a simple sentence indicating the relationship between the case particle before conversion and the case particle after conversion. The conversion source sentence of the active sentence in FIG. 7A can be expressed as "dog <?> I <?> Bitten (bitten)." Since pre-translation case particles that should appear in the original sentence have not been given, they are indicated by “<?> (Unknown)”. Also, the converted case particles extracted from the sentence of FIG. “Ga” and “O” are indicated by arrows below>. As shown in FIG. 7B, in the active sentence to which the information to be analyzed is not given, information on the case particle before conversion is unknown, but information on the case particle after conversion as a solution (classification destination). have. Then, "dog <?>Bite" in the sentence shown in FIG. 7B can be converted into the following problem structure.
[0088]
"Problem ⇒ solution" = "dog <?> Chewed.
Thus, it can be seen that an active sentence to which information to be analyzed is not added can be used as teacher data for machine learning.
[0089]
The unsupervised data generated from the active sentence in FIG. 7A has less information than the supervised data in that it has no information on the prepositional case particle. However, using a large amount of active sentences as unsupervised data because the number of active sentences is larger than passive sentences and causative sentences, and there is no need to manually tag the information of case particles after conversion This has the advantage of increasing the number of teacher signals handled by the machine learning method.
[0090]
FIG. 8 shows a configuration example of a sentence conversion processing system according to the second embodiment. The sentence conversion processing system 200 includes a CPU and a memory, and includes a problem expression equivalent part extraction unit 201, a problem expression information storage unit 202, a semantic analysis information storage unit 203, a problem structure conversion unit 204, an unsupervised data storage unit 205, a solution It includes a feature pair extraction unit 101, a machine learning unit 102, a learning result database 103, a feature extraction unit 110, a solution estimation processing unit 111, and a sentence database 5.
[0091]
The problem expression equivalent part extraction unit 201 refers to a problem expression information storage unit 202 that stores in advance what is equivalent to the problem expression (problem expression equivalent part) in the processing in this system, This is a means for extracting a sentence from the sentence database 5 storing data (sentence) to which information to be analyzed is not added, and extracting a problem expression equivalent part from the extracted sentence.
[0092]
Here, the problem expression information storage unit 202 stores case particles (converted case particles) to be changed in conversion from a passive sentence / causative sentence to an active sentence as a problem expression equivalent unit.
[0093]
When it is necessary to convert the extracted problem expression equivalent, the problem structure conversion unit 204 converts the problem expression equivalent by referring to the semantic analysis information storage 203 that stores information for semantic analysis. This is a means for converting a case particle into a “problem-solution” structure as a solution using case particles extracted from a problem expression equivalent part, and storing the converted unsupervised data in the unsupervised data storage unit 205 as a case.
[0094]
The solution-feature pair extraction unit 101, the machine learning unit 102, the learning result database 103, the feature extraction unit 110, and the solution estimation processing unit 111 of the sentence conversion processing system 200 perform the processes of the same numbers described in the first embodiment. This is a means for performing substantially the same processing as the means. The solution-feature pair extraction unit 101 extracts cases that are unsupervised data from the unsupervised data storage unit 205, and extracts a set of a solution and a feature set for each case.
[0095]
FIG. 9 shows a processing flow of the unsupervised data generation processing.
[0096]
Step S <b> 21: A sentence (active sentence) which is electronic data of a natural sentence to which information to be analyzed is not added is input from the sentence database 5 to the problem expression corresponding part extracting unit 201.
[0097]
Step S22: The problem expression equivalent part extraction unit 201 refers to the problem expression information storage unit 202, detects the structure of the input active sentence, and extracts the problem expression equivalent part. At this time, information on what is the problem expression equivalent part is given by the problem expression information stored in the problem expression information storage unit 202. For example, "dog <? = Case to be estimated (case particle after conversion)>bite" is stored as the problem expression information. Then, the problem expression equivalent part extraction unit 201 matches the sentence structure stored as the problem expression information with the structure of the input sentence (active sentence), and determines a match as the problem expression equivalent part. For example, if the input sentence is "dog bites", "ga" is extracted as a question expression equivalent part as a result of the matching.
[0098]
Step S23: The problem structure conversion unit 204 refers to the semantic analysis information storage unit 203, extracts the extracted problem expression equivalent part as a solution, converts that part into a problem expression (<?>), And as a result The obtained sentence is a problem. For example, “ga” extracted as a problem expression equivalent part from the active sentence “dog bites” is used as a solution, and the extracted “ga” part is converted into a problem expression (<?>), And “dog <?> Chew. "
Step S24: Further, the question structure conversion unit 204 stores the data having the structure of the question and the solution in the unsupervised data storage unit 205 as unsupervised data (case).
[0099]
After that, the sentence conversion processing system 200 performs the same processing as the processing in the first embodiment (see FIG. 2). That is, the solution-feature pair extraction unit 101 extracts a case from the unsupervised data storage unit 205 and extracts a set of a solution and a feature set for each case (step S1).
[0100]
If the extracted case is “dog <?> Chew.” ⇒ “ga”, for example, the following set of features is extracted.
[0101]
・ Nominal n = dog, which has a case to be estimated
The word v = bite that the case to be inferred modifies,
-The original case particle between the n and n (unknown).
Then, the machine learning unit 102 learns from the combination of the solution and the set of features, what case particle is to be a solution at what feature. The machine learning unit 102 learns that “solution = is” easily in the case of a set of features as described above, and stores the learning result in the learning result database 103 (step S2).
[0102]
If the retrieved case is “snake <?> Chew.” ⇒ “ga”, the following set of features is extracted.
[0103]
・ Nominal n = snake in the case to be estimated,
The word v = bite that the case to be inferred modifies,
-The original case particle between the n and n (unknown).
The machine learning unit 102 also learns that “solution =” is likely to be obtained even in the case of the above-described feature set, and stores the learning result in the learning result database 103.
[0104]
Thereafter, the processing from the input sentence 3 being input to the feature extraction unit 110 to the output of the solution 4 by the solution estimation processing unit 111 is the same as the processing in the first embodiment from step S3 to step S3 in the processing flow of FIG. Since the process is the same as the process shown in step S5, the description is omitted.
[0105]
FIG. 10 shows another configuration example of the sentence conversion processing system according to the second embodiment. The sentence conversion processing system 250 includes a problem expression equivalent part extraction unit 201, a problem expression information storage unit 202, a semantic analysis information storage unit 203, a problem structure conversion unit 204, an unsupervised data storage unit 205, a feature-solution pair / feature-solution. It includes a candidate pair extraction unit 161, a machine learning unit 162, a learning result database 163, a feature-solution candidate extraction unit 170, a solution estimation processing unit 171, and a sentence database 5.
[0106]
In the sentence conversion processing system 250, the problem expression equivalent part extraction unit 201, the problem expression information storage unit 202, the semantic analysis information storage unit 203, and the problem structure conversion unit 204 are the processing units assigned the same numbers shown in FIG. This is a means for performing the same processing as.
[0107]
Also, the feature-solution pair / feature-solution candidate pair extraction unit 161, the machine learning unit 162, the learning result database 163, the feature-solution candidate extraction unit 170, and the solution estimation processing unit 171 of the sentence conversion processing system 250 are configured as shown in FIG. This means performs substantially the same processing as the respective processing means assigned the same numbers as shown in FIG.
[0108]
The sentence conversion processing system 250 extracts a set of a solution or a solution candidate and a set of features for each case from the unsupervised data storage unit 205 by the feature-solution pair / feature-solution candidate pair extraction unit 161 ( FIG. 6: Step S11).
[0109]
If the extracted case is “dog <?> Chew.” ⇒ “ga”, for example, the following set of features is extracted.
[0110]
・ Nominal n = dog, which has a case to be estimated
The word v = bite that the case to be inferred modifies,
-The original case particle between the n and n (unknown).
Then, the machine learning unit 162 determines the probability of being a positive example or the probability of being a negative example for a set of solutions or solution candidates and features by a machine learning method from a set of a solution or solution candidates and a set of features. learn. This learning result is stored in the learning result database 163 (FIG. 6: step S12).
[0111]
Thereafter, the processing from the input sentence 3 being input to the feature-solution candidate extraction section 170 to the output of the solution 4 by the solution estimation processing section 171 is the processing in the first embodiment of the processing flow of FIG. Since the processing is the same as that in steps S13 to S16, the description is omitted.
[0112]
[Third Embodiment]
Since the case (“problem-solution”) stored in the unsupervised data storage unit 205 has almost the same structure as the case (“problem-solution”) stored in the solution database 2, the case of the unsupervised data is It is also possible to mix and use examples of supervised data. In the present embodiment, a method of performing machine learning using both unsupervised data and supervised data as teacher signals is referred to as “supervised / unsupervised learning”.
[0113]
Unsupervised data does not have information on pre-transformation case particles appearing in the original sentence, and has less information than supervised data. However, there is no need to manually tag solution information (such as the case particles after conversion) for each case. Further, since the number of active sentences is generally larger than the number of passive sentences, many sentences can be used as teacher signals. For this reason, the sentence conversion process based on supervised / unsupervised learning performs a sentence conversion process using a learning result of machine learning using a large amount of teacher data without increasing the labor burden of manually adding information to be analyzed. There is an advantage that can be.
[0114]
FIG. 11 shows a configuration example of a sentence conversion processing system 300 according to the third embodiment. The sentence conversion processing system 300 includes a CPU and a memory, and includes a problem expression equivalent part extraction unit 201, a problem expression information storage unit 202, a semantic analysis information storage unit 203, a problem structure conversion unit 204, an unsupervised data storage unit 205, It includes a feature pair extraction unit 101, a machine learning unit 102, a learning result database 103, a feature extraction unit 110, a solution estimation processing unit 111, a solution database 2, and a sentence database 5. The sentence conversion processing system 300 has a configuration in which the sentence conversion processing system 200 having the configuration illustrated in FIG. 8 described as the second embodiment and the solution database 2 is further provided, and is substantially the same as the sentence conversion processing system 200. Perform processing.
[0115]
The solution-feature pair extraction unit 101 determines, for each case, a set of solutions and features for the case that is supervised data stored in the solution database 2 and the case that is unsupervised data stored in the unsupervised data storage unit 205. Is extracted.
[0116]
FIG. 12 shows another configuration example of the sentence conversion processing system according to the third embodiment. The sentence conversion processing system 350 includes a CPU and a memory, and includes a problem expression equivalent part extraction unit 201, a problem expression information storage unit 202, a semantic analysis information storage unit 203, a problem structure conversion unit 204, an unsupervised data storage unit 205, and a feature- It includes a solution pair / feature-solution candidate pair extraction unit 161, a machine learning unit 162, a learning result database 163, a feature-solution candidate extraction unit 170, a solution estimation processing unit 171, a solution database 2, and a sentence database 5.
[0117]
The sentence conversion processing system 350 has a configuration provided with the solution database 2 in addition to the configuration of the sentence conversion processing system 250 having the configuration shown in FIG. 10 described as the second embodiment. Perform processing.
[0118]
The feature-solution pair / feature-solution candidate pair extraction unit 161 extracts, for each case, the case that is the supervised data stored in the solution database 2 and the case that is the unsupervised data stored in the unsupervised data storage unit 205. A set of a solution or solution candidate and a set of features is extracted.
[0119]
[Fourth Embodiment]
As a fourth embodiment, a description will be given of a process of a language analysis processing system that performs a stack-type machine learning utilizing both advantages of unsupervised data and supervised data to perform an analysis process when performing a language analysis process. .
[0120]
Stacked machine learning is machine learning that uses a technique called "stacking" that is used to fuse the analysis results of multiple systems, and uses a teacher signal that adds the analysis results of different machine learning methods to features. And perform machine learning.
[Reference 5: Hans van Halteren, Jakub, Zavrel, and Walter Dalemans, Improving Accuracy in World Class Taging Agreement, Long-Term Trading, Long-Term Communicating, Meeting of the Recommendations. 27, No. 2, (2001) pp. 199-229]
In the present embodiment, the language analysis processing system performs a language analysis process using borrowed machine learning (machine learning using unsupervised data) or combined machine learning (machine learning using supervised / unsupervised data), and performs the processing. The resulting estimated solution is added as an element of the set of features. Then, a language analysis process by supervised learning is further performed using the set of features to which the estimated solution has been added.
[0121]
For example, in the supervised machine learning used in the language analysis processing system of this embodiment, it is assumed that a set of features extracted from certain supervised data (case) has a list {a, b, c}. The processing system for stacking is a language analysis processing system using unsupervised machine learning, and the analysis result is "d ₁ ". In this case, in the supervised machine learning process of the language analysis processing system, the analysis result “d, ₁ And the list {a, b, c, "analysis result of unsupervised learning = d ₁ Perform machine learning with “｝” as a new set of features.
[0122]
The processing system for stacking is a language analysis processing system using supervised / unsupervised machine learning, and the analysis result is "d ₂ ". In this case, in the supervised machine learning process of the language analysis processing system, the analysis result “d, ₂ And the list {a, b, c, "analysis result of supervised / unsupervised learning = d ₂ Perform machine learning with “｝” as a new set of features.
[0123]
As the stacking processing system, a language analysis processing system using unsupervised machine learning and a language analysis processing system using supervised / unsupervised machine learning can be used. In this case, in the supervised machine learning process of the language analysis processing system, the analysis result “d, ₁ "And" d ₂ And the list {a, b, c, "analysis result of unsupervised learning = d ₁ "," Supervised / unsupervised learning analysis result = d ₂ Perform machine learning with “｝” as a new set of features.
[0124]
As described above, when non-borrowed machine learning using supervised data and borrowed machine learning or combined machine learning using supervised data are combined using the stacking method, the supervised data used for supervised machine learning (example ) Is increased. Thus, it is considered that each case itself used for supervised machine learning improves the learning accuracy. Furthermore, in supervised machine learning, learning is performed to maximize the accuracy rate of supervised data (cases) although the feature is increasing, that is, learning to maximize the accuracy of the analysis target. The analysis process is performed using the learning result. As a result, it is expected that the advantages of the supervised machine learning and the unsupervised machine learning can be effectively used to obtain high analysis accuracy.
[0125]
FIG. 13 shows a configuration example of a language analysis processing system according to the fourth embodiment.
[0126]
The language analysis processing system 500 is a system that outputs an analysis result of a language analysis process for a given problem, and includes a CPU and a memory, and includes a solution-feature pair extraction unit 501, a machine learning unit 502, a learning result database 503, A feature extraction unit 504, a solution estimation processing unit 505, an unsupervised learning processing system for stack 1010, a first feature addition unit 511, a second feature addition unit 512, a sentence database 5, and a solution database 6 are provided.
[0127]
The processing units of the solution-feature pair extraction unit 501, the machine learning unit 502, the learning result database 503, the feature extraction unit 504, and the solution estimation processing unit 505 are respectively the solution-feature pair extraction unit 101 of the sentence conversion processing system 100. , A machine learning unit 102, a learning result database 103, a feature extraction unit 110, and a solution estimation processing unit 111.
[0128]
The unsupervised stacking learning processing system 1010 extracts a set of features from unsupervised data generated from the sentence database 5 for the language analysis process, and determines what feature set from the extracted set of features. And learns the result of the learning, and stores the learning result. In the case of the feature set received from the first feature adding unit 511 or the second feature adding unit 512, what kind of solution ( (Analysis result) is estimated from the learning result that memorized whether it is likely to be ₁ To the first feature adding unit 511 or the solution d ₁ 'Is returned to the second feature adding unit 512.
[0129]
The unsupervised stacking learning processing system 1010 has processing means configured in the same manner as the sentence conversion processing system 200 shown in FIG. 8, that is, a problem expression equivalent part extraction unit 201, a problem expression information storage unit 202, and a semantic analysis information storage unit 203. A problem structure conversion unit 204, an unsupervised data storage unit 205, a solution-feature pair extraction unit 101, a machine learning unit 102, a learning result database 103, a feature extraction unit 110, and a solution estimation processing unit 111 (not shown). Outputs the analysis result of the language analysis process for the given problem.
[0130]
The first feature adding unit 511 extracts only the feature set from the set of the solution and the feature set received from the solution-feature pair extracting unit 501 and passes it to the stack unsupervised learning processing system 1010, and performs the stack unsupervised learning process. Solution d returned from system 1010 ₁ And “analysis result of unsupervised learning = d ₁ "Is added as a feature to the original set of features.
[0131]
The second feature adding unit 512 extracts the set of features received from the feature extracting unit 504, passes the set to the unsupervised learning processing system for stack 1010, and returns the solution d returned from the unsupervised learning processing system for stack 1010. ₁ ', And the analysis result of unsupervised learning = d ₁ This is a means to add '”as a feature to a set of features.
[0132]
14 and 15 show a processing flow of the language analysis processing system 500.
[0133]
Step S30: The unsupervised stacking learning processing system 1010 extracts a single sentence stored in the sentence database 5. The problem sentence equivalent part is extracted from the extracted sentence by referring to the problem expression information to obtain a solution. Cases having a “problem-solution” structure are stored as unsupervised data. Furthermore, a set of a solution and a set of features is extracted for each case, and what kind of solution is likely to be solved is learned by a machine learning method, and the learning result is stored.
[0134]
Step S31: Thereafter, the solution-feature pair extraction unit 501 extracts cases from the solution database 6, and extracts a set of a solution and a set of features for each case.
[0135]
Step S32: The first feature adding unit 511 extracts only a set of features from the set of the solution and the set of features, and passes the set to the unsupervised stacking learning processing system 1010.
[0136]
Step S33: The unsupervised stacking learning processing system 1010 refers to the learning result stored in advance to estimate what kind of solution the received feature set is likely to be, and obtains the estimated solution d. ₁ Is returned to the first feature adding unit 511.
[0137]
Step S34: The solution d returned by the first feature adding unit 511 ₁ Is added to the original set of features as a feature. As a result, assuming that the original feature set is {a, b, c}, the feature set passed to the machine learning unit 502 is {a, b, c, “analysis result of unsupervised learning = d ₁ "｝.
[0138]
Step S35: The machine learning unit 502 uses the solution and the analysis result of “unsupervised learning = d ₁ ", And learns what kind of solution is likely to occur in what kind of feature from the set with the feature set including"", and stores the learning result in the learning result database 503.
[0139]
Step S36: The sentence whose solution is to be obtained is input to the feature extracting unit 504.
[0140]
Step S37: The feature extraction unit 504 extracts a feature set from the input sentence 3 and passes it to the second feature addition unit 512.
[0141]
Step S38: The received feature set is passed to the unsupervised stacking learning processing system 1010 by the second feature adding unit 512.
[0142]
Step S39: The unsupervised stacking learning processing system 1010 refers to the learning result stored in advance to estimate what kind of solution is likely to be obtained in the received set of features, and estimates the estimated solution d. ₁ 'To the second feature adding unit 512.
[0143]
Step S310: The solution d returned by the second feature adding unit 512 ₁ 'As a feature to the original set of features. Assuming that the original feature set is {a, b, c}, the feature set passed to the machine learning unit 502 is {a, b, c, “analysis result of unsupervised learning = d ₁ The set of features is passed to the solution estimation processing unit 505.
[0144]
Step S311: The solution estimation processing unit 505 refers to the learning result stored in the learning result database 503 to estimate what kind of solution is likely to be obtained in the case of the given set of features, and determines the estimated solution. 4 is output.
[0145]
Hereinafter, the processing of the language analysis processing system 500 will be described in more detail by taking specific processing as an example. As a first specific example, a processing example in a case where the language analysis processing system 500 estimates a post-translational case particle in a conversion process from a passive sentence / causative sentence to an active sentence will be described.
[0146]
In the unsupervised stacking learning processing system 1010 of the language analysis processing system 500, case particles to be converted (case particles to be estimated) in the conversion process from the passive sentence / causative sentence to the active sentence are stored in advance as a problem expression. . Then, when the sentence extracted from the sentence database 5 is "dog bites", "ga" is extracted as the problem expression equivalent part and is set as the solution (classification destination), and the sentence is transformed into "dog <?>Bites." Problem (context)
Example (problem ⇒ solution): "dog <?>Chew." ⇒ "ga"
Is stored. Further, the following set of features is extracted from this case.
[0147]
・ Nominal n = dog, which has a case to be estimated
The word v = bite that the case to be inferred modifies,
・ Original (before conversion) case particle between the noun and the verb =? (unknown)
Then, in the case of this set of features, it learns that the case particle after conversion is likely to be “ga” and stores the learning result.
[0148]
When the sentence extracted from the sentence database 5 is “Snake bites”, the same processing is performed.
Example (problem ⇒ solution): “Snake <?> Chew.” ⇒ “ga”
Is stored. Further, the following set of features is extracted from this case.
[0149]
・ Nominal n = snake in the case to be estimated,
The word v = bite that the case to be inferred modifies,
・ Original (before conversion) case particle between the noun and the verb =? (unknown)
Then, even in the case of this set of features, it learns that the case particle after conversion is likely to be “ga” and stores the learning result.
[0150]
Then, from the solution database 6 by the solution-feature pair extraction unit 501,
Example (problem ⇒ solution): “dog bites” ⇒ “ga”
Is extracted, and a set of the solution “GA” and the following set of features is extracted for each case.
[0151]
・ Nominal n = dog, which has a case to be estimated
The word v = bite that the case to be inferred modifies,
The original (before conversion) case particle between the n and n
Further, the first feature adding unit 511 extracts only a set of features from the set of the extracted solution and the set of features, and passes the extracted set to the unsupervised stacking learning processing system 1010. The unsupervised stacking learning processing system 1010 refers to the learning result stored in advance to estimate what kind of solution is likely to be obtained for the received set of features, and obtains the estimated solution d. ₁ “GA” is returned to the first feature adding unit 511.
[0152]
Next, the solution d returned by the first feature adding unit 511 ₁ Is added to the original set of features as a feature, resulting in the following set of features.
[0153]
・ Nominal n = dog, which has a case to be estimated
The word v = bite that the case to be inferred modifies,
The original (before conversion) case particle between the n and n
・ Analysis result of unsupervised learning = (solution d ₁ )
Then, the machine learning unit 502 provides a solution and a solution d. ₁ Is learned from a set with a set of features that include, and what kind of solution is likely to occur, and the learning result is stored in the learning result database 503.
[0154]
Thereafter, a sentence for which a solution is to be obtained is input to the feature extraction unit 504. The feature extraction unit 504 extracts a set of features from the input sentence 3. For example, when the input sentence 3 is “bite by snake.”, The following feature set is extracted and passed to the second feature adding unit 512.
[0155]
・ Nominal n = snake in the case to be estimated,
The word v = bite that the case to be inferred modifies,
The original (before conversion) case particle between the n and n
Then, the received feature set is passed to the unsupervised stacking learning processing system 1010 by the second feature adding unit 512. The unsupervised stacking learning processing system 1010 refers to the learning result stored in advance to estimate what kind of solution is likely to be obtained in the received set of features, and obtains the estimated solution d. ₁ 'Return “GA” to the second feature adding unit 512.
[0156]
The solution d returned by the second feature adding unit 512 ₁ 'As a feature to the original set of features. For example, the following set of features is obtained.
[0157]
・ Nominal n = snake in the case to be estimated,
The word v = bite that the case to be inferred modifies,
The original (before conversion) case particle between the n and n
・ Analysis result of unsupervised learning = (solution d ₁ ')
And the solution d ₁ The set of features including 'is passed to the solution estimation processing unit 505. The solution estimation processing unit 505 refers to the learning result stored in the learning result database 503 to estimate what kind of solution is likely to be obtained in the case of the given set of features, and determines the estimated solution 4 Output.
[0158]
Here, the case particle “GA” estimated by referring to the learning result of the supervised learning based on the set of features to which the analysis result “GA” returned from the unsupervised learning processing system for stack 1010 is added is output. You.
[0159]
As described above, the machine learning unit 502 adds “the analysis result of unsupervised learning = d ₁ Machine learning is performed using a set of features to which "" has been added. Since the feature set used in this case has more feature information than the feature set extracted from the supervised data, only the supervised data is used. Machine learning can be performed with higher accuracy than when machine learning is performed.In addition, machine learning is performed using only unsupervised data that has a large amount of data but little information on features. In addition, machine learning with higher accuracy can be performed in terms of a large amount of feature information.
[0160]
Furthermore, the solution estimation processing unit 505 refers to a high-precision learning result learned using a case in which the feature set information is large, and sees the similarity of the feature sets extracted from the input sentence 3. . Therefore, the analysis result of unsupervised learning = d ₁ Compared to the case where "" is not included, the similarity between the feature sets is higher, and the accuracy of the estimation process is higher.
[0161]
As a second specific example, a processing example in which the linguistic analysis processing system 500 performs a process of estimating a surface case given when a sentence is generated when the meaning of the sentence is represented by a deep case or the like. Is shown.
[0162]
For example, if the meaning of a sentence is represented by a deep case, it can be expressed as follows.
[0163]
Sentence "Eat apple <← obj>"
In this sentence, “apple” is an object of “eat”, and “apple” and “eat” are connected by a deep case object (indicated by <← obj>).
[0164]
Then, in the sentence generation processing, a generated sentence “eat apple” is output from the original sentence. In this case, however, it is necessary to generate a case particle “wo” corresponding to <← obj>. The problem structure (problem ⇒ case) given in this process is shown below.
[0165]
Problem (problem ⇒ case):
"Eat an apple <← obj>" ⇒ "O"
The unsupervised stack learning processing system 1010 of the language analysis processing system 500 stores a given deep case as a problem expression. Then, in the unsupervised stacking learning processing system 1010, when the sentence extracted from the sentence database 5 is "eat apples.", The case particle "wo" is replaced as a problem expression equivalent part, and the case particle "wo" is replaced. The following cases are stored as unsupervised data, with a sentence extracted as a solution and obtained as a result of converting a problem expression equivalent part of the extracted sentence as a problem.
[0166]
Case (problem ⇒ solution):
"Eat apple <?>" ⇒ "O"
Further, a set of a solution and a set of features is extracted from this case. Here, the set of features is as follows.
[0167]
・ Nominative n = apple, the case to be generated
A case v to be modified by the case to be generated = eat,
・ Deep case between verbal n and verb v =? (unknown)
Then, what kind of solution is likely to be obtained when what kind of feature set is obtained, and the learning result is stored. For example, in the case of the set of features described above, it is learned that “solution = を” is likely to occur.
[0168]
It is also assumed that the sentence “Eat mandarin orange” is extracted from the sentence database 5. In this case, the following cases are used as unsupervised data.
[0169]
Case (problem ⇒ solution):
"Orange <?>Eat" ⇒ "O"
Further, a set of a solution and a set of features is extracted from this case. Here, the set of features is as follows.
[0170]
・ Nominative n = mandarin orange to be generated
A case v to be modified by the case to be generated = eat,
・ Deep case between verbal n and verb v =? (unknown)
In case estimation in the sentence generation process, the information on the features is smaller than that of general supervised data. However, since there are a large number of sentences that can be used as unsupervised data, many unsupervised data are used. It is possible to prepare.
[0171]
Then, what kind of solution is likely to be obtained when what kind of feature set is obtained, and the learning result is stored. Also in this case, it is learned that "solution =" is likely to occur.
[0172]
After that, it is assumed that the following cases are extracted from the solution database 6 by the solution-feature pair extraction unit 501.
[0173]
Example: "Eat an apple <← obj>" ⇒ "O"
Further, a set of a solution and a set of features is extracted from the extracted cases. The following are extracted as a set of features.
[0174]
・ Nominative n = apple, the case to be generated
A case v to be modified by the case to be generated = eat,
・ The deep case between the noun n and the verb v = obj
The first feature adding unit 511 passes the extracted feature set to the unsupervised learning processing system for stack 1010, and the unsupervised learning processing system for stack 1010 receives the extracted features based on the stored learning result. Of possible solutions for the set of ₁ = Return "" to the first feature adding unit 511. Then, the first feature adding unit 511 calculates the returned solution d ₁ Is added to the set of features to obtain the following set of features.
[0175]
・ Nominative n = apple, the case to be generated
A case v to be modified by the case to be generated = eat,
A deep case between a noun n and a verb v = obj,
-Analysis result of unsupervised learning = (solution d ₁ )
Then, the machine learning unit 502 learns what kind of solution is likely to be obtained in the case of the set of features. At this time, the solution d acquired from the unsupervised learning processing system for stack 1010 ₁ Analysis result of unsupervised learning = (solution d ₁ ) ”As a set of features,
・ Nominative n = apple, the case to be generated
A case v to be modified by the case to be generated = eat,
A deep case between a noun n and a verb v = obj,
-Analysis result of unsupervised learning = (solution d ₁ )
If there is such a feature, "was" can be learned as a solution. This learning result is stored in the learning result database 503.
[0176]
Thereafter, when the sentence “Mikan <← obj> eats” is input to the feature extraction unit 504, the feature extraction unit 504 extracts the following set of features from the input sentence 3, and extracts the second feature addition unit. Transfer to 512.
[0177]
・ Nominative n = mandarin orange to be generated
A case v to be modified by the case to be generated = eat,
・ The deep case between the noun n and the verb v = obj
When the set of features is passed to the unsupervised learning processing system for stack 1010 by the second feature adding unit 512, the unsupervised learning processing system for stack 1010 refers to the feature received by referring to the stored learning result. Solution d that is likely to be the case of ₁ '= “を” is estimated and returned to the second feature adding unit 512.
[0178]
The second feature adding unit 512 adds the solution d to the original set of features. ₁ The following set of features to which 'is added is passed to the solution estimation processing unit 505.
[0179]
・ Nominative n = mandarin orange to be generated
A case v to be modified by the case to be generated = eat,
A deep case between a noun n and a verb v = obj,
-Analysis result of unsupervised learning = (solution d ₁ ')
The solution estimation processing unit 505 estimates what kind of solution is likely to be obtained in the case of this set of features. Here, since the set of features stored as the learning result and the set of features extracted from the input sentence 3 are very similar, it is possible to correctly estimate "" as a solution in the learning result. . Then, the case particle "wo" to be generated as the estimated solution 4 is output.
[0180]
Next, as a third specific example, a processing example in a case where the language analysis processing system 500 performs processing for complementing abbreviated expressions of verbs will be described. For example, the sentence "What works so well." Is considered to be an expression in which the verb part at the end of the sentence is omitted, and processing for complementing the omitted verb part "I do not think" is performed.
[0181]
In this case, the omitted "verb part to be complemented" is used as the problem expression, and the "verb part" that complements the abbreviated expression is used as the solution. In the unsupervised stack learning processing system 1010 of the language analysis processing system 500, problem expression information is stored in advance in order to extract such a problem expression.
[0182]
If the sentence extracted from the sentence database 5 is "I don't think it works so well", the verb part at the end of the sentence is replaced as the problem expression equivalent part, and the verb part "I don't think" at the end of the sentence is extracted as a solution. The following example is stored as unsupervised data, using a sentence obtained as a result of converting the problem expression equivalent part of the extracted sentence as a problem.
[0183]
Case (problem ⇒ solution):
"It's so good <?>" ⇒ "I don't think"
Further, a set of a solution and a set of features is extracted from this case. Here, the set of features is as follows.
[0184]
・ "Ha",
・ What is "
・ "Kutto",
・ "Ikuto",
…,
・ "I don't think it works so much"
Then, what kind of solution is likely to be obtained when what kind of feature set is obtained, and the learning result is stored. For example, in the case of the set of features described above, it is learned that “solution = unlikely” is likely to occur.
[0185]
Then, from the solution database 6 by the solution-feature pair extraction unit 501,
Example: "It works so well." ⇒ "I don't think"
And a set of a solution and a set of features is extracted from the extracted cases. Here, the set of features consists of the following features.
[0186]
・ "Ha",
・ What is "
・ "Kutto",
・ "Ikuto",
…,
・ "What works so well"
・ "I don't think it works so well"
The first feature adding unit 511 passes the extracted set of features to the stack unsupervised learning processing system 1010.
[0187]
The unsupervised stacking learning processing system 1010 estimates, based on the stored learning results, what kind of solution is likely to be obtained in the case of the received set of features, and estimates the estimated solution d. ₁ = Return "I don't think" to the first feature adding unit 511.
[0188]
Then, the first feature adding unit 511 calculates the returned solution d ₁ Is added to the set of features to obtain the following set of features.
[0189]
・ "Ha",
・ What is "
・ "Kutto",
・ "Ikuto",
…,
・ "What works so well"
・ "I don't think it works so well"
-Analysis result of unsupervised learning = I don't think (solution d ₁ )
Then, the machine learning unit 502 learns what kind of solution is likely to be obtained in the case of the set of features, and stores the learning result in the learning result database 503.
[0190]
After that, when the sentence “It works so well” is input to the feature extraction unit 504, the feature extraction unit 504 extracts the following feature set from the input sentence 3 and Transfer to 512.
[0191]
・ "Ha",
・ What is "
・ "Kutto",
・ "Ikuto",
…,
・ "What works so well"
When the set of features is passed to the unsupervised learning processing system for stack 1010 by the second feature adding unit 512, the unsupervised learning processing system for stack 1010 refers to the feature received by referring to the stored learning result. Solution d that is likely to be the case of ₁ '= "I do not think" and return it to the second feature adding unit 512.
[0192]
The second feature adding unit 512 adds the solution d to the original set of features. ₁ The following set of features to which 'is added is passed to the solution estimation processing unit 505.
[0193]
・ "Ha",
・ What is "
・ "Kutto",
・ "Ikuto",
…,
・ "What works so well"
-Analysis result of unsupervised learning = I don't think (solution d ₁ ')
The solution estimation processing unit 505 estimates what kind of solution is likely to be obtained in the case of this set of features, and outputs the omitted verb part “I do not think” as the estimated solution 4.
[0194]
FIG. 16 shows another configuration example of the language analysis processing system according to the fourth embodiment. The linguistic analysis processing system 540 includes processing means similar to the linguistic analysis processing system 500, and has a configuration including a supervised / unsupervised learning processing system for stack 1020 instead of the unsupervised learning processing system for stack 1010.
[0195]
The stack supervised / unsupervised learning processing system 1020 has a configuration in which the solution database 2 is added to the same processing means as the stack supervised unsupervised learning processing system 1010. The stacking supervised / unsupervised learning processing system 1020 extracts a set of features from the unsupervised data generated from the sentence database 5 and the case (supervised data) of the solution database 2 for the language analysis process, and extracts the extracted features. And learns what kind of solution (analysis result) is likely to occur when the set of features is obtained, and stores the learning result, and receives from the first feature adding unit 511 or the second feature adding unit 512 What kind of solution (analysis result) is likely to occur in the case of a set of features is estimated from the stored learning results, and the estimated solution d ₂ To the first feature adding unit 511 or the solution d ₂ 'Is returned to the second feature adding unit 512.
[0196]
The first feature adding unit 511 of the language analysis processing system 540 includes the solution d ₂ And “analysis result of supervised / unsupervised learning = d ₂ Is added to the original set of features as a feature. The second feature adding unit 512 of the language analysis processing system 540 sets the solution d returned from the stacking supervised / non-supervised learning processing system 1020. ₂ ', And the analysis result of supervised / unsupervised learning = d ₂ Add '”to the set of features as a feature.
[0197]
FIG. 17 shows another configuration example of the language analysis processing system according to the fourth embodiment.
[0198]
The language analysis processing system 550 is an output system that outputs an analysis result of a language analysis process for a given problem, and includes a CPU and a memory, and includes a feature-solution pair / feature-solution candidate pair extraction unit 561, a machine learning unit 562, Learning result database 563, feature-solution candidate extracting unit 564, solution estimation processing unit 565, unsupervised learning processing system 1030 for stack, first feature adding unit 521, second feature adding unit 522, sentence database 5, and solution database 6 Is provided.
[0199]
The processing units of the feature-solution pair / feature-solution candidate pair extraction unit 561, the machine learning unit 562, the learning result database 563, the feature-solution candidate extraction unit 564, and the solution estimation processing unit 565 are each a sentence conversion processing system. A means for performing substantially the same processing as the 150 feature-solution pair / feature-solution candidate pair extraction unit 161, the machine learning unit 162, the learning result database 163, the feature-solution candidate extraction unit 170, and the solution estimation processing unit 171. .
[0200]
The unsupervised stacking learning processing system 1030 extracts a set of a solution or solution candidate and a set of features from the unsupervised data generated from the sentence database 5 for the language analysis process, and extracts the extracted solution or solution candidate and the feature. From a set of sets, what kind of solution or solution candidate and the set of features are learned as positive examples or negative examples by machine learning, and the learning result is stored. The probability of being a positive or negative example in the case of a set of a solution or solution candidate received from the first feature adding unit 521 or the second feature adding unit 522 and a set of features with reference to the result is a positive example. The solution candidate with the highest probability is estimated as a solution (analysis result), and the estimated solution d ₃ To the first feature adding unit 521 or the solution d ₃ 'Is returned to the second feature adding unit 522.
[0201]
The unsupervised stacking learning processing system 1030 calculates the solution d ₃ , Solution d ₃ ', It is possible to output a solution candidate estimated as a solution, and to output information as to whether the solution is a positive example or a negative example, information about a probability of a positive example or a negative example, and the like.
[0202]
The unsupervised stacking learning processing system 1030 has processing means configured in the same manner as the sentence conversion processing system 250 shown in FIG. 10, that is, a problem expression equivalent part extraction unit 201, a problem expression information storage unit 202, and a semantic analysis information storage unit 203. , Problem structure conversion unit 204, unsupervised data storage unit 205, feature-solution pair / feature-solution candidate pair extraction unit 161, machine learning unit 162, learning result database 163, feature-solution candidate extraction unit 170, and solution estimation processing A section 171 is provided (not shown), and outputs an analysis result of a language analysis process for a given problem.
[0203]
The first feature adding unit 521 passes the set of the solution or the solution candidate and the set of features received from the feature-solution pair / feature-solution candidate pair extraction unit 561 to the stack unsupervised learning processing system 1030, and None Solution d returned from learning processing system 1030 ₃ Received, "analysis result of unsupervised learning = solution d ₃ "Is added as a feature to the original set of features.
[0204]
The second feature adding unit 522 transfers the set of the solution candidate and the feature set received from the feature-solution candidate extracting unit 564 to the stack unsupervised learning processing system 1030, and is returned from the stack unsupervised learning processing system 1030. Solution d ₃ 'Received, "analysis result of unsupervised learning = solution d ₃ This is a means to add '”as a feature to the original set of features.
[0205]
18 and 19 show a processing flow of the language analysis processing system 550.
[0206]
Step S40: In the unsupervised stacking learning processing system 1030, the single sentence stored in the sentence database 5 is extracted, the problem expression corresponding part is extracted from the extracted sentence by referring to the problem expression information to obtain a solution, and the semantic analysis information is further extracted. , The part corresponding to the problem expression is converted into a problem structure, and a sentence obtained as a result of the conversion is stored as an unsupervised data in a case having a “problem-solution” structure as a problem. Furthermore, a set of a solution or a solution candidate and a set of features is extracted for each case, and the probability of being a positive example or the probability of being a negative example is determined for any solution or combination of a solution candidate and a set of features. Learning is performed by a machine learning method, and the learning result is stored.
[0207]
Step S41: After that, the feature-solution pair / feature-solution candidate pair extraction unit 561 extracts cases from the solution database 6 and extracts a solution or a set of solution candidates and a set of features for each case.
[0208]
Step S42: The first feature adding unit 521 transfers a solution or a set of solution candidates and a set of features to the unsupervised stacking learning processing system 1030.
[0209]
Step S43: The stack-less unsupervised learning processing system 1030 refers to the learning result stored in advance and indicates the probability or the negative example that is a positive example for the set of the received solution or solution candidate and the set of features. Finding the probability and finding the solution candidate with the largest positive probability as the solution d ₃ And the solution d ₃ Is returned to the first feature adding unit 521.
[0210]
Step S44: The solution d returned by the first feature adding unit 521 ₃ From the analysis result of unsupervised learning = solution d ₃ Is added to the original set of features as a feature. ₃ In the case where information such as a positive example or a negative example and information such as a probability of a positive example or a negative example are included in addition to the estimated solution candidate, the received solution d ₃ May be added to the set of features. For example, “analysis result of unsupervised learning = estimated solution candidate (solution d ₃ ), "," Analysis result of unsupervised learning = positive example / negative example (solution d ₃ ) ”Or“ analysis result of unsupervised learning = probability of positive example / probability of negative example (solution d ₃ ) "Are added to the original set of features.
[0211]
The processing from step S41 to step S44 is performed on a set of all solutions or solution candidates and a set of features.
[0212]
Step S45: The solution or the solution candidate and the solution d by the machine learning unit 562 ₃ From the set with the set of features including, the probability of being a positive example or the probability of being a negative example is determined by machine learning for any solution or set of solution candidates and the set of features, and the learning result is obtained as the learning result. It is stored in the database 563.
[0213]
Step S46: A sentence for which a solution is to be obtained is input to the feature-solution candidate extraction unit 564.
[0214]
Step S47: The feature-solution candidate extraction unit 564 extracts a set of solution candidates and feature sets from the input sentence 3.
[0215]
Step S48: The second feature adding unit 522 transfers the set of the received solution candidate and the set of features to the unsupervised stacking learning processing system 1030.
[0216]
Step S49: The stack-less unsupervised learning processing system 1030 refers to the learning result stored in advance to determine the set of the solution candidate and the set of features from the set of the received solution candidate and the set of features. The probability of being a positive example or the probability of being a negative example is obtained when ₃ And the solution d ₃ 'To the second feature adding unit 522.
[0219]
Step S410: The solution d returned by the second feature adding unit 522 ₃ From ', the analysis result of unsupervised learning = solution d ₃ Add '”as a feature to the original set of features.
[0218]
Step S411: The solution estimation processing unit 565 refers to the learning result stored in the learning result database 563 to determine the probability of being a positive example or the probability of being a negative example in the case of the passed solution candidate and the set of features. Ask. This probability is obtained for all the solution candidates, and is output as a solution 4 for obtaining the solution candidate having the highest positive example probability.
[0219]
FIG. 20 shows another configuration example of the language analysis processing system according to the fourth embodiment. The language analysis processing system 580 has the same processing means as the language analysis processing system 550, and has a configuration in which a stack supervised / unsupervised learning processing system 1040 is provided instead of the stack unsupervised learning processing system 1030.
[0220]
The stack supervised / unsupervised learning processing system 1040 has a configuration in which the solution database 2 is added to the same processing means as the stack supervised / unsupervised learning processing system 1020. The stacking supervised / unsupervised learning processing system 1040 extracts a set of a solution or solution candidate and a set of features from unsupervised data generated from the sentence database 5 for the language analysis process, and From the set with the set of features, the probability of being a positive example or the probability of being a negative example at the time of what solution or solution candidate and the set of features is learned by machine learning, and the learning result is stored, With reference to the learning result, the probability of being a positive or negative example in the case of a set of a solution or a solution candidate received from the first feature adding unit 521 or the second feature adding unit 522 and a feature set is calculated. Is estimated as the solution (analysis result), and the estimated solution d ₄ To the first feature adding unit 521 or the solution d ₄ 'Is returned to the second feature adding unit 522.
[0221]
The learning processing system 1040 with / without a teacher for stacks has a solution d ₄ , Solution d ₄ ', It is possible to output a solution candidate estimated as a solution, and to output information as to whether the solution is a positive example or a negative example, information about a probability of a positive example or a negative example, and the like.
[0222]
The first feature adding unit 521 of the language analysis processing system 580 includes the solution d returned from the stacking supervised / unsupervised learning processing system 1040. ₄ And “analysis result of supervised / unsupervised learning = d ₄ Is added to the original set of features as a feature. The second feature adding unit 522 of the language analysis processing system 580 outputs the solution d returned from the stacking supervised / non-supervised learning processing system 1040. ₄ ', And the analysis result of supervised / unsupervised learning = d ₄ Add '”as a feature to the original set of features.
[0223]
FIG. 21 shows another configuration example of the language analysis processing system according to the fourth embodiment. The linguistic analysis processing system 600 has the same processing means as the linguistic analysis processing system 500, and further has a stacking supervised / unsupervised learning processing system 1020.
[0224]
The first feature adding unit 611 of the linguistic analysis processing system 600 includes only the feature set from the set of the solution and the feature set received from the solution-feature pair extracting unit 501, and the stack unsupervised learning processing system 1010 and the stack teacher. The solution d passed to the with / without learning processing system 1020 and returned from the unsupervised stacking learning processing system 1010 ₁ D returned from the learning processing system 1020 with and without a supervised stack ₂ Receive. Then, “analysis result of unsupervised learning = d ₁ Analysis results of "and" supervised / unsupervised learning = d ₂ ”As a feature to the original set of features.
[0225]
Also, the second feature adding unit 612 of the language analysis processing system 600 passes the set of features received from the feature extracting unit 504 to the stack unsupervised learning processing system 1010 and the stack supervised / unsupervised learning processing system 1020, and Solution d returned from the unsupervised learning processing system 1010 ₁ 'And the solution d returned from the supervised / unsupervised learning processing system 1020 for the stack ₂ ', And the analysis result of unsupervised learning = d ₁ Analysis result of "" and "supervised / unsupervised learning = d ₂ Add '”as a feature to the original set of features.
[0226]
FIG. 22 shows another configuration example of the language analysis processing system according to the fourth embodiment. The linguistic analysis processing system 650 has the same processing means as the linguistic analysis processing system 550, and further has a configuration in which a stacking supervised / non-supervised learning processing system 1040 is provided.
[0227]
The first feature adding unit 621 of the linguistic analysis processing system 650 converts the solution received from the feature-solution pair / feature-solution candidate pair extraction unit 561 or a set of solution candidates and a set of features into an unsupervised stacking learning processing system 1030. And the solution d passed to the stack supervised / unsupervised learning processing system 1040 and returned from the stack supervised learning processing system 1030 ₃ And the solution d returned from the learning processing system 1040 with and without the teacher for the stack ₄ Receive. Then, “analysis result of unsupervised learning = d ₃ Analysis results of "and" supervised / unsupervised learning = d ₄ ”As a feature to the original set of features.
[0228]
Further, the second feature adding unit 622 of the language analysis processing system 650 converts the set of the solution candidate and the feature set received from the feature-solution candidate extraction unit 564 into a stack unsupervised learning processing system 1030 and a stack supervised / The solution d passed to the unsupervised learning processing system 1040 and returned from the unsupervised learning processing system 1030 for the stack ₃ 'And the solution d returned from the supervised / unsupervised learning processing system 1040 for the stack ₄ ', And the analysis result of unsupervised learning = d ₃ Analysis result of "" and "supervised / unsupervised learning = d ₄ Add '”as a feature to the original set of features.
[0229]
The stack unsupervised learning processing system 1030 and the stack supervised / unsupervised learning processing system 1040 provide a solution d ₃ , Solution d ₃ ', Solution d ₄ , Solution d ₄ ', It is possible to output a solution candidate estimated as a solution, and to output information as to whether the solution is a positive example or a negative example, information about a probability of a positive example or a negative example, and the like. In this case, some or all of the information included in the received solution is added to the set of features. For example, “analysis result of unsupervised learning = estimated solution candidate”, “analysis result of unsupervised learning = positive example / negative example”, or “analysis result of unsupervised learning = probability of positive example / probability of negative example” One or more features such as "" are added to the original feature set.
[0230]
As already explained, since unsupervised data has different properties from supervised data, simply adding unsupervised data to supervised data and performing machine learning is not enough to improve processing accuracy. In some cases. By combining machine learning with unsupervised data and machine learning with supervised data using the stacking method as in this embodiment, the advantages of both learning can be used appropriately, and the accuracy of analysis processing can be improved. I think it was possible.
[0231]
Finally, examples of the conventional technique and the technique of the present invention will be described. As an embodiment, a case conversion process in a sentence conversion process from a passive sentence / causative sentence to an active sentence is adopted. The support vector machine method was adopted as the machine learning method. In addition, the Kyoto University corpus was used as supervised data, and all case particles (53,157) of active sentences included in the Kyoto University corpus were used as unsupervised data. FIG. 23 shows the distribution of case particles after conversion in unsupervised data.
[0232]
Further, the processing accuracy in the examples was also evaluated by using a Kyoto University corpus and performing cross-validation of 10 divisions.
[Reference 6: Yoshio Kurohashi, Makoto Nagao, Text Corpus Project of Kyoto University, The 3rd Annual Meeting of the Association for Language Processing, 1997, pp115-118]
An experiment of case particle conversion was performed using the following method.
[0233]
・ Use of supervised learning
・ Use of unsupervised learning
・ Use of learning with / without teacher
・ Stacking method 1:
After adding the analysis result of unsupervised learning to the feature, supervised learning is performed.
[0234]
・ Stacking method 2:
After adding the analysis result of the supervised / unsupervised learning to the feature, supervised learning is performed.
[0235]
・ Stacking method 3:
After adding the analysis result of unsupervised learning and the analysis result of supervised / unsupervised learning to a feature, supervised learning is performed.
[0236]
The evaluation results of the processing accuracy are shown below. The processing accuracy means how much of the 4,671 cases of supervised data were correctly answered.
[0237]
・ Use of supervised learning = 89.06%
・ Use of unsupervised learning = 51.15%
-Use of supervised / unsupervised learning = 87.09%
-Stacking method 1 = 89.47%
-Stacking method 2 = 89.55%
-Stacking method 3 = 89.55%
The accuracy of the processing using the supervised learning method was 89.06%. This means that the case particle conversion process in the sentence conversion from the passive sentence / causative sentence to the active sentence can be realized with at least this accuracy by processing using a machine learning method. Conventionally, there has been no case particle conversion processing using a machine learning method, and thus the accuracy shown by the embodiment of the present invention shows a special effect of the present invention.
[0238]
The accuracy of the processing using the unsupervised learning method was extremely low at 51.15%. It is considered that the effect of the lack of information on the prepositional case particle to be analyzed is large.
[0239]
The accuracy of the processing using the supervised / unsupervised learning method was also lower than the accuracy of the processing using the supervised learning method. Since unsupervised data has a different property from supervised data, it is considered that use of unsupervised data has led to a decrease in accuracy.
[0240]
The accuracy of the processing using all stacking methods exceeded the accuracy of the processing using the supervised learning method. However, the improvement in accuracy is not great. Therefore, as a result of performing a statistical test using a binomial test, all stacking methods had a significant difference at a significance level of 0.01 with respect to supervised learning. For this reason, it was confirmed that the technique of using the result of unsupervised learning in addition to the features in the present invention is effective.
[0241]
Further, for comparison with the accuracy of the “processing using supervised learning” of the present invention, processing according to the method described in Non-Patent Document 4 was performed as one of the conventional techniques.
[0242]
The accuracy of the case conversion process by the method described in Non-Patent Document 4 was 36% in F value (reproducibility 75%, precision 24%). The reason that the processing accuracy according to this conventional technique is low is that a given sentence contains a word that is not in the dictionary. The accuracy of the processing after registering undefined words in such a dictionary was 83% in F value (94% in recall, 74% in precision). Here, the precision is indicated by the F value because the case conversion in the method of Non-Patent Document 4 outputs a plurality of conversion results to one input. As described above, it can be understood that the influence of the insufficiency of each existing frame dictionary is large as already pointed out.
[0243]
In addition, since the processing result by the method of Non-Patent Document 4 is sentence unit, the processing result according to the present invention is also tabulated by sentence unit. At this time, in the processing according to the present invention, the accuracy per sentence was 85.58%. However, the sentence unit here is a sentence with one declinable word, and for a sentence composed of a plurality of sentences such as a compound sentence, the accuracy was calculated after dividing the sentence into one sentence.
[0244]
The accuracy of the processing according to the present invention is almost the same as the processing accuracy after an unknown word or the like is registered in a dictionary by the method described in Non-Patent Document 4. According to the present invention, the accuracy of about 85% is obtained without any additional registration of the information to be analyzed in the dictionary. This indicates that the processing according to the present invention can perform processing with higher accuracy than the conventional technique.
[0245]
Although the present invention has been described with reference to the embodiments, it is obvious that the present invention can be variously modified within the scope of the gist.
[0246]
The embodiment of the present invention mainly deals with the conversion of case particles in the conversion process from passive sentences and causative sentences to active sentences. However, since the classification destination in the machine learning unit of the present invention is a case particle in an active sentence to a passive sentence and a case particle in a causative sentence, the conversion process from an active sentence to a passive sentence and a causative sentence is also applied to the present invention. It is possible to apply
[0247]
In addition to the analysis processing described as the language analysis processing in the embodiment of the present invention, anaphoric analysis of indices, pronouns, zero pronouns, etc., indirect anaphoric analysis, semantic analysis of "A of B", metaphor analysis, etc. The present invention can be applied to various analysis processes, case particle generation processing in sentence generation processing, case particle generation processing in translation processing, and the like.
[0248]
Further, each unit, function, or element of the present invention can be realized as a processing program read and executed by a computer. Further, the processing program for realizing the present invention can be stored in an appropriate recording medium such as a computer-readable, portable medium memory, a semiconductor memory, and a hard disk, and is provided by being recorded on these recording media. Alternatively, it is provided by transmission / reception using various communication networks via a communication interface.
[0249]
【The invention's effect】
As described above, according to the present invention, an analysis result of machine learning using unsupervised data is added to a feature, and a new method of performing machine learning using supervised data having the added feature is realized. As a result, machine learning using the advantages of both unsupervised data and supervised data can be realized, and sentence conversion processing with higher accuracy can be realized.
[0250]
In particular, the present invention can be applied to a very wide range of problems including word / phrase generation processing such as abbreviation completion processing, sentence generation processing, machine translation processing, character recognition processing, and speech recognition processing. Thereby, a highly practical language analysis processing system can be realized.
[0251]
Further, according to the present invention, a new method for converting case particles in a conversion process from a passive / causative sentence in Japanese to an active sentence using machine learning has been realized. According to the present invention, it is possible to estimate a case particle after conversion with higher accuracy than before.
[0252]
The conversion of the passive sentence / causative sentence to the active sentence to which the present invention is applied is useful in many fields of natural language processing using computers, such as sentence generation processing, sentence paraphrasing processing, knowledge acquisition systems, and question answering systems. is there.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration example of a sentence conversion processing system according to a first embodiment.
FIG. 2 is a diagram illustrating a processing flow of a sentence conversion processing system according to the first embodiment.
FIG. 3 is a diagram showing an example of a case stored in a tagged corpus.
FIG. 4 is a diagram showing a concept of maximizing a margin in a support vector machine method.
FIG. 5 is a diagram illustrating another configuration example of the sentence conversion processing system according to the first embodiment.
FIG. 6 is a diagram showing a processing flow of a sentence conversion processing system having another configuration example in the first embodiment.
FIG. 7 is a diagram for explaining unsupervised data.
FIG. 8 is a diagram illustrating a configuration example of a sentence conversion processing system according to a second embodiment.
FIG. 9 is a diagram showing a processing flow of unsupervised data generation processing.
FIG. 10 is a diagram illustrating another configuration example of the sentence conversion processing system according to the second embodiment.
FIG. 11 is a diagram illustrating a configuration example of a sentence conversion processing system according to a third embodiment.
FIG. 12 is a diagram illustrating another configuration example of the sentence conversion processing system according to the third embodiment.
FIG. 13 is a diagram illustrating a configuration example of a language analysis processing system according to a fourth embodiment.
FIG. 14 is a diagram illustrating a processing flow of a language analysis processing system according to a fourth embodiment.
FIG. 15 is a diagram showing a processing flow of a language analysis processing system in a fourth embodiment.
FIG. 16 is a diagram illustrating another configuration example of the language analysis processing system according to the fourth embodiment.
FIG. 17 is a diagram illustrating another configuration example of the language analysis processing system according to the fourth embodiment.
FIG. 18 is a diagram showing a processing flow of a language analysis processing system having another configuration example in the fourth embodiment.
FIG. 19 is a diagram showing a processing flow of a language analysis processing system having another configuration example in the fourth embodiment.
FIG. 20 is a diagram illustrating another configuration example of the language analysis processing system according to the fourth embodiment.
FIG. 21 is a diagram illustrating another configuration example of the language analysis processing system according to the fourth embodiment.
FIG. 22 is a diagram illustrating another configuration example of the language analysis processing system according to the fourth embodiment.
FIG. 23 is a diagram illustrating a distribution of case particles after conversion in unsupervised data in the example.
[Explanation of symbols]
100, 150, 200, 250, 300, 350 sentence conversion processing system
101,501 solution-feature pair extraction unit
102,162,502,562 Machine learning unit
103,163,503,563 Learning result database
110,504 Feature extraction unit
111,171,505,565 Solution estimation processing unit
161,561 Feature-solution pair / feature-solution candidate pair extraction unit
170,564 feature-solution candidate extraction unit
201 Problem expression equivalent part extraction unit
202 Problem expression information storage unit
203 Semantic analysis information storage
204 Problem Structure Conversion Unit
205 Unsupervised data storage
500,540,550,580,600,650 Language analysis processing system
511, 521, 611, 621 First feature addition unit
512, 522, 612, 622 2nd feature addition part
1010,1030 Unsupervised learning processing system for stack
1020, 1040 Supervised / Unsupervised learning processing system for stack
2,6 solution database
3 Input sentence
4 Solution
5 sentence database

Claims

A method for processing language analysis using a machine learning method in a language analysis processing system including a stack processing system and a main processing system that performs processing using a processing result of the stack processing system,
In the stack processing system,
When performing analysis processing, access to a problem expression information storage unit that stores in advance information of expressions that are problematic in the machine learning method,
Based on the question expression information, extract a question expression equivalent part that matches the question expression information from the case data to which information about the question is not assigned,
Converting the case data obtained by converting the problem expression equivalent part with the problem expression equivalent part as a solution into unsupervised data having a structure as a problem,
Extracting a set of a solution and a set of features for each unsupervised data,
From a set of the solution and a set of features, learn what kind of solution is likely to be obtained by a machine learning method,
The result of the learning is stored in a learning result database,
Acquiring a first feature set extracted from the case data from the main processing system or a second feature set extracted from the case data input as a processing target,
With reference to the learning result database, from the first feature set or the second feature set, estimate what kind of solution is likely to be obtained from which feature,
Sending a first processing result for the first feature set or a second processing result for the second feature set to the main processing system;
In the main processing system,
When performing analysis processing, access a solution database that stores case data to which a solution to a problem handled by the machine learning method is assigned,
Retrieving the case data from the solution database, extracting a set of a solution and a set of features for each case data,
Sending a first feature set obtained by extracting only a feature set from the set of the solution and the feature set to the stack processing system;
Receiving the first processing result from the stack processing system;
Adding the first processing result as a feature to a first feature set extracted from the case data;
From a set of the solution and the first feature set to which the processing result is added, learning what kind of solution is likely to be obtained by machine learning using a machine learning method,
Storing the result of the learning in a learning result database,
Extract a set of features from the case data input as a processing target,
Sending a second set of features of the input case data to the stack processing system;
Receiving a second processing result output from the stack processing system;
Adding the second processing result as a feature to a second feature set of the input case data;
A language analysis process for estimating, from the second feature set to which the processing result has been added and referring to the learning result database, what kind of feature is likely to be a solution based on the second feature set. Method.

A method for processing language analysis using a machine learning method in a language analysis processing system including a stack processing system and a main processing system that performs processing using a processing result of the stack processing system,
In the stack processing system,
When performing analysis processing, access to a problem expression information storage unit that stores in advance information of expressions that are problematic in the machine learning method,
Based on the question expression information, extract a question expression equivalent part that matches the question expression information from the case data to which information about the question is not assigned,
Converting the case data obtained by converting the problem expression equivalent part with the problem expression equivalent part as a solution into unsupervised data having a structure as a problem,
Extracting a set of a solution or solution candidate and a set of features for each unsupervised data,
From the set of the solution or solution candidate and the set of features, the probability of being a positive example or the probability of being a negative example in the case of the solution or solution candidate and the set of features is learned by machine learning. ,
The result of the learning is stored in a learning result database,
From the main processing system, a first set of a set of solutions or solution candidates and features extracted from the case data or a second set of a set of solution candidates and features extracted from the case data input as a processing target Acquired,
With reference to the learning result database, from the first set or the second set, what kind of solution candidate and feature set is the probability of being a positive example or the probability of being a negative example for a set of features? And perform a process of estimating a solution candidate having the highest probability of being a positive example from all solution candidates as a solution,
Sending a first processing result for the first set or a second processing result for the second set to the main processing system;
In the main processing system,
When performing analysis processing, access a solution database that stores case data to which a solution to a problem handled by the machine learning method is assigned,
Extracting the case data from the solution database, extracting a first set of a solution or solution candidate and a set of features for each case data;
Sending the first set to the stack processing system;
Receiving the first processing result from the stack processing system;
Adding the first processing result to the first set of feature sets;
From the first set to which the first processing result has been added, the probability of a positive example or the probability of a negative example in the case of a set of solutions or solution candidates and features is determined by a machine learning method. Learn,
Storing the result of the learning in a learning result database,
A second set of solution candidates and feature sets is extracted from the case data input as a processing target,
Sending the second set to the stack processing system;
Receiving a second processing result from the stack processing system;
Adding the second processing result as a feature to the second set of features,
With reference to the learning result database, from the second set to which the second processing result is added, what kind of solution candidate and feature set is a probability of a positive example or a negative example A language analysis processing method characterized in that a probability is obtained as a solution that has the highest probability of being a positive example among all solution candidates.

The language analysis processing method according to claim 1 or 2,
A language analysis processing method, wherein the stack processing system and the main processing system analyze a case particle after conversion in a sentence conversion process from a passive sentence or a causative sentence to an active sentence.

A sentence conversion processing method for converting a passive sentence or a causal sentence digitized using a computer into an active sentence,
Accessing a solution database that stores case data to which solutions to problems handled by machine learning are assigned when performing conversion processing from passive sentences or causative sentences to active sentences,
Retrieving the case data from the solution database, extracting a set of a solution and a set of features for each case data,
From a set of the solution and a set of features, learn what kind of solution is likely to be obtained by a machine learning method,
Storing the result of the learning in a learning result database,
Extract a set of features from the digitized sentence input as the analysis target,
A sentence conversion processing method characterized by referring to the learning result database and estimating what solution is likely to be obtained in the case of a feature extracted from the input sentence.

A sentence conversion processing method for converting a passive sentence or a causal sentence digitized using a computer into an active sentence,
Accessing a solution database that stores case data to which solutions to problems handled by machine learning are assigned when performing conversion processing from passive sentences or causative sentences to active sentences,
Retrieving the case data from the solution database, extracting a set of a solution or solution candidate and a set of features for each case data,
From the set of the solution or solution candidate and the set of features, the probability of being a positive example or the probability of being a negative example in the case of the solution or solution candidate and the set of features is learned by machine learning. ,
Storing the result of the learning in a learning result database,
A set of solution candidates and a set of features is extracted from the digitized sentence input as the analysis target,
With reference to the learning result database, from a set of solution candidates and feature sets extracted from the input sentence, a probability or a negative example of what kind of solution candidates and feature sets is a positive example. A sentence conversion processing method characterized in that a probability is determined, and a solution candidate having the highest probability of being a positive example is estimated as a solution from all solution candidates.

A system for processing language analysis using a machine learning method including a stack processing system and a main processing system that performs processing using the processing result of the stack processing system,
The stack processing system includes:
Problem expression information storage means in which information of expressions to be a problem in the machine learning method when performing analysis processing is stored in advance;
Processing means for extracting, based on the question expression information, a question expression equivalent part that matches the question expression information from case data to which information about the question is not assigned; Processing means for converting the case data obtained by converting the expression equivalent part into unsupervised data having a structure in question;
Processing means for extracting a set of a solution and a set of features for each unsupervised data;
From the set of the solution and the set of features, processing means for learning what kind of solution is likely to be what kind of solution by machine learning method,
A learning result database storing results of the learning;
Processing means for acquiring a first feature set extracted from case data from the main processing system or a second feature set extracted from case data input as a processing target;
Processing means for referring to the learning result database and estimating from the first feature set or the second feature set which kind of solution is likely to be a solution based on the first feature set or the second feature set;
Processing means for sending a first processing result for the first feature set or a second processing result for the second feature set to the main processing system,
The main processing system includes:
A solution database storing case data to which solutions to problems handled by the machine learning method are provided when performing analysis processing;
Processing means for extracting the case data from the solution database and extracting a set of a solution and a set of features for each case data;
Processing means for sending to the stack processing system a first feature set obtained by extracting only a feature set from the set of the solution and the feature set;
Processing means for receiving the first processing result from the stack processing system; processing means for adding the first processing result as a feature to a first feature set extracted from the case data;
Processing means for learning, from a set of the solution and the first feature set to which the processing result is added, what kind of solution is likely to be obtained by a machine learning method,
A learning result database storing results of the learning;
Processing means for extracting a set of features from case data input as a processing target;
Processing means for transmitting a second set of features of the input case data to the stack processing system;
Processing means for receiving a second processing result output from the stack processing system;
Processing means for adding the second processing result as a feature to a second feature set of the input case data;
Processing means for estimating what kind of solution is likely to be obtained based on the second feature set to which the processing result is added with reference to the learning result database. A language analysis processing system.

A system configured to include a stack processing system and a main processing system that performs processing using a processing result of the stack processing system, and that performs language analysis using a machine learning method.
The stack processing system includes:
Problem expression information storage means in which information of expressions to be a problem in the machine learning method when performing analysis processing is stored in advance;
Based on the problem expression information, processing means for extracting a problem expression equivalent part that matches the problem expression information from case data to which information about the problem is not added,
Processing means for converting the case data obtained by converting the problem expression equivalent part with the problem expression equivalent part as a solution into unsupervised data having a structure as a problem;
Processing means for extracting a set of a solution or a solution candidate and a set of features for each unsupervised data;
From the set of the solution or solution candidate and the set of features, the probability of a positive example or the probability of a negative example in the case of the solution or the solution and the set of features is learned by a machine learning method. Processing means;
A learning result database storing results of the learning;
From the main processing system, a first set of a set of solutions or solution candidates and features extracted from the case data or a second set of a set of solution candidates and features extracted from the case data input as a processing target Processing means for acquiring;
With reference to the learning result database, from the first set or the second set, what kind of solution candidate and feature set is the probability of being a positive example or the probability of being a negative example for a set of features? Processing means for determining and estimating a solution candidate having the highest probability of being a positive example among all solution candidates as a solution;
Processing means for sending a first processing result for the first set or a second processing result for the second set to the main processing system,
The main processing system includes:
A solution database storing case data to which solutions to problems handled by the machine learning method are provided when performing analysis processing;
Processing means for extracting the case data from the solution database and extracting a first set of a solution or a solution candidate and a set of features for each case data;
Processing means for sending the first set to the stack processing system;
Processing means for receiving the first processing result from the stack processing system;
Processing means for adding the first processing result to the first set of feature sets;
From the first set to which the first processing result has been added, the probability of a positive example or the probability of a negative example in the case of a set of solutions or solution candidates and features is determined by a machine learning method. Processing means for learning;
A learning result database storing results of the learning;
Processing means for extracting a second set of solution candidates and feature sets from the case data input as a processing target;
Processing means for sending the second set to the stack processing system;
Processing means for receiving a second processing result from the stack processing system;
Processing means for adding the second processing result as a feature to the second set of features,
With reference to the learning result database, from the second set to which the second processing result is added, what kind of solution candidate and feature set is a probability of a positive example or a negative example Processing means for determining the probability of the solution candidate and estimating the solution candidate having the highest probability of being a positive example from among all the solution candidates.

In the language analysis processing system according to any one of claims 6 and 7,
A language analysis processing system, wherein the stack processing system and the main processing system analyze a case particle after conversion in a sentence conversion process from a passive sentence or a causal sentence to an active sentence.

A sentence conversion processing system that converts a passive sentence or a causative sentence digitized using a computer into an active sentence,
A solution database that stores case data to which a solution to a problem handled by the machine learning method is provided when performing a conversion process from a passive sentence or a causative sentence to an active sentence;
Processing means for extracting the case data from the solution database and extracting a set of a solution and a set of features for each case data;
From the set of the solution and the set of features, processing means for learning what kind of solution is likely to be what kind of solution by machine learning method,
A learning result database storing results of the learning;
Processing means for extracting a set of features from the digitized sentence input as an analysis target;
Processing means for estimating what solution is likely to be obtained in the case of a feature extracted from the input sentence with reference to the learning result database.

A sentence conversion processing system that converts a passive sentence or a causative sentence digitized using a computer into an active sentence,
A solution database that stores case data to which a solution to a problem handled by the machine learning method is provided when performing a conversion process from a passive sentence or a causative sentence to an active sentence;
Processing means for extracting the case data from the solution database and extracting a set of a solution or a solution candidate and a set of features for each case data;
From the set of the solution or solution candidate and the set of features, the probability of a positive example or the probability of a negative example in the case of the solution or the solution and the set of features is learned by a machine learning method. Processing means;
A learning result database storing results of the learning;
Processing means for extracting a set of solution candidates and a set of features from the digitized sentence input as an analysis target;
With reference to the learning result database, from the set of solution candidates and feature sets extracted from the input sentence, the probability or negative example of what kind of solution candidates and feature sets is a positive example A sentence conversion processing system, comprising: processing means for obtaining a probability of existence and estimating a solution candidate having the highest probability of being a positive example among all solution candidates as a solution.