JP5139701B2

JP5139701B2 - Language analysis model learning apparatus, language analysis model learning method, language analysis model learning program, and recording medium thereof

Info

Publication number: JP5139701B2
Application number: JP2007063941A
Authority: JP
Inventors: 潤鈴木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-03-13
Filing date: 2007-03-13
Publication date: 2013-02-06
Anticipated expiration: 2027-03-13
Also published as: JP2008225907A

Description

本発明は、言語解析モデル学習技術に係り、特に、文字列または記号列に付与すべきラベルを推定するために使用される言語解析モデルを学習する言語解析モデル学習技術に関する。 The present invention relates to a language analysis model learning technique, and more particularly to a language analysis model learning technique for learning a language analysis model used for estimating a label to be assigned to a character string or a symbol string.

従来、例えば、テキスト、ＤＮＡ塩基配列、ｗｅｂ空間等に関する文字列または記号列といった系列構造を有した入力（入力系列）に対して、分類用のタグとしてラベル（ラベル系列、または出力系列ともいう）を付与する問題が知られている。このような問題のことを、以下、系列構造予測問題と呼び、入力系列ｘに対して出力系列ｙを付与する装置（またはプログラム）のことを系列構造予測器と呼ぶ。入力系列ｘおよび出力系列ｙの具体例を図１８に示す。 Conventionally, for example, a label (also referred to as a label series or an output series) is used as a classification tag for an input (input series) having a series structure such as a character string or a symbol string related to text, DNA base sequence, web space, etc. The problem of granting is known. Such a problem is hereinafter referred to as a sequence structure prediction problem, and an apparatus (or program) that assigns an output sequence y to an input sequence x is referred to as a sequence structure predictor. Specific examples of the input series x and the output series y are shown in FIG.

図１８（ａ）は、系列構造予測問題のうちテキストを形態素に区切る問題に関するものであり、テキストに対して言語的な特徴を示すラベルを付与する例を示している。入力系列ｘは、「不動産情報登記評価システムが１３日に発足した。」のように、１２個の形態素に区切られている。ここで、形態素とは、これ以上に細かくすると意味がなくなってしまう最小の文字列をいう。ラベル「Ｂ」は、文字列上の形態素の開始位置を示し、ラベル「Ｉ」は、ラベル「Ｂ」で開始した形態素の範囲内に含まれる位置を示している。 FIG. 18A relates to a problem of dividing a text into morphemes among sequence structure prediction problems, and shows an example in which a label indicating a linguistic feature is given to the text. The input series x is divided into 12 morphemes as “Real estate information registration evaluation system was established on the 13th”. Here, the morpheme is the smallest character string that is meaningless if it is made finer than this. The label “B” indicates the start position of the morpheme on the character string, and the label “I” indicates the position included in the range of the morpheme that starts with the label “B”.

図１８（ｂ）は、系列構造予測問題のうちテキストから固有表現を抽出する問題に関するものであり、固有名詞に対して固有名詞の種類を示すラベルを付与する例を示している。入力系列ｘは、「田中一郎は陸上連盟の会長です」のように、８個の形態素に区切られている。このうち、「田中」、「一郎」、「陸上」、「連盟」の４つの形態素は、固有表現を示すので、ラベル「Ｂ−人名」、「Ｉ−人名」、「Ｂ−組織名」、「Ｉ−組織名」がそれぞれ付与されている。また、ラベル「Ｏ」は、固有表現以外の形態素を示している。なお、「田中一郎」、「陸上連盟」の２つの固有名詞も固有表現を示すので、ラベル「人名」、「組織名」がそれぞれ付与されている。 FIG. 18B relates to a problem of extracting a proper expression from text in a sequence structure prediction problem, and shows an example in which a label indicating the type of proper noun is given to a proper noun. The input sequence x is divided into 8 morphemes, such as “Ichiro Tanaka is the president of the Land Federation”. Of these, the four morphemes “Tanaka”, “Ichiro”, “Land”, and “Alliance” indicate specific expressions, so the labels “B-person name”, “I-person name”, “B-organization name”, “I-Organization name” is assigned. The label “O” indicates a morpheme other than the unique expression. Note that the two proper nouns “Ichiro Tanaka” and “National Land Federation” also show proper expressions, and thus the labels “person name” and “organization name” are assigned respectively.

図１８（ｃ）は、系列構造予測問題のうちＤＮＡ塩基配列から遺伝子領域を推定する問題に関するものであり、４種類の文字列（Ｔ，Ｃ，Ａ，Ｇ）を用いた３個の塩基の順列（コドン）に対してアミノ酸を示すラベルを付与する例を示している。ここでは、コドン「ＡＴＧ」に対して、たんぱく質への翻訳開始を示す開始コドンおよびメチオニンを示すラベル「Ｍ」が付与されている。また、コドン「ＴＧＡ」に対して、たんぱく質への翻訳終了を示す終始コドンおよびヒスチジンを示すラベル「Ｈ」が付与されている。また、コドン「ＡＴＧ」とコドン「ＴＧＡ」とに挟まれた各コドンには、対応するラベル「Ｒ」、「Ｄ」、「Ｗ」、「Ｑ」が付与されている。また、コドン「ＡＴＧ」より前（左側）の文字と、コドン「ＴＧＡ」より後（右側）の文字とには、対応するアミノ酸ではないことを示すために、ラベル「Ｏ」が付与されている。 FIG. 18 (c) relates to a problem of estimating a gene region from a DNA base sequence in a series structure prediction problem. Three bases using four types of character strings (T, C, A, G) are shown. The example which provides the label which shows an amino acid with respect to a permutation (codon) is shown. Here, the codon “ATG” is given a start codon indicating the start of translation into protein and a label “M” indicating methionine. In addition, a codon “TGA” is provided with a termination codon indicating completion of translation into protein and a label “H” indicating histidine. Corresponding labels “R”, “D”, “W”, and “Q” are given to the codons sandwiched between the codon “ATG” and the codon “TGA”. Further, a label “O” is given to the character before (left side) the codon “ATG” and the character after (right side) the codon “TGA” to indicate that it is not a corresponding amino acid. .

従来の言語解析モデル学習装置は、対象とする系列構造予測問題に関して入力系列に正解ラベル系列が付与された正解データを用いて事前に統計学的な学習（以下、教師あり学習という）を行っている。ここで、正解データとは、学習の前に入力系列と出力系列との組が分かっているデータである。そして、教師あり学習で得られたパラメタを用いて、入力系列とラベル系列との対応関係を示す言語解析モデルが作成される。そして、系列構造予測器は、作成された言語解析モデルを利用することで、入力系列に対してラベルを実際に付与することが可能となる。 A conventional language analysis model learning device performs statistical learning (hereinafter referred to as supervised learning) in advance using correct data in which a correct label sequence is assigned to an input sequence for a target sequence structure prediction problem. Yes. Here, the correct answer data is data in which a set of an input sequence and an output sequence is known before learning. Then, a language analysis model indicating the correspondence between the input series and the label series is created using the parameters obtained by supervised learning. The sequence structure predictor can actually assign a label to the input sequence by using the created language analysis model.

教師あり学習の具体的な方法として、従来は、局所的な最適解を組み合わせる方法が用いられていた。近年では、条件付き確率場に代表される大域的最適化に基づく方法が用いられるようになってきており、性能が向上している（例えば、非特許文献１参照）。 As a specific method of supervised learning, conventionally, a method of combining local optimum solutions has been used. In recent years, a method based on global optimization typified by a conditional random field has come to be used, and performance has been improved (for example, see Non-Patent Document 1).

この条件付き確率場を用いる方法は、例えば、言語情報処理単位かつ意味のまとまりである「チャンク」毎に解析しながら読むチャンキング（Chunking）や、固有表現抽出といった自然言語解析タスクにおいて、現在最もよい性能を示す方法の一つとして広く利用されるようになっている。条件付き確率場の方法は、条件付き確率ｐ（ｙ｜ｘ）をモデル化する方法に分類される方法である。このように条件付き確率ｐ（ｙ｜ｘ）をモデル化して系列構造を予測する手法を、以下では、識別アプローチという。
J． Lafferty, A. McCallum and F. Pereira, Conditional Random Fields：Probabilistic Models for Segmenting and Labeling Sequence Data, In Proc. of ICML-2001, Pages 282-289, 2001 This method of using a conditional random field is currently the most used in natural language analysis tasks such as chunking to read while analyzing for each “chunk” that is a unit of linguistic information processing and meaning, and extraction of proper expressions. It is widely used as one of the methods showing good performance. The conditional random field method is a method classified as a method of modeling the conditional probability p (y | x). A method for predicting the sequence structure by modeling the conditional probability p (y | x) in this way is hereinafter referred to as an identification approach.
J. Lafferty, A. McCallum and F. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, In Proc. Of ICML-2001, Pages 282-289, 2001

系列予測問題に関わらず、一般に、性能のよい予測を行うためには、教師あり学習時に扱う特徴空間を十分に覆うデータ量が必要であると認識されている。しかし、正解データは、人手により作成する必要があるため、作成コストが高いという問題がある。
一方、系列予測問題に関する学習で扱うラベル系列（出力系列）ｙの個々のインスタンスｙ_i（ｉ＝１，２，…）間には相互依存性があるため、系列予測問題に関する学習で扱う特徴空間は非常に大きなものになってしまう。そして、予測性能を高めるためには、膨大な正解データが必要となる。しかも、系列予測問題には、ラベル系列の相互依存性という特徴があるため、正解データを作成するためには、対象とするタスクに関するさらに高い専門的な能力とコストとが要求される。 Regardless of the sequence prediction problem, it is generally recognized that in order to perform prediction with good performance, a data amount that sufficiently covers the feature space handled during supervised learning is required. However, since correct data needs to be created manually, there is a problem that the creation cost is high.
On the other hand, since each instance y _i (i = 1, 2,...) Of the label sequence (output sequence) y handled in the learning related to the sequence prediction problem is interdependent, the feature space handled in the learning related to the sequence prediction problem. Will be very big. And in order to improve prediction performance, a huge amount of correct answer data is required. Moreover, since the sequence prediction problem is characterized by the interdependence of label sequences, in order to create correct data, higher specialized ability and cost are required for the target task.

一方、従来の言語解析モデル学習装置は、人手により作成された限られた量の正解データしか利用していないため、正解データ以外の未知データに対する性能はよくない。特に、同じタスクで別ドメインのデータの系列構造を予測する場合には、その別ドメイン特有の特徴が学習できていないため、予測性能が大幅に劣化する場合がある。具体的な例を挙げると、自然言語処理の固有表現抽出タスクでは、一般的に新聞記事の正解データをモデル学習に利用する。この新聞記事の正解データを用いて学習した結果得られた言語解析モデルを用いて系列構造予測器がｗｅｂデータの固有表現抽出を行なった場合に、ｗｅｂデータに対する固有表現抽出性能は、新聞記事に対する固有表現抽出性能に比べて大幅に低くなる。これは、ｗｅｂデータが学習データの新聞記事と異なる別ドメインであるからである。 On the other hand, since the conventional language analysis model learning apparatus uses only a limited amount of correct answer data created manually, the performance of unknown data other than correct answer data is not good. In particular, when the sequence structure of data in another domain is predicted using the same task, the prediction performance may be significantly deteriorated because features unique to that domain cannot be learned. As a specific example, in the natural language processing specific expression extraction task, correct data of newspaper articles is generally used for model learning. When the sequence structure predictor extracts the web data by using the language analysis model obtained as a result of learning using the correct data of the newspaper article, the proper expression extraction performance for the web data is as follows. It is significantly lower than the proper expression extraction performance. This is because the web data is in a different domain from the newspaper article of the learning data.

また、正解データは、人手により作成されるので、限られた量しか存在しない。例えば、固有表現抽出タスクをあるドメインで適用しようとする場合に、ｗｅｂデータを含む多くのドメインでは、その適用したいドメインでの正解データが存在しない。そのため、性能のよい系列構造予測器の作成が困難である場合がほとんどである。 In addition, since the correct answer data is created manually, there is only a limited amount. For example, when applying a specific expression extraction task in a certain domain, in many domains including web data, there is no correct data in the domain to be applied. Therefore, in most cases, it is difficult to create a high-performance sequence structure predictor.

一方、適用したいドメインのラベルなしデータの獲得は比較的簡単である場合が多い。ここで、ラベルなしデータとは、正解ラベルが付与されていないデータ、つまり、加工されていないデータを意味する。以下では、このラベルなしデータと対照的に正解ラベルを有している正解データのことを、ラベルありデータということにする。 On the other hand, acquisition of unlabeled data of a domain to be applied is often relatively easy. Here, unlabeled data means data to which no correct answer label is assigned, that is, data that has not been processed. Hereinafter, correct data having a correct label in contrast to the unlabeled data is referred to as labeled data.

ラベルなしデータには、正解ラベル（正解出力系列）の情報が付与されていないので、ラベルなしデータをどのようにすれば学習に用いることができるかは知られていない。試みに、条件付き確率場の方法に代表される識別アプローチにおいてラベルなしデータを学習に用いるために、古典的な条件付き確率によるモデル（識別モデル）を設計してみると、条件式からラベルなしデータの項が消去されてしまう。つまり、識別アプローチは、教師あり学習の設定ではよい性能を示すが、ラベルなしデータを取り込むことは難しいアプローチであると言える。 Since unlabeled data is not given information on correct labels (correct output series), it is not known how unlabeled data can be used for learning. In an attempt to design a model (classification model) with classical conditional probability to use unlabeled data for learning in the identification approach represented by the conditional random field method, there is no label from the conditional expression. Data terms are deleted. In other words, the identification approach shows good performance in the supervised learning setting, but it can be said that it is a difficult approach to capture unlabeled data.

また、例えば、同時確率ｐ（ｘ，ｙ）によるモデル（生成モデル）を設計する方法である生成アプローチにおいて、ラベルなしデータを学習に用いる場合には、ＥＭ（Expectation Maximization）アルゴリズムを用いることで、正解ラベル系列（出力系列）の情報を欠損情報としてラベルなしデータを自然かつ簡単に取り込む枠組が存在する。ただし、この生成アプローチでは、条件付き確率場といった教師あり学習の設定での識別アプローチに比べて予測性能が遠く及ばない場合がほとんどである。 Further, for example, in a generation approach that is a method for designing a model (generation model) based on the joint probability p (x, y), when using unlabeled data for learning, an EM (Expectation Maximization) algorithm is used. There is a framework in which unlabeled data is taken in naturally and easily by using information on the correct label series (output series) as missing information. However, in this generation approach, in most cases, the prediction performance does not reach far compared with an identification approach in a supervised learning setting such as a conditional random field.

そこで、本発明では、前記した問題を解決し、低コストで言語解析モデルの予測性能を向上させるためにラベルありデータとラベルなしデータとを入力として系列構造予測器の学習を行うことのできる言語解析モデル学習技術を提供することを目的とする。 Therefore, in the present invention, in order to solve the above-described problem and improve the prediction performance of the language analysis model at a low cost, a language that can learn a sequence structure predictor using labeled data and unlabeled data as input. The purpose is to provide analytical model learning technology.

前記課題を解決するために、請求項１に記載の言語解析モデル学習装置は、文字列または記号列にラベルが付与されたデータを示すラベルありデータと、文字列または記号列を示すラベルなしデータとを入力データとして、識別モデルと生成モデルとに基づいて、文字列または記号列に付与すべきラベルを推定するために使用される言語解析モデルを学習する言語解析モデル学習装置であって、前記識別モデルが、入力される文字列または記号列を条件に予め定められたラベル候補が出現する確率を示す条件付き確率を用いて前記付与すべきラベルを推定するモデルであり、前記生成モデルが、入力される文字列または記号列と前記予め定められたラベル候補とが同時に生成される確率を示す同時確率を用いて前記付与すべきラベルを推定するモデルであり、前記入力されたラベルなしデータを利用して、予め学習された識別モデル用パラメタベクトル集合と、予め定められたモデル統合用パラメタ集合とを用いて、第１目的関数を最大化する生成モデル用パラメタベクトル集合を決定する生成モデル学習手段と、前記入力されたラベルありデータを利用して、前記予め学習された識別モデル用パラメタベクトル集合と、前記生成モデル学習手段で決定された生成モデル用パラメタベクトル集合とを用いて、第２目的関数を最大化する前記モデル統合用パラメタ集合を決定するモデル統合学習手段と、を備え、前記生成モデル学習手段で前記生成モデル用パラメタベクトル集合を決定する処理と、前記モデル統合学習手段で前記モデル統合用パラメタ集合を決定する処理とを交互に実行し、前記生成モデル学習手段及び前記モデル統合学習手段が交互に決定した前記生成モデル用パラメタベクトル集合と前記モデル統合用パラメタ集合とのいずれか一方が所定の収束条件を満たすか否かを判別し、前記収束条件を満たすと判定したときに、その時点の前記生成モデル用パラメタベクトル集合と前記モデル統合用パラメタ集合とを出力する収束判定手段とを備え、前記第１目的関数は、前記識別モデル用パラメタベクトル集合と、前記生成モデル用パラメタベクトル集合と、前記モデル統合用パラメタ集合とを用いて、ラベルなしデータが与えられた時のすべての出力に対する識別関数の出力値の合計を算出する関数であり、前記第２目的関数は、前記識別モデル用パラメタベクトル集合と、前記生成モデル用パラメタベクトル集合と、前記モデル統合用パラメタ集合とを用いて、ラベルありデータを正しく識別できる度合いを算出する関数であることを特徴とする。 In order to solve the above-mentioned problem, the language analysis model learning device according to claim 1 is provided with labeled data indicating data in which a character string or symbol string is labeled, and unlabeled data indicating a character string or symbol string. Is a language analysis model learning device that learns a language analysis model used to estimate a label to be assigned to a character string or a symbol string based on an identification model and a generation model, The identification model is a model that estimates the label to be assigned using a conditional probability indicating a probability that a predetermined label candidate appears on the condition of an input character string or symbol string, and the generation model includes: The label to be added is estimated using a joint probability indicating a probability that an input character string or symbol string and the predetermined label candidate are generated simultaneously. A model, using the unlabeled data the input, advance and parameter vector set for learning identification model, using the model integration parameter set predetermined you maximize first objective function and generating model learning means for determining a parameter vector set for generating model utilizes There labels that are the input data, and parameter vector set for the pre-learning identification model determined in the product model learning means by using the generative model for parameter vector set, and model integrated learning means for determining a pre-SL model integration parameter set that maximize the second objective function comprises, parameters for the product model in the generating the model learning unit vector The process of determining the set and the process of determining the parameter set for model integration by the model integrated learning means are alternately performed. And rows, one of said model integration parameter set and the generated model parameter vector set for learning means and said generating models model integrated learning means has determined alternately determine whether or not a predetermined convergence condition is satisfied And a convergence determination means for outputting the generation model parameter vector set and the model integration parameter set at that time when it is determined that the convergence condition is satisfied, and the first objective function includes the identification Using the model parameter vector set, the generated model parameter vector set, and the model integration parameter set, the sum of output values of the discriminant function for all outputs when unlabeled data is given is calculated. And the second objective function includes the identification model parameter vector set and the generation model parameter vector. It is a function that calculates the degree to which labeled data can be correctly identified by using a model set and the model integration parameter set .

また、請求項６に記載の言語解析モデル学習方法は、文字列または記号列にラベルが付与されたデータを示すラベルありデータと、文字列または記号列を示すラベルなしデータとを入力データとして、識別モデルと生成モデルとに基づいて、文字列または記号列に付与すべきラベルを推定するために使用される言語解析モデルを学習する言語解析モデル学習装置の言語解析モデル学習方法であって、前記識別モデルが、入力される文字列または記号列を条件に予め定められたラベル候補が出現する確率を示す条件付き確率を用いて前記付与すべきラベルを推定するモデルであり、前記生成モデルが、入力される文字列または記号列と前記予め定められたラベル候補とが同時に生成される確率を示す同時確率を用いて前記付与すべきラベルを推定するモデルであり、生成モデル学習手段によって、前記入力されたラベルなしデータを利用して、予め学習された識別モデル用パラメタベクトル集合と、予め定められたモデル統合用パラメタ集合とを用いて、第１目的関数を最大化する生成モデル用パラメタベクトル集合を決定するステップと、モデル統合学習手段によって、前記入力されたラベルありデータを利用して、前記予め学習された識別モデル用パラメタベクトル集合と、前記生成モデル学習手段で決定された生成モデル用パラメタベクトル集合とを用いて、第２目的関数を最大化する前記モデル統合用パラメタ集合を決定するステップと、を交互に実行し、収束判定手段によって、前記生成モデル学習手段及び前記モデル統合学習手段が交互に決定した前記生成モデル用パラメタベクトル集合と前記モデル統合用パラメタ集合とのいずれか一方が所定の収束条件を満たすか否かを判別し、前記収束条件を満たすと判定したときに、その時点の前記生成モデル用パラメタベクトル集合と前記モデル統合用パラメタ集合とを出力するステップとを含み、前記第１目的関数は、前記識別モデル用パラメタベクトル集合と、前記生成モデル用パラメタベクトル集合と、前記モデル統合用パラメタ集合とを用いて、ラベルなしデータが与えられた時のすべての出力に対する識別関数の出力値の合計を算出する関数であり、前記第２目的関数は、前記識別モデル用パラメタベクトル集合と、前記生成モデル用パラメタベクトル集合と、前記モデル統合用パラメタ集合とを用いて、ラベルありデータを正しく識別できる度合いを算出する関数であることを特徴とする。 The language analysis model learning method according to claim 6 uses, as input data, labeled data indicating data in which a character string or a symbol string is labeled, and unlabeled data indicating a character string or a symbol string, A language analysis model learning method of a language analysis model learning device for learning a language analysis model used for estimating a label to be assigned to a character string or a symbol string based on an identification model and a generation model, The identification model is a model that estimates the label to be assigned using a conditional probability indicating a probability that a predetermined label candidate appears on the condition of an input character string or symbol string, and the generation model includes: The label to be added is estimated using a joint probability indicating a probability that an input character string or symbol string and the predetermined label candidate are generated simultaneously. A model, by generating the model learning unit, using the unlabeled data the input, using pre and parameter vector set for learning identification model, and the model integration parameter set predetermined first determining a generate models for parameter vector set you maximize 1 objective function, the model integrated learning means, there labels that are the input using the data, and the parameter vector set in advance for learning identification model , using said generated model parameter vector set for learning generated model determined by means executes determining a pre SL model integration parameter set that maximize the second objective function, the alternately convergence determination by means para for the generative model learning means and said generating models model integrated learning means has determined alternately One of said model integration parameter set and data vector set is determined whether or not a predetermined convergence condition is satisfied, when it is determined that the convergence condition is satisfied, and the parameter vector set for generating models that point look including the step of outputting said model integration parameter set, wherein the first objective function using a parameter vector set for the identification model, and parameter vector set for the generation model, and the model integration parameter set And calculating the sum of the output values of the discriminant function for all outputs when unlabeled data is given, wherein the second objective function includes the discriminant model parameter vector set and the generated model parameter Using the vector set and the parameter set for model integration, calculate the degree to which labeled data can be correctly identified It is a function to perform.

請求項１に記載の言語解析モデル学習装置または請求項６に記載の言語解析モデル学習方法によれば、言語解析モデル学習装置は、ラベルなしデータを利用して生成モデル用パラメタベクトル集合を決定することで、生成アプローチによりラベルなしデータを取り込む。そして、言語解析モデル学習装置は、決定された生成モデル用パラメタベクトル集合とラベルありデータとを利用してモデル統合用パラメタ集合を決定することで、生成アプローチにより取り込んだラベルなしデータを識別アプローチにより学習することができる。そして、言語解析モデル学習装置は、生成モデル用パラメタベクトル集合とモデル統合用パラメタ集合との一方が収束するまで交互に決定することで最適な生成モデル用パラメタベクトル集合およびモデル統合用パラメタ集合とを出力する。したがって、これら出力される生成モデル用パラメタベクトル集合およびモデル統合用パラメタ集合と、学習済みの識別モデル用パラメタベクトル集合とから構成される言語解析モデルは低コストで高い予測性能を実現することが可能である。 According to the language analysis model learning device according to claim 1 or the language analysis model learning method according to claim 6 , the language analysis model learning device determines a generation model parameter vector set using unlabeled data. Thus, the unlabeled data is captured by the generation approach. Then, the language analysis model learning device determines the model integration parameter set by using the determined generation model parameter vector set and the labeled data, thereby identifying the unlabeled data captured by the generation approach by the identification approach. Can learn. Then, the language analysis model learning device determines the optimum generation model parameter vector set and model integration parameter set by alternately determining until one of the generation model parameter vector set and the model integration parameter set converges. Output. Therefore, the language analysis model that consists of these output generation model parameter vector set and model integration parameter set and learned identification model parameter vector set can achieve high prediction performance at low cost. It is.

また、請求項２に記載の言語解析モデル学習装置は、請求項１に記載の言語解析モデル学習装置において、前記生成モデル学習手段が、前記第１目的関数として後記する式（１４）に示す目的関数Ｇ（Θ｜Γ）を計算する目的関数計算手段と、後記する式（１４）で示される目的関数Ｇ（Θ｜Γ）を、固定したΛとΓの下で最大化するパラメタベクトル集合Θを求める処理を行う補助関数計算手段と、前記収束判定手段で前記収束条件を満たさないと判定された場合に、前記求められた生成モデル用パラメタベクトル集合Θを更新するパラメタ更新手段と、を備え、前記モデル統合学習手段が、前記第２目的関数として後記する式（１６）に示す目的関数ＬFurther, the language analysis model learning device according to claim 2 is the language analysis model learning device according to claim 1, wherein the generation model learning means has an object represented by an expression (14) described later as the first objective function. An objective function calculation means for calculating the function G (Θ | Γ), and a parameter vector set Θ that maximizes the objective function G (Θ | Γ) represented by the following formula (14) under fixed Λ and Γ Auxiliary function calculation means for performing a process for obtaining a parameter update means for updating the obtained generation model parameter vector set Θ when the convergence determination means determines that the convergence condition is not satisfied. , The model integrated learning means uses the objective function L shown in the following equation (16) as the second objective function. ^SS-HybSS-Hyb （Γ｜Θ）を、固定したΛとΘの下で計算する目的関数計算手段と、後記する式（１６）に示す目的関数ＬObjective function calculation means for calculating (Γ | Θ) under fixed Λ and Θ, and an objective function L shown in the following equation (16) ^SS-HybSS-Hyb （Γ｜Θ）を識別モデル用のモデル統合パラメタγ(Γ | Θ) is the model integration parameter γ for the discrimination model _ii で偏微分する計算を行う識別モデル用偏微分計算手段と、後記する式（１６）に示す目的関数ＬAnd partial differentiation calculation means for identification model for performing partial differentiation calculation with an objective function L shown in equation (16) described later ^SS-HybSS-Hyb （Γ｜Θ）を生成モデル用のモデル統合パラメタγModel integration parameter γ for generating model (Γ | Θ) _jj で偏微分する計算を行う生成モデル用偏微分計算手段と、前記収束判定手段で前記収束条件を満たさないと判定された場合に、前記求められたモデル統合用パラメタ集合Γを更新するパラメタ更新手段と、を備えることを特徴とする。A partial differential calculation means for a generation model that performs a partial differentiation calculation at the time, and a parameter update means that updates the obtained model integration parameter set Γ when the convergence determination means determines that the convergence condition is not satisfied And.
また、請求項７に記載の言語解析モデル学習方法は、請求項６に記載の言語解析モデル学習方法において、前記生成モデル用パラメタベクトル集合を決定するステップが、前記第１目的関数として後記する式（１４）に示す目的関数Ｇ（Θ｜Γ）を計算するステップと、予め定められた補助関数を計算することで、後記する式（１４）で示される目的関数Ｇ（Θ｜Γ）を、固定したΛとΓの下で最大化するパラメタベクトル集合Θを求める処理を行うステップと、前記収束判定手段で前記収束条件を満たさないと判定された場合に、前記求められた生成モデル用パラメタベクトル集合Θを更新するステップと、を有し、前記モデル統合用パラメタ集合を決定するステップが、前記第２目的関数として後記する式（１６）に示す目的関数ＬFurther, the language analysis model learning method according to claim 7 is the language analysis model learning method according to claim 6, wherein the step of determining the generation model parameter vector set is an expression described later as the first objective function. By calculating the objective function G (Θ | Γ) shown in (14) and calculating a predetermined auxiliary function, the objective function G (Θ | Γ) shown in the following formula (14) is obtained. A step of obtaining a parameter vector set Θ that is maximized under fixed Λ and Γ, and when the convergence determination means determines that the convergence condition is not satisfied, the generated generation model parameter vector Updating the set Θ, and the step of determining the model integration parameter set is an objective function L shown in the following equation (16) as the second objective function: ^SS-HybSS-Hyb （Γ｜Θ）を、固定したΛとΘの下で計算するステップと、後記する式（１６）に示す目的関数ＬA step of calculating (Γ | Θ) under fixed Λ and Θ, and an objective function L shown in the following equation (16) ^SS-HybSS-Hyb （Γ｜Θ）を識別モデル用のモデル統合パラメタγ(Γ | Θ) is the model integration parameter γ for the discrimination model _ii で偏微分する計算を行うステップと、後記する式（１６）に示す目的関数ＬAnd a step of performing partial differentiation with the objective function L shown in the following equation (16) ^SS-HybSS-Hyb （Γ｜Θ）を生成モデル用のモデル統合パラメタγModel integration parameter γ for generating model (Γ | Θ) _jj で偏微分する計算を行うステップと、前記収束判定手段で前記収束条件を満たさないと判定された場合に、前記求められたモデル統合用パラメタ集合Γを更新するステップと、を有することを特徴とする。And a step of performing partial differentiation in step (a) and updating the obtained model integration parameter set Γ when the convergence determination means determines that the convergence condition is not satisfied. To do.

また、請求項３に記載の言語解析モデル学習装置は、請求項２に記載の言語解析モデル学習装置において、前記第１目的関数を最大化させるための補助関数および前記第２目的関数が、前記ラベルありデータを利用して前記識別モデル用パラメタベクトル集合から推定された入力系列と出力系列との同時確率と、当該識別モデル用パラメタベクトル集合に対して予め求められた前記モデル統合用パラメタ集合とに基づいて算出される確率値を、積算対象とする前記識別モデル用パラメタベクトル集合に亘って積算した結果を示す識別モデル統合用確率値と、前記ラベルなしデータを利用して前記生成モデル用パラメタベクトル集合から推定された入力系列と出力系列との同時確率と、当該生成モデル用パラメタベクトル集合に対して予め求められた前記モデル統合用パラメタ集合とに基づいて算出される確率値を、積算対象とする前記生成モデル用パラメタベクトル集合に亘って積算した結果を示す生成モデル統合用確率値と、の積を、前記入力される文字列または記号列に対して付与すべきラベルの事後確率を示すパラメタ集合として含み、前記補助関数計算手段は、前記補助関数として後記する式（１５）に示すＱ関数を用いて、このＱ関数が最大になるパラメタベクトル集合Θ′を、現在のパラメタベクトル集合Θより求め、Θ′がΘに対して増大しなくなるまでΘをΘ′で置き換えながら繰り返しＱ関数を求めることで、前記目的関数Ｇ（Θ｜Γ）を最大化するパラメタベクトル集合Θを求めることを特徴とする。
また、請求項８に記載の言語解析モデル学習方法は、請求項７に記載の言語解析モデル学習方法において、前記第１目的関数を最大化させるための補助関数および前記第２目的関数が、前記ラベルありデータを利用して前記識別モデル用パラメタベクトル集合から推定された入力系列と出力系列との同時確率と、当該識別モデル用パラメタベクトル集合に対して予め求められた前記モデル統合用パラメタ集合とに基づいて算出される確率値を、積算対象とする前記識別モデル用パラメタベクトル集合に亘って積算した結果を示す識別モデル統合用確率値と、前記ラベルなしデータを利用して前記生成モデル用パラメタベクトル集合から推定された入力系列と出力系列との同時確率と、当該生成モデル用パラメタベクトル集合に対して予め求められた前記モデル統合用パラメタ集合とに基づいて算出される確率値を、積算対象とする前記生成モデル用パラメタベクトル集合に亘って積算した結果を示す生成モデル統合用確率値と、の積を、前記入力される文字列または記号列に対して付与すべきラベルの事後確率を示すパラメタ集合として含み、前記パラメタベクトル集合Θを求める処理を行うステップは、前記補助関数として後記する式（１５）に示すＱ関数を用いて、このＱ関数が最大になるパラメタベクトル集合Θ′を、現在のパラメタベクトル集合Θより求め、Θ′がΘに対して増大しなくなるまでΘをΘ′で置き換えながら繰り返しＱ関数を求めることで、前記目的関数Ｇ（Θ｜Γ）を最大化するパラメタベクトル集合Θを求めることを特徴とする。 The language analysis model learning device according to claim 3 is the language analysis model learning device according to claim 2, wherein the auxiliary function for maximizing the first objective function and the second objective function are A joint probability of an input sequence and an output sequence estimated from the identification model parameter vector set using labeled data, and the model integration parameter set determined in advance for the identification model parameter vector set; The probability value calculated based on the identification model parameter vector set to be integrated over the identification model integration probability value indicating the result of integration, and the generation model parameter using the unlabeled data Calculated in advance for the joint probability of the input sequence and output sequence estimated from the vector set and the parameter vector set for the generation model. A probability value calculated based on the generated model integration parameter set, and a product model integration probability value indicating a result obtained by integrating the probability values calculated over the generation model parameter vector set to be integrated, The auxiliary function calculation means includes a Q function shown in the following formula (15) as the auxiliary function, which is included as a parameter set indicating the posterior probability of the label to be given to the input character string or symbol string. The parameter vector set Θ ′ that maximizes the Q function is obtained from the current parameter vector set Θ ′, and the Q function is repeatedly obtained by replacing Θ with Θ ′ until Θ ′ does not increase with respect to Θ. A parameter vector set Θ that maximizes the objective function G (Θ | Γ) is obtained.
The language analysis model learning method according to claim 8 is the language analysis model learning method according to claim 7 , wherein the auxiliary function for maximizing the first objective function and the second objective function are A joint probability of an input sequence and an output sequence estimated from the identification model parameter vector set using labeled data, and the model integration parameter set determined in advance for the identification model parameter vector set; The probability value calculated based on the identification model parameter vector set to be integrated over the identification model integration probability value indicating the result of integration, and the generation model parameter using the unlabeled data Calculated in advance for the joint probability of the input sequence and output sequence estimated from the vector set and the parameter vector set for the generation model. A probability value calculated based on the generated model integration parameter set, and a product model integration probability value indicating a result obtained by integrating the probability values calculated over the generation model parameter vector set to be integrated, It viewed including as a parameter set indicating a posteriori probability of the label to be assigned to a character string or symbol string is the input, the step of performing a process of obtaining the parameter vector set Θ will be described later as the auxiliary function (15) The parameter vector set Θ ′ that maximizes the Q function is obtained from the current parameter vector set Θ, and is repeated while replacing Θ with Θ ′ until Θ ′ does not increase with respect to Θ. By obtaining a Q function, a parameter vector set Θ that maximizes the objective function G (Θ | Γ) is obtained .

請求項３に記載の言語解析モデル学習装置または請求項８に記載の言語解析モデル学習方法によれば、言語解析モデル学習装置は、第１目的関数を最大化させるための補助関数および第２目的関数が、入力系列に対して付与すべきラベルの事後確率として、ラベルありデータを利用した識別モデル統合用確率値と、ラベルなしデータを利用した生成モデル統合用確率値との積を含むので、第１目的関数および第２目的関数をそれぞれ最大化することで決定される生成モデル用パラメタベクトル集合およびモデル統合用パラメタ集合とが、ラベルなしデータとラベルありデータとを学習した結果を反映することとなる。ここで、ラベルなしデータを第１目的関数を最大化させるための補助関数に利用するのであって、ラベルなしデータを第１目的関数に直接用いることはしない。つまり、ラベルなしデータを生成アプローチで取り込みつつ、識別アプローチの予測性能の良さを兼ね備えることが可能となる。ここで、識別モデル用パラメタベクトルから推定された同時確率は、例えば、条件付き確率場（ＣＲＦ：Conditional Random Fields）により構成される。また、生成モデル用パラメタベクトルから推定された同時確率は、例えば、隠れマルコフモデル（ＨＭＭ：Hidden Markov Model）により構成される。 According to the language analysis model learning process according to the language analysis model learning apparatus or claim 8 as claimed in claim 3, the language analysis model learning device, the auxiliary function to maximize the first objective function and the second object Since the function includes the product of the identification model integration probability value using the labeled data and the generation model integration probability value using the unlabeled data as the posterior probability of the label to be given to the input series, The generation model parameter vector set and model integration parameter set determined by maximizing the first objective function and the second objective function respectively reflect the result of learning unlabeled data and labeled data. It becomes. Here, the unlabeled data is used as an auxiliary function for maximizing the first objective function, and the unlabeled data is not directly used as the first objective function. That is, it is possible to combine the good prediction performance of the identification approach while capturing unlabeled data by the generation approach. Here, the joint probability estimated from the identification model parameter vector is configured by, for example, a conditional random field (CRF). Moreover, the joint probability estimated from the parameter vector for generation | occurrence | production models is comprised by the hidden Markov model (HMM: Hidden Markov Model), for example.

また、請求項４に記載の言語解析モデル学習装置は、請求項３に記載の言語解析モデル学習装置において、前記出力される生成モデル用パラメタベクトル集合および前記モデル統合用パラメタ集合と、前記予め学習された識別モデル用パラメタベクトル集合とを、前記事後確率を示すパラメタ集合に統合するパラメタ統合手段をさらに備えることを特徴とする。 The language analysis model learning device according to claim 4 is the language analysis model learning device according to claim 3 , wherein the output generation model parameter vector set and the model integration parameter set to be output are learned in advance. The method further comprises parameter integration means for integrating the identified identification model parameter vector set into the parameter set indicating the posterior probability.

また、請求項９に記載の言語解析モデル学習方法は、請求項８に記載の言語解析モデル学習方法において、パラメタ統合手段によって、前記出力される生成モデル用パラメタベクトル集合および前記モデル統合用パラメタ集合と、前記予め学習された識別モデル用パラメタベクトル集合とを、前記事後確率を示すパラメタ集合に統合するステップを有することを特徴とする。 The language analysis model learning method according to claim 9 is the language analysis model learning method according to claim 8 , wherein the output model parameter vector set and the model integration parameter set output by the parameter integration unit are provided. And the step of integrating the previously learned parameter vector set for the identification model into the parameter set indicating the posterior probability.

請求項４に記載の言語解析モデル学習装置または請求項９に記載の言語解析モデル学習方法によれば、言語解析モデル学習装置は、入力される文字列または記号列に対して付与すべきラベルの事後確率として、生成モデル用パラメタベクトル集合、モデル統合用パラメタ集合、および識別モデル用パラメタベクトル集合を単一のパラメタベクトル（パラメタ集合）に統合する。したがって、この統合されたパラメタベクトルは、言語解析モデルとして利用し易くなる。 According to the language analysis model learning device according to claim 4 or the language analysis model learning method according to claim 9 , the language analysis model learning device has a label to be added to the input character string or symbol string. As a posterior probability, the generated model parameter vector set, the model integration parameter set, and the identification model parameter vector set are integrated into a single parameter vector (parameter set). Therefore, the integrated parameter vector can be easily used as a language analysis model.

また、請求項５に記載の言語解析モデル学習装置は、請求項１ないし請求項４のいずれか一項に記載の言語解析モデル学習装置において、前記識別モデルを用いて前記入力されたラベルありデータを学習することで前記識別モデル用パラメタベクトル集合を作成する識別モデル学習手段をさらに備えることを特徴とする。 Moreover, the language analysis model learning device according to claim 5 is the language analysis model learning device according to any one of claims 1 to 4 , wherein the input labeled data using the identification model. It further comprises an identification model learning means for creating the identification model parameter vector set by learning.

また、請求項１０に記載の言語解析モデル学習方法は、請求項６ないし請求項９のいずれか一項に記載の言語解析モデル学習方法において、識別モデル学習手段によって、前記識別モデルを用いて前記入力されたラベルありデータを学習することで前記識別モデル用パラメタベクトル集合を作成するステップを有することを特徴とする。 Also, the language analysis model learning method according to claim 10, at the language analysis model learning process according to any one of claims 6 to 9, by the identification model learning unit, using the identification model And learning the input labeled data to create the identification model parameter vector set.

請求項５に記載の言語解析モデル学習装置または請求項１０に記載の言語解析モデル学習方法によれば、言語解析モデル学習装置は、ラベルありデータを学習することで識別モデル用パラメタベクトル集合を作成する。したがって、識別モデル用パラメタベクトル集合を作成するための構成と、モデル統合用パラメタ集合を作成するための構成とを共用することが可能となる。 According to the language analysis model learning device according to claim 5 or the language analysis model learning method according to claim 10 , the language analysis model learning device creates a parameter vector set for the identification model by learning the labeled data. To do. Therefore, it is possible to share the configuration for creating the identification model parameter vector set and the configuration for creating the model integration parameter set.

また、請求項１１に記載の言語解析モデル学習プログラムは、請求項６ないし請求項１０のいずれか一項に記載の言語解析モデル学習方法をコンピュータに実行させるためのプログラムである。このように構成されることにより、このプログラムをインストールされたコンピュータは、このプログラムに基づいた各機能を実現することができる。 Also, language analysis model learning program according to claim 11 is a program for executing the language analysis model learning process according to a computer in any one of claims 6 to 10. By being configured in this way, a computer in which this program is installed can realize each function based on this program.

また、請求項１２に記載のコンピュータ読み取り可能な記録媒体は、請求項１１に記載の言語解析モデル学習プログラムが記録されたことを特徴とする。このように構成されることにより、この記録媒体を装着されたコンピュータは、この記録媒体に記録されたプログラムに基づいた各機能を実現することができる。 A computer-readable recording medium according to a twelfth aspect is characterized in that the language analysis model learning program according to the eleventh aspect is recorded. By being configured in this way, a computer equipped with this recording medium can realize each function based on a program recorded on this recording medium.

本発明によれば、ラベルありデータとラベルなしデータとを入力として系列構造予測器の学習を行うことができる。その結果、低コストで言語解析モデルの予測性能を向上させることが可能となる。 According to the present invention, it is possible to learn a sequence structure predictor using labeled data and unlabeled data as inputs. As a result, the prediction performance of the language analysis model can be improved at low cost.

以下、図面を参照して本発明の言語解析モデル学習装置および言語解析モデル学習方法を実施するための最良の形態（以下「実施形態」という）について詳細に説明する。本実施形態では、入力をテキスト、出力を固有表現のラベル列として固有表現抽出の問題で言語解析モデル学習装置を説明する。 Hereinafter, the best mode (hereinafter referred to as “embodiment”) for carrying out the language analysis model learning device and language analysis model learning method of the present invention will be described in detail with reference to the drawings. In the present embodiment, the language analysis model learning apparatus will be described with respect to the problem of specific expression extraction using text as input and label string as output.

［言語解析モデル作成装置の構成］
図１は、本発明の実施形態に係る言語解析モデル作成装置の概要を模式的に示す構成図である。言語解析モデル作成装置１は、図１に示すように、言語解析モデル学習装置２と、パラメタ統合装置３とを備えている。学習フェーズにおいて、言語解析モデル学習装置２は、ラベルありデータＤ_lと、ラベルなしデータＤ_uと、学習支援情報記憶手段４に格納された情報とを用いて、処理結果をパラメタ集合記憶手段５に出力する。 [Configuration of language analysis model creation device]
FIG. 1 is a configuration diagram schematically showing an outline of a language analysis model creation device according to an embodiment of the present invention. As shown in FIG. 1, the language analysis model creation device 1 includes a language analysis model learning device 2 and a parameter integration device 3. In the learning phase, the language analysis model learning device 2 uses the labeled data D ₁ , the unlabeled data _Du, and the information stored in the learning support information storage unit 4 to process the processing results in the parameter set storage unit 5. Output to.

ラベルありデータＤ_lは、文字列または記号列にラベルが付与されたデータを示す。ここでは、式（１）に示すように、ラベルありサンプルＳ_l＝（ｘⁿ，ｙⁿ）のＮ個の集合のことを、ラベルありデータＤ_lと呼ぶ。また、ラベルなしデータＤ_uは、文字列または記号列を示す。ここでは、式（２）に示すように、ラベルなしサンプルＳ_u＝（ｘ^m）のＭ個の集合のことを、ラベルなしデータＤ_uと呼ぶ。 Of labeled data D _l shows data label is applied to the string or symbol string. Here, as shown in Expression (1), the N sets of labeled samples S ₁ = (x ⁿ , y ⁿ ) are referred to as labeled data D ₁ . The unlabeled data _Du represents a character string or a symbol string. Here, as shown in Expression (2), the M sets of unlabeled samples S _u = (x ^m ) are referred to as unlabeled data D _u .

学習支援情報記憶手段４は、学習支援情報として、後記する識別モデル用特徴抽出テンプレートと、生成モデル用特徴抽出テンプレートと、出力ラベル候補とを記憶するものである。 The learning support information storage unit 4 stores, as learning support information, an identification model feature extraction template, a generated model feature extraction template, and an output label candidate, which will be described later.

また、言語解析モデル学習装置２は、処理結果として、式（３）で示す識別モデル用パラメタベクトル集合Λと、式（４）で示す生成モデル用パラメタベクトル集合Θと、式（５）で示すモデル統合用パラメタ集合Γとをパラメタ集合記憶手段５に格納する。ここで、λ_iは識別モデル用パラメタベクトル、θ_jは生成モデル用パラメタベクトル、γ_i，γ_jはモデル統合用パラメタである。また、識別モデルは、入力される文字列または記号列を条件に予め定められたラベル候補が出現する確率を示す条件付き確率を用いて付与すべきラベルを推定するモデルである。また、生成モデルは、入力される文字列または記号列と予め定められたラベル候補とが同時に生成される確率を示す同時確率を用いて付与すべきラベルを推定するモデルである。 Further, the language analysis model learning device 2 shows, as processing results, an identification model parameter vector set Λ shown by Expression (3), a generated model parameter vector set Θ shown by Expression (4), and Expression (5). The model integration parameter set Γ is stored in the parameter set storage means 5. Here, λ _i is an identification model parameter vector, θ _j is a generation model parameter vector, and γ _i and γ _j are model integration parameters. The identification model is a model for estimating a label to be given using a conditional probability indicating a probability that a predetermined label candidate appears on the condition of an input character string or symbol string. The generation model is a model for estimating a label to be assigned using a joint probability indicating a probability that an input character string or symbol string and a predetermined label candidate are generated at the same time.

パラメタ統合装置（パラメタ統合手段）３は、言語解析モデル学習装置２から出力される生成モデル用パラメタベクトル集合Θ、モデル統合用パラメタ集合Γ、および識別モデル用パラメタベクトル集合Λとを、単一のパラメタ集合に統合するものである。式（６）に示すＲは、入力される文字列または記号列に対して付与すべきラベルの事後確率を示すものである。Ｒは、ｐ_i ^Dをγ_iで累乗して算出される確率値をｉに亘って積算した結果を示す識別モデル統合用確率値と、ｐ_j ^Gをγ_jで累乗して算出される確率値をjに亘って積算した結果を示す生成モデル統合用確率値との積の形になっている。パラメタ統合装置３は、統合されたパラメタ集合を言語解析モデル記憶手段６に格納する。 The parameter integration device (parameter integration means) 3 uses a generated model parameter vector set Θ, a model integration parameter set Γ, and an identification model parameter vector set Λ outputted from the language analysis model learning device 2 as a single unit. It is integrated into the parameter set. R shown in Equation (6) indicates the posterior probability of the label to be assigned to the input character string or symbol string. R is a probability value for integrating an identification model indicating the result of accumulating probability values calculated by powering p _i ^D by γ _i over i, and a probability calculated by raising p _j ^G to a power of γ _j It is in the form of a product with the generation model integration probability value indicating the result of integrating the values over j. The parameter integration device 3 stores the integrated parameter set in the language analysis model storage unit 6.

評価フェーズにおいて、系列構造予測装置７は、入力データであるラベルなしサンプルＳ_uに対して、言語解析モデル記憶手段６に格納されている言語解析モデル（パラメタ統合装置３によって統合されたパラメタ集合）を用いて、付与すべきラベルを推定し、入力データに対応したラベルありサンプルＳ_lを出力するものである。 In the evaluation phase, the sequence structure prediction apparatus 7, to the unlabeled sample S _u is the input data, language analysis model stored in the language analysis model storage unit 6 (parameter set integrated by the parameter integrating unit 3) Is used to estimate the label to be given and output a labeled sample S ₁ corresponding to the input data.

［言語解析モデル学習装置の構成］
図２は、図１に示した言語解析モデル学習装置の構成を模式的に示す機能ブロック図である。言語解析モデル学習装置２は、例えば、ＣＰＵ（Central Processing Unit）と、ＲＡＭ（Random Access Memory）と、ＲＯＭ（Read Only Memory）と、ＨＤＤ（Hard Disk Drive）と、入出力インタフェース等から構成され、図２に示すように、識別モデル学習手段１０と、系列構造予測器学習手段２０とを備えている。 [Configuration of language analysis model learning device]
FIG. 2 is a functional block diagram schematically showing the configuration of the language analysis model learning device shown in FIG. The language analysis model learning device 2 includes, for example, a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), an HDD (Hard Disk Drive), an input / output interface, and the like. As shown in FIG. 2, an identification model learning unit 10 and a sequence structure predictor learning unit 20 are provided.

＜入力データ例＞
図３は、図２に示した言語解析モデル学習装置に入力する情報の例を示す図であって、（ａ）はラベルありデータ、（ｂ）は出力ラベル候補集合をそれぞれ示している。図３（ａ）に示す例は、図１８（ｂ）に示したものと同様である。ただし、形態素区切りは事前に施されているものとする。 <Example of input data>
FIG. 3 is a diagram illustrating an example of information input to the language analysis model learning apparatus illustrated in FIG. 2, where (a) illustrates data with labels and (b) illustrates an output label candidate set. The example shown in FIG. 3A is the same as that shown in FIG. However, the morpheme separation is assumed to be performed in advance.

図３（ｂ）に示す出力ラベル候補集合は、固有表現抽出に対応したものであり、予め定められた５つの出力ラベル候補を要素としている。各要素は、図１８（ｂ）に示したものと同様である。この出力ラベル候補集合は、対象とする問題によって自動的に決定され、学習支援情報記憶手段４の出力ラベル候補記憶手段４３から言語解析モデル学習装置２が取得するものである。 The output label candidate set shown in FIG. 3B corresponds to the specific expression extraction, and includes five predetermined output label candidates as elements. Each element is the same as that shown in FIG. This output label candidate set is automatically determined according to the target problem, and is acquired by the language analysis model learning device 2 from the output label candidate storage means 43 of the learning support information storage means 4.

＜識別モデル学習手段＞
識別モデル学習手段１０は、識別モデルを用いて、入力されたラベルありデータＤ_lを学習することで識別モデル用パラメタベクトル集合Λを作成するものであり、出力候補グラフ生成手段１１と、特徴抽出手段１２と、パラメタ学習手段１３とを備えている。このうち、出力候補グラフ生成手段１１と、特徴抽出手段１２とは、学習のための前処理を行うためのものである。また、識別モデル学習手段１０は、式（７）に示すラベルありデータＤ_l′を用いる。 <Identification model learning means>
The discrimination model learning means 10 creates the discrimination model parameter vector set Λ by learning the input labeled data D _l using the discrimination model. The discrimination model learning means 10 and the feature extraction Means 12 and parameter learning means 13 are provided. Among these, the output candidate graph generation means 11 and the feature extraction means 12 are for performing preprocessing for learning. Further, the identification model learning means 10 uses the labeled data D _{l ′} shown in Expression (7).

≪出力候補グラフ生成手段≫
出力候補グラフ生成手段１１は、入力されたラベルありデータＤ_l′から出力候補グラフを生成するものである。出力候補グラフは、図４に示すように、可能性のあるすべての出力系列候補をパスで接続したラティス形式で表現したものである。図４は、図２に示した言語解析モデル学習装置で生成される出力候補グラフの一例を模式的に示す図である。ここで、＜ＢＯＳ＞は、入力系列ｘの始まりを表す固定の特別なラベルであり、＜ＥＯＳ＞は、入力系列ｘの終わりを表す固定の特別なラベルである。ラティスは、入力されたラベルありデータＤ_l′（入力系列ｘ）に対する出力系列ｙが、個々のインスタンスｙ_i（ｉ＝１，…，５）をノード、各インスタンス間の依存関係をリンクで示したものである。出力候補グラフ中の＜ＢＯＳ＞から＜ＥＯＳ＞間の１つのパスは１つの出力に対応し、出力候補グラフは、とり得るすべての出力の候補を包含したグラフになっている。例えば、ノード４０１は、ラベルありデータＤ_l′の４番目の単語「陸上」に対してラベル「Ｂ−組織名」を付与した出力のインスタンスを示す。同様に、ノード４０２は、ラベルありデータＤ_l′の６番目の単語「の」に対してラベル「Ｏ」を付与した出力のインスタンスを示す。 << Output candidate graph generation means >>
The output candidate graph generation means 11 generates an output candidate graph from the input labeled data D _{l ′} . As shown in FIG. 4, the output candidate graph is a lattice format in which all possible output series candidates are connected by a path. FIG. 4 is a diagram schematically illustrating an example of an output candidate graph generated by the language analysis model learning device illustrated in FIG. Here, <BOS> is a fixed special label indicating the beginning of the input sequence x, and <EOS> is a fixed special label indicating the end of the input sequence x. The lattice indicates that the output series y for the input labeled data D _{l ′} (input series x) is an individual instance y _i (i = 1,..., 5) as a node, and a dependency relationship between each instance is indicated by a link. It is a thing. One path between <BOS> and <EOS> in the output candidate graph corresponds to one output, and the output candidate graph is a graph including all possible output candidates. For example, the node 401 indicates an output instance in which the label “B-organization name” is assigned to the fourth word “land” of the labeled data D _{1 ′} . Similarly, the node 402 represents an output instance in which the label “O” is assigned to the sixth word “no” of the labeled data D _{l ′} .

≪特徴抽出手段≫
特徴抽出手段１２は、出力候補グラフから、推定する位置のラベルと、識別モデル用特徴抽出テンプレートに記述された入力系列中のインスタンスとの組み合わせにより、特徴を抽出するものである。識別モデル用特徴抽出テンプレートの具体例を図５に示す。図５は、識別モデル用特徴抽出テンプレートの例を示す図である。識別モデル用特徴抽出テンプレート５００は、出力系列のｉ番目として「ｙ₄」のラベルを推定するテンプレートである。この識別モデル用特徴抽出テンプレート５００は、推定する位置のラベルの前後２つずつまでに対応する入力単語を特徴として抽出するものである。したがって、特徴抽出手段１２は、識別モデル用特徴抽出テンプレート５００を用いた場合に、図５に示す「ｙ₄」の位置のラベルを推定するときに、図６（ａ）に符号６０１〜６０５で示す特徴を抽出する。具体的には、図５に示す出力系列のｉ番目として「ｙ₄」の位置のラベルを「Ｂ−組織名」と推定する場合には、図６（ｂ）に符号６１１〜６１５で示す特徴を抽出する。同様に、図５に示す出力系列のｉ番目として「ｙ₄」の位置のラベルを「Ｂ−人名」と推定する場合には、図６（ｃ）に符号６２１〜６２５で示す特徴を抽出する。 ≪Feature extraction means≫
The feature extraction means 12 extracts features from the output candidate graph by combining the position label to be estimated and the instances in the input series described in the feature extraction template for identification model. A specific example of the feature extraction template for the identification model is shown in FIG. FIG. 5 is a diagram illustrating an example of an identification model feature extraction template. The identification model feature extraction template 500 is a template for estimating a label of “y ₄ ” as the i-th output sequence. The identification model feature extraction template 500 extracts input words corresponding to up to two labels before and after the position label to be estimated as a feature. Therefore, when the feature extraction unit 12 estimates the label at the position of “y ₄ ” shown in FIG. 5 when using the identification model feature extraction template 500, reference numerals 601 to 605 in FIG. Extract features to show. Specifically, when the label at the position of “y ₄ ” is estimated as “B-organization name” as the i-th output sequence shown in FIG. 5, the features indicated by reference numerals 611 to 615 in FIG. To extract. Similarly, when the label at the position of “y ₄ ” is estimated as “B-person name” as the i-th output sequence shown in FIG. 5, the features indicated by reference numerals 621 to 625 in FIG. 6C are extracted. .

また、特徴抽出手段１２は、出力候補グラフと識別モデル用特徴抽出テンプレートを用いて、出力候補グラフの各ノード（またはリンク）に特徴ベクトルを付与し、特徴ベクトル付き出力候補グラフを生成する。出力候補グラフと特徴ベクトルとの関係を図７に示す。図７は、図４に丸で示したノードに対して、図５に示した識別モデル用特徴抽出テンプレートを用いて作成された特徴ベクトルの説明図である。図７（ａ）は、図４に示したノード４０１に付与される特徴を示す。特徴抽出手段１２は、これらの特徴には値「１」を紐付け、また、ノード４０１に付与されない別の特徴には値「０」を紐付けることで、「１」と「０」とを要素に持つ特徴ベクトルを形成する。特徴抽出手段１２は、前後２単語のいずれかと、対象とするノードが属する出力ラベルとの組合せで特徴を生成するため、入力系列の単語の同じ位置で出力ラベルの違うノードの特徴ベクトルは、それぞれ互いに直交するので、それらの内積をとると「０」となる。図７（ｂ）は、図４に示したノード４０２に付与される特徴を示す。これも同様なので説明を省略する。 In addition, the feature extraction unit 12 assigns a feature vector to each node (or link) of the output candidate graph using the output candidate graph and the identification model feature extraction template, and generates an output candidate graph with a feature vector. FIG. 7 shows the relationship between the output candidate graph and the feature vector. FIG. 7 is an explanatory diagram of feature vectors created using the identification model feature extraction template shown in FIG. 5 for the nodes indicated by circles in FIG. FIG. 7A shows the characteristics given to the node 401 shown in FIG. The feature extraction unit 12 associates a value “1” with these features, and associates a value “0” with another feature not assigned to the node 401, thereby obtaining “1” and “0”. A feature vector for an element is formed. Since the feature extraction unit 12 generates a feature by combining one of the two words before and after and the output label to which the target node belongs, the feature vectors of the nodes having different output labels at the same position of the words in the input sequence are respectively Since they are orthogonal to each other, their inner product is “0”. FIG. 7B shows the characteristics given to the node 402 shown in FIG. Since this is also the same, description is omitted.

≪パラメタ学習手段≫
パラメタ学習手段１３は、特徴ベクトル付き出力候補グラフと、初期化されたパラメタベクトル（すべての要素が０のベクトル）を用いて識別アプローチの教師あり学習を行うものである。ここでは、パラメタ学習手段１３は、条件付き確率場に基づいて識別モデル用パラメタベクトル集合Λの学習を行うために、図２に示すように、目的関数計算手段１３１と、目的関数勾配計算手段１３２と、収束判定手段１３３と、パラメタ更新手段１３４とを備えている。なお、条件付き確率場については、例えば、「F. Sha and F. Pereira, Shallow Parsing with Conditio1al Random Fields, In Proc. of HLT/NAACL-2003. pages 213-220, 2003」に詳述されているので説明を省略する。 ≪Parameter learning means≫
The parameter learning means 13 performs supervised learning of an identification approach using an output candidate graph with feature vectors and an initialized parameter vector (a vector in which all elements are 0). Here, the parameter learning unit 13 performs learning of the identification model parameter vector set Λ based on the conditional random field, as shown in FIG. 2, the objective function calculation unit 131 and the objective function gradient calculation unit 132. And a convergence determination unit 133 and a parameter update unit 134. The conditional random field is described in detail in, for example, “F. Sha and F. Pereira, Shallow Parsing with Conditio1al Random Fields, In Proc. Of HLT / NAACL-2003. Pages 213-220, 2003”. Therefore, explanation is omitted.

目的関数計算手段１３１は、識別モデル用パラメタベクトルλが入力された目的関数を計算するものである。ここでは、前提として、入力系列ｘ（ラベルありデータ）が与えられた際の系列中の位置ｓから得られる局所的な特徴ベクトルｆ_sを式（８）で定義する。 The objective function calculation unit 131 calculates an objective function to which the identification model parameter vector λ is input. Here, as a premise, a local feature vector f _s obtained from a position s in a sequence when an input sequence x (labeled data) is given is defined by Expression (8).

また、条件付き確率場は、各クリーク（clique）上のポテンシャル関数の積と全体の比率とを用いて条件付き確率ｐ（ｙ｜ｘ）を定義する。つまり、条件付き確率場による入力系列ｘに対する出力系列ｙの条件付き確率ｐ（ｙ｜ｘ）は、式（９）で定義される。また、式（９）中のＺ（ｘ）は、式（１０）に示すように、すべての出力ｙに対する正規化項にあたる。 The conditional random field defines the conditional probability p (y | x) using the product of the potential function on each clique and the overall ratio. That is, the conditional probability p (y | x) of the output sequence y with respect to the input sequence x by the conditional random field is defined by Expression (9). Further, Z (x) in equation (9) corresponds to a normalization term for all outputs y as shown in equation (10).

目的関数計算手段１３１は、与えられたラベルありデータＤ_l′を利用して、パラメタベクトルλの（対数）事後確率最大化を行うものである。つまり、目的関数計算手段１３１は、logｐ（λ｜Ｄ_l′）の最大化を行う。具体的には、目的関数計算手段１３１は、式（１１）で示される目的関数Ｌ^CRF（λ）を計算する。ただし、ｐ（λ）はλの事前確率分布を表す。式（１１）に示した目的関数を最適化するには、Ｌ−ＢＦＧＳといった勾配に基づく数値最適化法を適用することができる。なお、Ｌ−ＢＦＧＳについては、「D. C. Liu and J. Nocedal, On the Limited Memory BFGS Method for Large Scale Optimization Math. Programming,Ser.B, 45（3）:503-528,1989」に記載されているので、説明を省略する。 The objective function calculation unit 131 maximizes the (logarithm) posterior probability of the parameter vector λ using the given labeled data D _{l ′} . That is, the objective function calculation means 131 maximizes logp (λ | D _{l ′} ). Specifically, the objective function calculation unit 131 calculates an objective function L ^CRF (λ) represented by Expression (11). However, p (λ) represents the prior probability distribution of λ. In order to optimize the objective function shown in Expression (11), a numerical optimization method based on a gradient such as L-BFGS can be applied. Note that L-BFGS is described in “DC Liu and J. Nocedal, On the Limited Memory BFGS Method for Large Scale Optimization Math. Programming, Ser. B, 45 (3): 503-528, 1989”. Therefore, explanation is omitted.

目的関数勾配計算手段１３２は、式（１１）に示した目的関数の勾配を計算するものである。式（１１）に示した目的関数の勾配∇Ｌ^CRF（λ）は、式（１２）で示される。 The objective function gradient calculation means 132 is for calculating the gradient of the objective function shown in the equation (11). The gradient ∇L ^CRF (λ) of the objective function shown in Expression (11) is expressed by Expression (12).

ここで、Ｅは添字の期待値を示す。また、右辺第１項は、特徴ベクトルの経験的期待値なので、ラベルありデータから正解系列に表れる特徴を数え上げることで容易に計算できる。また、その値はラベルありデータにのみ依存する値なので、学習前に一度計算すればよい。また、式（１２）の右辺第２項は、すべての出力系列に対する各特徴ベクトルが出現する期待値となっている。よって、すべての可能な出力系列について個別に計算する必要がある。しかも、構造予測問題においては、一般に、すべての可能な出力の総数が非常に膨大な数となるため、すべての可能な出力を個々に計算することは計算量の観点から非常に困難である。しかしながら、系列構造予測問題に関しては、forward-backwardアルゴリズムを用いて効率的に期待値が計算できることが知られているため、現実的な時間で処理を実行することが可能となっている（非特許文献１参照）。 Here, E indicates the expected value of the subscript. Also, the first term on the right side is an empirical expected value of the feature vector, so it can be easily calculated by counting the features appearing in the correct answer sequence from the labeled data. Further, since the value depends only on the labeled data, it may be calculated once before learning. The second term on the right side of equation (12) is an expected value at which each feature vector for all output sequences appears. Thus, all possible output sequences need to be calculated individually. Moreover, in the structure prediction problem, the total number of all possible outputs is generally very large, and it is very difficult to calculate all possible outputs individually from the viewpoint of computational complexity. However, regarding the sequence structure prediction problem, it is known that the expected value can be calculated efficiently using the forward-backward algorithm, so that it is possible to execute the process in a realistic time (non-patent) Reference 1).

収束判定手段１３３は、各特徴の経験的期待値とすべての出力に対する各特徴の期待値との差が収束したか否かを判別するものである。具体的には、収束判定手段１３３は、式（１２）で示される目的関数の勾配∇Ｌ^CRF（λ）の値が収束したか否かを判別する。また、収束判定手段１３３は、目的関数の勾配∇Ｌ^CRF（λ）の値が収束したと判定した場合には、その時点の識別モデル用パラメタベクトルλを識別モデル用パラメタベクトル集合記憶手段５１に出力する。出力される識別モデル用パラメタベクトル（列ベクトル）の一例を図８に示す。パラメタ更新手段１３４は、目的関数の勾配∇Ｌ^CRF（λ）の値が収束していない場合に、識別モデル用パラメタベクトルλを更新するものである。 The convergence determination unit 133 determines whether or not the difference between the empirical expected value of each feature and the expected value of each feature for all outputs has converged. Specifically, the convergence determination unit 133 determines whether or not the value of the gradient ∇L ^CRF (λ) of the objective function expressed by Expression (12) has converged. When the convergence determining unit 133 determines that the value of the gradient ∇L ^CRF (λ) of the objective function has converged, the convergence model parameter vector λ at that time is stored in the identification model parameter vector set storage unit 51. Output. An example of the output identification model parameter vector (column vector) is shown in FIG. The parameter updating unit 134 updates the identification model parameter vector λ when the value of the gradient ∇L ^CRF (λ) of the objective function has not converged.

本実施形態では、識別モデル学習手段１０は、同一のラベルありデータから複数の識別モデルを学習することが可能である。例えば、識別モデル用特徴抽出テンプレートを変更することによって、複数の異なった識別モデルを作成することができる。なお、識別モデルの数に関しては、特に限定されるものではなく、設計者が、タスクなどに応じて、例えば、１〜数千まで自由に決定することができる。本実施形態では、識別モデル学習手段１０により作成された異なる（Ｉ個の）識別モデル用パラメタベクトルλ_iをすべて合わせて前記した式（３）で表す。 In the present embodiment, the identification model learning means 10 can learn a plurality of identification models from the same labeled data. For example, a plurality of different identification models can be created by changing the identification model feature extraction template. Note that the number of identification models is not particularly limited, and the designer can freely determine, for example, 1 to several thousand according to the task or the like. In the present embodiment, all the different (I) identification model parameter vectors λ _i created by the identification model learning means 10 are represented by the above-described equation (3).

＜系列構造予測器学習手段＞
系列構造予測器学習手段２０は、識別モデル用パラメタベクトル集合記憶手段５１に記憶された識別モデル用パラメタベクトル集合Λを入力として、交互に処理を実行する生成モデル学習手段２１およびモデル統合学習手段２２と、この交互処理を所定の終了条件に達するまで行わせる収束判定手段２３とを備えている。 <Sequence structure predictor learning means>
The sequence structure predictor learning unit 20 receives the identification model parameter vector set Λ stored in the identification model parameter vector set storage unit 51 as an input, and generates a generation model learning unit 21 and a model integrated learning unit 22 that execute processing alternately. And convergence determining means 23 for performing this alternate processing until a predetermined end condition is reached.

生成モデル学習手段２１は、入力されたラベルなしデータＤ_uを利用して、予め学習された識別モデル用パラメタベクトル集合Λと、予め定められたモデル統合用パラメタ集合Γとを学習することで第１目的関数を最大化するような生成モデル用パラメタベクトル集合Θを決定するものである。 The generation model learning means 21 uses the input unlabeled data _Du to learn the identification model parameter vector set Λ previously learned and the predetermined model integration parameter set Γ. A generation model parameter vector set Θ that maximizes one objective function is determined.

モデル統合学習手段２２は、入力されたラベルありデータＤ_lを利用して、識別モデル用パラメタベクトル集合Λと、生成モデル用パラメタベクトル集合Θとを学習することで第２目的関数を最大化するようなモデル統合用パラメタ集合Γを決定するものである。 The model integrated learning means 22 maximizes the second objective function by learning the identification model parameter vector set Λ and the generated model parameter vector set Θ using the input labeled data D _l. Such a model integration parameter set Γ is determined.

収束判定手段２３は、生成モデル学習手段２１およびモデル統合学習手段２２に対して、生成モデル用パラメタベクトル集合Θとモデル統合用パラメタ集合Γとを交互に決定させて、いずれか一方が所定の収束条件を満たすときに、その時点の生成モデル用パラメタベクトル集合Θとモデル統合用パラメタ集合Γとを出力するものである。 The convergence determination unit 23 causes the generation model learning unit 21 and the model integration learning unit 22 to alternately determine the generation model parameter vector set Θ and the model integration parameter set Γ, one of which is a predetermined convergence. When the condition is satisfied, the generation model parameter vector set Θ and the model integration parameter set Γ at that time are output.

＜学習支援情報記憶手段＞
学習支援情報記憶手段４は、識別モデル用特徴抽出テンプレート記憶手段４１と、生成モデル用特徴抽出テンプレート記憶手段４２と、出力ラベル候補記憶手段４３とを備えている。識別モデル用特徴抽出テンプレート記憶手段４１は、識別モデル用特徴抽出テンプレートを記憶するものである。生成モデル用特徴抽出テンプレート記憶手段４２は、生成モデル用特徴抽出テンプレートを記憶するものである。出力ラベル候補記憶手段４３は、出力ラベル候補を記憶するものである。なお、各記憶手段４１，４２，４３は、例えば、一般的なハードディスクやメモリから構成される。 <Learning support information storage means>
The learning support information storage unit 4 includes an identification model feature extraction template storage unit 41, a generated model feature extraction template storage unit 42, and an output label candidate storage unit 43. The identification model feature extraction template storage means 41 stores an identification model feature extraction template. The generation model feature extraction template storage means 42 stores the generation model feature extraction template. The output label candidate storage unit 43 stores output label candidates. Each storage means 41, 42, 43 is composed of, for example, a general hard disk or memory.

＜パラメタ集合記憶手段＞
パラメタ集合記憶手段５は、識別モデル用パラメタベクトル集合記憶手段５１と、生成モデル用パラメタベクトル集合記憶手段５２と、モデル統合用パラメタ集合記憶手段５３とを備えている。識別モデル用パラメタベクトル集合記憶手段５１は、識別モデル用パラメタベクトル集合Λを記憶するものである。生成モデル用パラメタベクトル集合記憶手段５２は、生成モデル用パラメタベクトル集合Θを記憶するものである。モデル統合用パラメタ集合記憶手段５３は、モデル統合用パラメタ集合Γを記憶するものである。なお、各記憶手段５１，５２，５３は、例えば、一般的なハードディスクやメモリから構成される。 <Parameter set storage means>
The parameter set storage unit 5 includes an identification model parameter vector set storage unit 51, a generated model parameter vector set storage unit 52, and a model integration parameter set storage unit 53. The identification model parameter vector set storage means 51 stores the identification model parameter vector set Λ. The generation model parameter vector set storage means 52 stores the generation model parameter vector set Θ. The model integration parameter set storage means 53 stores the model integration parameter set Γ. Each storage means 51, 52, 53 is composed of a general hard disk or memory, for example.

［系列構造予測器学習手段の構成］
図９は、図２に示した系列構造予測器学習手段の構成を模式的に示す機能ブロック図である。
≪生成モデル学習手段≫
生成モデル学習手段２１は、出力候補グラフ生成手段２１１と、特徴抽出手段２１２と、目的関数計算手段２１３と、補助関数計算手段２１４と、パラメタ更新手段２１５とを備えている。このうち、出力候補グラフ生成手段２１１と、特徴抽出手段２１２とは、学習のための前処理を行うためのものである。また、生成モデル学習手段２１は、前記した式（２）に示すラベルなしデータＤ_uを用いる。 [Configuration of Sequence Structure Predictor Learning Means]
FIG. 9 is a functional block diagram schematically showing the configuration of the sequence structure predictor learning unit shown in FIG.
≪Generation model learning means≫
The generation model learning unit 21 includes an output candidate graph generation unit 211, a feature extraction unit 212, an objective function calculation unit 213, an auxiliary function calculation unit 214, and a parameter update unit 215. Among these, the output candidate graph generation unit 211 and the feature extraction unit 212 are for performing preprocessing for learning. Further, the generation model learning means 21 uses the unlabeled data _Du shown in the above equation (2).

出力候補グラフ生成手段２１１は、図２に示した識別モデル学習手段１０の出力候補グラフ生成手段１１と同様なので説明を省略する。
特徴抽出手段２１２は、出力候補グラフから、推定する位置のラベルと、生成モデル用特徴抽出テンプレートに記述された入力系列中のインスタンスとの組み合わせにより、特徴を抽出する。生成モデル用特徴抽出テンプレートの具体例を図１０に示す。図１０は、生成モデル用特徴抽出テンプレートの例を示す図である。生成モデル用特徴抽出テンプレートの基本的なフォーマットは、識別モデル用特徴抽出テンプレートと同様であり、特徴の抽出方法も同様である。唯一の違いは、生成モデル用特徴抽出テンプレートは、隠れマルコフモデルの制約として抽出される素性が互いに独立であるという条件を満たす必要がある。よって、この独立の条件を満たしている場合には、識別モデル用特徴抽出テンプレートと同じテンプレートを利用してもよい。また、全く違ったテンプレートにしてもよく、対象とするタスクや事前知識により自由に設計することができる。 The output candidate graph generation unit 211 is the same as the output candidate graph generation unit 11 of the identification model learning unit 10 shown in FIG.
The feature extraction unit 212 extracts a feature from the output candidate graph by combining the estimated position label and the instance in the input series described in the generated model feature extraction template. A specific example of the generated model feature extraction template is shown in FIG. FIG. 10 is a diagram illustrating an example of a generated model feature extraction template. The basic format of the generation model feature extraction template is the same as that of the identification model feature extraction template, and the feature extraction method is also the same. The only difference is that the feature extraction template for the generated model needs to satisfy the condition that the features extracted as constraints of the hidden Markov model are independent from each other. Therefore, when this independent condition is satisfied, the same template as the identification model feature extraction template may be used. Moreover, it may be a completely different template, and can be designed freely according to the target task and prior knowledge.

図１０に例示した生成モデル用特徴抽出テンプレート１０００は、推定する位置のラベルとその前のラベルに対応する入力単語を特徴として抽出するものである。したがって、特徴抽出手段２１２は、生成モデル用特徴抽出テンプレート１０００を用いた場合に、図１０に示す「ｙ₄」の位置のラベルを推定するときに、図１１（ａ）に示す特徴を抽出する。具体的には、図１０に示す出力系列のｉ番目として「ｙ₄」の位置のラベルを「Ｂ−組織名」と推定する場合には、図１１（ｂ）に示す特徴を抽出する。同様に、図１０に示す出力系列のｉ番目として「ｙ₄」の位置のラベルを「Ｂ−人名」と推定する場合には、図１１（ｃ）に示す特徴を抽出する。また、特徴抽出手段２１２は、出力候補グラフと生成モデル用特徴抽出テンプレートを用いて、出力候補グラフの各ノードおよびリンクにシンボル生成確率および遷移確率を付与し、確率付き出力候補グラフを出力する。 The generation model feature extraction template 1000 illustrated in FIG. 10 extracts as input features a label at an estimated position and an input word corresponding to the preceding label. Therefore, the feature extraction unit 212 extracts the feature shown in FIG. 11A when estimating the label at the position “y ₄ ” shown in FIG. 10 when the generated model feature extraction template 1000 is used. . Specifically, when the label at the position of “y ₄ ” is estimated as “B-organization name” as the i-th output sequence shown in FIG. 10, the feature shown in FIG. 11B is extracted. Similarly, when the label at the position of “y ₄ ” is estimated as “B-person name” as the i-th output sequence shown in FIG. 10, the feature shown in FIG. 11C is extracted. Further, the feature extraction unit 212 assigns a symbol generation probability and a transition probability to each node and link of the output candidate graph using the output candidate graph and the generation model feature extraction template, and outputs an output candidate graph with probability.

目的関数計算手段２１３は、確率付き出力候補グラフと、生成モデル用パラメタベクトル集合Θ^(t)とが入力された目的関数（第１目的関数）Ｇを計算するものである。目的関数Ｇの前提として、前記した式（６）のＲ（ｙ｜ｘ；Λ，Θ，Γ）に基づき、入力系列ｘを与えた際の出力系列ｙの識別関数（discriminant function）ｇ（ｙ｜ｘ；Λ，Θ，Γ）を考える。前記した式（６）右辺の分母は正規化項なのでｙの決定には寄与しないため、識別関数ｇ（ｙ｜ｘ；Λ，Θ，Γ）は、式（６）右辺の分子のみを用いて以下のように定義できる。 The objective function calculation means 213 calculates an objective function (first objective function) G to which the output candidate graph with probability and the generation model parameter vector set Θ ^(t) are input. As a premise of the objective function G, based on R (y | x; Λ, Θ, Γ) of Equation (6) described above, a discriminant function g (y Consider | x; Λ, Θ, Γ). Since the denominator on the right side of the equation ( 6 ) is a normalized term, it does not contribute to the determination of y. Therefore, the discriminant function g (y | x; Λ, Θ, Γ) uses only the numerator on the right side of the equation ( 6 ). It can be defined as follows:

すべての出力ｙに対する識別関数ｇの出力値の合計を、式（１４）に示す目的関数Ｇ（Θ｜Γ）で定義する。ここで、ｐ（Θ）は、生成モデル用パラメタベクトル集合Θに対する事前確率分布を表している。したがって、目的関数計算手段２１３は、式（１４）に示す目的関数Ｇ（Θ｜Γ）を計算する。 The sum of the output values of the discrimination function g for all outputs y is defined by an objective function G (Θ | Γ) shown in Expression (14). Here, p (Θ) represents a prior probability distribution for the generation model parameter vector set Θ. Therefore, the objective function calculation means 213 calculates an objective function G (Θ | Γ) shown in Expression (14).

補助関数計算手段２１４は、式（１４）で示される目的関数Ｇ（Θ｜Γ）を最大化するパラメタベクトル集合Θを求める処理を行う。つまり、Γが既知のとき、Ｇ（Θ｜Γ）を初期値近傍で最大化するΘを、ＥＭアルゴリズムのような反復計算によって推定することができる。具体的には、補助関数計算手段２１４は、式（１５）に示すＱ関数（補助関数）が最大になるパラメタベクトル集合Θ′を、現在のパラメタベクトル集合Θより求め、Θ′がΘに対して増大しなくなるまでΘをΘ′で置き換えながら繰り返しＱ関数を求める。 The auxiliary function calculating unit 214 performs processing for obtaining a parameter vector set Θ that maximizes the objective function G (Θ | Γ) represented by the equation (14). That is, when Γ is known, Θ that maximizes G (Θ | Γ) in the vicinity of the initial value can be estimated by iterative calculation such as the EM algorithm. Specifically, the auxiliary function calculating unit 214 obtains a parameter vector set Θ ′ that maximizes the Q function (auxiliary function) shown in Expression (15) from the current parameter vector set Θ, and Θ ′ is equal to Θ. The Q function is repeatedly obtained while Θ is replaced with Θ ′ until it no longer increases.

式（１５）に示すＱ関数（Ｑ（Θ′，Θ；Γ））の形は、隠れマルコフモデルと同じ形であるため、隠れマルコフモデルで用いられるBaum-Welchアルゴリズムを用いて効率的にパラメタ更新することができる。ただし、隠れマルコフモデルでは、条件付き確率ｐ（ｙ｜ｘ；θ）を用いて周辺確率を計算するが、本実施形態では、式（６）で示すＲ（ｙ｜ｘ；Λ，Θ，Γ）を用いて周辺確率を計算する点が異なる。
パラメタ更新手段２１５は、収束判定手段２３によって、収束していないと判定された場合に、生成モデル用パラメタベクトル集合Θ^(t)を更新するものである。 Since the form of the Q function (Q (Θ ′, Θ; Γ)) shown in Equation (15) is the same as the hidden Markov model, the parameters can be efficiently obtained using the Baum-Welch algorithm used in the hidden Markov model. Can be updated. However, in the hidden Markov model, the peripheral probability is calculated using the conditional probability p (y | x; θ), but in this embodiment, R (y | x; Λ, Θ, Γ shown in Expression (6). ) Is used to calculate the marginal probability.
The parameter updating unit 215 updates the generated model parameter vector set Θ ^(t) when the convergence determining unit 23 determines that the convergence has not occurred.

この生成モデル学習手段２１において、ラベルなしデータＤ_uを用いて式（１４）で示される目的関数Ｇ（Θ｜Γ）を最大化させることは、あらゆる未知入力系列ｘに対して、出力系列ｙ間の識別関数ｇが大きい値を与えることを意味し、それは、識別の信頼性を高くすることに貢献する。なぜならば、仮に、識別関数ｇがすべての出力系列ｙに対して非常に小さい値を与えるような状況を想定してみると、すべての出力系列ｙ間での識別関数ｇの値の差が非常に小さくなってほぼ同じ値となることと等価であるために、識別の信頼性は低いと考えることができるからである。また、系列構造予測器学習手段２０は、ラベルなしデータＤ_uを識別関数ｇのすべての出力に対する総和を最大化（増加）させるためにのみに利用するのであって、ラベルなしデータＤ_uを最終的な系列構造予測器の最適化には直接用いることはしない。これは、ラベルなしデータＤ_uは正解出力が不明であるため、出力系列ｙに対する識別器の最適化には貢献できないためである。 In this product model learning unit 21, the objective function G of the formula (14) using the unlabeled data D _u | thereby maximizing (theta gamma) is for any unknown input sequence x, an output sequence y This means that the discriminant function g in between gives a large value, which contributes to increasing the reliability of discrimination. This is because if the situation is such that the discriminant function g gives a very small value to all output series y, the difference in the value of the discriminant function g between all the output series y is very large. This is because it is equivalent to the fact that it becomes smaller and becomes almost the same value, so it can be considered that the reliability of identification is low. Further, the sequence structure predictor learning unit 20 is a of being only utilized to maximize (increase) the sum of the unlabeled data D _u for all the output of the discriminant function g, the final the unlabeled data D _u It is not used directly for optimizing typical sequence structure predictors. This is because the correct output of the unlabeled data _Du is unknown, so that it cannot contribute to the optimization of the discriminator for the output sequence y.

≪モデル統合学習手段≫
モデル統合学習手段２２は、任意の生成モデル用パラメタベクトル集合Θに対して、モデル統合用パラメタ集合Γに対する事後確率最大化（ＭＡＰ：Maximum A Posteriori）によるパラメタ推定（ＭＡＰ推定）を行うものであり、目的関数（第２目的関数）を最大化するようなモデル統合用パラメタ集合Γを推定する。このモデル統合学習手段２２は、出力候補グラフ生成手段２２１と、特徴抽出手段２２２と、目的関数計算手段２２３と、識別モデル用偏微分計算手段２２４と、生成モデル用偏微分計算手段２２５と、パラメタ更新手段２２６とを備えている。このうち、出力候補グラフ生成手段２２１と、特徴抽出手段２２２とは、図２に示した識別モデル学習手段１０の出力候補グラフ生成手段１１と、特徴抽出手段１２と同様なので説明を省略する。また、モデル統合学習手段２２は、前記した式（１）に示すラベルありデータＤ_lを用いる。 ≪Model integrated learning means≫
The model integrated learning means 22 performs parameter estimation (MAP estimation) by posterior probability maximization (MAP: Maximum A Posteriori) for the model integration parameter set Γ for an arbitrary generated model parameter vector set Θ. Then, a parameter set Γ for model integration that maximizes the objective function (second objective function) is estimated. The model integrated learning unit 22 includes an output candidate graph generation unit 221, a feature extraction unit 222, an objective function calculation unit 223, an identification model partial differential calculation unit 224, a generation model partial differential calculation unit 225, a parameter Update means 226. Among these, the output candidate graph generation unit 221 and the feature extraction unit 222 are the same as the output candidate graph generation unit 11 and the feature extraction unit 12 of the identification model learning unit 10 shown in FIG. Further, the model integrated learning means 22 uses the labeled data D _l shown in the above equation (1).

目的関数計算手段２２３は、モデル統合用パラメタ集合Γ^(t)が入力された目的関数（第２目的関数）Ｌ^SS-Hybを計算するものである。目的関数Ｌ^SS-Hyb（Γ｜Θ）は、式（１６）で定義される。ただし、ｐ（Γ）をΓの事前確率分布とする。 The objective function calculation means 223 calculates an objective function (second objective function) L ^SS-Hyb to which the model integration parameter set Γ ^(t) is input. The objective function L ^SS-Hyb (Γ | Θ) is defined by Expression (16). However, p (Γ) is a prior probability distribution of Γ.

任意の固定されたΘ上で、目的関数Ｌ^SS-Hyb（Γ｜Θ）はパラメタ集合Γに対して凸関数となるので、この最適化は大域的最適解が保証される。よって、目的関数Ｌ^SS-Hyb（Γ｜Θ）の勾配を計算すれば、Ｌ−ＢＦＧＳといった勾配を用いる最適化アルゴリズムを適用して容易に解を得ることができる。 Since the objective function L ^SS-Hyb (Γ | Θ) is a convex function with respect to the parameter set Γ on any fixed Θ, this optimization guarantees a global optimal solution. Therefore, if the gradient of the objective function L ^SS-Hyb (Γ | Θ) is calculated, a solution can be easily obtained by applying an optimization algorithm using a gradient such as L-BFGS.

識別モデル用偏微分計算手段２２４は、式（１６）に示す目的関数Ｌ^SS-Hyb（Γ｜Θ）を識別モデル（本実施形態ではＣＲＦ）用のモデル統合パラメタγ_iで偏微分する計算を行うものである。具体的には、識別モデル用偏微分計算手段２２４は、式（１７）の右辺を計算する。式（１７）の右辺第１項と第２項とは最適化処理中には定数となるため、事前に一度計算しておけばよい。式（１７）の右辺第３項の計算については、説明の都合上、後記する。 The partial differential calculation means 224 for the identification model performs a calculation for partial differentiation of the objective function L ^SS-Hyb (Γ | Θ) shown in Expression (16) by the model integration parameter γ _i for the identification model (CRF in this embodiment). Is what you do. Specifically, the partial differential calculation means for identification model 224 calculates the right side of Expression (17). Since the first term and the second term on the right side of Equation (17) are constants during the optimization process, they need only be calculated once in advance. The calculation of the third term on the right side of Equation (17) will be described later for convenience of explanation.

生成モデル用偏微分計算手段２２５は、式（１６）に示す目的関数Ｌ^SS-Hyb（Γ｜Θ）を生成モデル用のモデル統合パラメタγ_jで偏微分する計算を行うものである。具体的には、生成モデル用偏微分計算手段２２５は、式（１８）の右辺を計算する。 The partial differential calculation means for generation model 225 performs a calculation for partial differentiation of the objective function L ^SS-Hyb (Γ | Θ) shown in Expression (16) with the model integration parameter γ _j for the generation model. Specifically, the generation model partial differential calculation means 225 calculates the right side of the equation (18).

式（１８）の右辺第１項は最適化処理中には定数となるため、事前に一度計算しておけばよい。次に、式（１８）の右辺第２項の計算について、前記した式（１７）の右辺第３項の計算と合わせて説明する。ここで、前記した式（６）右辺の分母をＮ_R（ｘ）で表すとすると、前記した式（６）は式（１９）のように書き表すことができる。 Since the first term on the right side of Equation (18) becomes a constant during the optimization process, it may be calculated once in advance. Next, the calculation of the second term on the right side of Equation (18) will be described together with the calculation of the third term on the right side of Equation (17). Here, if the denominator on the right side of Equation (6) is expressed by N _R (x), Equation (6) can be expressed as Equation (19).

この式（１９）によれば、各位置ｓのコストは、識別モデルと生成モデルの各位置ｓに対応する値の総乗で求められ、式（１９）に示す条件付き確率Ｒ（ｙ｜ｘ；Λ，Θ，Γ）は、そのコストのすべての位置での総乗と全体の比率で表される。式（１７）の右辺第３項と式（１８）の右辺第２項は、各モデルの出力値に対する期待値であるため式（１９）から、forward-backwardアルゴリズムを用いて効率的に計算できる。つまり、Γの推定は、従来、条件付き確率場で用いられていたものと全く同じforward-backwardアルゴリズムを用いて効率的に導出できる。
パラメタ更新手段２２６は、収束判定手段２３によって、収束していないと判定された場合に、モデル統合用パラメタ集合Γ^(t)を更新するものである。 According to this equation (19), the cost of each position s is obtained by the sum of the values corresponding to each position s of the identification model and the generation model, and the conditional probability R (y | x shown in equation (19) ; Λ, Θ, Γ) is expressed as the sum of the cost at all positions and the overall ratio. Since the third term on the right side of Equation (17) and the second term on the right side of Equation (18) are expected values for the output values of each model, they can be efficiently calculated from Equation (19) using the forward-backward algorithm. . In other words, the estimation of Γ can be efficiently derived using the same forward-backward algorithm as conventionally used in the conditional random field.
The parameter updating unit 226 updates the model integration parameter set Γ ^(t) when the convergence determining unit 23 determines that the model has not converged.

本実施形態では、収束判定手段２３の判別する収束条件として、式（２０）で示すΔが所定値以下になったときに収束したものと判定する。これは、モデル統合用パラメタ集合Γが、固定の生成モデル用パラメタベクトル集合Θに対して大域的最適解を持つためである。なお、式（２０）に示すΔの代わりに、例えば、｜Θ^(t)−Θ^(t-1)｜、｜Γ^(t)−Γ^(t-1)｜等を用いるようにしても構わない。 In the present embodiment, the convergence condition determined by the convergence determination means 23 is determined to have converged when Δ shown in Expression (20) becomes a predetermined value or less. This is because the model integration parameter set Γ has a global optimal solution for the fixed generation model parameter vector set Θ. For example, | Θ ^(t) −Θ ^(t−1) |, | Γ ^(t) −Γ ^(t−1) |, etc. may be used instead of Δ shown in Expression (20). Absent.

以上の構成によって、収束判定手段２３は、収束したと判定した場合には、その時点の生成モデル用パラメタベクトル集合Θを生成モデル用パラメタベクトル集合記憶手段５２に出力すると共に、その時点のモデル統合用パラメタ集合Γをモデル統合用パラメタ集合記憶手段５３に出力する。出力される生成モデル用パラメタベクトル（列ベクトル）の一例を図１２に示す。また、出力されるモデル統合用パラメタ集合の一例を列ベクトル形式で図１３に示す。 With the above configuration, when the convergence determination unit 23 determines that the convergence has occurred, the generation model parameter vector set Θ at that time is output to the generation model parameter vector set storage unit 52 and the model integration at that time is integrated. The parameter set Γ for use is output to the parameter set storage means 53 for model integration. An example of the output generation model parameter vector (column vector) is shown in FIG. An example of the model integration parameter set to be output is shown in FIG. 13 in a column vector format.

なお、生成モデル学習手段２１と、モデル統合学習手段２２とは、ＣＰＵが記憶手段のＨＤＤ等に格納された所定のプログラムをＲＡＭに展開して実行することによりその機能が実現されるものである。 The generation model learning means 21 and the model integrated learning means 22 are realized by the CPU developing and executing a predetermined program stored in the HDD or the like of the storage means on the RAM. .

［言語解析モデル作成装置の動作］
図１に示した言語解析モデル作成装置の動作について主に言語解析モデル学習装置２の動作を中心に図１４を参照（適宜図１および図２参照）して説明する。図１４は、図１に示した言語解析モデル作成装置の動作を示すフローチャートである。言語解析モデル作成装置１の言語解析モデル学習装置２は、ラベルありデータＤ_l、ラベルなしデータＤ_u、および、学習支援情報を入力する（ステップＳ１）。学習支援情報としては、対象とする問題により自動的に決定される「出力ラベル候補集合」、対象とする問題により人手にて決定される「識別モデル用特徴抽出テンプレートおよび生成モデル用特徴抽出テンプレート」を入力する。 [Operation of language analysis model creation device]
The operation of the language analysis model creation apparatus shown in FIG. 1 will be described with reference to FIG. 14 (refer to FIGS. 1 and 2 as appropriate), mainly focusing on the operation of the language analysis model learning apparatus 2. FIG. 14 is a flowchart showing the operation of the language analysis model creation device shown in FIG. The language analysis model learning device 2 of the language analysis model creation device 1 inputs the labeled data D _l , the unlabeled data D _u , and learning support information (step S1). As learning support information, “output label candidate set” automatically determined by the target problem, “identification model feature extraction template and generation model feature extraction template” manually determined by the target problem Enter.

そして、言語解析モデル学習装置２は、識別モデル学習手段１０によって、ラベルありデータＤ_l′、出力ラベル候補集合、および、識別モデル用特徴抽出テンプレートを用いて、教師あり学習処理（条件付き確率場）を実行し（ステップＳ２）、教師あり学習処理の結果得られた識別モデル用パラメタベクトル集合Λを識別モデル用パラメタベクトル集合記憶手段５１に出力する（ステップＳ３）。そして、言語解析モデル学習装置２は、系列構造予測器学習手段２０によって、識別モデル用パラメタベクトル集合記憶手段５１から識別モデル用パラメタベクトル集合Λを取得して入力し（ステップＳ４）、生成モデル用パラメタベクトル集合Θおよびモデル統合用パラメタ集合Γをｔ＝０で初期化する（ステップＳ５）。 Then, the language analysis model learning device 2 uses the identification model learning means 10 to perform supervised learning processing (conditional random field using the labeled data D _{l ′} , the output label candidate set, and the identification model feature extraction template. ) (Step S2), and outputs the identification model parameter vector set Λ obtained as a result of the supervised learning process to the identification model parameter vector set storage means 51 (step S3). Then, the language analysis model learning device 2 acquires and inputs the identification model parameter vector set Λ from the identification model parameter vector set storage unit 51 by the sequence structure predictor learning unit 20 (step S4), and generates the generated model. The parameter vector set Θ and the model integration parameter set Γ are initialized at t = 0 (step S5).

次に、言語解析モデル学習装置２は、系列構造予測器学習手段２０の生成モデル学習手段２１によって、ラベルなしデータＤ_u、出力ラベル候補集合、および、生成モデル用特徴抽出テンプレートを用いて、生成モデル用パラメタベクトル集合推定処理を実行する（ステップＳ６）。この処理は、詳細は後記するが、固定したΛとΓの下でΘを推定するものである。そして、言語解析モデル学習装置２は、系列構造予測器学習手段２０のモデル統合学習手段２２によって、ラベルありデータＤ_L、出力ラベル候補集合を用いて、モデル統合用パラメタ集合推定処理を実行する（ステップＳ７）。この処理は、詳細は後記するが、固定したΛとΘの下でΓを推定するものである。 Next, the language analysis model learning device 2 uses the generation model learning unit 21 of the sequence structure predictor learning unit 20 to generate the unlabeled data D _u , the output label candidate set, and the generation model feature extraction template. Model parameter vector set estimation processing is executed (step S6). Although this process will be described in detail later, Θ is estimated under fixed Λ and Γ. Then, the language analysis model learning device 2 uses the model integrated learning unit 22 of the sequence structure predictor learning unit 20 to execute the model integration parameter set estimation process using the labeled data D _L and the output label candidate set ( Step S7). Although this process will be described later in detail, Γ is estimated under fixed Λ and Θ.

そして、言語解析モデル学習装置２は、収束判定手段２３によって、モデル統合用パラメタ集合Γが収束したか否かを判別する（ステップＳ８）。収束していない場合（ステップＳ８：Ｎｏ）、言語解析モデル学習装置２は、系列構造予測器学習手段２０によって、現在のｔの値に「１」を加え（ステップＳ９）、ステップＳ６に戻る。一方、収束した場合（ステップＳ８：Ｙｅｓ）、言語解析モデル学習装置２は、系列構造予測器学習手段２０によって、現時点の生成モデル用パラメタベクトル集合Θおよびモデル統合用パラメタ集合Γを出力する（ステップＳ１０）。そして、言語解析モデル作成装置１のパラメタ統合装置３は、言語解析モデル学習装置２から出力される生成モデル用パラメタベクトル集合Θ、モデル統合用パラメタ集合Γ、および識別モデル用パラメタベクトル集合Λを、単一のパラメタ集合に統合する（ステップＳ１１）。 Then, the language analysis model learning device 2 determines whether or not the model integration parameter set Γ has converged by the convergence determination unit 23 (step S8). If not converged (step S8: No), the language analysis model learning device 2 adds “1” to the current value of t by the sequence structure predictor learning unit 20 (step S9), and returns to step S6. On the other hand, when it converges (step S8: Yes), the language analysis model learning device 2 outputs the current generation model parameter vector set Θ and the model integration parameter set Γ by the sequence structure predictor learning unit 20 (step S8). S10). Then, the parameter integration device 3 of the language analysis model creation device 1 generates the generated model parameter vector set Θ, the model integration parameter set Γ, and the identification model parameter vector set Λ output from the language analysis model learning device 2. Integration into a single parameter set (step S11).

＜教師あり学習処理＞
図１５は、図１４に示した教師あり学習処理を示すフローチャートである。言語解析モデル学習装置２の識別モデル学習手段１０は、出力候補グラフ生成手段１１によって、入力されたラベルありデータＤ_l′から出力候補グラフを生成する（ステップＳ２１）。そして、識別モデル学習手段１０は、特徴抽出手段１２によって、出力候補グラフから、推定する位置のラベルと識別モデル用特徴抽出テンプレートに記述された入力系列中のインスタンスとの組み合わせにより特徴を抽出する（ステップＳ２２）。そして、識別モデル学習手段１０は、識別モデル用パラメタベクトルλの初期値をパラメタ学習手段１３に入力する（ステップＳ２３）。次に、識別モデル学習手段１０のパラメタ学習手段１３は、目的関数計算手段１３１によって、前記した式（１１）で示される目的関数Ｌ^CRF（λ）を計算する（ステップＳ２４）。そして、パラメタ学習手段１３は、目的関数勾配計算手段１３２によって、目的関数の勾配∇Ｌ^CRF（λ）を前記した式（１２）に基づいて計算し（ステップＳ２５）、収束判定手段１３３によって、目的関数の勾配∇Ｌ^CRF（λ）の値が収束したか否かを判別する（ステップＳ２６）。収束した場合（ステップＳ２６：Ｙｅｓ）、識別モデル学習手段１０は、その時点の識別モデル用パラメタベクトルλを識別モデル用パラメタベクトル集合記憶手段５１に出力する（ステップＳ２７）。一方、収束していない場合（ステップＳ２６：Ｎｏ）、パラメタ学習手段１３は、パラメタ更新手段１３４によって、パラメタベクトルλを更新する（ステップＳ２８）。 <Supervised learning process>
FIG. 15 is a flowchart showing the supervised learning process shown in FIG. The identification model learning means 10 of the language analysis model learning device 2 generates an output candidate graph from the input labeled data D1 _′ by the output candidate graph generation means 11 (step S21). Then, the identification model learning unit 10 extracts a feature from the output candidate graph by the combination of the label of the position to be estimated and the instance in the input sequence described in the feature extraction template for the identification model by the feature extraction unit 12 ( Step S22). Then, the discrimination model learning means 10 inputs the initial value of the discrimination model parameter vector λ to the parameter learning means 13 (step S23). Next, the parameter learning means 13 of the discrimination model learning means 10 calculates the objective function L ^CRF (λ) represented by the above-described equation (11) by the objective function calculation means 131 (step S24). Then, the parameter learning means 13 calculates the objective function gradient ∇L ^CRF (λ) based on the above equation (12) by the objective function gradient calculating means 132 (step S25), and the convergence determining means 133 calculates the objective function gradient ∇L ^CRF (λ). It is determined whether or not the value of the function gradient ∇L ^CRF (λ) has converged (step S26). When it has converged (step S26: Yes), the discrimination model learning means 10 outputs the discrimination model parameter vector λ at that time to the discrimination model parameter vector set storage means 51 (step S27). On the other hand, when it has not converged (step S26: No), the parameter learning unit 13 updates the parameter vector λ by the parameter update unit 134 (step S28).

＜生成モデル用パラメタベクトル集合推定処理＞
図１６は、図１４に示した生成モデル用パラメタベクトル集合推定処理を示すフローチャートである。言語解析モデル学習装置２の生成モデル学習手段２１は、出力候補グラフ生成手段２１１によって、入力されたラベルなしデータＤ_uから出力候補グラフを生成する（ステップＳ４１）。そして、生成モデル学習手段２１は、特徴抽出手段２１２によって、出力候補グラフから、推定する位置のラベルと生成モデル用特徴抽出テンプレートに記述された入力系列中のインスタンスとの組み合わせにより特徴を抽出する（ステップＳ４２）。そして、生成モデル学習手段２１は、生成モデル用パラメタベクトル集合Θ^(t)を目的関数計算手段２１３に入力する（ステップＳ４３）。次に、生成モデル学習手段２１は、目的関数計算手段２１３によって、前記した式（１４）に示す目的関数Ｇ（Θ｜Γ）を計算し（ステップＳ４４）、補助関数計算手段２１４によって、式（１５）に示すＱ関数（Ｑ（Θ′，Θ；Γ））を計算する（ステップＳ４５）。そして、生成モデル学習手段２１は、処理の結果として生成モデル用パラメタベクトル集合Θ^(t+1)を収束判定手段２３に出力する（ステップＳ４６）。 <Generation model parameter vector set estimation processing>
FIG. 16 is a flowchart showing the generation model parameter vector set estimation processing shown in FIG. Generating model learning unit 21 of the language analysis model learning device 2, the output candidate graph generation unit 211 generates an output candidate graph from unlabeled data D _u which is input (step S41). Then, the generation model learning unit 21 extracts a feature from the output candidate graph by a combination of the label of the position to be estimated and the instance in the input sequence described in the generation model feature extraction template by the feature extraction unit 212 ( Step S42). Then, the generation model learning unit 21 inputs the generation model parameter vector set Θ ^(t) to the objective function calculation unit 213 (step S43). Next, the generation model learning unit 21 calculates the objective function G (Θ | Γ) shown in the above equation (14) by the objective function calculation unit 213 (step S44), and the auxiliary function calculation unit 214 calculates the equation ( The Q function (Q (Θ ′, Θ; Γ)) shown in 15) is calculated (step S45). Then, the generation model learning unit 21 outputs the generation model parameter vector set Θ ^{(t + 1)} to the convergence determination unit 23 as a result of the processing (step S46).

＜モデル統合用パラメタ集合推定処理＞
図１７は、図１４に示したモデル統合用パラメタ集合推定処理を示すフローチャートである。言語解析モデル学習装置２のモデル統合学習手段２２は、出力候補グラフ生成手段２２１によって、入力されたラベルありデータＤ_lから出力候補グラフを生成する（ステップＳ６１）。そして、モデル統合学習手段２２は、特徴抽出手段２２２によって、出力候補グラフから、推定する位置のラベルと識別モデル用特徴抽出テンプレートに記述された入力系列中のインスタンスとの組み合わせにより特徴を抽出する（ステップＳ６２）。そして、モデル統合学習手段２２は、モデル統合用パラメタ集合Γ^(t)を目的関数計算手段２２３に入力する（ステップＳ６３）。次に、モデル統合学習手段２２は、目的関数計算手段２２３によって、前記した式（１６）に示す目的関数Ｌ^SS-Hyb（Γ｜Θ）を計算し（ステップＳ６４）、識別モデル用偏微分計算手段２２４によって、前記した式（１７）に基づいて、識別モデルのパラメタγ_iによる偏微分を計算し（ステップＳ６５）、生成モデル用偏微分計算手段２２５によって、前記した式（１８）に基づいて、生成モデルのパラメタγ_jによる偏微分を計算する（ステップＳ６６）。そして、モデル統合学習手段２２は、処理の結果としてモデル統合用パラメタ集合Γ^(t+1)を収束判定手段２３に出力する（ステップＳ６７）。 <Model set parameter set estimation process>
FIG. 17 is a flowchart showing the model integration parameter set estimation processing shown in FIG. Model integrated learning unit 22 of the language analysis model learning device 2, the output candidate graph generation unit 221 generates an output candidate graph from there Labels input data D _l (step S61). Then, the model integrated learning means 22 uses the feature extraction means 222 to extract features from the output candidate graph by combining the estimated position label and the instance in the input sequence described in the identification model feature extraction template ( Step S62). Then, the model integration learning unit 22 inputs the model integration parameter set Γ ^(t) to the objective function calculation unit 223 (step S63). Next, the model integrated learning means 22 uses the objective function calculation means 223 to calculate the objective function L ^SS-Hyb (Γ | Θ) shown in the above equation (16) (step S64), and the partial differential calculation for the discrimination model The means 224 calculates the partial differentiation based on the parameter γ _i of the identification model based on the above equation (17) (step S65), and the generation model partial differentiation calculation means 225 based on the above equation (18). Then, the partial differentiation by the parameter γ _j of the generation model is calculated (step S66). Then, the model integrated learning unit 22 outputs the model integration parameter set Γ ^{(t + 1)} to the convergence determination unit 23 as a result of the processing (step S67).

なお、言語解析モデル学習装置２は、一般的なコンピュータに、前記した各ステップを実行させる言語解析モデル学習プログラムを実行することで実現することもできる。このプログラムは、通信回線を介して配布することも可能であるし、ＣＤ−ＲＯＭ等の記録媒体に書き込んで配布することも可能である。 The language analysis model learning device 2 can also be realized by executing a language analysis model learning program that causes a general computer to execute each step described above. This program can be distributed via a communication line, or can be written on a recording medium such as a CD-ROM for distribution.

本実施形態の言語解析モデル学習装置２によれば、ラベルなしデータＤ_uを利用して生成モデル用パラメタベクトル集合Θを決定し、決定された生成モデル用パラメタベクトル集合ΘとラベルありデータＤ_lとを利用してモデル統合用パラメタ集合Γを決定することで、生成アプローチにより取り込んだラベルなしデータＤ_uを識別アプローチにより学習することができる。そのため、獲得が比較的簡単なラベルなしデータＤ_uを利用して構造予測器の学習を行うことが可能となる。その結果、ラベルありデータＤ_lの量が同じであれば、従来の条件付き確率場の学習方法よりも性能がよい構造予測器の学習を行うことが可能となる。また、あるドメインのラベルありデータＤ_lが存在しない場合に、同タスク別ドメインのラベルありデータＤ_lを用いて学習した構造予測器と、対象ドメインのラベルなしデータＤ_uとを用いて学習を行うことが可能となる。つまり、従来、ラベルありデータＤ_lが存在しない場合には予測が困難であったドメインに対して、ラベルなしデータＤ_uを獲得するコストのみで、高性能の構造予測器を作成することが可能となる。 According to the language analysis model learning device 2 of the present embodiment, to determine the parameter vector set Θ for generating model utilizing unlabeled data D _u, with parameter vector set Θ and the label determined generated model data D _l preparative by determining the model integration parameter set Γ using the unlabeled data D _u captured by generation approach may be learned by the identification approach. Therefore, it is possible to learn the structure predictor using the unlabeled data _Du that is relatively easy to acquire. As a result, if the amount of the labeled data D _l is the same, it is possible to learn a structure predictor with better performance than the conventional conditional random field learning method. Furthermore, when the label has the data D _l of a domain does not exist, the structure predictor learned using a label with data D _l of the task-specific domain, a learning by using the unlabeled data D _u target domain Can be done. That is, conventionally, the domain prediction is difficult when the label has the data D _l is not present, only the cost of acquiring the unlabeled data D _u, can create a high-performance structural predictor It becomes.

以上、本発明の実施形態について説明したが、本発明はこれに限定されるものではなく、その趣旨を変えない範囲で実施することができる。例えば、本実施形態では、識別モデル学習手段１０で利用するラベルありデータと、モデル統合学習手段２２で利用するラベルありデータとを別のものとして説明したが、これに限定されるものではなく、同一であっても構わない。ただし、それぞれの処理部で異なるラベルありデータを用いた方が、適合性の高い言語解析モデルが構築できることが多いことが実験的に分かっているので、識別モデル学習手段１０とモデル統合学習手段２２とでそれぞれ利用するラベルありデータは異なる方が好ましい。 As mentioned above, although embodiment of this invention was described, this invention is not limited to this, It can implement in the range which does not change the meaning. For example, in the present embodiment, the labeled data used by the identification model learning unit 10 and the labeled data used by the model integrated learning unit 22 have been described as different, but the present invention is not limited thereto. It may be the same. However, since it has been experimentally known that it is often possible to construct a language analysis model having high compatibility when using data with different labels in each processing unit, the identification model learning unit 10 and the model integrated learning unit 22 It is preferable that the data with labels used in and are different.

また、本実施形態では、識別モデル用特徴抽出テンプレートおよび生成モデル用特徴抽出テンプレートが、対象とする問題により人手にて決定されるものとして説明したが、複数の問題に対応できるように予め作成された複数のテンプレートを用意しておき、対象とする問題を入力することで自動的にテンプレートを選択するように構成してもよい。
また、本実施形態では、言語解析モデル学習装置２は、識別モデル学習手段１０を備えるベストモードの構成で説明したが、予め学習された識別モデル用パラメタベクトル集合Λを利用できるように構成していれば、識別モデル学習手段１０を備えていなくてもよい。
また、本実施形態では、パラメタ統合装置３は、言語解析モデル学習装置２とは別に設けるものとして説明したが、言語解析モデル学習装置２に含めて構成するようにしてもよい。 In the present embodiment, the feature extraction template for the identification model and the feature extraction template for the generation model have been described as being manually determined depending on the target problem. However, the feature extraction template for the generation model is created in advance so as to deal with a plurality of problems. A plurality of templates may be prepared, and a template may be automatically selected by inputting a target problem.
In the present embodiment, the language analysis model learning device 2 has been described as having the best mode configuration including the identification model learning means 10, but is configured so that a previously learned identification model parameter vector set Λ can be used. If so, the identification model learning means 10 may not be provided.
In the present embodiment, the parameter integration device 3 has been described as being provided separately from the language analysis model learning device 2, but may be configured to be included in the language analysis model learning device 2.

本発明の実施形態に係る言語解析モデル作成装置の概要を模式的に示す構成図である。It is a block diagram which shows typically the outline | summary of the language analysis model production apparatus which concerns on embodiment of this invention. 図１に示した言語解析モデル学習装置の構成を模式的に示す機能ブロック図である。It is a functional block diagram which shows typically the structure of the language analysis model learning apparatus shown in FIG. 図２に示した言語解析モデル学習装置に入力する情報の例を示す図であって、（ａ）はラベルありデータ、（ｂ）は出力ラベル候補集合をそれぞれ示している。It is a figure which shows the example of the information input into the linguistic analysis model learning apparatus shown in FIG. 2, Comprising: (a) has shown data with a label, (b) has each shown the output label candidate set. 図２に示した言語解析モデル学習装置で生成される出力候補グラフの一例を模式的に示す図である。It is a figure which shows typically an example of the output candidate graph produced | generated with the language analysis model learning apparatus shown in FIG. 識別モデル用特徴抽出テンプレートの例を示す図である。It is a figure which shows the example of the feature extraction template for identification models. 図５に示した識別モデル用特徴抽出テンプレートで抽出される特徴の例を示す図である。It is a figure which shows the example of the feature extracted with the feature extraction template for identification models shown in FIG. 図４に丸で示したノードに対して、図５に示した識別モデル用特徴抽出テンプレートを用いて作成された特徴ベクトルの説明図である。FIG. 6 is an explanatory diagram of a feature vector created using the identification model feature extraction template shown in FIG. 5 for the nodes indicated by circles in FIG. 4. 図２に示した識別モデル学習手段から出力される識別モデル用パラメタベクトルの一例を示す図である。It is a figure which shows an example of the parameter vector for identification models output from the identification model learning means shown in FIG. 図２に示した系列構造予測器学習手段の構成を模式的に示す機能ブロック図である。It is a functional block diagram which shows typically the structure of the sequence structure predictor learning means shown in FIG. 生成モデル用特徴抽出テンプレートの例を示す図である。It is a figure which shows the example of the feature extraction template for generation | occurrence | production models. 図１０に示した生成モデル用特徴抽出テンプレートで抽出される特徴の例を示す図である。It is a figure which shows the example of the feature extracted by the feature extraction template for generation models shown in FIG. 図９に示した系列構造予測器学習手段から出力される生成モデル用パラメタベクトルの一例を示す図である。It is a figure which shows an example of the parameter vector for production | generation models output from the sequence structure predictor learning means shown in FIG. 図９に示した系列構造予測器学習手段から出力されるモデル統合用パラメタ集合の一例を示す図である。It is a figure which shows an example of the parameter set for model integration output from the sequence structure predictor learning means shown in FIG. 図１に示した言語解析モデル作成装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the language analysis model creation apparatus shown in FIG. 図１４に示した教師あり学習処理を示すフローチャートである。It is a flowchart which shows the supervised learning process shown in FIG. 図１４に示した生成モデル用パラメタベクトル集合推定処理を示すフローチャートである。It is a flowchart which shows the parameter vector set estimation process for generation | occurrence | production models shown in FIG. 図１４に示したモデル統合用パラメタ集合推定処理を示すフローチャートである。It is a flowchart which shows the parameter set estimation process for model integration shown in FIG. 系列構造予測問題で扱う入力系列および出力系列の例を模式的に示す図である。It is a figure which shows typically the example of the input series and output series which are handled with a sequence structure prediction problem.

Explanation of symbols

１言語解析モデル作成装置
２言語解析モデル学習装置
１０識別モデル学習手段
１１出力候補グラフ生成手段
１２特徴抽出手段
１３パラメタ学習手段
１３１目的関数計算手段
１３２目的関数勾配計算手段
１３３収束判定手段
１３４パラメタ更新手段
２０系列構造予測器学習手段
２１生成モデル学習手段
２１１出力候補グラフ生成手段
２１２特徴抽出手段
２１３目的関数計算手段
２１４補助関数計算手段
２１５パラメタ更新手段
２２モデル統合学習手段
２２１出力候補グラフ生成手段
２２２特徴抽出手段
２２３目的関数計算手段
２２４識別モデル用偏微分計算手段
２２５生成モデル用偏微分計算手段
２２６パラメタ更新手段
２３収束判定手段
３パラメタ統合装置（パラメタ統合手段）
４学習支援情報記憶手段
４１識別モデル用特徴抽出テンプレート記憶手段
４２生成モデル用特徴抽出テンプレート記憶手段
４３出力ラベル候補記憶手段
５パラメタ集合記憶手段
５１識別モデル用パラメタベクトル集合記憶手段
５２生成モデル用パラメタベクトル集合記憶手段
５３モデル統合用パラメタ集合記憶手段
６言語解析モデル記憶手段
７系列構造予測装置 DESCRIPTION OF SYMBOLS 1 Language analysis model creation apparatus 2 Language analysis model learning apparatus 10 Discrimination model learning means 11 Output candidate graph generation means 12 Feature extraction means 13 Parameter learning means 131 Objective function calculation means 132 Objective function gradient calculation means 133 Convergence determination means 134 Parameter update means 134 20 Sequence structure predictor learning means 21 Generation model learning means 211 Output candidate graph generation means 212 Feature extraction means 213 Objective function calculation means 214 Auxiliary function calculation means 215 Parameter update means 22 Model integrated learning means 221 Output candidate graph generation means 222 Feature extraction Means 223 Objective function calculation means 224 Identification model partial differential calculation means 225 Generation model partial differential calculation means 226 Parameter update means 23 Convergence determination means 3 Parameter integration device (parameter integration means)
4 learning support information storage means 41 identification model feature extraction template storage means 42 generation model feature extraction template storage means 43 output label candidate storage means 5 parameter set storage means 51 identification model parameter vector set storage means 52 generation model parameter vector Set storage means 53 Parameter set storage means for model integration 6 Language analysis model storage means 7 Sequence structure prediction device

Claims

Based on the identification model and the generation model, the character string or symbol string based on the identification model and the generation model with the labeled data indicating the data with the label attached to the character string or symbol string and the unlabeled data indicating the character string or symbol string as input data A language analysis model learning device for learning a language analysis model used for estimating a label to be assigned to
The identification model is a model for estimating the label to be given using a conditional probability indicating a probability that a label candidate predetermined in the condition of an input character string or symbol string appears,
The generation model is a model for estimating the label to be assigned using a joint probability indicating a probability that an input character string or symbol string and the predetermined label candidate are generated simultaneously;
Using the unlabeled data the input, advance and parameter vector set for learning identification model, using the model integration parameter set predetermined for Generating Models that maximize the first objective function A generation model learning means for determining a parameter vector set;
Using the input labeled data , the second objective function is obtained using the previously learned identification model parameter vector set and the generated model parameter vector set determined by the generated model learning means. and a model integrated learning means for determining a pre-SL model integration parameter set you maximized,
Alternately executing the process of determining the generated model parameter vector set by the generated model learning means and the process of determining the model integration parameter set by the model integrated learning means,
One of said generated model learning means and the model integration learning means the model integration parameter set and the parameter vector set for the product model determined alternately, it is determined whether or not a predetermined convergence condition is satisfied, the Convergence determining means for outputting the generated model parameter vector set and the model integration parameter set at that time when it is determined that the convergence condition is satisfied ,
The first objective function uses the identification model parameter vector set, the generated model parameter vector set, and the model integration parameter set to identify all outputs when unlabeled data is given. A function that calculates the sum of output values of a function
The second objective function is a function that calculates a degree to which labeled data can be correctly identified using the identification model parameter vector set, the generated model parameter vector set, and the model integration parameter set. ,
A language analysis model learning device characterized by that.

The generation model learning means includes:
Objective function calculating means for calculating an objective function G (Θ | Γ) shown in the following equation (14) as the first objective function;
Auxiliary function calculation means for performing processing for obtaining a parameter vector set Θ that maximizes an objective function G (Θ | Γ) represented by the following equation (14) under fixed Λ and Γ;
A parameter updating unit that updates the obtained generation model parameter vector set Θ when the convergence determining unit determines that the convergence condition is not satisfied, and
The model integrated learning means includes:
The objective function L shown in the following equation (16) as the second objective function ^SS-HybSS-Hyb An objective function calculating means for calculating (Γ | Θ) under fixed Λ and Θ,
The objective function L shown in the following equation (16) ^SS-HybSS-Hyb (Γ | Θ) is the model integration parameter γ for the discrimination model _ii Partial differentiation calculation means for identification model for performing partial differentiation at
The objective function L shown in the following equation (16) ^SS-HybSS-Hyb Model integration parameter γ for generating model (Γ | Θ) _jj A partial differential calculation means for a generation model that performs a partial differential calculation at
Parameter updating means for updating the obtained model integration parameter set Γ when the convergence determining means determines that the convergence condition is not satisfied.
The language analysis model learning device according to claim 1.

here,
Labeled data is labeled sample = (x ⁿⁿ , Y ⁿⁿ ) N sets of
Unlabeled data is unlabeled sample = (x ^mm ) Shows M sets,
Λ represents an identification model parameter vector set represented by the following equation (3):
Θ represents a generation model parameter vector set represented by the following equation (4):
Γ represents a parameter set for model integration represented by the following formula (5),
p _ii ^DD Indicates the joint probability estimated from the discrimination model,
p _jj ^GG Indicates the joint probability estimated from the generated model,
p (Θ) represents the prior probability distribution for Θ,
p (Γ) represents a prior probability distribution for Γ.

An auxiliary function for maximizing the first objective function and the second objective function are:
The joint probability of the input sequence and the output sequence estimated from the identification model parameter vector set using the labeled data, and the model integration parameter set obtained in advance for the identification model parameter vector set The probability value calculated based on the identification model integration probability value indicating the result of integrating over the identification model parameter vector set to be integrated, and
The joint probability of the input sequence and the output sequence estimated from the generation model parameter vector set using the unlabeled data, and the model integration parameter set obtained in advance for the generation model parameter vector set The product of the probability value calculated based on the generation model integration probability value indicating the result of integrating the probability value calculated over the generation model parameter vector set to be integrated,
I viewed including as a parameter set indicating a posteriori probability of the label to be assigned to a character string or symbol string is the input,
The auxiliary function calculating means includes:
Using the Q function shown in the following equation (15) as the auxiliary function, a parameter vector set Θ ′ that maximizes the Q function is obtained from the current parameter vector set Θ, and Θ ′ increases with respect to Θ. by obtaining the repetition Q function while replacing the theta until no in theta ', the objective function G | claim 2, characterized in <br/> obtaining parameters vector set theta to maximize (theta gamma) Language analysis model learning device.

The apparatus further comprises parameter integration means for integrating the output generation model parameter vector set and the model integration parameter set, and the previously learned identification model parameter vector set into a parameter set indicating the posterior probability. The language analysis model learning device according to claim 3 .

Any of claims 1 to 4, characterized by further comprising an identification model learning unit for creating the parameter vector set identifying model by learning the label has data the input by using the identification model The language analysis model learning device according to one item.

Based on the identification model and the generation model, the character string or symbol string based on the identification model and the generation model with the labeled data indicating the data with the label attached to the character string or symbol string and the unlabeled data indicating the character string or symbol string A language analysis model learning method of a language analysis model learning device for learning a language analysis model used for estimating a label to be given to
The identification model is a model for estimating the label to be given using a conditional probability indicating a probability that a label candidate predetermined in the condition of an input character string or symbol string appears,
The generation model is a model for estimating the label to be assigned using a joint probability indicating a probability that an input character string or symbol string and the predetermined label candidate are generated simultaneously;
Maximum by generation model learning unit, using the unlabeled data the input, and parameter vector set for pre-learning identification model, using the model integration parameter set predetermined first objective function determining a parameter vector set for generate models that turn into,
The model integrated learning unit, using a located labels that are the input data, the use and parameter vector set in advance for learning identification model, and a parameter vector set for generating model determined by the generation model learning means and determining the pre-SL model integration parameter set that maximize the second objective function,
Alternately
The convergence judging means, one of said model integration parameter set and the generated model parameter vector set for learning means and said generating models model integrated learning means has determined alternately whether a predetermined convergence condition is satisfied It determines the when it is determined that the convergence condition is satisfied, saw including a step of outputting said model integration parameter set and the parameter vector set for generating models that point,
The first objective function uses the identification model parameter vector set, the generated model parameter vector set, and the model integration parameter set to identify all outputs when unlabeled data is given. A function that calculates the sum of output values of a function
The second objective function is a function that calculates a degree to which labeled data can be correctly identified using the identification model parameter vector set, the generated model parameter vector set, and the model integration parameter set. ,
A language analysis model learning method characterized by this.

Determining the generation model parameter vector set;
Calculating an objective function G (Θ | Γ) shown in the following equation (14) as the first objective function;
By calculating a predetermined auxiliary function, a process for obtaining a parameter vector set Θ that maximizes the objective function G (Θ | Γ) expressed by the following equation (14) under fixed Λ and Γ Steps to perform,
Updating the obtained generation model parameter vector set Θ when the convergence determining means determines that the convergence condition is not satisfied, and
The step of determining the parameter set for model integration includes:
The objective function L shown in the following equation (16) as the second objective function ^SS-HybSS-Hyb Calculating (Γ | Θ) under fixed Λ and Θ;
The objective function L shown in the following equation (16) ^SS-HybSS-Hyb (Γ | Θ) is the model integration parameter γ for the discrimination model _ii Performing a partial differentiation with
The objective function L shown in the following equation (16) ^SS-HybSS-Hyb Model integration parameter γ for generating model (Γ | Θ) _jj Performing a partial differentiation with
Updating the obtained model integration parameter set Γ when it is determined by the convergence determination means that the convergence condition is not satisfied.
The language analysis model learning method according to claim 6.

An auxiliary function for maximizing the first objective function and the second objective function are:
The joint probability of the input sequence and the output sequence estimated from the identification model parameter vector set using the labeled data, and the model integration parameter set obtained in advance for the identification model parameter vector set The probability value calculated based on the identification model integration probability value indicating the result of integrating over the identification model parameter vector set to be integrated, and
The joint probability of the input sequence and the output sequence estimated from the generation model parameter vector set using the unlabeled data, and the model integration parameter set obtained in advance for the generation model parameter vector set The product of the probability value calculated based on the generation model integration probability value indicating the result of integrating the probability value calculated over the generation model parameter vector set to be integrated,
I viewed including as a parameter set indicating a posteriori probability of the label to be assigned to a character string or symbol string is the input,
The step of obtaining the parameter vector set Θ is:
Using the Q function shown in the following equation (15) as the auxiliary function, a parameter vector set Θ ′ that maximizes the Q function is obtained from the current parameter vector set Θ, and Θ ′ increases with respect to Θ. by obtaining the repetition Q function while replacing the theta until no in theta ', the objective function G | claim 7, wherein <br/> obtaining parameters vector set theta to maximize (theta gamma) Language analysis model learning method.

A step of integrating the output generation model parameter vector set and the model integration parameter set, and the previously learned identification model parameter vector set into a parameter set indicating the posterior probability by parameter integration means; The language analysis model learning method according to claim 8 , comprising:

By identifying the model learning unit, the claims 6 to 9, characterized by the step of creating a parameter vector set for the identification model by learning the label has data the input by using the identification model The language analysis model learning method according to any one of the above.

A language analysis model learning program for causing a computer to execute the language analysis model learning method according to any one of claims 6 to 10 .

A computer-readable recording medium in which the language analysis model learning program according to claim 11 is recorded.