JP5295037B2

JP5295037B2 - Learning device using Conditional Random Fields or Global Conditional Log-linearModels, and parameter learning method and program in the learning device

Info

Publication number: JP5295037B2
Application number: JP2009186668A
Authority: JP
Inventors: 隆伸大庭; 貴明堀; 篤中村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-08-11
Filing date: 2009-08-11
Publication date: 2013-09-18
Anticipated expiration: 2029-08-11
Also published as: JP2011039785A

Description

この発明は機械学習分野におけるConditional Random Fields（条件付確率場）やGlobal Conditional Log-linear Modelsに関し、特にそのパラメータの学習に関する。 The present invention relates to Conditional Random Fields and Global Conditional Log-linear Models in the field of machine learning, and more particularly to learning of parameters thereof.

Conditional Random Fields（条件付確率場：ＣＲＦｓ）やGlobal Conditional Log-linear Models（ＧＣＬＭｓ）はシンボル系列の中から最も妥当と予測されるシンボル系列を決定するための学習機械（学習装置）である。用途は自然言語処理分野だけでもラベリング、形態系解析、音声認識や機械翻訳での誤り訂正など多岐に渡る。同様の用途に適用可能な学習機械もいくつか存在するが、ＣＲＦｓやＧＣＬＭｓは目的関数の凹性に基づく最適解への収束の保証などの利点を有しており、かつ最も精度の高いモデルを生成する学習機械のひとつとして挙げられる。 Conditional Random Fields (Conditional Random Fields: CRFs) and Global Conditional Log-linear Models (GCLMs) are learning machines (learning devices) for determining a symbol sequence that is predicted to be most appropriate from among symbol sequences. Applications range widely from natural language processing to labeling, morphological analysis, speech recognition, and error correction in machine translation. There are some learning machines that can be applied to similar applications, but CRFs and GCLMs have advantages such as guaranteeing convergence to an optimal solution based on the concave nature of the objective function, and the most accurate model is available. One of the learning machines to generate.

非特許文献１にはＣＲＦｓを日本語形態素解析に適用した例が記載されており、また非特許文献２にはＧＣＬＭｓを音声認識の誤り訂正言語モデルに適用した例が記載されている。いずれも代替手法に比較し、高精度なモデル生成を実現している。 Non-Patent Document 1 describes an example in which CRFs are applied to Japanese morphological analysis, and Non-Patent Document 2 describes an example in which GCLMs is applied to an error correction language model for speech recognition. All of them achieve high-accuracy model generation compared to alternative methods.

工藤拓、山本薫、松本裕治，「Conditional Random Fieldsを用いた日本語形態素解析」，情報処理学会研究報告 2004-NL-161，Vol.2004，No.47，pp.89-96(2004)Taku Kudo, Atsushi Yamamoto, Yuji Matsumoto, "Japanese Morphological Analysis Using Conditional Random Fields", IPSJ Research Report 2004-NL-161, Vol.2004, No.47, pp.89-96 (2004) Roark B., Saraclar M. and Collins M.「Discriminative n-gram language modeling」，Computer Speech and Language，Vol.21，No.2，pp.373-392(2007)Roark B., Saraclar M. and Collins M. “Discriminative n-gram language modeling”, Computer Speech and Language, Vol.21, No.2, pp.373-392 (2007)

ところで、シンボル系列の中から最も妥当と予測されるシンボル系列を決定するための学習機械を使用する場合、学習データを利用して事前にモデル学習（モデルのパラメータ学習）を行う必要がある。学習データは複数のシンボル系列と、それに対応する正解シンボル系列の組からなるリストを大量に集めたものである。しかし、場合により、各シンボル系列のシンボル系列重み（重要度）が与えられる場合がある。 By the way, when using a learning machine for determining a symbol sequence that is predicted to be most appropriate from symbol sequences, it is necessary to perform model learning (model parameter learning) in advance using learning data. The learning data is a collection of a large number of lists each consisting of a set of a plurality of symbol sequences and corresponding correct symbol sequences. However, in some cases, the symbol sequence weight (importance) of each symbol sequence may be given.

例えば、音声認識の誤り訂正言語モデルにおいて、正解単語列と対応するリスト内のシンボル系列の単語誤り率をシンボル系列重みとして与えることが下記参考文献に記載されている。
参考文献：小林彰夫、他５名，「単語ラティスの識別的スコアリングによる音声認識」，秋季音響学会講演論文集，pp.233-234(2007)
シンボル系列重みは人為的に与えられたモデル学習を助ける補助情報であるため、これを用いて学習を行うことは高精度なモデルの生成に繋がる。そのため、シンボル系列重みが与えられている場合にはシンボル系列重みを扱う枠組みを持つ学習機械を選択する必要がある。 For example, in the error correction language model for speech recognition, the following reference describes that the word error rate of a symbol series in a list corresponding to a correct word string is given as a symbol series weight.
References: Akio Kobayashi and 5 others, “Speech recognition by discriminative scoring of word lattice”, Proc. Of the Acoustical Society of Japan, pp.233-234 (2007)
Since the symbol sequence weight is auxiliary information that aids artificially given model learning, learning using this leads to generation of a highly accurate model. Therefore, when a symbol sequence weight is given, it is necessary to select a learning machine having a framework that handles the symbol sequence weight.

しかるに、従来のConditional Random Fields（ＣＲＦｓ）やGlobal Conditional Log-linear Models（ＧＣＬＭｓ）はシンボル系列重みを扱う枠組みを持たない学習機械であった。 However, conventional Conditional Random Fields (CRFs) and Global Conditional Log-linear Models (GCLMs) are learning machines that do not have a framework for handling symbol sequence weights.

この発明の目的はこのような状況に鑑み、Conditional Random FieldsもしくはGlobal Conditional Log-linear Modelsを用いる学習装置において、シンボル系列重みを扱う枠組みを持つようにした学習装置及びそのパラメータ学習方法、プログラムを提供することにある。 In view of such a situation, an object of the present invention is to provide a learning apparatus using Conditional Random Fields or Global Conditional Log-linear Models that has a framework for handling symbol sequence weights, its parameter learning method, and a program There is to do.

この発明によれば、Conditional Random FieldsもしくはGlobal Conditional Log-linear Modelsを用いる学習装置は、複数のシンボル系列の素性ベクトルと、それに対応する正解シンボル系列の素性ベクトルと、複数のシンボル系列の素性ベクトルのシンボル系列重みと正解シンボル系列の素性ベクトルのシンボル系列重みとからなるリストの集合を学習データとして取り込むリスト入力部と、目的関数のパラメータを初期化するパラメータ初期化部と、複数のシンボル系列及び正解シンボル系列の素性ベクトルごとに、前記パラメータと当該素性ベクトルとの内積により線形スコアを算出し、その線形スコアと当該素性ベクトルのシンボル系列重みとから重み付けされた指数スコアを算出するリスト内処理部と、リスト内処理部で算出された全ての指数スコアと前記複数のシンボル系列の素性ベクトルと前記正解シンボル系列の素性ベクトルとを用いて前記目的関数及びその傾きを算出する目的関数算出部と、前記傾きから前記目的関数の収束を判定する収束判定部と、前記パラメータを更新するパラメータ更新部とを備える。 According to the present invention, a learning apparatus using Conditional Random Fields or Global Conditional Log-linear Models includes a plurality of symbol sequence feature vectors, a corresponding correct symbol sequence feature vector, and a plurality of symbol sequence feature vectors. A list input unit for acquiring a set of lists including symbol sequence weights and symbol sequence weights of feature vectors of correct symbol sequences as learning data, a parameter initialization unit for initializing parameters of an objective function, a plurality of symbol sequences and correct answers for each feature vector of the symbol sequence, the parameters and the by the inner product of the feature vector to calculate the linear score, its linear score and a list in the processing unit to output calculated a weighted index score and a symbol sequence weights of the feature vector When, all of the index score calculated in a list in the processing unit An objective function calculation unit that calculates the objective function and its gradient using the feature vectors of the plurality of symbol sequences and the feature vector of the correct symbol sequence; a convergence determination unit that determines convergence of the objective function from the gradient; And a parameter updating unit for updating the parameter.

上記構成において、リスト内処理部は、好ましくは前記線形スコアを算出する線形スコア算出部と、その線形スコア算出部で算出された線形スコアから指数スコアを算出する指数スコア算出部と、その指数スコア算出部で算出された指数スコアに前記シンボル系列重みを乗算して前記重み付けされた指数スコアを算出する重み乗算部とよりなるものとされる。 In the above configuration, the in-list processing unit preferably includes a linear score calculation unit that calculates the linear score, an exponent score calculation unit that calculates an exponent score from the linear score calculated by the linear score calculation unit, and the exponent score A weight multiplier for calculating the weighted exponent score by multiplying the exponent score calculated by the calculator by the symbol sequence weight.

さらに、リスト内処理部は、前記線形スコアを算出する線形スコア算出部と、その線形スコア算出部で算出された線形スコアに前記シンボル系列重みを加算して重み付けされた線形スコアを算出する重み加算部と、その重み加算部で算出された重み付けされた線形スコアから前記重み付けされた指数スコアを算出する指数スコア算出部とよりなるものとしてもよい。 Further, the list processing unit includes a linear score calculation unit that calculates the linear score, and a weight addition that calculates a weighted linear score by adding the symbol series weight to the linear score calculated by the linear score calculation unit And an exponent score calculation unit that calculates the weighted exponent score from the weighted linear score calculated by the weight addition unit.

この発明によるConditional Random FieldsもしくはGlobal Conditional Log-linear Modelsを用いる学習装置におけるパラメータ学習方法は、複数のシンボル系列の素性ベクトルと、それに対応する正解シンボル系列の素性ベクトルと、それら各シンボル系列のシンボル系列重みとからなるリストの集合を学習データとして取り込むリスト入力過程と、目的関数のパラメータを初期化するパラメータ初期化過程と、前記パラメータと前記素性ベクトルとの内積により線形スコアを算出し、その線形スコアと前記シンボル系列重みとから重み付けされた指数スコアを前記各シンボル系列に対して算出するリスト内処理過程と、リスト内処理過程で算出された全ての指数スコア及び前記素性ベクトルを用いて前記目的関数及びその傾きを算出する目的関数算出過程と、前記傾きから前記目的関数の収束を判定する収束判定過程と、前記パラメータを更新するパラメータ更新過程とを含む。 A parameter learning method in a learning apparatus using Conditional Random Fields or Global Conditional Log-linear Models according to the present invention includes a feature vector of a plurality of symbol sequences, a feature vector of a correct symbol sequence corresponding thereto, and a symbol sequence of each symbol sequence A linear score is calculated by a list input process that takes a set of lists including weights as learning data, a parameter initialization process that initializes parameters of an objective function, and an inner product of the parameter and the feature vector, and the linear score And an index score weighted from the symbol series weight for each symbol series, and the objective function using all the exponent scores and the feature vectors calculated in the list processing process. And an objective function calculation process for calculating the inclination thereof, A convergence determination process for determining the convergence of the objective function from the slope, and a parameter update process for updating the parameter are included.

上記において、リスト内処理過程は、好ましくは前記線形スコアを算出する線形スコア算出過程と、その線形スコア算出過程で算出された線形スコアから指数スコアを算出する指数スコア算出過程と、その指数スコア算出過程で算出された指数スコアに前記シンボル系列重みを乗算して前記重み付けされた指数スコアを算出する重み乗算過程とよりなるものとされる。 In the above, the list processing process preferably includes a linear score calculation process for calculating the linear score, an exponent score calculation process for calculating an exponent score from the linear score calculated in the linear score calculation process, and an exponent score calculation The exponent score calculated in the process is multiplied by the symbol sequence weight to calculate the weighted exponent score.

さらに、リスト内処理過程は、前記線形スコアを算出する線形スコア算出過程と、その線形スコア算出過程で算出された線形スコアに前記シンボル系列重みを加算して重み付けされた線形スコアを算出する重み加算過程と、その重み加算過程で算出された重み付けされた線形スコアから前記重み付けされた指数スコアを算出する指数スコア算出過程とよりなるものとしてもよい。 Further, the in-list processing step includes a linear score calculation step for calculating the linear score, and a weight addition for calculating a weighted linear score by adding the symbol sequence weight to the linear score calculated in the linear score calculation step. It is also possible to comprise a process and an exponent score calculation process for calculating the weighted exponent score from the weighted linear score calculated in the weight addition process.

この発明によれば、最適解への収束を保証したConditional Random FieldsもしくはGlobal Conditional Log-linear Modelsを用いる学習装置において、シンボル系列重みを扱う枠組みを持つ学習装置及びそのパラメータ学習方法を実現することができる。 According to the present invention, in a learning device using Conditional Random Fields or Global Conditional Log-linear Models that guarantees convergence to an optimal solution, a learning device having a framework for handling symbol sequence weights and its parameter learning method can be realized. it can.

よって、この発明によればシンボル系列重みなしの学習に比べ、学習精度を向上させることができ、モデル性能の向上を図ることができる。 Therefore, according to the present invention, learning accuracy can be improved and model performance can be improved as compared to learning without symbol sequence weight.

ＣＲＦｓやＧＣＬＭｓを用いる既存の学習装置の機能構成例を示すブロック図。The block diagram which shows the function structural example of the existing learning apparatus which uses CRFs and GCLMs. 図１に示した学習装置におけるパラメータ学習方法の手順を説明するためのフローチャート。The flowchart for demonstrating the procedure of the parameter learning method in the learning apparatus shown in FIG. この発明によるＣＲＦｓやＧＣＬＭｓを用いる学習装置の一実施例の機能構成を示すブロック図。The block diagram which shows the function structure of one Example of the learning apparatus using CRFs and GCLMs by this invention. 図３に示した学習装置におけるパラメータ学習方法の手順を説明するためのフローチャート。The flowchart for demonstrating the procedure of the parameter learning method in the learning apparatus shown in FIG. この発明によるＣＲＦｓやＧＣＬＭｓを用いる学習装置の他の実施例の機能構成を示すブロック図。The block diagram which shows the function structure of the other Example of the learning apparatus using CRFs and GCLMs by this invention. 図５に示した学習装置におけるパラメータ学習方法の手順を説明するためのフローチャート。The flowchart for demonstrating the procedure of the parameter learning method in the learning apparatus shown in FIG.

まず、最初に、既存のConditional Random Fields（ＣＲＦｓ）及びGlobal Conditional Log-linear Models（ＧＣＬＭｓ）について説明する。 First, existing Conditional Random Fields (CRFs) and Global Conditional Log-linear Models (GCLMs) will be described.

条件により呼称が異なるが、これらＣＲＦｓ及びＧＣＬＭｓは共に学習データが与えられたもとで次の目的関数Ｌを最小化するパラメータ（パラメータベクトル）ｗ^→を求めることで学習が達成される学習機械である。 Although the names differ depending on the conditions, these CRFs and GCLMs are learning machines in which learning is achieved by obtaining a parameter (parameter vector) w ^→ that minimizes the next objective function L with learning data given.

学習において各シンボル系列は素性ベクトルにより表現されている。ｆ^→ _i,0はｉ番目のリスト（シンボル系列集合）の正解シンボル系列の素性ベクトルであり、ｆ^→ _i,jはｉ番目のリストに属するｊ番目のシンボル系列の素性ベクトルである。正解はリストに含まれていてもいなくてもよい。〈ｗ^→，ｆ^→〉はパラメータｗ^→と素性ベクトルｆ^→の内積を表す。 In learning, each symbol series is represented by a feature vector. f ^→ _{i, 0} is a feature vector of the correct symbol sequence of the i-th list (symbol sequence set), and f ^→ _{i, j} is a feature vector of the j-th symbol sequence belonging to the i-th list. The correct answer may or may not be included in the list. <W ^→ , f ^→ > represents an inner product of the parameter w ^→ and the feature vector f ^→ .

式（１）を用いて学習を行うと、学習データに対しての性能は高いものの、過学習により異なるデータに対する性能の低いモデルが生成される恐れがある。これを防止するために、ＣＲＦｓおよびＧＣＬＭｓにおけるパラメータ推定は一般には下式のような正則化付きの目的関数の最小化として定式化される。 When learning is performed using Expression (1), although the performance with respect to the learning data is high, a model with low performance with respect to different data may be generated due to over-learning. In order to prevent this, parameter estimation in CRFs and GCLMs is generally formulated as a minimization of an objective function with regularization, such as:

この目的関数を最小化するｗ^→は準ニュートン法に基づく手法により求めることができるが、この説明は本発明の範疇外であるため、ここでは省略する。一般に、式（１）の形を持つ関数は、ｗ^→に対して凹型であることが知られている。そのため、大局的最適解への収束が保証される。 Although w ^→ which minimizes the objective function can be obtained by a method based on the quasi-Newton method, this explanation is out of the scope of the present invention, and is omitted here. In general, the function that has the form of equation (1), is known to be against w ^→ is concave. Therefore, convergence to a global optimal solution is guaranteed.

図１は既存のＣＲＦｓやＧＣＬＭｓを用いる学習装置の基本的な機能構成例を示したものであり、学習装置はリスト入力部１１と条件入力部１２とパラメータ推定部２０とよりなり、パラメータ推定部２０はパラメータ初期化部２１とリスト内処理部２２と目的関数算出部２２と収束判定部２４とパラメータ更新部２５とによって構成されている。また、リスト内処理部２２はこの例では線形スコア算出部２２ａと指数スコア算出部２２ｂとによって構成されている。 FIG. 1 shows a basic functional configuration example of a learning device using existing CRFs and GCLMs. The learning device includes a list input unit 11, a condition input unit 12, and a parameter estimation unit 20, and includes a parameter estimation unit. 20 includes a parameter initialization unit 21, an in-list processing unit 22, an objective function calculation unit 22, a convergence determination unit 24, and a parameter update unit 25. In this example, the list processing unit 22 includes a linear score calculation unit 22a and an exponent score calculation unit 22b.

図２は図１に示した学習装置におけるパラメータ学習の基本的な手順を示したものであり、以下、図１及び２を参照してパラメータ学習における各部の処理及び手順について説明する。
・リスト入力（ステップＳ１）
リスト入力部１１は複数のシンボル系列の素性ベクトルと、それに対応する正解シンボル系列の素性ベクトルとからなるリストの集合を学習データとして取り込み、全てのi,jについてｆ^→ _i,0及びｆ^→ _i,jが入力される。
・パラメータ初期化（ステップＳ２）
パラメータ初期化部２１は式（２）に示した目的関数Ｌのパラメータｗ^→を初期化する。
・線形スコア算出（ステップＳ３）
リスト内処理部２２の線形スコア算出部２２ａはパラメータｗ^→と素性ベクトルｆ^→との内積により線形スコア〈ｗ^→，ｆ^→〉を算出する。
・指数スコア算出（ステップＳ４）
指数スコア算出部２２ｂは線形スコア〈ｗ^→，ｆ^→〉から指数スコアexp（〈ｗ^→，ｆ^→〉）を算出する。各ｉにおいて、正解シンボル系列の指数スコアexp（〈ｗ^→，ｆ^→ _i,0〉）を含む｛exp（〈ｗ^→，ｆ^→ _i,j〉）｜ｊ＝０，１，…，ｎ_ｉ｝が算出される。
・目的関数及びその傾き算出（ステップＳ５）
目的関数算出部２３はリスト内処理部２２で算出された全ての指数スコア及び素性ベクトルを用いて式（２）の目的関数Ｌ及びその傾きを算出する。傾きは、 FIG. 2 shows a basic procedure of parameter learning in the learning apparatus shown in FIG. 1, and the processing and procedure of each part in parameter learning will be described below with reference to FIGS.
・ List input (step S1)
List input section 11 takes in the feature vectors of a plurality of symbol sequences, a set of list of the feature vector of the correct symbol sequence corresponding thereto as the learning data, all i, for j f ^→ _{i, 0} and f ^→ _{i , j} is entered.
-Parameter initialization (step S2)
The parameter initialization unit 21 initializes the parameter w ^→ of the objective function L shown in Expression (2).
-Linear score calculation (step S3)
The linear score calculation unit 22a of the in-list processing unit 22 calculates a linear score <w ^→ , f ^→ > by an inner product of the parameter w ^→ and the feature vector f ^→ .
-Index score calculation (step S4)
The exponent score calculation unit 22b calculates an exponent score exp (<w ^→ , f ^→ >) from the linear score <w ^→ , f ^→ >. In each i, index score exp of the correct symbol sequence ^{^{_{(<w →, f → i}}} , 0>) including the ^{{exp (<w →, f} → i, j>) | j = 0,1, ..., n i } Is calculated.
-Objective function and its slope calculation (step S5)
The objective function calculation unit 23 calculates the objective function L of equation (2) and its gradient using all the exponent scores and feature vectors calculated by the in-list processing unit 22. The slope is

で表される。
・収束判定（ステップＳ６）
収束判定部２４は式（３）に示した傾きから目的関数Ｌの収束を判定する。
・パラメータ更新（ステップＳ７）
パラメータ更新部２５は収束判定部２４で収束未と判定された時、パラメータｗ^→の更新を行う。 It is represented by
・ Convergence determination (step S6)
The convergence determination unit 24 determines the convergence of the objective function L from the slope shown in Expression (3).
-Parameter update (step S7)
The parameter update unit 25 updates the parameter w ^→ when the convergence determination unit 24 determines that the convergence has not been completed.

以降、最適解に収束するまでステップＳ３〜Ｓ７を繰り返し実行する。パラメータ学習は最適解への収束によって完了し、最適なパラメータｗ^→が推定される。なお、収束判定条件、パラメータ更新条件、式（２）における定数Ｃ等は条件入力部１２よりパラメータ推定部２０に入力される。 Thereafter, steps S3 to S7 are repeatedly executed until the optimal solution is converged. The parameter learning is completed by convergence to the optimal solution, and the optimal parameter w ^→ is estimated. The convergence determination condition, the parameter update condition, the constant C in equation (2), and the like are input from the condition input unit 12 to the parameter estimation unit 20.

次に、上述した既存のＣＲＦｓやＧＣＬＭｓを用いる学習装置及びそのパラメータ学習方法をベースにして、この発明の実施例を図面を参照して説明する。なお、各図において図１及び２と対応する部分には同一符号を付し、その詳細な説明を省略する。 Next, an embodiment of the present invention will be described with reference to the drawings based on the above-described learning device using CRFs and GCLMs and its parameter learning method. In addition, in each figure, the same code | symbol is attached | subjected to the part corresponding to FIG.1 and 2, and the detailed description is abbreviate | omitted.

図３はこの発明によるＣＲＦｓやＧＣＬＭｓを用いる学習装置の実施例１の構成を示したものであり、図４は図３に示した学習装置におけるパラメータ学習の手順を示したものである。 FIG. 3 shows the configuration of Embodiment 1 of the learning apparatus using CRFs and GCLMs according to the present invention, and FIG. 4 shows the procedure of parameter learning in the learning apparatus shown in FIG.

この例ではリスト入力部１１には重み付きリストが入力され（ステップＳ１）、即ち入力にはシンボル系列重みが加わっており、全てのi,jについて素性ベクトルｆ^→ _i,0，ｆ^→ _i,j及びシンボル系列重みｓ_i,0，ｓ_i,jが入力される。 In this example, a weighted list is input to the list input unit 11 (step S1), that is, a symbol sequence weight is added to the input, and feature vectors f ^→ _{i, 0} , f ^→ _i, for all i, j _{. j} and symbol sequence weights s _{i, 0} , s _{i, j} are input.

リスト内処理部２２は線形スコア算出部２２ａと指数スコア算出部２２ｂと重み乗算部２２ｃとよりなり、重み乗算部２２ｃは指数スコア算出部２２ｂで算出された指数スコアexp（〈ｗ^→，ｆ^→〉）に対してシンボル系列重みｓを乗算し（ステップＳ１１）、重み付けされた指数スコアｓ exp（〈ｗ^→，ｆ^→〉）を算出する。これにより、この例ではシンボル系列重みをＣＲＦｓやＧＣＬＭｓに導入した学習装置及びその学習装置におけるパラメータ学習方法を実現することができる。 The in-list processing unit 22 includes a linear score calculation unit 22a, an exponent score calculation unit 22b, and a weight multiplication unit 22c. The weight multiplication unit 22c calculates the exponent score exp (<w ^→ , f ^→) calculated by the exponent score calculation unit 22b. >) Is multiplied by a symbol sequence weight s (step S11), and a weighted exponent score s exp (<w ^→ , f ^→ >) is calculated. Thereby, in this example, it is possible to realize a learning apparatus in which symbol sequence weights are introduced into CRFs and GCLMs, and a parameter learning method in the learning apparatus.

図５はこの発明によるＣＲＦｓやＧＣＬＭｓを用いる学習装置の実施例２の構成を示したものであり、図６は図５に示した学習装置におけるパラメータ学習の手順を示したものである。 FIG. 5 shows the configuration of Embodiment 2 of the learning apparatus using CRFs and GCLMs according to the present invention, and FIG. 6 shows the parameter learning procedure in the learning apparatus shown in FIG.

リスト入力部１１には実施例１と同様、重み付きリストが入力される（ステップＳ１）。 As in the first embodiment, a weighted list is input to the list input unit 11 (step S1).

リスト内処理部２２はこの例では線形スコア算出部２２ａと重み加算部２２ｄと指数スコア算出部２２ｂとよりなる。重み加算部２２ｄは線形スコア算出部２２ａで算出された線形スコア〈ｗ^→，ｆ^→〉に対してシンボル系列重みｓを加算し（ステップＳ２１）、重み付けされた線形スコア〈ｗ^→，ｆ^→〉＋ｓを算出する。重み付けされた線形スコア〈ｗ^→，ｆ^→〉＋ｓは指数スコア算出部２２ｂに渡され、指数スコア算出部２２ｂは重み付けされた指数スコアexp（〈ｗ^→，ｆ^→〉＋ｓ）を算出する。これにより、この例においても実施例１と同様、シンボル系列重みをＣＲＦｓやＧＣＬＭｓに導入した学習装置及びその学習装置におけるパラメータ学習方法を実現することができる。 In this example, the list processing unit 22 includes a linear score calculation unit 22a, a weight addition unit 22d, and an exponent score calculation unit 22b. The weight addition unit 22d adds the symbol sequence weight s to the linear score <w ^→ , f ^→ > calculated by the linear score calculation unit 22a (step S21), and the weighted linear score <w ^→ , f ^→ >. + S is calculated. The weighted linear score <w ^→ , f ^→ > + s is passed to the exponent score calculation unit 22b, and the exponent score calculation unit 22b calculates the weighted exponent score exp (<w ^→ , f ^→ > + s). As a result, in this example as well as in the first embodiment, it is possible to realize a learning device in which symbol sequence weights are introduced into CRFs and GCLMs and a parameter learning method in the learning device.

なお、上記においてリスト入力部１１に入力されるリスト（重み付きリスト）はネットワーク表現化されたものも含むものとする。 In the above description, the list (weighted list) input to the list input unit 11 includes a network representation.

以下、上述した実施例１及び２におけるシンボル系列重みｓの導入（乗算，加算）が等価であること及びこの発明における目的関数Ｌも凹型であることについて説明する。
［等価な２種類のシンボル系列重みの導入］
実施例１における目的関数Ｌは次式となる。 Hereinafter, it will be described that the introduction (multiplication, addition) of the symbol sequence weight s in the first and second embodiments is equivalent and that the objective function L in the present invention is also concave.
[Introduction of two equivalent symbol sequence weights]
The objective function L in the first embodiment is as follows.

一方、実施例２における目的関数Ｌは次式となる。 On the other hand, the objective function L in the second embodiment is as follows.

ここで、ｘ＝exp(log(x))及びexp(x)exp(y)＝exp(ｘ＋ｙ)であることを利用すると、式（４）は次式に変換できる。 Here, using the fact that x = exp (log (x)) and exp (x) exp (y) = exp (x + y), Expression (4) can be converted into the following expression.

式（５）と式（６）はｓがlog(ｓ)に変換されただけであることがわかる。対数は単調増加関数であるから、両者はスケーリングの違いはあれど、大きな重みを持つシンボル系列の影響力が大きくなるように設計された関数である点において共通であり、シンボル系列重みｓは役割の観点から等価な働きをしている。
［凹型な目的関数］
式（４）が凹型であることを示すために、等式変形された式（６）を用いて考える。今、ｗ^→及びｆ^→ _i,jにそれぞれ新たな要素として、ｃ及びlog(ｓ_i,j )を追加する。追加されたものをそれぞれＷ^→及びＦ^→ _i,jと表記する。このとき、式（６）は Equations (5) and (6) show that s has only been converted to log (s). Since the logarithm is a monotonically increasing function, the two are common in that they are functions designed to increase the influence of a symbol sequence with a large weight, although the difference in scaling is, the symbol sequence weight s plays a role Equivalent work from the point of view.
[Concave objective function]
In order to show that equation (4) is concave, we consider using equation (6), which is an equational transformation. Now, c and log (s _{i, j} ) are added as new elements to w ^→ and f ^→ _{i, j} , respectively. The added ones are written as W ^→ and F ^→ _{i, j} , respectively. At this time, the equation (6) is

となる。この式上では暗黙にｃ＝１を要請しているが、ｃも推定の対象とすると、前述した通り、この形状の関数が凹型であることは知られている。但し、Ｗ^→について凹型である。ＣＲＦｓ及びＧＣＬＭｓにおける学習は目的関数のパラメータｗ^→についての最小化である。従って、興味は式（７）がｗ^→に関して凹型であるかどうかである。しかし、ＬがＷ^→に関して凹型であるということは、あらゆるＷ^→の要素ｗ_ｋ及びｃに対して凹型であることを意味する。つまり、ｗ^→に関して凹型であることは保たれる。
式（５）も同様の理由からパラメータｗ^→に対して凹型であることがわかる。 It becomes. In this equation, c = 1 is implicitly requested. However, if c is also an estimation target, it is known that the function of this shape is concave as described above. However, W ^→ is concave. Learning in CRFs and GCLMs is a minimization of the objective function parameter w 1 ^→ . Therefore, the interest is whether equation (7) is concave with respect to w ^→ . However, L being concave with respect to W ^→ means concave with respect to any W ^→ elements w _k and c. In other words, it is kept concave with respect to w ^→ .
It can be seen that equation (5) is also concave with respect to the parameter w ^→ for the same reason.

以上説明したこの発明による学習装置及びその学習装置におけるパラメータ学習方法は、コンピュータと、コンピュータにインストールされたパラメータ学習プログラムによって実現することができる。 The learning device and the parameter learning method in the learning device according to the present invention described above can be realized by a computer and a parameter learning program installed in the computer.

《学習により得られたモデルの使用方法》
リストから最適なシンボル系列を求める場合には、学習により得られたパラメータｗ^→と各シンボル系列の素性ベクトルｆ^→の内積〈ｗ^→，ｆ^→〉が最も大きなシンボル系列を選択する。
《検証》
日本語話し言葉コーパス（ＣＳＪ）を用い、本発明の効果を検証した。ＣＳＪは講演音声データとその書き起こしからなるデータベースである。下記の表に示したような学習用と２つの評価セットを用意した。《How to use the model obtained by learning》
When obtaining the optimum symbol sequence from the list, the symbol sequence having the largest inner product <w ^→ , f ^→ > of the parameter w ^→ obtained by learning and the feature vector f ^→ of each symbol series is selected.
<Verification>
The effect of the present invention was verified using a Japanese spoken corpus (CSJ). CSJ is a database consisting of speech data and transcripts. Two evaluation sets were prepared for learning as shown in the table below.

講演を発話単位に分割し、音声認識システムで５０００−bestリストを作成した。つまり、リストの数は発話数に一致する。そして、シンボル系列は音声認識結果であり、各リストに最大５０００のシンボル系列が存在する。素性にはｕｎｉ−，ｂｉ−，ｔｒｉ−gram boolean及び音声認識スコアを用いた。シンボル系列重み（重要度）には各シンボル系列のリスト中の順位（単語誤り率の昇順）を用いた。表中の単語誤り率は音声認識システムの出力した５０００−bestリストのうち、最も大きな認識スコアを持つ認識結果に対して算出されたものである。 The lecture was divided into utterance units, and a 5000-best list was created by the speech recognition system. That is, the number of lists matches the number of utterances. The symbol series is a speech recognition result, and there are a maximum of 5000 symbol series in each list. Uni-, bi-, tri-gram boolean and speech recognition score were used for the features. For the symbol sequence weight (importance), the rank in the list of each symbol sequence (ascending order of word error rate) was used. The word error rate in the table is calculated for the recognition result having the largest recognition score in the 5000-best list output by the speech recognition system.

シンボル系列を〈ｗ^→，ｆ^→〉の大きい順に並べ替えることにより、最終的に最も高いスコアを持つシンボル系列を新たな音声認識結果とし、その単語誤り率を比較した。即ち、モデル学習の目的は、単語誤り率の低いシンボル系列に高いスコアを与えることにある。結果は以下の通りとなった。 By rearranging the symbol sequences in descending order of <w ^→ , f ^→ >, the symbol sequence having the highest score is finally set as a new speech recognition result, and the word error rates are compared. That is, the purpose of model learning is to give a high score to a symbol series having a low word error rate. The results were as follows.

重み付き学習の効果でモデル性能が向上し、より低い単語誤り率を実現することができた。 The model performance was improved by the effect of weighted learning, and a lower word error rate could be realized.

Claims

A learning device that uses Conditional Random Fields or Global Conditional Log-linear Models,
A set of lists comprising a feature vector of a plurality of symbol sequences, a feature vector of a corresponding correct symbol sequence, a symbol sequence weight of the feature vector of the plurality of symbol sequences, and a symbol sequence weight of the feature vector of the correct symbol sequence A list input unit that captures as learning data,
A parameter initialization unit for initializing parameters of the objective function;
For each feature vector of said plurality of symbol sequences and the correct symbol sequence, the parameters and the by the inner product of the feature vector to calculate the linear scores, exponent score weighting from the symbol sequence weight of the linear score and the feature vector and the list in the processing unit that de San a,
And the objective function calculation section that calculates the objective function and its gradient by using the feature vector of the feature vector and the correct symbol sequence with all index score calculated by the list in the processing unit of the plurality of symbol sequences,
A convergence determination unit for determining convergence of the objective function from the slope;
A learning apparatus comprising: a parameter updating unit that updates the parameter.

In the learning apparatus using Conditional Random Fields or Global Conditional Log-linear Models according to claim 1,
The in-list processing unit is calculated by a linear score calculation unit that calculates the linear score, an exponent score calculation unit that calculates an exponent score from the linear score calculated by the linear score calculation unit, and an exponent score calculation unit. And a weight multiplication unit for calculating the weighted exponent score by multiplying the exponent score by the symbol series weight.

In the learning apparatus using Conditional Random Fields or Global Conditional Log-linear Models according to claim 1,
The list in the processing unit, the weight for calculating the linear score calculator for calculating the linear score, the linear scores weighted by adding each of the symbol sequences weight linear score calculated in its linear score calculator A learning apparatus comprising: an addition unit; and an exponent score calculation unit that calculates the weighted exponent score from the weighted linear score calculated by the weight addition unit.

A parameter learning method in a learning apparatus using Conditional Random Fields or Global Conditional Log-linear Models,
A set of lists comprising a feature vector of a plurality of symbol sequences, a feature vector of a corresponding correct symbol sequence, a symbol sequence weight of the feature vector of the plurality of symbol sequences, and a symbol sequence weight of the feature vector of the correct symbol sequence List input process to import as learning data,
A parameter initialization process for initializing the parameters of the objective function;
For each feature vector of said plurality of symbol sequences and the correct symbol sequence, the parameters and the by the inner product of the feature vector to calculate the linear scores, exponent score weighting from the symbol sequence weight of the linear score and the feature vector and the list in the process of de San a,
And the objective function calculation step of calculating the objective function and its gradient by using the feature vector of the feature vector and the correct symbol sequence with all index score calculated by the list within the process the plurality of symbol sequences,
A convergence determination process for determining convergence of the objective function from the slope;
A parameter updating process for updating the parameter.

In the parameter learning method in the learning apparatus using Conditional Random Fields or Global Conditional Log-linear Models according to claim 4,
The in-list processing process is calculated in a linear score calculation process for calculating the linear score, an exponent score calculation process for calculating an exponent score from the linear score calculated in the linear score calculation process, and an exponent score calculation process. And a weight multiplication process for calculating the weighted exponent score by multiplying the symbol score by the symbol sequence weight.

In the parameter learning method in the learning apparatus using Conditional Random Fields or Global Conditional Log-linear Models according to claim 4,
The in-list processing step includes a linear score calculation step for calculating the linear score, and a weight addition step for calculating a weighted linear score by adding the symbol sequence weight to the linear score calculated in the linear score calculation step. And an exponent score calculation step of calculating the weighted exponent score from the weighted linear score calculated in the weight addition step.

A parameter learning program for causing a computer to function as a learning device using the Conditional Random Fields or Global Conditional Log-linear Models according to any one of claims 1 to 3.