JP2011039965A

JP2011039965A - Model parameter estimation device, method, and program

Info

Publication number: JP2011039965A
Application number: JP2009189111A
Authority: JP
Inventors: Takanobu Oba; 隆伸大庭; Takaaki Hori; 貴明堀; Atsushi Nakamura; 篤中村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-08-18
Filing date: 2009-08-18
Publication date: 2011-02-24
Anticipated expiration: 2029-08-18
Also published as: JP5268825B2

Abstract

<P>PROBLEM TO BE SOLVED: To allow a model parameter to be estimated more accurately than conventional devices and methods. <P>SOLUTION: A device receives one or more lists i which comprise a plurality of symbol sequences f<SB>i, j</SB>which have importance degrees e<SB>i, j</SB>assigned thereto and are expressed by identity vectors, and receives correct answer symbol sequences f<SB>i, O</SB>of respective list i, which have importance degrees e<SB>i, O</SB>assigned thereto and are expressed by identity vectors, and estimates a model parameter w and includes an importance degree conversion part and a model parameter estimation part. The importance degree conversion part converts, per list, importance degrees e<SB>i, j</SB>so that a value of the degree of importance of a prescribed symbol sequence is relatively larger than values of degrees of importance of symbol sequences other than the prescribed sequence. The model parameter estimation part estimates the model parameter w from the symbol sequences f<SB>i, j</SB>, correct answer symbol sequences f<SB>i, O</SB>, and degrees of importance after the conversion. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、シンボル系列の並べ替え問題におけるモデル学習に用いるモデルパラメータ推定装置、方法及びプログラムに関する。 The present invention relates to a model parameter estimation apparatus, method, and program used for model learning in a symbol sequence rearrangement problem.

音声認識や機械翻訳では、暫定的な複数の認識結果や翻訳結果（単語系列）を出力し、その中から誤りの少ない（正解に近い）系列を見つけることで、認識や翻訳の精度を改善することができる。音声認識機や機械翻訳機が出力する個々の正解候補の単語列をシンボル系列、出力される複数の正解候補の組をリストと呼ぶとき、このようなリストからの正解シンボル系列の抽出は、一般に各シンボル系列にスコアを付与し、リスト内のシンボル系列をスコア順に並べ替えることにより行う。すなわち、通常は最も大きなスコアを持つ単語列が認識・翻訳結果であり、仮にそうでない場合にも順次スコアの高いシンボル系列を検証することで正解に近いシンボル系列の効率的な結果の抽出を実現している（音声認識につき非特許文献１、２、５参照、機械翻訳につき非特許文献３、４参照）。 In speech recognition and machine translation, multiple provisional recognition results and translation results (word sequences) are output, and by finding a sequence with few errors (close to the correct answer), recognition and translation accuracy are improved. be able to. When a word sequence of individual correct candidates output by a speech recognizer or machine translator is called a symbol series, and a set of output correct candidates is called a list, the extraction of the correct symbol series from such a list is generally performed. A score is assigned to each symbol series, and the symbol series in the list is rearranged in the order of score. In other words, the word sequence with the highest score is usually the recognition / translation result, and even if it is not, it is possible to efficiently extract the result of the symbol sequence close to the correct answer by sequentially verifying the symbol sequence with the highest score. (Refer to Non-Patent Documents 1, 2, and 5 for voice recognition, and Non-Patent Documents 3 and 4 for machine translation).

シンボル系列からなるリストから目的のシンボル系列を抽出する際には、一般に、予め学習により得られたモデルが用いられる。以下、予め用意されたモデルを用いて正解に近い系列を見つける方法を図７を用いて説明する。 When a target symbol series is extracted from a list of symbol series, a model obtained by learning in advance is generally used. Hereinafter, a method of finding a sequence close to the correct answer using a model prepared in advance will be described with reference to FIG.

まず、複数のシンボル系列からなるリストを読み込む（Ｓ１１）。各シンボル系列は一般に素性（特徴）ベクトルにより表現され、素性には単語、品詞、音素などのN-gramや共起、構文解析や係り受け解析を適応した結果から得られる依存関係の頻度、boolean（有無を二値表現したもの）などが用いられる。もっとも、リストの形態は必ずしも素性ベクトルの列に限られず、ネットワークのような表現形態であっても最終的に素性ベクトルが抽出できる形態であればよい。なお、シンボル系列は次のような方法により素性ベクトルで表現することができる（非特許文献３参照）。例えば、シンボル集合｛○、×、△｝からなるシンボル系列○○×○を素性ベクトルで表現する方法を考える。ある１つのシンボルがシンボル系列に出現した場合に１、出現しない場合に０の素性値をとるとした時、シンボル系列○○×○には、○と×は出現するため１、△は出現しないので０となる。素性ベクトルは、このような素性につき[１、１、０]^Ｔのようにベクトル表現したものである。シンボル系列として自然言語の単語列を扱う時には、各シンボル系列の構文解析結果やそのスコアなどの付加的な情報を加えてから、それらの情報も含めて素性ベクトルを作成する場合もある。 First, a list consisting of a plurality of symbol sequences is read (S11). Each symbol series is generally represented by a feature (feature) vector, and the feature is an N-gram or co-occurrence of words, parts of speech, phonemes, etc., frequency of dependency obtained from the result of applying syntactic analysis or dependency analysis, boolean (A binary representation of presence / absence) is used. However, the form of the list is not necessarily limited to the column of feature vectors, and any form that can finally extract a feature vector may be used even if it is an expression form such as a network. Note that the symbol sequence can be expressed by a feature vector by the following method (see Non-Patent Document 3). For example, consider a method of representing a symbol sequence OOXX with a symbol set {◯, χ, △} by a feature vector. If a feature value is 1 when a symbol appears in the symbol series and 0 when it does not appear, 1 and △ do not appear in the symbol series XX So it becomes 0. The feature vector is a vector representation of such a feature as [1, 1, 0] ^T. When a natural language word string is handled as a symbol series, additional information such as a syntax analysis result of each symbol series and its score is added, and a feature vector may be created including the information.

次に、学習で得られたモデルを参照し、シンボル系列に応じたスコアを付与する（Ｓ１２）。スコアの算出方法は多様である。ベクトルｗが予め学習により得られたモデルパラメータであるとき、素性ベクトルにより表現されたシンボル系列ｆ_i,jのスコアＳ_ｗ(ｆ_i,j)は、例えばＳ_ｗ(ｆ_i,j)＝ｗ^Ｔ・ｆ_i,jにより算出することができる（ｉはリストのインデックス（ｉ＝１、２、・・・、Ｎ）、ｊは各リストｉにおけるシンボル系列のインデックス（ｊ＝１、２、・・・、ｎ_ｉ））、Ｔは行列の転置）。 Next, referring to the model obtained by learning, a score corresponding to the symbol series is given (S12). There are various ways to calculate the score. When the vector w is a model parameter obtained by learning in advance _, the score S _w (f _{i, j} ) of the symbol sequence f _{i, j} represented by the feature vector is, for example, S _w (f _{i, j} ) = w ^T · f _{i, j} can be calculated (i is an index of a list (i = 1, 2,..., N), j is an index of a symbol sequence in each list i (j = 1, 2,. .., n _i )), T is the transpose of the matrix).

そして、付与されたスコアに従いシンボル系列ｆ_i,jを並べ替えることで、リスト内のシンボル系列を正解に近い順に整列することができる（Ｓ１３）。 Then, by rearranging the symbol series f _{i, j} in accordance with the assigned score, the symbol series in the list can be arranged in the order closest to the correct answer (S13).

また、スコア算出に用いるモデルパラメータｗを推定する方法を図８を用いて、以下説明する。 A method for estimating the model parameter w used for score calculation will be described below with reference to FIG.

まず、複数のシンボル系列からなるリストを複数読み込む（Ｓ２１）。読み込むリストの数が多いほど、様々なデータに対して高精度に機能するモデルパラメータが得られることを期待できる。また、各リストの正解シンボル系列もあわせて読み込む。ただし、正解のシンボル系列と同一のシンボル系列が各リストに含まれていても、含まれていなくてもよい。 First, a plurality of lists consisting of a plurality of symbol series are read (S21). As the number of lists to be read increases, it can be expected that model parameters that function with high accuracy can be obtained for various data. Also, the correct symbol series of each list is read together. However, the same symbol series as the correct symbol series may or may not be included in each list.

次に、読み込まれた情報をもとにモデルパラメータｗを学習により推定する（Ｓ２２）。パラメータの推定は正解シンボル系列に他のシンボル系列より高いスコアが付与されるように行う。つまり、正解シンボル系列に付与されたスコアより大きなスコアが付与されるシンボル系列の数ErrorCountを小さくするようにモデルパラメータｗを決めればよい。例えば、式(1)を最小化するｗを求める。 Next, the model parameter w is estimated by learning based on the read information (S22). The parameter is estimated so that the correct symbol series is given a higher score than the other symbol series. That is, the model parameter w may be determined so as to reduce the number ErrorCount of symbol series to which a score higher than the score given to the correct symbol series is given. For example, w that minimizes the expression (1) is obtained.

ここで、Ｉ(x)はｘの値が正の時に０、それ以外の時に１を与える関数、ｆ_i,0は正解シンボル系列、Ｎはリストの数、ｎ_ｉはリストｉに含まれるシンボル系列の数である。 Here, I (x) is a symbol included 0 when the value of x is positive, the function which gives 1 in other cases, f _{i, 0} is correct symbol sequence, N is the number of lists, the n _i list i The number of series.

もっとも、音声認識機や機械翻訳機から出力される各シンボル系列ｆ_i,jには通常、任意の評価尺度（例えばリスト内での単語誤り率の順位など）に基づく重要度ｅ_i,jが付与されているため、これをパラメータの推定に用いることで推定精度を高めることができる。例えば、非特許文献３にて開示されているExpLoss Boosting(ELBst)法によれば、式(2)のＬの値を最小化するｗを求めればよい。 Of course, each symbol series f _{i, j} output from a speech recognizer or machine translator usually has an importance e _{i, j} based on an arbitrary evaluation measure (for example, the rank of word error rates in the list). Since it is given, the estimation accuracy can be increased by using this for parameter estimation. For example, according to the ExpLoss Boosting (ELBst) method disclosed in Non-Patent Document 3, w that minimizes the value of L in Equation (2) may be obtained.

式(2)においては、特に素性値が０、１の二値である時に効率的にｗを推定するアルゴリズムが存在する。 In the equation (2), there is an algorithm for efficiently estimating w particularly when the feature value is binary of 0 and 1.

Z.Zhou, J.Gao, F.K.Soong, and H.Meng,"A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization," Proceedings of ICASSP, 2006, Vol.1, p.141-144Z. Zhou, J. Gao, FKSoong, and H. Meng, "A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization," Proceedings of ICASSP, 2006, Vol.1, p.141 -144 小林彰夫, 佐藤庄衛, 尾上和穂, 本間真一, 今井亨, 都木徹,「単語ラティスの識別的スコアリングによる音声認識」, 日本音響学会講演論文集, 2007年9月, p.233-234Akio Kobayashi, Shohei Sato, Kazuho Onoe, Shinichi Honma, Satoshi Imai, Toru Toki, “Speech Recognition by Discriminative Scoring of Word Lattice”, Proceedings of the Acoustical Society of Japan, September 2007, p.233-234 M.Collins and T.Koo,"Discriminative Reranking for Natural Language Parsing," Association for Computational Linguistics, 2005, Vol.31, No.1, p.25-70M. Collins and T. Koo, "Discriminative Reranking for Natural Language Parsing," Association for Computational Linguistics, 2005, Vol. 31, No. 1, p. 25-70 F.J.Och,"Minimum Error Rate Training in Statistical Machine Translation," Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 2003, p.160-167F.J.Och, "Minimum Error Rate Training in Statistical Machine Translation," Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 2003, p.160-167 B.Roark, M.Saraclar and M.Collins,"Corrective Language Modeling For Large Vocabulary ASR with The Perceptron Algorithm," Association for Computational Linguistics, Proceedings of ICASSP, 2004, Vol.1, p.749-752B.Roark, M.Saraclar and M.Collins, "Corrective Language Modeling For Large Vocabulary ASR with The Perceptron Algorithm," Association for Computational Linguistics, Proceedings of ICASSP, 2004, Vol.1, p.749-752

式(2)によるパラメータ推定方法の場合、各リストｉのΣの計算において、正解シンボル系列ｆ_i,0はｎ_ｉ個すべての項に出現するのに対し、その他のシンボル系列ｆ_i,jはそれぞれｊ番目の項、すなわち１つの項のみに出現する。そのため、Ｌを最小化するモデルパラメータｗには多くのリストに共通する傾向を強く反映すべきであるところ、あるリストの正解シンボル系列に顕著な特徴があれば、その影響を強く受けた精度の低いモデルパラメータｗが算出されてしまうという問題が生じる。すなわち、多くのリストに共通する誤りに低いスコアを与えるより、ひとつのリストの正解に高いスコアを与えることが計算上重視されることになる。 For parameter estimation method according to equation (2), in the calculation of Σ of each list i, while correct symbol sequence f _{i, 0} is appearing n _i or all sections other symbol sequence f _{i, j} is Each appears in the j-th term, that is, only one term. Therefore, the model parameter w that minimizes L should strongly reflect the tendency common to many lists. If there is a remarkable feature in the correct symbol sequence of a certain list, There arises a problem that a low model parameter w is calculated. That is, it is important to give a high score to the correct answer of one list rather than giving a low score to errors common to many lists.

本発明はこのような問題の影響を軽減し、従来より高精度にモデルパラメータを推定することが可能なモデルパラメータ推定装置、方法及びプログラムを提供することを目的とする。 An object of the present invention is to provide a model parameter estimation apparatus, method, and program capable of reducing the influence of such a problem and estimating a model parameter with higher accuracy than before.

本発明のモデルパラメータ推定装置は、それぞれ重要度ｅ_i,jが割り当てられ素性ベクトルで表現された複数のシンボル系列ｆ_i,jからなる、１以上のリストｉ（ｉはリストのインデックス（ｉ＝１、２、・・・、Ｎ）、ｊは各ｉにおけるシンボル系列のインデックス（ｊ＝１、２、・・・、ｎ_ｉ））と、それぞれ素性ベクトルで表現された各リストｉの正解シンボル系列ｆ_i,0とが入力され、モデルパラメータｗを推定するモデルパラメータ推定装置であって、重要度変換部とモデルパラメータ推定部とを備える。 Model parameter estimation device of the present invention, importance e _{i respectively,} the plurality of symbol sequences f _i that is expressed by _j is assigned feature _vector, consisting of _j, 1 or more lists i (i is the index of the list (i = 1, 2, · · ·, n), j is the index (j = 1, 2 of the symbol sequence for each i, · · _{·, n} i)) and, correct symbol of each list i which are respectively represented by a feature vector A model parameter estimation device that receives a sequence f _{i, 0} and estimates a model parameter w, and includes an importance level conversion unit and a model parameter estimation unit.

重要度変換部は、上記重要度ｅ_i,jをリストごとに所定のシンボル系列の重要度の値が上記所定のシンボル系列以外のシンボル系列の重要度の値に比べ相対的に大きな値になるように変換する。 The importance level conversion unit sets the importance level e _{i, j} to a relatively large value for the importance level of a predetermined symbol sequence for each list as compared to the importance level values of symbol sequences other than the predetermined symbol sequence. Convert as follows.

モデルパラメータ推定部は、上記シンボル系列ｆ_i,jと上記正解シンボル系列ｆ_i,0と上記変換後の重要度とから、モデルパラメータｗを推定する。 The model parameter estimation unit estimates the model parameter w from the symbol series f _{i, j} , the correct symbol series f _{i, 0,} and the importance after the conversion.

本発明のモデルパラメータ推定装置、方法及びプログラムによれば、リスト内の各シンボル系列に付与された重要度を適宜変換して、一部の不正解シンボル系列に大きな重要度を与え正解シンボル系列の影響を相対的に低減することにより、従来より高精度にモデルパラメータを推定することができる。 According to the model parameter estimation apparatus, method, and program of the present invention, the importance assigned to each symbol series in the list is appropriately converted, and a large importance is given to some incorrect answer symbol series to correct the correct symbol series. By reducing the influence relatively, the model parameters can be estimated with higher accuracy than in the past.

モデルパラメータ推定装置１００の機能構成例を示す図。The figure which shows the function structural example of the model parameter estimation apparatus 100. FIG. モデルパラメータ推定装置１００の処理フロー例を示す図。The figure which shows the example of a processing flow of the model parameter estimation apparatus 100. パラメータの値の相違によるガウス型関数の形状の相違を示すイメージ図。The image figure which shows the difference in the shape of a Gaussian function by the difference in the value of a parameter. 検証に用いた学習用・開発用・評価用の各セットの内容を示す図。The figure which shows the content of each set for learning, development, and evaluation used for verification. 本発明と従来技術の単語誤り率の比較検証結果を示す図。The figure which shows the comparison verification result of the word error rate of this invention and a prior art. ガウス型関数の形状を変化させた場合の単語誤り率の変化の検証結果を示す図。The figure which shows the verification result of the change of the word error rate at the time of changing the shape of a Gaussian function. シンボル系列の並べ替え処理フローの例を示す図。The figure which shows the example of the rearrangement process flow of a symbol series. モデル学習の処理フローの例を示す図。The figure which shows the example of the processing flow of model learning.

図１に本発明のモデルパラメータ推定装置１００の機能構成例を、図２にその処理フロー例をそれぞれ示す。モデルパラメータ推定装置１００は、それぞれ重要度ｅ_i,jが割り当てられ素性ベクトルで表現された複数のシンボル系列ｆ_i,jからなる、１以上のリストｉ（ｉはリストのインデックス（ｉ＝１、２、・・・、Ｎ）、ｊは各ｉにおけるシンボル系列のインデックス（ｊ＝１、２、・・・、ｎ_ｉ））と、それぞれ素性ベクトルで表現された各リストｉの正解シンボル系列ｆ_i,0とが入力され、モデルパラメータｗを推定して出力する装置であり、重要度変換部１０１とモデルパラメータ推定部１０２とを備える。 FIG. 1 shows a functional configuration example of the model parameter estimation apparatus 100 of the present invention, and FIG. 2 shows a processing flow example thereof. Model parameter estimation device 100, respectively importance e _i, a plurality of symbol sequences expressed in _j is assigned feature vector f _i, consisting of _j, 1 or more lists i (i is the index of the list (i = 1, 2,..., N), j is the symbol sequence index (j = 1, 2,..., N _i )) for each i, and the correct symbol sequence f for each list i represented by a feature vector. _{i, 0} is input, and the apparatus estimates and outputs the model parameter w, and includes an importance conversion unit 101 and a model parameter estimation unit 102.

重要度変換部１０１は、上記重要度ｅ_i,jを、各リストごとに所定のシンボル系列の重要度が他のシンボル系列の重要度に比べ相対的に大きな重要度になるように変換し、変換後の重要度Ｅ(ｅ_i,j)を出力する（Ｓ１）。具体的には、一部の不正解シンボル系列に大きな重要度を与えることで、正解シンボル系列の影響を相対的に低減する。そのため、変換に用いる関数Ｅは単一のもしくは少数の凸型形状を持つ関数であり、例えば式(3)のようなガウス型関数が対応する。 The importance level conversion unit 101 converts the importance levels e _{i, j so} that the importance level of a predetermined symbol series is relatively higher than that of other symbol series for each list, The degree of importance E (e _{i, j} ) after conversion is output (S1). Specifically, by giving a large importance to a part of the incorrect answer symbol sequences, the influence of the correct answer symbol sequences is relatively reduced. Therefore, the function E used for conversion is a function having a single or a small number of convex shapes, and corresponds to, for example, a Gaussian function such as Expression (3).

ここで、(a,b,c)はガウス型関数の形状を決めるパラメータであり、学習に先立ち事前に与えるパラメータである。ａは振幅を制御する変数、ｂはガウス関数の中心位置を制御する変数、ｃはガウス関数の幅を制御する変数である。図３は(a,b,c)の値を変化させた時のガウス型関数の形状を示したものである。なお、図３ではｅ_i,jを５０００までに制限している（それ以上の値は学習データに出現しない）ものとし、それ以降は点線で示している。 Here, (a, b, c) are parameters for determining the shape of the Gaussian function, and are parameters given in advance prior to learning. a is a variable for controlling the amplitude, b is a variable for controlling the center position of the Gaussian function, and c is a variable for controlling the width of the Gaussian function. FIG. 3 shows the shape of the Gaussian function when the values of (a, b, c) are changed. In FIG. 3, it is assumed that e _{i, j} is limited to 5000 (values higher than that do not appear in the learning data), and the subsequent lines are indicated by dotted lines.

大きな重要度を与えるシンボル系列の選択（抽出）方法としては例えば、
・各リストにおいて、重要度の値が大きいものから順に所定の個数（例えば１〜５０個程度）を抽出
・予め定めた値以上の重要度を持つものを抽出
・重要度の値の大小にかかわらず等確率でランダムに、所定の個数（例えば１〜５０個程度）を抽出
・重要度の値が大きいものから順次等間隔で抽出
することなどが考えられる。また、これらを複合し、例えば、予め定めた値以上の重要度を持つシンボル系列を抽出したものと、予め定めた値より小さい重要度を持つシンボル系列について所定の個数をランダムに又は重要度の値が大きいものから順次等間隔で抽出したものの組み合わせとして抽出しても構わない。 As a method of selecting (extracting) a symbol series that gives a large importance, for example,
・ In each list, extract a predetermined number (for example, about 1 to 50) in descending order of importance value. ・ Extract those with importance level higher than a predetermined value. Randomly extract a predetermined number (for example, about 1 to 50) with equal probability. • It may be possible to sequentially extract at a regular interval in descending order of importance. Also, by combining these, for example, a predetermined number of symbol sequences having importance levels lower than a predetermined value and those obtained by extracting a symbol sequence having an importance level higher than a predetermined value are randomly or You may extract as a combination of the thing extracted sequentially at equal intervals from the thing with a big value.

大きな重要度を与えるシンボル系列を選択することは、フィルタのピーク位置の指定と等価である。ピーク位置の決定後は、一部のシンボル系列に他より大きな重要度を与えるという本発明の趣旨に沿う範囲において、ピーク位置をつなぐ任意の関数を設計するとよい。ただし、無限（に近い）値をピーク位置に与えることや、ピーク間にマイナスの無限（に近い）値を与えることや負値の重要度を許容しないパラメータ推定法を用いる際に負値を返すフィルタを設計するなど、明らかにパラメータ推定に悪影響を与える関数を設定すべきでないことは言うまでもない。 Selecting a symbol sequence that gives a large importance is equivalent to specifying a peak position of the filter. After the peak position is determined, an arbitrary function that connects the peak positions may be designed in a range that is in accordance with the gist of the present invention that gives a higher importance to some symbol sequences than others. However, when an infinite (close) value is given to the peak position, a negative infinite (close) value is given between peaks, or a parameter estimation method that does not allow the importance of negative values is returned, a negative value is returned. It goes without saying that functions that obviously have an adverse effect on parameter estimation, such as designing a filter, should not be set.

具体的に用いる関数Ｅを決めるには様々なＥを試し、開発セットなどで評価して最良のもの（式(3)においては（a,b,c）の値)を選択するとよい。また、学習セットをいくつかに分割し、ひとつを開発セットとして見立て他で学習を行うことを、すべての組み合わせに対して行うクロスバリデーション法によっても、開発セットを用いて最良のＥを選択する方法と同様な結果を得ることができる。 In order to determine the function E to be used specifically, it is preferable to try various E, evaluate the development set, etc., and select the best one (value of (a, b, c) in the expression (3)). The best way to select the best E using the development set is to divide the learning set into several parts and use the development set as a cross-validation method, where one is considered as a development set and the other is used for learning. Similar results can be obtained.

なお、本発明はシンボル系列に重要度が付されていない場合や、重要度が付されていても全てのシンボル系列の重要度が同じである場合においても、各シンボル系列に対しリスト内でランダムに与えた値や、素性ベクトル空間上での正解からの距離などを重要度とみなすことにより適用することが可能である。 In the present invention, each symbol series is randomly selected in the list even when the importance is not attached to the symbol series, or even when the importance is attached to all the symbol series. It is possible to apply by considering the value given to, or the distance from the correct answer in the feature vector space as the importance.

モデルパラメータ推定部１０２は、上記シンボル系列ｆ_i,jと上記正解シンボル系列ｆ_i,0と上記変換後の重要度Ｅ(ｅ_i,j)とから、モデルパラメータｗを計算して出力する（Ｓ２）。例えば、非特許文献３にて開示されているELBst法による式(2)を式(4)のように変形し、式(4)のＬの値を最小化するｗを求めればよい。 The model parameter estimation unit 102 calculates and outputs a model parameter w from the symbol series f _{i, j} , the correct symbol series f _{i, 0,} and the converted importance E (e _{i, j} ) ( S2). For example, equation (2) based on the ELBst method disclosed in Non-Patent Document 3 may be transformed into equation (4) to obtain w that minimizes the value of L in equation (4).

式(4)においては、特に素性値が０、１の二値である時に効率的にｗを推定するアルゴリズムが存在する。 In the equation (4), there is an algorithm for efficiently estimating w particularly when the feature value is binary of 0 and 1.

なお、最良のｗの推定にあたっては、学習の初期値や収束条件、ハイパーパラメータなどを与えることにより開発セットを用いて行うのが一般的である。 Note that the best w is generally estimated by using a development set by giving initial values of learning, convergence conditions, hyper parameters, and the like.

＜効果の検証＞
日本語話し言葉コーパス（ＣＳＪ）を用い、本発明の効果を検証する。ＣＳＪは講演音声データとその書き起こしからなるデータベースである。なお、検証にあたり、図４に示す学習用と開発・評価用セットを用意した。 <Verification of effects>
A Japanese spoken corpus (CSJ) is used to verify the effect of the present invention. CSJ is a database consisting of speech data and transcripts. For verification, a learning set and a development / evaluation set shown in FIG. 4 were prepared.

講演を発話単位に分割し、音声認識システムで5000-bestリストを作成した。つまり、リストの数は発話数に一致する。そして、シンボル系列は音声認識結果であり、各リストに最大５０００のシンボル系列が存在する。素性にはuni、bi-、tri-gram boolean及び音声認識スコアを用いた。また、重要度には各シンボル系列のリスト中の順位（単語誤り率の昇順）を用いた。なお、図４に示す単語誤り率は、音声認識システムの出力した5000-bestリストのうち、最も大きな認識スコアを持つ認識結果に対して算出されたものである。 The lecture was divided into utterance units, and a 5000-best list was created with a speech recognition system. That is, the number of lists matches the number of utterances. The symbol series is a speech recognition result, and there are a maximum of 5000 symbol series in each list. For the feature, uni, bi-, tri-gram boolean and speech recognition score were used. For the importance, the rank in the list of each symbol series (ascending order of word error rate) was used. Note that the word error rate shown in FIG. 4 is calculated for the recognition result having the largest recognition score in the 5000-best list output by the speech recognition system.

モデルパラメータｗを、重要度ｅ_i,jをそのまま使う式(2)と重要度をＥ(ｅ_i,j)＝Ｅ_gauss(ｅ_i,j、a、b、c)として変換した式(4)によりそれぞれ求め、これらを用いてシンボル系列を並べ替えて、それぞれ最終的に最も高いスコアを持つシンボル系列を音声認識結果とし、両者の単語誤り率を比較した（図５）。なお、あわせて関数Ｅ_gauss(ｅ_i,j、a、b、c)の形状(a,b,c)を変化させた場合の単語誤り率の相違についても検証した（図６）。 Formula (2) that uses model parameter w as the importance e _{i, j} as it is and Formula (4) that converts the importance as E (e _{i, j} ) = E _gauss (e _{i, j} , a, b, c) ), And rearranging the symbol series using these, the symbol series having the highest score in the end is used as the speech recognition result, and the word error rates of the two are compared (FIG. 5). In addition, the difference in word error rate when the shape (a, b, c) of the function E _gauss (e _{i, j} , a, b, c) was changed was also verified (FIG. 6).

図５から、本発明による重要度変換により単語誤り率が低減できていることがわかる。また、図６からＥ_gauss(ｅ_i,j、a、b、c)のｂの値が大きいほど、ｃの値が小さいほど単語誤り率を低減できていることがわかる。前記のとおり、ｃはガウス関数の幅を制御する変数であるため、各リストにおいて少数のシンボル系列に相対的に大きな重みを与えることで高精度なモデル生成につながることが今回の検証から確認できたと言える。 FIG. 5 shows that the word error rate can be reduced by the importance conversion according to the present invention. Further, FIG. 6 shows that the word error rate can be reduced as the value of b of E _gauss (e _{i, j} , a, b, c) is larger and the value of c is smaller. As described above, since c is a variable that controls the width of the Gaussian function, it can be confirmed from this verification that a relatively large weight is given to a small number of symbol sequences in each list, leading to high-accuracy model generation. I can say.

以上のように、本発明のモデルパラメータ推定装置及び方法によれば、リスト内の各シンボル系列に付与された重要度を適宜変換して、一部の不正解シンボル系列に大きな重要度を与え正解シンボル系列の影響を相対的に低減することにより、従来より高精度なモデルパラメータを生成することができる。 As described above, according to the model parameter estimation apparatus and method of the present invention, the importance given to each symbol series in the list is appropriately converted, and a large degree of importance is given to some incorrect answer symbol series to give a correct answer. By relatively reducing the influence of the symbol series, it is possible to generate model parameters with higher accuracy than in the past.

上記の各装置をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。この場合、処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。また、上記の各種処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 When each of the above devices is realized by a computer, processing contents of functions that each device should have are described by a program. The processing functions are realized on the computer by executing the program on the computer. In this case, at least a part of the processing content may be realized by hardware. Further, the various processes described above are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

Claims

One or more lists i (i is an index of the list (i = 1, 2,..., N) each consisting of a plurality of symbol sequences f _{i, j} each assigned an importance e _{i, j} and represented by a feature vector. ), J is input with the symbol sequence index (j = 1, 2,..., N _i )) for each i and the correct symbol sequence f _{i, 0 for} each list i expressed by a feature vector. A model parameter estimation device for estimating the model parameter w,
Importance for converting the importance e _{i, j so} that the importance value of a predetermined symbol sequence is relatively larger than the importance value of a symbol sequence other than the predetermined symbol sequence for each list A degree converter,
A model parameter estimator for estimating a model parameter w from the symbol series f _{i, j} , the correct symbol series f _{i, 0,} and the importance after the conversion;
A model parameter estimation device comprising:

The model parameter estimation apparatus according to claim 1, wherein
The model parameter estimation apparatus according to claim 1, wherein the predetermined symbol series is obtained by extracting a predetermined number of symbol series f _{i, j} in descending order of importance.

The model parameter estimation apparatus according to claim 1, wherein
The predetermined symbol series is obtained by extracting a symbol series f _{i, j} having an importance level equal to or higher than a predetermined value and a predetermined number of symbol series f _{i, j} having an importance level lower than a predetermined value. A model parameter estimation apparatus characterized by being extracted at equal intervals sequentially from a random or large importance value.

One or more lists i (i is an index of the list (i = 1, 2,..., N) each consisting of a plurality of symbol sequences f _{i, j} each assigned an importance e _{i, j} and represented by a feature vector. ), J is an index of a symbol sequence in each i (j = 1, 2,..., N _i )) and a correct symbol sequence f _{i, 0 of} each list i expressed by a feature vector. A model parameter estimation method for estimating the model parameter w,
Importance for converting the importance e _{i, j so} that the importance value of a predetermined symbol sequence is relatively larger than the importance value of a symbol sequence other than the predetermined symbol sequence for each list A degree conversion step;
A model parameter estimating step for estimating a model parameter w from the symbol series f _{i, j} , the correct symbol series f _{i, 0} and the importance after the conversion;
Model parameter estimation method to execute.

The model parameter estimation method according to claim 4, wherein
The model parameter estimation method, wherein the predetermined symbol series is obtained by extracting a predetermined number of symbol series f _{i, j} in descending order of importance.

The model parameter estimation method according to claim 4, wherein
The predetermined symbol series is obtained by extracting a symbol series f _{i, j} having an importance level equal to or higher than a predetermined value and a predetermined number of symbol series f _{i, j} having an importance level lower than a predetermined value. A model parameter estimation method characterized by being extracted at equal intervals sequentially from a random or large importance value.

The program for functioning a computer as an apparatus in any one of Claims 1 thru | or 3.