JP5264649B2

JP5264649B2 - Information compression model parameter estimation apparatus, method and program

Info

Publication number: JP5264649B2
Application number: JP2009189112A
Authority: JP
Inventors: 隆伸大庭; 貴明堀; 篤中村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-08-18
Filing date: 2009-08-18
Publication date: 2013-08-14
Anticipated expiration: 2029-08-18
Also published as: JP2011039432A

Description

本発明は、シンボル系列の並べ替え問題におけるモデル学習に用いる情報圧縮型モデルパラメータ推定装置、方法及びプログラムに関する。 The present invention relates to an information compression model parameter estimation apparatus, method, and program used for model learning in a symbol sequence rearrangement problem.

音声認識や機械翻訳では、暫定的な複数の認識結果や翻訳結果（単語系列）を出力し、その中から誤りの少ない（正解に近い）系列を見つけることで、認識や翻訳の精度を改善することができる。音声認識機や機械翻訳機が出力する個々の正解候補の単語列をシンボル系列、出力される複数の正解候補の組をリストと呼ぶとき、このようなリストからの正解シンボル系列の抽出は、一般に各シンボル系列にスコアを付与し、リスト内のシンボル系列をスコア順に並べ替えることにより行う。すなわち、通常は最も大きなスコアを持つ単語列が認識・翻訳結果であり、仮にそうでない場合にも順次スコアの高いシンボル系列を検証することで正解に近いシンボル系列の効率的な結果の抽出を実現している（音声認識につき非特許文献１、２、６参照、機械翻訳につき非特許文献３、４参照）。 In speech recognition and machine translation, multiple provisional recognition results and translation results (word sequences) are output, and by finding a sequence with few errors (close to the correct answer), recognition and translation accuracy are improved. be able to. When a word sequence of individual correct candidates output by a speech recognizer or machine translator is called a symbol series, and a set of output correct candidates is called a list, the extraction of the correct symbol series from such a list is generally performed. A score is assigned to each symbol series, and the symbol series in the list is rearranged in the order of score. In other words, the word sequence with the highest score is usually the recognition / translation result, and even if it is not, it is possible to efficiently extract the result of the symbol sequence close to the correct answer by sequentially verifying the symbol sequence with the highest score. (See Non-Patent Documents 1, 2 and 6 for speech recognition, and Non-Patent Documents 3 and 4 for machine translation).

シンボル系列からなるリストから目的のシンボル系列を抽出する際には、一般に、予め学習により得られたモデルが用いられる。以下、予め用意されたモデルを用いて正解に近い系列を見つける方法を図８を用いて説明する。 When a target symbol series is extracted from a list of symbol series, a model obtained by learning in advance is generally used. Hereinafter, a method of finding a sequence close to the correct answer using a model prepared in advance will be described with reference to FIG.

まず、複数のシンボル系列からなるリストを読み込む（Ｓ１１）。各シンボル系列は一般に素性（特徴）ベクトルにより表現され、素性には単語、品詞、音素などのN-gramや共起、構文解析や係り受け解析を適応した結果から得られる依存関係の頻度、boolean（有無を二値表現したもの）などが用いられる。もっとも、リストの形態は必ずしも素性ベクトルの列に限られず、ネットワークのような表現形態であっても最終的に素性ベクトルが抽出できる形態であればよい。なお、シンボル系列は次のような方法により素性ベクトルで表現することができる（非特許文献３参照）。例えば、シンボル集合｛○、×、△｝からなるシンボル系列○○×○を素性ベクトルで表現する方法を考える。ある１つのシンボルがシンボル系列に出現した場合に１、出現しない場合に０の素性値をとるとした時、シンボル系列○○×○には、○と×は出現するため１、△は出現しないので０となる。素性ベクトルは、このような素性につき[１、１、０]^Ｔのようにベクトル表現したものである。シンボル系列として自然言語の単語列を扱う時には、各シンボル系列の構文解析結果やそのスコアなどの付加的な情報を加えてから、それらの情報も含めて素性ベクトルを作成する場合もある。 First, a list consisting of a plurality of symbol sequences is read (S11). Each symbol series is generally represented by a feature (feature) vector, and the feature is an N-gram or co-occurrence of words, parts of speech, phonemes, etc., frequency of dependency obtained from the result of applying parsing or dependency analysis, boolean (A binary representation of presence / absence) is used. However, the form of the list is not necessarily limited to a column of feature vectors, and any form that can finally extract a feature vector may be used even if it is an expression form such as a network. Note that the symbol sequence can be expressed by a feature vector by the following method (see Non-Patent Document 3). For example, consider a method of representing a symbol sequence OOXX with a symbol set {◯, χ, △} by a feature vector. If a feature value is 1 when a symbol appears in the symbol series and 0 when it does not appear, 1 and △ do not appear in the symbol series XXX because O and X appear. So it becomes 0. The feature vector is a vector representation of such a feature as [1, 1, 0] ^T. When a natural language word string is handled as a symbol series, additional information such as a syntax analysis result of each symbol series and its score is added, and a feature vector may be created including the information.

次に、学習で得られたモデルを参照し、シンボル系列に応じたスコアを付与する（Ｓ１２）。スコアの算出方法は多様である。ベクトルｗが予め学習により得られたモデルパラメータであるとき、素性ベクトルにより表現されたシンボル系列ｆ_i,jのスコアＳ_ｗ(ｆ_i,j)は、例えばＳ_ｗ(ｆ_i,j)＝ｗ^Ｔ・ｆ_i,jにより算出することができる（ｉはリストのインデックス（ｉ＝１、２、・・・、Ｎ）、ｊは各リストｉにおけるシンボル系列のインデックス（ｊ＝１、２、・・・、ｎ_ｉ））、Ｔは行列の転置）。 Next, referring to the model obtained by learning, a score corresponding to the symbol series is given (S12). There are various ways to calculate the score. When the vector w is a model parameter obtained by learning in advance _, the score S _w (f _{i, j} ) of the symbol sequence f _{i, j} represented by the feature vector is, for example, S _w (f _{i, j} ) = w ^T · f _{i, j} can be calculated (i is an index of a list (i = 1, 2,..., N), j is an index of a symbol sequence in each list i (j = 1, 2,. .., n _i )), T is the transpose of the matrix).

そして、付与されたスコアに従いシンボル系列ｆ_i,jを並べ替えることで、リスト内のシンボル系列を正解に近い順に整列することができる（Ｓ１３）。 Then, by rearranging the symbol series f _{i, j} according to the assigned score, the symbol series in the list can be arranged in the order from the closest to the correct answer (S13).

また、スコア算出に用いるモデルパラメータｗを推定する方法を図９を用いて、以下説明する。 Further, a method for estimating the model parameter w used for score calculation will be described below with reference to FIG.

まず、複数のシンボル系列からなるリストを複数読み込む（Ｓ２１）。読み込むリストの数が多いほど、様々なデータに対して高精度に機能するモデルパラメータが得られることを期待できる。また、各リストの正解シンボル系列もあわせて読み込む。ただし、正解のシンボル系列と同一のシンボル系列が各リストに含まれていても、含まれていなくてもよい。 First, a plurality of lists consisting of a plurality of symbol series are read (S21). As the number of lists to be read increases, it can be expected that model parameters that function with high accuracy can be obtained for various data. Also, the correct symbol series of each list is read together. However, the same symbol series as the correct symbol series may or may not be included in each list.

次に、読み込まれた情報をもとにモデルパラメータｗを学習により推定する（Ｓ２２）。パラメータの推定は正解シンボル系列に他のシンボル系列より高いスコアが付与されるように行う。つまり、正解シンボル系列に付与されたスコアより大きなスコアが付与されるシンボル系列の数ErrorCountを小さくするようにモデルパラメータｗを決めればよい。例えば、式(1)を最小化するｗを求める。 Next, the model parameter w is estimated by learning based on the read information (S22). The parameter is estimated so that the correct symbol series is given a higher score than the other symbol series. That is, the model parameter w may be determined so as to reduce the number ErrorCount of symbol series to which a score higher than the score given to the correct symbol series is given. For example, w that minimizes the expression (1) is obtained.

ここで、Ｉ(x)はｘの値が正の時に０、それ以外の時に１を与える関数、ｆ_i,0は正解シンボル系列、Ｎはリストの数、ｎ_ｉはリストｉに含まれるシンボル系列の数である。また、非特許文献５にはGlobal Conditional Log-linear Model(GCLM)法によるモデルパラメータｗの決定方法が開示されており、これによる場合は式(2)のＬの値を最小化するｗを求めればよい。 Here, I (x) is a symbol included 0 when the value of x is positive, the function which gives 1 in other cases, f _{i, 0} is correct symbol sequence, N is the number of lists, the n _i list i The number of series. Non-Patent Document 5 discloses a method for determining a model parameter w by the Global Conditional Log-linear Model (GCLM) method. In this case, w for minimizing the value of L in Equation (2) can be obtained. That's fine.

ここで‖ｗ‖はノルムであり、これを用いることでロバストな推定結果が得られることが知られている。また、Ｃはハイパーパラメータであり、開発セットなどを用いて決定する。式(2)によればモデルパラメータｗの推定結果が大局的な最適解に収束することが保証されている。モデルパラメータｗの推定は、具体的には公知のＬ−ＢＦＧＳなどの手法により行うことができる。 Here, ‖w‖ is a norm, and it is known that a robust estimation result can be obtained by using this norm. C is a hyperparameter, which is determined using a development set or the like. According to Equation (2), it is guaranteed that the estimation result of the model parameter w converges to a global optimum solution. Specifically, the model parameter w can be estimated by a known method such as L-BFGS.

もっとも、音声認識機や機械翻訳機から出力される各シンボル系列ｆ_i,jには通常、任意の評価尺度（例えばリスト内での順位など）に基づく重要度ｅ_i,jが付与されているため、これをパラメータの推定に用いることで推定精度を高めることができる。例えば、非特許文献３にて開示されているExpLoss Boosting(ELBst)法による場合は、式(3)のＬの値を最小化するｗを求めればよい。 Of course, each symbol series f _{i, j} output from a speech recognizer or machine translator is usually assigned an importance e _{i, j} based on an arbitrary evaluation scale (for example, ranking in a list). Therefore, estimation accuracy can be improved by using this for parameter estimation. For example, in the case of the ExpLoss Boosting (ELBst) method disclosed in Non-Patent Document 3, w that minimizes the value of L in Equation (3) may be obtained.

式(3)においては、特に素性値が０、１の二値である時に効率的にｗを推定するアルゴリズムが存在する。また、非特許文献４にて開示されているMinimum Error Rate Training(MERT)法による場合は、式(4)のＬの値を最小化するｗを求めればよい。 In equation (3), there is an algorithm for efficiently estimating w, particularly when the feature value is binary of 0 and 1. Further, in the case of the Minimum Error Rate Training (MERT) method disclosed in Non-Patent Document 4, it is only necessary to obtain w that minimizes the value of L in Equation (4).

ここでαはハイパーパラメータであり、開発セットなどを用いて決定する。式(4)によれ
ば正解シンボル系列を用いることなくモデルパラメータｗの推定を行うことができる。なお、モデルパラメータｗの推定は、具体的には公知のＬ−ＢＦＧＳなどの手法により行うことができる。 Here, α is a hyper parameter, which is determined using a development set or the like. According to Equation (4), the model parameter w can be estimated without using the correct symbol sequence. The estimation of the model parameter w can be specifically performed by a known method such as L-BFGS.

以上のモデルパラメータ推定方法の説明は、学習データ全体（＝すべてのリスト）を読み込み、全体最適化を行う学習方法（バッチ型）を前提としたものであるが、リストを１つずつ読み込み、その度にモデルパラメータを更新するオンライン型の学習方法も存在する。もっとも、オンライン型の学習でも良いパラメータ推定結果を得るため、一般には全てのリストを再帰的に複数回に渡って読み込ませる。通常の計算機ではデータ入力の時間がかかるため、全体を計算機上のメモリに読み込ませておく場合も多い。 The above model parameter estimation method is based on the learning method (batch type) that reads the entire learning data (= all lists) and performs overall optimization. There is also an online learning method that updates model parameters each time. However, in order to obtain a good parameter estimation result even in online learning, in general, all lists are recursively read a plurality of times. Since an ordinary computer takes time to input data, the entire computer is often read into a memory on the computer.

Z.Zhou, J.Gao, F.K.Soong, and H.Meng,"A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization," Proceedings of ICASSP, 2006, Vol.1, p.141-144Z. Zhou, J. Gao, FKSoong, and H. Meng, "A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization," Proceedings of ICASSP, 2006, Vol.1, p.141 -144 小林彰夫, 佐藤庄衛, 尾上和穂, 本間真一, 今井亨, 都木徹,「単語ラティスの識別的スコアリングによる音声認識」, 日本音響学会講演論文集, 2007年9月, p.233-234Akio Kobayashi, Shohei Sato, Kazuho Onoe, Shinichi Honma, Satoshi Imai, Toru Toki, “Speech Recognition by Discriminative Scoring of Word Lattice”, Proceedings of the Acoustical Society of Japan, September 2007, p.233-234 M.Collins and T.Koo,"Discriminative Reranking for Natural Language Parsing," Association for Computational Linguistics, 2005, Vol.31, No.1, p.25-70M. Collins and T. Koo, "Discriminative Reranking for Natural Language Parsing," Association for Computational Linguistics, 2005, Vol. 31, No. 1, p. 25-70 F.J.Och,"Minimum Error Rate Training in Statistical Machine Translation," Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 2003, p.160-167F.J.Och, "Minimum Error Rate Training in Statistical Machine Translation," Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 2003, p.160-167 B.Roark, M.Saraclar and M.Collins,"Discriminative n-gram language modeling," Computer Speech and Language, 2007, Vol.21, No.2, p.373-392B.Roark, M.Saraclar and M.Collins, "Discriminative n-gram language modeling," Computer Speech and Language, 2007, Vol.21, No.2, p.373-392 B.Roark, M.Saraclar and M.Collins,"Corrective Language Modeling For Large Vocabulary ASR with The Perceptron Algorithm," Association for Computational Linguistics, Proceedings of ICASSP, 2004, Vol.1, p.749-752B. Roark, M. Saraclar and M. Collins, "Corrective Language Modeling For Large Vocabulary ASR with The Perceptron Algorithm," Association for Computational Linguistics, Proceedings of ICASSP, 2004, Vol.1, p.749-752

モデルの学習には複数のリストを用意する必要があるが、ひとつのリストだけでも多くのシンボル系列を持つ。そのため全体では膨大な数のシンボル系列を扱う必要がある。例えば、音声認識や機械翻訳のように単語列をシンボル系列とする場合には、多くのデータに渡って高精度なモデルを生成するためにはリストも多数必要になり、各シンボル系列から多くの特徴を抽出する必要がある。例えば、非特許文献５ではそれぞれ１００〜１０００のシンボル系列を有するおよそ２８０，０００リストで学習を行っている。この場合、各シンボル系列から抽出した特徴（素性）を記憶するのに必要なメモリ領域を平均１００バイトと極めて小さく見積もっても、１０００×２８０，０００×１００＝２８ギガバイトのメモリ領域を消費する。このように膨大な作業領域（コンピュータのメモリなど）を必要とするため、汎用の計算機で扱うことは困難である。 To learn a model, it is necessary to prepare multiple lists, but a single list has many symbol sequences. Therefore, it is necessary to handle a huge number of symbol sequences as a whole. For example, when a word string is a symbol series as in speech recognition or machine translation, a large number of lists are required to generate a highly accurate model over a large amount of data. It is necessary to extract features. For example, in Non-Patent Document 5, learning is performed with approximately 280,000 lists each having 100 to 1000 symbol sequences. In this case, even if the memory area required to store the features (features) extracted from each symbol series is estimated to be as small as 100 bytes on average, a memory area of 1000 × 280,000 × 100 = 28 gigabytes is consumed. Since such a large work area (computer memory, etc.) is required, it is difficult to handle it with a general-purpose computer.

本発明は、このような問題を解消し、汎用の計算機で従来と同等な推定精度のモデルパラメータの推定処理を行うことが可能な情報圧縮型モデルパラメータ推定装置、方法及びプログラムを提供することを目的とする。 The present invention provides an information compression model parameter estimation apparatus, method, and program capable of solving such a problem and performing model parameter estimation processing with estimation accuracy equivalent to that of a conventional computer using a general-purpose computer. Objective.

本発明の一態様による情報圧縮型モデルパラメータ推定装置は、それぞれ重要度ｅ_i,jが割り当てられ素性ベクトルで表現されたｎ _ｉ個のシンボル系列ｆ_i,jからなる、Ｎ個のリストｉ（ｉはリストのインデックス、Ｎは１以上の整数、ｊは各ｉにおけるシンボル系列のインデックス、ｎ _ｉは４以上の整数）と、それぞれ素性ベクトルで表現された各リストｉの正解シンボル系列ｆ_i,0とが入力され、モデルパラメータｗを推定する情報圧縮型モデルパラメータ推定装置であって、グルーピング部とマージング部とモデルパラメータ推定部とを備える。 Information compression model parameter estimation apparatus according to an aspect of the present invention, respectively importance e _i, symbol sequence of n _i number expressed in feature vector _j is assigned f _i, consisting of _j, N number of list i ( i is the index of the list, n is the integer of 1 or more, j is the index of the symbol sequence for each i, n _i is the integer of 4 or more), the correct answer symbol sequence f _i for each list i which are respectively represented by a feature _{vector, 0} is an information compression type model parameter estimation apparatus that estimates the model parameter w, and includes a grouping unit, a merging unit, and a model parameter estimation unit.

グルーピング部は、上記リストｉごとに、リストに属する複数のシンボル系列ｆ_i,jを所定の方法により複数のグループＧ_i(x)（ｘはグループのインデックス）にグループ分けする。マージング部は、上記グループＧ _i (x)に属する複数のシンボル系列ｆ _i,j から代表シンボル系列ｆ _i,x を、また上記グループＧ _i (x)に属する複数のシンボル系列ｆ _i,j に対応する複数の重要度ｅ _i,j から代表重要度ｅ _i,x をそれぞれ求める。モデルパラメータ推定部は、上記代表シンボル系列ｆ _i,x と上記正解シンボル系列ｆ _i,0 と上記代表重要度ｅ _i,x とから、モデルパラメータｗを推定する。 For each list i, the grouping unit groups a plurality of symbol sequences f _{i, j} belonging to the list into a plurality of groups G _i (x) (x is a group index) by a predetermined method. Merging unit, the group G _i (x) belonging to a plurality of symbol sequences f _i, representatives from _j symbol sequence f _i, the _x, also the group G _i (x) belonging to a plurality of symbol sequences f _i, the _j The representative importance e _{i, x} is obtained from the corresponding importance e _{i, j} . The model parameter estimation unit estimates the model parameter w from the representative symbol series f _{i, x} , the correct symbol series f _{i, 0,} and the representative importance e _{i, x} .

本発明の他の態様による情報圧縮型モデルパラメータ推定装置は、それぞれ重要度ｅThe information compression type model parameter estimation device according to another aspect of the present invention has an importance e _{i,ji, j} が割り当てられ素性ベクトルで表現されたｎIs assigned and represented by a feature vector _ｉi 個のシンボル系列ｆSymbol series f _{i,ji, j} からなる、Ｎ個のリストｉ（ｉはリストのインデックス、Ｎは１以上の整数、ｊは各ｉにおけるシンボル系列のインデックス、ｎN lists i (where i is an index of the list, N is an integer of 1 or more, j is an index of a symbol sequence in each i, n _ｉi は４以上の整数）が入力され、モデルパラメータｗを推定する情報圧縮型モデルパラメータ推定装置であって、グルーピング部とマージング部とモデルパラメータ推定部とを備える。Is an information compression type model parameter estimation device that estimates a model parameter w, and includes a grouping unit, a merging unit, and a model parameter estimation unit.
グルーピング部は、上記リストｉごとに、リストに属する複数のシンボル系列ｆFor each list i, the grouping unit includes a plurality of symbol sequences f belonging to the list. _{i,ji, j} を所定の方法により複数のグループＧA plurality of groups G by a predetermined method _ii (x)（ｘはグループのインデックス）にグループ分けする。マージング部は、上記グループＧGroup into (x) (x is the index of the group). The merging part is group G _ii (x)に属する複数のシンボル系列ｆA plurality of symbol sequences f belonging to (x) _{i,ji, j} から代表シンボル系列ｆTo representative symbol series f _{i,xi, x} を、また上記グループＧAnd Group G above _ii (x)に属する複数のシンボル系列ｆA plurality of symbol sequences f belonging to (x) _{i,ji, j} に対応する複数の重要度ｅMultiple importance e corresponding to _{i,ji, j} から代表重要度ｅTo representative importance e _{i,xi, x} をそれぞれ求める。モデルパラメータ推定部は、上記代表シンボル系列ｆFor each. The model parameter estimator receives the representative symbol sequence f _{i,xi, x} と上記代表重要度ｅAnd the above-mentioned representative importance e _{i,xi, x} とから、モデルパラメータｗを推定する。From these, the model parameter w is estimated.

本発明の他の態様による情報圧縮型モデルパラメータ推定装置は、それぞれ素性ベクトルで表現されたｎAccording to another aspect of the present invention, there is provided an information compression type model parameter estimation apparatus in which n is represented by a feature vector. _ｉi 個のシンボル系列ｆSymbol series f _{i,ji, j} からなる、Ｎ個のリストｉ（ｉはリストのインデックス、Ｎは１以上の整数、ｊは各ｉにおけるシンボル系列のインデックス、ｎN lists i (where i is an index of the list, N is an integer of 1 or more, j is an index of a symbol sequence in each i, n _ｉi は４以上の整数）と、それぞれ素性ベクトルで表現された各リストｉの正解シンボル系列ｆIs an integer greater than or equal to 4) and the correct symbol sequence f of each list i represented by a feature vector. _{i,0i, 0} とが入力され、モデルパラメータｗを推定する情報圧縮型モデルパラメータ推定装置であって、グルーピング部とマージング部とモデルパラメータ推定部とを備える。Is an information compression model parameter estimation device that estimates the model parameter w, and includes a grouping unit, a merging unit, and a model parameter estimation unit.
グルーピング部は、上記リストｉごとに、リストに属する複数のシンボル系列ｆFor each list i, the grouping unit includes a plurality of symbol sequences f belonging to the list. _{i,ji, j} を所定の方法により複数のグループＧA plurality of groups G by a predetermined method _ii (x)（ｘはグループのインデックス）にグループ分けする。マージング部は、上記グループＧGroup into (x) (x is the index of the group). The merging part is group G _ii (x)に属する複数のシンボル系列ｆA plurality of symbol sequences f belonging to (x) _{i,ji, j} から代表シンボル系列ｆTo representative symbol series f _{i,xi, x} を求める。モデルパラメータ推定部は、上記代表シンボル系列ｆAsk for. The model parameter estimator receives the representative symbol sequence f _{i,xi, x} と上記正解シンボル系列ｆAnd the above correct symbol sequence f _{i,0i, 0} とから、モデルパラメータｗを推定する。From these, the model parameter w is estimated.

本発明の情報圧縮型モデルパラメータ推定装置、方法及びプログラムによれば、従来と同等な推定精度を確保しつつ、学習に使用するシンボル系列の情報を圧縮できるため、汎用の計算機でモデルパラメータの推定処理を行うことが可能となる。 According to the information compression model parameter estimation apparatus, method, and program of the present invention, since it is possible to compress symbol sequence information used for learning while ensuring the same estimation accuracy as in the past, model parameters can be estimated by a general-purpose computer. Processing can be performed.

情報圧縮型モデルパラメータ推定装置１００の機能構成例を示す図。The figure which shows the function structural example of the information compression type model parameter estimation apparatus 100. 情報圧縮型モデルパラメータ推定装置１００の処理フロー例を示す図。The figure which shows the example of a processing flow of the information compression type model parameter estimation apparatus. 検証に用いた学習用・開発用・評価用の各セットの内容を示す図。The figure which shows the content of each set for learning, development, and evaluation used for verification. データ保持に要したメモリサイズを示す図。The figure which shows the memory size required for data holding. モデルパラメータの推定にＥＬＢｓｔ法を用いた場合の本発明と従来技術の単語誤り率の比較検証結果を示す図。The figure which shows the comparison verification result of the word error rate of this invention at the time of using ELBst method for estimation of a model parameter, and a prior art. モデルパラメータの推定にＧＣＬＭ法を用いた場合の本発明と従来技術の単語誤り率の比較検証結果を示す図。The figure which shows the comparison verification result of the word error rate of this invention at the time of using GCLM method for estimation of a model parameter, and a prior art. モデルパラメータの推定にＭＥＲＴ法を用いた場合の本発明と従来技術の単語誤り率の比較検証結果を示す図。The figure which shows the comparison verification result of the word error rate of this invention at the time of using MERT method for estimation of a model parameter, and a prior art. シンボル系列の並べ替え処理フローの例を示す図。The figure which shows the example of the rearrangement process flow of a symbol series. モデル学習の処理フローの例を示す図。The figure which shows the example of the processing flow of model learning.

図１に本発明の情報圧縮型モデルパラメータ推定装置１００の機能構成例を、図２にその処理フロー例をそれぞれ示す。情報圧縮型モデルパラメータ推定装置１００は、それぞれ重要度ｅ_i,jが割り当てられ素性ベクトルで表現された複数のシンボル系列ｆ_i,jからなる、１以上のリストｉ（ｉはリストのインデックス（ｉ＝１、２、・・・、Ｎ）、ｊは各ｉにおけるシンボル系列のインデックス（ｊ＝１、２、・・・、ｎ_ｉ））と、それぞれ素性ベクトルで表現された各リストｉの正解シンボル系列ｆ_i,0とが入力され、モデルパラメータｗを推定して出力する装置であり、グルーピング部１０１とマージング部１０２とモデルパラメータ推定部１０３とを備える。 FIG. 1 shows a functional configuration example of the information compression model parameter estimation apparatus 100 of the present invention, and FIG. 2 shows a processing flow example thereof. The information compression model parameter estimation apparatus 100 includes one or more lists i (i is an index of a list (i), each of which is composed of a plurality of symbol sequences f _{i, j} each assigned an importance e _{i, j} and expressed by a feature vector. = 1, 2,..., N), j is the index of the symbol sequence in each i (j = 1, 2,..., N _i )) and the correct answer for each list i represented by a feature vector. The apparatus receives a symbol sequence f _{i, 0} and estimates and outputs a model parameter w, and includes a grouping unit 101, a merging unit 102, and a model parameter estimation unit 103.

グルーピング部１０１は、リストに属する複数のシンボル系列ｆ_i,jを所定の方法により複数のグループＧ_i(x)（ｘはグループのインデックス）にグループ分けする（Ｓ１）。グループ分けの仕方は任意であり、例えばＫ−ｍｅａｎｓなどの一般的な方法により、素性ベクトル空間上の素性ベクトル空間上での距離が近いものをグループ化する、又は重要度の値が近いものをグループ化することなどが考えられる。また、重要度が誤り率の場合に、正解シンボル系列に近いもののグループとその他のもののグループとに分け、更に正解シンボル系列を正解に近いもののグループに属させるなどの操作を行ってもよい。 The grouping unit 101 groups a plurality of symbol sequences f _{i, j} belonging to the list into a plurality of groups G _i (x) (x is a group index) by a predetermined method (S1). The method of grouping is arbitrary, for example, by using a general method such as K-means, groups of features in the feature vector space that are close to each other are grouped, or those having a close importance value are used. Grouping is possible. Further, when the degree of importance is an error rate, an operation may be performed such as dividing into a group that is close to the correct symbol sequence and another group, and further allowing the correct symbol sequence to belong to a group that is close to the correct answer.

マージング部１０２は、グループＧ_i(x)ごとに当該グループＧ_i(x)に属する複数のシンボル系列ｆ_i,jから代表シンボル系列ｆ_i,xを、また当該グループＧ_i(x)に属する各シンボル系列に対応する複数の重要度ｅ_i,jから代表重要度ｅ_i,xをそれぞれ求める（Ｓ２）。具体的には、例えば代表シンボル系列ｆ_i,xについては式(5)のマージ関数Ｆにより、代表重要度ｅ_i,xについては式(6)のマージ関数Ｅによりそれぞれ求める。なお、式(5),(6)において(ｆ_i,j,ｅ_i,j)はシンボル系列とそれに対応する重要度の組を表す。 Merging unit 102, the group G _i (x) the group G _i (x) belonging to a plurality of symbol sequences f _i for each _{representative} from _j symbol sequence f _i, the _x, also belonging to the group G _i (x) The representative importance e _{i, x} is obtained from the plurality of importance e _{i, j} corresponding to each symbol series (S2). Specifically, for example, the representative symbol series f _{i, x} is obtained by the merge function F in Expression (5), and the representative importance e _{i, x} is obtained by the merge function E in Expression (6). In equations (5) and (6), (f _{i, j} , e _{i, j} ) represents a set of symbol sequences and corresponding importance levels.

マージ関数Ｆにより代表シンボル系列ｆ_i,xにマージする例として、例えば式(7)、(8)に示す方法が挙げられる。 As an example of merging into the representative symbol series f _{i, x} by the merge function F, for example, methods shown in equations (7) and (8) can be given.

式(7)は代表シンボル系列をグループに属するシンボル系列のセントロイドとして求める方法である。また、式(8)は代表シンボル系列をグループに属するシンボル系列の重み付き内分点として求める方法であり、重要度を考慮することができないモデルパラメータ推定方法を採用する場合でも、代表ベクトル上で重要度を考慮することができるという利点がある。なお、これらのマージに加え、シンボル系列の素性ベクトルの各要素を量子化して小数桁の切り捨てを行うことで記憶領域を更に削減することができる。
マージ関数Ｅにより代表重要度ｅ_i,xにマージする方法についても、例えば式(9)に示す重要度の平均値を用いる方法などが挙げられる。 Expression (7) is a method for obtaining the representative symbol series as a centroid of the symbol series belonging to the group. Equation (8) is a method for obtaining a representative symbol sequence as a weighted internal point of a symbol sequence belonging to a group, and even when a model parameter estimation method that cannot take into account importance is adopted, There is an advantage that importance can be considered. In addition to these merging, the storage area can be further reduced by quantizing each element of the feature vector of the symbol series and truncating the decimal digits.
As a method of merging the representative importance e _{i, x} with the merge function E, for example, a method using an average value of importance shown in Expression (9) can be cited.

モデルパラメータ推定部１０３は、代表シンボル系列ｆ_i,xと正解シンボル系列ｆ_i,0と代表重要度ｅ_i,xとから、モデルパラメータｗを計算して出力する（Ｓ３）。例えば、非特許文献３にて開示されているＥＬＢｓｔ法による式(3)を式(10)のように変形し、式(10)のＬの値を最小化するｗを求めればよい。 The model parameter estimation unit 103 calculates and outputs a model parameter w from the representative symbol series f _{i, x} , the correct symbol series f _{i, 0,} and the representative importance e _{i, x} (S3). For example, equation (3) based on the ELBst method disclosed in Non-Patent Document 3 may be transformed into equation (10) to obtain w that minimizes the value of L in equation (10).

式(10)においては、特に素性値が０、１の二値である時に効率的にｗを推定するアルゴリズムが存在する。また、非特許文献４にて開示されているＭＥＲＴ法による場合は、式(4)を式(11)のように変形し、式(11)のＬの値を最小化するｗを求めればよい。 In the equation (10), there is an algorithm for efficiently estimating w particularly when the feature value is binary of 0 and 1. Further, in the case of the MERT method disclosed in Non-Patent Document 4, equation (4) may be transformed into equation (11) to obtain w that minimizes the value of L in equation (11). .

ここでαはハイパーパラメータであり、開発セットなどを用いて決定する。式(11)によれば正解シンボル系列を用いることなくモデルパラメータｗの推定を行うことができる。なお、モデルパラメータｗの推定は、具体的には公知のＬ−ＢＦＧＳなどの手法により行うことができる。更に、非特許文献５にて開示されているＧＣＬＭ法による場合は、式(2)
を式(12)のように変形し、式(12)のＬの値を最小化するｗを求めればよい。 Here, α is a hyper parameter, which is determined using a development set or the like. According to Equation (11), the model parameter w can be estimated without using the correct symbol sequence. The estimation of the model parameter w can be specifically performed by a known method such as L-BFGS. Further, in the case of the GCLM method disclosed in Non-Patent Document 5, the formula (2)
May be transformed as shown in Equation (12) to obtain w that minimizes the value of L in Equation (12).

ここで‖ｗ‖はノルムであり、これを用いることでロバストな推定結果が得られることが知られている。また、Ｃはハイパーパラメータであり、開発セットなどを用いて決定する。式(12)によればモデルパラメータｗの推定結果が最適解に収束することが保証されている。モデルパラメータｗの推定は、具体的には公知のＬ−ＢＦＧＳなどの手法により行うことができる。 Here, ‖w‖ is a norm, and it is known that a robust estimation result can be obtained by using this norm. C is a hyperparameter, which is determined using a development set or the like. According to Expression (12), it is guaranteed that the estimation result of the model parameter w converges to the optimal solution. Specifically, the model parameter w can be estimated by a known method such as L-BFGS.

＜効果の検証＞
日本語話し言葉コーパス（ＣＳＪ）を用い、本発明の効果を検証する。ＣＳＪは講演音声データとその書き起こしからなるデータベースである。なお、検証にあたり、図３に示す学習用と開発用と２つの評価用のセットを用意した。 <Verification of effects>
A Japanese spoken corpus (CSJ) is used to verify the effect of the present invention. CSJ is a database consisting of speech data and transcripts. In the verification, two sets for evaluation and for learning and for development shown in FIG. 3 were prepared.

講演を発話単位に分割し、音声認識システムで5000-bestリストを作成した。つまり、リストの数は発話数に一致する。そして、シンボル系列は音声認識結果であり、各リストに最大５０００のシンボル系列が存在する。素性にはuni、bi-、tri-gram boolean及び音声認識スコアを用いた。また、重要度には各シンボル系列のリスト中の順位（単語誤り率の昇順）を用いた。なお、図３に示す単語誤り率は、音声認識システムの出力した5000-bestリストのうち、最も大きな認識スコアを持つ認識結果に対して算出されたものである。perplexityはデータの近さを表す指標であり、音声認識システム内の言語モデルにより算出されたものである。perplexityの大きさから評価用Ｂが他のセットと異なる性質を多く含むことがわかる。 The lecture was divided into utterance units, and a 5000-best list was created with a speech recognition system. That is, the number of lists matches the number of utterances. The symbol series is a speech recognition result, and there are a maximum of 5000 symbol series in each list. For the feature, uni, bi-, tri-gram boolean and speech recognition score were used. For the importance, the rank in the list of each symbol series (ascending order of word error rate) was used. Note that the word error rate shown in FIG. 3 is calculated for the recognition result having the largest recognition score in the 5000-best list output by the speech recognition system. Perplexity is an index representing the proximity of data, and is calculated by a language model in the speech recognition system. It can be seen from the size of the perplexity that the evaluation B includes many different properties from the other sets.

モデルパラメータｗを、全シンボル系列を用いて推定した場合と本発明のようにマージした場合（式(8)を用いてマージ）とについてそれぞれ求め、これらを用いてシンボル系列を並べ替えて、それぞれ最終的に最も高いスコアを持つシンボル系列を音声認識結果として、両者の単語誤り率を比較した。なお、図４は本検証においてデータ保持に要したメモリサイズであり、全データを使用した場合は数十ギガバイトの記憶領域を要するのに対し、式(8)を用いてシンボル系列をマージした場合は、汎用的なコンピュータで動作可能なメガオーダにまで記憶領域の消費量が削減されている。図５は、ＥＬＢｓｔ法に基づく式(10)により学習したモデルパラメータを用いて得られた音声認識結果における単語誤り率を比較したものである。全データを用いて学習した場合も式(8)でマージした圧縮データにより学習した場合も、同程度の誤り率となっていることがわかる。なお、全データを用いて学習した場合、すなわち、１つの正解に対して複数のシンボル系列を用いる場合、ＥＬＢｓｔ法によると正解シンボル系列にパラメータ推定値が強く影響を受ける恐れがある。これに対し、式(8)＋重要度マージ無しではその影響が削除され、結果的に全データを使用した学習より高精度なモデルが生成されたと考えられる。本検証では、重要度マージを行うとリスト間の重要度のばらつきが大きくなり精度が低下する。もっとも、評価用Ｂセットでも全データを使用した場合と同程度の精度は得られている。これは、重要度マージ有りは重要度マージ無しを包含する表現力を備えることから、重要度の設計方法や本発明の適用対象に応じて、重要度のマージが効果的に働く場合があることを表していると考える。図６は、ＧＣＬＭ法に基づく式(12)により学習したモデルパラメータを用いて得られた音声認識結果における単語誤り率を比較したものである。ＧＣＬＭ法では重要度を扱う枠組みが無い。それでも、全データを用いた学習ではＥＬＢｓｔ法と同等以上の性能を持つモデルが生成されている。その理由のひとつとして、大局的最適解への収束が考えられる。シンボル系列のマージを行うと、素性ベクトル空間上に重要度が表現されるため、更に高精度なモデルが生成されることになる。図７は、ＭＥＲＴ法に基づく式(11)により学習したモデルパラメータを用いて得られた音声認識結果における単語誤り率を比較したものである。全データを用いた場合と式(8)でマージした場合とを比較すると、評価用Ｂセットで大きく精度が低下したものの、学習・開発用セットと似た特徴を持つ評価用Ａセットにおいては同等な性能が得られている。 When model parameter w is estimated using all symbol sequences and merged as in the present invention (merge using equation (8)), symbol sequences are rearranged using these, respectively, Finally, the symbol sequences with the highest scores were used as speech recognition results, and the word error rates were compared. Note that FIG. 4 shows the memory size required for data retention in this verification. When all data is used, a storage area of several tens of gigabytes is required, whereas when symbol sequences are merged using equation (8) In other words, the consumption of the storage area is reduced to a mega-order that can be operated by a general-purpose computer. FIG. 5 compares the word error rates in the speech recognition results obtained using the model parameters learned by Equation (10) based on the ELBst method. It can be seen that the error rate is comparable when learning is performed using all the data and when learning is performed using the compressed data merged in Equation (8). Note that, when learning is performed using all data, that is, when a plurality of symbol sequences are used for one correct answer, there is a possibility that the parameter estimation value is strongly influenced by the correct symbol sequence according to the ELBst method. On the other hand, it is considered that the influence is deleted without Expression (8) + importance merging, and as a result, a model with higher accuracy than learning using all data is generated. In this verification, when importance level merging is performed, the variation in importance level between lists increases, and the accuracy decreases. Of course, the B set for evaluation has the same accuracy as when all data is used. This is because there is an expressive power that includes importance merging and no importance merging, so importance merging may work effectively depending on the importance design method and the application target of the present invention. I think that represents. FIG. 6 compares the word error rates in the speech recognition results obtained using the model parameters learned by Equation (12) based on the GCLM method. The GCLM method has no framework for handling importance. Nevertheless, in learning using all data, a model having performance equivalent to or better than the ELBst method is generated. One of the reasons may be convergence to a global optimal solution. When symbol sequences are merged, importance is expressed in the feature vector space, so that a more accurate model is generated. FIG. 7 compares the word error rates in the speech recognition results obtained using the model parameters learned by Equation (11) based on the MERT method. Comparing the case of using all data and the case of merging with equation (8), although the accuracy is greatly reduced in the evaluation B set, it is the same in the evaluation A set having characteristics similar to the learning / development set Performance is obtained.

以上のように、本発明の情報圧縮型モデルパラメータ推定装置及び方法によれば、学習精度を従来と同程度に確保しつつ、学習に使用するシンボル系列の情報を圧縮できるため、汎用の計算機でモデルパラメータの推定処理を行うことが可能となる。 As described above, according to the information compression type model parameter estimation apparatus and method of the present invention, it is possible to compress symbol sequence information used for learning while securing learning accuracy to the same level as in the past. Model parameter estimation processing can be performed.

上記の各装置をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。この場合、処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。また、上記の各種処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 When each of the above devices is realized by a computer, processing contents of functions that each device should have are described by a program. The processing functions are realized on the computer by executing the program on the computer. In this case, at least a part of the processing content may be realized by hardware. Further, the various processes described above are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

Claims

N lists i (i is an index of the list, N is an integer greater than or equal to 1, and j is an integer greater than or equal to 1 _), each of which is made up of n _i symbol sequences f _{i, j} assigned importance levels e _{i, j} and represented by feature vectors. Symbol sequence index in each i, n _i is an integer of 4 or more) and correct symbol sequence f _{i, 0 of} each list i expressed by a feature vector, respectively _, and an information compression type for estimating a model parameter w A model parameter estimation device,
A grouping unit that groups a plurality of symbol sequences f _{i, j} belonging to the list into a plurality of groups G _i (x) (x is an index of a group) for each list i;
The group G _i (x) belonging to a plurality of symbol sequences f _i, representatives from _j symbol sequence f _{i, x,} and also the group G _i (x) belonging to a plurality of symbol sequences f _i, of the plurality corresponding to _j A merging unit for obtaining representative importance e _{i, x} from importance e _{i, j} , respectively;
A model parameter estimator for estimating a model parameter w from the representative symbol sequence f _{i, x} , the correct symbol sequence f _{i, 0,} and the representative importance e _{i, x} ;
An information compression model parameter estimation device comprising:

N lists i (i is an index of the list, N is an integer greater than or equal to 1, and j is an integer greater than or equal to 1 _), each of which is made up of n _i symbol sequences f _{i, j} each assigned an importance e _{i, j} and represented by a feature vector An index of a symbol series in each i, where n _i is an integer of 4 or more), and is an information compression model parameter estimation device that estimates a model parameter w,
A grouping unit that groups a plurality of symbol sequences f _{i, j} belonging to the list into a plurality of groups G _i (x) (x is an index of a group) for each list i;
The group G _i (x) belonging to a plurality of symbol sequences f _i, representatives from _j symbol sequence f _{i, x,} and also the group G _i (x) belonging to a plurality of symbol sequences f _i, of the plurality corresponding to _j A merging unit for obtaining representative importance e _{i, x} from importance e _{i, j} , respectively;
A model parameter estimation unit for estimating a model parameter w from the representative symbol series f _{i, x} and the representative importance e _{i, x} ;
An information compression model parameter estimation device comprising:

Each of N lists i (i is an index of the list, N is an integer greater than or equal to 1, j is an index of the symbol sequence in each i, and n is composed of n _i symbol sequences f _{i, j} each represented by a feature vector _i is an integer of 4 or more) and a correct symbol sequence f _{i, 0 of} each list i each represented by a feature vector, and is an information compression type model parameter estimation device for estimating a model parameter w,
A grouping unit that groups a plurality of symbol sequences f _{i, j} belonging to the list into a plurality of groups G _i (x) (x is an index of a group) for each list i;
A merging unit for obtaining a representative symbol sequence f _{i, x} from a plurality of symbol sequences f _{i, j} belonging to the group G _i (x);
A model parameter estimation unit for estimating a model parameter w from the representative symbol sequence f _{i, x} and the correct symbol sequence f _{i, 0} ;
An information compression model parameter estimation device comprising:

In the information compression model parameter estimation device according to claim 1 or 2,
The information compression type model parameter estimation apparatus, wherein the grouping unit performs grouping based on a distance in a feature vector space or an importance value.

In the information compression model parameter estimation device according to claim 1 or 2,
The merging unit obtains a representative symbol sequence f _{i, x} as a centroid or a weighted inner dividing point of a plurality of symbol sequences f _{i, j} belonging to the group G _i (x), and sets the representative importance e _{i, x} to the group An information compression model parameter estimation apparatus characterized by obtaining an average value of a plurality of importance levels e _{i, j} corresponding to a plurality of symbol sequences f _{i, j} belonging to G _i (x).

N or more lists i (i is an index of the list, N is an integer of 1 or more, and j is composed of n _i symbol sequences f _{i, j} each assigned an importance e _{i, j} and represented by a feature vector. Is a symbol sequence index for each i, n _i is an integer greater than or equal to 4), and correct symbol sequences f _{i, 0 of} each list i each represented by a feature vector, and information compression for estimating a model parameter w Type model parameter estimation method,
A grouping step for grouping a plurality of symbol sequences f _{i, j} belonging to the list into a plurality of groups G _i (x) (x is an index of the group) for each list i;
The group G _i (x) belonging to a plurality of symbol sequences f _i, representatives from _j symbol sequence f _{i, x,} and also the group G _i (x) belonging to a plurality of symbol sequences f _i, of the plurality corresponding to _j A merging step for obtaining the representative importance e _{i, x} from the importance e _{i, j} , respectively;
A model parameter estimation step for estimating a model parameter w from the representative symbol sequence f _{i, x} , the correct symbol sequence f _{i, 0} and the representative importance e _{i, x} ;
The information compression type model parameter estimation method which performs.

N lists i (i is an index of the list, N is an integer greater than or equal to 1, and j is an integer greater than or equal to 1 _), each of which is made up of n _i symbol sequences f _{i, j} assigned importance levels e _{i, j} and represented by feature vectors. An index of a symbol sequence in each i, where n _i is an integer of 4 or more), and is an information compression model parameter estimation method for estimating a model parameter w,
A grouping step for grouping a plurality of symbol sequences f _{i, j} belonging to the list into a plurality of groups G _i (x) (x is an index of the group) for each list i;
The group G _i (x) belonging to a plurality of symbol sequences f _i, representatives from _j symbol sequence f _{i, x,} and also the group G _i (x) belonging to a plurality of symbol sequences f _i, of the plurality corresponding to _j A merging step for obtaining the representative importance e _{i, x} from the importance e _{i, j} , respectively;
A model parameter estimation step for estimating a model parameter w from the representative symbol series f _{i, x} and the representative importance e _{i, x} ;
The information compression type model parameter estimation method which performs.

Each of N lists i (i is an index of the list, N is an integer greater than or equal to 1, j is an index of the symbol sequence in each i, and n is composed of n _i symbol sequences f _{i, j} each represented by a feature vector _i is an integer greater than or _equal to 4) and a correct symbol sequence f _{i, 0 of} each list i each represented by a feature vector, and is an information compression model parameter estimation method for estimating a model parameter w,
A grouping step for grouping a plurality of symbol sequences f _{i, j} belonging to the list into a plurality of groups G _i (x) (x is an index of the group) for each list i;
A merging step for obtaining a representative symbol sequence f _{i, x} from a plurality of symbol sequences f _{i, j} belonging to the group G _i (x);
A model parameter estimation step for estimating a model parameter w from the representative symbol sequence f _{i, x} and the correct symbol sequence f _{i, 0} ;
The information compression type model parameter estimation method which performs.

In the information compression type model parameter estimation method according to claim 6 or 7 ,
In the information compression type model parameter estimation method, the grouping step includes grouping based on a distance in a feature vector space or a value of importance.

In the information compression type model parameter estimation method according to claim 6 or 7 ,
In the merging step, the representative symbol sequence f _{i, x} is obtained as a centroid or a weighted inner dividing point of a plurality of symbol sequences f _{i, j} belonging to the group G _i (x), and the representative importance e _{i, x} is determined as a group. An information compression model parameter estimation method, characterized in that it is obtained as an average value of a plurality of importance levels e _{i, j} corresponding to a plurality of symbol sequences f _{i, j} belonging to G _i (x).

Program for causing a computer to function as an apparatus according to any one of claims 1 to 5.