JP2017016384A

JP2017016384A - Mixed coefficient parameter learning device, mixed occurrence probability calculation device, and programs thereof

Info

Publication number: JP2017016384A
Application number: JP2015132347A
Authority: JP
Inventors: 正熊野; Tadashi Kumano
Original assignee: Nippon Hoso Kyokai NHK
Current assignee: Japan Broadcasting Corp
Priority date: 2015-07-01
Filing date: 2015-07-01
Publication date: 2017-01-19

Abstract

PROBLEM TO BE SOLVED: To provide a mixed coefficient parameter learning device for improving the accuracy of a mixed occurrence probability.SOLUTION: A mixed coefficient parameter learning device 30 includes: first occurrence probability request means 312 for requesting a hidden layer vector and an occurrence probability from a neural network language model calculation device 10; second occurrence probability input means 313 for requesting an occurrence probability from an other language model calculation device 20; first mixed coefficient calculation means 314 for calculating a mixed coefficient from the hidden layer vector; mapping vector update means 315 for updating a mapping vector by a probabilistic gradient descent method; update rate reduction means 317 for reducing an update rate; and termination condition determination means 316 for causing the mapping vector update means 315 to update the mapping vector until satisfying a termination condition.SELECTED DRAWING: Figure 2

Description

本願発明は、混合係数の算出に必要なパラメータを学習する混合係数パラメータ学習装置、ニューラルネットワーク確率モデルと他の確率モデルとの混合生起確率を算出する混合生起確率算出装置、及び、これらのプログラムに関する。 The present invention relates to a mixing coefficient parameter learning device that learns parameters necessary for calculating a mixing coefficient, a mixed occurrence probability calculation device that calculates a mixed occurrence probability between a neural network probability model and another probability model, and a program thereof. .

統計的言語モデル（以後、「言語モデル」）とは、ある言語又はドメインにおいて、単語系列ｗ_１ｗ_２…ｗ_ｎが生起する確率ｐ（ｗ_１ｗ_２…ｗ_ｎ）を計算する手段、及び、その手段によって計算するのに必要な各種統計量の一覧として定義される。言語モデルによって言語の生起を確率モデル化することは、統計的自然言語処理の最も基本的な技術の一つであり、音声認識、機械翻訳をはじめとする各種自然言語処理技術に用いられている。 Statistical language model (hereinafter, "language model") and comprises means for calculating in a language or domain, the probability word sequence _{_{_w}} 1 _w 2 ... _w _n are occurring _{_{_{p (w 1 w 2 ... w}}} n), and , Defined as a list of various statistics necessary for calculation by the means. Probabilistic modeling of language occurrence by language model is one of the most basic techniques of statistical natural language processing, and is used in various natural language processing techniques such as speech recognition and machine translation. .

なお、言語モデルとは、ある言語、又は、その言語の特定分野における表現（単語系列）生起の確率モデルのことであり、一般的には予め与えられた当該言語又はその言語の当該分野のコーパスから学習する。
また、コーパスとは、ある言語又はその言語の特定分野で観測された単語系列の実例である。
また、ｗ_１，ｗ_２，ｗ_ｎは、単語を表す。 The language model is a probability model of occurrence of an expression (word sequence) in a specific language or a specific field of the language, and is generally a predetermined language or a corpus of the language in the field. To learn from.
A corpus is an example of a word sequence observed in a certain language or a specific field of the language.
In _{_{_{addition, w 1, w 2, w}}} n represents the word.

単語系列の生起確率ｐ（ｗ_１ｗ_２…ｗ_ｎ）は、一般的には、系列の各単語がそれ以前の単語系を前文脈として生起する確率の積、すなわち、ｐ（ｗ_１）×ｐ（ｗ_２｜ｗ_１）×ｐ（ｗ_３｜ｗ_１ｗ_２）×…×ｐ（ｗ_ｎ｜ｗ_１ｗ_２…ｗ_ｎ−１）としてモデル化される。つまり、言語モデルは、前文脈が与えられた条件下での次単語生起の予測モデルであると言える。 The occurrence probability p (w ₁ w ₂ ... W _n ) of a word sequence is generally the product of the probabilities that each word of the sequence occurs with the previous word system as the previous context, that is, p (w ₁ ) × It is modeled as p (w ₂ | w ₁ ) × p (w ₃ | w ₁ w ₂ ) ×... × p (w _n | w ₁ w ₂ ... w _n−1 ). That is, it can be said that the language model is a prediction model for occurrence of the next word under the condition given the previous context.

言語モデルの最も一般的な実現手法は、ｎ−ｇｒａｍ言語モデルである。このｎ−ｇｒａｍ言語モデルは、前記条件となる前文脈を直近のｎ−１単語に制限し（但し、ｎは１以上の整数）、学習コーパスから、ｎ−１単語の列である前文脈の異なり毎に次単語生起頻度を収集した結果に基づき、各前文脈条件下の次単語生起確率を推定するものである。 The most common implementation method of the language model is an n-gram language model. This n-gram language model restricts the preceding context as the condition to the nearest n-1 words (where n is an integer equal to or greater than 1), and from the learning corpus, the previous context is a string of n-1 words. The next word occurrence probability under each previous context condition is estimated based on the result of collecting the next word occurrence frequency for each difference.

ｎ−ｇｒａｍ言語モデルでは、精度よく次単語の生起確率を推定するために長い前文脈を参照する（大きな値のｎを用いる）必要がある。また、ｎ−ｇｒａｍ言語モデルでは、各前文脈に対して十分な実例を集める必要があるが、長い前文脈を用いるほど前文脈の異なりが増加するため、正確性を向上させるために非常に大きな学習コーパスを用意する必要がある。 In the n-gram language model, it is necessary to refer to a long previous context (using a large value of n) in order to accurately estimate the occurrence probability of the next word. Also, in the n-gram language model, it is necessary to collect sufficient examples for each previous context, but the longer the previous context, the more the difference in the previous context increases, so it is very large to improve accuracy. A learning corpus needs to be prepared.

近年、このｎ−ｇｒａｍ言語モデルに対して、ニューラルネットワークを用いた言語モデル実現手法が提案されている。この手法は、ニューラルネットワークを用いて、各単語を表すものとして、固定次元で各次元が実数値である単語表現ベクトルへの写像を学習し、前文脈として単語列の各単語に対応する単語表現ベクトルの組み合わせを用いるものである。 In recent years, a language model realization method using a neural network has been proposed for this n-gram language model. This method uses a neural network to learn a mapping to a word expression vector with a fixed dimension and each dimension as a real value as representing each word, and the word expression corresponding to each word in the word sequence as the previous context A combination of vectors is used.

例えば、非特許文献１に記載のＮＮＬＭ（Neural Network Language Model）は、図５のようなニューラルネットワークを構築する。以後、言語モデルについて、有限個｜Ｖ｜種類の単語のみを扱うものとし、各単語は１〜｜Ｖ｜の数値として表すことにする。このとき、｜Ｖ｜種類の単語の中には、必ず文頭を表す特殊な単語を含むものとする。ここで、各単語ｗに対応する予め定めた固定次元数ｍの単語表現ベクトルをＣ（ｗ）とする。また、単語系列ｗ_１ｗ_２…ｗ_ｔの生起に関して、単語ｗ_ｔのｎ−１個の前文脈を表すｎ−１個の単語表現ベクトルを連結したｎ×ｍ次元の入力ベクトルｘ（ｔ）＝［Ｃ（ｗ_{ｔ−ｎ＋１}），…，Ｃ（ｗ_ｔ−２），Ｃ（ｗ_ｔ−１）]から予め定めた固有次元数ｈのベクトルへの線形写像をＨｘ（ｔ）とする。
なお、前文脈の長さがｎ−１未満である（すなわちｔ＜ｎである）場合には、単語ｗ_１の前にｎ−ｔ個の文頭を表す単語を補うことで入力ベクトルｘ（ｔ）を作成するものとする。 For example, NNLM (Neural Network Language Model) described in Non-Patent Document 1 constructs a neural network as shown in FIG. Hereinafter, regarding the language model, only a limited number of | V | types of words are handled, and each word is expressed as a numerical value of 1 to | V |. At this time, it is assumed that a special word representing the beginning of a sentence is included in the | V | type words. Here, it is assumed that a word expression vector having a predetermined fixed dimension number m corresponding to each word w is C (w). Also, regarding the occurrence of the word sequence w ₁ w ₂ ... W _t , an n × m-dimensional input vector x (t) obtained by concatenating n−1 word expression vectors representing n−1 previous contexts of the word w _t. = [C (w _{t−n + 1} ),..., C (w _t−2 ), C (w _t−1 )] to Hx (t) is a linear mapping from a vector having a predetermined eigendimension number h.
If the length of the previous context is less than n−1 (ie, t <n), the input vector x (t (t) is obtained by supplementing the word w _{1 with} the word representing the beginning of the sentence before the word w _1. ).

また、線形写像Ｈｘ（ｔ）の各次元を非線形関数ｆ（例えば、双曲線正接関数ｔａｎｈ）で変換した隠れ層ベクトルｚ（ｔ）から｜Ｖ｜次元ベクトルｙ（ｔ）への線形写像をＵｚ（ｔ）とする。
また、ｙ（ｔ）の各次元を式（１）に示す関数で変換した｜Ｖ｜次元ベクトルを出力ベクトルｐ（ｔ）とする。この場合、次単語がｗ_ｔである確率を以下の式（１）〜式（３）のように定義する（但し、ｙ_ｉはｙのｉ次元の値）。
また、入力ベクトルｘ（ｔ）の（ｔ）は、前文脈ｗ_１ｗ_２…ｗ_ｔ−１に後続する次単語ｗ_ｔの生起確率に関わる入力ベクトルｘを意味する（他のベクトルも同様）。
また、図５の‘○’はベクトルの要素を表す。 Further, a linear mapping from a hidden layer vector z (t) obtained by converting each dimension of the linear mapping Hx (t) with a nonlinear function f (for example, a hyperbolic tangent function tanh) to a | V | -dimensional vector y (t) is expressed as Uz ( t).
Also, a | V | -dimensional vector obtained by converting each dimension of y (t) with the function shown in Expression (1) is set as an output vector p (t). In this case, the probability that the next word is w _t is defined as in the following equations (1) to (3) (where y _i is the i-dimensional value of y).
Further, (t) of the input vector x (t) means the input vector x related to the occurrence probability of the next word w _t following the previous context w ₁ w ₂ ... W _t−1 (the same applies to other vectors). .
Further, “◯” in FIG. 5 represents a vector element.

ｎ，ｍ，ｈを予め設定し、学習コーパスの各単語ｗ_ｔに対して、前文脈ｗ_{ｔ−ｎ＋１}，…，ｗ_ｔ−２，ｗ_ｔ−１をニューラルネットワークに入力して次単語生起の確率分布を出力(順方向伝搬)し、出力ベクトルと正解ベクトルとの交差エントロピー誤差をニューラルネットワークに逆方向伝搬させながら、以下の式（４）〜式（６）のように単語表現ベクトルＣ、入力層から隠れ層への重みＨ、隠れ層から出力層への重みＵを確率的勾配降下法により更新する（但し、εは更新率）。これを学習コーパス全体で何回か繰り返すことによって学習を実現する。
なお、正解ベクトルとは、単語ｗ_ｔの生起確率を１とし、それ以外の単語の生起確率を０としたベクトルである。 n, m, h are set in advance, and for each word w _t in the learning corpus, the previous contexts w _{t−n + 1} ,..., w _t−2 , w _t−1 are input to the neural network to generate the next word occurrence. While outputting the probability distribution (forward propagation) and propagating the cross-entropy error between the output vector and the correct vector to the neural network in the backward direction, the word expression vector C as shown in the following equations (4) to (6): The weight H from the input layer to the hidden layer and the weight U from the hidden layer to the output layer are updated by the stochastic gradient descent method (where ε is the update rate). Learning is realized by repeating this several times in the entire learning corpus.
The correct vector is a vector in which the occurrence probability of the word w _t is 1, and the occurrence probabilities of other words are 0.

単語表現ベクトルＣの学習の結果、類似した単語が近い単語表現ベクトルに写像され、Ｈの学習の結果、類似した単語が近い隠れ層ベクトルに写像されるので、小規模な学習コーパスから学習した場合でも高い正確性を得ることができる。 As a result of learning the word expression vector C, similar words are mapped to the nearest word expression vector, and as a result of learning H, similar words are mapped to the nearest hidden layer vector, so when learning from a small learning corpus But you can get high accuracy.

また、ＮＮＬＭとは異なる手法として、非特許文献２に記載のＲＮＮＬＭ（Recurrent Neural Network Language Model）が提案されている。前記したように、ＮＮＬＭでは、単語ｗ_ｔに対して予め定めたｎ−１個の単語ｗ_{ｔ−ｎ＋１}，…，ｗ_ｔ−２，ｗ_ｔ−１から前文脈を表す隠れ層ベクトルｚ（ｔ）を計算する。一方、このＲＮＮＬＭは、図６のように、隠れ層ベクトルｚ（ｔ）を、１つ前の単語ｗ_ｔ−１及びこの単語ｗ_ｔ−１に対する前文脈を表す隠れ層ベクトルｚ（ｔ−１）から計算する。これにより、ＲＮＮＬＭは、明示的な前文脈長ｎを与えることなく、長い前文脈を反映した次単語生起の予測を行うことを可能にする。 Further, as a method different from NNLM, RNNLM (Recurrent Neural Network Language Model) described in Non-Patent Document 2 has been proposed. As described above, in NNLM, word predetermined (n-1) for the word _{_{w t w t-n + 1}} , ..., w t-2, w hiding from _t-1 represents a previous context layer vector z (t ). On the other hand, this RNNLM, as in FIG. 6, a hidden layer vector z (t), 1 previous word _{w t-1} and the hidden layer vector z representing the previous context for this word _{w t-1 (t-1} ) This allows RNNLM to predict the occurrence of the next word reflecting a long previous context without giving an explicit previous context length n.

一般的には、これらニューラルネットワーク言語モデルは、他の言語モデル（例えば、ｎ−ｇｒａｍ言語モデル）と組み合わせて用いられる。具体的には、ニューラルネットワーク言語モデルによる生起確率をｐ_Ｎ、他の言語モデルによる生起確率をｐ_Ｏ、混合比率をλとする。この場合、以下の式（７）のように、λ：１−λの比率で両言語モデルの生起確率ｐ_Ｎ，ｐ_Ｏを混合したものを混合生起確率ｐとして算出する。 Generally, these neural network language models are used in combination with other language models (for example, n-gram language models). Specifically, the occurrence probability by the neural network language model is p _N , the occurrence probability by another language model is p _O , and the mixing ratio is λ. In this case, as shown in the following formula (7), a mixture of the occurrence probabilities p _N and p _O of the two language models at a ratio of λ: 1−λ is calculated as the mixed occurrence probability p.

なお、式（７）では、λが混合係数を表す。通常、混合係数λは、学習済みの両言語モデルを用意した上で、別途用意したテストコーパスに対して最も正確性が高くなる値を決定し、決定した値を固定的に用いる。 In Equation (7), λ represents a mixing coefficient. In general, the mixed coefficient λ is prepared by preparing a learned bilingual model, determining a value with the highest accuracy with respect to a separately prepared test corpus, and using the determined value in a fixed manner.

このように生起確率を混合するには、以下のような理由がある。
１）ニューラルネットワーク言語モデルでは学習コーパスに表れなかった単語（未知語）の生起確率を推定する一般的手法がないが、ｎ−ｇｒａｍ言語モデルでは未知語に適切な生起確率を割り当てることができる。
２）ニューラルネットワーク言語モデルは学習のための計算量がｎ−ｇｒａｍ言語モデルと比較して非常に大きいため、ニューラルネットワーク言語モデルで小規模のドメインに特化した学習コーパスを用いて学習し、より広範囲の大規模な学習コーパスから学習したｎ−ｇｒａｍ言語モデルと組み合わせることが現実的である。 There are the following reasons for mixing the occurrence probabilities in this way.
1) Although there is no general method for estimating the occurrence probability of a word (unknown word) that did not appear in the learning corpus in the neural network language model, an appropriate occurrence probability can be assigned to the unknown word in the n-gram language model.
2) Since the neural network language model has a very large amount of calculation for learning compared to the n-gram language model, the neural network language model learns using a learning corpus specialized for small domains in the neural network language model. It is realistic to combine with an n-gram language model learned from a wide range of large-scale learning corpora.

A Neural Probabilistic Language Model,Yoshua Bengio et.al,Journal of Machine Learning Research 3,(2003),1137-1155A Neural Probabilistic Language Model, Yoshua Bengio et.al, Journal of Machine Learning Research 3, (2003), 1137-1155 Static Language Model based on Neural Network,Tomas MikolovStatic Language Model based on Neural Network, Tomas Mikolov

しかし、ニューラルネットワーク言語モデルでは、前文脈に関係なく固定的な混合係数を用いているため、混合生起確率の正確性が低くなるという問題がある。例えば、前文脈「私は」の次に表れる単語を正確に予測するためには巨大な学習コーパスで学習する必要がある。一方、前文脈「私は山にいきまし」の次に表れる単語は「た」「て」くらいしかあり得ず、小規模な学習コーパスで学習しても正確に予測可能である。すなわち、ニューラルネットワーク言語モデルでは、前文脈に応じて異なる混合係数を用いることで、混合生起確率の正確性を向上させることができる。 However, since the neural network language model uses a fixed mixing coefficient regardless of the previous context, there is a problem that the accuracy of the mixed occurrence probability is lowered. For example, in order to accurately predict the word appearing next to the previous context “I am”, it is necessary to learn with a huge learning corpus. On the other hand, the word that appears next to the previous context “I went to the mountain” can only be about “ta” and “te”, and can be accurately predicted even by learning with a small learning corpus. In other words, the neural network language model can improve the accuracy of the mixed occurrence probability by using different mixing coefficients depending on the previous context.

本願発明は、前記した課題に鑑みて、混合生起確率の正確性を向上させる混合係数パラメータ学習装置、混合生起確率算出装置、及び、これらのプログラムを提供することを課題とする。 This invention makes it a subject to provide the mixing coefficient parameter learning apparatus, the mixing occurrence probability calculation apparatus, and these programs which improve the precision of mixing occurrence probability in view of an above described subject.

前記した課題に鑑みて、本願発明に係る混合係数パラメータ学習装置は、ニューラルネットワーク確率モデルと前記ニューラルネットワーク確率モデル以外の他の確率モデルとのそれぞれで求めた前要素系列に対する次要素の生起確率を混合するときの混合係数の算出に必要なパラメータを学習する混合係数パラメータ学習装置であって、第１生起確率入力手段と、第２生起確率入力手段と、第１混合係数算出手段と、写像ベクトル更新手段と、更新率減少手段と、終了条件判定手段とを備える構成とした。 In view of the problems described above, the mixing coefficient parameter learning device according to the present invention determines the occurrence probability of the next element with respect to the previous element sequence obtained by each of the neural network probability model and the other probability models other than the neural network probability model. A mixing coefficient parameter learning device for learning parameters necessary for calculating a mixing coefficient when mixing, a first occurrence probability input means, a second occurrence probability input means, a first mixing coefficient calculation means, a mapping vector The update unit, the update rate reduction unit, and the end condition determination unit are provided.

かかる構成によれば、混合係数パラメータ学習装置は、第１生起確率入力手段によって、前記ニューラルネットワーク確率モデルの隠れ層ベクトルと、前記ニューラルネットワーク確率モデルで求めた生起確率とが入力される。 According to such a configuration, the mixed coefficient parameter learning apparatus receives the hidden layer vector of the neural network probability model and the occurrence probability obtained by the neural network probability model by the first occurrence probability input means.

すなわち、学習済みのニューラルネットワーク確率モデルに前要素系列を入力すると、ニューラルネットワーク確率モデルの隠れ層ベクトルとして、汎化された前要素の表現が得られる。従って、学習済みのニューラルネットワーク確率モデルの隠れ層ベクトルから混合係数への写像ベクトルを学習すれば、前要素系列に応じた混合係数を求めることが可能となる。 That is, when a previous element sequence is input to a learned neural network probability model, a generalized expression of the previous element is obtained as a hidden layer vector of the neural network probability model. Therefore, if a mapping vector from a hidden layer vector of a learned neural network probability model to a mixture coefficient is learned, a mixture coefficient corresponding to the previous element sequence can be obtained.

混合係数パラメータ学習装置は、第２生起確率入力手段によって、前記他の確率モデルで求めた生起確率が入力される。
混合係数パラメータ学習装置は、第１混合係数算出手段によって、予め設定された写像ベクトルにより前記隠れ層ベクトルを実数値のスカラに線形写像し、前記実数値のスカラをシグモイド関数で非線形変換することで、前記混合係数を算出する。 In the mixing coefficient parameter learning device, the occurrence probability obtained by the other probability model is input by the second occurrence probability input means.
The mixing coefficient parameter learning device linearly maps the hidden layer vector to a real-valued scalar according to a preset mapping vector by the first mixing coefficient calculation means, and nonlinearly converts the real-valued scalar by a sigmoid function. And calculating the mixing coefficient.

混合係数パラメータ学習装置は、写像ベクトル更新手段によって、前記ニューラルネットワーク確率モデルと前記他の確率モデルとのそれぞれで求めた生起確率、前記混合係数、及び、予め設定された更新率を用いた確率的勾配降下法により、前記パラメータとしての前記写像ベクトルを更新する。 The mixing coefficient parameter learning device uses the mapping vector update means to generate the probabilities of occurrence of the neural network probability model and the other probability models, the mixing coefficient, and a probabilistic value using a preset update rate. The mapping vector as the parameter is updated by a gradient descent method.

混合係数パラメータ学習装置は、更新率減少手段によって、予め設定された更新率減少規則に従って前記更新率を減少させる。
混合係数パラメータ学習装置は、終了条件判定手段によって、予め設定された終了条件を満たすか否かを判定し、前記終了条件を満たすまで、減少させた前記更新率で前記写像ベクトル更新手段に前記写像ベクトルを更新させる。例えば、この終了条件は、更新率を減少させても、生起確率が変化しないという条件である。 The mixing coefficient parameter learning device decreases the update rate according to a preset update rate decrease rule by the update rate decrease means.
The mixing coefficient parameter learning device determines whether or not a predetermined end condition is satisfied by an end condition determining unit, and the mapping vector updating unit reduces the update rate to the mapping vector update unit until the end condition is satisfied. Update the vector. For example, the termination condition is a condition that the occurrence probability does not change even if the update rate is decreased.

また、前記した課題に鑑みて、本願発明に係る混合生起確率算出装置は、ニューラルネットワーク確率モデルと前記ニューラルネットワーク確率モデル以外の他の確率モデルとのそれぞれで求めた前要素系列に対する次要素の生起確率を混合した混合生起確率を算出する混合生起確率算出装置であって、第３生起確率入力手段と、第４生起確率入力手段と、第２混合係数算出手段と、混合生起確率算出手段とを備える構成とした。 Further, in view of the above-described problems, the mixed occurrence probability calculation device according to the present invention provides the occurrence of the next element with respect to the previous element sequence obtained by each of the neural network probability model and another probability model other than the neural network probability model. A mixed occurrence probability calculating device for calculating a mixed occurrence probability in which probabilities are mixed, comprising: a third occurrence probability input means; a fourth occurrence probability input means; a second mixing coefficient calculation means; and a mixed occurrence probability calculation means. It was set as the structure provided.

かかる構成によれば、混合生起確率算出装置は、第３生起確率入力手段によって、前記ニューラルネットワークの隠れ層ベクトルと、前記ニューラルネットワーク確率モデルで求めた生起確率とが入力される。
混合生起確率算出装置は、第４生起確率入力手段によって、前記他の確率モデルで求めた生起確率が入力される。 According to such a configuration, the mixed occurrence probability calculation device receives the hidden layer vector of the neural network and the occurrence probability obtained by the neural network probability model by the third occurrence probability input unit.
In the mixed occurrence probability calculation device, the occurrence probability obtained by the other probability model is input by the fourth occurrence probability input means.

混合生起確率算出装置は、第２混合係数算出手段によって、本願発明に係る混合係数パラメータ学習装置が学習した写像ベクトルで前記隠れ層ベクトルを実数値のスカラに線形写像し、前記実数値のスカラをシグモイド関数で非線形変換することで、前要素系列に応じた混合係数を算出する。 The mixed occurrence probability calculating device linearly maps the hidden layer vector to a real-valued scalar with the mapping vector learned by the mixed-coefficient parameter learning device according to the present invention by the second mixing coefficient calculating means, and converts the real-valued scalar into the real-valued scalar. By performing non-linear transformation with a sigmoid function, a mixing coefficient corresponding to the previous element series is calculated.

混合生起確率算出装置は、混合生起確率算出手段によって、前要素系列に応じた前記混合係数を用いて、前記ニューラルネットワーク確率モデルと前記他の確率モデルとで求めた前記次要素の生起確率を混合することで、前記混合生起確率を算出する。 The mixed occurrence probability calculating device mixes the occurrence probabilities of the next element obtained by the neural network probability model and the other probability models by using the mixing coefficient corresponding to the previous element series by the mixed occurrence probability calculating means. Thus, the mixed occurrence probability is calculated.

本願発明は、以下のような優れた効果を奏する。
本願発明によれば、学習済みのニューラルネットワーク確率モデルの隠れ層ベクトルから混合係数への写像ベクトルを学習する。これにより、前要素系列に応じた混合係数が求められるので、混合生起確率の正確性を向上させることができる。 The present invention has the following excellent effects.
According to the present invention, a mapping vector from a hidden layer vector of a learned neural network probability model to a mixture coefficient is learned. Thereby, since the mixing coefficient according to the previous element series is obtained, the accuracy of the mixing occurrence probability can be improved.

本願発明における写像ベクトルの学習手順を説明する説明図である。It is explanatory drawing explaining the learning procedure of the mapping vector in this invention. 本願発明の実施形態に係る混合生起確率算出システムの構成を示すブロック図である。It is a block diagram which shows the structure of the mixed occurrence probability calculation system which concerns on embodiment of this invention. 図２の混合係数パラメータ算出装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the mixing coefficient parameter calculation apparatus of FIG. 図２の混合生起確率算出装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the mixed occurrence probability calculation apparatus of FIG. 従来のＮＮＬＭの処理概要を説明する説明図である。It is explanatory drawing explaining the process outline | summary of the conventional NNLM. 従来のＲＮＮＬＭの処理概要を説明する説明図である。It is explanatory drawing explaining the process outline | summary of the conventional RNNLM.

以下、本願発明の実施形態に係る混合生起確率算出システム１について説明する。
最初に、図１を参照し、写像ベクトルの学習手順と、混合生起確率の算出手順とを説明する。その後、混合生起確率算出システム１の構成について説明する。 Hereinafter, the mixed occurrence probability calculation system 1 according to the embodiment of the present invention will be described.
First, the mapping vector learning procedure and the mixed occurrence probability calculation procedure will be described with reference to FIG. Thereafter, the configuration of the mixed occurrence probability calculation system 1 will be described.

ここで、前文脈（前要素系列）を表現したｈ次元の隠れ層ベクトルを持ち、この隠れ層ベクトルからの写像として次単語の生起確率を演算するニューラルネットワーク言語モデル演算装置１０があることとする。このニューラルネットワーク言語モデル演算装置１０は、生起確率の計算に必要な各統計量が学習コーパス等により学習済みであることとする。 Here, it is assumed that there is a neural network language model calculation apparatus 10 that has an h-dimensional hidden layer vector representing the previous context (previous element series) and calculates the occurrence probability of the next word as a mapping from the hidden layer vector. . The neural network language model computing device 10 assumes that each statistic necessary for calculating the occurrence probability has been learned by a learning corpus or the like.

また、ｎ−ｇｒａｍ言語モデル等の他の言語モデルで次単語の生起確率を推定する他言語モデル演算装置２０（図２）があることとする。この他言語モデル演算装置２０では、生起確率の計算に必要な各統計量が学習コーパス等により学習済みであることとする。
なお、他言語モデル演算装置２０の各統計量を学習するために用いる学習コーパス等は、前記のニューラルネットワーク言語モデル演算装置１０の各統計量を学習するために用いた学習コーパスと同一である必要はない。 Further, it is assumed that there is another language model calculation device 20 (FIG. 2) that estimates the occurrence probability of the next word using another language model such as an n-gram language model. In this other language model calculation device 20, it is assumed that each statistic necessary for calculating the occurrence probability has been learned by a learning corpus or the like.
The learning corpus used for learning each statistic of the other language model calculation device 20 needs to be the same as the learning corpus used for learning each statistic of the neural network language model calculation device 10. There is no.

＜写像ベクトルの学習手順＞
ニューラルネットワーク言語モデル演算装置１０に前文脈を入力すると、隠れ層ベクトルとして、汎化された前文脈の表現が得られる。そこで、本願発明は、図１のように、ニューラルネットワーク言語モデル演算装置１０に、隠れ層ベクトルｚから混合係数λへの写像を行う処理を追加し、この写像ベクトルＳを学習することとする。 <Mapping vector learning procedure>
When the previous context is input to the neural network language model calculation device 10, a generalized expression of the previous context is obtained as a hidden layer vector. Therefore, in the present invention, as shown in FIG. 1, a process for mapping from the hidden layer vector z to the mixing coefficient λ is added to the neural network language model calculation apparatus 10 and the mapping vector S is learned.

具体的には、ニューラルネットワーク言語モデルによる処理に以下の式（８）及び式（９）の処理を加えることで、前文脈…ｗ_ｔ−１が与えられたときの次単語ｗ_ｔの生起確率の算出に必要な混合係数λ（ｔ）を計算するようにする。 Specifically, by adding the processing of the following equations (8) and (9) to the processing by the neural network language model, the occurrence probability of the next word w _t when the previous context... W _t−1 is given. The mixing coefficient λ (t) necessary for calculating is calculated.

なお、式（８）は、実数値のスカラｓ（ｔ）から混合係数λ（ｔ）へのシグモイド関数による非線形変換を表している。
また、式（９）は、隠れ層ベクトルｚ（ｔ）から実数値のスカラｓ（ｔ）への線形写像Ｓｚ（t）を表している。また、式（９）では、ｂがバイアス値を表している。 Equation (8) represents a nonlinear conversion by a sigmoid function from a real-valued scalar s (t) to a mixing coefficient λ (t).
Equation (9) represents a linear mapping Sz (t) from a hidden layer vector z (t) to a real-valued scalar s (t). In Expression (9), b represents the bias value.

写像ベクトルＳ及びバイアス値ｂの学習は、何らかの学習コーパス中の各単語ｗ_ｔについて、生起確率ｐ_Ｎ（ｗ_ｔ｜…ｗ_ｔ−１）と生起確率ｐ_Ｏ（ｗ_ｔ｜…ｗ_ｔ−１）とを式（８）で定義される混合係数λ（ｔ）を用いて、以下の式（１０）に従って混合した混合生起確率ｐ（ｗ_ｔ｜…ｗ_ｔ−１）が最大となるように、以下の手順１〜手順３で行う。 Learning of the mapping vector S and the bias value b, for each word _{w t} in some learning corpus, the occurrence probability _{_{_{p N (w t | ... w}}} t-1) and the occurrence probability _{_{_{p O (w t | ... w}}} t-1 ) And the mixing coefficient λ (t) defined by the equation (8) so that the mixed occurrence probability p (w _t |... W _t−1 ) is maximized according to the following equation (10). The following steps 1 to 3 are performed.

なお、学習コーパスは、ニューラルネットワーク言語モデルや別の言語モデルの学習に用いたコーパスと同じもの、又は、そのコーパスと異なるものでもよい。
また、生起確率ｐ_Ｎ（ｗ_ｔ｜…ｗ_ｔ−１）は、前文脈…ｗ_ｔ−１をニューラルネットワーク言語モデルに与えて得られる次単語ｗ_ｔの生起確率である。
また、生起確率ｐ_Ｏ（ｗ_ｔ｜…ｗ_ｔ−１）は、前文脈…ｗ_ｔ−１を他の言語モデルに与えて得られる次単語ｗ_ｔの生起確率である。 Note that the learning corpus may be the same as or different from the corpus used to learn the neural network language model or another language model.
The occurrence probability p _N (w _t |... W _t-1 ) is the occurrence probability of the next word w _t obtained by giving the previous context... W _t-1 to the neural network language model.
The occurrence probability p _O (w _t |... W _t-1 ) is the occurrence probability of the next word w _t obtained by giving the previous context ... w _t-1 to another language model.

手順１．更新率εを予め設定する。
手順２．学習コーパス中の各単語ｗ_ｔに対して以下の（ａ）〜（ｃ）の処理を実行する。
（ａ）ニューラルネットワーク言語モデル演算装置１０に適宜（ＮＮＭＬのように前文脈が固定されている場合、その長さで区切った）前文脈…ｗ_ｔ−１を入力して順方向伝搬を行うことで、隠れ層ベクトルｚ（ｔ）及び次単語ｗ_ｔの生起確率ｐ_Ｎ（ｗ_ｔ｜…ｗ_ｔ−１）を求める。同様に、他の言語モデルの生起確率ｐ_Ｏ（ｗ_ｔ｜…ｗ_ｔ−１）を求める。 Procedure 1. An update rate ε is set in advance.
Procedure 2. The following processes (a) to (c) are executed for each word w _t in the learning corpus.
(A) The forward context is input to the neural network language model arithmetic unit 10 as appropriate (when the previous context is fixed as in NNML, divided by its length) ... w _t-1 to perform forward propagation. Thus, the occurrence probability p _N (w _t |... W _t−1 ) of the hidden layer vector z (t) and the next word w _t is obtained. Similarly, occurrence probabilities p _O (w _t |... W _t−1 ) of other language models are obtained.

（ｂ）隠れ層から順方向伝搬を行うことで、混合係数λ（ｔ）を求める。すなわち、式（８）及び式（９）を用いて、混合係数λ（ｔ）を求める。 (B) The mixing coefficient λ (t) is obtained by performing forward propagation from the hidden layer. That is, the mixing coefficient λ (t) is obtained using the equations (8) and (9).

（ｃ）確率的勾配降下法により写像ベクトルＳを更新する。すなわち、ｈ次元の写像ベクトルＳの各次元Ｓ_ｉを、以下の式（１１）及び式（１２）のように混合係数λ（ｔ）が反映された確率的勾配降下法により更新する。 (C) Update the mapping vector S by the stochastic gradient descent method. That is, each dimension S _i of the h-dimensional mapping vector S is updated by a probabilistic gradient descent method in which the mixing coefficient λ (t) is reflected as in the following expressions (11) and (12).

さらに、式（９）のバイアス値ｂも学習の対象となる。このため、以下の式（１３）及び式（１４）のようにバイアス値ｂも更新する。 Further, the bias value b in the equation (9) is also a learning target. For this reason, the bias value b is also updated as in the following equations (13) and (14).

手順２（ｃ）において、写像ベクトルＳを更新する際、ニューラルネットワーク言語モデルによる過学習を防止するため、一例として式（１５）のように、正則化を行ってもよい。さらに、バイアス値ｂについても、写像ベクトルＳと同様、正則化を行ってもよい。
なお、式（１５）では、βが正則化係数を表す。例えば、正則化係数βは、更新率εより小さな値とする。 In the procedure 2 (c), when the mapping vector S is updated, regularization may be performed as shown in Expression (15) as an example in order to prevent over-learning by the neural network language model. Further, the bias value b may be regularized as with the mapping vector S.
In equation (15), β represents a regularization coefficient. For example, the regularization coefficient β is set to a value smaller than the update rate ε.

手順３．所定の終了条件に合致するまで、手順２に戻って処理を繰り返す。このとき、所定の更新率減少規則に従って、更新率εを減少させる。
なお、終了条件及び更新率減少規則の詳細は、後記する。 Procedure 3. The process returns to step 2 and repeats until a predetermined end condition is met. At this time, the update rate ε is decreased according to a predetermined update rate decrease rule.
Details of the termination condition and the update rate reduction rule will be described later.

＜混合生起確率の算出手順＞
次単語ｗ_ｔの生起確率の計算は、前記した学習結果を用いて、以下の手順４〜手順６で行う。 <Procedure for calculating mixed occurrence probability>
The occurrence probability of the next word w _t is calculated by the following procedure 4 to procedure 6 using the learning result described above.

手順４．ニューラルネットワーク言語モデル演算装置に適宜（ＮＮＭＬのように前文脈が固定されている場合、その長さで区切った）前文脈…ｗ_ｔ−１を入力して順方向伝搬を行うことで、隠れ層ベクトルｚ（ｔ）及び次単語ｗ_ｔの生起確率ｐ_Ｎ（ｗ_ｔ｜…ｗ_ｔ−１）を求める。同様に、他の言語モデルの生起確率ｐ_Ｏ（ｗ_ｔ｜…ｗ_ｔ−１）を求める。
なお、この手順４は、写像ベクトルＳの学習手順２（ａ）と同じ処理である。 Procedure 4. Hidden layer by inputting forward context ... w _t-1 to the neural network language model arithmetic unit as appropriate (when the previous context is fixed like NNML, divided by its length) and performing forward propagation The occurrence probability p _N (w _t |... W _t−1 ) of the vector z (t) and the next word w _t is obtained. Similarly, occurrence probabilities p _O (w _t |... W _t−1 ) of other language models are obtained.
This procedure 4 is the same process as the learning procedure 2 (a) of the mapping vector S.

手順５．隠れ層から順方向伝搬を行うことで、混合係数λ（ｔ）を求める。すなわち、学習した写像ベクトルＳ及びバイアス値ｂを式（９）に代入して、混合係数λ（ｔ）を求める。なお、この手順５は、写像ベクトルＳの学習手順２（ｂ）と同じ処理である。
手順６．式（１６）を用いて、混合生起確率ｐ（ｗ_ｔ｜…ｗ_ｔ−１）を求める。 Procedure 5. By performing forward propagation from the hidden layer, the mixing coefficient λ (t) is obtained. That is, the learned mapping vector S and the bias value b are substituted into equation (9) to obtain the mixing coefficient λ (t). This procedure 5 is the same process as the learning procedure 2 (b) of the mapping vector S.
Procedure 6. The mixed occurrence probability p (w _t |... W _t−1 ) is obtained using Expression (16).

図２を参照し、本願発明の実施形態に係る混合生起確率算出システム１の構成について説明する。 With reference to FIG. 2, the structure of the mixed occurrence probability calculation system 1 according to the embodiment of the present invention will be described.

混合生起確率算出システム１は、ニューラルネットワーク言語モデルで求めた生起確率ｐ_Ｎと、他の言語モデルで求めた生起確率ｐ_Ｏとを混合した混合生起確率Ｐを算出するものである。図２のように、混合生起確率算出システム１は、ニューラルネットワーク言語モデル演算装置１０と、他言語モデル演算装置２０と、混合係数パラメータ学習装置３０と、混合生起確率算出装置４０とを備える。 Mixed occurrence probability calculation system 1 is for calculating the occurrence probability p _N calculated in the neural network language model, the mixing probability P obtained by mixing the occurrence probability p _O obtained in other language models. As shown in FIG. 2, the mixed occurrence probability calculation system 1 includes a neural network language model calculation device 10, another language model calculation device 20, a mixing coefficient parameter learning device 30, and a mixed occurrence probability calculation device 40.

［ニューラルネットワーク言語モデル演算装置の構成］
ニューラルネットワーク言語モデル演算装置１０は、ニューラルネットワーク言語モデルにより、生起確率ｐ_Ｎを演算するものである。例えば、ニューラルネットワーク言語モデル演算装置１０は、隠れ層を用いるニューラルネットワーク（例えば、ＮＮＬＭ、ＲＮＮＬＭ）を用いることができる。 [Configuration of Neural Network Language Model Calculation Device]
The neural network language model computing device 10 computes the occurrence probability p _N using a neural network language model. For example, the neural network language model calculation apparatus 10 can use a neural network using a hidden layer (for example, NNLM, RNNLM).

具体的には、ニューラルネットワーク言語モデル演算装置１０は、前文脈ｗ_１，ｗ_２，…，ｗ_ｔ−１が入力されると、当該前文脈に後続する単語ｗ_ｔの生起確率ｐ_Ｎ（ｗ_ｔ｜…ｗ_ｔ−１）を演算する。また、ニューラルネットワーク言語モデル演算装置１０は、ニューラルネットワークの出力層ベクトルｐ（ｔ）を演算する際、ニューラルネットワークの入力層ベクトルｘ（ｔ）から算出した隠れ層ベクトルｚ（ｔ）を記憶し、記憶した隠れ層ベクトルｚ（ｔ）を混合係数パラメータ学習装置３０又は混合生起確率算出装置４０に出力する。 Specifically, when the previous context w ₁ , w ₂ ,..., W _t−1 is input, the neural network language model calculation apparatus 10 receives the occurrence probability p _N (w of the word w _t following the previous context. _t |... w _t−1 ) is calculated. The neural network language model calculation device 10 stores the hidden layer vector z (t) calculated from the input layer vector x (t) of the neural network when calculating the output layer vector p (t) of the neural network, The stored hidden layer vector z (t) is output to the mixture coefficient parameter learning device 30 or the mixture occurrence probability calculation device 40.

ＮＮＬＭの場合、ニューラルネットワーク言語モデル演算装置１０は、参照可能な前文脈の長さが、前文脈の末尾から所定の単語数ｎ−１までに限定される（ｎは１以上の整数）。
例えば、前文脈がｗ_１，ｗ_２，…，ｗ_ｔ−１の場合、参照可能な前文脈がｗ_{ｔ−ｎ＋１}，ｗ_{ｔ-ｎ＋１}，…，ｗ_ｔ−１となる。
ニューラルネットワーク言語モデル演算装置１０は、入力された前文脈の各単語に対応した単語表現ベクトルＣ（ｗ）を記憶し、長さｎ−１の前文脈ｗ_{ｔ−ｎ＋１}，ｗ_{ｔ-ｎ＋１}，…，ｗ_ｔ−１が入力されると、その各単語に応じた単語表現ベクトルＣ（ｗ）を連結してニューラルネットワークの入力層ベクトルｘ（ｔ）に設定する。そして、ニューラルネットワーク言語モデル演算装置１０は、順方向伝搬を行い、ニューラルネットワークの隠れ層ベクトルｚ（ｔ）及び出力層ベクトルｐ（ｔ）を算出する。
出力層ベクトルｐ（ｔ）は、単語の異なり数の次元を持つベクトルであり、ベクトルの各次元の値がその次元に対応した単語の生起確率を表す。なお、隠れ層ベクトルｚ（ｔ）を「前文脈ｗ_１，ｗ_２，…，ｗ_ｔ−１の隠れ層表現」と呼ぶ。 In the case of NNLM, the neural network language model calculation apparatus 10 limits the length of the previous context that can be referred to from the end of the previous context to a predetermined number of words n−1 (n is an integer of 1 or more).
For example, prior context _w _1, w 2, _..., when the _{w t-1,} referable prior context _{w t-n + 1, w} t-n + 1, ..., a _{w t-1.}
The neural network language model calculation apparatus 10 stores the word expression vector C (w) corresponding to each word of the input previous context, and the previous contexts w _{t−n + 1} , w _{t−n + 1} ,. , W _t−1 are input, the word expression vectors C (w) corresponding to the respective words are connected and set to the input layer vector x (t) of the neural network. Then, the neural network language model calculation device 10 performs forward propagation and calculates the hidden layer vector z (t) and the output layer vector p (t) of the neural network.
The output layer vector p (t) is a vector having a number of different dimensions of the word, and the value of each dimension of the vector represents the occurrence probability of the word corresponding to that dimension. It should be noted that the hidden layer vector z (t) is referred to as a "pre-context _w _1, w 2, _..., hidden layer representation of _{w t-1".}

ＲＮＮＬＭの場合、ニューラルネットワーク言語モデル演算装置１０は、内部にこれまで順に入力された単語系列ｗ_１，ｗ_２，…を前文脈とする隠れ層ベクトルｚを記憶している。初期状態において、隠れ層ベクトルｚは、ニューラルネットワーク言語モデル演算装置１０に固有の初期値に設定される。
ニューラルネットワーク言語モデル演算装置１０は、ｉ番目の単語ｗ_ｉを入力すると、当該単語ｗ_ｉに対応した次元のみが１で、他のすべての次元が０であるベクトルを入力層ｘ（ｉ）に設定する。そして、ニューラルネットワーク言語モデル演算装置１０は、入力層ｘ（ｉ）及び記憶している前入力の隠れ層ベクトルｚ（ｉ）から順方向伝搬を行い、ニューラルネットワークの隠れ層ベクトルｚ（ｉ＋１）及び出力層ベクトルｐ（ｉ＋１）を算出する。単語ｗ_１，ｗ_２，…，ｗ_ｔ−１までの入力及び順方向伝搬が終了したとき、隠れ層ベクトルｚ（ｔ）は、前記したＮＮＬＭを用いた場合の「前文脈ｗ_１，ｗ_２，…，ｗ_ｔ−１の隠れ層表現」と同様のものになる。すなわち、ニューラルネットワーク言語モデル演算装置１０は、隠れ層ベクトルｚ（ｔ）を用いた順方向伝搬により出力層ベクトルｐ（ｔ）を算出し、次単語の生起確率ｐ_Ｎを求める。 In the case of RNNLM, the neural network language model calculation apparatus 10 stores a hidden layer vector z having the word series w ₁ , w ₂ ,. In the initial state, the hidden layer vector z is set to an initial value unique to the neural network language model arithmetic unit 10.
When the i-th word w _i is input, the neural network language model calculation device 10 inputs a vector in which only the dimension corresponding to the word w _i is 1 and all other dimensions are 0 to the input layer x (i). Set. Then, the neural network language model calculation device 10 performs forward propagation from the input layer x (i) and the stored hidden layer vector z (i) of the previous input, and the neural network hidden layer vector z (i + 1) and An output layer vector p (i + 1) is calculated. When the input to the words w ₁ , w ₂ ,..., W _t−1 and the forward propagation are finished, the hidden layer vector z (t) is “previous context w ₁ , w ₂ when using the NNLM”. ,..., Hidden layer representation of w _t−1 ”. In other words, neural network language model calculating unit 10 calculates the output layer vector p (t) by forward propagating Using Hidden layer vector z (t), obtaining the occurrence probability p _N of the next word.

なお、ニューラルネットワーク言語モデル演算装置１０は、学習済みであり（学習データにより順方向伝搬のための写像行列が適切な値に設定済みであり）、その学習結果が記憶されていることとする。
また、ニューラルネットワーク言語モデル演算装置１０は、一般的な構成のため、これ以上の説明を省略する。 It is assumed that the neural network language model calculation device 10 has already been learned (the mapping matrix for forward propagation has been set to an appropriate value by the learning data), and the learning result is stored.
Further, since the neural network language model calculation device 10 has a general configuration, further description thereof is omitted.

［他言語モデル演算装置の構成］
他言語モデル演算装置２０は、ニューラルネットワーク言語モデル以外の他の言語モデル（例えば、ｎ−ｇｒａｍ言語モデル）により、生起確率ｐ_Ｏを演算するものである。具体的には、他言語モデル演算装置２０は、前文脈ｗ_１，ｗ_２，…，ｗ_ｔ−１が入力されると、当該前文脈に後続する任意の単語ｗ_ｔの生起確率ｐ_Ｏ（ｗ_ｔ｜…ｗ_ｔ−１）を演算して出力する。 [Configuration of other language model arithmetic unit]
The other language model calculation device 20 calculates the occurrence probability p _O using a language model other than the neural network language model (for example, an n-gram language model). Specifically, when the previous context w ₁ , w ₂ ,..., W _t−1 is input, the other language model calculation apparatus 20 receives the occurrence probability p _O (arbitrary word w _t following the previous context. w _t |... w _t−1 ) is calculated and output.

なお、他言語モデル演算装置２０は、確率値の計算に必要な各種パラメータが予め設定されていることとする。
また、他言語モデル演算装置２０は、一般的な構成のため、これ以上の説明を省略する。 In the other language model calculation device 20, various parameters necessary for calculating the probability value are set in advance.
Further, since the other language model calculation device 20 has a general configuration, further description is omitted.

［混合係数パラメータ学習装置の構成］
混合係数パラメータ学習装置３０は、ニューラルネットワーク言語モデルと他の言語モデルとのそれぞれで求めた生起確率ｐ_Ｎ，ｐ_Ｏを混合するときの混合係数λの算出に必要なパラメータを学習するものである。 [Configuration of mixing coefficient parameter learning device]
The mixing coefficient parameter learning device 30 learns parameters necessary for calculating the mixing coefficient λ when the occurrence probabilities p _N and p _O obtained in the neural network language model and other language models are mixed. .

図２のように、混合係数パラメータ学習装置３０は、混合係数パラメータ記憶手段３０１と、学習パラメータ記憶手段３０２と、学習データ記憶手段３０３と、混合係数記憶手段３０４と、初期化手段３１１と、第１生起確率要求手段（第１生起確率入力手段）３１２と、第２生起確率要求手段（第２生起確率入力手段）３１３と、第１混合係数算出手段３１４と、写像ベクトル更新手段３１５と、終了条件判定手段３１６と、更新率減少手段３１７とを備える。 As shown in FIG. 2, the mixing coefficient parameter learning apparatus 30 includes a mixing coefficient parameter storage unit 301, a learning parameter storage unit 302, a learning data storage unit 303, a mixing coefficient storage unit 304, an initialization unit 311, 1 occurrence probability request means (first occurrence probability input means) 312, second occurrence probability request means (second occurrence probability input means) 313, first mixing coefficient calculation means 314, mapping vector update means 315, and end Condition determining means 316 and update rate reducing means 317 are provided.

混合係数パラメータ記憶手段３０１は、混合係数λの算出に必要な混合係数パラメータを記憶するメモリ、ハードディスク等の記憶手段である。具体的には、混合係数パラメータ記憶手段３０１は、写像ベクトルＳ、バイアス値ｂ等の混合係数パラメータを記憶する。この写像ベクトルＳは、ニューラルネットワークの隠れ層ベクトルｚの次元数ｈと同一次元数である。 The mixing coefficient parameter storage unit 301 is a storage unit such as a memory or a hard disk that stores a mixing coefficient parameter necessary for calculating the mixing coefficient λ. Specifically, the mixing coefficient parameter storage unit 301 stores mixing coefficient parameters such as the mapping vector S and the bias value b. This mapping vector S has the same dimensionality as the dimensionality h of the hidden layer vector z of the neural network.

学習パラメータ記憶手段３０２は、写像ベクトルＳの学習に必要なパラメータを記憶するメモリ、ハードディスク等の記憶手段である。具体的には、学習パラメータ記憶手段３０２は、更新率ε、正則化係数β等の学習パラメータを記憶する。 The learning parameter storage unit 302 is a storage unit such as a memory or a hard disk that stores parameters necessary for learning the mapping vector S. Specifically, the learning parameter storage unit 302 stores learning parameters such as the update rate ε and the regularization coefficient β.

学習データ記憶手段３０３は、写像ベクトルＳの学習に必要な学習データである単語列を記憶するメモリ、ハードディスク等の記憶手段である。この学習データは、ニューラルネットワーク言語モデル演算装置１０及び他言語モデル演算装置２０で学習に用いたものと同一でなくともよい。
混合係数記憶手段３０４は、混合係数λを記憶するメモリ、ハードディスク等の記憶手段である。 The learning data storage unit 303 is a storage unit such as a memory or a hard disk that stores a word string that is learning data necessary for learning the mapping vector S. This learning data may not be the same as that used for learning in the neural network language model calculation device 10 and the other language model calculation device 20.
The mixing coefficient storage unit 304 is a storage unit such as a memory or a hard disk that stores the mixing coefficient λ.

初期化手段３１１は、混合係数パラメータ及び学習パラメータの初期化を行うものである。具体的には、初期化手段３１１は、混合係数パラメータ記憶手段３０１の写像ベクトルＳの各次元の値、および、バイアス値ｂを乱数で初期化する。また、初期化手段３１１は、学習パラメータ記憶手段３０２の更新率ε及び正則化係数βを予め設定した値で初期化する。 The initialization unit 311 initializes the mixing coefficient parameter and the learning parameter. Specifically, the initialization unit 311 initializes the value of each dimension of the mapping vector S of the mixing coefficient parameter storage unit 301 and the bias value b with random numbers. The initialization unit 311 initializes the update rate ε and the regularization coefficient β in the learning parameter storage unit 302 with preset values.

第１生起確率要求手段３１２は、学習データ記憶手段３０３の前文脈をニューラルネットワーク言語モデル演算装置１０に出力することで、隠れ層ベクトルｚ及び生起確率ｐ_Ｎを要求するものである。この要求に応じて、第１生起確率要求手段３１２は、ニューラルネットワーク言語モデル演算装置１０から、隠れ層ベクトルｚ及び生起確率ｐ_Ｎが入力される。そして、第１生起確率要求手段３１２は、入力された隠れ層ベクトルｚ及び生起確率ｐ_Ｎを第１混合係数算出手段３１４及び写像ベクトル更新手段３１５に出力する。 The first probability requesting means 312, by outputting the previous context of learning data storage unit 303 to the neural network language model calculating unit 10, and requests the hidden layer vector z and probability p _N. In response to this request, the first occurrence probability requesting unit 312 receives the hidden layer vector z and the occurrence probability p _N from the neural network language model calculation device 10. The first probability request means 312 outputs the input hidden layers vector z and probability p _N to the first mixing coefficient calculation means 314 and the mapping vector updating means 315.

第２生起確率要求手段３１３は、学習データ記憶手段３０３の前文脈を他言語モデル演算装置２０に出力することで、生起確率ｐ_Ｏを要求するものである。ここで、第２生起確率要求手段３１３は、第１生起確率要求手段３１２と同一の前文脈を他言語モデル演算装置２０に出力する。この要求に応じて、第２生起確率要求手段３１３は、他言語モデル演算装置２０から、生起確率ｐ_Ｏが入力される。そして、第２生起確率要求手段３１３は、入力された生起確率ｐ_Ｏを写像ベクトル更新手段３１５に出力する。 The second occurrence probability requesting unit 313 requests the occurrence probability p _O by outputting the previous context of the learning data storage unit 303 to the other language model calculation device 20. Here, the second occurrence probability requesting unit 313 outputs the same previous context as that of the first occurrence probability requesting unit 312 to the other language model calculation device 20. In response to this request, the second occurrence probability request means 313 receives the occurrence probability p _O from the other language model calculation device 20. Then, the second occurrence probability request unit 313 outputs the input occurrence probability p _O to the mapping vector update unit 315.

第１混合係数算出手段３１４は、式（９）を用いて、混合係数パラメータ記憶手段３０１の写像ベクトルＳにより、第１生起確率要求手段３１２から入力された隠れ層ベクトルｚを実数値のスカラｓに線形写像するものである。また、第１混合係数算出手段３１４は、式（８）を用いて、実数値のスカラｓをシグモイド関数で非線形変換することで、混合係数λを算出する。そして、第１混合係数算出手段３１４は、算出した混合係数λを混合係数記憶手段３０４に記憶する。 The first mixing coefficient calculating unit 314 uses the expression (9) to convert the hidden layer vector z input from the first occurrence probability requesting unit 312 into a real-valued scalar s by using the mapping vector S of the mixing coefficient parameter storage unit 301. Is a linear mapping. The first mixing coefficient calculation unit 314 calculates the mixing coefficient λ by nonlinearly converting the real-valued scalar s using a sigmoid function using Equation (8). Then, the first mixing coefficient calculation unit 314 stores the calculated mixing coefficient λ in the mixing coefficient storage unit 304.

写像ベクトル更新手段３１５は、第１生起確率要求手段３１２からの生起確率ｐ_Ｎ、第２生起確率要求手段３１３からの生起確率ｐ_Ｏ、混合係数記憶手段３０４の混合係数λ、及び、学習パラメータ記憶手段３０２の更新率εを用いた確率的勾配降下法により、混合係数記憶手段３０４の写像ベクトルＳを更新するものである。つまり、写像ベクトル更新手段３１５は、式（１１）及び式（１２）で表される確率的勾配降下法を用いて、写像ベクトルＳを更新する。 The mapping vector update unit 315 includes an occurrence probability p _N from the first occurrence probability request unit 312, an occurrence probability p _O from the second occurrence probability request unit 313, a mixing coefficient λ of the mixing coefficient storage unit 304, and a learning parameter storage. The mapping vector S of the mixing coefficient storage unit 304 is updated by a probabilistic gradient descent method using the update rate ε of the unit 302. That is, the mapping vector update unit 315 updates the mapping vector S using the stochastic gradient descent method expressed by the equations (11) and (12).

終了条件判定手段３１６は、予め設定された終了条件を満たすか否かを判定し、この終了条件を満たすまで、後記する更新率減少手段３１７が減少させた更新率εで写像ベクトル更新手段３１５に写像ベクトルＳを更新させるものである。例えば、終了条件判定手段３１６は、予め設定した回数だけ更新率εを減少させて混合生起確率ｐの値が変化しなかった場合、終了条件を満たすと判定する。 The end condition determining unit 316 determines whether or not a preset end condition is satisfied, and the map vector updating unit 315 is updated with an update rate ε decreased by an update rate decreasing unit 317 described later until the end condition is satisfied. The mapping vector S is updated. For example, the end condition determination unit 316 determines that the end condition is satisfied when the update rate ε is decreased by a preset number of times and the value of the mixed occurrence probability p does not change.

ここで、終了条件を満たしていない場合、終了条件判定手段３１６は、更新率減少手段３１７に更新率εの減少を指令する。その後、終了条件判定手段３１６は、第１生起確率要求手段３１２、第２生起確率要求手段３１３、第１混合係数算出手段３１４、及び、写像ベクトル更新手段３１５に処理の再実行を指令する。
一方、終了条件を満たしている場合、終了条件判定手段３１６は、処理を終了する。
なお、図２では、終了条件判定手段３１６からの指令信号を破線で図示した。 Here, when the end condition is not satisfied, the end condition determining unit 316 instructs the update rate reducing unit 317 to decrease the update rate ε. Thereafter, the end condition determination unit 316 instructs the first occurrence probability request unit 312, the second occurrence probability request unit 313, the first mixing coefficient calculation unit 314, and the mapping vector update unit 315 to re-execute processing.
On the other hand, when the end condition is satisfied, the end condition determining unit 316 ends the process.
In FIG. 2, the command signal from the end condition determination unit 316 is illustrated by a broken line.

更新率減少手段３１７は、予め設定された更新率減少規則に従って、必要に応じて学習パラメータ記憶手段３０２の更新率εを減少させるものである。例えば、更新率減少規則としては、更新率εの値から予め設定した値を減算するという規則があげられる。 The update rate reduction means 317 reduces the update rate ε of the learning parameter storage means 302 as necessary according to a preset update rate reduction rule. For example, the update rate reduction rule includes a rule of subtracting a preset value from the value of the update rate ε.

［混合生起確率算出装置の構成］
混合生起確率算出装置４０は、ニューラルネットワーク言語モデルと他の確率モデルとのそれぞれで求めた生起確率ｐ_Ｎ，ｐ_Ｏを混合した混合生起確率ｐを算出するものである。図２のように、混合生起確率算出装置４０は、対象データ記憶手段４０１と、混合生起確率記憶手段４０２と、第３生起確率要求手段（第３生起確率入力手段）４１１と、第４生起確率要求手段（第４生起確率入力手段）４１２と、第２混合係数算出手段４１３と、混合生起確率算出手段４１４とを備える。 [Configuration of mixed occurrence probability calculation device]
The mixed occurrence probability calculation device 40 calculates a mixed occurrence probability p obtained by mixing the occurrence probabilities p _N and p _O obtained in the neural network language model and other probability models, respectively. As shown in FIG. 2, the mixed occurrence probability calculating device 40 includes a target data storage unit 401, a mixed occurrence probability storage unit 402, a third occurrence probability requesting unit (third occurrence probability input unit) 411, and a fourth occurrence probability. Request means (fourth occurrence probability input means) 412, second mixing coefficient calculation means 413, and mixed occurrence probability calculation means 414 are provided.

対象データ記憶手段４０１は、混合生起確率ｐの算出対象となる前文脈及び次単語を表す単語列を記憶するメモリ、ハードディスク等の記憶手段である。この対象データ記憶手段４０１の単語列は、学習データ記憶手段３０３の単語列と異なるものである。
混合生起確率記憶手段４０２は、混合生起確率ｐを記憶するメモリ、ハードディスク等の記憶手段である。 The target data storage unit 401 is a storage unit such as a memory or a hard disk that stores a word string representing a previous context and a next word that are targets of calculation of the mixed occurrence probability p. The word string in the target data storage unit 401 is different from the word string in the learning data storage unit 303.
The mixed occurrence probability storage unit 402 is a storage unit such as a memory or a hard disk that stores the mixed occurrence probability p.

第３生起確率要求手段４１１は、対象データ記憶手段４０１の前文脈をニューラルネットワーク言語モデル演算装置１０に出力することで、隠れ層ベクトルｚ及び生起確率ｐ_Ｎを要求するものである。この要求に応じて、第３生起確率要求手段４１１は、ニューラルネットワーク言語モデル演算装置１０から、隠れ層ベクトルｚ及び生起確率ｐ_Ｎが入力される。そして、第３生起確率要求手段４１１は、入力された隠れ層ベクトルｚ及び生起確率ｐ_Ｎを第２混合係数算出手段４１３及び混合生起確率算出手段４１４に出力する。 Third probability requesting unit 411, by outputting the previous context object data storage means 401 in the neural network language model calculating unit 10, and requests the hidden layer vector z and probability p _N. In response to this request, the third occurrence probability requesting means 411 receives the hidden layer vector z and the occurrence probability p _N from the neural network language model calculation device 10. The third probability requesting unit 411 outputs the input hidden layers vector z and probability p _N to the second mixing coefficient calculation means 413, and mixtures occurrence probability calculating unit 414.

第４生起確率要求手段４１２は、対象データ記憶手段４０１の前文脈を他言語モデル演算装置２０に出力することで、生起確率ｐ_Ｏを要求するものである。ここで、第４生起確率要求手段４１２は、第３生起確率要求手段４１１と同一の前文脈を他言語モデル演算装置２０に出力する。この要求に応じて、第４生起確率要求手段４１２は、他言語モデル演算装置２０から、生起確率ｐ_Ｏが入力される。そして、第４生起確率要求手段４１２は、入力された生起確率ｐ_Ｏを混合生起確率算出手段４１４に出力する。 The fourth occurrence probability requesting unit 412 requests the occurrence probability p _O by outputting the previous context of the target data storage unit 401 to the other language model calculation device 20. Here, the fourth occurrence probability requesting unit 412 outputs the same previous context as that of the third occurrence probability requesting unit 411 to the other language model calculation device 20. In response to this request, the fourth occurrence probability request means 412 receives the occurrence probability p _O from the other language model calculation device 20. Then, the fourth occurrence probability requesting means 412 outputs the input occurrence probability p _O to the mixed occurrence probability calculating means 414.

第２混合係数算出手段４１３は、式（９）を用いて、混合係数パラメータ記憶手段３０１の写像ベクトルＳにより、第３生起確率要求手段４１１から入力された隠れ層ベクトルｚを実数値のスカラｓに線形写像するものである。また、第２混合係数算出手段４１３は、式（８）を用いて、実数値のスカラｓをシグモイド関数で非線形変換することで、混合係数λを算出する。そして、第２混合係数算出手段４１３は、算出した混合係数を混合係数記憶手段３０４に記憶する。 The second mixing coefficient calculation unit 413 uses the expression (9) to calculate the hidden layer vector z input from the third occurrence probability requesting unit 411 using the mapping vector S of the mixing coefficient parameter storage unit 301 as a real-valued scalar s. Is a linear mapping. Further, the second mixing coefficient calculation unit 413 calculates the mixing coefficient λ by nonlinearly transforming the real-valued scalar s with a sigmoid function using Expression (8). Then, the second mixing coefficient calculation unit 413 stores the calculated mixing coefficient in the mixing coefficient storage unit 304.

混合生起確率算出手段４１４は、混合係数記憶手段３０４の混合係数λを用いて、第３生起確率要求手段４１１から入力された生起確率ｐ_Ｎと、第４生起確率要求手段４１２から入力された生起確率ｐ_Ｏとを混合することで、混合生起確率ｐを算出するものである。そして、混合生起確率算出手段４１４は、算出した混合生起確率ｐを混合生起確率記憶手段４０２に記憶する。 The mixed occurrence probability calculation means 414 uses the mixing coefficient λ of the mixing coefficient storage means 304 and the occurrence probability p _N input from the third occurrence probability request means 411 and the occurrence input input from the fourth occurrence probability request means 412. By mixing the probability p _O , the mixed occurrence probability p is calculated. Then, the mixed occurrence probability calculating unit 414 stores the calculated mixed occurrence probability p in the mixed occurrence probability storage unit 402.

［混合係数パラメータ学習装置の動作］
図３を参照し、混合係数パラメータ学習装置３０の動作について説明する（適宜図２参照）。 [Operation of mixing coefficient parameter learning device]
The operation of the mixing coefficient parameter learning device 30 will be described with reference to FIG. 3 (see FIG. 2 as appropriate).

混合係数パラメータ学習装置３０は、初期化手段３１１によって、写像ベクトルＳ、バイアス値ｂ等の混合係数パラメータを初期化する（ステップＳ１）。
混合係数パラメータ学習装置３０は、初期化手段３１１によって、更新率ε、正則化係数β等の学習パラメータを初期化する（ステップＳ２）。
混合係数パラメータ学習装置３０は、カウンタｉの値を１に初期化する（ステップＳ３）。 The mixing coefficient parameter learning device 30 initializes the mixing coefficient parameters such as the mapping vector S and the bias value b by the initialization unit 311 (step S1).
The mixing coefficient parameter learning device 30 initializes learning parameters such as the update rate ε and the regularization coefficient β by the initialization unit 311 (step S2).
The mixing coefficient parameter learning device 30 initializes the value of the counter i to 1 (step S3).

混合係数パラメータ学習装置３０は、第１生起確率要求手段３１２によって、学習データ記憶手段３０３の単語列ｗ_１，ｗ_２，…，ｗ_Ｎのうち、先頭からｉ−１個の単語列ｗ_１，ｗ_２，…，ｗ_ｉ−１を前文脈としてニューラルネットワーク言語モデル演算装置１０に出力する。
混合係数パラメータ学習装置３０は、第１生起確率要求手段３１２によって、ニューラルネットワーク言語モデル演算装置１０から、隠れ層ベクトルｚ（ｉ）及び次単語ｗ_ｉの生起確率ｐ_Ｎ（ｗ_ｉ｜ｗ_１ｗ_２…ｗ_ｉ−１）が入力される（ステップＳ４）。 Mixing coefficient parameter learning unit 30, the first probability requesting means 312, a word string _w 1 of the learning data storage unit _303, w 2, ..., _w of _N, i-1 or word string _w 1 from the head, w ₂ ,..., w _i−1 are output to the neural network language model arithmetic unit 10 as the previous context.
The mixing coefficient parameter learning device 30 uses the first occurrence probability requesting unit 312 to generate the occurrence probability p _N (w _i | w ₁ w of the hidden layer vector z (i) and the next word w _i from the neural network language model calculation device 10. ₂ ... W _i-1 ) is input (step S4).

混合係数パラメータ学習装置３０は、第２生起確率要求手段３１３によって、ステップＳ４と同一の前文脈ｗ_１，ｗ_２，…，ｗ_ｉ−１を他言語モデル演算装置２０に出力する。
混合係数パラメータ学習装置３０は、第２生起確率要求手段３１３によって、他言語モデル演算装置２０から、次単語ｗ_ｉの生起確率ｐ_Ｏ（ｗ_ｉ｜ｗ_１ｗ_２…ｗ_ｉ−１）が入力される（ステップＳ５）。 The mixed coefficient parameter learning device 30 outputs the same previous contexts w ₁ , w ₂ ,..., W _i−1 as in step S 4 to the other language model calculation device 20 by the second occurrence probability requesting unit 313.
The mixed coefficient parameter learning device 30 receives the occurrence probability p _O (w _i | w ₁ w ₂ ... W _i-1 ) of the next word w _i from the other language model calculation device 20 by the second occurrence probability requesting unit 313. (Step S5).

混合係数パラメータ学習装置３０は、第１混合係数算出手段３１４によって、ステップＳ４で入力された隠れ層のベクトルｚ（ｉ）及び写像ベクトルＳを用いて、式（８）及び式（９）に従って混合係数λ（ｉ)を算出する（ステップＳ６） The mixing coefficient parameter learning device 30 performs mixing according to the equations (8) and (9) using the hidden layer vector z (i) and the mapping vector S input by the first mixing coefficient calculation unit 314 in step S4. The coefficient λ (i) is calculated (step S6).

混合係数パラメータ学習装置３０は、写像ベクトル更新手段３１５によって、ステップＳ４で入力された生起確率ｐ_Ｎ（ｗ_ｉ｜ｗ_１ｗ_２…ｗ_ｉ−１）と、ステップＳ５で入力された生起確率ｐ_Ｏ（ｗ_ｉ｜ｗ_１ｗ_２…ｗ_ｉ−１）と、ステップＳ６で算出した混合係数λ（ｉ）と、更新率εとを用いて、式（１１）及び式（１２）で写像ベクトルＳを更新する（ステップＳ７）。 The mixing coefficient parameter learning device 30 uses the mapping vector update unit 315 to generate the occurrence probability p _N (w _i | w ₁ w ₂ ... W _i−1 ) input in step S4 and the occurrence probability p input in step S5. _{Using O} (w _i | w ₁ w ₂ ... W _i−1 ), the mixing coefficient λ (i) calculated in step S6, and the update rate ε, the mapping vector is expressed by equation (11) and equation (12). S is updated (step S7).

混合係数パラメータ学習装置３０は、カウンタｉをインクリメントする（ステップＳ８）。
混合係数パラメータ学習装置３０は、カウンタｉが単語最大数Ｎ以下であるか否かを判定する(ステップＳ９)。
カウンタｉが単語最大数Ｎ以下の場合（ステップＳ９でＹｅｓ）、混合係数パラメータ学習装置３０は、ステップＳ４の処理に戻る。 The mixing coefficient parameter learning device 30 increments the counter i (step S8).
The mixing coefficient parameter learning device 30 determines whether or not the counter i is equal to or less than the maximum number N of words (step S9).
When the counter i is equal to or less than the maximum number N of words (Yes in step S9), the mixing coefficient parameter learning device 30 returns to the process of step S4.

カウンタｉが単語最大数Ｎ以下でない場合（ステップＳ９でＮｏ）、混合係数パラメータ学習装置３０は、終了条件判定手段３１６によって、終了条件を満たすか否かを判定する（ステップＳ１０）。
終了条件を満たす場合（ステップＳ１０でＹｅｓ）、混合係数パラメータ学習装置３０は、処理を終了する。 If the counter i is not equal to or less than the maximum number N of words (No in step S9), the mixture coefficient parameter learning device 30 determines whether or not the end condition is satisfied by the end condition determining unit 316 (step S10).
If the end condition is satisfied (Yes in step S10), the mixing coefficient parameter learning device 30 ends the process.

終了条件を満たさない場合（ステップＳ１０でＮｏ）、混合係数パラメータ学習装置３０は、更新率減少手段３１７によって、更新率減少規則に従って、必要に応じて更新率εを減少させ（ステップＳ１１）、ステップＳ３の処理に戻る。 When the termination condition is not satisfied (No in step S10), the mixing coefficient parameter learning device 30 decreases the update rate ε as necessary according to the update rate decrease rule by the update rate decreasing unit 317 (step S11), and step The process returns to S3.

［混合生起確率算出装置の動作］
図４を参照し、混合生起確率算出装置４０の動作について説明する（適宜図１参照）。 [Operation of mixed occurrence probability calculation device]
The operation of the mixed occurrence probability calculation device 40 will be described with reference to FIG. 4 (see FIG. 1 as appropriate).

混合生起確率算出装置４０は、第３生起確率要求手段４１１によって、対象データ記憶手段４０１の単語列ｗ_１，ｗ_２，…，ｗ_ｔ−１を前文脈としてニューラルネットワーク言語モデル演算装置１０に出力する。
混合生起確率算出装置４０は、第３生起確率要求手段４１１によって、ニューラルネットワーク言語モデル演算装置１０から、隠れ層ベクトルｚ及び次単語ｗ_ｔの生起確率ｐ_Ｎ（ｗ_ｔ｜ｗ_１ｗ_２…ｗ_ｔ−１）が入力される（ステップＳ２１）。 The mixed occurrence probability calculation device 40 outputs the word string w ₁ , w ₂ ,..., W _t−1 of the target data storage unit 401 to the neural network language model calculation device 10 as the previous context by the third occurrence probability request unit 411. To do.
The mixed occurrence probability calculation device 40 receives the occurrence probability p _N (w _t | w ₁ w ₂ ... W of the hidden layer vector z and the next word w _t from the neural network language model calculation device 10 by the third occurrence probability request means 411. _t-1 ) is input (step S21).

混合生起確率算出装置４０は、第４生起確率要求手段４１２によって、ステップＳ２１と同一の前文脈ｗ_１，ｗ_２，…，ｗ_ｔ−１を他言語モデル演算装置２０に出力する。
混合生起確率算出装置４０は、第４生起確率要求手段４１２によって、他言語モデル演算装置２０から、次単語ｗ_ｉの生起確率ｐ_Ｏ（ｗ_ｔ｜ｗ_１ｗ_２…ｗ_ｔ−１）が入力される（ステップＳ２２）。 The mixed occurrence probability calculation device 40 outputs the same previous contexts w ₁ , w ₂ ,..., W _t−1 as in step S 21 to the other language model calculation device 20 by the fourth occurrence probability requesting means 412.
The occurrence probability p _O (w _t | w ₁ w ₂ ... W _t−1 ) of the next word w _i is input to the mixed occurrence probability calculation device 40 from the other language model calculation device 20 by the fourth occurrence probability request unit 412. (Step S22).

混合生起確率算出装置４０は、第２混合係数算出手段４１３によって、ステップＳ２１で入力された隠れ層のベクトルｚ及び写像ベクトルＳを用いて、式（８）及び式（９）に従って混合係数λ（ｔ)を算出する（ステップＳ２３） The mixed occurrence probability calculating device 40 uses the hidden layer vector z and the mapping vector S input in step S21 by the second mixing coefficient calculating unit 413, according to the equations (8) and (9). t) is calculated (step S23).

混合生起確率算出装置４０は、混合生起確率算出手段４１４によって、ステップ２１で入力された生起確率ｐ_Ｎ（ｗ_ｔ｜ｗ_１ｗ_２…ｗ_ｔ−１）とステップ２２で入力された生起確率ｐ_Ｏ（ｗ_ｔ｜ｗ_１，ｗ_２，…，ｗ_ｔ−１）との混合生起確率ｐ（ｗ_ｔ｜ｗ_１ｗ_２…ｗ_ｔ−１）を、式（１６）で算出する（ステップＳ２４）。 The mixed occurrence probability calculation device 40 uses the occurrence probability p _N (w _t | w ₁ w ₂ ... W _t−1 ) input in step 21 and the occurrence probability p input in step 22 by the mixed occurrence probability calculation means 414. The mixed occurrence probability p (w _t | w ₁ w ₂ ... W _t−1 ) with _O (w _t | w ₁ , w ₂ ,..., W _t−1 ) is calculated by Expression (16) (step S24). ).

［作用・効果］
以上のように、混合生起確率算出システム１は、ニューラルネットワーク言語モデルにより写像ベクトルＳを学習し、学習した写像ベクトルＳにより前文脈に応じた混合係数を求めている。これにより、混合生起確率算出システム１は、ｎ−ｇｒａｍ言語モデル等の他の言語モデルと混合して混合生起確率ｐを算出する際、従来よりも混合生起確率ｐの正確性を向上させることができる。 [Action / Effect]
As described above, the mixed occurrence probability calculation system 1 learns the mapping vector S using the neural network language model, and obtains the mixing coefficient corresponding to the previous context using the learned mapping vector S. As a result, when the mixed occurrence probability calculation system 1 calculates the mixed occurrence probability p by mixing with another language model such as an n-gram language model, the accuracy of the mixed occurrence probability p can be improved as compared with the conventional case. it can.

（変形例）
以上、本願発明の各実施形態を詳述してきたが、本願発明は前記した実施形態に限られるものではなく、本願発明の要旨を逸脱しない範囲の設計変更等も含まれる。 (Modification)
As mentioned above, although each embodiment of this invention was explained in full detail, this invention is not limited to above-described embodiment, The design change etc. of the range which does not deviate from the summary of this invention are also included.

前記した実施形態では、混合係数パラメータ学習装置が混合係数パラメータ記憶手段及び混合係数記憶手段を備えることとして説明したが、本願発明は、これに限定されない。つまり、混合生起確率算出装置が混合係数パラメータ記憶手段及び混合係数記憶手段を備えてもよい。 In the embodiment described above, the mixing coefficient parameter learning device has been described as including the mixing coefficient parameter storage unit and the mixing coefficient storage unit, but the present invention is not limited to this. That is, the mixture occurrence probability calculation device may include a mixture coefficient parameter storage unit and a mixture coefficient storage unit.

前記した実施形態では、本願発明を言語モデルに適用する例を説明したが、本願発明が適用可能な確率モデルはこれに限定されず、何らかの記号系列に後続して生起する記号の生起確率モデル一般に適用することができる。 In the embodiment described above, an example in which the present invention is applied to a language model has been described. However, the probability model to which the present invention can be applied is not limited to this, and the occurrence probability model of a symbol that occurs following any symbol sequence in general. Can be applied.

前記した実施形態では、正則化を行うこととして説明したが、本願発明は、正則化を行わなくともよい。
前記した実施形態では、バイアス値ｂを用いることとして説明したが、本願発明は、バイアス値ｂを用いなくともよい。この場合、前記した式（９）の代わりに以下の式（１７）を用いることになる。 In the above-described embodiment, it has been described that regularization is performed. However, the present invention may not be regularized.
In the above-described embodiment, the bias value b is used. However, the present invention may not use the bias value b. In this case, the following formula (17) is used instead of the above formula (9).

前記した実施形態では、混合係数パラメータ学習装置を独立したハードウェアとして説明したが、本願発明は、これに限定されない。例えば、混合係数パラメータ学習装置は、コンピュータが備えるＣＰＵ、メモリ、ハードディスク等のハードウェア資源を協調動作させる混合係数パラメータ学習プログラムで実現することもできる。このプログラムは、通信回線を介して配布してもよく、ＣＤ−ＲＯＭやフラッシュメモリ等の記録媒体に書き込んで配布してもよい。
また、混合生起確率算出装置は、混合係数パラメータ学習装置と同様、混合生起確率算出プログラムで実現することもできる。 In the above-described embodiment, the mixing coefficient parameter learning device has been described as independent hardware, but the present invention is not limited to this. For example, the mixing coefficient parameter learning device can also be realized by a mixing coefficient parameter learning program for cooperatively operating hardware resources such as a CPU, a memory, and a hard disk included in a computer. This program may be distributed through a communication line, or may be distributed by writing in a recording medium such as a CD-ROM or a flash memory.
Further, the mixed occurrence probability calculation device can also be realized by a mixed occurrence probability calculation program in the same manner as the mixing coefficient parameter learning device.

１混合生起確率算出システム
１０ニューラルネットワーク言語モデル演算装置
２０他言語モデル演算装置
３０混合係数パラメータ学習装置
３０１混合係数パラメータ記憶手段
３０２学習パラメータ記憶手段
３０３学習データ記憶手段
３０４混合係数記憶手段
３１１初期化手段
３１２第１生起確率要求手段（第１生起確率入力手段）
３１３第２生起確率要求手段（第２生起確率入力手段）
３１４第１混合係数算出手段
３１５写像ベクトル更新手段
３１６終了条件判定手段
３１７更新率減少手段
４０混合生起確率算出装置
４０１対象データ記憶手段
４０２混合生起確率記憶手段
４１１第３生起確率要求手段（第３生起確率入力手段）
４１２第４生起確率要求手段（第４生起確率入力手段）
４１３第２混合係数算出手段
４１４混合生起確率算出手段 DESCRIPTION OF SYMBOLS 1 Mixed occurrence probability calculation system 10 Neural network language model calculating apparatus 20 Other language model calculating apparatus 30 Mixed coefficient parameter learning apparatus 301 Mixed coefficient parameter storage means 302 Learning parameter storage means 303 Learning data storage means 304 Mixed coefficient storage means 311 Initialization means 312 First occurrence probability request means (first occurrence probability input means)
313 Second occurrence probability request means (second occurrence probability input means)
314 First mixture coefficient calculation means 315 Mapping vector update means 316 End condition determination means 317 Update rate reduction means 40 Mixed occurrence probability calculation device 401 Target data storage means 402 Mixed occurrence probability storage means 411 Third occurrence probability request means (third occurrence) Probability input means)
412 Fourth occurrence probability request means (fourth occurrence probability input means)
413 Second mixing coefficient calculating means 414 Mixed occurrence probability calculating means

Claims

A mixing coefficient parameter for learning parameters necessary for calculating the mixing coefficient when mixing the occurrence probabilities of the next element with respect to the previous element sequence obtained by the neural network probability model and other probability models other than the neural network probability model. A learning device,
First occurrence probability input means for inputting a hidden layer vector of the neural network probability model and an occurrence probability obtained by the neural network probability model;
A second occurrence probability input means for inputting the occurrence probability obtained by the other probability model;
A first mixing coefficient calculating means for calculating the mixing coefficient by linearly mapping the hidden layer vector to a real-valued scalar according to a preset mapping vector, and nonlinearly transforming the real-valued scalar with a sigmoid function;
The mapping vector as the parameter is obtained by a stochastic gradient descent method using an occurrence probability obtained by each of the neural network probability model and the other probability model, the mixing coefficient, and a preset update rate. Map vector updating means for updating;
Update rate reduction means for reducing the update rate according to a preset update rate reduction rule;
Determining whether or not a preset end condition is satisfied, and until the end condition is satisfied, an end condition determining unit that causes the map vector updating unit to update the mapping vector at the reduced update rate;
A mixing coefficient parameter learning apparatus comprising:

2. The mixed coefficient parameter learning apparatus according to claim 1, wherein the map vector update unit performs regularization when the map vector is updated.

A mixed occurrence probability calculation device for calculating a mixed occurrence probability obtained by mixing the occurrence probabilities of the next element with respect to the previous element series obtained by the neural network probability model and other probability models other than the neural network probability model,
A third occurrence probability input means for inputting the hidden layer vector of the neural network and the occurrence probability obtained by the neural network probability model;
A fourth occurrence probability input means for inputting the occurrence probability obtained by the other probability model;
The mixture coefficient is calculated by linearly mapping the hidden layer vector to a real-valued scalar using the mapping vector learned by the mixing coefficient parameter learning apparatus according to claim 1 and performing nonlinear conversion on the real-valued scalar using a sigmoid function. Second mixing coefficient calculating means for
A mixed occurrence probability calculating means for calculating the mixed occurrence probability by mixing the occurrence probabilities of the next element obtained by the neural network probability model and the other probability model using the mixing coefficient;
A mixed occurrence probability calculation device comprising:

A mixing coefficient parameter learning program for causing a computer to function as the mixing coefficient parameter learning apparatus according to claim 1.

A mixed occurrence probability calculation program for causing a computer to function as the mixed occurrence probability calculation device according to claim 3.