JP6343582B2

JP6343582B2 - Language model generation apparatus, method, and program

Info

Publication number: JP6343582B2
Application number: JP2015080212A
Authority: JP
Inventors: 亮増村; 浩和政瀧
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-04-09
Filing date: 2015-04-09
Publication date: 2018-06-13
Anticipated expiration: 2035-04-09
Also published as: JP2016200953A

Description

この発明は、言語モデルを構築するための技術に関する。特に、潜在語リカレントニューラルネットワーク言語モデルと呼ばれる新たな音声認識用言語モデルを構築するための技術に関する。 The present invention relates to a technique for constructing a language model. In particular, the present invention relates to a technique for constructing a new speech recognition language model called a latent word recurrent neural network language model.

音声認識や機械翻訳では、言語的な予測のために言語モデルが必要である。言語モデルは、言語らしさを計測可能なものであり、その性能が音声認識や機械翻訳の性能を左右するものである。これまで、様々な種類の言語モデルが提案されてきている。 Speech recognition and machine translation require a language model for linguistic prediction. A language model can measure language likeness, and its performance affects the performance of speech recognition and machine translation. So far, various kinds of language models have been proposed.

この言語モデルとしてN-gram言語モデルが一般的に利用される。N-gram言語モデルの学習方法は、公知の技術であるためここでは省略する（例えば、非特許文献１参照。）N-gram言語モデルは学習テキストがあれば容易に学習することが可能であり、その学習方法はこれまで様々に提案されている（例えば、非特許文献２参照。）。N-gram言語モデルは、基本的に学習データの単語系列をダイレクトにモデル化することができ、直前の単語N-1単語w_i-N+1,…,w_i-1から現在の単語w_iの予測確率P(w_i|w_i-N+1,…,w_i-1,θ_N-gram)を構成する。なお、θ_N-gramは、N-gram言語モデルのモデルパラメータを表す。 As this language model, an N-gram language model is generally used. Since the learning method of the N-gram language model is a known technique, it is omitted here (for example, see Non-Patent Document 1). The N-gram language model can be easily learned if there is a learning text. Various learning methods have been proposed so far (see, for example, Non-Patent Document 2). The N-gram language model can directly model the word sequence of the learning data, and the current word w from the immediately preceding word N-1 word w _{i-N + 1} , ..., w _i-1. _i prediction probability P (w _i | w _{i−N + 1} ,..., w _i−1 , θ _N-gram ). Note that θ _N-gram represents a model parameter of the N-gram language model.

N-gramモデルを拡張したモデルとして、潜在語言語モデル（Latent Words Language Model）という言語モデルがある（例えば、非特許文献３参照。）。潜在語言語モデルは、潜在語と呼ばれる観測できる単語の裏に隠れた単語を考慮することができ、モデル構造としては潜在語の系列をモデル化する遷移確率モデルと、潜在語ごとの単語の出力確率モデルに分けられる。遷移確率モデルは、潜在語についてのN-gramモデルとして表され、直前のN-1個の潜在語h_i-N+1,…,h_i-1から現在の潜在語h_iの予測確率P(h_i|h_i-N+1,…,h_i-1,θ_LWLM)を構成する。出力確率モデルは、各潜在語ごとの観測単語に対する1-gramモデルとして表され、潜在語h_iの観測単語w_iについての予測確率P(w_i|h_i,θ_LWLM)を構成する。なお、θ_LWLMは、潜在語言語モデルのモデルパラメータを表す。潜在語言語モデルのN-gram言語モデルに対する優位点は、潜在語を考慮することによるロバストネスである。潜在語を考慮することで、少ない学習データからでも性能の高い確率予測が可能であることがわかっている。 As a model obtained by extending the N-gram model, there is a language model called a latent word language model (see Non-Patent Document 3, for example). The latent language model can consider words hidden behind observable words called latent words, and the model structure is a transition probability model that models a series of latent words and the output of words for each latent word. Divided into probabilistic models. Transition probability model is represented as a N-gram model for a potential word, potential word h _{i-N + 1} of the N-1 immediately preceding, ..., predicted probability P h _i-1 from the current potential word h _i (h _i | h _{i−N + 1} ,..., h _i−1 , θ _LWLM ). The output probability model is represented as a 1-gram model for the observation word for each latent word, and constitutes the prediction probability P (w _i | h _i , θ _LWLM ) for the observation word w _i of the latent word h _i . Θ _LWLM represents a model parameter of the latent language model. The advantage of the latent language model over the N-gram language model is robustness by considering latent words. It has been found that by considering latent words, it is possible to predict probability with high performance even from a small amount of learning data.

一方、潜在語を用いない方法で言語モデル高度化する試みもある。近年高い性能を実現しているのはリカレントニューラルネットワーク（Recurrent Neural Network）言語モデルである（例えば、非特許文献１参照。）。 On the other hand, there is an attempt to upgrade the language model by a method that does not use latent words. In recent years, a recurrent neural network language model has achieved high performance (for example, see Non-Patent Document 1).

リカレントニューラルネットワーク言語モデルは、テキストデータから学習できる。リカレントニューラルネットワークによる確率予測では、直前の単語w_i-1と直前のネットワーク中の中間層の出力s_i-1の２つが入力となり、現在の単語w_iの予測確率P(w_i|w_i-1,s_i-1,θ_RNNLM)を構成する。なお、θ_RNNLMはリカレントニューラルネットワークのモデルパラメータである。リカレントニューラルネットワークのN-gramモデルに対する明らかな優位点は、長距離の文脈を考慮して確率分布をモデル化できる点である。N-gramモデルでは、直前のn-1単語のみを文脈として考慮していたが、リカレントニューラルネットワーク言語モデルでは、直前のネットワークの中間層の出力s_i-1に長距離の文脈情報が含まれているため、より頑健な確率予測が実現できる。 The recurrent neural network language model can be learned from text data. In the probability prediction by the recurrent neural network, the immediately preceding word w _i-1 and the output s _i-1 of the intermediate layer in the immediately preceding network are input, and the prediction probability P (w _i | w _{i of the} current word w _i is input. ₋₁ , s _i−1 , θ _RNNLM ). Note that θ _RNNLM is a model parameter of the recurrent neural network. An obvious advantage of the recurrent neural network over the N-gram model is that the probability distribution can be modeled in consideration of the long distance context. In the N-gram model, only the immediately preceding n-1 word was considered as the context, but in the recurrent neural network language model, long _- distance context information is included in the output s _i-1 of the intermediate layer of the immediately preceding network. Therefore, more robust probability prediction can be realized.

北健二，“言語と計算-4 確率的言語モデル”,東京大学出版界, pp.57-62.Kenji Kita, “Language and Computation-4 Stochastic Language Model”, University of Tokyo Press, pp.57-62 S. F. Chen, and J. Goodman, “ An Empirical Study of Smoothing techniques for language modeling ”，Computer Speech & Language, vol.13, no.4, pp.359-383, 1999.S. F. Chen, and J. Goodman, “An Empirical Study of Smoothing techniques for language modeling”, Computer Speech & Language, vol.13, no.4, pp.359-383, 1999. K. Deschacht, J. D. Belder, and M-F. Moens, “ The 潜在語言語モデル ”，Computer Speech and Language, vol.26, pp.384-409, 2012.K. Deschacht, J. D. Belder, and M-F. Moens, “The Latent Language Model”, Computer Speech and Language, vol.26, pp.384-409, 2012. Mikolov Tomas, Karafiat Martin, Burget Lukas, Cernocky Jan, Khudanpur Sanjeev: Recurrent neural network based language model, In: Proc Interspeech2010.Mikolov Tomas, Karafiat Martin, Burget Lukas, Cernocky Jan, Khudanpur Sanjeev: Recurrent neural network based language model, In: Proc Interspeech2010.

潜在語言語モデルは遷移確率モデル及び出力確率モデルから構成されると述べたが、遷移確率モデルがN-gram言語モデルの構造にとどまっている。遷移確率モデルは、潜在語の系列情報をモデル化しているが、長距離の潜在語の関係をモデル化できていない。そのため、確率予測の性能が低くなってしまっている。 Although the latent language model is described as consisting of a transition probability model and an output probability model, the transition probability model remains in the structure of the N-gram language model. The transition probability model models series information of latent words, but cannot model a long-distance latent word relationship. Therefore, the performance of probability prediction has been lowered.

この発明の目的は、従来よりも確率予測の性能が高い言語モデルを生成する言語モデル生成装置、方法及びプログラムを提供することである。 An object of the present invention is to provide a language model generation apparatus, method, and program for generating a language model with higher probability prediction performance than the conventional one.

この発明の一態様による言語モデル生成装置は、w₁,w₂,…,w_Lをテキストデータを構成する各単語とし、h₁,h₂,…,h_Lをそれぞれw₁,w₂,…,w_Lの潜在語とし、θ_LWLMを潜在語言語モデルのモデルパラメータとし、Nを所定の正の整数として、テキストデータを用いて潜在語言語モデル学習を行い、h₁,h₂,…,h_Lと、確率分布P(h_i|h_i-N+1,…,h_i-1,θ_LWLM)と、確率分布P(w_i|h_i,θ_LWLM)とを生成する潜在語言語モデル学習部と、s_i-1をh_iの直前のネットワーク中の中間層の出力とし、θ_LWRNNLMをリカレントニューラルネットワークのモデルパラメータとして、h₁,h₂,…,h_Lを用いてリカレントニューラルネットワーク学習を行い、確率分布P(h_i|h_i-1,s_i-1,θ_LWRNNLM)を生成するリカレントニューラルネットワーク学習部と、確率分布P(w_i|h_i,θ_LWLM)と確率分布P(h_i|h_i-1,s_i-1,θ_LWRNNLM)との乗算を用いて任意の単語列が出現する確率を計算するための言語モデルを構成する潜在語リカレントニューラルネットワーク言語モデル構成部と、を備えている。 In the language model generation device according to one aspect of the present invention, w ₁ , w ₂ ,..., W _L are words constituting the text data, and h ₁ , h ₂ ,..., H _L are w ₁ , w ₂ , …, W _L latent words, θ _LWLM is a model parameter of the latent language model, N is a predetermined positive integer, latent language language model learning is performed using text data, h ₁ , h ₂ , ... , h _L , latent word that generates probability distribution P (h _i | h _{i-N + 1} , ..., h _i-1 , θ _LWLM ) and probability distribution P (w _i | h _i , θ _LWLM ) Recurrent using the language model learning unit and s _i-1 as the output of the intermediate layer in the network immediately before h _i and θ _LWRNNLM as the model parameter of the recurrent neural network using h ₁ , h ₂ , ..., h _L Recurrent neural network learning unit that performs neural network learning and generates probability distribution P (h _i | h _i-1 , s _i-1 , θ _LWRNNLM ), probability distribution P (w _i | h _i , θ _LWLM ) and Probability distribution P Latent word recurrent neural network language model constructing unit that constructs a language model for calculating the probability of occurrence of an arbitrary word sequence using multiplication with (h _i | h _i-1 , s _i-1 , θ _LWRNNLM ) And.

従来よりも確率予測の性能が高い言語モデルを生成することができる。 It is possible to generate a language model with higher probability prediction performance than before.

言語モデル生成装置の例を説明するためのブロック図。The block diagram for demonstrating the example of a language model production | generation apparatus. 言語モデル生成方法の例を説明するための流れ図。The flowchart for demonstrating the example of the language model production | generation method.

［技術的背景］
潜在語言語モデルにおける潜在変数の遷移確率モデルを通常のN-gram言語モデルではなく、リカレントニューラルネットワーク言語モデルでモデル化する。具体的には遷移確率モデルがN-gramモデルである通常の潜在語言語モデルを学習する際に、予測される潜在語系列を利用してリカレントニューラルネットワーク言語モデルを構築し、それを遷移確率モデルに採用する。このとき、出力確率モデルは元の潜在語言語モデルのものを利用する。なお、このように構築するモデルを潜在語リカレントニューラルネットワーク（Latent Words Recurrent Neural Network）言語モデルと呼ぶことにする。 [Technical background]
The latent variable transition probability model in the latent language model is modeled by a recurrent neural network language model instead of the normal N-gram language model. Specifically, when learning an ordinary latent language model whose transition probability model is an N-gram model, a recurrent neural network language model is constructed using the predicted latent word sequence, and it is used as a transition probability model. To adopt. At this time, the output probability model uses the original latent language model. The model constructed in this way is called a latent word recurrent neural network language model.

潜在語リカレントニューラルネットワーク言語モデルを導入することにより、通常の潜在語言語モデルと比較して高い言語予測性能を持つ言語モデルを構築できる。このモデルを音声認識で用いることで高い認識性能が得られ、また機械翻訳に用いることで、高い翻訳性能が得られる。 By introducing the latent word recurrent neural network language model, it is possible to construct a language model having a higher language prediction performance than a normal latent language language model. High recognition performance can be obtained by using this model for speech recognition, and high translation performance can be obtained by using it for machine translation.

［実施形態］
潜在語リカレントニューラルネットワーク言語モデルを構築するための言語モデル生成装置及び方法の実施形態の例について述べる。 [Embodiment]
An example embodiment of a language model generation apparatus and method for building a latent word recurrent neural network language model is described.

言語モデル生成装置は、潜在語言語モデル学習部１、リカレントニューラルネットワーク学習部２及び潜在語リカレントニューラルネットワーク言語モデル構成部３を例えば備えている。言語モデル生成装置が、図２に例示する各ステップの処理を行うことにより言語モデル生成方法が実現される。 The language model generation device includes, for example, a latent language language model learning unit 1, a recurrent neural network learning unit 2, and a latent word recurrent neural network language model configuration unit 3. The language model generation apparatus implements the language model generation method by performing the processing of each step illustrated in FIG.

言語モデル生成装置の入力は単語区切りがわかるテキストデータ、言語モデル生成装置の出力は潜在語リカレントニューラルネットワーク言語モデルとなる。単語区切りがわかるテキストデータは、任意の形態素解析器を利用することで、単語区切りなしのテキストファイルから作成することが可能である。以下、詳細な流れを説明する。 The input of the language model generation device is text data that understands word breaks, and the output of the language model generation device is a latent word recurrent neural network language model. Text data for which word breaks can be understood can be created from a text file without word breaks by using an arbitrary morphological analyzer. The detailed flow will be described below.

以下、言語モデル生成装置の各部の処理について説明する。 Hereinafter, processing of each unit of the language model generation device will be described.

＜潜在語言語モデル学習部１＞
入力：単語区切りがわかるテキストデータ
出力：潜在語言語モデルについての情報、学習データの潜在語系列
潜在語言語モデル学習部１は、単語区切りがわかるテキストデータを学習データとして、潜在語言語モデルを学習する（ステップＳ１）。具体的な学習方法については、例えば非特許文献２に記載された既存の潜在語言語モデルの学習方法を用いればよい。 <Latent language model learning unit 1>
Input: Text data that understands word breaks Output: Information about latent language models, latent word sequences of learning data The latent language model learning unit 1 learns latent language models using text data that understands word breaks as learning data (Step S1). As a specific learning method, for example, an existing latent language model learning method described in Non-Patent Document 2 may be used.

潜在語言語モデルは、P(h_i|h_i-N+1,…,h_i-1,θ_LWLM)をという確率分布と、P(w_i|h_i,θ_LWLM)という確率分布の２個の確率分布を持っている。ここで、h_iは潜在語、w_iは観測語と呼ばれる。潜在語h_iは、潜在語言語モデルにおける潜在変数にあたり、観測語w_iは実際にテキスト中に出現する単語を表す。P(h_i|h_i-N+1,…,h_i-1,θ_LWLM)は一般的な単語N-gram言語モデルの形、P(w_i|h_i,θ_LWLM)はunigram言語モデルとなっている。なお、θ_LWLMは、潜在語言語モデルのモデルパラメータを表す。 The latent language model has two probability distributions, P (h _i | h _{i-N + 1} , ..., h _i-1 , θ _LWLM ) and P (w _i | h _i , θ _LWLM ). Has a probability distribution. Here, h _i is called a latent word and w _i is called an observation word. The latent word h _i is a latent variable in the latent language model, and the observed word w _i represents a word that actually appears in the text. P (h _i | h _{i-N + 1} ,…, h _i-1 , θ _LWLM ) is a general word N-gram language model form, P (w _i | h _i , θ _LWLM ) is a unigram language model It has become. Θ _LWLM represents a model parameter of the latent language model.

潜在語言語モデルの学習は、入力する学習テキストの各単語に対して、潜在語の割り当てを推定する問題である。つまり、「w₁・w₂・…・w_L」(学習テキストに含まれる総単語数：L)という学習テキスト(観測語の系列)があれば、「w₁」「w₂」…「w_L」の各観測語の潜在語「h₁」「h₂」…「h_L」を推定する問題と言える。この割り当てを推定できれば、潜在語系列「h₁・h₂・…・h_L」に対してN-gram言語モデルを学習すればP(h_i|h_i-N+1,…,h_i-1,θ_LWLM)を構築でき、「h₁→w₁」「h₂→w₂」…「h_L→w_L」に対して、unigram言語モデルを学習すればP(w_i|h_i,θ_LWLM)を構築できる。具体的な潜在語の割り当ての推定は、ギブスサンプリングという方法により推定できる。ギブスサンプリングについては公知の技術であるため、ここではその説明を省略する。 The learning of the latent language model is a problem of estimating the allocation of latent words for each word of the input learning text. That is, if there is a learning text (series of observation words) “w ₁ · w ₂ ... W _L ” (total number of words included in the learning text: L), “w ₁ ” “w ₂ ” ... “w It can be said that the latent words “h ₁ ”, “h ₂ ”... “H _L ” of each observation word of “ _L ” are estimated. If this assignment can be estimated, P (h _i | h _{i-N + 1} ,…, h _i− can be obtained by learning an N-gram language model for the latent word sequence “h ₁ · h ₂ ···· h _L ”. ₁ , θ _LWLM ), and by learning a unigram language model for “h ₁ → w ₁ ”, “h ₂ → w ₂ ” ... “h _L → w _L ”, P (w _i | h _i , θ _LWLM ) can be constructed. A specific latent word assignment can be estimated by a method called Gibbs sampling. Since Gibbs sampling is a known technique, the description thereof is omitted here.

最終的な出力は潜在語言語モデル(具体的には、潜在語言語モデルのパラメータの実体である２個の確率分布P(h_i|h_i-N+1,…,h_i-1,θ_LWLM),P(w_i|h_i,θ_LWLM)と、そのモデル化の際に推定した入力された学習データの潜在語系列である。 The final output is a latent language model (specifically, two probability distributions P (h _i | h _{i−N + 1} ,..., H _i−1 , θ which are parameters of the latent language model) _LWLM ), P (w _i | h _i , θ _LWLM ) and latent word sequences of input learning data estimated at the time of modeling.

潜在語言語モデルは、潜在語リカレントニューラルネットワーク言語モデル構成部３に出力される。学習データの潜在語系列はリカレントニューラルネットワーク学習部２に出力される。 The latent word language model is output to the latent word recurrent neural network language model configuration unit 3. The latent word sequence of the learning data is output to the recurrent neural network learning unit 2.

＜リカレントニューラルネットワーク学習部２＞
入力：学習データの潜在語系列
出力：潜在語についてのリカレントニューラルネットワーク
リカレントニューラルネットワーク学習部２は、潜在語言語モデル学習部１の出力として得られた、学習データの潜在語系列からリカレントニューラルネットワークを学習する（ステップＳ２）。通常は、観測単語系列から学習するリカレントニューラルネットワークをここでは、学習データの潜在語系列から学習することになる。学習方法自体は、観測単語系列を扱う場合とは変わらない。すなわち、例えば非特許文献４に記載されている、観測単語系列からリカレントニューラルネットワークを学習する方法を、観測単語系列に代えて学習データの潜在語系列に対して適用することにより、リカレントニューラルネットワークを学習することができる。 <Recurrent neural network learning unit 2>
Input: Latent word sequence of learning data Output: Recurrent neural network for latent words The recurrent neural network learning unit 2 generates a recurrent neural network from the latent word sequence of learning data obtained as an output of the latent language language model learning unit 1. Learning is performed (step S2). Normally, a recurrent neural network that learns from observed word sequences is learned from latent word sequences of learning data. The learning method itself is the same as when the observed word sequence is handled. That is, for example, by applying the method for learning the recurrent neural network from the observed word sequence described in Non-Patent Document 4 to the latent word sequence of the learning data instead of the observed word sequence, the recurrent neural network is Can learn.

ここで、学習されるリカレントニューラルネットワークは、直前の潜在語h_i-1と直前のネットワーク中の中間層の出力s_i-1の２つを入力とした場合の現在の潜在語h_iの予測確率についての確率分布P(h_i|h_i-1,s_i-1,θ_LWRNNLM)である。なお、θ_LWRNNLMは、潜在語についてのリカレントニューラルネットワークのモデルパラメータである。 Here, the recurrent neural network to be learned is a prediction of the current latent word h _i when the previous latent word h _i-1 and the output s _i-1 of the intermediate layer in the previous network are input. A probability distribution P (h _i | h _i−1 , s _i−1 , θ _LWRNNLM ) for the probability. Θ _LWRNNLM is a model parameter of the recurrent neural network for the latent word.

学習により生成された、潜在語についてのリカレントニューラルネットワークは、潜在語リカレントニューラルネットワーク言語モデル構成部３に出力される。 The recurrent neural network for latent words generated by learning is output to the latent word recurrent neural network language model construction unit 3.

＜潜在語リカレントニューラルネットワーク言語モデル構成部３＞
入力：潜在語言語モデル、潜在語についてのリカレントニューラルネットワーク
出力：潜在語リカレントニューラルネットワーク言語モデル
潜在語リカレントニューラルネットワーク言語モデル構成部３は、潜在語言語モデルと、潜在語についてのリカレントニューラルネットワークとから潜在語リカレントニューラルネットワーク言語モデルを構成する（ステップＳ３）。具体的には、潜在語言語モデルにおけるP(w_i|h_i,θ_LWLM)のパラメータと、潜在語についてのリカレントニューラルネットワークにおけるP(h_i|h_i-1,s_i-1,θ_LWRNNLM)のパラメータとを取り出し、取り出したパラメータをペアとした確率モデルを構成する。Jは、h_i(i=1,2,…,L)の取り得る値の集合である。 <Latent Word Recurrent Neural Network Language Model Configuration Unit 3>
Input: latent word language model, recurrent neural network for latent words Output: latent word recurrent neural network language model The latent word recurrent neural network language model configuration unit 3 includes a latent word language model and a recurrent neural network for latent words. A latent word recurrent neural network language model is constructed (step S3). Specifically, the parameter of P (w _i | h _i , θ _LWLM ) in the latent word language model and P (h _i | h _i-1 , s _i-1 , θ _LWRNNLM in the recurrent neural network for the latent word ) And a probability model in which the extracted parameters are paired. J is a set of possible values of h _i (i = 1, 2,..., L).

このようにして、潜在語リカレントニューラルネットワーク言語モデル構成部３は、確率分布P(w_i|h_i,θ_LWLM)と確率分布P(h_i|h_i-1,s_i-1,θ_LWRNNLM)との乗算を用いて任意の単語列が出現する確率を計算するための言語モデルを構成する。 In this way, the latent word recurrent neural network language model constructing unit 3 performs the probability distribution P (w _i | h _i , θ _LWLM ) and the probability distribution P (h _i | h _i−1 , s _i−1 , θ _LWRNNLM ) And a language model for calculating the probability that an arbitrary word string appears.

［変形例］
潜在語リカレントニューラルネットワーク言語モデルに基づいて、N-gram言語モデルに近似により生成してもよい。これにより、音声認識や機械翻訳で利用しやすいモデルの形にすることができる。N-gram言語モデルの形は、音声認識や機械翻訳で高速に動作させる形態が整っており、実用に優れる。 [Modification]
The N-gram language model may be generated by approximation based on the latent word recurrent neural network language model. As a result, the model can be easily used for speech recognition and machine translation. The form of the N-gram language model is well-practical because it can be operated at high speed by speech recognition and machine translation.

そのために、言語モデル生成装置は、例えば疑似学習テキスト生成部４及びN-gram言語モデル生成部５を更に備えていてもよい。 Therefore, the language model generation device may further include, for example, a pseudo learning text generation unit 4 and an N-gram language model generation unit 5.

＜疑似学習テキスト生成部４＞
入力：潜在語リカレントニューラルネットワーク言語モデル
出力：疑似学習テキスト
疑似学習テキスト生成部４は、潜在語リカレントニューラルネットワーク言語モデル構成部３が構築した潜在語リカレントニューラルネットワーク言語モデルから疑似学習テキストを生成する。ここでは、疑似学習テキスト「w₁・w₂・…・w_M」というM個の単語を生成することを目的とする。基本的に、最初に潜在語系列「h₁・h₂・…・h_M」を最初に生成し、そこから疑似学習テキストを生成することとなる。潜在語系列の生成には、P(h_i|h_i-1,s_i-1,θ_LWRNNLM)のパラメータを利用する。潜在語からの単語を生成する際は、P(w_i|h_i,θ_LWLM)のパラメータを利用する。 <Pseudo-learning text generator 4>
Input: latent word recurrent neural network language model Output: pseudo learning text The pseudo learning text generation unit 4 generates pseudo learning text from the latent word recurrent neural network language model constructed by the latent word recurrent neural network language model configuration unit 3. Here, the purpose is to generate M words of pseudo-learning text “w ₁ · w ₂ ... · W _M ”. Basically, a latent word sequence “h ₁ , h ₂ ,..., H _M ” is first generated, and a pseudo learning text is generated therefrom. The parameter of P (h _i | h _i−1 , s _i−1 , θ _LWRNNLM ) is used to generate the latent word sequence. When generating a word from a latent word, the parameter of P (w _i | h _i , θ _LWLM ) is used.

まず、初期の潜在語h₁をP(h₁|silS,s₀,θ_LWRNNLM)に基づいてランダムに生成する。ここで、silSは先頭記号、s₀はすべての要素が0のベクトルを用いる。確率分布を得たときに、ランダムに生成する場合はSampleOneというアルゴリズムに従う。SampleOneアルゴリズムは、ある離散確率分布からランダムに1つ単語をサンプルするアルゴリズムであり、これについては後で説明する。h₁を生成したら、どのw₁を生成するかを決定する。これは、P(w₁|h₁,θ_LWLM)という確率分布を得て、この分布に基づいてSampleOneアルゴリズムに基づいて生成を行う。次は、h₂を生成するために、確率分布P(h₂|h₁,s₁,θ_LWRNNLM)を得る。この確率分布に従ってh₂をSampleOneアルゴリズムにより生成する。h₂を生成したら、どのw₂を生成するかを決定する。これは、P(w₂|h₂,h₁,θ_LWLM)という確率分布を得る。この確率分布に従ってw₂をSampleOneアルゴリズムにより生成する。さらに次は、P(h₃|h₂,s₂,θ_LWRNNLM)という確率分布に基づいてh₃を生成し、・・・という生成を繰り返す。h_Mを生成するまでこれを繰り返すことで、潜在語系列「h₁・h₂・…・h_M」、および疑似学習テキスト「w₁・w₂・…・w_M」を生成する。なお、Mの値は人手で決定する。この値が大きいほど潜在語リカレントニューラルネットワーク言語モデルの性質を良く表す疑似学習テキストとできる。この値は最初の学習テキストに含まれる単語数Lと同等またはそれより大きい値を使うべきである。小さすぎると性能は出ない。 First, an initial latent word h ₁ is randomly generated based on P (h ₁ | silS, s ₀ , θ _LWRNNLM ). Here, silS uses a leading symbol, and s ₀ uses a vector in which all elements are 0. If a random distribution is generated when a probability distribution is obtained, an algorithm called SampleOne is followed. The SampleOne algorithm is an algorithm that randomly samples one word from a certain discrete probability distribution, which will be described later. Once h ₁ is generated, decide which w ₁ to generate. This _obtains a probability distribution of P (w ₁ | h ₁ , θ _LWLM ), and generates based on the SampleOne algorithm based on this distribution. Next, in order to generate h ₂ , a probability distribution P (h ₂ | h ₁ , s ₁ , θ _LWRNNLM ) is obtained. H ₂ is generated by the SampleOne algorithm according to this probability distribution. After generating h ₂ , decide which w ₂ to generate. This obtains a probability distribution of P (w ₂ | h ₂ , h ₁ , θ _LWLM ). W ₂ is generated by the SampleOne algorithm according to this probability distribution. Next, h ₃ is generated based on a probability distribution of P (h ₃ | h ₂ , s ₂ , θ _LWRNNLM ), and the generation of... _Is repeated. By repeating this process until h _M is generated, the latent word sequence “h ₁ · h ₂ ... h _M ” and pseudo learning text “w ₁ · w ₂ ... w _M ” are generated. The value of M is determined manually. The larger this value, the more the pseudo-learning text that better represents the properties of the latent word recurrent neural network language model. This value should be equal to or greater than the number of words L included in the first learning text. If it is too small, performance will not be achieved.

以下、SampleOneアルゴリズムについて説明する。
入力：確率分布(多項分布)
出力：確率分布の実現値
SampleOneアルゴリズムは、確率分布からランダムに１個の値を決定するためのアルゴリズムである。具体的に説明するために、前述の例であるP(h₁)が入力である場合を扱う。 Hereinafter, the SampleOne algorithm will be described.
Input: Probability distribution (multinomial distribution)
Output: Realized probability distribution
The SampleOne algorithm is an algorithm for determining one value at random from a probability distribution. In order to explain specifically, the case where P (h ₁ ) in the above example is an input will be treated.

P(h₁)は多項分布と呼ばれる確率分布の形となっている。h₁の具体的な実現値の集合をJとする。Jは、確率分布が与えられれば自動的に決まるものである。具体的にhが、P(h₁)という確率分布は、P(h₁=t₁), P(h₁=t₂),…, P(h₁=t_H)となっている。ここで、t₁,t₂,…,t_Hが具体的な実現値であり、この集合がJである。このとき、P(h₁)は次の性質を持つ。 P (h ₁ ) is in the form of a probability distribution called a multinomial distribution. Let J be a set of concrete realization values of h ₁ . J is automatically determined if a probability distribution is given. Specifically, the probability distribution that h is P (h ₁ ) is P (h ₁ = t ₁ ), P (h ₁ = t ₂ ),..., P (h ₁ = t _H ). Here, t ₁ , t ₂ ,..., T _H are specific realization values, and this set is J. At this time, P (h ₁ ) has the following properties.

このとき、h₁のSampleOneは乱数に基づく。ここでは、乱数値をrandとおく。P(h₁=t₁), P(h₁=t₂),…, P(h₁=t_H)は具体的な数値を持っている。rand-P(h₁=t₁), rand-P(h₁=t₁)-P(h₁=t₂), rand-P(h₁=t₁)-P(h₁=t₂)-P(h₁=t₃),…と順番に値を算出し、その値が0より小さくなった場合の値を出力する。例えば、
rand-P(h₁=t₁)>0
rand-P(h₁=t₁)-P(h₁=t₂)<0
であれば、t₂を出力する。SampleOneアルゴリズムは、任意の多項分布からのデータサンプルアルゴリズムと言える。 At this time, SampleOne of h ₁ is based on a random number. Here, the random value is set to rand. P (h ₁ = t ₁ ), P (h ₁ = t ₂ ),..., P (h ₁ = t _H ) have specific numerical values. rand-P (h ₁ = t ₁ ), rand-P (h ₁ = t ₁ ) -P (h ₁ = t ₂ ), rand-P (h ₁ = t ₁ ) -P (h ₁ = t ₂ ) -P (h ₁ = t ₃ ), ... in order and output the value when that value is less than 0. For example,
rand-P (h ₁ = t ₁ )> 0
rand-P (h ₁ = t ₁ ) -P (h ₁ = t ₂ ) <0
If so, t ₂ is output. The SampleOne algorithm can be said to be a data sampling algorithm from an arbitrary multinomial distribution.

＜N-gram言語モデル生成部５＞
入力：疑似学習テキスト
出力：潜在語リカレントニューラルネットワーク的N-gram言語モデル
N-gram言語モデル生成部５は、学習テキスト中の全てのN個組みの単語の組み合わせの頻度を数え、N-gram言語モデルとし、潜在語リカレントニューラルネットワーク的N-gram言語モデルを構成する。 <N-gram language model generator 5>
Input: Pseudo learning text Output: Latent word recurrent neural network N-gram language model
The N-gram language model generation unit 5 counts the frequency of the combination of all N words in the learning text, forms an N-gram language model, and constructs a latent word recurrent neural network N-gram language model.

音声認識の場合は、一般的にN=3をとることが多い。N-gram言語モデルの学習方法は、例えば非特許文献１に記載された公知の技術であるためここでは省略する。これにより、潜在語についてのリカレントニューラルネットワークの性質を引き継いだN-gram言語モデルを構成でき、音声認識や機械翻訳で簡単に利用できる。 In the case of speech recognition, generally N = 3 is often used. Since the learning method of the N-gram language model is a known technique described in Non-Patent Document 1, for example, it is omitted here. This makes it possible to construct an N-gram language model that inherits the properties of the recurrent neural network for latent words, and can be easily used for speech recognition and machine translation.

[プログラム及び記録媒体]
言語モデル生成装置及び方法において説明した処理は、記載の順にしたがって時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 [Program and recording medium]
The processes described in the language model generation apparatus and method are not only executed in time series in the order described, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes or as necessary. .

また、言語モデル生成装置における各処理をコンピュータによって実現する場合、言語モデル生成装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、その各処理がコンピュータ上で実現される。 Further, when each process in the language model generation device is realized by a computer, the processing contents of the functions that the language model generation device should have are described by a program. Then, by executing this program on a computer, each process is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、各処理手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each processing means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

その他、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 Needless to say, other modifications are possible without departing from the spirit of the present invention.

１潜在語言語モデル学習部
２リカレントニューラルネットワーク学習部
３潜在語リカレントニューラルネットワーク言語モデル構成部
４疑似学習テキスト生成部
５言語モデル生成部 DESCRIPTION OF SYMBOLS 1 Latent language language model learning part 2 Recurrent neural network learning part 3 Latent word recurrent neural network language model structure part 4 Pseudo learning text generation part 5 Language model generation part

Claims

w ₁ , w ₂ , ..., w _L are the words constituting the text data, h ₁ , h ₂ , ..., h _L are the latent words of w ₁ , w ₂ , ..., w _L respectively, and θ _LWLM is The latent language model learning is performed using the above text data with the model parameters of the latent language model, N being a predetermined positive integer, and h ₁ , h ₂ ,..., H _L and the probability distribution P (h _i | h _{i-N + 1} , ..., h _i-1 , θ _LWLM ) and probability distribution P (w _i | h _i , θ _LWLM ),
s _i-1 is the output of the intermediate layer in the network immediately before h _i , θ _LWRNNLM is the model parameter of the recurrent neural network, and h ₁ , h ₂ , ..., h _L is used to perform recurrent neural network learning, A recurrent neural network learning unit that generates a probability distribution P (h _i | h _i-1 , s _i-1 , θ _LWRNNLM );
Calculate the probability that an arbitrary word string appears using multiplication of probability distribution P (w _i | h _i , θ _LWLM ) and probability distribution P (h _i | h _i-1 , s _i-1 , θ _LWRNNLM ) A latent word recurrent neural network language model component that constitutes a language model for
Language model generation device including

The language model generation device according to claim 1,
A pseudo-learning text generation unit that generates pseudo-learning text based on the configured language model;
An N-gram language model generation unit that generates an N-gram language model based on the pseudo-learning text;
A language model generation apparatus further comprising:

The latent language model learning unit designates w ₁ , w ₂ ,..., W _L as words constituting the text data, and h ₁ , h ₂ ,..., H _L as w ₁ , w ₂ _,. the latent word, the model parameters of the potential language model theta _LwLm, the N as the predetermined positive integer, performs latent language model learning using the text _{_{data, h 1, h 2, ...}} , h L If the probability distribution _{_{P (h i | h i-}} N + 1, ..., h i-1, θ LWLM) and the probability distribution _{_{P (w i | h i,}} θ LWLM) and potential language model learning to generate a Steps,
The recurrent neural network learning unit uses s _i-1 as the output of the intermediate layer in the network immediately before h _i , θ _LWRNNLM as the model parameter of the recurrent neural network, and h ₁ , h ₂ , ..., h _L A recurrent neural network learning step for performing recurrent neural network learning and generating a probability distribution P (h _i | h _i-1 , s _i-1 , θ _LWRNNLM );
Latent word recurrent neural network language model component uses multiplication of probability distribution P (w _i | h _i , θ _LWLM ) and probability distribution P (h _i | h _i-1 , s _i-1 , θ _LWRNNLM ) A latent word recurrent neural network language model constructing step for constructing a language model for calculating the probability of occurrence of an arbitrary word string;
Language model generation method including

The program for functioning a computer as each part of the language model production | generation apparatus of Claim 1 or 2.