JP2017117045A

JP2017117045A - Method, device, and program for language probability calculation

Info

Publication number: JP2017117045A
Application number: JP2015249375A
Authority: JP
Inventors: 貴明堀; Takaaki Hori; 具治岩田; Tomoharu Iwata; 哲則小林; Tetsunori Kobayashi; 幹森岡; Miki Morioka
Original assignee: Waseda University; Nippon Telegraph and Telephone Corp
Current assignee: Waseda University; Nippon Telegraph and Telephone Corp
Priority date: 2015-12-22
Filing date: 2015-12-22
Publication date: 2017-06-29
Anticipated expiration: 2035-12-22
Also published as: JP6495814B2

Abstract

PROBLEM TO BE SOLVED: To appropriately predict the next word by effectively utilizing information of a long text, that is, a topic and a style of the sentence, a word peculiar to a speaker and a way of speaking or the like.SOLUTION: A language probability calculation device 1 sequentially reads a vector showing a symbol, and calculates an activity vector of an input layer for each reading, and every time the activity vector of the input layer is calculated, the activity vector of the intermediate layer is calculated based on the activity vector calculated last time in the intermediate layer and the activity vector of the input layer. At this time, the language probability calculation device 1 calculates an average activity vector which is an average of the activity vector calculated previous to the prescribed number of times from among the activity vectors of the intermediate layer, and every time the activity vector of the intermediate layer is calculated, the activity vector of an output layer is calculated based on the activity vector of the intermediate layer and the average activity vector of the intermediate layer, and an appearance probability of the symbol is calculated based on the activity vector of the output layer.SELECTED DRAWING: Figure 4

Description

本発明は、言語モデルを用いた言語確率算出方法、言語確率算出装置および言語確率算出プログラムに関する。 The present invention relates to a language probability calculation method, a language probability calculation device, and a language probability calculation program using a language model.

従来、文字や単語の列である記号列が、ある対象の言語においてどの程度もっともらしいかを言語確率として与えるモデルである言語モデルが知られている。言語モデルは様々な目的に利用されている。例えば音声認識においては、ある入力音声信号に対して任意の単語列の発音と音響的に類似している度合を示す音響確率と、単語の並びとして言語的に妥当である度合を表す言語確率とを併せて考慮することで、音響的にも言語的にももっともらしい単語列を多数の認識候補の中から選ぶことができる。 2. Description of the Related Art Conventionally, a language model is known, which is a model that gives as a language probability how likely a symbol string, which is a string of characters or words, in a target language. Language models are used for various purposes. For example, in speech recognition, an acoustic probability indicating the degree of acoustic similarity to the pronunciation of an arbitrary word sequence for a certain input speech signal, and a language probability indicating the degree of linguistic validity as a word sequence In addition, it is possible to select a word string that is plausible acoustically and linguistically from among a large number of recognition candidates.

一般的に広く用いられる言語モデルとしてＮグラム言語モデルがある。Ｎグラム言語モデルは、ある単語の出現確率はその単語の前にあるＮ−１単語にのみ依存するという仮定を置く。つまり、Ｎ単語の連鎖確率を個々の単語の言語確率として推定し、その累積値を単語列に対する言語確率とする。一般にＮには２〜４程度の値が用いられる。 There is an N-gram language model as a widely used language model. The N-gram language model makes the assumption that the appearance probability of a word depends only on the N-1 word preceding that word. That is, the chain probability of N words is estimated as the language probability of each word, and the accumulated value is used as the language probability for the word string. Generally, a value of about 2 to 4 is used for N.

一方、Ｎグラム言語モデル以外の言語モデルとして、ＲＮＮ（リカレントニューラルネットワーク：Recurrent Neural Network）言語モデルがある（例えば非特許文献１を参照）。ＲＮＮは多層ニューラルネットワークの一種であり、中間層のニューロンに再帰的な結合を持つ。この再帰的な結合により、入力単語列の頭から直前に読み込んだ単語までの全文脈を中間層の活性度ベクトルに蓄えることができ、より長い文脈に依存する言語確率を算出することができる。 On the other hand, as a language model other than the N-gram language model, there is an RNN (Recurrent Neural Network) language model (see, for example, Non-Patent Document 1). RNN is a kind of multilayer neural network, and has recursive connections to neurons in the intermediate layer. By this recursive combination, all contexts from the beginning of the input word string to the word read immediately before can be stored in the activity vector of the intermediate layer, and a longer language-dependent language probability can be calculated.

T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, S. Khudanpur,“Recurrent neural network based language model,” 国際会議Interspeech 2010予稿集, pp. 1045-1048, 2010.T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, S. Khudanpur, “Recurrent neural network based language model,” International Conference Interspeech 2010 Proceedings, pp. 1045-1048, 2010.

しかしながら、従来の言語モデルには、長い文脈の情報、すなわち文章の話題やスタイル、話者に特有の単語や話し方等を効果的に利用して次の単語を適切に予測することができないという問題があった。 However, the conventional language model cannot effectively predict the next word by effectively using long contextual information, that is, the topic and style of the sentence, the word and the way of speaking specific to the speaker, etc. was there.

例えば、Ｎグラム言語モデルを用いて単語を予測する場合、前述の通り単語数個分の情報しか単語の予測に利用することができないため、長い文脈の情報を効果的に利用して次の単語を適切に予測することができない場合がある。 For example, when predicting a word using an N-gram language model, only the information for several words can be used for predicting a word as described above. May not be predicted properly.

また、ＲＮＮ言語モデルを用いて単語を予測する場合、より新しく出現した単語の影響が大きくなり、ある程度過去の単語の影響は非常に小さくなるため、長い文脈の情報を効果的に利用して次の単語を適切に予測することができない場合がある。これは、再帰的な結合によって伝えられる成分が活性度関数により０と１の間に正規化されることにより、活性度ベクトルに蓄えられる過去に読み込まれた記号に対する成分が、新たな記号を読み込むごとに指数的に減少するためである。 In addition, when predicting a word using the RNN language model, the influence of a newly appearing word becomes large, and the influence of a past word to some extent becomes very small. May not be able to predict the word properly. This is because the component transmitted by recursive combination is normalized between 0 and 1 by the activity function, so that the component for the previously read symbol stored in the activity vector reads a new symbol. This is because it decreases exponentially.

本発明の言語確率算出方法は、入力層と、再帰結合したニューロンを有する中間層と、出力層と、を有するニューラルネットワークモデルを用いて言語確率を算出する言語確率算出方法であって、記号を表すベクトルを順次読み込む記号ベクトル読込工程と、前記記号ベクトル読込工程によって前記ベクトルが読み込まれるたびに、前記ベクトルを基に前記入力層における活性度ベクトルを算出する入力層活性度ベクトル算出工程と、前記入力層活性度ベクトル算出工程によって前記入力層における活性度ベクトルが算出されるたびに、前記中間層において前回算出された活性度ベクトルと、前記入力層における活性度ベクトルとを基に前記中間層における活性度ベクトルを算出する中間層活性度ベクトル算出工程と、前記中間層における活性度ベクトルのうち、所定回数前までに算出された活性度ベクトルの平均である平均活性度ベクトルを算出する平均活性度ベクトル算出工程と、前記中間層活性度ベクトル算出工程によって前記中間層における活性度ベクトルが算出されるたびに、前記中間層における活性度ベクトルと、前記中間層における前記平均活性度ベクトルとを基に前記出力層における活性度ベクトルを算出する出力層活性度ベクトル算出工程と、前記出力層における活性度ベクトルを基に所定の記号の出現確率を算出する記号出現確率算出工程と、を含んだことを特徴とする。 The language probability calculation method of the present invention is a language probability calculation method for calculating a language probability using a neural network model having an input layer, an intermediate layer having recursively coupled neurons, and an output layer, A symbol vector reading step for sequentially reading vectors to be represented; an input layer activity vector calculating step for calculating an activity vector in the input layer based on the vector each time the vector is read by the symbol vector reading step; Each time the activity vector in the input layer is calculated by the input layer activity vector calculation step, the activity level in the intermediate layer is calculated based on the activity vector previously calculated in the intermediate layer and the activity vector in the input layer. An intermediate layer activity vector calculating step for calculating an activity vector, and an activity in the intermediate layer; An average activity vector calculating step of calculating an average activity vector that is an average of the activity vectors calculated up to a predetermined number of times before, and the activity in the intermediate layer by the intermediate layer activity vector calculating step An output layer activity vector calculation step of calculating an activity vector in the output layer based on the activity vector in the intermediate layer and the average activity vector in the intermediate layer each time a vector is calculated; And a symbol appearance probability calculating step of calculating an appearance probability of a predetermined symbol based on an activity vector in the output layer.

また、本発明の言語確率算出装置は、入力層と、再帰結合したニューロンを有する中間層と、出力層と、を有するニューラルネットワークモデルを用いて言語確率を算出する言語確率算出装置であって、記号を表すベクトルを順次読み込む記号ベクトル読込部と、前記記号ベクトル読込部によって前記ベクトルが読み込まれるたびに、前記ベクトルを基に前記入力層における活性度ベクトルを算出する入力層活性度ベクトル算出部と、前記入力層活性度ベクトル算出部によって前記入力層における活性度ベクトルが算出されるたびに、前記中間層において前回算出された活性度ベクトルと、前記入力層における活性度ベクトルとを基に前記中間層における活性度ベクトルを算出する中間層活性度ベクトル算出部と、前記中間層における活性度ベクトルのうち、所定回数前までに算出された活性度ベクトルの平均である平均活性度ベクトルを算出する平均活性度ベクトル算出部と、前記中間層活性度ベクトル算出部によって前記中間層における活性度ベクトルが算出されるたびに、前記中間層における活性度ベクトルと、前記中間層における前記平均活性度ベクトルとを基に前記出力層における活性度ベクトルを算出する出力層活性度ベクトル算出部と、前記出力層における活性度ベクトルを基に所定の記号の出現確率を算出する記号出現確率算出部と、を有することを特徴とする。 The language probability calculation device of the present invention is a language probability calculation device that calculates a language probability using a neural network model having an input layer, an intermediate layer having recursively coupled neurons, and an output layer, A symbol vector reading unit that sequentially reads a vector representing a symbol, and an input layer activity vector calculation unit that calculates an activity vector in the input layer based on the vector each time the vector is read by the symbol vector reading unit; Each time the activity vector in the input layer is calculated by the input layer activity vector calculator, the intermediate level is calculated based on the activity vector previously calculated in the intermediate layer and the activity vector in the input layer. An intermediate layer activity vector calculating unit for calculating an activity vector in the layer; and an activity in the intermediate layer An average activity vector calculation unit that calculates an average activity vector that is an average of activity vectors calculated up to a predetermined number of times before the battle, and the activity vector in the intermediate layer by the intermediate layer activity vector calculation unit An output layer activity vector calculation unit that calculates an activity vector in the output layer based on the activity vector in the intermediate layer and the average activity vector in the intermediate layer, and the output A symbol appearance probability calculating unit that calculates an appearance probability of a predetermined symbol based on an activity vector in the layer.

本発明によれば、長い文脈の情報、すなわち文章の話題やスタイル、話者に特有の単語や話し方等を効果的に利用して次の単語を適切に予測することができる。 According to the present invention, it is possible to appropriately predict the next word by effectively using long context information, that is, the topic and style of the sentence, the word and the way of speaking peculiar to the speaker, and the like.

図１は、ＲＮＮ言語モデルの一例を示す図である。FIG. 1 is a diagram illustrating an example of an RNN language model. 図２は、ＲＮＮ言語モデルにおける活性度を算出する方法の一例を示すフローチャートである。FIG. 2 is a flowchart illustrating an example of a method for calculating the degree of activity in the RNN language model. 図３は、ＲＮＮ言語モデルにおける言語確率算出方法の一例を説明するための図である。FIG. 3 is a diagram for explaining an example of a language probability calculation method in the RNN language model. 図４は、第１の実施形態に係る言語確率算出装置の構成の一例を示す図である。FIG. 4 is a diagram illustrating an example of the configuration of the language probability calculation apparatus according to the first embodiment. 図５は、第１の実施形態に係る言語確率算出装置における処理の一例を示すフローチャートである。FIG. 5 is a flowchart illustrating an example of processing in the language probability calculation apparatus according to the first embodiment. 図６は、第１の実施形態に係る言語確率算出装置における言語確率算出方法の一例を説明するための図である。FIG. 6 is a diagram for explaining an example of a language probability calculation method in the language probability calculation apparatus according to the first embodiment. 図７は、第２の実施形態に係る音声認識装置の構成の一例を示す図である。FIG. 7 is a diagram illustrating an example of the configuration of the speech recognition apparatus according to the second embodiment. 図８は、第２の実施形態に係る音声認識装置における処理の一例を示すフローチャートである。FIG. 8 is a flowchart illustrating an example of processing in the speech recognition apparatus according to the second embodiment. 図９は、プログラムが実行されることにより、言語確率算出装置もしくは音声認識装置が実現されるコンピュータの一例を示す図である。FIG. 9 is a diagram illustrating an example of a computer in which a language probability calculation device or a speech recognition device is realized by executing a program.

以下に、本願に係る言語確率算出方法、言語確率算出装置および言語確率算出プログラムの実施形態を図面に基づいて詳細に説明する。なお、この実施形態により本願に係る言語確率算出方法、言語確率算出装置および言語確率算出プログラムが限定されるものではない。 Hereinafter, embodiments of a language probability calculation method, a language probability calculation device, and a language probability calculation program according to the present application will be described in detail with reference to the drawings. The language probability calculation method, the language probability calculation device, and the language probability calculation program according to the present application are not limited by this embodiment.

［ＲＮＮ言語モデルの概要］
まず、ＲＮＮ言語モデルの概要について説明する。ＲＮＮは、１つの入力層、１つ以上の中間層、および１つの出力層を持ち、少なくとも１つの中間層の中でニューロンが相互に結合された再帰結合を持つ。そして、ＲＮＮ言語モデルのＲＮＮには入力記号列の各記号が順次入力され、現在の１つ前の記号を表すベクトルと、その時の中間層の各ニューロンの活性度とを用いて、現在の記号の出現確率を算出する。 [Outline of RNN language model]
First, an outline of the RNN language model will be described. The RNN has one input layer, one or more intermediate layers, and one output layer, and has a recursive connection in which neurons are connected to each other in at least one intermediate layer. Then, each symbol of the input symbol string is sequentially input to the RNN of the RNN language model, and the current symbol is obtained by using the vector representing the current previous symbol and the activity of each neuron in the intermediate layer at that time. The appearance probability of is calculated.

ＲＮＮにおける各層には複数のニューロンがあり、それぞれ上位や下位、もしくは同じ層にあるニューロンと結合されている。各ニューロンは、発火している度合を表す活性度（実数値）を持ち、結合されたニューロン間には結合の強さを表す結合重み（実数値）が割り当てられる。各ニューロンの活性度に結合重みを掛けた値が結合先のニューロンに伝播される。また同じ層に含まれるニューロンの活性度はまとめて活性度ベクトルとして表現される。 Each layer in the RNN has a plurality of neurons, which are connected to neurons in the upper layer, the lower layer, or the same layer. Each neuron has an activity (real value) indicating the degree of firing, and a connection weight (real value) indicating the strength of connection is assigned between the connected neurons. A value obtained by multiplying the activity of each neuron by the connection weight is propagated to the connection destination neuron. In addition, the activity of neurons included in the same layer is collectively expressed as an activity vector.

図１は、ＲＮＮ言語モデルの一例を示す図である。図１は、入力層、中間層、出力層を各１層ずつ持つＲＮＮを表している。図１に示すように、中間層は同じ層に戻る再帰的な結合を持っている。入力層には入力ベクトルの値が活性度として与えられる。 FIG. 1 is a diagram illustrating an example of an RNN language model. FIG. 1 shows an RNN having one input layer, one intermediate layer, and one output layer. As shown in FIG. 1, the intermediate layer has a recursive connection back to the same layer. The input layer is given the value of the input vector as the activity.

前述のように、ＲＮＮ言語モデルでは、入力される記号は０または１の値からなるベクトルとして表現される。例えば、考慮する全ての記号の数（語彙サイズ）と同じだけのニューロンを入力層に用意しておき、入力記号に対応するニューロンだけが１、他のニューロンは０を取るように活性度を設定することができる。この場合、仮に考慮する記号の種類をＡ、Ｂ、Ｃとすると、入力層のニューロンは３つ必要であり、記号Ａ、記号Ｂ、記号Ｃに対応する入力ベクトルは、例えば式（１）のように表される。ただし、ベクトルの１次元目が記号Ａ、２次元目が記号Ｂ、３次元目が記号Ｃに対応するものとする。 As described above, in the RNN language model, an input symbol is expressed as a vector composed of 0 or 1 values. For example, as many neurons as the number (vocabulary size) of all the symbols to be considered are prepared in the input layer, and the activity is set so that only the neurons corresponding to the input symbols take 1 and the other neurons take 0. can do. In this case, if the types of symbols to be considered are A, B, and C, three neurons in the input layer are required, and the input vectors corresponding to the symbols A, B, and C are, for example, those of Expression (1). It is expressed as follows. However, the first dimension of the vector corresponds to the symbol A, the second dimension corresponds to the symbol B, and the third dimension corresponds to the symbol C.

また、図１では入力層のニューロンの上から順にベクトルの１、２、３次元目の要素が活性度になるように対応している。中間層においては再帰的な結合を考慮して活性度を算出する。ただし、最初の記号を読み込んだとき、すなわちｔ＝１のときは活性度を０と仮定する。また、出力層のニューロンの活性度ベクトルはソフトマックス関数等を用いて算出する。なお、出力層のニューロンも入力層と同様に上から順に記号Ａ、記号Ｂ、記号Ｃに対応している。 In FIG. 1, the elements in the first, second, and third dimensions of the vector correspond to the degree of activity in order from the top of the neurons in the input layer. In the intermediate layer, the activity is calculated in consideration of recursive coupling. However, when the first symbol is read, that is, when t = 1, the activity is assumed to be zero. Further, the activity vector of the neurons in the output layer is calculated using a softmax function or the like. Note that the neurons in the output layer also correspond to the symbols A, B, and C in order from the top as in the input layer.

次に、図２を用いて活性度を算出する方法について説明する。図２は、ＲＮＮ言語モデルにおける活性度を算出する方法の一例を示すフローチャートである。ここで、活性度の算出には、Ｌ層からなるＲＮＮであって、第１層が入力層、第２〜第Ｌ−１層が中間層（Ｌ≧３）、第Ｌ層が出力層であるＲＮＮを用いる。また、ＲＮＮの第ｎ層（１≦ｎ≦Ｌ）にはＨ_ｎ個のニューロンが含まれる。また、第ｍ層のｊ番目のニューロンから第ｎ層のｋ番目のニューロンへの結合重みｗ_ｍ，ｎ［ｋ，ｊ］を要素とする行列をｗ_ｍ，ｎで表す。ただし、１≦ｍ≦ｎ≦Ｌ、１≦ｊ≦Ｈ_ｍ、１≦ｋ≦Ｈ_ｎとする。また、第ｎ層（１＜ｎ≦Ｌ）の時刻ｔにおけるニューロンの活性度ベクトルをｈ_ｎ ^（ｔ）と表す。 Next, a method for calculating the activity will be described with reference to FIG. FIG. 2 is a flowchart illustrating an example of a method for calculating the degree of activity in the RNN language model. Here, the activity is calculated by the RNN consisting of the L layer, where the first layer is the input layer, the second to L-1 layers are the intermediate layers (L ≧ 3), and the L layer is the output layer. Some RNN is used. In addition, the nth layer (1 ≦ n ≦ L) of the RNN includes H _n neurons. In addition, a matrix having a connection weight w _{m, n} [k, j] from the j-th neuron in the m-th layer to the k-th neuron in the n-th layer is represented by w _{m, n} . However, the 1 ≦ m ≦ n ≦ L, 1 ≦ j ≦ H m, 1 ≦ k ≦ H n. Further, the activity vector of the neuron at the time t in the nth layer (1 <n ≦ L ⁾ is represented as h _n ^(t) .

まず、ｔを１に設定する（ステップＳ１０１）。次に、ｔ番目の入力記号ｘ_ｔを入力層の活性度ベクトルｈ_１ ^（ｔ）に代入し（ステップＳ１０２）、ｎを２に初期化する（ステップＳ１０３）。ここで、ＲＮＮの第ｎ層が再帰接続のある中間層である場合（ステップＳ１０４、Ｙｅｓ）、再帰的に伝搬される成分ｗ_ｎ，ｎ・ｈ_ｎ ^{（ｔ−１）}をｚ_ｎに代入する（ステップＳ１０５）。一方、ＲＮＮの第ｎ層が再帰接続のある中間層でない場合（ステップＳ１０４、Ｎｏ）、ｚ_ｎは０ベクトルに設定する（ステップＳ１０６）。 First, t is set to 1 (step S101). Next, the t-th input symbol x _t is substituted into the activity vector h ₁ ^(t) of the input layer (step S102), and n is initialized to 2 (step S103). Here, when the n-th layer of the RNN is an intermediate layer with recursive connection (step S104, Yes), the recursively propagated component w _{n, n} · h _n ^(t−1) is substituted into z _n . (Step S105). On the other hand, when the n-th layer of the RNN is not an intermediate layer with recursive connection (No in step S104), z _n is set to a 0 vector (step S106).

そして、第ｎ−１層から伝搬される成分ｗ_{ｎ−１，ｎ}・ｈ_ｎ−１ ^（ｔ）をｚ_ｎに加える（ステップＳ１０７）。次に、ｚ_ｎに活性化関数ｆ_ｎ（・）を適用し、第ｎ層の活性度ベクトルｈ_ｎ ^（ｔ）を得る（ステップＳ１０８）。ここで、Ｌを層の数としたときに、ｎ＜Ｌの場合（ステップＳ１０９、Ｙｅｓ）、ｎを１だけ増加させ（ステップＳ１１０）、ステップＳ１０４へ戻り処理を繰り返す。また、ｎ＜Ｌでない場合（ステップＳ１０９、Ｎｏ）、ｔ＜Ｔであれば（ステップＳ１１１、Ｙｅｓ）ｔを１だけ増加させ（ステップＳ１１２）、ステップＳ１０２へ戻り処理を繰り返す。ｔ＜Ｔでなければ（ステップＳ１１１、Ｎｏ）処理を終了する。 Then, the component w _{n−1, n} · h _n−1 ^(t) propagated from the ⁽ _n−1 ⁾ _{th layer} is added to z _n (step S107). Next, the activation function f _n (•) is applied to z _n to obtain the n-th layer activity vector h _n ^(t) (step S108). Here, when L is the number of layers, if n <L (step S109, Yes), n is incremented by 1 (step S110), and the process returns to step S104 and is repeated. If n <L is not satisfied (step S109, No), if t <T (step S111, Yes), t is incremented by 1 (step S112), and the process returns to step S102 to repeat the process. If t <T is not satisfied (No in step S111), the process is terminated.

図３を用いてＲＮＮ言語モデルにおける言語確率算出方法について説明する。図３は、ＲＮＮ言語モデルにおける言語確率算出方法の一例を説明するための図である。なお、図３の例ではＬ＝３であり、中間層は１層のみとする。 The language probability calculation method in the RNN language model will be described with reference to FIG. FIG. 3 is a diagram for explaining an example of a language probability calculation method in the RNN language model. In the example of FIG. 3, L = 3, and only one intermediate layer is provided.

図３に示すように、順に入力された記号ｘ_１，ｘ_２，…，ｘ_ｔに対して、中間層の活性度ベクトルｈ_２ ^（ｔ）は、現在の記号に対する入力層の活性度ベクトルｈ_１ ^（ｔ）と１つ前の記号に対する中間層の活性度ベクトルｈ_２ ^{（ｔ−１）}とから算出される（図２のステップＳ１０５、Ｓ１０６、Ｓ１０７、Ｓ１０８）。また、ｈ_２ ^（ｔ）に基づいて出力層の活性度ベクトルｈ_３ ^（ｔ）が算出される。そして、出力層活性度ベクトルに基づいて記号の出現確率が算出される。 As shown in FIG. 3, for the symbols x ₁ , x ₂ ,..., X _t that are input in order, the intermediate layer activity vector h ₂ ^(t) is the input layer activity vector h for the current symbol. ₁ ^(t) and the intermediate layer activity vector h ₂ ^(t−1) for the previous symbol are calculated (steps S105, S106, S107, and S108 in FIG. 2). Further, the activity vector h ₃ ^{(t) of the} output layer is calculated based on h ₂ ^(t) . Then, the appearance probability of the symbol is calculated based on the output layer activity vector.

このように、ＲＮＮの中間層のニューロンの活性度は再帰的な結合により再び中間層のニューロンへ伝搬されることから、中間層のニューロンの活性度には、現在までに読み込んだ入力系列の特徴が記憶される。したがって、ＲＮＮ言語モデルは入力記号列の最初から現在までの履歴に依存した記号出現確率を求めることができる。 Thus, since the activity of the neurons in the intermediate layer of the RNN is propagated again to the neurons in the intermediate layer by recursive connection, the activity of the neurons in the intermediate layer includes the characteristics of the input sequence read so far. Is memorized. Therefore, the RNN language model can obtain the symbol appearance probability depending on the history from the beginning of the input symbol string to the present.

これにより、ＲＮＮ言語モデルは、過去のＮ−１個の記号のみから次の記号を予測するＮグラムモデル（Ｎは高々３か４）よりも長い文脈を考慮した記号出現確率を求めることが可能なモデルとなっている。しかしながら、前述の通り、より新しく出現した単語の影響が大きくなり、ある程度過去の単語の影響は非常に小さくなる。そのため、ある程度過去の単語が、単語の予測に効果的に利用されていない。 As a result, the RNN language model can determine the symbol appearance probability considering a longer context than the N-gram model (N is 3 or 4 at most) that predicts the next symbol from only the past N-1 symbols. Model. However, as described above, the influence of newly appearing words becomes large, and the influence of past words to some extent becomes very small. Therefore, some past words are not effectively used for word prediction.

［第１の実施形態］
以下の実施形態では、第１の実施形態に係る言語確率算出装置の構成、および言語確率算出装置によって実行される言語確率算出方法を説明し、さらに第１の実施形態による効果を説明する。また、以降の説明において「ＲＮＮ言語モデル」は本発明の実施形態におけるＲＮＮ言語モデルを示すが、「従来のＲＮＮ言語モデル」はこれまで図１〜３を用いて説明したＲＮＮ言語モデルを示すものとする。 [First Embodiment]
In the following embodiments, the configuration of the language probability calculation device according to the first embodiment, the language probability calculation method executed by the language probability calculation device, and the effects of the first embodiment will be described. In the following description, “RNN language model” indicates the RNN language model in the embodiment of the present invention. “Conventional RNN language model” indicates the RNN language model described above with reference to FIGS. And

［第１の実施形態の構成］
まず、図４を用いて第１の実施形態に係る言語確率算出装置の構成について説明する。図４は、第１の実施形態に係る言語確率算出装置の構成の一例を示す図である。図４に示すように、言語確率算出装置１は、予測部１０、学習部１１および記憶部１２を有する。 [Configuration of First Embodiment]
First, the configuration of the language probability calculation apparatus according to the first embodiment will be described with reference to FIG. FIG. 4 is a diagram illustrating an example of the configuration of the language probability calculation apparatus according to the first embodiment. As illustrated in FIG. 4, the language probability calculation device 1 includes a prediction unit 10, a learning unit 11, and a storage unit 12.

また、予測部１０は記号ベクトル読込部１０１、入力層活性度ベクトル算出部１０２、中間層活性度ベクトル算出部１０３、平均活性度ベクトル算出部１０４、出力層活性度ベクトル算出部１０５および記号出現確率算出部１０６を有する。また、学習部１１は損失関数定義部１１１およびパラメータ推定部１１２を有する。また、記憶部１２は予測部１０等で用いられるＲＮＮ言語モデルを記憶するＲＮＮ言語モデル記憶部１２１を有する。 The prediction unit 10 includes a symbol vector reading unit 101, an input layer activity vector calculation unit 102, an intermediate layer activity vector calculation unit 103, an average activity vector calculation unit 104, an output layer activity vector calculation unit 105, and a symbol appearance probability. A calculation unit 106 is included. The learning unit 11 includes a loss function definition unit 111 and a parameter estimation unit 112. In addition, the storage unit 12 includes an RNN language model storage unit 121 that stores an RNN language model used in the prediction unit 10 and the like.

まず、予測部１０の各部について詳細に説明するとともに、言語確率算出装置１によって実行される言語確率算出方法について説明する。ここで、言語確率算出装置１は、Ｌ層からなるＲＮＮであって、第１層が入力層、第２〜第Ｌ−１層が中間層（Ｌ≧３）、第Ｌ層が出力層であるＲＮＮを用いて言語確率の算出を行う。また、ＲＮＮの第ｎ層（１≦ｎ≦Ｌ）にはＨ_ｎ個のニューロンが含まれる。また、第ｍ層のｊ番目のニューロンから第ｎ番層のｋ番目のニューロンへの結合重みｗ_ｍ，ｎ［ｋ，ｊ］を要素とする行列をｗ_ｍ，ｎで表す。ただし、１≦ｍ≦ｎ≦Ｌ、１≦ｊ≦Ｈ_ｍ、１≦ｋ≦Ｈ_ｎとする。また、第ｎ層（１＜ｎ≦Ｌ）の時刻ｔにおけるニューロンの活性度ベクトルをｈ_ｎ ^（ｔ）と表す。 First, each part of the prediction unit 10 will be described in detail, and a language probability calculation method executed by the language probability calculation device 1 will be described. Here, the language probability calculation device 1 is an RNN composed of L layers, where the first layer is an input layer, the second to (L-1) th layers are intermediate layers (L ≧ 3), and the Lth layer is an output layer. A language probability is calculated using a certain RNN. In addition, the nth layer (1 ≦ n ≦ L) of the RNN includes H _n neurons. Further, a matrix having the connection weights w _{m, n} [k, j] from the j-th neuron in the m-th layer to the k-th neuron in the n-th layer is represented by w _{m, n} . However, the 1 ≦ m ≦ n ≦ L, 1 ≦ j ≦ H m, 1 ≦ k ≦ H n. Further, the activity vector of the neuron at the time t in the nth layer (1 <n ≦ L ⁾ is represented as h _n ^(t) .

記号ベクトル読込部１０１は記号を表すベクトルを順次読み込む。個々の記号は、１つのベクトルの次元に対応し、対応する次元の要素のみを１、対応しない次元の要素を０とすることで固有の記号を表現する。 The symbol vector reading unit 101 sequentially reads vectors representing symbols. Each symbol corresponds to a dimension of one vector, and a unique symbol is expressed by setting only an element of a corresponding dimension to 1 and setting an element of a dimension not corresponding to 0.

入力層活性度ベクトル算出部１０２は、記号ベクトル読込部１０１によってベクトルが読み込まれるたびに、ベクトルを基に入力層における活性度ベクトルを算出する。第１の実施形態においては、入力層活性度ベクトル算出部１０２は入力記号のベクトルの各次元の値を入力層のニューロンの活性度に決定する。したがって、入力記号のベクトルの次元数は出現する可能性のある入力記号の種類数および入力層のニューロンの数に等しい。ここで、入力記号列Ｘ＝ｘ_１，ｘ_２，…，ｘ_ｔ，…，ｘ_Ｔの１番目から順にｔ番目の記号ｘ_ｔが読み込まれたとき、入力層の各ニューロンの活性度を表す活性度ベクトルｈ_１ ^（ｔ）は、式（２）のようになる。 Every time a vector is read by the symbol vector reading unit 101, the input layer activity vector calculation unit 102 calculates an activity vector in the input layer based on the vector. In the first embodiment, the input layer activity vector calculation unit 102 determines the value of each dimension of the input symbol vector as the neuron activity of the input layer. Therefore, the number of dimensions of the vector of input symbols is equal to the number of types of input symbols that can appear and the number of neurons in the input layer. Here, the input symbol sequence _{_{X = x 1, x 2,}} ..., x t, ..., when the t-th symbol _{x t} is read from the first _{x T} in order, representing the activity of the neurons in the input layer The activity vector h ₁ ^(t) is expressed by Equation (2).

中間層活性度ベクトル算出部１０３は、入力層活性度ベクトル算出部１０２によって入力層における活性度ベクトルが算出されるたびに中間層における活性度ベクトルを算出する。中間層活性度ベクトル算出部１０３は、式（３）に示すように、第ｎ層（１＜ｎ≦Ｌ）のニューロンの活性度ベクトルｈ_ｎ ^（ｔ）を、当該ニューロンに結合されたニューロンの活性度に結合重みを掛けた値の集合、すなわち結合重み行列と活性度ベクトルの積を求め、求めたベクトルの積を活性化関数によって０と１の間に正規化することによって算出する。なお、中間層が１層の場合、ｈ_ｎ−１ ^（ｔ）は入力層における活性度ベクトルである。 The intermediate layer activity vector calculation unit 103 calculates the activity vector in the intermediate layer each time the input layer activity vector calculation unit 102 calculates the activity vector in the input layer. As shown in Expression (3), the intermediate layer activity vector calculation unit 103 uses the n-th layer (1 <n ≦ L) neuron activity vector h _n ^(t) to the neuron coupled to the neuron. A set of values obtained by multiplying the activity by the connection weight, that is, the product of the connection weight matrix and the activity vector is obtained, and the product of the obtained vector is calculated by normalizing between 0 and 1 by the activation function. When the intermediate layer is one layer, h _n−1 ^(t) is an activity vector in the input layer.

ここで、活性化関数としては式（４）に示すシグモイド関数が用いられる。ただし、ｘはベクトルの各要素を表すものとする。 Here, the sigmoid function shown in Formula (4) is used as the activation function. Here, x represents each element of the vector.

一方、中間層において同じ層内のニューロンとの再帰的な結合がある場合は、式（５）に示すように、中間層において前回算出された活性度ベクトル、すなわちｔ−１番目の記号ｘ_ｔ−１を読み込んだ時の中間層における活性度ベクトルを基にｈ_ｎ ^（ｔ）を算出する。式（５）の右辺のｆ_ｎ（・）内の第２項には、添え字（ｔ−１）が付いた活性度ベクトルｈ_ｎ ^{（ｔ−１）}が同じ層内のニューロンの活性度にｗ_ｎ，ｎの重み付きで加算される。 On the other hand, when there is a recursive connection with neurons in the same layer in the intermediate layer, as shown in the equation (5), the activity vector previously calculated in the intermediate layer, that is, the t−1th symbol x _t H _n ^(t) is calculated based on the activity vector in the intermediate layer when ₋₁ is read. In the second term in f _n (•) on the right side of Equation (5), the activity vector h _n ^(t−1) with the subscript (t−1) is assigned to the activity of the neurons in the same layer. Add with weighting w _{n, n} .

平均活性度ベクトル算出部１０４は、中間層における活性度ベクトルのうち、所定回数前までに算出された活性度ベクトルの平均である平均活性度ベクトルを算出する。すなわち、平均活性度ベクトル算出部１０４は、時刻ｔまでに読み込んだ記号列ｘ_１，ｘ_２，…，ｘ_ｔに対して第ｎ−１層の中間層の活性度ベクトルｈ_ｎ−１ ^（１），ｈ_ｎ−１ ^（２），…，ｈ_ｎ−１ ^{（ｔ−１）}，ｈ_ｎ−１ ^（ｔ）が中間層活性度ベクトル算出部１０３によって算出されているとき、算出された中間層の活性度ベクトルの過去Ａ個分の平均である平均活性度ベクトルを式（６）によって算出する。 The average activity vector calculation unit 104 calculates an average activity vector that is an average of the activity vectors calculated up to a predetermined number of times among the activity vectors in the intermediate layer. That is, the average activity vector calculation unit 104 performs the activity vector h _n−1 ^{(1) of the (} _n−1 ⁾ _th intermediate layer with respect to the symbol strings x ₁ , x ₂ _,. ^{_{^{_{), h n-1 (2}}}} ), ..., h n-1 (t-1), _{when h ^n-1} ^(t) is calculated by an interlayer activity vector calculating unit 103, the intermediate layer is calculated An average activity vector, which is the average of past A activity vectors, is calculated by Equation (6).

このとき、Ａの値を大きくすることで、従来よりも長い文脈情報が平均活性度ベクトルに保持される。さらに、平均活性度ベクトル算出部１０４は、所定回数を変化させて複数の平均活性度ベクトルを算出するようにしてもよい。この場合、平均活性度ベクトル算出部１０４は、異なるＡの値Ａ^（１），Ａ^（２），…，Ａ^（ｍ），…，Ａ^（Ｍ）を設定し、複数の平均活性度ベクトルを算出する。ここで、Ｍは任意の定数とし、Ａ^（ｍ）は整数を返すｍの関数とする。このとき、ｍ＝１，２，…，Ｍに対する平均活性度ベクトルを式（７）によって算出する。 At this time, by increasing the value of A, context information longer than the conventional context information is held in the average activity vector. Further, the average activity vector calculation unit 104 may calculate a plurality of average activity vectors by changing the predetermined number of times. In this case, the average activity vector calculating unit 104, different values of ^{^{A A (1), A (}} 2), ..., A (m), ..., set the ^{A (M),} a plurality of average activity vector calculate. Here, M is an arbitrary constant, and A ^(m) is a function of m that returns an integer. At this time, an average activity vector for m = 1, 2,...

ここで、ｗ_{ｎ−１，ｎ} ^（ｍ）はＡ^（ｍ）個分の平均活性度ベクトルを算出するための結合重み行列である。なお、Ｍ＝１、Ａ^（１）＝１、ｗ_{ｎ−１，ｎ} ^（１）＝ｗ_{ｎ−１，ｎ}とした場合は、平均活性度ベクトルは中間層活性度ベクトル算出部１０３によって算出される活性度ベクトルと等しくなる。また、第ｎ層が再帰的な結合を持つ場合は、平均活性度ベクトル算出部１０４は中間層活性度ベクトル算出部１０３と同様に、式（８）によってｔ−１の活性度ベクトルｈ_ｎ ^{（ｔ−１）}を結合重みｗ_ｎ，ｎを掛けて加える。 Here, w _{n−1, n} ^(m) is a connection weight matrix for calculating an average activity vector for A ^(m) . When M = 1, A ⁽¹⁾ = 1, w _{n−1, n} ⁽¹⁾ = w _{n−1, n} , the average activity vector is calculated by the intermediate layer activity vector calculation unit 103. Equal to the activity vector. When the n-th layer has a recursive combination, the average activity vector calculation unit 104, like the intermediate layer activity vector calculation unit 103, uses the equation (8) to calculate the activity vector h _n ^{( t-1)} is multiplied by the connection weights w _{n, n} and added.

出力層活性度ベクトル算出部１０５は、中間層活性度ベクトル算出部１０３によって中間層における活性度ベクトルが算出されるたびに、出力層における活性度ベクトルを算出する。出力層活性度ベクトル算出部１０５は、出力層に最も近い中間層である第Ｌ−１層の活性度ベクトルもしくは平均活性度ベクトルに基づいて、第Ｌ層すなわち出力層における活性度ベクトルの算出を行う。このとき、出力層活性度ベクトル算出部１０５は、活性度を確率と見なすために、活性化関数として式（９）に示すソフトマックス関数を用いて出力層における活性度ベクトルを算出する。 The output layer activity vector calculation unit 105 calculates an activity vector in the output layer every time the activity vector in the intermediate layer is calculated by the intermediate layer activity vector calculation unit 103. The output layer activity vector calculation unit 105 calculates the activity vector in the Lth layer, that is, the output layer, based on the activity vector or the average activity vector of the (L-1) th layer which is the intermediate layer closest to the output layer. Do. At this time, the output layer activity vector calculation unit 105 calculates the activity vector in the output layer using the softmax function shown in Expression (9) as the activation function in order to regard the activity as a probability.

ここで、式（９）の分母は活性度を確率と見なすための正規化項であり、ｚ_ｎ［ｉ］は、式（１０）に示すｎ−１層目から重み付きで伝搬された活性度ベクトルｚ_ｎのｉ次元目の要素を表す。 Here, the denominator of the equation (9) is a normalization term for regarding the activity as a probability, and z _n [i] is the activity propagated with weight from the (n−1) th layer shown in the equation (10). This represents the i- _th element of the degree vector z _n .

ＲＮＮ言語モデルでは、出力層の個々のニューロンは固有の記号に対応しており、予測される次の記号の出現確率は、その記号に対応するニューロンの活性度として求められる。そこで、記号出現確率算出部１０６は、入力記号列ｘ_１，ｘ_２，…，ｘ_ｔを読み込んだ後で出力層における活性度ベクトルｈ_Ｌ ^（ｔ）を基に記号ｖ_ｋが出現する確率を、式（１１）によって算出する。ただし、記号ｖ_ｋは出力層のｋ番目のニューロンに対応する記号を表す。 In the RNN language model, each neuron in the output layer corresponds to a unique symbol, and the appearance probability of the predicted next symbol is obtained as the activity of the neuron corresponding to that symbol. Therefore, the symbol occurrence probability calculation unit 106, an input symbol sequence _x _1, x 2, ..., the probability that the symbol _{v k} based on the activity of vector _h ^{L (t)} in the output layer after reading _{x t} appears , Calculated by the equation (11). Here, the symbol v _k represents a symbol corresponding to the k-th neuron in the output layer.

なお、平均活性度ベクトル算出部１０４が所定回数を変化させて複数の平均活性度ベクトルを算出する場合、出力層活性度ベクトル算出部１０５は、中間層における活性度ベクトルおよび中間層における複数の平均活性度ベクトルのそれぞれの重み付きの和を出力層における活性度ベクトルとして算出する。この場合、出力層活性度ベクトル算出部１０５はｎをＬに設定した式（７）または式（８）を用いる。 When the average activity vector calculation unit 104 calculates a plurality of average activity vectors by changing the predetermined number of times, the output layer activity vector calculation unit 105 calculates the activity vector in the intermediate layer and the plurality of averages in the intermediate layer. The weighted sum of each activity vector is calculated as the activity vector in the output layer. In this case, the output layer activity vector calculation unit 105 uses Expression (7) or Expression (8) in which n is set to L.

中間層活性度ベクトル算出部１０３および平均活性度ベクトル算出部１０４は、第ｎ−１層および第ｎ層に再帰接続があるか否かによって、これまで説明した方法を使い分けて活性度の算出を行う。 The intermediate layer activity vector calculation unit 103 and the average activity vector calculation unit 104 calculate the activity by using the method described so far depending on whether or not the n−1th layer and the nth layer have recursive connections. Do.

まず、第ｎ層および第ｎ−１層のいずれにも再帰接続がない場合は、式（３）によって算出される活性度ベクトルが第ｎ層の活性度ベクトルとなる。また、第ｎ−１層に再帰接続があり第ｎ層に再帰接続がない場合は、式（５）によって算出される活性度ベクトルが第ｎ層の活性度ベクトルとなる。また、第ｎ−１層に再帰接続がなく第ｎ層に再帰接続がある場合は、式（７）によって算出される活性度ベクトルが第ｎ層の活性度ベクトルとなる。また、第ｎ層および第ｎ−１層のいずれにも再帰接続がある場合は、式（８）によって算出される活性度ベクトルが第ｎ層の活性度ベクトルとなる。 First, when there is no recursive connection in either the nth layer or the (n−1) th layer, the activity vector calculated by the equation (3) becomes the activity vector of the nth layer. Further, when there is a recursive connection in the (n−1) th layer and there is no recursive connection in the nth layer, the activity vector calculated by the equation (5) becomes the activity vector of the nth layer. Further, when there is no recursive connection in the (n−1) th layer and there is a recursive connection in the nth layer, the activity vector calculated by the equation (7) becomes the activity vector of the nth layer. Further, when there is a recursive connection in both the nth layer and the n−1th layer, the activity vector calculated by the equation (8) becomes the activity vector of the nth layer.

次に、学習部１１の各部について詳細に説明するとともに、活性度ベクトル算出のための結合重みの設定方法について説明する。まず、基本的にはＲＮＮのパラメータである結合重みは、記号列の学習データを用いて誤差逆伝搬法を用いて推定される。 Next, each part of the learning unit 11 will be described in detail, and a connection weight setting method for calculating the activity vector will be described. First, the coupling weight, which is basically an RNN parameter, is estimated by using the error back propagation method using the learning data of the symbol string.

例として、Ｌ＝３すなわち３層のＲＮＮ言語モデルを作成する場合を例に挙げて結合重みの設定方法について説明する。この場合、第１層が入力層、第２層が中間層、第３層が出力層である。まず、結合重み行列ｗ_１，２、ｗ_２，２、ｗ_２，３は、ＢＰＴＴ（通時的誤差逆伝搬：Back Propagation Through Time）等の既知の方法によって設定することができる（参考文献：Williams,R.J.,and Zipser,D. A learning algorithm for continually running fully recurrent neural networks. Neural Computation. 1(2),270, 1989.）。 As an example, a method of setting the connection weight will be described by taking L = 3, that is, a case of creating a three-layer RNN language model as an example. In this case, the first layer is an input layer, the second layer is an intermediate layer, and the third layer is an output layer. First, the connection weight matrices w ₁ , ₂ , w ₂ , ₂ , w ₂ , ₃ can be set by a known method such as BPTT (Back Propagation Through Time) (reference document: Williams, RJ, and Zipser, D. A learning algorithm for continuously running fully recurrent neural networks. Neural Computation. 1 (2), 270, 1989.).

一方、学習部２０は、平均活性度ベクトル算出部１０４における結合重みｗ_２，３ ^（１），…，ｗ_２，３ ^（Ｍ）を次のように設定する。まず、学習部２０は、予測部１０の機能を利用すること等により行列ｗ_１，２およびｗ_２，２を用いて学習データの記号列ｘ_１，ｘ_２，…，ｘ_ｔに対する中間層の活性度ベクトル系列ｈ_２ ^（１），ｈ_２ ^（２），…，ｈ_２ ^（ｔ），…，ｈ_２ ^（Ｔ）を求める。そして、損失関数定義部１１１は学習データを用いて、重み付きの和における重みをパラメータとする損失関数を定義する。すなわち損失関数定義部１１１は、式（１２）に示すパラメータの集合に対して、式（１３）に示す負の対数尤度に基づく損失関数を定義する。 On the other hand, the learning unit 20 sets the connection weights w _2,3 ⁽¹⁾ ,..., W _2,3 ^(M) in the average activity vector calculation unit 104 as follows. First, the learning unit 20 uses the functions of the prediction unit 10 or the like to use the matrices w _1,2 and w _2,2 to change the intermediate layer for the symbol strings x ₁ , x ₂ _,. activity vector sequence _{^{_{^{h 2 (1), h 2}}}} (2), ..., h 2 (t), ..., determine the ^{h 2 (T).} Then, the loss function definition unit 111 uses the learning data to define a loss function using the weight in the weighted sum as a parameter. That is, the loss function definition unit 111 defines a loss function based on the negative log likelihood shown in Expression (13) for the set of parameters shown in Expression (12).

そして、パラメータ推定部１１２は損失関数が最小となるようにパラメータを推定する。パラメータ推定部１１２は、式（１３）のＥ（Ｖ）が最小になるようにＶを推定する。ここで、式（１４）に示すように、ｙ_ｔ［ｋ］は学習データのベクトル表現された個々の記号ｘ_ｔの次、すなわち時刻ｔ＋１に出現する記号ｙ_ｔの第ｋ次元目の要素を表す。 Then, the parameter estimation unit 112 estimates parameters so that the loss function is minimized. The parameter estimation unit 112 estimates V so that E (V) in Expression (13) is minimized. Here, as shown in Expression (14), y _t [k] is an element of the k-th dimension of the symbol y _t that appears next to each symbol x _t expressed in vector of the learning data, that is, at time t + 1. Represent.

また、ｈ_３ ^（ｔ）は出力層の活性度ベクトルであり、予測された次の記号の確率分布を表す。つまり、Ｅ（Ｖ）は、実際に次に出現した記号に対してＲＮＮ言語モデルがより高い確率を付けるほど小さくなる。そのため、パラメータ推定部１１２はＥ（Ｖ）が最小になるようなＶを求めれば良い。ただし、右辺第２項は、Ｖの個々の要素ｗ_２，３ ^（ｍ）［ｋ，ｊ］が大きくなり過ぎないように制御するための正則化項であり、βは正則化項に対する重み係数を表す正の定数である。 H ₃ ^(t) is an activity vector of the output layer and represents the probability distribution of the predicted next symbol. That is, E (V) becomes smaller as the RNN language model gives a higher probability to the symbol that actually appears next. Therefore, the parameter estimation unit 112 may obtain V that minimizes E (V). However, the second term on the right-hand side is a regularization term for controlling the individual elements w _2,3 ^(m) [k, j] of V so as not to become too large, and β is a weighting factor for the regularization term. Is a positive constant.

パラメータ推定部１１２は、例えば勾配法を用いてＥ（Ｖ）の最小値を求める。例えば、式（１５）に示すように、パラメータ推定部１１２は、Ｅ（Ｖ）の個々の要素ｗ_２，３ ^（ｍ）［ｋ，ｊ］に対する偏微分を求める。 The parameter estimation unit 112 obtains the minimum value of E (V) using, for example, a gradient method. For example, as shown in Expression (15), the parameter estimation unit 112 obtains partial differentiation with respect to individual elements w _2,3 ^(m) [k, j] of E (V).

そして、パラメータ推定部１１２は式（１６）に示すように新しいパラメータを求め、求めた新しいパラメータを用いてさらに偏微分計算およびパラメータ更新を繰り返す。ここで、ηは学習率を表す。 Then, the parameter estimation unit 112 obtains a new parameter as shown in Expression (16), and further repeats partial differential calculation and parameter update using the obtained new parameter. Here, η represents a learning rate.

また、パラメータ推定部１１２は、学習データをいくつかの小さなブロック（もしくは個々の単語等）に分割し、分割されたブロックを順に読み込んで偏微分計算およびパラメータ更新を繰り返す確率的勾配法を用いることで、損失関数の収束を速くすることができる。 The parameter estimation unit 112 uses a stochastic gradient method that divides the learning data into several small blocks (or individual words, etc.), reads the divided blocks in order, and repeats the partial differential calculation and the parameter update. Thus, the convergence of the loss function can be accelerated.

［第１の実施形態の処理］
次に、図５を用いて言語確率算出装置１の処理について説明する。図５は、第１の実施形態に係る言語確率算出装置における処理の一例を示すフローチャートである。図５に示すように、まず、記号ベクトル読込部１０１はｔを１に設定し（ステップＳ２０１）、記号列を読み込む。そして、入力層活性度ベクトル算出部１０２は記号ｘ_ｔを入力層の活性度ベクトルｈ_１ ^（ｔ）に設定し（ステップＳ２０２）、ｎを２に設定する（ステップＳ２０３）。 [Process of First Embodiment]
Next, processing of the language probability calculation device 1 will be described with reference to FIG. FIG. 5 is a flowchart illustrating an example of processing in the language probability calculation apparatus according to the first embodiment. As shown in FIG. 5, first, the symbol vector reading unit 101 sets t to 1 (step S201), and reads a symbol string. Then, the input layer activity vector calculation unit 102 sets the symbol x _t to the input layer activity vector h ₁ ^(t) (step S202), and sets n to 2 (step S203).

ここで、第ｎ層が再帰接続のある中間層である場合（ステップＳ２０４、Ｙｅｓ）、中間層活性度ベクトル算出部１０３は再帰的に伝搬される成分ｗ_ｎ，ｎ・ｈ_ｎ ^{（ｔ−１）}をｚ_ｎに代入する（ステップＳ２０５）。一方、第ｎ層が再帰接続のある中間層でない場合（ステップＳ２０４、Ｎｏ）、中間層活性度ベクトル算出部１０３はｚ_ｎを０ベクトルに設定する（ステップＳ２０６）。 Here, when the n-th layer is an intermediate layer with recursive connection (step S204, Yes), the intermediate layer activity vector calculation unit 103 recursively propagates the components wn _{, n} · h _n ^{(t−1). )} is substituted into _{z n} (step S205). On the other hand, if the n-th layer is not an intermediate layer with a recursive connection (step S204, No), the intermediate layer activity vector calculating unit 103 sets the _{z n} to 0 vector (step S206).

さらに、第ｎ−１層が再帰接続のある中間層である場合（ステップＳ２０７、Ｙｅｓ）、平均活性度ベクトル算出部１０４は平均活性度ベクトルの算出を行う。このとき、まず平均活性度ベクトル算出部１０４はｍを１に設定する（ステップＳ２０９）。そして、平均活性度ベクトル算出部１０４は過去Ａ^（ｍ）個分の平均活性度ベクトルを算出し（ステップＳ２１０）、重みを掛けてｚ_ｎに加える（ステップＳ２１１）。 Furthermore, when the (n−1) th layer is an intermediate layer with recursive connection (step S207, Yes), the average activity vector calculation unit 104 calculates an average activity vector. At this time, the average activity vector calculation unit 104 first sets m to 1 (step S209). Then, the average activity vector calculation unit 104 calculates an average activity vector for the past A ^(m) (step S210), adds the weight to z _n (step S211).

ここで、ｍ＜Ｍである場合（ステップＳ２１２、Ｙｅｓ）、平均活性度ベクトル算出部１０４はｍを１だけ増加させ（ステップＳ２１３）、ステップＳ２０９に戻り処理を繰り返す。また、ｍ＜Ｍでない場合（ステップＳ２１２、Ｎｏ）、平均活性度ベクトル算出部１０４は平均活性度ベクトルの算出を終了する。 If m <M (step S212, Yes), the average activity vector calculation unit 104 increases m by 1 (step S213), returns to step S209, and repeats the process. If m <M is not satisfied (step S212, No), the average activity vector calculation unit 104 ends the calculation of the average activity vector.

一方、第ｎ−１層が再帰接続のある中間層でない場合（ステップＳ２０７、Ｎｏ）、中間層活性度ベクトル算出部１０３は第ｎ−１層から伝搬される成分ｗ_{ｎ−１，ｎ}・ｈ_ｎ−１ ^（ｔ）をｚ_ｎに加える（ステップＳ２０８）。 On the other hand, when the (n−1) th layer is not an intermediate layer with recursive connection (No in step S207), the intermediate layer activity vector calculation unit 103 determines the component w _{n−1, n} · h propagated from the (n−1) th layer. Add ^{_n-1 (t)} to _{z n} (step S208).

次に、平均活性度ベクトル算出部１０４は、ｚ_ｎに活性化関数ｆ_ｎ（・）を適用し、第ｎ層の活性度ベクトルｈ_ｎ ^（ｔ）を得る（ステップＳ２１４）。なお、ｎ＝Ｌの場合は出力層活性度ベクトル算出部１０５が活性化関数の適用を行う。ここで、ｎ＜Ｌの場合（ステップＳ２１５、Ｙｅｓ）、ｎを１だけ増加させ（ステップＳ２１６）、ステップＳ２０４へ戻り処理を繰り返す。また、ｎ＜Ｌでない場合（ステップＳ２１５、Ｎｏ）、ｔ＜Ｔであれば（ステップＳ２１７、Ｙｅｓ）ｔを１だけ増加させ（ステップＳ２１８）、ステップＳ２０２へ戻り処理を繰り返す。ｔ＜Ｔでなければ（ステップＳ２１７、Ｎｏ）処理を終了する。 Next, the average activity vector calculation unit 104 applies the activation function f _n (·) to z _n to obtain the n-th layer activity vector h _n ^(t) (step S214). When n = L, the output layer activity vector calculation unit 105 applies the activation function. Here, when n <L (step S215, Yes), n is increased by 1 (step S216), and the process returns to step S204 and is repeated. If n <L is not satisfied (step S215, No), if t <T (step S217, Yes), t is increased by 1 (step S218), and the process returns to step S202 to repeat the process. If t <T is not satisfied (step S217, No), the process is terminated.

図６を用いて言語確率算出装置１における言語確率算出方法について説明する。図６は、第１の実施形態に係る言語確率算出装置における言語確率算出方法の一例を説明するための図である。なお、図６の例ではＬ＝３であり、中間層は１層のみとする。 A language probability calculation method in the language probability calculation apparatus 1 will be described with reference to FIG. FIG. 6 is a diagram for explaining an example of a language probability calculation method in the language probability calculation apparatus according to the first embodiment. In the example of FIG. 6, L = 3, and only one intermediate layer is provided.

図６に示すように、順に入力された記号ｘ_１，ｘ_２，…，ｘ_ｔに対して、中間層活性度ベクトル算出部１０３は、ｎを２として、現在の記号に対する入力層の活性度ベクトルｈ_１ ^（ｔ）と１つ前の記号に対する中間層の活性度ベクトルｈ_２ ^{（ｔ−１）}とから中間層の活性度ベクトルｈ_２ ^（ｔ）を算出する（図５のステップＳ２０５、Ｓ２０６、Ｓ２０８、Ｓ２１４）。 As shown in FIG. 6, for the symbols x ₁ , x ₂ ,..., X _t that are sequentially input, the intermediate layer activity vector calculation unit 103 sets n to 2 and the activity of the input layer for the current symbol. An intermediate layer activity vector h ₂ ^(t) is calculated from the vector h ₁ ^(t) and the intermediate layer activity vector h ₂ ^(t−1) for the previous symbol (steps S205 and S206 in FIG. 5 ⁾ . , S208, S214).

次に、平均活性度ベクトル算出部１０４は、ｎを３として、過去２個分の中間層の活性度ベクトルの平均、過去４個分の中間層の活性度ベクトルの平均、および過去８個分の中間層の活性度ベクトルの平均を算出する（図５のステップＳ２１０、Ｓ２１１）。なお、このとき中間層の活性度ベクトルｈ_２ ^（ｔ）は過去１回分の中間層の活性度ベクトルの平均とみなされる。 Next, the average activity vector calculation unit 104 sets n to 3, the average of the activity vectors of the past two intermediate layers, the average of the past four intermediate layer activity vectors, and the past eight The average of the activity vectors of the intermediate layers is calculated (steps S210 and S211 in FIG. 5). At this time, the activity vector h ₂ ^(t) of the intermediate layer is regarded as an average of the activity vectors of the intermediate layer for the past one time.

出力層活性度ベクトル算出部１０５は、過去１回分、２回分、４回分、８回分の平均活性度ベクトルに活性化関数を適用し、出力層における活性度ベクトルｈ_３ ^（ｔ）を算出する（図５のステップＳ２１４）。なお、過去何個分までの平均活性度ベクトルの算出対象の活性度ベクトルが過去何個分であるかは、定数Ｍおよび関数Ａ^（ｍ）の設定による。 The output layer activity vector calculation unit 105 applies the activation function to the average activity vector for the past one time, two times, four times, and eight times, and calculates the activity vector h ₃ ^(t) in the output layer ( Step S214 in FIG. It should be noted that the past number of activity vectors for which the average activity vector is calculated depends on the setting of the constant M and the function A ^(m) .

［第１の実施形態の効果］
実際のデータを用いて言語確率算出装置１の評価を行った結果を用いて第１の実施形態の効果について説明する。まず、ＲＮＮ言語モデルのパラメータを求めるため「日本語話し言葉コーパス」に含まれる学会講演を人が書き起こした文章を学習データとして使用した。 [Effect of the first embodiment]
The effects of the first embodiment will be described using the results of the evaluation of the language probability calculation device 1 using actual data. First, in order to obtain the parameters of the RNN language model, we used sentences that were written by a person as a learning data for academic conferences included in the “Japanese spoken corpus”.

また、ＲＮＮ言語モデルは、学習データに出現している単語（語彙サイズ５２，５６４）に対し、入力層（Ｈ_１＝５２，５６４）、中間層（Ｈ_２＝４００）、出力層（Ｈ_３＝５２，５６４）からなる３層（Ｌ＝３）のＲＮＮとして構成した。また、第１の実施形態におけるＲＮＮ言語モデルのパラメータ（ｗ_２，３ ^（ｍ）［ｋ，ｊ］）の推定には確率的勾配法を用い、Ｍ＝６、Ａ^（ｍ）＝２^{（ｍ−１）}、学習率η＝０．１、β＝１０^−５とした。 In addition, the RNN language model has an input layer (H ₁ = 52,564), an intermediate layer (H ₂ = 400), and an output layer (H ₃ ) for words appearing in learning data (vocabulary sizes 52 and 564). = 52,564) and configured as a three-layer (L = 3) RNN. In addition, the stochastic gradient method is used to estimate the parameters (w _2,3 ^(m) [k, j]) of the RNN language model in the first embodiment, and M = 6, A ^(m) = 2 ^{(m -1)} , learning rate η = 0.1, β = 10 ⁻⁵ .

第１の実施形態におけるＲＮＮ言語モデルおよび従来のＲＮＮ言語モデルに学習データを学習させ、学習データとは異なる学会講演の書き起こしデータ１０講演分を評価データとして、それぞれのＲＮＮ言語モデルのテストセットパープレキシティを計算した結果を表１に示す。 The learning data is learned by the RNN language model in the first embodiment and the conventional RNN language model, and transcripts of 10 conference lectures different from the learning data are used as evaluation data, and test sets of each RNN language model are used. The results of calculating plexity are shown in Table 1.

なお、テストセットパープレキシティは言語モデルの性能を表す指標として知られている。テストセットパープレキシティは、評価データｘ_１，ｘ_２，…ｘ_τ，…，ｘ_Ｒが与えられたとき、言語モデルを用いて式（１７）に示すエントロピーで２を累乗した値、すなわち２^Ｈとして定義される。テストセットパープレキシティが小さいほど言語モデルの性能が高いことを意味するため、表１より第１の実施形態におけるＲＮＮ言語モデルの方が従来のＲＮＮ言語モデルより高い性能を示すことが分かる。 Test set perplexity is known as an index representing the performance of a language model. When the evaluation data x ₁ , x ₂ ,..., X _τ ,..., X _R are given, the test set perplexity is a value obtained by raising 2 to the entropy shown in Expression (17) using a language model, that is, 2 Defined as ^H. Since the smaller the test set perplexity means that the performance of the language model is higher, it can be seen from Table 1 that the RNN language model in the first embodiment shows higher performance than the conventional RNN language model.

言語確率算出装置１では、まず記号ベクトル読込部１０１は記号を表すベクトルを順次読み込む。次に、入力層活性度ベクトル算出部１０２は、記号ベクトル読込部１０１によってベクトルが読み込まれるたびに、ベクトルを基に入力層における活性度ベクトルを算出する。そして、中間層活性度ベクトル算出部１０３は、入力層活性度ベクトル算出部１０２によって入力層における活性度ベクトルが算出されるたびに、中間層において前回算出された活性度ベクトルと、入力層における活性度ベクトルとを基に中間層における活性度ベクトルを算出する。 In the language probability calculation apparatus 1, first, the symbol vector reading unit 101 sequentially reads vectors representing symbols. Next, every time a vector is read by the symbol vector reading unit 101, the input layer activity vector calculation unit 102 calculates an activity vector in the input layer based on the vector. Then, each time the input layer activity vector calculation unit 102 calculates the activity vector in the input layer, the intermediate layer activity vector calculation unit 103 calculates the activity vector previously calculated in the intermediate layer and the activity in the input layer. Based on the degree vector, the activity vector in the intermediate layer is calculated.

ここで、平均活性度ベクトル算出部１０４は、中間層における活性度ベクトルのうち、所定回数前までに算出された活性度ベクトルの平均である平均活性度ベクトルを算出する。そして、出力層活性度ベクトル算出部１０５は、中間層活性度ベクトル算出部１０３によって中間層における活性度ベクトルが算出されるたびに、中間層における活性度ベクトルと、中間層における平均活性度ベクトルとを基に出力層における活性度ベクトルを算出する。そして、記号出現確率算出部１０６は、出力層における活性度ベクトルを基に所定の記号の出現確率を算出する。 Here, the average activity vector calculation unit 104 calculates an average activity vector that is an average of the activity vectors calculated up to a predetermined number of times among the activity vectors in the intermediate layer. The output layer activity vector calculation unit 105 calculates the activity vector in the intermediate layer and the average activity vector in the intermediate layer each time the activity vector in the intermediate layer is calculated by the intermediate layer activity vector calculation unit 103. Based on the above, the activity vector in the output layer is calculated. Then, the symbol appearance probability calculation unit 106 calculates the appearance probability of a predetermined symbol based on the activity vector in the output layer.

このように、平均活性度ベクトル算出部１０４によって、任意の回数前までに算出された活性度ベクトルの影響を、最終的な記号の出現確率の算出に十分に与えることが可能となる。よって、第１の実施形態によれば、長い文脈の情報、すなわち文章の話題やスタイル、話者に特有の単語や話し方等を効果的に利用して次の単語を適切に予測することができる。 As described above, the average activity vector calculation unit 104 can sufficiently affect the influence of the activity vector calculated up to an arbitrary number of times before the final symbol appearance probability. Therefore, according to the first embodiment, it is possible to appropriately predict the next word by effectively using long context information, that is, the topic and style of the sentence, the word and the way of speaking specific to the speaker, and the like. .

また、平均活性度ベクトル算出部１０４は、所定回数を変化させて複数の平均活性度ベクトルを算出するようにしてもよい。このとき、出力層活性度ベクトル算出部１０５は、中間層における活性度ベクトルおよび中間層における複数の平均活性度ベクトルのそれぞれの重み付きの和を出力層における活性度ベクトルとして算出する。さらに、損失関数定義部１１１は学習データを用いて、重み付きの和における重みをパラメータとする損失関数を定義する。そして、パラメータ推定部１１２は損失関数が最小となるようにパラメータを推定する。 In addition, the average activity vector calculation unit 104 may calculate a plurality of average activity vectors by changing the predetermined number of times. At this time, the output layer activity vector calculation unit 105 calculates the weighted sum of the activity vector in the intermediate layer and the plurality of average activity vectors in the intermediate layer as the activity vector in the output layer. Further, the loss function definition unit 111 uses the learning data to define a loss function using the weight in the weighted sum as a parameter. Then, the parameter estimation unit 112 estimates parameters so that the loss function is minimized.

このように、複数の平均活性度ベクトルを用い、学習により予測精度を向上させることができるため、長い文脈の情報をより効果的に利用できるようになる。 In this way, since the prediction accuracy can be improved by learning using a plurality of average activity vectors, information in a long context can be used more effectively.

［第２の実施形態］
次に、第２の実施形態として、本発明の言語確率算出方法を音声認識装置に適用した場合について説明する。音声認識装置においては、音響的な妥当性および言語的な妥当性の両方を考慮して認識結果を出力する。第２の実施形態においては、言語的な妥当性の判定に本発明の言語確率算出方法を用いる。 [Second Embodiment]
Next, a case where the language probability calculation method of the present invention is applied to a speech recognition apparatus will be described as a second embodiment. The speech recognition apparatus outputs a recognition result in consideration of both acoustic validity and linguistic validity. In the second embodiment, the language probability calculation method of the present invention is used to determine linguistic validity.

［第２の実施形態の構成］
図７を用いて、第２の実施形態に係る音声認識装置の構成について説明する。図７は、第２の実施形態に係る音声認識装置の構成の一例を示す図である。図７に示すように、音声認識装置２は、音声信号入力部２１、候補文作成部２２、音響スコア算出部２３、言語確率算出部２４、言語スコア算出部２５および認識結果抽出部２６を有する。 [Configuration of Second Embodiment]
The configuration of the speech recognition apparatus according to the second embodiment will be described with reference to FIG. FIG. 7 is a diagram illustrating an example of the configuration of the speech recognition apparatus according to the second embodiment. As illustrated in FIG. 7, the speech recognition device 2 includes a speech signal input unit 21, a candidate sentence creation unit 22, an acoustic score calculation unit 23, a language probability calculation unit 24, a language score calculation unit 25, and a recognition result extraction unit 26. .

音声信号入力部２１には、認識対象となる音声信号が入力される。候補文作成部２２が入力された音声信号に合致する文の候補である複数の候補文を作成する。ここで、候補文作成部２２はＱ個の候補文Ｘ_１，Ｘ_１，…，Ｘ_Ｑを作成する。 An audio signal to be recognized is input to the audio signal input unit 21. The candidate sentence creation unit 22 creates a plurality of candidate sentences that are sentence candidates that match the input voice signal. Here, the candidate sentence creation unit 22 creates Q candidate sentences X ₁ , X ₁ ,..., X _Q.

音響スコア算出部２３は候補文ごとの音声信号との音響的な一致度を表す音響スコアを算出する。音響スコア算出部２３は、Ｑ個の候補文それぞれに対し、音響スコアａｍｓｃｏｒｅ（Ｘ_ｑ）を算出する。なお、音響スコア算出部２３は既知の方法を用いて音響スコアを算出するようにしてよい。 The acoustic score calculation unit 23 calculates an acoustic score representing an acoustic coincidence with the speech signal for each candidate sentence. The acoustic score calculation unit 23 calculates an acoustic score amscore (X _q ) for each of the Q candidate sentences. The acoustic score calculation unit 23 may calculate the acoustic score using a known method.

ここで、従来のＮグラム言語モデルを用いる場合、言語スコアをｌｍｓｃｏｒｅ（Ｘ_ｑ）とすると、式（１８）に示すようにｓｃｏｒｅ（Ｘ_ｑ）が算出される。 Here, when the conventional N-gram language model is used, assuming that the language score is lmscore (X _q ), score (X _q ) is calculated as shown in Expression (18).

なお、言語スコアｌｍｓｃｏｒｅ（Ｘ_ｑ）はＮグラムの言語確率を基に、式（１９）によって算出される。ここで、候補文Ｘ_ｑの記号列ｘ_ｑ，１，…_，ｘ_{ｑ，τ−１}における記号ｘ_ｑ，τのＮグラムによって算出された出現確率はＰ_{ｎｇｒａｍ}（ｘ_ｑ，τ｜ｘ_{ｑ，τ−Ｎ＋１}…ｘ_{ｑ，τ−１}）と表される。このとき、Ｎは３〜４程度である。 The language score lmscore (X _q ) is calculated by the equation (19) based on the language probability of N grams. Here, the appearance probability calculated by the N-gram of the symbol x _{q, τ} in the symbol string x _{q, 1,} ... _, X _{q, τ−1} of the candidate sentence X _q is P _ngram (x _{q, τ} | x _{q, τ−N + 1} ... x _{q, τ−1} ). At this time, N is about 3-4.

音声認識装置２においては、Ｎグラム言語モデルの代わりに第１の実施形態の言語確率算出方法におけるＲＮＮ言語モデルが用いられる。言語確率算出部２４は、第１の実施形態と同様の言語確率算出方法により、候補文ごとに記号の出現確率を算出する。すなわち、言語確率算出部２４は、候補文Ｘ_ｑの記号列ｘ_ｑ，１，…_，ｘ_{ｑ，τ−１}における記号ｘ_ｑ，τの出現確率Ｐ_ｒｎｎ（ｘ_ｑ，τ｜ｘ_ｑ，１…ｘ_{ｑ，τ−１}）を算出する。 In the speech recognition device 2, the RNN language model in the language probability calculation method of the first embodiment is used instead of the N-gram language model. The language probability calculation unit 24 calculates the appearance probability of the symbol for each candidate sentence by the same language probability calculation method as that of the first embodiment. That is, the language probability calculation unit 24 _generates the appearance probability P _rnn (x _{q, τ} | x _{q, 1} of the symbols x _{q, τ} in the symbol string x _{q, 1,} ... _, X _{q, τ−1} of the candidate sentence X _q. ... _{Xq, [tau] -1} ) is calculated.

また、言語スコア算出部２５は出現確率を基に候補文ごとの言語スコアを算出する。言語スコア算出部２５は、言語確率算出部２４が算出した記号の出現確率を基に、各候補文の言語スコアを式（２０）によって算出する。 In addition, the language score calculation unit 25 calculates a language score for each candidate sentence based on the appearance probability. The language score calculation unit 25 calculates the language score of each candidate sentence based on the appearance probability of the symbol calculated by the language probability calculation unit 24 using Expression (20).

認識結果抽出部２６は、候補文のうち、音響スコアと言語スコアとの合計が最も大きい候補文を音声信号に合致する文として抽出する。認識結果抽出部２６は、音響スコアおよび言語スコアを基に、式（２１）によって各候補文のｓｃｏｒｅ（Ｘ_ｑ）を算出し、ｓｃｏｒｅ（Ｘ_ｑ）が最大となる候補文を認識結果として抽出する。ここで、λは対数確率に対するスケーリング係数を表す正の定数である。 The recognition result extraction unit 26 extracts the candidate sentence having the largest sum of the acoustic score and the language score from the candidate sentences as a sentence that matches the voice signal. The recognition result extraction unit 26 calculates score (X _q ) of each candidate sentence by Expression (21) based on the acoustic score and the language score, and extracts a candidate sentence having the maximum score (X _q ) as a recognition result. To do. Here, λ is a positive constant representing a scaling coefficient with respect to the logarithmic probability.

［第２の実施形態の処理］
図８を用いて第２の実施形態の処理について説明する。図８は、第２の実施形態に係る音声認識装置における処理の一例を示すフローチャートである。図８に示すように、まず音声信号入力部２１に音声信号が入力される（ステップＳ３０１）。次に、候補文作成部２２は、音声認識結果の候補文を作成する（ステップＳ３０２）。そして、音響スコア算出部２３は各候補文の音響スコアを算出する（ステップＳ３０３）。 [Process of Second Embodiment]
The processing of the second embodiment will be described with reference to FIG. FIG. 8 is a flowchart illustrating an example of processing in the speech recognition apparatus according to the second embodiment. As shown in FIG. 8, an audio signal is first input to the audio signal input unit 21 (step S301). Next, the candidate sentence creation unit 22 creates a candidate sentence of the speech recognition result (step S302). And the acoustic score calculation part 23 calculates the acoustic score of each candidate sentence (step S303).

ここで、言語確率算出部２４は言語確率を算出する（ステップＳ３０４）。そして、言語スコア算出部２５は言語確率を基に言語スコアを算出する（ステップＳ３０５）。認識結果抽出部２６は、音響スコアおよび言語スコアにスケーリング係数を掛けた値の和が最大となる候補文を抽出し（ステップＳ３０６）、認識結果として出力する（ステップＳ３０７）。 Here, the language probability calculation unit 24 calculates a language probability (step S304). The language score calculation unit 25 calculates a language score based on the language probability (step S305). The recognition result extraction unit 26 extracts a candidate sentence having the maximum sum of values obtained by multiplying the acoustic score and the language score by the scaling coefficient (step S306), and outputs the candidate sentence as a recognition result (step S307).

［第２の実施形態の効果］
Ｎグラム言語モデル、従来のＲＮＮ言語モデルおよび第２の実施形態におけるＲＮＮ言語モデルを用いて音声認識を行った場合のそれぞれの単語誤り率を表２に示す。ただし、初めに出力する候補の数Ｑは５００に設定した。また、単語誤り率は、実際に話された単語の中で誤って認識した単語の割合を表しており、小さいほど音声認識の精度が高いことを表す。 [Effects of Second Embodiment]
Table 2 shows respective word error rates when speech recognition is performed using the N-gram language model, the conventional RNN language model, and the RNN language model in the second embodiment. However, the number Q of candidates to be output first is set to 500. The word error rate represents the proportion of words that are mistakenly recognized among the actually spoken words. The smaller the word error rate, the higher the accuracy of speech recognition.

表２に示すように、Ｎグラム言語モデルを用いる音声認識では単語誤り率が１４．８％であった。また、従来のＲＮＮ言語モデルを用いる音声認識では単語誤り率は１３．９％であった。また、第２の実施形態におけるＲＮＮ言語モデルを用いる音声認識では単語誤り率は１３．５％となった。これより、第２の実施形態におけるＲＮＮ言語モデルは、Ｎグラム言語モデルおよび従来のＲＮＮ言語モデルよりも高い精度の音声認識を実現することが示された。 As shown in Table 2, the word error rate in speech recognition using the N-gram language model was 14.8%. In addition, the word error rate was 13.9% in speech recognition using the conventional RNN language model. In the speech recognition using the RNN language model in the second embodiment, the word error rate is 13.5%. From this, it was shown that the RNN language model in the second embodiment realizes speech recognition with higher accuracy than the N-gram language model and the conventional RNN language model.

音声認識装置２では、まず候補文作成部２２が入力された音声信号に合致する文の候補である複数の候補文を作成する。そして、音響スコア算出部２３は候補文ごとの音声信号との音響的な一致度を表す音響スコアを算出する。 In the speech recognition apparatus 2, first, the candidate sentence creation unit 22 creates a plurality of candidate sentences that are sentence candidates that match the input speech signal. And the acoustic score calculation part 23 calculates the acoustic score showing an acoustic coincidence with the audio | voice signal for every candidate sentence.

また、言語確率算出部２４は、第１の実施形態と同様の言語確率算出方法により、候補文ごとに記号の出現確率を算出する。そして、言語スコア算出部２５は出現確率を基に候補文ごとの言語スコアを算出する。そして、認識結果抽出部２６は、候補文のうち、音響スコアと言語スコアとの合計が最も大きい候補文を音声信号に合致する文として抽出する。 In addition, the language probability calculation unit 24 calculates a symbol appearance probability for each candidate sentence by the same language probability calculation method as that of the first embodiment. And the language score calculation part 25 calculates the language score for every candidate sentence based on an appearance probability. And the recognition result extraction part 26 extracts a candidate sentence with the largest sum of an acoustic score and a language score as a sentence which corresponds to an audio | voice signal among candidate sentences.

［その他の実施形態］
図６等においては、中間層が１層である場合を例として説明したが、本発明における中間層は１層に限られず複数であってもよい。その場合、言語確率算出装置１の中間層活性度ベクトル算出部１０３は、入力層活性度ベクトル算出部１０２によって入力層における活性度ベクトルが算出されるたびに、中間層において前回算出された活性度ベクトルおよび入力層における活性度ベクトルだけでなく、中間層の下の中間層における活性度ベクトルも基にして中間層における活性度ベクトルを算出する。 [Other Embodiments]
In FIG. 6 and the like, the case where the intermediate layer is one layer has been described as an example. However, the intermediate layer in the present invention is not limited to one layer and may be a plurality. In that case, the intermediate layer activity vector calculation unit 103 of the language probability calculation device 1 calculates the activity level previously calculated in the intermediate layer every time the input layer activity vector calculation unit 102 calculates the activity vector in the input layer. The activity vector in the intermediate layer is calculated based on the activity vector in the intermediate layer below the intermediate layer as well as the vector and the activity vector in the input layer.

また、言語確率算出装置１には、中間層活性度ベクトル算出部１０３によって中間層における活性度ベクトルが算出されるたびに、中間層活性度ベクトル算出部１０３によって算出された活性度ベクトルと、中間層における平均活性度ベクトルとを基に中間層の上の中間層における活性度ベクトルを算出する層間活性度ベクトル算出部がさらに設けられる。 The language probability calculation device 1 also includes an activity vector calculated by the intermediate layer activity vector calculation unit 103 and an intermediate level each time an activity vector in the intermediate layer is calculated by the intermediate layer activity vector calculation unit 103. An interlayer activity vector calculation unit is further provided for calculating an activity vector in the intermediate layer above the intermediate layer based on the average activity vector in the layer.

［システム構成等］
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵ（Central Processing Unit）および当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 [System configuration, etc.]
Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. Further, all or any part of each processing function performed in each device is realized by a CPU (Central Processing Unit) and a program analyzed and executed by the CPU, or hardware by wired logic. Can be realized as

また、本実施形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともでき、あるいは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 In addition, among the processes described in the present embodiment, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

［プログラム］
図９は、プログラムが実行されることにより、言語確率算出装置または音声認識装置が実現されるコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。 [program]
FIG. 9 is a diagram illustrating an example of a computer in which a language probability calculation device or a speech recognition device is realized by executing a program. The computer 1000 includes a memory 1010 and a CPU 1020, for example. The computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These units are connected by a bus 1080.

メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１およびＲＡＭ（Random Access Memory）１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090. The disk drive interface 1040 is connected to the disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example. The video adapter 1060 is connected to the display 1130, for example.

ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、言語確率算出装置または音声認識装置の各処理を規定するプログラムは、コンピュータにより実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、言語確率算出装置または音声認識装置における機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤ（Solid State Drive）により代替されてもよい。 The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each process of the language probability calculation device or the speech recognition device is implemented as a program module 1093 in which a code executable by a computer is described. The program module 1093 is stored in the hard disk drive 1090, for example. For example, a program module 1093 for executing processing similar to the functional configuration in the language probability calculation device or the speech recognition device is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

また、上述した実施形態の処理で用いられる設定データは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して実行する。 The setting data used in the processing of the above-described embodiment is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 and executes them as necessary.

なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３およびプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３およびプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read out by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). The program module 1093 and the program data 1094 may be read by the CPU 1020 from another computer via the network interface 1070.

１言語確率算出装置
１０予測部
１１学習部
１２記憶部
１０１記号ベクトル読込部
１０２入力層活性度ベクトル算出部
１０３中間層活性度ベクトル算出部
１０４平均活性度ベクトル算出部
１０５出力層活性度ベクトル算出部
１０６記号出現確率算出部
１１１損失関数定義部
１１２パラメータ推定部
１２１ＲＮＮ言語モデル記憶部 DESCRIPTION OF SYMBOLS 1 Language probability calculation apparatus 10 Prediction part 11 Learning part 12 Storage part 101 Symbol vector reading part 102 Input layer activity vector calculation part 103 Intermediate layer activity vector calculation part 104 Average activity vector calculation part 105 Output layer activity vector calculation part 106 Symbol appearance probability calculation unit 111 Loss function definition unit 112 Parameter estimation unit 121 RNN language model storage unit

Claims

A language probability calculation method for calculating a language probability using a neural network model having an input layer, an intermediate layer having recursively coupled neurons, and an output layer,
A symbol vector reading step for sequentially reading vectors representing symbols;
An input layer activity vector calculation step of calculating an activity vector in the input layer based on the vector each time the vector is read by the symbol vector reading step;
Each time an activity vector in the input layer is calculated by the input layer activity vector calculation step, the intermediate layer is based on the activity vector previously calculated in the intermediate layer and the activity vector in the input layer. An intermediate layer activity vector calculation step of calculating an activity vector in
An average activity vector calculating step of calculating an average activity vector that is an average of the activity vectors calculated up to a predetermined number of times among the activity vectors in the intermediate layer;
Each time the activity vector in the intermediate layer is calculated by the intermediate layer activity vector calculation step, the activity in the output layer is based on the activity vector in the intermediate layer and the average activity vector in the intermediate layer. An output layer activity vector calculation step for calculating a degree vector;
A symbol appearance probability calculating step of calculating an appearance probability of a predetermined symbol based on an activity vector in the output layer;
A language probability calculation method comprising:

The average activity vector calculating step calculates a plurality of average activity vectors by changing the predetermined number of times,
The output layer activity vector calculation step calculates a weighted sum of the activity vector in the intermediate layer and the plurality of average activity vectors in the intermediate layer as the activity vector in the output layer. The language probability calculation method according to claim 1.

A loss function defining step for defining a loss function using the weight in the weighted sum as a parameter, using learning data;
A parameter estimation step for estimating the parameter so that the loss function is minimized;
The language probability calculation method according to claim 2, further comprising:

Having a plurality of intermediate layers,
In the intermediate layer activity vector calculation step, whenever the activity vector in the input layer is calculated by the input layer activity vector calculation step, the activity vector previously calculated in the intermediate layer and the intermediate layer 4. The activity vector in the intermediate layer is calculated based on the activity vector in the lower intermediate layer and the average activity vector in the intermediate layer. 5. Language probability calculation method.

A candidate sentence creation step of creating a plurality of candidate sentences that are sentence candidates that match the input speech signal;
An acoustic score calculation step of calculating an acoustic score representing an acoustic coincidence with the voice signal for each candidate sentence, and
The symbol vector reading step sequentially reads a vector representing words constituting the candidate sentence,
A language score calculating step of calculating a language score for each candidate sentence based on the appearance probability calculated by the symbol appearance probability calculating step;
A recognition result extraction step of extracting a candidate sentence having the largest sum of the acoustic score and the language score as a sentence matching the voice signal among the candidate sentences;
The language probability calculation method according to claim 1, further comprising:

A language probability calculation device that calculates a language probability using a neural network model having an input layer, an intermediate layer having recursively coupled neurons, and an output layer,
A symbol vector reading unit for sequentially reading vectors representing symbols;
An input layer activity vector calculation unit that calculates an activity vector in the input layer based on the vector each time the vector is read by the symbol vector reading unit;
Each time the activity vector in the input layer is calculated by the input layer activity vector calculation unit, the intermediate layer based on the activity vector previously calculated in the intermediate layer and the activity vector in the input layer An intermediate layer activity vector calculation unit for calculating an activity vector in
An average activity vector calculation unit that calculates an average activity vector that is an average of activity vectors calculated up to a predetermined number of times among the activity vectors in the intermediate layer;
Each time the activity vector in the intermediate layer is calculated by the intermediate layer activity vector calculation unit, the activity in the output layer is based on the activity vector in the intermediate layer and the average activity vector in the intermediate layer. An output layer activity vector calculation unit for calculating a degree vector;
A symbol appearance probability calculating unit that calculates an appearance probability of a predetermined symbol based on an activity vector in the output layer;
A language probability calculation apparatus comprising:

A language probability calculation program for causing a computer to function as the language probability calculation device according to claim 6.