JP6620950B2

JP6620950B2 - Word learning device, word learning method, and word learning program

Info

Publication number: JP6620950B2
Application number: JP2017039543A
Authority: JP
Inventors: 林　克彦; 克彦林; 永田　昌明; 昌明永田; 仁新保
Original assignee: Nara Institute of Science and Technology NUC; Nippon Telegraph and Telephone Corp
Current assignee: Nara Institute of Science and Technology NUC; Nippon Telegraph and Telephone Corp
Priority date: 2017-03-02
Filing date: 2017-03-02
Publication date: 2019-12-18
Anticipated expiration: 2037-03-02
Also published as: JP2018147100A

Description

本発明は、文の集合から単語ベクトルを学習する単語学習装置、単語学習方法、及び単語学習プログラムに関する。 The present invention relates to a word learning device, a word learning method, and a word learning program that learn a word vector from a set of sentences.

従来、単語の意味を単語ベクトルで表現する手法として、一般に「ｗｏｒｄ２ｖｅｃ」と呼ばれるソフトウェアで使用されているＣＶＯＷ（ＣｏｕｎｔｉｎｕｏｕｓＢａｇ−ｏｆ−Ｗｏｒｄｓ）モデル、及び、Ｓｋｉｐ−ｇｒａｍモデルが提案されている（非特許文献２参照）。これらのモデルは、文の集合を入力として、文の集合における各文に含まれる各単語に対してｄ次元ベクトルを学習し、出力するモデルとなっている。 Conventionally, as a technique for expressing the meaning of a word as a word vector, a CVOW (Counterinous Bag-of-Words) model and a Skip-gram model that are generally used in software called “word2vec” have been proposed (non-null) Patent Document 2). These models are models that receive a set of sentences and learn and output a d-dimensional vector for each word included in each sentence in the set of sentences.

ここでは、ＣＢＯＷモデルよりも優れたＳｋｉｐ−ｇｒａｍモデルについて、その概要を説明する。 Here, an outline of the Skip-gram model superior to the CBOW model will be described.

まず、入力とした文の集合Ｘ中の文ｘ＝ｗ_１，…，ｗ_Ｔに対して、

…（１） First, for a sentence x = w ₁ ,..., W _{T in} a set X of input sentences,

... (1)

という確率モデルを考える。ここで、ｃは単語ｗ_ｔの周辺単語を考慮する履歴の長さを表す。ｔ＋ｊが１より小さくなる場合は、ｗ_ｔ＋ｊは＜ｓ＞のような仮想先頭単語を考え、ｔ＋ｊがＴより大きくなる場合は、ｗ_ｔ＋ｊは＜／ｓ＞のような仮想文末単語を考える。 Consider the probability model. Here, c represents the length of the history considering the neighboring words of the word w _t . When t + j is smaller than 1, w _{t + j} considers a virtual head word such as <s>, and when t + j is larger than T, w _{t + j} considers a virtual end word such as </ s>.

Ｓｋｉｐ−ｇｒａｍモデルでは、

…（２） In the Skip-gram model,

... (2)

としてモデル化する。ここで、ｖ及びｖ’は、下付き添字が表す単語の単語ベクトル、Ｗは、文書Ｘ中に含まれる全ての単語の集合とする。上記（１）式の確率モデルを最大化するような単語ベクトルｖ及びｖ’を学習する。 As a model. Here, v and v ′ are word vectors of words represented by subscripts, and W is a set of all words included in the document X. Learning word vectors v and v 'that maximize the probability model of equation (1) above.

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121-2159, 2011.John Duchi, Elad Hazan, and Yoram Singer.Adaptive subgradient methods for online learning and stochastic optimization.Journal of Machine Learning Research, 12 (Jul): 2121-2159, 2011. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean.Efficient estimation of word representations in vector space.arXiv preprint arXiv: 1301.3781, 2013.

Ｓｋｉｐ−ｇｒａｍモデル等の単語ベクトル学習モデルでは、ユークリッド空間上での積、和等を使って確率モデルを考えるため、一般に獲得される単語ベクトルの各次元要素は関連性を持たない。しかし、離散的な信号のように単語ベクトルの各次元間に関連性があり、単語ベクトルの次元の並びに意味を持たせることができれば、より多くの情報を単語ベクトルに記憶させることが可能となる。 In a word vector learning model such as the Skip-gram model, since a probability model is considered using products, sums, etc. in the Euclidean space, each dimension element of a word vector generally acquired is not related. However, if there is a relationship between the dimensions of the word vector as in a discrete signal and the meaning of the word vector dimension can be given, more information can be stored in the word vector. .

本発明は、以上のような事情に鑑みてなされたものであり、各次元間に関連性を持たせた単語ベクトルを学習することができる単語学習装置、単語学習方法、及び単語学習プログラムを提供することを目的とする。 The present invention has been made in view of the circumstances as described above, and provides a word learning device, a word learning method, and a word learning program that can learn a word vector having a relationship between dimensions. The purpose is to do.

上記目的を達成するために、本発明の単語学習装置は、文の集合、及び単語の集合を入力とし、前記文の集合の各々の文に含まれる対象単語と、当該対象単語の周辺の周辺単語とを含む周辺文脈付き単語を抽出する周辺文脈付き単語抽出部と、前記周辺文脈付き単語抽出部により抽出された、全ての前記周辺文脈付き単語に対する、前記対象単語の単語ベクトルと、前記周辺単語の各々の単語ベクトルに対して巡回畳み込み又は巡回相互相関を用いた演算を行うことにより得られるベクトルとの積で算出されるスコア値を用いて表される目的関数を最適化するように、前記対象単語の各々の単語ベクトル、及び前記周辺単語の各々の単語ベクトルを学習する単語ベクトル学習部と、前記単語ベクトル学習部により学習された前記単語ベクトルを出力する出力部と、を含む。 In order to achieve the above object, the word learning device of the present invention receives a set of sentences and a set of words as input, and includes target words included in each sentence of the set of sentences and surroundings around the target words. A word extraction unit with a peripheral context for extracting a word with a peripheral context including a word, a word vector of the target word for all the words with a peripheral context extracted by the word extraction unit with a peripheral context, and the peripheral So as to optimize the objective function represented using the score value calculated by the product with the vector obtained by performing the operation using cyclic convolution or cyclic cross-correlation for each word vector of the word, A word vector learning unit that learns each word vector of the target word and each word vector of the surrounding words, and the word vector learned by the word vector learning unit Comprising an output unit for outputting.

なお、前記単語ベクトル学習部は、前記対象単語の各々の単語ベクトル、及び前記周辺単語の各々の単語ベクトルを含む単語ベクトルの集合をθとし、周辺文脈付き単語ｘと正解ラベルｙとのペア（ｘ，ｙ）の集合をＸとし、前記周辺文脈付き単語ｘに対するスコア値をη_ｘとした場合に、下記の式で定式化されたロジスティック回帰モデルＬ（θ）を前記目的関数として、前記目的関数を最小化する前記単語ベクトルの集合θを、最適な単語ベクトルの集合θ＾として学習するようにしても良い。

ただし、対象単語ｗ_tの単語ベクトルを

とし、対象単語ｗ_tの左文脈の周辺単語の単語ベクトルを

とし、対象単語ｗ_tの右文脈の周辺単語の単語ベクトルを

とすると、前記スコア値η_ｘは、以下の式で算出される。

The word vector learning unit sets θ as a set of word vectors including each word vector of the target word and each word vector of the peripheral words, and a pair of a word x with a peripheral context and a correct label y ( When the set of x, y) is X and the score value for the word x with surrounding context is η _x , the objective function is a logistic regression model L (θ) formulated by the following equation: The set of word vectors θ that minimizes the function may be learned as an optimal set of word vectors θ ^.

However, the word vector of the target word w _t

And the word vector of the surrounding words in the left context of the target word w _t

And the word vector of the surrounding words in the right context of the target word w _t

Then, the score value η _x is calculated by the following equation.

上記目的を達成するために、本発明の単語学習方法は、周辺文脈付き単語抽出部、単語ベクトル学習部、及び出力部を有する単語学習装置における単語学習方法であって、前記周辺文脈付き単語抽出部が、文の集合、及び単語の集合を入力とし、前記文の集合の各々の文に含まれる対象単語と、当該対象単語の周辺の周辺単語とを含む周辺文脈付き単語を抽出するステップと、前記単語ベクトル学習部が、前記周辺文脈付き単語抽出部により抽出された、全ての前記周辺文脈付き単語に対する、前記対象単語の単語ベクトルと、前記周辺単語の各々の単語ベクトルに対して巡回畳み込み又は巡回相互相関を用いた演算を行うことにより得られるベクトルとの積で算出されるスコア値を用いて表される目的関数を最適化するように、前記対象単語の各々の単語ベクトル、及び前記周辺単語の各々の単語ベクトルを学習するステップと、前記出力部が、前記単語ベクトル学習部により学習された前記単語ベクトルを出力するステップステップと、を含む。 In order to achieve the above object, a word learning method of the present invention is a word learning method in a word learning device having a word extraction unit with a surrounding context, a word vector learning unit, and an output unit, the word extraction with a surrounding context. A section having a set of sentences and a set of words as input, and extracting a word with a peripheral context including a target word included in each sentence of the set of sentences and a peripheral word around the target word; The word vector learning unit cyclically convolves the word vector of the target word with respect to all the words with the surrounding context extracted by the word extracting unit with the surrounding context and the word vectors of the surrounding words. Alternatively, the target unit may be optimized so as to optimize an objective function represented by using a score value calculated by a product with a vector obtained by performing an operation using cyclic cross-correlation. A step of learning of each word vector, and the word vectors of each of the peripheral words, the output unit comprises, a step step of outputting the word vectors learned by the word vector learning unit.

なお、前記単語ベクトル学習部が学習するステップでは、前記対象単語の各々の単語ベクトル、及び前記周辺単語の各々の単語ベクトルを含む単語ベクトルの集合をθとし、周辺文脈付き単語ｘと正解ラベルｙとのペア（ｘ，ｙ）の集合をＸとし、前記周辺文脈付き単語ｘに対するスコア値をη_ｘとした場合に、下記の式で定式化されたロジスティック回帰モデルＬ（θ）を前記目的関数として、前記目的関数を最小化する前記単語ベクトルの集合θを、最適な単語ベクトルの集合θ＾として学習するようにしても良い。

ただし、対象単語ｗ_tの単語ベクトルを

とし、対象単語ｗ_tの左文脈の周辺単語の単語ベクトルを

とし、対象単語ｗ_tの右文脈の周辺単語の単語ベクトルを

とすると、前記スコア値η_ｘは、以下の式で算出される。

In the step of learning by the word vector learning unit, a set of word vectors including each word vector of the target word and each word vector of the peripheral word is set as θ, and the word x with the peripheral context and the correct label y Logarithmic regression model L (θ) formulated by the following equation, where X is the set of pairs (x, y) and _x and the score value for the word x with surrounding context is _x The set of word vectors θ that minimizes the objective function may be learned as an optimal set of word vectors θ ^.

However, the word vector of the target word w _t

And the word vector of the surrounding words in the left context of the target word w _t

And the word vector of the surrounding words in the right context of the target word w _t

Then, the score value η _x is calculated by the following equation.

上記目的を達成するために、本発明のプログラムは、コンピュータを、上記単語学習装置の各部として機能させるためのプログラムである。 In order to achieve the above object, a program of the present invention is a program for causing a computer to function as each part of the word learning device.

本発明によれば、各次元間に関連性を持たせた単語ベクトルを学習することが可能となる。 According to the present invention, it is possible to learn a word vector having a relationship between dimensions.

実施形態に係る単語学習装置の機能的な構成を示すブロック図である。It is a block diagram which shows the functional structure of the word learning apparatus which concerns on embodiment. 実施形態に係る単語学習処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the word learning process which concerns on embodiment.

以下、本実施形態について図面を用いて説明する。 Hereinafter, the present embodiment will be described with reference to the drawings.

本実施形態に係る単語学習装置は、（巡回）畳み込み、及び（巡回）相互相関と呼ばれる離散ベクトル（離散信号）間の演算を行うことにより、各次元要素の並びに意味を持たせた単語ベクトルを獲得するためのモデルを学習する。 The word learning device according to the present embodiment performs a calculation between discrete vectors (discrete signals) called (cyclic) convolution and (cyclic) cross-correlation, thereby obtaining word vectors having meanings of each dimension element. Learn the model to acquire.

ここで２つのｄ次元の単語ベクトルｖ及びｖ’とすると、畳み込み＊は下記（３）式で定義される。 Here, assuming that two d-dimensional word vectors v and v ′, convolution * is defined by the following equation (3).

…（３）
... (3)

また、相互相関

は、下記（４）式で定義される。 Also cross-correlation

Is defined by the following equation (4).

…（４）
(4)

なお、畳み込みは、ｖｖ’＝ｖ’＊ｖとなり、可換な操作であるが、相互相関は、

となり、非可換な操作となる。 Note that convolution is vv ′ = v ′ * v, which is a commutative operation.

This is a non-commutative operation.

図１は、本実施形態に係る単語学習装置１０の機能的な構成を示すブロック図である。図１に示すように、本実施形態に係る単語学習装置１０は、入力部１２、周辺文脈付き単語抽出部１４、単語ベクトル学習部１６、及び、出力部１８を備えている。 FIG. 1 is a block diagram showing a functional configuration of a word learning device 10 according to the present embodiment. As illustrated in FIG. 1, the word learning device 10 according to the present embodiment includes an input unit 12, a word extraction unit 14 with surrounding context, a word vector learning unit 16, and an output unit 18.

入力部１２は、文の集合Ｘ’と、文の集合Ｘ’における各文に含まれる単語の集合Ｗ＝｛ｗ^１，…，ｗ^Ｎ｝と、周辺文脈の履歴長ｌとを入力とする。ただし、Ｎは、単語の数である。 The input unit 12 inputs a sentence set X ′, a word set W = {w ¹ ,..., W ^N } included in each sentence in the sentence set X ′, and the history length l of the surrounding context. . Here, N is the number of words.

周辺文脈付き単語抽出部１４は、文の集合Ｘ’に含まれる文をｘ’＝｛ｗ_１，…，ｗ_Ｔ｝とする。全ての

に対して、周辺文脈付き単語ｘ＝｛ｗ_ｔ−ｌ，…，ｗ_ｔ−１，ｗ_ｔ，ｗ_ｔ＋１，…，ｗ_ｔ＋ｌ｝を取り出す。周辺文脈がｗ_１より左になる場合、＜ｓ＞という特殊な単語を用い、周辺文脈がｗ_Ｔより右にある場合、＜／ｓ＞という特殊な単語を用いる。 The word extraction unit with surrounding context 14 sets sentences included in the sentence set X ′ as x ′ = {w ₁ ,..., W _T }. All of

Respect, the word with surrounding context _{_{x = {w t-l,}} ..., w t-1, w t, w t + 1, ..., w t + l} taken out. If the peripheral context is to the left than w _1, using a special word <s>, if the peripheral context is to the right than w _T, using a special word </ s>.

周辺文脈付き単語抽出部１４は、上述した周辺文脈付き単語を取り出す処理を文の集合Ｘ’に含まれる全ての文について行い、取り出された周辺文脈付き単語ｘと正解ラベルを表す「１」とを組み合わせたペア（ｘ，１）とし、そのペアの集合をＸと定義する。なお、ペアの集合Ｘは、以下で提案するモデルの学習データとして用いられる。 The word extraction unit 14 with the surrounding context performs the above-described processing for extracting the word with the surrounding context for all sentences included in the sentence set X ′, and “1” representing the extracted word x with the surrounding context and the correct label. Is defined as a pair (x, 1), and the set of pairs is defined as X. The set X of pairs is used as learning data for a model proposed below.

また、上記非特許文献２に開示されているように、Ｓｋｉｐ−ｇｒａｍモデル等では、一般に、ベクトル学習を効果的に行うために、文の集合Ｘ’に含まれない周辺文脈付き単語ｘを用いた負例をサンプリングによって生成し、ペアの集合Ｘに混ぜること（ＮｅｇａｔｉｖｅＳｍｐｌｉｎｇ）を行う。 In addition, as disclosed in Non-Patent Document 2, the Skip-gram model or the like generally uses a word x with a peripheral context that is not included in the sentence set X ′ in order to effectively perform vector learning. The negative example is generated by sampling and mixed into the set X of pairs (Negative Smpling).

本実施形態でも同様に、文の集合Ｘ’に現れなかった周辺文脈付き単語ｘと、正解ラベルを表す「−１」と組み合わせたペア（ｘ，−１）のような負例をサンプリングによって生成し、ペアの集合Ｘに混ぜることにより、ペアの集合Ｘを作成する。 Similarly, in this embodiment, a negative example such as a pair (x, −1) in combination with a word x with a surrounding context that did not appear in the sentence set X ′ and “−1” representing a correct answer label is generated by sampling. Then, a set X of pairs is created by mixing with the set X of pairs.

単語ベクトル学習部１６は、ある周辺文脈付き単語ｘ＝｛ｗ_ｔ−ｌ，…，ｗ_ｔ−１，ｗ_ｔ，ｗ_ｔ＋１，…，ｗ_ｔ＋ｌ｝に対して、畳み込み及び相互相関を用いて次の計算を行う。 Word vector learning unit 16, a word with a certain neighborhood context _{_{x = {w t-l,}} ..., w t-1, w t, w t + 1, ..., w t + l} with respect to using the convolution and cross-correlation follows Perform the calculation.

まず、対象単語の周辺の周辺単語からなる周辺文脈を、左文脈ｃ_ｌ＝｛ｗ_ｔ−ｌ，…，ｗ_ｔ−１｝と右文脈ｃ_ｒ＝｛ｗ_ｔ＋１，…，ｗ_ｔ＋ｌ｝とに分け、畳み込みを行って、下記（５）式及び（６）式を用いて左文脈の単語ベクトルｖ_ｃｌ’と、右文脈の単語ベクトルｖ_ｃｒ’とを計算する。 First, peripheral contexts composed of peripheral words around the target word are changed to a left context c _l = {w _t−l ,..., W _t−1 } and a right context c _r = {w _{t + 1} _,. Division and convolution are performed, and the word vector v _cl ′ of the left context and the word vector v _cr ′ of the right context are calculated using the following equations (5) and (6).

…（５）

…（６）
... (5)

... (6)

次に、畳み込みを用いた下記（７）式、又は相互相関を用いた下記（８）式を用いて、周辺文脈付き単語における対象単語ｗ_ｔのスコア値η_ｘを計算する。すなわち、スコア値η_ｘは、対象単語ｗ_ｔの単語ベクトルｖ_ｗｔと、周辺単語ｗ_ｊの各々の単語ベクトルｖ_ｃｌ’及びｖ_ｃｒ’に対して巡回畳み込み又は巡回相互相関を用いた演算を行うことにより得られるベクトルとの積で算出される。 Next, using the following equation (7) using convolution or the following equation (8) using cross-correlation, the score value η _x of the target word w _{t in} the word with surrounding context is calculated. That is, the score value η _x performs a calculation using cyclic convolution or cyclic cross-correlation on the word vector v _wt of the target word w _t and each of the word vectors v _cl ′ and v _cr ′ of the neighboring word w _j. It is calculated by the product with the vector obtained by this.

…（７）

…（８）
... (7)

(8)

上記（７）式では、畳み込みは可換な操作であるため、左右の周辺単語を区別せずに関連付けることができる。一方、上記（８）式では、相互相関を用いているため、左右の区別がある。上記（７）式及び（８）式のどちらでモデル化を行っても本質に差異はないため、本実施形態では、上記（７）式（巡回畳み込み）を用いた場合について説明する。 In the above equation (7), since convolution is a commutative operation, it is possible to associate the left and right neighboring words without distinction. On the other hand, in the above equation (8), there is a left-right distinction because cross-correlation is used. Since there is no difference in essence regardless of whether the modeling is performed by using the formula (7) or the formula (8), a case will be described in the present embodiment where the formula (7) (cyclic convolution) is used.

本実施形態では、上記（７）式又は（８）式で計算されるスコア値η_ｘに対して、ロジスティック関数σを適用し、下記（９）式で示される確率値ｐ_θを考える。 In the present embodiment, the logistic function σ is applied to the score value η _x calculated by the above formula (7) or (8), and the probability value p _{θ represented} by the following formula (9) is considered.

…（９）
... (9)

ここで、

はモデルパラメータであり、φ_ｗｔ（ｃ_ｌ，ｃ_ｒ）は、周辺文脈付き単語ｗ＝｛ｃ_ｌ，ｗ_ｔ，ｃ_ｒ｝が対象とするデータ上に存在するかしないかを表す２値関数である。 here,

Is a model parameter, and φ _wt (c _l , c _r ) is a binary function indicating whether or not the word w = {c _l , w _t , c _r } with surrounding context exists on the target data It is.

単語ベクトル学習部１６は、下記（１０）式に示すロジスティック回帰モデルを最小化するように定式化することで、最適なパラメータ

、すなわち対象単語ｗ_ｔの各々の単語ベクトルｖ_ｗｔ、及び周辺単語ｗ_ｊの各々の単語ベクトルｖ_ｃｌ’及びｖ_ｃｒ’を学習する。 The word vector learning unit 16 formulates the logistic regression model shown in the following equation (10) so as to minimize the optimum parameter.

That is, the word vector v _wt of each of the target words w _t and the word vectors v _cl ′ and v _cr ′ of each of the surrounding words w _j are learned.

…（１０）
(10)

これは、Ａｄａｇｒａｄ等の確率的勾配降下法（上記非特許文献１を参照）によって最適化することができる。最適化の手続きは、以下のようになる。 This can be optimized by a probabilistic gradient descent method such as Adagrad (see Non-Patent Document 1 above). The optimization procedure is as follows.

（手順０）パラメータθは、各単語ベクトルの次元数Ｄを使って、

から

を区間とする一様分布から生成した乱数を用いて、単語ベクトルの各次元を初期化する。 (Procedure 0) The parameter θ uses the dimensionality D of each word vector,

From

Each dimension of the word vector is initialized using a random number generated from a uniform distribution with the interval.

（手順１）繰り返し回数のカウンタｉを０とし、以下の（手順２）をＭ（Ｍは任意の自然数）回繰り返す。 (Procedure 1) The iteration number counter i is set to 0, and the following (Procedure 2) is repeated M (M is an arbitrary natural number) times.

（手順２）

に対して、下記（手順２−１）乃至（手順２−５）の計算を行う。ここで、ｘ、ｃ_ｌ、ｃ_ｒの定義は上述した通りである。 (Procedure 2)

On the other hand, the following (procedure 2-1) to (procedure 2-5) are calculated. _{Here, x,} c l, of _{c r} defined as described above.

ここで、対象単語、及び各周辺単語に対する勾配ベクトル

は、全てゼロベクトルで初期化されているものとする。 Here, the gradient vector for the target word and each neighboring word

Are all initialized with a zero vector.

（手順２−１）カウンタｉにおける対象単語ｗ_ｔの単語ベクトル

の勾配を、下記（１１）式を用いて計算する。 (Procedure 2-1) Word vector of target word w _t in counter i

Is calculated using the following equation (11).

…（１１）
... (11)

（手順２−２）

に対して、カウンタｉにおける周辺単語ｗ_ｊの単語ベクトル

の勾配を、下記（１２）式を用いて計算する。 (Procedure 2-2)

A word vector of neighboring words w _j in counter i

Is calculated using the following equation (12).

…（１２）
(12)

ここで、

とする。 here,

And

（手順２−３）

に対しても同様に、カウンタｉにおける周辺単語ｗ_ｊの単語ベクトル

の勾配を、下記（１３）式を用いて計算する。 (Procedure 2-3)

Similarly, the word vector of the neighboring word w _j in the counter i

Is calculated using the following equation (13).

…（１３）
... (13)

ここで、

とする。 here,

And

（手順２−４）カウンタ（ｉ＋１）における対象単語ｗ_ｔの単語ベクトル

の勾配を、下記（１４）式として更新する。 (Procedure 2-4) Word vector of target word w _t in counter (i + 1)

Is updated as the following equation (14).

…（１４）
... (14)

また、

に対しても、カウンタ（ｉ＋１）における周辺単語ｗ_ｊの単語ベクトル

の勾配を、下記（１５）式として更新する。 Also,

Also, a word vector of neighboring words w _j in the counter (i + 1)

Is updated as the following equation (15).

…（１５）
... (15)

（手順２−５）カウンタｉ＝ｉ＋１となるようにカウンタを更新する。 (Procedure 2-5) The counter is updated so that counter i = i + 1.

出力部１８は、単語ベクトル学習部１６により学習されたパラメータ（単語ベクトルの集合）θを出力する。 The output unit 18 outputs a parameter (a set of word vectors) θ learned by the word vector learning unit 16.

なお、本実施形態に係る単語学習装置１０は、例えば、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、各種プログラムを記憶するＲＯＭ（Read Only Memory）を備えたコンピュータ装置で構成される。また、単語学習装置１０を構成するコンピュータは、ハードディスクドライブ、不揮発性メモリ等の記憶部を備えていても良い。本実施形態では、ＣＰＵがＲＯＭ、ハードディスク等の記憶部に記憶されているプログラムを読み出して実行することにより、上記のハードウェア資源とプログラムとが協働し、上述した機能が実現される。 Note that the word learning device 10 according to the present embodiment is configured by a computer device including, for example, a CPU (Central Processing Unit), a RAM (Random Access Memory), and a ROM (Read Only Memory) that stores various programs. Moreover, the computer which comprises the word learning apparatus 10 may be provided with memory | storage parts, such as a hard disk drive and a non-volatile memory. In the present embodiment, the CPU reads and executes a program stored in a storage unit such as a ROM or a hard disk, whereby the hardware resources and the program cooperate to realize the above-described function.

本実施形態に係る単語学習装置１０による単語学習処理の流れを、図２に示すフローチャートを用いて説明する。本実施形態では、単語学習装置１０に、単語学習処理の実行を開始するための予め定めた情報が入力されたタイミングで単語学習処理が開始されるが、単語学習処理が開始されるタイミングはこれに限らず、例えば、入力画像が入力されたタイミングで単語学習処理が開始されても良い。 The flow of the word learning process by the word learning device 10 according to the present embodiment will be described using the flowchart shown in FIG. In the present embodiment, the word learning process is started at a timing when predetermined information for starting execution of the word learning process is input to the word learning device 10, but the timing at which the word learning process is started is this. For example, the word learning process may be started at the timing when the input image is input.

ステップＳ１０１では、入力部１２が、文の集合Ｘ’、及び文の集合Ｘ’の各文に含まれる単語の集合Ｗを入力する。 In step S101, the input unit 12 inputs a sentence set X ′ and a word set W included in each sentence of the sentence set X ′.

ステップＳ１０３では、周辺文脈付き単語抽出部１４が、文の集合Ｘ’に含まれる何れかの文を選択する。 In step S103, the word extraction unit 14 with the surrounding context selects any sentence included in the sentence set X ′.

ステップＳ１０５では、周辺文脈付き単語抽出部１４が、選択した文に含まれる各単語を対象とし、右側に隣接するｌ個の単語を右文脈として抽出すると共に、左側に隣接するｌ個の単語を左文脈として抽出する。 In step S105, the word extraction unit with surrounding context 14 targets each word included in the selected sentence, extracts 1 word adjacent on the right side as a right context, and extracts 1 word adjacent on the left side. Extract as left context.

ステップＳ１０７では、周辺文脈付き単語抽出部１４が、対象とした単語と、ステップＳ１０５で抽出した単語とを含む周辺文脈付き単語ｘを生成する。 In step S107, the word extraction unit 14 with a surrounding context generates a word x with a surrounding context including the target word and the word extracted in step S105.

ステップＳ１０９では、周辺文脈付き単語抽出部１４が、周辺文脈付き単語ｘと正解ラベルｙとのペア（ｘ，ｙ）の集合Ｘを生成する。 In step S109, the word extraction unit 14 with the surrounding context generates a set X of pairs (x, y) of the word x with the surrounding context and the correct answer label y.

ステップＳ１１１では、周辺文脈付き単語抽出部１４が、未処理の文があるか否か、すなわち文の集合Ｘ’に含まれる文のうち、ステップＳ１０３乃至Ｓ１０９の処理を行っていない文があるか否かを判定する。ステップＳ１１１で未処理の文があると判定した場合（Ｓ１１１，Ｙ）はステップＳ１０３に戻り、未処理の文に対してステップＳ１０３乃至Ｓ１１１の処理を行う。また、ステップＳ１１１で未処理の文がないと判定した場合（Ｓ１１１，Ｎ）はステップＳ１１３に移行する。 In step S111, the word extraction unit with surrounding context 14 determines whether or not there is an unprocessed sentence, that is, among sentences included in the sentence set X ′, there is a sentence that has not been processed in steps S103 to S109. Determine whether or not. If it is determined in step S111 that there is an unprocessed sentence (S111, Y), the process returns to step S103, and the processes of steps S103 to S111 are performed on the unprocessed sentence. If it is determined in step S111 that there is no unprocessed sentence (S111, N), the process proceeds to step S113.

ステップＳ１１３では、周辺文脈付き単語抽出部１４が、文の集合Ｘ’に含まれない周辺文脈付き単語ｘと正解ラベル「−１」とのペア（ｘ，ｙ）を集合Ｘに追加する。 In step S113, the word extraction unit 14 with the surrounding context adds a pair (x, y) of the word x with the surrounding context and the correct label “−1” not included in the sentence set X ′ to the set X.

ステップＳ１１５では、単語ベクトル学習部１６が、一様分布から生成した乱数を用いて、各対象単語及び各周辺単語の単語ベクトルの各次元を初期化する。 In step S115, the word vector learning unit 16 initializes each dimension of the word vectors of each target word and each peripheral word using random numbers generated from the uniform distribution.

ステップＳ１１７では、単語ベクトル学習部１６が、カウンタｉ＝０にリセットする。 In step S117, the word vector learning unit 16 resets the counter i = 0.

ステップＳ１１９では、単語ベクトル学習部１６が、カウンタｉに１を加算する。 In step S119, the word vector learning unit 16 adds 1 to the counter i.

ステップＳ１２１では、単語ベクトル学習部１６が、周辺文脈付き単語ｘの各々に対し、上記（１１）式、上記（１２）式、及び上記（１３）式を用いて、対象単語及び各周辺単語の単語ベクトルの勾配を計算する。 In step S121, the word vector learning unit 16 uses the above expression (11), the above expression (12), and the above expression (13) for each word x with surrounding context, Calculate the gradient of the word vector.

ステップＳ１２３では、単語ベクトル学習部１６が、周辺文脈付き単語ｘの各々に対し、対象単語及び各周辺単語の単語ベクトルの勾配と、上記（１４）式、及び上記（１５）式とを用いて、対象単語及び各周辺単語の単語ベクトルを更新し、単語ベクトルの集合θを計算する。 In step S123, the word vector learning unit 16 uses the gradients of the word vectors of the target word and each peripheral word, the above formula (14), and the above formula (15) for each word x with surrounding context. The word vectors of the target word and each peripheral word are updated, and a set of word vectors θ is calculated.

ステップＳ１２５では、単語ベクトル学習部１６が、カウンタｉがＭ（Ｍは予め定めた２以上の自然数）以上であるか否かを判定する。ステップＳ１２５でカウンタｉがＭ以上であった場合（Ｓ１２５，Ｙ）はステップＳ１２７に移行する。また、ステップＳ１２５でカウンタｉがＭより小さい場合（Ｓ１２５，Ｎ）はステップＳ１１９に戻り、ステップＳ１１９乃至Ｓ１２５の処理を行う。 In step S125, the word vector learning unit 16 determines whether or not the counter i is greater than or equal to M (M is a predetermined natural number of 2 or more). If the counter i is greater than or equal to M in step S125 (S125, Y), the process proceeds to step S127. If the counter i is smaller than M in step S125 (S125, N), the process returns to step S119, and the processes of steps S119 to S125 are performed.

ステップＳ１２７では、最適な単語ベクトルの集合θを出力し、本単語学習処理のプログラムの実行を終了する。なお、本実施形態では、推定された寸法をディスプレイ等の表示手段に表示させたり、推定された寸法を示すデータを記憶手段に記憶させたりすることにより、寸法情報を出力する。 In step S127, an optimal word vector set θ is output, and the execution of the word learning process program is terminated. In the present embodiment, the estimated dimension is displayed on display means such as a display, or the dimension information is output by storing data indicating the estimated dimension in the storage means.

このように、本実施形態では、文の集合、及び単語の集合を入力とし、文の集合の各々の文に含まれる対象単語と、当該対象単語の周辺の周辺単語とを含む周辺文脈付き単語を抽出する。また、抽出された、全ての前記周辺文脈付き単語に対する、対象単語の単語ベクトルと、周辺単語の各々の単語ベクトルに対して巡回畳み込み又は巡回相互相関を用いた演算を行うことにより得られるベクトルとの積で算出されるスコア値を用いて表される目的関数を最適化するように、対象単語の各々の単語ベクトル、及び周辺単語の各々の単語ベクトルを学習する。そして、学習された単語ベクトルを出力する。 As described above, in this embodiment, a word with a peripheral context that includes a set of sentences and a set of words as input and includes target words included in each sentence of the set of sentences and peripheral words around the target words. To extract. Further, a word vector of the target word for all the extracted words with surrounding context, and a vector obtained by performing an operation using cyclic convolution or cyclic cross-correlation on each word vector of the surrounding words, Each word vector of the target word and each word vector of surrounding words are learned so as to optimize the objective function expressed using the score value calculated by the product of. Then, the learned word vector is output.

Ｓｋｉｐ−ｇｒａｍモデル等で学習された単語ベクトルは機械翻訳、構文解析、形態素解析、文書分類等の自然言語処理の多くの問題に応用されている。本実施形態に係る単語学習装置１０により学習された単語ベクトルの集合θも、上記多くの問題に応用することが可能である。また、本実施形態に係る単語学習装置１０で獲得される単語ベクトルの集合θは各次元の並びに意味を持つため、より多くの情報をベクトルに組み込むことが可能になる。 Word vectors learned by the Skip-gram model are applied to many problems of natural language processing such as machine translation, syntax analysis, morphological analysis, and document classification. The set of word vectors θ learned by the word learning device 10 according to the present embodiment can also be applied to the many problems described above. In addition, since the set of word vectors θ obtained by the word learning device 10 according to the present embodiment has a meaning in each dimension, more information can be incorporated into the vector.

なお、本実施形態では、対象単語ｗ_ｔのスコア値η_ｘを計算する際に、上記（７）式（巡回畳み込み）を用いた場合について説明したが、これに限らず、上記（８）式（巡回相互相関）を用いても良い。この場合には、上記（１１）式の代わりに下記（１６）式を用い、上記（１２）式の代わりに下記（１７）式を用い、上記（１３）式の代わりに下記（１８）式を用いれば良い。

…（１６）

…（１７）

…（１８） In the present embodiment, the case where the above equation (7) (cyclic convolution) is used when calculating the score value η _x of the target word w _t has been described, but the present invention is not limited to this, and the above equation (8) (Cyclic cross-correlation) may be used. In this case, the following formula (16) is used instead of the above formula (11), the following formula (17) is used instead of the above formula (12), and the following formula (18) is used instead of the above formula (13). Should be used.

... (16)

... (17)

... (18)

また、本実施形態では、図１に示す機能の構成要素の動作をプログラムとして構築し、単語学習装置１０として利用されるコンピュータにインストールして実行させるが、これに限らず、ネットワークを介して流通させても良い。 Further, in the present embodiment, the operation of the components of the functions shown in FIG. 1 is constructed as a program and installed and executed on a computer used as the word learning device 10, but is not limited thereto, and distributed via a network. You may let them.

また、構築されたプログラムをハードディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールしたり、配布したりしても良い。 Further, the constructed program may be stored in a portable storage medium such as a hard disk or a CD-ROM, and installed in a computer or distributed.

１０単語学習装置
１２入力部
１４周辺文脈付き単語抽出部
１６単語ベクトル学習部
１８出力部 DESCRIPTION OF SYMBOLS 10 Word learning apparatus 12 Input part 14 Word extraction part 16 with a surrounding context Word vector learning part 18 Output part

Claims

Peripheral context-extracted word extraction that uses a set of sentences and a set of words as input, and extracts a word with a peripheral context including target words included in each sentence of the set of sentences and peripheral words around the target word And
For all the words with surrounding context extracted by the word extracting unit with surrounding context,
It is expressed using a score value calculated by the product of the word vector of the target word and a vector obtained by performing an operation using cyclic convolution or cyclic cross-correlation on each word vector of the surrounding words. A word vector learning unit that learns each word vector of the target word and each word vector of the surrounding words so as to optimize the objective function
An output unit for outputting the word vector learned by the word vector learning unit;
Word learning device including

The word vector learning unit sets θ as a set of word vectors including each word vector of the target word and each word vector of the peripheral words, and a pair (x, When the set of y) is X and the score value for the word x with surrounding context is η _x , the logistic regression model L (θ) formulated by the following equation is used as the objective function, and the objective function is The word learning device according to claim 1, wherein the word vector set θ to be minimized is learned as an optimum word vector set θ ^.

However, the word vector of the target word w _t

And the word vector of the surrounding words in the left context of the target word w _t

And the word vector of the surrounding words in the right context of the target word w _t

Then, the score value η _x is calculated by the following equation.

A word learning method in a word learning device having a word extraction unit with a peripheral context, a word vector learning unit, and an output unit,
The word extraction unit with a peripheral context receives a set of sentences and a set of words, and includes a target word included in each sentence of the set of sentences and a peripheral word around the target word Extracting a word;
The word vector learning unit for all the words with surrounding context extracted by the word extracting unit with surrounding context,
It is expressed using a score value calculated by the product of the word vector of the target word and a vector obtained by performing an operation using cyclic convolution or cyclic cross-correlation on each word vector of the surrounding words. Learning each word vector of the target word and each word vector of the surrounding words so as to optimize the objective function
The output unit outputting the word vector learned by the word vector learning unit;
Word learning method including

In the step of learning by the word vector learning unit, θ is a set of word vectors including each word vector of the target word and each word vector of the peripheral words, and a word x with a peripheral context and a correct label y When the set of pairs (x, y) is X and the score value for the word x with surrounding context is η _x , a logistic regression model L (θ) formulated by the following equation is used as the objective function: The word learning method according to claim 3, wherein the set of word vectors θ that minimizes the objective function is learned as an optimal set of word vectors θ ^.

However, the word vector of the target word w _t

And the word vector of the surrounding words in the left context of the target word w _t

And the word vector of the surrounding words in the right context of the target word w _t

Then, the score value η _x is calculated by the following equation.

The program for functioning a computer as each part of the word learning apparatus of Claim 1 or 2.