JP6586026B2

JP6586026B2 - Word vector learning device, natural language processing device, method, and program

Info

Publication number: JP6586026B2
Application number: JP2016025130A
Authority: JP
Inventors: 鈴木　潤; 潤鈴木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-02-12
Filing date: 2016-02-12
Publication date: 2019-10-02
Anticipated expiration: 2036-02-12
Also published as: JP2017142746A

Description

本発明は、単語ベクトル学習装置、自然言語処理装置、方法、及びプログラムに係り、特に、単語に関する単語ベクトルを学習する単語ベクトル学習装置、自然言語処理装置、方法、及びプログラムに関する。 The present invention relates to a word vector learning device, a natural language processing device, a method, and a program, and more particularly, to a word vector learning device, a natural language processing device, a method, and a program for learning a word vector related to a word.

個々の単語は離散シンボルであり、かつ物理現象などに基づくものでもないことから、単語間の類似度を定量的に表現するのはそれほど単純ではない。比較として、例えば音声は、計算機上では一般的に周波数の時系列データとして捉えられる。よって、任意の音声区間同士の類似度は、周波数から算出できる様々な特徴量（連続値）をベクトル化したものの間で距離を計算することで、ある程度計測することができる。同様に、画像間の類似度も画素情報を特徴量としてベクトル化したものの間で距離を計算することである程度容易に計算できる。このように、波形であったり色彩であったり物理的な現象を基とするものの間の類似度は、計算機上でも比較的自然に扱うことが可能であるが、言語のような離散的なシンボルで記述された物理現象にも則さないものの間の類似度は、計算機上で単純には扱えない。このような背景から、単語のような離散シンボル間の類似度を計算するために、これまで様々な方法論が考案されている。そのひとつに分散意味表現という方法がある。これは、音声や画像と同様に、各単語に一つのベクトルを割り振り、そのベクトル間の距離をもって単語間の意味的な類似度を表現しようと試みる方法である。ベクトル空間内の距離計算で単語間の意味的な近さを表現するので、計算機にとっては非常に親和性が高い方法と言える。 Since each word is a discrete symbol and not based on a physical phenomenon or the like, it is not so simple to quantitatively express the similarity between words. For comparison, for example, voice is generally regarded as time-series data of frequency on a computer. Therefore, the similarity between arbitrary speech sections can be measured to some extent by calculating the distance between the vectorization of various feature quantities (continuous values) that can be calculated from the frequency. Similarly, similarity between images can be easily calculated to some extent by calculating a distance between pixel information vectorized as feature amounts. In this way, the similarity between waveforms, colors, and those based on physical phenomena can be handled relatively naturally on a computer, but discrete symbols such as languages can be used. The similarity between things that do not conform to the physical phenomenon described in (1) cannot simply be handled on a computer. From this background, various methodologies have been devised so far to calculate the similarity between discrete symbols such as words. One of them is a distributed semantic expression method. This is a method of allocating one vector to each word and trying to express the semantic similarity between words with the distance between the vectors, as in the case of speech and images. It can be said that this method has a very high affinity for computers because it expresses the semantic proximity between words by calculating the distance in the vector space.

図８に分散意味表現による単語間の類似度の概要を示す。 FIG. 8 shows an outline of the similarity between words by the distributed semantic expression.

ここでは、ｉ番目の単語をｗ_ｉと表す。また、ｉ番目の単語ｗ_ｉに割り当てられたベクトルをｒ_ｉで表す。以降、単語に割り振られたベクトルのことを特別に「単語ベクトル」と呼ぶこととする。つまり、単語ｗ_ｉの単語ベクトルはｒ_ｉである。この時、計算機上の計算としては、二つの単語ｗ_ｉとｗ_ｊ間の類似度は、ｗ_ｉ、ｗ_ｊの単語ベクトル間の内積、或いは、コサイン距離により定義するのが一般的である。 Here, the i-th word is represented as w _i . Also, the vector assigned to the i-th word w _i is denoted by r _i . Hereinafter, a vector assigned to a word is specifically referred to as a “word vector”. That is, the word vector of word w _i is r _i . At this time, as a calculation on a computer, the similarity between two words w _i and w _j is generally defined by an inner product between word vectors of w _i and w _j or a cosine distance.

この場合、値が大きければ大きいほど単語ｗ_ｉとｗ_ｊは似ているということを意味する。これによって、翻訳、対話、文書要約、文書校正といった言語処理の様々なアプリケーションの中で意味的に類似性がある単語を処理の中で扱えるようになる利点がある。結果として、単語間の意味的な近さを用いない処理方式より良い結果が得られることが示されている。 In this case, the larger the value, the more similar the words w _i and w _j are. This has the advantage that words that are semantically similar can be handled during processing in various applications of language processing such as translation, dialogue, document summarization, and document proofreading. As a result, it is shown that better results can be obtained than a processing method that does not use semantic proximity between words.

ここで、各単語の単語ベクトルの獲得方法には、これまで多くの方法が提案されている。基本的な方法論としては、まず文章内の各単語に対して、その単語の文脈情報を定義する。文脈情報に特に規定はなく様々な情報を用いることができるが、最も単純には各単語の周辺に出現する単語を文脈情報として扱う場合がほとんどである。文脈の定義を変更しても、単語ベクトルの推定アルゴリズムそのものにはあまり影響を与えない。よって、以降の議論では、単語の文脈情報としては、周辺に出現する単語とする。 Here, many methods have been proposed for acquiring word vectors for each word. As a basic methodology, first, for each word in a sentence, the context information of that word is defined. There are no specific rules for the context information and various information can be used, but most simply, the words appearing around each word are mostly handled as context information. Changing the context definition does not significantly affect the word vector estimation algorithm itself. Therefore, in the following discussion, the word context information is a word that appears in the vicinity.

近年では、ｗｅｂから得られるような大規模なデータにも対応できるほど高速に処理可能な方法が主流となっている（非特許文献１及び非特許文献２参照）。大規模データが扱える手法が主流な理由は、データが多ければ多いほど、言語事象を的確に捉えることが可能となるため、類似度の推定精度が向上することが理論的にも期待できるからである。ここでは従来方式の代表として、非特許文献２に即した単語ベクトルの獲得方法を述べる。 In recent years, methods that can be processed at such a high speed as to be able to deal with large-scale data obtained from web have become mainstream (see Non-Patent Document 1 and Non-Patent Document 2). The reason why the method that can handle large-scale data is the main reason is that the more data there is, the more accurately the language event can be captured, so it can be theoretically expected that the estimation accuracy of similarity will be improved. is there. Here, as a representative of the conventional method, a method for acquiring a word vector according to Non-Patent Document 2 will be described.

この方式では、単語がある単語の文脈として出現した場合を表現するために、各単語に単語ベクトルとは別のもう一つのベクトルを割り当てる。これを単語ベクトルと対比して便宜上「文脈ベクトル」と呼ぶ。つまり、ｉ番目の単語ｗ_ｉは、単語ベクトルｒ_ｉと文脈ベクトルｃ_ｉの二つのベクトルを持つ。語彙数Ｖとする。この時に、（ｒ_ｉ）_ｉ＝１ ^Ｖを全ての単語ベクトルをｉ＝１からＶまで順番に並べたベクトルのリストとする。同様に、（ｃ_ｉ）_ｉ＝１ ^Ｖを全ての文脈ベクトルをｉ＝１からＶまで順番に並べたベクトルのリストとする。表記を簡単にするためＲ＝（ｒ_ｉ）_ｉ＝１ ^Ｖ、Ｃ＝（ｃ_ｉ）_ｉ＝１ ^Ｖとする。また、ｉ番目の単語に対してｊ番目の単語が文脈となった回数をＸ_ｉ，ｊとする。この時、非特許文献２に即した分散意味表現の獲得には以下の目的関数を最小化する問題で定式化できる。 In this method, in order to express a case where a word appears as the context of a certain word, another vector different from the word vector is assigned to each word. For convenience, this is called a “context vector” in contrast to the word vector. That is, the i-th word w _i has two vectors, a word vector r _i and a context vector c _i . The vocabulary number is V. At this time, let (r _i ) _{i = 1} ^{V be} a vector list in which all word vectors are arranged in order from i = 1 to V. Similarly, (c _i ) _{i = 1} ^V is a list of vectors in which all context vectors are arranged in order from i = 1 to V. In order to simplify the notation, R = (r _i ) _{i = 1} ^V and C = (c _i ) _{i = 1} ^V. Further, the number of times that the j-th word becomes a context with respect to the i-th word is X _{i, j} . At this time, the acquisition of the distributed semantic expression according to Non-Patent Document 2 can be formulated by the problem of minimizing the following objective function.

ただし、＾Ｒ＝（＾ｒ_ｉ）_ｉ＝１ ^Ｖおよび＾Ｃ＝（＾ｃ_ｉ）_ｉ＝１ ^Ｖは、単語ベクトルおよび文脈ベクトルの推定結果を表す。また、φは重み係数をＸ_ｉ，ｊから計算するための関数である。 However, {circumflex over (R)} (^ r _i ) _{i = 1} ^V and {circumflex over (C)} (^ c _i ) _{i = 1} ^V represent estimation results of word vectors and context vectors. Φ is a function for calculating a weighting factor from X _{i, j} .

最終的に得られた＾ｒ_ｉがｉ番目の単語の単語ベクトルである。これが、（２）式の類似度計算などで用いられる単語ベクトルとなる。また、翻訳、文書校正といった自然言語処理の応用アプリケーションで利用される。 Finally obtained ^ r _i is a word vector of the i-th word. This is a word vector used in the similarity calculation of equation (2). It is also used in natural language processing application applications such as translation and document proofreading.

Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space. Proceedings of Workshop at ICLR, 2013.Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space.Proceedings of Workshop at ICLR, 2013. Jeffrey Pennington, Richard Socher, and Christopher Manning, Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),2014.Jeffrey Pennington, Richard Socher, and Christopher Manning, Glove: Global Vectors for Word Representation.Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.

前述の非特許文献２のような現在主流に用いられている方法の課題は、扱う「単語数」や出力となるベクトル空間の「次元数」が大きくなると、最終的に得られる単語ベクトルの結果を保持するのに必要なメモリ量が大きい点があげられる。 The problem of the method used in the current mainstream as in Non-Patent Document 2 described above is that when the “number of words” to be handled and the “dimension number” of the output vector space are increased, the result of the word vector finally obtained The amount of memory required to hold the data is large.

ファイルサイズは、そのまま実行時のメモリ占有量と同じとなる。メモリ占有量は、携帯端末等の限定されたリソースしか持たない計算環境で、非常に大きな問題となる可能性がある。また、一般的な計算機上での実行時にも、近年のマルチコアな計算機上で同時に複数実行する際や、他のプログラムになるべく影響を与えないという観点で、メモリ占有量は極力少ないことが望まれる。 The file size is the same as the memory occupancy during execution. Memory occupancy can be a significant problem in computing environments with limited resources, such as portable terminals. In addition, even when executing on a general computer, it is desirable that the memory occupancy is as small as possible from the viewpoint of not affecting other programs as much as possible when simultaneously executing multiple on a recent multi-core computer. .

本発明は、上記問題点を解決するために成されたものであり、必要なメモリ容量を削減することができる単語ベクトルを学習することができる単語ベクトル学習装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and provides a word vector learning apparatus, method, and program capable of learning a word vector that can reduce a necessary memory capacity. With the goal.

また、学習された単語ベクトルを用いて、単語の意味的な類似度に基づく自然言語処理を行う自然言語処理装置、及びプログラムを提供することを目的とする。 It is another object of the present invention to provide a natural language processing apparatus and program for performing natural language processing based on the semantic similarity of words using learned word vectors.

上記目的を達成するために、第１の発明に係る単語ベクトル学習装置は、文書データに基づいて、単語の各々について、前記単語に関するＤ次元の単語ベクトル、及び前記単語が他の単語の文脈として出現することを表すＤ次元の文脈ベクトルを学習する単語ベクトル学習装置であって、前記単語ベクトル及び文脈ベクトルの各々を、次元数ＦのＢ個のブロックに分割し、次元数ＦのＳ種類の実数ベクトルの集合と、前記文書データにおいて単語ペアについて一方の単語が他方の単語の文脈として出現した回数とに基づいて、前記単語ベクトル及び文脈ベクトルの各々の各ブロックが、前記Ｓ種類の実数ベクトルの何れかとなる制約の下で、前記単語ベクトル及び文脈ベクトルの各々の各ブロックの尤もらしさを表す目的関数を最適化するように、前記単語の各々についての単語ベクトル及び文脈ベクトルの各ブロックの実数値ベクトルを推定する学習部、を含んで構成されている。 In order to achieve the above object, the word vector learning device according to the first invention is based on document data, and for each word, a D-dimensional word vector related to the word and the word as a context of another word A word vector learning device for learning a D-dimensional context vector representing appearance, wherein each of the word vector and the context vector is divided into B blocks having a dimensionality F, Based on the set of real vectors and the number of times one word has appeared as the context of the other word for the word pair in the document data, each block of the word vector and context vector is the S types of real vectors. The objective function representing the likelihood of each block of the word vector and the context vector is optimized under any of the constraints To, and is configured to include a learning unit, which estimates the real value vector of each block of word vectors and the context vector for each of the words.

また、第１の発明に係る単語ベクトル学習装置において、前記学習部は、以下の式により表される、前記目的関数を最適化するように、単語ｉの各々についての単語ベクトルの各ブロックｒ_ｉ,b及び単語ｊの各々についての文脈ベクトルの各ブロックｃ_ｊ,bの実数ベクトルを更新する更新部と、予め定められた反復終了条件を満たすまで、前記更新部による更新を繰り返させる反復判定部と、を含むようにしてもよい。
ただし、Ｄは、文書データで出現する、前記単語と、前記単語の文脈として出現する単語とのペアの集合を表し、α_ｉ,ｂ、β_ｉ,ｂは、ラグランジュ未定乗数であり、ｕ_ｉ,ｂ、ｖ_ｊ,ｂは、次元数Ｆの実数ベクトルである補助パラメタであり、ρは、定数であり、ζは、次元数ＦのＳ種類の実数ベクトルの集合を表し、Ｘ_ｉ，ｊは単語ｉに対して単語ｊが文脈として出現した回数を表す。 Further, in the word vector learning device according to the first invention, the learning unit represents each block r _i of the word vector for each word i so as to optimize the objective function represented by the following equation: _{, b} and the word j, an update unit that updates the real vector of each block c _{j, b} of the context vector, and an iterative determination unit that repeats the update by the update unit until a predetermined iteration end condition is satisfied. May be included.
Here, D represents a set of pairs of the word appearing in the document data and a word appearing as the context of the word, α _{i, b} and β _{i, b} are Lagrange undetermined multipliers, and u _{i , b} , v _{j, b} are auxiliary parameters that are real vectors of dimension F, ρ is a constant, ζ represents a set of S types of real vectors of dimension F, and X _{i, j} Represents the number of times word j appears as context with respect to word i.

第２の発明に係る自然言語処理装置は、入力された入力文書に対して、請求項１又は２記載の単語ベクトル学習装置で学習された各単語の前記単語ベクトルを用いて、前記単語ベクトルに基づく単語間の意味的な類似度に基づく自然言語処理を行う自然言語処理部、を含んで構成されている。 A natural language processing device according to a second invention uses the word vector of each word learned by the word vector learning device according to claim 1 or 2 as an input document, to the word vector. A natural language processing unit that performs natural language processing based on the semantic similarity between the words based on.

第３の発明に係る単語ベクトル学習方法は、文書データに基づいて、単語の各々について、前記単語に関するＤ次元の単語ベクトル、及び前記単語が他の単語の文脈として出現することを表すＤ次元の文脈ベクトルを学習する単語ベクトル学習装置における単語ベクトル学習方法であって、学習部が、前記単語ベクトル及び文脈ベクトルの各々を、次元数ＦのＢ個のブロックに分割し、次元数ＦのＳ種類の実数ベクトルの集合と、前記文書データにおいて単語ペアについて一方の単語が他方の単語の文脈として出現した回数とに基づいて、前記単語ベクトル及び文脈ベクトルの各々の各ブロックが、前記Ｓ種類の実数ベクトルの何れかとなる制約の下で、前記単語ベクトル及び文脈ベクトルの各々の各ブロックの尤もらしさを表す目的関数を最適化するように、前記単語の各々についての単語ベクトル及び文脈ベクトルの各ブロックの実数値ベクトルを推定するステップ、を含んで実行することを特徴とする。 A word vector learning method according to a third aspect of the present invention is based on document data, and for each word, a D-dimensional word vector related to the word, and a D-dimensional representing that the word appears as the context of another word. A word vector learning method in a word vector learning device for learning a context vector, wherein the learning unit divides each of the word vector and the context vector into B blocks having a dimension number F, and S types having a dimension number F. Each block of the word vector and the context vector is the S types of real numbers, based on the set of real vectors and the number of times one word appears as the context of the other word in the document data. An objective function representing the likelihood of each block of each of the word vector and the context vector under the constraint of any of the vectors As optimization, and executes comprising the step of estimating a real-valued vector of each block of word vectors and the context vector for each of the words.

また、第１の発明に係る単語ベクトル学習装置において、前記学習部が推定するステップは、更新部が、以下の式により表される、前記目的関数を最適化するように、単語ｉの各々についての単語ベクトルの各ブロックｒ_ｉ,b及び単語ｊの各々についての文脈ベクトルの各ブロックｃ_ｊ,bの実数ベクトルを更新するステップと、反復判定部が、予め定められた反復終了条件を満たすまで、前記更新部による更新を繰り返させるステップと、を含むようにしてもよい。
ただし、Ｄは、文書データで出現する、前記単語と、前記単語の文脈として出現する単語とのペアの集合を表し、α_ｉ,ｂ、β_ｉ,ｂは、ラグランジュ未定乗数であり、ｕ_ｉ,ｂ、ｖ_ｊ,ｂは、次元数Ｆの実数ベクトルである補助パラメタであり、ρは、定数であり、ζは、次元数ＦのＳ種類の実数ベクトルの集合を表し、Ｘ_ｉ，ｊは単語ｉに対して単語ｊが文脈として出現した回数を表す。 Also, in the word vector learning device according to the first invention, the step of estimating by the learning unit is performed for each word i so that the update unit optimizes the objective function represented by the following equation: Updating the real vector of each block c _{j, b} of the context vector for each block r _{i, b} and word j of the word vector until the iteration determination unit satisfies a predetermined iteration termination condition And repeating the updating by the updating unit.
Here, D represents a set of pairs of the word appearing in the document data and a word appearing as the context of the word, α _{i, b} and β _{i, b} are Lagrange undetermined multipliers, and u _{i , b} , v _{j, b} are auxiliary parameters that are real vectors of dimension F, ρ is a constant, ζ represents a set of S types of real vectors of dimension F, and X _{i, j} Represents the number of times word j appears as context with respect to word i.

第４の発明に係る自然言語処理装置は、自然言語処理部が、入力された入力文書に対して、請求項４又は５記載の単語ベクトル学習装置で学習された各単語の前記単語ベクトルを用いて、前記単語ベクトルに基づく単語間の意味的な類似度に基づく自然言語処理を行うステップを含んで実行することを特徴とする。 In the natural language processing device according to the fourth invention, the natural language processing unit uses the word vector of each word learned by the word vector learning device according to claim 4 or 5 for the inputted input document. And performing a natural language process based on a semantic similarity between words based on the word vector.

また、第５の発明に係るプログラムは、コンピュータを、第１の発明に係る単語ベクトル学習装置、又は第２の発明に係る自然言語処理装置を構成する各部として機能させるためのプログラムである。 A program according to the fifth invention is a program for causing a computer to function as each part constituting the word vector learning device according to the first invention or the natural language processing device according to the second invention.

本発明の単語ベクトル学習装置、方法、及びプログラムによれば、単語ベクトル及び文脈ベクトルの各々を、次元数ＦのＢ個のブロックに分割し、次元数ＦのＳ種類の実数ベクトルの集合と、一方の単語が他方の単語の文脈として出現した回数とに基づいて、各ブロックが、Ｓ種類の実数ベクトルの何れかとなる制約の下で、単語ベクトル及び文脈ベクトルの各々の各ブロックの尤もらしさを表す目的関数を最適化するように、単語の各々についての単語ベクトル及び文脈ベクトルの各ブロックの実数値ベクトルを推定することにより、必要なメモリ容量を削減することができる単語ベクトルを学習することができる、という効果が得られる。 According to the word vector learning device, method, and program of the present invention, each of the word vector and the context vector is divided into B blocks having the dimension number F, and a set of S kinds of real vectors having the dimension number F; Based on the number of times one word has appeared as the context of the other word, the likelihood of each block of each of the word vector and the context vector is determined under the constraint that each block is one of S types of real vectors. Learning word vectors that can reduce the required memory capacity by estimating the real-valued vector of each block of word vectors and context vectors for each word so as to optimize the objective function to represent The effect of being able to be obtained is obtained.

また、本発明の自然言語処理装置、及びプログラムによれば、学習された単語ベクトルを用いて、単語の意味的な類似度に基づく自然言語処理を行うことができる、という効果が得られる。 In addition, according to the natural language processing apparatus and the program of the present invention, it is possible to perform natural language processing based on the semantic similarity of words using a learned word vector.

単語リスト及び文脈の共起情報の例を示す図である。It is a figure which shows the example of the word list and context co-occurrence information. 単語ベクトルの一例を示す図である。It is a figure which shows an example of a word vector. 本発明の実施の形態に係る単語ベクトル学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the word vector learning apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る単語ベクトル学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the word vector learning apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る単語ベクトル学習装置における単語ベクトル学習処理ルーチンを示すフローチャートである。It is a flowchart which shows the word vector learning process routine in the word vector learning apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る単語ベクトル学習装置におけるベクトル更新処理ルーチンを示すフローチャートである。It is a flowchart which shows the vector update process routine in the word vector learning apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る自然言語装置における自然言語処理ルーチンを示すフローチャートである。It is a flowchart which shows the natural language processing routine in the natural language apparatus which concerns on embodiment of this invention. 分散意味表現による単語間の類似度の概要を示す図である。It is a figure which shows the outline | summary of the similarity between words by a distributed semantic expression.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態に係る概要＞ <Outline according to Embodiment of the Present Invention>

まず、本発明の実施の形態における概要を説明する。 First, an outline of the embodiment of the present invention will be described.

本発明では、本質的に、必要リソース量（ファイルサイズ、メモリ専有量など）は、どのような計算環境であれ少ないほうがよりよい、という一般的な考えに基づいて、単語ベクトルのサイズを圧縮するという課題に取り組む。 In essence, the present invention compresses the size of a word vector based on the general idea that the amount of required resources (file size, memory occupancy, etc.) is better in any computing environment. Tackle the problem.

本発明の実施の形態では、上記（３）式の単語間の類似度はコサイン類似度によって計算されることから、コサイン類似度の値をなるべく変化させずに単語ベクトルが保持する情報量を削減することで、全体として容量を削減した単語ベクトルを得る、という考えに基づく。 In the embodiment of the present invention, the similarity between words in the above equation (3) is calculated by the cosine similarity, so that the amount of information held by the word vector is reduced without changing the value of the cosine similarity as much as possible. This is based on the idea of obtaining a word vector with a reduced capacity as a whole.

まず、次元数Ｄの単語ベクトルを幾つかのブロックで分割する。例えば、単語ベクトルの「ブロック内の要素数」をＦ、「ブロック数」をＢとすると、Ｄ＝ＢＦの関係が成り立つ。例えば、Ｄ＝２５６、Ｆ＝３２、なら、Ｂ＝８となる。次に、各ブロックが取ることができる値集合の種類数を事前に決定する。例えば、「値集合の種類数」をＳとおく。 First, a word vector having a dimension number D is divided into several blocks. For example, if the “number of elements in a block” of the word vector is F and the “number of blocks” is B, the relationship of D = BF holds. For example, if D = 256 and F = 32, B = 8. Next, the number of types of value sets that each block can take is determined in advance. For example, S is the “number of value set types”.

このとき、単語ベクトル内の個々のブロックは、値集合Ｓの中の一つを選択することとする。つまり、一つの単語ベクトルは、Ｂ個のブロックが個々に一つ選択した値集合の組み合わせによって表現されることになる。このとき、単語ベクトルが取り得る種類数はＳＢ個であり、その中の一つが単語ベクトルに割り当てられる問題に相当する。例えば、先ほどの例で１６種類の値集合を用意した場合は、一つの単語ベクトルは、１６８＝４,２９４,９６７,２９６通りの値を取ることができる。一般的に単語数は数百万規模、どんなに多くても数千万程度なので、４３億種類程度あれば、十分に単語の分散表現を表現できると考えられる。よって、上記（２）式や（３）式で示したような最適化問題を解くことによって、縮約された単語分散意味表現をデータから獲得する方法を用いる。 At this time, one block in the value set S is selected as each block in the word vector. That is, one word vector is expressed by a combination of value sets in which B blocks are individually selected one by one. At this time, the number of types that the word vector can take is SB, and one of them corresponds to the problem of being assigned to the word vector. For example, when 16 types of value sets are prepared in the previous example, one word vector can take 168 = 4,294,967,296 values. In general, the number of words is in the millions, and no more than tens of millions. Therefore, if there are about 4.3 billion types, it is considered that the distributed expression of words can be expressed sufficiently. Therefore, a method of acquiring a reduced word distribution semantic expression from the data by solving the optimization problem as shown in the above formulas (2) and (3) is used.

本発明の実施の形態では、ｗｅｂ上に存在する自然言語で記述された電子化文章を使って単語の分散意味表現を獲得する問題を題材として適用した単語ベクトル学習装置を例に説明する。 In the embodiment of the present invention, a word vector learning apparatus to which the problem of acquiring a distributed semantic expression of a word using an electronic text described in a natural language existing on a web is applied as an example will be described.

まず最初に、ある一日のＳＮＳサイト等へ投稿される一般ユーザが生成する文章を獲得し単語ベクトルを生成することを想定する。この時、用いる語彙数Ｖは、獲得したデータに出現する全ての単語の種類数とする。或いは、ある頻度以上出現した単語を対象として語彙を決定してもよい。 First, it is assumed that a sentence generated by a general user posted to an SNS site on a certain day is acquired and a word vector is generated. At this time, the vocabulary number V to be used is the number of types of all words appearing in the acquired data. Alternatively, the vocabulary may be determined for words that appear more than a certain frequency.

図１に単語リスト及び文脈の共起情報の例を示す。なお、ここで示す例では簡単のため、単語区切りなどは、一般的によく用いられるツール等を用いて容易に獲得可能であることを前提とする。日本語の場合は、フリーで利用できるツールが存在するし、英語であれば、空白区切りを単語の区切りとして利用すれば良い。ここでは日本語の例で述べる。 FIG. 1 shows an example of word list and context co-occurrence information. For the sake of simplicity in the example shown here, it is assumed that word breaks and the like can be easily obtained using a commonly used tool or the like. In the case of Japanese, there are tools that can be used for free, and in the case of English, a blank separator can be used as a word separator. Here, I will give an example in Japanese.

単語ベクトルの次元数Ｄを例えば２５６に設定する。また、語彙数Ｖが１００万語だったと仮定する。単語ベクトルの一例を図２に示す。 The dimension number D of the word vector is set to 256, for example. Further, it is assumed that the vocabulary number V is 1 million words. An example of a word vector is shown in FIG.

この設定で従来法を用いた場合は、２５６次元の単語ベクトルが１００万本構築されることになる。 When the conventional method is used with this setting, 1 million 256-dimensional word vectors are constructed.

また、この単語ベクトルは、上記（２）式の最適化問題を解くことによって獲得できる。 This word vector can be obtained by solving the optimization problem of the above equation (2).

次に、本発明の実施の形態に適用した場合を考える。まず、ｉ番目の単語ベクトルｒ_ｉをブロック毎に分割した際のｂ番目の部分ベクトルをｒ_ｉ，ｂと記述する。このとき、ブロック数をＢとすると、ｒ_ｉ，ｂのｂ＝１からｂ＝Ｂまでのベクトルを連結するとｒと一致する。 Next, consider a case where the present invention is applied to an embodiment of the present invention. First, the b-th partial vector when the i-th word vector r _i is divided for each block is described as r _{i, b} . At this time, assuming that the number of blocks is B, the vectors from b = 1 to b = B of r _{i, b} match r.

本発明の実施の形態では、上記（３）式の代わりに以下（４）式の最適化問題を解くことで単語ベクトルを獲得する。 In the embodiment of the present invention, a word vector is obtained by solving the optimization problem of the following equation (4) instead of the above equation (3).

ここで、ζは、次元数ＦのＳ種類の実数ベクトルの集合を表す。 Here, ζ represents a set of S kinds of real vectors having a dimension number F.

従来法との大きな違いは、各単語ベクトルをブロック毎に分割し、ブロック単位のベクトルに対して、最適化を行うような形式になる点である。また各ブロックの値集合の種類がＳ種類となるように制約をかけながら学習が行われる点が大きな特徴となる。 A major difference from the conventional method is that each word vector is divided into blocks, and a block-type vector is optimized. Another significant feature is that learning is performed while restricting the value set of each block to S types.

次に、処理を簡単化するために、（４）式を以下のように変形する。 Next, in order to simplify the processing, the equation (4) is modified as follows.

（４）式と（５）式は等価である。単純に、最適化中の処理として、ベクトルｒとｕ、ｃとｖに分解して処理を行うための式変形である。ｒ＝ｕ、ｃ＝ｖの等式制約を拡張ラグランジュ緩和法を用いて緩和すると以下（６）式を得る（参考文献１：Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 2011.）。 Equations (4) and (5) are equivalent. It is simply an equation modification for performing processing by decomposing into vectors r and u and c and v as processing during optimization. When the equality constraint of r = u, c = v is relaxed using the extended Lagrangian relaxation method, the following equation (6) is obtained (Reference 1: Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed). Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 2011.).

本発明の実施の形態では、（６）式を目的関数として用いる。なお、Ａ＝（α_ｉ）_ｉ＝１ ^Ｖ，Ｂ＝（β_ｊ）_ｊ＝１ ^Ｖである。 In the embodiment of the present invention, equation (6) is used as an objective function. Note that A = (α _i ) _{i = 1} ^V and B = (β _j ) _{j = 1} ^V.

本発明の実施の形態の処理の流れを以下に示す。 The processing flow of the embodiment of the present invention is shown below.

（入力）
学習用データＤ、人手により決定するチューニングパラメタＤ,Ｂ,Ｆ,Ｔを入力する。 (input)
The learning data D and the tuning parameters D, B, F, and T determined manually are input.

（初期化）
最適化変数Ｒ,Ｃを乱数で初期化し、繰り返しを管理する変数ｔ＝０と初期化する。 (Initialization)
The optimization variables R and C are initialized with random numbers, and the variable t = 0 for managing repetition is initialized.

（処理１：更新）
勾配法などを用いて制約なしの状態で最適化パラメタＲ,Ｃを更新する。 (Process 1: Update)
The optimization parameters R and C are updated without restriction using a gradient method or the like.

（処理２：パラメタ調整）
全ての単語ベクトルのブロックで、値集合の種類数がＳ個になるように最適化パラメタＲ；Ｃを調整する。 (Process 2: Parameter adjustment)
The optimization parameter R; C is adjusted so that the number of types of value sets is S in all word vector blocks.

（処理３）
ラグランジュ乗数を更新する。 (Process 3)
Update the Lagrange multiplier.

（処理４：終了判定）
最適化が収束したか、或いは、事前に設定した規定回数Ｔに達したか判定終了判定が満たされていれば次の処理へ移行する。していなかったらｔ＝ｔ＋１として処理１へ戻る。 (Process 4: End determination)
If the optimization has converged or if the predetermined number of times T set in advance has been reached or if the determination end determination is satisfied, the process proceeds to the next process. If not, t = t + 1 and return to processing 1.

（処理５：縮約）
獲得された＾ｗに対してデータの保持方法を最適化する。 (Process 5: Reduction)
The data holding method is optimized for the obtained ^ w.

＜本発明の実施の形態に係る単語ベクトル学習装置の構成＞ <Configuration of Word Vector Learning Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る単語ベクトル学習装置の構成について説明する。図３に示すように、本発明の実施の形態に係る単語ベクトル学習装置１００は、ＣＰＵと、ＲＡＭと、後述する単語ベクトル学習処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この単語ベクトル学習装置１００は、機能的には図３に示すように入力部１０と、演算部２０と、出力部５０とを備えている。 Next, the configuration of the word vector learning device according to the embodiment of the present invention will be described. As shown in FIG. 3, the word vector learning device 100 according to the embodiment of the present invention includes a CPU, a RAM, a ROM for storing a program and various data for executing a word vector learning processing routine described later, Can be configured with a computer including Functionally, the word vector learning device 100 includes an input unit 10, a calculation unit 20, and an output unit 50 as shown in FIG.

入力部１０は、学習用データ The input unit 10 is learning data.

（以下、文書データと称する）を受け付ける。文書データは、単語と、単語の文脈として出現する単語とのペア（ｗ_ｉ,ｗ_ｊ）の集合である。ｗ_ｉ,ｗ_ｊは、対象となる単語ｗ_ｉと、その文脈となる単語ｗ_ｊである。また、入力部１０は、人手により決定されたチューニングパラメタＤ,Ｂ,Ｆ,Ｔをそれぞれ受け付ける。これらの値は、以降の処理中で値が変化することはない。 (Hereinafter referred to as document data). The document data is a set of pairs (w _i , w _j ) of words and words that appear as the word context. w _i and w _j are a target word w _i and a context word w _j . The input unit 10 receives tuning parameters D, B, F, and T determined manually. These values do not change during the subsequent processing.

演算部２０は、学習部３０と、ベクトル記憶部４０とを含んで構成されている。 The calculation unit 20 includes a learning unit 30 and a vector storage unit 40.

学習部３０は、文書データにおける単語ベクトル及び文脈ベクトルの各々を、次元数ＦのＢ個のブロックに分割し、次元数ＦのＳ種類の実数ベクトルの集合と、文書データにおいて単語ペアについて一方の単語が他方の単語の文脈として出現した回数Ｘ_ｉ，ｊとに基づいて、単語ベクトル及び文脈ベクトルの各々の各ブロックが、Ｓ種類の実数ベクトルの何れかとなる制約の下で、単語ベクトル及び文脈ベクトルの各々の各ブロックの尤もらしさを表す目的関数を最適化するように、単語の各々についての単語ベクトル及び文脈ベクトルの各ブロックの実数値ベクトルを推定する。 The learning unit 30 divides each of the word vector and the context vector in the document data into B blocks having a dimension number F, and sets one of the S types of real vectors having the dimension number F and one word pair in the document data. Based on the number of times X _{i, j} that the word has appeared as the context of the other word, each block of the word vector and the context vector is subject to the restriction that the word vector and the context are any one of the S real vectors. Estimate a word vector for each word and a real value vector for each block of the context vector to optimize the objective function representing the likelihood of each block of each vector.

学習部３０は、更新部３２と、反復判定部３４とを備えている。 The learning unit 30 includes an update unit 32 and an iterative determination unit 34.

更新部３２は、上記（６）式により表される、目的関数を最適化するように、文書データで出現する、単語と、単語の文脈として出現する単語とのペアの集合 The updating unit 32 represents a set of pairs of words that appear in the document data and words that appear as the context of the word, so as to optimize the objective function, expressed by the above equation (6).

と、次元数Ｆの実数ベクトルである各ブロックの補助パラメタｕ_ｉ,ｂ、ｖ_ｊ,ｂと、各ブロックについてのラグランジュ未定乗数α_ｉ,ｂ、β_ｉ,ｂとに基づいて、単語ｉの各々についての単語ベクトルの各ブロックｒ_ｉ,b及び単語ｊの各々についての文脈ベクトルの各ブロックｃ_ｊ,bの実数ベクトルを更新する。 And the auxiliary parameters u _{i, b} , v _{j, b of} each block, which is a real vector of dimension F _, and Lagrange undetermined multipliers α _{i, b} , β _{i, b} for each block, Update the real vector of each block c _{j, b} of the context vector for each block r _{i, b} and word j for each word vector.

反復判定部３４は、予め定められた反復終了条件を満たすまで、更新部３２による更新を繰り返させる。ここで反復終了条件は、繰り返し数Ｔの回数だけ更新部３２の更新を繰り返すこととする。 The iterative determination unit 34 causes the updating unit 32 to repeat the update until a predetermined iteration end condition is satisfied. Here, the iteration end condition is that the updating of the updating unit 32 is repeated the number of repetitions T.

更新部３２の具体的な処理について以下に説明する。 Specific processing of the updating unit 32 will be described below.

更新部３２は、まず最適化変数Ｒ,Ｃ,Ｕ,Ｖを乱数で初期化する。また、繰り返し数を管理する変数ｔをｔ＝０で初期化する。α、βも同様にα＝０、β＝０と初期化する。 The update unit 32 first initializes the optimization variables R, C, U, and V with random numbers. Also, a variable t for managing the number of repetitions is initialized with t = 0. Similarly, α and β are initialized as α = 0 and β = 0.

このとき、単語ベクトル集合Ｒ、及び文脈ベクトル集合Ｃのみを最適化変数と考え、補助パラメタ集合のＵ,Ｖを固定した場合の目的関数は以下（７）式のようになる。 At this time, only the word vector set R and the context vector set C are considered as optimization variables, and the objective function when U and V of the auxiliary parameter set are fixed is expressed by the following equation (7).

上記（７）式の単語ベクトルのブロックｒ_ｉ，ｂと文脈ベクトルのブロックｃ_ｊ，ｂとの勾配は以下（８）式及び（９）式のように書ける。 The gradient between the word vector block r _{i, b} and the context vector block c _{j, b} in the equation (7) can be written as the following equations (8) and (9).

よって、勾配法を用いて、上記（８）式、（９）式に従って、単語ベクトルのブロックｒ_ｉ，ｂと文脈ベクトルのブロックｃ_ｊ，ｂとを更新していく。 Therefore, the gradient vector is used to update the word vector block r _{i, b} and the context vector block c _{j, b} according to the above equations (8) and (9).

次に、補助パラメタｕ,ｖを更新する。（６）式に関して、補助パラメタｕ,ｖのみを最適化変数と考え、ｕ,ｖを固定した場合の目的関数は以下（１０）式のようになる。 Next, the auxiliary parameters u and v are updated. Regarding equation (6), only the auxiliary parameters u and v are considered as optimization variables, and the objective function when u and v are fixed is as shown in equation (10) below.

ただし、 However,

とする。 And

この問題は、ユークリッド距離に基づくｋ平均クラスタリング問題と等価になる。よって、例えば、一般的なｋ平均クラスタリング法を用いて（１０）式を解くことにより、補助パラメータｕ_ｉ，ｂ,ｖ_ｊ，ｂを更新する。 This problem is equivalent to the k-means clustering problem based on Euclidean distance. Therefore, for example, the auxiliary parameters u _{i, b} , v _{j, b} are updated by solving equation (10) using a general k-means clustering method.

次に、Ｒ,Ｃ,Ｕ,Ｖを固定し、ラグランジュ未定乗数α_ｉ,ｂ及びβ_ｉ,ｂを最適化変数として更新する。 Next, R, C, U, and V are fixed, and Lagrange undetermined multipliers α _{i, b} and β _{i, b} are updated as optimization variables.

この関係から以下の更新式を得る。勾配法を用いて、（１３）（１４）式に従って、ラグランジュ未定乗数α_ｉ,ｂ及びβ_ｉ,ｂを更新する。
ただし、ηは学習率を表す。 From this relationship, the following update formula is obtained. Using the gradient method, Lagrange undetermined multipliers α _{i, b} and β _{i, b} are updated in accordance with equations (13) and (14).
Here, η represents a learning rate.

次に、反復判定部３４による終了判定を行う。基本的に、終了判定は、事前に設定した繰り返し数Ｔに達したかどうかで判定する。ｔ＝Ｔの時、終了と判定し単語ベクトルの保存へ進む。ｔ＜Ｔの時には、ｔ＝ｔ＋１として更新部３２の最初の処理へ戻る。 Next, the end determination by the repetition determination unit 34 is performed. Basically, the end determination is made based on whether or not a preset number of repetitions T has been reached. When t = T, it is determined that the process is finished, and the process proceeds to save the word vector. When t <T, t = t + 1 and the process returns to the first process of the updating unit 32.

学習部３０は、最後に、更新部３２で更新された学習後の単語ベクトル及び文脈ベクトルの実数値ベクトルを適切な形式でベクトル記憶部４０に保存する。単語ベクトルの各ブロックは、Ｓ種類の値集合の中一つを取るので［ｌｏｇＳ］ｂｉｔで記述できる。よって、一つの単語ベクトルはＢ［ｌｏｇＳ］ｂｉｔとなる。結果、全語彙の単語ベクトルでは、ＶＢ［ｌｏｇＳ］ｂｉｔになる。同様に、全語彙の文脈ベクトルもＶＢ［ｌｏｇＳ］ｂｉｔで保存できる。また、一つの値集合は、倍精度浮動小数点（６４ｂｉｔ）がＦ個で構成されるので、Ｓ個の値集合全体は、６４ＳＦｂｉｔで表現できる。 Finally, the learning unit 30 stores the learned word vector and the real value vector of the context vector updated by the update unit 32 in the vector storage unit 40 in an appropriate format. Since each block of the word vector takes one of S types of value sets, it can be described by [logS] bit. Therefore, one word vector is B [logS] bit. As a result, the word vector of all vocabularies is VB [logS] bit. Similarly, context vectors for all vocabularies can also be stored with VB [logS] bit. Also, since one value set is composed of F double-precision floating point numbers (64 bits), the entire S value set can be expressed by 64 SF bits.

最終的に、２ＶＢ［ｌｏｇＳ］＋６４ＳＦｂｉｔで全語彙の単語ベクトルおよび文脈ベクトルが格納可能である。よって、Ｖ＝１,００００,００、Ｂ＝８、Ｓ＝１６、Ｆ＝３２のとき、最終的に、２×１,００００,００×８×４＋６４×１６×３２＝６４,０３２,７６８ｂｉｔ（約８ＭＢ）で全語彙の単語ベクトルおよび文脈ベクトルが格納可能である。 Finally, the word vectors and context vectors of all vocabularies can be stored with 2VB [logS] + 64SF bits. Therefore, when V = 1,0000,00, B = 8, S = 16, and F = 32, 2 × 1,0000,00 × 8 × 4 + 64 × 16 × 32 = 64,032,768 bits ( In about 8 MB), word vectors and context vectors of all vocabularies can be stored.

＜本発明の実施の形態に係る自然言語処理装置の構成＞ <Configuration of Natural Language Processing Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る自然言語処理装置の構成について説明する。図４に示すように、本発明の実施の形態に係る自然言語処理装置２００は、ＣＰＵと、ＲＡＭと、後述する自然言語処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この自然言語処理装置２００は、機能的には図４に示すように入力部２１０と、演算部２２０と、出力部２５０とを備えている。本実施の形態では、自然言語処理装置２００では、単語ベクトル学習装置１００により学習された単語ベクトルに基づいて、未知の単語を類似度の高い単語に置き換えて翻訳を行う場合を例に説明するが、これに限定されるものではなく、置き換えた単語を用いて要約、文書校正などを行ってもよい。 Next, the configuration of the natural language processing apparatus according to the embodiment of the present invention will be described. As shown in FIG. 4, the natural language processing apparatus 200 according to the embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program and various data for executing a natural language processing routine described later. It can be configured with a computer including. Functionally, the natural language processing apparatus 200 includes an input unit 210, a calculation unit 220, and an output unit 250 as shown in FIG. In the present embodiment, the natural language processing device 200 will be described by taking as an example a case where an unknown word is replaced with a word having a high similarity based on the word vector learned by the word vector learning device 100. However, the present invention is not limited to this, and summarization, document proofreading, and the like may be performed using the replaced word.

入力部２１０は、翻訳対象のテキストを受け付ける。 The input unit 210 receives text to be translated.

演算部２２０は、自然言語処理部２３０と、ベクトル記憶部２４０とを備えている。 The calculation unit 220 includes a natural language processing unit 230 and a vector storage unit 240.

ベクトル記憶部２４０には、ベクトル記憶部４０と同じものが記憶されている。 The vector storage unit 240 stores the same as the vector storage unit 40.

自然言語処理部２３０は、置換部２３２と、翻訳部２３４とを備えている。 The natural language processing unit 230 includes a replacement unit 232 and a translation unit 234.

置換部２３２は、入力部２１０で受け付けたテキストの単語のうち、単語を格納した既存の辞書（図示省略）にない未知の単語を抽出し、ベクトル記憶部４０に記憶されている単語に対する文脈ベクトルに基づいて、未知の単語に対して最も類似度が高い、辞書中の単語を推定する。そして、未知の単語を、推定された辞書中の単語に置き換えたテキストを生成する。 The replacement unit 232 extracts an unknown word that is not in an existing dictionary (not shown) that stores the word from the words of the text received by the input unit 210, and a context vector for the word stored in the vector storage unit 40 Based on the above, the word in the dictionary having the highest similarity to the unknown word is estimated. And the text which replaced the unknown word with the word in the estimated dictionary is produced | generated.

翻訳部２３４は、置換部２３２により単語が置き換えられたテキストを既存の手法により翻訳し、出力部２５０に出力して処理を終了する。 The translation unit 234 translates the text in which the word is replaced by the replacement unit 232 using an existing method, outputs the translated text to the output unit 250, and ends the processing.

なお、自然言語処理装置２００において、他の自然言語処理を行う際に、特定の文書中に出現する単語と類似する単語を辞書から抽出して、処理対象に含めることで、情報を増やして精度を向上させることが可能である。この際、出現した各単語に対して、上記（１）式を計算して類似度が高い単語を処理に含めるといったことを行う。 In the natural language processing apparatus 200, when performing other natural language processing, a word similar to a word appearing in a specific document is extracted from the dictionary and included in the processing target, thereby increasing information and accuracy. It is possible to improve. At this time, for each word that appears, the above formula (1) is calculated and a word having a high similarity is included in the process.

＜本発明の実施の形態に係る単語ベクトル学習装置の作用＞ <Operation of the word vector learning device according to the embodiment of the present invention>

次に、本発明の実施の形態に係る単語ベクトル学習装置１００の作用について説明する。入力部１０において文書データ、及びチューニングパラメタＤ,Ｂ,Ｆ,Ｔを受け付けると、単語ベクトル学習装置１００は、図５に示す単語ベクトル学習処理ルーチンを実行する。 Next, the operation of the word vector learning device 100 according to the embodiment of the present invention will be described. When the document data and the tuning parameters D, B, F, and T are received by the input unit 10, the word vector learning device 100 executes a word vector learning processing routine shown in FIG.

まず、ステップＳ１００では、最適化変数Ｒ,Ｃ,Ｕ,Ｖを乱数で初期化する。また、繰り返し数を管理する変数ｔをｔ＝０で初期化する。α、βも同様にα＝０、β＝０と初期化する。 First, in step S100, the optimization variables R, C, U, and V are initialized with random numbers. Also, a variable t for managing the number of repetitions is initialized with t = 0. Similarly, α and β are initialized as α = 0 and β = 0.

次に、ステップＳ１０２では、上記（６）式により表される、目的関数を最適化するように、文書データで出現する、単語と、単語の文脈として出現する単語とのペアの集合 Next, in step S102, a set of pairs of a word that appears in the document data and a word that appears as the context of the word so as to optimize the objective function represented by the above equation (6).

上記ステップＳ１０２は具体的には、図６に示す以下のステップＳ１２０〜Ｓ１２４により行われる。 Specifically, step S102 is performed by the following steps S120 to S124 shown in FIG.

ステップＳ１２０では、勾配法を用いて、上記（８）式、（９）式に従って、単語ベクトルのブロックｒ_ｉ，ｂと文脈ベクトルのブロックｃ_ｊ，ｂとを更新する。 In step S120, the word vector block r _{i, b} and the context vector block c _{j, b} are updated according to the above equations (8) and (9) using the gradient method.

ステップＳ１２２では、（１０）式に従って、補助パラメタｕ,ｖを更新する。 In step S122, the auxiliary parameters u and v are updated according to the equation (10).

ステップＳ１２４では、Ｒ,Ｃ,Ｕ,Ｖを固定し、上記（１３）式、（１４）式に従って、ラグランジュ未定乗数α_ｉ,ｂ及びβ_ｉ,ｂを最適化変数として更新する。 In step S124, R, C, U, and V are fixed, and Lagrange undetermined multipliers α _{i, b} and β _{i, b} are updated as optimization variables according to the equations (13) and (14).

ステップＳ１０４では、反復終了条件を満たすかを判定する。ｔ＝Ｔの時、終了と判定しステップＳ１１０の単語ベクトルの保存へ進み、ｔ＜Ｔの時には、ステップＳ１０６でｔ＝ｔ＋１としてステップＳ１０２の処理へ戻る。 In step S104, it is determined whether the iteration end condition is satisfied. When t = T, it is determined that the process is finished, and the process proceeds to the storage of the word vector in step S110. When t <T, t = t + 1 is set in step S106, and the process returns to step S102.

ステップＳ１０８では、ステップＳ１０２で更新された学習後の単語ベクトル及び文脈ベクトルの実数値ベクトルを適切な形式でベクトル記憶部４０に保存し、処理を終了する。 In step S108, the learned word vector and the real value vector of the context vector updated in step S102 are stored in an appropriate format in the vector storage unit 40, and the process ends.

以上説明したように、本発明の実施の形態に係る単語ベクトル学習装置によれば、単語ベクトル及び文脈ベクトルの各々を、次元数ＦのＢ個のブロックに分割し、次元数ＦのＳ種類の実数ベクトルの集合と、一方の単語が他方の単語の文脈として出現した回数とに基づいて、各ブロックが、Ｓ種類の実数ベクトルの何れかとなる制約の下で、単語ベクトル及び文脈ベクトルの各々の各ブロックの尤もらしさを表す目的関数を最適化するように、単語の各々についての単語ベクトル及び文脈ベクトルの各ブロックの実数値ベクトルを推定することにより、必要なメモリ容量を削減することができる単語ベクトルを学習することができる。 As described above, according to the word vector learning device according to the embodiment of the present invention, each of the word vector and the context vector is divided into B blocks having the dimensionality F, and S types having the dimensionality F are selected. Based on the set of real vectors and the number of times one word has appeared as the context of the other word, each block can be either of the S types of real vectors, with the constraint that each of the word vectors and context vectors. A word that can reduce the required memory capacity by estimating the word vector for each word and the real value vector of each block of the context vector so as to optimize the objective function representing the likelihood of each block Can learn vectors.

＜本発明の実施の形態に係る自然言語処理装置の作用＞ <Operation of Natural Language Processing Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る自然言語処理装置２００の作用について説明する。入力部２１０において翻訳対象のテキストを受け付けると、自然言語処理装置２００は、図７に示す自然言語処理ルーチンを実行する。 Next, the operation of the natural language processing apparatus 200 according to the embodiment of the present invention will be described. When the input unit 210 accepts a text to be translated, the natural language processing apparatus 200 executes a natural language processing routine shown in FIG.

ステップＳ２００では、入力部２１０で受け付けた翻訳対象のテキストから未知の単語を抽出する。 In step S200, an unknown word is extracted from the text to be translated received by the input unit 210.

ステップＳ２０２では、ベクトル記憶部２４０に記憶されている各単語の単語ベクトルに基づいて、ステップＳ２００で抽出された未知の単語に対して最も類似度が高い、辞書中の単語を推定し、翻訳対象のテキストについて、未知の単語を、推定された辞書中の単語に置き換えたテキストを生成する。 In step S202, based on the word vector of each word stored in the vector storage unit 240, the word in the dictionary having the highest similarity to the unknown word extracted in step S200 is estimated, and the translation target Is generated by replacing an unknown word with a word in the estimated dictionary.

ステップＳ２０４では、ステップＳ２０２で生成されたテキストに基づいて翻訳し、出力部２５０に出力して処理を終了する。 In step S204, translation is performed based on the text generated in step S202, and the text is output to the output unit 250 and the process is terminated.

以上説明したように、本発明の実施の形態に係る自然言語処理装置によれば、学習された単語ベクトルを用いて、単語の意味的な類似度に基づく翻訳処理を行うことができる。 As described above, according to the natural language processing apparatus of the embodiment of the present invention, it is possible to perform translation processing based on the semantic similarity of words using learned word vectors.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

１０、２１０入力部
２０、２２０演算部
３０学習部
３２更新部
３４反復判定部
４０ベクトル記憶部
５０、２５０出力部
１００単語ベクトル学習装置
２００自然言語処理装置
２３０自然言語処理部
２３２置換部
２３４翻訳部
２４０ベクトル記憶部 10, 210 Input unit 20, 220 Arithmetic unit 30 Learning unit 32 Update unit 34 Repetition determination unit 40 Vector storage unit 50, 250 Output unit 100 Word vector learning device 200 Natural language processing device 230 Natural language processing unit 232 Replacement unit 234 Translation unit 240 Vector storage

Claims

A word vector learning device that learns, for each word, a D-dimensional word vector related to the word and a D-dimensional context vector indicating that the word appears as a context of another word based on document data. ,
Each of the word vector and the context vector is divided into B blocks of dimension F, and a set of S types of real vectors of dimension F and one word for the word pair in the document data Based on the number of occurrences of the context, each block of each of the word vector and context vector is subject to a constraint that each block of the word vector and context vector is one of the S kinds of real vectors. A word vector learning apparatus including a learning unit that estimates a word vector and a real value vector of each block of a context vector for each of the words so as to optimize an objective function representing likelihood.

The learning unit
For each word vector block r _{i, b} for each word _{i and} for each block c _{j, b} for the context vector for each word j, so as to optimize the objective function represented by An updater for updating a real vector;
The word vector learning device according to claim 1, further comprising: an iterative determination unit that repeats updating by the updating unit until a predetermined iterative termination condition is satisfied.

Here, D represents a set of pairs of the word appearing in the document data and a word appearing as the context of the word, α _{i, b} and β _{i, b} are Lagrange undetermined multipliers, and u _{i , b} , v _{j, b} are auxiliary parameters that are real vectors of dimension F, ρ is a constant, ζ represents a set of S types of real vectors of dimension F, and X _{i, j} Represents the number of times word j appears as context with respect to word i.

A natural based on semantic similarity between words based on the word vector using the word vector of each word learned by the word vector learning device according to claim 1 or 2 with respect to the input document. A natural language processing apparatus including a natural language processing unit that performs language processing.

A word vector in a word vector learning device that learns, for each word, a D-dimensional word vector related to the word and a D-dimensional context vector indicating that the word appears as a context of another word based on document data A learning method,
The learning unit divides each of the word vector and the context vector into B blocks having a dimension number F, and one word for a word pair in the document data and a set of S kinds of real vectors having the dimension number F. Based on the number of times the word vector and context vector appear as contexts of the other word, each block of the word vector and context vector is subject to any of the S types of real vectors. A word vector learning method comprising: estimating a word vector for each of the words and a real value vector of each block of the context vector so as to optimize an objective function representing the likelihood of each block.

The step of estimating by the learning unit includes:
Each block r _{i, b} of the word vector for each word _{i and} each block c of the context vector for each word j so that the updating unit optimizes the objective function represented by updating _{j, b} real vectors;
5. The word vector learning method according to claim 4, further comprising: a step of repeating the update by the update unit until a repetition determination unit satisfies a predetermined repetition end condition.

Here, D represents a set of pairs of the word appearing in the document data and a word appearing as the context of the word, α _{i, b} and β _{i, b} are Lagrange undetermined multipliers, and u _{i , b} , v _{j, b} are auxiliary parameters that are real vectors of dimension F, ρ is a constant, ζ represents a set of S types of real vectors of dimension F, and X _{i, j} Represents the number of times word j appears as context with respect to word i.

The natural language processing unit uses the word vector of each word learned by the word vector learning device according to claim 4 or 5 for the inputted input document, and makes semantics between words based on the word vector. A natural language processing method including a step of performing natural language processing based on a similar degree of similarity.

The computer program to function as each unit constituting the word vectors learning equipment according to claim 1 or claim 2.

The program for functioning a computer as each part which comprises the natural language processing apparatus of Claim 3.