CN106897265B

CN106897265B - Word vector training method and device

Info

Publication number: CN106897265B
Application number: CN201710022458.0A
Authority: CN
Inventors: 李建欣; 刘垚鹏; 彭浩; 张日崇; 陈汉腾
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2017-01-12
Filing date: 2017-01-12
Publication date: 2020-07-10
Anticipated expiration: 2037-01-12
Also published as: CN106897265A

Abstract

The invention provides a word vector training method and device, and belongs to the technical field of machine learning. The word vector training method comprises the following steps: acquiring a newly added vocabulary library, wherein the vocabulary in the newly added vocabulary library and the vocabulary in an old vocabulary library form a new vocabulary library, and the vocabulary in the old vocabulary library corresponds to an old word vector; initializing the vocabulary in the new vocabulary library so that word vectors of the vocabulary in the old vocabulary library in the new vocabulary library are old word vectors, and vocabulary word vectors in the newly added vocabulary library in the new vocabulary library are random word vectors; and respectively updating word vectors of the words in the new word bank according to the first Huffman tree corresponding to the new word bank and the second Huffman tree corresponding to the old word bank. The word vector training method and device provided by the invention improve the training efficiency of the word vector.

Description

Word vector training method and device

Technical Field

The invention relates to the technical field of machine learning, in particular to a word vector training method and device.

Background

In the machine learning technique, in order for a machine to understand the meaning of a human language, a word representation tool of a neural network language model converts each word in the human language into a form of a word vector, so that a computer can learn the meaning of each word in the human language through the word vector.

With the prior art, when a new vocabulary is added to the vocabulary library, all vocabularies in the new vocabulary library are generally required to be learned again to obtain new word vectors of each vocabulary. However, this approach makes training of the word vectors inefficient.

Disclosure of Invention

The invention provides a word vector training method and device, which improve the training efficiency of word vectors.

The embodiment of the invention provides a word vector training method, which comprises the following steps:

acquiring a newly added vocabulary library, wherein the vocabulary in the newly added vocabulary library and the vocabulary in an old vocabulary library form a new vocabulary library, and the vocabulary in the old vocabulary library corresponds to an old word vector;

initializing the vocabulary in the new vocabulary library to ensure that word vectors of the vocabulary in the old vocabulary library in the new vocabulary library are old word vectors and the vocabulary word vectors in the new vocabulary library in the newly added vocabulary library are random word vectors;

and respectively updating word vectors of the words in the new vocabulary library according to the first Huffman tree corresponding to the new vocabulary library and the second Huffman tree corresponding to the old vocabulary library.

In an embodiment of the present invention, the updating the word vectors of the words in the new vocabulary library according to the first huffman tree corresponding to the new vocabulary library and the second huffman tree corresponding to the old vocabulary library respectively includes:

acquiring a preset target function corresponding to the first vocabulary, wherein the first vocabulary is a vocabulary in the new vocabulary library;

and performing gradient processing on the preset target function according to the attribute of the first vocabulary in the first Huffman tree and the attribute of the first vocabulary in the second Huffman tree to obtain a word vector corresponding to the first vocabulary.

In an embodiment of the present invention, the obtaining of the preset objective function corresponding to the first vocabulary includes:

if the first vocabulary belongs to the old vocabulary library, factorizing the first vocabulary according to an original objective function of a Skip-gram model to obtain a preset objective function corresponding to the first vocabulary;

and if the first vocabulary belongs to the newly added vocabulary library, the preset objective function corresponding to the first vocabulary is the original objective function of the Skip-gram model.

if the first vocabulary belongs to the old vocabulary library, performing factorization on the first vocabulary according to an original target function of a CBOW model to obtain a preset target function corresponding to the first vocabulary;

and if the first vocabulary belongs to the newly added vocabulary library, the preset target function corresponding to the first vocabulary is the original target function of the CBOW model.

In an embodiment of the present invention, factorizing the first vocabulary according to an original objective function of a Skip-gram model to obtain a preset objective function corresponding to the first vocabulary includes:

if the first vocabulary belongs to the old vocabulary library, then according to

Factorizing the first vocabulary to obtain a preset target function corresponding to the first vocabulary;

if the first vocabulary belongs to the newly added vocabulary library, the preset objective function corresponding to the first vocabulary is the original objective function of the Skip-gram model

Wherein W represents the first vocabulary, W represents the old vocabulary library, Δ W represents the newly added vocabulary library, c (W) represents a vocabulary library composed of vocabularies corresponding to W contexts, u represents a vocabulary corresponding to W contexts,

representing the lengths of matched Huffman codes of non-leaf nodes w on a second Huffman tree and a first Huffman tree, i represents that the first vocabulary is the ith node on the second Huffman tree, j represents that the first vocabulary is the jth node on the second Huffman tree,

a word vector representing the j-1 st node on the first huffman path corresponding to u,

representing the huffman coding of the j-th node on the second huffman path represented by u,

denotes the activation function, and v (w) denotes the word vector to which w corresponds.

In an embodiment of the present invention, the factorizing the first vocabulary according to the original objective function of the CBOW model to obtain the preset objective function corresponding to the first vocabulary includes:

if the first vocabulary belongs to the newly added vocabulary library, the preset target function corresponding to the first vocabulary is the original target function of the CBOW model

Wherein the content of the first and second substances,

representing the huffman code of the jth node on the second huffman path represented by w,

represents the sum of the word vectors corresponding to all the words in C (w).

In an embodiment of the present invention, the obtaining a word vector corresponding to the first vocabulary by performing gradient processing on the preset target function according to the attribute of the first vocabulary in the first huffman tree and the attribute of the second huffman tree includes:

if the first vocabulary belongs to the old vocabulary library and the first vocabulary is encoded in the first Huffman tree and encoded in the second Huffman treeHas the same prefix part, the vector of the corresponding node of the different part of the Huffman coding of the first vocabulary on the second Huffman tree is determined according to the

Executing random gradient ascending processing; the vector corresponding to the node on the second Huffman tree for different parts of the Huffman coding of the first vocabulary on the first Huffman tree is determined according to

Executing random gradient descent processing;

if the first vocabulary belongs to the newly added vocabulary library, the first vocabulary is processed according to

Executing random gradient ascending processing to obtain a word vector corresponding to the first vocabulary;

here, η' represents the learning rate.

if the first vocabulary belongs to the old vocabulary library and the coding of the first vocabulary in the first Huffman tree and the coding of the first vocabulary in the second Huffman tree have the same prefix part, then the vector of the corresponding node of the different part of the Huffman coding of the first vocabulary in the second Huffman tree is determined according to the prefix part

Execution follow-upGradient descending treatment;

wherein the content of the first and second substances,

and the word vector of the (i-1) th node on the first Huffman path corresponding to the w is represented.

The embodiment of the present invention further provides a word vector training device, including:

the acquisition module is used for acquiring a newly added vocabulary library, wherein the vocabulary in the newly added vocabulary library and the vocabulary in an old vocabulary library form a new vocabulary library, and the vocabulary in the old vocabulary library corresponds to an old word vector;

the initialization module is used for initializing the vocabulary in the new vocabulary library, so that word vectors of the vocabulary in the old vocabulary library in the new vocabulary library are old word vectors, and the vocabulary word vectors in the new vocabulary library in the newly added vocabulary library are random word vectors;

and the updating module is used for respectively updating the word vectors of the words in the new word bank according to the first Huffman tree corresponding to the new word bank and the second Huffman tree corresponding to the old word bank.

In an embodiment of the present invention, the updating module is specifically configured to obtain a preset objective function corresponding to the first vocabulary, where the first vocabulary is a vocabulary in the new vocabulary library; and performing gradient processing on the preset target function according to the attribute of the first vocabulary in the first Huffman tree and the attribute of the first vocabulary in the second Huffman tree to obtain a word vector corresponding to the first vocabulary.

According to the word vector training method and device provided by the embodiment of the invention, a newly added word library is obtained, and words in the new word library are initialized, so that word vectors of words in the new word library, which belong to words in an old word library, are old word vectors, and word vectors of words in the new word library, which belong to words in the newly added word library, are random word vectors; and respectively updating word vectors of words in the new vocabulary library according to the first Huffman tree corresponding to the new vocabulary library and the second Huffman tree corresponding to the old vocabulary library, so that the training efficiency of the word vectors is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a word vector training method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a process for updating word vectors of words in a new vocabulary library according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a word vector training apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.

Fig. 1 is a schematic flow chart of a word vector training method according to an embodiment of the present invention, where the word vector training method may be executed by a word vector training apparatus, and the word vector training apparatus may be integrated in a processor or may be separately configured, and the present invention is not limited in particular. Specifically, referring to fig. 1, the word vector training method may include:

and S101, acquiring a newly added vocabulary library.

And the vocabulary in the newly added vocabulary library and the vocabulary in the old vocabulary library form a new vocabulary library, and the vocabulary in the old vocabulary library corresponds to the old word vector.

In the embodiment of the invention, the vocabulary in the old vocabulary library is trained into the corresponding old word vector, and the vocabulary in the newly-added vocabulary library is not trained into the corresponding word vector. For example: the old vocabulary library is the vocabulary library of the existing trained word vector, the newly added vocabulary library comprises newly added vocabularies, and at the moment, the vocabularies in the old vocabulary library of the trained word vector and the newly added vocabularies are combined into a new vocabulary library.

S102, initializing the vocabulary in the new vocabulary library, so that word vectors of the vocabulary in the old vocabulary library in the new vocabulary library are old word vectors, and word vectors of the vocabulary in the newly added vocabulary library in the new vocabulary library are random word vectors.

For example, in the embodiment of the present invention, the old vocabulary library is W, where the vocabulary in the old vocabulary library has been trained to obtain the corresponding word vector v (W), the newly added vocabulary library is Δ W, the new vocabulary library is W ' ═ W + Δ W, the second huffman tree corresponding to the old vocabulary library W is T, and the first huffman corresponding to the new vocabulary library W ' is T '. Judging a first vocabulary W in the new vocabulary library, if W is in the old vocabulary library W, proving that W trains a corresponding word vector in the old vocabulary library, not training the word, but inheriting the original v (W); and if the first vocabulary w in the new vocabulary library is in the new vocabulary library, namely belongs to the new vocabulary, randomly initializing the word vector corresponding to w.

For example, taking the first vocabulary as an example, the first vocabulary is any vocabulary in the new vocabulary library, and the distribution of the first vocabulary on the first huffman tree can include two cases. In the first case: the first vocabulary is a leaf node on the first Huffman tree; in the second case: the first vocabulary is the non-leaf nodes on the first Huffman tree.

In the first case: if the first vocabulary is a leaf node on the first huffman tree, the first vocabulary can be initialized according to the following formula 1:

wherein w represents a first vocabulary, v (w) represents a word vector of w on a second Huffman tree T; v '(w) denotes the word vector of w on the first Huffman tree T'.

It can be seen from the formula 1 that, if the first vocabulary belongs to the old vocabulary library W, the word vector of the first vocabulary is the old word vector corresponding to the first vocabulary in the old vocabulary library, and if the first vocabulary does not belong to the old vocabulary library W, that is, the first vocabulary is the newly added vocabulary, at this time, the word vector of the first vocabulary may be initialized randomly, that is, the word vector of the first vocabulary is the random word vector at this time.

In the second case: if the first vocabulary is a non-leaf node on the first Huffman tree, the non-leaf node has a parameter vector. To distinguish the parameter vectors, we set W₁The parameter vector at the ith node on the corresponding first Huffman path is

W₂The parameter vector at the ith node on the corresponding first Huffman path is

When W is₁And W₂When corresponding to the same node on the tree, there are

Assuming that the code of a word w on the second Huffman tree is "0010" and the code on the first Huffman tree is "00011", since the Huffman codes of the two have the same prefix "00", and the vector on the node corresponding to the same prefix "00" remains unchanged, here, the identifier L needs to be set at the same time^wAnd L'^wRespectively representing the coding length of the first vocabulary w on the second Huffman tree and the coding length of the first vocabulary w on the first Huffman tree. Then the first vocabulary may be initialized according to equation 2 as follows:

wherein the content of the first and second substances,

representing the huffman coding of the ith node on the first huffman path represented by the non-leaf node w,

representing the huffman coding of the ith node on the second huffman path represented by the non-leaf node w. At this time, the huffman codes corresponding to the non-leaf nodes w on the first huffman tree may be divided into prefix matching parts

And other nodes

Representing the length of the matched huffman code of the non-leaf node w on the second huffman tree and on the first huffman tree.

It can be seen from the combination of equation 2 that if the first vocabulary is a non-leaf node on the first huffman tree, the vector corresponding to the prefix portion of the first vocabulary matched with the prefix portion on the second huffman tree on the first huffman tree is the existing parameter vector

And initializing the vector corresponding to the unmatched coding part as a zero vector.

It should be noted that, in the embodiment of the present invention, for the first vocabulary, if the first vocabulary is a leaf node on the first huffman tree, random initialization is adopted; if the node is a non-leaf node, initializing to a zero vector, specifically:

this allows the initial word vector to fall within the interval

Where m refers to the length of the word vector.

After the vocabulary in the new vocabulary library is initialized, the word vectors corresponding to the vocabulary in the new vocabulary library can be updated.

S103, respectively updating word vectors of the words in the new word bank according to the first Huffman tree corresponding to the new word bank and the second Huffman tree corresponding to the old word bank.

The word vector training method provided by the embodiment of the invention comprises the steps of obtaining a newly-added word library and carrying out initialization processing on words in the new word library, so that word vectors of words in the new word library, which belong to words in an old word library, are old word vectors, and word vectors of words in the new word library, which belong to words in the newly-added word library, are random word vectors; and respectively updating word vectors of words in the new vocabulary library according to the first Huffman tree corresponding to the new vocabulary library and the second Huffman tree corresponding to the old vocabulary library, so that the training efficiency of the word vectors is improved.

Optionally, in the embodiment of the present invention, in step S103, updating the word vectors of the words in the new vocabulary library according to the first huffman tree corresponding to the new vocabulary library and the second huffman tree corresponding to the old vocabulary library respectively may be implemented as follows, specifically please refer to fig. 2, where fig. 2 is a schematic flow diagram of updating the word vectors of the words in the new vocabulary library according to the embodiment of the present invention.

S201, acquiring a preset objective function corresponding to the first vocabulary.

Wherein, the first vocabulary is the vocabulary in the new vocabulary library.

Optionally, in S201, the preset objective function corresponding to the first vocabulary may be obtained through the following two models:

for a first Skip-gram model, if a first vocabulary belongs to an old vocabulary library, factorizing the first vocabulary according to an original objective function of the Skip-gram model to obtain a preset objective function corresponding to the first vocabulary; and if the first vocabulary belongs to the newly added vocabulary library, the preset objective function corresponding to the first vocabulary is the original objective function of the Skip-gram model.

For example, in the embodiment of the present invention, if the first vocabulary belongs to the old vocabulary library, factorizing each word in W according to the same part and different parts of the assembly code to obtain the preset objective function corresponding to the first vocabulary, that is: according to

if the first vocabulary belongs to the newly added vocabulary library, the preset objective function corresponding to the first vocabulary is the original objective function of the Skip-gram model:

wherein, w tableFirst vocabulary, W old vocabulary base, Δ W new vocabulary base, C (W) vocabulary base composed of vocabulary corresponding to W context, u vocabulary corresponding to W context,

representing the lengths of matched Huffman codes of non-leaf nodes w on the second Huffman tree and the first Huffman tree, i represents that the first vocabulary is the ith node on the second Huffman tree, j represents that the first vocabulary is the jth node on the second Huffman tree,

representing the activation function, v (w) representing the word vector to which w corresponds,

represents the sum of the codes of the same prefix,

representing the sum of other vocabulary inheritance in the new vocabulary library and non-leaf nodes initialized to zero.

For the second CBOW model, if the first vocabulary belongs to the old vocabulary library, factorizing the first vocabulary according to the original target function of the CBOW model to obtain a preset target function corresponding to the first vocabulary; and if the first vocabulary belongs to the newly added vocabulary library, the preset target function corresponding to the first vocabulary is the original target function of the CBOW model.

For example, in the embodiment of the present invention, if the first vocabulary belongs to the old vocabulary library, according to the first vocabulary, factoring each word in W according to the same part and different parts of the code to obtain the preset objective function corresponding to the first vocabulary, that is:

according to

And carrying out factorization on the first vocabulary to obtain a preset objective function corresponding to the first vocabulary.

If the first vocabulary belongs to the newly added vocabulary library, the preset target function corresponding to the first vocabulary is the original target function l (w, i) of the CBOW model:

wherein the content of the first and second substances,

represents the sum of the word vectors corresponding to all the words in C (w).

It should be noted that, in the embodiment of the present invention, by factoring each word in W according to the same part and different parts of the coding, the amount of computation in the word vector process can be saved, thereby improving the computation efficiency.

After the preset target function corresponding to the first vocabulary is obtained, gradient processing can be performed on the preset target function according to the attribute of the first vocabulary in the first Huffman tree and the attribute of the first vocabulary in the second Huffman tree, so that a word vector corresponding to the first vocabulary is obtained.

S202, gradient processing is carried out on a preset target function according to the attribute of the first vocabulary in the first Huffman tree and the attribute of the first vocabulary in the second Huffman tree, and a word vector corresponding to the first vocabulary is obtained.

Please refer to step S201, which can be implemented by the following two models:

for the first Skip-gram model, if the first word belongs to the old vocabulary library, and the first word belongs to the old vocabulary library, the first word is not a word in the old vocabulary libraryThe codes gathered in the first Huffman tree and the codes in the second Huffman tree have the same prefix part, and then the vectors of the corresponding nodes of the different parts of the Huffman codes of the first vocabulary on the second Huffman tree are according to the

Executing random gradient ascending processing; the vector corresponding to the node on the second Huffman tree for different parts of the Huffman coding of the first vocabulary on the first Huffman tree is based on

A random gradient descent process is performed.

If the first vocabulary belongs to the newly added vocabulary library, the first vocabulary is based on

And performing random gradient ascending processing to obtain a word vector corresponding to the first vocabulary, wherein η' represents the learning rate.

For example: can be expressed as:

for the second CBOW model, if the first vocabulary belongs to the old vocabulary library and the encoding of the first vocabulary in the first Huffman tree has the same prefix part as the encoding in the second Huffman tree, the vectors of the corresponding nodes of the different parts of the Huffman encoding of the first vocabulary in the second Huffman tree are based on

A random gradient descent process is performed.

And executing random gradient ascending processing to obtain a word vector corresponding to the first vocabulary.

Wherein the content of the first and second substances,

For example, it can be expressed as:

where η' denotes the learning rate, exemplary, an initial learning rate η is set₀Every 1000 words processed, the learning rate is adjusted according to the following formula:

wherein word count actual represents the number of words currently processed, and train words +1 is to prevent denominator from being zero, and a threshold η is introduced_min＝10^-4*η₀η minimum η_minIn the incremental learning process, the word number counter needs to add the word number of the original corpus and combine η_minη is calculated.

Fig. 3 is a schematic structural diagram of a word vector training apparatus 30 according to an embodiment of the present invention, and it should be understood that the embodiment of the present invention is only illustrated in fig. 3, but the present invention is not limited thereto. Referring to fig. 3, the word vector training apparatus 30 may include:

the obtaining module 301 is configured to obtain a new vocabulary library, where a vocabulary in the new vocabulary library and a vocabulary in an old vocabulary library form a new vocabulary library, and the vocabulary in the old vocabulary library corresponds to an old word vector.

The initialization module 302 is configured to initialize the vocabulary in the new vocabulary library, so that the word vector in the new vocabulary library that belongs to the vocabulary in the old vocabulary library is an old word vector, and the word vector in the new vocabulary library that belongs to the vocabulary in the newly added vocabulary library is a random word vector.

And the updating module 303 is configured to update word vectors of words in the new vocabulary library according to the first huffman tree corresponding to the new vocabulary library and the second huffman tree corresponding to the old vocabulary library.

Optionally, the updating module 303 is specifically configured to obtain a preset objective function corresponding to a first vocabulary, where the first vocabulary is a vocabulary in the new vocabulary library; and carrying out gradient processing on a preset target function according to the attribute of the first vocabulary in the first Huffman tree and the attribute of the first vocabulary in the second Huffman tree to obtain a word vector corresponding to the first vocabulary.

The word vector training apparatus 30 shown in the embodiment of the present invention may implement the technical solution corresponding to the word vector training method shown in the above method embodiment, and the implementation principle and the beneficial effect are similar, which are not described herein again.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for word vector training, comprising:

updating word vectors of words in the new vocabulary library respectively according to a first Huffman tree corresponding to the new vocabulary library and a second Huffman tree corresponding to the old vocabulary library;

wherein, the updating the word vectors of the words in the new vocabulary library according to the first Huffman tree corresponding to the new vocabulary library and the second Huffman tree corresponding to the old vocabulary library respectively comprises:

acquiring a preset target function corresponding to a first vocabulary, wherein the first vocabulary is a vocabulary in the new vocabulary library;

performing gradient processing on the preset target function according to the attribute of the first vocabulary in the first Huffman tree and the attribute of the first vocabulary in the second Huffman tree to obtain a word vector corresponding to the first vocabulary;

the obtaining of the preset objective function corresponding to the first vocabulary includes:

if the first vocabulary belongs to the newly added vocabulary library, the preset objective function corresponding to the first vocabulary is the original objective function of the Skip-gram model;

or, the obtaining of the preset objective function corresponding to the first vocabulary includes:

2. The method of claim 1, wherein the obtaining the predetermined objective function corresponding to the first vocabulary comprises:

representing the length of the matched Huffman codes on the second Huffman tree and the first Huffman tree when w is a non-leaf node, j represents that the first vocabulary is the jth node on the second Huffman tree,

representing activation function, v (w) representing word vector corresponding to w L'^uIndicating the coding length of the vocabulary u on the first huffman tree.

3. The method of claim 1, wherein the obtaining the predetermined objective function corresponding to the first vocabulary comprises:

if the first vocabulary belongs to the newly added vocabulary library, the preset target function corresponding to the first vocabulary is the original target function of the CBOW model;

wherein the content of the first and second substances,

representing the huffman code of the ith node on the second huffman path represented by w,

representing the sum of the word vectors corresponding to all the words in C (w);

w represents the first vocabulary, W represents the old vocabulary library, Δ W represents the newly added vocabulary library, c (W) represents a vocabulary library composed of vocabularies corresponding to the context of W,

representing the length of the matched Huffman codes on the second Huffman tree and the first Huffman tree when w is a non-leaf node; i denotes that the first vocabulary is the ith node on the second Huffman tree,

a word vector representing the i-1 st node on the first Huffman path corresponding to w, L'^wTo representThe coding length of the first word w on the first huffman tree,

representing an activation function.

4. The method according to claim 2, wherein the obtaining the word vector corresponding to the first vocabulary by performing gradient processing on the preset objective function according to the attribute of the first vocabulary in the first huffman tree and the attribute of the second huffman tree comprises:

Executing random gradient descent processing;

here, η' represents the learning rate.

5. The method according to claim 3, wherein the obtaining the word vector corresponding to the first vocabulary by performing gradient processing on the preset objective function according to the attribute of the first vocabulary in the first Huffman tree and the attribute of the second vocabulary in the second Huffman tree comprises:

Executing random gradient descent processing;

wherein η' represents the learning rate, X_wRepresenting the word vector corresponding to the first word w,

represents X_wThe transposing of (1).

6. A word vector training apparatus, comprising:

the updating module is used for respectively updating word vectors of words in the new word bank according to a first Huffman tree corresponding to the new word bank and a second Huffman tree corresponding to the old word bank;

the updating module is specifically configured to obtain a preset objective function corresponding to a first vocabulary, where the first vocabulary is a vocabulary in the new vocabulary library; performing gradient processing on the preset target function according to the attribute of the first vocabulary in the first Huffman tree and the attribute of the first vocabulary in the second Huffman tree to obtain a word vector corresponding to the first vocabulary;

wherein the update module is specifically configured to:

or, the update module is specifically configured to: