CN110348001A - A kind of term vector training method and server - Google Patents
A kind of term vector training method and server Download PDFInfo
- Publication number
- CN110348001A CN110348001A CN201810299633.5A CN201810299633A CN110348001A CN 110348001 A CN110348001 A CN 110348001A CN 201810299633 A CN201810299633 A CN 201810299633A CN 110348001 A CN110348001 A CN 110348001A
- Authority
- CN
- China
- Prior art keywords
- word
- vector
- context
- word vector
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000013598 vector Substances 0.000 title claims abstract description 479
- 238000012549 training Methods 0.000 title claims abstract description 79
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000004364 calculation method Methods 0.000 claims description 59
- 230000006870 function Effects 0.000 claims description 59
- 230000002452 interceptive effect Effects 0.000 claims description 28
- 230000003993 interaction Effects 0.000 claims description 22
- 238000007476 Maximum Likelihood Methods 0.000 claims description 6
- 239000003550 marker Substances 0.000 claims description 2
- 238000003058 natural language processing Methods 0.000 abstract description 11
- 238000010586 diagram Methods 0.000 description 12
- 230000015654 memory Effects 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 238000005457 optimization Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 6
- 238000005070 sampling Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000002123 temporal effect Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000005314 correlation function Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention discloses a kind of term vector training method and servers can satisfy the semanteme of natural language processing and the demand of syntax task for directional information to be integrated into term vector.The embodiment of the present invention provides a kind of term vector training method, comprising: obtains corresponding input term vector according to the word in training sample text;Corresponding original output term vector is obtained according to context words corresponding with the word in the training sample text;Target is generated according to the original output term vector and exports term vector, and the target output term vector carries the directional information for being used to indicate locality of the context words relative to the word;Term vector learning model is trained using the output term vector and target output term vector.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a word vector training method and a server.
Background
The SG (Skip-Gram) model is a currently general word vector learning model and is widely used in actual industrial environments. On the basis of large-scale corpora, the SG model can obtain a word vector model with high quality, and when a negative sampling (negative sampling) computing technology is used in a matched mode, word vectors can be computed efficiently and quickly, so that computing efficiency and result quality can be guaranteed simultaneously.
In the prior art, the SG model can be established by establishing the relationship between one word and other words around the word. Specifically, in a given corpus, for a word sequence segment, the SG model learns the relationship between them for each pair of words, i.e., predicts the probability of outputting other words given a word as input. The vector for each word is finally updated by optimizing these probability values.
Although the current SG models can effectively train the word vectors, the prior art still has some corresponding disadvantages. For example, the SG model treats any word in the context window of each target word equally, so the context structure information in the target word cannot be reflected in the vector of the target word, and all words around a word are equal in importance to the word, so the word vector obtained through the SG model learning cannot embody the context structure information, and the word vector obtained through the prior art is not sensitive to the position information of the target word, and cannot be effectively applied to semantic and syntactic tasks of natural language processing.
Disclosure of Invention
The embodiment of the invention provides a word vector training method and a server, which are used for integrating direction information into word vectors and can meet the requirements of semantic and syntactic tasks of natural language processing.
In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a word vector training method, including:
acquiring corresponding input word vectors according to words in the training sample text;
obtaining corresponding original output word vectors according to the context words corresponding to the words in the training sample text;
generating a target output word vector according to the original output word vector, wherein the target output word vector carries direction information for indicating the position direction of the context word relative to the word;
training a word vector learning model using the output word vectors and the target output word vectors.
In a second aspect, an embodiment of the present invention further provides a server, including:
the input word vector acquisition module is used for acquiring corresponding input word vectors according to words in the training sample text;
an output word vector obtaining module, configured to obtain a corresponding original output word vector according to a context word corresponding to the word in the training sample text;
an output word vector reconfiguration module, configured to generate a target output word vector according to the original output word vector, where the target output word vector carries direction information used to indicate a position direction of the context word relative to the word;
and the model training module is used for training a word vector learning model by using the output word vector and the target output word vector.
In the second aspect, the constituent modules of the server may further perform the steps described in the foregoing first aspect and various possible implementations, for details, see the foregoing description of the first aspect and various possible implementations.
In a third aspect, an embodiment of the present invention provides a server, where the server includes: a processor, a memory; the memory is used for storing instructions; the processor is configured to execute the instructions in the memory to cause the server to perform the method of any of the preceding first aspects.
In a fourth aspect, the present invention provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.
In a fifth aspect, embodiments of the present invention provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the above aspects.
According to the technical scheme, the embodiment of the invention has the following advantages:
in the embodiment of the invention, firstly, a corresponding input word vector is obtained according to a word in a training sample text, a corresponding original output word vector is obtained according to a context word corresponding to the word in the training sample text, then, a target output word vector is generated according to the original output word vector, the target output word vector carries direction information used for indicating the position direction of the context word relative to the word, and a word vector learning model is trained by using the output word vector and the target output word vector. In the embodiment of the invention, the context of the input word in different position directions is modeled respectively, and the structural information of the context word is integrated into the word vector learning, so the word vector obtained by the word vector model learning can embody the structural information of the context, and the word vector obtained by the word vector learning model provided by the embodiment of the invention can be suitable for various tasks of natural language processing, especially semantic and syntax related tasks.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings.
Fig. 1 is a schematic flow chart diagram of a word vector training method according to an embodiment of the present invention;
fig. 2 is a schematic view of an application scenario of the word vector training method according to the embodiment of the present invention;
fig. 3 is a schematic diagram of an SG model as a word vector learning model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of joint optimization provided by an embodiment of the present invention;
FIG. 5 is a diagram illustrating a word vector learning model according to an embodiment of the present invention as an SSG model;
fig. 6-a is a schematic structural diagram of a server according to an embodiment of the present invention;
FIG. 6-b is a schematic diagram illustrating a structure of an output word vector reconfiguration module according to an embodiment of the present invention;
FIG. 6-c is a schematic diagram of a structure of a model training module according to an embodiment of the present invention;
FIG. 6-d is a schematic diagram illustrating a structure of another output word vector reconfiguration module according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a composition structure of a server to which the word vector training method according to the embodiment of the present invention is applied.
Detailed Description
The embodiment of the invention provides a word vector training method and a server, which are used for integrating direction information into word vectors and can meet the requirements of semantic and syntactic tasks of natural language processing.
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one skilled in the art from the embodiments given herein are intended to be within the scope of the invention.
The terms "comprises" and "comprising," and any variations thereof, in the description and claims of this invention and the above-described drawings are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The following are detailed below.
The word vector training method provided by the embodiment of the invention realizes the training of the word vector learning model by using the direction information of the context, the word vector learning can be an SG (Skip-Gram) model with direction pointing of the context, for convenience of description, the SG model with direction pointing of the context adopted by the embodiment of the invention is called a DSG (direct Skip-Gram) model, and the DSG model provided by the embodiment of the invention can help to learn the word vector. The DSG model provided by the embodiment of the invention considers that sequence information of words is a very important indicating signal in any language, and for all input and output word pairs (pair), direction information is introduced into an output word vector to indicate the left and right directions (and the up and down directions) of a target word on the input word, so that the guiding function of the target word on the input word is enhanced, and a better word vector is obtained. In the embodiment of the invention, the structural information of the text is integrated into word vector learning by respectively modeling the upper text and the lower text of the target word. Therefore, the word vector obtained through the learning of the DSG model can embody the structural information of the context, the semantic expression capability of the current word vector can be enhanced through the direction information of the context, and the syntactic capability is increased at the same time, so that the word vector obtained through the embodiment of the invention can be suitable for semantic and syntactic tasks of natural language processing.
The word vector training method provided in the embodiment of the present invention may be applied to a word vector learning scenario, and the method may be applied to a server, where the server may include a processor and a memory, where an input word vector and a target output word vector are stored by a storage device in the server, and the target output word vector carries direction information indicating a position direction of a context word with respect to a word. For example, the input word vector and the target output word vector are stored in the memory of the server, and the processor may read a program from the memory to execute the word vector training method provided by the embodiment of the present invention.
Referring to fig. 1, a word vector training method according to an embodiment of the present invention includes the following steps:
101. and acquiring corresponding input word vectors according to words in the training sample text.
In the embodiment of the present invention, a corpus stores training sample texts, where the training sample texts may include a segment of vocabulary, where each vocabulary may be a word, and the word corresponds to a context word, for example, the training sample texts include: a continuous segment of vocabulary: ABC, then for word B, word a and word C constitute the context word for that B. Firstly, words and context words of the words are obtained from a training sample text, corresponding input word vectors are obtained according to the words in the training sample text, wherein the input word vectors comprise the words, the input vectors can be input into a word vector learning model, the input vectors can be continuously updated in the model training process, and new words can be continuously read from a corpus and written into the input vectors.
102. And acquiring a corresponding original output word vector according to the context word corresponding to the word in the training sample text.
In the embodiment of the invention, after words and context words of the words are obtained from a training sample text, an original output word vector corresponding to the context words can be obtained, the original output word vector comprises the context words corresponding to the words, the output word vector is a standard value of prediction output of a word vector learning model, and when an input vector is continuously updated in a model training process, new context words corresponding to the words can be continuously read from a corpus and written into the original output word vector.
It should be noted that, in the embodiment of the present invention, the output word vector corresponding to the context word may be described as an "original output word vector", and after the input word vector and the output word vector are obtained, training of a word vector learning model cannot be directly performed, but the original output word vector needs to be reconfigured to carry the position direction of the context word relative to the word in the output word vector.
103. And generating a target output word vector according to the original output word vector, wherein the target output word vector carries direction information for indicating the position direction of the context word relative to the word.
In the embodiment of the present invention, after the original output word vector is obtained, a target output word vector needs to be generated according to the original output word vector, where the target output word vector carries direction information for indicating a position direction of a context word relative to a word, that is, the original output word vector is reconfigured to carry the position direction of the context word relative to the word in the original output word vector, and for convenience of description of distinction, an output word vector obtained after the original output word vector is reconfigured is referred to as a "target output word vector".
In the embodiment of the invention, the target output word vector carries direction information indicating the position direction of the context word relative to the word. Where the position direction indicates which direction of the word the context word appears in, the direction information may be a one-dimensional array indicating the position direction. For example, the positional orientation of the context word relative to the word may include: a context word appears above (i.e., left direction) or below (i.e., right direction) a word, and the direction information may take a value of 1 if the context word of the word appears above (i.e., left direction) the word, and 0 if the context word of the word appears below (i.e., right direction) the word.
In some embodiments of the present invention, step 103 generating a target output word vector from the original output word vector comprises:
generating a direction vector according to the context word appearing above or below the word, wherein the direction vector is used for indicating that the context word appears above or below the word;
obtaining a target output word vector through the original output word vector and the direction vector, wherein the target output word vector comprises: the original output word vector and direction vector.
The sequence information of a word is an important indication signal in any language, the context word of a word in the corpus can indicate the sequence information corresponding to the word, the direction vector is used to indicate that the context word appears above or below the word, and the target output word vector is obtained through the original output word vector and the direction vector. The direction vector is introduced to indicate the left direction or the right direction of the target word in the input word, so that the guiding effect of the target word on the input word is enhanced, and a better word vector is obtained.
In some embodiments of the present invention, step 103 generating a target output word vector from the original output word vector comprises:
acquiring an upper output word vector from an original output word vector according to the context word appearing above the word;
acquiring a context output word vector from an original output word vector according to the context word appearing in the context of the word;
obtaining a target output word vector through the upper output word vector and the lower output word vector, wherein the target output word vector comprises: the above output word vector and the below output word vector.
In the embodiment of the present invention, a manner of implicitly carrying directional information may also be adopted, that is, two groups of output word vectors may be designed, which are respectively used to express the upper word and the lower word of any input word, unlike the implementation manner of carrying directional vectors in the target output word vector in the foregoing embodiment. Thus, each word has three vectors, one for expressing the word as an input word vector, one for expressing as an above output word vector, and the last for expressing as a below output word vector. Thus, when calculating a word vector, for any input word, its previous word may use their previous output word vector, and the following word may use the following output word vector with the input word vector of the input word to calculate a log-probability likelihood estimate. This implementation in embodiments of the present invention may also actively distinguish the context of a word during the learning process of a word vector, because each time it is only possible to update as context or context alone, the output vector of each word will only have half the probability to be updated.
104. The word vector learning model is trained using the output word vectors and the target output word vectors.
In the embodiment of the present invention, after an output word vector and a target output word vector are obtained, the word vector learning model may be trained using the output word vector and the target output word vector, the word vector learning model provided in the embodiment of the present invention may be an SG model with a directional direction in a context, which is referred to as a DSG model for short, and since the target output word vector carries directional information indicating a position direction of a context word relative to a word, structural information of the context word may be integrated into word vector learning through training of the word vector learning model, so that the word vector obtained through learning by the word vector model may embody structural information of the context, and the word vector obtained through the word vector learning model provided in the embodiment of the present invention may be applicable to semantic and syntactic tasks of natural language processing. The direction information is used for expanding the existing word vector learning model in the embodiment of the invention, so that various model variants can be obtained according to different use scenes, different tasks are further suitable, and word vectors with higher quality can be obtained.
In some embodiments of the invention, the target output word vector comprises: in the case of the original output word vector and the direction vector, step 104 trains a word vector learning model using the output word vector and the target output word vector, including:
obtaining an interactive function calculation result according to the input word vector and the direction vector, and performing iterative updating on the input word vector and the direction vector according to the interactive function calculation result;
obtaining a conditional probability calculation result according to the input word vector and the original output word vector, and performing iterative updating on the input word vector and the original output word vector according to the conditional probability calculation result;
and estimating the optimal target of the word vector learning model according to the interactive function calculation result and the conditional probability calculation result.
In the embodiment of the present invention, the result of the interaction function may be calculated by using the input word vector and the direction vector, that is, the result of the interaction function calculation is obtained, for example, the interaction relationship between the input word vector and the direction vector may be calculated by using a softmax function, so as to achieve the purpose of integrating the direction information into the final word vector. And synchronously updating the values of the input word vector and the direction vector through the interactive function calculation result, so that the interactive function calculation result accords with an expected result, for example, the interactive relation between the input word vector and the direction vector is calculated by using a softmax function, and the value of the interactive function calculation result is ensured to tend to 1 when the context word is on the left side of the word, and tends to 0 when the context word is on the right side of the word. In the embodiment of the invention, besides the need of calculating the interaction function between the input word vector and the direction vector, the conditional probability between the words and the context words needs to be synchronously calculated, namely, the conditional probability calculation result can be obtained according to the input word vector and the original output word vector, for example, the conditional probability between the words is calculated through an SG model, so that the semantic relation between the words is modeled. After the interactive function calculation result and the conditional probability calculation result are respectively obtained through the steps, joint optimization can be carried out according to the interactive function calculation result and the conditional probability calculation result, namely, the optimal target of the word vector learning model can be estimated, so that the optimal target of each word can be respectively updated in an iterative mode through the interactive function calculation result and the conditional probability calculation result, and after the training of the word vector learning model is completed, the word vector learning model can obtain high-quality word vectors of input words.
Optionally, in some embodiments of the present invention, taking an interaction function as a softmax function as an example, obtaining a calculation result of the interaction function according to an input word vector and a direction vector, where the calculation result includes:
an interaction function between an input word vector and a direction vector is calculated by, among other things,
wherein g (ω)t+i,ωt) Representing the result of the calculation of the interaction function, δωt+iMeaning that the context word is ωt+iDirection vector of time, vωtRepresenting the word as omegatThe input vector of time, V, represents the set of all words in the corpus. In the above formula, exp represents an e-function and T represents a transposition.
Optionally, in some embodiments of the present invention, iteratively updating the input word vector and the direction vector according to the interactive function calculation result includes:
the input word vector and the direction vector are iteratively updated in such a way that, among other things,
wherein,represents the updated word as ωtThe input vector of the time of day,represents the input vector before update, gamma represents the learning rate, deltaωt+iMeaning that the context word is ωt+iDirection vector of time, vωtRepresenting the word as omegatInput vector of time, σ (v)ωt Tδωt+i) Indicating the position direction predicted value of the context word relative to the word, D indicating the position direction mark value of the context word relative to the word,represents an updated context word of ωt+iThe direction vector of the time of flight,representing the context word before update as ωt+iThe direction vector of time.
In the above formula, a superscript (new) is used to indicate a vector after update, a superscript (old) is used to indicate a vector before update, γ is a learning rate, and the learning rate is a numerical variable that decreases as the training process progresses in the training of word vectors, for example, the learning rate may be defined as a ratio of an untrained text size to a total text size.
Optionally, the position and direction flag value D satisfies the following condition:
wherein, when i <0, the position direction of the context word relative to the word is represented as the above, and when i >0, the position direction of the context word relative to the word is represented as the below.
For example, D is the label information of the context word in the left and right directions of the input word, and as mentioned above, there are two values: i <0 corresponds to the vocabulary in the text, i >0 corresponds to the vocabulary in the text, and in each training sample, the value of D is a mark automatically obtained according to the position of the word during training.
Optionally, in some embodiments of the present invention, the optimal target of the word vector learning model is estimated according to the interactive function calculation result and the conditional probability calculation result:
the global log maximum likelihood estimate f (ω) is calculated as followst+i,ωt) Wherein
f(ωt+i,ωt)=p(ωt+i|ωt)+g(ωt+i,ωt) (formula five)
Wherein g (ω)t+i,ωt) Representing the result of the calculation of the interaction function, p (ω)t+i|ωt) Indicating conditionsAnd (5) calculating a result of the probability.
Joint log-likelihood estimation L for calculating the probability of a word to a context word bySGWherein
where V represents the set of all words in the corpus, and the context word is ωt+iThe word is omegatAnd c represents a context window size.
For example, the global log-maximum likelihood estimation can be optimized through the formula five, so that the optimal target estimation on the word vector learning model in the embodiment of the present invention can be converted into a joint optimization problem on two correlation functions, thereby implementing the optimal target estimation on the word vector learning model.
As can be seen from the description of the embodiments of the present invention in the above embodiments, first, a corresponding input word vector is obtained according to a word in a training sample text, a corresponding original output word vector is obtained according to a context word corresponding to the word in the training sample text, then, a target output word vector is generated according to the original output word vector, the target output word vector carries direction information indicating a position direction of the context word relative to the word, and a word vector learning model is trained using the output word vector and the target output word vector. Because the context of the input words in different position directions is modeled respectively in the embodiment of the invention, the structural information of the context words is merged into the word vector learning, so the word vector obtained by the word vector model learning can embody the structural information of the context, and the word vector obtained by the word vector learning model provided by the embodiment of the invention can be suitable for various tasks of natural language processing, especially tasks related to semantics and syntax.
In order to better understand and implement the above-mentioned schemes of the embodiments of the present invention, the following description specifically illustrates corresponding application scenarios.
The word vector learning model used in the embodiment of the present invention may be an improved SG model (hereinafter referred to as a DSG model) which is a word vector learning model by establishing a relationship between one word and other words around it. Specifically, in a given corpus, for a word sequence segment, the SG model learns the relationship between them for each pair of words, i.e., predicts the probability of outputting other words given a word as input. The vector for each word is finally updated by optimizing these probability values. The method provided by the invention can enhance the semantic ability of the SG model and increase the syntactic ability at the same time.
The word vector training method provided by the embodiment of the invention is used as a basic algorithm and can be used in all natural language related application scenes and processing technologies and products required by the application scenes. The usage mode is generally to generate or update word vectors by using the word vector learning model provided by the invention, and deliver the generated vectors to be applied to subsequent natural language processing tasks. For example, the generated word vector can be used in a word segmentation and part-of-speech tagging system to improve the accuracy of word segmentation and part-of-speech tagging, thereby improving the subsequent processing capability. As another example, in a search and related scenarios, the obtained search results often need to be sorted, and the sorted results often need to calculate semantic similarity of each result to a search query statement (query). The similarity measurement can be achieved through similarity calculation of word vectors, and therefore the quality of the vectors greatly determines the effect of the semantic similarity calculation method. In addition to the above tasks, since the word vectors trained by the embodiments of the present invention effectively combine and distinguish context information of different words, it is possible to have better performance especially for tasks of semantic and syntactic types (e.g., part-of-speech tagging, chunking analysis, structural syntactic analysis, dependency syntactic analysis, etc.).
Fig. 2 is a schematic view of an application scenario of the word vector training method according to the embodiment of the present invention. In the word vector training method provided by the embodiment of the invention, human languages have a linear characteristic, that is, words expressing any language often follow a certain sequence relationship, so that the collocation of the words can form a certain relatively fixed front-back sequence relationship, for example, in a sentence, a word may often appear on the left side of another word, especially for a language with higher requirement on the word sequence than the syntax, such as Chinese. Based on the above analysis, the embodiment of the present invention uses another approach to model the above (left text) and below (right text) relations of the input word respectively to reflect the word order relation formed by the context of one word. On the basis of an SG model, the embodiment of the invention introduces an additional direction vector delta for each word, and the vector delta is used for expressing and calculating the situation that the word appears on the left side or the right side of an input word as a certain context word.
For this purpose, a softmax function g is defined, and the interaction between the direction vector of the following word and the word vector of the currently input word is calculated as in the formula one, so as to integrate the direction information into the final word vector.
In particular, the interaction function is used to compute the input word w fortContext word wt+iAnd synchronously updating the values of delta and v through the calculation result of the formula I to ensure that when w is equal tot+iAt wtThe value of g tends to 1 when w is on the left sidet+iAt wtThe value of g tends to 0 when the right side is right. To achieve this effect, the updating manner of δ and v can be as in formula two and formula three in the foregoing embodiments, where the superscript (new) is used to indicate the vector after updating, and (old) is used to indicate the vector before updating, and the learning rate is a numerical variable that is continuously reduced as the training process progresses in the training of the word vector, and is generally defined as the ratio of the size of the untrained text to the size of the total text. Where D is the tagged information of the context word around the input word, there are two values, as described above, such as the formula four, i in the previous embodiment<0 corresponds to the above vocabulary, i>0 corresponds to the vocabulary in the following text, and in each training sample, the value of D is distinguished according to the text and the following text, and is a mark automatically obtained according to the position of the word during training.
The g function defined by the above formula can be regarded as an effective means for modeling the structural information of the context, and besides the g function, in the embodiment of the present invention, an SG model is used to calculate the conditional probability between words for modeling the semantic relationship between words.
As shown in fig. 3, the word vector learning model provided in the embodiment of the present invention is a schematic diagram of an SG model. Wherein, w0Is a current word, w-2,w-1,w1And w2Is w0The SG model utilizes w0As an input, maximize w0Probability to other words, so the optimization goal of the SG model over the entire corpus is to maximize each word wtA joint Log-likelihood estimate (Log-likelihood) of the probability to its context may be estimated, for example, by the aforementioned equation six.
For convenience of explanation of subsequent methods, formula six may use the f function to express wtIn the SG model, f (w)t+i,wt) Defined as a softmax function expressed by a word vector, for example, as shown in equation seven below:
wherein v iswtFinger wtIs expressed as an input vector of'wt+iFinger wt+iThe output vector of (a) is expressed, and so on. Each word in the SG model has two vectors, one for the input word (labeled v) and the other for the predicted output context word (labeled v'). Therefore, the SG model increases the value of the joint likelihood estimate in formula six by calculating formula seven and updating the vectors for each word continuously iteratively over the entire corpus, and outputs the vectors for all words after a specified number of iterations.
Fig. 4 is a schematic diagram of joint optimization provided in the embodiment of the present invention. In the embodiment of the present invention, the optimization target of the DSG model is consistent with the function defined by formula six, and the global log maximum likelihood estimation is optimized, for example, by using formula five in the foregoing embodiment. Therefore, in the present invention, it can be considered as a joint optimization problem for two correlation functions, and the optimization target of each word can be expressed in the form shown in fig. 4, where the solid line arrow represents the prediction relationship and the dotted line arrow represents the vector update process of the input word.
In the foregoing implementation process, the method provided by the embodiment of the present invention has no special requirement on hardware, is consistent with a word vector learning model (e.g., SG model), can complete calculation by using a common processor, and can be implemented by using a single thread or multiple threads. The word vectors and direction vectors related to the present invention are stored in a Memory (RAM) during the calculation process, and are output to a disk or other carriers for storage after the calculation is completed. In the embodiment of the invention, the whole algorithm only needs to give one training corpus, and the vectors of the words contained in the corpus can be calculated according to parameters such as the size of a predefined window, the iteration times and the like.
The embodiment of the present invention further provides a Structured SG (Structured Skip-Gram, SSG), where the SSG model considers context words of the input word and also considers the influence of the positions of the context words on the input word, the positions of the context words refer to a relative position relationship between the context words and the input word in the corpus, and the probabilities of the context words are predicted separately for each different position. The structure of the SSG model is similar to the SG model, as shown in FIG. 5, except that the SSG model estimates the probability of each context word at a corresponding location using different parameters, O in FIG. 2-2、O-1、O1And O2Indicating that different words are predicted using a uniform O as distinguished from that shown in fig. 3. Wherein, O is the expression of the prediction relationship, the same O represents the same prediction relationship, and O with different corner marks represents different prediction relationships.
In the embodiment of the invention, the optimization target of the SSG model is consistent with that of the SG model, and the joint log-likelihood estimation of the whole corpus is maximized. The only difference is that there are multiple output vectors corresponding to different locations in the SSG model, so f is defined as equation eight below.
Wherein r is the relative position, c is the contextThe window size and the remaining physical quantities have the same meaning as the preceding formula, where a context word wt+iThe probability of the input word needs to be taken into account at wtAnd thus the SSG model formally defines a series of different "roles" (prototypes) for each context word to distinguish the effect of the words on the input word when they occur at different locations. Compared with the SG model, distinguishing context words at different positions enables the SSG model to model the structural information of the context (here, information such as the arrangement and sequence relationship of the words) to a certain extent, so that the mutual relationship between the words richer than the SG can be learned.
As can be seen from the foregoing examples of the SG model, the DSG model, and the SSG model, although various methods can effectively train the word vector, there are some differences in many aspects. For example, the SG model does not distinguish between different types of contexts, and treats any word within the context window for each target word input word equally. Therefore, the context structure information in the target word cannot be reflected in the vector of the target word, all words around a word are equal in importance degree to the word, and much collocation information (especially fixed forward collocation or backward collocation) cannot be reflected in the learning process of the vector. In contrast, the SSG model solves the problem of distinguishing the context from the SG model, and ensures that the context words at each location have a specific and unique role, however, this significantly increases the computational complexity, and for training with the same size corpus, the SSG may take several times the SG.
Table 1 lists the temporal and spatial complexity of SG and SSG models, where d represents the dimension of a word vector (e.g., 50,100,200, etc.), and S is the corpus size (total Token number) used to train the word vector, where Token can be translated into "word" representing the number of words in the corpus, but will be different from the concept of words in the vocabulary, e.g., a corpus contains 1 ten thousand words, and possibly only 100 different words (i.e., vocabulary), where Token refers to 1 ten thousand words. V is the set of all words in the corpus, o is the time required for performing one vector update, n is the number of negative samples (negative sampling), and negative sampling is an algorithm for effectively reducing the computational complexity in word vector computation, as can be seen from table 1 below, when the context window increases, the spatial and temporal complexity of the SSG model will be higher than the SG model by the multiple of the window size c, and taking the context window size of 5 words typically used in word vector computation as an example, training an SSG model will require about 5 times the spatial and temporal complexity of the SG model.
TABLE 1 spatiotemporal complexity analysis of the various models
Method of producing a composite material | Spatial complexity | Time complexity |
SG | 2|V|d | 2cS(n+1)o |
SSG | (2c+1)|V|d | 4c2S(n+1)o |
DSG | 3|V|d | 2cS(n+2)o |
Wherein, the lower the space-time complexity, the easier the implementation, and the less the hardware requirement for the processor. The space-time complexity of the DSG model compared to the SG model and the SSG model is also shown in the third row of Table 1. Compared with an SG model and an SSG model, the DSG model can give consideration to learning of structural information of a certain degree of context, and meanwhile, compared with the SG model, the calculation complexity is not increased remarkably. The foregoing SSG model has higher computational complexity than the DSG model, and due to sparsity of occurrence of words at different positions, the SSG model is difficult to extend to a larger context window environment in actual computation, whereas the DSG model is less affected by the data sparsity problem than the SSG model. Considering the expression characteristics of Chinese, the DSG model is more sensitive to word order than syntax, so that the DSG model is more suitable for learning Chinese word vectors, and is more favorable for semantic understanding of a Chinese environment and further processing of Chinese vocabularies.
In the word vector training method provided by the invention, a group of additional direction vectors are introduced by using the DSG model and used for expressing the position information of the context words to the input words, so that the method can learn the structure information of the context. For example, compared with the SG and SSG models, the DSG model requires 1.5 times more spatial complexity than the conventional SG model, the temporal complexity is close to the conventional SG model, and is much lower than the SSG model, especially the spatial complexity is not affected by the size of the context window, the temporal complexity is linearly proportional to the window size, and the SSG model is squared.
In the embodiment of the present application, since a set of additional direction vectors is introduced, the set of additional direction vectors can be output separately, and can be directly used for calculating the position relationship between a certain word and another word, for example, directly calculating the direction vector and the cosine of the word vector, where sim-cosine (v1, d2) refers to the word vector of word 1 as v1, and d2 refers to the direction vector of word 2. The similarity is calculated through the position vector of a certain word and the word vector of another word, so that the expression of position information is simplified, and only the upper text and the lower text are distinguished. Meanwhile, since the sequence information of the context is integrated into the word vector expression, the word vector obtained through the learning of the DSG model has certain syntactic adaptability, that is, the final word vector implicitly contains text structure information, so that the method can help the semantic and syntactic related tasks (such as part-of-speech tagging, chunk recognition, dependency syntactic analysis and the like) of natural language processing to a certain extent.
Due to the context distinguishing performance in the invention, the word vector obtained by the learning of the method can obtain more accurate word class distinguishing capability. This is because, due to the characteristics of the language structure, words belonging to certain categories tend to be subject to a certain degree of organization, for example, adjectives tend to be in front of nouns, adverbs before and after verbs are functionally different, and so on (as with the aforementioned syntactic adaptability). Therefore, when similar words are calculated using word vectors learned by DSG, it is easier to obtain the same type of words (compared to SG models), and word vectors having such a capability can be calculated more efficiently than complex models such as SSG.
Without limitation, in the word vector training method provided by the embodiment of the present invention, the negative sampling algorithm may be replaced with a hierarchical softmax (hierarchical softmax) algorithm to calculate the probability of the target word and predict the target word. Compared with negative sampling, the layered softmax can obtain better results when the training data is small, but the required computing space is remarkably increased when the training data is increased.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
To facilitate a better implementation of the above-described aspects of embodiments of the present invention, the following also provides relevant means for implementing the above-described aspects.
Referring to fig. 6-a, a server 600 according to an embodiment of the present invention may include: an input word vector obtaining module 601, an output word vector obtaining module 602, an output word vector reconfiguring module 603, and a model training module 604, wherein,
an input word vector obtaining module 601, configured to obtain a corresponding input word vector according to a word in a training sample text;
an output word vector obtaining module 602, configured to obtain a corresponding original output word vector according to a context word corresponding to the word in the training sample text;
an output word vector reconfiguration module 603, configured to generate a target output word vector according to the original output word vector, where the target output word vector carries direction information used to indicate a position direction of the context word relative to the word;
a model training module 604, configured to train a word vector learning model using the output word vector and the target output word vector.
In some embodiments of the present application, referring to fig. 6-b, the output word vector reconfiguration module 603 includes:
a direction vector generation module 6031 configured to generate a direction vector according to the context word appearing above or below the word, where the direction vector is used to indicate that the context word appears above or below the word;
a first target output word vector generating module 6032, configured to obtain the target output word vector by using the original output word vector and the direction vector, where the target output word vector includes: the original output word vector and the direction vector.
In some embodiments of the present application, referring to fig. 6-c, the model training module 604 comprises:
an interactive function calculation module 6041, configured to obtain an interactive function calculation result according to the input word vector and the direction vector, and perform iterative update on the input word vector and the direction vector according to the interactive function calculation result;
a conditional probability calculation module 6042, configured to obtain a conditional probability calculation result according to the input word vector and the original output word vector, and iteratively update the input word vector and the original output word vector according to the conditional probability calculation result;
and an object estimation module 6043, configured to estimate an optimal object of the word vector learning model according to the interaction function calculation result and the conditional probability calculation result.
Further, in some embodiments of the present application, the interaction function calculation module 6041 is specifically configured to calculate an interaction function between the input word vector and the direction vector, wherein,
wherein, the g (ω)t+i,ωt) Representing the result of said interactive function calculation, said δωt+iMeans that the context word is ωt+iDirection vector of time, said vωtRepresents the word as omegatThe V represents the set of all words in the corpus.
Further, in some embodiments of the present application, the interactive function calculating module 6041 is specifically configured to iteratively update the input word vector and the direction vector, wherein,
wherein, theRepresents that the updated word is ωtAn input vector of time, saidRepresenting the input vector before update, gamma representing the learning rate, deltaωt+iMeans that the context word is ωt+iDirection vector of time, instituteV isωtRepresents the word as omegatAn input vector of time, said σ (v)ωt Tδωt+i) Representing a position direction predicted value of the context word relative to the word, the D representing a position direction marker value of the context word relative to the word, theRepresents the updated context word as ωt+iA direction vector of time, saidRepresents the context word before update as ωt+iThe direction vector of time.
In some embodiments of the present application, the position direction flag value D satisfies the following condition:
wherein when i <0, it means that the position direction of the context word with respect to the word is above, and when i >0, it means that the position direction of the context word with respect to the word is below.
Further, in some embodiments of the present application, the target estimation module 6043 is configured to calculate a global log maximum likelihood estimate f (ω) byt+i,ωt) Wherein, f (ω)t+i,ωt)=p(ωt+i|ωt)+g(ωt+i,ωt) Said g (ω)t+i,ωt) Representing the result of said interaction function computation, said p (ω)t+i|ωt) Representing the conditional probability computation result; calculating a joint log-likelihood estimate L of the probability of the word to the context word bySGWhereinthe V represents all the words in the corpus, and the context word isωt+iThe word is omegatAnd c represents a context window size.
In some embodiments of the present application, referring to fig. 6-d, the output word vector reconfiguration module 603 includes:
an output word vector generating module 6033, configured to obtain an output word vector from the original output word vector according to the context word appearing above the word;
a context output word vector generation module 6034, configured to obtain a context output word vector from the original output word vector according to that the context word appears in the context of the word;
a second target output word vector generating module 6035, configured to obtain the target output word vector by using the above output word vector and the below output word vector, where the target output word vector includes: the above output word vector and the below output word vector.
As can be seen from the above description of the embodiments of the present invention, first, a corresponding input word vector is obtained according to a word in a training sample text, a corresponding original output word vector is obtained according to a context word corresponding to the word in the training sample text, then, a target output word vector is generated according to the original output word vector, the target output word vector carries direction information indicating a position direction of the context word relative to the word, and a word vector learning model is trained by using the output word vector and the target output word vector. In the embodiment of the invention, the context of the input word in different position directions is modeled respectively, and the structural information of the context word is integrated into the word vector learning, so the word vector obtained by the word vector model learning can embody the structural information of the context, and the word vector obtained by the word vector learning model provided by the embodiment of the invention can be suitable for semantic and syntactic tasks of natural language processing.
Fig. 7 is a schematic diagram of a server 1100 according to an embodiment of the present invention, where the server 1100 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1122 (e.g., one or more processors) and a memory 1132, and one or more storage media 1130 (e.g., one or more mass storage devices) for storing applications 1142 or data 1144. Memory 1132 and storage media 1130 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 1122 may be provided in communication with the storage medium 1130 to execute a series of instruction operations in the storage medium 1130 on the server 1100.
The server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.
The steps of the word vector training method performed by the server in the above embodiment may be based on the server structure shown in fig. 7.
It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus necessary general hardware, and may also be implemented by special hardware including special integrated circuits, special CPUs, special memories, special components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, the implementation of a software program is a more preferable embodiment for the present invention. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
In summary, the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the above embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the above embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (15)
1. A method for word vector training, comprising:
acquiring corresponding input word vectors according to words in the training sample text;
obtaining corresponding original output word vectors according to the context words corresponding to the words in the training sample text;
generating a target output word vector according to the original output word vector, wherein the target output word vector carries direction information for indicating the position direction of the context word relative to the word;
training a word vector learning model using the output word vectors and the target output word vectors.
2. The method of claim 1, wherein generating a target output word vector from the original output word vector comprises:
generating a direction vector according to the context word appearing above or below the word, the direction vector indicating that the context word appears above or below the word;
obtaining the target output word vector through the original output word vector and the direction vector, wherein the target output word vector comprises: the original output word vector and the direction vector.
3. The method of claim 2, wherein training a word vector learning model using the output word vectors and the target output word vectors comprises:
obtaining an interactive function calculation result according to the input word vector and the direction vector, and performing iterative updating on the input word vector and the direction vector according to the interactive function calculation result;
obtaining a conditional probability calculation result according to the input word vector and the original output word vector, and performing iterative updating on the input word vector and the original output word vector according to the conditional probability calculation result;
and estimating the optimal target of the word vector learning model according to the interactive function calculation result and the conditional probability calculation result.
4. The method of claim 3, wherein obtaining an interactive function computation result according to the input word vector and the direction vector comprises:
calculating an interaction function between the input word vector and the direction vector by, wherein,
wherein, the g (ω)t+i,ωt) Representing the result of said interactive function calculation, said δωt+iMeans that the context word is ωt+iDirection vector of time, said vωtRepresents the word as omegatThe V represents the set of all words in the corpus.
5. The method of claim 3, wherein iteratively updating the input word vector and the direction vector according to the interactive function computation result comprises:
iteratively updating the input word vector and the direction vector by, wherein,
wherein, theRepresents that the updated word is ωtAn input vector of time, saidRepresenting the input vector before update, gamma representing the learning rate, deltaωt+iMeans that the context word is ωt+iDirection vector of time, said vωtRepresents the word as omegatAn input vector of time, said σ (v)ωt Tδωt+i) A position direction prediction value representing the context word relative to the word, andthe D represents a position direction mark value of the context word relative to the word, theRepresents the updated context word as ωt+iA direction vector of time, saidRepresents the context word before update as ωt+iThe direction vector of time.
6. The method according to claim 5, characterized in that the position direction flag value D satisfies the following condition:
wherein when i <0, it means that the position direction of the context word with respect to the word is above, and when i >0, it means that the position direction of the context word with respect to the word is below.
7. The method according to any one of claims 3 to 6, wherein the estimation of the optimal target of the word vector learning model from the interaction function calculation result and the conditional probability calculation result is performed by:
the global log maximum likelihood estimate f (ω) is calculated as followst+i,ωt) Wherein
f(ωt+i,ωt)=p(ωt+iωt)+g(ωt+i,ωt),
wherein, the g (ω)t+i,ωt) Representing the result of said interaction function computation, said p (ω)t+iωt) Representing the conditional probability computation result;
calculating a joint log-likelihood estimate L of the probability of the word to the context word bySGWhich isIn (1),
wherein the V represents all word sets in the corpus, and the context word is omegat+iThe word is omegatAnd c represents a context window size.
8. The method of claim 1, wherein generating a target output word vector from the original output word vector comprises:
acquiring an upper output word vector from the original output word vector according to the situation that the context word appears above the word;
acquiring a context output word vector from the original output word vector according to the situation that the context word appears in the context of the word;
obtaining the target output word vector through the above output word vector and the below output word vector, where the target output word vector includes: the above output word vector and the below output word vector.
9. A server, comprising:
the input word vector acquisition module is used for acquiring corresponding input word vectors according to words in the training sample text;
an output word vector obtaining module, configured to obtain a corresponding original output word vector according to a context word corresponding to the word in the training sample text;
an output word vector reconfiguration module, configured to generate a target output word vector according to the original output word vector, where the target output word vector carries direction information used to indicate a position direction of the context word relative to the word;
and the model training module is used for training a word vector learning model by using the output word vector and the target output word vector.
10. The server according to claim 9, wherein the output word vector reconfiguration module comprises:
a direction vector generation module for generating a direction vector according to the context word appearing above or below the word, wherein the direction vector is used for indicating that the context word appears above or below the word;
a first target output word vector generation module, configured to obtain the target output word vector through the original output word vector and the direction vector, where the target output word vector includes: the original output word vector and the direction vector.
11. The server of claim 10, wherein the model training module comprises:
the interactive function calculation module is used for obtaining an interactive function calculation result according to the input word vector and the direction vector and carrying out iterative update on the input word vector and the direction vector according to the interactive function calculation result;
the conditional probability calculation module is used for obtaining a conditional probability calculation result according to the input word vector and the original output word vector and performing iterative updating on the input word vector and the original output word vector according to the conditional probability calculation result;
and the target estimation module is used for estimating the optimal target of the word vector learning model according to the interactive function calculation result and the conditional probability calculation result.
12. The server according to claim 11, wherein the interaction function computation module is configured to compute the interaction function between the input word vector and the direction vector by,
wherein, the g (ω)t+i,ωt) Representing the result of said interactive function calculation, said δωt+iMeans that the context word is ωt+iDirection vector of time, said vωtRepresents the word as omegatThe V represents the set of all words in the corpus.
13. The server according to claim 11, wherein the interaction function computation module is configured to iteratively update the input word vector and the direction vector by, wherein,
wherein, theRepresents that the updated word is ωtAn input vector of time, saidRepresenting the input vector before update, gamma representing the learning rate, deltaωt+iMeans that the context word is ωt+iDirection vector of time, said vωtRepresents the word as omegatAn input vector of time, said σ (v)ωt Tδωt+i) Representing a position direction predicted value of the context word relative to the word, the D representing a position direction marker value of the context word relative to the word, theRepresents the updated context word as ωt+iA direction vector of time, saidRepresents the context word before update as ωt+iThe direction vector of time.
14. The server according to claim 13, wherein the position and direction flag value D satisfies the following condition:
wherein when i <0, it means that the position direction of the context word with respect to the word is above, and when i >0, it means that the position direction of the context word with respect to the word is below.
15. The server according to any of claims 11 to 14, wherein the target estimation module is configured to calculate a global log maximum likelihood estimate f (ω) byt+i,ωt) Wherein, f (ω)t+i,ωt)=p(ωt+i|ωt)+g(ωt+i,ωt) Said g (ω)t+i,ωt) Representing the result of said interaction function computation, said p (ω)t+i|ωt) Representing the conditional probability computation result; calculating a joint log-likelihood estimate L of the probability of the word to the context word bySGWhereinthe V represents a set of all words in the corpus, and the context word is omegat+iThe word is omegatAnd c represents a context window size.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810299633.5A CN110348001B (en) | 2018-04-04 | 2018-04-04 | Word vector training method and server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810299633.5A CN110348001B (en) | 2018-04-04 | 2018-04-04 | Word vector training method and server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110348001A true CN110348001A (en) | 2019-10-18 |
CN110348001B CN110348001B (en) | 2022-11-25 |
Family
ID=68172691
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810299633.5A Active CN110348001B (en) | 2018-04-04 | 2018-04-04 | Word vector training method and server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110348001B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115293156A (en) * | 2022-09-29 | 2022-11-04 | 四川大学华西医院 | Method and device for extracting prison short message abnormal event, computer equipment and medium |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010013228A1 (en) * | 2008-07-31 | 2010-02-04 | Ginger Software, Inc. | Automatic context sensitive language generation, correction and enhancement using an internet corpus |
CN104067340A (en) * | 2012-01-27 | 2014-09-24 | 三菱电机株式会社 | Method for enhancing speech in mixed signal |
CN104268200A (en) * | 2013-09-22 | 2015-01-07 | 中科嘉速(北京)并行软件有限公司 | Unsupervised named entity semantic disambiguation method based on deep learning |
US20170004208A1 (en) * | 2015-07-04 | 2017-01-05 | Accenture Global Solutions Limited | Generating a domain ontology using word embeddings |
CN106383816A (en) * | 2016-09-26 | 2017-02-08 | 大连民族大学 | Chinese minority region name identification method based on deep learning |
CN107180247A (en) * | 2017-05-19 | 2017-09-19 | 中国人民解放军国防科学技术大学 | Relation grader and its method based on selective attention convolutional neural networks |
CN107239443A (en) * | 2017-05-09 | 2017-10-10 | 清华大学 | The training method and server of a kind of term vector learning model |
CN107239444A (en) * | 2017-05-26 | 2017-10-10 | 华中科技大学 | A kind of term vector training method and system for merging part of speech and positional information |
CN107273355A (en) * | 2017-06-12 | 2017-10-20 | 大连理工大学 | A kind of Chinese word vector generation method based on words joint training |
CN107291693A (en) * | 2017-06-15 | 2017-10-24 | 广州赫炎大数据科技有限公司 | A kind of semantic computation method for improving term vector model |
CN107526834A (en) * | 2017-09-05 | 2017-12-29 | 北京工商大学 | Joint part of speech and the word2vec improved methods of the correlation factor of word order training |
US20180357531A1 (en) * | 2015-11-27 | 2018-12-13 | Devanathan GIRIDHARI | Method for Text Classification and Feature Selection Using Class Vectors and the System Thereof |
-
2018
- 2018-04-04 CN CN201810299633.5A patent/CN110348001B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010013228A1 (en) * | 2008-07-31 | 2010-02-04 | Ginger Software, Inc. | Automatic context sensitive language generation, correction and enhancement using an internet corpus |
CN104067340A (en) * | 2012-01-27 | 2014-09-24 | 三菱电机株式会社 | Method for enhancing speech in mixed signal |
CN104268200A (en) * | 2013-09-22 | 2015-01-07 | 中科嘉速(北京)并行软件有限公司 | Unsupervised named entity semantic disambiguation method based on deep learning |
US20170004208A1 (en) * | 2015-07-04 | 2017-01-05 | Accenture Global Solutions Limited | Generating a domain ontology using word embeddings |
US20180357531A1 (en) * | 2015-11-27 | 2018-12-13 | Devanathan GIRIDHARI | Method for Text Classification and Feature Selection Using Class Vectors and the System Thereof |
CN106383816A (en) * | 2016-09-26 | 2017-02-08 | 大连民族大学 | Chinese minority region name identification method based on deep learning |
CN107239443A (en) * | 2017-05-09 | 2017-10-10 | 清华大学 | The training method and server of a kind of term vector learning model |
CN107180247A (en) * | 2017-05-19 | 2017-09-19 | 中国人民解放军国防科学技术大学 | Relation grader and its method based on selective attention convolutional neural networks |
CN107239444A (en) * | 2017-05-26 | 2017-10-10 | 华中科技大学 | A kind of term vector training method and system for merging part of speech and positional information |
CN107273355A (en) * | 2017-06-12 | 2017-10-20 | 大连理工大学 | A kind of Chinese word vector generation method based on words joint training |
CN107291693A (en) * | 2017-06-15 | 2017-10-24 | 广州赫炎大数据科技有限公司 | A kind of semantic computation method for improving term vector model |
CN107526834A (en) * | 2017-09-05 | 2017-12-29 | 北京工商大学 | Joint part of speech and the word2vec improved methods of the correlation factor of word order training |
Non-Patent Citations (4)
Title |
---|
YAN SONG等: "Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings", 《PROCEEDINGS OF NAACL-HLT 2018》 * |
刘建平PINARD: "word2vec原理(一) CBOW与Skip-Gram模型基础", 《HTTPS://WWW.CNBLOGS.COM/PINARD/P/7160330.HTML》 * |
李博等: "改进的卷积神经网络关系分类方法研究", 《网络出版地址: HTTP://KNS.CNKI.NET/KCMS/DETAIL/11.5602.TP.20170608.1424.012.HTML》 * |
王民等: "基于区域的国画真伪鉴别算法研究", 《计算机工程与应用》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115293156A (en) * | 2022-09-29 | 2022-11-04 | 四川大学华西医院 | Method and device for extracting prison short message abnormal event, computer equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN110348001B (en) | 2022-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Stahlberg | Neural machine translation: A review | |
US20240013055A1 (en) | Adversarial pretraining of machine learning models | |
CN112115700B (en) | Aspect-level emotion analysis method based on dependency syntax tree and deep learning | |
US8566260B2 (en) | Structured prediction model learning apparatus, method, program, and recording medium | |
CN109344413B (en) | Translation processing method, translation processing device, computer equipment and computer readable storage medium | |
CN107220232B (en) | Keyword extraction method and device based on artificial intelligence, equipment and readable medium | |
Subramanya et al. | Efficient graph-based semi-supervised learning of structured tagging models | |
CN106557563B (en) | Query statement recommendation method and device based on artificial intelligence | |
Song et al. | Leveraging dependency forest for neural medical relation extraction | |
US20180365217A1 (en) | Word segmentation method based on artificial intelligence, server and storage medium | |
CN106909537B (en) | One-word polysemous analysis method based on topic model and vector space | |
CN109086265A (en) | A kind of semanteme training method, multi-semantic meaning word disambiguation method in short text | |
CN115983294B (en) | Translation model training method, translation method and translation equipment | |
WO2014073206A1 (en) | Information-processing device and information-processing method | |
CN110162594A (en) | Viewpoint generation method, device and the electronic equipment of text data | |
CN104536979A (en) | Generation method and device of topic model and acquisition method and device of topic distribution | |
CN110457471A (en) | File classification method and device based on A-BiLSTM neural network | |
Pan et al. | A content-based neural reordering model for statistical machine translation | |
CN110348001B (en) | Word vector training method and server | |
Forsati et al. | Hybrid PoS-tagging: A cooperation of evolutionary and statistical approaches | |
Zhang et al. | Multi-document extractive summarization using window-based sentence representation | |
CN113407776A (en) | Label recommendation method and device, training method and medium of label recommendation model | |
CN116011450A (en) | Word segmentation model training method, system, equipment, storage medium and word segmentation method | |
EP4064111A1 (en) | Word embedding with disentangling prior | |
Kirsch et al. | Noise reduction in distant supervision for relation extraction using probabilistic soft logic |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TG01 | Patent term adjustment |