WO2021227935A1 - Training of word vector embedding model - Google Patents

Training of word vector embedding model Download PDF

Info

Publication number
WO2021227935A1
WO2021227935A1 PCT/CN2021/092009 CN2021092009W WO2021227935A1 WO 2021227935 A1 WO2021227935 A1 WO 2021227935A1 CN 2021092009 W CN2021092009 W CN 2021092009W WO 2021227935 A1 WO2021227935 A1 WO 2021227935A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
context
word vector
matrix
vector
Prior art date
Application number
PCT/CN2021/092009
Other languages
French (fr)
Chinese (zh)
Inventor
曹绍升
陈超超
吴郑伟
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021227935A1 publication Critical patent/WO2021227935A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Definitions

  • the embodiments of this specification relate to the application of machine learning technology to the field of text processing, and in particular to a method and device for training a word vector embedding model.
  • Word vector technology solves the problem that computers cannot understand the semantics of human language by mapping words to real number vectors. For example, humans can easily determine that " ⁇ ” and " ⁇ ” are two words with very close semantics, but it is difficult for a computer to describe the semantic similarity of these two words.
  • the word vector algorithm can be used to generate a word vector for each of "cat” and "cat", and then by calculating the similarity between the word vectors, the semantic similarity between "cat” and "cat” can be determined. Therefore, the accuracy of the word vector algorithm determines the semantic comprehension ability of the computer.
  • the current word vector algorithm is relatively single, and it is difficult to meet multiple requirements. For example, while quickly generating word vectors for a large number of words, it is guaranteed that the determined word vectors have high accuracy. Therefore, a solution is needed that can quickly and accurately determine the word vectors of a large number of words.
  • the word vector training framework CBOW is used for reference, and the self-attention mechanism is introduced, which can realize the rapid training of massive word vectors while effectively improving the accuracy of the trained word vectors.
  • a method for training a word vector embedding model includes a first word vector matrix and a second word vector matrix; the method includes multiple iterations of updating, wherein any one of the iterations is updated Including: determining the central word and k context words of the central word from the word sequence corresponding to the training sentence, where k is an integer greater than 1, and determining the central word vector corresponding to the central word according to the first word vector matrix According to the second word vector matrix, determine the k context word vectors corresponding to the k context words; determine the corresponding k attention weights based on the similarity between the k context word vectors; Describe the k attention weights, perform a weighted summation on the k context word vectors to obtain the context representation vector of the central word; calculate the first similarity between the central word vector and the context representation vector; At least with the goal of increasing the first similarity, the first word vector matrix and the second word vector matrix are updated; the first word vector vector
  • the method before the multiple iterative updates, further includes: obtaining training corpus, including multiple training sentences; performing word segmentation on each training sentence, and obtaining the corresponding training corpus according to the word segmentation result According to the vocabulary list, initialize the first word vector matrix and the second word vector matrix, wherein one row or one column of each matrix corresponds to the vocabulary list A word in.
  • obtaining the vocabulary list corresponding to the training corpus according to the word segmentation result includes: performing word frequency statistics according to the word segmentation result to obtain the word frequencies of a plurality of different word segmentation; from the plurality of different word segmentation The low-frequency word segmentation whose word frequency is lower than the predetermined threshold is removed to obtain the vocabulary list.
  • determining the central word and the k context words of the central word from the word sequence corresponding to the training sentence includes: using a sliding window with a preset width, sliding along the word sequence, and changing any time Next, the word corresponding to the center position of the sliding window is determined as the central word, and words in the sliding window other than the central word are used as the k contextual words.
  • determining the corresponding k attention weights based on the similarity between the k context word vectors includes: determining a k-order similarity square matrix based on the k context word vectors, The element in the i-th row and the j-th column represents the similarity between the i-th context word vector and the j-th context word vector, where i and j are positive integers not greater than k; for the k-th order similarity square matrix Each row of is normalized to obtain a k-order self-attention score square matrix; the average value of each column in the k-order self-attention score square matrix is respectively calculated to obtain the k self-attention weights.
  • determining the k-th order similarity square matrix based on the k context word vectors includes: calculating the dot product between the i-th context word vector and the j-th context word vector as An element in the i-th row and j-th column in the k-th order similarity square matrix.
  • the normalization process is performed on each row in the k-order similarity square matrix to obtain the k-order self-attention score square matrix, which includes: using a softmax function to perform the normalization on each row separately Normalization processing is performed to obtain the k-order self-attention score square matrix.
  • calculating the first similarity between the central word vector and the context representation vector includes: calculating the dot product between the central word vector and the context representation vector as the The first degree of similarity.
  • updating the first word vector matrix and the second word vector matrix with at least increasing the first degree of similarity includes: randomly extracting from the first word vector matrix A certain first word vector; calculate the second similarity between the certain first word vector and the context representation vector; aim to increase the first similarity and reduce the second similarity , Updating the first word vector matrix and the second word vector matrix.
  • an apparatus for training a word vector embedding model includes a first word vector matrix and a second word vector matrix; the device includes an iterative update unit for performing multiple iterations Update, the iterative update unit performs any one of the iterative updates through the following modules:
  • the word determination module is configured to determine the central word and k context words of the central word from the word sequence corresponding to the training sentence, where k is an integer greater than 1.
  • the word vector determining module is configured to determine the central word vector corresponding to the central word according to the first word vector matrix; and to determine the k context word vectors corresponding to the k context words according to the second word vector matrix.
  • the weight determination module is configured to determine k corresponding attention weights based on the similarity between the k context word vectors.
  • the weighted summation module is configured to use the k attention weights to perform a weighted summation on the k context word vectors to obtain the context representation vector of the central word.
  • the similarity calculation module is configured to calculate the first similarity between the central word vector and the context representation vector.
  • the matrix update module is configured to update the first word vector matrix and the second word vector matrix with at least increasing the first degree of similarity; the first word vector matrix after multiple iterations update is used The target word vector for querying the target word.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in the first aspect.
  • a computing device including a memory and a processor, characterized in that executable code is stored in the memory, and when the processor executes the executable code, the implementation described in the first aspect is implemented.
  • the self-attention weight of multiple context word vectors is determined by introducing a self-attention mechanism, so as to realize the characterization of the mutual influence and internal association between multiple context word vectors. Furthermore, the weighted vector of multiple context word vectors is obtained as the context representation vector of the central word. In this way, compared with directly obtaining the average vector of multiple context word vectors as the context representation vector, the accuracy of the context representation vector can be improved, thereby Improve the accuracy of the updated word vector matrix, thereby improving the accuracy of the finally determined word embedding vector.
  • Fig. 1 shows a schematic diagram of two word vector matrices in a word embedding model according to an embodiment
  • FIG. 2 shows a schematic diagram of selecting a center word and a context word based on a word sequence according to an embodiment
  • FIG. 3 shows a schematic diagram of the architecture of training a word embedding model based on the selected words in FIG. 2 and the word vector matrix in FIG. 1 according to an embodiment
  • FIG. 4 shows a schematic diagram of a training corpus processing flow in a method for training a word vector embedding model according to an embodiment
  • Fig. 5 shows a schematic diagram of an iterative update process in a method for training a word vector embedding model according to an embodiment
  • Fig. 6 shows a schematic structural diagram of an apparatus for training a word vector embedding model according to an embodiment.
  • the word vector algorithm is used to map a word to a fixed-dimensional vector, so that the value of the vector can represent the semantic information of the word.
  • Skigram there are two common frameworks for training word vectors, namely Skigram and CBOW (Continuous Bag-of-Words Model).
  • CBOW Continuous Bag-of-Words Model
  • the accuracy of word vectors determined based on the Skigram framework is higher, but the training speed will be many times slower.
  • the CBOW framework is more needed, but the accuracy of the word vectors determined based on it is limited.
  • the inventor proposes a method for training word vector embedding models.
  • it can quickly complete the word vector training of super-large-scale texts while effectively improving the trained The accuracy of the word vector.
  • the above-mentioned word vector embedding model includes two word vector matrices established for the same multiple words, which are called the first word vector matrix and the second word vector matrix to facilitate distinguishing and description.
  • FIG. 1 shows a schematic diagram of two word vector matrices in a word embedding model according to an embodiment, and FIG. 1 shows a first word vector matrix corresponding to N (an integer greater than 1) words W N*M and the second word vector matrix C N*M .
  • N an integer greater than 1
  • the first word vector matrix and the second word vector matrix are updated, and further, after a predetermined number of iterations or until convergence Then, the N word vectors contained in the first word vector matrix updated in the last iteration are determined as the final word vectors of the N words.
  • updating any one of the foregoing multiple iterations may include: first, from the word sequence corresponding to the training sentence, select the central word and the corresponding multiple context words, for example, the words shown in FIG. 2 Select the center word t and 2b context words in the sequence, and the 2b context words correspond to the set ⁇ context word i
  • the first word vector of the word is used as the central word vector, and multiple second word vectors of multiple context words are obtained from the second word vector matrix as multiple context word vectors; then, the self-attention mechanism is introduced, based on multiple A context word vector is used to score multiple context words for self-attention, and then the weight of each context word among them is obtained, and the weight is used to perform a weighted summation on multiple context word vectors to obtain the context representation vector for the central word, for example As shown in Figure 3, based on 2b context word vectors, 2b self-attention weights are determined,
  • the central word vector w t and the context representation vector are used c t ′ calculates the training loss, which is used to adjust the two word vector matrices. In this way, iterative update of the two word vector matrices can be achieved.
  • the self-attention weights of multiple context word vectors are determined, and the mutual influence and internal association between multiple context word vectors are described, and then the value of multiple context word vectors is obtained.
  • the weighted vector is used as the context representation vector of the central word.
  • the execution subject of the foregoing method may be any device, equipment, system, server cluster, etc. that has computing and processing capabilities.
  • the above method firstly, it includes processing the training corpus to establish a vocabulary list, and then initialize the corresponding first word vector matrix and the second word vector matrix, and determine multiple word sequences corresponding to multiple training sentences for subsequent use
  • the first word vector matrix and the second word vector matrix are updated iteratively.
  • Fig. 4 shows a schematic diagram of a training corpus processing flow in a method for training a word vector embedding model according to an embodiment.
  • the processing flow of the training corpus includes the following steps:
  • Step S410 Obtain training corpus, which includes multiple training sentences; Step S420, perform word segmentation on each training sentence, and obtain a vocabulary table corresponding to the training corpus and the word sequence corresponding to each training sentence according to the word segmentation result; Step S430: Initialize the first word vector matrix and the second word vector matrix according to the vocabulary list, wherein one row or one column of each matrix corresponds to a word in the vocabulary list.
  • step S410 a training corpus is obtained, which includes a plurality of training sentences.
  • a large amount of text can be crawled from the website as training corpus.
  • electronic texts of reference books such as dictionaries, can be obtained as training corpus.
  • sentence segmentation processing may be performed on the training corpus to obtain the above-mentioned multiple training sentences.
  • the sentence segmentation processing can be based on common punctuation marks, such as comma, period, semicolon, exclamation point, etc., to segment the text.
  • the symbols in a text can be removed, and the retained text can be used as a training sentence.
  • tags, tags, spaces and other symbols can be removed, and the remaining text can be used as the corresponding training sentence.
  • step S420 word segmentation is performed on multiple training sentences, and a vocabulary list and multiple word sequences are determined according to the word segmentation result.
  • word segmentation is the process of recombining consecutive word sequences into word sequences according to certain specifications.
  • the word segmentation in this step can be achieved by using existing word segmentation methods or word segmentation tools.
  • the word segmentation methods may include forward maximum matching, minimum segmentation, and word segmentation based on N-gram model (N-gram), and the word segmentation tools may include THULAC lexical tools, NLPIR word segmentation systems, and the like.
  • the above-mentioned word segmentation results include multiple word segmentation obtained after word segmentation of each training sentence. Based on this, a number of different word segmentation can be determined, and the above vocabulary list can be constructed. At the same time, by performing word segmentation on each training sentence, the corresponding word segmentation sequence can be obtained as a word sequence, thereby obtaining multiple word sequences corresponding to multiple training sentences.
  • word frequency statistics can be performed to obtain the word frequencies of multiple different word segments, and the low-frequency segmentation words whose word frequency is lower than a predetermined threshold are removed, and the remaining word segmentation is used to construct the above Glossary.
  • the above word frequency refers to the number of occurrences of words
  • the predetermined threshold can be set according to actual needs, such as 10 or 20.
  • different word segments may be ranked according to word frequency, and then the last few words (such as 5 or 10 from the bottom) are discarded, and the above-mentioned vocabulary list is constructed using the remaining word segments.
  • the vocabulary and multiple word sequences can be determined. It needs to be understood that the words in the word sequence all exist in the vocabulary, and the words in the vocabulary may appear in several word sequences, or they may appear multiple times in a certain word sequence.
  • step S430 the first word vector matrix and the second word vector matrix are initialized according to the vocabulary list, wherein one row or one column of each matrix corresponds to a word in the vocabulary list.
  • each word in the vocabulary can be mapped to a fixed-dimensional vector as the word vector of each word. Therefore, the word vector corresponding to each word in the vocabulary can jointly form a word vector matrix.
  • the vocabulary includes N words, and mapping each word to an M-dimensional vector, an N*M-dimensional matrix can be obtained.
  • two word vector matrices are established, namely the first word vector matrix W N*M and the second word vector matrix C N*M .
  • a word vector can be used as a row vector or a column vector of a matrix. Accordingly, a row or a column in the matrix corresponds to a word in the vocabulary.
  • the initialization can be completed.
  • a random algorithm may be used to initially assign values to the matrix elements.
  • the specific values of matrix elements can be arbitrarily designated by the staff. For example, for W N*M and C N*M shown in Fig. 1, the i-th (i ⁇ [1,N ]) The i-th element of the row is set to 1, and the remaining elements are set to 0. It should be understood that the initialized first word vector matrix and the second word vector matrix may be the same or different, and there will usually be differences between the two in the subsequent iterative update process. In this way, the initialization of the two word vector matrices can be realized.
  • the word sequence corresponding to each training sentence can be determined based on the training corpus, and at the same time, the vocabulary can be determined, and then the construction of the first word vector matrix and the second word vector matrix can be completed And initialization.
  • FIG. 5 shows a schematic diagram of an iterative update process in a method for training a word vector embedding model according to an embodiment.
  • any one iteration update can include the following steps:
  • Step S510 Determine the central word and k context words of the central word from the word sequence corresponding to the training sentence, where k is an integer greater than 1;
  • Step S520 determine the central word corresponding to the first word vector matrix According to the second word vector matrix, determine the k context word vectors corresponding to the k context words;
  • step S530 determine the corresponding k context word vectors based on the similarity between the k context word vectors Attention weights;
  • step S540 use the k attention weights to perform a weighted summation on the k context word vectors to obtain the context representation vector of the central word;
  • step S550 calculate the central word vector and
  • the context represents the first degree of similarity between vectors;
  • step S560 at least aiming at increasing the first degree of similarity, updating the first word vector matrix and the second word vector matrix; the multiple times The first word vector matrix after iterative update is used to query the target word vector of the target word.
  • step S510 a central word and k context words of the central word are determined from the word sequence corresponding to the training sentence.
  • this step may include: using a sliding window with a preset width, sliding along the word sequence, determining the word corresponding to the center position of the sliding window at any moment as the central word, and Words in the sliding window other than the central word are regarded as the k context words.
  • the width of the sliding window can be set to 2b+1.
  • the center position of the sliding window at any time can be set to the b-th word in 2b+1.
  • each word sequence in the word sequence can be used as the central word in turn.
  • the sliding window uses the first word of the word sequence as the center of the sliding window , At this time, after the first word is used as the central word, there is no previous word in the word sequence.
  • the preset vector can be used to supplement, for example, b preset vectors are used as the b of the first word.
  • the word vector of the above words can be deduced by analogy for the rest. In this way, the central word and the corresponding k context words can be determined by the way of taking words through the sliding window.
  • a word may be randomly selected from the word sequence as the head word, and then k words adjacent to it may be selected as k context words. For example, assuming that k is 3, at this time, after selecting the central word, one word that is adjacent to it before and two words that are adjacent to it can be selected as its three contextual words.
  • the head word and its k context words can be determined based on the word sequence.
  • step S520 according to the first word vector matrix, determine the central word vector corresponding to the central word; and, according to the second word vector matrix, determine k context word vectors corresponding to the k context words.
  • the first word vector matrix and the second word vector matrix are initialized based on the vocabulary list, and the first mapping relationship between the words in the word list and the first word vector in the first word vector matrix, And the second mapping relationship with the second word vector in the second word vector matrix is established accordingly.
  • the center word and context word are both words in the vocabulary. Therefore, the first word vector corresponding to the center word can be determined according to the first mapping relationship as the above center word vector, and k is determined according to the second mapping relationship.
  • the k second word vectors corresponding to each context word are used as the above k context word vectors.
  • the central word can be searched in the vocabulary shown in FIG.
  • the central word vector corresponding to the central word and the k context word vectors corresponding to the k context words can be determined. Then, in step S530, based on the similarity between the k context word vectors among them, the corresponding k attention weights are determined.
  • this step may include: firstly, based on the k context word vectors, determine a k-order similarity square matrix; then, normalize each row in the k-order similarity square matrix respectively After processing, the k-order self-attention score square matrix is obtained; then, the average value of each column in the k-order self-attention score square matrix is respectively calculated to obtain the k self-attention weights.
  • the element in the i-th row and j-th column represents the similarity between the i-th context word vector and the j-th context word vector. It needs to be understood that i and j are not greater than k Is a positive integer.
  • the dot product, Euclidean distance, or cosine distance between the two word vectors can be calculated as the similarity between the two, and then determined as the i-th row in the k-order similarity square matrix Element in column j.
  • the softmax function may be used to perform normalization processing on each row in the k-order similarity square matrix. In this way, a square matrix of k-order self-attention scores can be obtained, and then the average value of the scores in each column can be obtained as the corresponding autonomy weight. It needs to be understood that each column in the k column corresponds to each context word vector, so that k self-attention weights corresponding to the k context word vectors can be obtained.
  • this step may include: firstly, calculating the similarity between any two word vectors in the k context word vectors, so that Similarity, where C is the symbol in permutation and combination, Indicates the number of groups that can be obtained by taking any two of k different elements as a group; then, for any one of the context word vectors, calculate the sum of the similarity between other word vectors and the k sums Value; Then, normalize the k sum values to obtain the k self-attention weights. It should be noted that, for the description of the similarity calculation and normalization processing, refer to the relevant description in the foregoing, and will not be repeated.
  • step S540 the k attention weights are used to perform a weighted summation on the k context word vectors to obtain the context representation vector of the central word.
  • Figure 3 shows 2b context word vectors and 2b self-attention weights, from which the context representation vector of the central word can be calculated:
  • subscript i in formula (1) is based on the order of words in the sliding window in Figure 2, or in other words, based on k context words, which is different from the vocabulary in Figure 1 Word subscripts.
  • t is the subscript of the head word, i ⁇ [tb, t+b]- ⁇ t ⁇ means that except for the head word, i points to each context word in turn;
  • c i means the i-th in k context word vectors
  • Context word vectors, a i represents the corresponding k-th self-attention weight.
  • the context representation vector of the central word can be obtained.
  • the first similarity between the central word vector and the context representation vector is calculated.
  • the dot product, or cosine distance or Euclidean distance, etc. between the central word vector and the context representation vector may be calculated as the first similarity.
  • step S560 at least with the goal of increasing the first degree of similarity, the first word vector matrix and the second word vector matrix are updated.
  • this step may include: randomly extracting a certain first word vector from the first word vector matrix, and calculating the second difference between the certain first word vector and the context representation vector Similarity; then, with the goal of increasing the first similarity and reducing the second similarity, the first word vector matrix and the second word vector matrix are updated. In this way, the accuracy of the updated word vector matrix can be further improved. It should be noted that for the calculation of the second similarity, the description of calculating the similarity between the vectors can be used, which will not be repeated here.
  • the training loss can be calculated using the following formula:
  • L represents the above training loss
  • w t represents the above central word vector
  • c t ′ represents the context representation vector of the above central word
  • w t ⁇ c t ′ represents the point between w t and c t ′
  • w r represents a certain first word vector drawn randomly above
  • w r ⁇ c t ′ represents the dot product between w t and c t ′
  • is a hyperparameter, for example, it can be set to 0.01, etc.
  • represents activation functions commonly used in neural networks, such as tanh function or sigmoid function. In this way, the training loss can be determined based on the first similarity and the second similarity, and then the two word vector matrices can be updated according to the training loss.
  • updating the two word vector matrices according to the training loss may include: determining the loss gradients of related elements in the two word vector matrices according to the training loss, and then using the current values of each element in the related elements to subtract the corresponding loss gradients Multiply the product with the learning step size (super parameter, for example, set to 0.05) to obtain the updated element value, thereby realizing the update of the two word vector matrices.
  • the update of the first word vector matrix and the second word vector matrix can be realized.
  • the above steps S510 to S560 describe the process of any one iteration update.
  • multiple iterative updates can be achieved, and the first word vector matrix after multiple iterative updates is used as the word vector query matrix corresponding to the vocabulary to query the target word vector of the target word .
  • a certain search engine after a certain search engine receives a user's query instruction for a target word, it can query the target word vector corresponding to the target word from the word vector query matrix, and then according to the target word vector, from The content database determines the relevant content used for feedback to the user.
  • the self-attention weights of multiple context word vectors are determined, and the mutual influence and interaction between multiple context word vectors are realized.
  • To characterize the internal association and then obtain the weighted vector of multiple context word vectors as the context representation vector of the central word.
  • the accuracy of the context representation vector can be improved, thereby improving the accuracy of the updated word vector matrix, and thereby improving the final word embedding. The accuracy of the vector.
  • FIG. 6 shows a schematic diagram of the device structure for training a word vector embedding model according to an embodiment.
  • the word vector embedding model includes a first word vector matrix and a second word vector matrix. Capable of computing node or server cluster implementation.
  • the device 600 includes an iterative update unit 610, configured to perform multiple iterative updates, and the iterative update unit performs any one of the iterative updates through the following modules:
  • the word determination module 611 is configured to determine the central word and k context words of the central word from the word sequence corresponding to the training sentence, where k is an integer greater than 1.
  • the word vector determining module 612 is configured to determine the central word vector corresponding to the central word according to the first word vector matrix; and to determine the k context word vectors corresponding to the k context words according to the second word vector matrix.
  • the weight determination module 613 is configured to determine k corresponding attention weights based on the similarity between the k context word vectors.
  • the weighted summation module 614 is configured to use the k attention weights to perform a weighted summation on the k context word vectors to obtain the context representation vector of the central word.
  • the similarity calculation module 615 is configured to calculate the first similarity between the central word vector and the context representation vector.
  • the matrix update module 616 is configured to update the first word vector matrix and the second word vector matrix with at least increasing the first degree of similarity; the first word vector matrix after multiple iterations of the update The target word vector used to query the target word.
  • the above device 600 further includes: a corpus acquisition unit 620, configured to acquire training corpus, including multiple training sentences; and a word segmentation unit 630, configured to segment each training sentence, and obtain all the training sentences according to the word segmentation result.
  • the vocabulary table corresponding to the training corpus, and the word sequence corresponding to each training sentence; the initialization unit 640 is configured to initialize the first word vector matrix and the second word vector matrix according to the vocabulary table, wherein each matrix A row or column of corresponds to a word in the vocabulary.
  • the word segmentation unit 630 is specifically configured to: perform word frequency statistics according to the word segmentation result to obtain the word frequencies of a plurality of different word segmentation; remove low-frequency word segmentation whose word frequency is lower than a predetermined threshold from the plurality of different word segmentation , Get the vocabulary list.
  • the word determining module 611 is specifically configured to: adopt a sliding window with a preset width, slide along the word sequence, and determine the word corresponding to the center position of the sliding window at any time as the center Words, words other than the central word in the sliding window are regarded as the k contextual words.
  • the weight determining module 613 is specifically configured to determine a k-th order similarity square matrix based on the k context word vectors, where the element in the i-th row and the j-th column represents the i-th context word vector and the j-th context word vector.
  • the similarity calculation module 615 is specifically configured to calculate the dot product between the central word vector and the context representation vector as the first similarity.
  • the matrix update module 616 is specifically configured to: randomly extract a certain first word vector from the first word vector matrix; calculate the relationship between the certain first word vector and the context representation vector The second degree of similarity; with the goal of increasing the first degree of similarity and reducing the second degree of similarity, the first word vector matrix and the second word vector matrix are updated.
  • the self-attention weights of multiple context word vectors are determined, and the mutual influence and interaction between multiple context word vectors are realized.
  • To characterize the internal association and then obtain the weighted vector of multiple context word vectors as the context representation vector of the central word.
  • the accuracy of the context representation vector can be improved, thereby improving the accuracy of the updated word vector matrix, and thereby improving the final word embedding. The accuracy of the vector.
  • a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed in a computer, the computer is caused to execute the description in conjunction with FIG. 4 or FIG. 5 method.
  • a computing device including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, a combination of FIG. 4 or FIG. 5 is implemented. The described method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

A method for training a word vector embedding model. Said method comprises multiple iterative updating, wherein any one of the iterative updating comprises: firstly determining a central word and a plurality of corresponding context words from a word sequence corresponding to a training sentence; then determining, according to a first word vector matrix, a central word vector corresponding to the central word, and determining, according to a second word vector matrix, a plurality of context word vectors corresponding to the plurality of context words; then, determining, on the basis of the similarity between the plurality of context word vectors, a plurality of attention weights corresponding thereto; using the plurality of attention weights to perform weighted summation on the plurality of context word vectors to obtain a context representation vector of the central word; then, calculating a first similarity between the central word vector and the context representation vector; and finally, with at least increasing the first similarity as a target, updating the first word vector matrix and the second word vector matrix.

Description

训练词向量嵌入模型Training word vector embedding model 技术领域Technical field
本说明书实施例涉及将机器学习技术应用到文本处理领域,具体地,涉及一种训练词向量嵌入模型的方法及装置。The embodiments of this specification relate to the application of machine learning technology to the field of text processing, and in particular to a method and device for training a word vector embedding model.
背景技术Background technique
词向量技术通过将词语映射为实数向量,解决计算机难以理解人类语言语义的问题。比如说,人可以轻易地判断出“猫”和“猫咪”是两个语义很接近的词语,但计算机很难刻画出这两个词语的语义相似度。对此,可以利用词向量算法为“猫”和“猫咪”各生成一个词向量,进而通过计算词向量之间的相似度,确定“猫”和“猫咪”之间的语义相似度。因此,词向量算法的准确度决定了计算机的语义理解能力。Word vector technology solves the problem that computers cannot understand the semantics of human language by mapping words to real number vectors. For example, humans can easily determine that "猫" and "猫咪" are two words with very close semantics, but it is difficult for a computer to describe the semantic similarity of these two words. In this regard, the word vector algorithm can be used to generate a word vector for each of "cat" and "cat", and then by calculating the similarity between the word vectors, the semantic similarity between "cat" and "cat" can be determined. Therefore, the accuracy of the word vector algorithm determines the semantic comprehension ability of the computer.
然而,目前的词向量算法较为单一,难以满足多种需求,例如,在为大批量词语快速生成词向量的同时,保证确定出的词向量具有较高的准确度。因此,需要一种方案,可以快速、准确地确定出海量词语的词向量。However, the current word vector algorithm is relatively single, and it is difficult to meet multiple requirements. For example, while quickly generating word vectors for a large number of words, it is guaranteed that the determined word vectors have high accuracy. Therefore, a solution is needed that can quickly and accurately determine the word vectors of a large number of words.
发明内容Summary of the invention
在本说明书描述的训练词向量嵌入模型的方法中,借鉴词向量训练框架CBOW,并引入自注意力机制,可以实现在快速训练出海量词向量的同时,有效提高训练出的词向量的准确度。In the method of training the word vector embedding model described in this manual, the word vector training framework CBOW is used for reference, and the self-attention mechanism is introduced, which can realize the rapid training of massive word vectors while effectively improving the accuracy of the trained word vectors. .
根据第一方面,提供一种训练词向量嵌入模型的方法,所述词向量嵌入模型包括,第一词向量矩阵和第二词向量矩阵;所述方法包括多次迭代更新,其中任一次迭代更新包括:从训练语句对应的词语序列中,确定中心词语和所述中心词语的k个上下文词语,其中k为大于1的整数;根据第一词向量矩阵,确定所述中心词语对应的中心词向量;根据第二词向量矩阵,确定所述k个上下文词语对应的k个上下文词向量;基于所述k个上下文词向量彼此之间的相似度,确定其对应的k个注意力权重;利用所述k个注意力权重,对所述k个上下文词向量进行加权求和,得到所述中心词语的上下文表示向量;计算所述中心词向量与所述上下文表示向量之间的第一相似度;至少以增大所述第一相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵;所述多次迭代更新后的第一词向量矩阵用于查询目标词语的目标词向量。According to a first aspect, a method for training a word vector embedding model is provided, the word vector embedding model includes a first word vector matrix and a second word vector matrix; the method includes multiple iterations of updating, wherein any one of the iterations is updated Including: determining the central word and k context words of the central word from the word sequence corresponding to the training sentence, where k is an integer greater than 1, and determining the central word vector corresponding to the central word according to the first word vector matrix According to the second word vector matrix, determine the k context word vectors corresponding to the k context words; determine the corresponding k attention weights based on the similarity between the k context word vectors; Describe the k attention weights, perform a weighted summation on the k context word vectors to obtain the context representation vector of the central word; calculate the first similarity between the central word vector and the context representation vector; At least with the goal of increasing the first similarity, the first word vector matrix and the second word vector matrix are updated; the first word vector matrix after multiple iterations is used to query the target of the target word Word vector.
在一个实施例中,在所述多次迭代更新之前,所述方法还包括:获取训练语料,其中包括多条训练语句;对各条训练语句进行分词,根据分词结果,得到所述训练语料对应的词汇表,以及各训练语句对应的所述词语序列;根据所述词汇表,初始化所述第一词向量矩阵和第二词向量矩阵,其中每个矩阵的一行或一列对应于所述词汇表中的一个词语。In one embodiment, before the multiple iterative updates, the method further includes: obtaining training corpus, including multiple training sentences; performing word segmentation on each training sentence, and obtaining the corresponding training corpus according to the word segmentation result According to the vocabulary list, initialize the first word vector matrix and the second word vector matrix, wherein one row or one column of each matrix corresponds to the vocabulary list A word in.
在一个具体的实施例中,其中根据分词结果,得到所述训练语料对应的词汇表,包括:根据所述分词结果进行词频统计,得到多个不同分词的词频;从所述多个不同分词中去除词频低于预定阈值的低频分词,得到所述词汇表。In a specific embodiment, obtaining the vocabulary list corresponding to the training corpus according to the word segmentation result includes: performing word frequency statistics according to the word segmentation result to obtain the word frequencies of a plurality of different word segmentation; from the plurality of different word segmentation The low-frequency word segmentation whose word frequency is lower than the predetermined threshold is removed to obtain the vocabulary list.
在一个实施例中,从训练语句对应的词语序列中,确定中心词语和所述中心词语的k个上下文词语,包括:采用预设宽度的滑窗,沿所述词语序列滑动,将任一时刻下,所述滑窗中心位置对应的词语确定为所述中心词语,将所述滑窗内除所述中心词语之外的词语,作为所述k个上下文词语。In one embodiment, determining the central word and the k context words of the central word from the word sequence corresponding to the training sentence includes: using a sliding window with a preset width, sliding along the word sequence, and changing any time Next, the word corresponding to the center position of the sliding window is determined as the central word, and words in the sliding window other than the central word are used as the k contextual words.
在一个实施例中,基于所述k个上下文词向量彼此之间的相似度,确定其对应的k个注意力权重,包括:基于所述k个上下文词向量,确定k阶相似度方阵,其中第i行第j列元素表示第i个上下文词向量与第j个上下文词向量之间的相似度,其中i和j为不大于k的正整数;对所述k阶相似度方阵中的各行分别进行归一化处理,得到k阶自注意力分数方阵;分别求取所述k阶自注意力分数方阵中各列的平均值,得到所述k个自注意力权重。In one embodiment, determining the corresponding k attention weights based on the similarity between the k context word vectors includes: determining a k-order similarity square matrix based on the k context word vectors, The element in the i-th row and the j-th column represents the similarity between the i-th context word vector and the j-th context word vector, where i and j are positive integers not greater than k; for the k-th order similarity square matrix Each row of is normalized to obtain a k-order self-attention score square matrix; the average value of each column in the k-order self-attention score square matrix is respectively calculated to obtain the k self-attention weights.
在一个具体的实施例中,基于所述k个上下文词向量,确定k阶相似度方阵,包括:计算所述第i个上下文词向量和第j个上下文词向量之间的点积,作为所述k阶相似度方阵中的第i行第j列元素。In a specific embodiment, determining the k-th order similarity square matrix based on the k context word vectors includes: calculating the dot product between the i-th context word vector and the j-th context word vector as An element in the i-th row and j-th column in the k-th order similarity square matrix.
在另一个具体的实施例中,其中对所述k阶相似度方阵中的各行分别进行归一化处理,得到k阶自注意力分数方阵,包括:利用softmax函数对所述各行分别进行归一化处理,得到所述k阶自注意力分数方阵。In another specific embodiment, the normalization process is performed on each row in the k-order similarity square matrix to obtain the k-order self-attention score square matrix, which includes: using a softmax function to perform the normalization on each row separately Normalization processing is performed to obtain the k-order self-attention score square matrix.
在一个实施例中,其中计算所述中心词向量与所述上下文表示向量之间的第一相似度,包括:计算所述中心词向量和所述上下文表示向量之间的点积,作为所述第一相似度。In one embodiment, calculating the first similarity between the central word vector and the context representation vector includes: calculating the dot product between the central word vector and the context representation vector as the The first degree of similarity.
在一个实施例中,其中至少以增大所述第一相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵,包括:从所述第一词向量矩阵中随机抽取某个第一词向量; 计算所述某个第一词向量和所述上下文表示向量之间的第二相似度;以增大所述第一相似度和减小所述第二相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵。In an embodiment, updating the first word vector matrix and the second word vector matrix with at least increasing the first degree of similarity includes: randomly extracting from the first word vector matrix A certain first word vector; calculate the second similarity between the certain first word vector and the context representation vector; aim to increase the first similarity and reduce the second similarity , Updating the first word vector matrix and the second word vector matrix.
根据第二方面,提供一种训练词向量嵌入模型的装置,所述词向量嵌入模型包括,第一词向量矩阵和第二词向量矩阵;所述装置包括迭代更新单元,用于执行多次迭代更新,所述迭代更新单元通过以下模块执行其中任一次迭代更新:According to a second aspect, an apparatus for training a word vector embedding model is provided, the word vector embedding model includes a first word vector matrix and a second word vector matrix; the device includes an iterative update unit for performing multiple iterations Update, the iterative update unit performs any one of the iterative updates through the following modules:
词语确定模块,配置为从训练语句对应的词语序列中,确定中心词语和所述中心词语的k个上下文词语,其中k为大于1的整数。词向量确定模块,配置为根据第一词向量矩阵,确定所述中心词语对应的中心词向量;根据第二词向量矩阵,确定所述k个上下文词语对应的k个上下文词向量。权重确定模块,配置为基于所述k个上下文词向量彼此之间的相似度,确定其对应的k个注意力权重。加权求和模块,配置为利用所述k个注意力权重,对所述k个上下文词向量进行加权求和,得到所述中心词语的上下文表示向量。相似度计算模块,配置为计算所述中心词向量与所述上下文表示向量之间的第一相似度。矩阵更新模块,配置为至少以增大所述第一相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵;所述多次迭代更新后的第一词向量矩阵用于查询目标词语的目标词向量。The word determination module is configured to determine the central word and k context words of the central word from the word sequence corresponding to the training sentence, where k is an integer greater than 1. The word vector determining module is configured to determine the central word vector corresponding to the central word according to the first word vector matrix; and to determine the k context word vectors corresponding to the k context words according to the second word vector matrix. The weight determination module is configured to determine k corresponding attention weights based on the similarity between the k context word vectors. The weighted summation module is configured to use the k attention weights to perform a weighted summation on the k context word vectors to obtain the context representation vector of the central word. The similarity calculation module is configured to calculate the first similarity between the central word vector and the context representation vector. The matrix update module is configured to update the first word vector matrix and the second word vector matrix with at least increasing the first degree of similarity; the first word vector matrix after multiple iterations update is used The target word vector for querying the target word.
根据第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面所描述的方法。According to a third aspect, there is provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in the first aspect.
根据第四方面,提供了一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面所描述的方法。According to a fourth aspect, there is provided a computing device, including a memory and a processor, characterized in that executable code is stored in the memory, and when the processor executes the executable code, the implementation described in the first aspect is implemented. Methods.
在本说明书实施例披露的上述方法和装置中,通过引入自注意力机制,确定多个上下文词向量的自注意力权重,实现对多个上下文词向量之间的相互影响、内在关联的刻画,进而求取多个上下文词向量的加权向量作为中心词的上下文表示向量,如此,相较于直接求取多个上下文词向量的平均向量作为上下文表示向量,可以提高上下文表示向量的准确度,从而提高更新后的词向量矩阵的准确度,进而提高最终确定出的词嵌入向量的准确度。In the above-mentioned method and device disclosed in the embodiments of this specification, the self-attention weight of multiple context word vectors is determined by introducing a self-attention mechanism, so as to realize the characterization of the mutual influence and internal association between multiple context word vectors. Furthermore, the weighted vector of multiple context word vectors is obtained as the context representation vector of the central word. In this way, compared with directly obtaining the average vector of multiple context word vectors as the context representation vector, the accuracy of the context representation vector can be improved, thereby Improve the accuracy of the updated word vector matrix, thereby improving the accuracy of the finally determined word embedding vector.
附图说明Description of the drawings
为了更清楚地说明本说明书披露的多个实施例的技术方案,下面将对实施例描述中 所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书披露的多个实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly describe the technical solutions of the multiple embodiments disclosed in this specification, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only disclosed in this specification. For the multiple embodiments, those of ordinary skill in the art can obtain other drawings based on these drawings without creative work.
图1示出根据一个实施例的词嵌入模型中两个词向量矩阵的示意图;Fig. 1 shows a schematic diagram of two word vector matrices in a word embedding model according to an embodiment;
图2示出根据一个实施例的基于词语序列选取中心词和上下文词的示意图;FIG. 2 shows a schematic diagram of selecting a center word and a context word based on a word sequence according to an embodiment;
图3示出根据一个实施例的基于图2中的选取词和图1中的词向量矩阵训练词嵌入模型的架构示意图;FIG. 3 shows a schematic diagram of the architecture of training a word embedding model based on the selected words in FIG. 2 and the word vector matrix in FIG. 1 according to an embodiment;
图4示出根据一个实施例的训练词向量嵌入模型的方法中的训练语料处理流程示意图;FIG. 4 shows a schematic diagram of a training corpus processing flow in a method for training a word vector embedding model according to an embodiment;
图5示出根据一个实施例的训练词向量嵌入模型的方法中的迭代更新流程示意图;Fig. 5 shows a schematic diagram of an iterative update process in a method for training a word vector embedding model according to an embodiment;
图6示出根据一个实施例的训练词向量嵌入模型的装置结构示意图。Fig. 6 shows a schematic structural diagram of an apparatus for training a word vector embedding model according to an embodiment.
具体实施方式Detailed ways
下面结合附图,对本说明书披露的多个实施例进行描述。In the following, a number of embodiments disclosed in this specification will be described with reference to the accompanying drawings.
本说明书实施例披露一种训练词向量嵌入模型的方法。下面,首先对发明人提出所述方法的发明构思进行介绍,具体如下:The embodiment of this specification discloses a method for training a word vector embedding model. Below, first introduce the inventive concept of the method proposed by the inventor, which is specifically as follows:
词向量算法用于将一个词语映射到一个固定维度的向量上,使得该向量的数值可以表示该词语的语义信息。目前训练词向量的常见框架有两种,分别为Skigram和CBOW(Continuous Bag-of-Words Model,连续词袋模型)。基于Skigram框架确定出的词向量准确度更高,但训练速度会慢很多倍。在一些数据量非常大的场景下,更需要CBOW框架,但基于其确定出的词向量准确度有限。The word vector algorithm is used to map a word to a fixed-dimensional vector, so that the value of the vector can represent the semantic information of the word. At present, there are two common frameworks for training word vectors, namely Skigram and CBOW (Continuous Bag-of-Words Model). The accuracy of word vectors determined based on the Skigram framework is higher, but the training speed will be many times slower. In some scenarios with very large data volumes, the CBOW framework is more needed, but the accuracy of the word vectors determined based on it is limited.
基于此,发明人提出一种训练词向量嵌入模型的方法,通过借鉴CBOW框架,并引入自注意力(SelfAttention)机制,实现在快速完成超大规模文本的词向量训练的同时,有效提高训练出的词向量的准确度。Based on this, the inventor proposes a method for training word vector embedding models. By drawing on the CBOW framework and introducing a self-attention mechanism, it can quickly complete the word vector training of super-large-scale texts while effectively improving the trained The accuracy of the word vector.
具体地,上述词向量嵌入模型中包括针对相同的多个词语建立的两个词向量矩阵,为便于区分描述,分别称为第一词向量矩阵和第二词向量矩阵。在一个实施例中,图1示出根据一个实施例的词嵌入模型中两个词向量矩阵的示意图,图1中示出对应于N(为大于1的整数)个词语的第一词向量矩阵W N*M和第二词向量矩阵C N*M。在训练上述词 向量嵌入模型的方法中,包括多次迭代更新,在每次迭代更新中都对第一词向量矩阵和第二词向量矩阵进行更新,进一步地,在迭代预定次数或者迭代至收敛后,将最后一次迭代中更新后的第一词向量矩阵包含的N个词向量,确定为N个词语最终的词向量。 Specifically, the above-mentioned word vector embedding model includes two word vector matrices established for the same multiple words, which are called the first word vector matrix and the second word vector matrix to facilitate distinguishing and description. In an embodiment, FIG. 1 shows a schematic diagram of two word vector matrices in a word embedding model according to an embodiment, and FIG. 1 shows a first word vector matrix corresponding to N (an integer greater than 1) words W N*M and the second word vector matrix C N*M . In the method of training the word vector embedding model, it includes multiple iterative updates. In each iterative update, the first word vector matrix and the second word vector matrix are updated, and further, after a predetermined number of iterations or until convergence Then, the N word vectors contained in the first word vector matrix updated in the last iteration are determined as the final word vectors of the N words.
在一个实施例中,对于上述多次迭代更新中的任意一次可以包括:首先,从训练语句对应的词语序列中,选取中心词和对应的多个上下文词语,例如,从图2中示出词语序列中选取中心词 t和2b个上下文词,该2b个上下文词对应集合{上下文词 i|i∈[t-b,t+b],i≠t};接着,从第一词向量矩阵中获取中心词的第一词向量,作为中心词向量,以及从第二词向量矩阵中获取多个上下文词的多个第二词向量,作为多个上下文词向量;然后,引入自注意力机制,基于多个上下文词向量,对多个上下文词进行自注意力打分,进而得到其中各个上下文词的权重,并利用该权重对多个上下文词向量进行加权求和,得到针对中心词的上下文表示向量,例如,如图3所示,基于2b个上下文词向量,确定2b个自注意力权重,再利用2b个自注意力权重对2b个上下文词向量进行加权求和,得到中心词的上下文表示向量c t′;最后,基于上述中心词向量和上下文表示向量,确定损失,用于更新第一词向量矩阵和第二词向量矩阵,例如,如图3所示,利用中心词向量w t和上下文表示向量c t′计算训练损失,用于调整两个词向量矩阵。如此,可以实现对两个词向量矩阵的迭代更新。 In one embodiment, updating any one of the foregoing multiple iterations may include: first, from the word sequence corresponding to the training sentence, select the central word and the corresponding multiple context words, for example, the words shown in FIG. 2 Select the center word t and 2b context words in the sequence, and the 2b context words correspond to the set {context word i |i∈[tb,t+b],i≠t}; then, get the center from the first word vector matrix The first word vector of the word is used as the central word vector, and multiple second word vectors of multiple context words are obtained from the second word vector matrix as multiple context word vectors; then, the self-attention mechanism is introduced, based on multiple A context word vector is used to score multiple context words for self-attention, and then the weight of each context word among them is obtained, and the weight is used to perform a weighted summation on multiple context word vectors to obtain the context representation vector for the central word, for example As shown in Figure 3, based on 2b context word vectors, 2b self-attention weights are determined, and then 2b self-attention weights are used to perform a weighted summation on the 2b context word vectors to obtain the context representation vector c t of the central word ′; Finally, based on the above-mentioned central word vector and context representation vector, the loss is determined and used to update the first word vector matrix and the second word vector matrix. For example, as shown in Figure 3, the central word vector w t and the context representation vector are used c t ′ calculates the training loss, which is used to adjust the two word vector matrices. In this way, iterative update of the two word vector matrices can be achieved.
采用上述方法,通过引入自注意力机制,确定多个上下文词向量的自注意力权重,实现对多个上下文词向量之间的相互影响、内在关联的刻画,进而求取多个上下文词向量的加权向量作为中心词的上下文表示向量,如此,相较于直接求取多个上下文词向量的平均向量作为上下文表示向量,可以提高上下文表示向量的准确度,从而提高更新后的词向量矩阵的准确度,进而提高最终确定出的词嵌入向量的准确度。Using the above method, by introducing a self-attention mechanism, the self-attention weights of multiple context word vectors are determined, and the mutual influence and internal association between multiple context word vectors are described, and then the value of multiple context word vectors is obtained. The weighted vector is used as the context representation vector of the central word. In this way, compared to directly obtaining the average vector of multiple context word vectors as the context representation vector, the accuracy of the context representation vector can be improved, thereby improving the accuracy of the updated word vector matrix Therefore, the accuracy of the finally determined word embedding vector is improved.
下面结合具体的实施例,描述上述方法的实施步骤。具体地,上述方法的执行主体可以为任何具有计算、处理能力的装置、设备、系统、服务器集群等。在上述方法中,首先包括,通过处理训练语料,建立词汇表,进而初始化对应的第一词向量矩阵和第二词向量矩阵,并且,确定多个训练语句对应的多个词语序列,用于后续对第一词向量矩阵和第二词向量矩阵进行多次迭代更新。为便于理解,先对训练语料的处理进行介绍,再对多次迭代更新的过程进行介绍。The following describes the implementation steps of the above method in conjunction with specific embodiments. Specifically, the execution subject of the foregoing method may be any device, equipment, system, server cluster, etc. that has computing and processing capabilities. In the above method, firstly, it includes processing the training corpus to establish a vocabulary list, and then initialize the corresponding first word vector matrix and the second word vector matrix, and determine multiple word sequences corresponding to multiple training sentences for subsequent use The first word vector matrix and the second word vector matrix are updated iteratively. In order to facilitate understanding, first introduce the processing of training corpus, and then introduce the process of multiple iterations and updates.
图4示出根据一个实施例的训练词向量嵌入模型的方法中的训练语料处理流程示意图。如图4所示,所述训练语料的处理流程中包括以下步骤:Fig. 4 shows a schematic diagram of a training corpus processing flow in a method for training a word vector embedding model according to an embodiment. As shown in Figure 4, the processing flow of the training corpus includes the following steps:
步骤S410,获取训练语料,其中包括多条训练语句;步骤S420,对各条训练语句 进行分词,根据分词结果,得到所述训练语料对应的词汇表,以及各训练语句对应的所述词语序列;步骤S430,根据所述词汇表,初始化所述第一词向量矩阵和第二词向量矩阵,其中每个矩阵的一行或一列对应于所述词汇表中的一个词语。Step S410: Obtain training corpus, which includes multiple training sentences; Step S420, perform word segmentation on each training sentence, and obtain a vocabulary table corresponding to the training corpus and the word sequence corresponding to each training sentence according to the word segmentation result; Step S430: Initialize the first word vector matrix and the second word vector matrix according to the vocabulary list, wherein one row or one column of each matrix corresponds to a word in the vocabulary list.
以上步骤具体如下:The above steps are as follows:
首先在步骤S410,获取训练语料,其中包括多条训练语句。First, in step S410, a training corpus is obtained, which includes a plurality of training sentences.
在一个实施例中,可以从网站中爬取大量文本作为训练语料。在另一个实施例中,可以获取工具书,如词典等的电子文本作为训练语料。进一步地,在一个实施例中,可以对训练语料进行断句处理,得到上述多条训练语句。其中断句处理可以为根据常用标点符号,如逗号、句号、分号、感叹号等,对文本进行断句。在另一个实施例中,可以对一篇文本中的符号进行去除,将保留的文字作为一条训练语句。在一个具体的实施例中,对于从社交网络平台爬取的用户动态,可以去除其中的标签、标签、空格等符号,将剩余文字作为对应的训练语句。In one embodiment, a large amount of text can be crawled from the website as training corpus. In another embodiment, electronic texts of reference books, such as dictionaries, can be obtained as training corpus. Further, in one embodiment, sentence segmentation processing may be performed on the training corpus to obtain the above-mentioned multiple training sentences. The sentence segmentation processing can be based on common punctuation marks, such as comma, period, semicolon, exclamation point, etc., to segment the text. In another embodiment, the symbols in a text can be removed, and the retained text can be used as a training sentence. In a specific embodiment, for the user dynamics crawled from the social network platform, tags, tags, spaces and other symbols can be removed, and the remaining text can be used as the corresponding training sentence.
如此,可以获取训练语料中包括的多条训练语句。接着,在步骤S420,对多条训练语句进行分词,并根据分词结果确定出词汇表和多条词语序列。In this way, multiple training sentences included in the training corpus can be obtained. Next, in step S420, word segmentation is performed on multiple training sentences, and a vocabulary list and multiple word sequences are determined according to the word segmentation result.
具体地,分词就是将连续的字序列按照一定的规范重新组合成词序列的过程。本步骤中的分词可以采用已有的分词方式或分词工具实现。例如,其中分词方式可以包括正向最大匹配、最小切分、基于N元文法模型(N-gram)的分词等,其中分词工具可以包括THULAC词法工具、NLPIR分词系统等。Specifically, word segmentation is the process of recombining consecutive word sequences into word sequences according to certain specifications. The word segmentation in this step can be achieved by using existing word segmentation methods or word segmentation tools. For example, the word segmentation methods may include forward maximum matching, minimum segmentation, and word segmentation based on N-gram model (N-gram), and the word segmentation tools may include THULAC lexical tools, NLPIR word segmentation systems, and the like.
上述分词结果中包括对各条训练语句进行分词后得到的多个分词。基于此,可以确定出多个不同的分词,构建上述词汇表。同时,对各条训练语句进行分词,可以得到对应的分词序列,作为词语序列,由此得到多条训练语句对应的多条词语序列。The above-mentioned word segmentation results include multiple word segmentation obtained after word segmentation of each training sentence. Based on this, a number of different word segmentation can be determined, and the above vocabulary list can be constructed. At the same time, by performing word segmentation on each training sentence, the corresponding word segmentation sequence can be obtained as a word sequence, thereby obtaining multiple word sequences corresponding to multiple training sentences.
在一个实施例中,根据分词结果中包含的多个分词,可以进行词频统计,得到多个不同分词的词频,并将其中词频低于预定阈值的低频分词进行去除,利用保留下来的分词构建上述词汇表。需要说明,上述词频是指词语的出现次数,其中预定阈值可以根据实际需要设定,如10或20等。在另一个实施例中,还可以根据词频对不同分词进行排名,然后将排在最末几位(如倒数5或10位)的分词进行抛弃处理,利用剩余分词构建上述词汇表。相应地,对于上述各个分词序列,同样需要去除上述低频词,得到对应的词语序列。由此可以得到去除低频词后的多条词语序列。In an embodiment, according to the multiple word segmentation contained in the word segmentation result, word frequency statistics can be performed to obtain the word frequencies of multiple different word segments, and the low-frequency segmentation words whose word frequency is lower than a predetermined threshold are removed, and the remaining word segmentation is used to construct the above Glossary. It should be noted that the above word frequency refers to the number of occurrences of words, and the predetermined threshold can be set according to actual needs, such as 10 or 20. In another embodiment, different word segments may be ranked according to word frequency, and then the last few words (such as 5 or 10 from the bottom) are discarded, and the above-mentioned vocabulary list is constructed using the remaining word segments. Correspondingly, for each of the above-mentioned word segmentation sequences, it is also necessary to remove the above-mentioned low-frequency words to obtain the corresponding word sequences. Therefore, multiple word sequences after removing low-frequency words can be obtained.
以上,可以确定出词汇表和多条词语序列。需要理解,词语序列中的词语都存在于 词汇表中,而词汇表中的词语,会出现在若干词语序列中,也可能在某条词语序列中出现多次。Above, the vocabulary and multiple word sequences can be determined. It needs to be understood that the words in the word sequence all exist in the vocabulary, and the words in the vocabulary may appear in several word sequences, or they may appear multiple times in a certain word sequence.
然后在步骤S430,根据所述词汇表,初始化所述第一词向量矩阵和第二词向量矩阵,其中每个矩阵的一行或一列对应于所述词汇表中的一个词语。Then in step S430, the first word vector matrix and the second word vector matrix are initialized according to the vocabulary list, wherein one row or one column of each matrix corresponds to a word in the vocabulary list.
具体地,可以将词汇表中的每个词语映射到固定维度的向量,作为该每个词语的词向量,因此,词汇表中各个词语对应的词向量,可以共同构成词向量矩阵。假定词汇表中包括N个词语,将其中每个词语映射到M维向量上,可以得到N*M维矩阵。参见图1,其中建立两个词向量矩阵,分别为第一词向量矩阵W N*M和第二词向量矩阵C N*M。需要理解,词向量可以作为矩阵的行向量或列向量,相应地,矩阵中的一行或一列对应于词汇表中的一个词语。 Specifically, each word in the vocabulary can be mapped to a fixed-dimensional vector as the word vector of each word. Therefore, the word vector corresponding to each word in the vocabulary can jointly form a word vector matrix. Assuming that the vocabulary includes N words, and mapping each word to an M-dimensional vector, an N*M-dimensional matrix can be obtained. Refer to Figure 1, where two word vector matrices are established, namely the first word vector matrix W N*M and the second word vector matrix C N*M . It needs to be understood that a word vector can be used as a row vector or a column vector of a matrix. Accordingly, a row or a column in the matrix corresponds to a word in the vocabulary.
在构建第一词向量矩阵和第二词向量矩阵的同时,可以完成对其的初始化。在一个实施例中,可以采用随机算法,对其中的矩阵元素进行初始赋值。在另一个实施例中,可以由工作人员任意指定矩阵元素的具体数值,例如,对于图1中示出的W N*M和C N*M,可以将其中第i(i∈[1,N])行第i个元素设置为1,其余元素设置为0。需要理解,初始化后的第一词向量矩阵和第二词向量矩阵可以相同,也可以不同,在后续迭代更新过程中,二者通常会出现差异。如此,可以是实现对两个词向量矩阵的初始化。 While constructing the first word vector matrix and the second word vector matrix, the initialization can be completed. In an embodiment, a random algorithm may be used to initially assign values to the matrix elements. In another embodiment, the specific values of matrix elements can be arbitrarily designated by the staff. For example, for W N*M and C N*M shown in Fig. 1, the i-th (i∈[1,N ]) The i-th element of the row is set to 1, and the remaining elements are set to 0. It should be understood that the initialized first word vector matrix and the second word vector matrix may be the same or different, and there will usually be differences between the two in the subsequent iterative update process. In this way, the initialization of the two word vector matrices can be realized.
由上,基于图4中示出的步骤,可以根据训练语料,确定出各条训练语句对应的词语序列,同时,确定出词汇表,进而完成第一词向量矩阵和第二词向量矩阵的构建和初始化。From the above, based on the steps shown in Figure 4, the word sequence corresponding to each training sentence can be determined based on the training corpus, and at the same time, the vocabulary can be determined, and then the construction of the first word vector matrix and the second word vector matrix can be completed And initialization.
进一步地,在上述词向量嵌入模型的训练方法中,可以基于各条训练语句对应的词语序列和初始化后的两个词向量矩阵,进行多次迭代更新。具体地,图5示出根据一个实施例的训练词向量嵌入模型的方法中的迭代更新流程示意图。如图5所示,其中任一次迭代更新可以包括以下步骤:Further, in the above-mentioned training method of the word vector embedding model, multiple iterative updates may be performed based on the word sequence corresponding to each training sentence and the two initialized word vector matrices. Specifically, FIG. 5 shows a schematic diagram of an iterative update process in a method for training a word vector embedding model according to an embodiment. As shown in Figure 5, any one iteration update can include the following steps:
步骤S510,从训练语句对应的词语序列中,确定中心词语和所述中心词语的k个上下文词语,其中k为大于1的整数;步骤S520,根据第一词向量矩阵,确定所述中心词语对应的中心词向量;根据第二词向量矩阵,确定所述k个上下文词语对应的k个上下文词向量;步骤S530,基于所述k个上下文词向量彼此之间的相似度,确定其对应的k个注意力权重;步骤S540,利用所述k个注意力权重,对所述k个上下文词向量进行加权求和,得到所述中心词语的上下文表示向量;步骤S550,计算所述中心词向 量与所述上下文表示向量之间的第一相似度;步骤S560,至少以增大所述第一相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵;所述多次迭代更新后的第一词向量矩阵用于查询目标词语的目标词向量。Step S510: Determine the central word and k context words of the central word from the word sequence corresponding to the training sentence, where k is an integer greater than 1; Step S520, determine the central word corresponding to the first word vector matrix According to the second word vector matrix, determine the k context word vectors corresponding to the k context words; step S530, determine the corresponding k context word vectors based on the similarity between the k context word vectors Attention weights; step S540, use the k attention weights to perform a weighted summation on the k context word vectors to obtain the context representation vector of the central word; step S550, calculate the central word vector and The context represents the first degree of similarity between vectors; step S560, at least aiming at increasing the first degree of similarity, updating the first word vector matrix and the second word vector matrix; the multiple times The first word vector matrix after iterative update is used to query the target word vector of the target word.
以上步骤具体如下:The above steps are as follows:
首先,在步骤S510,从训练语句对应的词语序列中,确定中心词语和所述中心词语的k个上下文词语。First, in step S510, a central word and k context words of the central word are determined from the word sequence corresponding to the training sentence.
在一个实施例中,本步骤中可以包括:采用预设宽度的滑窗,沿所述词语序列滑动,将任一时刻下,所述滑窗中心位置对应的词语确定为所述中心词语,将所述滑窗内除所述中心词语之外的词语,作为所述k个上下文词语。在一个具体的实施例中,如图2所示,可以将滑窗宽度设定为2b+1,相应地,可以将任一时刻下,滑窗中心位置,2b+1个词语中的第b+1个词语作为忠祠内词语,而将其他的2b个词语,作为k(=2b)个上下文词语。需要说明,对于其中采用的滑窗取词方式,可以依次将词语序列中的各个词语顺序作为中心词,对于滑窗内词语缺省的情况,比如滑窗以词语序列的首词为滑窗中心,此时将首词作为中心词后,其在词语序列中不存在在先的上文词语,此时,可以利用预设向量进行补足,比如,将b个预设向量作为首词的b个上文词语的词向量,其余情况可以类推。如此,可以通过滑窗取词的方式,确定出中心词语和对应的k个上下文词语。In one embodiment, this step may include: using a sliding window with a preset width, sliding along the word sequence, determining the word corresponding to the center position of the sliding window at any moment as the central word, and Words in the sliding window other than the central word are regarded as the k context words. In a specific embodiment, as shown in Figure 2, the width of the sliding window can be set to 2b+1. Accordingly, the center position of the sliding window at any time can be set to the b-th word in 2b+1. +1 words are regarded as words in the loyal shrine, and the other 2b words are regarded as k (=2b) context words. It should be noted that for the sliding window word acquisition method, each word sequence in the word sequence can be used as the central word in turn. For the default words in the sliding window, for example, the sliding window uses the first word of the word sequence as the center of the sliding window , At this time, after the first word is used as the central word, there is no previous word in the word sequence. At this time, the preset vector can be used to supplement, for example, b preset vectors are used as the b of the first word The word vector of the above words can be deduced by analogy for the rest. In this way, the central word and the corresponding k context words can be determined by the way of taking words through the sliding window.
在另一个实施例中,可以从词语序列中随机选取某个词语作为中心词,再选取与其邻近的k个词语作为k个上下文词语。例如,假定k为3,此时,在选取中心词后,可以选取与其在前邻近的1个词语和在后邻近的2个词语,作为其3个上下文词语。In another embodiment, a word may be randomly selected from the word sequence as the head word, and then k words adjacent to it may be selected as k context words. For example, assuming that k is 3, at this time, after selecting the central word, one word that is adjacent to it before and two words that are adjacent to it can be selected as its three contextual words.
以上,可以基于词语序列,确定中心词和其k个上下文词语。接着,在步骤S520,根据第一词向量矩阵,确定所述中心词语对应的中心词向量;并且,根据第二词向量矩阵,确定k个上下文词语对应的k个上下文词向量。Above, the head word and its k context words can be determined based on the word sequence. Next, in step S520, according to the first word vector matrix, determine the central word vector corresponding to the central word; and, according to the second word vector matrix, determine k context word vectors corresponding to the k context words.
具体地,如前所述,第一词向量矩阵和第二词向量矩阵是基于词汇表而进行初始化的,词语表中的词语与第一词向量矩阵中第一词向量的第一映射关系,以及与第二词向量矩阵中第二词向量的第二映射关系,随之建立。而中心词语和上下文词语都是词汇表中的词语,由此,可以根据第一映射关系,确定中心词对应的第一词向量,作为上述中心词向量,并且,根据第二映射关系,确定k个上下文词语对应的k个第二词向量,作为上述k个上下文词向量。根据一个实施例,可以在图1示出的词汇表中查找中心词, 并确定其在词汇表中的下标(如,下标2),再根据该下标在第一词向量矩阵中查找到对应的第一词向量(如,向量w 2)。同时,还可以在图1示出的词汇表中查找k个上下文词,并确定对应的k个下标(如其中包括下标N),再根据k个下标在第二词向量矩阵中查找到对应的k个第二词向量(如其中包括向量c N)。 Specifically, as described above, the first word vector matrix and the second word vector matrix are initialized based on the vocabulary list, and the first mapping relationship between the words in the word list and the first word vector in the first word vector matrix, And the second mapping relationship with the second word vector in the second word vector matrix is established accordingly. The center word and context word are both words in the vocabulary. Therefore, the first word vector corresponding to the center word can be determined according to the first mapping relationship as the above center word vector, and k is determined according to the second mapping relationship. The k second word vectors corresponding to each context word are used as the above k context word vectors. According to one embodiment, the central word can be searched in the vocabulary shown in FIG. 1, and its subscript (for example, subscript 2) in the vocabulary can be determined, and then search in the first word vector matrix according to the subscript To the corresponding first word vector (eg, vector w 2 ). At the same time, you can also search for k context words in the vocabulary shown in Figure 1, and determine the corresponding k subscripts (such as including subscript N), and then search in the second word vector matrix according to the k subscripts To the corresponding k second word vectors (for example, including the vector c N ).
如此,可以确定出中心词对应的中心词向量和k个上下文词语对应的k个上下文词向量。然后,在步骤S530,基于其中k个上下文词向量彼此之间的相似度,确定其对应的k个注意力权重。In this way, the central word vector corresponding to the central word and the k context word vectors corresponding to the k context words can be determined. Then, in step S530, based on the similarity between the k context word vectors among them, the corresponding k attention weights are determined.
在一个实施例中,本步骤中可以包括:首先,基于所述k个上下文词向量,确定k阶相似度方阵;接着,对所述k阶相似度方阵中的各行分别进行归一化处理,得到k阶自注意力分数方阵;然后,分别求取所述k阶自注意力分数方阵中各列的平均值,得到所述k个自注意力权重。In an embodiment, this step may include: firstly, based on the k context word vectors, determine a k-order similarity square matrix; then, normalize each row in the k-order similarity square matrix respectively After processing, the k-order self-attention score square matrix is obtained; then, the average value of each column in the k-order self-attention score square matrix is respectively calculated to obtain the k self-attention weights.
进一步地,对于上述k阶相似度方阵,其中第i行第j列元素表示第i个上下文词向量与第j个上下文词向量之间的相似度,需要理解,i和j为不大于k的正整数。在具体实施例中,可以计算这两个词向量之间的点积、欧式距离或余弦距离,作为二者之间的相似度,进而将其确定为k阶相似度方阵中的第i行第j列元素。Further, for the above k-th order similarity square matrix, the element in the i-th row and j-th column represents the similarity between the i-th context word vector and the j-th context word vector. It needs to be understood that i and j are not greater than k Is a positive integer. In a specific embodiment, the dot product, Euclidean distance, or cosine distance between the two word vectors can be calculated as the similarity between the two, and then determined as the i-th row in the k-order similarity square matrix Element in column j.
对于上述归一化处理,在一个具体的实施例中,可以利用softmax函数对k阶相似度方阵中的各行分别进行归一化处理。如此可以得到k阶自注意力分数方阵,进而求取其中各列分数的平均值,作为对应的自主力权重。需要理解,k列中的各列分别对应各个上下文词向量,由此可以得到与k个上下文词向量对应的k个自注意力权重。Regarding the aforementioned normalization processing, in a specific embodiment, the softmax function may be used to perform normalization processing on each row in the k-order similarity square matrix. In this way, a square matrix of k-order self-attention scores can be obtained, and then the average value of the scores in each column can be obtained as the corresponding autonomy weight. It needs to be understood that each column in the k column corresponds to each context word vector, so that k self-attention weights corresponding to the k context word vectors can be obtained.
在另一个实施例中,本步骤中可以包括:首先,计算k个上下文词向量中任意两个词向量之间的相似度,由此可以得到
Figure PCTCN2021092009-appb-000001
个相似度,其中C是排列组合中的符号,
Figure PCTCN2021092009-appb-000002
表示从k个不同元素中任取2个作为一组,可以得到的组数;接着,针对其中任一个上下文词向量,计算其他词向量与其之间的相似度的和值,进而得到k个和值;然后,对k个和值进行归一化处理,得到上述k个自注意力权重。需要说明,对于其中相似度计算和归一化处理的描述,可以参见前述中的相关描述,不作赘述。
In another embodiment, this step may include: firstly, calculating the similarity between any two word vectors in the k context word vectors, so that
Figure PCTCN2021092009-appb-000001
Similarity, where C is the symbol in permutation and combination,
Figure PCTCN2021092009-appb-000002
Indicates the number of groups that can be obtained by taking any two of k different elements as a group; then, for any one of the context word vectors, calculate the sum of the similarity between other word vectors and the k sums Value; Then, normalize the k sum values to obtain the k self-attention weights. It should be noted that, for the description of the similarity calculation and normalization processing, refer to the relevant description in the foregoing, and will not be repeated.
由上可以得到与k个上下文词向量对应的k个自注意力权重。接着在步骤S540,利用所述k个注意力权重,对所述k个上下文词向量进行加权求和,得到所述中心词语的上下文表示向量。在一个实施例中,参见图3,其中示出2b个上下文词向量和2b个自注意力权重,由此可以计算出中心词的上下文表示向量:From the above, k self-attention weights corresponding to k context word vectors can be obtained. Next, in step S540, the k attention weights are used to perform a weighted summation on the k context word vectors to obtain the context representation vector of the central word. In an embodiment, refer to Figure 3, which shows 2b context word vectors and 2b self-attention weights, from which the context representation vector of the central word can be calculated:
c t′=∑ i∈[t-b,t+b]-{t}a i·c i    (1) c t ′=∑ i∈[tb,t+b]-{t} a i ·c i (1)
需要说明,式(1)中的下标i是基于图2中滑窗内词语的顺序排列而言的,或者说,是基于k个上下文词语而言的,不同于图1中词汇表中的词语下标。另外,t是中心词的下标,i∈[t-b,t+b]-{t}表示,除中心词以外,i依次指向每一个上下文词语;c i表示k个上下文词向量中的第i个上下文词向量,a i表示对应的第k个自注意力权重。 It should be noted that the subscript i in formula (1) is based on the order of words in the sliding window in Figure 2, or in other words, based on k context words, which is different from the vocabulary in Figure 1 Word subscripts. In addition, t is the subscript of the head word, i∈[tb, t+b]-{t} means that except for the head word, i points to each context word in turn; c i means the i-th in k context word vectors Context word vectors, a i represents the corresponding k-th self-attention weight.
由此,通过加权求和,可以得到中心词语的上下文表示向量。然后在步骤S550,计算所述中心词向量与所述上下文表示向量之间的第一相似度。在具体实施例中,可以计算中心词向量和上下文表示向量之间的点积、或余弦距离或欧式距离等,作为所述第一相似度。基于此,在步骤S560,至少以增大所述第一相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵。Thus, through weighted summation, the context representation vector of the central word can be obtained. Then in step S550, the first similarity between the central word vector and the context representation vector is calculated. In a specific embodiment, the dot product, or cosine distance or Euclidean distance, etc. between the central word vector and the context representation vector may be calculated as the first similarity. Based on this, in step S560, at least with the goal of increasing the first degree of similarity, the first word vector matrix and the second word vector matrix are updated.
在一个实施例中,本步骤中可以包括:从所述第一词向量矩阵中随机抽取某个第一词向量,计算所述某个第一词向量和所述上下文表示向量之间的第二相似度;然后,以增大所述第一相似度和减小所述第二相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵。如此,可以进一步提高更新后的词向量矩阵的准确度。需要说明,对于其中第二相似度的计算,可以采用对计算向量间相似度的描述,在此不作赘述。In an embodiment, this step may include: randomly extracting a certain first word vector from the first word vector matrix, and calculating the second difference between the certain first word vector and the context representation vector Similarity; then, with the goal of increasing the first similarity and reducing the second similarity, the first word vector matrix and the second word vector matrix are updated. In this way, the accuracy of the updated word vector matrix can be further improved. It should be noted that for the calculation of the second similarity, the description of calculating the similarity between the vectors can be used, which will not be repeated here.
在一个具体的实施例中,其中训练损失可以采用以下公式计算:In a specific embodiment, the training loss can be calculated using the following formula:
L=-logσ(w t·c t′)-λlogσ(-w r·c t′)    (2) L=-logσ(w t ·c t ′)-λlogσ(-w r ·c t ′) (2)
上式(2)中,L表示上述训练损失,w t表示上述中心词向量,c t′表示上述中心词的上下文表示向量,w t·c t′表示w t与c t′之间的点积,w r表示上述随机抽取的某个第一词向量,w r·c t′表示w t与c t′之间的点积,λ是一个超参数,例如可以设定为0.01等,σ表示常用于神经网络中的激活函数,如tanh函数或sigmoid函数等。如此,可以基于第一相似度和第二相似度确定训练损失,进而根据训练损失更新两个词向量矩阵。 In the above formula (2), L represents the above training loss, w t represents the above central word vector, c t ′ represents the context representation vector of the above central word, and w t ·c t ′ represents the point between w t and c t ′ Product, w r represents a certain first word vector drawn randomly above, w r · c t ′ represents the dot product between w t and c t ′, λ is a hyperparameter, for example, it can be set to 0.01, etc., σ Represents activation functions commonly used in neural networks, such as tanh function or sigmoid function. In this way, the training loss can be determined based on the first similarity and the second similarity, and then the two word vector matrices can be updated according to the training loss.
需要说明,根据训练损失更新两个词向量矩阵可以包括:根据训练损失,确定两个词向量矩阵中相关元素的损失梯度,然后利用该相关元素中各个元素的当前值,减去对应的损失梯度与学习步长(超参,如设定为0.05)之间的乘积,得到更新后的元素值,从而实现对两个词向量矩阵的更新。It should be noted that updating the two word vector matrices according to the training loss may include: determining the loss gradients of related elements in the two word vector matrices according to the training loss, and then using the current values of each element in the related elements to subtract the corresponding loss gradients Multiply the product with the learning step size (super parameter, for example, set to 0.05) to obtain the updated element value, thereby realizing the update of the two word vector matrices.
以上,可以实现对上述第一词向量矩阵和第二词向量矩阵的更新。需要说明,上述步骤S510至步骤S560,描述任一次迭代更新的过程。通过重复执行上述步骤S510至步骤S560,可以实现多次迭代更新,进而将多次迭代更新后的第一词向量矩阵作为上 述词汇表对应的词向量查询矩阵,用于查询目标词语的目标词向量。在一个具体的实施例中,某个搜索引擎在接收到用户对目标词语的查询指令后,可以从上述词向量查询矩阵中查询出该目标词语对应的目标词向量,进而根据目标词向量,从内容数据库中确定用于向用户反馈的相关内容。Above, the update of the first word vector matrix and the second word vector matrix can be realized. It should be noted that the above steps S510 to S560 describe the process of any one iteration update. By repeating the above steps S510 to S560, multiple iterative updates can be achieved, and the first word vector matrix after multiple iterative updates is used as the word vector query matrix corresponding to the vocabulary to query the target word vector of the target word . In a specific embodiment, after a certain search engine receives a user's query instruction for a target word, it can query the target word vector corresponding to the target word from the word vector query matrix, and then according to the target word vector, from The content database determines the relevant content used for feedback to the user.
综上,采用本说明书实施例披露的训练词向量嵌入模型的方法,通过引入自注意力机制,确定多个上下文词向量的自注意力权重,实现对多个上下文词向量之间的相互影响、内在关联的刻画,进而求取多个上下文词向量的加权向量作为中心词的上下文表示向量。如此,相较于直接求取多个上下文词向量的平均向量作为上下文表示向量,可以提高上下文表示向量的准确度,从而提高更新后的词向量矩阵的准确度,进而提高最终确定出的词嵌入向量的准确度。In summary, using the method of training word vector embedding models disclosed in the embodiments of this specification, by introducing a self-attention mechanism, the self-attention weights of multiple context word vectors are determined, and the mutual influence and interaction between multiple context word vectors are realized. To characterize the internal association, and then obtain the weighted vector of multiple context word vectors as the context representation vector of the central word. In this way, compared to directly obtaining the average vector of multiple context word vectors as the context representation vector, the accuracy of the context representation vector can be improved, thereby improving the accuracy of the updated word vector matrix, and thereby improving the final word embedding. The accuracy of the vector.
与上述训练方法相对应的,本说明书实施例还披露一种训练装置。具体地,图6示出根据一个实施例的训练词向量嵌入模型的装置结构示意图,其中词向量嵌入模型中包括第一词向量矩阵和第二词向量矩阵,该装置可以通过任何具有计算、处理能力的计算节点或服务器集群实现。Corresponding to the above-mentioned training method, the embodiment of this specification also discloses a training device. Specifically, FIG. 6 shows a schematic diagram of the device structure for training a word vector embedding model according to an embodiment. The word vector embedding model includes a first word vector matrix and a second word vector matrix. Capable of computing node or server cluster implementation.
如图6所示,所述装置600包括迭代更新单元610,用于执行多次迭代更新,所述迭代更新单元通过以下模块执行其中任一次迭代更新:As shown in FIG. 6, the device 600 includes an iterative update unit 610, configured to perform multiple iterative updates, and the iterative update unit performs any one of the iterative updates through the following modules:
词语确定模块611,配置为从训练语句对应的词语序列中,确定中心词语和所述中心词语的k个上下文词语,其中k为大于1的整数。The word determination module 611 is configured to determine the central word and k context words of the central word from the word sequence corresponding to the training sentence, where k is an integer greater than 1.
词向量确定模块612,配置为根据第一词向量矩阵,确定所述中心词语对应的中心词向量;根据第二词向量矩阵,确定所述k个上下文词语对应的k个上下文词向量。The word vector determining module 612 is configured to determine the central word vector corresponding to the central word according to the first word vector matrix; and to determine the k context word vectors corresponding to the k context words according to the second word vector matrix.
权重确定模块613,配置为基于所述k个上下文词向量彼此之间的相似度,确定其对应的k个注意力权重。The weight determination module 613 is configured to determine k corresponding attention weights based on the similarity between the k context word vectors.
加权求和模块614,配置为利用所述k个注意力权重,对所述k个上下文词向量进行加权求和,得到所述中心词语的上下文表示向量。The weighted summation module 614 is configured to use the k attention weights to perform a weighted summation on the k context word vectors to obtain the context representation vector of the central word.
相似度计算模块615,配置为计算所述中心词向量与所述上下文表示向量之间的第一相似度。The similarity calculation module 615 is configured to calculate the first similarity between the central word vector and the context representation vector.
矩阵更新模块616,配置为至少以增大所述第一相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵;所述多次迭代更新后的第一词向量矩阵用于查询目标词语的目标词向量。The matrix update module 616 is configured to update the first word vector matrix and the second word vector matrix with at least increasing the first degree of similarity; the first word vector matrix after multiple iterations of the update The target word vector used to query the target word.
在一个实施例中,上述装置600还包括:语料获取单元620,配置为获取训练语料,其中包括多条训练语句;分词单元630,配置为对各条训练语句进行分词,根据分词结果,得到所述训练语料对应的词汇表,以及各训练语句对应的所述词语序列;初始化单元640,配置为根据所述词汇表,初始化所述第一词向量矩阵和第二词向量矩阵,其中每个矩阵的一行或一列对应于所述词汇表中的一个词语。In one embodiment, the above device 600 further includes: a corpus acquisition unit 620, configured to acquire training corpus, including multiple training sentences; and a word segmentation unit 630, configured to segment each training sentence, and obtain all the training sentences according to the word segmentation result. The vocabulary table corresponding to the training corpus, and the word sequence corresponding to each training sentence; the initialization unit 640 is configured to initialize the first word vector matrix and the second word vector matrix according to the vocabulary table, wherein each matrix A row or column of corresponds to a word in the vocabulary.
在一个具体的实施例中,其中分词单元630具体配置为:根据所述分词结果进行词频统计,得到多个不同分词的词频;从所述多个不同分词中去除词频低于预定阈值的低频分词,得到所述词汇表。In a specific embodiment, the word segmentation unit 630 is specifically configured to: perform word frequency statistics according to the word segmentation result to obtain the word frequencies of a plurality of different word segmentation; remove low-frequency word segmentation whose word frequency is lower than a predetermined threshold from the plurality of different word segmentation , Get the vocabulary list.
在一个实施例中,其中词语确定模块611具体配置为:采用预设宽度的滑窗,沿所述词语序列滑动,将任一时刻下,所述滑窗中心位置对应的词语确定为所述中心词语,将所述滑窗内除所述中心词语之外的词语,作为所述k个上下文词语。In one embodiment, the word determining module 611 is specifically configured to: adopt a sliding window with a preset width, slide along the word sequence, and determine the word corresponding to the center position of the sliding window at any time as the center Words, words other than the central word in the sliding window are regarded as the k contextual words.
在一个实施例中,其中权重确定模块613具体配置为:基于所述k个上下文词向量,确定k阶相似度方阵,其中第i行第j列元素表示第i个上下文词向量与第j个上下文词向量之间的相似度,其中i和j为不大于k的正整数;对所述k阶相似度方阵中的各行分别进行归一化处理,得到k阶自注意力分数方阵;分别求取所述k阶自注意力分数方阵中各列的平均值,得到所述k个自注意力权重。In an embodiment, the weight determining module 613 is specifically configured to determine a k-th order similarity square matrix based on the k context word vectors, where the element in the i-th row and the j-th column represents the i-th context word vector and the j-th context word vector. The similarity between two context word vectors, where i and j are positive integers not greater than k; each row in the k-order similarity square matrix is respectively normalized to obtain the k-order self-attention score square matrix ; Calculate the average value of each column in the k-order self-attention score square matrix to obtain the k self-attention weights.
在一个实施例中,其中相似度计算模块615具体配置为:计算所述中心词向量和所述上下文表示向量之间的点积,作为所述第一相似度。In an embodiment, the similarity calculation module 615 is specifically configured to calculate the dot product between the central word vector and the context representation vector as the first similarity.
在一个实施例中,其中矩阵更新模块616具体配置为:从所述第一词向量矩阵中随机抽取某个第一词向量;计算所述某个第一词向量和所述上下文表示向量之间的第二相似度;以增大所述第一相似度和减小所述第二相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵。In an embodiment, the matrix update module 616 is specifically configured to: randomly extract a certain first word vector from the first word vector matrix; calculate the relationship between the certain first word vector and the context representation vector The second degree of similarity; with the goal of increasing the first degree of similarity and reducing the second degree of similarity, the first word vector matrix and the second word vector matrix are updated.
综上,采用本说明书实施例披露的训练词向量嵌入模型的装置,通过引入自注意力机制,确定多个上下文词向量的自注意力权重,实现对多个上下文词向量之间的相互影响、内在关联的刻画,进而求取多个上下文词向量的加权向量作为中心词的上下文表示向量。如此,相较于直接求取多个上下文词向量的平均向量作为上下文表示向量,可以提高上下文表示向量的准确度,从而提高更新后的词向量矩阵的准确度,进而提高最终确定出的词嵌入向量的准确度。In summary, using the device for training word vector embedding models disclosed in the embodiments of this specification, by introducing a self-attention mechanism, the self-attention weights of multiple context word vectors are determined, and the mutual influence and interaction between multiple context word vectors are realized. To characterize the internal association, and then obtain the weighted vector of multiple context word vectors as the context representation vector of the central word. In this way, compared to directly obtaining the average vector of multiple context word vectors as the context representation vector, the accuracy of the context representation vector can be improved, thereby improving the accuracy of the updated word vector matrix, and thereby improving the final word embedding. The accuracy of the vector.
如上,根据又一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算 机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图4或图5所描述的方法。As above, according to another aspect of the embodiment, there is also provided a computer-readable storage medium on which a computer program is stored. When the computer program is executed in a computer, the computer is caused to execute the description in conjunction with FIG. 4 or FIG. 5 method.
根据又一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图4或图5所描述的方法。According to an embodiment of yet another aspect, there is also provided a computing device, including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, a combination of FIG. 4 or FIG. 5 is implemented. The described method.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本说明书披露的多个实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in the multiple embodiments disclosed in this specification can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
以上所述的具体实施方式,对本说明书披露的多个实施例的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本说明书披露的多个实施例的具体实施方式而已,并不用于限定本说明书披露的多个实施例的保护范围,凡在本说明书披露的多个实施例的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本说明书披露的多个实施例的保护范围之内。The specific implementations described above further describe the objectives, technical solutions, and beneficial effects of the multiple embodiments disclosed in this specification. It should be understood that the above descriptions are only examples of the multiple embodiments disclosed in this specification. The specific implementation mode is only, and is not used to limit the protection scope of the multiple embodiments disclosed in this specification. Any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the multiple embodiments disclosed in this specification , Should be included in the protection scope of the multiple embodiments disclosed in this specification.

Claims (15)

  1. 一种训练词向量嵌入模型的方法,所述词向量嵌入模型包括,第一词向量矩阵和第二词向量矩阵;所述方法包括多次迭代更新,其中任一次迭代更新包括:A method for training a word vector embedding model. The word vector embedding model includes a first word vector matrix and a second word vector matrix; the method includes multiple iterative updates, wherein any one iterative update includes:
    从训练语句对应的词语序列中,确定中心词语和所述中心词语的k个上下文词语,其中k为大于1的整数;From the word sequence corresponding to the training sentence, determine the central word and k context words of the central word, where k is an integer greater than 1;
    根据第一词向量矩阵,确定所述中心词语对应的中心词向量;根据第二词向量矩阵,确定所述k个上下文词语对应的k个上下文词向量;Determine the central word vector corresponding to the central word according to the first word vector matrix; determine the k context word vectors corresponding to the k context words according to the second word vector matrix;
    基于所述k个上下文词向量彼此之间的相似度,确定其对应的k个注意力权重;Determine the corresponding k attention weights based on the similarity between the k context word vectors;
    利用所述k个注意力权重,对所述k个上下文词向量进行加权求和,得到所述中心词语的上下文表示向量;Using the k attention weights to perform a weighted summation on the k context word vectors to obtain the context representation vector of the central word;
    计算所述中心词向量与所述上下文表示向量之间的第一相似度;Calculating the first similarity between the central word vector and the context representation vector;
    至少以增大所述第一相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵;所述多次迭代更新后的第一词向量矩阵用于查询目标词语的目标词向量。At least with the goal of increasing the first similarity, the first word vector matrix and the second word vector matrix are updated; the first word vector matrix after multiple iterations is used to query the target of the target word Word vector.
  2. 根据权利要求1所述的方法,在所述多次迭代更新之前,所述方法还包括:The method according to claim 1, before the multiple iterations of updating, the method further comprises:
    获取训练语料,其中包括多条训练语句;Obtain training corpus, including multiple training sentences;
    对各条训练语句进行分词,根据分词结果,得到所述训练语料对应的词汇表,以及各训练语句对应的所述词语序列;Perform word segmentation on each training sentence, and obtain the vocabulary table corresponding to the training corpus and the word sequence corresponding to each training sentence according to the word segmentation result;
    根据所述词汇表,初始化所述第一词向量矩阵和第二词向量矩阵,其中每个矩阵的一行或一列对应于所述词汇表中的一个词语。According to the vocabulary list, the first word vector matrix and the second word vector matrix are initialized, wherein one row or one column of each matrix corresponds to a word in the vocabulary list.
  3. 根据权利要求2所述的方法,其中,根据分词结果,得到所述训练语料对应的词汇表,包括:The method according to claim 2, wherein obtaining a vocabulary table corresponding to the training corpus according to the word segmentation result comprises:
    根据所述分词结果进行词频统计,得到多个不同分词的词频;Perform word frequency statistics according to the word segmentation results to obtain the word frequencies of multiple different word segmentation;
    从所述多个不同分词中去除词频低于预定阈值的低频分词,得到所述 词汇表。The low-frequency word segmentation whose word frequency is lower than a predetermined threshold is removed from the multiple different word segmentation to obtain the vocabulary list.
  4. 根据权利要求1所述的方法,其中,从训练语句对应的词语序列中,确定中心词语和所述中心词语的k个上下文词语,包括:The method according to claim 1, wherein determining the central word and the k context words of the central word from the word sequence corresponding to the training sentence comprises:
    采用预设宽度的滑窗,沿所述词语序列滑动,将任一时刻下,所述滑窗中心位置对应的词语确定为所述中心词语,将所述滑窗内除所述中心词语之外的词语,作为所述k个上下文词语。A sliding window with a preset width is used to slide along the word sequence, the word corresponding to the center position of the sliding window at any moment is determined as the central word, and the central word is excluded from the sliding window As the k contextual words.
  5. 根据权利要求1所述的方法,其中,基于所述k个上下文词向量彼此之间的相似度,确定其对应的k个注意力权重,包括:The method according to claim 1, wherein, based on the similarity of the k context word vectors to each other, determining the corresponding k attention weights comprises:
    基于所述k个上下文词向量,确定k阶相似度方阵,其中第i行第j列元素表示第i个上下文词向量与第j个上下文词向量之间的相似度,其中i和j为不大于k的正整数;Based on the k context word vectors, determine the k-th order similarity square matrix, where the element in the i-th row and j-th column represents the similarity between the i-th context word vector and the j-th context word vector, where i and j are A positive integer not greater than k;
    对所述k阶相似度方阵中的各行分别进行归一化处理,得到k阶自注意力分数方阵;Normalize each row in the k-order similarity square matrix to obtain the k-order self-attention score square matrix;
    分别求取所述k阶自注意力分数方阵中各列的平均值,得到所述k个自注意力权重。The average value of each column in the k-order self-attention score square matrix is respectively calculated to obtain the k self-attention weights.
  6. 根据权利要求5所述的方法,其中,基于所述k个上下文词向量,确定k阶相似度方阵,包括:The method according to claim 5, wherein, based on the k context word vectors, determining a k-th order similarity square matrix comprises:
    计算所述第i个上下文词向量和第j个上下文词向量之间的点积,作为所述k阶相似度方阵中的第i行第j列元素。Calculate the dot product between the i-th context word vector and the j-th context word vector as an element in the i-th row and j-th column in the k-th order similarity square matrix.
  7. 根据权利要求5所述的方法,其中,对所述k阶相似度方阵中的各行分别进行归一化处理,得到k阶自注意力分数方阵,包括:The method according to claim 5, wherein normalizing each row in the k-order similarity square matrix to obtain a k-order self-attention score square matrix includes:
    利用softmax函数对所述各行分别进行归一化处理,得到所述k阶自注意力分数方阵。The softmax function is used to normalize the rows respectively to obtain the k-order self-attention score square matrix.
  8. 根据权利要求1所述的方法,其中计算所述中心词向量与所述上下文表示向量之间的第一相似度,包括:The method according to claim 1, wherein calculating the first similarity between the central word vector and the context representation vector comprises:
    计算所述中心词向量和所述上下文表示向量之间的点积,作为所述第一相似度。The dot product between the central word vector and the context representation vector is calculated as the first degree of similarity.
  9. 根据权利要求1所述的方法,其中至少以增大所述第一相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵,包括:The method according to claim 1, wherein updating the first word vector matrix and the second word vector matrix at least with the goal of increasing the first degree of similarity comprises:
    从所述第一词向量矩阵中随机抽取某个第一词向量;Randomly extract a certain first word vector from the first word vector matrix;
    计算所述某个第一词向量和所述上下文表示向量之间的第二相似度;Calculating a second degree of similarity between the certain first word vector and the context representation vector;
    以增大所述第一相似度和减小所述第二相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵。With the goal of increasing the first similarity degree and reducing the second similarity degree, the first word vector matrix and the second word vector matrix are updated.
  10. 一种训练词向量嵌入模型的装置,所述词向量嵌入模型包括,第一词向量矩阵和第二词向量矩阵;所述装置包括迭代更新单元,用于执行多次迭代更新,所述迭代更新单元通过以下模块执行其中任一次迭代更新:A device for training a word vector embedding model. The word vector embedding model includes a first word vector matrix and a second word vector matrix; the device includes an iterative update unit for performing multiple iterative updates, the iterative update The unit performs any one of the iterative updates through the following modules:
    词语确定模块,配置为从训练语句对应的词语序列中,确定中心词语和所述中心词语的k个上下文词语,其中k为大于1的整数;The word determination module is configured to determine the central word and k context words of the central word from the word sequence corresponding to the training sentence, where k is an integer greater than 1;
    词向量确定模块,配置为根据第一词向量矩阵,确定所述中心词语对应的中心词向量;根据第二词向量矩阵,确定所述k个上下文词语对应的k个上下文词向量;The word vector determining module is configured to determine the central word vector corresponding to the central word according to the first word vector matrix; determine the k context word vectors corresponding to the k context words according to the second word vector matrix;
    权重确定模块,配置为基于所述k个上下文词向量彼此之间的相似度,确定其对应的k个注意力权重;A weight determination module configured to determine k corresponding attention weights based on the similarity between the k context word vectors;
    加权求和模块,配置为利用所述k个注意力权重,对所述k个上下文词向量进行加权求和,得到所述中心词语的上下文表示向量;A weighted summation module configured to use the k attention weights to perform a weighted summation on the k context word vectors to obtain the context representation vector of the central word;
    相似度计算模块,配置为计算所述中心词向量与所述上下文表示向量之间的第一相似度;A similarity calculation module, configured to calculate a first similarity between the central word vector and the context representation vector;
    矩阵更新模块,配置为至少以增大所述第一相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵;所述多次迭代更新后的第一词向量矩阵用于查询目标词语的目标词向量。The matrix update module is configured to update the first word vector matrix and the second word vector matrix with at least increasing the first degree of similarity; the first word vector matrix after multiple iterations update is used The target word vector for querying the target word.
  11. 根据权利要求10所述的装置,还包括:The device according to claim 10, further comprising:
    语料获取单元,配置为获取训练语料,其中包括多条训练语句;The corpus acquisition unit is configured to acquire training corpus, which includes multiple training sentences;
    分词单元,配置为对各条训练语句进行分词,根据分词结果,得到所述训练语料对应的词汇表,以及各训练语句对应的所述词语序列;The word segmentation unit is configured to segment each training sentence, and obtain the vocabulary table corresponding to the training corpus and the word sequence corresponding to each training sentence according to the word segmentation result;
    初始化单元,配置为根据所述词汇表,初始化所述第一词向量矩阵和第二词向量矩阵,其中每个矩阵的一行或一列对应于所述词汇表中的一个词语。The initialization unit is configured to initialize the first word vector matrix and the second word vector matrix according to the vocabulary list, wherein one row or one column of each matrix corresponds to a word in the vocabulary list.
  12. 根据权利要求10所述的装置,其中权重确定模块具体配置为:The device according to claim 10, wherein the weight determination module is specifically configured to:
    基于所述k个上下文词向量,确定k阶相似度方阵,其中第i行第j列元素表示第i个上下文词向量与第j个上下文词向量之间的相似度,其中i和j为不大于k的正整数;Based on the k context word vectors, determine the k-th order similarity square matrix, where the element in the i-th row and j-th column represents the similarity between the i-th context word vector and the j-th context word vector, where i and j are A positive integer not greater than k;
    对所述k阶相似度方阵中的各行分别进行归一化处理,得到k阶自注意力分数方阵;Normalize each row in the k-order similarity square matrix to obtain the k-order self-attention score square matrix;
    分别求取所述k阶自注意力分数方阵中各列的平均值,得到所述k个自注意力权重。The average value of each column in the k-order self-attention score square matrix is respectively calculated to obtain the k self-attention weights.
  13. 根据权利要求10所述的装置,其中矩阵更新模块具体配置为:The device according to claim 10, wherein the matrix update module is specifically configured as:
    从所述第一词向量矩阵中随机抽取某个第一词向量;Randomly extract a certain first word vector from the first word vector matrix;
    计算所述某个第一词向量和所述上下文表示向量之间的第二相似度;Calculating a second degree of similarity between the certain first word vector and the context representation vector;
    以增大所述第一相似度和减小所述第二相似度为目标,更新所述第一词向量矩阵和所述第二词向量矩阵。With the goal of increasing the first similarity degree and reducing the second similarity degree, the first word vector matrix and the second word vector matrix are updated.
  14. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-9中任一项的所述的方法。A computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of any one of claims 1-9.
  15. 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-9中任一项所述的方法。A computing device, comprising a memory and a processor, characterized in that executable code is stored in the memory, and when the processor executes the executable code, the device described in any one of claims 1-9 is implemented method.
PCT/CN2021/092009 2020-05-09 2021-05-07 Training of word vector embedding model WO2021227935A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010384236.5A CN111291165B (en) 2020-05-09 2020-05-09 Method and device for embedding training word vector into model
CN202010384236.5 2020-05-09

Publications (1)

Publication Number Publication Date
WO2021227935A1 true WO2021227935A1 (en) 2021-11-18

Family

ID=71018198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/092009 WO2021227935A1 (en) 2020-05-09 2021-05-07 Training of word vector embedding model

Country Status (2)

Country Link
CN (1) CN111291165B (en)
WO (1) WO2021227935A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292504A (en) * 2022-09-29 2022-11-04 北京如炬科技有限公司 Entity relationship classification method, device, equipment and storage medium
CN115329742A (en) * 2022-10-13 2022-11-11 深圳市大数据研究院 Scientific research project output evaluation acceptance method and system based on text analysis

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291165B (en) * 2020-05-09 2020-08-14 支付宝(杭州)信息技术有限公司 Method and device for embedding training word vector into model
CN112131864A (en) * 2020-09-10 2020-12-25 上海交通大学 Chinese word vector training method based on self-attention mechanism
CN112699666A (en) * 2020-12-29 2021-04-23 北京秒针人工智能科技有限公司 Method, system, equipment and storage medium for predicting keyword sound volume
CN113761934B (en) * 2021-07-29 2023-03-31 华为技术有限公司 Word vector representation method based on self-attention mechanism and self-attention model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165288A (en) * 2018-09-17 2019-01-08 北京神州泰岳软件股份有限公司 A kind of the term vector training method and device of multi-semantic meaning supervision
CN109308353A (en) * 2018-09-17 2019-02-05 北京神州泰岳软件股份有限公司 The training method and device of word incorporation model
CN110209823A (en) * 2019-06-12 2019-09-06 齐鲁工业大学 A kind of multi-tag file classification method and system
US20190286693A1 (en) * 2011-11-04 2019-09-19 International Business Machines Corporation Structured term recognition
CN111291165A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Method and device for embedding training word vector into model

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255118B (en) * 2017-07-11 2023-08-08 普天信息技术有限公司 Keyword extraction method and device
US10810378B2 (en) * 2018-10-25 2020-10-20 Intuit Inc. Method and system for decoding user intent from natural language queries
CN109948165B (en) * 2019-04-24 2023-04-25 吉林大学 Fine granularity emotion polarity prediction method based on mixed attention network
CN111026848B (en) * 2019-12-17 2022-08-02 电子科技大学 Chinese word vector generation method based on similar context and reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190286693A1 (en) * 2011-11-04 2019-09-19 International Business Machines Corporation Structured term recognition
CN109165288A (en) * 2018-09-17 2019-01-08 北京神州泰岳软件股份有限公司 A kind of the term vector training method and device of multi-semantic meaning supervision
CN109308353A (en) * 2018-09-17 2019-02-05 北京神州泰岳软件股份有限公司 The training method and device of word incorporation model
CN110209823A (en) * 2019-06-12 2019-09-06 齐鲁工业大学 A kind of multi-tag file classification method and system
CN111291165A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Method and device for embedding training word vector into model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292504A (en) * 2022-09-29 2022-11-04 北京如炬科技有限公司 Entity relationship classification method, device, equipment and storage medium
CN115329742A (en) * 2022-10-13 2022-11-11 深圳市大数据研究院 Scientific research project output evaluation acceptance method and system based on text analysis

Also Published As

Publication number Publication date
CN111291165A (en) 2020-06-16
CN111291165B (en) 2020-08-14

Similar Documents

Publication Publication Date Title
WO2021227935A1 (en) Training of word vector embedding model
CN110059160B (en) End-to-end context-based knowledge base question-answering method and device
WO2020082560A1 (en) Method, apparatus and device for extracting text keyword, as well as computer readable storage medium
WO2020062770A1 (en) Method and apparatus for constructing domain dictionary, and device and storage medium
Wang et al. Regularized latent semantic indexing
Filice et al. Kelp: a kernel-based learning platform for natural language processing
Nguyen et al. Toward mention detection robustness with recurrent neural networks
CN112368697A (en) System and method for evaluating a loss function or a gradient of a loss function via dual decomposition
CN109117474B (en) Statement similarity calculation method and device and storage medium
CN110516070B (en) Chinese question classification method based on text error correction and neural network
CN111832307A (en) Entity relationship extraction method and system based on knowledge enhancement
CN111414749A (en) Social text dependency syntactic analysis system based on deep neural network
CN111027292B (en) Method and system for generating limited sampling text sequence
CN113220865B (en) Text similar vocabulary retrieval method, system, medium and electronic equipment
US20220058349A1 (en) Data processing method, device, and storage medium
Zennaki et al. Unsupervised and lightly supervised part-of-speech tagging using recurrent neural networks
WO2022228127A1 (en) Element text processing method and apparatus, electronic device, and storage medium
US11829722B2 (en) Parameter learning apparatus, parameter learning method, and computer readable recording medium
CN109992667B (en) Text classification method and device
US20220222442A1 (en) Parameter learning apparatus, parameter learning method, and computer readable recording medium
Li et al. Exploiting argument information to improve biomedical event trigger identification via recurrent neural networks and supervised attention mechanisms
WO2021253938A1 (en) Neural network training method and apparatus, and video recognition method and apparatus
CN110046344A (en) Add the method and terminal device of separator
CN116720519B (en) Seedling medicine named entity identification method
CN113761151A (en) Synonym mining method, synonym mining device, synonym question answering method, synonym question answering device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21803137

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21803137

Country of ref document: EP

Kind code of ref document: A1