CN110852063B - Word vector generation method and device based on bidirectional LSTM neural network - Google Patents

Word vector generation method and device based on bidirectional LSTM neural network Download PDF

Info

Publication number
CN110852063B
CN110852063B CN201911045081.6A CN201911045081A CN110852063B CN 110852063 B CN110852063 B CN 110852063B CN 201911045081 A CN201911045081 A CN 201911045081A CN 110852063 B CN110852063 B CN 110852063B
Authority
CN
China
Prior art keywords
neural network
lstm neural
bidirectional lstm
word
word vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911045081.6A
Other languages
Chinese (zh)
Other versions
CN110852063A (en
Inventor
张睦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iol Wuhan Information Technology Co ltd
Original Assignee
Iol Wuhan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iol Wuhan Information Technology Co ltd filed Critical Iol Wuhan Information Technology Co ltd
Priority to CN201911045081.6A priority Critical patent/CN110852063B/en
Publication of CN110852063A publication Critical patent/CN110852063A/en
Application granted granted Critical
Publication of CN110852063B publication Critical patent/CN110852063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a word vector generation method and device based on a bidirectional LSTM neural network, wherein the method comprises the steps of training a second bidirectional LSTM neural network according to initial word vectors of linguistic data of a plurality of translators; determining a target translator from a plurality of translators, and training a word vector model according to editing behavior data corresponding to the corpus of the target translator; inputting the corpus of a plurality of translators into the trained word vector model, and obtaining an intermediate word vector according to the output of the first bidirectional LSTM neural network; adjusting the second bidirectional LSTM neural network according to the intermediate word vector of the corpus of the target translator; and inputting the intermediate word vectors of the linguistic data of the plurality of translators into the adjusted second bidirectional LSTM neural network, and obtaining a final word vector according to the vector for prediction generated by the second bidirectional LSTM neural network. The word vector generated by the embodiment of the invention can be widely applied to processing tasks of various natural languages.

Description

Word vector generation method and device based on bidirectional LSTM neural network
Technical Field
The invention relates to the technical field of language models, in particular to a word vector generation method and device based on a bidirectional LSTM neural network.
Background
The word is used as a basic semantic unit in the language, and the word is characterized as a vector (word vector) to be used as the input of a model, which is a very important and basic step in the current natural language processing process. The traditional method often obtains word vectors through techniques such as single-hot coding, PMI or PPMI matrix, co-occurrence matrix, SVD matrix decomposition and the like.
Around 2012, deep learning technology is increasingly popular in natural language processing, and word vector models based on neural networks are proposed, including Skip-gram, CBOW, gloVe and the like. The new word characterization method can better capture the semantic relativity between words, and is applied to the task directions of entity recognition, voice recognition, text classification, language model, intelligent question-answering and the like, so that the method has little progress.
However, there is a phenomenon that a word is ambiguous in a language, for example, english word tie has more than ten different semantics, so that the manner in which a word corresponds to a vector cannot sufficiently represent the semantics of the word. In the translation task, if the same translation manuscript is translated by different translators, different translation results are often generated. Such individualization of the phrase is derived from the nature of each translator itself, including various aspects of their educational background, translation experience, age level, etc. How to add these personalizing factors to the model to better assist the translator's translation is also a very challenging topic to solve.
Disclosure of Invention
The embodiment of the invention provides a word vector generation method and device based on a bidirectional LSTM neural network, which overcome or at least partially solve the problems.
In a first aspect, an embodiment of the present invention provides a word vector generating method based on a bidirectional LSTM neural network, including:
training a second bidirectional LSTM neural network according to the initial word vectors of the corpus of a plurality of translators;
determining a target translator from a plurality of translators, and training a word vector model according to editing behavior data corresponding to the corpus of the target translator;
inputting the linguistic data of the plurality of translators to the trained word vector model, and obtaining an intermediate word vector according to the output of the first bidirectional LSTM neural network;
adjusting the second bidirectional LSTM neural network according to the intermediate word vector of the corpus of the target translator;
inputting the intermediate word vectors of the linguistic data of the plurality of translators into the adjusted second bidirectional LSTM neural network, and obtaining a final word vector according to the vector for prediction generated by the second bidirectional LSTM neural network;
the word vector model comprises the first bidirectional LSTM neural network and the second bidirectional LSTM neural network, wherein the output layer of the first bidirectional LSTM neural network is connected with the input layer of the second bidirectional LSTM neural network
Preferably, the training of the second bidirectional LSTM neural network according to the corpus initial word vectors of a plurality of translators is specifically:
determining word vectors of the linguistic data of the plurality of translators from a preset initial word vector library, and taking the word vectors as initial word vectors;
and training the second bidirectional LSTM neural network from the positive sequence and the negative sequence of the corpus by using the initial word vectors of the corpuses of the plurality of translators.
Preferably, the training the word vector model according to the editing behavior data corresponding to the corpus of the target translator specifically includes:
characterizing each editing behavior data by using initialized character vectors in one-to-one correspondence;
training the word vector model from the positive sequence and the negative sequence of the corpus by utilizing the character vector of the editing behavior data corresponding to the corpus of the target translator so as to obtain the trained character vector.
Preferably, the intermediate word vector is obtained according to the output of the first bidirectional LSTM neural network, specifically:
inputting the corpus of the plurality of translators from positive sequences to the trained word vector model, and taking the output of a first bidirectional LSTM neural network as a first word representation;
inputting the corpus of the plurality of translators from the reverse order to the trained word vector model, and taking the output of the first bidirectional LSTM neural network as a second word representation;
and summing the first word representation and the second word representation to obtain the intermediate word vector.
Preferably, the adjusting the second bidirectional LSTM neural network according to the intermediate word vector of the corpus of the target translator specifically includes:
and training the second bidirectional LSTM neural network from the positive sequence and the negative sequence of the corpus by using the intermediate word vector of the corpus of the target translator.
Preferably, the training the bidirectional LSTM neural network from the positive sequence and the negative sequence of the corpus by using the initial word vectors of the corpora of the plurality of translators specifically includes:
training the second bidirectional LSTM neural network by taking the previous word example in the corpus of the plurality of translators as a sample and taking the probability of the subsequent word example of the previous word example as a sample label;
and then taking the following word cases in the corpus of the plurality of translators as samples, taking the probability of the preceding word cases of the following word cases as sample labels, and training the second bidirectional LSTM neural network again.
Preferably, the final word vector is obtained according to the vector for prediction generated by the second bidirectional LSTM neural network, specifically:
and obtaining a vector for positive sequence prediction and a vector for negative sequence prediction generated by the second bidirectional LSTM neural network, and summing the vector for positive sequence prediction and the vector for negative sequence prediction to obtain a final word vector.
In a second aspect, an embodiment of the present invention provides a word vector generating apparatus based on a bidirectional LSTM neural network, including:
the full training module is used for training the second bidirectional LSTM neural network according to the initial word vectors of the linguistic data of a plurality of translators;
the quantitative training module is used for determining a target translator from a plurality of translators and training a word vector model according to editing behavior data corresponding to the corpus of the target translator;
the intermediate quantity generation module is used for inputting the linguistic data of the plurality of translators to the trained word vector model, and obtaining an intermediate word vector according to the output of the first bidirectional LSTM neural network;
the adjusting module is used for adjusting the second bidirectional LSTM neural network according to the intermediate word vector of the corpus of the target translator;
the output module is used for inputting the intermediate word vectors of the linguistic data of the plurality of translators into the adjusted second bidirectional LSTM neural network, and obtaining a final word vector according to the vector for prediction generated by the second bidirectional LSTM neural network;
the word vector model comprises the first bidirectional LSTM neural network and the second bidirectional LSTM neural network, wherein the output layer of the first bidirectional LSTM neural network is connected with the input layer of the second bidirectional LSTM neural network
In a third aspect, an embodiment of the invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as provided in the first aspect when the program is executed.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as provided by the first aspect.
According to the word vector generation method and device based on the bidirectional LSTM neural network, the second bidirectional LSTM neural network is trained according to the corpus (full-quantity samples) of a plurality of translators, so that the second bidirectional LSTM neural network has a general translation style, the editing behavior data of the target translator is used as the representation of the expression for embodying the translator, the word vector model is trained through the editing behavior data, the purpose of describing the translation style of the target translator through a small quantity of sample enhanced word vector models is achieved, then the full-quantity samples are input into the trained word vector model, the output of the first bidirectional LSTM neural network is used as an intermediate word vector, the intermediate word vector is more accurate than the original word vector in terms of context semantics and the style of the target translator, then the intermediate word vector of the corpus of the target translator is adjusted to enable the second bidirectional LSTM neural network to be more accurate in terms of describing the target translator, and finally the full-quantity samples are input into the adjusted second bidirectional LSTM neural network, and a large quantity of translated words which can be more accurate in terms of the context and simultaneously. The word vector generated by the embodiment of the invention can be widely applied to processing tasks of various natural languages (model tasks requiring the word vector as input).
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a word vector generation method based on a bidirectional LSTM neural network according to an embodiment of the invention;
FIG. 2 is a schematic flow chart of training a second bidirectional LSTM neural network according to a positive sequence of corpus according to an embodiment of the invention;
FIG. 3 is a schematic flow chart of training a second bidirectional LSTM neural network according to the reverse order of corpus according to an embodiment of the invention;
fig. 4 is a schematic structural diagram of a word vector generating device based on a bidirectional LSTM neural network according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention is applicable to languages composed of letters, such as English, french, german, spanish and the like, and also applicable to languages composed of non-alphabetic elements but conveniently mapped to letters, such as Chinese (which can be mapped to Pinyin letters), japanese (which can be mapped to Roman letters) and the like. For convenience of description, the following embodiments mainly address English scenarios, and explain the schemes of the embodiments of the present invention
Fig. 1 is a schematic flow chart of a word vector generation method based on a bidirectional LSTM neural network according to an embodiment of the present invention, where an execution body of the flow includes at least one of the following devices: personal computers, medium-sized computers, computer clusters, mobile phones, tablet computers, intelligent wearable equipment, car computers and the like.
The flow in fig. 1 may include the steps of:
s101, training a second bidirectional LSTM neural network according to initial word vectors of corpus of a plurality of translators.
In the embodiment of the invention, the corpus refers to language materials, and the corpus is a basic unit for forming a corpus. A translator refers to a translator for translating a corpus in one language into a corpus in another language. It can be understood that different translators may translate different styles to the same corpus, so that different styles of translation results can be obtained by collecting the corpora of a plurality of translators, and a model without obvious translation style, i.e. conventional translation style, can be obtained by training the translation results of a plurality of styles together. The corpus of the translator in the embodiment of the invention refers to the translation result of the same target language, for example, the English translation result of the Chinese-English translation task.
LSTM (Long Short-Term Memory) is suitable for processing and predicting important events with very Long intervals and delays in time sequences, and as such, LSTM neural networks have the advantage of being more excellent and flexible to be applied to the fields of word vector generation, text translation and the like than ordinary RNN networks. The embodiment of the invention utilizes the characteristic that the bidirectional LSTM neural network has memory, and trains the second bidirectional LSTM neural network by utilizing the initial word vectors and the context relations of the linguistic data of a plurality of translators, thereby obtaining the second bidirectional LSTM neural network with general translation style.
In the embodiment of the invention, the initial word vector is trained by word vectors through a general corpus, and the purpose of the invention is to provide a feature vector representing the general meaning of a word example, specifically, the latest wikipedia English single-language corpus can be downloaded, word segmentation is performed, then the training of English word vectors is performed by using Skip-Gram algorithm and other algorithms, wherein some important super parameters can be set as follows: the dimension of the word vector is 300 and the context window is 5.
It can be understood that the training process of the second bidirectional LSTM neural network is divided into a positive sequence direction and a negative sequence direction, and can be summarized as that the probability of occurrence of the nth word example is predicted by using the initial word vector of the first n-1 word examples of the corpus, when the probability result accords with a preset threshold or the sample training is finished, the probability of occurrence of the nth word example is predicted by using the initial word vector of the last n-1 word examples of the corpus, and when the probability result accords with the preset threshold or the sample training is finished, the second bidirectional LSTM neural network stops training.
S102, determining a target translator from a plurality of translators, and training a word vector model according to editing behavior data corresponding to the corpus of the target translator.
The word vector model of the embodiment of the invention comprises a first bidirectional LSTM neural network and a second bidirectional LSTM neural network, wherein the output layer of the first bidirectional LSTM neural network is connected with the input layer of the second bidirectional LSTM neural network. The first bidirectional LSTM neural network is also a bidirectional LSTM neural network. The word vector model of the embodiment of the invention is that a first bidirectional LSTM neural network is arranged at the front end of a second bidirectional LSTM neural network, and the purpose is that:
the first aspect is to fully mine the context semantics of the sample by utilizing the advantage of more accurate feature extraction of the bidirectional LSTM neural network, so as to provide more accurate feature input quantity for the subsequent second bidirectional LSTM neural network;
in the second aspect, since the output of the first bidirectional LSTM neural network in the embodiment of the present invention is directly used as the input of the second bidirectional LSTM neural network, the training of the first bidirectional LSTM neural network does not need to set an additional tag (the additional tag refers to a tag other than the tag required for training the second bidirectional LSTM neural network), so that the complexity of model training is reduced;
in the third aspect, since the corpus quantity of the target translator is only a part of the corpus quantity of all translators, the training data quantity can be reduced through a part of corpus training word vector model, and the training time consumption is reduced;
in the fourth aspect, the first bidirectional LSTM neural network and the second bidirectional LSTM neural network are jointly trained, so that the whole word vector model is fused into the translation style of the target translator, and then the output of the first bidirectional LSTM neural network fused into the translation style of the target translator is used as an intermediate word vector for replacing the original word vector subsequently.
It should be noted that, the embodiment of the present invention is not limited to the specific selection rules of the target translator, and in practical application, the target translator may be a translator with higher translation quality and translation style identification.
The edit-action data of the translator refers to edit-action data generated by the translator at the time of translation, and the data is recorded in units of each word instance (token) in a sentence (segmented word), that is, a sequence of key strokes of a keyboard by a user is recorded according to a time sequence. This sequence completes a word instance in the sentence.
For example, an english translation result (corpus): the skill building is important. The word examples generated by word segmentation are: "the|skip|building|is|important|", where "|" represents a participle symbol.
For The word case "The", the key sequence is: t- > delete- > T- > h- > e;
for the word case "skin", the key sequence is: s- > k- > i- > l- > s- > delete;
for the word case "building", the key sequence is: d- > e- > v- > e- > delete- > b- > u- > i- > l- > d- > d- > i- > n- > g- > left- > delete;
for the word example "is", the key sequence is "i- > s";
for the word example of "important", the key sequence is "i- > m- > p- > o- > r- > a- > n- > t- > left- > t-";
for the word case ", the key sequence is. - > delete- > ";
the "- >" in the key sequence has no practical meaning, and is only for conveniently describing the key input process, so that the corpus is the editing behavior data corresponding to the corpus, namely, the set of key sequences of all word cases.
According to the word vector model trained in the step S102, on one hand, the second bidirectional LSTM neural network further strengthens the semantics of the translation style of the target translator after fusing the semantics of various translation styles, and on the other hand, the first bidirectional LSTM neural network can fuse the semantics of the translation style of the target translator.
S103, inputting the linguistic data of the plurality of translators into the trained word vector model, and obtaining an intermediate word vector according to the output of the first bidirectional LSTM neural network.
It should be noted that, by inputting the corpus of a plurality of translators to the trained word vector model, the output of the first bidirectional LSTM neural network can embody the translation style of the target translator, and the intermediate word vector is obtained through the output.
S104, adjusting the second bidirectional LSTM neural network according to the intermediate word vector of the corpus of the target translator.
Since the target translator is a subset of "multiple translators", step S103 obtains the intermediate word vectors of the corpora of the multiple translators, and then, in fact, obtains the intermediate word vectors of the corpora of the target translator. Because the first bidirectional LSTM neural network of the word vector model trained in step S102 can characterize the translation style of the target translator, adjusting the second bidirectional LSTM neural network with the intermediate word vector of the corpus of the target translator can enable the second bidirectional LSTM neural network to more accurately characterize the translation style of the target translator, and because the corpus of the target translator is less, power consumption required for fine tuning the second bidirectional LSTM neural network is also less.
S105, inputting the intermediate word vectors of the linguistic data of the plurality of translators into the adjusted second bidirectional LSTM neural network, and obtaining a final word vector according to the vector for prediction generated by the second bidirectional LSTM neural network.
It may be understood that the output of the second bidirectional LSTM neural network is a probability and not a feature vector, but before outputting the probability, a vector for prediction (probability) is generated.
According to the embodiment of the invention, the second bidirectional LSTM neural network is trained according to the corpus (full sample) of a plurality of translators, so that the second bidirectional LSTM neural network has a general translation style, the editing behavior data of the target translator is used as the representation of the expression of the translator, the word vector model is trained through the editing behavior data, the purpose of describing the translation style of the translator through the word vector model enhanced by a small number of samples is achieved, then the full sample is input into the trained word vector model, the output of the first bidirectional LSTM neural network is used as an intermediate word vector, the intermediate word vector is more accurate than the original word vector in terms of context and the style of the target translator, then the intermediate word vector of the corpus of the target translator is adjusted to the second bidirectional LSTM neural network, so that the second bidirectional LSTM neural network is more accurate in describing the style of the target translator, and finally the full sample is input into the adjusted second bidirectional LSTM neural network, so that a large number of word vectors which can accurately represent the context semantics and simultaneously more accord with the translation style of the target translator can be obtained. The word vector generated by the embodiment of the invention can be widely applied to processing tasks of various natural languages (model tasks requiring the word vector as input).
Based on the foregoing embodiments, as an optional embodiment, the training the second bidirectional LSTM neural network according to the corpus initial word vectors of the multiple translators specifically includes:
determining word vectors of the linguistic data of the plurality of translators from a preset initial word vector library, and taking the word vectors as initial word vectors;
and training the second bidirectional LSTM neural network from the positive sequence and the negative sequence of the corpus by using the initial word vectors of the corpuses of the plurality of translators.
Specifically, according to the embodiment of the invention, through downloading an English single-word corpus of Wikipedia and training English word vectors by using a Skip-Gram algorithm, word vectors corresponding to each English word case can be obtained, and thus an initial word vector library is constructed. For the linguistic data of a plurality of languages, the initial word vector required by training the second bidirectional LSTM neural network in the step S101 can be obtained by segmenting each linguistic data and then finding out the word vector corresponding to each segmented word from the initial word vector library.
The training of the second bidirectional LSTM neural network comprises the following steps: training the second bidirectional LSTM neural network by taking the previous word example in the corpus of the plurality of translators as a sample and taking the probability of the subsequent word example of the previous word example as a sample label; and then taking the following word cases in the corpus of the plurality of translators as samples, taking the probability of the preceding word cases of the following word cases as sample labels, and training the second bidirectional LSTM neural network again. And stopping training when the probability output by the second bidirectional LSTM neural network accords with a preset threshold or the sample training is finished.
Based on the foregoing embodiments, as an optional embodiment, the training the word vector model according to the editing behavior data corresponding to the corpus of the target translator specifically includes:
characterizing each editing behavior data by using initialized character vectors in one-to-one correspondence;
and training the word vector model by taking a character vector of the editing behavior data of the previous word example in the corpus of the target translator as a sample and taking the probability of the subsequent word example of the previous word example as a sample label so as to obtain a trained character vector.
It should be noted that, in the embodiment of the present invention, when training a word vector model, an initial character vector is configured for each editing behavior, so that each word instance of the corpus of the target translator can be represented by a plurality of character vectors, the positive sequence training of the second bidirectional LSTM neural network is to give the first n-1 words, predict the nth word, if the prediction is wrong, the Loss is generated, and update the model parameters (including the character vectors) of the second bidirectional LSTM neural network and the first bidirectional LSTM neural network according to the Loss and by using a back propagation algorithm to reduce the Loss until the Loss is lower than a preset threshold, the difference between the training process of the reverse sequence and the positive sequence is only that the sequence of the input words is opposite to the predicted word sequence, and other processes are basically consistent and are not repeated herein.
On the basis of the foregoing embodiments, as an optional embodiment, the obtaining the intermediate word vector according to the output of the first bidirectional LSTM neural network specifically includes:
inputting the corpus of the plurality of translators from positive sequences to the trained word vector model, and taking the output of a first bidirectional LSTM neural network as a first word representation;
inputting the corpus of the plurality of translators from the reverse order to the trained word vector model, and taking the output of the first bidirectional LSTM neural network as a second word representation;
and summing the first word representation and the second word representation to obtain the intermediate word vector.
It should be noted that, by acquiring the trained character vector, each editing behavior has a fixed character vector, so that the character vector of each word case, for example, the word case, can be obtained: apple can obtain character vectors of word cases by obtaining character vectors corresponding to a, p, l and e, and can input the obtained character vectors of each word case into a trained word vector model according to positive sequences and negative sequences of corpus.
Based on the foregoing embodiments, as an optional embodiment, the adjusting the second bidirectional LSTM neural network according to the intermediate word vector of the corpus of the target translator includes:
and training the second bidirectional LSTM neural network from the positive sequence and the negative sequence of the corpus by using the intermediate word vector of the corpus of the target translator.
Specifically, taking a positive sequence of a corpus as an example, taking an intermediate word vector of a preceding word example in the corpus of the target translator as a sample, taking the probability of a subsequent word example of the preceding word example as a sample label, and training the second bidirectional LSTM neural network.
Fig. 2 is a schematic flow chart of training a second bidirectional LSTM neural network according to a positive sequence of corpus according to an embodiment of the present invention, where the second bidirectional LSTM neural network is configured to complete:
t1, z1=f (t 0, < s > word vector)
Calculating the probability of which word in the word list is the first word through the value of z 1; the value of p (first word is it) is The largest, then The first word is The;
t2, z2=f (t 1, the word vector)
Calculating the probability of which word in the word list is the second word through the value of z 2; the value of p (second word is skill) is the largest, then the second word is skill;
t3, z3=f (t 2, skill word vector)
Calculating the probability that each word in the vocabulary is a third word through the value of z 3; the value of p (the third word is building) is the largest, and the third word is important;
and so on, the overall probability of The skill building is important positive sequence occurrence is ultimately predicted.
Fig. 3 is a schematic flow chart of training a second bidirectional LSTM neural network according to a reverse order of corpus according to an embodiment of the present invention, where the second bidirectional LSTM neural network is configured to complete:
h1, y1=f (h 0, importent word vector)
Calculating the probability of which word in the word list is the penultimate word through the value of y 1; the value of p (penultimate word is) is the largest, then the penultimate word is;
h2, y2=f (h 1, is word vector)
Calculating the probability of which word in the word list is the third word through the value of y 2; the value of p (the third last word is building) is the largest, and the third last word is building;
h3, y3=f (h 2, building word vector)
Calculating the probability that each word in the vocabulary is the fourth last word through the value of y 3; the value of p (the fourth last word is the skill) is the largest, the fourth last word is the skill;
and so on, the overall probability of The skill building is important reverse order occurrence is ultimately predicted.
Fig. 4 is a schematic structural diagram of a word vector generating device based on a bidirectional LSTM neural network according to an embodiment of the present invention, as shown in fig. 4, where the word vector generating device based on the bidirectional LSTM neural network includes: a full quantity training module 401, a quantitative training module 402, an intermediate quantity generation module 403, an adjustment module 404, and an output module 405, wherein:
the full-quantity training module 401 is configured to train the second bidirectional LSTM neural network according to initial word vectors of the corpora of the plurality of translators;
the quantitative training module 402 is configured to determine a target translator from a plurality of translators, and train a word vector model according to editing behavior data corresponding to a corpus of the target translator;
the intermediate quantity generation module 403 is configured to input the corpora of the multiple translators to the trained word vector model, and obtain an intermediate word vector according to the output of the first bidirectional LSTM neural network;
an adjustment module 404, configured to adjust the second bidirectional LSTM neural network according to the intermediate word vector of the corpus of the target translator;
the output module 405 is configured to input the intermediate word vectors of the corpora of the multiple translators to the adjusted second bidirectional LSTM neural network, and obtain a final word vector according to the vector for prediction generated by the second bidirectional LSTM neural network;
the word vector model comprises the first bidirectional LSTM neural network and the second bidirectional LSTM neural network, wherein the output layer of the first bidirectional LSTM neural network is connected with the input layer of the second bidirectional LSTM neural network
The word vector generating device based on the bidirectional LSTM neural network provided in the embodiment of the present invention specifically executes the flow of the embodiment of the word vector generating method based on the bidirectional LSTM neural network, and details of the embodiment of the word vector generating method based on the bidirectional LSTM neural network are described in detail herein, which are not repeated. According to the word vector generation device based on the bidirectional LSTM neural network, the second bidirectional LSTM neural network is trained according to the corpus (full-quantity samples) of a plurality of translators, so that the second bidirectional LSTM neural network has a general translation style, the editing behavior data of the target translator is used as the representation of the expression for embodying the translator, the word vector model is trained through the editing behavior data, the purpose of describing the translation style of the translator through a small quantity of sample enhancement word vector model is achieved, the full-quantity samples are input into the trained word vector model, the output of the first bidirectional LSTM neural network is used as an intermediate word vector, the intermediate word vector is more accurate than the original word vector in terms of context and the style of the target translator, then the second bidirectional LSTM neural network is adjusted to enable the second bidirectional LSTM neural network to be more accurate in terms of the style of the target translator, and finally the full-quantity samples are input into the adjusted second bidirectional LSTM neural network, and therefore a large quantity of translated words which can be accurately represented in context and more accords with the target translation style of the target translator can be obtained. The word vector generated by the embodiment of the invention can be widely applied to processing tasks of various natural languages (model tasks requiring the word vector as input).
Fig. 5 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, where, as shown in fig. 5, the electronic device may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke a computer program stored in memory 530 and executable on processor 510 to perform the bi-directional LSTM neural network based word vector generation method provided by the above embodiments, including, for example: training a second bidirectional LSTM neural network according to the initial word vectors of the corpus of a plurality of translators; determining a target translator from a plurality of translators, and training a word vector model according to editing behavior data corresponding to the corpus of the target translator; inputting the linguistic data of the plurality of translators to the trained word vector model, and obtaining an intermediate word vector according to the output of the first bidirectional LSTM neural network; adjusting the second bidirectional LSTM neural network according to the intermediate word vector of the corpus of the target translator; inputting the intermediate word vectors of the linguistic data of the plurality of translators into the adjusted second bidirectional LSTM neural network, and obtaining a final word vector according to the vector for prediction generated by the second bidirectional LSTM neural network; the word vector model comprises a first bidirectional LSTM neural network and a second bidirectional LSTM neural network, wherein an output layer of the first bidirectional LSTM neural network is connected with an input layer of the second bidirectional LSTM neural network.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in essence or a part contributing to the prior art or a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Embodiments of the present invention also provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the word vector generation method based on the bidirectional LSTM neural network provided in the foregoing embodiments, for example, including: training a second bidirectional LSTM neural network according to the initial word vectors of the corpus of a plurality of translators; determining a target translator from a plurality of translators, and training a word vector model according to editing behavior data corresponding to the corpus of the target translator; inputting the linguistic data of the plurality of translators to the trained word vector model, and obtaining an intermediate word vector according to the output of the first bidirectional LSTM neural network; adjusting the second bidirectional LSTM neural network according to the intermediate word vector of the corpus of the target translator; inputting the intermediate word vectors of the linguistic data of the plurality of translators into the adjusted second bidirectional LSTM neural network, and obtaining a final word vector according to the vector for prediction generated by the second bidirectional LSTM neural network; the word vector model comprises a first bidirectional LSTM neural network and a second bidirectional LSTM neural network, wherein an output layer of the first bidirectional LSTM neural network is connected with an input layer of the second bidirectional LSTM neural network.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A word vector generation method based on a bidirectional LSTM neural network, comprising:
training a second bidirectional LSTM neural network according to the initial word vectors of the corpus of a plurality of translators;
determining a target translator from a plurality of translators, and training a word vector model according to editing behavior data corresponding to the corpus of the target translator;
inputting the linguistic data of the plurality of translators to the trained word vector model, and obtaining an intermediate word vector according to the output of the first bidirectional LSTM neural network;
adjusting the second bidirectional LSTM neural network according to the intermediate word vector of the corpus of the target translator;
inputting the intermediate word vectors of the linguistic data of the plurality of translators into the adjusted second bidirectional LSTM neural network, and obtaining a final word vector according to the vector for prediction generated by the second bidirectional LSTM neural network;
the word vector model comprises a first bidirectional LSTM neural network and a second bidirectional LSTM neural network, wherein an output layer of the first bidirectional LSTM neural network is connected with an input layer of the second bidirectional LSTM neural network.
2. The word vector generation method based on the bidirectional LSTM neural network according to claim 1, wherein the training the second bidirectional LSTM neural network according to the corpus initial word vectors of a plurality of translators specifically comprises:
determining word vectors of the linguistic data of the plurality of translators from a preset initial word vector library, and taking the word vectors as initial word vectors;
and training the second bidirectional LSTM neural network from the positive sequence and the negative sequence of the corpus by using the initial word vectors of the corpuses of the plurality of translators.
3. The word vector generation method based on the bidirectional LSTM neural network according to claim 1, wherein the training the word vector model according to the editing behavior data corresponding to the corpus of the target translator specifically includes:
characterizing each editing behavior data by using initialized character vectors in one-to-one correspondence;
training the word vector model from the positive sequence and the negative sequence of the corpus by utilizing the character vector of the editing behavior data corresponding to the corpus of the target translator so as to obtain the trained character vector.
4. The word vector generation method based on the bidirectional LSTM neural network according to claim 1, wherein the obtaining the intermediate word vector according to the output of the first bidirectional LSTM neural network specifically includes:
inputting the corpus of the plurality of translators from positive sequences to the trained word vector model, and taking the output of a first bidirectional LSTM neural network as a first word representation;
inputting the corpus of the plurality of translators from the reverse order to the trained word vector model, and taking the output of the first bidirectional LSTM neural network as a second word representation;
and summing the first word representation and the second word representation to obtain the intermediate word vector.
5. The word vector generation method based on the bidirectional LSTM neural network according to claim 1, wherein the adjusting the second bidirectional LSTM neural network according to the intermediate word vector of the corpus of the target translator specifically includes:
and training the second bidirectional LSTM neural network from the positive sequence and the negative sequence of the corpus by using the intermediate word vector of the corpus of the target translator.
6. The word vector generation method based on the bidirectional LSTM neural network according to claim 2, wherein the training the bidirectional LSTM neural network from the positive sequence and the negative sequence by using the initial word vectors of the corpora of the plurality of translators specifically comprises:
training the second bidirectional LSTM neural network by taking the previous word example in the corpus of the plurality of translators as a sample and taking the probability of the subsequent word example of the previous word example as a sample label;
and then taking the following word cases in the corpus of the plurality of translators as samples, taking the probability of the preceding word cases of the following word cases as sample labels, and training the second bidirectional LSTM neural network again.
7. The word vector generation method based on the bidirectional LSTM neural network according to claim 6, wherein the obtaining a final word vector according to the vector for prediction generated by the second bidirectional LSTM neural network is specifically:
and obtaining a vector for positive sequence prediction and a vector for negative sequence prediction generated by the second bidirectional LSTM neural network, and summing the vector for positive sequence prediction and the vector for negative sequence prediction to obtain a final word vector.
8. A word vector generation device based on a bidirectional LSTM neural network, comprising:
the full training module is used for training the second bidirectional LSTM neural network according to the initial word vectors of the linguistic data of a plurality of translators;
the quantitative training module is used for determining a target translator from a plurality of translators and training a word vector model according to editing behavior data corresponding to the corpus of the target translator;
the intermediate quantity generation module is used for inputting the linguistic data of the plurality of translators to the trained word vector model, and obtaining an intermediate word vector according to the output of the first bidirectional LSTM neural network;
the adjusting module is used for adjusting the second bidirectional LSTM neural network according to the intermediate word vector of the corpus of the target translator;
the output module is used for inputting the intermediate word vectors of the linguistic data of the plurality of translators into the adjusted second bidirectional LSTM neural network, and obtaining a final word vector according to the vector for prediction generated by the second bidirectional LSTM neural network;
the word vector model comprises a first bidirectional LSTM neural network and a second bidirectional LSTM neural network, wherein an output layer of the first bidirectional LSTM neural network is connected with an input layer of the second bidirectional LSTM neural network.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor performs the steps of the bi-directional LSTM neural network based word vector generation method as claimed in any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the bi-directional LSTM neural network based word vector generation method of any one of claims 1 to 7.
CN201911045081.6A 2019-10-30 2019-10-30 Word vector generation method and device based on bidirectional LSTM neural network Active CN110852063B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911045081.6A CN110852063B (en) 2019-10-30 2019-10-30 Word vector generation method and device based on bidirectional LSTM neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911045081.6A CN110852063B (en) 2019-10-30 2019-10-30 Word vector generation method and device based on bidirectional LSTM neural network

Publications (2)

Publication Number Publication Date
CN110852063A CN110852063A (en) 2020-02-28
CN110852063B true CN110852063B (en) 2023-05-05

Family

ID=69598898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911045081.6A Active CN110852063B (en) 2019-10-30 2019-10-30 Word vector generation method and device based on bidirectional LSTM neural network

Country Status (1)

Country Link
CN (1) CN110852063B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353147B (en) * 2020-03-11 2023-03-03 鹏城实验室 Password strength evaluation method, device, equipment and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108874785A (en) * 2018-06-01 2018-11-23 清华大学 A kind of translation processing method and system
EP3408755A1 (en) * 2016-01-26 2018-12-05 Koninklijke Philips N.V. Systems and methods for neural clinical paraphrase generation
CN109165387A (en) * 2018-09-20 2019-01-08 南京信息工程大学 A kind of Chinese comment sentiment analysis method based on GRU neural network
CN109614479A (en) * 2018-10-29 2019-04-12 山东大学 A kind of judgement document's recommended method based on distance vector
TW201926078A (en) * 2017-11-30 2019-07-01 香港商阿里巴巴集團服務有限公司 Word vector processing method, apparatus and device
CN110119507A (en) * 2018-02-05 2019-08-13 阿里巴巴集团控股有限公司 Term vector generation method, device and equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484682B (en) * 2015-08-25 2019-06-25 阿里巴巴集团控股有限公司 Machine translation method, device and electronic equipment based on statistics
CN107608973A (en) * 2016-07-12 2018-01-19 华为技术有限公司 A kind of interpretation method and device based on neutral net
US10733380B2 (en) * 2017-05-15 2020-08-04 Thomson Reuters Enterprise Center Gmbh Neural paraphrase generator
US11030414B2 (en) * 2017-12-26 2021-06-08 The Allen Institute For Artificial Intelligence System and methods for performing NLP related tasks using contextualized word representations

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3408755A1 (en) * 2016-01-26 2018-12-05 Koninklijke Philips N.V. Systems and methods for neural clinical paraphrase generation
TW201926078A (en) * 2017-11-30 2019-07-01 香港商阿里巴巴集團服務有限公司 Word vector processing method, apparatus and device
CN110119507A (en) * 2018-02-05 2019-08-13 阿里巴巴集团控股有限公司 Term vector generation method, device and equipment
CN108874785A (en) * 2018-06-01 2018-11-23 清华大学 A kind of translation processing method and system
CN109165387A (en) * 2018-09-20 2019-01-08 南京信息工程大学 A kind of Chinese comment sentiment analysis method based on GRU neural network
CN109614479A (en) * 2018-10-29 2019-04-12 山东大学 A kind of judgement document's recommended method based on distance vector

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Yuwei Huang ; Xi Yang ; Fuzhen Zhuang ; Lishan Zhang ; Shengquan Yu.Automatic Chinese Reading Comprehension Grading by LSTM with Knowledge Adaptation.Advances in Knowledge Discovery and Data Mining.2018,第10937卷第118-129页. *
何馨宇 ; 李丽双 ; .基于双向LSTM和两阶段方法的触发词识别.中文信息学报.2017,(第06期),第151-158页. *
刘婉婉 ; 苏依拉 ; 乌尼尔 ; 仁庆道尔吉 ; .基于LSTM的蒙汉机器翻译的研究.计算机工程与科学.2018,(第10期),第178-184页. *

Also Published As

Publication number Publication date
CN110852063A (en) 2020-02-28

Similar Documents

Publication Publication Date Title
KR102577514B1 (en) Method, apparatus for text generation, device and storage medium
CN108460013B (en) Sequence labeling model and method based on fine-grained word representation model
KR102382499B1 (en) Translation method, target information determination method, related apparatus and storage medium
Yao et al. An improved LSTM structure for natural language processing
WO2018010455A1 (en) Neural network-based translation method and apparatus
CN110083710B (en) Word definition generation method based on cyclic neural network and latent variable structure
CN110674646A (en) Mongolian Chinese machine translation system based on byte pair encoding technology
CN109284397A (en) A kind of construction method of domain lexicon, device, equipment and storage medium
CN111666758B (en) Chinese word segmentation method, training device and computer readable storage medium
CN108304376B (en) Text vector determination method and device, storage medium and electronic device
KR102043353B1 (en) Apparatus and method for recognizing Korean named entity using deep-learning
CN114580382A (en) Text error correction method and device
CN111144140B (en) Zhongtai bilingual corpus generation method and device based on zero-order learning
CN110162789A (en) A kind of vocabulary sign method and device based on the Chinese phonetic alphabet
CN112560510B (en) Translation model training method, device, equipment and storage medium
CN113704416B (en) Word sense disambiguation method and device, electronic equipment and computer-readable storage medium
CN112463942A (en) Text processing method and device, electronic equipment and computer readable storage medium
Mocialov et al. Transfer learning for british sign language modelling
CN114757184B (en) Method and system for realizing knowledge question and answer in aviation field
Basmatkar et al. Survey on neural machine translation for multilingual translation system
Krishnan et al. Character based bidirectional LSTM for disambiguating tamil part-of-speech categories
CN114298010A (en) Text generation method integrating dual-language model and sentence detection
CN110852063B (en) Word vector generation method and device based on bidirectional LSTM neural network
CN110866404B (en) Word vector generation method and device based on LSTM neural network
CN111985251B (en) Translation quality evaluation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant