CN107729329A - A kind of neural machine translation method and device based on term vector interconnection technique - Google Patents

A kind of neural machine translation method and device based on term vector interconnection technique Download PDF

Info

Publication number
CN107729329A
CN107729329A CN201711091457.8A CN201711091457A CN107729329A CN 107729329 A CN107729329 A CN 107729329A CN 201711091457 A CN201711091457 A CN 201711091457A CN 107729329 A CN107729329 A CN 107729329A
Authority
CN
China
Prior art keywords
word
source
vector
sentence
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711091457.8A
Other languages
Chinese (zh)
Other versions
CN107729329B (en
Inventor
熊德意
邝少辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iol Wuhan Information Technology Co ltd
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201711091457.8A priority Critical patent/CN107729329B/en
Publication of CN107729329A publication Critical patent/CN107729329A/en
Application granted granted Critical
Publication of CN107729329B publication Critical patent/CN107729329B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of neural machine translation method based on term vector interconnection technique, including:In coding stage, encoder obtains the term vector sequence of source statement, the hidden layer sequence vector according to corresponding to the forward direction sequence vector of determination and opposite vector sequence determine source statement, vector representation containing contextual information corresponding to the word of each source includes forward direction hidden layer state corresponding to the source word, reverse hidden layer state and word vector, context vector can be obtained, in decoding stage, the target word of decoder prediction respective sources word, so as to generate the object statement of source statement.The technical scheme provided using the embodiment of the present invention, the information channel between source word vector sum destination end word vector is shortened, the connection and mapping between word vector is enhanced, enhances translation system performance, improve translation quality.The invention also discloses a kind of neural machine translation apparatus based on term vector interconnection technique, has relevant art effect.

Description

Neural machine translation method and device based on word vector connection technology
Technical Field
The invention relates to the technical field of Neural Machine Translation (NMT), in particular to a neural machine translation method and device based on a word vector connection technology.
Background
With the rapid development of computer technology, the computing power of computers is continuously improved, big data is widely applied, and deep learning is further applied. The deep learning-based NMT (Neural Machine Translation) technology is receiving increasing attention.
In the NMT field, the more common translation model is the encoder-decoder model with attention-based (attention-based). The main idea is that a sentence to be translated, namely a source sentence, is coded by a coder encoder, is expressed by using a vector, and then the vector expression of the source sentence is decoded by a decoder and translated into a corresponding translation, namely a target sentence. The encoder-decoder framework is a core idea of deep learning, and is a basic framework commonly used by an NMT system. Currently, the mainstream NMT system, encoder and decoder, all use RNN (Recurrent neural network) technology. RNN techniques have certain advantages in processing timing information, and can process an input of arbitrary length and convert it into a fixed-dimension vector.
In the NMT system, the encoder-decoder framework learns word-vectors (word-embedding) of source sentences and target sentences simultaneously during training. However, since the learned word vectors of the source sentence and the target sentence are respectively located at two ends (source end and target end) of the encoder-decoder framework, a very complicated information conversion channel (encoder and decoder) needs to be passed in the middle, so that direct connection and mapping between the word vectors of the source sentence and the target sentence are lacked, which easily causes the NMT system to generate wrong word alignment information, thereby reducing translation quality.
Disclosure of Invention
The invention aims to provide a neural machine translation method and device based on a word vector connection technology, so as to enhance the connection and mapping between word vectors, enhance the performance of a translation system and improve the translation quality.
In order to solve the technical problems, the invention provides the following technical scheme:
a neural machine translation method based on a word vector connection technology comprises the following steps:
in the encoding stage, an encoder encodes the read source sentences to obtain word vector sequences x = < x of the source sentences 1 ,x 2 ,…,x j ,…,x T >; wherein x is j The word vector of the jth source word in the source sentence is obtained, and T represents the number of the source words contained in the source sentence;
the forward recurrent neural network RNN of the encoder determines a forward vector sequence consisting of hidden vectors from the word vector sequenceWherein,f is a nonlinear activation function for the forward hidden layer state of the jth source word;
the reverse RNN of the encoder determines a reverse vector sequence composed of hidden vectors from the word vector sequenceWherein,a reverse hidden state for the jth source word;
determining that the hidden vector sequence < h corresponding to the source sentence according to the forward vector sequence and the reverse vector sequence 1 ,h 2 ,…,h j ,…,h T >; wherein,representing the vector containing the context information corresponding to each source word in the source sentences;
obtaining a context vector c by utilizing an attention network according to the hidden layer vector sequence t =q({h 1 ,h 2 ,…,h j ,…,h T }); wherein q is a nonlinear activation function;
in the decoding stage, the decoder is based on the contextVector c t And the current predicted target word y 1 ,y 2 ,…,y t-1 Predicting the target word y of the corresponding source word t Generating a target sentence of the source sentenceT y Representing the number of target words contained in the target sentence.
In one embodiment of the present invention, the probability p (y) of generating the target sentence y of the source sentence is:
wherein, p (y) t |{y 1 ,y 2 ,…,y t-1 },c t )=g(y t-1 ,s t ,c t ) G is a non-linear activation function, s t Is a hidden state in the RNN.
In one embodiment of the present invention,wherein, t * =arg max jtj ),α tj Are weights calculated from the attention network in the conventional NMT model.
In one embodiment of the present invention, the target sentence of the source sentence is generatedThen, the method further comprises the following steps:
determining a training setN represents the number of sentence pairs contained in the training corpus, x n 、y n Representing a sentence pair;
and based on the training set, performing model training according to a preset target training function.
In an embodiment of the present invention, the target training function is:
wherein,for the word vector loss function, w is the transformation matrix.
A neural machine translation device based on word vector join technology, comprising:
a word vector sequence obtaining module, configured to, in a coding stage, code the read source sentences by an encoder, and obtain a word vector sequence x = < x of the source sentences 1 ,x 2 ,…,x j ,…,x T >; wherein x is j The word vector of the jth source word in the source sentence is obtained, and T represents the number of the source words contained in the source sentence;
a forward vector sequence determination module for determining a forward vector sequence composed of hidden vectors according to the word vector sequence by a forward Recurrent Neural Network (RNN) of the encoderWherein,f is a nonlinear activation function for the forward hidden layer state of the jth source word;
a reverse vector sequence determination module for determining a reverse vector sequence composed of hidden vectors according to the word vector sequence by a reverse RNN of the encoderWherein,a reverse hidden state for the jth source word;
a hidden vector sequence determining module, configured to determine, according to the forward vector sequence and the reverse vector sequence, that a hidden vector sequence < h corresponds to the source sentence 1 ,h 2 ,…,h j ,…,h T >; wherein,representing the vector containing the context information corresponding to each source word in the source sentences;
a context vector obtaining module for obtaining a context vector c by using an attention network according to the hidden layer vector sequence t =q({h 1 ,h 2 ,…,h j ,…,h T }); wherein q is a nonlinear activation function;
a target statement generation module for, in a decoding stage, decoding the context vector c by a decoder t And the current predicted target word y 1 ,y 2 ,…,y t-1 H, predicting the target word y of the corresponding source word t Generating a target sentence of the source sentenceT y Representing the number of target words contained in the target sentence.
In one embodiment of the present invention, the probability p (y) of generating the target sentence y of the source sentence is:
wherein, p (y) t |{y 1 ,y 2 ,…,y t-1 },c t )=g(y t-1 ,s t ,c t ) G is a non-linear activation function, s t Is a hidden state in the RNN.
In one embodiment of the present invention,wherein, t * =arg max jtj ),α tj Are weights calculated from the attention network in the conventional NMT model.
In an embodiment of the present invention, the system further includes a training module, configured to:
generating the target sentence of the source sentenceThereafter, a training set is determinedN represents the number of sentence pairs contained in the training corpus, x n 、y n Representing a sentence pair;
and based on the training set, carrying out model training according to a preset target training function.
In an embodiment of the present invention, the target training function is:
wherein,for the word vector loss function, w is the transformation matrix.
By applying the technical scheme provided by the embodiment of the invention, in the encoding stage, an encoder encodes a read source sentence to obtain a word vector sequence of the source sentence, a forward RNN network of the encoder determines a forward vector sequence, a reverse RNN network determines a reverse vector sequence, a hidden vector sequence corresponding to the source sentence is determined according to the forward vector sequence and the reverse vector sequence, a vector containing context information corresponding to each source word represents a forward hidden state, a reverse hidden state and a word vector corresponding to the source word, the context vector can be obtained by using an attention network according to the hidden vector sequence, and in the decoding stage, a decoder predicts a target word of the corresponding source word according to the context vector and the currently predicted target word to generate a target sentence of the source sentence. The method shortens the information channel between the word vector of the source end and the word vector of the target end, enhances the connection and mapping between the word vectors, enhances the performance of a translation system and improves the translation quality.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating an embodiment of a neural machine translation method based on word vector concatenation;
fig. 2 is a schematic diagram of an NMT model of a fusion source hidden layer state connection model in an embodiment of the present invention;
FIG. 3 is a schematic diagram of an NMT model of a fused target hidden layer state connection model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an NMT model fused with a direct link model according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a neural machine translation device based on a word vector connection technique according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, reference will now be made in detail to the embodiments of the disclosure as illustrated in the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, there is a flowchart illustrating an implementation of a neural machine translation method based on a word vector join technique according to an embodiment of the present invention, where the method includes the following steps:
s110: in the encoding stage, the encoder encodes the read source sentences to obtain word vector sequences x = < x of the source sentences 1 ,x 2 ,…,x j ,…,x T >。
Wherein x is j The word vector of the jth source word in the source sentence, and T represents the number of source words contained in the source sentence.
The technical scheme provided by the embodiment of the invention is based on a basic NMT system. The encoder encorder is formed by a bi-directional RNN network, i.e., including a forward RNN network and a reverse RNN network, and the decoder is formed by an RNN network.
In practical applications, each word of each sentence in the training corpus may be initialized to a word vector in advance, and the word vectors of all words constitute a word vector dictionary. The word vector is generally a multi-dimensional vector, each dimension in the vector is a real number, and the dimension can be finally determined according to the result in the experimental process. For example, for the word "xans," the corresponding word vector may be <0.12, -0.23, \8230;, 0.99>.
In the encoding stage, the encoder may encode the read source sentences, that is, encode the source sentences into a series of vectors, and obtain a word vector sequence x = < x of the source sentences 1 ,x 2 ,…,x j ,…,x T > (ii). Wherein x is j The word vector for the jth source word in the source sentence, which may be an m-dimensional vector, represents the number of source words contained in the source sentence.
S120: the forward recurrent neural network RNN of the encoder determines a sequence of forward vectors consisting of hidden vectors from the sequence of word vectors
Wherein,for the forward hidden state of the jth source word, f is the nonlinear activation function.
The forward RNN network of the encoder can determine a forward vector sequence consisting of hidden vectors from the word vector sequence
For the forward hidden state of the jth source word, f is a nonlinear activation function, and particularly, GRU or LSTM may be used.
S130: the reverse RNN of the encoder determines a reverse vector sequence consisting of hidden vectors from the word vector sequence
Wherein,is the reverse hidden state of the jth source word.
The reverse RNN network of the encoder may determine a reverse vector sequence composed of hidden vectors according to the word vector sequence, as in the principle of step S120
The reverse hidden state for the jth source word.
S140: determining hidden layer vector sequence < h corresponding to source sentence according to forward vector sequence and reverse vector sequence 1 ,h 2 ,…,h j ,…,h T >。
Wherein,a vector representation containing context information corresponding to each source word in the source sentence.
After determining the forward vector sequence and the backward vector sequence, the forward hidden layer state corresponding to each source word can be connectedAnd reverse hidden layer stateAnd based on the hidden vector sequence, determining that the hidden vector sequence corresponding to the source sentence is less than h 1 ,h 2 ,…,h j ,…,h T >。A vector representation containing context information corresponding to each source word in the source sentence. Namely, the vector containing the context information corresponding to each source word represents the forward hidden layer state, the reverse hidden layer state and the word vector corresponding to the source word, and a source hidden layer state connection model is formed. A schematic diagram of an NMT model of a fusion source hidden layer state connection model is shown in fig. 2. In this model, the word vector for each source word is simply concatenated behind the hidden state vector for the corresponding source word by the bi-directional RNN network Encoder BiRNN Encoder. The word vector for the source word may be used not only to calculate the weights for the attention network, but also to predict the target word.
S150: obtaining a context vector c by utilizing an attention network according to the hidden vector sequence t =q({h 1 ,h 2 ,…,h j ,…,h T })。
Wherein q is a nonlinear activation function.
In step S140, after determining the hidden vector sequence corresponding to the source sentence, the context vector c can be obtained by using an attention network according to the hidden vector sequence t =q({h 1 ,h 2 ,…,h j ,…,h T }). q is a nonlinear activation function.
Specifically, the context vector may be obtained using an attribute network:
wherein a is a forward network of one layer, α tj For each hidden state h of the encoder j The weight of (c).
S160: in the decoding stage, the decoder bases on the context vector c t And the current predicted target word y 1 ,y 2 ,…,y t-1 H, predicting the target word y of the corresponding source word t Generating a target sentence of a source sentence
T y Indicating the number of target words contained in the target sentence.
In the decoding stage, the decoder may rely on the context vector c t And the current predicted target word y 1 ,y 2 ,…,y t-1 H, predicting the target word y of the corresponding source word t Thereby, a target sentence of the source sentence can be generatedI.e. given a context vector c t And the results of all predicted target words y 1 ,y 2 ,…,y t-1 Can continue to predict y t Thereby generating a target sentence of the source sentence.
In the embodiment of the invention, the encoder and the decoder both adopt RNN networks, mainly because the RNN networks have the following characteristics: the hidden state is determined by the current input and the last hidden state. For example, in the encoding stage, the hidden state is determined by the word vector of the current word at the source end and the previous hidden state, and in the decoding stage, the hidden state is determined by the word vector at the target end calculated in the previous step and the previous hidden state.
By applying the method provided by the embodiment of the invention, in the encoding stage, an encoder encodes a read source sentence to obtain a word vector sequence of the source sentence, a forward RNN network of the encoder determines a forward vector sequence, a reverse RNN network determines a reverse vector sequence, a hidden vector sequence corresponding to the source sentence is determined according to the forward vector sequence and the reverse vector sequence, a vector containing context information corresponding to each source word represents a forward hidden state, a reverse hidden state and a word vector corresponding to the source word, the context vector can be obtained by using an attention network according to the hidden vector sequence, and in the decoding stage, a decoder predicts a target word of the corresponding source word according to the context vector and the currently predicted target word, thereby generating a target sentence of the source sentence. The method shortens the information channel between the word vector of the source end and the word vector of the target end, enhances the connection and mapping between the word vectors, enhances the performance of a translation system and improves the translation quality.
In the embodiment of the present invention, the probability p (y) of generating the target sentence y of the source sentence is:
wherein, p (y) t |{y 1 ,y 2 ,…,y t-1 },c t )=g(y t-1 ,s t ,c t ) G is a non-linear activation function, and specifically, a softmax function, s, can be adopted t Is a hidden state in the RNN.
Based on the conventional NMT model, s t =f(y t-1 ,s t-1 ,c t )。
In one embodiment of the present invention,wherein, t * =arg max jtj ),α tj According to atten in the traditional NMT modelWeights calculated by the network.
Weight alpha calculated according to attention network in traditional NMT model tj The NMT model can be obtained to generate the current target word y t Time, mainly using source wordsThe information of (1). In accordance with this principle, embodiments of the present invention utilizeTo enhance the hidden layer state s of the target end t Using it to predict the target word y to be generated t . Namely, it isAnd y t Through the target hidden layer state s t And performing connection, which can be called a target end state connection model. The schematic diagram of the NMT model fused with the hidden layer state connection model at the target end can be seen from fig. 3, in the NMT model, the current target word y can be generated by the Attention weight information t Time, mainly using source wordsIs to be provided withInformation fusion to computational hidden state s t In the formula (c), by the hidden layer state s t Proceeding wordAnd y t The contact of (2).
In one embodiment of the invention, a target sentence of a source sentence is generatedThereafter, the method may further comprise the steps of:
the method comprises the following steps: determining a training setN represents the number of sentence pairs contained in the corpus, x n 、y n Representing a sentence pair;
step two: and based on the training set, carrying out model training according to a preset target training function.
In practical application, the training of the model generally adopts the minimized negative log-likelihood as a loss function and adopts the stochastic gradient descent as a training method to carry out iterative training. Determining a training setThen, based on the training set, the model training can be performed according to a preset target training function.
In one embodiment of the present invention, the target training function may be:
in another embodiment of the present invention, the target training function may be:
wherein,is a word vectoring loss function (word embedding loss), and w is a conversion matrix.
In the embodiment of the invention, the source end word vector and the target end word vector are connected through the conversion matrix w, the difference between the word vectors at two ends is reduced, the word vectors at two ends learned by the NMT model can be converted with each other, and if one source word is a source wordCorresponding to the target word y t By conversion ofThe matrix w can be reducedAnd y t The difference between them. This model may be referred to as a direct-connect model.
The direct connection model is an extension of the source hidden state connection model. And adding a conversion matrix on the basis of the source hidden layer state connection model to reduce the difference between word vectors at two ends. An NMT model diagram fused with a direct connection model is shown in FIG. 4, in which the model can be obtained through the Attention weight information to generate the target word y t The source word is mainly utilized(Obtaining a model of the connection between the mode and the hidden state of the targetObtaining the information in a consistent manner), performing source word processing by converting the matrix wAnd target word y t The mapping of (2) reduces the gap between two words.
Corresponding to the above method embodiments, the embodiments of the present invention further provide a neural machine translation device based on the word vector join technology, and a neural machine translation device based on the word vector join technology described below and a neural machine translation method based on the word vector join technology described above may be referred to correspondingly.
Referring to fig. 5, the apparatus may include the following modules:
a word vector sequence obtaining module 510, configured to, in the encoding stage, encode the read source sentences by the encoder, and obtain a word vector sequence x = < x of the source sentences 1 ,x 2 ,…,x j ,…,x T >; wherein x is j Is the first in the source sentenceThe word vectors of j source words, and T represents the number of the source words contained in the source sentences;
a forward vector sequence determination module 520, configured to determine a forward vector sequence composed of hidden vectors according to the word vector sequence by a forward recurrent neural network RNN of the encoderWherein,f is a nonlinear activation function for the forward hidden layer state of the jth source word;
a reverse vector sequence determination module 530 for the encoder's reverse RNN to determine a reverse vector sequence composed of hidden vectors from the word vector sequenceWherein,a reverse hidden state for the jth source word;
a hidden vector sequence determining module 540, configured to determine, according to the forward vector sequence and the reverse vector sequence, that the hidden vector sequence < h corresponds to the source sentence 1 ,h 2 ,…,h j ,…,h T >; wherein,representing the vector containing context information corresponding to each source word in the source sentences;
a context vector obtaining module 550, configured to obtain a context vector c by using an attention network according to the hidden vector sequence t =q({h 1 ,h 2 ,…,h j ,…,h T }); wherein q is a nonlinear activation function;
a target statement generation module 560 for, in the decoding stage, decoding the vector c according to the context t And the current predicted target word y 1 ,y 2 ,…,y t-1 Predicting the target word y of the corresponding source word t Generating a target sentence of a source sentenceT y Indicating the number of target words contained in the target sentence.
By applying the device provided by the embodiment of the invention, in the encoding stage, the encoder encodes the read source sentences to obtain word vector sequences of the source sentences, the forward RNN network of the encoder determines forward vector sequences, the reverse RNN network determines reverse vector sequences, hidden vector sequences corresponding to the source sentences are determined according to the forward vector sequences and the reverse vector sequences, vectors containing context information corresponding to each source word represent the forward hidden state, the reverse hidden state and the word vectors corresponding to the source word, the context vectors can be obtained by using an attention network according to the hidden vector sequences, and in the decoding stage, the decoder predicts the target words of the corresponding source words according to the context vectors and the currently predicted target words, thereby generating the target sentences of the source sentences. The method shortens the information channel between the word vector of the source end and the word vector of the target end, enhances the connection and mapping between the word vectors, enhances the performance of a translation system and improves the translation quality.
In one embodiment of the present invention, the probability p (y) of the target sentence y generating the source sentence is:
wherein, p (y) t |{y 1 ,y 2 ,…,y t-1 },c t )=g(y t-1 ,s t ,c t ) G is a nonlinear activation function, s t Is a hidden state in the RNN.
In one embodiment of the present invention,wherein, t * =arg max jtj ),α tj Are weights calculated from the attention network in the conventional NMT model.
In an embodiment of the present invention, the system further includes a training module, configured to:
in generating target sentence of source sentenceThereafter, a training set is determinedN represents the number of sentence pairs contained in the training corpus, x n 、y n Representing a sentence pair;
and based on the training set, performing model training according to a preset target training function.
In one embodiment of the present invention, the target training function is:
wherein,for the word vector loss function, w is the transformation matrix.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The principle and the embodiment of the present invention are explained by applying specific examples, and the above description of the embodiments is only used to help understanding the technical solution and the core idea of the present invention. It should be noted that, for those skilled in the art, without departing from the principle of the present invention, it is possible to make various improvements and modifications to the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (10)

1. A neural machine translation method based on a word vector connection technology is characterized by comprising the following steps:
in the encoding stage, an encoder encodes the read source sentences to obtain word vector sequences x = < x of the source sentences 1 ,x 2 ,…,x j ,…,x T >; wherein x is j For the word vector of the jth source word in the source sentences, T represents the number of the source words contained in the source sentences;
the forward recurrent neural network RNN of the encoder determines a forward vector sequence consisting of hidden vectors from the word vector sequenceWherein,f is a nonlinear activation function for the forward hidden layer state of the jth source word;
the reverse RNN of the encoder determines a reverse vector sequence consisting of hidden vectors from the word vector sequenceWherein,a reverse hidden state for the jth source word;
determining that the hidden vector sequence < h corresponding to the source sentence according to the forward vector sequence and the reverse vector sequence 1 ,h 2 ,…,h j ,…,h T >; wherein,representing the vector containing the context information corresponding to each source word in the source sentence;
obtaining a context vector c by utilizing an attention network according to the hidden layer vector sequence t =q({h 1 ,h 2 ,…,h j ,…,h T }); wherein q is a nonlinear activation function;
in the decoding stage, the decoder is based on the context vector c t And the current predicted target word y 1 ,y 2 ,…,y t-1 Predicting the target word y of the corresponding source word t Generating a target sentence of the source sentenceT y Representing the number of target words contained in the target sentence.
2. The method of claim 1, wherein the probability p (y) of generating the target sentence y of the source sentence is:
wherein, p (y) t |{y 1 ,y 2 ,…,y t-1 },c t )=g(y t-1 ,s t ,c t ) G is a non-linear activation function, s t Is a hidden state in the RNN.
3. The method of claim 2,wherein, t * =arg max jtj ),α tj Are weights calculated from the attention network in the conventional NMT model.
4. The method of any of claims 1-3, wherein a target sentence of the source sentence is generated at the time of the generating of the source sentenceThen, the method also comprises the following steps:
determining a training setN represents the number of sentence pairs contained in the training corpus, x n 、y n Representing a sentence pair;
and based on the training set, performing model training according to a preset target training function.
5. The method of claim 4, wherein the target training function is:
wherein,for the word vector loss function, w is the transformation matrix.
6. A neural machine translation device based on word vector connection technology, comprising:
a word vector sequence obtaining module, configured to, in a coding stage, code the read source sentences by an encoder, and obtain a word vector sequence x = < x of the source sentences 1 ,x 2 ,…,x j ,…,x T >; wherein x is j The word vector of the jth source word in the source sentence is obtained, and T represents the number of the source words contained in the source sentence;
a forward vector sequence determination module for determining a forward vector sequence composed of hidden vectors according to the word vector sequence by a forward Recurrent Neural Network (RNN) of the encoderWherein,f is a nonlinear activation function for the forward hidden layer state of the jth source word;
a reverse vector sequence determination module for determining a reverse vector sequence composed of hidden vectors according to the word vector sequence by a reverse RNN of the encoderWherein,a reverse hidden state for the jth source word;
a hidden vector sequence determining module for determining the source language according to the forward vector sequence and the backward vector sequenceHidden layer vector sequence < h corresponding to sentence 1 ,h 2 ,…,h j ,…,h T >; wherein,representing the vector containing the context information corresponding to each source word in the source sentence;
a context vector obtaining module, configured to obtain a context vector c by using an attention network according to the hidden layer vector sequence t =q({h 1 ,h 2 ,…,h j ,…,h T }); wherein q is a nonlinear activation function;
a target statement generation module for, in a decoding stage, decoding the context vector c by a decoder t And the current predicted target word y 1 ,y 2 ,…,y t-1 Predicting the target word y of the corresponding source word t Generating a target sentence of the source sentenceT y Representing the number of target words contained in the target sentence.
7. The apparatus of claim 6, wherein the probability p (y) of generating the target sentence y of the source sentence is:
wherein, p (y) t |{y 1 ,y 2 ,…,y t-1 },c t )=g(y t-1 ,s t ,c t ) G is a non-linear activation function, s t Is a hidden state in the RNN.
8. The apparatus of claim 7,wherein, t * =arg max jtj ),α tj Are weights calculated from the attention network in the conventional NMT model.
9. The apparatus of any one of claims 6 to 8, further comprising a training module to:
generating a target sentence of the source sentenceThereafter, a training set is determinedN represents the number of sentence pairs contained in the training corpus, x n 、y n Representing a sentence pair;
and based on the training set, performing model training according to a preset target training function.
10. The method of claim 9, wherein the target training function is:
wherein,for the word vector loss function, w is the transformation matrix.
CN201711091457.8A 2017-11-08 2017-11-08 Neural machine translation method and device based on word vector connection technology Active CN107729329B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711091457.8A CN107729329B (en) 2017-11-08 2017-11-08 Neural machine translation method and device based on word vector connection technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711091457.8A CN107729329B (en) 2017-11-08 2017-11-08 Neural machine translation method and device based on word vector connection technology

Publications (2)

Publication Number Publication Date
CN107729329A true CN107729329A (en) 2018-02-23
CN107729329B CN107729329B (en) 2021-03-26

Family

ID=61223060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711091457.8A Active CN107729329B (en) 2017-11-08 2017-11-08 Neural machine translation method and device based on word vector connection technology

Country Status (1)

Country Link
CN (1) CN107729329B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920468A (en) * 2018-05-07 2018-11-30 内蒙古工业大学 A kind of bilingual kind of inter-translation method of illiteracy Chinese based on intensified learning
CN108984539A (en) * 2018-07-17 2018-12-11 苏州大学 The neural machine translation method of translation information based on simulation future time instance
CN109062897A (en) * 2018-07-26 2018-12-21 苏州大学 Sentence alignment method based on deep neural network
CN109145315A (en) * 2018-09-05 2019-01-04 腾讯科技(深圳)有限公司 Text interpretation method, device, storage medium and computer equipment
CN109271643A (en) * 2018-08-08 2019-01-25 北京捷通华声科技股份有限公司 A kind of training method of translation model, interpretation method and device
CN109299479A (en) * 2018-08-21 2019-02-01 苏州大学 Translation memory is incorporated to the method for neural machine translation by door control mechanism
CN109558605A (en) * 2018-12-17 2019-04-02 北京百度网讯科技有限公司 Method and apparatus for translating sentence
CN109740169A (en) * 2019-01-09 2019-05-10 北京邮电大学 A kind of Chinese medical book interpretation method based on dictionary and seq2seq pre-training mechanism
CN111104806A (en) * 2018-10-26 2020-05-05 澳门大学 Construction method and device of neural machine translation model, and translation method and device
WO2020108545A1 (en) * 2018-11-29 2020-06-04 腾讯科技(深圳)有限公司 Statement processing method, statement decoding method and apparatus, storage medium and device
WO2020108400A1 (en) * 2018-11-28 2020-06-04 腾讯科技(深圳)有限公司 Text translation method and device, and storage medium
CN112287673A (en) * 2020-10-23 2021-01-29 广州云趣信息科技有限公司 Method for realizing voice navigation robot based on deep learning
CN112446221A (en) * 2019-08-14 2021-03-05 阿里巴巴集团控股有限公司 Translation evaluation method, device and system and computer storage medium
CN112597778A (en) * 2020-12-14 2021-04-02 华为技术有限公司 Training method of translation model, translation method and translation equipment
CN112836523A (en) * 2019-11-22 2021-05-25 上海流利说信息技术有限公司 Word translation method, device and equipment and readable storage medium
WO2023061107A1 (en) * 2021-10-13 2023-04-20 北京有竹居网络技术有限公司 Language translation method and apparatus based on layer prediction, and device and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8185375B1 (en) * 2007-03-26 2012-05-22 Google Inc. Word alignment with bridge languages
CN106202068A (en) * 2016-07-25 2016-12-07 哈尔滨工业大学 The machine translation method of semantic vector based on multi-lingual parallel corpora
CN107025219A (en) * 2017-04-19 2017-08-08 厦门大学 A kind of word insertion method for expressing based on internal Semantic hierarchy
CN107038159A (en) * 2017-03-09 2017-08-11 清华大学 A kind of neural network machine interpretation method based on unsupervised domain-adaptive

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8185375B1 (en) * 2007-03-26 2012-05-22 Google Inc. Word alignment with bridge languages
CN106202068A (en) * 2016-07-25 2016-12-07 哈尔滨工业大学 The machine translation method of semantic vector based on multi-lingual parallel corpora
CN107038159A (en) * 2017-03-09 2017-08-11 清华大学 A kind of neural network machine interpretation method based on unsupervised domain-adaptive
CN107025219A (en) * 2017-04-19 2017-08-08 厦门大学 A kind of word insertion method for expressing based on internal Semantic hierarchy

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAITAO MI ET AL: "Supervised Attentions for Neural Machine Translation", 《ARXIV》 *
熊德意 等: "基于句法的统计机器翻译综述", 《中文信息学报》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920468A (en) * 2018-05-07 2018-11-30 内蒙古工业大学 A kind of bilingual kind of inter-translation method of illiteracy Chinese based on intensified learning
CN108920468B (en) * 2018-05-07 2019-05-14 内蒙古工业大学 A kind of bilingual kind of inter-translation method of illiteracy Chinese based on intensified learning
CN108984539B (en) * 2018-07-17 2022-05-17 苏州大学 Neural machine translation method based on translation information simulating future moment
CN108984539A (en) * 2018-07-17 2018-12-11 苏州大学 The neural machine translation method of translation information based on simulation future time instance
CN109062897A (en) * 2018-07-26 2018-12-21 苏州大学 Sentence alignment method based on deep neural network
CN109271643A (en) * 2018-08-08 2019-01-25 北京捷通华声科技股份有限公司 A kind of training method of translation model, interpretation method and device
CN109299479A (en) * 2018-08-21 2019-02-01 苏州大学 Translation memory is incorporated to the method for neural machine translation by door control mechanism
CN109145315A (en) * 2018-09-05 2019-01-04 腾讯科技(深圳)有限公司 Text interpretation method, device, storage medium and computer equipment
US11853709B2 (en) 2018-09-05 2023-12-26 Tencent Technology (Shenzhen) Company Limited Text translation method and apparatus, storage medium, and computer device
CN111104806A (en) * 2018-10-26 2020-05-05 澳门大学 Construction method and device of neural machine translation model, and translation method and device
WO2020108400A1 (en) * 2018-11-28 2020-06-04 腾讯科技(深圳)有限公司 Text translation method and device, and storage medium
US12050881B2 (en) 2018-11-28 2024-07-30 Tencent Technology (Shenzhen) Company Limited Text translation method and apparatus, and storage medium
WO2020108545A1 (en) * 2018-11-29 2020-06-04 腾讯科技(深圳)有限公司 Statement processing method, statement decoding method and apparatus, storage medium and device
US12093635B2 (en) 2018-11-29 2024-09-17 Tencent Technology (Shenzhen) Company Limited Sentence encoding and decoding method, storage medium, and device
CN109558605B (en) * 2018-12-17 2022-06-10 北京百度网讯科技有限公司 Method and device for translating sentences
CN109558605A (en) * 2018-12-17 2019-04-02 北京百度网讯科技有限公司 Method and apparatus for translating sentence
CN109740169B (en) * 2019-01-09 2020-10-13 北京邮电大学 Traditional Chinese medicine ancient book translation method based on dictionary and seq2seq pre-training mechanism
CN109740169A (en) * 2019-01-09 2019-05-10 北京邮电大学 A kind of Chinese medical book interpretation method based on dictionary and seq2seq pre-training mechanism
CN112446221A (en) * 2019-08-14 2021-03-05 阿里巴巴集团控股有限公司 Translation evaluation method, device and system and computer storage medium
CN112446221B (en) * 2019-08-14 2023-12-15 阿里巴巴集团控股有限公司 Translation evaluation method, device, system and computer storage medium
CN112836523A (en) * 2019-11-22 2021-05-25 上海流利说信息技术有限公司 Word translation method, device and equipment and readable storage medium
CN112287673A (en) * 2020-10-23 2021-01-29 广州云趣信息科技有限公司 Method for realizing voice navigation robot based on deep learning
CN112597778A (en) * 2020-12-14 2021-04-02 华为技术有限公司 Training method of translation model, translation method and translation equipment
WO2022127613A1 (en) * 2020-12-14 2022-06-23 华为技术有限公司 Translation model training method, translation method, and device
WO2023061107A1 (en) * 2021-10-13 2023-04-20 北京有竹居网络技术有限公司 Language translation method and apparatus based on layer prediction, and device and medium

Also Published As

Publication number Publication date
CN107729329B (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN107729329B (en) Neural machine translation method and device based on word vector connection technology
CN107632987B (en) A kind of dialogue generation method and device
CN108920468B (en) A kind of bilingual kind of inter-translation method of illiteracy Chinese based on intensified learning
CN108153913B (en) Training method of reply information generation model, reply information generation method and device
CN111078866B (en) Chinese text abstract generation method based on sequence-to-sequence model
US11475225B2 (en) Method, system, electronic device and storage medium for clarification question generation
CN111666756B (en) Sequence model text abstract generation method based on theme fusion
US11694041B2 (en) Chapter-level text translation method and device
CN110297895B (en) Dialogue method and system based on free text knowledge
CN111738020A (en) Translation model training method and device
CN111401081A (en) Neural network machine translation method, model and model forming method
CN108984539B (en) Neural machine translation method based on translation information simulating future moment
CN112580370B (en) Mongolian nerve machine translation method integrating semantic knowledge
CN110020440A (en) A kind of machine translation method, device, server and storage medium
CN107463928A (en) Word sequence error correction algorithm, system and its equipment based on OCR and two-way LSTM
CN113609284A (en) Method and device for automatically generating text abstract fused with multivariate semantics
CN112560456A (en) Generation type abstract generation method and system based on improved neural network
CN115841119B (en) Emotion cause extraction method based on graph structure
Huai et al. Zerobn: Learning compact neural networks for latency-critical edge systems
CN110913229B (en) RNN-based decoder hidden state determination method, device and storage medium
CN111428518B (en) Low-frequency word translation method and device
CN110569499B (en) Generating type dialog system coding method and coder based on multi-mode word vectors
CN110717342A (en) Distance parameter alignment translation method based on transformer
CN115130479B (en) Machine translation method, target translation model training method, and related program and device
CN113469260B (en) Visual description method based on convolutional neural network, attention mechanism and self-attention converter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20221028

Address after: 18/F, Building A, Wuhan Optics Valley International Business Center, No. 111, Guanshan Avenue, Donghu New Technology Development Zone, Wuhan, Hubei 430070

Patentee after: Wuhan Ruidimu Network Technology Co.,Ltd.

Address before: No. 8, Xiangcheng District Ji Xue Road, Suzhou, Jiangsu

Patentee before: SOOCHOW University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221216

Address after: Room 1302, 13/F, Building B2, Future Science and Technology City, No. 999, Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan City, 430200, Hubei Province (Wuhan Area, Free Trade Zone)

Patentee after: IOL (WUHAN) INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 18/F, Building A, Wuhan Optics Valley International Business Center, No. 111, Guanshan Avenue, Donghu New Technology Development Zone, Wuhan, Hubei 430070

Patentee before: Wuhan Ruidimu Network Technology Co.,Ltd.

TR01 Transfer of patent right