CN111597829A - Translation method and device, storage medium and electronic equipment - Google Patents

Translation method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN111597829A
CN111597829A CN202010426346.3A CN202010426346A CN111597829A CN 111597829 A CN111597829 A CN 111597829A CN 202010426346 A CN202010426346 A CN 202010426346A CN 111597829 A CN111597829 A CN 111597829A
Authority
CN
China
Prior art keywords
target
language
training
words
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010426346.3A
Other languages
Chinese (zh)
Other versions
CN111597829B (en
Inventor
颜建昊
孟凡东
周杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010426346.3A priority Critical patent/CN111597829B/en
Publication of CN111597829A publication Critical patent/CN111597829A/en
Application granted granted Critical
Publication of CN111597829B publication Critical patent/CN111597829B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation

Abstract

The invention discloses a translation method and device, a storage medium and electronic equipment, comprising: acquiring N words of a first language to be translated from a pre-trained first target translation model, wherein N is a natural number, and the first target translation model is used for translating the words of the first language into words of a second language; determining a first target past characteristic vector and a first target future characteristic vector corresponding to N words through a first target translation model, and splicing the first target past characteristic vector and the first target future characteristic vector (the vectors determined according to N first coding vectors corresponding to the N words) with M first decoding vectors corresponding to the N words (the vectors decoded from the N first coding vectors) to obtain M first target splicing vectors; and determining M words of a second language by using the M first target splicing vectors, wherein the M words of the second language are words of the second language obtained by translating the N words of the first language by using the first target translation model.

Description

Translation method and device, storage medium and electronic equipment
Technical Field
The invention relates to the field of computers, in particular to a translation method and device, a storage medium and electronic equipment.
Background
At present, in the machine translation process, modeling is needed for past parts and future parts, and in the related art, the intuition objective function generally adopted has the problem of accuracy, and the utilization of text context information is lacked in the modeling. In addition, the modeling of the translated part and the untranslated part mainly has two functions, namely accurately finding the translated part and the untranslated part and utilizing the information to help the future model prediction. However, in the related art, these two functions are confused.
Therefore, in the related art, when a text is subjected to machine translation, the translated part of the text and the untranslated part of the text cannot be accurately determined, and an effective solution is not provided.
Disclosure of Invention
The embodiment of the invention provides a translation method and device, a storage medium and electronic equipment, which are used for at least solving the technical problem that translated part texts and untranslated part texts cannot be accurately determined when a text is subjected to machine translation.
According to an aspect of an embodiment of the present invention, there is provided a translation method including: acquiring N words of a first language to be translated from a pre-trained first target translation model, wherein N is a natural number, and the first target translation model is used for translating the words of the first language into words of a second language; determining a first target past feature vector and a first target future feature vector corresponding to the N words by using the first target translation model, and splicing the first target past feature vector and the first target future feature vector with M first decoded vectors corresponding to the N words to obtain M first target spliced vectors, wherein the first target past feature vector and the first target future feature vector are determined by using N first encoded vectors corresponding to the N words, the M first decoded vectors are obtained by decoding the N first encoded vectors, and M is a natural number; determining M words of the second language using the M first target concatenation vectors, wherein the M words of the second language are translated into N words of the first language by the first target translation model.
According to another aspect of the embodiments of the present invention, there is also provided a translation apparatus, including: a first obtaining unit, configured to obtain N words of a first language to be translated from a pre-trained first target translation model, where N is a natural number, and the first target translation model is configured to translate the words of the first language into words of a second language; a first processing unit, configured to determine a first target past feature vector and a first target future feature vector corresponding to the N words through the first target translation model, and splice the first target past feature vector and the first target future feature vector with M first decoded vectors corresponding to the N words to obtain M first target spliced vectors, where the first target past feature vector and the first target future feature vector are vectors determined according to N first encoded vectors corresponding to the N words, the M first decoded vectors are obtained by decoding the N first encoded vectors, and M is a natural number; a first determining unit, configured to determine M words of the second language by using the M first target concatenation vectors, where the M words of the second language are words of the second language obtained by translating N words of the first language by the first target translation model.
According to an aspect of an embodiment of the present invention, there is provided a translation method including: when the first training translation model is iterated for the t time, the Nth training coder in the first language is used for the first training translation modeltCoding the word to obtain the Nth wordtA first training code vector, wherein the NthtA first training code vector for representing said Nth of said first languagetWord, NtThe first training translation model is used for translating the words of the first language into the words of a second language; according to the above-mentioned NthtA first training code vector for determining a first past feature vector and a first future feature vector; using a second training coder in a second training translation model to N of the second languaget1Coding the words to obtain Nt1A second training code vector, and rootAccording to Nt1A second training code vector for determining a second past feature vector, wherein N is said second languaget1The word is a word of the second language translated by the first training translation model before the tth iteration; using the second training encoder to encode N in the second languaget2Coding the words to obtain Nt2A second training code vector based on Nt2A second training code vector for determining a second future feature vector, wherein N is said second languaget2A word is a word of the second language corresponding to a word of the first language to be translated by the first trained translation model after the tth iteration; determining a first loss value of the t-th iteration according to the first past eigenvector and the first future eigenvector and the second past eigenvector and the second future eigenvector; and under the condition that the first loss value of the tth iteration does not meet a first preset condition, updating parameters in the first training translation model, and executing the (t + 1) th iteration on the first training translation model.
According to another aspect of the embodiments of the present invention, there is also provided a translation apparatus, including: a first coding unit, configured to use a first training coder in the first training translation model to perform nth iteration for the first language when the first training translation model is subjected to the tth iterationtCoding the word to obtain the Nth wordtA first training code vector, wherein the NthtA first training code vector for representing said Nth of said first languagetWord, NtThe first training translation model is used for translating the words of the first language into the words of a second language; a second determination unit for determining whether the second determination unit is based on the Nth determination unittA first training code vector for determining a first past feature vector and a first future feature vector; a second processing unit for using a second training coder in a second training translation model to perform N in the second languaget1Coding the words to obtain Nt1A second training code vector based on Nt1A second training code vector for determining a second training code vectorPast feature vector, wherein N of said second languaget1The word is a word of the second language translated by the first training translation model before the tth iteration; a third processing unit for applying the second training coder to the N of the second languaget2Coding the words to obtain Nt2A second training code vector based on Nt2A second training code vector for determining a second future feature vector, wherein N is said second languaget2A word is a word of the second language corresponding to a word of the first language to be translated by the first trained translation model after the tth iteration; a third determining unit configured to determine a first loss value of the t-th iteration based on the first past eigenvector and the first future eigenvector, and the second past eigenvector and the second future eigenvector; and a fourth processing unit, configured to update a parameter in the first training translation model when the first loss value of the tth iteration does not satisfy a first preset condition, and perform a t +1 th iteration on the first training translation model.
According to a further aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above translation method when running.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the translation method through the computer program.
In the embodiment of the present invention, N words of a first language to be translated are obtained in a pre-trained first target translation model (used for translating a word of the first language into a word of a second language), then a first target past feature vector and a first target future feature vector corresponding to the N words are determined by the first target translation model, then a first target past feature vector and a first target future feature vector can be determined according to N first encoding vectors corresponding to the N words, the N first encoding vectors can be decoded to obtain M first decoding vectors, the first target past feature vector and the first target future feature vector are spliced with M first decoding vectors to obtain M first target splicing vectors, and finally, M words of a second language can be determined by using the M first target splicing vectors, wherein, the M words of the second language are the words of the second language obtained by translating the N words of the first language by the first target translation model, through the process, a first target past characteristic vector corresponding to the translated words and a first target future characteristic vector corresponding to the untranslated words can be accurately determined by the pre-trained first target translation model, the first target past characteristic vector and the first target future characteristic vector are spliced with the M first decoding vectors to obtain M first target splicing vectors, finally, the M words corresponding to the N words to be translated are determined by the M first target splicing vectors, through the mode of splicing the past characteristic vector and the future characteristic vector, the words obtained by translation of the translation model are more accurate, and the effect of improving the fidelity of the translation model through the information of the translated words and the untranslated words is achieved, the accuracy of the translation model is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of an application environment for a translation method according to an embodiment of the present invention;
FIG. 2 is a schematic flow diagram of an alternative translation method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an alternative translation model according to an embodiment of the present invention;
FIG. 4 is a schematic flow diagram of an alternative translation method according to an embodiment of the present invention;
FIG. 5 is a schematic flow diagram of yet another alternative translation method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an alternative translation apparatus according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an alternative translation apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to one aspect of an embodiment of the present invention, a translation method is provided. Alternatively, the above translation method may be applied, but not limited, to the application environment shown in fig. 1. As shown in fig. 1, a terminal device 102 (or a server 104) obtains N words of a first language to be translated from a first target translation model trained in advance, where N is a natural number, and the first target translation model is used to translate the words of the first language into words of a second language; determining a first target past feature vector and a first target future feature vector corresponding to the N words by using the first target translation model, and splicing the first target past feature vector and the first target future feature vector with M first decoded vectors corresponding to the N words to obtain M first target spliced vectors, wherein the first target past feature vector and the first target future feature vector are determined by using N first encoded vectors corresponding to the N words, the M first decoded vectors are obtained by decoding the N first encoded vectors, and M is a natural number; determining M words of the second language using the M first target concatenation vectors, wherein the M words of the second language are translated into N words of the first language by the first target translation model. The above is merely an example, and the embodiments of the present application are not limited herein.
According to yet another aspect of an embodiment of the present invention, a translation method is provided. Alternatively, the above translation method may be applied, but not limited, to the application environment shown in fig. 1. As shown in fig. 1, the terminal device 102 (or the server 104) uses the first training encoder in the first training translation model to perform the nth iteration on the first language when performing the tth iteration on the first training translation modeltCoding the word to obtain the Nth wordtA first training code vector, wherein the NthtA first training code vector for representing said Nth of said first languagetWord, NtThe first training translation model is used for translating the words of the first language into the words of a second language; according to the above-mentioned NthtA first training code vector for determining a first past feature vector and a first future feature vector; using a second training coder in a second training translation model to N of the second languaget1Coding the words to obtain Nt1A second training code vector based on Nt1A second training code directionMeasuring to determine a second past feature vector, wherein N is said second languaget1The word is a word of the second language translated by the first training translation model before the tth iteration; using the second training encoder to encode N in the second languaget2Coding the words to obtain Nt2A second training code vector based on Nt2A second training code vector for determining a second future feature vector, wherein N is said second languaget2A word is a word of the second language corresponding to a word of the first language to be translated by the first trained translation model after the tth iteration; determining a first loss value of the t-th iteration according to the first past eigenvector and the first future eigenvector and the second past eigenvector and the second future eigenvector; and under the condition that the first loss value of the tth iteration does not meet a first preset condition, updating parameters in the first training translation model, and executing the (t + 1) th iteration on the first training translation model. The above is merely an example, and the embodiments of the present application are not limited herein.
Alternatively, the method may be applied to a machine translation scenario, such as chinese-english translation, chinese-korean translation, and the like, and the embodiment is not limited herein.
Optionally, in this embodiment, the terminal device may be a terminal device configured with a target client, and may include, but is not limited to, at least one of the following: mobile phones (such as Android phones, iOS phones, etc.), notebook computers, tablet computers, palm computers, MID (Mobile Internet Devices), PAD, desktop computers, smart televisions, etc. The target client may be a video client, an instant messaging client, a browser client, an educational client, etc. Such networks may include, but are not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communication. The server may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The above is only an example, and the present embodiment is not limited to this.
Optionally, in this embodiment, as an optional implementation manner, the method may be executed by a server, or may be executed by a terminal device, or may be executed by both the server and the terminal device, and in this embodiment, the description is given by taking an example that the server (for example, the server 104) executes. As shown in fig. 2, the flow of the translation method may include the steps of:
step S202, when the first training translation model is iterated for the t time, the Nth training coder in the first language is used for the first training translation modeltCoding the word to obtain the Nth wordtA first training code vector, wherein the NthtA first training code vector for representing said Nth of said first languagetWord, NtThe first training translation model is used for translating the words of the first language into the words of the second language.
Alternatively, as shown in fig. 3, the encoders and decoders on the left part shown in fig. 3 are first trained translation models, the encoders on the left part are the first trained encoders, and the first trained translation models are used for translating the words in the first language into the words in the second language.
The first training coder can be used for the Nth iteration of the text to be translated in the first languagetCoding the word to obtain the Nth wordtA first training code vector, wherein the NthtA first training code vector for representing said Nth of said first languagetWord, NtIs a natural number.
Step S204, according to the Nth steptA first past feature vector and a first future feature vector are determined.
Optionally, may be according to the NthtThe first training code vector determines a first past feature vector and a first future feature vector, wherein the first past feature vector represents information that has been translated in part over the t iterations, and the first future feature vectorThe eigenvector represents the information of the still untranslated portion over the tth iteration.
Step S206, using the second training coder in the second training translation model to perform N in the second languaget1Coding the words to obtain Nt1A second training code vector based on Nt1A second training code vector for determining a second past feature vector, wherein N is said second languaget1The word is a word in the second language translated by the first training translation model before the tth iteration.
Optionally, as shown in fig. 3, the encoders and decoders on the right part shown in fig. 3 are second training translation models, the encoders on the right part are the second training encoders, and the second training translation models are used for translating the words in the second language into the words in the first language.
The second training encoder can be used to translate the first training translation model to obtain the N of the second language before the t-th iterationt1Coding the words to obtain Nt1A second training code vector based on Nt1A second past feature vector is determined from the second training code vector.
Step S208, using the second training encoder to perform N of the second languaget2Coding the words to obtain Nt2A second training code vector based on Nt2A second training code vector for determining a second future feature vector, wherein N is said second languaget2The word is a word of the second language corresponding to a word of the first language to be translated by the first trained translation model after the tth iteration.
Optionally, the second training encoder may further perform N times of the second language corresponding to the word of the first language to be translated by the first training translation model after the t-th iterationt2Coding the words to obtain Nt2A second training code vector based on Nt2A second training code vector.
It should be noted that, becauseTraining samples are known, Nt2The term may be a true value or a directly calculated value.
Step S210 is performed to determine a first loss value of the t-th iteration according to the first past eigenvector and the first future eigenvector, and the second past eigenvector and the second future eigenvector.
Optionally, a first loss value for the tth iteration may be determined based on the first past feature vector and the first future feature and the second past feature vector and the second future feature vector.
Step S212, when the first loss value of the tth iteration does not satisfy the first preset condition, updating the parameters in the first training translation model, and performing a t +1 th iteration on the first training translation model.
Optionally, if the first loss value does not satisfy the first preset condition, the parameters in the first training translation model may be updated according to the first loss value, and the t +1 th iteration is continued on the first training translation model, so that the parameters of the first training translation model are more accurate.
Through the embodiment, when the first training translation model is iterated for the t time, the first training encoder in the first training translation model is used for carrying out the Nth iteration on the first languagetCoding the word to obtain the Nth wordtA first training code vector, wherein the NthtA first training code vector for representing said Nth of said first languagetWord, NtThe first training translation model is used for translating the words of the first language into the words of a second language; according to the above-mentioned NthtA first training code vector for determining a first past feature vector and a first future feature vector; using a second training coder in a second training translation model to N of the second languaget1Coding the words to obtain Nt1A second training code vector based on Nt1A second training code vector for determining a second past feature vector, wherein N is said second languaget1The words are as aboveWords in the second language translated by the first training translation model before the t-th iteration; using the second training encoder to encode N in the second languaget2Coding the words to obtain Nt2A second training code vector based on Nt2A second training code vector for determining a second future feature vector, wherein N is said second languaget2A word is a word of the second language corresponding to a word of the first language to be translated by the first trained translation model after the tth iteration; determining a first loss value of the t-th iteration according to the first past eigenvector and the first future eigenvector and the second past eigenvector and the second future eigenvector; under the condition that the first loss value of the tth iteration does not meet the first preset condition, updating parameters in the first training translation model, and executing t +1 th iteration on the first training translation model, through the process, the translated words and the untranslated words can be accurately determined, and the parameters of the translation model are updated through iteration, so that the parameters of the translation model are more accurate, the effect of improving the fidelity of the translation model through the information of the translated words and the untranslated words is achieved, and the accuracy of the translation model is improved.
Alternatively, in this embodiment, the method according to the Nth aspecttA first training encoding vector for determining a first past feature vector and a first future feature vector, comprising: the above-mentioned NtConverting the first training encoding vectors into a plurality of first low-level capsule vectors; converting the first low-level capsule vectors into high-level capsule vectors; the plurality of high-level capsule vectors are divided into the first past feature vector and the first future feature vector.
Alternatively, the Nth mentioned above may be connected through a capsule networktConverting the first training code vectors into a plurality of first low-level capsule vectors, converting the first low-level capsule vectors into a plurality of high-level capsule vectors, and dividing the high-level capsule vectors into the first past feature vectors and the second past feature vectorsA future feature vector.
With this embodiment, the Nth can be divided by the capsule networktAnd the first past characteristic vector and the first future characteristic vector corresponding to the first training coding vector are used for determining information of a translated part in the past and information of an untranslated part, and parameters of a subsequent pair of first training translation models can be more accurate through the information.
Optionally, in this embodiment, the above-mentioned N-th steptThe first training encoding vectors are converted into a first plurality of low-level capsule vectors, including: u. ofi,j=Wj*hiWherein, u1,j,u2,j,…,uI,jH is the jth first low-level capsule vector of the first low-level capsule vectorsi∈{h1,h2,…,hI},{h1,h2,…,hIIs the above-mentioned NthtA first training code vector, WjRepresenting a jth training parameter corresponding to a jth high-level capsule vector of said plurality of high-level capsule vectors; the converting the first low-level capsule vectors into high-level capsule vectors includes:
Figure BDA0002498868620000111
cij=Softmax(bij),bij=bij+wTtanh(Wb[zt;uij;Ωj]) Wherein, the above ΩjA jth high-level capsule vector of the plurality of high-level capsule vectors, said sjC is the result of weighted summation of the plurality of first low-level capsule vectorsijW represents an assignment probability of one of the plurality of low-level capsule vectors corresponding to each of the plurality of high-level capsule vectorsTAnd the above WbThe first parameter is the above I is a preset value.
Optionally, the Nth of the first language is coded using a first training codertWord editingCode to obtain the NthtThe first training code vector may be { h }1,h2,…,hIObtaining a plurality of first low-level capsule vectors u by the following formulai,j=Wj*hiWherein, u1,j,u2,j,…,uI,jIs the jth of the first low-level capsule vectors, WjRepresents a jth training parameter corresponding to a jth high-level capsule vector of the plurality of high-level capsule vectors.
Then, the plurality of first low-level capsule vectors are converted into a plurality of high-level capsule vectors by:
Figure BDA0002498868620000121
Figure BDA0002498868620000122
cij=Softmax(bij)
bij=bij+wTtanh(Wb[zt;uij;Ωj])
wherein the above ΩjA jth high-level capsule vector of the plurality of high-level capsule vectors, said sjC is the result of weighted summation of the plurality of first low-level capsule vectorsijW represents an assignment probability of one of the plurality of low-level capsule vectors corresponding to each of the plurality of high-level capsule vectorsTAnd the above WbThe first parameter is the above I is a preset value.
It is understood that the above is only an example, and the present embodiment is not limited thereto.
Optionally, in this embodiment, the nth language of the first language is coded by the first training coder in the first training translation modeltAfter the word is encoded, the method further comprises: using the first training translation described aboveThe first training decoder in the model is to the NthtDecoding the first training code vector to obtain the NthtA first training decoding vector, wherein the NthtA first training decoding vector for determining the Nth of the second languagetWord, the Nth of the second languagetThe word is the Nth word of the first language from the first training translation modeltThe words in the second language are obtained through translation of the words; the above-mentioned NtSplicing the first training decoding vector with the first past characteristic vector and the first future characteristic vector to obtain a first spliced vector; determining the Nth of the second language using the first concatenation vectortA word.
Alternatively, as shown in fig. 3, the Decoder on the left part of fig. 3 is a first training Decoder in the first training translation model, and the N-th Decoder may be a second training Decoder using the first training DecodertDecoding the first training code vector to obtain the NthtA first training decoding vector, wherein the NthtA first training decoding vector for determining the Nth of the second languagetThe Nth word of the first language can be converted by the first training decodertThe Nth language of the second language obtained by translation of the wordtA word.
Then, the NthtSplicing the first training decoding vector with the first past eigenvector and the first future eigenvector to obtain a first spliced vector, and determining the Nth of the second language by using the first spliced vectortA word.
For example, 100 words are simultaneously input to the first training Encoder to obtain 100 encoded vectors, 100 encoded vectors are input to the first training Decoder to obtain 100 decoded vectors, the 100 decoded vectors are respectively spliced with a Past vector (such as the first Past eigenvector) and a Future vector (such as the first Future eigenvector), 100 spliced vectors (such as the first spliced vector) can be obtained, and softmax operation is respectively performed to obtain the probability of each translated word.
It is understood that the above is only an example, and the present embodiment is not limited thereto.
By means of the embodiment, by means of splicing the first past feature vector and the first future feature vector to obtain the first spliced vector, information of a translated part and information of an untranslated part in text context information can be added to the first spliced vector, and the nth language of the second language is determinedtThe word is used, and translation accuracy is improved.
Optionally, in this embodiment, the nth language of the second language is determined by using the first concatenation vectortA word comprising: determining a probability of each candidate word in the candidate word set of the second language using the first concatenation vector, wherein the probability of each candidate word is used to indicate that the candidate word is the Nth candidate word of the second languagetThe probability of an individual word; determining the candidate word with the highest probability in the candidate word set of the second language as the Nth word of the second languagetA word.
Optionally, the probability of each candidate word in the candidate word set of the second language may be determined through the first concatenation vector, where the probability of each candidate word may be used to indicate that the candidate word is the nth word of the second languagetThe probability of each word, and then the candidate word with the highest probability in the candidate word set of the second language is determined as the Nth word of the second languagetA word.
Optionally, in this embodiment, the updating of the parameter in the first training translation model includes at least one of: updating the training parameters in the first training encoder; updating the training parameters in the first training decoder; updating a first usage parameter according to the Nth usage parametertA first training code vector, parameters used in determining the first past feature vector and the first future feature vector.
Optionally, updating parameters in the first trained translation model may include at least one of:
updating the training parameters in the first training encoder;
updating the training parameters in the first training decoder;
updating parameters in a first training translation model, and updating a first use parameter, wherein the first use parameter is based on the NthtA first training code vector for determining parameters used in the first past feature vector and the first future feature vector;
updating a second usage parameter, wherein the second usage parameter is based on Nt1A second training code vector, which determines the parameters used in the second past feature vector;
updating a third usage parameter, wherein the third usage parameter is based on Nt2And a second training code vector, which determines parameters used in the second future feature vector.
It is understood that the above is only an example, and the present embodiment is not limited thereto.
Optionally, in this embodiment, the determining a first loss value of the t-th iteration according to the first past feature vector and the first future feature vector, and the second past feature vector and the second future feature vector: determining a first semantic distance based on the first past feature vector and the first future feature vector and the second past feature vector and the second future feature vector; determining a first loss value of the tth iteration according to the first semantic distance and a first training loss value, wherein the first training loss value is an nth loss value according to the second languagetDetermining the training loss value of the Nth word in the second languagetThe term is the Nth term of the first language of the first training translation model in the t-th iterationtAnd translating the words to obtain the words in the second language.
Optionally, the first semantic distance is determined by the semantic distance between the first past feature vector and the second past feature vector, the semantic distance between the first future feature vector and the second future feature vector, and then the t-th iteration is determined according to the first semantic distance and the first training loss valueThe first loss value of (1). Wherein the first training loss value is Nth according to the second languagetThe individual words determine the resulting training loss values.
Optionally, in this embodiment, determining a first semantic distance according to the first past feature vector and the first future feature vector and the second past feature vector and the second future feature vector includes:
Figure BDA0002498868620000151
wherein L isP,FRepresenting the first semantic distance as described above,
Figure BDA0002498868620000152
representing the first past feature vector as described above,
Figure BDA0002498868620000153
representing the second past feature vector as described above,
Figure BDA0002498868620000154
representing the first future feature vector as described above,
Figure BDA0002498868620000155
representing the second future feature vector.
Optionally, the above
Figure BDA0002498868620000156
Is the semantic distance between the first past feature vector and the second past feature vector
Figure BDA0002498868620000157
For the semantic distance of the first future feature vector and the second future feature vector: l isP,FFor the first semantic distance mentioned above, the distance,
Figure BDA0002498868620000158
representing the first past feature vector as described above,
Figure BDA0002498868620000159
representing the second past feature vector as described above,
Figure BDA00024988686200001510
representing the first future feature vector as described above,
Figure BDA00024988686200001511
representing the second future feature vector.
Optionally, in this embodiment, the determining a first loss value of the tth iteration according to the first semantic distance and the first training loss value includes: l ═ LNMT+LP,FWherein L represents a first loss value of the t-th iteration, and LP,FRepresenting the first semantic distance, LNMTRepresenting the first training loss value.
Alternatively, L may be given by the formula L ═ LNMT+LP,FDetermining a first loss value, said L representing a first loss value of said t-th iteration, LP,FRepresenting the first semantic distance, LNMTRepresenting the first training loss value.
Optionally, in this embodiment, after determining the first loss value of the tth iteration, the method further includes: and stopping the iteration of the first training translation model when a first loss value of the t-th iteration satisfies the first preset condition, wherein the first training translation model after the t-th iteration is determined as a first target translation model, the first training encoder in the first training translation model after the t-th iteration is determined as a first target encoder, and the first training decoder in the first training translation model after the t-th iteration is determined as a first target decoder.
Optionally, if the first loss value of the t-th iteration meets a first preset condition, the iteration of the first training translation model may be stopped, which indicates that the parameter precision of the first training translation model meets the first preset condition, or the first training translation model may be tested by testing a training sample, and if the training effect meets the requirement, the training of the first training translation model may also be stopped.
The first trained translation model after the t-th iteration is determined as a first target translation model, the first trained encoder in the first trained translation model after the t-th iteration is determined as a first target encoder, and the first trained decoder in the first trained translation model after the t-th iteration is determined as a first target decoder.
Optionally, after stopping the iteration of the first training translation model, as shown in fig. 4, a translation method implementation process is described as follows, and the specific steps are as follows:
step S402, obtaining N words of a first language to be translated from a pre-trained first target translation model, where N is a natural number, and the first target translation model is used to translate the words of the first language into words of a second language.
Step S404, determining a first target past feature vector and a first target future feature vector corresponding to the N words by using the first target translation model, and concatenating the first target past feature vector and the first target future feature vector with M first decoded vectors corresponding to the N words to obtain M first target concatenated vectors, where the first target past feature vector and the first target future feature vector are determined based on N first encoded vectors corresponding to the N words, the M first decoded vectors are obtained by decoding the N first encoded vectors, and M is a natural number.
Step S406, determining M words of the second language by using the M first target concatenation vectors, where the M words of the second language are translated by the first target translation model into N words of the first language.
Optionally, after stopping the iteration of the first training translation model, N words of a first language to be translated may be obtained in the first target translation model, where N is a natural number, the first target translation model is used to translate the words of the first language into words of a second language,
then, encoding the N words in the first language by the first target encoder to obtain N first encoded vectors, where the N first encoded vectors are used to represent the N words in the first language, and further decoding the N first encoded vectors by the first target decoder in the first target translation model to obtain M first decoded vectors, where M is a natural number; then, respectively splicing a past eigenvector and a future eigenvector of the M first decoding vectors to obtain M first target splicing vectors, wherein the past eigenvector and the future eigenvector are determined according to the N first coding vectors; and finally, determining M words of the second language by using the M first target concatenation vectors, wherein the M words of the second language are the words of the second language obtained by translating the N words of the first language by using the first target translation model.
For example, after the iteration of the first training translation model is stopped, that is, when the first training translation model is successfully trained to obtain the first target translation model, the first target encoder may encode N (e.g., 10) chinese words into N first encoded vectors, then the first target decoder may decode the N chinese words into M (e.g., 8) first decoded vectors, determine to obtain past eigenvectors and future eigenvectors according to the N first encoded vectors, splice the pair of past eigenvectors and future eigenvectors to M (e.g., 8) first decoded vectors to obtain M (e.g., 8) first target spliced vectors, and finally determine M words in the second language (english) using the M (e.g., 8) first target spliced vectors.
Through the embodiment, the translation result of the translation model can be supervised by context information through the trained first target translation model and the mode of splicing the past characteristic vector and the future characteristic vector, so that the translation result is more accurate, and the training accuracy of the translation model is improved.
Optionally, in this embodiment, before the determining, by the first target translation model, the corresponding first target past feature vector and first target future feature vector of the N words, the method further includes: encoding said N words of said first language by a first target encoder in said first target translation model to obtain said N first code vectors, wherein said N first code vectors are used to represent said N words of said first language; and decoding the N first encoded vectors by a first target decoder in the first target translation model to obtain the M first decoded vectors.
Optionally, before determining the corresponding first target past feature vector and first target future feature vector of the N words through the first target translation model, the first target encoder in the first target translation model may encode the N words of the first language to obtain N first encoded vectors representing the N words of the first language, and then the first target decoder in the first target translation model may decode the N first encoded vectors to obtain the M first decoded vectors.
Through the embodiment, the N first coding vectors can be obtained through the first target encoder, and the M first decoding vectors can be obtained through the first target decoder, so that a foundation is laid for the subsequent translation of N words, and the translation accuracy is improved.
Optionally, in this embodiment, the determining, by the first target translation model, a first target past feature vector and a first target future feature vector corresponding to the N words includes: converting the N first encoded vectors into a plurality of first target low-level capsule vectors; converting the first target low-level capsule vectors into target high-level capsule vectors; the plurality of target high-level capsule vectors are divided into the first target past feature vector and the first target future feature vector.
Optionally, after obtaining the N first encoding vectors, the first target past feature vector and the first target future feature vector corresponding to the N words may also be determined by the first target translation model, and the specific steps are as follows:
converting the N first coding vectors into a plurality of first target low-level capsule vectors, then converting the plurality of first target low-level capsule vectors into a plurality of target high-level capsule vectors, and finally dividing the plurality of target high-level capsule vectors to obtain the first target past characteristic vector and the first target future characteristic vector.
Through the embodiment, the first target past characteristic vector and the first target future characteristic vector corresponding to the N first coding vectors can be accurately determined through the method, namely, the past characteristic vector corresponding to the translated word and the future characteristic vector corresponding to the untranslated word are accurately determined, and the translation accuracy is improved.
Optionally, in this embodiment, the splicing the first target past feature vector and the first target future feature vector with the M first decoded vectors corresponding to the N words to obtain M first target spliced vectors includes: and obtaining M first target concatenation vectors by respectively concatenating the first target past eigenvector and the first target future eigenvector to the M first decoded vectors, wherein the first target past eigenvector and the first target future eigenvector are determined from the N first encoded vectors.
Optionally, after the first target past feature vector and the first target future feature vector are obtained in the above manner, the M first decoding vectors may be further spliced with the first target past feature vector and the first target future feature vector, respectively, so as to obtain M first target splicing vectors, so that the M first target splicing vectors may include information of translated words and information of untranslated words.
Through the embodiment, because the M first target splicing vectors contain the information of the translated words and the information of the untranslated words, M words corresponding to the N words determined by the M first target splicing vectors are more accurate, and the accuracy of translation is improved.
Optionally, in this embodiment, the method further includes: and determining whether the M words are accurate translations of the N words through a pre-trained second target translation model, wherein the second target translation model is used for translating the words in the second language into the words in the first language.
Optionally, in the process of translating the first target translation model, the translated M words of the first target translation model may be supervised through a second target translation model trained in advance, and whether the M words are accurately translated into N words is determined, where the second target translation model is used to translate the words of the second language into the words of the first language.
Through the embodiment, the first target translation model can be supervised through the second target translation model, and the translation accuracy is improved.
Optionally, in this embodiment, the determining, by using a second target translation model trained in advance, whether the M words are accurate translations of the N words includes:
using a second target encoder in the second target translation model to encode N in the second language1Coding the words to obtain N1A second code vector, and based on N1A second encoding vector for determining a second target past feature vector, wherein N of the second language1The word is a word of the second language translated by the first target translation model at the current moment; using the second target encoder to encode N in the second language2Coding the words to obtain N2A second target code vector, and according to N2A second target code vector for determining a second target future feature vector, wherein N of said second language2A word is a word of the second language corresponding to a word of the first language to be translated by the first target translation model after the current time; determining the current time based on the first target past eigenvector and the first target future eigenvector, and the second target past eigenvector and the target second future eigenvectorA target loss value; and determining that the M words are accurate translations of the N words when the target loss value meets a target threshold.
Optionally, how the second target translation model determines whether the M words are accurate translations of the N words is described in detail below, which includes the following steps:
first, using a second target encoder in a second target translation model for N in a second language1Coding the words to obtain N1A second code vector, and based on N1A second encoding vector for determining a second target past feature vector, wherein N of the second language1The words are the words of the second language obtained by translating the N words to be translated through the translation of the first target translation model at the current moment. For example,
then, the second target encoder is used to encode N in the second language2Coding the words to obtain N2A second target code vector, and according to N2A second target code vector for determining a second target future feature vector, wherein N of said second language2The word is a word of the second language corresponding to the word of the first language to be translated by the first target translation model after the current time.
Then, the target loss value at the current time is determined based on the first target past eigenvector and the first target future eigenvector, and the second target past eigenvector and the target second future eigenvector, and the target loss value may be determined by adding a first distance value between the first target past eigenvector and the second target past eigenvector and a second distance value between the first target future eigenvector and the target second future eigenvector, and determining the sum of the first distance value and the second distance value as the target loss value.
And determining that the M words are accurate translations of the N words when the target loss value meets a target threshold.
It is understood that the above is only an example, and the present embodiment is not limited thereto.
Through the embodiment, whether the M words are accurate translations of the N words can be accurately determined by calculating the target loss value, and the translation accuracy is improved.
Optionally, as shown in fig. 5, a process of updating parameters in the second training translation model is described as follows:
step S502, when the second training translation model is iterated for the t time, the Nth speech of the second language is processed by the second training coder in the second training translation modeltCoding the word to obtain the Nth wordtA third training code vector, wherein the Nth training code vectortA third training code vector for representing said Nth of said second languagetWord, NtThe second training translation model is used for translating the words of the second language into the words of the first language.
Step S504, according to the above-mentioned NthtAnd a third training code vector for determining a third past feature vector and a third future feature vector.
Step S506, using the first training encoder to perform N of the first languaget1Coding the words to obtain Nt1A fourth training code vector based on Nt1A fourth training code vector for determining a fourth past feature vector, wherein N is said first languaget1The word is a word in the first language translated by the second training translation model before the tth iteration.
Step S508, using the first training encoder to perform N of the first languaget2Coding the words to obtain Nt2A fourth training code vector based on Nt2A fourth training code vector for determining a fourth future feature vector, wherein N is said first languaget2The word is a word of the first language corresponding to a word of the second language to be translated by the second trained translation model after the tth iteration.
Step S510 determines a second loss value of the t-th iteration according to the third past feature vector and the third future feature vector, and the fourth past feature vector and the fourth future feature vector.
Step S512, updating the parameters in the second training translation model when the second loss value of the tth iteration does not satisfy the second preset condition, and performing a t +1 th iteration on the second training translation model.
Optionally, when the second training translation model is iterated for the t-th time, the second training encoder in the second training translation model may be used to encode the nth languagetCoding the word to obtain the Nth wordtA third training code vector, wherein said second training translation model is used to translate said second language word into said first language word, and said Nth training translation model is used to translate said second language word into said first language wordtA third training code vector representing said Nth of said second languagetWord, NtIs a natural number.
Then, the above-mentioned N-thtA third training encoding vector to determine a third past feature vector and a third future feature vector.
Then, the first training encoder may translate the N of the first language that has been translated by the second training translation model before the tth iterationt1Coding the words to obtain Nt1A fourth training code vector based on Nt1And a fourth training code vector for determining a fourth past feature vector.
Then, the first training encoder may perform N of the first language corresponding to the word of the second language to be translated by the second training translation model after the t-th iterationt2Coding the words to obtain Nt2A fourth training code vector based on Nt2And a fourth training code vector for determining a fourth future feature vector.
Then, a second loss value for the t-th iteration is determined based on the third past feature vector and the third future feature vector, and the fourth past feature vector and the fourth future feature vector.
Finally, if the second loss value of the tth iteration does not satisfy the second preset condition, the parameters in the second training translation model may be updated, and the t +1 th iteration may be performed on the second training translation model.
Optionally, in this embodiment, the second training coder in the second training translation model is used for the nth language of the second languagetAfter the word is encoded, the method further comprises: using a second training decoder in said second training translation model to perform said nthtDecoding the third training code vector to obtain the Nth training code vectortA third training decoding vector, wherein the Nth training decoding vectortA third training decoding vector for determining the Nth of the first languagetWord, the Nth of the first languagetThe word is the Nth word of the second language from the second training translation modeltThe words in the first language are obtained through translation of the words; the above-mentioned NtSplicing the third training decoding vectors with the third past characteristic vector and the third future characteristic vector to obtain a third spliced vector; determining the Nth of the first language using the third splicing vectortA word.
Optionally, as shown in fig. 3, the Decoder on the right part shown in fig. 3 is a second training Decoder in the second training translation model.
The Nth training decoder may be used for the above-mentionedtDecoding the third training code vector to obtain the Nth training code vector for determining the first languagetN th of a wordtA third training decoding vector, wherein the Nth of the first languagetThe word is the Nth word of the second language from the second training translation modeltAnd translating the words to obtain the words in the first language.
Then, the above-mentioned NtA third training decoding vector is spliced with the third past eigenvector and the third future eigenvector to obtain a third spliced vector, and finally, the third spliced vector is used to determine the third language of the first languageNtA word.
Optionally, the following describes a process of how to obtain the second target translation model, and the specific steps are as follows:
optionally, in this embodiment, after determining the second loss value of the tth iteration, the method further includes: and determining the second trained translation model as a second target translation model when a second loss value of the t-th iteration satisfies the second preset condition, wherein the second trained encoder in the second trained translation model after the t-th iteration is determined as a second target encoder, and the second trained decoder in the second trained translation model after the t-th iteration is determined as a second target decoder.
Optionally, if the second loss value of the t-th iteration meets a second preset condition, the iteration of the second training translation model may be stopped, which indicates that the parameter precision of the second training translation model meets the second preset condition, or the second training translation model may be tested by testing the training sample, and if the training effect meets the requirement, the training of the second training translation model may also be stopped.
The second trained translation model after the t-th iteration is determined as a second target translation model, the second trained encoder in the second trained translation model after the t-th iteration is determined as a second target encoder, and the second trained decoder in the second trained translation model after the t-th iteration is determined as a second target decoder.
The flow of the translation method is described below with reference to an alternative example. The method comprises the following specific steps:
optionally, information of the translated part and the untranslated part in the source sentence is determined by a dynamic routing mode of the capsule network, and then the low-level capsule is mapped onto the high-level capsule by a mode of calculating an allocation probability (elementary). The result of the machine translation can then be made more accurate by equally dividing the N high-grade capsules into N/2 capsules representing the "past" portion and N/2 capsules representing the "future" portion and then connecting the translated and untranslated portions of information to the output of the decoder before adding the translated and untranslated portions of information to the Softmax layer that is common for machine translation.
Alternatively, the coding module of the Neural Machine Translation (NMT) model may be output as { h }1,h2,…,hIThe mapping of the matrix of } as a low-level capsule:
ui,j=Wj*hi
wherein u isi,jI.e. output h of the coding moduleiThe matrix of (a) corresponds to the low-level capsules,
Figure BDA0002498868620000254
trainable parameters of the jth capsule are represented as matrix dimensions. Then, during the dynamic routing of the capsule network, the representation Ω of each high-level capsulejIs defined by an operation of compression (squash).
Figure BDA0002498868620000251
Figure BDA0002498868620000252
cij=Softmax(bij),
Figure BDA0002498868620000253
cij=Softmax(bij)
Wherein s isjIs the result of a weighted summation of all low-level capsules, and cijIt is the probability of assignment of a low-level capsule to a high-level capsule. Next, the high-level capsule can be split equally into two groups, representing translated portions Ω, respectivelyPAnd an untranslated portion omegaF
In each iteration of dynamic routing, the model continuously and iteratively updates b according to the existing matching informationijThe value of (a) is,
bij=bij+wTtanh(Wb[zt;uij;Ωj])
on the basis of modeling the past and future by the capsule network, the present invention proposes to use a dual network to supervise the past and future parts separately.
Alternatively, the translated and untranslated portions may be modeled as a dynamically routed Capsule network (DGC).
Alternatively, as shown in fig. 3, the overall structure of the model is a two-way dual model. In the model, two machine translation models which are opposite to each other are used simultaneously, and mutual supervision can be performed by utilizing the association between the two models. For example, in FIG. 3, the left-side source-to-target may be a Chinese-to-English translation model, and the right-side target-to-source may be an English-to-Chinese translation model.
Assume that at some time step t of machine translation model decoding, the "past" capsule output of the forward (source-to-target) translation model is as follows:
ΩP,f=DGC(hb,cij,zt}
assume that at some time step t of machine translation model decoding, the "future" capsule output of the forward (source-to-target) translation model is as follows:
ΩF,f=DGC(hb,cij,zt}
then, the partial result Y that the forward (source-end to target-end) model has decoded at t time step can be decodedt={y1,…,ytAnd put it back into the encoding part of the inverse (target-to-source) model.
Figure BDA0002498868620000261
Wherein the content of the first and second substances,
Figure BDA0002498868620000262
representing the performance of the decoded partial result by means of a reverse encoderAnd (5) encoding the encoded encoding result.
In the same way, a portion Y after t time steps can be divided≥t={yt+1,…,yTAnd (4) putting the data into a coding module of a reverse (target end to source end) model to obtain a coding result based on a future part.
Figure BDA0002498868620000263
Then, can be
Figure BDA0002498868620000271
And
Figure BDA0002498868620000272
again into the DGC to extract the corresponding high level representation (e.g., the output of the corresponding high level capsule).
Figure BDA0002498868620000273
Figure BDA0002498868620000274
Wherein m ispAnd mfA mask representing the translated and untranslated portions corresponding to the t time step. Then, a semantic distance is calculated for the output of the DGCs at both ends,
Figure BDA0002498868620000275
in the training process, the training loss calculated by the machine translation model is assumed to be LNMTBy adding the above semantic distance to the training loss,
L=LNMT+LP
optionally, the forward model (source end to target end) and the reverse model (target end to source end) can be mutually supervised in the normal training process through the upper L, so that the translation effects of the two parties are enhanced.
Optionally, the output of the capsule module may be added as partial information to the final output of a machine translation (NMT) model by means of concatenation, and a probability value of each word may be predicted by a Softmax layer, where P (y | x) ═ Softmax (Z; Ω;. omega.)P;ΩF) For generating the final predicted result.
In addition, the invention can be applied to a machine translation system requiring any Seq2Seq structure to enhance the translation fidelity.
According to the embodiment, in machine translation, the translated part and the untranslated part can be accurately found by modeling the translated part and the untranslated part, future model prediction can be helped through the information, model expression of a machine translation system can be effectively improved by utilizing the text context information, translation fidelity is increased, and user experience can be better enhanced.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
According to still another aspect of the embodiments of the present invention, there is also provided a translation apparatus, as shown in fig. 6, including:
a first encoding unit 602, configured to use a first training encoder in the first training translation model to perform an nth iteration of the first training translation model for a first languagetCoding the word to obtain the Nth wordtA first training code vector, wherein the NthtA first training code vector for representing said Nth of said first languagetWord, NtThe first training translation model is used for translating the words of the first language into the words of a second language;
a second determining unit 604 for determining the NthtA first training code vector for determining a first past feature vector and a first future feature vector;
a second processing unit 606, configured to use a second training coder in a second training translation model to perform N in the second languaget1Coding the words to obtain Nt1A second training code vector based on Nt1A second training code vector for determining a second past feature vector, wherein N is said second languaget1The word is a word of the second language translated by the first training translation model before the tth iteration;
a third processing unit 608 for processing N of the second language using the second training codert2Coding the words to obtain Nt2A second training code vector based on Nt2A second training code vector for determining a second future feature vector, wherein N is said second languaget2A word is a word of the second language corresponding to a word of the first language to be translated by the first trained translation model after the tth iteration;
a third determining unit 610, configured to determine a first loss value of the t-th iteration according to the first past eigenvector and the first future eigenvector, and the second past eigenvector and the second future eigenvector;
a fourth processing unit 612, configured to update parameters in the first training translation model when the first loss value of the tth iteration does not satisfy the first preset condition, and perform a t +1 th iteration on the first training translation model.
Through the embodiment, when the first training translation model is iterated for the t time, the first training encoder in the first training translation model is used for carrying out the Nth iteration on the first languagetCoding the word to obtain the Nth wordtA first training encoding vector, wherein,the above-mentioned NthtA first training code vector for representing said Nth of said first languagetWord, NtThe first training translation model is used for translating the words of the first language into the words of a second language; according to the above-mentioned NthtA first training code vector for determining a first past feature vector and a first future feature vector; using a second training coder in a second training translation model to N of the second languaget1Coding the words to obtain Nt1A second training code vector based on Nt1A second training code vector for determining a second past feature vector, wherein N is said second languaget1The word is a word of the second language translated by the first training translation model before the tth iteration; using the second training encoder to encode N in the second languaget2Coding the words to obtain Nt2A second training code vector based on Nt2A second training code vector for determining a second future feature vector, wherein N is said second languaget2A word is a word of the second language corresponding to a word of the first language to be translated by the first trained translation model after the tth iteration; determining a first loss value of the t-th iteration according to the first past eigenvector and the first future eigenvector and the second past eigenvector and the second future eigenvector; under the condition that the first loss value of the tth iteration does not meet the first preset condition, updating parameters in the first training translation model, and executing t +1 th iteration on the first training translation model, through the process, the translated words and the untranslated words can be accurately determined, and the parameters of the translation model are updated through iteration, so that the parameters of the translation model are more accurate, the effect of improving the fidelity of the translation model through the information of the translated words and the untranslated words is achieved, and the accuracy of the translation model is improved.
As an optional technical solution, the second determining unit includes: a first conversion module for converting the Nth signaltConverting the first training encoding vectors into a plurality of first low-level capsule vectors; a second conversion module for converting the first low-level capsule vectors into high-level capsule vectors; a first dividing module, configured to divide the plurality of high-level capsule vectors into the first past feature vector and the first future feature vector.
As an optional technical solution, the first conversion module is further configured to convert the nth data into the second data by the following methodtThe first training encoding vectors are converted into a first plurality of low-level capsule vectors, including: u. ofi,j=Wj*hiWherein, u1,j,u2,j,…,uI,jH is the jth first low-level capsule vector of the first low-level capsule vectorsi={h1,h2,…,hI},{h1,h2,…,hIIs the above-mentioned NthtA first training code vector, WjRepresenting a jth training parameter corresponding to a jth high-level capsule vector of said plurality of high-level capsule vectors; a second conversion module, further configured to convert the first low-level capsule vectors into high-level capsule vectors by:
Figure BDA0002498868620000301
Figure BDA0002498868620000302
cij=Softmax(bij),bij=bij+wTtanh(Wb[zt;uij;Ωj]) Wherein, the above ΩjA jth high-level capsule vector of the plurality of high-level capsule vectors, said sjC is the result of weighted summation of the plurality of first low-level capsule vectorsijW represents an assignment probability of one of the plurality of low-level capsule vectors corresponding to each of the plurality of high-level capsule vectorsTAnd the above WbIs firstUsing parameters, wherein the I is a preset value.
As an optional technical solution, the apparatus further includes: a first decoding unit for decoding the Nth training translation model by using a first training decoder in the first training translation modeltDecoding the first training code vector to obtain the NthtA first training decoding vector, wherein the NthtA first training decoding vector for determining the Nth of the second languagetWord, the Nth of the second languagetThe word is the Nth word of the first language from the first training translation modeltThe words in the second language are obtained through translation of the words; a first splicing unit for splicing the NthtSplicing the first training decoding vector with the first past characteristic vector and the first future characteristic vector to obtain a first spliced vector; a fourth determining unit for determining the Nth of the second language using the first splicing vectortA word.
As an optional technical solution, the fourth determining unit includes: a first determining module, configured to determine a probability of each candidate word in the candidate word set of the second language using the first concatenation vector, where the probability of each candidate word is used to indicate that the candidate word is the nth word of the second languagetThe probability of an individual word; a second determining module, configured to determine the candidate word with the highest probability in the candidate word set of the second language as the nth word of the second languagetA word.
As an optional technical solution, the fourth processing unit at least includes one of: a first updating module, configured to update the training parameters in the first training encoder; a second updating module, configured to update the training parameters in the first training decoder; a third updating module for updating a first usage parameter, wherein the first usage parameter is based on the Nth parametertA first training code vector, parameters used in determining the first past feature vector and the first future feature vector.
As an alternative solution, the third determining unit,the method comprises the following steps: a third determining module, configured to determine a first semantic distance according to the first past feature vector and the first future feature vector, and the second past feature vector and the second future feature vector; a fourth determining module, configured to determine a first loss value of the tth iteration according to the first semantic distance and a first training loss value, where the first training loss value is an nth loss value according to the second languagetDetermining the training loss value of the Nth word in the second languagetThe term is the Nth term of the first language of the first training translation model in the t-th iterationtAnd translating the words to obtain the words in the second language.
As an optional technical solution, the third determining module is further configured to determine a first semantic distance according to the first past feature vector and the first future feature vector, and the second past feature vector and the second future feature vector, and includes:
Figure BDA0002498868620000321
Figure BDA0002498868620000322
wherein L isP,FRepresenting the first semantic distance as described above,
Figure BDA0002498868620000323
representing the first past feature vector as described above,
Figure BDA0002498868620000324
representing the second past feature vector as described above,
Figure BDA0002498868620000325
representing the first future feature vector as described above,
Figure BDA0002498868620000326
representing the second future feature vector.
As an optional technical solution, the fourth determining module is further configured to perform the above-mentioned methodDetermining a first loss value of the tth iteration by using the first semantic distance and the first training loss value, wherein the determining comprises: l ═ LNMT+LP,FWherein L represents the first loss value of the t iteration, and LP,FRepresenting the first semantic distance, LNMTRepresenting the first training loss value.
As an optional technical solution, the apparatus further includes: and a fifth processing unit configured to stop the iteration of the first training translation model when a first loss value of the t-th iteration satisfies the first preset condition, where the first training translation model after the t-th iteration is determined to be a first target translation model, the first training encoder in the first training translation model after the t-th iteration is determined to be a first target encoder, and the first training decoder in the first training translation model after the t-th iteration is determined to be a first target decoder.
According to still another aspect of the embodiments of the present invention, there is also provided a translation apparatus, as shown in fig. 7, including:
a first obtaining unit 702, configured to obtain N words of a first language to be translated from a pre-trained first target translation model, where N is a natural number, where the first target translation model is used to translate the words of the first language into words of a second language;
a first processing unit 704, configured to determine a first target past feature vector and a first target future feature vector corresponding to the N words through the first target translation model, and splice the first target past feature vector and the first target future feature vector with M first decoded vectors corresponding to the N words, so as to obtain M first target spliced vectors, where the first target past feature vector and the first target future feature vector are vectors determined according to N first encoded vectors corresponding to the N words, the M first decoded vectors are obtained by decoding the N first encoded vectors, and M is a natural number;
a first determining unit 708, configured to determine M words of the second language by using the M first target concatenation vectors, where the M words of the second language are words of the second language obtained by translating N words of the first language by the first target translation model.
Obtaining N words of a first language to be translated from a pre-trained first target translation model (used for translating words of the first language into words of a second language), then determining corresponding first target past characteristic vectors and first target future characteristic vectors of the N words through the first target translation model, then determining to obtain first target past characteristic vectors and first target future characteristic vectors according to the N first coding vectors corresponding to the N words, decoding the N first coding vectors to obtain M first decoding vectors, splicing the first target past characteristic vectors and the first target future characteristic vectors with the M first decoding vectors to obtain M first target splicing vectors, and finally determining M words of a second language by using the M first target splicing vectors, wherein the M words of the second language are words of the second language obtained by translating the N words of the first language through the first target translation model, through the process, the first target past characteristic vector corresponding to the translated part of words and the first target future characteristic vector corresponding to the untranslated part of words can be accurately determined through the pre-trained first target translation model, the first target past characteristic vector and the first target future characteristic vector are spliced with the M first decoding vectors to obtain M first target splicing vectors, M words corresponding to the N to-be-translated words are finally determined through the M first target splicing vectors, and the words obtained through translation of the translation model are more accurate through the mode of splicing the past characteristic vector and the future characteristic vector, so that the effect of improving the fidelity of the translation model through the information of the translated part of words and the untranslated part of words is achieved, and the accuracy of the translation model is improved.
As an optional technical solution, the apparatus further includes: a second encoding unit configured to encode the N words of the first language by the first target encoder in the first target translation model to obtain N first encoded vectors, wherein the N first encoded vectors are used to represent the N words of the first language; a second decoding unit, configured to decode the N first encoded vectors by using the first target decoder in the first target translation model, so as to obtain the M first decoded vectors.
As an optional technical solution, the first processing unit is further configured to splice the first target past eigenvector and the first target future eigenvector to the M first decoded vectors respectively to obtain M first target spliced vectors, where the first target past eigenvector and the first target future eigenvector are vectors determined according to the N first encoded vectors.
As an optional technical solution, the first processing unit includes: a third conversion module, configured to convert the N first encoded vectors into a plurality of first target low-level capsule vectors; a fourth conversion module, configured to convert the plurality of first target low-level capsule vectors into a plurality of target high-level capsule vectors; a second dividing module, configured to divide the plurality of target high-level capsule vectors into the first target past feature vector and the first target future feature vector.
As an optional technical solution, the apparatus further includes: a fifth determining unit, configured to determine whether the M words are accurate translations of the N words through a pre-trained second target translation model, where the second target translation model is used to translate the words in the second language into the words in the first language.
As an optional technical solution, the fifth determining unit includes: a first processing module, configured to use a second target encoder in the second target translation model to perform N in the second language1Coding the words to obtain N1A second code vector, and based on N1A second encoding vector for determining a second target past feature vector, wherein N of the second language1The word is the second language translated by the first target translation model at the current momentThe word of (1); a second processing module for applying the second target encoder to the N of the second language2Coding the words to obtain N2A second target code vector, and according to N2A second target code vector for determining a second target future feature vector, wherein N of said second language2A word is a word of the second language corresponding to a word of the first language to be translated by the first target translation model after the current time; a third processing module, configured to determine a target loss value at the current time according to the first target past eigenvector and the first target future eigenvector, and the second target past eigenvector and the target second future eigenvector; and the fourth processing module is used for determining that the M words are accurate translations of the N words under the condition that the target loss value meets a target threshold value.
As an optional technical solution, the apparatus further includes: a third encoding unit, configured to use the second training encoder in the second training translation model to perform an nth iteration of a second language when performing a tth iteration on the second training translation modeltCoding the word to obtain the Nth wordtA third training code vector, wherein the Nth training code vectortA third training code vector for representing said Nth of said second languagetWord, NtThe second training translation model is used for translating the words of the second language into the words of the first language; a sixth determining unit for determining the second determination value according to the Nth determination valuetA third training code vector for determining a third past feature vector and a third future feature vector; a sixth processing unit for applying the first training coder to the N of the first languaget1Coding the words to obtain Nt1A fourth training code vector based on Nt1A fourth training code vector for determining a fourth past feature vector, wherein N is said first languaget1The word is a word of the first language translated by the second training translation model before the tth iteration; a seventh processing unit for using the first processing unitTraining the encoder to N of the first languaget2Coding the words to obtain Nt2A fourth training code vector based on Nt2A fourth training code vector for determining a fourth future feature vector, wherein N is said first languaget2A word is a word in the first language corresponding to a word in the second language to be translated through the second training translation model after the tth iteration; a seventh determining unit configured to determine a second loss value of the t-th iteration based on the third past feature vector and the third future feature vector, and the fourth past feature vector and the fourth future feature vector; and the eighth processing unit is configured to, when the second loss value of the tth iteration does not satisfy a second preset condition, update parameters in the second training translation model, and perform a t +1 th iteration on the second training translation model.
As an optional technical solution, the apparatus further includes: a third decoding unit for decoding the Nth training translation model by using a second training decoder in the second training translation modeltDecoding the third training code vector to obtain the Nth training code vectortA third training decoding vector, wherein the Nth training decoding vectortA third training decoding vector for determining the Nth of the first languagetWord, the Nth of the first languagetThe word is the Nth word of the second language from the second training translation modeltThe words in the first language are obtained through translation of the words; a second splicing unit for splicing the NthtSplicing the third training decoding vectors with the third past characteristic vector and the third future characteristic vector to obtain a third spliced vector; an eighth determining unit for determining the Nth of the first language using the third splicing vectortA word.
As an optional technical solution, the apparatus further includes: a ninth processing unit, configured to determine the second trained translation model as a second target translation model when the second loss value of the tth iteration satisfies the second preset condition after determining the second loss value of the tth iteration, where the second trained encoder in the second trained translation model after the tth iteration is determined as a second target encoder, and the second trained decoder in the second trained translation model after the tth iteration is determined as a second target decoder.
According to a further aspect of embodiments of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, obtaining N words of a first language to be translated from a pre-trained first target translation model, wherein N is a natural number, and the first target translation model is used for translating the words of the first language into words of a second language;
s2, determining a first target past eigenvector and a first target future eigenvector corresponding to the N words by using the first target translation model, and concatenating the first target past eigenvector and the first target future eigenvector with M first decoded vectors corresponding to the N words, to obtain M first target concatenated vectors, where the first target past eigenvector and the first target future eigenvector are determined from N first encoded vectors corresponding to the N words, the M first decoded vectors are obtained by decoding the N first encoded vectors, and M is a natural number;
s3, determining M words of the second language by using the M first target concatenation vectors, wherein the M words of the second language are translated by the first target translation model into N words of the first language.
Optionally, in this embodiment, the storage medium may be further configured to store a computer program for executing the following steps:
s1, using the first training when the first training translation model is iterated for the t timeNth of first language of first training coder in translation modeltCoding the word to obtain the Nth wordtA first training code vector, wherein the NthtA first training code vector for representing said Nth of said first languagetWord, NtThe first training translation model is used for translating the words of the first language into the words of a second language;
s2, according to the above-mentioned NthtA first training code vector for determining a first past feature vector and a first future feature vector;
s3, using the second training coder in the second training translation model to perform N in the second languaget1Coding the words to obtain Nt1A second training code vector based on Nt1A second training code vector for determining a second past feature vector, wherein N is said second languaget1The word is a word of the second language translated by the first training translation model before the tth iteration;
s4, using the second training coder to encode N of the second languaget2Coding the words to obtain Nt2A second training code vector based on Nt2A second training code vector for determining a second future feature vector, wherein N is said second languaget2A word is a word of the second language corresponding to a word of the first language to be translated by the first trained translation model after the tth iteration;
s5, determining a first loss value of the t-th iteration according to the first past eigenvector and the first future eigenvector, and the second past eigenvector and the second future eigenvector;
and S6, updating the parameters in the first training translation model and executing the (t + 1) th iteration on the first training translation model when the first loss value of the t-th iteration does not meet the first preset condition.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, ROM (Read-Only Memory), RAM (Random Access Memory), magnetic or optical disks, and the like.
According to another aspect of the embodiment of the present invention, there is also provided an electronic device for implementing the translation method, where the electronic device may be the terminal device or the server shown in fig. 1. The present embodiment takes the electronic device as a server as an example for explanation. As shown in fig. 8, the electronic device comprises a memory 802 and a processor 804, the memory 802 having a computer program stored therein, the processor 804 being arranged to perform the steps of any of the above-described method embodiments by means of the computer program.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, obtaining N words of a first language to be translated from a pre-trained first target translation model, wherein N is a natural number, and the first target translation model is used for translating the words of the first language into words of a second language;
s2, determining a first target past eigenvector and a first target future eigenvector corresponding to the N words by using the first target translation model, and concatenating the first target past eigenvector and the first target future eigenvector with M first decoded vectors corresponding to the N words, to obtain M first target concatenated vectors, where the first target past eigenvector and the first target future eigenvector are determined from N first encoded vectors corresponding to the N words, the M first decoded vectors are obtained by decoding the N first encoded vectors, and M is a natural number;
s3, determining M words of the second language by using the M first target concatenation vectors, wherein the M words of the second language are translated by the first target translation model into N words of the first language.
Optionally, in this embodiment, the processor may be further configured to execute, by the computer program, the following steps:
s1, when the first training translation model is iterated for the t time, the first training coder in the first training translation model is used for carrying out the Nth iteration of the first languagetCoding the word to obtain the Nth wordtA first training code vector, wherein the NthtA first training code vector for representing said Nth of said first languagetWord, NtThe first training translation model is used for translating the words of the first language into the words of a second language;
s2, according to the above-mentioned NthtA first training code vector for determining a first past feature vector and a first future feature vector;
s3, using the second training coder in the second training translation model to perform N in the second languaget1Coding the words to obtain Nt1A second training code vector based on Nt1A second training code vector for determining a second past feature vector, wherein N is said second languaget1The word is a word of the second language translated by the first training translation model before the tth iteration;
s4, using the second training coder to encode N of the second languaget2Coding the words to obtain Nt2A second training code vector based on Nt2A second training code vector for determining a second future feature vector, wherein N is said second languaget2A word is a word of the second language corresponding to a word of the first language to be translated by the first trained translation model after the tth iteration;
s5, determining a first loss value of the t-th iteration according to the first past eigenvector and the first future eigenvector, and the second past eigenvector and the second future eigenvector;
and S6, updating the parameters in the first training translation model and executing the (t + 1) th iteration on the first training translation model when the first loss value of the t-th iteration does not meet the first preset condition.
Alternatively, it is understood by those skilled in the art that the structure shown in fig. 8 is only an illustration and is not a limitation to the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 8, or have a different configuration than shown in FIG. 8.
The memory 802 may be used to store software programs and modules, such as program instructions/modules corresponding to the translation method and apparatus in the embodiments of the present invention, and the processor 804 executes various functional applications and data processing by running the software programs and modules stored in the memory 802, so as to implement the translation method described above. The memory 802 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 802 can further include memory located remotely from the processor 804, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. As an example, as shown in fig. 8, the memory 802 may include, but is not limited to, a first encoding unit 602, a second determining unit 604, a second processing unit 606, a third processing unit 608, a third determining unit 610, and a fourth processing unit 612 in the translation apparatus, or the memory 802 may include, but is not limited to, a first obtaining unit 702, a first processing unit 704, and a first determining unit 706 in the translation apparatus. In addition, other module units in the translation apparatus may also be included, but are not limited to, and are not described in detail in this example.
Optionally, the transmitting device 806 is configured to receive or transmit data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 806 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 806 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In addition, the electronic device further includes: and a connection bus 808 for connecting the respective module components in the electronic device.
In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the above methods according to the embodiments of the present invention.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (15)

1. A method of translation, comprising:
acquiring N words of a first language to be translated from a pre-trained first target translation model, wherein N is a natural number, and the first target translation model is used for translating the words of the first language into words of a second language;
determining a first target past feature vector and a first target future feature vector corresponding to the N words through the first target translation model, and splicing the first target past feature vector and the first target future feature vector with M first decoding vectors corresponding to the N words to obtain M first target spliced vectors, where the first target past feature vector and the first target future feature vector are vectors determined according to N first encoding vectors corresponding to the N words, the M first decoding vectors are vectors obtained by decoding the N first encoding vectors, and M is a natural number;
and determining M words of the second language by using the M first target splicing vectors, wherein the M words of the second language are obtained by translating the N words of the first language by the first target translation model.
2. The method of claim 1, wherein prior to said determining, by said first target translation model, corresponding first target past and future feature vectors for said N words, said method further comprises:
encoding the N words in the first language by a first target encoder in the first target translation model to obtain N first encoding vectors, where the N first encoding vectors are used to represent the N words in the first language;
and decoding the N first coding vectors through a first target decoder in the first target translation model to obtain the M first decoding vectors.
3. The method of claim 2, wherein the concatenating the first target past feature vector and the first target future feature vector with the M first decoded vectors corresponding to the N words to obtain M first target concatenated vectors comprises:
and respectively splicing the first target past eigenvector and the first target future eigenvector for the M first decoding vectors to obtain M first target spliced vectors, wherein the first target past eigenvector and the first target future eigenvector are vectors determined according to the N first coding vectors.
4. The method of claim 2, wherein said determining, by the first target translation model, corresponding first target past and future feature vectors for the N words comprises:
converting the N first encoding vectors into a plurality of first target low-level capsule vectors;
converting the plurality of first target low-level capsule vectors into a plurality of target high-level capsule vectors;
dividing the plurality of target high-level capsule vectors into the first target past feature vector and the first target future feature vector.
5. The method according to any one of claims 1 to 4, further comprising:
and determining whether the M words are accurate translations of the N words through a pre-trained second target translation model, wherein the second target translation model is used for translating the words in the second language into the words in the first language.
6. The method of claim 5, wherein determining whether the M words are accurate translations of the N words through a second pre-trained target translation model comprises:
for the second language using a second target encoder in the second target translation modelN1Coding the words to obtain N1A second code vector, and based on N1A second encoding vector for determining a second target past feature vector, wherein N of the second language1The individual word is a word of the second language translated by the first target translation model at the current moment;
n for the second language using the second target encoder2Coding the words to obtain N2A second target code vector, and according to N2A second target code vector, determining a second target future feature vector, wherein N of the second language2The word is a word of the second language corresponding to the word of the first language to be translated through the first target translation model after the current moment;
determining a target loss value of the current moment according to the first target past feature vector and the first target future feature vector and the second target past feature vector and the target second future feature vector;
determining that the M words are accurate translations of the N words if the target loss value satisfies a target threshold.
7. A method of translation, comprising:
when the first training translation model is iterated for the t time, the Nth training coder in the first language is used for the first training translation modeltCoding the word to obtain the Nth wordtA first training code vector, wherein the NthtA first training code vector for representing said Nth of said first languagetWord, NtThe first training translation model is used for translating the words in the first language into the words in the second language;
according to the NthtA first training code vector for determining a first past feature vector and a first future feature vector;
using a second training encoder in a second training translation model for the second languageN of seedt1Coding the words to obtain Nt1A second training code vector based on Nt1A second training code vector for determining a second past feature vector, wherein N of the second languaget1The word is a word of the second language translated by the first training translation model before the t-th iteration;
n for the second language using the second training encodert2Coding the words to obtain Nt2A second training code vector based on Nt2A second training code vector for determining a second future feature vector, wherein N is the second languaget2Each word is a word of the second language corresponding to the word of the first language to be translated through the first training translation model after the t-th iteration;
determining a first loss value for a tth iteration based on the first past and future feature vectors and the second past and future feature vectors;
and under the condition that the first loss value of the tth iteration does not meet a first preset condition, updating parameters in the first training translation model, and executing the t +1 th iteration on the first training translation model.
8. The method of claim 7, wherein determining a first past eigenvector and a first future eigenvector from the Nt first training code vector comprises:
converting the Nt first training encoding vector into a plurality of first low-level capsule vectors;
converting the plurality of first low-level capsule vectors into a plurality of high-level capsule vectors;
dividing the plurality of high-level capsule vectors into the first past feature vector and the first future feature vector.
9. The method of claim 8,
the N is1The first training encoding vectors are converted into a first plurality of low-level capsule vectors, including:
ui,j=Wj*hi
wherein, { u1,j,u2,j,…,uI,jIs the jth of the plurality of first low-level capsule vectors, hi∈{h1,h2,…,hI},{h1,h2,…,hIIs the Nt first training code vector, WjRepresenting a jth training parameter corresponding to a jth high-level capsule vector of the plurality of high-level capsule vectors;
the converting the plurality of first low-level capsule vectors to a plurality of high-level capsule vectors, comprising:
Figure FDA0002498868610000041
Figure FDA0002498868610000042
bij=bij+wTtanh(Wb[zt;uij;Ωj])
wherein said ΩjA jth high-level capsule vector of the plurality of high-level capsule vectors, said sjFor the result of weighted summation of the plurality of first low-level capsule vectors, the cijMeans for representing an assigned probability that one of the plurality of low-level capsule vectors corresponds to each of the plurality of high-level capsule vectors, the wTAnd said WbThe I is a preset value as a first use parameter.
10. The method of claim 7, wherein said using a first trained coder in a first trained translation model for a first languageN of (2)tAfter the word is encoded, the method further comprises:
using a first training decoder in the first training translation model to the NthtDecoding the first training code vector to obtain the NthtA first training decoding vector, wherein the NthtA first training decoding vector for determining the Nth of the second languagetA word, the Nth of the second languagetThe word is the Nth word of the first language of the first training translation modeltThe words in the second language are obtained through translation of the words;
the N thtSplicing the first training decoding vectors with the first past characteristic vector and the first future characteristic vector to obtain first spliced vectors;
determining the Nth of the second language using the first stitching vectortA word.
11. The method of claim 7, wherein the determining a first loss value for a tth iteration based on the first past and future eigenvectors and the second past and future eigenvectors comprises:
determining a first semantic distance from the first past feature vector and the first future feature vector and the second past feature vector and the second future feature vector;
determining a first loss value of the t iteration according to the first semantic distance and a first training loss value, wherein the first training loss value is the Nth loss value according to the second languagetDetermining the training loss value obtained by the word, the Nth of the second languagetThe word is the Nth word of the first language of the first training translation model in the t iterationtAnd translating the words to obtain the words in the second language.
12. The method according to any of claims 7 to 11, wherein after determining the first loss value for the tth iteration, the method further comprises:
stopping the iteration of the first training translation model when a first loss value of the t-th iteration meets the first preset condition, wherein the first training translation model after the t-th iteration is determined as a first target translation model, the first training encoder in the first training translation model after the t-th iteration is determined as a first target encoder, and a first training decoder in the first training translation model after the t-th iteration is determined as a first target decoder.
13. A translation apparatus, comprising:
a first obtaining unit, configured to obtain N words of a first language to be translated from a pre-trained first target translation model, where N is a natural number, and the first target translation model is configured to translate the words of the first language into words of a second language;
a first processing unit, configured to determine, through the first target translation model, a first target past feature vector and a first target future feature vector corresponding to the N words, and splice the first target past feature vector and the first target future feature vector with M first decoding vectors corresponding to the N words, to obtain M first target spliced vectors, where the first target past feature vector and the first target future feature vector are vectors determined according to N first encoding vectors corresponding to the N words, the M first decoding vectors are obtained by decoding the N first encoding vectors, and M is a natural number;
a first determining unit, configured to determine M words of the second language using the M first target concatenation vectors, where the M words of the second language are words of the second language obtained by translating N words of the first language by the first target translation model.
14. A computer-readable storage medium comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 6 or claims 7 to 12.
15. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the method of any of claims 1 to 6, or claims 7 to 12, by means of the computer program.
CN202010426346.3A 2020-05-19 2020-05-19 Translation method and device, storage medium and electronic equipment Active CN111597829B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010426346.3A CN111597829B (en) 2020-05-19 2020-05-19 Translation method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010426346.3A CN111597829B (en) 2020-05-19 2020-05-19 Translation method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111597829A true CN111597829A (en) 2020-08-28
CN111597829B CN111597829B (en) 2021-08-27

Family

ID=72185809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010426346.3A Active CN111597829B (en) 2020-05-19 2020-05-19 Translation method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111597829B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463956A (en) * 2020-11-26 2021-03-09 重庆邮电大学 Text summary generation system and method based on counterstudy and hierarchical neural network

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070033001A1 (en) * 2005-08-03 2007-02-08 Ion Muslea Identifying documents which form translated pairs, within a document collection
CN106649282A (en) * 2015-10-30 2017-05-10 阿里巴巴集团控股有限公司 Machine translation method and device based on statistics, and electronic equipment
CN107368476A (en) * 2017-07-25 2017-11-21 深圳市腾讯计算机系统有限公司 The method and relevant apparatus that a kind of method of translation, target information determine
CN108932069A (en) * 2018-07-11 2018-12-04 科大讯飞股份有限公司 Input method candidate entry determines method, apparatus, equipment and readable storage medium storing program for executing
CN109271643A (en) * 2018-08-08 2019-01-25 北京捷通华声科技股份有限公司 A kind of training method of translation model, interpretation method and device
CN109446534A (en) * 2018-09-21 2019-03-08 清华大学 Machine translation method and device
CN110879940A (en) * 2019-11-21 2020-03-13 哈尔滨理工大学 Machine translation method and system based on deep neural network
CN111160050A (en) * 2019-12-20 2020-05-15 沈阳雅译网络技术有限公司 Chapter-level neural machine translation method based on context memory network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070033001A1 (en) * 2005-08-03 2007-02-08 Ion Muslea Identifying documents which form translated pairs, within a document collection
CN106649282A (en) * 2015-10-30 2017-05-10 阿里巴巴集团控股有限公司 Machine translation method and device based on statistics, and electronic equipment
CN107368476A (en) * 2017-07-25 2017-11-21 深圳市腾讯计算机系统有限公司 The method and relevant apparatus that a kind of method of translation, target information determine
CN108932069A (en) * 2018-07-11 2018-12-04 科大讯飞股份有限公司 Input method candidate entry determines method, apparatus, equipment and readable storage medium storing program for executing
CN109271643A (en) * 2018-08-08 2019-01-25 北京捷通华声科技股份有限公司 A kind of training method of translation model, interpretation method and device
CN109446534A (en) * 2018-09-21 2019-03-08 清华大学 Machine translation method and device
CN110879940A (en) * 2019-11-21 2020-03-13 哈尔滨理工大学 Machine translation method and system based on deep neural network
CN111160050A (en) * 2019-12-20 2020-05-15 沈阳雅译网络技术有限公司 Chapter-level neural machine translation method based on context memory network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463956A (en) * 2020-11-26 2021-03-09 重庆邮电大学 Text summary generation system and method based on counterstudy and hierarchical neural network

Also Published As

Publication number Publication date
CN111597829B (en) 2021-08-27

Similar Documents

Publication Publication Date Title
JP7025090B2 (en) Translation method, target information determination method and related equipment, and computer program
WO2019114695A1 (en) Translation model-based training method, translation method, computer device and storage medium
US20230025317A1 (en) Text classification model training method, text classification method, apparatus, device, storage medium and computer program product
KR20200007900A (en) Generation of Points of Interest Text
CN107766319B (en) Sequence conversion method and device
CN111428520A (en) Text translation method and device
CN107729329A (en) A kind of neural machine translation method and device based on term vector interconnection technique
CN110705273B (en) Information processing method and device based on neural network, medium and electronic equipment
CN112883149B (en) Natural language processing method and device
CN110673840A (en) Automatic code generation method and system based on tag graph embedding technology
CN114676234A (en) Model training method and related equipment
CN112257471A (en) Model training method and device, computer equipment and storage medium
CN113919344A (en) Text processing method and device
CN112749569A (en) Text translation method and device
CN108280513B (en) Model generation method and device
CN111597829B (en) Translation method and device, storage medium and electronic equipment
CN110955765A (en) Corpus construction method and apparatus of intelligent assistant, computer device and storage medium
WO2019161753A1 (en) Information translation method and device, and storage medium and electronic device
JP2023512551A (en) Method, apparatus, electronic device and computer readable medium for generating prediction information
CN111027333A (en) Chapter translation method and device
CN105808527A (en) Oriented translation method and device based on artificial intelligence
CN115712701A (en) Language processing method, apparatus and storage medium
CN114841175A (en) Machine translation method, device, equipment and storage medium
CN109597884B (en) Dialog generation method, device, storage medium and terminal equipment
CN113569585A (en) Translation method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40027473

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant