CN113919373A - Neural machine translation method, training method and device of model thereof, and electronic device - Google Patents
Neural machine translation method, training method and device of model thereof, and electronic device Download PDFInfo
- Publication number
- CN113919373A CN113919373A CN202010646609.1A CN202010646609A CN113919373A CN 113919373 A CN113919373 A CN 113919373A CN 202010646609 A CN202010646609 A CN 202010646609A CN 113919373 A CN113919373 A CN 113919373A
- Authority
- CN
- China
- Prior art keywords
- sentence
- translation
- source
- parameter
- translation memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013519 translation Methods 0.000 title claims abstract description 273
- 230000001537 neural effect Effects 0.000 title claims abstract description 93
- 238000012549 training Methods 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title claims abstract description 65
- 238000004422 calculation algorithm Methods 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000012163 sequencing technique Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 20
- 238000012545 processing Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the disclosure provides a neural machine translation method, a training method and a training device of a model thereof and electronic equipment. The method comprises the following steps: acquiring a translation memory sentence pair with the highest similarity to each source end sentence in the parallel corpus from the translation memory; training a preset initial model by taking the source language sentence pair and the translation memory base sentence pair as a training sample set until a first parameter of a source-end encoder layer of the translation memory base of the preset initial model converges, a second parameter of a target-end encoder layer of the translation memory base of the preset initial model converges and a third parameter of a decoder layer containing translation memory base information of the preset initial model converges to obtain a neural machine translation model; the source sentence pair comprises a source sentence and a corresponding target sentence, and the translation memory base sentence pair comprises a translation memory base source sentence and a corresponding translation memory base target sentence. According to the embodiment of the disclosure, the accuracy of the neural machine translation result can be improved.
Description
Technical Field
The present disclosure relates to the field of machine translation technologies, and in particular, to a training method of a neural machine translation model, a neural machine translation method, a training device of a neural machine translation model, a neural machine translation device, and an electronic device.
Background
Neural Machine Translation (NMT) is a method for implementing context-specific Translation based on Neural network technology. A common neural machine translation method is usually implemented based on an encoder and a decoder. The encoder encodes a sentence in a source language into a set of hidden state representations, and the decoder acquires the hidden state representations generated by the encoder through an attention network, thereby generating a target sentence word by word. However, due to the performance limitation of the existing neural machine translation, the translation result often contains a lot of errors, and the translation quality is poor. Therefore, there is a need to provide a new neural machine translation method.
Disclosure of Invention
It is an object of embodiments of the present disclosure to provide a new technical solution for training of neural machine translation models.
According to a first aspect of the present disclosure, there is provided a method of training a neural machine translation model, the method comprising:
acquiring a translation memory sentence pair with the highest similarity to each source end sentence in the parallel corpus from the translation memory;
training a preset initial model by taking a source language sentence pair and a translation memory sentence pair as a training sample set until a first parameter of a source-end encoder layer of a translation memory of the preset initial model is converged, a second parameter of a target-end encoder layer of the translation memory of the preset initial model is converged, and a third parameter of a decoder layer of the preset initial model, which contains translation memory information, is converged to obtain a neural machine translation model;
the source sentence pair comprises a source sentence and a corresponding target sentence, and the translation memory base sentence pair comprises a translation memory base source sentence and a corresponding translation memory base target sentence.
Optionally, the obtaining, from the translation memory, a translation memory sentence pair with the highest similarity to each source sentence in the parallel corpus includes:
calculating the similarity between the source-end sentence and each translation memory library source-end sentence in the translation memory library;
and sequencing the similarity from large to small, and determining the source end sentence with the maximum similarity of the translation memory base and the corresponding target end sentence as the translation memory base sentence pair with the highest similarity of the source end sentences.
Optionally, the calculating a similarity between the source sentence and each translation memory source sentence in the translation memory includes:
and calculating the editing distance between the source end sentence and each translation memory library source end sentence in the translation memory library as the similarity.
Optionally, the training a preset initial model by using the source language sentence pair and the translation memory sentence pair as a training sample set until a first parameter of a source-end encoder layer of a translation memory of the preset initial model converges, a second parameter of a target-end encoder layer of the translation memory of the preset initial model converges, and a third parameter of a decoder layer of the preset initial model, which contains information of the translation memory, converges includes:
translating each sample in the training sample set based on the preset initial model to obtain a translation result;
substituting the translation result into a preset loss function for calculation to obtain the loss of each sample;
updating the first, second, and third parameters based on the loss until the first, second, and third parameters converge.
Optionally, wherein the updating the first parameter, the second parameter, and the third parameter based on the loss until the first parameter, the second parameter, and the third parameter converge includes:
respectively calculating a first derivative of the first parameter, a second derivative of the second parameter and a third derivative of the third parameter based on the loss and a preset back propagation algorithm;
updating the first parameter based on the first derivative and gradient descent algorithm, the second parameter based on the second derivative and gradient descent algorithm, and the third parameter based on the third derivative and gradient descent algorithm;
and updating the first parameter, the second parameter and the third parameter for multiple times based on the loss of a plurality of samples in the training sample set until convergence, so as to obtain the neural machine translation model.
According to a second aspect of the present disclosure, there is provided a method of neural machine translation, the method comprising:
acquiring a source end sentence to be translated;
inputting the source end sentence to be translated into a neural machine translation model, and outputting a target end sentence as a translation result; wherein the neural machine translation model is trained according to a training method of the neural machine translation model according to any one of the first aspect of the present disclosure.
According to a third aspect of the present disclosure, there is provided an apparatus for training a neural machine translation model, the apparatus comprising:
the acquisition module is used for acquiring translation memory sentence pairs with the highest similarity to each source end sentence in the parallel corpus from the translation memory;
the training module is used for training a preset initial model by taking a source language sentence pair and a translation memory sentence pair as a training sample set until a first parameter of a source end encoder layer of a translation memory library of the preset initial model converges, a second parameter of a target end encoder layer of the translation memory library of the preset initial model converges, and a third parameter of a decoder layer of the preset initial model, which contains translation memory library information, converges to obtain a neural machine translation model; the source sentence pair comprises a source sentence and a corresponding target sentence, and the translation memory base sentence pair comprises a translation memory base source sentence and a corresponding translation memory base target sentence.
According to a fourth aspect of the present disclosure, there is provided a neural machine translation device, the device comprising:
the obtaining module is used for obtaining a source end sentence to be translated;
the translation module is used for inputting the source end sentence to be translated into a neural machine translation model and outputting a target end sentence as a translation result; wherein the neural machine translation model is trained according to a training method of the neural machine translation model according to any one of the first aspect of the present disclosure.
According to a fifth aspect of embodiments of the present disclosure, there is also provided an electronic device, comprising a processor and a memory; the memory stores machine executable instructions executable by the processor; the processor executes the machine executable instructions to implement the method of training a neural machine translation model according to any one of the first aspect of the embodiments of the present disclosure.
According to a sixth aspect of embodiments of the present disclosure, there is also provided an electronic device, comprising a processor and a memory; the memory stores machine executable instructions executable by the processor; the processor executes the machine executable instructions to implement the neural machine translation method described in the second aspect of the embodiments of the present disclosure.
According to an embodiment of the disclosure, the accuracy of a neural machine translation result can be improved, and the translation quality is improved.
Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a schematic structural diagram of an electronic device to which a training method of a neural machine translation model according to an embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram of a method of training a neural machine translation model in accordance with an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of an encoder structure according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a decoder architecture according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of a training apparatus for a neural machine translation model according to an embodiment of the present disclosure;
FIG. 6 is a functional block diagram of an electronic device according to a first embodiment of the present disclosure;
FIG. 7 is a flow diagram of a neural machine translation method in accordance with an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a neural machine translation device, according to an embodiment of the present disclosure;
fig. 9 is a functional block diagram of an electronic device according to a second embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
< hardware configuration >
The present neural machine translation system can generate a sentence t' of another language having the same meaning from an input sentence s. However, due to the performance limitation of the neural machine translation system, the generated translation result often contains many errors.
The translation memory can store bilingual inter-translation sentence pairs(s)m,tm) The bilingual inter-sentence pairs can be manually translated or collected through other ways. If a sentence very similar to the sentence s exists in the translation memoryAnd its translationThe sentence pair information can be utilizedTo help the neural extreme translation system translate sentence s, but how to translate sentence pairs in the memoryThe method is integrated into a neural machine translation system, so that the problem of generating a translation with better quality for the sentence s becomes urgent to be solved. Based on this, the present disclosure proposes a method for improving the translation quality of a neural machine translation system by using a translation memory library, which can be used in various neural machine translation structures, including but not limited to convolutional neural networks, cyclic neural networks, self-attention networks, and the like.
Fig. 1 is a schematic structural diagram of an electronic device to which a method for training a neural machine translation model according to an embodiment of the present disclosure may be applied.
As shown in fig. 1, the electronic apparatus 1000 of the present embodiment may include a processor 1010, a memory 1020, an interface device 1030, a communication device 1040, a display device 1050, an input device 1060, a speaker 1070, a microphone 1080, and the like.
The processor 1010 may be a central processing unit CPU, a microprocessor MCU, or the like. The memory 1020 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1030 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1040 can perform wired or wireless communication, for example. The display device 1050 is, for example, a liquid crystal display panel, a touch panel, or the like. The input device 1060 may include, for example, a touch screen, a keyboard, and the like.
The electronic device 1000 may output audio information through the speaker 1070. The electronic device 1000 can pick up voice information input by a user through the microphone 1080.
The electronic device 1000 may be a laptop computer, desktop computer, or the like.
In this embodiment, the electronic device 1000 may obtain, from the translation memory, a translation memory sentence pair with the highest similarity to each source-end sentence in the parallel corpus; training a preset initial model by taking the source language sentence pair and the translation memory base sentence pair as a training sample set until a first parameter of a source-end encoder layer of the translation memory base of the preset initial model converges, a second parameter of a target-end encoder layer of the translation memory base of the preset initial model converges and a third parameter of a decoder layer containing translation memory base information of the preset initial model converges to obtain a neural machine translation model; the source sentence pair comprises a source sentence and a corresponding target sentence, and the translation memory base sentence pair comprises a translation memory base source sentence and a corresponding translation memory base target sentence.
In this embodiment, the memory 1020 of the electronic device 1000 is configured to store instructions for controlling the processor 1010 to operate in support of implementing a method of training a neural machine translation model according to any embodiment of the present description.
It should be understood by those skilled in the art that although a plurality of devices of the electronic apparatus 1000 are illustrated in fig. 1, the electronic apparatus 1000 of the present embodiment may refer to only some of the devices, for example, the processor 1010, the memory 1020, the display device 1050, the input device 1060, and the like.
The skilled person can design the instructions according to the disclosed solution of the present disclosure. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
< first embodiment >
< method >
The present embodiment provides a method for training a neural machine translation model, which may be implemented by an electronic device, such as the electronic device 1000 shown in fig. 1.
As shown in fig. 2, the method includes the following steps 2100 and 2200:
in step 2100, a translation memory sentence pair with the highest similarity to each source sentence in the parallel corpus is obtained from the translation memory.
The parallel corpus is a corpus in which the source text and the translated text are retrieved and displayed in a contrasting manner, and in this embodiment, a (s, t) represents a parallel sentence pair, (si,ti) For representing the ith sentence pair in a parallel corpus, in which s is usediRepresenting the source sentence of the ith sentence pair in the parallel corpus by tiRepresents its target end sentence, tiIs siThe translation of (2). Stored in the translation memory are bilingual inter-translation sentence pairs(s)m,tm) The bilingual translation sentence pair(s)m,tm) The sentence pairs can be manually translated or inter-translated sentences collected by other ways.Used for representing the ith sentence pair in the translation memory. In the present embodiment, toRepresents the source sentence of the ith sentence pair in the translation memory base toRepresenting the sentence at the target end of the sentence,is thatThe translation of (2).
In this step, when obtaining the translation memory bank sentence pair with the highest similarity, the electronic device 1000 may specifically calculate the similarity between the source-end sentence and each translation memory bank source-end sentence in the translation memory bank; and sequencing the similarity from large to small, and determining the translation memory base source end sentence with the maximum similarity and the target end sentence corresponding to the translation memory base source end sentence as the translation memory base sentence pair with the highest similarity to the source end sentence.
For example, a parallel sentence pair(s) is giveni,ti) We are right to translate the memory bank(s)m,tm) Each sentence pair inAll calculate siAnddegree of similarity e ofi. Pair e after calculationiSorting is carried out, and e is selectediHighest sentence pairAs(s)i,ti) Form a training sample set
Illustratively, in calculating the similarity, s may be calculatediAndthe edit distance of (d) is used as the similarity.
Wherein the source sentence pair comprises a source sentence siAnd corresponding target-side sentence tiThe translation memory sentence pair includes translation memory source end sentencesAnd corresponding translation memory target-side sentences
In the present embodiment, in order to pair sentencesIntroduced into a neural machine translation model, and two layers of encoders are added on the basis of the traditional encoder respectivelyAndas an input. As shown in FIG. 3, during encoding, the retrieved sentence is encoded by self-attention-network (self-attention-network)Then the source-end sentence s coded by the self-attention network is utilizediSource context of (1) represents s-context and encodedPerforming cross attention network (cross attention) coding to obtainThe source context of (a) represents sm-context.
Similarly, for translation memory target-side sentencesFirst utilizing self-attention network codingThen will beAndthe source context representation sm-context is obtained by cross-attention network codingThe target-side context of (1) represents tm-context.
Meanwhile, in this embodiment, a layer of decoder is added on the basis of the conventional 6-layer decoder, as shown in fig. 4, in this embodiment, the new layer of decoder is named as a decoder layer containing translation memory information, and has three inputs, namely, the output TM of the original 6-layer decoder and the source-end sentence siSource context representation s-context, and translating memory base target sentencesThe target-side context of (1) represents tm-context. Predicting the translation t by using the three information togetheri。
When the preset initial model is trained, a traditional translation model of a neural machine translation model can be trained as a reference system, and after the reference system is trained, parameters of the preset initial model of the embodiment are initialized by using the parameters, namely, a first parameter of a source encoder layer of a translation memory library of the preset initial model, a second parameter of a target encoder layer of the translation memory library of the preset initial model, and a third parameter of a decoder layer of the preset initial model, wherein the decoder layer contains information of the translation memory library.
Specifically, during training, the electronic device 1000 may translate each sample in the training sample set based on the preset initial model to obtain a translation result; substituting the translation result into a preset loss function for calculation to obtain the loss of each sample; the first, second, and third parameters are updated based on the loss until the first, second, and third parameters converge.
Wherein, the step of updating the first parameter, the second parameter, and the third parameter based on the loss until the first parameter, the second parameter, and the third parameter converge may specifically be: respectively calculating a first derivative of the first parameter, a second derivative of the second parameter and a third derivative of the third parameter based on the loss and a preset back propagation algorithm; updating the first parameter based on the first derivative and a gradient descent algorithm, updating the second parameter based on the second derivative and a gradient descent algorithm, and updating the third parameter based on the third derivative and a gradient descent algorithm; and updating the first parameter, the second parameter and the third parameter for multiple times based on the loss of a plurality of samples in the training sample set until convergence, so as to obtain the neural machine translation model.
For example, the electronic device 1000 calculates a first derivative of the first parameter W1 based on the loss L and a preset back propagation algorithmCalculating a second derivative of the second parameter W2And a third derivative of the third parameter W3And then based on the first derivativeAnd the random gradient descent algorithm to the first parameterUpdating based on the second derivativeAnd the random gradient descent algorithm pair second parameterUpdating and based on the third derivativeAnd the random gradient descent algorithm pair third parameterAnd (6) updating.
Finally, the electronic device 1000 may update the first parameter, the second parameter, and the third parameter multiple times based on the loss of the multiple samples in the training sample set until convergence, resulting in the neural machine translation model.
The training method of the neural machine translation model according to the present embodiment has been described above with reference to the drawings and examples. In the method of the embodiment, a translation memory sentence pair with the highest similarity to each source end sentence in a parallel corpus is obtained from a translation memory; training a preset initial model by taking the source language sentence pair and the translation memory base sentence pair as a training sample set until a first parameter of a source-end encoder layer of the translation memory base of the preset initial model converges, a second parameter of a target-end encoder layer of the translation memory base of the preset initial model converges and a third parameter of a decoder layer containing translation memory base information of the preset initial model converges to obtain a neural machine translation model; the source sentence pair comprises a source sentence and a corresponding target sentence, and the translation memory base sentence pair comprises a translation memory base source sentence and a corresponding translation memory base target sentence. The method according to the present embodiment makes use of the sum source sentence s present in the translation memoryiSentence pairs in very similar translation memoryTo help the neural extreme translation model translate source sentences siThereby can liftThe accuracy of the translation result of the neural machine is improved, and the quality of the translated text is improved.
< apparatus >
This embodiment provides a training apparatus for a neural machine translation model, for example, the training apparatus 5000 for a neural machine translation model shown in fig. 5.
As shown in fig. 5, the training apparatus 5000 for the neural machine translation model may include an obtaining module 5100 and a training module 5200.
The obtaining module 5100 is configured to obtain, from the translation memory, a translation memory sentence pair with the highest similarity to each source end sentence in the parallel corpus.
The training module 5200 is configured to train a preset initial model with a source language sentence pair and a translation memory sentence pair as a training sample set until a first parameter of a source-end encoder layer of a translation memory of the preset initial model converges, a second parameter of a target-end encoder layer of the translation memory of the preset initial model converges, and a third parameter of a decoder layer of the preset initial model, which contains translation memory information, converges, so as to obtain a neural machine translation model; the source sentence pair comprises a source sentence and a corresponding target sentence, and the translation memory base sentence pair comprises a translation memory base source sentence and a corresponding translation memory base target sentence.
Specifically, the obtaining module 5100 is configured to calculate similarity between the source-end sentence and each translation memory bank source-end sentence in the translation memory bank; and sequencing the similarity from large to small, and determining the source end sentence with the maximum similarity of the translation memory base and the corresponding target end sentence as the translation memory base sentence pair with the highest similarity of the source end sentence.
In one example, the obtaining module 5100 can calculate an edit distance between the source sentence and each translation memory source sentence in the translation memory as the similarity.
Specifically, the training module 5200 is configured to translate each sample in the training sample set based on the preset initial model to obtain a translation result; substituting the translation result into a preset loss function for calculation to obtain the loss of each sample; the first, second, and third parameters are updated based on the loss until the first, second, and third parameters converge.
When the training module 5200 updates the first parameter, the second parameter, and the third parameter based on the loss until the first parameter, the second parameter, and the third parameter converge, the training module 5200 may specifically calculate a first derivative of the first parameter, a second derivative of the second parameter, and a third derivative of the third parameter based on the loss and a preset back propagation algorithm, respectively; updating the first parameter based on the first derivative and a gradient descent algorithm, updating the second parameter based on the second derivative and a gradient descent algorithm, and updating the third parameter based on the third derivative and a gradient descent algorithm; and updating the first parameter, the second parameter and the third parameter for multiple times based on the loss of a plurality of samples in the training sample set until convergence, so as to obtain the neural machine translation model.
The training device of the neural machine translation model of this embodiment may be used to implement the method technical solution of this embodiment, and its implementation principle and technical effect are similar, and are not described herein again.
< apparatus >
In this embodiment, an electronic device is also provided, and the electronic device includes the training apparatus 5000 for the neural machine translation model described in the embodiment of the present disclosure; alternatively, the electronic device is the electronic device 6000 shown in fig. 6, and includes a processor 6200 and a memory 6100.
Memory 6100 stores machine executable instructions that are executable by the processor; a processor 6200 executing the machine executable instructions to implement the method for training the neural machine translation model according to any one of the embodiments.
< computer-readable storage Medium embodiment >
The present embodiments provide a computer-readable storage medium having stored therein an executable command that, when executed by a processor, performs a method described in any of the method embodiments of the present disclosure.
< second embodiment >
< method >
The embodiment provides a neural machine translation method, which translates a source sentence to be translated by applying a neural machine translation model obtained by training in the embodiment.
Specifically, as shown in fig. 7, the method includes the following steps 7100 to 7200:
and 7100, acquiring a source sentence to be translated.
The source sentence to be translated may be, for example, a language, such as chinese, english, or japanese.
7200, inputting the source end sentence to be translated into a neural machine translation model, and outputting a target end sentence as a translation result.
The neural machine translation model acquires a translation memory bank sentence pair with the highest similarity to each source end sentence in the parallel corpus from a translation memory bank; and training a preset initial model by taking the source language sentence pair and the translation memory base sentence pair as a training sample set until a first parameter of a source-end encoder layer of the translation memory base of the preset initial model is converged, a second parameter of a target-end encoder layer of the translation memory base of the preset initial model is converged, and a third parameter of a decoder layer containing information of the translation memory base of the preset initial model is converged. The source sentence pair comprises a source sentence and a corresponding target sentence, and the translation memory base sentence pair comprises a translation memory base source sentence and a corresponding translation memory base target sentence. For the specific training process, reference may be made to the above-mentioned first embodiment, which is not described herein again.
The neural machine translation method of the embodiment utilizes the source terminal sentence s existing in the translation memory bankiSentence pairs in very similar translation memoryTo help the neural extreme translation model translate source sentences siTherefore, the accuracy of the neural machine translation result can be improved, and the translation quality is improved.
< apparatus >
The present embodiment provides a neural machine translation apparatus, for example, a neural machine translation apparatus 8000 shown in fig. 8.
As shown in fig. 8, the neural machine translation device 8000 may include an acquisition module 8100 and a translation module 8200.
The obtaining module 8100 is configured to obtain a source sentence to be translated.
A translation module 8200, configured to input the source-end sentence to be translated into a neural machine translation model, and output a target-end sentence as a translation result; wherein, the neural machine translation model is obtained by training according to the training method of the neural machine translation model as described in the first embodiment.
The neural machine translation device of this embodiment may be used to implement the method technical solution of this embodiment, and its implementation principle and technical effect are similar, and are not described herein again.
< apparatus >
In this embodiment, an electronic device is also provided, which includes the neural machine translation device 8000 described in the embodiment of the present disclosure; alternatively, the electronic device is an electronic device 9000 shown in fig. 9, which includes a processor 9200 and a memory 9100:
the memory 9100 stores machine-executable instructions capable of being executed by the processor; a processor 9200 executing the machine executable instructions to implement the neural machine translation method of any of the embodiments.
< computer-readable storage Medium embodiment >
The present embodiments provide a computer-readable storage medium having stored therein an executable command that, when executed by a processor, performs a method described in any of the method embodiments of the present disclosure.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are equivalent.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the present disclosure is defined by the appended claims.
Claims (10)
1. A method of training a neural machine translation model, the method comprising:
acquiring a translation memory sentence pair with the highest similarity to each source end sentence in the parallel corpus from the translation memory;
training a preset initial model by taking a source language sentence pair and a translation memory sentence pair as a training sample set until a first parameter of a source-end encoder layer of a translation memory of the preset initial model is converged, a second parameter of a target-end encoder layer of the translation memory of the preset initial model is converged, and a third parameter of a decoder layer of the preset initial model, which contains translation memory information, is converged to obtain a neural machine translation model;
the source sentence pair comprises a source sentence and a corresponding target sentence, and the translation memory base sentence pair comprises a translation memory base source sentence and a corresponding translation memory base target sentence.
2. The method of claim 1, wherein said retrieving from the translation memory pairs of translation memory sentences having the highest similarity to each source sentence in the parallel corpus comprises:
calculating the similarity between the source-end sentence and each translation memory library source-end sentence in the translation memory library;
and sequencing the similarity from large to small, and determining the source end sentence with the maximum similarity of the translation memory base and the corresponding target end sentence as the translation memory base sentence pair with the highest similarity of the source end sentences.
3. The method of claim 2, wherein said calculating a similarity of the source sentence to each translation memory source sentence in the translation memory comprises:
and calculating the editing distance between the source end sentence and each translation memory library source end sentence in the translation memory library as the similarity.
4. The method of claim 1, wherein training a preset initial model with the source language sentence pairs and the translation memory sentence pairs as a training sample set until a first parameter of a translation memory source-side encoder layer of the preset initial model converges, a second parameter of a translation memory target-side encoder layer of the preset initial model converges, and a third parameter of a decoder layer of the preset initial model containing translation memory information converges comprises:
translating each sample in the training sample set based on the preset initial model to obtain a translation result;
substituting the translation result into a preset loss function for calculation to obtain the loss of each sample;
updating the first, second, and third parameters based on the loss until the first, second, and third parameters converge.
5. The method of claim 4, wherein the updating the first, second, and third parameters based on the loss until the first, second, and third parameters converge comprises:
respectively calculating a first derivative of the first parameter, a second derivative of the second parameter and a third derivative of the third parameter based on the loss and a preset back propagation algorithm;
updating the first parameter based on the first derivative and gradient descent algorithm, the second parameter based on the second derivative and gradient descent algorithm, and the third parameter based on the third derivative and gradient descent algorithm;
and updating the first parameter, the second parameter and the third parameter for multiple times based on the loss of a plurality of samples in the training sample set until convergence, so as to obtain the neural machine translation model.
6. A method of neural machine translation, the method comprising:
acquiring a source end sentence to be translated;
inputting the source end sentence to be translated into a neural machine translation model, and outputting a target end sentence as a translation result; the neural machine translation model is obtained by training according to the training method of the neural machine translation model as claimed in any one of claims 1-5.
7. An apparatus for training a neural machine translation model, the apparatus comprising:
the acquisition module is used for acquiring translation memory sentence pairs with the highest similarity to each source end sentence in the parallel corpus from the translation memory;
the training module is used for training a preset initial model by taking a source language sentence pair and a translation memory sentence pair as a training sample set until a first parameter of a source end encoder layer of a translation memory library of the preset initial model converges, a second parameter of a target end encoder layer of the translation memory library of the preset initial model converges, and a third parameter of a decoder layer of the preset initial model, which contains translation memory library information, converges to obtain a neural machine translation model; the source sentence pair comprises a source sentence and a corresponding target sentence, and the translation memory base sentence pair comprises a translation memory base source sentence and a corresponding translation memory base target sentence.
8. A neural machine translation device, the device comprising:
the obtaining module is used for obtaining a source end sentence to be translated;
the translation module is used for inputting the source end sentence to be translated into a neural machine translation model and outputting a target end sentence as a translation result; the neural machine translation model is obtained by training according to the training method of the neural machine translation model as claimed in any one of claims 1-5.
9. An electronic device comprising a processor and a memory; the memory stores machine executable instructions executable by the processor; the processor executes the machine executable instructions to implement the method of training of a neural machine translation model of any of claims 1-5.
10. An electronic device comprising a processor and a memory; the memory stores machine executable instructions executable by the processor; the processor executes the machine executable instructions to implement the neural machine translation method of claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010646609.1A CN113919373A (en) | 2020-07-07 | 2020-07-07 | Neural machine translation method, training method and device of model thereof, and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010646609.1A CN113919373A (en) | 2020-07-07 | 2020-07-07 | Neural machine translation method, training method and device of model thereof, and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113919373A true CN113919373A (en) | 2022-01-11 |
Family
ID=79231375
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010646609.1A Pending CN113919373A (en) | 2020-07-07 | 2020-07-07 | Neural machine translation method, training method and device of model thereof, and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113919373A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115130479A (en) * | 2022-04-13 | 2022-09-30 | 腾讯科技(深圳)有限公司 | Machine translation method, target translation model training method, and related program and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20100062826A (en) * | 2008-12-02 | 2010-06-10 | 한국전자통신연구원 | Translation memory apply method for auto translation and its apparatus |
CN107329961A (en) * | 2017-07-03 | 2017-11-07 | 西安市邦尼翻译有限公司 | A kind of method of cloud translation memory library Fast incremental formula fuzzy matching |
CN109299479A (en) * | 2018-08-21 | 2019-02-01 | 苏州大学 | Translation memory is incorporated to the method for neural machine translation by door control mechanism |
CN110046359A (en) * | 2019-04-16 | 2019-07-23 | 苏州大学 | Neural machine translation method based on sample guidance |
-
2020
- 2020-07-07 CN CN202010646609.1A patent/CN113919373A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20100062826A (en) * | 2008-12-02 | 2010-06-10 | 한국전자통신연구원 | Translation memory apply method for auto translation and its apparatus |
CN107329961A (en) * | 2017-07-03 | 2017-11-07 | 西安市邦尼翻译有限公司 | A kind of method of cloud translation memory library Fast incremental formula fuzzy matching |
CN109299479A (en) * | 2018-08-21 | 2019-02-01 | 苏州大学 | Translation memory is incorporated to the method for neural machine translation by door control mechanism |
CN110046359A (en) * | 2019-04-16 | 2019-07-23 | 苏州大学 | Neural machine translation method based on sample guidance |
Non-Patent Citations (1)
Title |
---|
曹骞;熊德意;: "基于数据扩充的翻译记忆库与神经机器翻译融合方法", 中文信息学报, no. 05, 15 May 2020 (2020-05-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115130479A (en) * | 2022-04-13 | 2022-09-30 | 腾讯科技(深圳)有限公司 | Machine translation method, target translation model training method, and related program and device |
CN115130479B (en) * | 2022-04-13 | 2024-05-21 | 腾讯科技(深圳)有限公司 | Machine translation method, target translation model training method, and related program and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8494837B2 (en) | Active learning systems and methods for rapid porting of machine translation systems to new language pairs or new domains | |
KR20210057708A (en) | Method, apparatus, and electronic device for training text generation model | |
JP5802292B2 (en) | Shared language model | |
KR20210092147A (en) | Method and apparatus for mining entity focus in text | |
CN110175336B (en) | Translation method and device and electronic equipment | |
CN109241286B (en) | Method and device for generating text | |
CN113590761B (en) | Training method of text processing model, text processing method and related equipment | |
CN111160004B (en) | Method and device for establishing sentence-breaking model | |
US10902188B2 (en) | Cognitive clipboard | |
CN112287698B (en) | Chapter translation method and device, electronic equipment and storage medium | |
KR102606514B1 (en) | Similarity processing method, apparatus, server and storage medium | |
CN111339789A (en) | Translation model training method and device, electronic equipment and storage medium | |
JP2010520532A (en) | Input stroke count | |
KR20210080150A (en) | Translation method, device, electronic equipment and readable storage medium | |
US11646030B2 (en) | Subtitle generation using background information | |
CN113919373A (en) | Neural machine translation method, training method and device of model thereof, and electronic device | |
CN111328416A (en) | Speech patterns for fuzzy matching in natural language processing | |
CN113919372A (en) | Machine translation quality evaluation method, device and storage medium | |
JP2023078411A (en) | Information processing method, model training method, apparatus, appliance, medium and program product | |
CN116189654A (en) | Voice editing method and device, electronic equipment and storage medium | |
CN113901841A (en) | Translation method, translation device and storage medium | |
CN109800438B (en) | Method and apparatus for generating information | |
CN111860214A (en) | Face detection method, training method and device of model thereof and electronic equipment | |
CN114064845A (en) | Method and device for training relational representation model and electronic equipment | |
CN117174084B (en) | Training data construction method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |