WO2022179149A1

WO2022179149A1 - Machine translation method and apparatus based on translation memory

Info

Publication number: WO2022179149A1
Application number: PCT/CN2021/126674
Authority: WO
Inventors: 毛红保
Original assignee: 语联网（武汉）信息技术有限公司
Priority date: 2021-02-23
Filing date: 2021-10-27
Publication date: 2022-09-01
Also published as: CN112818712B; CN112818712A

Abstract

Provided in the present application is a machine translation method based on a translation memory. The method comprises: searching, in a translation memory, for corpus source text having the highest similarity to source text to be translated, and translated text of the corpus source text; comparing the source text to be translated with the corpus source text, and acquiring a difference part in the corpus source text that is different from the source text to be translated; mapping the difference part to the translated text of the corpus source text, and replacing translated text, to which the difference part is mapped, in the translated text of the corpus source text with a mask; and taking the translated text of the corpus source text after replacement is performed and the source text to be translated as an input of a machine translation model, and outputting translated text of the source text to be translated, wherein the machine translation model is obtained by means of performing training by taking a translation source text sample as a sample and taking translated text corresponding to the translation source text sample as a label. By means of the present application, translation is performed by combining source text to be translated and translated text of corpus source text, such that the translation efficiency can be improved, the translation cost can be reduced, and the translation accuracy can also be improved.

Description

Method and device for machine translation based on translation memory

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of the Chinese patent application with the application number 202110203208.3 filed on February 23, 2021 and the invention title is "Method and Device for Machine Translation Based on Translation Memory", which is fully incorporated herein by reference.

technical field

The present application relates to the technical field of machine translation, and in particular, to a method and device for machine translation based on translation memory.

Background technique

Translation memory is a bilingual corpus generated and retained by translators during the translation process. It is usually data of relatively high quality of translation after manual proofreading. Due to the limited corpus in the translation memory, it is likely that the exact same corpus as the current text to be translated cannot be retrieved from the translation memory, so the translation of the current text to be translated cannot be directly obtained from the translation memory.

Translation memories can be used to assist current translation tasks. The existing method is to retrieve the corpus similar to the current text to be translated from the translation memory, and present the corresponding translation to the translator. The translator manually modifies the translation of the similar corpus according to the current text to be translated to obtain the translation of the current text to be translated.

Due to the large differences in sentence structure and expression between the original and translated texts of similar corpora, translators need to spend a lot of time checking and editing the translations of similar corpora, which is labor-intensive.

SUMMARY OF THE INVENTION

The present application provides a method and device for machine translation based on translation memory, which is used to solve the problem of time-consuming and laborious work when translators check and edit translations of similar corpora in the prior art, and realize automatic translation of text to be translated based on translation memory. translate.

The application provides a machine translation method based on translation memory, including:

Find the original corpus with the highest similarity to the original to be translated and the translation of the original corpus from the translation memory;

Comparing the original text to be translated and the original text of the corpus to obtain the difference parts in the original text of the corpus that are different from the original text to be translated;

mapping the difference part to the translation of the original corpus, and replacing the translation mapped with the difference part in the translation of the original corpus with a mask;

Using the replaced translation of the original text of the corpus and the original text to be translated as the input of the machine translation model, and outputting the translation of the original text to be translated;

Wherein, the machine translation model is obtained by training a sample of the translated original text as a sample, and a translation corresponding to the translated original sample as a label.

According to a translation memory-based machine translation method provided by this application, the translation of the original text of the corpus and the original text to be translated are used as the input of the machine translation model, and the translation of the original text to be translated is output, including:

Input the original text to be translated into the first encoder of the machine translation model, and output the encoding result of the original text to be translated;

inputting the replaced translation of the original corpus into the second encoder of the machine translation model, and outputting an encoding result of the translation of the original corpus;

The encoding result of the original text to be translated and the encoding result of the translation of the original text of the corpus are input into the decoder of the machine translation model, and the translation of the original text to be translated is output.

According to a translation memory-based machine translation method provided by the present application, the encoding result of the original text to be translated and the encoding result of the translation of the original text of the corpus are input into the decoder of the machine translation model, and the to-be-translated text is output Translation of the original text, including:

After inputting the encoding result of the original text to be translated and the encoding result of the translation of the target text into the cross-attention mechanism layer of the decoder, the original text to be translated is output through the linear processing layer and the softmax layer of the decoder in turn. 's translation.

According to a translation memory-based machine translation method provided by the present application, the mask includes brackets and preset characters; wherein, the preset characters are located inside the brackets.

According to a translation memory-based machine translation method provided by the present application, if there are multiple difference parts, the mask for replacing the translation mapped by each difference part also includes the number of each difference part, and the The numbers are inside the brackets.

According to a translation memory-based machine translation method provided by the present application, the mapping of the difference part to the translation of the original text of the corpus includes:

word-aligning the original text of the corpus and the translation of the original text of the corpus;

According to the word alignment result, the difference part is mapped to the translation of the original corpus.

According to a translation memory-based machine translation method provided by the present application, the machine translation model is a Transformer model.

The application also provides a machine translation device based on translation memory, including:

The search module is used to search the original corpus with the highest similarity to the original to be translated and the translation of the original corpus from the translation memory;

a comparison module, configured to compare the original text to be translated and the original text of the corpus, and obtain the difference parts in the original text of the corpus that are different from the original text to be translated;

a replacement module, configured to map the difference part to the translation of the original corpus, and replace the translation mapped with the difference part in the translation of the original corpus with a mask;

a translation module, configured to use the replaced translation of the original text of the corpus and the original text to be translated as the input of the machine translation model, and output the translation of the original text to be translated;

The present application also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to achieve any of the above The steps of the translation memory-based machine translation method.

The present application also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of any one of the above-mentioned translation memory-based machine translation methods.

The translation memory-based machine translation method and device provided by the present application searches the translation memory for the original text of the corpus and the translation of the original text with the highest similarity to the original text to be translated, and automatically compares the similarity between the original text to be translated and the original text of the corpus , effectively reduce the work intensity of manual checking, then map the difference in the original corpus to the translation of the original corpus, replace the mapped translation of the difference in the translation of the original corpus with a mask, and finally combine the replaced translation of the original corpus Automatic translation of the original text to be translated can not only improve translation efficiency, reduce translation costs, but also improve translation accuracy.

Description of drawings

In order to illustrate the technical solutions in the present application or the prior art more clearly, the following briefly introduces the accompanying drawings required in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are the For some embodiments of the application, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

Fig. 1 is one of the schematic flow sheets of the translation memory-based machine translation method provided by the application;

Fig. 2 is the structural representation of the machine translation model in the translation memory-based machine translation method provided by the application;

Fig. 3 is the second schematic flow chart of the translation memory-based machine translation method provided by the application;

4 is a schematic structural diagram of a translation memory-based machine translation device provided by the application;

FIG. 5 is a schematic structural diagram of an electronic device provided by the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be described clearly and completely below with reference to the accompanying drawings in the present application. Obviously, the described embodiments are part of the embodiments of the present application. , not all examples. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

The translation memory-based machine translation method of the present application is described below with reference to FIG. 1 . The method includes: Step 101 , searching for the original corpus with the highest similarity to the original to be translated and the translation of the original corpus from the translation memory;

The original text to be translated may be a text that needs to be translated in various application fields, such as engineering, advertising, or medicine. This embodiment is not limited to the type and quantity of the original text to be translated. A large amount of bilingual corpus data is stored in the translation memory, and these corpus data are data of relatively high translation quality after manual proofreading.

The text similarity retrieval method can be used to take the original text to be translated as the query text, retrieve the original corpus with the highest similarity to the original text to be translated from the translation memory, and retrieve the translation of the original text from the translation memory. The method of calculating the similarity may be calculating the Pearson correlation or the Euclidean distance between the original text to be translated and the original text of the corpus in the translation memory. This embodiment is not limited to the calculation method of the similarity.

Step 102, compare the original text to be translated and the original text of the corpus, and obtain the difference parts in the original text of the corpus that are different from the original text to be translated;

Specifically, the original text of the corpus retrieved from the translation memory may or may not be completely consistent with the original text to be translated. Therefore, after retrieving the original corpus from the translation memory, it is necessary to compare the similarity between the original to be translated and the original of the corpus to determine whether the original to be translated and the original of the corpus are completely consistent. It may be to perform word segmentation processing on the original text to be translated and the original text of the corpus, compare the similarity of words in the same position of the original text to be translated and the original text of the corpus, and determine whether the original text to be translated and the original text of the corpus are completely consistent according to the comparison result. This embodiment is not limited to this determination method.

If the original text to be translated is inconsistent with the original text of the corpus, the difference is marked in the original text of the corpus. For example, the original text to be translated is "I have an apple", and the original text with the highest similarity is "I have a pear". According to the similarity comparison result, the difference between the original text and the original text to be translated can be obtained as "pear". Then the difference parts can be marked in the original corpus. The original text of the marked corpus is "I have a [pear]", and this embodiment is not limited to this marking method.

Step 103, mapping the difference part to the translation of the original text of the corpus, and replacing the mapped translation of the difference part in the translation of the original text of the corpus with a mask;

Specifically, after obtaining the difference parts in the original text of the corpus that are different from the original text to be translated, the difference parts can be mapped to the translation of the original text of the corpus. For example, the original corpus is "I have a pear", the translation of the original corpus is "I have a pear", the difference in the original corpus is "pear", and the corresponding difference in the translation of the original corpus is "pear". After marking the difference part, the original text of the corpus is "I have a [pear]", and after mapping the marked difference part to the translation of the original text, the translation of the original text is "I have a [pear]".

Then, mask replacement is performed on the translation mapped by the difference part in the translation of the original corpus. Among them, the type of mask can be set according to actual needs.

Step 104, take the translation of the original text of the corpus and the original text to be translated as the input of the machine translation model, and output the translation of the original text to be translated; wherein, the machine translation model uses the translated text sample as a sample, and the translation The translation corresponding to the original sample is obtained by training as a label.

Specifically, the replaced translation of the original corpus and the original to be translated can be input into the machine translation model, and the machine translation model can learn the translation of the replaced original of the corpus and the original to be translated, and can output an accurate translation of the original to be translated. Among them, the machine translation model can be a neural machine translation model, but is not limited to this type.

In addition, the original text to be translated and the translation of the original text to be translated output by the machine translation model can also be added to the translation memory to provide rich corpus data for the expansion of the translation memory.

Since the corpus data in the translation memory contains high-quality translations, in this embodiment, the translation of the original corpus and the original to be translated are automatically translated, which can not only improve the accuracy of translation, but also reduce the need for checking and editing. work intensity, improve translation efficiency, and reduce translation costs.

In this embodiment, the original text of the corpus and the original text of the corpus with the highest similarity to the original text to be translated are searched in the translation memory, and the similarity between the original text to be translated and the original text of the corpus is automatically compared, thereby effectively reducing the work intensity of manual checking, and then the The difference part in the original corpus is mapped to the translation of the original corpus, the translation of the difference mapped in the translation of the original corpus is replaced with a mask, and finally the translation of the original corpus after the replacement and the original to be translated are automatically translated, not only It can improve translation efficiency, reduce translation cost, and improve translation accuracy.

On the basis of the above embodiment, in this embodiment, the translation of the original corpus and the original text to be translated are used as the input of the machine translation model, and the translation of the original text to be translated is output, including: converting the original text to be translated Input the translated text into the first encoder of the machine translation model, and output the encoding result of the original text to be translated; input the translation of the replaced corpus original into the second encoder of the machine translation model, and output the corpus The encoding result of the translation of the original text; the encoding result of the original text to be translated and the encoding result of the translation of the original text of the corpus are input into the decoder of the machine translation model, and the translation of the original text to be translated is output.

The machine translation model is a multi-input translation model, including two parallel encoders, namely, a first encoder and a second encoder. Wherein, the first encoder and the second encoder may be multiple layers. This embodiment is not limited to the number and structure of encoder layers. The machine translation model further includes a decoder, and the decoder may also be multi-layered, and this embodiment is not limited to the number and structure of the decoder layers.

The original text to be translated can be input into the first encoder, and the first encoder learns the original text to be translated, and outputs the encoding result of the original text to be translated; at the same time, the translation of the replaced original text of the corpus is input into the second encoder, and the second encoder passes After learning the translation of the original corpus, the encoding result of the translation of the original corpus is output. Then, the coding result of the original text to be translated and the coding result of the translation of the original text of the corpus are input into the decoder, and the decoder outputs the final translation result after learning the coding result of the original text to be translated and the coding result of the translation of the original text of the corpus.

On the basis of the above embodiment, in this embodiment, the encoding result of the original text to be translated and the encoding result of the translation of the original text of the corpus are input into the decoder of the machine translation model, and the translation of the original text to be translated is output, The method includes: inputting the encoding result of the original text to be translated and the encoding result of the translation of the target text into the cross-attention mechanism layer of the decoder, and then sequentially passing through the linear processing layer and the softmax layer of the decoder, and outputting the to-be-translated layer. Translation of the original text.

Among them, the encoder includes multiple sub-layers, and each sub-layer includes a feed-forward neural network layer, a cross-attention layer and a self-attention layer. As shown in Figure 2, the encoder also includes an input layer, a Linear (linear processing) layer, and a softmax layer. The Linear layer is used to flatten the input features into the form of a 1D tensor.

After the encoding result of the original text to be translated is subjected to the cross-attention operation in the cross-attention layer of the decoder, the result of the first cross-attention operation is output. Then, after performing the cross-attention operation on the result of the first cross-attention operation and the encoding result of the translation of the original corpus, the second cross-attention operation result is output. The result of the second cross-attention operation is sequentially passed through the linear processing layer and the softmax layer of the decoder to output the translation of the original text to be translated.

Based on the foregoing embodiments, the mask in this embodiment includes brackets and preset characters; wherein, the preset characters are located inside the brackets.

Specifically, parentheses and preset characters can be used as masks. Among them, the brackets can be square brackets, the preset character can be mask, and the mask is [mask]. The present embodiment is not limited to this type of mask. By using this mask, the translation of the difference mapping in the translation of the original corpus can be replaced by [mask]. For example, the translation of the original corpus is "I have a pear". "pear" is the translation of the difference part mapping, then the translation of the original corpus after mask replacement is "I have a[mask]".

On the basis of the above-mentioned embodiment, in this embodiment, if there are multiple different parts, the mask for replacing the translation mapped by each difference part also includes the number of each difference part, and the number is located in the inside parentheses.

Specifically, if there are multiple difference parts in the translation of the original text of the corpus, the translations mapped by the corresponding difference parts are replaced one by one by using a plurality of masks containing numbers respectively. Such as [mask1] and [mask2], etc. Among them, 1 and 2 in parentheses are the numbers of the difference parts.

On the basis of the above embodiments, in this embodiment, mapping the difference part to the translation of the original corpus includes: performing word alignment on the original corpus and the translation of the original corpus; according to the word alignment As a result, the difference portion is mapped to a translation of the corpus original.

Specifically, before mapping the difference part to the translation of the original corpus, a word alignment tool can be used to perform automatic word alignment on the original corpus and the translation of the original corpus. After word alignment, there is a correspondence between each word in the original corpus and each word in the translation of the original corpus. The word alignment tool may be a fast_align word alignment tool or a GIZA++ word alignment tool, etc. This embodiment is not limited to the word alignment tool.

For example, the original corpus is "I have a pear", and the translation of the original corpus is "I have a pear". After word alignment processing, "I" corresponds to "I", "有" corresponds to "have", "a " corresponds to "a", and "pear" corresponds to "pear".

In this embodiment, by performing automatic word alignment on the original corpus and the translation of the original corpus, the difference parts can be quickly mapped from the original corpus to the translation of the corpus.

On the basis of the foregoing embodiments, the machine translation model described in this embodiment is a Transformer model.

Specifically, a multi-input Transformer model can be used to translate the original text to be translated. Among them, the Transformer model uses a self-attention network for encoding and decoding. Both the Encoder (encoder) and the Decoder (decoder) are composed of multiple sub-layers, and each sub-layer includes a self-attention layer and a feed-forward neural network layer. In the Decoder, an Encoder-Decoder cross-attention layer is attached between the self-attention layer and the feed-forward neural network layer. Transformer models achieve state-of-the-art translation performance in many language translations.

As shown in FIG. 3, the complete flow chart of this embodiment is shown, and the specific steps include:

Step 1: Match the original text to be translated with the original text of the corpus in the translation memory, and output the original text of the corpus and the translation of the original text with the highest similarity to the original text to be translated;

Step 2, word-aligning the original corpus and the translation of the original corpus;

Step 3, compare the original text of the corpus with the original text to be translated, and mark the differences existing in the original text of the corpus;

Step 4: Map the marked differences in the original corpus to the translation of the original corpus;

Step 5, use the mask to replace the translation mapped by the difference part in the translation of the original text of the corpus;

In step 6, the translated text of the original corpus and the original text to be translated are used as the input of the machine translation model, and the translation of the original text to be translated is output.

The translation memory-based machine translation apparatus provided by the present application is described below. The translation memory-based machine translation apparatus described below and the translation memory-based machine translation method described above may refer to each other correspondingly.

As shown in FIG. 4 , the present embodiment provides a machine translation device based on translation memory. The device includes a search module 401, a comparison module 402, a replacement module 403 and a translation module 404, wherein:

The search module 401 is used for searching the original corpus with the highest similarity to the original to be translated and the translation of the original corpus from the translation memory;

The comparison module 402 is configured to compare the original text to be translated and the original text of the corpus, and obtain the difference parts in the original text of the corpus that are different from the original text to be translated;

If the original text to be translated is inconsistent with the original text of the corpus, the difference is marked in the original text of the corpus.

The replacement module 403 is configured to map the difference part to the translation of the original text of the corpus, and replace the mapped translation of the difference part in the translation of the original text of the corpus with a mask;

Specifically, after obtaining the difference parts in the original text of the corpus that are different from the original text to be translated, the difference parts can be mapped to the translation of the original text of the corpus. Then, mask replacement is performed on the translation mapped by the difference part in the translation of the original corpus. Among them, the type of mask can be set according to actual needs.

The translation module 404 is configured to use the translation of the replaced original text and the original text to be translated as the input of the machine translation model, and output the translation of the original text to be translated; wherein, the machine translation model uses a sample of the original text to be translated as a sample. The translation corresponding to the translated original sample is obtained by training as a label.

Specifically, the replaced translation of the original corpus and the original to be translated can be input into the machine translation model, and the machine translation model can learn the translation of the replaced original of the corpus and the original to be translated, and can output an accurate translation of the original to be translated. The machine translation model may be a neural machine translation model, but is not limited to this type.

On the basis of the above-mentioned embodiment, the translation module in this embodiment is specifically configured to: input the original text to be translated into the first encoder of the machine translation model, and output the encoding result of the original text to be translated; The translation of the original corpus is input into the second encoder of the machine translation model, and the encoding result of the translation of the original corpus is output; the encoding result of the original to be translated and the encoding result of the translation of the original corpus are input into the machine. The decoder of the translation model outputs the translation of the original text to be translated.

On the basis of the above-mentioned embodiment, the translation module in this embodiment is further configured to input the encoding result of the original text to be translated and the encoding result of the translation of the target text into the cross-attention mechanism layer of the decoder, and then sequentially go through the The linear processing layer and the softmax layer of the decoder output the translation of the original text to be translated.

On the basis of the above embodiments, this embodiment further includes a mapping module for performing word alignment on the original text of the corpus and the translation of the original text of the corpus; according to the word alignment result, the difference part is mapped to the The translation of the original text.

FIG. 5 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG. 5 , the electronic device may include: a processor (processor) 501, a communication interface (Communications Interface) 502, a memory (memory) 503 and a communication bus 504, The processor 501 , the communication interface 502 , and the memory 503 communicate with each other through the communication bus 504 . The processor 501 may invoke the logic instructions in the memory 503 to execute a translation memory-based machine translation method, the method comprising: searching the original corpus with the highest similarity to the original to be translated and a translation of the original corpus from the translation memory ; Compare the original text to be translated and the original text of the corpus, and obtain the difference part in the original text of the corpus that is different from the original text to be translated; map the difference part to the translation of the original text of the corpus, and convert the corpus The translation mapped by the difference part in the translation of the original text is replaced with a mask; the translation of the replaced corpus original text and the original text to be translated are used as the input of the machine translation model, and the translation of the original text to be translated is output; wherein, the machine translation The model is obtained by training the translated text sample as a sample, and the translation corresponding to the translated text sample as a label.

In addition, the above-mentioned logic instructions in the memory 503 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

In another aspect, the present application also provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer When executing, the computer can execute the translation memory-based machine translation method provided by the above methods, and the method includes: searching the original corpus with the highest similarity to the original to be translated and the translation of the original corpus from the translation memory; The original text to be translated is compared with the original text of the corpus, and the difference part of the original text of the corpus that is different from the original text to be translated is obtained; the difference part is mapped to the translation of the original text of the corpus, and the The translation mapped by the difference part in the translation is replaced with a mask; the translation of the replaced corpus original text and the original text to be translated are used as the input of the machine translation model, and the translation of the original text to be translated is output; wherein, the machine translation model consists of The translated original sample is used as a sample, and the translation corresponding to the translated original sample is obtained by training as a label.

In another aspect, the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the computer program is implemented to execute the translation memory-based machine translation methods provided above, The method includes: searching the original corpus with the highest similarity to the original to be translated and a translation of the original corpus from a translation memory; comparing the original to be translated and the original of the corpus, and obtaining the Describe the different parts of the original text to be translated; map the difference parts to the translation of the original corpus, and replace the translation mapped with the difference in the translation of the original corpus as a mask; The original text to be translated is used as the input of the machine translation model, and the translation of the original text to be translated is output; wherein, the machine translation model is obtained by training a sample of the original text to be translated as a sample and the translation corresponding to the sample of the original translated text as a label.

The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims

A method for machine translation based on translation memory, comprising:

Find the original corpus with the highest similarity to the original to be translated and the translation of the original corpus from the translation memory;

Comparing the original text to be translated and the original text of the corpus to obtain the difference parts in the original text of the corpus that are different from the original text to be translated;

mapping the difference part to the translation of the original corpus, and replacing the translation mapped with the difference part in the translation of the original corpus with a mask;

Using the replaced translation of the original text of the corpus and the original text to be translated as the input of the machine translation model, and outputting the translation of the original text to be translated;

Wherein, the machine translation model is obtained by training a sample of the translated original text as a sample, and a translation corresponding to the translated original sample as a label.
The method for machine translation based on translation memory according to claim 1, wherein the translation of the original text of the corpus and the original text to be translated are used as the input of the machine translation model, and the translation of the original text to be translated is output. translations, including:

Input the original text to be translated into the first encoder of the machine translation model, and output the encoding result of the original text to be translated;

inputting the replaced translation of the original corpus into the second encoder of the machine translation model, and outputting an encoding result of the translation of the original corpus;

The encoding result of the original text to be translated and the encoding result of the translation of the original text of the corpus are input into the decoder of the machine translation model, and the translation of the original text to be translated is output.
The method for machine translation based on translation memory according to claim 2, wherein the encoding result of the original text to be translated and the encoding result of the translation of the original corpus are input into the decoder of the machine translation model, and output The translation of the original text to be translated, including:

After inputting the encoding result of the original text to be translated and the encoding result of the translation of the target text into the cross-attention mechanism layer of the decoder, the original text to be translated is output through the linear processing layer and the softmax layer of the decoder in turn. 's translation.
The machine translation method based on translation memory according to any one of claims 1-3, wherein the mask includes brackets and preset characters; wherein, the preset characters are located inside the brackets.
The method for machine translation based on translation memory according to claim 4, wherein if there are multiple difference parts, the mask for replacing the translation mapped by each difference part further includes the number, which is inside the brackets.
The machine translation method based on translation memory according to any one of claims 1-3, wherein the mapping of the difference part to the translation of the original text of the corpus comprises:

word-aligning the original corpus and the translation of the original corpus;

According to the word alignment result, the difference part is mapped to the translation of the original corpus.
The machine translation method based on translation memory according to any one of claims 1-3, wherein the machine translation model is a Transformer model.
A machine translation device based on translation memory, comprising:

The search module is used to search the original corpus with the highest similarity to the original to be translated and the translation of the original corpus from the translation memory;

a comparison module, configured to compare the original text to be translated and the original text of the corpus, and obtain the difference parts in the original text of the corpus that are different from the original text to be translated;

a replacement module, configured to map the difference part to the translation of the original corpus, and replace the translation mapped with the difference part in the translation of the original corpus with a mask;

a translation module, configured to use the replaced translation of the original text of the corpus and the original text to be translated as the input of the machine translation model, and output the translation of the original text to be translated;

Wherein, the machine translation model is obtained by training a sample of the translated original text as a sample, and a translation corresponding to the translated original sample as a label.
An electronic device, comprising a memory, a processor, and a computer program stored on the memory and running on the processor, characterized in that, when the processor executes the program, the implementation of claims 1 to 7 The steps of any one of the translation memory-based machine translation methods.
A non-transitory computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the translation memory-based machine translation according to any one of claims 1 to 7 is realized steps of the method.