CN111859996B - Training method and device of machine translation model, electronic equipment and storage medium - Google Patents

Training method and device of machine translation model, electronic equipment and storage medium Download PDF

Info

Publication number
CN111859996B
CN111859996B CN202010550590.0A CN202010550590A CN111859996B CN 111859996 B CN111859996 B CN 111859996B CN 202010550590 A CN202010550590 A CN 202010550590A CN 111859996 B CN111859996 B CN 111859996B
Authority
CN
China
Prior art keywords
translation
sentence
sample
machine translation
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010550590.0A
Other languages
Chinese (zh)
Other versions
CN111859996A (en
Inventor
张睿卿
何中军
吴华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010550590.0A priority Critical patent/CN111859996B/en
Publication of CN111859996A publication Critical patent/CN111859996A/en
Application granted granted Critical
Publication of CN111859996B publication Critical patent/CN111859996B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The application discloses a training method and device of a machine translation model, electronic equipment and a storage medium, and relates to the technical field of natural language processing. The specific implementation scheme is as follows: adopting two machine translation models with dual structures to excavate mistranslation samples from a parallel corpus; the corresponding machine translation model is trained using the transliteration samples. The method and the device can mine the mistranslation sample of the machine translation model, train the mistranslation sample, enable the knowledge of the mistranslation sample to be learned, avoid mistranslation, and effectively improve the translation accuracy of the machine translation model. Compared with the prior art, the method has very strong flexibility, does not have any part of speech or other requirements on the mistranslation sample, can be suitable for mining any mistranslation sample containing sparse words and the like from a parallel corpus, and retrains the machine translation model based on the mined mistranslation sample so as to further effectively improve the translation accuracy of the machine translation model.

Description

Training method and device of machine translation model, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a training method and apparatus for a machine translation model, an electronic device, and a storage medium.
Background
In natural language processing (Natural Language Processing; NLP), when machine translation is performed by using a machine translation model trained based on deep learning technology, a situation that sparse words or unusual expression modes are translated in error often occurs, and the main reason is that the expression modes of the words have too few samples, and the machine translation model is inadequately learned.
In the prior art, in order to improve the accuracy of translating sparse words by a machine translation model, the translation of the corresponding sparse words can be interfered by manually interfering the translation content of the sparse words, so that the accuracy of the translation of the sparse words is improved.
However, not all sparse words may be resolved by intervention, e.g. translation of verbs is generally context dependent, and may not be suitable for determination by prior intervention. Therefore, the existing method for improving the translation accuracy of the sparse words by the machine translation model through manual intervention has great limitation and very poor flexibility.
Disclosure of Invention
In order to solve the technical problems, the application provides a training method and device of a machine translation model, electronic equipment and a storage medium.
According to an aspect of the present application, there is provided a method for training a machine translation model, wherein the method includes:
adopting two machine translation models with dual structures to excavate mistranslation samples from a parallel corpus;
and training a corresponding machine translation model by adopting the transliteration sample.
According to another aspect of the present application, there is provided a training apparatus of a machine translation model, wherein the apparatus includes:
the mining module is used for mining the misinterpretation samples from the parallel corpus by adopting two machine translation models with dual structures;
and the training module is used for training a corresponding machine translation model by adopting the mistranslation sample.
According to still another aspect of the present application, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to yet another aspect of the present application, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method as described above.
According to yet another aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
According to the technology of the application, the misinterpretation samples are mined from a parallel corpus by adopting two machine translation models which are dual structures; the corresponding machine translation model is trained by adopting the transliteration sample, the transliteration sample with the transliteration generated by the machine translation model can be mined, and the transliteration sample is trained again, so that the knowledge of the transliteration sample is learned without transliteration, and the translation accuracy of the machine translation model can be effectively improved. Compared with the prior art, the technical scheme of the method and the device for optimizing the machine translation model has the advantages that flexibility is very strong, no part-of-speech or other requirements are met on the mistranslation samples, the method and the device can be suitable for mining any mistranslation samples containing sparse words and the like from a parallel corpus, and retraining is conducted on the machine translation model based on the mined mistranslation samples so as to further effectively improve the translation accuracy of the machine translation model.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present application;
FIG. 2 is a schematic diagram according to a second embodiment of the present application;
FIG. 3 is a schematic diagram according to a third embodiment of the present application;
FIG. 4 is a schematic diagram according to a fourth embodiment of the present application;
FIG. 5 is a schematic diagram according to a fifth embodiment of the present application;
FIG. 6 is a block diagram of an electronic device for implementing a training method for a machine translation model of an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic view of a first embodiment of the present application. As shown in fig. 1, the present embodiment provides a training method of a machine translation model, which specifically may include the following steps:
s101, adopting two machine translation models with dual structures to excavate mistranslation samples from a parallel corpus;
s102, training a corresponding machine translation model by adopting the transliteration sample.
The execution body of the machine translation model training method in this embodiment is a machine translation model training device, and the machine translation model training device may be an electronic entity similar to a computer, or may be an application integrated by software, where the application runs on a computer device to implement training of a machine translation model when in use.
In this embodiment, two machine translation models with dual structures, that is, a forward machine translation model and a reverse machine translation model, can translate between two languages. The parallel corpus of the embodiment may include several samples, where each sample includes a source sentence and a target sentence, and the source sentence and the target sentence belong to languages of different languages. For example, if the forward machine translation model can translate a source sentence into a target sentence, the reverse machine translation model can translate the target sentence into the source sentence; and vice versa.
Considering that in the prior art, the number of samples such as sparse words is small compared with the total training data, in the training process, the problem that insufficient learning leads to translation of the sparse words easily occurs, and the samples can be called as mistranslation samples of a machine translation model. As can be seen, the misinterpretation samples of the present embodiment are not themselves erroneous, but rather the machine translation model is prone to translate erroneous samples during the translation of the machine translation model. Moreover, the misinterpretation samples of the present embodiment may refer to some samples containing sparse words and the like, and due to rarity, the machine learning ability is insufficient, and misinterpretation occurs.
In this embodiment, the mistranslation samples of the machine translation model in translation may be mined from the parallel corpus, and the mistranslation samples are adopted to purposefully and intensively retrain the machine translation model, so that the machine translation model may learn knowledge of the samples that were mistranslated before, and mistranslation may not occur any more, and further the translation accuracy of the machine translation model to sparse words may be improved.
If only one machine translation model is adopted, the source sentence is translated into the target sentence, and besides human detection, whether the machine translation model is subjected to mistranslation is difficult to detect. Therefore, in this embodiment, a forward machine translation model and a reverse machine translation model which are dual to each other are adopted, and one of them can be adopted to verify whether the other is mistranslated, so that mistranslated samples can be mined from a parallel corpus. And finally, training a corresponding machine translation model by adopting an transliteration sample to enable the knowledge in the transliteration sample to be learned, so that the transliteration is not generated any more. The machine translation model obtained through training in the embodiment can be applied to the NLP field to translate any sample, and the error rate of translation can be effectively reduced through training in the embodiment. Moreover, the technical scheme of the embodiment can be adopted regularly, and the machine translation model is trained pertinently by adopting the mistranslation sample, so that the machine translation model can learn new knowledge continuously, master new skills and improve the translation accuracy.
According to the training method of the machine translation model, by adopting two machine translation models with dual structures, a misinterpretation sample is mined from a parallel corpus; the corresponding machine translation model is trained by adopting the transliteration sample, the transliteration sample with the transliteration generated by the machine translation model can be mined, and the transliteration sample is trained again, so that the knowledge of the transliteration sample is learned without transliteration, and the translation accuracy of the machine translation model can be effectively improved. Compared with the prior art, the technical scheme of the embodiment has very strong flexibility, does not have any part of speech or other requirements on the mistranslation sample, can be suitable for mining any mistranslation sample containing sparse words and the like from a parallel corpus, and retrains the machine translation model based on the mined mistranslation sample so as to further effectively improve the translation accuracy of the machine translation model.
Fig. 2 is a schematic diagram of a second embodiment of the present application. As shown in fig. 2, the training method of the machine translation model of the present embodiment further introduces the technical solution of the present application in more detail on the basis of the technical solution of the embodiment shown in fig. 1. As shown in fig. 2, the training method of the machine translation model of the present embodiment may specifically include the following steps:
s201, for each sample in a parallel corpus, forward translation is carried out on a source sentence in the sample by adopting a forward machine translation model, so as to obtain a first translation sentence;
s202, performing reverse translation on the first translation sentence by adopting a reverse machine translation model to obtain a second translation sentence;
s023, adopting a forward machine translation model to forward translate the second translation sentence to obtain a third translation sentence;
s204, excavating whether the sample belongs to an transliteration sample of a forward machine translation model or not by analyzing the semantics of the source sentence, the second translation sentence and the first translation sentence and the third translation sentence;
for example, the implementation of the step S204 may include the following steps:
(a) Detecting whether a semantic difference exists between a source sentence and a second translation sentence or not, and whether a semantic difference exists between a first translation sentence and a third translation sentence or not; if the source sentence and the second translation sentence have semantic differences and the first translation sentence and the third translation sentence do not have semantic differences, executing the step (b); otherwise, for other cases, the sample is not considered an transliteration sample of the forward machine translation model.
(b) Determining that the sample is an transliteration sample of the forward machine translation model;
for example, the following describes the technical solution of the present embodiment in detail by taking two machine translation models with dual structures as a forward machine translation model x2y and a reverse machine translation model y2x, respectively, and taking whether the sample (x, y) is an mistranslated sample of the forward machine translation model as an example.
Firstly, assume that if the source sentence x and the target sentence in the sample are translated by two machine translation models x2y and y2x with dual structures respectively, the translation results y 'and y and x' are not semantically different, and the two steps of translation of x2y and y2x are considered to be correct.
Based on the above principle, the following operations are performed:
(1) Inputting a source sentence x in a sample into a forward machine translation model x2y, and translating y' by using the x;
(2) Inputting y ' into a reverse machine translation model y2x, and translating x ' by using y ';
(3) Detecting whether a semantic difference (diff) exists between x and x';
(4) Inputting x ' into a forward machine translation model x2y, and translating y ' by using the x ';
(5) Detecting whether a semantic diff exists between y 'and y';
in the above 5 steps, only the semantic difference between two detection sentences is detected, namely, the step (3) and the step (5); if step (3) does not produce semantic diff, according to the assumption, neither translation step (1) nor step (2) is problematic; conversely, if step (3) produces the semantic diff, it is stated that at least one of steps (1) and (2) produces the transliteration.
In this case, if step (5) has no semantic diff, there is no problem in translation of both step (2) and step (4) according to the assumption, so that there is a problem that misinterpretation occurs in step (1), i.e., step (1); conversely, if the semantic diff is also generated in step (5), it is difficult to determine where the problem is.
Therefore, in this embodiment, only the case that the semantic diff appears in the step (3) and the semantic diff does not exist in the step (5) is defined, the corresponding sample (x, y) is selected as the transliteration sample of the forward machine translation model.
S205, training a forward machine translation model by adopting an misinterpretation sample of the mined forward machine translation model.
By adopting the mode of the embodiment, a group of misinterpretation samples of the forward machine translation model can be mined each time, then the forward machine translation model is trained by adopting the group of misinterpretation samples, so that the forward machine translation model can learn the misinterpretation samples with emphasis, and the optimization of the forward machine translation model can be realized.
According to the training method of the machine translation model, by adopting the scheme, the mistranslation sample of the forward machine translation model can be mined, and then the mistranslation sample is adopted to perform key learning on the forward machine translation model so as to optimize the forward machine translation model, so that mistranslation can not occur when the forward machine translation model translates sparse words such as the mistranslation sample again, and further the translation accuracy of the forward machine translation model can be effectively improved. The technical scheme of the embodiment has very strong flexibility, does not have any part of speech or other requirements on the mistranslation sample, can be suitable for mining any mistranslation sample containing sparse words and the like from a parallel corpus, and retrains the machine translation model based on the mined mistranslation sample so as to further effectively improve the translation accuracy of the machine translation model.
Fig. 3 is a schematic view of a third embodiment of the present application. As shown in fig. 3, the training method of the machine translation model of the present embodiment further describes the technical solution of the present application in more detail on the basis of the technical solution of the embodiment shown in fig. 1. As shown in fig. 3, the training method of the machine translation model of the present embodiment may specifically include the following steps:
s301, for each sample in the parallel corpus, performing reverse translation on a target sentence in the sample by adopting a reverse machine translation model to obtain a fourth translation sentence;
s302, forward translation is carried out on the fourth translation statement by adopting a forward machine translation model, and a fifth translation statement is obtained;
s303, performing reverse translation on the fifth translation sentence by adopting a reverse machine translation model to obtain a sixth translation sentence;
s304, excavating whether a sample belongs to a misinterpretation sample of a reverse machine translation model or not by analyzing the semantics of the target sentence and the fifth translation sentence and the semantics of the fourth translation sentence and the sixth translation sentence;
for example, the implementation of this step S304 may include the following steps:
(A) Detecting whether a semantic difference exists between a target sentence and a fifth translation sentence or not, and whether a semantic difference exists between a fourth translation sentence and a sixth translation sentence or not; if the target sentence and the fifth translation sentence have semantic differences and the fourth translation sentence and the sixth translation sentence do not have semantic differences, executing the step (B); otherwise, for other cases, the sample is not considered an transliteration sample of the reverse machine translation model.
(B) The sample is determined to be an transliterated sample of the reverse machine translation model.
This embodiment differs from the embodiment shown in fig. 2 described above in that: the embodiment shown in FIG. 2 above is in mining a sample of the misinterpretation of the forward machine translation model. While this embodiment is similar in implementation principle in mining erroneous samples of the reverse machine translation model.
Similarly, the following description still uses two dual-structure machine translation models, namely a forward machine translation model x2y and a reverse machine translation model y2x, to determine whether the sample (x, y) is an mistranslated sample of the reverse machine translation model as an example, and describes the technical scheme of the embodiment in detail.
The following operations are performed with reference to the principle described in the embodiment shown in fig. 2 above:
(1 ') inputting a target sentence y in the sample into a reverse machine translation model y2x, and translating x' by using y;
(2 ') inputting x' into a forward machine translation model x2y, and translating y 'by using the x';
(3 ') detecting whether there is a semantic diff between y and y';
(4 ') inputting y ' into the reverse machine translation model y2x, translating x "using y ';
(5 ') detecting whether there is a semantic diff between x' and x ";
similarly, in the above 5 steps, only the semantic difference between the sentences is detected in two steps, namely, the step (3 ') and the step (5'); if step (3 ') does not produce semantic diff, according to the assumption, neither translation step (1 ') nor step (2 ') is problematic; conversely, if step (3 ') produces a semantic diff, it is stated that at least one of steps (1 ') and (2 ') produces an transliteration.
In this case, if step (5 ') has no semantic diff, according to the assumption, both the translation of step (2 ') and of step (4 ') have no problem, so that the problem arises that an incorrect translation is generated in step (1 '), i.e., step (1 '); conversely, if step (5') also produces a semantic diff, it is difficult to determine where the problem is.
Therefore, in this embodiment, only the case that the semantic diff appears in the step (3 ') is defined, and the semantic diff does not exist in the step (5'), the corresponding sample (x, y) is selected as the transliteration sample of the reverse machine translation model.
S305, training a reverse machine translation model by adopting an misinterpretation sample of the mined reverse machine translation model.
Similarly, by adopting the mode of the embodiment, a group of mistranslation samples of the reverse machine translation model can be mined each time, and then the reverse machine translation model is trained by adopting the group of mistranslation samples, so that the reverse machine translation model can learn the mistranslation samples with emphasis, and the optimization of the reverse machine translation model can be realized.
According to the training method of the machine translation model, by adopting the scheme, the mistranslation sample of the reverse machine translation model can be mined, and then the mistranslation sample is adopted to perform key learning on the reverse machine translation model so as to optimize the reverse machine translation model, so that mistranslations cannot occur when the reverse machine translation model translates sparse words such as the mistranslation sample again, and further the translation accuracy of the reverse machine translation model can be effectively improved. The technical scheme of the embodiment has very strong flexibility, does not have any part of speech or other requirements on the mistranslation sample, can be suitable for mining any mistranslation sample containing sparse words and the like from a parallel corpus, and retrains the machine translation model based on the mined mistranslation sample so as to further effectively improve the translation accuracy of the machine translation model.
Fig. 4 is a schematic view of a fourth embodiment of the present application. As shown in fig. 4, the present embodiment provides a training apparatus 400 for a machine translation model, including:
the mining module 401 is configured to mine misinterpretation samples from a parallel corpus by adopting two machine translation models with dual structures;
a training module 402 is configured to train a corresponding machine translation model using the transliteration samples.
The training device 400 for machine translation model in this embodiment implements the implementation principle and the technical effect of training the machine translation model by using the above modules, which are the same as those of the above related method embodiments, and detailed descriptions of the above related embodiments are omitted here.
Fig. 5 is a schematic view of a fifth embodiment of the present application. As shown in fig. 5, the training device for machine translation model of the present embodiment further describes the technical solution of the present application in more detail on the basis of the technical solution of the embodiment shown in fig. 4.
As shown in fig. 5, in the training apparatus 400 of the machine translation model of the present embodiment, the mining module 401 includes:
the forward translation unit 4011 is configured to forward translate, for each sample in the parallel corpus, a source sentence in the sample by using a forward machine translation model, so as to obtain a first translation sentence;
a reverse translation unit 4012, configured to reverse translate the first translation sentence by using a reverse machine translation model, to obtain a second translation sentence;
the forward translation unit 4011 is further configured to forward translate the second translation sentence by using a forward machine translation model to obtain a third translation sentence;
the mining unit 4013 is configured to mine whether the sample belongs to the misinterpretation sample of the forward machine translation model by analyzing semantics of the source sentence and the second translation sentence, and the first translation sentence and the third translation sentence.
Further alternatively, the mining unit 4013 is configured to:
detecting whether a semantic difference exists between a source sentence and a second translation sentence or not, and whether a semantic difference exists between a first translation sentence and a third translation sentence or not;
if the source sentence and the second translation sentence have semantic differences and the first translation sentence and the third translation sentence do not have semantic differences, determining that the sample is a misinterpretation sample of the forward machine translation model;
correspondingly, a training module 402 is configured to train the forward machine translation model using the misinterpretation samples of the forward machine translation model.
Further, optionally, in the mining module 401 of the present embodiment:
the reverse translation unit 4012 is further configured to reverse translate, for each sample in the parallel corpus, the target sentence in the sample by using a reverse machine translation model, so as to obtain a fourth translation sentence;
the forward translation list 4011 is further configured to forward translate the fourth translation sentence by using a forward machine translation model to obtain a fifth translation sentence;
the reverse translation unit 4012 is further configured to reverse translate the fifth translation sentence by using a reverse machine translation model to obtain a sixth translation sentence;
the mining unit 4013 is further configured to mine whether the sample belongs to a misinterpretation sample of the reverse machine translation model by analyzing semantics of the target sentence and the fifth translation sentence, and the fourth translation sentence and the sixth translation sentence.
Further alternatively, the mining unit 4013 is configured to:
detecting whether a semantic difference exists between a target sentence and a fifth translation sentence or not, and whether a semantic difference exists between a fourth translation sentence and a sixth translation sentence or not;
if the target sentence and the fifth translation sentence have semantic differences and the fourth translation sentence and the sixth translation sentence do not have semantic differences, determining that the sample is a mistranslation sample of the reverse machine translation model;
correspondingly, a training module 402 is configured to train the reverse machine translation model using the misinterpretation samples of the reverse machine translation model.
The training device 400 for machine translation model in this embodiment implements the implementation principle and the technical effect of training the machine translation model by using the above modules, which are the same as those of the above related method embodiments, and detailed descriptions of the above related embodiments are omitted here.
According to embodiments of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 6, a block diagram of an electronic device implementing a training method of a machine translation model according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 6, the electronic device includes: one or more processors 601, memory 602, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 601 is illustrated in fig. 6.
Memory 602 is a non-transitory computer-readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of training the machine translation model provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the training method of the machine translation model provided by the present application.
The memory 602 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., related modules shown in fig. 4 and 5) corresponding to a training method of a machine translation model in an embodiment of the present application. The processor 601 executes various functional applications of the server and data processing, i.e., implements the training method of the machine translation model in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 602.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created from the use of an electronic device implementing a training method of the machine translation model, and the like. In addition, the memory 602 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 602 may optionally include memory remotely located with respect to processor 601, which may be connected via a network to an electronic device implementing the training method of the machine translation model. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device implementing the training method of the machine translation model may further include: an input device 603 and an output device 604. The processor 601, memory 602, input device 603 and output device 604 may be connected by a bus or otherwise, for example in fig. 6.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic device implementing the training method of the machine translation model, such as a touch screen, keypad, mouse, trackpad, touch pad, pointer stick, one or more mouse buttons, trackball, joystick, etc. input devices. The output means 604 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the misinterpretation samples are mined from the parallel corpus by adopting two machine translation models with dual structures; the corresponding machine translation model is trained by adopting the transliteration sample, the transliteration sample with the transliteration generated by the machine translation model can be mined, and the transliteration sample is trained again, so that the knowledge of the transliteration sample is learned without transliteration, and the translation accuracy of the machine translation model can be effectively improved. Compared with the prior art, the technical scheme of the embodiment has very strong flexibility, does not have any part of speech or other requirements on the mistranslation sample, can be suitable for mining any mistranslation sample containing sparse words and the like from a parallel corpus, and retrains the machine translation model based on the mined mistranslation sample so as to further effectively improve the translation accuracy of the machine translation model.
According to the technical scheme of the embodiment of the application, the mistranslation sample of the forward machine translation model can be mined, and then the mistranslation sample is adopted to perform key learning on the forward machine translation model so as to optimize the forward machine translation model, so that mistranslations cannot occur when the forward machine translation model translates sparse words such as the mistranslation sample again, and further the translation accuracy of the forward machine translation model can be effectively improved.
According to the technical scheme of the embodiment of the application, the mistranslation sample of the reverse machine translation model can be mined, and then the mistranslation sample is adopted to perform key learning on the reverse machine translation model so as to optimize the reverse machine translation model, so that mistranslation can not occur when the reverse machine translation model translates sparse words such as the mistranslation sample again, and further the translation accuracy of the reverse machine translation model can be effectively improved.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (8)

1. A method of training a machine translation model, wherein the method comprises:
adopting two machine translation models with dual structures to excavate mistranslation samples from a parallel corpus;
training a corresponding machine translation model by adopting the transliteration sample;
wherein, adopt two machine translation models that are dual structure each other, excavate the mistranslation sample from parallel corpus, include:
for each sample in the parallel corpus, forward translation is carried out on a source sentence in the sample by adopting a forward machine translation model, so as to obtain a first translation sentence;
performing reverse translation on the first translation sentence by adopting a reverse machine translation model to obtain a second translation sentence;
adopting the forward machine translation model to forward translate the second translation sentence to obtain a third translation sentence;
mining whether the sample belongs to an transliterated sample of the forward machine translation model by analyzing semantics of the source sentence and the second translation sentence, and the first translation sentence and the third translation sentence;
wherein mining whether the sample belongs to an transliterated sample of the forward machine translation model by analyzing semantics of the source sentence and the second translation sentence, and the first translation sentence and the third translation sentence, comprises:
detecting whether the source sentence and the second translation sentence have semantic differences and whether the first translation sentence and the third translation sentence have semantic differences;
if the source sentence and the second translation sentence have semantic differences and the first translation sentence and the third translation sentence do not have semantic differences, determining that the sample is an transliteration sample of the forward machine translation model;
correspondingly, training a respective machine translation model using the misinterpretation samples, comprising:
the forward machine translation model is trained using a misinterpretation sample of the forward machine translation model.
2. The method of claim 1, wherein mining the transliterated samples from the parallel corpus using two machine translation models that are dual structures, comprises:
for each sample in the parallel corpus, performing reverse translation on a target sentence in the sample by adopting a reverse machine translation model to obtain a fourth translation sentence;
forward translating the fourth translation sentence by adopting a forward machine translation model to obtain a fifth translation sentence;
performing reverse translation on the fifth translation sentence by adopting the reverse machine translation model to obtain a sixth translation sentence;
mining whether the sample belongs to an transliterated sample of the reverse machine translation model by analyzing semantics of the target sentence and the fifth translation sentence, and the fourth translation sentence and the sixth translation sentence.
3. The method of claim 2, wherein mining whether the sample belongs to the mistranslated sample of the reverse machine translation model by analyzing semantics of the target statement and the fifth translation statement, and the fourth translation statement and the sixth translation statement, comprises:
detecting whether the target sentence and the fifth translation sentence have semantic differences, and whether the fourth translation sentence and the sixth translation sentence have semantic differences;
if the semantic difference exists between the target sentence and the fifth translation sentence and the semantic difference does not exist between the fourth translation sentence and the sixth translation sentence, determining that the sample is an transliteration sample of the reverse machine translation model;
correspondingly, training a respective machine translation model using the misinterpretation samples, comprising:
the reverse machine translation model is trained using the misinterpretation samples of the reverse machine translation model.
4. A training apparatus for a machine translation model, wherein the apparatus comprises:
the mining module is used for mining the misinterpretation samples from the parallel corpus by adopting two machine translation models with dual structures;
the training module is used for training a corresponding machine translation model by adopting the mistranslation sample;
wherein, the excavation module includes:
the forward translation unit is used for forward translating the source sentences in the samples by adopting a forward machine translation model for each sample in the parallel corpus to obtain a first translation sentence;
the reverse translation unit is used for carrying out reverse translation on the first translation statement by adopting a reverse machine translation model to obtain a second translation statement;
the forward translation unit is further configured to forward translate the second translation sentence by using the forward machine translation model to obtain a third translation sentence;
a mining unit configured to mine whether the sample belongs to an transliteration sample of the forward machine translation model by analyzing semantics of the source sentence and the second translation sentence, and the first translation sentence and the third translation sentence;
wherein, the excavation unit is used for:
detecting whether the source sentence and the second translation sentence have semantic differences and whether the first translation sentence and the third translation sentence have semantic differences;
if the source sentence and the second translation sentence have semantic differences and the first translation sentence and the third translation sentence do not have semantic differences, determining that the sample is an transliteration sample of the forward machine translation model;
correspondingly, the training module is configured to train the forward machine translation model using the transliteration sample of the forward machine translation model.
5. The apparatus of claim 4, wherein:
the reverse translation unit is further configured to reverse translate, by using a reverse machine translation model, a target sentence in each sample in the parallel corpus to obtain a fourth translation sentence;
the forward translation unit is further configured to forward translate the fourth translation sentence by using a forward machine translation model to obtain a fifth translation sentence;
the reverse translation unit is further configured to reverse translate the fifth translation sentence by using the reverse machine translation model to obtain a sixth translation sentence;
the mining unit is further configured to mine whether the sample belongs to a misinterpretation sample of the reverse machine translation model by analyzing semantics of the target sentence and the fifth translation sentence, and the fourth translation sentence and the sixth translation sentence.
6. The apparatus of claim 5, wherein the digging unit is configured to:
detecting whether the target sentence and the fifth translation sentence have semantic differences, and whether the fourth translation sentence and the sixth translation sentence have semantic differences;
if the semantic difference exists between the target sentence and the fifth translation sentence and the semantic difference does not exist between the fourth translation sentence and the sixth translation sentence, determining that the sample is an transliteration sample of the reverse machine translation model;
correspondingly, the training module is configured to train the reverse machine translation model using the transliteration sample of the reverse machine translation model.
7. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.
8. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-3.
CN202010550590.0A 2020-06-16 2020-06-16 Training method and device of machine translation model, electronic equipment and storage medium Active CN111859996B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010550590.0A CN111859996B (en) 2020-06-16 2020-06-16 Training method and device of machine translation model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010550590.0A CN111859996B (en) 2020-06-16 2020-06-16 Training method and device of machine translation model, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111859996A CN111859996A (en) 2020-10-30
CN111859996B true CN111859996B (en) 2024-03-26

Family

ID=72987280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010550590.0A Active CN111859996B (en) 2020-06-16 2020-06-16 Training method and device of machine translation model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111859996B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015169091A1 (en) * 2014-05-08 2015-11-12 华为技术有限公司 Machine translation method and device thereof
CN109558597A (en) * 2018-12-17 2019-04-02 北京百度网讯科技有限公司 Text interpretation method and device, equipment and storage medium
WO2019107625A1 (en) * 2017-11-30 2019-06-06 주식회사 시스트란인터내셔널 Machine translation method and apparatus therefor
CN110991196A (en) * 2019-12-18 2020-04-10 北京百度网讯科技有限公司 Translation method and device for polysemous words, electronic equipment and medium
CN111144140A (en) * 2019-12-23 2020-05-12 语联网(武汉)信息技术有限公司 Zero-learning-based Chinese and Tai bilingual corpus generation method and device
CN111259676A (en) * 2020-01-10 2020-06-09 苏州交驰人工智能研究院有限公司 Translation model training method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7848915B2 (en) * 2006-08-09 2010-12-07 International Business Machines Corporation Apparatus for providing feedback of translation quality using concept-based back translation
KR100961717B1 (en) * 2008-09-16 2010-06-10 한국전자통신연구원 Method and apparatus for detecting errors of machine translation using parallel corpus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015169091A1 (en) * 2014-05-08 2015-11-12 华为技术有限公司 Machine translation method and device thereof
WO2019107625A1 (en) * 2017-11-30 2019-06-06 주식회사 시스트란인터내셔널 Machine translation method and apparatus therefor
CN109558597A (en) * 2018-12-17 2019-04-02 北京百度网讯科技有限公司 Text interpretation method and device, equipment and storage medium
CN110991196A (en) * 2019-12-18 2020-04-10 北京百度网讯科技有限公司 Translation method and device for polysemous words, electronic equipment and medium
CN111144140A (en) * 2019-12-23 2020-05-12 语联网(武汉)信息技术有限公司 Zero-learning-based Chinese and Tai bilingual corpus generation method and device
CN111259676A (en) * 2020-01-10 2020-06-09 苏州交驰人工智能研究院有限公司 Translation model training method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于对偶学习的西里尔蒙古语-汉语机器翻译研究;苏依拉;孙晓骞;巴图其其格;仁庆道尔吉;;计算机应用与软件;20200112(01);全文 *
训练语料的不同利用方式对神经机器翻译模型的影响;邝少辉;熊德意;;中文信息学报;20180815(08);全文 *

Also Published As

Publication number Publication date
CN111859996A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
US11423222B2 (en) Method and apparatus for text error correction, electronic device and storage medium
CN111144115B (en) Pre-training language model acquisition method, device, electronic equipment and storage medium
US20210201198A1 (en) Method, electronic device, and storage medium for generating node representations in heterogeneous graph
CN111859997B (en) Model training method and device in machine translation, electronic equipment and storage medium
US20220019736A1 (en) Method and apparatus for training natural language processing model, device and storage medium
EP3851977A1 (en) Method, apparatus, electronic device, and storage medium for extracting spo triples
US20210365767A1 (en) Method and device for operator registration processing based on deep learning and electronic device
US20210200963A1 (en) Machine translation model training method, apparatus, electronic device and storage medium
CN111625224A (en) Code generation method, device, equipment and storage medium
EP3846069A1 (en) Pre-training method for sentiment analysis model, and electronic device
US11182648B2 (en) End-to-end model training method and apparatus, and non-transitory computer-readable medium
US20220019743A1 (en) Method for training multilingual semantic representation model, device and storage medium
CN111241810B (en) Punctuation prediction method and punctuation prediction device
EP3822815A1 (en) Method and apparatus for mining entity relationship, electronic device, storage medium, and computer program product
US20210319185A1 (en) Method for generating conversation, electronic device and storage medium
CN111783998B (en) Training method and device for illegal account identification model and electronic equipment
KR20210148813A (en) Medical fact verification method and apparatus, electronic device, and storage medium and program
CN111709252A (en) Model improvement method and device based on pre-trained semantic model
CN111126063B (en) Text quality assessment method and device
CN111310481B (en) Speech translation method, device, computer equipment and storage medium
CN111859996B (en) Training method and device of machine translation model, electronic equipment and storage medium
CN112328749A (en) Knowledge element extraction method, knowledge element extraction device, electronic apparatus, knowledge element extraction medium, and program product
CN115688802B (en) Text risk detection method and device
CN110990569A (en) Text clustering method and device and related equipment
JP7286737B2 (en) Text error correction method, device, electronic device, storage medium and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant