CN111859996A - Training method and device of machine translation model, electronic equipment and storage medium - Google Patents

Training method and device of machine translation model, electronic equipment and storage medium Download PDF

Info

Publication number
CN111859996A
CN111859996A CN202010550590.0A CN202010550590A CN111859996A CN 111859996 A CN111859996 A CN 111859996A CN 202010550590 A CN202010550590 A CN 202010550590A CN 111859996 A CN111859996 A CN 111859996A
Authority
CN
China
Prior art keywords
translation
sample
sentence
machine translation
translation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010550590.0A
Other languages
Chinese (zh)
Other versions
CN111859996B (en
Inventor
张睿卿
何中军
吴华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010550590.0A priority Critical patent/CN111859996B/en
Publication of CN111859996A publication Critical patent/CN111859996A/en
Application granted granted Critical
Publication of CN111859996B publication Critical patent/CN111859996B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The application discloses a training method and device of a machine translation model, electronic equipment and a storage medium, and relates to the technical field of natural language processing. The specific implementation scheme is as follows: mining a transliterated sample from a parallel corpus by adopting two machine translation models with a mutual dual structure; and training a corresponding machine translation model by adopting the transliteration sample. According to the method and the device, the transliteration sample of the machine translation model with the transliteration can be mined, and the transliteration sample is trained, so that the knowledge of the transliteration sample is learned, the transliteration does not occur any more, and the translation accuracy of the machine translation model can be effectively improved. Compared with the prior art, the method has very strong flexibility, has no requirement on part of speech or other aspects of the transliterated sample, can be suitable for mining any transliterated sample containing sparse words and the like from the parallel corpus, and retrains the machine translation model based on the mined transliterated sample so as to further effectively improve the translation accuracy of the machine translation model.

Description

Training method and device of machine translation model, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for training a machine translation model, an electronic device, and a storage medium.
Background
In Natural Language Processing (NLP), when a machine translation model trained based on a deep learning technique is used for machine translation, there are often cases where translation is incorrect for sparse words or unusual expressions, mainly because the number of expression pattern samples of the words is too small and the machine translation model is not sufficiently learned.
In the prior art, in order to improve the translation accuracy of a machine translation model for sparse words, the translation of corresponding sparse words can be intervened by manually intervening the translation contents of the sparse words, so as to improve the accuracy of the translation.
However, not all sparse words can be resolved by intervention, e.g., translations of verbs are generally context-dependent and not applicable to determination by proactive intervention. Therefore, the existing method for improving the translation accuracy of the machine translation model to the sparse words through manual intervention has great limitation and very poor flexibility.
Disclosure of Invention
In order to solve the technical problem, the application provides a training method and device of a machine translation model, an electronic device and a storage medium.
According to an aspect of the present application, a method for training a machine translation model is provided, wherein the method includes:
mining a transliterated sample from a parallel corpus by adopting two machine translation models with a mutual dual structure;
and training a corresponding machine translation model by adopting the transliteration sample.
According to another aspect of the present application, there is provided a training apparatus for a machine translation model, wherein the apparatus includes:
the mining module is used for mining the transliterated samples from the parallel corpus by adopting two machine translation models with dual structures;
and the training module is used for training a corresponding machine translation model by adopting the transliteration sample.
According to still another aspect of the present application, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to yet another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.
According to the technology of the application, by adopting two machine translation models with a dual structure, a transliterated sample is mined from a parallel corpus; the corresponding machine translation model is trained by adopting the transliterated samples, the transliterated samples with transliterated machine translation models can be mined, and the transliterated samples are trained again, so that the knowledge of the transliterated samples is learned, the transliterated samples do not occur any more, and the translation accuracy of the machine translation model can be effectively improved. Compared with the prior art, the technical scheme has very strong flexibility, has no requirement on part of speech or other aspects for the transliterated sample, can be suitable for mining any transliterated sample containing sparse words and the like from the parallel corpus, and retrains the machine translation model based on the mined transliterated sample so as to further effectively improve the translation accuracy of the machine translation model.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present application;
FIG. 2 is a schematic diagram according to a second embodiment of the present application;
FIG. 3 is a schematic illustration according to a third embodiment of the present application;
FIG. 4 is a schematic illustration according to a fourth embodiment of the present application;
FIG. 5 is a schematic illustration according to a fifth embodiment of the present application;
fig. 6 is a block diagram of an electronic device for implementing a method for training a machine translation model according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a first embodiment of the present application. As shown in fig. 1, this embodiment provides a method for training a machine translation model, which specifically includes the following steps:
S101, mining a transliterated sample from a parallel corpus by adopting two machine translation models with dual structures;
and S102, training a corresponding machine translation model by adopting the transliteration sample.
The main execution body of the training method of the machine translation model in this embodiment is a training device of the machine translation model, and the training device of the machine translation model may be an electronic entity similar to a computer, or may also be an application adopting software integration, and when in use, the application runs on a computer device to implement training of the machine translation model.
In this embodiment, two machine translation models with dual structures each other, that is, a forward machine translation model and a reverse machine translation model, can translate between two languages. The parallel corpus of the embodiment may include a plurality of samples, each sample includes a source sentence and a target sentence, and the source sentence and the target sentence belong to languages of different languages. For example, if the forward machine translation model can translate a source sentence into a target sentence, the reverse machine translation model can translate the target sentence into the source sentence; and vice versa.
Considering that in the prior art, samples such as sparse words are small in number relative to all training data, and during the training process, insufficient learning is likely to occur, so that translation of the sparse words is caused, and such samples can be called as transliterated samples of the machine translation model. Therefore, the misinterpreted samples of the embodiment are not wrong, but the wrong samples are easily translated by the machine translation model in the process of translating the machine translation model. Moreover, the transliterated samples of the embodiment may refer to samples containing sparse words and the like, which are transliterated due to the lack of machine learning capability caused by sparseness.
In the embodiment, wrong translation samples of the machine translation model translation error can be mined from the parallel corpus, and the machine translation model is trained in a targeted and concentrated manner by adopting the wrong translation samples, so that the machine translation model can learn the knowledge of the samples which are wrongly translated before and can not generate wrong translation, and the translation accuracy of the machine translation model to the sparse words can be improved.
If only one machine translation model is adopted to translate the source sentence into the target sentence, besides human detection, it is difficult to detect whether the machine translation model has a wrong translation. Therefore, in the embodiment, the forward machine translation model and the reverse machine translation model which are dual to each other are adopted, and one of the forward machine translation model and the reverse machine translation model can be adopted to verify whether the other one is subjected to transliteration, so that transliteration samples can be mined from the parallel corpus. And finally, training a corresponding machine translation model by adopting the transliterated sample, so that the transliterated sample can learn knowledge in the transliterated sample and does not generate transliterated. The machine translation model obtained through the training of the embodiment can be applied to the NLP field to translate any sample, and the error rate of translation can be effectively reduced due to the training of the embodiment. Moreover, the technical scheme of the embodiment can be regularly adopted, and the machine translation model is trained pertinently by adopting the transliterated samples, so that the machine translation model can continuously learn new knowledge, master new skills and improve the translation accuracy.
In the training method of the machine translation model of the embodiment, by adopting two machine translation models with a dual structure, a transliterated sample is mined from a parallel corpus; the corresponding machine translation model is trained by adopting the transliterated samples, the transliterated samples with transliterated machine translation models can be mined, and the transliterated samples are trained again, so that the knowledge of the transliterated samples is learned, the transliterated samples do not occur any more, and the translation accuracy of the machine translation model can be effectively improved. Compared with the prior art, the technical scheme of the embodiment has very strong flexibility, has no requirement on part of speech or other aspects for the transliterated sample, can be suitable for mining any transliterated sample containing sparse words and the like from the parallel corpus, and retrains the machine translation model based on the mined transliterated sample so as to further effectively improve the translation accuracy of the machine translation model.
Fig. 2 is a schematic diagram of a second embodiment of the present application. As shown in fig. 2, the method for training a machine translation model according to this embodiment is further described in more detail based on the technical solution of the embodiment shown in fig. 1. As shown in fig. 2, the training method of the machine translation model of the embodiment may specifically include the following steps:
S201, for each sample in the parallel corpus, performing forward translation on a source sentence in the sample by adopting a forward machine translation model to obtain a first translation sentence;
s202, performing reverse translation on the first translation statement by adopting a reverse machine translation model to obtain a second translation statement;
s023, performing forward translation on the second translation statement by adopting a forward machine translation model to obtain a third translation statement;
s204, mining whether the sample belongs to a wrong translation sample of the forward machine translation model or not by analyzing the semantics of the source sentence and the second translation sentence as well as the semantics of the first translation sentence and the third translation sentence;
for example, the step S204 may include the following steps:
(a) detecting whether semantic differences exist between the source sentences and the second translated sentences and between the first translated sentences and the third translated sentences; if the source sentence and the second translated sentence have semantic difference and the first translated sentence and the third translated sentence have no semantic difference, executing the step (b); otherwise, for other cases, the sample is deemed not to be a misinterpreted sample of the forward machine translation model.
(b) Determining the sample as a transliterated sample of the forward machine translation model;
For example, the following describes the technical solution of the present embodiment in detail by taking two dual-structure machine translation models, namely, a forward machine translation model x2y and a reverse machine translation model y2x, as an example to determine whether the sample (x, y) is a mistranslated sample of the forward machine translation model.
First, it is assumed that if the source sentence x and the target sentence in the sample are translated by the two machine translation models x2y and y2x with dual structures, respectively, and the translation results y 'and y, and x' and x are not semantically different, the two translations of x2y and y2x are both considered to be correct.
Based on the above principle, the following operations are performed:
(1) inputting a source sentence x in the sample into a forward machine translation model x2y, and translating y' by using x;
(2) inputting y ' into an inverse machine translation model y2x, and translating x ' by using y ';
(3) detecting whether a semantic difference (difference; diff) exists between x and x';
(4) inputting x ' into a forward machine translation model x2y, and translating y ' by using x ';
(5) detecting whether semantic diff exists between y 'and y';
in the above 5 steps, only two steps detect semantic differences between sentences, namely, the (3) th step and the (5) th step; if the semantic diff is not generated in the step (3), according to the assumption, the translation in the step (1) and the translation in the step (2) have no problem; otherwise, if the semantic diff is generated in the step (3), it indicates that at least one of the steps (1) and (2) generates the misinterpretation.
In this case, if step (5) has no semantic diff, the translation in step (2) and step (4) has no problem according to the assumption, so that the problem is that the wrong translation is generated in step (1), i.e. step (1); conversely, if step (5) also produces a semantic diff, it is difficult to determine at which step the problem occurred.
Therefore, in this embodiment, only the semantic diff occurs in the step (3), and in the case that there is no semantic diff in the step (5), the corresponding sample (x, y) is selected as the misinterpreted sample of the forward machine translation model.
S205, training the forward machine translation model by adopting the mined transliterated sample of the forward machine translation model.
By adopting the method of the embodiment, a group (batch) of the transliterated samples of the forward machine translation model can be mined each time, and then the forward machine translation model is trained by adopting the group of the transliterated samples, so that the forward machine translation model can be optimized by intensively learning the transliterated samples.
According to the training method of the machine translation model, by adopting the scheme, the transliterated sample of the forward machine translation model can be mined, then the transliterated sample is adopted to carry out key learning on the forward machine translation model so as to optimize the forward machine translation model, so that when the forward machine translation model translates sparse words such as the transliterated sample again, transliteration can not occur, and the translation accuracy of the forward machine translation model can be effectively improved. Moreover, the technical scheme of the embodiment has very strong flexibility, has no requirement on part of speech or other aspects of the transliterated sample, can be suitable for mining any transliterated sample containing sparse words and the like from the parallel corpus, and retrains the machine translation model based on the mined transliterated sample, so as to further effectively improve the translation accuracy of the machine translation model.
Fig. 3 is a schematic diagram of a third embodiment of the present application. As shown in fig. 3, the method for training a machine translation model according to this embodiment is further described in more detail based on the technical solution of the embodiment shown in fig. 1. As shown in fig. 3, the training method of the machine translation model of the embodiment may specifically include the following steps:
s301, for each sample in the parallel corpus, performing reverse translation on a target sentence in the sample by adopting a reverse machine translation model to obtain a fourth translation sentence;
s302, performing forward translation on the fourth translation statement by adopting a forward machine translation model to obtain a fifth translation statement;
s303, performing reverse translation on the fifth translation statement by adopting a reverse machine translation model to obtain a sixth translation statement;
s304, mining whether the sample belongs to a wrong translation sample of the reverse machine translation model or not by analyzing the semantics of the target sentence and the fifth translation sentence as well as the semantics of the fourth translation sentence and the sixth translation sentence;
for example, the step S304 may include the following steps:
(A) detecting whether the target sentence and the fifth translated sentence have semantic differences or not and whether the fourth translated sentence and the sixth translated sentence have semantic differences or not; if the target sentence and the fifth translation sentence have semantic difference and the fourth translation sentence and the sixth translation sentence have no semantic difference, executing the step (B); otherwise, for other cases, the sample is deemed not to be a misinterpreted sample of the reverse machine translation model.
(B) The sample is determined to be a misinterpreted sample of the reverse machine translation model.
This embodiment differs from the embodiment shown in fig. 2 described above in that: the embodiment shown in FIG. 2 described above is mining a transliterated sample of the forward machine translation model. The implementation principle of the embodiment is similar to that of the embodiment of mining the error sample of the reverse machine translation model.
Similarly, the following still takes the forward machine translation model x2y and the reverse machine translation model y2x as the two dual-structure machine translation models, and takes as an example to determine whether the sample (x, y) is a wrong translation sample of the reverse machine translation model, and the technical solution of this embodiment is described in detail.
With reference to the principles described above in the embodiment shown in fig. 2, the following operations are performed:
(1 ') inputting the target sentence y in the sample into an inverse machine translation model y2x, and translating x' by using y;
(2 ') inputting x' into a forward machine translation model x2y, and translating y 'by using x';
(3 ') detecting whether a semantic diff exists between y and y';
(4 ') inputting y' into an inverse machine translation model y2x, and translating x 'by using y';
(5 ') detecting whether a semantic diff exists between x ' and x ';
similarly, in the above 5 steps, only two steps detect semantic differences between sentences, namely, the (3 ') th step and the (5') th step; if step (3 ') does not produce semantic diff, according to the assumption, the translation of step (1 ') and step (2 ') has no problem; otherwise, if the semantic diff is generated in step (3 '), it means that at least one of step (1 ') and step (2 ') generates the misinterpretation.
In this case, if step (5 ') has no semantic diff, the translation in step (2 ') and step (4 ') has no problem according to the assumption, so the problem is that the wrong translation is generated in step (1 '), step (1 '); conversely, if step (5') also produces a semantic diff, it is difficult to determine at which step the problem occurred.
Therefore, in this embodiment, it is only defined that the semantic diff occurs in the (3 ') step, and in the case that the semantic diff does not occur in the (5') step, the corresponding sample (x, y) is selected as the misinterpreted sample of the reverse machine translation model.
S305, training the reverse machine translation model by adopting the mined wrong translation sample of the reverse machine translation model.
Similarly, by adopting the method of this embodiment, a group (batch) of misinterpreted samples of the reverse machine translation model may be mined each time, and then the reverse machine translation model is trained by using the group of misinterpreted samples, so that the reverse machine translation model can learn the misinterpreted samples with emphasis, and the optimization of the reverse machine translation model can be realized.
According to the training method of the machine translation model, by adopting the scheme, the transliterated sample of the reverse machine translation model can be mined, then the transliterated sample is adopted to carry out key learning on the reverse machine translation model so as to optimize the reverse machine translation model, so that when the reverse machine translation model translates sparse words such as the transliterated sample again, transliteration can not occur again, and the translation accuracy of the reverse machine translation model can be effectively improved. Moreover, the technical scheme of the embodiment has very strong flexibility, has no requirement on part of speech or other aspects of the transliterated sample, can be suitable for mining any transliterated sample containing sparse words and the like from the parallel corpus, and retrains the machine translation model based on the mined transliterated sample, so as to further effectively improve the translation accuracy of the machine translation model.
Fig. 4 is a schematic view of a fourth embodiment of the present application. As shown in fig. 4, the embodiment provides a training apparatus 400 for a machine translation model, including:
the mining module 401 is configured to mine a transliterated sample from the parallel corpus by using two machine translation models with a dual structure;
a training module 402, configured to train a corresponding machine translation model using the transliterated sample.
The training apparatus 400 for a machine translation model according to this embodiment implements the implementation principle and technical effect of the training of the machine translation model by using the modules, and as with the related method embodiments described above, the details of the related embodiments may be described above, and are not repeated herein.
Fig. 5 is a schematic view of a fifth embodiment of the present application. As shown in fig. 5, the training apparatus of the machine translation model according to the present embodiment further introduces the technical solution of the present application in more detail on the basis of the technical solution of the embodiment shown in fig. 4.
As shown in fig. 5, in the training apparatus 400 of the machine translation model according to the present embodiment, the mining module 401 includes:
the forward translation unit 4011 is configured to forward translate, by using a forward machine translation model, a source sentence in a sample for each sample in the parallel corpus to obtain a first translated sentence;
The reverse translation unit 4012 is configured to perform reverse translation on the first translation statement by using a reverse machine translation model to obtain a second translation statement;
the forward translation unit 4011 is further configured to forward translate the second translated sentence by using a forward machine translation model to obtain a third translated sentence;
and the mining unit 4013 is configured to mine whether the sample belongs to a misinterpreted sample of the forward machine translation model by analyzing semantics of the source sentence and the second translated sentence, and the first translated sentence and the third translated sentence.
Further optionally, a digging unit 4013 for:
detecting whether semantic differences exist between the source sentences and the second translated sentences and between the first translated sentences and the third translated sentences;
if the source sentence and the second translated sentence have semantic difference and the first translated sentence and the third translated sentence do not have semantic difference, determining the sample as a wrong translation sample of the forward machine translation model;
correspondingly, the training module 402 is configured to train the forward machine translation model using the misinterpreted samples of the forward machine translation model.
In addition, optionally, in the excavation module 401 of this embodiment:
the reverse translation unit 4012 is further configured to perform reverse translation on the target sentence in the sample by using a reverse machine translation model for each sample in the parallel corpus to obtain a fourth translation sentence;
The forward translation list 4011 is further configured to forward translate the fourth translation statement by using a forward machine translation model to obtain a fifth translation statement;
the reverse translation unit 4012 is further configured to perform reverse translation on the fifth translation statement by using a reverse machine translation model to obtain a sixth translation statement;
the mining unit 4013 is further configured to mine whether the sample belongs to a transliterated sample of the reverse machine translation model by analyzing semantics of the target sentence and the fifth translated sentence, and semantics of the fourth translated sentence and the sixth translated sentence.
Further optionally, a digging unit 4013 for:
detecting whether the target sentence and the fifth translated sentence have semantic differences or not and whether the fourth translated sentence and the sixth translated sentence have semantic differences or not;
if the target sentence and the fifth translation sentence have semantic differences and the fourth translation sentence and the sixth translation sentence have no semantic differences, determining the sample as a wrong translation sample of the reverse machine translation model;
correspondingly, a training module 402 is configured to train the reverse machine translation model using the misinterpreted samples of the reverse machine translation model.
The training apparatus 400 for a machine translation model according to this embodiment implements the implementation principle and technical effect of the training of the machine translation model by using the modules, and as with the related method embodiments described above, the details of the related embodiments may be described above, and are not repeated herein.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 6 is a block diagram of an electronic device implementing a training method of a machine translation model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.
The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of training a machine translation model provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of training a machine translation model provided herein.
The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., related modules shown in fig. 4 and 5) corresponding to the training method of the machine translation model in the embodiments of the present application. The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, namely, implements the training method of the machine translation model in the above method embodiment.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of an electronic device that implements a training method of a machine translation model, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected over a network to an electronic device implementing the training method of the machine translation model. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device implementing the training method of the machine translation model may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic apparatus implementing the training method of the machine translation model, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, two machine translation models with dual structures are adopted, and transliterated samples are mined from a parallel corpus; the corresponding machine translation model is trained by adopting the transliterated samples, the transliterated samples with transliterated machine translation models can be mined, and the transliterated samples are trained again, so that the knowledge of the transliterated samples is learned, the transliterated samples do not occur any more, and the translation accuracy of the machine translation model can be effectively improved. Compared with the prior art, the technical scheme of the embodiment has very strong flexibility, has no requirement on part of speech or other aspects for the transliterated sample, can be suitable for mining any transliterated sample containing sparse words and the like from the parallel corpus, and retrains the machine translation model based on the mined transliterated sample so as to further effectively improve the translation accuracy of the machine translation model.
According to the technical scheme of the embodiment of the application, the transliterated samples of the forward machine translation model can be mined, then the transliterated samples are adopted to carry out key learning on the forward machine translation model so as to optimize the forward machine translation model, so that when the forward machine translation model translates sparse words such as the transliterated samples again, transliteration can not occur, and the translation accuracy of the forward machine translation model can be effectively improved.
According to the technical scheme of the embodiment of the application, the transliterated samples of the reverse machine translation model can be mined, then the transliterated samples are adopted to perform key learning on the reverse machine translation model so as to optimize the reverse machine translation model, so that when the reverse machine translation model translates sparse words such as the transliterated samples again, transliteration can not occur, and the translation accuracy of the reverse machine translation model can be effectively improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A method of training a machine translation model, wherein the method comprises:
mining a transliterated sample from a parallel corpus by adopting two machine translation models with a mutual dual structure;
and training a corresponding machine translation model by adopting the transliteration sample.
2. The method of claim 1, wherein mining the transliterated samples from the parallel corpus using two machine translation models that are dual structures of each other comprises:
for each sample in the parallel corpus, performing forward translation on a source sentence in the sample by adopting a forward machine translation model to obtain a first translation sentence;
performing reverse translation on the first translation statement by adopting a reverse machine translation model to obtain a second translation statement;
performing forward translation on the second translation statement by adopting the forward machine translation model to obtain a third translation statement;
And mining whether the sample belongs to a wrong translation sample of the forward machine translation model or not by analyzing the semantics of the source sentence and the second translation sentence and the semantics of the first translation sentence and the third translation sentence.
3. The method of claim 2, wherein mining whether the sample belongs to a mis-translated sample of the forward machine translation model by analyzing semantics of the source sentence and the second translated sentence, and the first translated sentence and the third translated sentence comprises:
detecting whether the source sentence and the second translated sentence have semantic differences or not and whether the first translated sentence and the third translated sentence have semantic differences or not;
if the source sentence and the second translated sentence have semantic differences and the first translated sentence and the third translated sentence have no semantic differences, determining the sample as a mispredicted sample of the forward machine translation model;
correspondingly, training a corresponding machine translation model by using the transliteration sample comprises the following steps:
and training the forward machine translation model by adopting the transliterated samples of the forward machine translation model.
4. The method according to any one of claims 1-3, wherein mining the transliterated samples from the parallel corpus using two machine translation models that are dual structures of each other comprises:
For each sample in the parallel corpus, performing reverse translation on a target sentence in the sample by adopting a reverse machine translation model to obtain a fourth translation sentence;
performing forward translation on the fourth translation statement by adopting a forward machine translation model to obtain a fifth translation statement;
adopting the reverse machine translation model to perform reverse translation on the fifth translation statement to obtain a sixth translation statement;
and mining whether the sample belongs to a wrong translation sample of the reverse machine translation model or not by analyzing the semantics of the target sentence and the fifth translation sentence and the semantics of the fourth translation sentence and the sixth translation sentence.
5. The method of claim 4, wherein mining whether the sample belongs to a transliterated sample of the reverse machine translation model by analyzing semantics of the target sentence and the fifth translated sentence, and the fourth translated sentence and the sixth translated sentence comprises:
detecting whether the target sentence and the fifth translated sentence have semantic differences or not, and whether the fourth translated sentence and the sixth translated sentence have semantic differences or not;
if the target sentence and the fifth translated sentence have semantic differences and the fourth translated sentence and the sixth translated sentence have no semantic differences, determining the sample as a wrong translation sample of the reverse machine translation model;
Correspondingly, training a corresponding machine translation model by using the transliteration sample comprises the following steps:
training the reverse machine translation model using the transliterated samples of the reverse machine translation model.
6. An apparatus for training a machine translation model, wherein the apparatus comprises:
the mining module is used for mining the transliterated samples from the parallel corpus by adopting two machine translation models with dual structures;
and the training module is used for training a corresponding machine translation model by adopting the transliteration sample.
7. The apparatus of claim 6, wherein the excavation module comprises:
the forward translation unit is used for carrying out forward translation on the source sentences in the samples by adopting a forward machine translation model for each sample in the parallel corpus to obtain first translated sentences;
the reverse translation unit is used for performing reverse translation on the first translation statement by adopting a reverse machine translation model to obtain a second translation statement;
the forward translation unit is further configured to forward translate the second translation statement by using the forward machine translation model to obtain a third translation statement;
and the mining unit is used for mining whether the sample belongs to a wrong translation sample of the forward machine translation model or not by analyzing the semantics of the source sentence and the second translation sentence and the semantics of the first translation sentence and the third translation sentence.
8. The apparatus of claim 7, wherein the excavation unit is to:
detecting whether the source sentence and the second translated sentence have semantic differences or not and whether the first translated sentence and the third translated sentence have semantic differences or not;
if the source sentence and the second translated sentence have semantic differences and the first translated sentence and the third translated sentence have no semantic differences, determining the sample as a mispredicted sample of the forward machine translation model;
correspondingly, the training module is configured to train the forward machine translation model using the transliterated samples of the forward machine translation model.
9. The apparatus of claim 7 or 8, wherein:
the reverse translation unit is further configured to perform reverse translation on the target sentence in the sample by using a reverse machine translation model for each sample in the parallel corpus to obtain a fourth translated sentence;
the forward translation unit is further configured to forward translate the fourth translation statement by using a forward machine translation model to obtain a fifth translation statement;
the reverse translation unit is further configured to perform reverse translation on the fifth translation statement by using the reverse machine translation model to obtain a sixth translation statement;
The mining unit is further configured to mine whether the sample belongs to a transliterated sample of the reverse machine translation model by analyzing semantics of the target sentence and the fifth translated sentence, and semantics of the fourth translated sentence and the sixth translated sentence.
10. The apparatus of claim 9, wherein the excavation unit is to:
detecting whether the target sentence and the fifth translated sentence have semantic differences or not, and whether the fourth translated sentence and the sixth translated sentence have semantic differences or not;
if the target sentence and the fifth translated sentence have semantic differences and the fourth translated sentence and the sixth translated sentence have no semantic differences, determining the sample as a wrong translation sample of the reverse machine translation model;
correspondingly, the training module is configured to train the reverse machine translation model using the transliterated sample of the reverse machine translation model.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
CN202010550590.0A 2020-06-16 2020-06-16 Training method and device of machine translation model, electronic equipment and storage medium Active CN111859996B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010550590.0A CN111859996B (en) 2020-06-16 2020-06-16 Training method and device of machine translation model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010550590.0A CN111859996B (en) 2020-06-16 2020-06-16 Training method and device of machine translation model, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111859996A true CN111859996A (en) 2020-10-30
CN111859996B CN111859996B (en) 2024-03-26

Family

ID=72987280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010550590.0A Active CN111859996B (en) 2020-06-16 2020-06-16 Training method and device of machine translation model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111859996B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100070261A1 (en) * 2008-09-16 2010-03-18 Electronics And Telecommunications Research Institute Method and apparatus for detecting errors in machine translation using parallel corpus
US20100274552A1 (en) * 2006-08-09 2010-10-28 International Business Machines Corporation Apparatus for providing feedback of translation quality using concept-bsed back translation
WO2015169091A1 (en) * 2014-05-08 2015-11-12 华为技术有限公司 Machine translation method and device thereof
CN109558597A (en) * 2018-12-17 2019-04-02 北京百度网讯科技有限公司 Text interpretation method and device, equipment and storage medium
WO2019107625A1 (en) * 2017-11-30 2019-06-06 주식회사 시스트란인터내셔널 Machine translation method and apparatus therefor
CN110991196A (en) * 2019-12-18 2020-04-10 北京百度网讯科技有限公司 Translation method and device for polysemous words, electronic equipment and medium
CN111144140A (en) * 2019-12-23 2020-05-12 语联网(武汉)信息技术有限公司 Zero-learning-based Chinese and Tai bilingual corpus generation method and device
CN111259676A (en) * 2020-01-10 2020-06-09 苏州交驰人工智能研究院有限公司 Translation model training method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100274552A1 (en) * 2006-08-09 2010-10-28 International Business Machines Corporation Apparatus for providing feedback of translation quality using concept-bsed back translation
US20100070261A1 (en) * 2008-09-16 2010-03-18 Electronics And Telecommunications Research Institute Method and apparatus for detecting errors in machine translation using parallel corpus
WO2015169091A1 (en) * 2014-05-08 2015-11-12 华为技术有限公司 Machine translation method and device thereof
WO2019107625A1 (en) * 2017-11-30 2019-06-06 주식회사 시스트란인터내셔널 Machine translation method and apparatus therefor
CN109558597A (en) * 2018-12-17 2019-04-02 北京百度网讯科技有限公司 Text interpretation method and device, equipment and storage medium
CN110991196A (en) * 2019-12-18 2020-04-10 北京百度网讯科技有限公司 Translation method and device for polysemous words, electronic equipment and medium
CN111144140A (en) * 2019-12-23 2020-05-12 语联网(武汉)信息技术有限公司 Zero-learning-based Chinese and Tai bilingual corpus generation method and device
CN111259676A (en) * 2020-01-10 2020-06-09 苏州交驰人工智能研究院有限公司 Translation model training method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
苏依拉;孙晓骞;巴图其其格;仁庆道尔吉;: "基于对偶学习的西里尔蒙古语-汉语机器翻译研究", 计算机应用与软件, no. 01, 12 January 2020 (2020-01-12) *
邝少辉;熊德意;: "训练语料的不同利用方式对神经机器翻译模型的影响", 中文信息学报, no. 08, 15 August 2018 (2018-08-15) *

Also Published As

Publication number Publication date
CN111859996B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
CN111428008B (en) Method, apparatus, device and storage medium for training a model
JP2022003539A (en) Method, apparatus, electronic device and storage medium for correcting text errors
CN111079442B (en) Vectorization representation method and device of document and computer equipment
US20220019736A1 (en) Method and apparatus for training natural language processing model, device and storage medium
JP7235817B2 (en) Machine translation model training method, apparatus and electronic equipment
CN111859997B (en) Model training method and device in machine translation, electronic equipment and storage medium
US20210200963A1 (en) Machine translation model training method, apparatus, electronic device and storage medium
EP3940581A1 (en) Method and apparatus for training multilingual semantic representation model, device and storage medium
US20210365767A1 (en) Method and device for operator registration processing based on deep learning and electronic device
US11216615B2 (en) Method, device and storage medium for predicting punctuation in text
CN111783443A (en) Text disturbance detection method, disturbance reduction method, disturbance processing method and device
US20210319185A1 (en) Method for generating conversation, electronic device and storage medium
CN111079945A (en) End-to-end model training method and device
CN111783998B (en) Training method and device for illegal account identification model and electronic equipment
CN111709252A (en) Model improvement method and device based on pre-trained semantic model
CN111079449B (en) Method and device for acquiring parallel corpus data, electronic equipment and storage medium
CN111126063B (en) Text quality assessment method and device
CN112560499A (en) Pre-training method and device of semantic representation model, electronic equipment and storage medium
CN111738015A (en) Method and device for analyzing emotion polarity of article, electronic equipment and storage medium
CN111310481B (en) Speech translation method, device, computer equipment and storage medium
EP3855339A1 (en) Method and apparatus for generating text based on semantic representation
CN112016524A (en) Model training method, face recognition device, face recognition equipment and medium
CN110990569A (en) Text clustering method and device and related equipment
CN111859996B (en) Training method and device of machine translation model, electronic equipment and storage medium
CN115688802A (en) Text risk detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant