CN111626064A - Training method and device of neural machine translation model and storage medium - Google Patents
Training method and device of neural machine translation model and storage medium Download PDFInfo
- Publication number
- CN111626064A CN111626064A CN201910142831.5A CN201910142831A CN111626064A CN 111626064 A CN111626064 A CN 111626064A CN 201910142831 A CN201910142831 A CN 201910142831A CN 111626064 A CN111626064 A CN 111626064A
- Authority
- CN
- China
- Prior art keywords
- sentence
- machine translation
- training
- tuple
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013519 translation Methods 0.000 title claims abstract description 142
- 230000001537 neural effect Effects 0.000 title claims abstract description 99
- 238000012549 training Methods 0.000 title claims abstract description 94
- 238000000034 method Methods 0.000 title claims abstract description 69
- 230000008569 process Effects 0.000 claims abstract description 25
- 238000004590 computer program Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000015654 memory Effects 0.000 description 27
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a training method and device of a neural machine translation model and a storage medium. According to the training method of the neural machine translation model provided by the embodiment of the invention, in the training process of the neural machine translation model, the common N-tuple in the corpus of the target sentence is used as an inseparable word to be trained, so that the trained neural machine translation model can obtain a translation result containing more N-tuples in the actual translation process, the scoring result of the neural machine translation can be improved, and the machine translation quality can be improved.
Description
Technical Field
The invention relates to the technical field of neural machine translation in Natural Language Processing (NLP), in particular to a training method and device of a neural machine translation model and a storage medium.
Background
Neural Machine Translation (NMT) refers to a Machine Translation method that directly employs a Neural network to perform Translation modeling in an end-to-end manner. Different from a method for improving a certain module in the traditional statistical machine translation by utilizing a deep learning technology, the neural machine translation adopts a simple and visual method to complete the translation work: the source language sentence is first encoded into a dense vector using a neural network called an Encoder (Encoder), and the target language sentence is then decoded from the vector using a neural network called a Decoder (Decoder). The neural network model described above is generally referred to as an "Encoder-Decoder" (Encoder-Decoder) structure.
The prior art typically employs a Bilingual evaluation of translation quality (BLEU) algorithm to evaluate the machine translation quality. The design idea of the BLEU algorithm is consistent with the idea of judging the quality of machine translation, namely, the closer the machine translation result is to the result of professional manual translation, the better the translation quality is. The N-gram is a statistical language model, which can represent a sentence as a word sequence composed of N continuous words, and calculate the probability of the sentence by using the collocation information between adjacent words in the context, thereby determining whether the sentence is smooth. The BLEU algorithm adopts a matching rule of an N-gram, and can calculate a ratio of N-tuple similarity between a predicted translation and a reference translation through the matching rule, so as to obtain an evaluation index of machine translation quality.
At present, common NMT models include a sequence-to-sequence (seq2seq) model, a convolution sequence-to-sequence (convS2S) model and a transformer model, and the prior art improves the neural machine model itself to improve the machine translation performance. How to further improve the translation performance of the existing neural machine translation and enable the neural machine translation to be more accurately realized is a technical problem to be solved in the field.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a method, an apparatus and a storage medium for training a neural machine translation model, so as to improve the translation performance of neural machine translation.
In order to solve the above technical problem, a method for training a neural machine translation model provided in an embodiment of the present invention includes:
calculating the occurrence frequency of an N-tuple in a target sentence corpus, wherein the target sentence corpus comprises a plurality of target sentences, and N is greater than or equal to 2;
selecting the high-frequency N-tuple with the occurrence frequency higher than a preset threshold value from the N-tuple, and combining the high-frequency N-tuple existing in the target sentence into an integral word through a preset separator to obtain an updated target sentence corpus;
and training a neural machine translation model by using the source sentence corpus and the updated target sentence corpus.
Preferably, after training the neural machine translation model, the method further comprises:
translating to obtain a predicted sentence of the sentence to be translated by using the trained neural machine translation model;
and after the integral words existing in the prediction sentence are reset into N separated words, outputting the prediction sentence.
Preferably, the step of resetting the whole words present in the predicted sentence into N separate words includes:
and splitting the whole words existing in the predicted sentence according to preset separators in the whole words to obtain N words.
Preferably, the step of training the neural machine translation model by using the source sentence corpus and the updated target sentence corpus includes:
and training a neural machine translation model by using a parallel corpus consisting of a source sentence and a target sentence corresponding to the source sentence, wherein the whole word existing in the target sentence is forbidden to be segmented in the training process.
Preferably, the N-tuple is a 2-tuple, a 3-tuple or a 4-tuple.
Preferably, the neural machine translation model is a sequence-to-sequence seq2seq model, a convolutional sequence-to-sequence convS2S model, or a transformer model.
The embodiment of the invention also provides a training device of the neural machine translation model, which comprises:
a frequency calculation unit, configured to calculate an occurrence frequency of an N-tuple in a target sentence corpus, where the target sentence corpus includes a plurality of target sentences, and N is greater than or equal to 2;
the word combination unit is used for selecting the high-frequency N-tuple with the occurrence frequency higher than a preset threshold value from the N-tuple, and combining the high-frequency N-tuple existing in the target sentence into an integral word through a preset separator to obtain an updated target sentence corpus;
and the model training unit is used for training the neural machine translation model by utilizing the source sentence linguistic data and the updated target sentence linguistic data.
Preferably, the training device further comprises:
the translation unit is used for translating to obtain a predicted sentence of the sentence to be translated by utilizing the neural machine translation model obtained by the training of the model training unit; and outputting the predicted sentence after resetting the whole words existing in the predicted sentence into N separated words.
Preferably, in the training device, the translation unit is further configured to split the whole word existing in the predicted sentence according to a predetermined separator in the whole word, so as to obtain N words.
Preferably, in the training apparatus, the model training unit is further configured to train a neural machine translation model using a parallel corpus composed of a source sentence and a target sentence corresponding to the source sentence, where the whole word existing in the target sentence is prohibited from being segmented in a training process.
Preferably, in the training apparatus, the N-tuple is a 2-tuple, a 3-tuple, or a 4-tuple.
Preferably, in the training device, the neural machine translation model is a sequence-to-sequence seq2seq model, a convolution sequence-to-sequence convS2S model, or a transformer model.
The embodiment of the invention also provides a training device of the neural machine translation model, which comprises: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method of training a neural machine translation model as described above.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the training method for a neural machine translation model as described above.
Compared with the prior art, the training method, the training device and the storage medium of the neural machine translation model provided by the embodiment of the invention have the advantages that in the training process of the neural machine translation model, the common N-tuple in the corpus of the target sentence is used as an undetachable word to be trained, so that the trained neural machine translation model can obtain a translation result containing more N-tuples in the actual translation process, the scoring result of the neural machine translation can be improved, and the machine translation quality is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a schematic flow chart of a method for training a neural machine translation model according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another method for training a neural machine translation model according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an apparatus for training a neural machine translation model according to an embodiment of the present invention;
FIG. 4 is another schematic structural diagram of a training apparatus for neural machine translation model according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a training apparatus for a neural machine translation model according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments. In the following description, specific details such as specific configurations and components are provided only to help the full understanding of the embodiments of the present invention. Thus, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Referring to fig. 1, a flow diagram of a training method of a neural machine translation model according to an embodiment of the present invention is shown, where the training method of the neural machine translation model can improve the translation performance of the trained neural machine translation model. Specifically, the neural machine translation model is a sequence-to-sequence (seq2seq) model, a convolution sequence-to-sequence (convS2S) model or a transformer model, and of course, the embodiment of the present invention may also be applied to other types of neural machine translation models, which is not specifically limited in the present invention. As shown in fig. 1, the training method of the neural machine translation model may include:
In the training process of the neural machine translation model, the training corpora generally include a source sentence corpus and a target sentence corpus. The source sentence corpus comprises a plurality of source sentences in the source language, the target sentence corpus comprises a plurality of target sentences in the target language, each source sentence has a target sentence corresponding to the source sentence, and the source sentence corpus and the target sentence constitute a parallel corpus. In the above step 101, the occurrence frequency of each N-tuple in the target sentence corpus is calculated, for example, if the target sentence corpus includes 100 ten thousand target sentences and a certain N-tuple appears in the target sentences 2 ten thousand times in total, the occurrence frequency of the N-tuple is 2/100-0.02. Of course, the occurrence frequency may also be counted according to the number of occurrences, and in this case, the occurrence frequency of the N-tuple is 20 ten thousand times.
It should be noted that, regarding the concept of the correlation of the N-tuple, reference may be made to the related explanation of the prior art, where the N-tuple generally corresponds to N consecutive words in the sentence, and may also be N consecutive words and punctuations, as long as the words and punctuations are consecutive in the sentence, and in order to save space, they are not detailed here. Preferably, N is an integer greater than or equal to 2, and may specifically take the value of 2, 3 or 4, but may also take other larger values. As a preferred method, since the BLEU algorithm generally uses 4-tuple for evaluating the machine translation performance, the N-tuple of the embodiment of the present invention may preferably be 4-tuple. Taking the target language as English as an example, for a target sentence "it is said t and it is with a plurality of binary symbols" 4 tuples "include" it is said t "," is said t and "said t and it is with" … "and" with a plurality of binary symbols ". For the 4-tuple "it is said that", the embodiment of the present invention can calculate the occurrence frequency of the 4-tuple in all target sentences of the target sentence corpus.
And 102, selecting the high-frequency N-tuple with the occurrence frequency higher than a preset threshold value from the N-tuple, and combining the high-frequency N-tuple existing in the target sentence into an integral word through a preset separator to obtain an updated target sentence corpus.
Here, the embodiment of the present invention determines whether the N-tuple is a high-frequency N-tuple according to whether the occurrence frequency of the N-tuple is higher than a preset threshold, where the high-frequency N-tuple means that the N-tuple frequently occurs in the corpus of the target sentence, and therefore, the N-tuple is often used as a whole, and in consideration of the above factors, the embodiment of the present invention combines the high-frequency N-tuple into an integral word, and the integral word is prohibited from being further segmented into smaller subwords in the training process of the neural machine model.
In order to conveniently identify the whole word composed of the high-frequency N-tuple during model training, the embodiment of the invention may connect the words in the high-frequency N-tuple by using a predetermined separator to form a whole word. For example, the following separators may be employed: @ _, to connect the words in the N-tuple. Still taking the above "it is said that" as an example, by the above separator, an integral word can be obtained: "it @ _ is @ _ said @ _ that". Through the processing, the embodiment of the invention can combine the high-frequency N-tuple existing in each target sentence in the target sentence corpus into an integral word, thereby realizing the updating of the target sentence corpus. Of course, if there is no high-frequency N-tuple in a certain target sentence, the above-described combining process is not required.
It should be noted that, in the subsequent training process of the neural machine model, the employed segmentation algorithm may also use a specific separator, and the predetermined separator in step 102 needs to be distinguished from the separator used by the above segmentation algorithm, i.e. uses a separator different from the above separator. For example, taking the Byte Pair encoding (BPE, Byte Pair Encoder) algorithm as an example, the BPE adopts "@" as the separator, so if the BPE algorithm is adopted in the subsequent model training, the predetermined separator in step 102 should adopt a separator different from "@".
And 103, training a neural machine translation model by using the source sentence linguistic data and the updated target sentence linguistic data.
In step 103, the embodiment of the present invention trains the machine translation model by using the updated target sentence corpus and the original source sentence corpus, and finally obtains a trained neural machine translation model for translation from the source language to the target language.
In the training process, the embodiment of the present invention may train a neural machine translation model using a parallel corpus composed of a source sentence in a corpus of source sentences and a target sentence corresponding to the source sentence, wherein when an integral word obtained by combining high-frequency N-tuples exists in the target sentence, the integral word is prohibited from being segmented in the training process. That is, in the training process, if the whole word exists in the target sentence, the whole word is not further segmented.
It should be noted that the embodiment of the present invention may be applied to a neural machine translation model, for example, the neural machine translation model in step 103 may be a sequence-to-sequence (seq2seq) model, a convolution sequence-to-sequence (convS2S) model, or a transform model, and may also be another neural machine translation model, which is not specifically limited in the embodiment of the present invention.
Through the steps, the embodiment of the invention trains the neural machine translation model by using the whole words combined by the high-frequency N-tuples, so that the training of the high-frequency N-tuples as a whole in the training process can be ensured, and the neural machine translation model obtained by training can obtain translation results containing more high-frequency N-tuples during actual translation, thereby improving the scoring result of the neural machine translation and improving the machine translation quality.
According to the training method of the neural machine translation model, the neural machine translation model with high translation performance can be obtained through training through the steps 101-103. And subsequently, the trained neural machine translation model can be used for translating the source language into the target language.
Referring to fig. 2, the training method of the neural machine translation model according to the embodiment of the present invention, after step 103, may further include the following steps:
and step 104, translating to obtain a predicted sentence of the sentence to be translated by using the trained neural machine translation model.
And 105, when the whole words exist in the predicted sentence, resetting the whole words existing in the predicted sentence into N separated words, and then outputting the predicted sentence.
And 106, directly outputting the predicted sentence when the whole word does not exist in the predicted sentence.
In the above step 104, the neural machine translation model obtained in the step 103 is used for translation, and the predicted sentence obtained by translation may include an integral word combined by the high-frequency N-tuple. Therefore, in the embodiment of the present invention, the whole word that may exist in the prediction sentence is further split in step 105, specifically, a split point between adjacent words may be determined according to a predetermined separator in the whole word, and then the whole word that exists in the prediction sentence is split to obtain N words. Of course, if the whole word is not present in the predicted sentence, the predicted sentence may be directly output through step 106.
Through the steps, the embodiment of the invention realizes the translation application of the neural machine translation model obtained by training. The pre-trained neural machine translation model can obtain a translation result containing more N-tuples during actual translation, so that the scoring result of the neural machine translation is improved, and the machine translation quality is improved.
Based on the above method, an embodiment of the present invention further provides a device for implementing the above method, please refer to fig. 3, a training device 300 for a neural machine translation model provided in an embodiment of the present invention includes:
a frequency calculating unit 301, configured to calculate an occurrence frequency of an N-tuple in a target sentence corpus, where the target sentence corpus includes a plurality of target sentences, and N is greater than or equal to 2;
a word combining unit 302, configured to select a high-frequency N-tuple with an occurrence frequency higher than a preset threshold from the N-tuples, and combine the high-frequency N-tuples existing in the target sentence into an integral word through a predetermined delimiter, so as to obtain an updated target sentence corpus;
and a model training unit 303, configured to train a neural machine translation model using the source sentence corpus and the updated target sentence corpus.
Through the above units, the training device 300 for the neural machine translation model according to the embodiment of the present invention performs training of the neural machine translation model by using the whole words combined by the high-frequency N-tuples, and can ensure that the high-frequency N-tuples are trained as a whole in the training process, so that the trained neural machine translation model can obtain a translation result including a large number of high-frequency N-tuples during actual translation, thereby improving a scoring result of the neural machine translation and improving machine translation quality.
Preferably, the model training unit 303 is further configured to train a neural machine translation model using a parallel corpus composed of a source sentence and a target sentence corresponding to the source sentence, where the whole word existing in the target sentence is prohibited from being segmented in a training process.
Preferably, the N-tuple is a 2-tuple, a 3-tuple or a 4-tuple.
Preferably, the neural machine translation model is a sequence-to-sequence (seq2seq) model, a convolutional sequence-to-sequence (convS2S) model, or a transformer model.
Preferably, as shown in fig. 4, the training apparatus 300 for the neural machine translation model may further include:
a translation unit 304, configured to translate the neural machine translation model obtained through training by the model training unit 303 to obtain a predicted sentence of the sentence to be translated; and, when the whole word is present in the predicted sentence, outputting the predicted sentence after resetting the whole word present in the predicted sentence to separate N words; and when the whole words do not exist in the prediction sentence, directly outputting the prediction sentence.
Preferably, the translation unit 304 is further configured to split the whole word existing in the predicted sentence according to a predetermined separator in the whole word, so as to obtain N words.
Referring to fig. 5, an embodiment of the present invention further provides a hardware structure block diagram of a training apparatus for a neural machine translation model, as shown in fig. 5, the training apparatus 500 for a neural machine translation model includes:
a processor 502; and
a memory 504, in which memory 504 computer program instructions are stored,
wherein the computer program instructions, when executed by the processor, cause the processor 502 to perform the steps of:
calculating the occurrence frequency of an N-tuple in a target sentence corpus, wherein the target sentence corpus comprises a plurality of target sentences, and N is greater than or equal to 2;
selecting the high-frequency N-tuple with the occurrence frequency higher than a preset threshold value from the N-tuple, and combining the high-frequency N-tuple existing in the target sentence into an integral word through a preset separator to obtain an updated target sentence corpus;
and training a neural machine translation model by using the source sentence corpus and the updated target sentence corpus.
Further, as shown in fig. 5, the training apparatus 500 of the neural machine translation model may further include a network interface 501, an input device 503, a hard disk 505, and a display device 506.
The various interfaces and devices described above may be interconnected by a bus architecture. The bus architecture may be any architecture that includes any number of interconnected buses and bridges. Various circuits of one or more Central Processing Units (CPUs), represented in particular by processor 502, and one or more memories, represented by memory 504, are coupled together. The bus architecture may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like. It will be appreciated that a bus architecture is used to enable communications among the components. The bus architecture includes a power bus, a control bus, and a status signal bus, in addition to a data bus, all of which are well known in the art and therefore will not be described in detail herein.
The network interface 501 may be connected to a network (e.g., the internet, a local area network, etc.), collect source sentence corpus and target sentence corpus from the network, and store the collected corpora in the hard disk 505.
The input device 503 can receive various commands input by the operator and send the commands to the processor 502 for execution. The input device 503 may include a keyboard or a pointing device (e.g., a mouse, a trackball, a touch pad, a touch screen, etc.).
The display device 506 may display a result obtained by the processor 502 executing the instruction, for example, displaying a progress of model training and displaying a translation result of a sentence to be translated.
The memory 504 is used for storing programs and data necessary for operating the operating system, and data such as intermediate results in the calculation process of the processor 502.
It will be appreciated that the memory 504 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. The memory 504 of the apparatus and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
In some embodiments, memory 504 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof: an operating system 5041 and applications 5042.
The operating system 5041 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application 5042 includes various applications, such as a Browser (Browser), and is used to implement various application services. A program for implementing a method according to an embodiment of the present invention may be included in application 5042.
The methods disclosed in the above embodiments of the present invention may be applied to the processor 502 or implemented by the processor 502. The processor 502 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 502. The processor 502 described above may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, and may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 504, and the processor 502 reads the information in the memory 504 and performs the steps of the above method in combination with the hardware thereof.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
In particular, the computer program, when executed by the processor 502, may further implement the steps of:
after the neural machine translation model is trained, translating by using the neural machine translation model obtained by training to obtain a predicted sentence of a sentence to be translated;
when the whole words exist in the prediction sentence, the whole words existing in the prediction sentence are reset into N separated words, and then the prediction sentence is output;
and when the whole words do not exist in the prediction sentence, directly outputting the prediction sentence.
In particular, the computer program, when executed by the processor 502, may further implement the steps of:
and splitting the whole words existing in the predicted sentence according to preset separators in the whole words to obtain N words.
In particular, the computer program, when executed by the processor 502, may further implement the steps of:
and training a neural machine translation model by using a parallel corpus consisting of a source sentence and a target sentence corresponding to the source sentence, wherein the whole word existing in the target sentence is forbidden to be segmented in the training process.
Preferably, the N-tuple is a 2-tuple, a 3-tuple or a 4-tuple.
Preferably, the neural machine translation model is a sequence-to-sequence (seq2seq) model, a convolutional sequence-to-sequence (convS2S) model, or a transformer model.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the training method of the neural machine translation model according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (13)
1. A method for training a neural machine translation model, comprising:
calculating the occurrence frequency of an N-tuple in a target sentence corpus, wherein the target sentence corpus comprises a plurality of target sentences, and N is greater than or equal to 2;
selecting the high-frequency N-tuple with the occurrence frequency higher than a preset threshold value from the N-tuple, and combining the high-frequency N-tuple existing in the target sentence into an integral word through a preset separator to obtain an updated target sentence corpus;
and training a neural machine translation model by using the source sentence corpus and the updated target sentence corpus.
2. The method of claim 1, wherein after training the neural machine translation model, the method further comprises:
translating to obtain a predicted sentence of the sentence to be translated by using the trained neural machine translation model;
when the whole words exist in the prediction sentence, the whole words existing in the prediction sentence are reset into N separated words, and then the prediction sentence is output;
and when the whole words do not exist in the prediction sentence, directly outputting the prediction sentence.
3. The method of claim 2, wherein the step of resetting the whole words present in the predicted sentence to separate N words comprises:
and splitting the whole words existing in the predicted sentence according to preset separators in the whole words to obtain N words.
4. The method of claim 1, wherein the step of training the neural machine translation model using the source sentence corpus and the updated target sentence corpus comprises:
and training a neural machine translation model by using a parallel corpus consisting of a source sentence and a target sentence corresponding to the source sentence, wherein the whole word existing in the target sentence is forbidden to be segmented in the training process.
5. The method of any of claims 1 to 4, wherein the N-tuple is a 2-tuple, a 3-tuple, or a 4-tuple.
6. The method of claim 5, wherein the neural machine translation model is a sequence-to-sequence seq2seq model, a convolutional sequence-to-sequence convS2S model, or a transformer model.
7. An apparatus for training a neural machine translation model, comprising:
a frequency calculation unit, configured to calculate an occurrence frequency of an N-tuple in a target sentence corpus, where the target sentence corpus includes a plurality of target sentences, and N is greater than or equal to 2;
the word combination unit is used for selecting the high-frequency N-tuple with the occurrence frequency higher than a preset threshold value from the N-tuple, and combining the high-frequency N-tuple existing in the target sentence into an integral word through a preset separator to obtain an updated target sentence corpus;
and the model training unit is used for training the neural machine translation model by utilizing the source sentence linguistic data and the updated target sentence linguistic data.
8. The training apparatus of claim 7, further comprising:
the translation unit is used for translating to obtain a predicted sentence of the sentence to be translated by utilizing the neural machine translation model obtained by the training of the model training unit; and, when the whole word is present in the predicted sentence, outputting the predicted sentence after resetting the whole word present in the predicted sentence to separate N words; and when the whole words do not exist in the prediction sentence, directly outputting the prediction sentence.
9. The training apparatus of claim 8,
the translation unit is further configured to split the whole words existing in the predicted sentence according to predetermined separators in the whole words to obtain N words.
10. The training apparatus of claim 7,
the model training unit is further configured to train a neural machine translation model using a parallel corpus composed of a source sentence and a target sentence corresponding to the source sentence, wherein the whole word existing in the target sentence is prohibited from being segmented in a training process.
11. Training apparatus according to any of claims 7 to 10, wherein the N-tuple is a 2-tuple, a 3-tuple or a 4-tuple.
12. The training apparatus of claim 11, wherein the neural machine translation model is a sequence-to-sequence seq2seq model, a convolutional sequence-to-sequence convS2S model, or a transform model.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of training a neural machine translation model according to any one of claims 1 to 6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910142831.5A CN111626064B (en) | 2019-02-26 | 2019-02-26 | Training method, training device and storage medium for neural machine translation model |
JP2020029283A JP6965951B2 (en) | 2019-02-26 | 2020-02-25 | Training methods, devices and storage media for neural machine translation models |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910142831.5A CN111626064B (en) | 2019-02-26 | 2019-02-26 | Training method, training device and storage medium for neural machine translation model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111626064A true CN111626064A (en) | 2020-09-04 |
CN111626064B CN111626064B (en) | 2024-04-30 |
Family
ID=72260475
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910142831.5A Active CN111626064B (en) | 2019-02-26 | 2019-02-26 | Training method, training device and storage medium for neural machine translation model |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP6965951B2 (en) |
CN (1) | CN111626064B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112733552A (en) * | 2020-12-30 | 2021-04-30 | 科大讯飞股份有限公司 | Machine translation model construction method, device and equipment |
CN112765996A (en) * | 2021-01-19 | 2021-05-07 | 延边大学 | Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation |
CN113515959A (en) * | 2021-06-23 | 2021-10-19 | 网易有道信息技术(北京)有限公司 | Training method of machine translation model, machine translation method and related equipment |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112764784B (en) * | 2021-02-03 | 2022-10-11 | 河南工业大学 | Automatic software defect repairing method and device based on neural machine translation |
CN113343717A (en) * | 2021-06-15 | 2021-09-03 | 沈阳雅译网络技术有限公司 | Neural machine translation method based on translation memory library |
CN113553864B (en) * | 2021-06-30 | 2023-04-07 | 北京百度网讯科技有限公司 | Translation model training method and device, electronic equipment and storage medium |
CN113743095B (en) * | 2021-07-19 | 2024-09-20 | 西安理工大学 | Chinese problem generation unified pre-training method based on word lattice and relative position embedding |
CN114492469A (en) * | 2021-12-28 | 2022-05-13 | 科大讯飞股份有限公司 | Translation method, translation device and computer readable storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003083710A2 (en) * | 2002-03-27 | 2003-10-09 | Universiity Of Southern California | Phrase- based joint probability model for statistical machine translation |
US20080306725A1 (en) * | 2007-06-08 | 2008-12-11 | Microsoft Corporation | Generating a phrase translation model by iteratively estimating phrase translation probabilities |
JP2009527818A (en) * | 2006-02-17 | 2009-07-30 | グーグル・インコーポレーテッド | Distributed Model Coding and Adaptive Scalable Access Processing |
CN101685441A (en) * | 2008-09-24 | 2010-03-31 | 中国科学院自动化研究所 | Generalized reordering statistic translation method and device based on non-continuous phrase |
CN102193912A (en) * | 2010-03-12 | 2011-09-21 | 富士通株式会社 | Phrase division model establishing method, statistical machine translation method and decoder |
US20130030787A1 (en) * | 2011-07-25 | 2013-01-31 | Xerox Corporation | System and method for productive generation of compound words in statistical machine translation |
CN103631771A (en) * | 2012-08-28 | 2014-03-12 | 株式会社东芝 | Method and device for improving linguistic model |
CN103823795A (en) * | 2012-11-16 | 2014-05-28 | 佳能株式会社 | Machine translation system, machine translation method and decoder used together with system |
US20180089180A1 (en) * | 2016-09-27 | 2018-03-29 | Panasonic Intellectual Property Management Co., Ltd. | Method, device, and recording medium for providing translated sentence |
CN108132932A (en) * | 2017-12-27 | 2018-06-08 | 苏州大学 | Neural machine translation method with replicanism |
-
2019
- 2019-02-26 CN CN201910142831.5A patent/CN111626064B/en active Active
-
2020
- 2020-02-25 JP JP2020029283A patent/JP6965951B2/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003083710A2 (en) * | 2002-03-27 | 2003-10-09 | Universiity Of Southern California | Phrase- based joint probability model for statistical machine translation |
JP2009527818A (en) * | 2006-02-17 | 2009-07-30 | グーグル・インコーポレーテッド | Distributed Model Coding and Adaptive Scalable Access Processing |
US20080306725A1 (en) * | 2007-06-08 | 2008-12-11 | Microsoft Corporation | Generating a phrase translation model by iteratively estimating phrase translation probabilities |
CN101685441A (en) * | 2008-09-24 | 2010-03-31 | 中国科学院自动化研究所 | Generalized reordering statistic translation method and device based on non-continuous phrase |
CN102193912A (en) * | 2010-03-12 | 2011-09-21 | 富士通株式会社 | Phrase division model establishing method, statistical machine translation method and decoder |
US20130030787A1 (en) * | 2011-07-25 | 2013-01-31 | Xerox Corporation | System and method for productive generation of compound words in statistical machine translation |
CN103631771A (en) * | 2012-08-28 | 2014-03-12 | 株式会社东芝 | Method and device for improving linguistic model |
CN103823795A (en) * | 2012-11-16 | 2014-05-28 | 佳能株式会社 | Machine translation system, machine translation method and decoder used together with system |
US20180089180A1 (en) * | 2016-09-27 | 2018-03-29 | Panasonic Intellectual Property Management Co., Ltd. | Method, device, and recording medium for providing translated sentence |
CN108132932A (en) * | 2017-12-27 | 2018-06-08 | 苏州大学 | Neural machine translation method with replicanism |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112733552A (en) * | 2020-12-30 | 2021-04-30 | 科大讯飞股份有限公司 | Machine translation model construction method, device and equipment |
CN112733552B (en) * | 2020-12-30 | 2024-04-12 | 中国科学技术大学 | Machine translation model construction method, device and equipment |
CN112765996A (en) * | 2021-01-19 | 2021-05-07 | 延边大学 | Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation |
CN112765996B (en) * | 2021-01-19 | 2021-08-31 | 延边大学 | Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation |
CN113515959A (en) * | 2021-06-23 | 2021-10-19 | 网易有道信息技术(北京)有限公司 | Training method of machine translation model, machine translation method and related equipment |
Also Published As
Publication number | Publication date |
---|---|
JP2020140710A (en) | 2020-09-03 |
CN111626064B (en) | 2024-04-30 |
JP6965951B2 (en) | 2021-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111626064B (en) | Training method, training device and storage medium for neural machine translation model | |
KR102382499B1 (en) | Translation method, target information determination method, related apparatus and storage medium | |
CN111709248B (en) | Training method and device for text generation model and electronic equipment | |
EP3862907A1 (en) | Method and apparatus for labeling core entity, and electronic device | |
CN113110988B (en) | Testing applications with defined input formats | |
US10909319B2 (en) | Entity linking method, electronic device for performing entity linking, and non-transitory computer-readable recording medium | |
WO2021072852A1 (en) | Sequence labeling method and system, and computer device | |
CN109661664B (en) | Information processing method and related device | |
CN112329465A (en) | Named entity identification method and device and computer readable storage medium | |
CN111626065A (en) | Training method and device of neural machine translation model and storage medium | |
US11537792B2 (en) | Pre-training method for sentiment analysis model, and electronic device | |
CN110175336B (en) | Translation method and device and electronic equipment | |
CN111062206B (en) | Sub-word unit splitting method, sub-word unit splitting device and computer readable storage medium | |
JP7413630B2 (en) | Summary generation model training method, apparatus, device and storage medium | |
CN110674306B (en) | Knowledge graph construction method and device and electronic equipment | |
JP7226514B2 (en) | PRE-TRAINED LANGUAGE MODEL, DEVICE AND COMPUTER-READABLE STORAGE MEDIA | |
CN112528001B (en) | Information query method and device and electronic equipment | |
WO2020000764A1 (en) | Hindi-oriented multi-language mixed input method and device | |
CN109271641A (en) | A kind of Text similarity computing method, apparatus and electronic equipment | |
CN112507697B (en) | Event name generation method, device, equipment and medium | |
CN110852066B (en) | Multi-language entity relation extraction method and system based on confrontation training mechanism | |
KR20220049693A (en) | Translation method using proper nouns coding based on neural network and the system thereof | |
US20180197530A1 (en) | Domain terminology expansion by relevancy | |
CN114220505A (en) | Information extraction method of medical record data, terminal equipment and readable storage medium | |
US20180174572A1 (en) | Transliteration using machine translation pipeline |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |