WO2023155676A1

WO2023155676A1 - Method and apparatus for processing translation model, and computer-readable storage medium

Info

Publication number: WO2023155676A1
Application number: PCT/CN2023/073853
Authority: WO
Inventors: 张海楠; 陈宏申; 邹炎炎; 丁卓冶; 龙波
Original assignee: 北京沃东天骏信息技术有限公司; 北京京东世纪贸易有限公司
Priority date: 2022-02-18
Filing date: 2023-01-30
Publication date: 2023-08-24
Also published as: CN114595701A

Abstract

The present disclosure relates to the technical field of computers, and in particular to a method and apparatus for processing a translation model, and a computer-readable storage medium. The method of the present disclosure comprises: acquiring a plurality of groups of training statements, wherein each group of training statements comprises an original statement and a target translation statement; for each group of training statements, inputting the original statement into an encoder of a translation model, so as to obtain a feature vector of the original statement, and inputting the feature vector of the original statement into a decoder of the translation model; for a word at each position except the tail of a statement which is generated by the decoder, determining the semantic similarity between a generated statement, which is formed by said word and various words prior to said word, and the target translation statement; according to the semantic similarity, selecting said word or a word in the target translation statement that is at the same position as said word, so as to generate a word at the next position; and training the translation model according to the difference between a translation statement, which is formed by words that are generated by the decoder and located at various positions, and the target translation statement.

Description

Translation model processing method, device and computer-readable storage medium

Cross References to Related Applications

This application claims priority to a Chinese patent application with application number 202210150760.5 filed with the China Patent Office on February 18, 2022, the entire contents of which are incorporated herein by reference.

technical field

The present disclosure relates to the field of computer technology, and in particular to a translation model processing method, device and computer-readable storage medium.

Background technique

Autoregressive Model (ARM) is widely used in Natural Language Generation (Neural Language Generation, NLG) tasks, such as machine translation, dialogue reply generation, image subtitle generation and video description generation, using the encoder-decoder framework to predict and generate The short sentence is the next word of the condition.

For machine translation scenarios, when ARM is training, real translation sentences are used as generated short sentences, which forces the model to directly learn the distribution of real translation sentences. But at test or use time, the generated phrases come from the ARM's decoder itself, which is different from the input distribution at training time.

Contents of the invention

According to some embodiments of the present disclosure, a translation model processing method is provided, including: acquiring multiple sets of training sentences, wherein each group of training sentences includes: original sentences and target translation sentences; for each group of training sentences, inputting the original sentences The encoder of the translation model obtains the feature vector of the original sentence, and inputs the feature vector of the original sentence into the decoder of the translation model; for each word generated by the decoder except the end of the sentence, determine the word and the word before the word The semantic similarity between the generated sentence composed of each word in the target translation sentence and the target translation sentence; select the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the next position word; The difference between the translation sentence composed of words in each position and the target translation sentence is used to train the translation model.

In some embodiments, determining the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence includes: the generated sentence composed of the word and the words before the word and the target translation sentence Translate the sentence as a sentence pair, input the BERT model based on the bidirectional encoding representation of the converter, and get the output The feature vector of the sentence pair is output; the feature vector of the sentence pair is input into the activation function module, and the semantic similarity between the generated sentence and the target translation sentence is obtained.

In some embodiments, determining the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence includes: generating a random number, and comparing the random number with a reference value, wherein the reference value Located within the value range of the random number; determine whether the random number is less than the reference value, and if it is less than the reference value, determine the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence .

In some embodiments, the reference value increases as the training times increase.

In some embodiments, selecting the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the word in the next position includes: when the semantic similarity is higher than a threshold, generating the next word according to the word The word in the position; when the semantic similarity is lower than the threshold, the word in the next position is generated according to the word in the same position as the word in the target translation sentence.

In some embodiments, the threshold increases as the training times increase.

In some embodiments, the decoder includes a plurality of decoding modules, and selecting the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the word at the next position includes: selecting the word or the target according to the semantic similarity The word in the same position as the word in the translation sentence is used as the input word; the output state of the decoding module corresponding to the word and the word vector of the input word are input into the decoding module corresponding to the word in the next position, and the output of the next position is obtained. words.

In some embodiments, the method further includes: inputting the sentence to be translated into the trained translation model to obtain a translation sentence of the sentence to be translated.

In some embodiments, inputting the sentence to be translated into the trained translation model, and obtaining the corresponding translation sentence includes: inputting the sentence to be translated into the encoder of the translation model, obtaining the feature vector of the sentence to be translated, and converting the feature vector of the sentence to be translated Vector input to the decoder of the translation model; according to the probability value of each word in each position output by the decoder, select a preset number of words as multiple candidate words for each position. According to multiple candidate words for each position, Generate multiple alternative words in the next position of each position until the end of the sentence is reached, wherein the number of alternative words in each position is the same; use the decoder to generate alternative words in each position to form multiple alternative translation sentences , wherein the generation of words in each candidate translation sentence is associated; one candidate translation sentence is selected according to the probability value of each candidate translation sentence as the translation sentence of the sentence to be translated.

In some embodiments, according to the probability value of each word at each position output by the decoder, selecting a preset number of words as multiple candidate words for each position includes: for each position at each position output by the decoder word According to the probability value of the word and the probability values of the previous words associated with the generation of the word, the selection probability value of the word is determined; according to the selection probability value of each word in each position output by the decoder, a preset number of words of , as multiple candidate words for each position generated by the decoder.

In some embodiments, a candidate translation sentence is selected according to the probability value of each candidate translation sentence, and the translation sentence as the sentence to be translated includes: for each candidate translation sentence generated by the decoder, according to each word in the translation sentence Determine the probability value of each candidate translation sentence; select the candidate translation sentence with the largest probability value as the translation sentence of the sentence to be translated.

A translation model processing device provided according to other embodiments of the present disclosure includes an acquisition module for acquiring multiple sets of training sentences, wherein each group of training sentences includes: an original sentence and a target translation sentence input module for For each group of training sentences, the original sentence is input into the encoder of the translation model to obtain the feature vector of the original sentence, and the feature vector of the original sentence is input into the decoder of the translation model; the determination module is used for removing the end of the sentence generated by the decoder Words in each position outside the word, determine the semantic similarity between the generated sentence and the target translation sentence formed by the word and the words before the word; the generation module is used to select the word or the target translation sentence according to the semantic similarity A word in the same position as the word generates a word in the next position; a training module is used to train the translation model according to the difference between the translation sentence formed by the words in each position generated by the decoder and the target translation sentence.

In some embodiments, the device further includes: a translation module, configured to input the sentence to be translated into the trained translation model to obtain the corresponding translated sentence.

According to some other embodiments of the present disclosure, a processing device for a translation model is provided, including: a processor; and a memory coupled to the processor for storing instructions. When the instructions are executed by the processor, the processor performs the above-mentioned The translation model processing method of any embodiment.

A non-transitory computer-readable storage medium provided according to some further embodiments of the present disclosure, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the translation model processing method of any of the foregoing embodiments are implemented .

Other features of the present disclosure and advantages thereof will become apparent through the following detailed description of exemplary embodiments of the present disclosure with reference to the accompanying drawings.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present disclosure. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

Fig. 1 shows a schematic flowchart of a translation model processing method in some embodiments of the present disclosure.

Fig. 2 shows a schematic structural diagram of a translation model of some embodiments of the present disclosure.

Fig. 3 shows a schematic flowchart of a translation model processing method in some other embodiments of the present disclosure.

Fig. 4 shows a schematic structural diagram of a translation model processing device in some embodiments of the present disclosure.

Fig. 5 shows a schematic structural diagram of a translation model processing device according to other embodiments of the present disclosure.

Fig. 6 shows a schematic structural diagram of a translation model processing device according to some other embodiments of the present disclosure.

Detailed ways

The following will clearly and completely describe the technical solutions in the embodiments of the present disclosure with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only some of the embodiments of the present disclosure, not all of them. The following description of at least one exemplary embodiment is merely illustrative in nature and in no way intended as any limitation of the disclosure, its application or uses. Based on the embodiments in the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present disclosure.

In related technologies, the data input to the decoder part during training and testing or use of the translation model are different, resulting in that the trained translation model cannot achieve accurate translation functions during testing and use.

The present disclosure provides a translation model processing method, which will be described below in conjunction with FIGS. 1-3 .

Fig. 1 is a flow chart of some embodiments of the translation model processing method of the present disclosure. As shown in FIG. 1, the method of this embodiment includes: steps S102-S110.

In step S102, multiple sets of training sentences are obtained.

Each group of training sentences includes: an original sentence and a target translation sentence, for example, the original sentence is a Chinese sentence, and the target translation sentence is an English sentence with the same meaning.

In step S104, for each group of training sentences, input the original sentence into the encoder of the translation model to obtain the feature vector of the original sentence, and input the feature vector of the original sentence into the decoder of the translation model.

A translation model includes an encoder and a decoder. The translation model is, for example, a Seq2Seq (sequence-to-sequence) model, a Transformer model, etc., and is not limited to the examples given. An encoder may include multiple encoding modules. For example, each module of the LSTM (Long Short-Term Memory, long short-term memory) of the encoder part of the Seq2Seq model is used as an encoding module, or multiple encoders of the Transformer model are respectively used as an encoding module. The decoder can include multiple decoding modules, for example, each module of the LSTM in the decoder part of the Seq2Seq model is used as a decoding module, Or multiple decoders of the Transformer model are used as decoding modules respectively. For each set of training sentences, the original sentence is input into the encoder to obtain the feature vector and then input into the decoder part.

In step S106, for the word generated by the decoder at each position except the end of the sentence, determine the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence.

The decoder can generate multiple words and the probabilities of each word for each position except the sentence end of the translation sentence, and select a word with the highest probability as the generated word for the position. Each position in the translation sentence generated by the decoder refers to the position corresponding to each word.

For example, the original sentence is represented as X={x ₁ ,…,x _N }, the generated translation sentence is represented as Y={y ₁ ,…,y _T }, and the target translation sentence is represented as Y′={y′ ₁ ,… ,y′ _M }. As shown in FIG. 2 , each decoding module in the decoder generates words y ₁ ,...,y _T in different positions in the translated sentence respectively. Denote the word at the current position generated by the decoder as y _t , then the generated sentence composed of the word and the words before the word is a sentence composed of y ₁ ,...,y _t , determine the composition of y ₁ ,...,y _t The semantic similarity between the sentence and Y'.

In some embodiments, the generated sentence composed of the word and the words before the word and the target translation sentence are used as a sentence pair, and input into the BERT (Bidirectional Encoder Representation from Transformers, based on the bidirectional encoding representation of the converter) model to obtain the output The feature vector of the sentence pair; input the feature vector of the sentence pair into the activation function module to obtain the semantic similarity between the generated sentence and the target translation sentence. Before inputting the sentence pair into the BERT model, you can add a special identifier in the header and add a separator between the two sentences, take the vector corresponding to the special identifier in the vector output by the BERT model as the feature vector of the sentence pair, and put the sentence pair The feature vector is input into the sigmoid module to obtain the semantic similarity between the generated sentence and the target translation sentence. BERT models can be pre-trained.

In step S108, the word or the word in the same position as the word in the target translation sentence is selected according to the semantic similarity to generate a word in the next position. If the end of the sentence is reached, the steps of determining semantic similarity and generating the next-positioned word need not be performed.

In some embodiments, when the semantic similarity is higher than the threshold, the word at the next position is generated according to the word; when the semantic similarity is lower than the threshold, the word at the same position as the word in the target translation sentence is generated Generate the word for the next position.

In some embodiments, select the word or the word in the same position as the word in the target translation sentence according to the semantic similarity as the input word; the output state of the decoding module corresponding to the word and the word vector of the input word are input to the next The decoding module corresponding to the word of the position obtains the word of the next position outputted. As shown in Figure 2, the selection module is used to select the words in the current position and the words in the same position in the target translation sentence, if If the semantic similarity between the sentence composed of y ₁ ,…,y _t and Y′ is higher than the threshold, then input y _t into the decoding module corresponding to y _t+1 , otherwise, input y′ _t into the decoding module corresponding to y _t+1 , and then generate y _t+1 .

In some embodiments, if the target translation sentence includes multiple, determine the semantic similarity between the generated sentence and each target translation sentence, select the highest value from the multiple semantic similarities as the semantic similarity corresponding to the word, the highest value corresponds to The target translation sentence is used as a reference sentence; when the semantic similarity is higher than the threshold, the word in the next position is generated according to the word; The word generates the word for the next position.

In some embodiments, the threshold increases as the training times increase. Since the sentence generated by the model at the beginning of training has a relatively low similarity with the target translation sentence, the threshold is set lower at this time, and the words generated by the decoder can be introduced into the training process. As the number of training increases, the model is more accurate. , the similarity between the generated sentence and the target translation sentence is increased, and the threshold is increased, so that more accurate and reasonable words generated by the decoder can be selected and introduced into the training process.

In some embodiments, the threshold can be represented by the following formula:

In formula (1), k and γ are hyperparameters, k≥1, which determines the convergence speed, and n represents the number of training times. Formula (1) can make the threshold be a specific value greater than 0 at the beginning of training, so as to ensure that the words introduced into the training process have a higher quality.

In step S110, the translation model is trained according to the difference between the translated sentence composed of words in each position generated by the decoder and the target translated sentence.

After the decoder generates a complete translation sentence, the cross-entropy loss function can be calculated according to the generated translation sentence and the target translation sentence, and the translation model can be trained according to the cross-entropy loss function.

In the above embodiment, when training the translation model, for each word generated by the decoder, determine whether to introduce the word into the training according to the semantic similarity between the generated sentence containing the word and the target translation sentence. The procedure is used to generate the word for the next position in the translated sentence. In this way, the content generated by the decoder can be gradually integrated into the training process to reduce the differences between the training process and the test or use process, and the content generated by the decoder can be selected according to the semantic similarity between the generated sentence and the target translation sentence. Words can make the selected words more real, reasonable and accurate, thereby improving the efficiency and accuracy of translation model training. In addition, compared with directly comparing the similarity between the words in each position and the words in the target translation sentence, the words selected by the method of the above embodiment are more reasonable and accurate.

Since the generated words are of poor quality at the beginning of the training process, in order to reduce the introduction of poor quality words into the training The training process reduces the training convergence speed, and the present disclosure also improves the training process, which will be described below in conjunction with FIG. 3 .

Fig. 3 is a flow chart of another embodiment of the translation model processing method of the present disclosure. As shown in FIG. 3 , step S106 includes: performing steps S302 to S308 for the words generated by the decoder at each position except the end of the sentence.

In step S302, a random number m is generated.

m may be a random number ranging from 0 to 1.

In step S304, compare the random number m with the reference value u, if the random number m is smaller than the reference value u, then execute step S306, otherwise, execute step S308.

In some embodiments, the reference value u increases with the increase of the training times, the minimum value of the reference value u can be 0, and the maximum value can be 1, when u=0, it is the words of the target translation sentence directly The input decoder trains the model, and when u=1, the input of the model training is completely dependent on the words generated by the decoder, and the training is done like the test or use phase. If u is set too low (close to 0), the input to the decoder is almost entirely from the target translation sentence and cannot handle unknown words in the test or usage phase. If u is set too high (close to 1) at the beginning of training, since the model is not well trained, the generated words are of poor quality, which may lead to slow convergence as input to the decoder. The reference value u can be determined, for example, using the following formula:

In formula (2), k≥1 is a hyperparameter, which determines the convergence speed, and n represents the number of training times.

In step S306, determine the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence. Then step S108 can be executed.

In step S308, a word in the next position is generated according to the word in the same position as the word in the target translation sentence.

Step S110 may be executed after step S108 or step S308.

Both the threshold value and the reference value increase with the number of training times, and both show that the growth rate accelerates after a certain number of training times is reached. However, after reaching a certain number of training times, the threshold value increases relatively steadily with the number of training times, and the reference value increases rapidly with the number of training times. And the lowest value of the threshold is higher than the lowest value of the reference value. This is because the maximum value of the threshold and the reference value is a value close to 1, but the range of the threshold should be set to a specific value higher than 0 at the beginning of training, reducing the probability of introducing poor-quality generated words into the training process , while the reference value can be set to 0 at the beginning, and keep close to 0 when the number of training times is small, which can reduce the probability of introducing poor-quality generated words into the training process.

The above reference value can also be set to decrease as the number of training times increases, so it needs to be judged whether the random number is greater than the reference value, if yes, perform step S306, otherwise, perform step S308. The trend of the reference value and the upper The foregoing embodiment is opposite, and will not be repeated here.

The method of the above embodiment, by setting random numbers and reference values, can reduce the probability of introducing inaccurate, unreasonable, and low-quality words generated by the decoder into the training process in the early stage of training, and avoid reducing the speed of training convergence. Thereby improving the training efficiency and accuracy.

The following describes the process of testing or using the translation model after the training is completed.

In some embodiments, after the translation model is trained, the sentence to be translated is input into the trained translation model to obtain the translated sentence of the sentence to be translated.

In order to further expand the selection space of generated words and improve the richness and accuracy of the content of the generated translation sentences, in some embodiments, the sentences to be translated are input into the encoder of the translation model to obtain the feature vectors of the sentences to be translated, and The feature vector of the sentence to be translated is input into the decoder of the translation model; according to the probability value of each word in each position output by the decoder, select a preset number of words as multiple candidate words for each position; Multiple candidate words, generate multiple candidate words for the next position of each position, until the end of the sentence is reached, wherein the number of candidate words for each position is the same; use the decoder to generate candidate word composition for each position A plurality of candidate translation sentences; one candidate translation sentence is selected according to the probability value of each candidate translation sentence as the translation sentence of the sentence to be translated.

The number of candidate words in each position is the same, for example, each position generates 3 candidate words, and the generation of words in each candidate translation sentence is associated. For a word in each position in a candidate translation sentence except the beginning of the sentence, the word is generated based on the word in the previous position, indicating that the generation of the word is associated with the word in the previous position.

In some embodiments, for each word at each position output by the decoder, the selected probability value of the word is determined according to the probability value of the word and the probability values of previous words associated with generating the word; according to the decoder The selection probability value of each word in each position is output, and a preset number of words are selected as multiple candidate words for each position generated by the decoder.

A preset number of words may be selected from the selection probability values of each word in each position in descending order as multiple candidate words for the position. For each position, a plurality of candidate words of the position are respectively input into the decoding module corresponding to the next position, and a plurality of candidate words of the next position are respectively obtained.

For example, the words in the first position generated by the decoder include: y ₁₁ , y ₁₂ , y ₁₃ , y ₁₄ , y _15, etc., which can be sorted from large to small according to the probability value, and select the preset number of words that are sorted first, For example, three words with the highest probability value are selected as y ₁₁ , y ₁₃ , and y ₁₅ . When the decoder generates the word at the second position, it inputs y ₁₁ , y ₁₃ , and y ₁₅ into the decoding module corresponding to the second position, respectively, _{and obtains the word y 21} _, y ₂₂ at the second position corresponding to y 11 , respectively. y ₂₃ , y ₁₃ Words y ₂₄ , y ₂₅ , y ₂₆ corresponding to the second position, y ₁₅ correspond to words y ₂₇ , y ₂₈ , y ₂₉ in the second position. Then select alternative words from y ₂₁ ,...y ₂₉ , and select the one with the highest probability value according to the probability values of the phrases {y ₁₁ , y ₂₁ }, {y ₁₁ , y ₂₂ },...,{y ₁₅ , y ₂₉ } The word in the second position in the 3 phrases is used as an alternative word for the second position.

In some embodiments, for each candidate translation sentence generated by the decoder, the probability value of each candidate translation sentence is determined according to the probability value of each word in the translation sentence; the candidate translation sentence with the largest probability value is selected as the candidate translation sentence to be The translation statement for the translation statement.

The method of the above embodiment can improve the accuracy of translation sentence generation and the richness of content, and improve the translation quality.

The present disclosure also provides an apparatus for processing a translation model, which will be described below with reference to FIG. 4 .

Fig. 4 is a structural diagram of some embodiments of a translation model processing device of the present disclosure. As shown in FIG. 4 , the device 40 of this embodiment includes: an acquisition module 410 , an input module 420 , a determination module 430 , a generation module 440 , and a training module 450 .

The obtaining module 410 is used to obtain multiple sets of training sentences, wherein each group of training sentences includes: an original sentence and a target translation sentence.

The input module 420 is used for inputting the original sentence into the encoder of the translation model for each group of training sentences to obtain the feature vector of the original sentence, and input the feature vector of the original sentence into the decoder of the translation model.

The determination module 430 is used for determining the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence for the word generated by the decoder at each position except the end of the sentence.

In some embodiments, the determination module 430 is configured to use the generated sentence composed of the word and the words before the word and the target translation sentence as a sentence pair, input the converter-based bidirectional encoding representation BERT model, and obtain the output sentence pair Feature vector: Input the feature vector of the sentence pair into the activation function module to obtain the semantic similarity between the generated sentence and the target translation sentence.

In some embodiments, the determining module 430 is used to generate a random number, and compare the random number with a reference value, wherein the reference value is within the value range of the random number; determine whether the random number is smaller than the reference value, and if it is less than the reference value case, determine the semantic similarity between the generated sentence consisting of the word and the words preceding the word and the target translated sentence.

The generating module 440 is used to select the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate a word in the next position.

In some embodiments, the generating module 440 is used to generate If the semantic similarity is lower than the threshold, generate the word in the next position according to the word in the same position as the word in the target translation sentence.

In some embodiments, the threshold increases as the training times increase.

In some embodiments, the generation module 440 is used to select the word or the word in the same position as the word in the target translation sentence according to the semantic similarity as the input word; the output state of the decoding module corresponding to the word and the word of the input word The vector is input to the decoding module corresponding to the word in the next position, and the output word in the next position is obtained.

The training module 450 is used for training the translation model according to the difference between the translation sentence composed of words in each position generated by the decoder and the target translation sentence.

In some embodiments, the device 40 further includes: a translation module 460, configured to input the sentence to be translated into the trained translation model to obtain the translated sentence of the sentence to be translated.

In some embodiments, the translation module 460 is used to input the sentence to be translated into the encoder of the translation model, obtain the feature vector of the sentence to be translated, and input the feature vector of the sentence to be translated into the decoder of the translation model; The probability value of each word in each position, select the preset number of words as multiple candidate words for each position; according to the multiple candidate words for each position, generate multiple candidate words for the next position of each position Select words until the end of the sentence is reached, wherein the number of alternative words in each position is the same; use the decoder to generate alternative words in each position to form multiple alternative translation sentences, wherein the words in each alternative translation sentence The generation among them is associated; according to the probability value of each candidate translation sentence, a candidate translation sentence is selected as the translation sentence of the sentence to be translated.

In some embodiments, the translation module 460 is used to determine the selection probability of the word for each word at each position output by the decoder, according to the probability value of the word and the probability values of the previous words associated with the generation of the word value; according to the selection probability value of each word in each position output by the decoder, select a preset number of words as multiple candidate words for each position generated by the decoder.

In some embodiments, the translation module 460 is used for each candidate translation sentence generated by the decoder, according to the probability values of each word in the translation sentence, determine the probability value of each candidate translation sentence; select the candidate with the largest probability value Select the translation sentence as the translation sentence of the sentence to be translated.

The translation model processing devices in the embodiments of the present disclosure may be implemented by various computing devices or computer systems, which will be described below in conjunction with FIG. 5 and FIG. 6 .

Fig. 5 is a structural diagram of some embodiments of a translation model processing device of the present disclosure. As shown in FIG. 5 , the device 50 of this embodiment includes: a memory 510 and a processor 520 coupled to the memory 510, the processor 520 is controlled by It is configured to execute the translation model processing method in any of the embodiments of the present disclosure based on the instructions stored in the memory 510 .

Wherein, the memory 510 may include, for example, a system memory, a fixed non-volatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.

Fig. 6 is a structural diagram of other embodiments of a translation model processing device of the present disclosure. As shown in FIG. 6 , the apparatus 60 of this embodiment includes: a memory 610 and a processor 620 , which are similar to the memory 510 and the processor 520 respectively. It may also include an input/output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630 , 640 , 650 as well as the memory 610 and the processor 620 may be connected through a bus 660 , for example. Wherein, the input and output interface 630 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 640 provides connection interfaces for various networking devices, for example, it can be connected to a database server or a cloud storage server. The storage interface 650 provides connection interfaces for external storage devices such as SD cards and U disks.

The present disclosure also provides a computer program, including: instructions, which, when executed by the processor, cause the processor to execute the translation model processing method according to any of the foregoing embodiments.

Those skilled in the art should understand that the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Accordingly, the present disclosure can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. .

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present disclosure. It should be understood that each process and/or block in the flowchart and/or block diagram, and a combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present disclosure shall be included in the protection of the present disclosure. within range.

Claims

A processing method for a translation model, comprising:

Obtain multiple groups of training sentences, wherein each group of training sentences includes: an original sentence and a target translation sentence;

For each group of training sentences, the original sentence is input into the encoder of the translation model to obtain the feature vector of the original sentence, and the feature vector of the original sentence is input into the decoder of the translation model;

For the words of each position except the end of the sentence generated by the decoder, determine the semantic similarity between the generated sentence formed by the word and each word before the word and the target translation sentence;

Select this word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the word in the next position;

The translation model is trained according to the difference between the translation sentence formed by the words in each position generated by the decoder and the target translation sentence.
The processing method according to claim 1, wherein said determining the semantic similarity between the generated sentence formed by the word and each word before the word and the target translation sentence comprises:

The generated sentence formed by the word and each word before the word and the target translation sentence are used as a sentence pair, and the input is based on the converter-based bidirectional encoding representation BERT model, and the feature vector of the output sentence pair is obtained;

Inputting the feature vector of the sentence pair into an activation function module to obtain the semantic similarity between the generated sentence and the target translation sentence.
The processing method according to claim 1, wherein said determining the semantic similarity between the generated sentence formed by the word and each word before the word and the target translation sentence comprises:

generating a random number, and comparing the random number with a reference value, wherein the reference value is within the value range of the random number;

Determine whether the random number is smaller than the reference value, and if it is smaller than the reference value, determine the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence.
The processing method according to claim 3, wherein,

The reference value increases with the increase of training times.
The processing method according to claim 1, wherein said selecting the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the word in the next position comprises:

Under the situation that described semantic similarity is higher than threshold value, generate the word of next position according to this word;

When the semantic similarity is lower than the threshold, a word at the next position is generated according to the word at the same position as the word in the target translation sentence.
The processing method according to claim 5, wherein,

The threshold increases with the increase of training times.
The processing method according to claim 1, wherein the decoder includes a plurality of decoding modules, and the word selected according to the semantic similarity or the word in the same position as the word in the target translation sentence generates the next Words for position include:

Select the word in the same position as the word in the word or the target translation sentence according to the semantic similarity, as the input word;

The output state of the decoding module corresponding to the word and the word vector of the input word are input into the decoding module corresponding to the word in the next position to obtain the output word in the next position.
The processing method according to any one of claims 1-7, further comprising:

Inputting the sentence to be translated into the trained translation model to obtain the translated sentence of the sentence to be translated.
The processing method according to claim 8, wherein said inputting the sentence to be translated into the trained translation model to obtain the corresponding translated sentence comprises:

Input the sentence to be translated into the encoder of the translation model to obtain the feature vector of the sentence to be translated, and input the feature vector of the sentence to be translated into the decoder of the translation model;

According to the probability value of each word in each position output by the decoder, select a preset number of words as a plurality of candidate words for each position;

According to the plurality of alternative words in each position, generate a plurality of alternative words in the next position of each position until the end of the sentence is reached, wherein the number of alternative words in each position is the same;

Utilize described decoder to generate the alternative words of each position to form a plurality of alternative translation sentences, wherein, each alternative The generation between the words in the selected translation sentence is related;

A candidate translation sentence is selected according to the probability value of each candidate translation sentence as the translation sentence of the sentence to be translated.
The processing method according to claim 9, wherein, according to the probability value of each word in each position output by the decoder, selecting a preset number of words as a plurality of candidate words for each position includes :

For each word at each position that the decoder outputs, according to the probability value of the word and the probability value of each previous word associated with generating the word, determine the selection probability value of the word;

According to the selection probability value of each word in each position output by the decoder, a preset number of words are selected as multiple candidate words for each position generated by the decoder.
The processing method according to claim 9, wherein said selecting a candidate translation sentence according to the probability value of each candidate translation sentence, as the translation sentence of the sentence to be translated comprises:

For each candidate translation sentence generated by the decoder, determine the probability value of each candidate translation sentence according to the probability value of each word in the translation sentence;

The candidate translation sentence with the highest probability value is selected as the translation sentence of the sentence to be translated.
A processing device for a translation model, comprising:

An acquisition module, configured to acquire multiple groups of training sentences, wherein each group of training sentences includes: an original sentence and a target translation sentence;

The input module is used to input the original sentence into the encoder of the translation model for each group of training sentences to obtain the feature vector of the original sentence, and input the feature vector of the original sentence into the decoder of the translation model;

Determining module, for the word of each position except the end of sentence that described decoder generates, determine the semantic similarity between the generated sentence that this word and each word before this word form and described target translation sentence ;

Generating module, for selecting this word or the word of the same position as this word in described target translation sentence according to described semantic similarity, generate the word of next position;

The training module is used to train the translation model according to the difference between the translation sentence formed by the words in each position generated by the decoder and the target translation sentence.
The processing device according to claim 12, further comprising:

The translation module is used to input the sentence to be translated into the trained translation model to obtain the translation sentence of the sentence to be translated.
A processing device for a translation model, comprising:

processor; and

The memory coupled to the processor is used to store instructions, and when the instructions are executed by the processor, the processor executes the translation model processing method according to any one of claims 1-11.
A non-transitory computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method described in any one of claims 1-11 are implemented.
A computer program comprising:

An instruction, when the instruction is executed by the processor, causes the processor to execute the translation model processing method according to any one of claims 1-11.