WO2023155676A1 - Method and apparatus for processing translation model, and computer-readable storage medium - Google Patents

Method and apparatus for processing translation model, and computer-readable storage medium Download PDF

Info

Publication number
WO2023155676A1
WO2023155676A1 PCT/CN2023/073853 CN2023073853W WO2023155676A1 WO 2023155676 A1 WO2023155676 A1 WO 2023155676A1 CN 2023073853 W CN2023073853 W CN 2023073853W WO 2023155676 A1 WO2023155676 A1 WO 2023155676A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
sentence
translation
words
decoder
Prior art date
Application number
PCT/CN2023/073853
Other languages
French (fr)
Chinese (zh)
Inventor
张海楠
陈宏申
邹炎炎
丁卓冶
龙波
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2023155676A1 publication Critical patent/WO2023155676A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/51Translation evaluation

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a translation model processing method, device and computer-readable storage medium.
  • ARM is widely used in Natural Language Generation (Neural Language Generation, NLG) tasks, such as machine translation, dialogue reply generation, image subtitle generation and video description generation, using the encoder-decoder framework to predict and generate The short sentence is the next word of the condition.
  • NLG Natural Language Generation
  • a translation model processing method including: acquiring multiple sets of training sentences, wherein each group of training sentences includes: original sentences and target translation sentences; for each group of training sentences, inputting the original sentences
  • the encoder of the translation model obtains the feature vector of the original sentence, and inputs the feature vector of the original sentence into the decoder of the translation model; for each word generated by the decoder except the end of the sentence, determine the word and the word before the word
  • the semantic similarity between the generated sentence composed of each word in the target translation sentence and the target translation sentence select the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the next position word;
  • the difference between the translation sentence composed of words in each position and the target translation sentence is used to train the translation model.
  • determining the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence includes: the generated sentence composed of the word and the words before the word and the target translation sentence Translate the sentence as a sentence pair, input the BERT model based on the bidirectional encoding representation of the converter, and get the output The feature vector of the sentence pair is output; the feature vector of the sentence pair is input into the activation function module, and the semantic similarity between the generated sentence and the target translation sentence is obtained.
  • determining the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence includes: generating a random number, and comparing the random number with a reference value, wherein the reference value Located within the value range of the random number; determine whether the random number is less than the reference value, and if it is less than the reference value, determine the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence .
  • the reference value increases as the training times increase.
  • selecting the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the word in the next position includes: when the semantic similarity is higher than a threshold, generating the next word according to the word The word in the position; when the semantic similarity is lower than the threshold, the word in the next position is generated according to the word in the same position as the word in the target translation sentence.
  • the threshold increases as the training times increase.
  • the decoder includes a plurality of decoding modules, and selecting the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the word at the next position includes: selecting the word or the target according to the semantic similarity The word in the same position as the word in the translation sentence is used as the input word; the output state of the decoding module corresponding to the word and the word vector of the input word are input into the decoding module corresponding to the word in the next position, and the output of the next position is obtained. words.
  • the method further includes: inputting the sentence to be translated into the trained translation model to obtain a translation sentence of the sentence to be translated.
  • inputting the sentence to be translated into the trained translation model, and obtaining the corresponding translation sentence includes: inputting the sentence to be translated into the encoder of the translation model, obtaining the feature vector of the sentence to be translated, and converting the feature vector of the sentence to be translated Vector input to the decoder of the translation model; according to the probability value of each word in each position output by the decoder, select a preset number of words as multiple candidate words for each position.
  • multiple candidate words for each position Generate multiple alternative words in the next position of each position until the end of the sentence is reached, wherein the number of alternative words in each position is the same; use the decoder to generate alternative words in each position to form multiple alternative translation sentences , wherein the generation of words in each candidate translation sentence is associated; one candidate translation sentence is selected according to the probability value of each candidate translation sentence as the translation sentence of the sentence to be translated.
  • selecting a preset number of words as multiple candidate words for each position includes: for each position at each position output by the decoder word According to the probability value of the word and the probability values of the previous words associated with the generation of the word, the selection probability value of the word is determined; according to the selection probability value of each word in each position output by the decoder, a preset number of words of , as multiple candidate words for each position generated by the decoder.
  • a candidate translation sentence is selected according to the probability value of each candidate translation sentence, and the translation sentence as the sentence to be translated includes: for each candidate translation sentence generated by the decoder, according to each word in the translation sentence Determine the probability value of each candidate translation sentence; select the candidate translation sentence with the largest probability value as the translation sentence of the sentence to be translated.
  • a translation model processing device includes an acquisition module for acquiring multiple sets of training sentences, wherein each group of training sentences includes: an original sentence and a target translation sentence input module for For each group of training sentences, the original sentence is input into the encoder of the translation model to obtain the feature vector of the original sentence, and the feature vector of the original sentence is input into the decoder of the translation model; the determination module is used for removing the end of the sentence generated by the decoder Words in each position outside the word, determine the semantic similarity between the generated sentence and the target translation sentence formed by the word and the words before the word; the generation module is used to select the word or the target translation sentence according to the semantic similarity A word in the same position as the word generates a word in the next position; a training module is used to train the translation model according to the difference between the translation sentence formed by the words in each position generated by the decoder and the target translation sentence.
  • the device further includes: a translation module, configured to input the sentence to be translated into the trained translation model to obtain the corresponding translated sentence.
  • a translation module configured to input the sentence to be translated into the trained translation model to obtain the corresponding translated sentence.
  • a processing device for a translation model including: a processor; and a memory coupled to the processor for storing instructions.
  • the processor executes the above-mentioned The translation model processing method of any embodiment.
  • a non-transitory computer-readable storage medium provided according to some further embodiments of the present disclosure, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the translation model processing method of any of the foregoing embodiments are implemented .
  • Fig. 1 shows a schematic flowchart of a translation model processing method in some embodiments of the present disclosure.
  • Fig. 2 shows a schematic structural diagram of a translation model of some embodiments of the present disclosure.
  • Fig. 3 shows a schematic flowchart of a translation model processing method in some other embodiments of the present disclosure.
  • Fig. 4 shows a schematic structural diagram of a translation model processing device in some embodiments of the present disclosure.
  • Fig. 5 shows a schematic structural diagram of a translation model processing device according to other embodiments of the present disclosure.
  • Fig. 6 shows a schematic structural diagram of a translation model processing device according to some other embodiments of the present disclosure.
  • the data input to the decoder part during training and testing or use of the translation model are different, resulting in that the trained translation model cannot achieve accurate translation functions during testing and use.
  • the present disclosure provides a translation model processing method, which will be described below in conjunction with FIGS. 1-3 .
  • Fig. 1 is a flow chart of some embodiments of the translation model processing method of the present disclosure. As shown in FIG. 1, the method of this embodiment includes: steps S102-S110.
  • step S102 multiple sets of training sentences are obtained.
  • Each group of training sentences includes: an original sentence and a target translation sentence, for example, the original sentence is a Chinese sentence, and the target translation sentence is an English sentence with the same meaning.
  • step S104 for each group of training sentences, input the original sentence into the encoder of the translation model to obtain the feature vector of the original sentence, and input the feature vector of the original sentence into the decoder of the translation model.
  • a translation model includes an encoder and a decoder.
  • the translation model is, for example, a Seq2Seq (sequence-to-sequence) model, a Transformer model, etc., and is not limited to the examples given.
  • An encoder may include multiple encoding modules. For example, each module of the LSTM (Long Short-Term Memory, long short-term memory) of the encoder part of the Seq2Seq model is used as an encoding module, or multiple encoders of the Transformer model are respectively used as an encoding module.
  • LSTM Long Short-Term Memory, long short-term memory
  • the decoder can include multiple decoding modules, for example, each module of the LSTM in the decoder part of the Seq2Seq model is used as a decoding module, Or multiple decoders of the Transformer model are used as decoding modules respectively.
  • each module of the LSTM in the decoder part of the Seq2Seq model is used as a decoding module, Or multiple decoders of the Transformer model are used as decoding modules respectively.
  • the original sentence is input into the encoder to obtain the feature vector and then input into the decoder part.
  • step S106 for the word generated by the decoder at each position except the end of the sentence, determine the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence.
  • the decoder can generate multiple words and the probabilities of each word for each position except the sentence end of the translation sentence, and select a word with the highest probability as the generated word for the position.
  • Each position in the translation sentence generated by the decoder refers to the position corresponding to each word.
  • each decoding module in the decoder generates words y 1 ,...,y T in different positions in the translated sentence respectively.
  • the word at the current position generated by the decoder as y t
  • the generated sentence composed of the word and the words before the word is a sentence composed of y 1 ,...,y t , determine the composition of y 1 ,...,y t
  • the generated sentence composed of the word and the words before the word and the target translation sentence are used as a sentence pair, and input into the BERT (Bidirectional Encoder Representation from Transformers, based on the bidirectional encoding representation of the converter) model to obtain the output
  • the feature vector of the sentence pair input the feature vector of the sentence pair into the activation function module to obtain the semantic similarity between the generated sentence and the target translation sentence.
  • step S108 the word or the word in the same position as the word in the target translation sentence is selected according to the semantic similarity to generate a word in the next position. If the end of the sentence is reached, the steps of determining semantic similarity and generating the next-positioned word need not be performed.
  • the word at the next position is generated according to the word; when the semantic similarity is lower than the threshold, the word at the same position as the word in the target translation sentence is generated Generate the word for the next position.
  • the selection module is used to select the words in the current position and the words in the same position in the target translation sentence, if If the semantic similarity between the sentence composed of y 1 ,...,y t and Y′ is higher than the threshold, then input y t into the decoding module corresponding to y t+1 , otherwise, input y′ t into the decoding module corresponding to y t+1 , and then generate y t+1 .
  • the target translation sentence includes multiple, determine the semantic similarity between the generated sentence and each target translation sentence, select the highest value from the multiple semantic similarities as the semantic similarity corresponding to the word, the highest value corresponds to The target translation sentence is used as a reference sentence; when the semantic similarity is higher than the threshold, the word in the next position is generated according to the word; The word generates the word for the next position.
  • the threshold increases as the training times increase. Since the sentence generated by the model at the beginning of training has a relatively low similarity with the target translation sentence, the threshold is set lower at this time, and the words generated by the decoder can be introduced into the training process. As the number of training increases, the model is more accurate. , the similarity between the generated sentence and the target translation sentence is increased, and the threshold is increased, so that more accurate and reasonable words generated by the decoder can be selected and introduced into the training process.
  • the threshold can be represented by the following formula:
  • k and ⁇ are hyperparameters, k ⁇ 1, which determines the convergence speed, and n represents the number of training times.
  • Formula (1) can make the threshold be a specific value greater than 0 at the beginning of training, so as to ensure that the words introduced into the training process have a higher quality.
  • step S110 the translation model is trained according to the difference between the translated sentence composed of words in each position generated by the decoder and the target translated sentence.
  • the cross-entropy loss function can be calculated according to the generated translation sentence and the target translation sentence, and the translation model can be trained according to the cross-entropy loss function.
  • the translation model when training the translation model, for each word generated by the decoder, determine whether to introduce the word into the training according to the semantic similarity between the generated sentence containing the word and the target translation sentence. The procedure is used to generate the word for the next position in the translated sentence.
  • the content generated by the decoder can be gradually integrated into the training process to reduce the differences between the training process and the test or use process, and the content generated by the decoder can be selected according to the semantic similarity between the generated sentence and the target translation sentence. Words can make the selected words more real, reasonable and accurate, thereby improving the efficiency and accuracy of translation model training.
  • the words selected by the method of the above embodiment are more reasonable and accurate.
  • the training process reduces the training convergence speed, and the present disclosure also improves the training process, which will be described below in conjunction with FIG. 3 .
  • step S106 includes: performing steps S302 to S308 for the words generated by the decoder at each position except the end of the sentence.
  • step S302 a random number m is generated.
  • m may be a random number ranging from 0 to 1.
  • step S304 compare the random number m with the reference value u, if the random number m is smaller than the reference value u, then execute step S306, otherwise, execute step S308.
  • the reference value u can be determined, for example, using the following formula:
  • k ⁇ 1 is a hyperparameter, which determines the convergence speed, and n represents the number of training times.
  • step S306 determine the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence. Then step S108 can be executed.
  • step S308 a word in the next position is generated according to the word in the same position as the word in the target translation sentence.
  • Step S110 may be executed after step S108 or step S308.
  • Both the threshold value and the reference value increase with the number of training times, and both show that the growth rate accelerates after a certain number of training times is reached. However, after reaching a certain number of training times, the threshold value increases relatively steadily with the number of training times, and the reference value increases rapidly with the number of training times. And the lowest value of the threshold is higher than the lowest value of the reference value.
  • the maximum value of the threshold and the reference value is a value close to 1, but the range of the threshold should be set to a specific value higher than 0 at the beginning of training, reducing the probability of introducing poor-quality generated words into the training process , while the reference value can be set to 0 at the beginning, and keep close to 0 when the number of training times is small, which can reduce the probability of introducing poor-quality generated words into the training process.
  • the above reference value can also be set to decrease as the number of training times increases, so it needs to be judged whether the random number is greater than the reference value, if yes, perform step S306, otherwise, perform step S308.
  • the trend of the reference value and the upper is opposite, and will not be repeated here.
  • the method of the above embodiment by setting random numbers and reference values, can reduce the probability of introducing inaccurate, unreasonable, and low-quality words generated by the decoder into the training process in the early stage of training, and avoid reducing the speed of training convergence. Thereby improving the training efficiency and accuracy.
  • the sentence to be translated is input into the trained translation model to obtain the translated sentence of the sentence to be translated.
  • the sentences to be translated are input into the encoder of the translation model to obtain the feature vectors of the sentences to be translated, and
  • the feature vector of the sentence to be translated is input into the decoder of the translation model; according to the probability value of each word in each position output by the decoder, select a preset number of words as multiple candidate words for each position; Multiple candidate words, generate multiple candidate words for the next position of each position, until the end of the sentence is reached, wherein the number of candidate words for each position is the same; use the decoder to generate candidate word composition for each position A plurality of candidate translation sentences; one candidate translation sentence is selected according to the probability value of each candidate translation sentence as the translation sentence of the sentence to be translated.
  • the number of candidate words in each position is the same, for example, each position generates 3 candidate words, and the generation of words in each candidate translation sentence is associated.
  • the word is generated based on the word in the previous position, indicating that the generation of the word is associated with the word in the previous position.
  • the selected probability value of the word is determined according to the probability value of the word and the probability values of previous words associated with generating the word; according to the decoder
  • the selection probability value of each word in each position is output, and a preset number of words are selected as multiple candidate words for each position generated by the decoder.
  • a preset number of words may be selected from the selection probability values of each word in each position in descending order as multiple candidate words for the position. For each position, a plurality of candidate words of the position are respectively input into the decoding module corresponding to the next position, and a plurality of candidate words of the next position are respectively obtained.
  • the words in the first position generated by the decoder include: y 11 , y 12 , y 13 , y 14 , y 15, etc., which can be sorted from large to small according to the probability value, and select the preset number of words that are sorted first, For example, three words with the highest probability value are selected as y 11 , y 13 , and y 15 .
  • the decoder When the decoder generates the word at the second position, it inputs y 11 , y 13 , and y 15 into the decoding module corresponding to the second position, respectively, and obtains the word y 21 , y 22 at the second position corresponding to y 11 , respectively.
  • the probability value of each candidate translation sentence is determined according to the probability value of each word in the translation sentence; the candidate translation sentence with the largest probability value is selected as the candidate translation sentence to be The translation statement for the translation statement.
  • the method of the above embodiment can improve the accuracy of translation sentence generation and the richness of content, and improve the translation quality.
  • the present disclosure also provides an apparatus for processing a translation model, which will be described below with reference to FIG. 4 .
  • Fig. 4 is a structural diagram of some embodiments of a translation model processing device of the present disclosure.
  • the device 40 of this embodiment includes: an acquisition module 410 , an input module 420 , a determination module 430 , a generation module 440 , and a training module 450 .
  • the obtaining module 410 is used to obtain multiple sets of training sentences, wherein each group of training sentences includes: an original sentence and a target translation sentence.
  • the input module 420 is used for inputting the original sentence into the encoder of the translation model for each group of training sentences to obtain the feature vector of the original sentence, and input the feature vector of the original sentence into the decoder of the translation model.
  • the determination module 430 is used for determining the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence for the word generated by the decoder at each position except the end of the sentence.
  • the determination module 430 is configured to use the generated sentence composed of the word and the words before the word and the target translation sentence as a sentence pair, input the converter-based bidirectional encoding representation BERT model, and obtain the output sentence pair Feature vector: Input the feature vector of the sentence pair into the activation function module to obtain the semantic similarity between the generated sentence and the target translation sentence.
  • the determining module 430 is used to generate a random number, and compare the random number with a reference value, wherein the reference value is within the value range of the random number; determine whether the random number is smaller than the reference value, and if it is less than the reference value case, determine the semantic similarity between the generated sentence consisting of the word and the words preceding the word and the target translated sentence.
  • the reference value increases as the training times increase.
  • the generating module 440 is used to select the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate a word in the next position.
  • the generating module 440 is used to generate If the semantic similarity is lower than the threshold, generate the word in the next position according to the word in the same position as the word in the target translation sentence.
  • the threshold increases as the training times increase.
  • the generation module 440 is used to select the word or the word in the same position as the word in the target translation sentence according to the semantic similarity as the input word; the output state of the decoding module corresponding to the word and the word of the input word The vector is input to the decoding module corresponding to the word in the next position, and the output word in the next position is obtained.
  • the training module 450 is used for training the translation model according to the difference between the translation sentence composed of words in each position generated by the decoder and the target translation sentence.
  • the device 40 further includes: a translation module 460, configured to input the sentence to be translated into the trained translation model to obtain the translated sentence of the sentence to be translated.
  • a translation module 460 configured to input the sentence to be translated into the trained translation model to obtain the translated sentence of the sentence to be translated.
  • the translation module 460 is used to input the sentence to be translated into the encoder of the translation model, obtain the feature vector of the sentence to be translated, and input the feature vector of the sentence to be translated into the decoder of the translation model;
  • the probability value of each word in each position select the preset number of words as multiple candidate words for each position; according to the multiple candidate words for each position, generate multiple candidate words for the next position of each position Select words until the end of the sentence is reached, wherein the number of alternative words in each position is the same; use the decoder to generate alternative words in each position to form multiple alternative translation sentences, wherein the words in each alternative translation sentence The generation among them is associated; according to the probability value of each candidate translation sentence, a candidate translation sentence is selected as the translation sentence of the sentence to be translated.
  • the translation module 460 is used to determine the selection probability of the word for each word at each position output by the decoder, according to the probability value of the word and the probability values of the previous words associated with the generation of the word value; according to the selection probability value of each word in each position output by the decoder, select a preset number of words as multiple candidate words for each position generated by the decoder.
  • the translation module 460 is used for each candidate translation sentence generated by the decoder, according to the probability values of each word in the translation sentence, determine the probability value of each candidate translation sentence; select the candidate with the largest probability value Select the translation sentence as the translation sentence of the sentence to be translated.
  • the translation model processing devices in the embodiments of the present disclosure may be implemented by various computing devices or computer systems, which will be described below in conjunction with FIG. 5 and FIG. 6 .
  • Fig. 5 is a structural diagram of some embodiments of a translation model processing device of the present disclosure.
  • the device 50 of this embodiment includes: a memory 510 and a processor 520 coupled to the memory 510, the processor 520 is controlled by It is configured to execute the translation model processing method in any of the embodiments of the present disclosure based on the instructions stored in the memory 510 .
  • the memory 510 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
  • the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
  • Fig. 6 is a structural diagram of other embodiments of a translation model processing device of the present disclosure.
  • the apparatus 60 of this embodiment includes: a memory 610 and a processor 620 , which are similar to the memory 510 and the processor 520 respectively. It may also include an input/output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630 , 640 , 650 as well as the memory 610 and the processor 620 may be connected through a bus 660 , for example.
  • the input and output interface 630 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
  • the network interface 640 provides connection interfaces for various networking devices, for example, it can be connected to a database server or a cloud storage server.
  • the storage interface 650 provides connection interfaces for external storage devices such as SD cards and U disks.
  • the present disclosure also provides a computer program, including: instructions, which, when executed by the processor, cause the processor to execute the translation model processing method according to any of the foregoing embodiments.
  • the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Accordingly, the present disclosure can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. .
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The present disclosure relates to the technical field of computers, and in particular to a method and apparatus for processing a translation model, and a computer-readable storage medium. The method of the present disclosure comprises: acquiring a plurality of groups of training statements, wherein each group of training statements comprises an original statement and a target translation statement; for each group of training statements, inputting the original statement into an encoder of a translation model, so as to obtain a feature vector of the original statement, and inputting the feature vector of the original statement into a decoder of the translation model; for a word at each position except the tail of a statement which is generated by the decoder, determining the semantic similarity between a generated statement, which is formed by said word and various words prior to said word, and the target translation statement; according to the semantic similarity, selecting said word or a word in the target translation statement that is at the same position as said word, so as to generate a word at the next position; and training the translation model according to the difference between a translation statement, which is formed by words that are generated by the decoder and located at various positions, and the target translation statement.

Description

翻译模型的处理方法、装置和计算机可读存储介质Translation model processing method, device and computer-readable storage medium
相关申请的交叉引用Cross References to Related Applications
本申请要求于2022年2月18日提交中国专利局、申请号为202210150760.5的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application with application number 202210150760.5 filed with the China Patent Office on February 18, 2022, the entire contents of which are incorporated herein by reference.
技术领域technical field
本公开涉及计算机技术领域,特别涉及一种翻译模型的处理方法、装置和计算机可读存储介质。The present disclosure relates to the field of computer technology, and in particular to a translation model processing method, device and computer-readable storage medium.
背景技术Background technique
自回归模型(Autoregressive Model,ARM)广泛用于自然语言生成(Neural Language Generation,NLG)任务,如机器翻译、对话回复生成、图像字幕生成和视频描述生成,利用编码-解码器框架预测以已生成短句为条件的下一个单词。Autoregressive Model (ARM) is widely used in Natural Language Generation (Neural Language Generation, NLG) tasks, such as machine translation, dialogue reply generation, image subtitle generation and video description generation, using the encoder-decoder framework to predict and generate The short sentence is the next word of the condition.
针对机器翻译场景,ARM在训练时,真实翻译语句被用作已生成短句,这迫使模型直接学习真实翻译语句的分布。但在测试或使用时,已生成短语来自ARM的解码器本身,这与训练时的输入分布是不同的。For machine translation scenarios, when ARM is training, real translation sentences are used as generated short sentences, which forces the model to directly learn the distribution of real translation sentences. But at test or use time, the generated phrases come from the ARM's decoder itself, which is different from the input distribution at training time.
发明内容Contents of the invention
根据本公开的一些实施例提供的一种翻译模型的处理方法,包括:获取多组训练语句,其中,每组训练语句包括:原语句和目标翻译语句;针对每组训练语句,将原语句输入翻译模型的编码器,得到原语句的特征向量,并将原语句的特征向量输入翻译模型的解码器;针对解码器生成的除句尾之外每个位置的词语,确定该词语和该词语之前的各个词语组成的已生成语句与目标翻译语句之间的语义相似度;根据语义相似度选取该词语或者目标翻译语句中与该词语相同位置的词语生成下一位置的词语;根据解码器生成的各个位置的词语组成的翻译语句与目标翻译语句的差异,对翻译模型进行训练。According to some embodiments of the present disclosure, a translation model processing method is provided, including: acquiring multiple sets of training sentences, wherein each group of training sentences includes: original sentences and target translation sentences; for each group of training sentences, inputting the original sentences The encoder of the translation model obtains the feature vector of the original sentence, and inputs the feature vector of the original sentence into the decoder of the translation model; for each word generated by the decoder except the end of the sentence, determine the word and the word before the word The semantic similarity between the generated sentence composed of each word in the target translation sentence and the target translation sentence; select the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the next position word; The difference between the translation sentence composed of words in each position and the target translation sentence is used to train the translation model.
在一些实施例中,确定该词语和该词语之前的各个词语组成的已生成语句与目标翻译语句之间的语义相似度包括:将该词语和该词语之前的各个词语组成的已生成语句与目标翻译语句作为语句对,输入基于转换器的双向编码表征BERT模型,得到输 出语句对的特征向量;将语句对的特征向量输入激活函数模块,得到已生成语句与目标翻译语句的语义相似度。In some embodiments, determining the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence includes: the generated sentence composed of the word and the words before the word and the target translation sentence Translate the sentence as a sentence pair, input the BERT model based on the bidirectional encoding representation of the converter, and get the output The feature vector of the sentence pair is output; the feature vector of the sentence pair is input into the activation function module, and the semantic similarity between the generated sentence and the target translation sentence is obtained.
在一些实施例中,确定该词语和该词语之前的各个词语组成的已生成语句与目标翻译语句之间的语义相似度包括:生成随机数,并将随机数与参考值对比,其中,参考值位于随机数的取值范围内;确定随机数是否小于参考值,在小于参考值的情况下,确定该词语和该词语之前的各个词语组成的已生成语句与目标翻译语句之间的语义相似度。In some embodiments, determining the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence includes: generating a random number, and comparing the random number with a reference value, wherein the reference value Located within the value range of the random number; determine whether the random number is less than the reference value, and if it is less than the reference value, determine the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence .
在一些实施例中,参考值随训练次数的增大而增大。In some embodiments, the reference value increases as the training times increase.
在一些实施例中,根据语义相似度选取该词语或者目标翻译语句中与该词语相同位置的词语生成下一位置的词语包括:在语义相似度高于阈值的情况下,根据该词语生成下一位置的词语;在语义相似度低于阈值的情况下,根据目标翻译语句中与该词语相同位置的词语生成下一位置的词语。In some embodiments, selecting the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the word in the next position includes: when the semantic similarity is higher than a threshold, generating the next word according to the word The word in the position; when the semantic similarity is lower than the threshold, the word in the next position is generated according to the word in the same position as the word in the target translation sentence.
在一些实施例中,阈值随训练次数的增加而增大。In some embodiments, the threshold increases as the training times increase.
在一些实施例中,解码器包括多个解码模块,根据语义相似度选取该词语或者目标翻译语句中与该词语相同位置的词语生成下一位置的词语包括:根据语义相似度选取该词语或者目标翻译语句中与该词语相同位置的词语,作为输入词语;将该词语对应的解码模块输出的状态和输入词语的词向量,输入下一位置的词语对应的解码模块,得到输出的下一位置的词语。In some embodiments, the decoder includes a plurality of decoding modules, and selecting the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the word at the next position includes: selecting the word or the target according to the semantic similarity The word in the same position as the word in the translation sentence is used as the input word; the output state of the decoding module corresponding to the word and the word vector of the input word are input into the decoding module corresponding to the word in the next position, and the output of the next position is obtained. words.
在一些实施例中,该方法还包括:将待翻译语句输入训练完成的翻译模型,得到待翻译语句的翻译语句。In some embodiments, the method further includes: inputting the sentence to be translated into the trained translation model to obtain a translation sentence of the sentence to be translated.
在一些实施例中,将待翻译语句输入训练完成的翻译模型,得到对应的翻译语句包括:将待翻译语句输入翻译模型的编码器,得到待翻译语句的特征向量,并将待翻译语句的特征向量输入翻译模型的解码器;根据解码器输出的每个位置上各个词语的概率值,选取预设数量的词语作为每个位置的多个备选词语根据每个位置的多个备选词语,生成每个位置的下一位置的多个备选词语,直至达到句尾,其中,每个位置的备选词语的数量相同;利用解码器生成各个位置的备选词语组成多个备选翻译语句,其中,每个备选翻译语句中的词语之间的生成是相关联的;根据各个备选翻译语句的概率值选取一个备选翻译语句,作为待翻译语句的翻译语句。In some embodiments, inputting the sentence to be translated into the trained translation model, and obtaining the corresponding translation sentence includes: inputting the sentence to be translated into the encoder of the translation model, obtaining the feature vector of the sentence to be translated, and converting the feature vector of the sentence to be translated Vector input to the decoder of the translation model; according to the probability value of each word in each position output by the decoder, select a preset number of words as multiple candidate words for each position. According to multiple candidate words for each position, Generate multiple alternative words in the next position of each position until the end of the sentence is reached, wherein the number of alternative words in each position is the same; use the decoder to generate alternative words in each position to form multiple alternative translation sentences , wherein the generation of words in each candidate translation sentence is associated; one candidate translation sentence is selected according to the probability value of each candidate translation sentence as the translation sentence of the sentence to be translated.
在一些实施例中,根据解码器输出的每个位置上各个词语的概率值,选取预设数量的词语作为每个位置的多个备选词语包括:针对解码器输出的每个位置上的各个词 语,根据该词语的概率值以及与生成该词语关联的之前各个词语的概率值,确定该词语的选取概率值;根据解码器输出的每个位置上各个词语的选取概率值,选取预设数量的词语,作为解码器生成的每个位置的多个备选词语。In some embodiments, according to the probability value of each word at each position output by the decoder, selecting a preset number of words as multiple candidate words for each position includes: for each position at each position output by the decoder word According to the probability value of the word and the probability values of the previous words associated with the generation of the word, the selection probability value of the word is determined; according to the selection probability value of each word in each position output by the decoder, a preset number of words of , as multiple candidate words for each position generated by the decoder.
在一些实施例中,根据各个备选翻译语句的概率值选取一个备选翻译语句,作为待翻译语句的翻译语句包括:针对解码器生成的每个备选翻译语句,根据该翻译语句中各个词语的概率值,确定各个备选翻译语句的概率值;选取概率值最大的备选翻译语句作为待翻译语句的翻译语句。In some embodiments, a candidate translation sentence is selected according to the probability value of each candidate translation sentence, and the translation sentence as the sentence to be translated includes: for each candidate translation sentence generated by the decoder, according to each word in the translation sentence Determine the probability value of each candidate translation sentence; select the candidate translation sentence with the largest probability value as the translation sentence of the sentence to be translated.
根据本公开的另一些实施例提供的一种翻译模型的处理装置,包括获取模块,用于获取多组训练语句,其中,每组训练语句包括:原语句和目标翻译语句输入模块,用于针对每组训练语句,将原语句输入翻译模型的编码器,得到原语句的特征向量,并将原语句的特征向量输入翻译模型的解码器;确定模块,用于针对解码器生成的除句尾之外每个位置的词语,确定该词语和该词语之前的各个词语组成的已生成语句与目标翻译语句之间的语义相似度;生成模块,用于根据语义相似度选取该词语或者目标翻译语句中与该词语相同位置的词语生成下一位置的词语;训练模块,用于根据解码器生成的各个位置的词语组成的翻译语句与目标翻译语句的差异,对翻译模型进行训练。A translation model processing device provided according to other embodiments of the present disclosure includes an acquisition module for acquiring multiple sets of training sentences, wherein each group of training sentences includes: an original sentence and a target translation sentence input module for For each group of training sentences, the original sentence is input into the encoder of the translation model to obtain the feature vector of the original sentence, and the feature vector of the original sentence is input into the decoder of the translation model; the determination module is used for removing the end of the sentence generated by the decoder Words in each position outside the word, determine the semantic similarity between the generated sentence and the target translation sentence formed by the word and the words before the word; the generation module is used to select the word or the target translation sentence according to the semantic similarity A word in the same position as the word generates a word in the next position; a training module is used to train the translation model according to the difference between the translation sentence formed by the words in each position generated by the decoder and the target translation sentence.
在一些实施例中,该装置还包括:翻译模块,用于将待翻译语句输入训练完成的翻译模型,得到对应的翻译语句。In some embodiments, the device further includes: a translation module, configured to input the sentence to be translated into the trained translation model to obtain the corresponding translated sentence.
根据本公开的又一些实施例提供的一种翻译模型的处理装置,包括:处理器;以及耦接至处理器的存储器,用于存储指令,指令被处理器执行时,使处理器执行如前述任意实施例的翻译模型的处理方法。According to some other embodiments of the present disclosure, a processing device for a translation model is provided, including: a processor; and a memory coupled to the processor for storing instructions. When the instructions are executed by the processor, the processor performs the above-mentioned The translation model processing method of any embodiment.
根据本公开的再一些实施例提供的一种非瞬时性计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现前述任意实施例的翻译模型的处理方法的步骤。A non-transitory computer-readable storage medium provided according to some further embodiments of the present disclosure, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the translation model processing method of any of the foregoing embodiments are implemented .
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。Other features of the present disclosure and advantages thereof will become apparent through the following detailed description of exemplary embodiments of the present disclosure with reference to the accompanying drawings.
附图说明Description of drawings
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅 是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present disclosure. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1示出本公开的一些实施例的翻译模型的处理方法的流程示意图。Fig. 1 shows a schematic flowchart of a translation model processing method in some embodiments of the present disclosure.
图2示出本公开的一些实施例的翻译模型的结构示意图。Fig. 2 shows a schematic structural diagram of a translation model of some embodiments of the present disclosure.
图3示出本公开的另一些实施例的翻译模型的处理方法的流程示意图。Fig. 3 shows a schematic flowchart of a translation model processing method in some other embodiments of the present disclosure.
图4示出本公开的一些实施例的翻译模型的处理装置的结构示意图。Fig. 4 shows a schematic structural diagram of a translation model processing device in some embodiments of the present disclosure.
图5示出本公开的另一些实施例的翻译模型的处理装置的结构示意图。Fig. 5 shows a schematic structural diagram of a translation model processing device according to other embodiments of the present disclosure.
图6示出本公开的又一些实施例的翻译模型的处理装置的结构示意图。Fig. 6 shows a schematic structural diagram of a translation model processing device according to some other embodiments of the present disclosure.
具体实施方式Detailed ways
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present disclosure with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only some of the embodiments of the present disclosure, not all of them. The following description of at least one exemplary embodiment is merely illustrative in nature and in no way intended as any limitation of the disclosure, its application or uses. Based on the embodiments in the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present disclosure.
在相关技术中,翻译模型在训练时和测试或使用时输入解码器部分的数据不同,导致训练得到的翻译模型无法在测试和使用时实现准确的翻译功能。In related technologies, the data input to the decoder part during training and testing or use of the translation model are different, resulting in that the trained translation model cannot achieve accurate translation functions during testing and use.
本公开提供一种翻译模型的处理方法,下面结合图1~3进行描述。The present disclosure provides a translation model processing method, which will be described below in conjunction with FIGS. 1-3 .
图1为本公开翻译模型的处理方法一些实施例的流程图。如图1所示,该实施例的方法包括:步骤S102~S110。Fig. 1 is a flow chart of some embodiments of the translation model processing method of the present disclosure. As shown in FIG. 1, the method of this embodiment includes: steps S102-S110.
在步骤S102中,获取多组训练语句。In step S102, multiple sets of training sentences are obtained.
每组训练语句包括:原语句和目标翻译语句,例如,原语句为中文语句,目标翻译语句为相同含义的英文语句。Each group of training sentences includes: an original sentence and a target translation sentence, for example, the original sentence is a Chinese sentence, and the target translation sentence is an English sentence with the same meaning.
在步骤S104中,针对每组训练语句,将原语句输入翻译模型的编码器,得到原语句的特征向量,并将原语句的特征向量输入翻译模型的解码器。In step S104, for each group of training sentences, input the original sentence into the encoder of the translation model to obtain the feature vector of the original sentence, and input the feature vector of the original sentence into the decoder of the translation model.
翻译模型包括编码器和解码器。翻译模型例如为Seq2Seq(序列对序列)模型、Transformer模型等,不限于所举示例。编码器可以包括多个编码模块。例如,Seq2Seq模型的编码器部分的LSTM(Long Short-Term Memory,长短期记忆)的各个模块作为编码模块,或Transformer模型多个编码器分别作为编码模块。解码器可以包括多个解码模块,例如,Seq2Seq模型的解码器部分的LSTM的各个模块作为解码模块, 或Transformer模型多个解码器分别作为解码模块。针对每组训练语句,将原语句输入编码器后得到特征向量再输入解码器部分。A translation model includes an encoder and a decoder. The translation model is, for example, a Seq2Seq (sequence-to-sequence) model, a Transformer model, etc., and is not limited to the examples given. An encoder may include multiple encoding modules. For example, each module of the LSTM (Long Short-Term Memory, long short-term memory) of the encoder part of the Seq2Seq model is used as an encoding module, or multiple encoders of the Transformer model are respectively used as an encoding module. The decoder can include multiple decoding modules, for example, each module of the LSTM in the decoder part of the Seq2Seq model is used as a decoding module, Or multiple decoders of the Transformer model are used as decoding modules respectively. For each set of training sentences, the original sentence is input into the encoder to obtain the feature vector and then input into the decoder part.
在步骤S106中,针对解码器生成的除句尾之外每个位置的词语,确定该词语和该词语之前的各个词语组成的已生成语句与目标翻译语句之间的语义相似度。In step S106, for the word generated by the decoder at each position except the end of the sentence, determine the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence.
解码器可以针对除翻译语句的句尾之外每个位置生成多个词语以及各个词语的概率,从中选取一个概率最大的词语作为生成的该位置的词语。解码器生成的翻译语句中每个位置是指每个词语对应的位置。The decoder can generate multiple words and the probabilities of each word for each position except the sentence end of the translation sentence, and select a word with the highest probability as the generated word for the position. Each position in the translation sentence generated by the decoder refers to the position corresponding to each word.
例如,原语句表示为X={x1,…,xN},生成的翻译语句表示为Y={y1,…,yT},目标翻译语句表示为Y′={y′1,…,y′M}。如图2所示,解码器中各个解码模块分别生成翻译语句中不同位置的词语y1,…,yT。将解码器生成的当前位置的词语表示为yt,则该词语和该词语之前的各个词语组成的已生成语句为y1,…,yt组成的语句,确定y1,…,yt组成的语句与Y′的语义相似度。For example, the original sentence is represented as X={x 1 ,…,x N }, the generated translation sentence is represented as Y={y 1 ,…,y T }, and the target translation sentence is represented as Y′={y′ 1 ,… ,y′ M }. As shown in FIG. 2 , each decoding module in the decoder generates words y 1 ,...,y T in different positions in the translated sentence respectively. Denote the word at the current position generated by the decoder as y t , then the generated sentence composed of the word and the words before the word is a sentence composed of y 1 ,...,y t , determine the composition of y 1 ,...,y t The semantic similarity between the sentence and Y'.
在一些实施例中,将该词语和该词语之前的各个词语组成的已生成语句与目标翻译语句作为语句对,输入BERT(Bidirectional Encoder Representation from Transformers,基于转换器的双向编码表征)模型,得到输出语句对的特征向量;将语句对的特征向量输入激活函数模块,得到已生成语句与目标翻译语句的语义相似度。将语句对输入BERT模型前,可以在首部加入特殊标识符,并在两个句子间加入分隔符,取BERT模型输出的向量中特殊标识符对应的向量作为语句对的特征向量,将语句对的特征向量输入sigmoid模块得到已生成语句与目标翻译语句的语义相似度。BERT模型可以是预先训练好的。In some embodiments, the generated sentence composed of the word and the words before the word and the target translation sentence are used as a sentence pair, and input into the BERT (Bidirectional Encoder Representation from Transformers, based on the bidirectional encoding representation of the converter) model to obtain the output The feature vector of the sentence pair; input the feature vector of the sentence pair into the activation function module to obtain the semantic similarity between the generated sentence and the target translation sentence. Before inputting the sentence pair into the BERT model, you can add a special identifier in the header and add a separator between the two sentences, take the vector corresponding to the special identifier in the vector output by the BERT model as the feature vector of the sentence pair, and put the sentence pair The feature vector is input into the sigmoid module to obtain the semantic similarity between the generated sentence and the target translation sentence. BERT models can be pre-trained.
在步骤S108中,根据语义相似度选取该词语或者目标翻译语句中与该词语相同位置的词语生成下一位置的词语。如果到达句尾,则不需要执行确定语义相似度以及生成下一位置的词语的步骤。In step S108, the word or the word in the same position as the word in the target translation sentence is selected according to the semantic similarity to generate a word in the next position. If the end of the sentence is reached, the steps of determining semantic similarity and generating the next-positioned word need not be performed.
在一些实施例中,在语义相似度高于阈值的情况下,根据该词语生成下一位置的词语;在语义相似度低于阈值的情况下,根据目标翻译语句中与该词语相同位置的词语生成下一位置的词语。In some embodiments, when the semantic similarity is higher than the threshold, the word at the next position is generated according to the word; when the semantic similarity is lower than the threshold, the word at the same position as the word in the target translation sentence is generated Generate the word for the next position.
在一些实施例中,根据语义相似度选取该词语或者目标翻译语句中与该词语相同位置的词语,作为输入词语;将该词语对应的解码模块输出的状态和输入词语的词向量,输入下一位置的词语对应的解码模块,得到输出的下一位置的词语。如图2所示,选取模块用于对当前位置的词语和目标翻译语句中相同位置的词语进行选择,如果 y1,…,yt组成的语句与Y′的语义相似度高于阈值,则将yt输入yt+1对应的解码模块,否则,将y′t输入yt+1对应的解码模块,进而生成yt+1In some embodiments, select the word or the word in the same position as the word in the target translation sentence according to the semantic similarity as the input word; the output state of the decoding module corresponding to the word and the word vector of the input word are input to the next The decoding module corresponding to the word of the position obtains the word of the next position outputted. As shown in Figure 2, the selection module is used to select the words in the current position and the words in the same position in the target translation sentence, if If the semantic similarity between the sentence composed of y 1 ,…,y t and Y′ is higher than the threshold, then input y t into the decoding module corresponding to y t+1 , otherwise, input y′ t into the decoding module corresponding to y t+1 , and then generate y t+1 .
在一些实施例中,如果目标翻译语句包括多个,确定已生成语句与各个目标翻译语句的语义相似度,从多个语义相似度中选取最高值作为该词语对应的语义相似度,最高值对应的目标翻译语句作为参考语句;在语义相似度高于阈值的情况下,根据该词语生成下一位置的词语;在语义相似度低于阈值的情况下,根据参考语句中与该词语相同位置的词语生成下一位置的词语。In some embodiments, if the target translation sentence includes multiple, determine the semantic similarity between the generated sentence and each target translation sentence, select the highest value from the multiple semantic similarities as the semantic similarity corresponding to the word, the highest value corresponds to The target translation sentence is used as a reference sentence; when the semantic similarity is higher than the threshold, the word in the next position is generated according to the word; The word generates the word for the next position.
在一些实施例中,阈值随训练次数的增加而增大。由于模型在训练开始时生成的语句与目标翻译语句的相似度比较低,此时将阈值设置的低一些,可以将解码器生成的词语引入训练过程,随着训练次数的增大,模型更加准确,生成的语句与目标翻译语句的相似度增大,阈值增大则可以选取解码器生成的更加准确和合理的词语引入训练过程。In some embodiments, the threshold increases as the training times increase. Since the sentence generated by the model at the beginning of training has a relatively low similarity with the target translation sentence, the threshold is set lower at this time, and the words generated by the decoder can be introduced into the training process. As the number of training increases, the model is more accurate. , the similarity between the generated sentence and the target translation sentence is increased, and the threshold is increased, so that more accurate and reasonable words generated by the decoder can be selected and introduced into the training process.
在一些实施例中,阈值可以采用以下公式表示:
In some embodiments, the threshold can be represented by the following formula:
公式(1)中,k、γ为超参数,k≥1,决定了收敛速度,n表示训练次数。公式(1)可以使阈值在训练开始时为大于0的一个特定值,以保证引入训练过程的词语有较高的质量。In formula (1), k and γ are hyperparameters, k≥1, which determines the convergence speed, and n represents the number of training times. Formula (1) can make the threshold be a specific value greater than 0 at the beginning of training, so as to ensure that the words introduced into the training process have a higher quality.
在步骤S110中,根据解码器生成的各个位置的词语组成的翻译语句与目标翻译语句的差异,对翻译模型进行训练。In step S110, the translation model is trained according to the difference between the translated sentence composed of words in each position generated by the decoder and the target translated sentence.
解码器生成完整的翻译语句后,可以根据生成的翻译语句与目标翻译语句计算交叉熵损失函数,根据交叉熵损失函数对翻译模型进行训练。After the decoder generates a complete translation sentence, the cross-entropy loss function can be calculated according to the generated translation sentence and the target translation sentence, and the translation model can be trained according to the cross-entropy loss function.
上述实施例中在进行翻译模型的训练时,针对解码器生成的每个位置的词语,根据包含该词语的已生成语句与目标翻译语句之间的语义相似度,来确定是否将该词语引入训练过程用于生成翻译语句中下一位置的词语。这样可以逐步将解码器生成的内容融入到训练的过程中,以减少训练过程与测试或使用过程存在的差异,并且根据已生成语句与目标翻译语句之间的语义相似度来选取解码器生成的词语,可以使选取的词语更加真实、合理和准确,从而提高翻译模型训练的效率和准确性。此外,相对于直接比对每个位置的词语和目标翻译语句中的词语之间的相似度,上述实施例的方法选取的词语更加合理和准确。In the above embodiment, when training the translation model, for each word generated by the decoder, determine whether to introduce the word into the training according to the semantic similarity between the generated sentence containing the word and the target translation sentence. The procedure is used to generate the word for the next position in the translated sentence. In this way, the content generated by the decoder can be gradually integrated into the training process to reduce the differences between the training process and the test or use process, and the content generated by the decoder can be selected according to the semantic similarity between the generated sentence and the target translation sentence. Words can make the selected words more real, reasonable and accurate, thereby improving the efficiency and accuracy of translation model training. In addition, compared with directly comparing the similarity between the words in each position and the words in the target translation sentence, the words selected by the method of the above embodiment are more reasonable and accurate.
由于训练过程开始时,生成的词语质量较差,为了减少将质量较差的词语引入训 练过程降低训练收敛速度,本公开还对训练过程进行了改进,下面结合图3进行描述。Since the generated words are of poor quality at the beginning of the training process, in order to reduce the introduction of poor quality words into the training The training process reduces the training convergence speed, and the present disclosure also improves the training process, which will be described below in conjunction with FIG. 3 .
图3为本公开翻译模型的处理方法另一些实施例的流程图。如图3所示,步骤S106包括:针对解码器生成的除句尾之外每个位置的词语,执行步骤S302~S308。Fig. 3 is a flow chart of another embodiment of the translation model processing method of the present disclosure. As shown in FIG. 3 , step S106 includes: performing steps S302 to S308 for the words generated by the decoder at each position except the end of the sentence.
在步骤S302中,生成随机数m。In step S302, a random number m is generated.
m可以为0~1的随机数。m may be a random number ranging from 0 to 1.
在步骤S304中,将随机数m与参考值u进行对比,如果随机数m小于参考值u,则执行步骤S306,否则,执行步骤S308。In step S304, compare the random number m with the reference value u, if the random number m is smaller than the reference value u, then execute step S306, otherwise, execute step S308.
在一些实施例中,参考值u随训练次数的增大而增大,参考值u的最小值可以为0,最大值可以为1,当u=0时,则是直接将目标翻译语句的词语输入解码器对模型进行训练,而当u=1时,模型训练的输入完全依赖于解码器生成的词语,像测试或使用阶段一样做训练。如果u设置得太低(接近于0),解码器的输入几乎完全来自目标翻译语句,并且无法处理测试或使用阶段的未知词语。如果在训练开始时将u设置得太高(接近于1),由于模型并没有得到很好的训练,生成的词语质量较差,作为解码器的输入可能导致收敛的速度缓慢。参考值u例如可以采用以下公式确定:
In some embodiments, the reference value u increases with the increase of the training times, the minimum value of the reference value u can be 0, and the maximum value can be 1, when u=0, it is the words of the target translation sentence directly The input decoder trains the model, and when u=1, the input of the model training is completely dependent on the words generated by the decoder, and the training is done like the test or use phase. If u is set too low (close to 0), the input to the decoder is almost entirely from the target translation sentence and cannot handle unknown words in the test or usage phase. If u is set too high (close to 1) at the beginning of training, since the model is not well trained, the generated words are of poor quality, which may lead to slow convergence as input to the decoder. The reference value u can be determined, for example, using the following formula:
公式(2)中,k≥1为超参数,决定了收敛速度,n表示训练次数。In formula (2), k≥1 is a hyperparameter, which determines the convergence speed, and n represents the number of training times.
在步骤S306中,确定该词语和该词语之前的各个词语组成的已生成语句与目标翻译语句之间的语义相似度。之后可以执行步骤S108。In step S306, determine the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence. Then step S108 can be executed.
在步骤S308中,根据目标翻译语句中与该词语相同位置的词语生成下一位置的词语。In step S308, a word in the next position is generated according to the word in the same position as the word in the target translation sentence.
在步骤S108或步骤S308之后可以执行步骤S110。Step S110 may be executed after step S108 or step S308.
阈值和参考值都随训练次数增大而增大,并且都呈现在达到一定训练次数后,增长速率加快。但是,达到一定训练次数后阈值随训练次数的增长相对比较平稳,参考值随训练次数的增长比较急促。并且阈值的最低取值高于参考值的最低取值。这是因为,阈值和参考值最大值为接近1的值,但是,阈值的范围在训练一开始就要设置为高于0的特定值,减少将生成的质量较差的词语引入训练过程的概率,而参考值一开始可以设置为0,并且在训练次数较少时保持接近0,这样可以减少将生成的质量较差的词语引入训练过程的概率。Both the threshold value and the reference value increase with the number of training times, and both show that the growth rate accelerates after a certain number of training times is reached. However, after reaching a certain number of training times, the threshold value increases relatively steadily with the number of training times, and the reference value increases rapidly with the number of training times. And the lowest value of the threshold is higher than the lowest value of the reference value. This is because the maximum value of the threshold and the reference value is a value close to 1, but the range of the threshold should be set to a specific value higher than 0 at the beginning of training, reducing the probability of introducing poor-quality generated words into the training process , while the reference value can be set to 0 at the beginning, and keep close to 0 when the number of training times is small, which can reduce the probability of introducing poor-quality generated words into the training process.
上述参考值也可以设置为随训练次数增大而减小,这样需要判断的是随机数是否大于参考值,如果是,则执行步骤S306,否则,执行步骤S308。参考值的趋势与上 述实施例是相反的,在此不再赘述。The above reference value can also be set to decrease as the number of training times increases, so it needs to be judged whether the random number is greater than the reference value, if yes, perform step S306, otherwise, perform step S308. The trend of the reference value and the upper The foregoing embodiment is opposite, and will not be repeated here.
上述实施例的方法,通过设置随机数和参考值的方式,可以在训练初期减少将解码器生成的不准确、不合理、质量较低的词语引入训练过程的概率,避免降低训练收敛的速度,从而提高训练效率和准确性。The method of the above embodiment, by setting random numbers and reference values, can reduce the probability of introducing inaccurate, unreasonable, and low-quality words generated by the decoder into the training process in the early stage of training, and avoid reducing the speed of training convergence. Thereby improving the training efficiency and accuracy.
下面描述训练完成后,翻译模型的测试或使用过程。The following describes the process of testing or using the translation model after the training is completed.
在一些实施例中,在对翻译模型训练完成之后,将待翻译语句输入训练完成的翻译模型,得到待翻译语句的翻译语句。In some embodiments, after the translation model is trained, the sentence to be translated is input into the trained translation model to obtain the translated sentence of the sentence to be translated.
为了进一步扩大生成词语的选择空间,提高生成的翻译语句的内容的丰富性和准确性,在一些实施例中,将待翻译语句输入翻译模型的编码器,得到待翻译语句的特征向量,并将待翻译语句的特征向量输入翻译模型的解码器;根据解码器输出的每个位置上各个词语的概率值,选取预设数量的词语作为每个位置的多个备选词语;根据每个位置的多个备选词语,生成每个位置的下一位置的多个备选词语,直至达到句尾,其中,每个位置的备选词语的数量相同;利用解码器生成各个位置的备选词语组成多个备选翻译语句;根据各个备选翻译语句的概率值选取一个备选翻译语句,作为待翻译语句的翻译语句。In order to further expand the selection space of generated words and improve the richness and accuracy of the content of the generated translation sentences, in some embodiments, the sentences to be translated are input into the encoder of the translation model to obtain the feature vectors of the sentences to be translated, and The feature vector of the sentence to be translated is input into the decoder of the translation model; according to the probability value of each word in each position output by the decoder, select a preset number of words as multiple candidate words for each position; Multiple candidate words, generate multiple candidate words for the next position of each position, until the end of the sentence is reached, wherein the number of candidate words for each position is the same; use the decoder to generate candidate word composition for each position A plurality of candidate translation sentences; one candidate translation sentence is selected according to the probability value of each candidate translation sentence as the translation sentence of the sentence to be translated.
每个位置的备选词语的数量相同,例如,每个位置都生成3个备选词语,每个备选翻译语句中的词语之间的生成是相关联的。针对一个备选翻译语句中除句首之外每个位置的词语,该词语是根据前一位置的词语生成的,表示该词语的生成与前一位置的词语相关联。The number of candidate words in each position is the same, for example, each position generates 3 candidate words, and the generation of words in each candidate translation sentence is associated. For a word in each position in a candidate translation sentence except the beginning of the sentence, the word is generated based on the word in the previous position, indicating that the generation of the word is associated with the word in the previous position.
在一些实施例中,针对解码器输出的每个位置上的各个词语,根据该词语的概率值以及与生成该词语关联的之前各个词语的概率值,确定该词语的选取概率值;根据解码器输出的每个位置上各个词语的选取概率值,选取预设数量的词语,作为解码器生成的每个位置的多个备选词语。In some embodiments, for each word at each position output by the decoder, the selected probability value of the word is determined according to the probability value of the word and the probability values of previous words associated with generating the word; according to the decoder The selection probability value of each word in each position is output, and a preset number of words are selected as multiple candidate words for each position generated by the decoder.
可以将每个位置上各个词语的选取概率值按照由大到小的顺序选取预设数量的词语作为该位置的多个备选词语。针对每个位置,将该位置的多个备选词语分别输入下一位置对应的解码模块中,分别得到下一位置的多个备选词语。A preset number of words may be selected from the selection probability values of each word in each position in descending order as multiple candidate words for the position. For each position, a plurality of candidate words of the position are respectively input into the decoding module corresponding to the next position, and a plurality of candidate words of the next position are respectively obtained.
例如,解码器生成第一个位置的词语包括:y11,y12,y13,y14,y15等,可以按照概率值进行从大到小排序,选取排序在前的预设数量词语,例如,选取3个概率值最大的词语为y11,y13,y15。解码器在生成第二个位置的词语时,将y11,y13,y15分别输入第二个位置对应的解码模块,分别得到y11对应的第二个位置的词语y21,y22,y23,y13 对应的第二个位置的词语y24,y25,y26,y15对应的第二个位置的词语y27,y28,y29。再从y21,…y29中选取备选词语,根据短语{y11,y21},{y11,y22},…,{y15,y29}的概率值,选取概率值最大的3个短语中的第二位置的词语,作为第二位置的备选词语。For example, the words in the first position generated by the decoder include: y 11 , y 12 , y 13 , y 14 , y 15, etc., which can be sorted from large to small according to the probability value, and select the preset number of words that are sorted first, For example, three words with the highest probability value are selected as y 11 , y 13 , and y 15 . When the decoder generates the word at the second position, it inputs y 11 , y 13 , and y 15 into the decoding module corresponding to the second position, respectively, and obtains the word y 21 , y 22 at the second position corresponding to y 11 , respectively. y 23 , y 13 Words y 24 , y 25 , y 26 corresponding to the second position, y 15 correspond to words y 27 , y 28 , y 29 in the second position. Then select alternative words from y 21 ,...y 29 , and select the one with the highest probability value according to the probability values of the phrases {y 11 , y 21 }, {y 11 , y 22 },...,{y 15 , y 29 } The word in the second position in the 3 phrases is used as an alternative word for the second position.
在一些实施例中,针对解码器生成的每个备选翻译语句,根据该翻译语句中各个词语的概率值,确定各个备选翻译语句的概率值;选取概率值最大的备选翻译语句作为待翻译语句的翻译语句。In some embodiments, for each candidate translation sentence generated by the decoder, the probability value of each candidate translation sentence is determined according to the probability value of each word in the translation sentence; the candidate translation sentence with the largest probability value is selected as the candidate translation sentence to be The translation statement for the translation statement.
上述实施例的方法,可以提高翻译语句生成的准确性和内容的丰富性,提高翻译质量。The method of the above embodiment can improve the accuracy of translation sentence generation and the richness of content, and improve the translation quality.
本公开还提供一种翻译模型的处理装置,下面结合图4进行描述。The present disclosure also provides an apparatus for processing a translation model, which will be described below with reference to FIG. 4 .
图4为本公开翻译模型的处理装置的一些实施例的结构图。如图4所示,该实施例的装置40包括:获取模块410,输入模块420,确定模块430,生成模块440,训练模块450。Fig. 4 is a structural diagram of some embodiments of a translation model processing device of the present disclosure. As shown in FIG. 4 , the device 40 of this embodiment includes: an acquisition module 410 , an input module 420 , a determination module 430 , a generation module 440 , and a training module 450 .
获取模块410用于获取多组训练语句,其中,每组训练语句包括:原语句和目标翻译语句。The obtaining module 410 is used to obtain multiple sets of training sentences, wherein each group of training sentences includes: an original sentence and a target translation sentence.
输入模块420用于针对每组训练语句,将原语句输入翻译模型的编码器,得到原语句的特征向量,并将原语句的特征向量输入翻译模型的解码器。The input module 420 is used for inputting the original sentence into the encoder of the translation model for each group of training sentences to obtain the feature vector of the original sentence, and input the feature vector of the original sentence into the decoder of the translation model.
确定模块430用于针对解码器生成的除句尾之外每个位置的词语,确定该词语和该词语之前的各个词语组成的已生成语句与目标翻译语句之间的语义相似度。The determination module 430 is used for determining the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence for the word generated by the decoder at each position except the end of the sentence.
在一些实施例中,确定模块430用于将该词语和该词语之前的各个词语组成的已生成语句与目标翻译语句作为语句对,输入基于转换器的双向编码表征BERT模型,得到输出语句对的特征向量;将语句对的特征向量输入激活函数模块,得到已生成语句与目标翻译语句的语义相似度。In some embodiments, the determination module 430 is configured to use the generated sentence composed of the word and the words before the word and the target translation sentence as a sentence pair, input the converter-based bidirectional encoding representation BERT model, and obtain the output sentence pair Feature vector: Input the feature vector of the sentence pair into the activation function module to obtain the semantic similarity between the generated sentence and the target translation sentence.
在一些实施例中,确定模块430用于生成随机数,并将随机数与参考值对比,其中,参考值位于随机数的取值范围内;确定随机数是否小于参考值,在小于参考值的情况下,确定该词语和该词语之前的各个词语组成的已生成语句与目标翻译语句之间的语义相似度。In some embodiments, the determining module 430 is used to generate a random number, and compare the random number with a reference value, wherein the reference value is within the value range of the random number; determine whether the random number is smaller than the reference value, and if it is less than the reference value case, determine the semantic similarity between the generated sentence consisting of the word and the words preceding the word and the target translated sentence.
在一些实施例中,参考值随训练次数的增大而增大。In some embodiments, the reference value increases as the training times increase.
生成模块440用于根据语义相似度选取该词语或者目标翻译语句中与该词语相同位置的词语生成下一位置的词语。The generating module 440 is used to select the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate a word in the next position.
在一些实施例中,生成模块440用于在语义相似度高于阈值的情况下,根据该词 语生成下一位置的词语;在语义相似度低于阈值的情况下,根据目标翻译语句中与该词语相同位置的词语生成下一位置的词语。In some embodiments, the generating module 440 is used to generate If the semantic similarity is lower than the threshold, generate the word in the next position according to the word in the same position as the word in the target translation sentence.
在一些实施例中,阈值随训练次数的增加而增大。In some embodiments, the threshold increases as the training times increase.
在一些实施例中,生成模块440用于根据语义相似度选取该词语或者目标翻译语句中与该词语相同位置的词语,作为输入词语;将该词语对应的解码模块输出的状态和输入词语的词向量,输入下一位置的词语对应的解码模块,得到输出的下一位置的词语。In some embodiments, the generation module 440 is used to select the word or the word in the same position as the word in the target translation sentence according to the semantic similarity as the input word; the output state of the decoding module corresponding to the word and the word of the input word The vector is input to the decoding module corresponding to the word in the next position, and the output word in the next position is obtained.
训练模块450用于根据解码器生成的各个位置的词语组成的翻译语句与目标翻译语句的差异,对翻译模型进行训练。The training module 450 is used for training the translation model according to the difference between the translation sentence composed of words in each position generated by the decoder and the target translation sentence.
在一些实施例中,该装置40还包括:翻译模块460,用于将待翻译语句输入训练完成的翻译模型,得到待翻译语句的翻译语句。In some embodiments, the device 40 further includes: a translation module 460, configured to input the sentence to be translated into the trained translation model to obtain the translated sentence of the sentence to be translated.
在一些实施例中,翻译模块460用于将待翻译语句输入翻译模型的编码器,得到待翻译语句的特征向量,并将待翻译语句的特征向量输入翻译模型的解码器;根据解码器输出的每个位置上各个词语的概率值,选取预设数量的词语作为每个位置的多个备选词语;根据每个位置的多个备选词语,生成每个位置的下一位置的多个备选词语,直至达到句尾,其中,每个位置的备选词语的数量相同;利用解码器生成各个位置的备选词语组成多个备选翻译语句,其中,每个备选翻译语句中的词语之间的生成是相关联的;根据各个备选翻译语句的概率值选取一个备选翻译语句,作为待翻译语句的翻译语句。In some embodiments, the translation module 460 is used to input the sentence to be translated into the encoder of the translation model, obtain the feature vector of the sentence to be translated, and input the feature vector of the sentence to be translated into the decoder of the translation model; The probability value of each word in each position, select the preset number of words as multiple candidate words for each position; according to the multiple candidate words for each position, generate multiple candidate words for the next position of each position Select words until the end of the sentence is reached, wherein the number of alternative words in each position is the same; use the decoder to generate alternative words in each position to form multiple alternative translation sentences, wherein the words in each alternative translation sentence The generation among them is associated; according to the probability value of each candidate translation sentence, a candidate translation sentence is selected as the translation sentence of the sentence to be translated.
在一些实施例中,翻译模块460用于针对解码器输出的每个位置上的各个词语,根据该词语的概率值以及与生成该词语关联的之前各个词语的概率值,确定该词语的选取概率值;根据解码器输出的每个位置上各个词语的选取概率值,选取预设数量的词语,作为解码器生成的每个位置的多个备选词语。In some embodiments, the translation module 460 is used to determine the selection probability of the word for each word at each position output by the decoder, according to the probability value of the word and the probability values of the previous words associated with the generation of the word value; according to the selection probability value of each word in each position output by the decoder, select a preset number of words as multiple candidate words for each position generated by the decoder.
在一些实施例中,翻译模块460用于针对解码器生成的每个备选翻译语句,根据该翻译语句中各个词语的概率值,确定各个备选翻译语句的概率值;选取概率值最大的备选翻译语句作为待翻译语句的翻译语句。In some embodiments, the translation module 460 is used for each candidate translation sentence generated by the decoder, according to the probability values of each word in the translation sentence, determine the probability value of each candidate translation sentence; select the candidate with the largest probability value Select the translation sentence as the translation sentence of the sentence to be translated.
本公开的实施例中的翻译模型的处理装置可各由各种计算设备或计算机系统来实现,下面结合图5以及图6进行描述。The translation model processing devices in the embodiments of the present disclosure may be implemented by various computing devices or computer systems, which will be described below in conjunction with FIG. 5 and FIG. 6 .
图5为本公开翻译模型的处理装置的一些实施例的结构图。如图5所示,该实施例的装置50包括:存储器510以及耦接至该存储器510的处理器520,处理器520被 配置为基于存储在存储器510中的指令,执行本公开中任意一些实施例中的翻译模型的处理方法。Fig. 5 is a structural diagram of some embodiments of a translation model processing device of the present disclosure. As shown in FIG. 5 , the device 50 of this embodiment includes: a memory 510 and a processor 520 coupled to the memory 510, the processor 520 is controlled by It is configured to execute the translation model processing method in any of the embodiments of the present disclosure based on the instructions stored in the memory 510 .
其中,存储器510例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)、数据库以及其他程序等。Wherein, the memory 510 may include, for example, a system memory, a fixed non-volatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
图6为本公开翻译模型的处理装置的另一些实施例的结构图。如图6所示,该实施例的装置60包括:存储器610以及处理器620,分别与存储器510以及处理器520类似。还可以包括输入输出接口630、网络接口640、存储接口650等。这些接口630,640,650以及存储器610和处理器620之间例如可以通过总线660连接。其中,输入输出接口630为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口640为各种联网设备提供连接接口,例如可以连接到数据库服务器或者云端存储服务器等。存储接口650为SD卡、U盘等外置存储设备提供连接接口。Fig. 6 is a structural diagram of other embodiments of a translation model processing device of the present disclosure. As shown in FIG. 6 , the apparatus 60 of this embodiment includes: a memory 610 and a processor 620 , which are similar to the memory 510 and the processor 520 respectively. It may also include an input/output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630 , 640 , 650 as well as the memory 610 and the processor 620 may be connected through a bus 660 , for example. Wherein, the input and output interface 630 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 640 provides connection interfaces for various networking devices, for example, it can be connected to a database server or a cloud storage server. The storage interface 650 provides connection interfaces for external storage devices such as SD cards and U disks.
本公开还提供一种计算机程序,包括:指令,所述指令被所述处理器执行时,使所述处理器执行如前述任意实施例的翻译模型的处理方法。The present disclosure also provides a computer program, including: instructions, which, when executed by the processor, cause the processor to execute the translation model processing method according to any of the foregoing embodiments.
本领域内的技术人员应当明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Accordingly, the present disclosure can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. .
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解为可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present disclosure. It should be understood that each process and/or block in the flowchart and/or block diagram, and a combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。 These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
以上所述仅为本公开的较佳实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。 The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present disclosure shall be included in the protection of the present disclosure. within range.

Claims (16)

  1. 一种翻译模型的处理方法,包括:A processing method for a translation model, comprising:
    获取多组训练语句,其中,每组训练语句包括:原语句和目标翻译语句;Obtain multiple groups of training sentences, wherein each group of training sentences includes: an original sentence and a target translation sentence;
    针对每组训练语句,将原语句输入翻译模型的编码器,得到所述原语句的特征向量,并将所述原语句的特征向量输入所述翻译模型的解码器;For each group of training sentences, the original sentence is input into the encoder of the translation model to obtain the feature vector of the original sentence, and the feature vector of the original sentence is input into the decoder of the translation model;
    针对所述解码器生成的除句尾之外每个位置的词语,确定该词语和该词语之前的各个词语组成的已生成语句与所述目标翻译语句之间的语义相似度;For the words of each position except the end of the sentence generated by the decoder, determine the semantic similarity between the generated sentence formed by the word and each word before the word and the target translation sentence;
    根据所述语义相似度选取该词语或者所述目标翻译语句中与该词语相同位置的词语生成下一位置的词语;Select this word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the word in the next position;
    根据所述解码器生成的各个位置的词语组成的翻译语句与所述目标翻译语句的差异,对所述翻译模型进行训练。The translation model is trained according to the difference between the translation sentence formed by the words in each position generated by the decoder and the target translation sentence.
  2. 根据权利要求1所述的处理方法,其中,所述确定该词语和该词语之前的各个词语组成的已生成语句与所述目标翻译语句之间的语义相似度包括:The processing method according to claim 1, wherein said determining the semantic similarity between the generated sentence formed by the word and each word before the word and the target translation sentence comprises:
    将该词语和该词语之前的各个词语组成的已生成语句与所述目标翻译语句作为语句对,输入基于转换器的双向编码表征BERT模型,得到输出所述语句对的特征向量;The generated sentence formed by the word and each word before the word and the target translation sentence are used as a sentence pair, and the input is based on the converter-based bidirectional encoding representation BERT model, and the feature vector of the output sentence pair is obtained;
    将所述语句对的特征向量输入激活函数模块,得到所述已生成语句与所述目标翻译语句的语义相似度。Inputting the feature vector of the sentence pair into an activation function module to obtain the semantic similarity between the generated sentence and the target translation sentence.
  3. 根据权利要求1所述的处理方法,其中,所述确定该词语和该词语之前的各个词语组成的已生成语句与所述目标翻译语句之间的语义相似度包括:The processing method according to claim 1, wherein said determining the semantic similarity between the generated sentence formed by the word and each word before the word and the target translation sentence comprises:
    生成随机数,并将所述随机数与参考值对比,其中,所述参考值位于所述随机数的取值范围内;generating a random number, and comparing the random number with a reference value, wherein the reference value is within the value range of the random number;
    确定所述随机数是否小于所述参考值,在小于所述参考值的情况下,确定该词语和该词语之前的各个词语组成的已生成语句与所述目标翻译语句之间的语义相似度。Determine whether the random number is smaller than the reference value, and if it is smaller than the reference value, determine the semantic similarity between the generated sentence composed of the word and the words before the word and the target translation sentence.
  4. 根据权利要求3所述的处理方法,其中,The processing method according to claim 3, wherein,
    所述参考值随训练次数的增大而增大。 The reference value increases with the increase of training times.
  5. 根据权利要求1所述的处理方法,其中,所述根据所述语义相似度选取该词语或者所述目标翻译语句中与该词语相同位置的词语生成下一位置的词语包括:The processing method according to claim 1, wherein said selecting the word or the word in the same position as the word in the target translation sentence according to the semantic similarity to generate the word in the next position comprises:
    在所述语义相似度高于阈值的情况下,根据该词语生成下一位置的词语;Under the situation that described semantic similarity is higher than threshold value, generate the word of next position according to this word;
    在所述语义相似度低于阈值的情况下,根据所述目标翻译语句中与该词语相同位置的词语生成下一位置的词语。When the semantic similarity is lower than the threshold, a word at the next position is generated according to the word at the same position as the word in the target translation sentence.
  6. 根据权利要求5所述的处理方法,其中,The processing method according to claim 5, wherein,
    所述阈值随训练次数的增加而增大。The threshold increases with the increase of training times.
  7. 根据权利要求1所述的处理方法,其中,所述解码器包括多个解码模块,所述根据所述语义相似度选取该词语或者所述目标翻译语句中与该词语相同位置的词语生成下一位置的词语包括:The processing method according to claim 1, wherein the decoder includes a plurality of decoding modules, and the word selected according to the semantic similarity or the word in the same position as the word in the target translation sentence generates the next Words for position include:
    根据所述语义相似度选取该词语或者所述目标翻译语句中与该词语相同位置的词语,作为输入词语;Select the word in the same position as the word in the word or the target translation sentence according to the semantic similarity, as the input word;
    将该词语对应的解码模块输出的状态和所述输入词语的词向量,输入下一位置的词语对应的解码模块,得到输出的下一位置的词语。The output state of the decoding module corresponding to the word and the word vector of the input word are input into the decoding module corresponding to the word in the next position to obtain the output word in the next position.
  8. 根据权利要求1-7任一项所述的处理方法,还包括:The processing method according to any one of claims 1-7, further comprising:
    将待翻译语句输入训练完成的翻译模型,得到所述待翻译语句的翻译语句。Inputting the sentence to be translated into the trained translation model to obtain the translated sentence of the sentence to be translated.
  9. 根据权利要求8所述的处理方法,其中,所述将待翻译语句输入训练完成的翻译模型,得到对应的翻译语句包括:The processing method according to claim 8, wherein said inputting the sentence to be translated into the trained translation model to obtain the corresponding translated sentence comprises:
    将所述待翻译语句输入翻译模型的编码器,得到所述待翻译语句的特征向量,并将所述待翻译语句的特征向量输入所述翻译模型的解码器;Input the sentence to be translated into the encoder of the translation model to obtain the feature vector of the sentence to be translated, and input the feature vector of the sentence to be translated into the decoder of the translation model;
    根据所述解码器输出的每个位置上各个词语的概率值,选取预设数量的词语作为所述每个位置的多个备选词语;According to the probability value of each word in each position output by the decoder, select a preset number of words as a plurality of candidate words for each position;
    根据所述每个位置的多个备选词语,生成所述每个位置的下一位置的多个备选词语,直至达到句尾,其中,每个位置的备选词语的数量相同;According to the plurality of alternative words in each position, generate a plurality of alternative words in the next position of each position until the end of the sentence is reached, wherein the number of alternative words in each position is the same;
    利用所述解码器生成各个位置的备选词语组成多个备选翻译语句,其中,每个备 选翻译语句中的词语之间的生成是相关联的;Utilize described decoder to generate the alternative words of each position to form a plurality of alternative translation sentences, wherein, each alternative The generation between the words in the selected translation sentence is related;
    根据各个备选翻译语句的概率值选取一个备选翻译语句,作为所述待翻译语句的翻译语句。A candidate translation sentence is selected according to the probability value of each candidate translation sentence as the translation sentence of the sentence to be translated.
  10. 根据权利要求9所述的处理方法,其中,所述根据所述解码器输出的每个位置上各个词语的概率值,选取预设数量的词语作为所述每个位置的多个备选词语包括:The processing method according to claim 9, wherein, according to the probability value of each word in each position output by the decoder, selecting a preset number of words as a plurality of candidate words for each position includes :
    针对所述解码器输出的所述每个位置上的各个词语,根据该词语的概率值以及与生成该词语关联的之前各个词语的概率值,确定该词语的选取概率值;For each word at each position that the decoder outputs, according to the probability value of the word and the probability value of each previous word associated with generating the word, determine the selection probability value of the word;
    根据所述解码器输出的所述每个位置上各个词语的选取概率值,选取预设数量的词语,作为所述解码器生成的所述每个位置的多个备选词语。According to the selection probability value of each word in each position output by the decoder, a preset number of words are selected as multiple candidate words for each position generated by the decoder.
  11. 根据权利要求9所述的处理方法,其中,所述根据各个备选翻译语句的概率值选取一个备选翻译语句,作为所述待翻译语句的翻译语句包括:The processing method according to claim 9, wherein said selecting a candidate translation sentence according to the probability value of each candidate translation sentence, as the translation sentence of the sentence to be translated comprises:
    针对所述解码器生成的每个备选翻译语句,根据该翻译语句中各个词语的概率值,确定各个备选翻译语句的概率值;For each candidate translation sentence generated by the decoder, determine the probability value of each candidate translation sentence according to the probability value of each word in the translation sentence;
    选取概率值最大的备选翻译语句作为所述待翻译语句的翻译语句。The candidate translation sentence with the highest probability value is selected as the translation sentence of the sentence to be translated.
  12. 一种翻译模型的处理装置,包括:A processing device for a translation model, comprising:
    获取模块,用于获取多组训练语句,其中,每组训练语句包括:原语句和目标翻译语句;An acquisition module, configured to acquire multiple groups of training sentences, wherein each group of training sentences includes: an original sentence and a target translation sentence;
    输入模块,用于针对每组训练语句,将原语句输入翻译模型的编码器,得到所述原语句的特征向量,并将所述原语句的特征向量输入所述翻译模型的解码器;The input module is used to input the original sentence into the encoder of the translation model for each group of training sentences to obtain the feature vector of the original sentence, and input the feature vector of the original sentence into the decoder of the translation model;
    确定模块,用于针对所述解码器生成的除句尾之外每个位置的词语,确定该词语和该词语之前的各个词语组成的已生成语句与所述目标翻译语句之间的语义相似度;Determining module, for the word of each position except the end of sentence that described decoder generates, determine the semantic similarity between the generated sentence that this word and each word before this word form and described target translation sentence ;
    生成模块,用于根据所述语义相似度选取该词语或者所述目标翻译语句中与该词语相同位置的词语生成下一位置的词语;Generating module, for selecting this word or the word of the same position as this word in described target translation sentence according to described semantic similarity, generate the word of next position;
    训练模块,用于根据所述解码器生成的各个位置的词语组成的翻译语句与所述目标翻译语句的差异,对所述翻译模型进行训练。The training module is used to train the translation model according to the difference between the translation sentence formed by the words in each position generated by the decoder and the target translation sentence.
  13. 根据权利要求12所述的处理装置,还包括: The processing device according to claim 12, further comprising:
    翻译模块,用于将待翻译语句输入训练完成的翻译模型,得到所述待翻译语句的翻译语句。The translation module is used to input the sentence to be translated into the trained translation model to obtain the translation sentence of the sentence to be translated.
  14. 一种翻译模型的处理装置,包括:A processing device for a translation model, comprising:
    处理器;以及processor; and
    耦接至所述处理器的存储器,用于存储指令,所述指令被所述处理器执行时,使所述处理器执行如权利要求1-11任一项所述的翻译模型的处理方法。The memory coupled to the processor is used to store instructions, and when the instructions are executed by the processor, the processor executes the translation model processing method according to any one of claims 1-11.
  15. 一种非瞬时性计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现权利要求1-11任一项所述方法的步骤。A non-transitory computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method described in any one of claims 1-11 are implemented.
  16. 一种计算机程序,包括:A computer program comprising:
    指令,所述指令被所述处理器执行时,使所述处理器执行如权利要求1-11任一项所述的翻译模型的处理方法。 An instruction, when the instruction is executed by the processor, causes the processor to execute the translation model processing method according to any one of claims 1-11.
PCT/CN2023/073853 2022-02-18 2023-01-30 Method and apparatus for processing translation model, and computer-readable storage medium WO2023155676A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210150760.5 2022-02-18
CN202210150760.5A CN114595701A (en) 2022-02-18 2022-02-18 Translation model processing method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2023155676A1 true WO2023155676A1 (en) 2023-08-24

Family

ID=81806767

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/073853 WO2023155676A1 (en) 2022-02-18 2023-01-30 Method and apparatus for processing translation model, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN114595701A (en)
WO (1) WO2023155676A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595701A (en) * 2022-02-18 2022-06-07 北京沃东天骏信息技术有限公司 Translation model processing method and device and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108874785A (en) * 2018-06-01 2018-11-23 清华大学 A kind of translation processing method and system
US20180373704A1 (en) * 2017-06-21 2018-12-27 Samsung Electronics Co., Ltd. Method and apparatus for machine translation using neural network and method of training the apparatus
US20190057081A1 (en) * 2017-08-18 2019-02-21 Samsung Electronics Co., Ltd. Method and apparatus for generating natural language
CN111222347A (en) * 2020-04-15 2020-06-02 北京金山数字娱乐科技有限公司 Sentence translation model training method and device and sentence translation method and device
CN114595701A (en) * 2022-02-18 2022-06-07 北京沃东天骏信息技术有限公司 Translation model processing method and device and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180373704A1 (en) * 2017-06-21 2018-12-27 Samsung Electronics Co., Ltd. Method and apparatus for machine translation using neural network and method of training the apparatus
US20190057081A1 (en) * 2017-08-18 2019-02-21 Samsung Electronics Co., Ltd. Method and apparatus for generating natural language
CN108874785A (en) * 2018-06-01 2018-11-23 清华大学 A kind of translation processing method and system
CN111222347A (en) * 2020-04-15 2020-06-02 北京金山数字娱乐科技有限公司 Sentence translation model training method and device and sentence translation method and device
CN114595701A (en) * 2022-02-18 2022-06-07 北京沃东天骏信息技术有限公司 Translation model processing method and device and computer readable storage medium

Also Published As

Publication number Publication date
CN114595701A (en) 2022-06-07

Similar Documents

Publication Publication Date Title
US20230326466A1 (en) Text processing method and apparatus, electronic device, and medium
EP3913542A2 (en) Method and apparatus of training model, device, medium, and program product
CN111402861B (en) Voice recognition method, device, equipment and storage medium
WO2022121251A1 (en) Method and apparatus for training text processing model, computer device and storage medium
CN111159220B (en) Method and apparatus for outputting structured query statement
JP7413630B2 (en) Summary generation model training method, apparatus, device and storage medium
CN112101010B (en) Telecom industry OA office automation manuscript auditing method based on BERT
CN111539199B (en) Text error correction method, device, terminal and storage medium
JP2023039888A (en) Method, device, apparatus, and storage medium for model training and word stock generation
US20230178067A1 (en) Method of training speech synthesis model and method of synthesizing speech
WO2023155676A1 (en) Method and apparatus for processing translation model, and computer-readable storage medium
US11322151B2 (en) Method, apparatus, and medium for processing speech signal
KR102608867B1 (en) Method for industry text increment, apparatus thereof, and computer program stored in medium
CN111563148A (en) Dialog generation method based on phrase diversity
CN117371406A (en) Annotation generation method, device, equipment and medium based on large language model
CN111968646A (en) Voice recognition method and device
US20230317058A1 (en) Spoken language processing method and apparatus, and storage medium
CN115438678B (en) Machine translation method, device, electronic equipment and storage medium
WO2023193442A1 (en) Speech recognition method and apparatus, and device and medium
JP7349523B2 (en) Speech recognition method, speech recognition device, electronic device, storage medium computer program product and computer program
JP2023078411A (en) Information processing method, model training method, apparatus, appliance, medium and program product
CN111104806A (en) Construction method and device of neural machine translation model, and translation method and device
CN113553833B (en) Text error correction method and device and electronic equipment
CN114357984A (en) Homophone variant processing method based on pinyin
CN113806520A (en) Text abstract generation method and system based on reinforcement learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23755680

Country of ref document: EP

Kind code of ref document: A1