CN112668326B

CN112668326B - Sentence translation method, sentence translation device, sentence translation equipment and sentence translation storage medium

Info

Publication number: CN112668326B
Application number: CN202011524311.XA
Authority: CN
Inventors: 刘懿; 王健宗; 黄章成
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2024-03-08
Anticipated expiration: 2040-12-21
Also published as: CN112668326A

Abstract

The invention relates to artificial intelligence and discloses a sentence translation method, a sentence translation device, sentence translation equipment and a sentence translation storage medium, wherein the sentence translation method comprises the following steps: word segmentation and vectorization are carried out on the sentence to be translated, and an initial sentence vector is obtained; inputting the initial sentence vector into a sentence translation model based on federal learning, and determining target words corresponding to each word to be translated in the sentence to be translated and output probabilities corresponding to the target words according to model output results; and determining sentence translation results corresponding to the sentences to be translated according to the target vocabulary and the output probabilities corresponding to the target vocabulary. Because the sentence to be translated is converted into the sentence vector, the sentence vector is input into the sentence translation model based on federal learning, and the sentence translation result is determined according to each target vocabulary in the model output result and the output probability thereof, compared with the conventional sentence translation model trained by less corpus data, the sentence translation method based on federal learning has better translation effect and higher accuracy.

Description

Sentence translation method, sentence translation device, sentence translation equipment and sentence translation storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a sentence translation method, apparatus, device, and storage medium.

Background

With the promotion of internet technology, deep learning technology is increasingly widely applied, for example, a deep learning machine translation model is constructed to realize the conversion from a sentence to be translated into a target sentence. However, the machine translation model based on deep learning needs to be trained by a large amount of and various corpus data, and under the large background that large internet enterprises increasingly pay attention to self data security and privacy, many data providers are also more and more unwilling to directly expose private data to other enterprises, so that the defect that the translation result is inaccurate when the translation model used in the market at present is used for sentence translation is caused.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The invention mainly aims to provide a sentence translation method, a sentence translation device, sentence translation equipment and a sentence translation storage medium, and aims to solve the technical problem that sentences cannot be translated accurately in the prior art.

In order to achieve the above object, the present invention provides a sentence translation method, which includes the steps of:

word segmentation processing is carried out on the sentences to be translated, and a vocabulary sequence to be translated is obtained;

Vectorizing the vocabulary sequence to be translated to obtain an initial sentence vector;

inputting the initial sentence vector into a sentence translation model based on federation learning, and obtaining a model output result;

determining target words corresponding to each word to be translated in the sentence to be translated and output probabilities corresponding to the target words according to the model output results;

and determining a sentence translation result corresponding to the sentence to be translated according to the target vocabulary and the output probability corresponding to the target vocabulary.

Preferably, before the step of performing word segmentation processing on the sentence to be translated to obtain the vocabulary sequence to be translated, the method further includes:

obtaining model training corpus data, vectorizing the model training corpus data, and obtaining an initial corpus vector;

converting the initial corpus vector into a high-dimensional corpus vector;

training a local sentence translation model according to the high-dimensional corpus vector, and acquiring model parameters corresponding to the trained local sentence translation model;

the model parameters are sent to a central server, so that the central server performs federal aggregation on the received model parameters, updates a server statement translation model based on the aggregated model parameters, and feeds back model gradient values and model convergence information based on the updated server statement translation model;

Updating the local sentence translation model according to the model gradient value to obtain an updated local sentence translation model, and taking the updated local sentence translation model as a sentence translation model based on federal learning when the model convergence information is model convergence.

Preferably, the model training corpus data comprises source sentences and target sentences corresponding to the source sentences;

the step of converting the initial corpus vector into a high-dimensional corpus vector includes:

based on a self-attention mechanism, the initial corpus vector is converted into a high-dimensional corpus vector through a preset coder and decoder, and the high-dimensional corpus vector is used for representing the semantic relation between the source sentence and the target sentence.

Preferably, after the step of sending the model parameters to a central server to enable the central server to perform federal aggregation on the received model parameters, updating a server sentence translation model based on the aggregated model parameters, and feeding back a model gradient value and model convergence information based on the updated server sentence translation model, the method further includes:

updating the local sentence translation model according to the model gradient value to obtain an updated local sentence translation model;

And returning to the step of training the local sentence translation model according to the high-dimensional corpus vector and obtaining model parameters corresponding to the trained local sentence translation model when the model convergence information indicates that the model is not converged.

Preferably, the step of sending the model parameters to a central server to enable the central server to perform federal aggregation on the received model parameters, update a server sentence translation model based on the aggregated model parameters, and feed back a model gradient value and model convergence information based on the updated server sentence translation model includes:

and sending the model parameters to a central server, so that the central server performs federal aggregation on the received model parameters based on a dynamic restarting mechanism, performs gradient update on the server statement translation model based on the aggregated model parameters, feeds back a model gradient value, detects whether a loss value of the server statement translation model after gradient update meets a preset convergence condition, and feeds back model convergence information.

Preferably, the step of performing federal aggregation on the received model parameters by the central server based on a dynamic restart mechanism, performing gradient update on a server statement translation model based on the aggregated model parameters, and feeding back a model gradient value includes:

The central server determines corresponding parameter value variation, momentum variable value variation and corpus data amount according to model parameters sent by each partner participating in federal modeling;

the central server determines the variation of the model parameters through a first preset formula according to the variation of the parameter values, the variation of the momentum variable values and the corpus data quantity;

the first preset formula is as follows:

in the formula, the parameter value variation Parameter values representing model parameters at the t-th iteration, and the change amount delta omega=omega of the momentum variable values ^(t) -ω ^(t-1) ，ω ^(t) The value of the momentum variable at the t-th iteration is represented, si represents the corpus data amount of the i-th federal modeling partner modeling corpus;

and the central server performs federal aggregation on the variable quantity of the model parameters to obtain aggregated model parameters, performs gradient update on a server statement translation model based on the aggregated model parameters, and feeds back a model gradient value.

Preferably, the step of performing federal aggregation on the variable quantity of the model parameter by the central server to obtain an aggregated model parameter, performing gradient update on a server statement translation model based on the aggregated model parameter, and feeding back a model gradient value includes:

The central server performs federal aggregation on the variation of the model parameters through a second preset formula to obtain aggregated model parameters, and performs gradient update on a server statement translation model based on the aggregated model parameters and feeds back a model gradient value;

wherein, the second preset formula is:

in the method, in the process of the invention,representing the model parameters aggregated at the t-th iteration.

In addition, in order to achieve the above object, the present invention also provides a sentence translating apparatus comprising:

the word segmentation module is used for carrying out word segmentation processing on the sentence to be translated to obtain a word sequence to be translated;

the vectorization module is used for vectorizing the vocabulary sequence to be translated to obtain an initial sentence vector;

the translation module is used for inputting the initial sentence vector into a sentence translation model based on federal learning and obtaining a model output result;

the translation module is further used for determining target words corresponding to each word to be translated in the sentence to be translated and output probabilities corresponding to the target words according to the model output result;

the translation module is further configured to determine a sentence translation result corresponding to the sentence to be translated according to the target vocabulary and the output probability corresponding to the target vocabulary.

In addition, to achieve the above object, the present invention also proposes a sentence translating apparatus, the apparatus comprising: a memory, a processor, and a sentence translator stored on the memory and executable on the processor, the sentence translator configured to implement the steps of the sentence translation method as described above.

In addition, in order to achieve the above object, the present invention also proposes a storage medium having stored thereon a sentence translation program which, when executed by a processor, implements the steps of the sentence translation method as described above.

The invention obtains a vocabulary sequence to be translated by carrying out word segmentation processing on the sentence to be translated; vectorizing a vocabulary sequence to be translated to obtain an initial sentence vector; inputting the initial sentence vector into a sentence translation model based on federation learning, and obtaining a model output result; determining target words corresponding to each word to be translated in the sentence to be translated and output probabilities corresponding to the target words according to the model output results; and determining sentence translation results corresponding to the sentences to be translated according to the target vocabulary and the output probabilities corresponding to the target vocabulary. Because the sentence to be translated is converted into the sentence vector, then the sentence vector is input into the sentence translation model based on federal learning, and the sentence translation result is determined according to each target vocabulary in the model output result and the output probability thereof, compared with the conventional sentence translation method using the translation model trained by less corpus data for sentence translation, the sentence translation method based on federal learning has better translation effect and higher accuracy.

Drawings

FIG. 1 is a schematic diagram of a statement translation device of a hardware runtime environment according to an embodiment of the present invention;

FIG. 2 is a flowchart of a first embodiment of a sentence translation method according to the present invention;

FIG. 3 is a flowchart of a sentence translation method according to a second embodiment of the present invention;

fig. 4 is a block diagram showing the construction of a first embodiment of the sentence translating apparatus of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of a statement translation device of a hardware running environment according to an embodiment of the present invention.

As shown in fig. 1, the sentence translating apparatus may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The memory 1005 may be a high-speed Random access memory (Random AccessMemory, RAM) memory or a stable nonvolatile memory (Non-VolatileMemory, NVM), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the structure shown in fig. 1 is not limiting of the sentence translation device and may include more or fewer components than shown, or certain components in combination, or a different arrangement of components.

As shown in fig. 1, an operating system, a data storage module, a network communication module, a user interface module, and a sentence translation program may be included in the memory 1005 as one type of storage medium.

In the sentence translating device shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the sentence translating apparatus of the present invention may be provided in a sentence translating apparatus that calls a sentence translating program stored in the memory 1005 through the processor 1001 and executes the sentence translating method provided by the embodiment of the present invention.

An embodiment of the present invention provides a sentence translation method, referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the sentence translation method of the present invention.

In this embodiment, the sentence translation method includes the following steps:

step S10: word segmentation processing is carried out on the sentences to be translated, and a vocabulary sequence to be translated is obtained;

It should be noted that, the execution body of the method of the present embodiment may be a computing service device having data processing, program running and network communication functions, for example, a smart phone, a tablet computer, a personal computer, etc. (hereinafter referred to as a translation device). The statement to be translated is the source statement to be translated.

It should be understood that the word segmentation refers to word segmentation of the sentence to be translated, that is, a text sequence is converted into a continuous word sequence, for example, after the sentence to be translated "While there is life, the text isope" is subjected to word segmentation, and then the word sequence to be translated "While/ther/is/life/ther/is/hope" is obtained by sorting according to the sequence of appearance of words.

Step S20: vectorizing the vocabulary sequence to be translated to obtain an initial sentence vector;

it should be understood that vectorization, that is, converting symbol information of natural language into digital information in the form of vectors, for example, initial sentence vectors corresponding to the word sequence "While/thene/is/life/thene/is/hope" to be translated after vectorization are:

{While:(1，1)，there:(2，2)，is:(3，2)，life:(4，1)，there:(5，2)，is:(6，2)，hope:(7，1)}

the numerals in brackets represent the number and the number of occurrences of the word, i.e., (number ), respectively. Thus, the lexical sequence to be translated, "While/thene/is/life/thene/is/hope," is converted into a digital vector in the form of { (1, 1), (2, 2), (3, 2), (4, 1), (5, 2), (6, 2), (7, 1) … … }.

In a specific implementation, after a sequence of words to be translated is obtained, translation equipment firstly obtains a word sequence corresponding to each word to be translated in the sequence of words to be translated, then determines a sequence number/a serial number corresponding to each word to be translated according to the word sequence, counts the occurrence times corresponding to each word to be translated, and then vectorizes the sequence of words to be translated according to the sequence number/the serial number and the occurrence times corresponding to each word to be translated to obtain an initial sentence vector.

Step S30: inputting the initial sentence vector into a sentence translation model based on federation learning, and obtaining a model output result;

in this embodiment, the network model for translating the initial sentence vector is a neural network model for translating sentences, which is trained based on the federal learning mode.

It should be appreciated that federal learning, also known as joint learning, alliance learning, is a machine learning framework. The model training method can effectively help a plurality of model training parties to complete the construction and training of the model without revealing own data under the condition of meeting the requirements of user privacy protection, data security and government regulations. According to the embodiment, model construction and training are carried out in a federal learning mode, corpus data owned by a plurality of different translation model providers can be effectively utilized, and the problems that an existing mode is only used for training according to corpus data which is smaller in own owned volume and single in dimension when translation model training is carried out, so that the trained translation model is unsatisfactory in translation effect and low in translation accuracy are avoided.

In a specific implementation, after the translation device obtains the initial sentence vector, the initial sentence vector is input into a sentence translation model trained in advance based on federation learning, and a model output result is obtained.

Step S40: determining target words corresponding to each word to be translated in the sentence to be translated and output probabilities corresponding to the target words according to the model output results;

in this embodiment, since the sentence translation model based on federal learning is trained according to the deep neural network, the output result is usually expressed in the form of [ "vocal" 0.8], where "vocal" is a target vocabulary and "0.8" is the output probability corresponding to the target vocabulary. For example, in the sentence "While there is life, the term" while "has a plurality of translation modes: { during …, when …, at the same time as …, (comparing two things) …, …, however }, the translation mode is closer to the true meaning of the sentence, the output probability of the target vocabulary corresponding to the mode is greater, and the final translation result is more accurate.

In a specific implementation, after the translation device obtains the model output result, the target vocabulary corresponding to each vocabulary to be translated in the sentence to be translated and the output probability corresponding to the target vocabulary can be determined according to the model output result.

Step S50: and determining a sentence translation result corresponding to the sentence to be translated according to the target vocabulary and the output probability corresponding to the target vocabulary.

It should be understood that, when the target vocabulary corresponding to the vocabulary to be translated and the output probability corresponding to the target vocabulary are determined, the translation device may determine the final target vocabulary corresponding to each vocabulary to be translated according to the magnitude of the output probability. For example, the output probabilities of the target vocabulary corresponding to the vocabulary "while" to be translated and the target vocabulary are { (during …, 0.1), (when …, 0.5), (at the same time as …, 0.3), (whereas, 0.05), (however, 0.05) }, respectively, and the final target vocabulary corresponding to the vocabulary "while" to be translated is "when …" can be determined according to the magnitude of the output probabilities.

In a specific implementation, after obtaining each target vocabulary and the corresponding output probability thereof, the translation device can determine a final target vocabulary corresponding to each vocabulary to be translated according to the output probability, and then combine the final target vocabularies to obtain the sentence translation result corresponding to the sentence to be translated. For example, the sentence translation result corresponding to the sentence "While there is life, the sentence is hope" is: when living (when present), is promising.

According to the embodiment, word segmentation processing is carried out on the sentence to be translated, so that a vocabulary sequence to be translated is obtained; vectorizing a vocabulary sequence to be translated to obtain an initial sentence vector; inputting the initial sentence vector into a sentence translation model based on federation learning, and obtaining a model output result; determining target words corresponding to each word to be translated in the sentence to be translated and output probabilities corresponding to the target words according to the model output results; and determining sentence translation results corresponding to the sentences to be translated according to the target vocabulary and the output probabilities corresponding to the target vocabulary. Because the sentence to be translated is converted into the sentence vector, then the sentence vector is input into the sentence translation model based on federal learning, and then the sentence translation result is determined according to each target vocabulary and the output probability thereof in the model output result, compared with the conventional sentence translation method using a translation model trained by less corpus data, the sentence translation method based on federal learning has better translation effect and higher accuracy.

Referring to fig. 3, fig. 3 is a flowchart illustrating a sentence translation method according to a second embodiment of the present invention.

Based on the first embodiment, in this embodiment, before step S10, the method includes:

Step S01: obtaining model training corpus data, vectorizing the model training corpus data, and obtaining an initial corpus vector;

it should be noted that, in this embodiment, the translation device performs model training based on federal learning as one of the co-modelers of the joint modeling. Assuming that N collaborative modeling parties carry out joint modeling, the N collaborative modeling parties respectively have different model training corpus data Ci (i is more than or equal to 1 and less than or equal to N), and the model training corpus data of the collaborative modeling parties do not exist in a local database in the joint modeling process, so that original data are not directly shared to other collaborative modeling parties, and the safety and privacy of corpus data are effectively ensured.

In this step, the model training corpus data includes a source Sentence and a target Sentence corresponding to the source Sentence, where the source Sentence sequence-source may be a Sentence to be translated, and the target Sentence sequence-target may be a correct translation Sentence corresponding to the Sentence to be translated.

In a specific implementation, the translation device may read the model training corpus data from the local database, and then vectorize the model training corpus data to obtain an initial corpus vector. The vectorization manner of the corpus data in this embodiment is referred to the above first embodiment, and will not be described herein.

Step S02: converting the initial corpus vector into a high-dimensional corpus vector;

it should be understood that, in order to ensure the accuracy of the trained translation model, the embodiment preferably converts the low-dimensional initial corpus vector into the high-dimensional corpus vector, so that the translation model can train a more accurate translation model by combining data (such as semantics, context, and the like) of different dimensions when performing corpus translation.

In particular implementations, the translation device may convert the initial corpus vector to a high-dimensional corpus vector through a Support Vector Machine (SVM).

Further, important features of sparse data that can be fast in a self-attention mechanism are considered, so that the translation effect of the translation model is improved. As an embodiment, the step S02 may include: based on a self-attention mechanism, the initial corpus vector is converted into a high-dimensional corpus vector through a preset coder and decoder, and the high-dimensional corpus vector is used for representing the semantic relation between the source sentence and the target sentence.

It should be noted that, the preset codec includes an encoder and a decoder, where the encoder is configured to encode a source sentence vector in the initial corpus vector to obtain a source sentence encoded vector, and the decoder is configured to decode the source sentence encoded vector and the target sentence vector to obtain a decoded vector, and then the translation agency converts the decoded vector into a high-dimensional corpus vector based on a self-attention mechanism.

Step S03: training a local sentence translation model according to the high-dimensional corpus vector, and acquiring model parameters corresponding to the trained local sentence translation model;

it should be noted that, in the federal learning scenario, the partner needs to complete model training and construction through a central server. At the initial time of model training, the central server and each co-modeler have the same original model, and each party performs federal modeling and training based on the original model. In the scheme, the original model in the central server is called a server statement translation model, and the original model in the co-modeling party is called a local statement translation model.

In a specific implementation, the translation equipment trains a local sentence translation model according to the high-dimensional corpus vector, and then reads model parameters corresponding to the trained sentence translation model after training is completed.

Step S04: the model parameters are sent to a central server, so that the central server performs federal aggregation on the received model parameters, updates a server statement translation model based on the aggregated model parameters, and feeds back model gradient values and model convergence information based on the updated server statement translation model;

It should be understood that, because model training corpus data owned by different modeling partners are different, high-dimensional corpus vectors used to train the local sentence translation model are also different, in order to make the finally trained model conform to the requirements of each partner modeling partner, in this embodiment, the central server performs federal aggregation on model parameters sent by each partner modeling partner, then updates the server sentence translation model based on the aggregated model parameters, for example, updates the server sentence translation model f (x) to f (x)' through the aggregated model parameters, and then feeds back the model gradient value of the updated server sentence translation model and model convergence information to each partner according to the updated server sentence translation model.

It should be noted that, the model convergence information is whether the updated server statement translation model converges, and whether the model converges is generally determined by a loss value of the model, so after the central server obtains the updated server statement translation model, whether the model converges is further determined according to the loss value corresponding to the model.

Further, in order to ensure accuracy of model parameter aggregation, in this embodiment, the central server may perform federal aggregation on model parameters based on a dynamic restart mechanism.

Specifically, the translation device may send the model parameters to a central server, so that the central server performs federal aggregation on the received model parameters based on a dynamic restart mechanism, performs gradient update on a server sentence translation model based on the aggregated model parameters, feeds back a model gradient value, and detects whether a loss value of the server sentence translation model after gradient update meets a preset convergence condition and feeds back model convergence information.

In this embodiment, the central server may determine, according to the model parameters sent by the partners participating in federal modeling, a corresponding parameter value variation, a momentum variable value variation, and a corpus data amount, and then determine, according to the parameter value variation, the momentum variable value variation, and the corpus data amount, a variation of the model parameters according to a first preset formula;

the first preset formula is as follows:

in the formula, the parameter value variation Parameter values representing model parameters at the t-th iteration, and the change amount delta omega=omega of the momentum variable values ^(t) -ω ^(t-1) ，ω ^(t) And (3) representing the value of the momentum variable at the t-th iteration, and Si represents the corpus data amount of the i-th federal modeling collaboration modeling corpus.

Further, after the central server obtains the variable quantity of the model parameters, federally aggregating the variable quantity of the model parameters through a second preset formula to obtain aggregated model parameters, and carrying out gradient update on a server statement translation model based on the aggregated model parameters and feeding back a model gradient value;

Wherein, the second preset formula is:

In this embodiment, the central server may calculate the model loss value according to a third preset formula during the t-th iteration based on the aggregated model parameters;

wherein, the third preset formula is:

in a specific implementation, the central server can judge whether the model is converged according to the loss value of the server statement translation model updated each time, and feedback the model convergence information.

Step S05: updating the local sentence translation model according to the model gradient value to obtain an updated local sentence translation model, and taking the updated local sentence translation model as a sentence translation model based on federal learning when the model convergence information is model convergence.

In a specific implementation, after receiving a model gradient value fed back by a central server, a translation device updates a local sentence model according to the model gradient value to obtain an updated local sentence translation model, and when model convergence information is model convergence, the updated local sentence translation model is used as a sentence translation model based on federal learning. Correspondingly, when the model convergence information is that the model is not converged, the translation equipment returns to the step of training the local sentence translation model according to the high-dimensional corpus vector and obtaining model parameters corresponding to the trained local sentence translation model. I.e. steps S02-S04 are performed in a loop until the model converges. Specifically, the translation equipment updates the local sentence translation model according to the model gradient value to obtain an updated local sentence translation model; and returning to the step of training the local sentence translation model according to the high-dimensional corpus vector and obtaining model parameters corresponding to the trained local sentence translation model when the model convergence information indicates that the model is not converged.

According to the embodiment, the initial corpus vector is obtained by obtaining model training corpus data and vectorizing the model training corpus data; converting the initial corpus vector into a high-dimensional corpus vector; training the local sentence translation model according to the high-dimensional corpus vector, and obtaining model parameters corresponding to the trained local sentence translation model; the model parameters are sent to a central server, so that the central server performs federal aggregation on the received model parameters, updates a server statement translation model based on the aggregated model parameters, and feeds back model gradient values and model convergence information based on the updated server statement translation model; the local sentence translation model is updated according to the model gradient value, the updated local sentence translation model is obtained, and when model convergence information is model convergence, the updated local sentence translation model is used as a sentence translation model based on federal learning.

In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium stores a sentence translation program, and the sentence translation program realizes the steps of the sentence translation method when being executed by a processor.

Referring to fig. 4, fig. 4 is a block diagram showing the construction of a first embodiment of the sentence translating apparatus of the present invention.

As shown in fig. 4, the sentence translating device according to the embodiment of the present invention includes:

the word segmentation module 401 is configured to perform word segmentation on a sentence to be translated to obtain a vocabulary sequence to be translated;

a vectorization module 402, configured to vectorize the vocabulary sequence to be translated to obtain an initial sentence vector;

a translation module 403, configured to input the initial sentence vector into a sentence translation model based on federal learning, and obtain a model output result;

the translation module 403 is further configured to determine, according to the model output result, a target vocabulary corresponding to each vocabulary to be translated in the sentence to be translated and an output probability corresponding to the target vocabulary;

the translation module 403 is further configured to determine a sentence translation result corresponding to the sentence to be translated according to the target vocabulary and an output probability corresponding to the target vocabulary.

Based on the above-described first embodiment of the sentence translating apparatus of the present invention, a second embodiment of the sentence translating apparatus of the present invention is presented.

In this embodiment, the sentence translating apparatus further includes: the model training module is used for converting the initial corpus vector into a high-dimensional corpus vector; training a local sentence translation model according to the high-dimensional corpus vector, and acquiring model parameters corresponding to the trained local sentence translation model; the model parameters are sent to a central server, so that the central server performs federal aggregation on the received model parameters, updates a server statement translation model based on the aggregated model parameters, and feeds back model gradient values and model convergence information based on the updated server statement translation model; updating the local sentence translation model according to the model gradient value to obtain an updated local sentence translation model, and taking the updated local sentence translation model as a sentence translation model based on federal learning when the model convergence information is model convergence.

Further, the model training module is further configured to convert, based on a self-attention mechanism, the initial corpus vector into a high-dimensional corpus vector through a preset codec, where the high-dimensional corpus vector is used to characterize a semantic relationship between the source sentence and the target sentence.

Further, the model training module is further configured to update the local sentence translation model according to the model gradient value, so as to obtain an updated local sentence translation model; and when the model convergence information indicates that the model is not converged, executing the operation of training the local sentence translation model according to the high-dimensional corpus vector and obtaining model parameters corresponding to the trained local sentence translation model.

Further, the model training module is further configured to send the model parameters to a central server, so that the central server performs federal aggregation on the received model parameters based on a dynamic restarting mechanism, performs gradient update on a server sentence translation model based on the aggregated model parameters, feeds back a model gradient value, and detects whether a loss value of the server sentence translation model after gradient update meets a preset convergence condition and feeds back model convergence information.

Further, in this embodiment, the central server determines, according to model parameters sent by each partner participating in federal modeling, a corresponding parameter value variation, a momentum variable value variation, and a corpus data amount; determining the variable quantity of the model parameters through a first preset formula according to the variable quantity of the parameter values, the variable quantity of the momentum variable values and the corpus data quantity; federal aggregation is carried out on the variable quantity of the model parameters to obtain aggregated model parameters, gradient update is carried out on a server statement translation model based on the aggregated model parameters, and a model gradient value is fed back;

The first preset formula is as follows:

Further, in this embodiment, the central server performs federal aggregation on the variable quantity of the model parameters through a second preset formula to obtain aggregated model parameters, and performs gradient update on the server statement translation model based on the aggregated model parameters and feeds back a model gradient value;

wherein, the second preset formula is:

in the method, in the process of the invention,indicated at the t-th iterationModel parameters aggregated over time.

Other embodiments or specific implementation manners of the sentence translating apparatus of the present invention may refer to the above method embodiments, and are not described herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read-only memory/random-access memory, magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A sentence translation method, characterized in that the sentence translation method comprises the steps of:

determining sentence translation results corresponding to the sentences to be translated according to the target vocabulary and the output probabilities corresponding to the target vocabulary;

before the step of obtaining the vocabulary sequence to be translated, the method further comprises the following steps:

converting the initial corpus vector into a high-dimensional corpus vector;

updating the local sentence translation model according to the model gradient value to obtain an updated local sentence translation model, and taking the updated local sentence translation model as a sentence translation model based on federal learning when the model convergence information is model convergence;

the step of sending the model parameters to a central server so that the central server performs federal aggregation on the received model parameters, updating a server statement translation model based on the aggregated model parameters, and feeding back a model gradient value and model convergence information based on the updated server statement translation model comprises the following steps:

the model parameters are sent to a central server, so that the central server performs federal aggregation on the received model parameters based on a dynamic restarting mechanism, gradient updating is performed on a server statement translation model based on the aggregated model parameters, a model gradient value is fed back, whether a loss value of the server statement translation model after gradient updating meets a preset convergence condition is detected, and model convergence information is fed back;

The step that the central server performs federal aggregation on the received model parameters based on a dynamic restarting mechanism, performs gradient update on a server statement translation model based on the aggregated model parameters and feeds back a model gradient value comprises the following steps:

the first preset formula is as follows:

in the formula, the parameter value variation，/>Parameter value representing model parameter at t-th iteration, momentum variable value variation +.>，/>The value of the momentum variable at the t-th iteration is represented, si represents the corpus data amount of the i-th federal modeling partner modeling corpus;

2. The sentence translation method according to claim 1, wherein the model training corpus data includes a source sentence and a target sentence corresponding to the source sentence;

3. The sentence translation method according to claim 1, wherein after the step of sending the model parameters to a central server to cause the central server to federally aggregate the received model parameters, update a server sentence translation model based on the aggregated model parameters, and feedback model gradient values and model convergence information based on the updated server sentence translation model, the method further comprises:

4. The sentence translation method according to claim 1, wherein the step of the central server federally aggregating the variation of the model parameters to obtain aggregated model parameters, gradient updating the server sentence translation model based on the aggregated model parameters, and feeding back model gradient values, comprises:

wherein, the second preset formula is:

5. A sentence translating apparatus, comprising:

the translation module is further used for determining sentence translation results corresponding to the sentences to be translated according to the target vocabulary and the output probabilities corresponding to the target vocabulary;

the model training module is used for obtaining model training corpus data, vectorizing the model training corpus data and obtaining an initial corpus vector; converting the initial corpus vector into a high-dimensional corpus vector; training a local sentence translation model according to the high-dimensional corpus vector, and acquiring model parameters corresponding to the trained local sentence translation model; the model parameters are sent to a central server, so that the central server performs federal aggregation on the received model parameters, updates a server statement translation model based on the aggregated model parameters, and feeds back model gradient values and model convergence information based on the updated server statement translation model; updating the local sentence translation model according to the model gradient value to obtain an updated local sentence translation model, and taking the updated local sentence translation model as a sentence translation model based on federal learning when the model convergence information is model convergence;

The model training module is further configured to send the model parameters to a central server, so that the central server performs federal aggregation on the received model parameters based on a dynamic restarting mechanism, performs gradient update on a server sentence translation model based on the aggregated model parameters, feeds back a model gradient value, and detects whether a loss value of the server sentence translation model after gradient update meets a preset convergence condition and feeds back model convergence information;

the model training module is also used for determining corresponding parameter value variation, momentum variable value variation and corpus data quantity according to model parameters sent by each partner participating in federal modeling by the central server; the central server determines the variation of the model parameters through a first preset formula according to the variation of the parameter values, the variation of the momentum variable values and the corpus data quantity;

the first preset formula is as follows:

6. A sentence translating apparatus, characterized in that the apparatus comprises: a memory, a processor, and a statement translation program stored on the memory and executable on the processor, the statement translation program configured to implement the steps of the statement translation method of any one of claims 1 to 4.

7. A storage medium having stored thereon a sentence translation program which, when executed by a processor, implements the steps of the sentence translation method according to any one of claims 1 to 4.