WO2019214365A1 - 翻译模型训练的方法、语句翻译的方法、设备及存储介质 - Google Patents
翻译模型训练的方法、语句翻译的方法、设备及存储介质 Download PDFInfo
- Publication number
- WO2019214365A1 WO2019214365A1 PCT/CN2019/080411 CN2019080411W WO2019214365A1 WO 2019214365 A1 WO2019214365 A1 WO 2019214365A1 CN 2019080411 W CN2019080411 W CN 2019080411W WO 2019214365 A1 WO2019214365 A1 WO 2019214365A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- training
- sample
- perturbed
- samples
- translation
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/44—Statistical methods, e.g. probability models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
- G06F9/30196—Instruction operation extension or modification using decoder, e.g. decoder per instruction set, adaptable or programmable decoders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the embodiments of the present invention relate to the field of computer technologies, and in particular, to a method for training a translation model, a method for translation of a statement, a device, and a storage medium.
- machine translation has been widely used, such as sound interpretation and translation of content, etc., based on machine translation to convert one input language into another.
- Neural machine translation is a neural network-based machine translation model, which has achieved a good translation level in many language pairs and has been widely used in various machine translation products.
- the neural machine translation model is based on a complete neural network, the global nature of its modeling results in each output of the target being dependent on each word input at the source, making it overly sensitive to small perturbations in the input.
- the embodiment of the present application provides a method for training a translation model, a method for translation of a sentence, a device, and a storage medium, which can improve the robustness of machine translation and the quality of translation.
- a first aspect of the embodiments of the present application provides a method for training a translation model, including:
- the computer device acquires a training sample set, where the training sample set includes a plurality of training samples
- the computer device determines a perturbed sample set corresponding to each training sample in the training sample set, the perturbed sample set includes at least one perturbation sample, and the semantic similarity between the perturbed sample and the corresponding training sample is higher than a first preset value ;
- the computer device trains the initial translation model using the plurality of training samples and the respective perturbed sample sets corresponding to each of the training samples to obtain a target translation model.
- a second aspect of the embodiments of the present application provides a method for statement translation, including:
- the terminal device translates the first statement to be translated using a target translation model to obtain a translation result statement expressed in a second language, wherein the target translation model is to use a plurality of training samples and each of the plurality of training samples Each of the training samples is trained by the corresponding set of perturbed samples, the perturbed sample set includes at least one perturbed sample, and the semantic similarity between the perturbed samples and the corresponding training samples is higher than the first preset value;
- the terminal device outputs the translation result sentence expressed in the second language.
- a third aspect of the embodiments of the present application provides an apparatus for training a translation model, including one or more processors, and one or more memories storing program units, wherein the program units are executed by a processor, and the program units include:
- An obtaining unit configured to acquire a training sample set, where the training sample set includes a plurality of training samples
- a determining unit configured to determine a perturbed sample set corresponding to each training sample in the training sample set acquired by the acquiring unit, the perturbed sample set includes at least one perturbation sample, and the perturbed sample and the corresponding training sample
- the semantic similarity is higher than the first preset value
- a model training unit configured to train the initial translation model by using the plurality of training samples obtained by the acquiring unit and the perturbed sample set corresponding to each of the training samples determined by the determining unit to obtain a target translation model.
- a fourth aspect of the embodiments of the present application provides an apparatus for statement translation, including one or more processors, and one or more memories storing program units, wherein the program units are executed by a processor, and the program units include:
- a receiving unit configured to receive the first to-be-translated statement expressed in the first language
- a translation unit configured to translate the first statement to be translated received by the receiving unit using a target translation model to obtain a translation result statement expressed in a second language, wherein the target translation model is to use multiple training And the sample and the perturbed sample set corresponding to each of the plurality of training samples are trained, the perturbed sample set includes at least one perturbed sample, and the semantic similarity between the perturbed sample and the corresponding training sample is higher than the first default value;
- an output unit configured to output a translation result statement expressed in the second language translated by the translation unit.
- a fifth aspect of the embodiments of the present application provides a computer device, including: an input/output (I/O) interface, a processor, and a memory, where the program stores program instructions;
- I/O input/output
- processor processor
- memory where the program stores program instructions
- the processor is operative to execute program instructions stored in a memory, the method of the first aspect.
- a sixth aspect of the embodiments of the present application provides a terminal device, where the terminal device includes: an input/output (I/O) interface, a processor, and a memory, where the memory instruction is stored in the memory;
- I/O input/output
- the processor is operative to execute program instructions stored in a memory and to perform the method of the second aspect.
- a seventh aspect of the present application provides a non-transitory computer readable storage medium, comprising instructions, when executed on a computer device, causing the computer device to perform the method or the second method as described in the first aspect above The method described in the aspects.
- Yet another aspect of an embodiment of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect or the second aspect described above.
- the perturbed sample is used in the training of the translation model, and the semantic similarity between the perturbed sample and the training sample is higher than the first preset value, that is, the semantics of the perturbed sample and the training sample are very similar, so that the trained target translation is performed.
- the model can also translate correctly when it receives a statement with noise. Thereby improving the robustness of machine translation and the quality of translation.
- FIG. 1 is a schematic diagram of an embodiment of a system for training a translation model in an embodiment of the present application
- FIG. 2 is a schematic diagram of an embodiment of a method for training a translation model in an embodiment of the present application
- FIG. 3 is a schematic structural diagram of an initial translation model in an embodiment of the present application.
- FIG. 4 is a schematic diagram of an embodiment of a method for statement translation in an embodiment of the present application.
- FIG. 5 is a schematic diagram of an application scenario of statement translation in the embodiment of the present application.
- FIG. 6 is a schematic diagram of another application scenario of statement translation in the embodiment of the present application.
- FIG. 7 is a schematic diagram of another application scenario of statement translation in the embodiment of the present application.
- FIG. 8 is a schematic diagram of another application scenario of statement translation in the embodiment of the present application.
- FIG. 9 is a schematic diagram of an embodiment of an apparatus for training a translation model in an embodiment of the present application.
- FIG. 10 is a schematic diagram of an embodiment of an apparatus for translating sentences in an embodiment of the present application.
- FIG. 11 is a schematic diagram of an embodiment of a computer device in an embodiment of the present application.
- FIG. 12 is a schematic diagram of an embodiment of a terminal device in an embodiment of the present application.
- the embodiment of the present application provides a method for training a translation model, which can improve the robustness of machine translation and the quality of translation.
- the embodiment of the present application also provides a corresponding method for sentence translation, a computer device, a terminal device, and a computer readable storage medium. The details are described below separately.
- machine translation is used in both simultaneous interpretation and text translation.
- Machine translation is usually model-based translation, that is, by pre-training the translation model, the trained translation model can receive a statement in one language and then convert the statement into another language output.
- neuromachine translation is a machine translation model based entirely on neural networks. The accuracy of translation is high, but the anti-noise ability of the model is not good. Once there is a slight disturbance in the input sentence, the output statement will be inaccurate. Therefore, the embodiment of the present application provides a method for training a translation model, and various perturbation samples are introduced in the training sample during the training of the translation model, thereby ensuring that the trained translation model receives the speech with the disturbance. Translation can also be done correctly.
- the disturbance includes noise.
- FIG. 1 is a schematic diagram of an embodiment of a system for training a translation model in an embodiment of the present application.
- an embodiment of a system for translation model training in an embodiment of the present application includes a computer device 10 and a database 20 in which training samples are stored.
- the computer device 10 acquires a training sample set from the database 20, and then uses the training sample set to perform translation model training to obtain a target translation model.
- FIG. 2 is a schematic diagram of an embodiment of a method for training a model in the embodiment of the present application.
- an embodiment of a method for training a translation model provided by an embodiment of the present application includes:
- the computer device acquires a training sample set, where the training sample set includes multiple training samples.
- the training sample in the training sample set refers to the sample without the disturbance.
- the computer device determines a perturbed sample set corresponding to each training sample in the training sample set, where the perturbed sample set includes at least one perturbation sample, and the semantic similarity between the perturbed sample and the corresponding training sample is higher than the first pre-preparation Set the value.
- the disturbance sample refers to a sample containing disturbance information or noise, but the similarity between the semantics and the training sample is basically the same, and the disturbance information may be a word with the same meaning but different words, or may be other situations. A word that makes the semantics of the statement not change much.
- the first preset value in the embodiment of the present application may be a specific value, such as: 90% or 95%, etc., which is merely an example, and does not limit the value of the first preset value, the first preset value. Can be set according to needs.
- the computer device trains the initial translation model by using the plurality of training samples and the perturbed sample set corresponding to each of the training samples to obtain a target translation model.
- the training samples are trained together with the corresponding perturbed samples.
- the perturbed sample is used in the training of the translation model, and the semantic similarity between the perturbed sample and the training sample is higher than the first preset value, that is, the semantics of the perturbed sample and the training sample are very similar, so that the trained target translation is performed.
- the model can also translate correctly when it receives a statement with noise. Thereby improving the robustness of machine translation and the quality of translation.
- Each training sample is a training sample pair, and the training sample pair includes a training input sample and a training output sample;
- the determining, by the computer device, the set of perturbation samples corresponding to each training sample may include:
- the computer device uses the plurality of training samples and the perturbed sample set corresponding to each of the training samples to train the initial translation model to obtain the target translation model, which may include:
- the initial translation model is trained using a plurality of training sample pairs and a perturbed input sample set corresponding to each of the training input samples, and a perturbed output sample corresponding to the perturbed output sample set to obtain a target translation model.
- the training input sample is a first language
- the training output sample is a second language.
- the first language is different from the second language.
- the first language is exemplified in Chinese
- the second language is exemplified in English.
- Chinese and English should not be construed as limiting the translation model in the embodiments of the present application.
- the translation model in the embodiment of the present application can be applied to translation between any two different languages. Translation between the two languages can be achieved as long as the training samples in the corresponding two languages are used during training.
- each training input sample may have multiple perturbed input samples, but the perturbed input samples corresponding to each perturbed input sample are the same as the training output samples.
- Table 1 is only an example.
- the perturbed input samples corresponding to the training input samples may be less than those listed in Table 1 or more than those listed in Table 1.
- the perturbation input samples are described above.
- the generation of the perturbed input samples is described below.
- a method of generating a perturbed input sample can be:
- the determining the set of the perturbed input samples corresponding to each of the training input samples in the training sample set may include:
- a perturbed sentence is generated from a vocabulary level, an input sentence is given, then the first word to be modified is sampled, the position of the first word is determined, and then the first words of the positions are determined. Replace with the second word in the word list.
- the word list will contain many words, and the choice of the second word can be understood by referring to the following formula.
- E[x i ] is the word vector of the first word x i
- cos(E[x i ], E[x]) measures the similarity between the first word x i and the second word x. Since the word vector can capture the semantic information of the word, by this alternative, the first word x i in the current sentence can be better replaced with the second word x having similar semantic information.
- Another type of perturbed input sample can be generated as follows:
- the determining the set of the perturbed input samples corresponding to each of the training input samples in the training sample set may include:
- a different Gaussian noise vector is superimposed on the word vector of each word to obtain the perturbed sample set.
- a sentence with a disturbance is generated from a feature level. Given a statement, you can get the vector of each word in the statement, and then add Gaussian noise to the word vector of each word to simulate the possible types of disturbance. You can understand it by referring to the following formula:
- E[x i ] identifies the word vector of the word x i
- E[x′ i ] is the word vector of the word after adding Gaussian noise
- the vector ⁇ is sampled from the Gaussian noise with variance ⁇ 2 , ⁇ Is a hyperparameter.
- This technical solution is a general solution that can freely define any strategy for adding a disturbance input.
- FIG. 3 is a schematic structural diagram of an initial translation model in the embodiment of the present application.
- the initial translation model provided by the embodiment of the present application includes an encoder, a classifier, and a decoder.
- the encoder is configured to receive the training input sample and the corresponding perturbed input sample, and output a first intermediate representation result and a second intermediate representation result, wherein the first intermediate representation result is an intermediate representation result of the training input sample, and the second intermediate representation result is corresponding Disturbs the middle of the input sample to represent the result.
- the classifier is for distinguishing between the first intermediate representation result and the second intermediate representation result.
- the decoder is configured to output a training output sample according to the first intermediate representation result, and output the training output sample according to the second intermediate representation result.
- the model objective function of the initial translation model includes a classification objective function associated with the classifier and the encoder, a training objective function and a perturbation objective function associated with the encoder and the decoder;
- the classification objective function includes the training input sample, the corresponding disturbance input sample, parameters of the encoder, and parameters of the classifier;
- the training objective function includes the training input sample, the training output sample, parameters of the encoder, and parameters of the decoder;
- the perturbation objective function includes the perturbed input sample, the training output sample, parameters of the encoder, and parameters of the decoder.
- the training input sample may be represented by x
- the corresponding disturbance input sample may be represented by x′
- the training output sample and the disturbance output sample are represented by y
- the first intermediate representation result may be represented by H x
- the second The intermediate representation result can be represented by H x '
- the classification objective function can be represented by L inv (x, x')
- the training objective function can be represented by L true (x, y)
- the perturbed objective function can be L noisy (x', y) indicates.
- the initial translation model in the embodiment of the present application may be a neural machine translation model.
- the training goal for the initial translation model is to enable the initial translation model to remain substantially consistent for the translation behavior of x and x'.
- the encoder is responsible for converting the statement x of the first language into H x , and the decoder outputs the target language statement y with H x as input and output.
- the training goal of the embodiment of the present application is to train a perturbation-invariant encoder and decoder.
- L inv (x, x') encourages the encoder to output a similar representation for x and x', thereby implementing a perturbation-invariant encoder that achieves this by combating learning.
- L noisy (x', y) directs the decoder to generate a target language statement y for the input x' containing the perturbation.
- the two newly introduced training objectives enable the robustness of the neural machine translation model so that it can be protected from drastic changes in the output space due to small disturbances in the input.
- the training target L true (x, y) on the original data x and y will be introduced to ensure the quality of the translation while enhancing the robustness of the neural machine translation model.
- ⁇ enc is the parameter of the encoder
- ⁇ dec is the parameter of the decoder
- ⁇ dis is the parameter of the classifier.
- ⁇ and ⁇ are used to control the importance between the original translation task and the stability of the machine translation model.
- the goal of the perturbation-invariant encoder is that when the encoder inputs a correct statement x and its corresponding perturbation statement x', the representation of the two statements by the encoder is indistinguishable, which directly contributes to the robustness of the decoder. Output.
- an encoder can be used as the generator G, which defines a process of generating a hidden representation H x sequence.
- a classifier D is also introduced to distinguish between the representation H x of the original input and the perturbation input H x ' .
- the effect of the generator G is to produce a similar representation for x and x' so that the classifier D cannot distinguish them, whereas the role of the classifier D is to try to distinguish them.
- the adversarial learning objectives are defined as:
- the classifier Given an input, the classifier outputs a categorical value whose goal is to maximize the categorical value of the correct statement x while minimizing the categorical value of the perturbed statement x'.
- the stochastic gradient descent is used to optimize the model objective function J( ⁇ ).
- J( ⁇ ) In forward propagation, in addition to a batch of data containing x and y, a batch of data of x' and y is also included.
- the values of J( ⁇ ) can be calculated from the two batches of data, and then J( ⁇ ) is calculated to correspond to the gradient of the model parameters, which are used to update the model parameters. Since the goal of Linv is to maximize the categorical value of the correct statement x while minimizing the categorical value of the perturbed statement x', Linv multiplies the gradient of the parameter set ⁇ enc by -1, and the other gradients propagate normally. Thus, the values of ⁇ enc , ⁇ dec , and ⁇ dis in the initial training model can be calculated, thereby training a target translation model with anti-noise capability.
- the using the plurality of training sample pairs and the perturbed input sample set corresponding to each of the training input samples, and the perturbed output sample corresponding to the perturbed output sample set to train the initial translation model To get the target translation model including:
- the training process of the target translation model is introduced above.
- the process of using the target translation model for statement translation is described below.
- FIG. 4 is a schematic diagram of an embodiment of a method for translation of a sentence in the embodiment of the present application. As shown in FIG. 4, an embodiment of a method for translation of a sentence provided by an embodiment of the present application includes:
- the terminal device receives the first to-be-translated statement expressed in the first language.
- the first language may be any one of the types of languages supported by the target translation model.
- the terminal device translates the first to-be-translated sentence using a target translation model to obtain a translation result statement expressed in a second language, where the target translation model is to use multiple training samples and the plurality of training samples
- Each of the training samples is trained by a corresponding set of perturbed samples, the perturbed sample set includes at least one perturbed sample, and the semantic similarity between the perturbed samples and the corresponding training samples is higher than a first preset value.
- the terminal device outputs the translation result statement expressed in the second language.
- the second language is a language different from the first language, for example, the first language is Chinese and the second language is English.
- the target translation model since the target translation model has anti-noise capability, when a speech with noise is received, the translation can be performed correctly. Thereby improving the robustness of machine translation and the quality of translation.
- the method may further include:
- the second to-be-translated statement is a perturbation statement of the first to-be-translated sentence, and the second to-be-translated sentence and the first to-be-translated statement
- the similarity of the statement is higher than the first preset value
- the terminal device outputs the statement of the translation result.
- the first statement to be translated is not limited to the training input sample in the above example, and may be one of the above-mentioned disturbance input samples.
- FIG. 5 is a schematic diagram of an application scenario of a sentence translation in the embodiment of the present application
- (A)-(C) in FIG. 5 are diagrams of a scenario of text translation in a social application according to an embodiment of the present application.
- FIG. 5 is a diagram showing another example of a scenario of text translation in a social application according to an embodiment of the present application.
- FIG. 6 is a schematic diagram of another application scenario of the sentence translation in the embodiment of the present application.
- a of FIG. 6 if the “they play the game AI without fear of difficulty” is translated into English, the text is long pressed.
- the page shown in (B) of FIG. 6 appears, and functions such as “copy”, “forward”, “delete”, and “translate” appear in the page shown in (B) of FIG.
- the user clicks "Translate” on the page shown in (B) of FIG. 6 the translation result "They are not afraid of difficulties to make Go AI” shown in (C) of FIG. 6 appears. .
- FIG. 7 is a schematic diagram of an application of sentence translation in a simultaneous interpretation scenario according to an embodiment of the present application.
- FIG. 8 is a schematic diagram of another application of the sentence translation in the simultaneous interpretation scenario according to an embodiment of the present application.
- the above embodiment introduces the training process of the target translation model in the embodiment of the present application and the process of using the target translation model to perform the sentence translation.
- the device for translation model training, the device for sentence translation, and the calculation are introduced in the following embodiments with reference to the accompanying drawings. Equipment and terminal equipment.
- FIG. 9 is a schematic diagram of an embodiment of an apparatus for training a model in the embodiment of the present application.
- the apparatus 30 for translation model training provided by the embodiment of the present application includes one or more processors, and one or more memories of the program unit, wherein the program units are executed by the processor.
- the program units include:
- the obtaining unit 301 is configured to acquire a training sample set, where the training sample set includes multiple training samples;
- the determining unit 302 is configured to determine a perturbed sample set corresponding to each training sample in the training sample set acquired by the acquiring unit 301, the perturbed sample set includes at least one perturbation sample, and the perturbed sample and corresponding training The semantic similarity of the sample is higher than the first preset value;
- the model training unit 303 is configured to use the plurality of training samples obtained by the obtaining unit 301 and the perturbed sample set corresponding to each of the training samples determined by the determining unit 302 to train an initial translation model to obtain a target Translation model.
- the perturbed sample is used in the training of the translation model, and the semantic similarity between the perturbed sample and the training sample is higher than the first preset value, that is, the semantics of the perturbed sample and the training sample are very similar, so that the trained target translation is performed.
- the model can also translate correctly when it receives a statement with noise. Thereby improving the robustness of machine translation and the quality of translation.
- each training sample is a training sample pair, where the training sample pair includes a training input sample and a training output sample, and each of the training input samples is determined to be corresponding to each Disturbing an input sample set, and a perturbed output sample corresponding to the perturbed output sample set, the perturbed input sample set including at least one perturbed input sample, the perturbed output sample being identical to the training output sample;
- the model training unit 303 is configured to use a plurality of training sample pairs and a perturbed input sample set corresponding to each of the training input samples, and a perturbed output sample corresponding to the perturbed output sample set to train an initial translation model to obtain a target Translation model.
- the determining unit 302 is configured to:
- the determining unit 302 is configured to:
- a different Gaussian noise vector is superimposed on the word vector of each word to obtain the perturbed sample set.
- the initial translation model includes an encoder, a classifier, and a decoder
- the encoder is configured to receive the training input sample and a corresponding perturbed input sample, and output a first intermediate representation result and a second intermediate representation result, where the first intermediate representation result is an intermediate representation result of the training input sample, The second intermediate representation result is an intermediate representation result of the corresponding perturbed input sample;
- the classifier is configured to distinguish the first intermediate representation result and the second intermediate representation result
- the decoder is configured to output the training output sample according to a first intermediate representation result, and output the training output sample according to the second intermediate representation result.
- the model objective function of the initial translation model includes a classification objective function related to the classifier and the encoder, a training objective function and a perturbation objective function related to the encoder and the decoder;
- the classification objective function includes the training input sample, the corresponding disturbance input sample, parameters of the encoder, and parameters of the classifier;
- the training objective function includes the training input sample, the training output sample, parameters of the encoder, and parameters of the decoder;
- the perturbation objective function includes the perturbed input sample, the training output sample, parameters of the encoder, and parameters of the decoder.
- model training unit 303 is configured to:
- the device 30 for training the translation model provided by the embodiment of the present application can be understood by referring to the corresponding content in the embodiment of the method, and details are not repeated herein.
- an embodiment of an apparatus for statement translation provided by an embodiment of the present application includes one or more processors, and one or more memories storing program units, wherein the program units are executed by a processor.
- the program unit includes:
- the receiving unit 401 is configured to receive the first to-be-translated statement expressed in the first language
- the translating unit 402 is configured to translate the first to-be-translated sentence received by the receiving unit 401 using a target translation model to obtain a translation result statement expressed in a second language, wherein the target translation model is used Training samples and training samples corresponding to each of the plurality of training samples are respectively trained, the perturbed sample set includes at least one perturbed sample, and the semantic similarity between the perturbed samples and the corresponding training samples is higher than First preset value;
- the output unit 403 is configured to output a translation result statement expressed by the translation unit 402 and expressed in a second language.
- the target translation model since the target translation model has anti-noise capability, when a speech with noise is received, the translation can be performed correctly. Thereby improving the robustness of machine translation and the quality of translation.
- the receiving unit 401 is further configured to receive a second to-be-translated statement expressed in the first language, where the second to-be-translated statement is a perturbation statement of the first to-be-translated sentence, The similarity between the second statement to be translated and the first statement to be translated is higher than the first preset value;
- the translation unit 402 is further configured to translate the second statement to be translated using a target translation model to obtain the translation result statement corresponding to the first statement to be translated;
- the output unit 403 is further configured to output the translation result statement.
- the device 40 for translating the above statement can be understood by referring to the corresponding content of the method embodiment, and details are not repeated herein.
- FIG. 11 is a schematic diagram of an embodiment of a computer device in an embodiment of the present application.
- computer device 50 includes a processor 510, a memory 540, and an input/output (I/O) interface 530, which may include read only memory and random access memory, and provides operational instructions and data to processor 510.
- I/O input/output
- a portion of memory 540 may also include non-volatile random access memory (NVRAM).
- NVRAM non-volatile random access memory
- memory 540 stores elements, executable modules or data structures, or a subset thereof, or their extension set:
- the operation instruction in the process of determining the ground flag, by calling an operation instruction stored in the memory 540 (the operation instruction can be stored in the operating system),
- the training sample set includes a plurality of training samples
- the initial translation model is trained using the plurality of training samples and the respective perturbed sample sets corresponding to each of the training samples to obtain a target translation model.
- the perturbed sample is used in the training of the translation model, and the semantic similarity between the perturbed sample and the training sample is higher than the first preset value, that is, the semantics of the perturbed sample and the training sample are very similar, so that the trained target translation is performed.
- the model can also translate correctly when it receives a statement with noise. Thereby improving the robustness of machine translation and the quality of translation.
- the processor 510 controls the operation of the computer device 50, which may also be referred to as a Central Processing Unit (CPU).
- Memory 540 can include read only memory and random access memory and provides instructions and data to processor 510.
- a portion of memory 540 may also include non-volatile random access memory (NVRAM).
- NVRAM non-volatile random access memory
- the specific components of the computer device 50 are coupled together by a bus system 520 in a specific application.
- the bus system 520 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus system 520 in the figure.
- Processor 510 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 510 or an instruction in a form of software.
- the processor 510 described above may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or discrete hardware. Component.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA off-the-shelf programmable gate array
- the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
- the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present application may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
- the storage medium is located in memory 540, and processor 510 reads the information in memory 540 and, in conjunction with its hardware, performs the steps of the above method.
- the processor 510 is configured to:
- Each of the training samples is a training sample pair, and when the training sample pair includes a training input sample and a training output sample, determining a perturbed input sample set corresponding to each training input sample, and the perturbed output sample set corresponds to Disturbing an output sample, the perturbed input sample set comprising at least one perturbed input sample, the perturbed output sample being identical to the training output sample;
- the initial translation model is trained using a plurality of training sample pairs and a perturbed input sample set corresponding to each of the training input samples, and a perturbed output sample corresponding to the perturbed output sample set to obtain a target translation model.
- the processor 510 is configured to:
- the processor 510 is configured to:
- a different Gaussian noise vector is superimposed on the word vector of each word to obtain the perturbed sample set.
- the initial translation model includes an encoder, a classifier, and a decoder
- the encoder is configured to receive the training input sample and a corresponding perturbed input sample, and output a first intermediate representation result and a second intermediate representation result, where the first intermediate representation result is an intermediate representation result of the training input sample, The second intermediate representation result is an intermediate representation result of the corresponding perturbed input sample;
- the classifier is configured to distinguish the first intermediate representation result and the second intermediate representation result
- the decoder is configured to output the training output sample according to a first intermediate representation result, and output the training output sample according to the second intermediate representation result.
- the model objective function of the initial translation model includes a classification objective function related to the classifier and the encoder, a training objective function and a perturbation objective function related to the encoder and the decoder;
- the classification objective function includes the training input sample, the corresponding disturbance input sample, parameters of the encoder, and parameters of the classifier;
- the training objective function includes the training input sample, the training output sample, parameters of the encoder, and parameters of the decoder;
- the perturbation objective function includes the perturbed input sample, the training output sample, parameters of the encoder, and parameters of the decoder.
- the processor 510 is configured to:
- the description of the computer device 50 can be understood by referring to the description in the parts of FIG. 1 to FIG. 3, and details are not repeated herein.
- the process of translating the above statement is performed by the terminal device, such as a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), a car computer, and the like, and the terminal is a mobile phone.
- the terminal device such as a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), a car computer, and the like
- PDA personal digital assistant
- POS point of sales
- car computer a mobile phone
- FIG. 12 is a schematic diagram of an embodiment of a terminal device according to an embodiment of the present application.
- the terminal device is a mobile phone.
- the mobile phone includes: a radio frequency (RF) circuit 1110, a memory 1120, an input unit 1130, a display unit 1140, a sensor 1150, an audio circuit 1160, a wireless fidelity (WiFi) module 1170, and a processor 1180. And components such as the camera 1190.
- RF radio frequency
- the structure of the handset shown in Figure 12 does not constitute a limitation to the handset, and may include more or fewer components than those illustrated, or some components may be combined, or different components may be arranged.
- the RF circuit 1110 can be used to transmit and receive information, or receive and transmit signals during a call, and the RF circuit 1110 is also a transceiver. Specifically, after the downlink information of the base station is received, it is processed by the processor 1180; in addition, the data designed for the uplink is transmitted to the base station.
- RF circuit 1110 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like.
- LNA Low Noise Amplifier
- RF circuitry 1110 can also communicate with the network and other devices via wireless communication.
- the above wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division). Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), and the like.
- GSM Global System of Mobile communication
- GPRS General Packet Radio Service
- CDMA Code Division Multiple Access
- WCDMA Wideband Code Division Multiple Access
- LTE Long Term Evolution
- E-mail Short Messaging Service
- the memory 1120 can be used to store software programs and modules, and the processor 1180 executes various functional applications and data processing of the mobile phone by running software programs and modules stored in the memory 1120.
- the memory 1120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to Data created by the use of the mobile phone (such as audio data, phone book, etc.).
- memory 1120 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
- the input unit 1130 can be configured to receive a statement to be translated and a translation indicator input by the user.
- the input unit 1130 may include a touch panel 1131 and other input devices 1132.
- the touch panel 1131 also referred to as a touch screen, can collect touch operations on or near the user (such as the user using a finger, a stylus, or the like on the touch panel 1131 or near the touch panel 1131. Operation), and drive the corresponding connecting device according to a preset program.
- the touch panel 1131 may include two parts: a touch detection device and a touch controller.
- the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
- the processor 1180 is provided and can receive commands from the processor 1180 and execute them.
- the touch panel 1131 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
- the input unit 1130 may also include other input devices 1132.
- other input devices 1132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
- Display unit 1140 can be used to display the results of the translation.
- the display unit 1140 may include a display panel 1141.
- the display panel 1141 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
- the touch panel 1131 can cover the display panel 1141. After the touch panel 1131 detects a touch operation thereon or nearby, the touch panel 1131 transmits to the processor 1180 to determine the type of the touch event, and then the processor 1180 according to the touch event. The type provides a corresponding visual output on the display panel 1141.
- the touch panel 1131 and the display panel 1141 are used as two independent components to implement the input and input functions of the mobile phone in FIG. 12, in some embodiments, the touch panel 1131 and the display panel 1141 may be integrated. Realize the input and output functions of the phone.
- the handset may also include at least one type of sensor 1150, such as a light sensor, motion sensor, and other sensors.
- the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1141 according to the brightness of the ambient light, and the proximity sensor may close the display panel 1141 and/or when the mobile phone moves to the ear. Or backlight.
- the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
- the mobile phone can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
- the gesture of the mobile phone such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration
- vibration recognition related functions such as pedometer, tapping
- the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
- An audio circuit 1160, a speaker 1161, and a microphone 1162 can provide an audio interface between the user and the handset.
- the audio circuit 1160 can transmit the converted electrical data of the received audio data to the speaker 1161, and convert it into a sound signal output by the speaker 1161; on the other hand, the microphone 1162 converts the collected sound signal into an electrical signal, and the audio circuit 1160 After receiving, it is converted into audio data, and then processed by the audio data output processor 1180, transmitted to the other mobile phone via the RF circuit 1110, or outputted to the memory 1120 for further processing.
- WiFi is a short-range wireless transmission technology.
- the mobile phone can help users to send and receive emails, browse web pages and access streaming media through the WiFi module 1170, which provides users with wireless broadband Internet access.
- FIG. 12 shows the WiFi module 1170, it can be understood that it does not belong to the essential configuration of the mobile phone, and may be omitted as needed within the scope of not changing the essence of the invention.
- the processor 1180 is a control center for the handset, which connects various portions of the entire handset using various interfaces and lines, by executing or executing software programs and/or modules stored in the memory 1120, and invoking data stored in the memory 1120, The phone's various functions and processing data, so that the overall monitoring of the phone.
- the processor 1180 may include one or more processing units; preferably, the processor 1180 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
- the modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 1180.
- Camera 1190 is used to capture images.
- the mobile phone also includes a power source (such as a battery) that supplies power to various components.
- a power source such as a battery
- the power source can be logically coupled to the processor 1180 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
- the mobile phone may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
- the processor 1180 included in the terminal further has the following control functions:
- the respective perturbed sample sets of the samples are trained, and the perturbed sample set includes at least one perturbed sample, and the semantic similarity between the perturbed samples and the corresponding training samples is higher than the first preset value;
- the translation result statement expressed in the second language is output.
- the second to-be-translated statement is a perturbation statement of the first to-be-translated sentence, and the second to-be-translated statement and the first to-be-translated statement The similarity is higher than the first preset value;
- the translation result statement is output.
- the computer program product includes one or more computer instructions.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
- wire eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
- the computer readable storage medium can be any available media that can be stored by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
- the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a Solid State Disk (SSD)) or the like.
- the program may be stored in a computer readable storage medium, and the storage medium may include: ROM, RAM, disk or CD.
- the perturbed sample is used in the training of the translation model, and the semantic similarity between the perturbed sample and the training sample is higher than the first preset value, that is, the semantics of the perturbed sample and the training sample are very similar, so that the trained target translation is performed.
- the model can also translate correctly when it receives a statement with noise. Thereby improving the robustness of machine translation and the quality of translation.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (15)
- 一种翻译模型训练的方法,包括:计算机设备获取训练样本集合,所述训练样本集合中包括多个训练样本;所述计算机设备确定所述训练样本集合中每个训练样本各自对应的扰动样本集合,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;所述计算机设备使用所述多个训练样本和所述每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型。
- 根据权利要求1所述的方法,其中,所述每个训练样本为一个训练样本对,所述训练样本对包括训练输入样本和训练输出样本;所述计算机设备确定每个训练样本各自对应的扰动样本集合,包括:确定每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本,所述扰动输入样本集合包括至少一个扰动输入样本,所述扰动输出样本与所述训练输出样本相同;所述计算机设备使用所述多个训练样本和所述每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型,包括:使用多个训练样本对和所述每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本训练初始翻译模型,以得到目标翻译模型。
- 根据权利要求2所述的方法,其中,所述计算机设备确定所述训练样本集合中每个训练输入样本各自对应的扰动输入样本集合,包括:确定所述每个训练输入样本中的第一词语,所述第一词语为待被替换的词语;用至少一个第二词语分别替换所述第一词语,以得到所述扰动输入样本集合,所述第二词语与所述第一词语的语义相似度高于第二预设值。
- 根据权利要求2所述的方法,其中,所述计算机设备确定所述训练样本集合中每个训练输入样本各自对应的扰动输入样本集合,包括:确定所述每个训练输入样本中每个词语的词向量;每次在所述每个词语的词向量上叠加一个不同的高斯噪声向量,以得到所 述扰动样本集合。
- 根据权利要求2-4任一所述的方法,其中,所述初始翻译模型包括编码器、分类器和解码器;所述编码器用于接收所述训练输入样本和对应的扰动输入样本,并输出第一中间表示结果和第二中间表示结果,所述第一中间表示结果为所述训练输入样本的中间表示结果,所述第二中间表示结果为所述对应的扰动输入样本的中间表示结果;所述分类器用于区分所述第一中间表示结果和所述第二中间表示结果;所述解码器用于根据第一中间表示结果输出所述训练输出样本,根据所述第二中间表示结果输出所述训练输出样本。
- 根据权利要求5所述的方法,其中,所述初始翻译模型的模型目标函数包括与所述分类器和所述编码器相关的分类目标函数、与所述编码器和所述解码器相关的训练目标函数和扰动目标函数;其中,所述分类目标函数中包括所述训练输入样本、所述对应的扰动输入样本、所述编码器的参数和所述分类器的参数;所述训练目标函数包括所述训练输入样本、所述训练输出样本、所述编码器的参数和所述解码器的参数;所述扰动目标函数包括所述扰动输入样本、所述训练输出样本、所述编码器的参数和所述解码器的参数。
- 根据权利要求6所述的方法,其中,所述计算机设备使用多个训练样本对和所述每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本训练初始翻译模型,以得到目标翻译模型,包括:将每个训练输入样本、对应的扰动输入样本以及对应的训练输出样本输入所述模型目标函数;按照梯度下降的方式优化所述模型目标函数,以确定出所述编码器的参数的取值、所述解码器的参数的取值和所述分类器的参数的取值,其中,对于分类目标函数中编码器的参数的梯度乘以-1。
- 一种语句翻译的方法,包括:终端设备接收以第一语言表达的第一待翻译语句;所述终端设备使用目标翻译模型对所述第一待翻译语句进行翻译,以得到用第二语言表达的翻译结果语句,其中所述目标翻译模型为使用多个训练样本和所述多个训练样本中每个训练样本各自对应的扰动样本集合训练得到的,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;所述终端设备输出所述用第二语言表达的翻译结果语句。
- 根据权利要求8所述的方法,其中,所述方法还包括:所述终端设备接收以所述第一语言表达的第二待翻译语句,所述第二待翻译语句为所述第一待翻译语句的扰动语句,所述第二待翻译语句与所述第一待翻译语句的相似度高于第一预设值;所述终端设备使用目标翻译模型对所述第二待翻译语句进行翻译,以得到与所述第一待翻译语句对应的所述翻译结果语句;输出所述翻译结果语句。
- 一种翻译模型训练的装置,包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,所述程序单元由所述处理器执行,所述程序单元包括:获取单元,被设置为获取训练样本集合,所述训练样本集合中包括多个训练样本;确定单元,被设置为确定所述获取单元获取的所述训练样本集合中每个训练样本各自对应的扰动样本集合,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;模型训练单元,被设置为使用所述获取单元获得的所述多个训练样本和所述确定单元确定的所述每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型。
- 根据权利要求10所述的装置,其中,所述确定单元,被设置为在所述每个训练样本为一个训练样本对,所述训练样本对包括训练输入样本和训练输出样本时,确定每个训练输入样本各自对 应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本,所述扰动输入样本集合包括至少一个扰动输入样本,所述扰动输出样本与所述训练输出样本相同;所述模型训练单元,被设置为使用多个训练样本对和所述每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本训练初始翻译模型,以得到目标翻译模型。
- 一种语句翻译的装置,包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,所述程序单元由所述处理器执行,所述程序单元包括:接收单元,被设置为接收以第一语言表达的第一待翻译语句;翻译单元,被设置为使用目标翻译模型对所述接收单元接收的所述第一待翻译语句进行翻译,以得到用第二语言表达的翻译结果语句,其中所述目标翻译模型为使用多个训练样本和所述多个训练样本中每个训练样本各自对应的扰动样本集合训练得到的,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;输出单元,被设置为输出所述翻译单元翻译出的用第二语言表达的翻译结果语句。
- 一种计算机设备,所述计算机设备包括:输入/输出(I/O)接口、处理器和存储器,所述存储器中存储有程序指令;所述处理器用于执行存储器中存储的程序指令,执行如权利要求1-7任一所述的方法。
- 一种终端设备,所述终端设备包括:输入/输出(I/O)接口、处理器和存储器,所述存储器中存储有程序指令;所述处理器用于执行存储器中存储的程序指令,执行如权利要求8或9所述的方法。
- 一种非暂态计算机可读存储介质,包括指令,当所述指令在计算机设备上运行时,使得所述计算机设备执行如权利要求1-7中任一项所述的方法或权利要求8或9所述的方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020545261A JP7179273B2 (ja) | 2018-05-10 | 2019-03-29 | 翻訳モデルのトレーニング方法、語句翻訳の方法、機器、記憶媒体及びコンピュータプログラム |
EP19800044.0A EP3792789A4 (en) | 2018-05-10 | 2019-03-29 | MODEL TRANSLATION LEARNING PROCESS, SENTENCE TRANSLATION PROCESS AND APPARATUS, AND INFORMATION MEDIA |
US16/987,565 US11900069B2 (en) | 2018-05-10 | 2020-08-07 | Translation model training method, sentence translation method, device, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810445783.2 | 2018-05-10 | ||
CN201810445783.2A CN110472251B (zh) | 2018-05-10 | 2018-05-10 | 翻译模型训练的方法、语句翻译的方法、设备及存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/987,565 Continuation US11900069B2 (en) | 2018-05-10 | 2020-08-07 | Translation model training method, sentence translation method, device, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019214365A1 true WO2019214365A1 (zh) | 2019-11-14 |
Family
ID=68466679
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/080411 WO2019214365A1 (zh) | 2018-05-10 | 2019-03-29 | 翻译模型训练的方法、语句翻译的方法、设备及存储介质 |
Country Status (5)
Country | Link |
---|---|
US (1) | US11900069B2 (zh) |
EP (1) | EP3792789A4 (zh) |
JP (1) | JP7179273B2 (zh) |
CN (1) | CN110472251B (zh) |
WO (1) | WO2019214365A1 (zh) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111859995A (zh) * | 2020-06-16 | 2020-10-30 | 北京百度网讯科技有限公司 | 机器翻译模型的训练方法、装置、电子设备及存储介质 |
CN112364999A (zh) * | 2020-10-19 | 2021-02-12 | 深圳市超算科技开发有限公司 | 冷水机调节模型的训练方法、装置及电子设备 |
CN112541557A (zh) * | 2020-12-25 | 2021-03-23 | 北京百度网讯科技有限公司 | 生成式对抗网络的训练方法、装置及电子设备 |
CN113110843A (zh) * | 2021-03-05 | 2021-07-13 | 卓尔智联(武汉)研究院有限公司 | 合约生成模型训练方法、合约生成方法及电子设备 |
CN114241268A (zh) * | 2021-12-21 | 2022-03-25 | 支付宝(杭州)信息技术有限公司 | 一种模型的训练方法、装置及设备 |
CN115081462A (zh) * | 2022-06-15 | 2022-09-20 | 京东科技信息技术有限公司 | 翻译模型训练、翻译方法和装置 |
CN111258991B (zh) * | 2020-01-08 | 2023-11-07 | 北京小米松果电子有限公司 | 一种数据处理方法、装置及存储介质 |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021141576A1 (en) * | 2020-01-08 | 2021-07-15 | Google, Llc | Translation of text depicted in images |
CN113283249B (zh) * | 2020-02-19 | 2024-09-27 | 阿里巴巴集团控股有限公司 | 机器翻译方法、装置及计算机可读存储介质 |
CN111859997B (zh) * | 2020-06-16 | 2024-01-26 | 北京百度网讯科技有限公司 | 机器翻译中的模型训练方法、装置、电子设备及存储介质 |
CN111723550B (zh) * | 2020-06-17 | 2024-07-12 | 腾讯科技(深圳)有限公司 | 语句改写方法、装置、电子设备以及计算机存储介质 |
CN111753556B (zh) * | 2020-06-24 | 2022-01-04 | 掌阅科技股份有限公司 | 双语对照阅读的方法、终端及计算机存储介质 |
CN112257459B (zh) * | 2020-10-16 | 2023-03-24 | 北京有竹居网络技术有限公司 | 语言翻译模型的训练方法、翻译方法、装置和电子设备 |
CN112328348A (zh) * | 2020-11-05 | 2021-02-05 | 深圳壹账通智能科技有限公司 | 应用程序多语言支持方法、装置、计算机设备及存储介质 |
CN112380883B (zh) * | 2020-12-04 | 2023-07-25 | 北京有竹居网络技术有限公司 | 模型训练方法、机器翻译方法、装置、设备及存储介质 |
CN112528637B (zh) * | 2020-12-11 | 2024-03-29 | 平安科技(深圳)有限公司 | 文本处理模型训练方法、装置、计算机设备和存储介质 |
CN112417895B (zh) * | 2020-12-15 | 2024-09-06 | 广州博冠信息科技有限公司 | 弹幕数据处理方法、装置、设备以及存储介质 |
US12001798B2 (en) * | 2021-01-13 | 2024-06-04 | Salesforce, Inc. | Generation of training data for machine learning based models for named entity recognition for natural language processing |
CN112598091B (zh) * | 2021-03-08 | 2021-09-07 | 北京三快在线科技有限公司 | 一种训练模型和小样本分类的方法及装置 |
CN113204977B (zh) * | 2021-04-29 | 2023-09-26 | 北京有竹居网络技术有限公司 | 信息翻译方法、装置、设备和存储介质 |
CN113609157B (zh) * | 2021-08-09 | 2023-06-30 | 平安科技(深圳)有限公司 | 语言转换模型训练、语言转换方法、装置、设备及介质 |
CN113688245B (zh) * | 2021-08-31 | 2023-09-26 | 中国平安人寿保险股份有限公司 | 基于人工智能的预训练语言模型的处理方法、装置及设备 |
CN113762397B (zh) * | 2021-09-10 | 2024-04-05 | 北京百度网讯科技有限公司 | 检测模型训练、高精度地图更新方法、设备、介质及产品 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102439596A (zh) * | 2009-05-22 | 2012-05-02 | 微软公司 | 从非结构化资源挖掘短语对 |
CN105279252A (zh) * | 2015-10-12 | 2016-01-27 | 广州神马移动信息科技有限公司 | 挖掘相关词的方法、搜索方法、搜索系统 |
CN107526720A (zh) * | 2016-06-17 | 2017-12-29 | 松下知识产权经营株式会社 | 意思生成方法、意思生成装置以及程序 |
Family Cites Families (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007122525A (ja) | 2005-10-29 | 2007-05-17 | National Institute Of Information & Communication Technology | 言い換え処理方法及び装置 |
US9201871B2 (en) * | 2010-06-11 | 2015-12-01 | Microsoft Technology Licensing, Llc | Joint optimization for machine translation system combination |
JP5506569B2 (ja) * | 2010-06-29 | 2014-05-28 | 株式会社Kddi研究所 | サポートベクトルマシンの再学習方法及び装置 |
US8515736B1 (en) * | 2010-09-30 | 2013-08-20 | Nuance Communications, Inc. | Training call routing applications by reusing semantically-labeled data collected for prior applications |
US8972240B2 (en) * | 2011-05-19 | 2015-03-03 | Microsoft Corporation | User-modifiable word lattice display for editing documents and search queries |
CN102799579B (zh) * | 2012-07-18 | 2015-01-21 | 西安理工大学 | 具有错误自诊断和自纠错功能的统计机器翻译方法 |
CN104239286A (zh) * | 2013-06-24 | 2014-12-24 | 阿里巴巴集团控股有限公司 | 同义短语的挖掘方法和装置及搜索相关内容的方法和装置 |
US9026551B2 (en) * | 2013-06-25 | 2015-05-05 | Hartford Fire Insurance Company | System and method for evaluating text to support multiple insurance applications |
CN104933038A (zh) * | 2014-03-20 | 2015-09-23 | 株式会社东芝 | 机器翻译方法和机器翻译装置 |
US9443513B2 (en) * | 2014-03-24 | 2016-09-13 | Educational Testing Service | System and method for automated detection of plagiarized spoken responses |
US10055485B2 (en) * | 2014-11-25 | 2018-08-21 | International Business Machines Corporation | Terms for query expansion using unstructured data |
CN107438842A (zh) * | 2014-12-18 | 2017-12-05 | Asml荷兰有限公司 | 通过机器学习的特征搜索 |
US10115055B2 (en) * | 2015-05-26 | 2018-10-30 | Booking.Com B.V. | Systems methods circuits and associated computer executable code for deep learning based natural language understanding |
US9984068B2 (en) * | 2015-09-18 | 2018-05-29 | Mcafee, Llc | Systems and methods for multilingual document filtering |
CN105512679A (zh) * | 2015-12-02 | 2016-04-20 | 天津大学 | 一种基于极限学习机的零样本分类方法 |
JP6671027B2 (ja) * | 2016-02-01 | 2020-03-25 | パナソニックIpマネジメント株式会社 | 換言文生成方法、該装置および該プログラム |
JP6655788B2 (ja) * | 2016-02-01 | 2020-02-26 | パナソニックIpマネジメント株式会社 | 対訳コーパス作成方法、該装置および該プログラムならびに機械翻訳システム |
US20170286376A1 (en) * | 2016-03-31 | 2017-10-05 | Jonathan Mugan | Checking Grammar Using an Encoder and Decoder |
US10268686B2 (en) * | 2016-06-24 | 2019-04-23 | Facebook, Inc. | Machine translation system employing classifier |
CN107608973A (zh) * | 2016-07-12 | 2018-01-19 | 华为技术有限公司 | 一种基于神经网络的翻译方法及装置 |
KR102565275B1 (ko) * | 2016-08-10 | 2023-08-09 | 삼성전자주식회사 | 병렬 처리에 기초한 번역 방법 및 장치 |
US20180061408A1 (en) * | 2016-08-24 | 2018-03-01 | Semantic Machines, Inc. | Using paraphrase in accepting utterances in an automated assistant |
CN109690577A (zh) * | 2016-09-07 | 2019-04-26 | 皇家飞利浦有限公司 | 利用堆叠式自动编码器进行的半监督式分类 |
JP6817556B2 (ja) | 2016-09-27 | 2021-01-20 | パナソニックIpマネジメント株式会社 | 類似文生成方法、類似文生成プログラム、類似文生成装置及び類似文生成システム |
KR102589638B1 (ko) * | 2016-10-31 | 2023-10-16 | 삼성전자주식회사 | 문장 생성 장치 및 방법 |
CN107180026B (zh) * | 2017-05-02 | 2020-12-29 | 苏州大学 | 一种基于词嵌入语义映射的事件短语学习方法及装置 |
CN107273503B (zh) * | 2017-06-19 | 2020-07-10 | 北京百度网讯科技有限公司 | 用于生成同语言平行文本的方法和装置 |
CN107463879A (zh) * | 2017-07-05 | 2017-12-12 | 成都数联铭品科技有限公司 | 基于深度学习的人体行为识别方法 |
CN107766577B (zh) * | 2017-11-15 | 2020-08-21 | 北京百度网讯科技有限公司 | 一种舆情监测方法、装置、设备及存储介质 |
JP7149560B2 (ja) * | 2018-04-13 | 2022-10-07 | 国立研究開発法人情報通信研究機構 | リクエスト言換システム、リクエスト言換モデル及びリクエスト判定モデルの訓練方法、及び対話システム |
CN113761950B (zh) * | 2021-04-28 | 2024-08-27 | 腾讯科技(深圳)有限公司 | 一种翻译模型的测试方法及装置 |
-
2018
- 2018-05-10 CN CN201810445783.2A patent/CN110472251B/zh active Active
-
2019
- 2019-03-29 JP JP2020545261A patent/JP7179273B2/ja active Active
- 2019-03-29 WO PCT/CN2019/080411 patent/WO2019214365A1/zh active Application Filing
- 2019-03-29 EP EP19800044.0A patent/EP3792789A4/en active Pending
-
2020
- 2020-08-07 US US16/987,565 patent/US11900069B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102439596A (zh) * | 2009-05-22 | 2012-05-02 | 微软公司 | 从非结构化资源挖掘短语对 |
CN105279252A (zh) * | 2015-10-12 | 2016-01-27 | 广州神马移动信息科技有限公司 | 挖掘相关词的方法、搜索方法、搜索系统 |
CN107526720A (zh) * | 2016-06-17 | 2017-12-29 | 松下知识产权经营株式会社 | 意思生成方法、意思生成装置以及程序 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3792789A4 |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111258991B (zh) * | 2020-01-08 | 2023-11-07 | 北京小米松果电子有限公司 | 一种数据处理方法、装置及存储介质 |
CN111859995A (zh) * | 2020-06-16 | 2020-10-30 | 北京百度网讯科技有限公司 | 机器翻译模型的训练方法、装置、电子设备及存储介质 |
CN111859995B (zh) * | 2020-06-16 | 2024-01-23 | 北京百度网讯科技有限公司 | 机器翻译模型的训练方法、装置、电子设备及存储介质 |
CN112364999A (zh) * | 2020-10-19 | 2021-02-12 | 深圳市超算科技开发有限公司 | 冷水机调节模型的训练方法、装置及电子设备 |
CN112364999B (zh) * | 2020-10-19 | 2021-11-19 | 深圳市超算科技开发有限公司 | 冷水机调节模型的训练方法、装置及电子设备 |
CN112541557A (zh) * | 2020-12-25 | 2021-03-23 | 北京百度网讯科技有限公司 | 生成式对抗网络的训练方法、装置及电子设备 |
CN112541557B (zh) * | 2020-12-25 | 2024-04-05 | 北京百度网讯科技有限公司 | 生成式对抗网络的训练方法、装置及电子设备 |
CN113110843A (zh) * | 2021-03-05 | 2021-07-13 | 卓尔智联(武汉)研究院有限公司 | 合约生成模型训练方法、合约生成方法及电子设备 |
CN113110843B (zh) * | 2021-03-05 | 2023-04-11 | 卓尔智联(武汉)研究院有限公司 | 合约生成模型训练方法、合约生成方法及电子设备 |
CN114241268A (zh) * | 2021-12-21 | 2022-03-25 | 支付宝(杭州)信息技术有限公司 | 一种模型的训练方法、装置及设备 |
CN115081462A (zh) * | 2022-06-15 | 2022-09-20 | 京东科技信息技术有限公司 | 翻译模型训练、翻译方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
US11900069B2 (en) | 2024-02-13 |
JP7179273B2 (ja) | 2022-11-29 |
EP3792789A4 (en) | 2021-07-07 |
EP3792789A1 (en) | 2021-03-17 |
CN110472251A (zh) | 2019-11-19 |
US20200364412A1 (en) | 2020-11-19 |
CN110472251B (zh) | 2023-05-30 |
JP2021515322A (ja) | 2021-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019214365A1 (zh) | 翻译模型训练的方法、语句翻译的方法、设备及存储介质 | |
US10956771B2 (en) | Image recognition method, terminal, and storage medium | |
KR102360659B1 (ko) | 기계번역 방법, 장치, 컴퓨터 기기 및 기억매체 | |
US12050881B2 (en) | Text translation method and apparatus, and storage medium | |
CN110334360B (zh) | 机器翻译方法及装置、电子设备及存储介质 | |
US10747954B2 (en) | System and method for performing tasks based on user inputs using natural language processing | |
CN107102746B (zh) | 候选词生成方法、装置以及用于候选词生成的装置 | |
WO2018107921A1 (zh) | 回答语句确定方法及服务器 | |
US20150325236A1 (en) | Context specific language model scale factors | |
US10754885B2 (en) | System and method for visually searching and debugging conversational agents of electronic devices | |
US10573317B2 (en) | Speech recognition method and device | |
US9128930B2 (en) | Method, device and system for providing language service | |
CN108984535B (zh) | 语句翻译的方法、翻译模型训练的方法、设备及存储介质 | |
CN111563390B (zh) | 文本生成方法、装置和电子设备 | |
CN104462058B (zh) | 字符串识别方法及装置 | |
CN107870904A (zh) | 一种翻译方法、装置以及用于翻译的装置 | |
GB2533842A (en) | Text correction based on context | |
US11163377B2 (en) | Remote generation of executable code for a client application based on natural language commands captured at a client device | |
US20210110824A1 (en) | Electronic apparatus and controlling method thereof | |
US11741302B1 (en) | Automated artificial intelligence driven readability scoring techniques | |
CN108345590B (zh) | 一种翻译方法、装置、电子设备以及存储介质 | |
CN117010386A (zh) | 一种对象名称的识别方法、装置以及存储介质 | |
CN116959407A (zh) | 一种读音预测方法、装置及相关产品 | |
CN118378615A (zh) | 文章质量评估方法、装置、存储介质及电子设备 | |
CN118485051A (zh) | 一种教育公文生成方法、装置、计算机设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19800044 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020545261 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2019800044 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2019800044 Country of ref document: EP Effective date: 20201210 |