WO2019214365A1 - 翻译模型训练的方法、语句翻译的方法、设备及存储介质 - Google Patents

翻译模型训练的方法、语句翻译的方法、设备及存储介质 Download PDF

Info

Publication number
WO2019214365A1
WO2019214365A1 PCT/CN2019/080411 CN2019080411W WO2019214365A1 WO 2019214365 A1 WO2019214365 A1 WO 2019214365A1 CN 2019080411 W CN2019080411 W CN 2019080411W WO 2019214365 A1 WO2019214365 A1 WO 2019214365A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
sample
perturbed
samples
translation
Prior art date
Application number
PCT/CN2019/080411
Other languages
English (en)
French (fr)
Inventor
程勇
涂兆鹏
孟凡东
翟俊杰
刘洋
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2020545261A priority Critical patent/JP7179273B2/ja
Priority to EP19800044.0A priority patent/EP3792789A4/en
Publication of WO2019214365A1 publication Critical patent/WO2019214365A1/zh
Priority to US16/987,565 priority patent/US11900069B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30196Instruction operation extension or modification using decoder, e.g. decoder per instruction set, adaptable or programmable decoders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the embodiments of the present invention relate to the field of computer technologies, and in particular, to a method for training a translation model, a method for translation of a statement, a device, and a storage medium.
  • machine translation has been widely used, such as sound interpretation and translation of content, etc., based on machine translation to convert one input language into another.
  • Neural machine translation is a neural network-based machine translation model, which has achieved a good translation level in many language pairs and has been widely used in various machine translation products.
  • the neural machine translation model is based on a complete neural network, the global nature of its modeling results in each output of the target being dependent on each word input at the source, making it overly sensitive to small perturbations in the input.
  • the embodiment of the present application provides a method for training a translation model, a method for translation of a sentence, a device, and a storage medium, which can improve the robustness of machine translation and the quality of translation.
  • a first aspect of the embodiments of the present application provides a method for training a translation model, including:
  • the computer device acquires a training sample set, where the training sample set includes a plurality of training samples
  • the computer device determines a perturbed sample set corresponding to each training sample in the training sample set, the perturbed sample set includes at least one perturbation sample, and the semantic similarity between the perturbed sample and the corresponding training sample is higher than a first preset value ;
  • the computer device trains the initial translation model using the plurality of training samples and the respective perturbed sample sets corresponding to each of the training samples to obtain a target translation model.
  • a second aspect of the embodiments of the present application provides a method for statement translation, including:
  • the terminal device translates the first statement to be translated using a target translation model to obtain a translation result statement expressed in a second language, wherein the target translation model is to use a plurality of training samples and each of the plurality of training samples Each of the training samples is trained by the corresponding set of perturbed samples, the perturbed sample set includes at least one perturbed sample, and the semantic similarity between the perturbed samples and the corresponding training samples is higher than the first preset value;
  • the terminal device outputs the translation result sentence expressed in the second language.
  • a third aspect of the embodiments of the present application provides an apparatus for training a translation model, including one or more processors, and one or more memories storing program units, wherein the program units are executed by a processor, and the program units include:
  • An obtaining unit configured to acquire a training sample set, where the training sample set includes a plurality of training samples
  • a determining unit configured to determine a perturbed sample set corresponding to each training sample in the training sample set acquired by the acquiring unit, the perturbed sample set includes at least one perturbation sample, and the perturbed sample and the corresponding training sample
  • the semantic similarity is higher than the first preset value
  • a model training unit configured to train the initial translation model by using the plurality of training samples obtained by the acquiring unit and the perturbed sample set corresponding to each of the training samples determined by the determining unit to obtain a target translation model.
  • a fourth aspect of the embodiments of the present application provides an apparatus for statement translation, including one or more processors, and one or more memories storing program units, wherein the program units are executed by a processor, and the program units include:
  • a receiving unit configured to receive the first to-be-translated statement expressed in the first language
  • a translation unit configured to translate the first statement to be translated received by the receiving unit using a target translation model to obtain a translation result statement expressed in a second language, wherein the target translation model is to use multiple training And the sample and the perturbed sample set corresponding to each of the plurality of training samples are trained, the perturbed sample set includes at least one perturbed sample, and the semantic similarity between the perturbed sample and the corresponding training sample is higher than the first default value;
  • an output unit configured to output a translation result statement expressed in the second language translated by the translation unit.
  • a fifth aspect of the embodiments of the present application provides a computer device, including: an input/output (I/O) interface, a processor, and a memory, where the program stores program instructions;
  • I/O input/output
  • processor processor
  • memory where the program stores program instructions
  • the processor is operative to execute program instructions stored in a memory, the method of the first aspect.
  • a sixth aspect of the embodiments of the present application provides a terminal device, where the terminal device includes: an input/output (I/O) interface, a processor, and a memory, where the memory instruction is stored in the memory;
  • I/O input/output
  • the processor is operative to execute program instructions stored in a memory and to perform the method of the second aspect.
  • a seventh aspect of the present application provides a non-transitory computer readable storage medium, comprising instructions, when executed on a computer device, causing the computer device to perform the method or the second method as described in the first aspect above The method described in the aspects.
  • Yet another aspect of an embodiment of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect or the second aspect described above.
  • the perturbed sample is used in the training of the translation model, and the semantic similarity between the perturbed sample and the training sample is higher than the first preset value, that is, the semantics of the perturbed sample and the training sample are very similar, so that the trained target translation is performed.
  • the model can also translate correctly when it receives a statement with noise. Thereby improving the robustness of machine translation and the quality of translation.
  • FIG. 1 is a schematic diagram of an embodiment of a system for training a translation model in an embodiment of the present application
  • FIG. 2 is a schematic diagram of an embodiment of a method for training a translation model in an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of an initial translation model in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an embodiment of a method for statement translation in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an application scenario of statement translation in the embodiment of the present application.
  • FIG. 6 is a schematic diagram of another application scenario of statement translation in the embodiment of the present application.
  • FIG. 7 is a schematic diagram of another application scenario of statement translation in the embodiment of the present application.
  • FIG. 8 is a schematic diagram of another application scenario of statement translation in the embodiment of the present application.
  • FIG. 9 is a schematic diagram of an embodiment of an apparatus for training a translation model in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of an embodiment of an apparatus for translating sentences in an embodiment of the present application.
  • FIG. 11 is a schematic diagram of an embodiment of a computer device in an embodiment of the present application.
  • FIG. 12 is a schematic diagram of an embodiment of a terminal device in an embodiment of the present application.
  • the embodiment of the present application provides a method for training a translation model, which can improve the robustness of machine translation and the quality of translation.
  • the embodiment of the present application also provides a corresponding method for sentence translation, a computer device, a terminal device, and a computer readable storage medium. The details are described below separately.
  • machine translation is used in both simultaneous interpretation and text translation.
  • Machine translation is usually model-based translation, that is, by pre-training the translation model, the trained translation model can receive a statement in one language and then convert the statement into another language output.
  • neuromachine translation is a machine translation model based entirely on neural networks. The accuracy of translation is high, but the anti-noise ability of the model is not good. Once there is a slight disturbance in the input sentence, the output statement will be inaccurate. Therefore, the embodiment of the present application provides a method for training a translation model, and various perturbation samples are introduced in the training sample during the training of the translation model, thereby ensuring that the trained translation model receives the speech with the disturbance. Translation can also be done correctly.
  • the disturbance includes noise.
  • FIG. 1 is a schematic diagram of an embodiment of a system for training a translation model in an embodiment of the present application.
  • an embodiment of a system for translation model training in an embodiment of the present application includes a computer device 10 and a database 20 in which training samples are stored.
  • the computer device 10 acquires a training sample set from the database 20, and then uses the training sample set to perform translation model training to obtain a target translation model.
  • FIG. 2 is a schematic diagram of an embodiment of a method for training a model in the embodiment of the present application.
  • an embodiment of a method for training a translation model provided by an embodiment of the present application includes:
  • the computer device acquires a training sample set, where the training sample set includes multiple training samples.
  • the training sample in the training sample set refers to the sample without the disturbance.
  • the computer device determines a perturbed sample set corresponding to each training sample in the training sample set, where the perturbed sample set includes at least one perturbation sample, and the semantic similarity between the perturbed sample and the corresponding training sample is higher than the first pre-preparation Set the value.
  • the disturbance sample refers to a sample containing disturbance information or noise, but the similarity between the semantics and the training sample is basically the same, and the disturbance information may be a word with the same meaning but different words, or may be other situations. A word that makes the semantics of the statement not change much.
  • the first preset value in the embodiment of the present application may be a specific value, such as: 90% or 95%, etc., which is merely an example, and does not limit the value of the first preset value, the first preset value. Can be set according to needs.
  • the computer device trains the initial translation model by using the plurality of training samples and the perturbed sample set corresponding to each of the training samples to obtain a target translation model.
  • the training samples are trained together with the corresponding perturbed samples.
  • the perturbed sample is used in the training of the translation model, and the semantic similarity between the perturbed sample and the training sample is higher than the first preset value, that is, the semantics of the perturbed sample and the training sample are very similar, so that the trained target translation is performed.
  • the model can also translate correctly when it receives a statement with noise. Thereby improving the robustness of machine translation and the quality of translation.
  • Each training sample is a training sample pair, and the training sample pair includes a training input sample and a training output sample;
  • the determining, by the computer device, the set of perturbation samples corresponding to each training sample may include:
  • the computer device uses the plurality of training samples and the perturbed sample set corresponding to each of the training samples to train the initial translation model to obtain the target translation model, which may include:
  • the initial translation model is trained using a plurality of training sample pairs and a perturbed input sample set corresponding to each of the training input samples, and a perturbed output sample corresponding to the perturbed output sample set to obtain a target translation model.
  • the training input sample is a first language
  • the training output sample is a second language.
  • the first language is different from the second language.
  • the first language is exemplified in Chinese
  • the second language is exemplified in English.
  • Chinese and English should not be construed as limiting the translation model in the embodiments of the present application.
  • the translation model in the embodiment of the present application can be applied to translation between any two different languages. Translation between the two languages can be achieved as long as the training samples in the corresponding two languages are used during training.
  • each training input sample may have multiple perturbed input samples, but the perturbed input samples corresponding to each perturbed input sample are the same as the training output samples.
  • Table 1 is only an example.
  • the perturbed input samples corresponding to the training input samples may be less than those listed in Table 1 or more than those listed in Table 1.
  • the perturbation input samples are described above.
  • the generation of the perturbed input samples is described below.
  • a method of generating a perturbed input sample can be:
  • the determining the set of the perturbed input samples corresponding to each of the training input samples in the training sample set may include:
  • a perturbed sentence is generated from a vocabulary level, an input sentence is given, then the first word to be modified is sampled, the position of the first word is determined, and then the first words of the positions are determined. Replace with the second word in the word list.
  • the word list will contain many words, and the choice of the second word can be understood by referring to the following formula.
  • E[x i ] is the word vector of the first word x i
  • cos(E[x i ], E[x]) measures the similarity between the first word x i and the second word x. Since the word vector can capture the semantic information of the word, by this alternative, the first word x i in the current sentence can be better replaced with the second word x having similar semantic information.
  • Another type of perturbed input sample can be generated as follows:
  • the determining the set of the perturbed input samples corresponding to each of the training input samples in the training sample set may include:
  • a different Gaussian noise vector is superimposed on the word vector of each word to obtain the perturbed sample set.
  • a sentence with a disturbance is generated from a feature level. Given a statement, you can get the vector of each word in the statement, and then add Gaussian noise to the word vector of each word to simulate the possible types of disturbance. You can understand it by referring to the following formula:
  • E[x i ] identifies the word vector of the word x i
  • E[x′ i ] is the word vector of the word after adding Gaussian noise
  • the vector ⁇ is sampled from the Gaussian noise with variance ⁇ 2 , ⁇ Is a hyperparameter.
  • This technical solution is a general solution that can freely define any strategy for adding a disturbance input.
  • FIG. 3 is a schematic structural diagram of an initial translation model in the embodiment of the present application.
  • the initial translation model provided by the embodiment of the present application includes an encoder, a classifier, and a decoder.
  • the encoder is configured to receive the training input sample and the corresponding perturbed input sample, and output a first intermediate representation result and a second intermediate representation result, wherein the first intermediate representation result is an intermediate representation result of the training input sample, and the second intermediate representation result is corresponding Disturbs the middle of the input sample to represent the result.
  • the classifier is for distinguishing between the first intermediate representation result and the second intermediate representation result.
  • the decoder is configured to output a training output sample according to the first intermediate representation result, and output the training output sample according to the second intermediate representation result.
  • the model objective function of the initial translation model includes a classification objective function associated with the classifier and the encoder, a training objective function and a perturbation objective function associated with the encoder and the decoder;
  • the classification objective function includes the training input sample, the corresponding disturbance input sample, parameters of the encoder, and parameters of the classifier;
  • the training objective function includes the training input sample, the training output sample, parameters of the encoder, and parameters of the decoder;
  • the perturbation objective function includes the perturbed input sample, the training output sample, parameters of the encoder, and parameters of the decoder.
  • the training input sample may be represented by x
  • the corresponding disturbance input sample may be represented by x′
  • the training output sample and the disturbance output sample are represented by y
  • the first intermediate representation result may be represented by H x
  • the second The intermediate representation result can be represented by H x '
  • the classification objective function can be represented by L inv (x, x')
  • the training objective function can be represented by L true (x, y)
  • the perturbed objective function can be L noisy (x', y) indicates.
  • the initial translation model in the embodiment of the present application may be a neural machine translation model.
  • the training goal for the initial translation model is to enable the initial translation model to remain substantially consistent for the translation behavior of x and x'.
  • the encoder is responsible for converting the statement x of the first language into H x , and the decoder outputs the target language statement y with H x as input and output.
  • the training goal of the embodiment of the present application is to train a perturbation-invariant encoder and decoder.
  • L inv (x, x') encourages the encoder to output a similar representation for x and x', thereby implementing a perturbation-invariant encoder that achieves this by combating learning.
  • L noisy (x', y) directs the decoder to generate a target language statement y for the input x' containing the perturbation.
  • the two newly introduced training objectives enable the robustness of the neural machine translation model so that it can be protected from drastic changes in the output space due to small disturbances in the input.
  • the training target L true (x, y) on the original data x and y will be introduced to ensure the quality of the translation while enhancing the robustness of the neural machine translation model.
  • ⁇ enc is the parameter of the encoder
  • ⁇ dec is the parameter of the decoder
  • ⁇ dis is the parameter of the classifier.
  • ⁇ and ⁇ are used to control the importance between the original translation task and the stability of the machine translation model.
  • the goal of the perturbation-invariant encoder is that when the encoder inputs a correct statement x and its corresponding perturbation statement x', the representation of the two statements by the encoder is indistinguishable, which directly contributes to the robustness of the decoder. Output.
  • an encoder can be used as the generator G, which defines a process of generating a hidden representation H x sequence.
  • a classifier D is also introduced to distinguish between the representation H x of the original input and the perturbation input H x ' .
  • the effect of the generator G is to produce a similar representation for x and x' so that the classifier D cannot distinguish them, whereas the role of the classifier D is to try to distinguish them.
  • the adversarial learning objectives are defined as:
  • the classifier Given an input, the classifier outputs a categorical value whose goal is to maximize the categorical value of the correct statement x while minimizing the categorical value of the perturbed statement x'.
  • the stochastic gradient descent is used to optimize the model objective function J( ⁇ ).
  • J( ⁇ ) In forward propagation, in addition to a batch of data containing x and y, a batch of data of x' and y is also included.
  • the values of J( ⁇ ) can be calculated from the two batches of data, and then J( ⁇ ) is calculated to correspond to the gradient of the model parameters, which are used to update the model parameters. Since the goal of Linv is to maximize the categorical value of the correct statement x while minimizing the categorical value of the perturbed statement x', Linv multiplies the gradient of the parameter set ⁇ enc by -1, and the other gradients propagate normally. Thus, the values of ⁇ enc , ⁇ dec , and ⁇ dis in the initial training model can be calculated, thereby training a target translation model with anti-noise capability.
  • the using the plurality of training sample pairs and the perturbed input sample set corresponding to each of the training input samples, and the perturbed output sample corresponding to the perturbed output sample set to train the initial translation model To get the target translation model including:
  • the training process of the target translation model is introduced above.
  • the process of using the target translation model for statement translation is described below.
  • FIG. 4 is a schematic diagram of an embodiment of a method for translation of a sentence in the embodiment of the present application. As shown in FIG. 4, an embodiment of a method for translation of a sentence provided by an embodiment of the present application includes:
  • the terminal device receives the first to-be-translated statement expressed in the first language.
  • the first language may be any one of the types of languages supported by the target translation model.
  • the terminal device translates the first to-be-translated sentence using a target translation model to obtain a translation result statement expressed in a second language, where the target translation model is to use multiple training samples and the plurality of training samples
  • Each of the training samples is trained by a corresponding set of perturbed samples, the perturbed sample set includes at least one perturbed sample, and the semantic similarity between the perturbed samples and the corresponding training samples is higher than a first preset value.
  • the terminal device outputs the translation result statement expressed in the second language.
  • the second language is a language different from the first language, for example, the first language is Chinese and the second language is English.
  • the target translation model since the target translation model has anti-noise capability, when a speech with noise is received, the translation can be performed correctly. Thereby improving the robustness of machine translation and the quality of translation.
  • the method may further include:
  • the second to-be-translated statement is a perturbation statement of the first to-be-translated sentence, and the second to-be-translated sentence and the first to-be-translated statement
  • the similarity of the statement is higher than the first preset value
  • the terminal device outputs the statement of the translation result.
  • the first statement to be translated is not limited to the training input sample in the above example, and may be one of the above-mentioned disturbance input samples.
  • FIG. 5 is a schematic diagram of an application scenario of a sentence translation in the embodiment of the present application
  • (A)-(C) in FIG. 5 are diagrams of a scenario of text translation in a social application according to an embodiment of the present application.
  • FIG. 5 is a diagram showing another example of a scenario of text translation in a social application according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of another application scenario of the sentence translation in the embodiment of the present application.
  • a of FIG. 6 if the “they play the game AI without fear of difficulty” is translated into English, the text is long pressed.
  • the page shown in (B) of FIG. 6 appears, and functions such as “copy”, “forward”, “delete”, and “translate” appear in the page shown in (B) of FIG.
  • the user clicks "Translate” on the page shown in (B) of FIG. 6 the translation result "They are not afraid of difficulties to make Go AI” shown in (C) of FIG. 6 appears. .
  • FIG. 7 is a schematic diagram of an application of sentence translation in a simultaneous interpretation scenario according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of another application of the sentence translation in the simultaneous interpretation scenario according to an embodiment of the present application.
  • the above embodiment introduces the training process of the target translation model in the embodiment of the present application and the process of using the target translation model to perform the sentence translation.
  • the device for translation model training, the device for sentence translation, and the calculation are introduced in the following embodiments with reference to the accompanying drawings. Equipment and terminal equipment.
  • FIG. 9 is a schematic diagram of an embodiment of an apparatus for training a model in the embodiment of the present application.
  • the apparatus 30 for translation model training provided by the embodiment of the present application includes one or more processors, and one or more memories of the program unit, wherein the program units are executed by the processor.
  • the program units include:
  • the obtaining unit 301 is configured to acquire a training sample set, where the training sample set includes multiple training samples;
  • the determining unit 302 is configured to determine a perturbed sample set corresponding to each training sample in the training sample set acquired by the acquiring unit 301, the perturbed sample set includes at least one perturbation sample, and the perturbed sample and corresponding training The semantic similarity of the sample is higher than the first preset value;
  • the model training unit 303 is configured to use the plurality of training samples obtained by the obtaining unit 301 and the perturbed sample set corresponding to each of the training samples determined by the determining unit 302 to train an initial translation model to obtain a target Translation model.
  • the perturbed sample is used in the training of the translation model, and the semantic similarity between the perturbed sample and the training sample is higher than the first preset value, that is, the semantics of the perturbed sample and the training sample are very similar, so that the trained target translation is performed.
  • the model can also translate correctly when it receives a statement with noise. Thereby improving the robustness of machine translation and the quality of translation.
  • each training sample is a training sample pair, where the training sample pair includes a training input sample and a training output sample, and each of the training input samples is determined to be corresponding to each Disturbing an input sample set, and a perturbed output sample corresponding to the perturbed output sample set, the perturbed input sample set including at least one perturbed input sample, the perturbed output sample being identical to the training output sample;
  • the model training unit 303 is configured to use a plurality of training sample pairs and a perturbed input sample set corresponding to each of the training input samples, and a perturbed output sample corresponding to the perturbed output sample set to train an initial translation model to obtain a target Translation model.
  • the determining unit 302 is configured to:
  • the determining unit 302 is configured to:
  • a different Gaussian noise vector is superimposed on the word vector of each word to obtain the perturbed sample set.
  • the initial translation model includes an encoder, a classifier, and a decoder
  • the encoder is configured to receive the training input sample and a corresponding perturbed input sample, and output a first intermediate representation result and a second intermediate representation result, where the first intermediate representation result is an intermediate representation result of the training input sample, The second intermediate representation result is an intermediate representation result of the corresponding perturbed input sample;
  • the classifier is configured to distinguish the first intermediate representation result and the second intermediate representation result
  • the decoder is configured to output the training output sample according to a first intermediate representation result, and output the training output sample according to the second intermediate representation result.
  • the model objective function of the initial translation model includes a classification objective function related to the classifier and the encoder, a training objective function and a perturbation objective function related to the encoder and the decoder;
  • the classification objective function includes the training input sample, the corresponding disturbance input sample, parameters of the encoder, and parameters of the classifier;
  • the training objective function includes the training input sample, the training output sample, parameters of the encoder, and parameters of the decoder;
  • the perturbation objective function includes the perturbed input sample, the training output sample, parameters of the encoder, and parameters of the decoder.
  • model training unit 303 is configured to:
  • the device 30 for training the translation model provided by the embodiment of the present application can be understood by referring to the corresponding content in the embodiment of the method, and details are not repeated herein.
  • an embodiment of an apparatus for statement translation provided by an embodiment of the present application includes one or more processors, and one or more memories storing program units, wherein the program units are executed by a processor.
  • the program unit includes:
  • the receiving unit 401 is configured to receive the first to-be-translated statement expressed in the first language
  • the translating unit 402 is configured to translate the first to-be-translated sentence received by the receiving unit 401 using a target translation model to obtain a translation result statement expressed in a second language, wherein the target translation model is used Training samples and training samples corresponding to each of the plurality of training samples are respectively trained, the perturbed sample set includes at least one perturbed sample, and the semantic similarity between the perturbed samples and the corresponding training samples is higher than First preset value;
  • the output unit 403 is configured to output a translation result statement expressed by the translation unit 402 and expressed in a second language.
  • the target translation model since the target translation model has anti-noise capability, when a speech with noise is received, the translation can be performed correctly. Thereby improving the robustness of machine translation and the quality of translation.
  • the receiving unit 401 is further configured to receive a second to-be-translated statement expressed in the first language, where the second to-be-translated statement is a perturbation statement of the first to-be-translated sentence, The similarity between the second statement to be translated and the first statement to be translated is higher than the first preset value;
  • the translation unit 402 is further configured to translate the second statement to be translated using a target translation model to obtain the translation result statement corresponding to the first statement to be translated;
  • the output unit 403 is further configured to output the translation result statement.
  • the device 40 for translating the above statement can be understood by referring to the corresponding content of the method embodiment, and details are not repeated herein.
  • FIG. 11 is a schematic diagram of an embodiment of a computer device in an embodiment of the present application.
  • computer device 50 includes a processor 510, a memory 540, and an input/output (I/O) interface 530, which may include read only memory and random access memory, and provides operational instructions and data to processor 510.
  • I/O input/output
  • a portion of memory 540 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • memory 540 stores elements, executable modules or data structures, or a subset thereof, or their extension set:
  • the operation instruction in the process of determining the ground flag, by calling an operation instruction stored in the memory 540 (the operation instruction can be stored in the operating system),
  • the training sample set includes a plurality of training samples
  • the initial translation model is trained using the plurality of training samples and the respective perturbed sample sets corresponding to each of the training samples to obtain a target translation model.
  • the perturbed sample is used in the training of the translation model, and the semantic similarity between the perturbed sample and the training sample is higher than the first preset value, that is, the semantics of the perturbed sample and the training sample are very similar, so that the trained target translation is performed.
  • the model can also translate correctly when it receives a statement with noise. Thereby improving the robustness of machine translation and the quality of translation.
  • the processor 510 controls the operation of the computer device 50, which may also be referred to as a Central Processing Unit (CPU).
  • Memory 540 can include read only memory and random access memory and provides instructions and data to processor 510.
  • a portion of memory 540 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the specific components of the computer device 50 are coupled together by a bus system 520 in a specific application.
  • the bus system 520 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus system 520 in the figure.
  • Processor 510 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 510 or an instruction in a form of software.
  • the processor 510 described above may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or discrete hardware. Component.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in memory 540, and processor 510 reads the information in memory 540 and, in conjunction with its hardware, performs the steps of the above method.
  • the processor 510 is configured to:
  • Each of the training samples is a training sample pair, and when the training sample pair includes a training input sample and a training output sample, determining a perturbed input sample set corresponding to each training input sample, and the perturbed output sample set corresponds to Disturbing an output sample, the perturbed input sample set comprising at least one perturbed input sample, the perturbed output sample being identical to the training output sample;
  • the initial translation model is trained using a plurality of training sample pairs and a perturbed input sample set corresponding to each of the training input samples, and a perturbed output sample corresponding to the perturbed output sample set to obtain a target translation model.
  • the processor 510 is configured to:
  • the processor 510 is configured to:
  • a different Gaussian noise vector is superimposed on the word vector of each word to obtain the perturbed sample set.
  • the initial translation model includes an encoder, a classifier, and a decoder
  • the encoder is configured to receive the training input sample and a corresponding perturbed input sample, and output a first intermediate representation result and a second intermediate representation result, where the first intermediate representation result is an intermediate representation result of the training input sample, The second intermediate representation result is an intermediate representation result of the corresponding perturbed input sample;
  • the classifier is configured to distinguish the first intermediate representation result and the second intermediate representation result
  • the decoder is configured to output the training output sample according to a first intermediate representation result, and output the training output sample according to the second intermediate representation result.
  • the model objective function of the initial translation model includes a classification objective function related to the classifier and the encoder, a training objective function and a perturbation objective function related to the encoder and the decoder;
  • the classification objective function includes the training input sample, the corresponding disturbance input sample, parameters of the encoder, and parameters of the classifier;
  • the training objective function includes the training input sample, the training output sample, parameters of the encoder, and parameters of the decoder;
  • the perturbation objective function includes the perturbed input sample, the training output sample, parameters of the encoder, and parameters of the decoder.
  • the processor 510 is configured to:
  • the description of the computer device 50 can be understood by referring to the description in the parts of FIG. 1 to FIG. 3, and details are not repeated herein.
  • the process of translating the above statement is performed by the terminal device, such as a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), a car computer, and the like, and the terminal is a mobile phone.
  • the terminal device such as a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), a car computer, and the like
  • PDA personal digital assistant
  • POS point of sales
  • car computer a mobile phone
  • FIG. 12 is a schematic diagram of an embodiment of a terminal device according to an embodiment of the present application.
  • the terminal device is a mobile phone.
  • the mobile phone includes: a radio frequency (RF) circuit 1110, a memory 1120, an input unit 1130, a display unit 1140, a sensor 1150, an audio circuit 1160, a wireless fidelity (WiFi) module 1170, and a processor 1180. And components such as the camera 1190.
  • RF radio frequency
  • the structure of the handset shown in Figure 12 does not constitute a limitation to the handset, and may include more or fewer components than those illustrated, or some components may be combined, or different components may be arranged.
  • the RF circuit 1110 can be used to transmit and receive information, or receive and transmit signals during a call, and the RF circuit 1110 is also a transceiver. Specifically, after the downlink information of the base station is received, it is processed by the processor 1180; in addition, the data designed for the uplink is transmitted to the base station.
  • RF circuit 1110 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like.
  • LNA Low Noise Amplifier
  • RF circuitry 1110 can also communicate with the network and other devices via wireless communication.
  • the above wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division). Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), and the like.
  • GSM Global System of Mobile communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • E-mail Short Messaging Service
  • the memory 1120 can be used to store software programs and modules, and the processor 1180 executes various functional applications and data processing of the mobile phone by running software programs and modules stored in the memory 1120.
  • the memory 1120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to Data created by the use of the mobile phone (such as audio data, phone book, etc.).
  • memory 1120 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the input unit 1130 can be configured to receive a statement to be translated and a translation indicator input by the user.
  • the input unit 1130 may include a touch panel 1131 and other input devices 1132.
  • the touch panel 1131 also referred to as a touch screen, can collect touch operations on or near the user (such as the user using a finger, a stylus, or the like on the touch panel 1131 or near the touch panel 1131. Operation), and drive the corresponding connecting device according to a preset program.
  • the touch panel 1131 may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
  • the processor 1180 is provided and can receive commands from the processor 1180 and execute them.
  • the touch panel 1131 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • the input unit 1130 may also include other input devices 1132.
  • other input devices 1132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
  • Display unit 1140 can be used to display the results of the translation.
  • the display unit 1140 may include a display panel 1141.
  • the display panel 1141 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch panel 1131 can cover the display panel 1141. After the touch panel 1131 detects a touch operation thereon or nearby, the touch panel 1131 transmits to the processor 1180 to determine the type of the touch event, and then the processor 1180 according to the touch event. The type provides a corresponding visual output on the display panel 1141.
  • the touch panel 1131 and the display panel 1141 are used as two independent components to implement the input and input functions of the mobile phone in FIG. 12, in some embodiments, the touch panel 1131 and the display panel 1141 may be integrated. Realize the input and output functions of the phone.
  • the handset may also include at least one type of sensor 1150, such as a light sensor, motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1141 according to the brightness of the ambient light, and the proximity sensor may close the display panel 1141 and/or when the mobile phone moves to the ear. Or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
  • the mobile phone can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
  • the gesture of the mobile phone such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration
  • vibration recognition related functions such as pedometer, tapping
  • the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
  • An audio circuit 1160, a speaker 1161, and a microphone 1162 can provide an audio interface between the user and the handset.
  • the audio circuit 1160 can transmit the converted electrical data of the received audio data to the speaker 1161, and convert it into a sound signal output by the speaker 1161; on the other hand, the microphone 1162 converts the collected sound signal into an electrical signal, and the audio circuit 1160 After receiving, it is converted into audio data, and then processed by the audio data output processor 1180, transmitted to the other mobile phone via the RF circuit 1110, or outputted to the memory 1120 for further processing.
  • WiFi is a short-range wireless transmission technology.
  • the mobile phone can help users to send and receive emails, browse web pages and access streaming media through the WiFi module 1170, which provides users with wireless broadband Internet access.
  • FIG. 12 shows the WiFi module 1170, it can be understood that it does not belong to the essential configuration of the mobile phone, and may be omitted as needed within the scope of not changing the essence of the invention.
  • the processor 1180 is a control center for the handset, which connects various portions of the entire handset using various interfaces and lines, by executing or executing software programs and/or modules stored in the memory 1120, and invoking data stored in the memory 1120, The phone's various functions and processing data, so that the overall monitoring of the phone.
  • the processor 1180 may include one or more processing units; preferably, the processor 1180 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 1180.
  • Camera 1190 is used to capture images.
  • the mobile phone also includes a power source (such as a battery) that supplies power to various components.
  • a power source such as a battery
  • the power source can be logically coupled to the processor 1180 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
  • the mobile phone may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
  • the processor 1180 included in the terminal further has the following control functions:
  • the respective perturbed sample sets of the samples are trained, and the perturbed sample set includes at least one perturbed sample, and the semantic similarity between the perturbed samples and the corresponding training samples is higher than the first preset value;
  • the translation result statement expressed in the second language is output.
  • the second to-be-translated statement is a perturbation statement of the first to-be-translated sentence, and the second to-be-translated statement and the first to-be-translated statement The similarity is higher than the first preset value;
  • the translation result statement is output.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
  • wire eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be stored by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a Solid State Disk (SSD)) or the like.
  • the program may be stored in a computer readable storage medium, and the storage medium may include: ROM, RAM, disk or CD.
  • the perturbed sample is used in the training of the translation model, and the semantic similarity between the perturbed sample and the training sample is higher than the first preset value, that is, the semantics of the perturbed sample and the training sample are very similar, so that the trained target translation is performed.
  • the model can also translate correctly when it receives a statement with noise. Thereby improving the robustness of machine translation and the quality of translation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

本申请实施例公开了一种翻译模型训练的方法、语句翻译的方法、设备及存储介质,该翻译模型训练的方法包括:计算机设备获取训练样本集合,训练样本集合中包括多个训练样本;计算机设备确定训练样本集合中每个训练样本各自对应的扰动样本集合,扰动样本集合包括至少一个扰动样本,扰动样本与对应训练样本的语义相似度高于第一预设值;计算机设备使用多个训练样本和每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型。本申请实施例提供的方案在模型训练是引入了扰动样本,所以可以提高机器翻译的鲁棒性,以及翻译质量。

Description

翻译模型训练的方法、语句翻译的方法、设备及存储介质
本申请要求于2018年05月10日提交中国专利局、申请号为201810445783.2、发明名称“翻译模型训练的方法、语句翻译的方法、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机技术领域,具体涉及一种翻译模型训练的方法、语句翻译的方法、设备及存储介质。
背景技术
随着人工智能的发展,机器翻译已经被广泛使用,如同声传译和聊天内容翻译等,都是基于机器翻译将一种输入语言转换为另一种语言输出。
神经机器翻译是一种完全基于神经网络的机器翻译模型,其在诸多语言对上已经达到了很好的翻译水平,已被广泛的应用在各种机器翻译产品中。然而,由于神经机器翻译模型基于一个完整的神经网络,其建模的全局性导致目标端的每个输出依赖于源端输入的每个词,使得对于输入中的微小扰动过度敏感。例如,在中文到英文得翻译中,用户输入“他们不怕困难做出围棋AI”,机器翻译模型给出的英文翻译为“They are not afraid of difficulties to make Go AI”,然而,当用户输入一个相似的语句“他们不畏困难做出围棋AI”,机器翻译的输出发生了剧烈改变,结果为“They are not afraid to make Go AI”,尽管用户只是用近义词替换了其中一个词,但其翻译结果却发生了剧烈变化。
由此可见,目前的神经机器翻译的稳定性,也就是鲁棒性比较差。
发明内容
本申请实施例提供一种翻译模型训练的方法、语句翻译的方法、设备及存储介质,可以提高机器翻译的鲁棒性,以及翻译质量。
本申请实施例第一方面提供一种翻译模型训练的方法,包括:
计算机设备获取训练样本集合,所述训练样本集合中包括多个训练样本;
计算机设备确定所述训练样本集合中每个训练样本各自对应的扰动样本 集合,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;
计算机设备使用所述多个训练样本和所述每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型。
本申请实施例第二方面提供一种语句翻译的方法,包括:
终端设备接收以第一语言表达的第一待翻译语句;
终端设备使用目标翻译模型对所述第一待翻译语句进行翻译,以得到用第二语言表达的翻译结果语句,其中所述目标翻译模型为使用多个训练样本和所述多个训练样本中每个训练样本各自对应的扰动样本集合训练得到的,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;
终端设备输出所述用第二语言表达的翻译结果语句。
本申请实施例第三方面提供一种翻译模型训练的装置,包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,程序单元由处理器执行,程序单元包括:
获取单元,被设置为获取训练样本集合,所述训练样本集合中包括多个训练样本;
确定单元,被设置为确定所述获取单元获取的所述训练样本集合中每个训练样本各自对应的扰动样本集合,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;
模型训练单元,被设置为使用所述获取单元获得的所述多个训练样本和所述确定单元确定的所述每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型。
本申请实施例第四方面提供一种语句翻译的装置,包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,程序单元由处理器执行,程序单元包括:
接收单元,被设置为接收以第一语言表达的第一待翻译语句;
翻译单元,被设置为使用目标翻译模型对所述接收单元接收的所述第一待 翻译语句进行翻译,以得到用第二语言表达的翻译结果语句,其中所述目标翻译模型为使用多个训练样本和所述多个训练样本中每个训练样本各自对应的扰动样本集合训练得到的,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;
输出单元,被设置为输出所述翻译单元翻译出的用第二语言表达的翻译结果语句。
本申请实施例第五方面提供一种计算机设备,所述计算机设备包括:输入/输出(I/O)接口、处理器和存储器,所述存储器中存储有程序指令;
所述处理器用于执行存储器中存储的程序指令,执行如第一方面所述的方法。
本申请实施例第六方面提供一种终端设备,所述终端设备包括:输入/输出(I/O)接口、处理器和存储器,所述存储器中存储有程序指令;
所述处理器用于执行存储器中存储的程序指令,执行如第二方面所述的方法。
本申请实施例第七方面提供一种非暂态计算机可读存储介质,包括指令,所述指令在计算机设备上运行时,使得所述计算机设备执行如上述第一方面所述的方法或第二方面所述的方法。
本申请实施例的又一方面提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。
本申请实施例在翻译模型训练时就采用了扰动样本,扰动样本与训练样本的语义相似度高于第一预设值,也就是扰动样本与训练样本的语义很相近,这样训练出来的目标翻译模型在接收到带有噪声的语句时,也可以正确进行翻译。从而提高了机器翻译的鲁棒性,以及翻译质量。
附图说明
图1是本申请实施例中翻译模型训练的系统的一实施例示意图;
图2是本申请实施例中翻译模型训练的方法的一实施例示意图;
图3是本申请实施例中初始翻译模型的一架构示意图;
图4是本申请实施例中语句翻译的方法的一实施例示意图;
图5是本申请实施例中语句翻译的一应用场景示意图;
图6是本申请实施例中语句翻译的另一应用场景示意图;
图7是本申请实施例中语句翻译的另一应用场景示意图;
图8是本申请实施例中语句翻译的另一应用场景示意图;
图9是本申请实施例中翻译模型训练的装置的一实施例示意;
图10是本申请实施例中语句翻译的装置的一实施例示意;
图11是本申请实施例中计算机设备的一实施例示意图;
图12是本申请实施例中终端设备的一实施例示意图。
具体实施方式
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请实施例提供一种翻译模型训练的方法,可以提高机器翻译的鲁棒性,以及翻译质量。本申请实施例还提供了相应的语句翻译的方法、计算机设备、终端设备以及计算机可读存储介质。以下分别进行详细说明。
随着人工智能的发展,机器翻译的准确度越来越高,很大程度上方便了用户。如:同声传译、文字翻译等场景中都用到了机器翻译。机器翻译通常是基于模型的翻译,也就是通过预先训练翻译模型,训练好的翻译模型可以接收一种语言的语句,然后将该语句转换成另一种语言输出。目前神经机器翻译是完全基于神经网络的机器翻译模型,翻译的准确度较高,但该模型的抗噪声能力不好,一旦输入的语句中有微小的扰动,输出的语句就会不准确。因此,本申请实施例提供一种翻译模型训练的方法,在翻译模型训练时在训练样本中就引入了各种扰动样本,从而保证了训练出的翻译模型在接收到带有扰动的语句时,也可以正确进行翻译。
需要说明的是,本申请实施例中,扰动包括噪声。
下面结合附图介绍本申请实施例中翻译模型训练的过程。
图1为本申请实施例中翻译模型训练的系统的一实施例示意图。
如图1所示,本申请实施例中的翻译模型训练的系统的一实施例包括计算机设备10和数据库20,数据库20中存储有训练样本。
计算机设备10从数据库20获取训练样本集合,然后使用该训练样本集合进行翻译模型训练,得到目标翻译模型。
该模型训练的过程参阅图2进行理解,图2是本申请实施例中翻译模型训练的方法的一实施例示意图。
如图2所示,本申请实施例提供的翻译模型训练的方法的一实施例包括:
101、计算机设备获取训练样本集合,所述训练样本集合中包括多个训练样本。
本申请实施例中,训练样本集合中的训练样本指的是不带有扰动的样本。
102、计算机设备确定所述训练样本集合中每个训练样本各自对应的扰动样本集合,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值。
本申请实施例中,扰动样本是指包含了扰动信息或者噪声的样本,但语义与训练样本的相似度还是基本一致的,扰动信息可以是意思相同但文字不同的词,也可以是其他情形能使语句的语义不发生较大变化的词。
本申请实施例中的第一预设值可以为一具体值,如:90%或95%等,本处只是举例说明,并不限定第一预设值的取值,该第一预设值可以根据需求设定。
关于训练样本与扰动样本的关系可以参阅如下例子进行理解:
训练样本:“他们不怕困难做出围棋AI”。
扰动样本:“他们不畏困难做出围棋AI”。
由上述例子可见,训练样本和扰动样本的语义很接近,只是用不同的词,如“不畏”对原词“不怕”做了替换。
103、计算机设备使用所述多个训练样本和所述每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型。
在模型训练时,使用训练样本和对应的扰动样本一起训练。
本申请实施例在翻译模型训练时就采用了扰动样本,扰动样本与训练样本的语义相似度高于第一预设值,也就是扰动样本与训练样本的语义很相近,这 样训练出来的目标翻译模型在接收到带有噪声的语句时,也可以正确进行翻译。从而提高了机器翻译的鲁棒性,以及翻译质量。
可选地,本申请实施例提供的翻译模型训练的方法的另一实施例中,
所述每个训练样本为一个训练样本对,所述训练样本对包括训练输入样本和训练输出样本;
对应地,所述计算机设备确定每个训练样本各自对应的扰动样本集合,可以包括:
确定每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本,所述扰动输入样本集合包括至少一个扰动输入样本,所述扰动输出样本与所述训练输出样本相同;
对应地,计算机设备使用所述多个训练样本和所述每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型,可以包括:
使用多个训练样本对和所述每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本训练初始翻译模型,以得到目标翻译模型。
本申请实施例中,训练输入样本为第一语言,训练输出样本为第二种语言。第一语言和第二语言不同。本申请实施例中第一语言用中文举例,第二语言用英文举例。但不应将中文和英文理解为是对本申请实施例中翻译模型的限定。本申请实施例中的翻译模型可以适用于任意两种不同语言之间的互译。只要训练时采用了相应两种语言的训练样本,就可以实现这两种语言之间的翻译。
本申请实施例中,每个训练输入样本可以有多个扰动输入样本,但每个扰动输入样本对应的扰动输出样本都与训练输出样本相同。
以上训练输入样本、训练输出样本、扰动输入样本与扰动输出样本之间的对应关系可以参阅表1进行理解。
表1
Figure PCTCN2019080411-appb-000001
由以上表1可见,训练输入样本为x时,训练输出样本为y,x对应的扰动输入样本有多个,分别为x′1、x′2和x′3等,每个扰动输入样本对应的扰动输出样本都为y。这样,就可以确保训练出来的目标翻译模型,在无论输入为x还是为x′1、x′2和x′3时,输出的翻译结果都为y。从而进一步保证了目标翻译模型翻译的鲁棒性和翻译质量。
当然,表1也只是举例说明,训练输入样本对应的扰动输入样本可以比表1中列举的少或者比表1中列举的多。
以上介绍了扰动输入样本,下面介绍扰动输入样本的产生。
一种扰动输入样本的产生方式可以是:
所述确定所述训练样本集合中每个训练输入样本各自对应的扰动输入样本集合,可以包括:
确定所述每个训练输入样本中的第一词语,所述第一词语为待被替换的词语;
用至少一个第二词语分别替换所述第一词语,以得到所述扰动输入样本集合,所述第二词语与所述第一词语的语义相似度高于第二预设值。
本申请实施例中,从词汇级别来产生带扰动的语句,给定一个输入的语句,然后采样其中的要修改的第一词语,确定该第一词语的位置,然后将这些位置的第一词语用词语表中第二词语替换。
词语表中会包含很多词语,关于第二词语的选择可以参考如下公式进行理解。
Figure PCTCN2019080411-appb-000002
其中,E[x i]是第一词语x i的词向量,cos(E[x i],E[x])度量了第一词语x i与第二词语x的相似度。由于词向量能够捕捉词语的语义信息,因此,通过此替换方式,能够较好的将当前语句中的第一词x i替换成与其有相近语义信息的第二词语x。
另一种扰动输入样本的产生方式可以是:
所述确定所述训练样本集合中每个训练输入样本各自对应的扰动输入样本集合,可以包括:
确定所述每个训练输入样本中每个词语的词向量;
每次在所述每个词语的词向量上叠加一个不同的高斯噪声向量,以得到所述扰动样本集合。
本申请实施例中,从特征级别产生带扰动的语句。给定一个语句,可以得到该语句中每个词语的向量,然后给每个词语的词向量都加上高斯噪声来模拟可能的扰动种类,可以参阅如下公式进行理解:
E[x′ i]=E[x i]+ε,ε~N(0,δ 2I)
以上公式表示,E[x i]标识词语x i的词向量,E[x′ i]是加入高斯噪声后词语的词向量,向量ε是从方差为δ 2的高斯噪声中采样出来的,δ是一个超参数。
本技术方案中是一个通用方案,其可以自由的定义任何加入扰动输入的策略。
以上介绍了扰动输入样本的产生过程,下面介绍本申请实施例中翻译模型的架构。
图3是本申请实施例中初始翻译模型的一架构示意图,如图3所示,本申请实施例提供的初始翻译模型包括编码器、分类器和解码器。
编码器用于接收训练输入样本和对应的扰动输入样本,并输出第一中间表示结果和第二中间表示结果,第一中间表示结果为训练输入样本的中间表示结果,第二中间表示结果为对应的扰动输入样本的中间表示结果。
分类器用于区分第一中间表示结果和第二中间表示结果。
解码器用于根据第一中间表示结果输出训练输出样本,根据第二中间表示结果输出训练输出样本。
所述初始翻译模型的模型目标函数包括与所述分类器和所述编码器相关的分类目标函数、与所述编码器和所述解码器相关的训练目标函数和扰动目标函数;
其中,所述分类目标函数中包括所述训练输入样本、所述对应的扰动输入样本、所述编码器的参数和所述分类器的参数;
所述训练目标函数包括所述训练输入样本、所述训练输出样本、所述编码器的参数和所述解码器的参数;
所述扰动目标函数包括所述扰动输入样本、所述训练输出样本、所述编码器的参数和所述解码器的参数。
本申请实施例中,训练输入样本可以用x表示,对应的扰动输入样本可以用x′表示,训练输出样本和扰动输出样本都用y表示,第一中间表示结果可以用H x表示,第二中间表示结果可以用H x′表示,分类目标函数可以用L inv(x,x′)表示,训练目标函数可以用L true(x,y)表示,扰动目标函数可以用L noisy(x′,y)表示。
本申请实施例中的初始翻译模型可以为神经机器翻译模型。
对初始翻译模型的训练目标是使得初始翻译模型能对于x和x′的翻译行为保持基本一致。编码器负责将第一语言的语句x转化为H x,而解码器以H x为输入输出目标语言语句y。本申请实施例的训练目标是训练一个扰动不变的编码器和解码器。
因为x′是x的一个微小的改变,所以会有相似的语义信息。给定一个输入对(x,x′),在翻译模型训练时的训练目标是:(1)编码表示H x应该与H x′尽可能相近;(2)给定H x′,解码器应该输出相同的y。因此,本申请实施例中引入了两个训练目标去增强编码器与解码器的鲁棒性:
引入L inv(x,x′)鼓励编码器对于x和x′输出相似的表示,从而实现扰动不变的编码器,通过对抗学习来实现此目标。
引入L noisy(x′,y)引导解码器能够对于含有扰动的输入x′产生目标语言语句y。
两个新引入的训练目标能够实现神经机器翻译模型的鲁棒性,使得其可以 免于遭受由于输入的微小扰动而引起的输出空间的剧烈变化。同时,会将在原始数据x和y上的训练目标L true(x,y)引入来保证在提升神经机器翻译模型鲁棒性的同时增强翻译的质量。
因此,初始翻译模型的模型目标函数为:
Figure PCTCN2019080411-appb-000003
其中θ enc是编码器的参数,θ dec是解码器的参数,θ dis是分类器的参数。α和β用来控制原始的翻译任务和机器翻译模型稳定性之间的重要度。
扰动不变的编码器的目标是当编码器输入一个正确语句x和其对应的扰动语句x′后,编码器对两个语句产生的表示是无法区分,其能直接有利于解码器产生鲁棒的输出。在本申请实施例中可以将编码器作为产生器G,其定义了产生隐表示H x序列的过程。同时引入了分类器D去区分原始输入的表示H x和扰动输入H x′。产生器G的作用是对于x和x′产生相近的表示,使得分类器D无法区分他们,然而分类器D的作用是尽力区分它们。
形式上,对抗性学习目标定义为:
L inv(x,x′;θ enc,θ dis)=E x~S[-logD(G(x))]
+E x′~N(x)[-log(1-D(G(x′)))]
给定一个输入,分类器会输出一个分类值,其目标是最大化正确语句x的分类值,同时最小化扰动语句x′的分类值。
采用随机梯度下降来优化模型目标函数J(θ)。在前向传播中,除了包含x和y的一批数据,还包含x′和y的一批数据。通过这两批数据能够计算出J(θ)的数值,然后,计算出J(θ)对应于模型参数的梯度,这些梯度会用来去更新模型参数。因为,L inv的目标是最大化正确语句x的分类值,同时最小化扰动语句x′的 分类值,所以L inv对于参数集合θ enc的梯度乘以-1,其他梯度正常传播。由此,可以计算出初始训练模型中θ enc、θ dec和θ dis的取值,从而训练出具有抗噪能力的目标翻译模型。
也就是说,本申请实施例中,所述使用多个训练样本对和所述每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本训练初始翻译模型,以得到目标翻译模型,包括:
将每个训练输入样本、对应的扰动输入样本以及对应的训练输出样本输入所述模型目标函数;
按照梯度下降的方式优化所述模型目标函数,以确定出所述编码器的参数的取值、所述解码器的参数的取值和所述分类器的参数的取值,其中,对于分类目标函数中编码器的参数的梯度乘以-1。
以上介绍了目标翻译模型的训练过程,下面介绍使用该目标翻译模型进行语句翻译的过程。
图4是本申请实施例中语句翻译的方法的一实施例示意图,如图4所示,本申请实施例提供的语句翻译的方法的一实施例包括:
201、终端设备接收以第一语言表达的第一待翻译语句。
本申请实施例中,第一语言可以是目标翻译模型所支持的任意一种类型的语言。
202、终端设备使用目标翻译模型对所述第一待翻译语句进行翻译,以得到用第二语言表达的翻译结果语句,其中所述目标翻译模型为使用多个训练样本和所述多个训练样本中每个训练样本各自对应的扰动样本集合训练得到的,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值。
关于目标翻译模型可以参阅前述模型训练过程的实施例进行理解,本处不做过多赘述。
203、终端设备输出所述用第二语言表达的翻译结果语句。
第二语言是与第一语言不同的语言,例如:第一语言为中文,第二语言为英文。
本申请实施例中,因为目标翻译模型具备抗噪能力,所以接收到带有噪声的语句时,也可以正确进行翻译。从而提高了机器翻译的鲁棒性,以及翻译质量。
可选地,本申请实施例提供的语句翻译的方法的另一实施例中,还可以包括:
终端设备接收以所述第一语言表达的第二待翻译语句,所述第二待翻译语句为所述第一待翻译语句的扰动语句,所述第二待翻译语句与所述第一待翻译语句的相似度高于第一预设值;
终端设备使用目标翻译模型对所述第二待翻译语句进行翻译,以得到与所述第一待翻译语句对应的所述翻译结果语句;
终端设备输出所述用所述翻译结果语句。
本申请实施例中,第一待翻译语句不限于是上述示例中的训练输入样本,可以是上述扰动输入样本中的一个。
关于本申请实施例中的语句翻译方案可以参阅下述两个场景示例进行理解。
图5是本申请实施例中语句翻译的一应用场景示意图,图5中的(A)-(C)为本申请实施例在社交应用中的文本翻译的一场景示例图。
如图5中的(A)所示,要将社交应用中的“他们不怕困难做出围棋AI”翻译成英文,则长按文字部分,就会出现图5中的(B)所示的页面,在图5中的(B)所示的页面中出现了“复制”、“转发”、“删除”和“译英”等功能框,当然图5中的(B)只是举例说明,“译英”也可以改成“翻译”,然后再出现下拉框选择对应的翻译文字。用户在图5中的(B)所示的页面上点击“译英”后,则会出现图5中的(C)所示的翻译结果“They are not afraid of difficulties to make Go AI”。
图5中的(A)-(C)为本申请实施例在社交应用中的文本翻译的另一场景示例图。
图6是本申请实施例中语句翻译的另一应用场景示意图,如图6中的A所示,要将社交应用中的“他们不畏困难做出围棋AI”翻译成英文,则长按文字部分, 就会出现图6中的(B)所示的页面,在图5中的(B)所示的页面中出现了“复制”、“转发”、“删除”和“译英”等功能框,用户在图6中的(B)所示的页面上点击“译英”后,则会出现图6中的(C)所示的翻译结果“They are not afraid of difficulties to make Go AI”。
由图5中的(A)-(C),以及图6中的(A)-(C)的过程和结果比对中可见,虽然图5中的(A)要翻译的语句是“他们不怕困难做出围棋AI”,图6的(A)中要翻译的语句是“他们不畏困难做出围棋AI”,针对这两个语义相似的语句,图5的(C)和图6的(C)中分别得到了相同的翻译结果“They are not afraid of difficulties to make Go AI”。可见,本申请实施例提供的语句翻译方案的鲁棒性更好,翻译质量更好。
图7为本申请实施例的语句翻译在同声传译场景的一应用示意图。
如图7所示,在同声传译场景中,发言者用中文说出了“他们不怕困难做出围棋AI”,在使用英文频道的听众中听到的语句为“They are not afraid of difficulties to make Go AI”。
图8为本申请实施例的语句翻译在同声传译场景的另一应用示意图。
如图8所示,在同声传译场景中,发言者用中文说出了“他们不畏困难做出围棋AI”,在使用英文频道的听众中听到的语句为“They are not afraid of difficulties to make Go AI”。
由图7和图8的示例对比可见,对于语义相似的输入,翻译的结果是相同的,可见,本申请实施例提供的语句翻译方案的鲁棒性更好,翻译质量更好。
需要说明的是,以上两个应用场景只是举例说明,本申请实施例的方案可以用在多种翻译场景中,而且涉及到的终端设备的形态也不限于图5至图8中所示出的形态。
以上实施例介绍了本申请实施例中目标翻译模型的训练过程和使用目标翻译模型进行语句翻译的过程,下面结合附图介绍本申请实施例中的翻译模型训练的装置、语句翻译的装置、计算节设备和终端设备。
图9是本申请实施例中翻译模型训练的装置的一实施例示意。本申请实施例提供的翻译模型训练的装置30包括一个或多个处理器,以及一个或多个存储 程序单元的存储器,其中,程序单元由处理器执行,如图9所示,程序单元包括:
获取单元301,被设置为获取训练样本集合,所述训练样本集合中包括多个训练样本;
确定单元302,被设置为确定所述获取单元301获取的所述训练样本集合中每个训练样本各自对应的扰动样本集合,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;
模型训练单元303,被设置为使用所述获取单元301获得的所述多个训练样本和所述确定单元302确定的所述每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型。
本申请实施例在翻译模型训练时就采用了扰动样本,扰动样本与训练样本的语义相似度高于第一预设值,也就是扰动样本与训练样本的语义很相近,这样训练出来的目标翻译模型在接收到带有噪声的语句时,也可以正确进行翻译。从而提高了机器翻译的鲁棒性,以及翻译质量。
可选地,所述确定单元302,被设置为在所述每个训练样本为一个训练样本对,所述训练样本对包括训练输入样本和训练输出样本时,确定每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本,所述扰动输入样本集合包括至少一个扰动输入样本,所述扰动输出样本与所述训练输出样本相同;
模型训练单元303,被设置为使用多个训练样本对和所述每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本训练初始翻译模型,以得到目标翻译模型。
可选地,所述确定单元302被设置为:
确定所述每个训练输入样本中的第一词语,所述第一词语为待被替换的词语;
用至少一个第二词语分别替换所述第一词语,以得到所述扰动输入样本集合,所述第二词语与所述第一词语的语义相似度高于第二预设值。
可选地,所述确定单元302被设置为:
确定所述每个训练输入样本中每个词语的词向量;
每次在所述每个词语的词向量上叠加一个不同的高斯噪声向量,以得到所述扰动样本集合。
可选地,所述初始翻译模型包括编码器、分类器和解码器;
所述编码器用于接收所述训练输入样本和对应的扰动输入样本,并输出第一中间表示结果和第二中间表示结果,所述第一中间表示结果为所述训练输入样本的中间表示结果,所述第二中间表示结果为所述对应的扰动输入样本的中间表示结果;
所述分类器用于区分所述第一中间表示结果和所述第二中间表示结果;
所述解码器用于根据第一中间表示结果输出所述训练输出样本,根据所述第二中间表示结果输出所述训练输出样本。
可选地,所述初始翻译模型的模型目标函数包括与所述分类器和所述编码器相关的分类目标函数、与所述编码器和所述解码器相关的训练目标函数和扰动目标函数;
其中,所述分类目标函数中包括所述训练输入样本、所述对应的扰动输入样本、所述编码器的参数和所述分类器的参数;
所述训练目标函数包括所述训练输入样本、所述训练输出样本、所述编码器的参数和所述解码器的参数;
所述扰动目标函数包括所述扰动输入样本、所述训练输出样本、所述编码器的参数和所述解码器的参数。
可选地,模型训练单元303被设置为:
将每个训练输入样本、对应的扰动输入样本以及对应的训练输出样本输入所述模型目标函数;
按照梯度下降的方式优化所述模型目标函数,以确定出所述编码器的参数的取值、所述解码器的参数的取值和所述分类器的参数的取值,其中,对于分类目标函数中编码器的参数的梯度乘以-1。
本申请实施例提供的翻译模型训练的装置30可以参阅上述方法实施例部分的相应内容进行理解,本处不再重复赘述。
如图10所示,本申请实施例提供的语句翻译的装置的一实施例,该装置包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,程序单元由处理器执行,程序单元包括:
接收单元401,被设置为接收以第一语言表达的第一待翻译语句;
翻译单元402,被设置为使用目标翻译模型对所述接收单元401接收的所述第一待翻译语句进行翻译,以得到用第二语言表达的翻译结果语句,其中所述目标翻译模型为使用多个训练样本和所述多个训练样本中每个训练样本各自对应的扰动样本集合训练得到的,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;
输出单元403,被设置为输出所述翻译单元402翻译出的用第二语言表达的翻译结果语句。
本申请实施例中,因为目标翻译模型具备抗噪能力,所以接收到带有噪声的语句时,也可以正确进行翻译。从而提高了机器翻译的鲁棒性,以及翻译质量。
可选地,所述接收单元401,还被设置为接收以所述第一语言表达的第二待翻译语句,所述第二待翻译语句为所述第一待翻译语句的扰动语句,所述第二待翻译语句与所述第一待翻译语句的相似度高于第一预设值;
所述翻译单元402,还被设置为使用目标翻译模型对所述第二待翻译语句进行翻译,以得到与所述第一待翻译语句对应的所述翻译结果语句;
所述输出单元403,还被设置为输出所述翻译结果语句。
以上语句翻译的装置40可以参阅方法实施例部分的相应内容进行理解,本处不再重复赘述。
图11是本申请实施例中计算机设备的一实施例示意图。如图11所示,计算机设备50包括处理器510、存储器540和输入输出(I/O)接口530,存储器540可以包括只读存储器和随机存取存储器,并向处理器510提供操作指令和数据。存储器540的一部分还可以包括非易失性随机存取存储器(NVRAM)。
在一些实施方式中,存储器540存储了如下的元素,可执行模块或者数据结构,或者他们的子集,或者他们的扩展集:
在本申请实施例中,在地面标志确定的过程中,通过调用存储器540存储的操作指令(该操作指令可存储在操作系统中),
获取训练样本集合,所述训练样本集合中包括多个训练样本;
确定所述训练样本集合中每个训练样本各自对应的扰动样本集合,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;
使用所述多个训练样本和所述每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型。
本申请实施例在翻译模型训练时就采用了扰动样本,扰动样本与训练样本的语义相似度高于第一预设值,也就是扰动样本与训练样本的语义很相近,这样训练出来的目标翻译模型在接收到带有噪声的语句时,也可以正确进行翻译。从而提高了机器翻译的鲁棒性,以及翻译质量。
处理器510控制计算机设备50的操作,处理器510还可以称为中央处理单元(Central Processing Unit,CPU)。存储器540可以包括只读存储器和随机存取存储器,并向处理器510提供指令和数据。存储器540的一部分还可以包括非易失性随机存取存储器(NVRAM)。具体的应用中计算机设备50的各个组件通过总线系统520耦合在一起,其中总线系统520除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统520。
上述本申请实施例揭示的方法可以应用于处理器510中,或者由处理器510实现。处理器510可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器510中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器510可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理 器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器540,处理器510读取存储器540中的信息,结合其硬件完成上述方法的步骤。
可选地,处理器510被设置为:
在所述每个训练样本为一个训练样本对,所述训练样本对包括训练输入样本和训练输出样本时,确定每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本,所述扰动输入样本集合包括至少一个扰动输入样本,所述扰动输出样本与所述训练输出样本相同;
使用多个训练样本对和所述每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本训练初始翻译模型,以得到目标翻译模型。
可选地,处理器510被设置为:
确定所述每个训练输入样本中的第一词语,所述第一词语为待被替换的词语;
用至少一个第二词语分别替换所述第一词语,以得到所述扰动输入样本集合,所述第二词语与所述第一词语的语义相似度高于第二预设值。
可选地,处理器510被设置为:
确定所述每个训练输入样本中每个词语的词向量;
每次在所述每个词语的词向量上叠加一个不同的高斯噪声向量,以得到所述扰动样本集合。
可选地,所述初始翻译模型包括编码器、分类器和解码器;
所述编码器用于接收所述训练输入样本和对应的扰动输入样本,并输出第一中间表示结果和第二中间表示结果,所述第一中间表示结果为所述训练输入样本的中间表示结果,所述第二中间表示结果为所述对应的扰动输入样本的中间表示结果;
所述分类器用于区分所述第一中间表示结果和所述第二中间表示结果;
所述解码器用于根据第一中间表示结果输出所述训练输出样本,根据所述第二中间表示结果输出所述训练输出样本。
可选地,所述初始翻译模型的模型目标函数包括与所述分类器和所述编码器相关的分类目标函数、与所述编码器和所述解码器相关的训练目标函数和扰动目标函数;
其中,所述分类目标函数中包括所述训练输入样本、所述对应的扰动输入样本、所述编码器的参数和所述分类器的参数;
所述训练目标函数包括所述训练输入样本、所述训练输出样本、所述编码器的参数和所述解码器的参数;
所述扰动目标函数包括所述扰动输入样本、所述训练输出样本、所述编码器的参数和所述解码器的参数。
可选地,处理器510被设置为:
将每个训练输入样本、对应的扰动输入样本以及对应的训练输出样本输入所述模型目标函数;
按照梯度下降的方式优化所述模型目标函数,以确定出所述编码器的参数的取值、所述解码器的参数的取值和所述分类器的参数的取值,其中,对于分类目标函数中编码器的参数的梯度乘以-1。
上对计算机设备50的描述可以参阅图1至图3部分的描述进行理解,本处不再重复赘述。
上述语句翻译的过程由终端设备来执行时,例如手机,平板电脑、个人数字助理(Personal Digital Assistant,PDA)、销售终端(Point of Sales,POS)、车载电脑等任意终端设备,以终端为手机为例:
图12是本申请实施例中终端设备的一实施例示意图,下面以该终端设备是手机为例进行说明。参考图12,手机包括:射频(Radio Frequency,RF)电路1110、存储器1120、输入单元1130、显示单元1140、传感器1150、音频电路1160、无线保真(wireless fidelity,WiFi)模块1170、处理器1180、以及摄像头1190等部件。本领域技术人员可以理解,图12中示出的手机结构并不 构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图12对手机的各个构成部件进行具体的介绍:
RF电路1110可用于收发信息或通话过程中,信号的接收和发送,RF电路1110也就是收发器。特别地,将基站的下行信息接收后,给处理器1180处理;另外,将设计上行的数据发送给基站。通常,RF电路1110包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路1110还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。
存储器1120可用于存储软件程序以及模块,处理器1180通过运行存储在存储器1120的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器1120可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1120可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
输入单元1130可用于接收用户输入的待翻译语句、翻译指示灯。具体地,输入单元1130可包括触控面板1131以及其他输入设备1132。触控面板1131,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1131上或在触控面板1131附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板1131可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器 从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器1180,并能接收处理器1180发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1131。除了触控面板1131,输入单元1130还可以包括其他输入设备1132。具体地,其他输入设备1132可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元1140可用于显示翻译的结果。显示单元1140可包括显示面板1141,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板1141。进一步的,触控面板1131可覆盖显示面板1141,当触控面板1131检测到在其上或附近的触摸操作后,传送给处理器1180以确定触摸事件的类型,随后处理器1180根据触摸事件的类型在显示面板1141上提供相应的视觉输出。虽然在图12中,触控面板1131与显示面板1141是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将触控面板1131与显示面板1141集成而实现手机的输入和输出功能。
手机还可包括至少一种传感器1150,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板1141的亮度,接近传感器可在手机移动到耳边时,关闭显示面板1141和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路1160、扬声器1161,传声器1162可提供用户与手机之间的音频接口。音频电路1160可将接收到的音频数据转换后的电信号,传输到扬声器1161,由扬声器1161转换为声音信号输出;另一方面,传声器1162将收集的声音信号转换为电信号,由音频电路1160接收后转换为音频数据,再将音频数据输出处理器1180处理后,经RF电路1110以发送给比如另一手机,或者 将音频数据输出至存储器1120以便进一步处理。
WiFi属于短距离无线传输技术,手机通过WiFi模块1170可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图12示出了WiFi模块1170,但是可以理解的是,其并不属于手机的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。
处理器1180是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器1120内的软件程序和/或模块,以及调用存储在存储器1120内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器1180可包括一个或多个处理单元;优选的,处理器1180可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1180中。
摄像头1190用于采集图像。
手机还包括给各个部件供电的电源(比如电池),优选的,电源可以通过电源管理系统与处理器1180逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。
在本申请实施例中,该终端所包括的处理器1180还具有以下控制功能:
接收以第一语言表达的第一待翻译语句;
使用目标翻译模型对所述第一待翻译语句进行翻译,以得到用第二语言表达的翻译结果语句,其中所述目标翻译模型为使用多个训练样本和所述多个训练样本中每个训练样本各自对应的扰动样本集合训练得到的,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;
输出所述用第二语言表达的翻译结果语句。
可选地,还可以:
接收以所述第一语言表达的第二待翻译语句,所述第二待翻译语句为所述第一待翻译语句的扰动语句,所述第二待翻译语句与所述第一待翻译语句的相似度高于第一预设值;
使用目标翻译模型对所述第二待翻译语句进行翻译,以得到与所述第一待翻译语句对应的所述翻译结果语句;
输出所述翻译结果语句。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:ROM、RAM、磁盘或光盘等。
以上对本申请实施例所提供的翻译模型训练的方法、语句翻译的方法、装置以及设备进行了详细介绍,本文中应用了具体个例对本申请实施例的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请实施例的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请实施例的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容 不应理解为对本申请实施例的限制。
工业实用性
本申请实施例在翻译模型训练时就采用了扰动样本,扰动样本与训练样本的语义相似度高于第一预设值,也就是扰动样本与训练样本的语义很相近,这样训练出来的目标翻译模型在接收到带有噪声的语句时,也可以正确进行翻译。从而提高了机器翻译的鲁棒性,以及翻译质量。

Claims (15)

  1. 一种翻译模型训练的方法,包括:
    计算机设备获取训练样本集合,所述训练样本集合中包括多个训练样本;
    所述计算机设备确定所述训练样本集合中每个训练样本各自对应的扰动样本集合,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;
    所述计算机设备使用所述多个训练样本和所述每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型。
  2. 根据权利要求1所述的方法,其中,所述每个训练样本为一个训练样本对,所述训练样本对包括训练输入样本和训练输出样本;
    所述计算机设备确定每个训练样本各自对应的扰动样本集合,包括:
    确定每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本,所述扰动输入样本集合包括至少一个扰动输入样本,所述扰动输出样本与所述训练输出样本相同;
    所述计算机设备使用所述多个训练样本和所述每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型,包括:
    使用多个训练样本对和所述每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本训练初始翻译模型,以得到目标翻译模型。
  3. 根据权利要求2所述的方法,其中,所述计算机设备确定所述训练样本集合中每个训练输入样本各自对应的扰动输入样本集合,包括:
    确定所述每个训练输入样本中的第一词语,所述第一词语为待被替换的词语;
    用至少一个第二词语分别替换所述第一词语,以得到所述扰动输入样本集合,所述第二词语与所述第一词语的语义相似度高于第二预设值。
  4. 根据权利要求2所述的方法,其中,所述计算机设备确定所述训练样本集合中每个训练输入样本各自对应的扰动输入样本集合,包括:
    确定所述每个训练输入样本中每个词语的词向量;
    每次在所述每个词语的词向量上叠加一个不同的高斯噪声向量,以得到所 述扰动样本集合。
  5. 根据权利要求2-4任一所述的方法,其中,所述初始翻译模型包括编码器、分类器和解码器;
    所述编码器用于接收所述训练输入样本和对应的扰动输入样本,并输出第一中间表示结果和第二中间表示结果,所述第一中间表示结果为所述训练输入样本的中间表示结果,所述第二中间表示结果为所述对应的扰动输入样本的中间表示结果;
    所述分类器用于区分所述第一中间表示结果和所述第二中间表示结果;
    所述解码器用于根据第一中间表示结果输出所述训练输出样本,根据所述第二中间表示结果输出所述训练输出样本。
  6. 根据权利要求5所述的方法,其中,所述初始翻译模型的模型目标函数包括与所述分类器和所述编码器相关的分类目标函数、与所述编码器和所述解码器相关的训练目标函数和扰动目标函数;
    其中,所述分类目标函数中包括所述训练输入样本、所述对应的扰动输入样本、所述编码器的参数和所述分类器的参数;
    所述训练目标函数包括所述训练输入样本、所述训练输出样本、所述编码器的参数和所述解码器的参数;
    所述扰动目标函数包括所述扰动输入样本、所述训练输出样本、所述编码器的参数和所述解码器的参数。
  7. 根据权利要求6所述的方法,其中,所述计算机设备使用多个训练样本对和所述每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本训练初始翻译模型,以得到目标翻译模型,包括:
    将每个训练输入样本、对应的扰动输入样本以及对应的训练输出样本输入所述模型目标函数;
    按照梯度下降的方式优化所述模型目标函数,以确定出所述编码器的参数的取值、所述解码器的参数的取值和所述分类器的参数的取值,其中,对于分类目标函数中编码器的参数的梯度乘以-1。
  8. 一种语句翻译的方法,包括:
    终端设备接收以第一语言表达的第一待翻译语句;
    所述终端设备使用目标翻译模型对所述第一待翻译语句进行翻译,以得到用第二语言表达的翻译结果语句,其中所述目标翻译模型为使用多个训练样本和所述多个训练样本中每个训练样本各自对应的扰动样本集合训练得到的,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;
    所述终端设备输出所述用第二语言表达的翻译结果语句。
  9. 根据权利要求8所述的方法,其中,所述方法还包括:
    所述终端设备接收以所述第一语言表达的第二待翻译语句,所述第二待翻译语句为所述第一待翻译语句的扰动语句,所述第二待翻译语句与所述第一待翻译语句的相似度高于第一预设值;
    所述终端设备使用目标翻译模型对所述第二待翻译语句进行翻译,以得到与所述第一待翻译语句对应的所述翻译结果语句;
    输出所述翻译结果语句。
  10. 一种翻译模型训练的装置,包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,所述程序单元由所述处理器执行,所述程序单元包括:
    获取单元,被设置为获取训练样本集合,所述训练样本集合中包括多个训练样本;
    确定单元,被设置为确定所述获取单元获取的所述训练样本集合中每个训练样本各自对应的扰动样本集合,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;
    模型训练单元,被设置为使用所述获取单元获得的所述多个训练样本和所述确定单元确定的所述每个训练样本各自对应的扰动样本集合训练初始翻译模型,以得到目标翻译模型。
  11. 根据权利要求10所述的装置,其中,
    所述确定单元,被设置为在所述每个训练样本为一个训练样本对,所述训练样本对包括训练输入样本和训练输出样本时,确定每个训练输入样本各自对 应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本,所述扰动输入样本集合包括至少一个扰动输入样本,所述扰动输出样本与所述训练输出样本相同;
    所述模型训练单元,被设置为使用多个训练样本对和所述每个训练输入样本各自对应的扰动输入样本集合,以及所述扰动输出样本集合对应的扰动输出样本训练初始翻译模型,以得到目标翻译模型。
  12. 一种语句翻译的装置,包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,所述程序单元由所述处理器执行,所述程序单元包括:
    接收单元,被设置为接收以第一语言表达的第一待翻译语句;
    翻译单元,被设置为使用目标翻译模型对所述接收单元接收的所述第一待翻译语句进行翻译,以得到用第二语言表达的翻译结果语句,其中所述目标翻译模型为使用多个训练样本和所述多个训练样本中每个训练样本各自对应的扰动样本集合训练得到的,所述扰动样本集合包括至少一个扰动样本,所述扰动样本与对应训练样本的语义相似度高于第一预设值;
    输出单元,被设置为输出所述翻译单元翻译出的用第二语言表达的翻译结果语句。
  13. 一种计算机设备,所述计算机设备包括:输入/输出(I/O)接口、处理器和存储器,所述存储器中存储有程序指令;
    所述处理器用于执行存储器中存储的程序指令,执行如权利要求1-7任一所述的方法。
  14. 一种终端设备,所述终端设备包括:输入/输出(I/O)接口、处理器和存储器,所述存储器中存储有程序指令;
    所述处理器用于执行存储器中存储的程序指令,执行如权利要求8或9所述的方法。
  15. 一种非暂态计算机可读存储介质,包括指令,当所述指令在计算机设备上运行时,使得所述计算机设备执行如权利要求1-7中任一项所述的方法或权利要求8或9所述的方法。
PCT/CN2019/080411 2018-05-10 2019-03-29 翻译模型训练的方法、语句翻译的方法、设备及存储介质 WO2019214365A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2020545261A JP7179273B2 (ja) 2018-05-10 2019-03-29 翻訳モデルのトレーニング方法、語句翻訳の方法、機器、記憶媒体及びコンピュータプログラム
EP19800044.0A EP3792789A4 (en) 2018-05-10 2019-03-29 MODEL TRANSLATION LEARNING PROCESS, SENTENCE TRANSLATION PROCESS AND APPARATUS, AND INFORMATION MEDIA
US16/987,565 US11900069B2 (en) 2018-05-10 2020-08-07 Translation model training method, sentence translation method, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810445783.2 2018-05-10
CN201810445783.2A CN110472251B (zh) 2018-05-10 2018-05-10 翻译模型训练的方法、语句翻译的方法、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/987,565 Continuation US11900069B2 (en) 2018-05-10 2020-08-07 Translation model training method, sentence translation method, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2019214365A1 true WO2019214365A1 (zh) 2019-11-14

Family

ID=68466679

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/080411 WO2019214365A1 (zh) 2018-05-10 2019-03-29 翻译模型训练的方法、语句翻译的方法、设备及存储介质

Country Status (5)

Country Link
US (1) US11900069B2 (zh)
EP (1) EP3792789A4 (zh)
JP (1) JP7179273B2 (zh)
CN (1) CN110472251B (zh)
WO (1) WO2019214365A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859995A (zh) * 2020-06-16 2020-10-30 北京百度网讯科技有限公司 机器翻译模型的训练方法、装置、电子设备及存储介质
CN112364999A (zh) * 2020-10-19 2021-02-12 深圳市超算科技开发有限公司 冷水机调节模型的训练方法、装置及电子设备
CN112541557A (zh) * 2020-12-25 2021-03-23 北京百度网讯科技有限公司 生成式对抗网络的训练方法、装置及电子设备
CN113110843A (zh) * 2021-03-05 2021-07-13 卓尔智联(武汉)研究院有限公司 合约生成模型训练方法、合约生成方法及电子设备
CN114241268A (zh) * 2021-12-21 2022-03-25 支付宝(杭州)信息技术有限公司 一种模型的训练方法、装置及设备
CN115081462A (zh) * 2022-06-15 2022-09-20 京东科技信息技术有限公司 翻译模型训练、翻译方法和装置
CN111258991B (zh) * 2020-01-08 2023-11-07 北京小米松果电子有限公司 一种数据处理方法、装置及存储介质

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021141576A1 (en) * 2020-01-08 2021-07-15 Google, Llc Translation of text depicted in images
CN113283249B (zh) * 2020-02-19 2024-09-27 阿里巴巴集团控股有限公司 机器翻译方法、装置及计算机可读存储介质
CN111859997B (zh) * 2020-06-16 2024-01-26 北京百度网讯科技有限公司 机器翻译中的模型训练方法、装置、电子设备及存储介质
CN111723550B (zh) * 2020-06-17 2024-07-12 腾讯科技(深圳)有限公司 语句改写方法、装置、电子设备以及计算机存储介质
CN111753556B (zh) * 2020-06-24 2022-01-04 掌阅科技股份有限公司 双语对照阅读的方法、终端及计算机存储介质
CN112257459B (zh) * 2020-10-16 2023-03-24 北京有竹居网络技术有限公司 语言翻译模型的训练方法、翻译方法、装置和电子设备
CN112328348A (zh) * 2020-11-05 2021-02-05 深圳壹账通智能科技有限公司 应用程序多语言支持方法、装置、计算机设备及存储介质
CN112380883B (zh) * 2020-12-04 2023-07-25 北京有竹居网络技术有限公司 模型训练方法、机器翻译方法、装置、设备及存储介质
CN112528637B (zh) * 2020-12-11 2024-03-29 平安科技(深圳)有限公司 文本处理模型训练方法、装置、计算机设备和存储介质
CN112417895B (zh) * 2020-12-15 2024-09-06 广州博冠信息科技有限公司 弹幕数据处理方法、装置、设备以及存储介质
US12001798B2 (en) * 2021-01-13 2024-06-04 Salesforce, Inc. Generation of training data for machine learning based models for named entity recognition for natural language processing
CN112598091B (zh) * 2021-03-08 2021-09-07 北京三快在线科技有限公司 一种训练模型和小样本分类的方法及装置
CN113204977B (zh) * 2021-04-29 2023-09-26 北京有竹居网络技术有限公司 信息翻译方法、装置、设备和存储介质
CN113609157B (zh) * 2021-08-09 2023-06-30 平安科技(深圳)有限公司 语言转换模型训练、语言转换方法、装置、设备及介质
CN113688245B (zh) * 2021-08-31 2023-09-26 中国平安人寿保险股份有限公司 基于人工智能的预训练语言模型的处理方法、装置及设备
CN113762397B (zh) * 2021-09-10 2024-04-05 北京百度网讯科技有限公司 检测模型训练、高精度地图更新方法、设备、介质及产品

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102439596A (zh) * 2009-05-22 2012-05-02 微软公司 从非结构化资源挖掘短语对
CN105279252A (zh) * 2015-10-12 2016-01-27 广州神马移动信息科技有限公司 挖掘相关词的方法、搜索方法、搜索系统
CN107526720A (zh) * 2016-06-17 2017-12-29 松下知识产权经营株式会社 意思生成方法、意思生成装置以及程序

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007122525A (ja) 2005-10-29 2007-05-17 National Institute Of Information & Communication Technology 言い換え処理方法及び装置
US9201871B2 (en) * 2010-06-11 2015-12-01 Microsoft Technology Licensing, Llc Joint optimization for machine translation system combination
JP5506569B2 (ja) * 2010-06-29 2014-05-28 株式会社Kddi研究所 サポートベクトルマシンの再学習方法及び装置
US8515736B1 (en) * 2010-09-30 2013-08-20 Nuance Communications, Inc. Training call routing applications by reusing semantically-labeled data collected for prior applications
US8972240B2 (en) * 2011-05-19 2015-03-03 Microsoft Corporation User-modifiable word lattice display for editing documents and search queries
CN102799579B (zh) * 2012-07-18 2015-01-21 西安理工大学 具有错误自诊断和自纠错功能的统计机器翻译方法
CN104239286A (zh) * 2013-06-24 2014-12-24 阿里巴巴集团控股有限公司 同义短语的挖掘方法和装置及搜索相关内容的方法和装置
US9026551B2 (en) * 2013-06-25 2015-05-05 Hartford Fire Insurance Company System and method for evaluating text to support multiple insurance applications
CN104933038A (zh) * 2014-03-20 2015-09-23 株式会社东芝 机器翻译方法和机器翻译装置
US9443513B2 (en) * 2014-03-24 2016-09-13 Educational Testing Service System and method for automated detection of plagiarized spoken responses
US10055485B2 (en) * 2014-11-25 2018-08-21 International Business Machines Corporation Terms for query expansion using unstructured data
CN107438842A (zh) * 2014-12-18 2017-12-05 Asml荷兰有限公司 通过机器学习的特征搜索
US10115055B2 (en) * 2015-05-26 2018-10-30 Booking.Com B.V. Systems methods circuits and associated computer executable code for deep learning based natural language understanding
US9984068B2 (en) * 2015-09-18 2018-05-29 Mcafee, Llc Systems and methods for multilingual document filtering
CN105512679A (zh) * 2015-12-02 2016-04-20 天津大学 一种基于极限学习机的零样本分类方法
JP6671027B2 (ja) * 2016-02-01 2020-03-25 パナソニックIpマネジメント株式会社 換言文生成方法、該装置および該プログラム
JP6655788B2 (ja) * 2016-02-01 2020-02-26 パナソニックIpマネジメント株式会社 対訳コーパス作成方法、該装置および該プログラムならびに機械翻訳システム
US20170286376A1 (en) * 2016-03-31 2017-10-05 Jonathan Mugan Checking Grammar Using an Encoder and Decoder
US10268686B2 (en) * 2016-06-24 2019-04-23 Facebook, Inc. Machine translation system employing classifier
CN107608973A (zh) * 2016-07-12 2018-01-19 华为技术有限公司 一种基于神经网络的翻译方法及装置
KR102565275B1 (ko) * 2016-08-10 2023-08-09 삼성전자주식회사 병렬 처리에 기초한 번역 방법 및 장치
US20180061408A1 (en) * 2016-08-24 2018-03-01 Semantic Machines, Inc. Using paraphrase in accepting utterances in an automated assistant
CN109690577A (zh) * 2016-09-07 2019-04-26 皇家飞利浦有限公司 利用堆叠式自动编码器进行的半监督式分类
JP6817556B2 (ja) 2016-09-27 2021-01-20 パナソニックIpマネジメント株式会社 類似文生成方法、類似文生成プログラム、類似文生成装置及び類似文生成システム
KR102589638B1 (ko) * 2016-10-31 2023-10-16 삼성전자주식회사 문장 생성 장치 및 방법
CN107180026B (zh) * 2017-05-02 2020-12-29 苏州大学 一种基于词嵌入语义映射的事件短语学习方法及装置
CN107273503B (zh) * 2017-06-19 2020-07-10 北京百度网讯科技有限公司 用于生成同语言平行文本的方法和装置
CN107463879A (zh) * 2017-07-05 2017-12-12 成都数联铭品科技有限公司 基于深度学习的人体行为识别方法
CN107766577B (zh) * 2017-11-15 2020-08-21 北京百度网讯科技有限公司 一种舆情监测方法、装置、设备及存储介质
JP7149560B2 (ja) * 2018-04-13 2022-10-07 国立研究開発法人情報通信研究機構 リクエスト言換システム、リクエスト言換モデル及びリクエスト判定モデルの訓練方法、及び対話システム
CN113761950B (zh) * 2021-04-28 2024-08-27 腾讯科技(深圳)有限公司 一种翻译模型的测试方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102439596A (zh) * 2009-05-22 2012-05-02 微软公司 从非结构化资源挖掘短语对
CN105279252A (zh) * 2015-10-12 2016-01-27 广州神马移动信息科技有限公司 挖掘相关词的方法、搜索方法、搜索系统
CN107526720A (zh) * 2016-06-17 2017-12-29 松下知识产权经营株式会社 意思生成方法、意思生成装置以及程序

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3792789A4

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258991B (zh) * 2020-01-08 2023-11-07 北京小米松果电子有限公司 一种数据处理方法、装置及存储介质
CN111859995A (zh) * 2020-06-16 2020-10-30 北京百度网讯科技有限公司 机器翻译模型的训练方法、装置、电子设备及存储介质
CN111859995B (zh) * 2020-06-16 2024-01-23 北京百度网讯科技有限公司 机器翻译模型的训练方法、装置、电子设备及存储介质
CN112364999A (zh) * 2020-10-19 2021-02-12 深圳市超算科技开发有限公司 冷水机调节模型的训练方法、装置及电子设备
CN112364999B (zh) * 2020-10-19 2021-11-19 深圳市超算科技开发有限公司 冷水机调节模型的训练方法、装置及电子设备
CN112541557A (zh) * 2020-12-25 2021-03-23 北京百度网讯科技有限公司 生成式对抗网络的训练方法、装置及电子设备
CN112541557B (zh) * 2020-12-25 2024-04-05 北京百度网讯科技有限公司 生成式对抗网络的训练方法、装置及电子设备
CN113110843A (zh) * 2021-03-05 2021-07-13 卓尔智联(武汉)研究院有限公司 合约生成模型训练方法、合约生成方法及电子设备
CN113110843B (zh) * 2021-03-05 2023-04-11 卓尔智联(武汉)研究院有限公司 合约生成模型训练方法、合约生成方法及电子设备
CN114241268A (zh) * 2021-12-21 2022-03-25 支付宝(杭州)信息技术有限公司 一种模型的训练方法、装置及设备
CN115081462A (zh) * 2022-06-15 2022-09-20 京东科技信息技术有限公司 翻译模型训练、翻译方法和装置

Also Published As

Publication number Publication date
US11900069B2 (en) 2024-02-13
JP7179273B2 (ja) 2022-11-29
EP3792789A4 (en) 2021-07-07
EP3792789A1 (en) 2021-03-17
CN110472251A (zh) 2019-11-19
US20200364412A1 (en) 2020-11-19
CN110472251B (zh) 2023-05-30
JP2021515322A (ja) 2021-06-17

Similar Documents

Publication Publication Date Title
WO2019214365A1 (zh) 翻译模型训练的方法、语句翻译的方法、设备及存储介质
US10956771B2 (en) Image recognition method, terminal, and storage medium
KR102360659B1 (ko) 기계번역 방법, 장치, 컴퓨터 기기 및 기억매체
US12050881B2 (en) Text translation method and apparatus, and storage medium
CN110334360B (zh) 机器翻译方法及装置、电子设备及存储介质
US10747954B2 (en) System and method for performing tasks based on user inputs using natural language processing
CN107102746B (zh) 候选词生成方法、装置以及用于候选词生成的装置
WO2018107921A1 (zh) 回答语句确定方法及服务器
US20150325236A1 (en) Context specific language model scale factors
US10754885B2 (en) System and method for visually searching and debugging conversational agents of electronic devices
US10573317B2 (en) Speech recognition method and device
US9128930B2 (en) Method, device and system for providing language service
CN108984535B (zh) 语句翻译的方法、翻译模型训练的方法、设备及存储介质
CN111563390B (zh) 文本生成方法、装置和电子设备
CN104462058B (zh) 字符串识别方法及装置
CN107870904A (zh) 一种翻译方法、装置以及用于翻译的装置
GB2533842A (en) Text correction based on context
US11163377B2 (en) Remote generation of executable code for a client application based on natural language commands captured at a client device
US20210110824A1 (en) Electronic apparatus and controlling method thereof
US11741302B1 (en) Automated artificial intelligence driven readability scoring techniques
CN108345590B (zh) 一种翻译方法、装置、电子设备以及存储介质
CN117010386A (zh) 一种对象名称的识别方法、装置以及存储介质
CN116959407A (zh) 一种读音预测方法、装置及相关产品
CN118378615A (zh) 文章质量评估方法、装置、存储介质及电子设备
CN118485051A (zh) 一种教育公文生成方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19800044

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020545261

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2019800044

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2019800044

Country of ref document: EP

Effective date: 20201210