WO2019019916A1 - 翻译的方法、目标信息确定的方法及相关装置、存储介质 - Google Patents
翻译的方法、目标信息确定的方法及相关装置、存储介质 Download PDFInfo
- Publication number
- WO2019019916A1 WO2019019916A1 PCT/CN2018/095231 CN2018095231W WO2019019916A1 WO 2019019916 A1 WO2019019916 A1 WO 2019019916A1 CN 2018095231 W CN2018095231 W CN 2018095231W WO 2019019916 A1 WO2019019916 A1 WO 2019019916A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vector
- source
- translation
- translation vector
- moment
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/44—Statistical methods, e.g. probability models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the embodiments of the present invention relate to the field of computer technologies, and in particular, to a translation method, a method for determining target information, a related device, and a storage medium.
- Machine translation refers to the process of using a machine to convert text or speech from one language to another with the same meaning.
- MT Machine translation
- NMT neural machine translation
- Embodiments of the present invention provide a translation method, a method for determining target information, a related device, and a storage medium.
- An aspect of the present invention provides a method for translation, the method being applied to a neural network machine translation system, the method comprising: encoding an object to be processed by an encoder to obtain a source vector representation sequence; The processing text information belongs to the first language; the source end vector vector corresponding to the first time is obtained according to the source end vector representation sequence; wherein the source end context vector corresponding to the first time is used to indicate the first Determining, in the text information to be processed, the source content to be processed; determining a translation vector according to the source end vector representation sequence and the source side context vector; wherein the translation vector comprises a first translation vector and/or a second a translation vector, the first translation vector is a vector corresponding to the source content that is not translated in the source vector representation sequence in the first time, and the second translation vector is the source vector representation in the second time instant a vector corresponding to the source content that has been translated in the sequence, where the second time is a time adjacent to the first time; Decoding the translation vector and the source side context vector with a
- Another aspect of the present invention provides a method for determining target information, including: performing encoding processing on a text information to be processed to obtain a source vector representation sequence; and obtaining a source side context corresponding to the first moment according to the source end vector representation sequence a source side context vector corresponding to the first moment is used to indicate source content to be processed in the to-be-processed text information at the first moment; a sequence and a source according to the source end vector
- the end context vector determines a translation vector; wherein the translation vector comprises a first translation vector and/or a second translation vector, wherein the first translation vector is not translated in the source vector representation sequence in the first time instant a vector corresponding to the source content, where the second translation vector is a vector corresponding to the source content that has been translated in the source vector representation sequence in the second time, and the second time is before the first time a neighboring moment; decoding the translation vector and the source context vector to obtain target information at the first moment
- a still further aspect of the present invention provides an object information determining apparatus including at least one memory; at least one processor; wherein the at least one memory stores at least one instruction module configured to be executed by the at least one processor;
- the at least one instruction module includes: an encoding module, configured to perform encoding processing on the text information to be processed, to obtain a source vector representation sequence; and a first obtaining module, configured to acquire, according to the source end vector representation sequence, the first time corresponding a source-side context vector, where the source-side context vector corresponding to the first moment is used to indicate the source content to be processed in the to-be-processed text information at the first moment; and the first determining module is configured to The source vector representation sequence and the source context vector determine a translation vector; wherein the translation vector comprises a first translation vector and/or a second translation vector, wherein the first translation vector is within a first time instant
- the source vector represents a vector corresponding to the untranslated source content in the sequence, the second translation vector a vector corresponding to the
- a still further aspect of the present invention provides a target information determining apparatus, including: a memory, a processor, and a bus system; wherein the memory is used to store a program; and the processor is configured to execute a program in the memory, including the following steps Encoding the processed text information to obtain a source vector representation sequence; obtaining a source context vector corresponding to the first moment according to the source vector representation sequence; wherein the source context vector corresponding to the first moment is used Determining source content to be processed in the to-be-processed text information at the first moment; determining a translation vector according to the source end vector representation sequence and the source side context vector; wherein the translation vector includes the first a translation vector and/or a second translation vector, wherein the first translation vector is a vector corresponding to the source content that is not translated in the source vector representation sequence in the first time, the second translation vector is The source end vector indicates a vector corresponding to the source content that has been translated in the sequence, and the second time is a moment adjacent to the first moment; decoding the
- Yet another aspect of the present invention provides a computer readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the methods described in the above aspects.
- a still further aspect of the present invention provides a method for determining target information, which is performed by an electronic device, the method comprising: performing encoding processing on a text information to be processed to obtain a source vector representation sequence; and obtaining a sequence according to the source vector a source-side context vector corresponding to the first moment; wherein the source-side context vector corresponding to the first moment is used to indicate source content to be processed in the to-be-processed text information at the first moment; according to the source end a vector representation sequence and the source side context vector determining a translation vector; wherein the translation vector comprises a first translation vector and/or a second translation vector, wherein the first translation vector is at the source end within a first time instant
- the vector represents a vector corresponding to the untranslated source content in the sequence, and the second translation vector is a vector corresponding to the source content that has been translated in the source vector representation sequence in the second time, the second moment a moment adjacent to the first moment; decoding the translation vector and the source context vector to Get the target
- FIG. 1 is a block diagram of an apparatus for determining target information in an embodiment of the present invention
- FIG. 2 is a schematic flowchart of a method for determining target information in an embodiment of the present invention
- FIG. 3 is a schematic diagram of an embodiment of a method for determining target information according to an embodiment of the present invention
- FIG. 4 is a schematic structural diagram of a gated loop unit according to an embodiment of the present invention.
- FIG. 5 is another schematic structural diagram of a gated loop unit according to an embodiment of the present invention.
- FIG. 6 is another schematic structural diagram of a gated loop unit according to an embodiment of the present invention.
- FIG. 7 is a schematic diagram of an embodiment of an enhanced attention module according to an embodiment of the present invention.
- FIG. 8 is a schematic diagram of an embodiment of an enhanced decoder state according to an embodiment of the present invention.
- FIG. 9 is a schematic diagram of an embodiment of translating a source end vector representation sequence in a source sequence according to an embodiment of the present invention.
- FIG. 10 is a schematic diagram of an embodiment of translating a source end vector representation sequence in a source sequence according to an embodiment of the present invention
- FIG. 11 is a schematic diagram of an embodiment of a target information determining apparatus according to an embodiment of the present invention.
- FIG. 12 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
- FIG. 13 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
- FIG. 14 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
- FIG. 15 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
- FIG. 16 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
- FIG. 17 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
- FIG. 18 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
- FIG. 19 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
- FIG. 20 is a schematic structural diagram of a target information determining apparatus according to an embodiment of the present invention.
- the embodiment of the invention provides a method for translation, a method for determining target information, and related devices, which can model the untranslated source content and/or the translated source content in the source vector representation sequence. That is, the part of the content is stripped from the original language model for training, thereby reducing the difficulty of the model training of the decoder and improving the translation effect of the translation system.
- an encoder-decoder which is to convert an input sequence into a vector having a length, and the so-called decoding is converted into a vector sequence generated by the encoder. Output sequence.
- the encoder-decoder model has many applications, such as translation, document extraction and question answering systems.
- translation the input sequence is the text to be translated and the output sequence is the translated text.
- the input sequence is the question that is raised, and the output sequence is the answer.
- CNN convolutional neural networks
- RNN recurrent neural networks
- GRU gating Gated recurrent unit
- LSTM long-term recurrent neural network
- BiRNN bidirectional recurrent neural networks
- FIG. 1 is a schematic structural diagram of a target information determining apparatus according to an embodiment of the present invention.
- two hidden layers are additionally introduced in a decoder, and the two hidden layers may be represented by a vector sequence.
- the picture Indicates the second translation vector corresponding to the time t-1, and the second translation vector refers to the source content that has been translated, that is, the past translation vector.
- the first translation vector corresponding to the time t is represented, and the first translation vector refers to the source content that has not been translated, that is, the future translation vector.
- c t represents the source side context vector corresponding to the time t.
- s t represents the decoder state at time t.
- the present invention directly models past translations (translated content) and future translations (untranslated content) at the semantic level, and separates relevant content from the decoder state, thereby improving neural network translation.
- the system stores and utilizes related content to improve the translation system.
- the method provided by the present invention can be used in a mainstream neural network machine translation system.
- FIG. 2 is a schematic flowchart of a method for determining target information according to an embodiment of the present invention.
- the encoder module S1 inputs a sentence to be processed in step 101, and then outputs a source vector representation sequence by the encoder module S1.
- the following steps are repeated by the attention module S2, the past future module S3, and the decoder module S4 until all the translations are generated:
- the attention module S2 reads in the past translation vector and the future translation vector at the time t-1, wherein the initial translation vector is an all-zero vector, indicating that no source content is translated, and the initial translation vector is the source vector. Represents the last vector of the sequence, representing a summary of the source sentence.
- the attention module S2 outputs the source side context vector of the current time, that is, the time t, in step 103.
- module S3 reads the source context vector of the current time, and in step 104 updates the past translation vector and the future translation vector at time t.
- the decoder module S4 reads the future translation vector at the t-th time, the past translation vector at the t-1th time, the source-side context vector at the t-th time, and other standard inputs, and generates the target word at the t-th time in step 105.
- the present invention can be applied to an NMT system.
- the following describes a method for translation provided by the present invention.
- One embodiment of the method for translation in the embodiment of the present invention includes:
- the encoder first processes the text information to be processed to obtain a source vector representation sequence, wherein the text information to be processed belongs to the first language, such as Chinese, and it can be understood that, in practical applications, Can be other types of languages.
- the process of the encoding process is specifically: inputting the text information to be processed into an encoder in the NMT system, and then encoding the text information to be processed by the encoder, and finally obtaining a source vector representation sequence according to the result of the encoding process, the source end vector Indicates that each source vector in the sequence belongs to the first language.
- the text information to be processed can be a Chinese sentence, which contains several phrases.
- the source vector representation sequence can be obtained, and then the source time context vector corresponding to the current time, that is, the first time instant, is obtained, wherein the source side context vector is used to represent the source content to be processed.
- the source content can be a certain word in this Chinese sentence.
- the source side context vector corresponding to the first moment is used to indicate the source content to be processed in the to-be-processed text information at the first moment.
- the NMT system determines a first translation vector and/or a second translation vector according to the source vector representation sequence and the source context vector, wherein the first translation vector indicates that the source vector representation sequence is not translated in the first time instant The source content, and the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second time, and the second time is a time adjacent to the first time. If the first moment is time t, then the second moment is the time t-1.
- the first translation vector and/or the second translation vector may be referred to as a translation vector, and the translation vector may be a first translation vector, a second translation vector, or a first translation vector and a second translation. vector.
- the first translation vector is a vector corresponding to the source content that is not translated in the source vector representation sequence in the first time instant
- the second translation vector is the source that has been translated in the source vector representation sequence in the second time instant.
- the source end vector indicates that the source content corresponding to the sequence is “up to 1,300 flights per month to the world.”
- the words corresponding to the source vector are “monthly”, “to”, “Worldwide”, “of”, “flight”, “as many” and “1300”.
- the future translation vector can be understood as the corresponding vector of the "world”, “of”, “flight”, “as many” and “1300” untranslated.
- Past translation vectors can be understood as vectors corresponding to "monthly” and "to” that have been translated.
- the first translation vector and/or the second translation vector and the source side context vector are decoded by the decoder in the NMT system to obtain target information at the first moment, wherein the target information belongs to the second language.
- the second language is a language different from the first language, and may be English, French, or Japanese, and is not limited herein.
- the output target information can be "all parts of the world", that is, the first language is Chinese and the second language is English, which ends the process of machine translation.
- a translation method which can model the untranslated source content and/or the translated source content in the source vector representation sequence, that is, the content is
- the original language model is stripped out for training, which reduces the difficulty of model training of the decoder and improves the translation effect of the translation system.
- FIG. 3 is an embodiment of a method for determining target information according to an embodiment of the present invention, including:
- the encoder in the target information determining apparatus performs encoding processing on the text information to be processed; wherein the text information to be processed may be a sentence to be translated, for example, “multiple airports are closed.”
- the sentence is encoded.
- the source vector representation sequence is obtained.
- the source vector represents one source content (source word) corresponding to each vector in the sequence, for example, the source content in the sentence “Multiple airports are closed.”
- the source content in the sentence is “multiple”, “airport”, “Being”, “Closed”, “.” and " ⁇ eos>”.
- the decoder in the target information determining means generates the translation word by word.
- the source side context vector corresponding to the first moment is used to indicate the source content to be processed in the to-be-processed text information at the first moment.
- the target information determining apparatus may obtain the source side context vector corresponding to the first moment according to the source end vector representation sequence, where the first time instant is the tth moment in the embodiment of the present invention, and the source side context vector is used to represent Source content to be processed.
- the target information determining apparatus outputs an alignment probability for each source content, such as 0.0 or 0.2, and the source vector indicates that the sum of the alignment probabilities in the sequence is 1, and the alignment probability is larger, indicating the source content and The more relevant the target information to be generated.
- the source side context vector at time t is generated by weighting the alignment probability and the semantic vector.
- the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second time, and the second time is a time adjacent to the first time;
- first translation vector and/or the second translation vector can be referred to as a translation vector.
- the target information determining apparatus may determine the first translation vector according to the source end vector representation sequence and the source side context vector, or determine the second translation vector according to the source end vector representation sequence and the source side context vector, or according to the source end.
- the vector representation sequence and the source side context vector determine the first translation vector and the second translation vector.
- the first translation vector indicates the source content that is not translated in the source vector representation sequence in the first time instant
- the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second time, second The moment is a moment adjacent to the first moment.
- the first translation vector represents a future translation vector at time t
- the second translation vector represents a past translation vector at time t-1.
- the decoder in the target information determining apparatus uses a neural network output layer to decode the first translation vector and the source side context vector, and obtains the target information at the first moment.
- the second translation vector and the source side context vector are decoded to obtain target information at the first moment.
- the first translation vector, the second translation vector, and the source side context vector are decoded to obtain target information at the first moment.
- a plurality of pieces of information to be selected may be generated, and finally one word with the highest similarity is output as the target information. For example, “multiple airports are closed.”
- “multiple” can be translated into “many” or “much”, but it can be known from the semantic knowledge stored in the decoder state vector, before the countable nouns. "many”, therefore, the “multiple” here is finally translated into “many”.
- a method for determining target information is provided.
- the target information determining apparatus performs encoding processing on the processed text information to obtain a source end vector representation sequence, and then obtains a first time corresponding according to the source end vector representation sequence.
- the source side context vector is used to represent the source content to be processed, and the first translation vector and/or the second translation vector is determined according to the source end vector representation sequence and the source side context vector, the first translation vector Indicates that the source end vector is not translated in the source end vector in the first time instant, and the second translation vector indicates the source end content that has been translated in the source end vector representation sequence in the second time instant, and the second moment is before the first moment
- the final target information determining means performs decoding processing on the first translation vector and/or the second translation vector and the source side context vector to obtain target information at the first moment.
- the source end content in the source vector representation sequence and/or the translated source content in the sequence may be modeled, that is, the part of the content is stripped from the original language model for training. Thereby reducing the difficulty of the model training of the decoder and improving the translation effect of the translation system.
- the first translation is determined according to the source end vector representation sequence and the source end context vector.
- Vectors which can include:
- the third translation vector and the source side context vector are processed using a preset neural network model to obtain a first translation vector.
- the third translation vector is a vector corresponding to the untranslated source content in the sequence in the source vector representation in the second time.
- the determining, by the target information determining apparatus, the first translation vector according to the source vector representation sequence and the source side context vector may include: first acquiring the third translation vector corresponding to the second moment according to the source end vector representation sequence, and then The third translation vector and the source side context vector are processed by using a preset neural network model to obtain a first translation vector.
- the target information determining apparatus needs to read the source-side context vector of the first time (representing the source content translated at the first time) And then update the stored future translation vector.
- the initialization of the future translation vector is a summary of the source sentence (usually the source vector represents the last vector of the sequence), indicating that all source content of the start is not translated. So at every moment, the update is as follows:
- the future translation vector representing the t-1th moment, that is, the third translation vector, c t represents the source-side context vector at the time t, and
- RNN() represents the calculation using the RNN model.
- RNN is only a schematic representation of the default neural network model.
- the preset neural network model may be LSTM, time delay network model or gated convolutional neural network, or other types.
- the neural network structure is not limited here.
- the neural network model processes the third translation vector and the source side context vector to obtain a first translation vector. In the above manner, by using the preset neural network model to output the first translation vector, the accuracy of the future translation vector can be improved.
- the first step is determined according to the source end vector representation sequence and the source end context vector.
- the translation vector and the second translation vector may include:
- the neural network model is obtained by processing the second translation vector and the source side context vector.
- the fourth translation vector is a vector corresponding to the source content of the sequence that has been translated in the source vector representation in the first time.
- the first moment is the t-th time
- the second moment is the t-1th moment
- the source-side context vector of the first moment ie, the source-side semantic content being translated
- c t is also used to update the past translation vector and future translation vector. Updates are as follows:
- the future translation vector representing the t-1th moment, that is, the third translation vector, c t represents the source-side context vector at the time t, and
- RNN() represents the calculation using the RNN model.
- a past translation vector representing the tth moment that is, a fourth translation vector
- the past translation vector representing the t-1th moment that is, the second translation vector
- c t represents the source side context vector at the time t
- RNN() represents the calculation using the RNN model.
- RNN is only a schematic representation of the default neural network model.
- the preset neural network model may be LSTM, time delay network model or gated convolutional neural network, or other types.
- the neural network structure is not limited here.
- the third translation vector and the source side context vector are processed by using a preset neural network model to obtain a first translation vector, and the second translation can also be obtained according to the position of the source side context vector in the source vector representation sequence.
- determining the second translation according to the source end vector representation sequence and the source side context vector Vectors which can include:
- the neural network model is obtained by processing the second translation vector and the source side context vector.
- the target information determining apparatus may acquire the second translation vector at the second moment according to the position of the source end context vector at the first moment in the sequence of the source end vector representation.
- the target information determining apparatus needs to read the source side context vector at the t-2th time and the past translation vector at the t-2th time. Then, the source context vector at time t-2 and the past translation vector at time t-2 are processed by a preset neural network model to obtain a past translation vector at time t-1. Past translation vectors are initialized to all zero vectors Indicates that no source content has been translated at the beginning. So at every moment, the update is as follows:
- a past translation vector representing the tth moment that is, a fourth translation vector
- the past translation vector representing the t-1th moment that is, the second translation vector
- c t represents the source side context vector at the time t
- RNN() represents the calculation using the RNN model.
- RNN is only a schematic representation of the default neural network model.
- the preset neural network model may be LSTM, time delay network model or gated convolutional neural network, or other types.
- the neural network structure is not limited here.
- how to determine the second translation vector according to the source end vector representation sequence and the source side context vector that is, obtain the second translation vector according to the position of the source side context vector appearing in the source vector representation sequence.
- the second translation vector is used to generate a fourth translation vector corresponding to the first moment, and the fourth translation vector is obtained by processing the second translation vector and the source context vector by using a preset neural network model.
- the preset neural network model to output the second translation vector, the accuracy of the past translation vector can be improved.
- Processing the third translation vector and the source side context vector to obtain the first translation vector may include:
- the source side context vector is subtracted from the third translation vector using a gated loop unit GRU to obtain a first translation vector.
- the GRU is a shorthand for the Gated Recurrent Unit.
- the first time is the t-th time
- the second time is the t-1 time
- the source-side context vector c t of the first time is from the third translation vector (ie, until the first Subtracted from the vector corresponding to the source content that has not been translated at the second moment.
- the invention can be applied to a variety of RNN structures.
- the mainstream GRU as an example.
- FIG. 4 is a schematic structural diagram of a gated loop unit according to an embodiment of the present invention.
- a “decremental” model can be established, and it is desirable that the parameters of the GRU can automatically learn the following rules:
- U, W, U r , W r , U u and Wu represent function-related parameters that are uniformly trained with other parameters of the neural network translation system.
- FIG. 5 is another schematic structural diagram of a gated loop unit according to an embodiment of the present invention.
- the future translation vector of the moment (ie, time t-1) is subtracted, wherein the future translation vector of the second moment is the third translation vector. Get the required first translation vector and pass it to the GRU structure.
- the GRU may be used to subtract the source context vector from the third translation vector to obtain the first translation vector, and then the obtained first translation vector is transmitted to the GRU structure.
- a decrementing signal can be given in the GRU, which is advantageous for learning the law. Thereby improving the accuracy of model training.
- Processing the third translation vector and the source side context vector to obtain the first translation vector may include:
- the third translation vector and the source side context vector are processed by the GRU to obtain an intermediate vector
- the intermediate vector is interpolated with the third translation vector to obtain a first translation vector.
- FIG. 6 is another schematic structural diagram of a gated loop unit according to an embodiment of the present invention.
- the second moment is the t-1th time, and the parameters of the GRU can automatically learn the following rules:
- the future translation vector representing the time t-1, that is, the third translation vector, c t represents the source side context vector at the first moment.
- the third translation vector at time t-1 can be interpolated and combined to obtain the final first translation vector.
- the past translation vector and the future translation vector can be obtained at each moment. Represents the source content that has been translated until time t, Indicates the source content that has not been translated until time t.
- the target information determining apparatus processes the third translation vector and the source side context vector by using a preset neural network model, so that the process of obtaining the first translation vector may be: first adopting the GRU to the third translation vector. And the source side context vector is processed to obtain an intermediate vector, and then the intermediate vector is interpolated and combined with the third translation vector to obtain a first translation vector.
- performing the decrementing operation inside the GRU is advantageous for improving the accuracy of the operation and increasing the efficiency of the operation.
- the source end context corresponding to the first moment is obtained according to the source end vector representation sequence.
- Vectors which can include:
- the source side context vector corresponding to the first moment is determined according to the alignment probability of the source content and the semantic vector of the source content.
- FIG. 7 is an enhanced attention module according to an embodiment of the present invention. a schematic diagram of the embodiment, specifically, assuming that the first time t for the first time, the second time point t-1 for the first time, the target information determination means includes an encoder and a decoder, a second decoder states S t time decoder -1 , second translation vector Third translation vector And the source vector represents the vector h i of the source content in the sequence, and determines the alignment probability ⁇ t,i of the source content.
- the alignment probability ⁇ t,i is calculated by the following formula:
- ⁇ t,i refers to the alignment probability distribution of each source content output by the attention mechanism
- the sum of the alignment probability distributions is 1
- h i is the vector representation of the ith source content of the input sentence in the encoder
- softmax () represents the normalization operation
- the value input by the neural network is usually a negative and positive value, so usually it is first converted to a positive value using its index value, and then all the index values are normalized. To get the probability distribution.
- a() is the operation of the attention module.
- the semantic vector x i of the corresponding source content is weighted and summed to obtain the source side context vector corresponding to the first moment.
- ⁇ t,1 at the first moment is 0.5
- ⁇ t,2 is 0.3
- ⁇ t,3 is 0.2
- x 1 is 2
- x 2 is 4, and
- x 3 is 6, so the source end corresponding to the first moment
- the context vector c t is calculated as:
- the calculation probability of the alignment probability ⁇ t,i can also be or
- the vector of the source content in the sequence is used to determine the alignment probability of the source content, and then The source side context vector corresponding to the first moment is determined according to the alignment probability of the source content and the semantic vector of the source content.
- the attention module in the target information determining device can be made aware of which source content has been translated, and which source content has not been translated, thereby placing more attention on the untranslated content. Reduce the focus on translated content to mitigate the problem of missing translations and duplicate translations.
- the first translation vector and/or the second translation vector and the source are provided on the basis of the foregoing embodiment corresponding to FIG.
- the method further includes: decoding the state according to the second moment, the target information of the second moment, the source context vector, the first translation vector, and the second translation. a vector that determines a decoder state at a first time;
- performing decoding processing on the first translation vector and/or the second translation vector and the source-side context vector to obtain target information at the first moment may include: a decoder state at the first moment, a source-side context vector, The first translation vector and/or the second translation vector are subjected to decoding processing to obtain target information at the first moment.
- the target information determining apparatus obtains the target information of the first time, first, according to the decoder state of the second time, the target information of the second time, the source side context vector, the first translation vector, and the second translation.
- the vector determines the decoder state at the first moment, wherein the first moment is the t-th time, also the current time, and the second time is the t-1th time, also as the previous time.
- FIG. 8 is a schematic diagram of an embodiment of an enhanced decoder state according to an embodiment of the present invention.
- the decoder state at the second moment is s t-1
- the target information at the second moment is y t -1
- the source context vector is c t
- the first translation vector is The second translation vector.
- the decoder state s t at the first moment can be calculated using the following formula:
- f() represents the activation function of updating the decoder state, and is also the standard configuration of the neural network translation model, and its input can be flexibly changed according to actual needs.
- the decoder state at the first moment is first determined according to the decoder state at the second moment, the target information at the second moment, the source context vector, the first translation vector, and the second translation vector.
- the decoder state, the source context vector, the first translation vector, and/or the second translation vector at the first moment are then subjected to decoding processing to obtain target information at the first moment.
- the modeling of the first translation vector and/or the second translation vector is independent from the decoder state, and can form a complete source-side semantic vector with the source-side context vector output by the attention module at the first moment. Expressed and passed to the decoder to generate more accurate target information.
- the method further includes:
- the training target is determined according to the first indicator expectation value and the second index expectation value, wherein the training target is used to construct a preset neural network model.
- a method for increasing the training target is also provided, and the preset neural network model can be better trained by increasing the training target.
- the training of future translation vectors will introduce the training of future translation vectors as an example. It can be understood that the way of training the translation vectors in the past is similar, and will not be described here. Take the future translation vector as an example, you need to reach as much as possible. That is, the information difference between the two translation vectors at the adjacent time is substantially the same as the source content translated at the moment to satisfy the modeling of the future translation vector.
- the first indicator expectation value can be calculated as follows:
- E(y t ) represents the target information
- y t is a vector representation of the target information
- It is an indicator of the expected value of the future translation vector as we expected (for example, the amount of updates and the source content being translated are substantially equal). The higher the expected value of the indicator, the more it meets our expectations.
- the second indicator expectation value can also be based on the second translation vector.
- the fourth translation vector Calculated Further obtaining the expected value of the second indicator
- the training target can be calculated as follows:
- J( ⁇ , ⁇ ) represents the parameters obtained through training, which is a general representation of the training target.
- ⁇ represents the parameters of the NMT system, and ⁇ represents the parameters of the newly introduced past future modules.
- a training goal that represents a standard neural network translation model, that is, a probability of maximizing the generation of each target information, or may be expressed as a likelihood score that maximizes the generation of the target word.
- the first indicator expectation value is obtained according to the first translation vector and the third translation vector
- the second indicator expectation value is obtained according to the second translation vector and the fourth translation vector, and then according to the first indicator expectation value and
- the second indicator expectation value determines a training goal, wherein the training goal is used to construct a preset neural network model.
- FIG. 9 is the first source content in the translation source vector representation sequence in the application scenario of the present invention.
- a schematic diagram of an embodiment is specifically:
- the encoder reads in the input sentence "Multiple airports are forced to close. ⁇ eos>", ⁇ eos> represents the sentence terminator, and then outputs a source vector representation sequence, where each vector (ie, the dot in Figure 9 is straight Article) corresponds to a source content. Based on this source vector representation sequence, the decoder generates a translation.
- the alignment probability and the semantic vector are weighted to generate the source-side context vector c 1 at the first moment, and the alignment probability is 0.5, 0.2, 0.2, 0.1, 0.0, and 0.0 in FIG.
- the alignment probability is 0.5, 0.2, 0.2, 0.1, 0.0, and 0.0 in FIG.
- a past translation vector representing the first moment a past translation vector representing the initial moment
- Indicates the future translation vector for the first moment Represents the future translation vector at the initial moment.
- Decoder pair c 1 And a decoder status s 0 at the initial time decoding may be updated decoder states s 1 of the first time, according to the s 0 and c 1, using a neural network output layer, and all target side words, selects similar the highest degree of a word as the target information y 1, y 1 which is "more than" translation "many".
- FIG. 10 is a schematic diagram of an embodiment of translating a source end vector representation sequence in a source sequence according to an embodiment of the present invention.
- the alignment probability and the semantic vector are weighted to generate a second moment.
- the source side context vector c 2 the alignment probability is 0.3, 0.6, 0.1, 0.0, 0.0, and 0.0 in FIG.
- the past translation vector and the future translation vector are updated, that is, the following formula is adopted:
- a past translation vector representing the second moment a past translation vector representing the first moment
- Indicates the future translation vector for the second moment Indicates the future translation vector for the first moment.
- Decoder pair c 2 Decoding with the decoder state s 1 at the first time, the decoder state at the second time can be updated, and a neural network output layer is used according to s 1 , c 2 and the previously generated target information y 1 , and all destinations from the word, selects the highest similarity as a target word information y 2, y 2 which is the "airport" translation "airports".
- the target information determining apparatus 30 in the embodiment of the present invention includes at least one memory; at least one processor; wherein the at least one memory stores at least one The instruction module is configured to be executed by the at least one processor; wherein the at least one instruction module comprises: an encoding module 301, configured to perform encoding processing on the text information to be processed to obtain a source vector representation sequence; the first acquisition module 302.
- the source end context vector corresponding to the first moment is obtained according to the source end vector representation sequence encoded by the encoding module 301, where the source end context vector is used to represent the source content to be processed; a determining module 303, configured to determine, according to the source vector representation sequence encoded by the encoding module 301 and the source side context vector acquired by the first obtaining module 302, a first translation vector and/or a second translation a vector, wherein the first translation vector indicates the source vector representation in the first time instant The source content that is not translated, the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second time, and the second time is the adjacent one before the first time a decoding module 304, configured to perform decoding processing on the first translation vector and/or the second translation vector determined by the first determining module 303 and the source context vector to obtain a first moment Target information.
- the encoding module 301 performs encoding processing on the text information to be processed to obtain a source vector representation sequence
- the first obtaining module 302 obtains the first moment according to the source end vector representation sequence obtained by the encoding module 301.
- a source-side context vector where the source-side context vector is used to represent the source-side content to be processed
- the first determining module 303 is configured according to the source-end vector representation sequence encoded by the encoding module 301 and the first
- the source side context vector obtained by the obtaining module 302 determines a first translation vector and/or a second translation vector, wherein the first translation vector indicates a source that is not translated in the source vector representation sequence in the first time instant End content, the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second time, the second time is a time adjacent to the first time, and the decoding module 304 Decoding the first translation vector and/or the second translation vector determined by the first determining module 303 and the source context vector Processing to obtain target information for the first time.
- a target information determining apparatus performs encoding processing on the text information to be processed to obtain a source end vector representation sequence, and then obtains a first moment according to the source end vector representation sequence.
- a source side context vector which is used to represent the source content to be processed, and determines a first translation vector and/or a second translation vector according to the source end vector representation sequence and the source side context vector, the first translation vector indication
- the source end vector represents the untranslated source content in the sequence
- the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second moment
- the second moment is the first moment before the phase
- the final target information determining means performs decoding processing on the first translation vector and/or the second translation vector and the source side context vector to obtain target information at the first moment.
- the source end content in the source vector representation sequence and/or the translated source content in the sequence may be modeled, that is, the part of the content is stripped from the original language model for training. Thereby reducing the difficulty of the model training of the decoder and improving the translation effect of the translation system.
- the first determining module 303 includes: a first acquiring unit 3031, configured to acquire a third translation vector corresponding to the second moment according to the source end vector representation sequence; and a first processing unit 3032, configured to adopt a preset neural network
- the network model processes the third translation vector acquired by the first obtaining unit 3031 and the source side context vector to obtain the first translation vector.
- the neural network model processes the third translation vector and the source side context vector to obtain a first translation vector. In the above manner, by using the preset neural network model to output the first translation vector, the accuracy of the future translation vector can be improved.
- the first determining module 303 includes: a second acquiring unit 3033, configured to acquire a third translation vector corresponding to the second moment according to the source end vector representation sequence; and a second processing unit 3034, configured to adopt a preset neural network
- the network model processes the third translation vector and the source context vector acquired by the second obtaining unit 3033 to obtain the first translation vector, and the third obtaining unit 3035 is configured to use the source end according to the source And acquiring, by the context vector, the second translation vector, where the source vector represents a position in the sequence, wherein the second translation vector is used to update a fourth translation vector corresponding to the first moment, where The four translation vectors are obtained by processing the second translation vector and the source side context vector using the preset neural network model.
- the third translation vector and the source side context vector are processed by using a preset neural network model to obtain a first translation vector, and the second translation can also be obtained according to the position of the source side context vector in the source vector representation sequence.
- the first determining module 303 includes: a fourth obtaining unit 3036, configured to acquire the second translation vector according to a location where the source side context vector appears in the source vector representation sequence, where the second translation vector is used to generate the first a fourth translation vector corresponding to the time, the fourth translation vector being obtained by processing the second translation vector and the source side context vector by using the preset neural network model.
- how to determine the second translation vector according to the source end vector representation sequence and the source side context vector that is, obtain the second translation vector according to the position of the source side context vector appearing in the source vector representation sequence.
- the second translation vector is used to generate a fourth translation vector corresponding to the first moment, and the fourth translation vector is obtained by processing the second translation vector and the source context vector by using a preset neural network model.
- the preset neural network model to output the second translation vector, the accuracy of the past translation vector can be improved.
- the first processing unit 3032 includes: a subtraction sub-unit 30321, configured to subtract the source-side context vector from the third translation vector by using a gated loop unit GRU to obtain the first translation vector.
- the GRU may be used to subtract the source context vector from the third translation vector to obtain the first translation vector, and then the obtained first translation vector is transmitted to the GRU structure.
- a decrementing signal can be given in the GRU, which is advantageous for learning the law. Thereby improving the accuracy of model training.
- the first processing unit 3032 is configured to process the third translation vector and the source side context vector by using a GRU to obtain an intermediate vector
- the merging subunit 30323 is configured to process the processing subunit 30322.
- the intermediate vector is interpolated and combined with the third translation vector to obtain the first translation vector.
- the target information determining apparatus processes the third translation vector and the source side context vector by using a preset neural network model, so that the process of obtaining the first translation vector may be: first adopting the GRU to the third translation vector. And the source side context vector is processed to obtain an intermediate vector, and then the intermediate vector is interpolated and combined with the third translation vector to obtain a first translation vector.
- performing the decrementing operation inside the GRU is advantageous for improving the accuracy of the operation and increasing the efficiency of the operation.
- the first obtaining module 302 includes: a determining unit 3021, configured to determine, according to the decoder state of the second moment, the second translation vector, the third translation vector, and the vector of the source end content in the source end vector, determining the alignment of the source content a second determining unit 3022, configured to determine, according to an alignment probability of the source content determined by the first determining unit 3021 and a semantic vector of the source content, the source end corresponding to the first moment Context vector.
- the vector of the source content in the sequence is used to determine the alignment probability of the source content, and then The source side context vector corresponding to the first moment is determined according to the alignment probability of the source content and the semantic vector of the source content.
- the attention module in the target information determining device can be made aware of which source content has been translated, and which source content has not been translated, thereby placing more attention on the untranslated content. Reduce the focus on translated content to mitigate the problem of missing translations and duplicate translations.
- the target information determining apparatus 30 may further include a second determining module 305, configured by the decoding module 304 to perform decoding processing on the first translation vector and/or the second translation vector and the source side context vector to obtain target information at a first moment Determining, according to the decoder state of the second moment, the target information of the second moment, the source side context vector, the first translation vector, and the second translation vector, decoding of the first moment
- the decoding module 304 includes: a decoding unit 3041, configured to perform, for the decoder state of the first moment, the source context vector, the first translation vector, and/or the second translation vector Decoding processing to obtain target information of the first moment.
- the decoder state at the first moment is first determined according to the decoder state at the second moment, the target information at the second moment, the source context vector, the first translation vector, and the second translation vector.
- the decoder state, the source context vector, the first translation vector, and/or the second translation vector at the first moment are then subjected to decoding processing to obtain target information at the first moment.
- the modeling of the first translation vector and/or the second translation vector is independent from the decoder state, and can form a complete source-side semantic vector with the source-side context vector output by the attention module at the first moment. Expressed and passed to the decoder to generate more accurate target information.
- the target information determining apparatus 30 is further configured to: obtain, according to the first translation vector and the third translation vector, a first indicator expected value, where the first indicator expected value is used to represent a future translation vector change and a
- the third indicator obtaining module 307 is configured to obtain a second indicator expectation value according to the second translation vector and the fourth translation vector, where the second The indicator expectation value is used to indicate a semantic consistency between the past translation vector change and the target information of the first time;
- the second determining module 308 is configured to use the first indicator expectation value obtained by the second acquiring module 306.
- the second indicator expectation value acquired by the third obtaining module 307 determines a training target, wherein the training target is used to construct a preset neural network model.
- the first indicator expectation value is obtained according to the first translation vector and the third translation vector
- the second indicator expectation value is obtained according to the second translation vector and the fourth translation vector, and then according to the first indicator expectation value and
- the second indicator expectation value determines a training goal, wherein the training goal is used to construct a preset neural network model.
- FIG. 20 is a schematic structural diagram of a target information determining apparatus according to an embodiment of the present invention.
- the target information determining apparatus 300 may generate a large difference due to different configurations or performances, and may include one or more central processing units (central processing units).
- CPU 322 eg, one or more processors
- memory 332 e.g., one or more storage media 330 storing application 342 or data 344 (eg, one or one storage device in Shanghai).
- the memory 332 and the storage medium 330 may be short-term storage or persistent storage.
- the program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations in the target information determining device.
- the central processing unit 322 can be configured to communicate with the storage medium 330 to perform a series of instruction operations in the storage medium 330 on the target information determining apparatus 300.
- the target information determining apparatus 300 may also include one or more power sources 326, one or more wired or wireless network interfaces 350, one or more input and output interfaces 358, and/or one or more operating systems 341, such as Windows ServerTM. , Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
- the steps performed by the target information determining means in the above embodiment may determine the device structure based on the target information shown in FIG.
- the CPU 322 is configured to perform the following steps: performing encoding processing on the processed text information to obtain a source vector representation sequence; and acquiring a source side context vector corresponding to the first moment according to the source end vector representation sequence, wherein the source side context The vector is used to represent the source content to be processed; the first translation vector and/or the second translation vector is determined according to the source vector representation sequence and the source context vector, wherein the first translation vector indicates the first The source end vector indicates the source content not translated in the sequence, and the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second time, the second moment is Determining a moment before the first moment; performing decoding processing on the first translation vector and/or the second translation vector and the source context vector to obtain target information at the first moment.
- the CPU 322 is specifically configured to: acquire a third translation vector corresponding to the second moment according to the source end vector representation sequence; and use the preset neural network model to the third translation vector and the The source context vector is processed to obtain the first translation vector.
- the CPU 322 is specifically configured to: acquire a third translation vector corresponding to the second moment according to the source end vector representation sequence; and use the preset neural network model to the third translation vector and the The source side context vector is processed to obtain the first translation vector; and the second translation vector is obtained according to the position of the source side context vector appearing in the source end vector representation sequence, wherein the second translation vector
- the translation vector is used to update the fourth translation vector corresponding to the first moment, and the fourth translation vector is obtained by processing the second translation vector and the source context vector by using the preset neural network model.
- the CPU 322 is specifically configured to: obtain the second translation vector according to the location of the source-side context vector that appears in the source-side vector representation sequence, where the second translation vector is used. And generating a fourth translation vector corresponding to the first moment, where the fourth translation vector is obtained by processing the second translation vector and the source context vector by using the preset neural network model.
- the CPU 322 is specifically configured to perform the step of: subtracting the source side context vector from the third translation vector by using a gated loop unit GRU to obtain the first translation vector.
- the CPU 322 is specifically configured to: process the third translation vector and the source side context vector by using a GRU to obtain an intermediate vector; and perform the intermediate vector and the third translation vector Interpolation is combined to obtain the first translation vector.
- the CPU 322 is specifically configured to: perform, according to the decoder state of the second moment, the second translation vector, the third translation vector, and the source end vector, indicating source content in the sequence.
- the vector determines the alignment probability of the source content; and determines the source context vector corresponding to the first moment according to the alignment probability of the source content and the semantic vector of the source content.
- the CPU 322 is further configured to: perform, according to the decoder state of the second moment, the target information of the second moment, the source context vector, the first translation vector, and the A translation vector determines the decoder state at the first instant.
- the CPU 322 is specifically configured to perform the following steps: performing decoding processing on the decoder state, the first translation vector, the second translation vector, and the source-side context vector at the first moment to obtain the first Target information at the moment.
- the CPU 322 is further configured to: obtain a first indicator expected value according to the first translation vector and the third translation vector, where the first indicator expected value is used to indicate a future translation vector change and a semantic consistency between the target information at the first time; obtaining a second indicator expectation value according to the second translation vector and the fourth translation vector, wherein the second indicator expected value is used to represent past translation a semantic consistency between the vector change and the target information at the first time; determining a training target based on the first indicator expected value and the second indicator expected value, wherein the training target is used to construct a preset neural network model.
- the disclosed system, apparatus, and method may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
- the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
- a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like.
- an embodiment of the present invention also provides a computer readable storage medium comprising instructions that, when run on a computer, cause the computer to perform the method as described above.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (16)
- 一种翻译的方法,所述方法应用于神经网络机器翻译系统,所述方法包括:采用编码器对待处理文本信息进行编码处理,以得到源端向量表示序列;其中,所述待处理文本信息属于第一语言;根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;采用解码器对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息;其中,所述目标信息属于不同于所述第一语言的第二语言。
- 根据权利要求1所述的方法,其中,所述采用编码器对待处理文本信息进行编码处理,以得到源端向量表示序列,包括:将所述待处理文本信息输入至所述编码器;采用所述编码器对所述待处理文本信息进行编码处理;根据编码处理的结果获取所述源端向量表示序列;其中,所述源端向量表示序列中各个源端向量属于所述第一语言。
- 根据权利要求1或2所述的方法,其中,所述采用解码器对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息,包括:将所述翻译向量以及所述源端上下文向量输入至所述解码器;采用所述解码器对所述翻译向量以及所述源端上下文向量进行解码处理;根据解码处理的结果获取所述待处理的源端内容的翻译内容,其中,所述翻译内容为所述第一时刻的目标信息。
- 一种目标信息确定的方法,包括:对待处理文本信息进行编码处理,以得到源端向量表示序列;根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所 述翻译向量包括第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息。
- 根据权利要求4所述的方法,其中,根据所述源端向量表示序列以及所述源端上下文向量确定第一翻译向量,包括:根据所述源端向量表示序列获取第三翻译向量;所述第三翻译向量为所述第二时刻内所述源端向量表示序列中未被翻译的源端内容对应的向量;采用预设神经网络模型对所述第三翻译向量和所述源端上下文向量进行处理,以得到所述第一翻译向量。
- 根据权利要求4所述的方法,其中,根据所述源端向量表示序列以及所述源端上下文向量确定第一翻译向量和第二翻译向量,包括:根据所述源端向量表示序列获取第三翻译向量;所述第三翻译向量为所述第二时刻内所述源端向量表示序列中未被翻译的源端内容对应的向量;采用预设神经网络模型对所述第三翻译向量和所述源端上下文向量进行处理,以得到所述第一翻译向量;根据所述源端上下文向量在所述源端向量表示序列中出现的位置,获取所述第二翻译向量;其中,所述第二翻译向量用于更新第四翻译向量,所述第四翻译向量为所述第一时刻内所述源端向量表示序列中已被翻译的源端内容对应的向量,所述第四翻译向量为采用所述预设神经网络模型对所述第二翻译向量和所述源端上下文向量处理后得到的。
- 根据权利要求4所述的方法,其中,根据所述源端向量表示序列以及所述源端上下文向量确定第二翻译向量,包括:根据所述源端上下文向量在所述源端向量表示序列中出现的位置,获取所述第二翻译向量,其中,所述第二翻译向量用于生成第四翻译向量,所述第四翻译向量为所述第一时刻内所述源端向量表示序列中已被翻译的源端内容对应的向量,所述第四翻译向量为采用所述预设神经网络模型对所述第二翻译向量和所述源端上下文向量处理后得到的。
- 根据权利要求5或6所述的方法,其中,所述采用预设神经网络模型对所述第三翻译向量和所述源端上下文向量进行处理,以得到所述第一翻译向量,包括:采用门控循环单元从所述第三翻译向量中减去所述源端上下文向量,以得到所述第一翻译向量。
- 根据权利要求5或6所述的方法,其中,所述采用预设神经网络模型对所述第三翻译向量和所述源端上下文向量进行处理,以得到所述第一翻译向量,包括:采用门控循环单元对所述第三翻译向量和所述源端上下文向量进行处理,以得到中间向量;将所述中间向量与所述第三翻译向量进行插值合并,以得到所述第一翻译向量。
- 根据权利要求4所述的方法,其中,所述根据所述源端向量表示序列获取第一时刻对应的源端上下文向量,包括:根据所述第二时刻的解码器状态、所述第二翻译向量、第三翻译向量以及所述源端向量表示序列中源端内容的向量,确定源端内容的对齐概率;所述第三翻译向量为所述第二时刻内所述源端向量表示序列中未被翻译的源端内容对应的向量;根据所述源端内容的对齐概率以及所述源端内容的语义向量,确定所述第一时刻对应的所述源端上下文向量。
- 根据权利要求4所述的方法,其中,所述对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息之前,所述方法还包括:根据所述第二时刻的解码器状态、所述第二时刻的目标信息、所述源端上下文向量和所述翻译向量,确定所述第一时刻的解码器状态;对应的,所述对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息,包括:对所述第一时刻的解码器状态、所述源端上下文向量和所述翻译向量进行解码处理,以得到所述第一时刻的目标信息。
- 根据权利要求10或11所述的方法,其中,所述方法还包括:根据所述第一翻译向量以及所述第三翻译向量,获取第一指标期望值,其中,所述第一指标期望值用于表示未来翻译向量变化与所述第一时刻的目标信息之间语义的一致性情况;根据所述第二翻译向量以及所述第四翻译向量,获取第二指标期望值,其中,所述第二指标期望值用于表示过去翻译向量变化与所述第一时刻的目标信息之间语义的一致性情况;根据所述第一指标期望值以及所述第二指标期望值确定训练目标,其中,所述训练目标用于构建预设神经网络模型。
- 一种目标信息确定装置,包括至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:编码模块,用于对待处理文本信息进行编码处理,以得到源端向量表示序列;第一获取模块,用于根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;第一确定模块,用 于根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;解码模块,用于对对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息。
- 一种目标信息确定装置,包括:存储器、处理器以及总线系统;其中,所述存储器用于存储程序;所述处理器用于执行所述存储器中的程序,包括如下步骤:对待处理文本信息进行编码处理,以得到源端向量表示序列;根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息;所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
- 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1-12所述的方法。
- 一种目标信息确定的方法,该方法由电子设备执行,该方法包括:对待处理文本信息进行编码处理,以得到源端向量表示序列;根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020503957A JP7025090B2 (ja) | 2017-07-25 | 2018-07-11 | 翻訳方法、ターゲット情報決定方法および関連装置、ならびにコンピュータプログラム |
KR1020207002392A KR102382499B1 (ko) | 2017-07-25 | 2018-07-11 | 번역 방법, 타깃 정보 결정 방법, 관련 장치 및 저장 매체 |
EP18837956.4A EP3660707A4 (en) | 2017-07-25 | 2018-07-11 | TRANSLATION PROCEDURES, TARGET INFORMATION DETERMINATION PROCESS AND ASSOCIATED DEVICE AND STORAGE MEDIUM |
US16/749,243 US11928439B2 (en) | 2017-07-25 | 2020-01-22 | Translation method, target information determining method, related apparatus, and storage medium |
US18/390,153 US20240169166A1 (en) | 2017-07-25 | 2023-12-20 | Translation method, target information determining method, related apparatus, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710612833.7 | 2017-07-25 | ||
CN201710612833.7A CN107368476B (zh) | 2017-07-25 | 2017-07-25 | 一种翻译的方法、目标信息确定的方法及相关装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/749,243 Continuation US11928439B2 (en) | 2017-07-25 | 2020-01-22 | Translation method, target information determining method, related apparatus, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019019916A1 true WO2019019916A1 (zh) | 2019-01-31 |
Family
ID=60306920
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/095231 WO2019019916A1 (zh) | 2017-07-25 | 2018-07-11 | 翻译的方法、目标信息确定的方法及相关装置、存储介质 |
Country Status (6)
Country | Link |
---|---|
US (2) | US11928439B2 (zh) |
EP (1) | EP3660707A4 (zh) |
JP (1) | JP7025090B2 (zh) |
KR (1) | KR102382499B1 (zh) |
CN (1) | CN107368476B (zh) |
WO (1) | WO2019019916A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472255A (zh) * | 2019-08-20 | 2019-11-19 | 腾讯科技(深圳)有限公司 | 神经网络机器翻译方法、模型、电子终端以及存储介质 |
CN111209395A (zh) * | 2019-12-27 | 2020-05-29 | 铜陵中科汇联科技有限公司 | 一种短文本相似度计算系统及其训练方法 |
US11710003B2 (en) | 2018-02-26 | 2023-07-25 | Tencent Technology (Shenzhen) Company Limited | Information conversion method and apparatus, storage medium, and electronic device |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101328745B1 (ko) * | 2011-06-13 | 2013-11-20 | (주)우남케미칼 | 예비발포입자 제조 시스템 |
CN107368476B (zh) * | 2017-07-25 | 2020-11-03 | 深圳市腾讯计算机系统有限公司 | 一种翻译的方法、目标信息确定的方法及相关装置 |
CN108363763B (zh) * | 2018-02-05 | 2020-12-01 | 深圳市腾讯计算机系统有限公司 | 一种自动问答方法、装置和存储介质 |
CN108197123A (zh) * | 2018-02-07 | 2018-06-22 | 云南衍那科技有限公司 | 一种基于智能手表的云翻译系统和方法 |
CN110134971B (zh) | 2018-02-08 | 2022-12-16 | 腾讯科技(深圳)有限公司 | 一种机器翻译的方法、设备以及计算机可读存储介质 |
CN110489761B (zh) * | 2018-05-15 | 2021-02-02 | 科大讯飞股份有限公司 | 一种篇章级文本翻译方法及装置 |
CN108984539B (zh) * | 2018-07-17 | 2022-05-17 | 苏州大学 | 基于模拟未来时刻的翻译信息的神经机器翻译方法 |
CN109062897A (zh) * | 2018-07-26 | 2018-12-21 | 苏州大学 | 基于深度神经网络的句子对齐方法 |
CN109274814B (zh) * | 2018-08-20 | 2020-10-23 | 维沃移动通信有限公司 | 一种消息提示方法、装置及终端设备 |
CN109271646B (zh) * | 2018-09-04 | 2022-07-08 | 腾讯科技(深圳)有限公司 | 文本翻译方法、装置、可读存储介质和计算机设备 |
CN109146064B (zh) * | 2018-09-05 | 2023-07-25 | 腾讯科技(深圳)有限公司 | 神经网络训练方法、装置、计算机设备和存储介质 |
CN109446534B (zh) * | 2018-09-21 | 2020-07-31 | 清华大学 | 机器翻译方法及装置 |
CN111428516B (zh) * | 2018-11-19 | 2022-08-19 | 腾讯科技(深圳)有限公司 | 一种信息处理的方法以及装置 |
CN109543199B (zh) * | 2018-11-28 | 2022-06-10 | 腾讯科技(深圳)有限公司 | 一种文本翻译的方法以及相关装置 |
CN109858045B (zh) * | 2019-02-01 | 2020-07-10 | 北京字节跳动网络技术有限公司 | 机器翻译方法和装置 |
CN109933809B (zh) * | 2019-03-15 | 2023-09-15 | 北京金山数字娱乐科技有限公司 | 一种翻译方法及装置、翻译模型的训练方法及装置 |
CN111783435A (zh) * | 2019-03-18 | 2020-10-16 | 株式会社理光 | 共享词汇的选择方法、装置及存储介质 |
CN110110337B (zh) * | 2019-05-08 | 2023-04-18 | 网易有道信息技术(北京)有限公司 | 翻译模型训练方法、介质、装置和计算设备 |
CN111597829B (zh) * | 2020-05-19 | 2021-08-27 | 腾讯科技(深圳)有限公司 | 翻译方法和装置、存储介质和电子设备 |
US11741317B2 (en) * | 2020-05-25 | 2023-08-29 | Rajiv Trehan | Method and system for processing multilingual user inputs using single natural language processing model |
CN114254660A (zh) * | 2020-09-22 | 2022-03-29 | 北京三星通信技术研究有限公司 | 多模态翻译方法、装置、电子设备及计算机可读存储介质 |
KR20220056004A (ko) * | 2020-10-27 | 2022-05-04 | 삼성전자주식회사 | 전자 장치 및 이의 제어 방법 |
JPWO2022102364A1 (zh) * | 2020-11-13 | 2022-05-19 | ||
KR20230135990A (ko) * | 2022-03-17 | 2023-09-26 | 주식회사 아론티어 | 트랜스포머와 원자 환경을 이용한 역합성 번역 방법 및 이를 수행하기 위한 장치 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105068998A (zh) * | 2015-07-29 | 2015-11-18 | 百度在线网络技术(北京)有限公司 | 基于神经网络模型的翻译方法及装置 |
US20160179790A1 (en) * | 2013-06-03 | 2016-06-23 | National Institute Of Information And Communications Technology | Translation apparatus, learning apparatus, translation method, and storage medium |
CN106126507A (zh) * | 2016-06-22 | 2016-11-16 | 哈尔滨工业大学深圳研究生院 | 一种基于字符编码的深度神经翻译方法及系统 |
CN106372058A (zh) * | 2016-08-29 | 2017-02-01 | 中译语通科技(北京)有限公司 | 一种基于深度学习的短文本情感要素抽取方法及装置 |
CN106663092A (zh) * | 2014-10-24 | 2017-05-10 | 谷歌公司 | 具有罕见词处理的神经机器翻译系统 |
CN107368476A (zh) * | 2017-07-25 | 2017-11-21 | 深圳市腾讯计算机系统有限公司 | 一种翻译的方法、目标信息确定的方法及相关装置 |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2612404C (en) * | 2005-06-17 | 2014-05-27 | National Research Council Of Canada | Means and method for adapted language translation |
US9471565B2 (en) * | 2011-07-29 | 2016-10-18 | At&T Intellectual Property I, L.P. | System and method for locating bilingual web sites |
SG11201404225WA (en) * | 2012-01-27 | 2014-08-28 | Nec Corp | Term translation acquisition method and term translation acquisition apparatus |
JP6296592B2 (ja) * | 2013-05-29 | 2018-03-20 | 国立研究開発法人情報通信研究機構 | 翻訳語順情報出力装置、機械翻訳装置、学習装置、翻訳語順情報出力方法、学習方法、およびプログラム |
US9535960B2 (en) * | 2014-04-14 | 2017-01-03 | Microsoft Corporation | Context-sensitive search using a deep learning model |
US9846836B2 (en) * | 2014-06-13 | 2017-12-19 | Microsoft Technology Licensing, Llc | Modeling interestingness with deep neural networks |
US9778929B2 (en) * | 2015-05-29 | 2017-10-03 | Microsoft Technology Licensing, Llc | Automated efficient translation context delivery |
CN106484682B (zh) * | 2015-08-25 | 2019-06-25 | 阿里巴巴集团控股有限公司 | 基于统计的机器翻译方法、装置及电子设备 |
CN106484681B (zh) * | 2015-08-25 | 2019-07-09 | 阿里巴巴集团控股有限公司 | 一种生成候选译文的方法、装置及电子设备 |
CN106672058A (zh) * | 2015-11-08 | 2017-05-17 | 匡桃红 | 婴儿车无晃动收合关节 |
US9792534B2 (en) * | 2016-01-13 | 2017-10-17 | Adobe Systems Incorporated | Semantic natural language vector space |
US11449744B2 (en) * | 2016-06-23 | 2022-09-20 | Microsoft Technology Licensing, Llc | End-to-end memory networks for contextual language understanding |
US10706351B2 (en) * | 2016-08-30 | 2020-07-07 | American Software Safety Reliability Company | Recurrent encoder and decoder |
US20180197080A1 (en) * | 2017-01-11 | 2018-07-12 | International Business Machines Corporation | Learning apparatus and method for bidirectional learning of predictive model based on data sequence |
KR102637338B1 (ko) * | 2017-01-26 | 2024-02-16 | 삼성전자주식회사 | 번역 보정 방법 및 장치와 번역 시스템 |
US10839790B2 (en) * | 2017-02-06 | 2020-11-17 | Facebook, Inc. | Sequence-to-sequence convolutional architecture |
CN107092664B (zh) * | 2017-03-30 | 2020-04-28 | 华为技术有限公司 | 一种内容解释方法及装置 |
KR102329127B1 (ko) * | 2017-04-11 | 2021-11-22 | 삼성전자주식회사 | 방언을 표준어로 변환하는 방법 및 장치 |
US10733380B2 (en) * | 2017-05-15 | 2020-08-04 | Thomson Reuters Enterprise Center Gmbh | Neural paraphrase generator |
CN108959312B (zh) * | 2017-05-23 | 2021-01-29 | 华为技术有限公司 | 一种多文档摘要生成的方法、装置和终端 |
CN110309839B (zh) * | 2019-08-27 | 2019-12-03 | 北京金山数字娱乐科技有限公司 | 一种图像描述的方法及装置 |
-
2017
- 2017-07-25 CN CN201710612833.7A patent/CN107368476B/zh active Active
-
2018
- 2018-07-11 KR KR1020207002392A patent/KR102382499B1/ko active IP Right Grant
- 2018-07-11 WO PCT/CN2018/095231 patent/WO2019019916A1/zh unknown
- 2018-07-11 EP EP18837956.4A patent/EP3660707A4/en not_active Ceased
- 2018-07-11 JP JP2020503957A patent/JP7025090B2/ja active Active
-
2020
- 2020-01-22 US US16/749,243 patent/US11928439B2/en active Active
-
2023
- 2023-12-20 US US18/390,153 patent/US20240169166A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160179790A1 (en) * | 2013-06-03 | 2016-06-23 | National Institute Of Information And Communications Technology | Translation apparatus, learning apparatus, translation method, and storage medium |
CN106663092A (zh) * | 2014-10-24 | 2017-05-10 | 谷歌公司 | 具有罕见词处理的神经机器翻译系统 |
CN105068998A (zh) * | 2015-07-29 | 2015-11-18 | 百度在线网络技术(北京)有限公司 | 基于神经网络模型的翻译方法及装置 |
CN106126507A (zh) * | 2016-06-22 | 2016-11-16 | 哈尔滨工业大学深圳研究生院 | 一种基于字符编码的深度神经翻译方法及系统 |
CN106372058A (zh) * | 2016-08-29 | 2017-02-01 | 中译语通科技(北京)有限公司 | 一种基于深度学习的短文本情感要素抽取方法及装置 |
CN107368476A (zh) * | 2017-07-25 | 2017-11-21 | 深圳市腾讯计算机系统有限公司 | 一种翻译的方法、目标信息确定的方法及相关装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3660707A4 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11710003B2 (en) | 2018-02-26 | 2023-07-25 | Tencent Technology (Shenzhen) Company Limited | Information conversion method and apparatus, storage medium, and electronic device |
CN110472255A (zh) * | 2019-08-20 | 2019-11-19 | 腾讯科技(深圳)有限公司 | 神经网络机器翻译方法、模型、电子终端以及存储介质 |
CN110472255B (zh) * | 2019-08-20 | 2021-03-02 | 腾讯科技(深圳)有限公司 | 神经网络机器翻译方法、模型、电子终端以及存储介质 |
CN111209395A (zh) * | 2019-12-27 | 2020-05-29 | 铜陵中科汇联科技有限公司 | 一种短文本相似度计算系统及其训练方法 |
CN111209395B (zh) * | 2019-12-27 | 2022-11-11 | 铜陵中科汇联科技有限公司 | 一种短文本相似度计算系统及其训练方法 |
Also Published As
Publication number | Publication date |
---|---|
US11928439B2 (en) | 2024-03-12 |
JP2020528625A (ja) | 2020-09-24 |
EP3660707A1 (en) | 2020-06-03 |
CN107368476A (zh) | 2017-11-21 |
CN107368476B (zh) | 2020-11-03 |
KR20200019740A (ko) | 2020-02-24 |
JP7025090B2 (ja) | 2022-02-24 |
KR102382499B1 (ko) | 2022-04-01 |
EP3660707A4 (en) | 2020-08-05 |
US20200226328A1 (en) | 2020-07-16 |
US20240169166A1 (en) | 2024-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019019916A1 (zh) | 翻译的方法、目标信息确定的方法及相关装置、存储介质 | |
CN110309514B (zh) | 一种语义识别方法及装置 | |
CN107836000B (zh) | 用于语言建模和预测的改进的人工神经网络方法、电子设备 | |
WO2019154210A1 (zh) | 机器翻译的方法、设备以及计算机可读存储介质 | |
CN112487182A (zh) | 文本处理模型的训练方法、文本处理方法及装置 | |
CN113657399A (zh) | 文字识别模型的训练方法、文字识别方法及装置 | |
CN110162766B (zh) | 词向量更新方法和装置 | |
WO2018232699A1 (zh) | 一种信息处理的方法及相关装置 | |
CN110457661B (zh) | 自然语言生成方法、装置、设备及存储介质 | |
CN111144140B (zh) | 基于零次学习的中泰双语语料生成方法及装置 | |
US20220148239A1 (en) | Model training method and apparatus, font library establishment method and apparatus, device and storage medium | |
CN111738020B (zh) | 一种翻译模型的训练方法及装置 | |
WO2022100481A1 (zh) | 一种文本信息的翻译方法、装置、电子设备和存储介质 | |
US11710003B2 (en) | Information conversion method and apparatus, storage medium, and electronic device | |
US20230215203A1 (en) | Character recognition model training method and apparatus, character recognition method and apparatus, device and storage medium | |
CN110020440B (zh) | 一种机器翻译方法、装置、服务器及存储介质 | |
JP2023002690A (ja) | セマンティックス認識方法、装置、電子機器及び記憶媒体 | |
JP2023072022A (ja) | マルチモーダル表現モデルのトレーニング方法、クロスモーダル検索方法及び装置 | |
WO2020155769A1 (zh) | 关键词生成模型的建模方法和装置 | |
CN115312034A (zh) | 基于自动机和字典树处理语音信号的方法、装置和设备 | |
CN114580444A (zh) | 文本翻译模型的训练方法、设备及存储介质 | |
CN113947091A (zh) | 用于语言翻译的方法、设备、装置和介质 | |
CN111178097B (zh) | 基于多级翻译模型生成中泰双语语料的方法及装置 | |
CN117034951A (zh) | 基于大语言模型的具有特定语言风格的数字人 | |
CN108874786B (zh) | 机器翻译方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18837956 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20207002392 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2020503957 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2018837956 Country of ref document: EP Effective date: 20200225 |