WO2019019916A1 - 翻译的方法、目标信息确定的方法及相关装置、存储介质 - Google Patents

翻译的方法、目标信息确定的方法及相关装置、存储介质 Download PDF

Info

Publication number
WO2019019916A1
WO2019019916A1 PCT/CN2018/095231 CN2018095231W WO2019019916A1 WO 2019019916 A1 WO2019019916 A1 WO 2019019916A1 CN 2018095231 W CN2018095231 W CN 2018095231W WO 2019019916 A1 WO2019019916 A1 WO 2019019916A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
source
translation
translation vector
moment
Prior art date
Application number
PCT/CN2018/095231
Other languages
English (en)
French (fr)
Inventor
涂兆鹏
周浩
史树明
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2020503957A priority Critical patent/JP7025090B2/ja
Priority to KR1020207002392A priority patent/KR102382499B1/ko
Priority to EP18837956.4A priority patent/EP3660707A4/en
Publication of WO2019019916A1 publication Critical patent/WO2019019916A1/zh
Priority to US16/749,243 priority patent/US11928439B2/en
Priority to US18/390,153 priority patent/US20240169166A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present invention relate to the field of computer technologies, and in particular, to a translation method, a method for determining target information, a related device, and a storage medium.
  • Machine translation refers to the process of using a machine to convert text or speech from one language to another with the same meaning.
  • MT Machine translation
  • NMT neural machine translation
  • Embodiments of the present invention provide a translation method, a method for determining target information, a related device, and a storage medium.
  • An aspect of the present invention provides a method for translation, the method being applied to a neural network machine translation system, the method comprising: encoding an object to be processed by an encoder to obtain a source vector representation sequence; The processing text information belongs to the first language; the source end vector vector corresponding to the first time is obtained according to the source end vector representation sequence; wherein the source end context vector corresponding to the first time is used to indicate the first Determining, in the text information to be processed, the source content to be processed; determining a translation vector according to the source end vector representation sequence and the source side context vector; wherein the translation vector comprises a first translation vector and/or a second a translation vector, the first translation vector is a vector corresponding to the source content that is not translated in the source vector representation sequence in the first time, and the second translation vector is the source vector representation in the second time instant a vector corresponding to the source content that has been translated in the sequence, where the second time is a time adjacent to the first time; Decoding the translation vector and the source side context vector with a
  • Another aspect of the present invention provides a method for determining target information, including: performing encoding processing on a text information to be processed to obtain a source vector representation sequence; and obtaining a source side context corresponding to the first moment according to the source end vector representation sequence a source side context vector corresponding to the first moment is used to indicate source content to be processed in the to-be-processed text information at the first moment; a sequence and a source according to the source end vector
  • the end context vector determines a translation vector; wherein the translation vector comprises a first translation vector and/or a second translation vector, wherein the first translation vector is not translated in the source vector representation sequence in the first time instant a vector corresponding to the source content, where the second translation vector is a vector corresponding to the source content that has been translated in the source vector representation sequence in the second time, and the second time is before the first time a neighboring moment; decoding the translation vector and the source context vector to obtain target information at the first moment
  • a still further aspect of the present invention provides an object information determining apparatus including at least one memory; at least one processor; wherein the at least one memory stores at least one instruction module configured to be executed by the at least one processor;
  • the at least one instruction module includes: an encoding module, configured to perform encoding processing on the text information to be processed, to obtain a source vector representation sequence; and a first obtaining module, configured to acquire, according to the source end vector representation sequence, the first time corresponding a source-side context vector, where the source-side context vector corresponding to the first moment is used to indicate the source content to be processed in the to-be-processed text information at the first moment; and the first determining module is configured to The source vector representation sequence and the source context vector determine a translation vector; wherein the translation vector comprises a first translation vector and/or a second translation vector, wherein the first translation vector is within a first time instant
  • the source vector represents a vector corresponding to the untranslated source content in the sequence, the second translation vector a vector corresponding to the
  • a still further aspect of the present invention provides a target information determining apparatus, including: a memory, a processor, and a bus system; wherein the memory is used to store a program; and the processor is configured to execute a program in the memory, including the following steps Encoding the processed text information to obtain a source vector representation sequence; obtaining a source context vector corresponding to the first moment according to the source vector representation sequence; wherein the source context vector corresponding to the first moment is used Determining source content to be processed in the to-be-processed text information at the first moment; determining a translation vector according to the source end vector representation sequence and the source side context vector; wherein the translation vector includes the first a translation vector and/or a second translation vector, wherein the first translation vector is a vector corresponding to the source content that is not translated in the source vector representation sequence in the first time, the second translation vector is The source end vector indicates a vector corresponding to the source content that has been translated in the sequence, and the second time is a moment adjacent to the first moment; decoding the
  • Yet another aspect of the present invention provides a computer readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the methods described in the above aspects.
  • a still further aspect of the present invention provides a method for determining target information, which is performed by an electronic device, the method comprising: performing encoding processing on a text information to be processed to obtain a source vector representation sequence; and obtaining a sequence according to the source vector a source-side context vector corresponding to the first moment; wherein the source-side context vector corresponding to the first moment is used to indicate source content to be processed in the to-be-processed text information at the first moment; according to the source end a vector representation sequence and the source side context vector determining a translation vector; wherein the translation vector comprises a first translation vector and/or a second translation vector, wherein the first translation vector is at the source end within a first time instant
  • the vector represents a vector corresponding to the untranslated source content in the sequence, and the second translation vector is a vector corresponding to the source content that has been translated in the source vector representation sequence in the second time, the second moment a moment adjacent to the first moment; decoding the translation vector and the source context vector to Get the target
  • FIG. 1 is a block diagram of an apparatus for determining target information in an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method for determining target information in an embodiment of the present invention
  • FIG. 3 is a schematic diagram of an embodiment of a method for determining target information according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of a gated loop unit according to an embodiment of the present invention.
  • FIG. 5 is another schematic structural diagram of a gated loop unit according to an embodiment of the present invention.
  • FIG. 6 is another schematic structural diagram of a gated loop unit according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of an embodiment of an enhanced attention module according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of an embodiment of an enhanced decoder state according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of an embodiment of translating a source end vector representation sequence in a source sequence according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of an embodiment of translating a source end vector representation sequence in a source sequence according to an embodiment of the present invention
  • FIG. 11 is a schematic diagram of an embodiment of a target information determining apparatus according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
  • FIG. 14 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
  • FIG. 15 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
  • FIG. 16 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
  • FIG. 17 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
  • FIG. 18 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
  • FIG. 19 is a schematic diagram of another embodiment of a target information determining apparatus according to an embodiment of the present invention.
  • FIG. 20 is a schematic structural diagram of a target information determining apparatus according to an embodiment of the present invention.
  • the embodiment of the invention provides a method for translation, a method for determining target information, and related devices, which can model the untranslated source content and/or the translated source content in the source vector representation sequence. That is, the part of the content is stripped from the original language model for training, thereby reducing the difficulty of the model training of the decoder and improving the translation effect of the translation system.
  • an encoder-decoder which is to convert an input sequence into a vector having a length, and the so-called decoding is converted into a vector sequence generated by the encoder. Output sequence.
  • the encoder-decoder model has many applications, such as translation, document extraction and question answering systems.
  • translation the input sequence is the text to be translated and the output sequence is the translated text.
  • the input sequence is the question that is raised, and the output sequence is the answer.
  • CNN convolutional neural networks
  • RNN recurrent neural networks
  • GRU gating Gated recurrent unit
  • LSTM long-term recurrent neural network
  • BiRNN bidirectional recurrent neural networks
  • FIG. 1 is a schematic structural diagram of a target information determining apparatus according to an embodiment of the present invention.
  • two hidden layers are additionally introduced in a decoder, and the two hidden layers may be represented by a vector sequence.
  • the picture Indicates the second translation vector corresponding to the time t-1, and the second translation vector refers to the source content that has been translated, that is, the past translation vector.
  • the first translation vector corresponding to the time t is represented, and the first translation vector refers to the source content that has not been translated, that is, the future translation vector.
  • c t represents the source side context vector corresponding to the time t.
  • s t represents the decoder state at time t.
  • the present invention directly models past translations (translated content) and future translations (untranslated content) at the semantic level, and separates relevant content from the decoder state, thereby improving neural network translation.
  • the system stores and utilizes related content to improve the translation system.
  • the method provided by the present invention can be used in a mainstream neural network machine translation system.
  • FIG. 2 is a schematic flowchart of a method for determining target information according to an embodiment of the present invention.
  • the encoder module S1 inputs a sentence to be processed in step 101, and then outputs a source vector representation sequence by the encoder module S1.
  • the following steps are repeated by the attention module S2, the past future module S3, and the decoder module S4 until all the translations are generated:
  • the attention module S2 reads in the past translation vector and the future translation vector at the time t-1, wherein the initial translation vector is an all-zero vector, indicating that no source content is translated, and the initial translation vector is the source vector. Represents the last vector of the sequence, representing a summary of the source sentence.
  • the attention module S2 outputs the source side context vector of the current time, that is, the time t, in step 103.
  • module S3 reads the source context vector of the current time, and in step 104 updates the past translation vector and the future translation vector at time t.
  • the decoder module S4 reads the future translation vector at the t-th time, the past translation vector at the t-1th time, the source-side context vector at the t-th time, and other standard inputs, and generates the target word at the t-th time in step 105.
  • the present invention can be applied to an NMT system.
  • the following describes a method for translation provided by the present invention.
  • One embodiment of the method for translation in the embodiment of the present invention includes:
  • the encoder first processes the text information to be processed to obtain a source vector representation sequence, wherein the text information to be processed belongs to the first language, such as Chinese, and it can be understood that, in practical applications, Can be other types of languages.
  • the process of the encoding process is specifically: inputting the text information to be processed into an encoder in the NMT system, and then encoding the text information to be processed by the encoder, and finally obtaining a source vector representation sequence according to the result of the encoding process, the source end vector Indicates that each source vector in the sequence belongs to the first language.
  • the text information to be processed can be a Chinese sentence, which contains several phrases.
  • the source vector representation sequence can be obtained, and then the source time context vector corresponding to the current time, that is, the first time instant, is obtained, wherein the source side context vector is used to represent the source content to be processed.
  • the source content can be a certain word in this Chinese sentence.
  • the source side context vector corresponding to the first moment is used to indicate the source content to be processed in the to-be-processed text information at the first moment.
  • the NMT system determines a first translation vector and/or a second translation vector according to the source vector representation sequence and the source context vector, wherein the first translation vector indicates that the source vector representation sequence is not translated in the first time instant The source content, and the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second time, and the second time is a time adjacent to the first time. If the first moment is time t, then the second moment is the time t-1.
  • the first translation vector and/or the second translation vector may be referred to as a translation vector, and the translation vector may be a first translation vector, a second translation vector, or a first translation vector and a second translation. vector.
  • the first translation vector is a vector corresponding to the source content that is not translated in the source vector representation sequence in the first time instant
  • the second translation vector is the source that has been translated in the source vector representation sequence in the second time instant.
  • the source end vector indicates that the source content corresponding to the sequence is “up to 1,300 flights per month to the world.”
  • the words corresponding to the source vector are “monthly”, “to”, “Worldwide”, “of”, “flight”, “as many” and “1300”.
  • the future translation vector can be understood as the corresponding vector of the "world”, “of”, “flight”, “as many” and “1300” untranslated.
  • Past translation vectors can be understood as vectors corresponding to "monthly” and "to” that have been translated.
  • the first translation vector and/or the second translation vector and the source side context vector are decoded by the decoder in the NMT system to obtain target information at the first moment, wherein the target information belongs to the second language.
  • the second language is a language different from the first language, and may be English, French, or Japanese, and is not limited herein.
  • the output target information can be "all parts of the world", that is, the first language is Chinese and the second language is English, which ends the process of machine translation.
  • a translation method which can model the untranslated source content and/or the translated source content in the source vector representation sequence, that is, the content is
  • the original language model is stripped out for training, which reduces the difficulty of model training of the decoder and improves the translation effect of the translation system.
  • FIG. 3 is an embodiment of a method for determining target information according to an embodiment of the present invention, including:
  • the encoder in the target information determining apparatus performs encoding processing on the text information to be processed; wherein the text information to be processed may be a sentence to be translated, for example, “multiple airports are closed.”
  • the sentence is encoded.
  • the source vector representation sequence is obtained.
  • the source vector represents one source content (source word) corresponding to each vector in the sequence, for example, the source content in the sentence “Multiple airports are closed.”
  • the source content in the sentence is “multiple”, “airport”, “Being”, “Closed”, “.” and " ⁇ eos>”.
  • the decoder in the target information determining means generates the translation word by word.
  • the source side context vector corresponding to the first moment is used to indicate the source content to be processed in the to-be-processed text information at the first moment.
  • the target information determining apparatus may obtain the source side context vector corresponding to the first moment according to the source end vector representation sequence, where the first time instant is the tth moment in the embodiment of the present invention, and the source side context vector is used to represent Source content to be processed.
  • the target information determining apparatus outputs an alignment probability for each source content, such as 0.0 or 0.2, and the source vector indicates that the sum of the alignment probabilities in the sequence is 1, and the alignment probability is larger, indicating the source content and The more relevant the target information to be generated.
  • the source side context vector at time t is generated by weighting the alignment probability and the semantic vector.
  • the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second time, and the second time is a time adjacent to the first time;
  • first translation vector and/or the second translation vector can be referred to as a translation vector.
  • the target information determining apparatus may determine the first translation vector according to the source end vector representation sequence and the source side context vector, or determine the second translation vector according to the source end vector representation sequence and the source side context vector, or according to the source end.
  • the vector representation sequence and the source side context vector determine the first translation vector and the second translation vector.
  • the first translation vector indicates the source content that is not translated in the source vector representation sequence in the first time instant
  • the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second time, second The moment is a moment adjacent to the first moment.
  • the first translation vector represents a future translation vector at time t
  • the second translation vector represents a past translation vector at time t-1.
  • the decoder in the target information determining apparatus uses a neural network output layer to decode the first translation vector and the source side context vector, and obtains the target information at the first moment.
  • the second translation vector and the source side context vector are decoded to obtain target information at the first moment.
  • the first translation vector, the second translation vector, and the source side context vector are decoded to obtain target information at the first moment.
  • a plurality of pieces of information to be selected may be generated, and finally one word with the highest similarity is output as the target information. For example, “multiple airports are closed.”
  • “multiple” can be translated into “many” or “much”, but it can be known from the semantic knowledge stored in the decoder state vector, before the countable nouns. "many”, therefore, the “multiple” here is finally translated into “many”.
  • a method for determining target information is provided.
  • the target information determining apparatus performs encoding processing on the processed text information to obtain a source end vector representation sequence, and then obtains a first time corresponding according to the source end vector representation sequence.
  • the source side context vector is used to represent the source content to be processed, and the first translation vector and/or the second translation vector is determined according to the source end vector representation sequence and the source side context vector, the first translation vector Indicates that the source end vector is not translated in the source end vector in the first time instant, and the second translation vector indicates the source end content that has been translated in the source end vector representation sequence in the second time instant, and the second moment is before the first moment
  • the final target information determining means performs decoding processing on the first translation vector and/or the second translation vector and the source side context vector to obtain target information at the first moment.
  • the source end content in the source vector representation sequence and/or the translated source content in the sequence may be modeled, that is, the part of the content is stripped from the original language model for training. Thereby reducing the difficulty of the model training of the decoder and improving the translation effect of the translation system.
  • the first translation is determined according to the source end vector representation sequence and the source end context vector.
  • Vectors which can include:
  • the third translation vector and the source side context vector are processed using a preset neural network model to obtain a first translation vector.
  • the third translation vector is a vector corresponding to the untranslated source content in the sequence in the source vector representation in the second time.
  • the determining, by the target information determining apparatus, the first translation vector according to the source vector representation sequence and the source side context vector may include: first acquiring the third translation vector corresponding to the second moment according to the source end vector representation sequence, and then The third translation vector and the source side context vector are processed by using a preset neural network model to obtain a first translation vector.
  • the target information determining apparatus needs to read the source-side context vector of the first time (representing the source content translated at the first time) And then update the stored future translation vector.
  • the initialization of the future translation vector is a summary of the source sentence (usually the source vector represents the last vector of the sequence), indicating that all source content of the start is not translated. So at every moment, the update is as follows:
  • the future translation vector representing the t-1th moment, that is, the third translation vector, c t represents the source-side context vector at the time t, and
  • RNN() represents the calculation using the RNN model.
  • RNN is only a schematic representation of the default neural network model.
  • the preset neural network model may be LSTM, time delay network model or gated convolutional neural network, or other types.
  • the neural network structure is not limited here.
  • the neural network model processes the third translation vector and the source side context vector to obtain a first translation vector. In the above manner, by using the preset neural network model to output the first translation vector, the accuracy of the future translation vector can be improved.
  • the first step is determined according to the source end vector representation sequence and the source end context vector.
  • the translation vector and the second translation vector may include:
  • the neural network model is obtained by processing the second translation vector and the source side context vector.
  • the fourth translation vector is a vector corresponding to the source content of the sequence that has been translated in the source vector representation in the first time.
  • the first moment is the t-th time
  • the second moment is the t-1th moment
  • the source-side context vector of the first moment ie, the source-side semantic content being translated
  • c t is also used to update the past translation vector and future translation vector. Updates are as follows:
  • the future translation vector representing the t-1th moment, that is, the third translation vector, c t represents the source-side context vector at the time t, and
  • RNN() represents the calculation using the RNN model.
  • a past translation vector representing the tth moment that is, a fourth translation vector
  • the past translation vector representing the t-1th moment that is, the second translation vector
  • c t represents the source side context vector at the time t
  • RNN() represents the calculation using the RNN model.
  • RNN is only a schematic representation of the default neural network model.
  • the preset neural network model may be LSTM, time delay network model or gated convolutional neural network, or other types.
  • the neural network structure is not limited here.
  • the third translation vector and the source side context vector are processed by using a preset neural network model to obtain a first translation vector, and the second translation can also be obtained according to the position of the source side context vector in the source vector representation sequence.
  • determining the second translation according to the source end vector representation sequence and the source side context vector Vectors which can include:
  • the neural network model is obtained by processing the second translation vector and the source side context vector.
  • the target information determining apparatus may acquire the second translation vector at the second moment according to the position of the source end context vector at the first moment in the sequence of the source end vector representation.
  • the target information determining apparatus needs to read the source side context vector at the t-2th time and the past translation vector at the t-2th time. Then, the source context vector at time t-2 and the past translation vector at time t-2 are processed by a preset neural network model to obtain a past translation vector at time t-1. Past translation vectors are initialized to all zero vectors Indicates that no source content has been translated at the beginning. So at every moment, the update is as follows:
  • a past translation vector representing the tth moment that is, a fourth translation vector
  • the past translation vector representing the t-1th moment that is, the second translation vector
  • c t represents the source side context vector at the time t
  • RNN() represents the calculation using the RNN model.
  • RNN is only a schematic representation of the default neural network model.
  • the preset neural network model may be LSTM, time delay network model or gated convolutional neural network, or other types.
  • the neural network structure is not limited here.
  • how to determine the second translation vector according to the source end vector representation sequence and the source side context vector that is, obtain the second translation vector according to the position of the source side context vector appearing in the source vector representation sequence.
  • the second translation vector is used to generate a fourth translation vector corresponding to the first moment, and the fourth translation vector is obtained by processing the second translation vector and the source context vector by using a preset neural network model.
  • the preset neural network model to output the second translation vector, the accuracy of the past translation vector can be improved.
  • Processing the third translation vector and the source side context vector to obtain the first translation vector may include:
  • the source side context vector is subtracted from the third translation vector using a gated loop unit GRU to obtain a first translation vector.
  • the GRU is a shorthand for the Gated Recurrent Unit.
  • the first time is the t-th time
  • the second time is the t-1 time
  • the source-side context vector c t of the first time is from the third translation vector (ie, until the first Subtracted from the vector corresponding to the source content that has not been translated at the second moment.
  • the invention can be applied to a variety of RNN structures.
  • the mainstream GRU as an example.
  • FIG. 4 is a schematic structural diagram of a gated loop unit according to an embodiment of the present invention.
  • a “decremental” model can be established, and it is desirable that the parameters of the GRU can automatically learn the following rules:
  • U, W, U r , W r , U u and Wu represent function-related parameters that are uniformly trained with other parameters of the neural network translation system.
  • FIG. 5 is another schematic structural diagram of a gated loop unit according to an embodiment of the present invention.
  • the future translation vector of the moment (ie, time t-1) is subtracted, wherein the future translation vector of the second moment is the third translation vector. Get the required first translation vector and pass it to the GRU structure.
  • the GRU may be used to subtract the source context vector from the third translation vector to obtain the first translation vector, and then the obtained first translation vector is transmitted to the GRU structure.
  • a decrementing signal can be given in the GRU, which is advantageous for learning the law. Thereby improving the accuracy of model training.
  • Processing the third translation vector and the source side context vector to obtain the first translation vector may include:
  • the third translation vector and the source side context vector are processed by the GRU to obtain an intermediate vector
  • the intermediate vector is interpolated with the third translation vector to obtain a first translation vector.
  • FIG. 6 is another schematic structural diagram of a gated loop unit according to an embodiment of the present invention.
  • the second moment is the t-1th time, and the parameters of the GRU can automatically learn the following rules:
  • the future translation vector representing the time t-1, that is, the third translation vector, c t represents the source side context vector at the first moment.
  • the third translation vector at time t-1 can be interpolated and combined to obtain the final first translation vector.
  • the past translation vector and the future translation vector can be obtained at each moment. Represents the source content that has been translated until time t, Indicates the source content that has not been translated until time t.
  • the target information determining apparatus processes the third translation vector and the source side context vector by using a preset neural network model, so that the process of obtaining the first translation vector may be: first adopting the GRU to the third translation vector. And the source side context vector is processed to obtain an intermediate vector, and then the intermediate vector is interpolated and combined with the third translation vector to obtain a first translation vector.
  • performing the decrementing operation inside the GRU is advantageous for improving the accuracy of the operation and increasing the efficiency of the operation.
  • the source end context corresponding to the first moment is obtained according to the source end vector representation sequence.
  • Vectors which can include:
  • the source side context vector corresponding to the first moment is determined according to the alignment probability of the source content and the semantic vector of the source content.
  • FIG. 7 is an enhanced attention module according to an embodiment of the present invention. a schematic diagram of the embodiment, specifically, assuming that the first time t for the first time, the second time point t-1 for the first time, the target information determination means includes an encoder and a decoder, a second decoder states S t time decoder -1 , second translation vector Third translation vector And the source vector represents the vector h i of the source content in the sequence, and determines the alignment probability ⁇ t,i of the source content.
  • the alignment probability ⁇ t,i is calculated by the following formula:
  • ⁇ t,i refers to the alignment probability distribution of each source content output by the attention mechanism
  • the sum of the alignment probability distributions is 1
  • h i is the vector representation of the ith source content of the input sentence in the encoder
  • softmax () represents the normalization operation
  • the value input by the neural network is usually a negative and positive value, so usually it is first converted to a positive value using its index value, and then all the index values are normalized. To get the probability distribution.
  • a() is the operation of the attention module.
  • the semantic vector x i of the corresponding source content is weighted and summed to obtain the source side context vector corresponding to the first moment.
  • ⁇ t,1 at the first moment is 0.5
  • ⁇ t,2 is 0.3
  • ⁇ t,3 is 0.2
  • x 1 is 2
  • x 2 is 4, and
  • x 3 is 6, so the source end corresponding to the first moment
  • the context vector c t is calculated as:
  • the calculation probability of the alignment probability ⁇ t,i can also be or
  • the vector of the source content in the sequence is used to determine the alignment probability of the source content, and then The source side context vector corresponding to the first moment is determined according to the alignment probability of the source content and the semantic vector of the source content.
  • the attention module in the target information determining device can be made aware of which source content has been translated, and which source content has not been translated, thereby placing more attention on the untranslated content. Reduce the focus on translated content to mitigate the problem of missing translations and duplicate translations.
  • the first translation vector and/or the second translation vector and the source are provided on the basis of the foregoing embodiment corresponding to FIG.
  • the method further includes: decoding the state according to the second moment, the target information of the second moment, the source context vector, the first translation vector, and the second translation. a vector that determines a decoder state at a first time;
  • performing decoding processing on the first translation vector and/or the second translation vector and the source-side context vector to obtain target information at the first moment may include: a decoder state at the first moment, a source-side context vector, The first translation vector and/or the second translation vector are subjected to decoding processing to obtain target information at the first moment.
  • the target information determining apparatus obtains the target information of the first time, first, according to the decoder state of the second time, the target information of the second time, the source side context vector, the first translation vector, and the second translation.
  • the vector determines the decoder state at the first moment, wherein the first moment is the t-th time, also the current time, and the second time is the t-1th time, also as the previous time.
  • FIG. 8 is a schematic diagram of an embodiment of an enhanced decoder state according to an embodiment of the present invention.
  • the decoder state at the second moment is s t-1
  • the target information at the second moment is y t -1
  • the source context vector is c t
  • the first translation vector is The second translation vector.
  • the decoder state s t at the first moment can be calculated using the following formula:
  • f() represents the activation function of updating the decoder state, and is also the standard configuration of the neural network translation model, and its input can be flexibly changed according to actual needs.
  • the decoder state at the first moment is first determined according to the decoder state at the second moment, the target information at the second moment, the source context vector, the first translation vector, and the second translation vector.
  • the decoder state, the source context vector, the first translation vector, and/or the second translation vector at the first moment are then subjected to decoding processing to obtain target information at the first moment.
  • the modeling of the first translation vector and/or the second translation vector is independent from the decoder state, and can form a complete source-side semantic vector with the source-side context vector output by the attention module at the first moment. Expressed and passed to the decoder to generate more accurate target information.
  • the method further includes:
  • the training target is determined according to the first indicator expectation value and the second index expectation value, wherein the training target is used to construct a preset neural network model.
  • a method for increasing the training target is also provided, and the preset neural network model can be better trained by increasing the training target.
  • the training of future translation vectors will introduce the training of future translation vectors as an example. It can be understood that the way of training the translation vectors in the past is similar, and will not be described here. Take the future translation vector as an example, you need to reach as much as possible. That is, the information difference between the two translation vectors at the adjacent time is substantially the same as the source content translated at the moment to satisfy the modeling of the future translation vector.
  • the first indicator expectation value can be calculated as follows:
  • E(y t ) represents the target information
  • y t is a vector representation of the target information
  • It is an indicator of the expected value of the future translation vector as we expected (for example, the amount of updates and the source content being translated are substantially equal). The higher the expected value of the indicator, the more it meets our expectations.
  • the second indicator expectation value can also be based on the second translation vector.
  • the fourth translation vector Calculated Further obtaining the expected value of the second indicator
  • the training target can be calculated as follows:
  • J( ⁇ , ⁇ ) represents the parameters obtained through training, which is a general representation of the training target.
  • represents the parameters of the NMT system, and ⁇ represents the parameters of the newly introduced past future modules.
  • a training goal that represents a standard neural network translation model, that is, a probability of maximizing the generation of each target information, or may be expressed as a likelihood score that maximizes the generation of the target word.
  • the first indicator expectation value is obtained according to the first translation vector and the third translation vector
  • the second indicator expectation value is obtained according to the second translation vector and the fourth translation vector, and then according to the first indicator expectation value and
  • the second indicator expectation value determines a training goal, wherein the training goal is used to construct a preset neural network model.
  • FIG. 9 is the first source content in the translation source vector representation sequence in the application scenario of the present invention.
  • a schematic diagram of an embodiment is specifically:
  • the encoder reads in the input sentence "Multiple airports are forced to close. ⁇ eos>", ⁇ eos> represents the sentence terminator, and then outputs a source vector representation sequence, where each vector (ie, the dot in Figure 9 is straight Article) corresponds to a source content. Based on this source vector representation sequence, the decoder generates a translation.
  • the alignment probability and the semantic vector are weighted to generate the source-side context vector c 1 at the first moment, and the alignment probability is 0.5, 0.2, 0.2, 0.1, 0.0, and 0.0 in FIG.
  • the alignment probability is 0.5, 0.2, 0.2, 0.1, 0.0, and 0.0 in FIG.
  • a past translation vector representing the first moment a past translation vector representing the initial moment
  • Indicates the future translation vector for the first moment Represents the future translation vector at the initial moment.
  • Decoder pair c 1 And a decoder status s 0 at the initial time decoding may be updated decoder states s 1 of the first time, according to the s 0 and c 1, using a neural network output layer, and all target side words, selects similar the highest degree of a word as the target information y 1, y 1 which is "more than" translation "many".
  • FIG. 10 is a schematic diagram of an embodiment of translating a source end vector representation sequence in a source sequence according to an embodiment of the present invention.
  • the alignment probability and the semantic vector are weighted to generate a second moment.
  • the source side context vector c 2 the alignment probability is 0.3, 0.6, 0.1, 0.0, 0.0, and 0.0 in FIG.
  • the past translation vector and the future translation vector are updated, that is, the following formula is adopted:
  • a past translation vector representing the second moment a past translation vector representing the first moment
  • Indicates the future translation vector for the second moment Indicates the future translation vector for the first moment.
  • Decoder pair c 2 Decoding with the decoder state s 1 at the first time, the decoder state at the second time can be updated, and a neural network output layer is used according to s 1 , c 2 and the previously generated target information y 1 , and all destinations from the word, selects the highest similarity as a target word information y 2, y 2 which is the "airport" translation "airports".
  • the target information determining apparatus 30 in the embodiment of the present invention includes at least one memory; at least one processor; wherein the at least one memory stores at least one The instruction module is configured to be executed by the at least one processor; wherein the at least one instruction module comprises: an encoding module 301, configured to perform encoding processing on the text information to be processed to obtain a source vector representation sequence; the first acquisition module 302.
  • the source end context vector corresponding to the first moment is obtained according to the source end vector representation sequence encoded by the encoding module 301, where the source end context vector is used to represent the source content to be processed; a determining module 303, configured to determine, according to the source vector representation sequence encoded by the encoding module 301 and the source side context vector acquired by the first obtaining module 302, a first translation vector and/or a second translation a vector, wherein the first translation vector indicates the source vector representation in the first time instant The source content that is not translated, the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second time, and the second time is the adjacent one before the first time a decoding module 304, configured to perform decoding processing on the first translation vector and/or the second translation vector determined by the first determining module 303 and the source context vector to obtain a first moment Target information.
  • the encoding module 301 performs encoding processing on the text information to be processed to obtain a source vector representation sequence
  • the first obtaining module 302 obtains the first moment according to the source end vector representation sequence obtained by the encoding module 301.
  • a source-side context vector where the source-side context vector is used to represent the source-side content to be processed
  • the first determining module 303 is configured according to the source-end vector representation sequence encoded by the encoding module 301 and the first
  • the source side context vector obtained by the obtaining module 302 determines a first translation vector and/or a second translation vector, wherein the first translation vector indicates a source that is not translated in the source vector representation sequence in the first time instant End content, the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second time, the second time is a time adjacent to the first time, and the decoding module 304 Decoding the first translation vector and/or the second translation vector determined by the first determining module 303 and the source context vector Processing to obtain target information for the first time.
  • a target information determining apparatus performs encoding processing on the text information to be processed to obtain a source end vector representation sequence, and then obtains a first moment according to the source end vector representation sequence.
  • a source side context vector which is used to represent the source content to be processed, and determines a first translation vector and/or a second translation vector according to the source end vector representation sequence and the source side context vector, the first translation vector indication
  • the source end vector represents the untranslated source content in the sequence
  • the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second moment
  • the second moment is the first moment before the phase
  • the final target information determining means performs decoding processing on the first translation vector and/or the second translation vector and the source side context vector to obtain target information at the first moment.
  • the source end content in the source vector representation sequence and/or the translated source content in the sequence may be modeled, that is, the part of the content is stripped from the original language model for training. Thereby reducing the difficulty of the model training of the decoder and improving the translation effect of the translation system.
  • the first determining module 303 includes: a first acquiring unit 3031, configured to acquire a third translation vector corresponding to the second moment according to the source end vector representation sequence; and a first processing unit 3032, configured to adopt a preset neural network
  • the network model processes the third translation vector acquired by the first obtaining unit 3031 and the source side context vector to obtain the first translation vector.
  • the neural network model processes the third translation vector and the source side context vector to obtain a first translation vector. In the above manner, by using the preset neural network model to output the first translation vector, the accuracy of the future translation vector can be improved.
  • the first determining module 303 includes: a second acquiring unit 3033, configured to acquire a third translation vector corresponding to the second moment according to the source end vector representation sequence; and a second processing unit 3034, configured to adopt a preset neural network
  • the network model processes the third translation vector and the source context vector acquired by the second obtaining unit 3033 to obtain the first translation vector, and the third obtaining unit 3035 is configured to use the source end according to the source And acquiring, by the context vector, the second translation vector, where the source vector represents a position in the sequence, wherein the second translation vector is used to update a fourth translation vector corresponding to the first moment, where The four translation vectors are obtained by processing the second translation vector and the source side context vector using the preset neural network model.
  • the third translation vector and the source side context vector are processed by using a preset neural network model to obtain a first translation vector, and the second translation can also be obtained according to the position of the source side context vector in the source vector representation sequence.
  • the first determining module 303 includes: a fourth obtaining unit 3036, configured to acquire the second translation vector according to a location where the source side context vector appears in the source vector representation sequence, where the second translation vector is used to generate the first a fourth translation vector corresponding to the time, the fourth translation vector being obtained by processing the second translation vector and the source side context vector by using the preset neural network model.
  • how to determine the second translation vector according to the source end vector representation sequence and the source side context vector that is, obtain the second translation vector according to the position of the source side context vector appearing in the source vector representation sequence.
  • the second translation vector is used to generate a fourth translation vector corresponding to the first moment, and the fourth translation vector is obtained by processing the second translation vector and the source context vector by using a preset neural network model.
  • the preset neural network model to output the second translation vector, the accuracy of the past translation vector can be improved.
  • the first processing unit 3032 includes: a subtraction sub-unit 30321, configured to subtract the source-side context vector from the third translation vector by using a gated loop unit GRU to obtain the first translation vector.
  • the GRU may be used to subtract the source context vector from the third translation vector to obtain the first translation vector, and then the obtained first translation vector is transmitted to the GRU structure.
  • a decrementing signal can be given in the GRU, which is advantageous for learning the law. Thereby improving the accuracy of model training.
  • the first processing unit 3032 is configured to process the third translation vector and the source side context vector by using a GRU to obtain an intermediate vector
  • the merging subunit 30323 is configured to process the processing subunit 30322.
  • the intermediate vector is interpolated and combined with the third translation vector to obtain the first translation vector.
  • the target information determining apparatus processes the third translation vector and the source side context vector by using a preset neural network model, so that the process of obtaining the first translation vector may be: first adopting the GRU to the third translation vector. And the source side context vector is processed to obtain an intermediate vector, and then the intermediate vector is interpolated and combined with the third translation vector to obtain a first translation vector.
  • performing the decrementing operation inside the GRU is advantageous for improving the accuracy of the operation and increasing the efficiency of the operation.
  • the first obtaining module 302 includes: a determining unit 3021, configured to determine, according to the decoder state of the second moment, the second translation vector, the third translation vector, and the vector of the source end content in the source end vector, determining the alignment of the source content a second determining unit 3022, configured to determine, according to an alignment probability of the source content determined by the first determining unit 3021 and a semantic vector of the source content, the source end corresponding to the first moment Context vector.
  • the vector of the source content in the sequence is used to determine the alignment probability of the source content, and then The source side context vector corresponding to the first moment is determined according to the alignment probability of the source content and the semantic vector of the source content.
  • the attention module in the target information determining device can be made aware of which source content has been translated, and which source content has not been translated, thereby placing more attention on the untranslated content. Reduce the focus on translated content to mitigate the problem of missing translations and duplicate translations.
  • the target information determining apparatus 30 may further include a second determining module 305, configured by the decoding module 304 to perform decoding processing on the first translation vector and/or the second translation vector and the source side context vector to obtain target information at a first moment Determining, according to the decoder state of the second moment, the target information of the second moment, the source side context vector, the first translation vector, and the second translation vector, decoding of the first moment
  • the decoding module 304 includes: a decoding unit 3041, configured to perform, for the decoder state of the first moment, the source context vector, the first translation vector, and/or the second translation vector Decoding processing to obtain target information of the first moment.
  • the decoder state at the first moment is first determined according to the decoder state at the second moment, the target information at the second moment, the source context vector, the first translation vector, and the second translation vector.
  • the decoder state, the source context vector, the first translation vector, and/or the second translation vector at the first moment are then subjected to decoding processing to obtain target information at the first moment.
  • the modeling of the first translation vector and/or the second translation vector is independent from the decoder state, and can form a complete source-side semantic vector with the source-side context vector output by the attention module at the first moment. Expressed and passed to the decoder to generate more accurate target information.
  • the target information determining apparatus 30 is further configured to: obtain, according to the first translation vector and the third translation vector, a first indicator expected value, where the first indicator expected value is used to represent a future translation vector change and a
  • the third indicator obtaining module 307 is configured to obtain a second indicator expectation value according to the second translation vector and the fourth translation vector, where the second The indicator expectation value is used to indicate a semantic consistency between the past translation vector change and the target information of the first time;
  • the second determining module 308 is configured to use the first indicator expectation value obtained by the second acquiring module 306.
  • the second indicator expectation value acquired by the third obtaining module 307 determines a training target, wherein the training target is used to construct a preset neural network model.
  • the first indicator expectation value is obtained according to the first translation vector and the third translation vector
  • the second indicator expectation value is obtained according to the second translation vector and the fourth translation vector, and then according to the first indicator expectation value and
  • the second indicator expectation value determines a training goal, wherein the training goal is used to construct a preset neural network model.
  • FIG. 20 is a schematic structural diagram of a target information determining apparatus according to an embodiment of the present invention.
  • the target information determining apparatus 300 may generate a large difference due to different configurations or performances, and may include one or more central processing units (central processing units).
  • CPU 322 eg, one or more processors
  • memory 332 e.g., one or more storage media 330 storing application 342 or data 344 (eg, one or one storage device in Shanghai).
  • the memory 332 and the storage medium 330 may be short-term storage or persistent storage.
  • the program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations in the target information determining device.
  • the central processing unit 322 can be configured to communicate with the storage medium 330 to perform a series of instruction operations in the storage medium 330 on the target information determining apparatus 300.
  • the target information determining apparatus 300 may also include one or more power sources 326, one or more wired or wireless network interfaces 350, one or more input and output interfaces 358, and/or one or more operating systems 341, such as Windows ServerTM. , Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
  • the steps performed by the target information determining means in the above embodiment may determine the device structure based on the target information shown in FIG.
  • the CPU 322 is configured to perform the following steps: performing encoding processing on the processed text information to obtain a source vector representation sequence; and acquiring a source side context vector corresponding to the first moment according to the source end vector representation sequence, wherein the source side context The vector is used to represent the source content to be processed; the first translation vector and/or the second translation vector is determined according to the source vector representation sequence and the source context vector, wherein the first translation vector indicates the first The source end vector indicates the source content not translated in the sequence, and the second translation vector indicates the source content that has been translated in the source vector representation sequence in the second time, the second moment is Determining a moment before the first moment; performing decoding processing on the first translation vector and/or the second translation vector and the source context vector to obtain target information at the first moment.
  • the CPU 322 is specifically configured to: acquire a third translation vector corresponding to the second moment according to the source end vector representation sequence; and use the preset neural network model to the third translation vector and the The source context vector is processed to obtain the first translation vector.
  • the CPU 322 is specifically configured to: acquire a third translation vector corresponding to the second moment according to the source end vector representation sequence; and use the preset neural network model to the third translation vector and the The source side context vector is processed to obtain the first translation vector; and the second translation vector is obtained according to the position of the source side context vector appearing in the source end vector representation sequence, wherein the second translation vector
  • the translation vector is used to update the fourth translation vector corresponding to the first moment, and the fourth translation vector is obtained by processing the second translation vector and the source context vector by using the preset neural network model.
  • the CPU 322 is specifically configured to: obtain the second translation vector according to the location of the source-side context vector that appears in the source-side vector representation sequence, where the second translation vector is used. And generating a fourth translation vector corresponding to the first moment, where the fourth translation vector is obtained by processing the second translation vector and the source context vector by using the preset neural network model.
  • the CPU 322 is specifically configured to perform the step of: subtracting the source side context vector from the third translation vector by using a gated loop unit GRU to obtain the first translation vector.
  • the CPU 322 is specifically configured to: process the third translation vector and the source side context vector by using a GRU to obtain an intermediate vector; and perform the intermediate vector and the third translation vector Interpolation is combined to obtain the first translation vector.
  • the CPU 322 is specifically configured to: perform, according to the decoder state of the second moment, the second translation vector, the third translation vector, and the source end vector, indicating source content in the sequence.
  • the vector determines the alignment probability of the source content; and determines the source context vector corresponding to the first moment according to the alignment probability of the source content and the semantic vector of the source content.
  • the CPU 322 is further configured to: perform, according to the decoder state of the second moment, the target information of the second moment, the source context vector, the first translation vector, and the A translation vector determines the decoder state at the first instant.
  • the CPU 322 is specifically configured to perform the following steps: performing decoding processing on the decoder state, the first translation vector, the second translation vector, and the source-side context vector at the first moment to obtain the first Target information at the moment.
  • the CPU 322 is further configured to: obtain a first indicator expected value according to the first translation vector and the third translation vector, where the first indicator expected value is used to indicate a future translation vector change and a semantic consistency between the target information at the first time; obtaining a second indicator expectation value according to the second translation vector and the fourth translation vector, wherein the second indicator expected value is used to represent past translation a semantic consistency between the vector change and the target information at the first time; determining a training target based on the first indicator expected value and the second indicator expected value, wherein the training target is used to construct a preset neural network model.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like.
  • an embodiment of the present invention also provides a computer readable storage medium comprising instructions that, when run on a computer, cause the computer to perform the method as described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

一种翻译的方法、目标信息确定的方法及相关装置、存储介质,该翻译的方法包括:采用编码器对待处理文本信息进行编码处理,以得到源端向量表示序列(201);其中,所述待处理文本信息属于第一语言;根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容(202);根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量(203);采用解码器对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息(204);其中,所述目标信息属于不同于所述第一语言的第二语言。

Description

翻译的方法、目标信息确定的方法及相关装置、存储介质
本申请要求于2017年07月25日提交中国专利局、申请号为201710612833.7发明名称为“一种翻译的方法、目标信息确定的方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及计算机技术领域,尤其涉及一种翻译的方法、目标信息确定的方法及相关装置、存储介质。
背景
机器翻译(machine translation,MT)是指使用机器将文本或言语从一种语言转化为具有相同含义内容的另一种语言的过程。随着深度学习的兴起,最近两年深层神经网络技术在MT上也得到应用,神经网络机器翻译(neural machine translation,NMT)成为新一代翻译技术。
技术内容
本发明实施例提供了一种翻译的方法、目标信息确定的方法及相关装置、存储介质。
本发明一方面提供了一种翻译的方法,所述方法应用于神经网络机器翻译系统,所述方法包括:采用编码器对待处理文本信息进行编码处理,以得到源端向量表示序列;其中,所述待处理文本信息属于第一语言;根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;采用解码器对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息;其中,所述目标信息属于不同于所述第一语言的第二语言。
本发明另一方面提供了一种目标信息确定的方法,包括:对待处理文本信息进行编码处理,以得到源端向量表示序列;根据所述源端向量表示序列获取第一时刻 对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息。
本发明又一方面提供了一种目标信息确定装置,包括至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:编码模块,用于对待处理文本信息进行编码处理,以得到源端向量表示序列;第一获取模块,用于根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;第一确定模块,用于根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;解码模块,用于对对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息。
本发明又一方面提供了一种目标信息确定装置,包括:存储器、处理器以及总线系统;其中,所述存储器用于存储程序;所述处理器用于执行所述存储器中的程序,包括如下步骤:对待处理文本信息进行编码处理,以得到源端向量表示序列;根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息;所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
本发明又一方面提供了一种计算机可读存储介质,所述计算机可读存储介质中 存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
本发明又一方面提供一种目标信息确定方法,该方法由电子设备执行,该方法包括:对待处理文本信息进行编码处理,以得到源端向量表示序列;根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息。
附图简要说明
图1为本发明实施例中目标信息确定装置的架构图;
图2为本发明实施例中目标信息确定的方法一个流程示意图;
图3为本发明实施例中目标信息确定的方法一个实施例示意图;
图4为本发明实施例中门控循环单元的一个结构示意图;
图5为本发明实施例中门控循环单元的另一个结构示意图;
图6为本发明实施例中门控循环单元的另一个结构示意图;
图7为本发明实施例中增强注意力模块的一个实施例示意图;
图8为本发明实施例中增强解码器状态的一个实施例示意图;
图9为本发明应用场景中翻译源端向量表示序列中第一个源端内容的实施例示意图;
图10为本发明应用场景中翻译源端向量表示序列中第二个源端内容的实施例示意图;
图11为本发明实施例中目标信息确定装置一个实施例示意图;
图12为本发明实施例中目标信息确定装置另一个实施例示意图;
图13为本发明实施例中目标信息确定装置另一个实施例示意图;
图14为本发明实施例中目标信息确定装置另一个实施例示意图;
图15为本发明实施例中目标信息确定装置另一个实施例示意图;
图16为本发明实施例中目标信息确定装置另一个实施例示意图;
图17为本发明实施例中目标信息确定装置另一个实施例示意图;
图18为本发明实施例中目标信息确定装置另一个实施例示意图;
图19为本发明实施例中目标信息确定装置另一个实施例示意图;
图20为本发明实施例中目标信息确定装置一个结构示意图。
实施方式
本发明实施例提供了一种翻译的方法、目标信息确定的方法及相关装置,可以对源端向量表示序列中未被翻译的源端内容和/或已被翻译的源端内容进行建模处理,即把这部分内容从原来的语言模型中剥离出来进行训练,从而降低解码器的模型训练难度,提高翻译系统的翻译效果。
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应理解,本发明实施例主要应用于编码-解码模型(encoder-decoder),所谓编码,就是将输入序列转化成一个有长度的向量,而所谓解码,就是根据编码器生成的向量序列再转化成输出序列。encoder-decoder模型有很多的应用,例如翻译、文档摘取和问答系统等。在翻译中,输入序列是待翻译的文本,输出序列是翻译后的文本。在问答系统中,输入序列是提出的问题,而输出序列是答案。
可以理解的是,在具体实现的时候,编码器和解码器都不是固定的,可选的有卷积神经网络(convolutional neural networks,CNN)、循环神经网络(recurrent neural networks,RNN)、门控循环单元(gated recurrent unit,GRU)、时间递归神经网络(long short term memory,LSTM)以及双向循环神经网络(bidirectional recurrent neural networks,BiRNN)等,还可以在编码和解码时采用不同的神经网络,例如,编码时使用BiRNN解码时使用RNN,或者在编码时使用RNN解码时使用LSTM,此次不做限定。
请参阅图1,图1为本发明实施例中目标信息确定装置的架构图,如图所示,在解码器中额外引入两个隐层,这两个隐层可以用向量序列来表示。图中的
Figure PCTCN2018095231-appb-000001
表示第t-1时刻所对应的第二翻译向量,第二翻译向量是指已经翻译的源端内容,即过去翻译向量。图中的
Figure PCTCN2018095231-appb-000002
表示第t时刻所对应的第一翻译向量,第一翻译向量是指还未翻译的源端内容,即未来翻译向量。图中c t表示第t时刻对应的源端上下文向 量。图中s t表示第t时刻的解码器状态。
本发明通过引入额外的隐层,在语义层面直接对过去翻译(已被翻译内容)和未来翻译(未被翻译内容)进行建模,将相关内容从解码器状态中剥离出来,提高神经网络翻译系统对相关内容的储存及利用,从而提高翻译系统效果。本发明提供的方法可以用在主流神经网络机器翻译系统中。
为了便于理解,请参阅图2,图2为本发明实施例中目标信息确定的方法一个流程示意图,如图所示,具体为:
首先,编码器模块S1在步骤101中输入一个需要被处理的句子,然后由该编码器模块S1输出源端向量表示序列。接下来由注意力模块S2、过去未来模块S3和解码器模块S4重复以下的步骤,直到全部译文生成为止:
注意力模块S2读入第t-1时刻的过去翻译向量和未来翻译向量,其中,过去翻译向量的初始为全零向量,表示没有源端内容被翻译,而未来翻译向量的初始为源端向量表示序列的最后一个向量,表示源端句子的总结。注意力模块S2在步骤103中输出当前时刻即第t时刻的源端上下文向量。过去未来模块S3读取当前时刻的源端上下文向量,在步骤104中更新第t时刻的过去翻译向量和未来翻译向量。解码器模块S4读取第t时刻的未来翻译向量、第t-1时刻的过去翻译向量、第t时刻的源端上下文向量以及其他标准输入,并在步骤105中生成第t时刻的目标词。
本发明可以用于NMT系统,下面将介绍本发明所提供的一种翻译的方法,本发明实施例中翻译的方法一个实施例包括:
在NMT系统中,首先采用编码器对待处理文本信息进行编码处理,以得到源端向量表示序列,其中,待处理文本信息属于第一语言,例如中文,可以理解的是,在实际应用中,还可以是其他类型的语言。
编码处理的流程具体为,将待处理文本信息输入至NMT系统中的编码器,然后采用该编码器对待处理文本信息进行编码处理,最后根据编码处理的结果获取源端向量表示序列,源端向量表示序列中各个源端向量属于第一语言。
假设第一语言是中文,那么待处理文本信息可以是一个中文的句子,这个句子中包含了若干个词组。在对这个中文句子进行编码处理后,即可以得到源端向量表示序列,进而获取当前时刻即第一时刻所对应的源端上下文向量,其中,源端上下文向量用于表示待处理的源端内容,源端内容具体可以为这个中文句子中的某个词语。
可理解的是,第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容。
接下来,NMT系统会根据源端向量表示序列以及源端上下文向量确定第一翻译向量和/或第二翻译向量,其中,第一翻译向量指示第一时刻内在源端向量表示序列 中未被翻译的源端内容,而第二翻译向量指示第二时刻内在源端向量表示序列中已经被翻译的源端内容,第二时刻为第一时刻之前相邻的一个时刻。如果第一时刻是t时刻,那么第二时刻便是t-1时刻。
为了便于表述,可以将第一翻译向量和/或第二翻译向量称为翻译向量,翻译向量可以是第一翻译向量,也可以是第二翻译向量,也可以是第一翻译向量和第二翻译向量。
也就是说,第一翻译向量为第一时刻内在源端向量表示序列中未被翻译的源端内容对应的向量,第二翻译向量为第二时刻内在源端向量表示序列中已经被翻译的源端内容对应的向量。
举个例子,假设源端向量表示序列对应的源端内容为“每月到世界各地的航班多达1300个。”于是,源端向量所对应的词语即为“每月”、“到”、“世界各地”、“的”、“航班”、“多达”和“1300个”。如果当前时刻翻译到“世界各地”这个词语,那么未来翻译向量可理解为未被翻译的“世界各地”、“的”、“航班”、“多达”和“1300个”各自对应的向量,过去翻译向量可理解为已被翻译的“每月”和“到”对应的向量。
最后,在NMT系统中采用解码器对第一翻译向量和/或第二翻译向量以及源端上下文向量进行解码处理,以得到第一时刻的目标信息,其中,目标信息属于第二语言。可以理解的是,第二语言是与第一语言不同的一门语言,可以是英语、法语或者日语等,此处不做限定。
假设第一时刻翻译到“世界各地”这个词语,那么输出的目标信息可以为“all parts of the world”,也就是第一语言为中文,第二语言为英文,至此结束了机器翻译的过程。
本发明实施例中,提供了一种翻译的方法,可以对源端向量表示序列中未被翻译的源端内容和/或已被翻译的源端内容进行建模处理,即把这部分内容从原来的语言模型中剥离出来进行训练,从而降低解码器的模型训练难度,提高翻译系统的翻译效果。
下面将对本发明中目标信息确定的方法进行介绍,该方法可以由电子设备执行。请参阅图3,图3为本发明实施例中目标信息确定的方法一个实施例包括:
201、对待处理文本信息进行编码处理,以得到源端向量表示序列;
本实施例中,目标信息确定装置中的编码器对待处理文本信息进行编码处理;其中,待处理文本信息可以是一个待翻译的句子,例如“多个机场被关闭。”句子在经过编码处理后,可得到源端向量表示序列。
源端向量表示序列中每个向量对应的一个源端内容(源端词),例如,在“多个机场被关闭。”这个句子中的源端内容分别为“多个”、“机场”、“被”、“关闭”、“。” 和“<eos>”。根据此序列,目标信息确定装置中的解码器逐词生成译文。
202、根据源端向量表示序列获取第一时刻对应的源端上下文向量,其中,源端上下文向量用于表示待处理的源端内容;
可理解的是,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容。
本实施例中,目标信息确定装置根据源端向量表示序列,可以获取第一时刻对应的源端上下文向量,第一时刻即为本发明实施例中的第t时刻,源端上下文向量用于表示待处理的源端内容。
具体地,目标信息确定装置为每个源端内容输出一个对齐概率,例如0.0或者0.2等,源端向量表示序列中各个对齐概率之和为1,且对齐概率越大,表示这个源端内容和待生成的目标信息越相关。将对齐概率和语义向量加权即可生成第t时刻的源端上下文向量。
203、根据源端向量表示序列以及源端上下文向量确定第一翻译向量和/或第二翻译向量,其中,第一翻译向量指示第一时刻内在源端向量表示序列中未被翻译的源端内容,第二翻译向量指示第二时刻内在源端向量表示序列中已经被翻译的源端内容,第二时刻为第一时刻之前相邻的一个时刻;
可理解的是,第一翻译向量和/或第二翻译向量可以称之为翻译向量。
本实施例中,目标信息确定装置可以根据源端向量表示序列以及源端上下文向量确定第一翻译向量,或者根据源端向量表示序列以及源端上下文向量确定第二翻译向量,又或者根据源端向量表示序列以及源端上下文向量确定第一翻译向量和第二翻译向量。其中,第一翻译向量指示第一时刻内在源端向量表示序列中未被翻译的源端内容,第二翻译向量指示第二时刻内在源端向量表示序列中已经被翻译的源端内容,第二时刻为第一时刻之前相邻的一个时刻。
具体地,第一翻译向量表示第t时刻的未来翻译向量,第二翻译向量表示第t-1时刻的过去翻译向量。当生成完一个目标信息后,就把相应的源端上下文向量c t加到前一时刻的过去翻译向量上得到新的过去翻译向量,并从前一时刻的未来翻译向量中减去c t得到新的未来翻译向量。
204、对第一翻译向量和/或第二翻译向量以及源端上下文向量进行解码处理,以得到第一时刻的目标信息。
本实施例中,目标信息确定装置中的解码器使用一个神经网络输出层,对第一翻译向量以及源端上下文向量进行解码处理,可以得到第一时刻的目标信息。或者,对第二翻译向量以及源端上下文向量进行解码处理,可以得到第一时刻的目标信息。又或者,对第一翻译向量、第二翻译向量以及源端上下文向量进行解码处理,可以得到第一时刻的目标信息。
在生成目标信息的过程中可以生成多个待选择的信息,最后输出相似度最高的一个词语作为目标信息。例如,“多个机场被关闭。”这个句子中,“多个”可以被翻译成“many”或者“much”,然而根据解码器状态向量中存储的语义知识可以知道,在可数名词之前使用“many”,因此,这里的“多个”最后被翻译成“many”。
本发明实施例中,提供了一种目标信息确定的方法,首先由目标信息确定装置对待处理文本信息进行编码处理,以得到源端向量表示序列,然后根据源端向量表示序列获取第一时刻对应的源端上下文向量,该源端上下文向量用于表示待处理的源端内容,根据源端向量表示序列以及源端上下文向量确定第一翻译向量和/或第二翻译向量,该第一翻译向量指示第一时刻内在源端向量表示序列中未被翻译的源端内容,第二翻译向量指示第二时刻内在源端向量表示序列中已经被翻译的源端内容,第二时刻为第一时刻之前相邻的一个时刻,最后目标信息确定装置对第一翻译向量和/或第二翻译向量以及源端上下文向量进行解码处理,以得到第一时刻的目标信息。通过上述方式,可以对源端向量表示序列中未被翻译的源端内容和/或已被翻译的源端内容进行建模处理,即把这部分内容从原来的语言模型中剥离出来进行训练,从而降低解码器的模型训练难度,提高翻译系统的翻译效果。
可选地,在上述图3对应的实施例的基础上,本发明实施例提供的目标信息确定的方法的一个可选实施例中,根据源端向量表示序列以及源端上下文向量确定第一翻译向量,可以包括:
根据源端向量表示序列获取第二时刻对应的第三翻译向量;
采用预设神经网络模型对第三翻译向量和源端上下文向量进行处理,以得到第一翻译向量。
可理解的是,所述第三翻译向量为所述第二时刻内所述源端向量表示序列中未被翻译的源端内容对应的向量。
本实施例中,目标信息确定装置可以根据源端向量表示序列以及源端上下文向量确定第一翻译向量的过程可以包括:先根据源端向量表示序列获取第二时刻对应的第三翻译向量,然后采用预设神经网络模型对第三翻译向量和源端上下文向量进行处理,以此得到第一翻译向量。
具体地,假设第一时刻为第t时刻,第二时刻为第t-1时刻,那么目标信息确定装置需要读入第一时刻的源端上下文向量(表示第一时刻被翻译的源端内容),进而更新存储的未来翻译向量。未来翻译向量的初始化为源端句子的总结(通常为源端向量表示序列的最后一个向量),表示起始的所有源端内容都没有被翻译。于是在每一时刻,更新如下:
Figure PCTCN2018095231-appb-000003
其中,
Figure PCTCN2018095231-appb-000004
表示第t个时刻的未来翻译向量,即第一翻译向量,
Figure PCTCN2018095231-appb-000005
表示第t-1个时刻的未来翻译向量,即第三翻译向量,c t表示第t时刻的源端上下文向量,RNN()表示采用RNN模型进行计算。
需要说明的是,这里将RNN作为预设神经网络模型仅为一个示意,在实际应用中,预设神经网络模型可以是LSTM、时延网络模型或者闸控卷积神经网络,还可以是其他类型的神经网络结构,此处不做限定。
其次,本发明实施例中,介绍了如何根据源端向量表示序列以及源端上下文向量确定第一翻译向量,即根据源端向量表示序列获取第二时刻对应的第三翻译向量,然后采用预设神经网络模型对第三翻译向量和源端上下文向量进行处理,以得到第一翻译向量。通过上述方式,利用预设神经网络模型输出第一翻译向量,可以提高未来翻译向量的准确度。
可选地,在上述图3对应的实施例的基础上,本发明实施例提供的目标信息确定的方法的另一个可选实施例中,根据源端向量表示序列以及源端上下文向量确定第一翻译向量和第二翻译向量,可以包括:
根据源端向量表示序列获取第二时刻对应的第三翻译向量;
采用预设神经网络模型对第三翻译向量和源端上下文向量进行处理,以得到第一翻译向量;
根据源端上下文向量在源端向量表示序列中出现的位置,获取第二翻译向量,其中,第二翻译向量用于更新第一时刻所对应的第四翻译向量,第四翻译向量为采用预设神经网络模型对第二翻译向量和源端上下文向量处理后得到的。
可理解的是,所述第四翻译向量为所述第一时刻内所述源端向量表示序列中已被翻译的源端内容对应的向量。
本实施例中,假设第一时刻为第t时刻,第二时刻为第t-1时刻,第一时刻的源端上下文向量(即正在被翻译的源端语义内容)表示为c t,是由注意力模块得到的,c t同时也被用来更新过去翻译向量以及未来翻译向量。更新如下:
Figure PCTCN2018095231-appb-000006
其中,
Figure PCTCN2018095231-appb-000007
表示第t个时刻的未来翻译向量,即第一翻译向量,
Figure PCTCN2018095231-appb-000008
表示第t-1个时刻的未来翻译向量,即第三翻译向量,c t表示第t时刻的源端上下文向量,RNN()表示采用RNN模型进行计算。
Figure PCTCN2018095231-appb-000009
其中,
Figure PCTCN2018095231-appb-000010
表示第t个时刻的过去翻译向量,即第四翻译向量,
Figure PCTCN2018095231-appb-000011
表示第t-1个时刻的过去翻译向量,即第二翻译向量,c t表示第t时刻的源端上下文向量, RNN()表示采用RNN模型进行计算。
需要说明的是,这里将RNN作为预设神经网络模型仅为一个示意,在实际应用中,预设神经网络模型可以是LSTM、时延网络模型或者闸控卷积神经网络,还可以是其他类型的神经网络结构,此处不做限定。
我们希望能通过建模达到“累积”这一规律,即将第t时刻的的源端上下文向量c t(第t时刻被翻译的源端内容)和第t-1时刻过去翻译向量(直到第t-1时刻已经被翻译的源端内容),因此我们选择RNN结构,因为RNN能很好地总结到第t时刻为止的历史信息,与预期相符。例如,
Figure PCTCN2018095231-appb-000012
以及
Figure PCTCN2018095231-appb-000013
其次,本发明实施例中,介绍了如何根据源端向量表示序列以及源端上下文向量确定第一翻译向量和第二翻译向量,即根据源端向量表示序列获取第二时刻对应的第三翻译向量,然后采用预设神经网络模型对第三翻译向量和源端上下文向量进行处理,以得到第一翻译向量,还可以根据源端上下文向量在源端向量表示序列中出现的位置,获取第二翻译向量,其中,第二翻译向量用于更新第一时刻所对应的第四翻译向量,第四翻译向量为采用预设神经网络模型对第二翻译向量和源端上下文向量处理后得到的。通过上述方式,可以提高过去翻译向量和未来翻译向量的准确度。
可选地,在上述图3对应的实施例的基础上,本发明实施例提供的目标信息确定的方法又一个可选实施例中,根据源端向量表示序列以及源端上下文向量确定第二翻译向量,可以包括:
根据源端上下文向量在源端向量表示序列中出现的位置,获取第二翻译向量,其中,第二翻译向量用于生成第一时刻所对应的第四翻译向量,第四翻译向量为采用预设神经网络模型对第二翻译向量和源端上下文向量处理后得到的。
本实施例中,目标信息确定装置可以根据第一时刻的源端上下文向量在源端向量表示序列中出现的位置,获取第二时刻的第二翻译向量。
具体地,假设第一时刻为第t时刻,第二时刻为第t-1时刻,那么目标信息确定装置需要读入第t-2时刻的源端上下文向量以及第t-2时刻的过去翻译向量,然后,采用预设神经网络模型对第t-2时刻的源端上下文向量以及第t-2时刻的过去翻译向量进行处理,以获取t-1时刻的过去翻译向量。过去翻译向量初始化为全零向量
Figure PCTCN2018095231-appb-000014
表示起始没有任何源端内容被翻译。于是在每一时刻,更新如下:
Figure PCTCN2018095231-appb-000015
其中,
Figure PCTCN2018095231-appb-000016
表示第t个时刻的过去翻译向量,即第四翻译向量,
Figure PCTCN2018095231-appb-000017
表示第t-1个时刻的过去翻译向量,即第二翻译向量,c t表示第t时刻的源端上下文向量,RNN()表示采用RNN模型进行计算。
需要说明的是,这里将RNN作为预设神经网络模型仅为一个示意,在实际应用 中,预设神经网络模型可以是LSTM、时延网络模型或者闸控卷积神经网络,还可以是其他类型的神经网络结构,此处不做限定。
其次,本发明实施例中,介绍了如何根据源端向量表示序列以及源端上下文向量确定第二翻译向量,即根据源端上下文向量在源端向量表示序列中出现的位置,获取第二翻译向量,第二翻译向量用于生成第一时刻所对应的第四翻译向量,第四翻译向量为采用预设神经网络模型对第二翻译向量和源端上下文向量处理后得到的。通过上述方式,利用预设神经网络模型输出第二翻译向量,可以提高过去翻译向量的准确度。
可选地,在上述图3对应的前两个实施例中任一实施例的基础上,本发明实施例提供的目标信息确定的方法的又一个可选实施例中,采用预设神经网络模型对第三翻译向量和源端上下文向量进行处理,以得到第一翻译向量,可以包括:
采用门控循环单元GRU从第三翻译向量中减去源端上下文向量,以得到第一翻译向量。
可理解的是,GRU为Gated Recurrent Unit的简写。
本实施例中,在得到未来翻译向量的过程中需要通过建模来达到“累减”的规律。假设第一时刻为第t时刻,第二时刻为第t-1时刻,即将第一时刻的源端上下文向量c t(第一时刻被翻译的源端内容)从第三翻译向量(即直到第二时刻还未被翻译的源端内容对应的向量)中减去。这里我们设计了几种结构,以建模这种“累减”规律。本发明可适用于多种RNN结构,这里我们以主流的GRU为例进行介绍。
请参阅图4,图4为本发明实施例中门控循环单元的一个结构示意图,根据图4对应的标准GRU结构,可以建立“递减”模型,且希望GRU的参数能自动学习到如下规律:
Figure PCTCN2018095231-appb-000018
其中,
Figure PCTCN2018095231-appb-000019
Figure PCTCN2018095231-appb-000020
Figure PCTCN2018095231-appb-000021
其中,
Figure PCTCN2018095231-appb-000022
表示第t时刻的未来翻译向量,即第一翻译向量,
Figure PCTCN2018095231-appb-000023
表示第t-1时刻的未来翻译向量,即第三翻译向量,c t表示第一时刻的源端上下文向量,u t表示第一时刻的更新门,
Figure PCTCN2018095231-appb-000024
表示GRU生成的更新状态候选,即中间向量,r t表示输出的权重向量,tanh()表示双曲正切函数,σ()表示sigmoid函数。U、W、U r、 W r、U u和W u表示函数相关的参数,这些参数和神经网络翻译系统的其他参数统一进行训练。
然而,图4对应的GRU结构并没有给出需要“递减”的信号,这样会增加GRU学习规律的难度,因此,可以对该GRU结构进行改进,以得到图5对应的的GRU-o结构。请参阅图5,图5为本发明实施例中门控循环单元的另一个结构示意图,在这个结构中,我们先将第一时刻(即第t时刻)的源端上下文向量c t从第二时刻(即第t-1时刻)的未来翻译向量中减去,其中,第二时刻的未来翻译向量即为第三翻译向量。得到需要的第一翻译向量,再传到GRU结构中,具体地,请参阅如下公式:
Figure PCTCN2018095231-appb-000025
Figure PCTCN2018095231-appb-000026
其中,
Figure PCTCN2018095231-appb-000027
表示将c t
Figure PCTCN2018095231-appb-000028
中减去,U m和W m表示表示函数相关的参数,这些参数和神经网络翻译系统的其他参数统一进行训练。
再次,本发明实施例中,可以采用GRU从第三翻译向量中减去源端上下文向量,以得到第一翻译向量,再将得到的第一翻译向量传递到GRU结构中。通过上述方式,可以在GRU中给出递减的信号,这样有利于学习规律。从而提升模型训练的准确度。
可选地,在上述图3对应的前两个实施例中任一实施例的基础上,本发明实施例提供的目标信息确定的方法的又一个可选实施例中,采用预设神经网络模型对第三翻译向量和源端上下文向量进行处理,以得到第一翻译向量,可以包括:
采用GRU对第三翻译向量和源端上下文向量进行处理,以得到中间向量;
将中间向量与第三翻译向量进行插值合并,以得到第一翻译向量。
本实施例中,还可以在GRU内部执行“累减”的操作,请参阅图6,图6为本发明实施例中门控循环单元的另一个结构示意图,假设第一时刻为第t时刻,第二时刻为第t-1时刻,GRU的参数能自动学习到如下规律:
Figure PCTCN2018095231-appb-000029
其中,
Figure PCTCN2018095231-appb-000030
表示GRU生成的更新状态候选,即中间向量,r t表示输出的权重向量,tanh()表示双曲正切函数,
Figure PCTCN2018095231-appb-000031
表示第t-1时刻的未来翻译向量,即第三翻译向量,c t表示第一时刻的源端上下文向量。
得到
Figure PCTCN2018095231-appb-000032
之后可以与第t-1时刻的第三翻译向量进行插值合并,得到最终的第一翻译向量
Figure PCTCN2018095231-appb-000033
采用上述操作,可以在每个时刻都得到过去翻译向量和未来翻译向量,
Figure PCTCN2018095231-appb-000034
表示直到t时刻为止已经被翻译的源端内容,
Figure PCTCN2018095231-appb-000035
表示到t时刻为止未被翻译的源端内容。
再次,本发明实施例中,目标信息确定装置采用预设神经网络模型对第三翻译 向量和源端上下文向量进行处理,以得到第一翻译向量的过程可以是,先采用GRU对第三翻译向量和源端上下文向量进行处理,以得到中间向量,再将中间向量与第三翻译向量进行插值合并,以得到第一翻译向量。通过上述方式,在GRU内部执行递减的操作有利于提升操作的准确度,并增加操作的效率。
可选地,在上述图3对应的实施例基础上,本发明实施例提供的目标信息确定的方法的又一个可选实施例中,根据源端向量表示序列获取第一时刻对应的源端上下文向量,可以包括:
根据第二时刻的解码器状态、第二翻译向量、第三翻译向量以及源端向量表示序列中源端内容的向量,确定源端内容的对齐概率;
根据源端内容的对齐概率以及源端内容的语义向量,确定第一时刻对应的源端上下文向量。
本实施例中,将介绍目标信息确定装置如何根据源端向量表示序列获取第一时刻对应的源端上下文向量,为了便于理解,请参阅图7,图7为本发明实施例中增强注意力模块的一个实施例示意图,具体地,假设第一时刻为第t时刻,第二时刻为第t-1时刻,目标信息确定装置包含编码器和解码器,解码器第二时刻的解码器状态s t-1、第二翻译向量
Figure PCTCN2018095231-appb-000036
第三翻译向量
Figure PCTCN2018095231-appb-000037
以及源端向量表示序列中源端内容的向量h i,确定源端内容的对齐概率α t,i
即采用如下公式计算对齐概率α t,i
Figure PCTCN2018095231-appb-000038
其中,α t,i指由注意力机制输出的对每个源端内容的对齐概率分布,对齐概率分布的总和为1,h i是编码器对输入句子中第i个源端内容的向量表示,softmax()表示归一化操作,神经网络输入的值通常是一个可负可正的值,所以通常会先使用它的指数值转化成一个正数值,再把所有的指数值归一化,以得到概率分布。a()是注意力模块的操作。
在得到源端内容的对齐概率α t,i之后,与对应的源端内容的语义向量x i进行加权求和,以得到第一时刻对应的源端上下文向量。例如,第一时刻的α t,1为0.5,α t,2为0.3,α t,3为0.2,x 1为2,x 2为4,x 3为6,于是第一时刻对应的源端上下文向量c t的计算方式为:
c t=0.5×2+0.3×4+0.2×6=3.4
可以理解的是,在实际应用中,对齐概率α t,i的计算方式还可以是
Figure PCTCN2018095231-appb-000039
或者
Figure PCTCN2018095231-appb-000040
其次,本发明实施例中,首先根据第二时刻的解码器状态、第二翻译向量、第三翻译向量以及源端向量表示序列中源端内容的向量,可以确定源端内容的对齐概率,然后根据源端内容的对齐概率以及源端内容的语义向量,确定第一时刻对应的源端上下文向量。通过上述方式,可以使得目标信息确定装置中的注意力模块知道哪些源端内容已经被翻译,而哪些源端内容还未被翻译,从而把注意力更多的放在未被翻译的内容,而减少对已经翻译内容的关注,以此达到缓解遗漏翻译和重复翻译的问题。
可选地,在上述图3对应的实施例的基础上,本发明实施例提供的目标信息确定的方法的又一个可选实施例中,对第一翻译向量和/或第二翻译向量以及源端上下文向量进行解码处理,以得到第一时刻的目标信息之前,还可以包括:根据第二时刻的解码器状态、第二时刻的目标信息、源端上下文向量、第一翻译向量以及第二翻译向量,确定第一时刻的解码器状态;
对应的,对第一翻译向量和/或第二翻译向量以及源端上下文向量进行解码处理,以得到第一时刻的目标信息,可以包括:对第一时刻的解码器状态、源端上下文向量、第一翻译向量和/或第二翻译向量进行解码处理,以得到第一时刻的目标信息。
本实施例中,在目标信息确定装置得到第一时刻的目标信息之前,首先需要根据第二时刻的解码器状态、第二时刻的目标信息、源端上下文向量、第一翻译向量以及第二翻译向量,确定第一时刻的解码器状态,其中,第一时刻即为第t时刻,也作为当前时刻,而第二时刻即为第t-1时刻,也作为上一个时刻。
具体地,请参阅图8,图8为本发明实施例中增强解码器状态的一个实施例示意图,第二时刻的解码器状态即为s t-1,第二时刻的目标信息即为y t-1,源端上下文向量即为c t,第一翻译向量即为
Figure PCTCN2018095231-appb-000041
第二翻译向量即为
Figure PCTCN2018095231-appb-000042
利用如下公式可以计算得到第一时刻的解码器状态s t
Figure PCTCN2018095231-appb-000043
其中,f()表示更新解码器状态的激活函数(activation function),也是神经网络翻译模型的标准配置,它的输入可以根据实际需求灵活变化。
其次,本发明实施例中,首先需要根据第二时刻的解码器状态、第二时刻的目标信息、源端上下文向量、第一翻译向量以及第二翻译向量,确定第一时刻的解码器状态,然后对第一时刻的解码器状态、源端上下文向量、第一翻译向量和/或第二翻译向量进行解码处理,以得到第一时刻的目标信息。通过上述方式,将第一翻译向量和/或第二翻译向量的建模从解码器状态中独立出来,可以和第一时刻的由注意力模块输出的源端上下文向量构成完整的源端语义向量表示,并且传给解码器,以 生成更准确的目标信息。
可选地,在上述图3对应的后两个实施例中任一实施例的基础上,本发明实施例提供的目标信息确定的方法的又一可选实施例中,还可以包括:
根据第一翻译向量以及第三翻译向量,获取第一指标期望值,其中,第一指标期望值用于表示未来翻译向量变化与第一时刻的目标信息之间语义的一致性情况;
根据第二翻译向量以及第四翻译向量,获取第二指标期望值,其中,第二指标期望值用于表示过去翻译向量变化与第一时刻的目标信息之间语义的一致性情况;
根据第一指标期望值以及第二指标期望值确定训练目标,其中,训练目标用于构建预设神经网络模型。
本实施例中,还提供了一种增加训练目标的方法,通过增加训练目标可以更好地训练得到预设神经网络模型。为了便于介绍,下面将以训练未来翻译向量为例进行介绍,可以理解的是,训练过去翻译向量的方式类似,此处不做赘述。以未来翻译向量为例,需要尽可能达到
Figure PCTCN2018095231-appb-000044
即相邻时刻的两个翻译向量的信息差与该时刻被翻译的源端内容大致相同,以满足对未来翻译向量的建模。由于在翻译中源端内容和目标信息的语义内容大致相等,即c t≈E(y t),所以我们定义了一个新的指标期望值,直接评估未来翻译向量变化和生成对应目标信息在语义层面上的一致性。
可以采用如下方式计算第一指标期望值:
Figure PCTCN2018095231-appb-000045
其中,E(y t)表示目标信息,y t为目标信息的向量表示,
Figure PCTCN2018095231-appb-000046
是评判未来翻译向量的更新是否如我们预期(比如,更新量和被翻译的源端内容基本相等)的指标期望值。指标期望值越高,也就越符合我们的预期。
Figure PCTCN2018095231-appb-000047
是第一翻译向量
Figure PCTCN2018095231-appb-000048
与第三翻译向量
Figure PCTCN2018095231-appb-000049
之差的绝对值。
类似的,第二指标期望值也可以根据第二翻译向量
Figure PCTCN2018095231-appb-000050
以及第四翻译向量
Figure PCTCN2018095231-appb-000051
计算得到
Figure PCTCN2018095231-appb-000052
进而得到第二指标期望值
Figure PCTCN2018095231-appb-000053
根据第一指标期望值以及第二指标期望值,可以采用如下方式计算训练目标:
Figure PCTCN2018095231-appb-000054
其中,J(θ,γ)表示通过训练得到的参数,这是训练目标的通用表示。θ表示NMT系统的参数,γ表示新引入的过去未来模块的参数,
Figure PCTCN2018095231-appb-000055
表示得到最大分数的训练目标(即式中的likelihood,、future loss以及past loss)对应的参数。
Figure PCTCN2018095231-appb-000056
表示标准神经网络翻译模型的训练目标,即最大化每个目标信息的生成概率,或者可以表述为最大化目标词生成的似然(likelihood)分数。
Figure PCTCN2018095231-appb-000057
表示未来翻译向量的第一指标期望值,
Figure PCTCN2018095231-appb-000058
表示过去翻译向量的第二指标期望值。
再次,本发明实施例中,根据第一翻译向量以及第三翻译向量,获取第一指标期望值,并根据第二翻译向量以及第四翻译向量,获取第二指标期望值,然后根据第一指标期望值以及第二指标期望值确定训练目标,其中,训练目标用于构建预设神经网络模型。通过上述方式,可以增加训练目标,且这部分的训练目标能够较好的满足语义层面上的一致性,从而提升训练的准确度可可行性。
为便于理解,下面可以以一个具体应用场景对本发明中目标信息确定的过程进行详细描述,请参阅图9,图9为本发明应用场景中翻译源端向量表示序列中第一个源端内容的实施例示意图,具体为:
编码器读入输入句子“多个机场被迫关闭。<eos>”,<eos>表示句子终结符号,然后输出一个源端向量表示序列,其中,每个向量(即图9中的圆点直条)对应一个源端内容。根据此源端向量表示序列,解码器生成译文。
首先将对齐概率和语义向量加权,生成第1个时刻的源端上下文向量c 1,对齐概率即为图9中的0.5、0.2、0.2、0.1、0.0以及0.0。接下来根据c 1更新过去翻译向量和未来翻译向量,即采用如下公式:
Figure PCTCN2018095231-appb-000059
Figure PCTCN2018095231-appb-000060
其中,
Figure PCTCN2018095231-appb-000061
表示第1个时刻的过去翻译向量,
Figure PCTCN2018095231-appb-000062
表示初始时刻的过去翻译向量,
Figure PCTCN2018095231-appb-000063
表示第1个时刻的未来翻译向量,
Figure PCTCN2018095231-appb-000064
表示初始时刻的未来翻译向量。
解码器对c 1
Figure PCTCN2018095231-appb-000065
和初始时刻的解码器状态s 0进行解码,可以更新第1个时刻的解码器状态s 1,根据s 0和c 1,使用一个神经网络输出层,并和所有目标端词进行比较,选择相似度最高的一个词作为目标信息y 1,该y 1即为“多个”的译文“many”。
请参阅图10,图10为本发明应用场景中翻译源端向量表示序列中第二个源端内容的实施例示意图,如图所示,首先将对齐概率和语义向量加权,生成第2个时刻的源端上下文向量c 2,对齐概率即为图10中的0.3、0.6、0.1、0.0、0.0以及0.0。接下来根据c 2更新过去翻译向量和未来翻译向量,即采用如下公式:
Figure PCTCN2018095231-appb-000066
Figure PCTCN2018095231-appb-000067
其中,
Figure PCTCN2018095231-appb-000068
表示第2个时刻的过去翻译向量,
Figure PCTCN2018095231-appb-000069
表示第1个时刻的过去翻译向量,
Figure PCTCN2018095231-appb-000070
表示第2个时刻的未来翻译向量,
Figure PCTCN2018095231-appb-000071
表示第1个时刻的未来翻译向量。
解码器对c 2
Figure PCTCN2018095231-appb-000072
和第1个时刻的解码器状态s 1进行解码,可以更新第2个时刻的解码器状态,根据s 1、c 2和前一个生成的目标信息y 1,使用一个神经网络输出层,并和所有目标端词进行比较,选择相似度最高的一个词作为目标信息y 2,该y 2即为“机场”的译文“airports”。
以此类推,直到翻译完整个输入句子。
下面对本发明中的目标信息确定装置进行详细描述,请参阅图11,本发明实施例中的目标信息确定装置30包括至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:编码模块301,用于对待处理文本信息进行编码处理,以得到源端向量表示序列;第一获取模块302,用于根据所述编码模块301编码得到的所述源端向量表示序列获取第一时刻对应的源端上下文向量,其中,所述源端上下文向量用于表示待处理的源端内容;第一确定模块303,用于根据所述编码模块301编码得到的所述源端向量表示序列以及所述第一获取模块302获取的所述源端上下文向量确定第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量指示第一时刻内在所述源端向量表示序列中未被翻译的源端内容,第二翻译向量指示第二时刻内在所述源端向量表示序列中已经被翻译的源端内容,所述第二时刻为所述第一时刻之前相邻的一个时刻;解码模块304,用于对所述第一确定模块303确定的所述第一翻译向量和/或所述第二翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息。
本实施例中,编码模块301对待处理文本信息进行编码处理,以得到源端向量表示序列,第一获取模块302根据所述编码模块301编码得到的所述源端向量表示序列获取第一时刻对应的源端上下文向量,其中,所述源端上下文向量用于表示待处理的源端内容,第一确定模块303根据所述编码模块301编码得到的所述源端向量表示序列以及所述第一获取模块302获取的所述源端上下文向量确定第一翻译向 量和/或第二翻译向量,其中,所述第一翻译向量指示第一时刻内在所述源端向量表示序列中未被翻译的源端内容,第二翻译向量指示第二时刻内在所述源端向量表示序列中已经被翻译的源端内容,所述第二时刻为所述第一时刻之前相邻的一个时刻,解码模块304对所述第一确定模块303确定的所述第一翻译向量和/或所述第二翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息。
本发明实施例中,提供了一种目标信息确定装置,首先由目标信息确定装置对待处理文本信息进行编码处理,以得到源端向量表示序列,然后根据源端向量表示序列获取第一时刻对应的源端上下文向量,该源端上下文向量用于表示待处理的源端内容,根据源端向量表示序列以及源端上下文向量确定第一翻译向量和/或第二翻译向量,该第一翻译向量指示第一时刻内在源端向量表示序列中未被翻译的源端内容,第二翻译向量指示第二时刻内在源端向量表示序列中已经被翻译的源端内容,第二时刻为第一时刻之前相邻的一个时刻,最后目标信息确定装置对第一翻译向量和/或第二翻译向量以及源端上下文向量进行解码处理,以得到第一时刻的目标信息。通过上述方式,可以对源端向量表示序列中未被翻译的源端内容和/或已被翻译的源端内容进行建模处理,即把这部分内容从原来的语言模型中剥离出来进行训练,从而降低解码器的模型训练难度,提高翻译系统的翻译效果。
可选地,在上述图11所对应的实施例的基础上,请参阅图12,本发明实施例提供的目标信息确定装置30的另一实施例中,
所述第一确定模块303包括:第一获取单元3031,用于根据所述源端向量表示序列获取所述第二时刻对应的第三翻译向量;第一处理单元3032,用于采用预设神经网络模型对所述第一获取单元3031获取的所述第三翻译向量和所述源端上下文向量进行处理,以得到所述第一翻译向量。
其次,本发明实施例中,介绍了如何根据源端向量表示序列以及源端上下文向量确定第一翻译向量,即根据源端向量表示序列获取第二时刻对应的第三翻译向量,然后采用预设神经网络模型对第三翻译向量和源端上下文向量进行处理,以得到第一翻译向量。通过上述方式,利用预设神经网络模型输出第一翻译向量,可以提高未来翻译向量的准确度。
可选地,在上述图11所对应的实施例的基础上,请参阅图13,本发明实施例提供的目标信息确定装置30的另一实施例中,
所述第一确定模块303包括:第二获取单元3033,用于根据所述源端向量表示序列获取所述第二时刻对应的第三翻译向量;第二处理单元3034,用于采用预设神经网络模型对所述第二获取单元3033获取的所述第三翻译向量和所述源端上下文向量进行处理,以得到所述第一翻译向量;第三获取单元3035,用于根据所述源端上下文向量在所述源端向量表示序列中出现的位置,获取所述第二翻译向量,其中, 所述第二翻译向量用于更新所述第一时刻所对应的第四翻译向量,所述第四翻译向量为采用所述预设神经网络模型对所述第二翻译向量和所述源端上下文向量处理后得到的。
其次,本发明实施例中,介绍了如何根据源端向量表示序列以及源端上下文向量确定第一翻译向量和第二翻译向量,即根据源端向量表示序列获取第二时刻对应的第三翻译向量,然后采用预设神经网络模型对第三翻译向量和源端上下文向量进行处理,以得到第一翻译向量,还可以根据源端上下文向量在源端向量表示序列中出现的位置,获取第二翻译向量,其中,第二翻译向量用于更新第一时刻所对应的第四翻译向量,第四翻译向量为采用预设神经网络模型对第二翻译向量和源端上下文向量处理后得到的。通过上述方式,可以提高过去翻译向量和未来翻译向量的准确度。
可选地,在上述图11所对应的实施例的基础上,请参阅图14,本发明实施例提供的目标信息确定装置30的另一实施例中,所述第一确定模块303包括:第四获取单元3036,用于根据所述源端上下文向量在所述源端向量表示序列中出现的位置,获取所述第二翻译向量,其中,所述第二翻译向量用于生成所述第一时刻所对应的第四翻译向量,所述第四翻译向量为采用所述预设神经网络模型对所述第二翻译向量和所述源端上下文向量处理后得到的。
其次,本发明实施例中,介绍了如何根据源端向量表示序列以及源端上下文向量确定第二翻译向量,即根据源端上下文向量在源端向量表示序列中出现的位置,获取第二翻译向量,第二翻译向量用于生成第一时刻所对应的第四翻译向量,第四翻译向量为采用预设神经网络模型对第二翻译向量和源端上下文向量处理后得到的。通过上述方式,利用预设神经网络模型输出第二翻译向量,可以提高过去翻译向量的准确度。
可选地,在上述图12或图13所对应的实施例的基础上,请参阅图15,本发明实施例提供的目标信息确定装置30的另一实施例中,所述第一处理单元3032包括:相减子单元30321,用于采用门控循环单元GRU从所述第三翻译向量中减去所述源端上下文向量,以得到所述第一翻译向量。
再次,本发明实施例中,可以采用GRU从第三翻译向量中减去源端上下文向量,以得到第一翻译向量,再将得到的第一翻译向量传递到GRU结构中。通过上述方式,可以在GRU中给出递减的信号,这样有利于学习规律。从而提升模型训练的准确度。
可选地,在上述图12或图13所对应的实施例的基础上,请参阅图16,本发明实施例提供的目标信息确定装置30的另一实施例中,所述第一处理单元3032包括:处理子单元30322,用于采用GRU对所述第三翻译向量和所述源端上下文向量进行处理,以得到中间向量;合并子单元30323,用于将所述处理子单元30322处理得 到的所述中间向量与所述第三翻译向量进行插值合并,以得到所述第一翻译向量。
再次,本发明实施例中,目标信息确定装置采用预设神经网络模型对第三翻译向量和源端上下文向量进行处理,以得到第一翻译向量的过程可以是,先采用GRU对第三翻译向量和源端上下文向量进行处理,以得到中间向量,再将中间向量与第三翻译向量进行插值合并,以得到第一翻译向量。通过上述方式,在GRU内部执行递减的操作有利于提升操作的准确度,并增加操作的效率。
可选地,在上述图11所对应的实施例的基础上,请参阅图17,本发明实施例提供的目标信息确定装置30的另一实施例中,所述第一获取模块302包括:第一确定单元3021,用于根据所述第二时刻的解码器状态、所述第二翻译向量、第三翻译向量以及所述源端向量表示序列中源端内容的向量,确定源端内容的对齐概率;第二确定单元3022,用于根据所述第一确定单元3021确定的所述源端内容的对齐概率以及所述源端内容的语义向量,确定所述第一时刻对应的所述源端上下文向量。
其次,本发明实施例中,首先根据第二时刻的解码器状态、第二翻译向量、第三翻译向量以及源端向量表示序列中源端内容的向量,可以确定源端内容的对齐概率,然后根据源端内容的对齐概率以及源端内容的语义向量,确定第一时刻对应的源端上下文向量。通过上述方式,可以使得目标信息确定装置中的注意力模块知道哪些源端内容已经被翻译,而哪些源端内容还未被翻译,从而把注意力更多的放在未被翻译的内容,而减少对已经翻译内容的关注,以此达到缓解遗漏翻译和重复翻译的问题。
可选地,在上述图11所对应的实施例的基础上,请参阅图18,本发明实施例提供的目标信息确定装置30的另一实施例中,所述目标信息确定装置30还可以包括:第二确定模块305,用于所述解码模块304对所述第一翻译向量和/或所述第二翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息之前,根据所述第二时刻的解码器状态、所述第二时刻的目标信息、所述源端上下文向量、所述第一翻译向量以及所述第二翻译向量,确定所述第一时刻的解码器状态;所述解码模块304包括:解码单元3041,用于对所述第一时刻的解码器状态、所述源端上下文向量、所述第一翻译向量和/或所述第二翻译向量进行解码处理,以得到所述第一时刻的目标信息。
其次,本发明实施例中,首先需要根据第二时刻的解码器状态、第二时刻的目标信息、源端上下文向量、第一翻译向量以及第二翻译向量,确定第一时刻的解码器状态,然后对第一时刻的解码器状态、源端上下文向量、第一翻译向量和/或第二翻译向量进行解码处理,以得到第一时刻的目标信息。通过上述方式,将第一翻译向量和/或第二翻译向量的建模从解码器状态中独立出来,可以和第一时刻的由注意力模块输出的源端上下文向量构成完整的源端语义向量表示,并且传给解码器,以 生成更准确的目标信息。
可选地,在上述图17或图18所对应的实施例的基础上,请参阅图19,本发明实施例提供的目标信息确定装置30的另一实施例中,所述目标信息确定装置30还可以包括:第二获取模块306,用于根据所述第一翻译向量以及所述第三翻译向量,获取第一指标期望值,其中,所述第一指标期望值用于表示未来翻译向量变化与所述第一时刻的目标信息之间语义的一致性情况;第三获取模块307,用于根据所述第二翻译向量以及所述第四翻译向量,获取第二指标期望值,其中,所述第二指标期望值用于表示过去翻译向量变化与所述第一时刻的目标信息之间语义的一致性情况;第二确定模块308,用于根据所述第二获取模块306获取的所述第一指标期望值以及所述第三获取模块307获取的所述第二指标期望值确定训练目标,其中,所述训练目标用于构建预设神经网络模型。
再次,本发明实施例中,根据第一翻译向量以及第三翻译向量,获取第一指标期望值,并根据第二翻译向量以及第四翻译向量,获取第二指标期望值,然后根据第一指标期望值以及第二指标期望值确定训练目标,其中,训练目标用于构建预设神经网络模型。通过上述方式,可以增加训练目标,且这部分的训练目标能够较好的满足语义层面上的一致性,从而提升训练的准确度可可行性。
图20是本发明实施例提供的一种目标信息确定装置结构示意图,该目标信息确定装置300可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)322(例如,一个或一个以上处理器)和存储器332,一个或一个以上存储应用程序342或数据344的存储介质330(例如一个或一个以上海量存储设备)。其中,存储器332和存储介质330可以是短暂存储或持久存储。存储在存储介质330的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对目标信息确定装置中的一系列指令操作。更进一步地,中央处理器322可以设置为与存储介质330通信,在目标信息确定装置300上执行存储介质330中的一系列指令操作。
目标信息确定装置300还可以包括一个或一个以上电源326,一个或一个以上有线或无线网络接口350,一个或一个以上输入输出接口358,和/或,一个或一个以上操作系统341,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述实施例中由目标信息确定装置所执行的步骤可以基于该图20所示的目标信息确定装置结构。
CPU 322用于执行如下步骤:对待处理文本信息进行编码处理,以得到源端向量表示序列;根据所述源端向量表示序列获取第一时刻对应的源端上下文向量,其中,所述源端上下文向量用于表示待处理的源端内容;根据所述源端向量表示序列 以及所述源端上下文向量确定第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量指示第一时刻内在所述源端向量表示序列中未被翻译的源端内容,第二翻译向量指示第二时刻内在所述源端向量表示序列中已经被翻译的源端内容,所述第二时刻为所述第一时刻之前相邻的一个时刻;对所述第一翻译向量和/或所述第二翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息。
可选地,CPU 322具体用于执行如下步骤:根据所述源端向量表示序列获取所述第二时刻对应的第三翻译向量;采用预设神经网络模型对所述第三翻译向量和所述源端上下文向量进行处理,以得到所述第一翻译向量。
可选地,CPU 322具体用于执行如下步骤:根据所述源端向量表示序列获取所述第二时刻对应的第三翻译向量;采用预设神经网络模型对所述第三翻译向量和所述源端上下文向量进行处理,以得到所述第一翻译向量;根据所述源端上下文向量在所述源端向量表示序列中出现的位置,获取所述第二翻译向量,其中,所述第二翻译向量用于更新所述第一时刻所对应的第四翻译向量,所述第四翻译向量为采用所述预设神经网络模型对所述第二翻译向量和所述源端上下文向量处理后得到的。
可选地,CPU 322具体用于执行如下步骤:根据所述源端上下文向量在所述源端向量表示序列中出现的位置,获取所述第二翻译向量,其中,所述第二翻译向量用于生成所述第一时刻所对应的第四翻译向量,所述第四翻译向量为采用所述预设神经网络模型对所述第二翻译向量和所述源端上下文向量处理后得到的。
可选地,CPU 322具体用于执行如下步骤:采用门控循环单元GRU从所述第三翻译向量中减去所述源端上下文向量,以得到所述第一翻译向量。
可选地,CPU 322具体用于执行如下步骤:采用GRU对所述第三翻译向量和所述源端上下文向量进行处理,以得到中间向量;将所述中间向量与所述第三翻译向量进行插值合并,以得到所述第一翻译向量。
可选地,CPU 322具体用于执行如下步骤:根据所述第二时刻的解码器状态、所述第二翻译向量、所述第三翻译向量以及所述源端向量表示序列中源端内容的向量,确定源端内容的对齐概率;根据所述源端内容的对齐概率以及所述源端内容的语义向量,确定所述第一时刻对应的所述源端上下文向量。
可选地,CPU 322还用于执行如下步骤:根据所述第二时刻的解码器状态、所述第二时刻的目标信息、所述源端上下文向量、所述第一翻译向量以及所述第二翻译向量,确定所述第一时刻的解码器状态。
CPU 322具体用于执行如下步骤:对所述第一时刻的解码器状态、所述第一翻译向量、所述第二翻译向量以及所述源端上下文向量进行解码处理,以得到所述第一时刻的目标信息。
可选地,CPU 322还用于执行如下步骤:根据所述第一翻译向量以及所述第三 翻译向量,获取第一指标期望值,其中,所述第一指标期望值用于表示未来翻译向量变化与所述第一时刻的目标信息之间语义的一致性情况;根据所述第二翻译向量以及所述第四翻译向量,获取第二指标期望值,其中,所述第二指标期望值用于表示过去翻译向量变化与所述第一时刻的目标信息之间语义的一致性情况;根据所述第一指标期望值以及所述第二指标期望值确定训练目标,其中,所述训练目标用于构建预设神经网络模型。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。故本发明一实施例还提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如上述方法。
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参 照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (16)

  1. 一种翻译的方法,所述方法应用于神经网络机器翻译系统,所述方法包括:
    采用编码器对待处理文本信息进行编码处理,以得到源端向量表示序列;其中,所述待处理文本信息属于第一语言;
    根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;
    根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;
    采用解码器对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息;其中,所述目标信息属于不同于所述第一语言的第二语言。
  2. 根据权利要求1所述的方法,其中,所述采用编码器对待处理文本信息进行编码处理,以得到源端向量表示序列,包括:
    将所述待处理文本信息输入至所述编码器;
    采用所述编码器对所述待处理文本信息进行编码处理;
    根据编码处理的结果获取所述源端向量表示序列;其中,所述源端向量表示序列中各个源端向量属于所述第一语言。
  3. 根据权利要求1或2所述的方法,其中,所述采用解码器对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息,包括:
    将所述翻译向量以及所述源端上下文向量输入至所述解码器;
    采用所述解码器对所述翻译向量以及所述源端上下文向量进行解码处理;
    根据解码处理的结果获取所述待处理的源端内容的翻译内容,其中,所述翻译内容为所述第一时刻的目标信息。
  4. 一种目标信息确定的方法,包括:
    对待处理文本信息进行编码处理,以得到源端向量表示序列;
    根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;
    根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所 述翻译向量包括第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;
    对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息。
  5. 根据权利要求4所述的方法,其中,根据所述源端向量表示序列以及所述源端上下文向量确定第一翻译向量,包括:
    根据所述源端向量表示序列获取第三翻译向量;所述第三翻译向量为所述第二时刻内所述源端向量表示序列中未被翻译的源端内容对应的向量;
    采用预设神经网络模型对所述第三翻译向量和所述源端上下文向量进行处理,以得到所述第一翻译向量。
  6. 根据权利要求4所述的方法,其中,根据所述源端向量表示序列以及所述源端上下文向量确定第一翻译向量和第二翻译向量,包括:
    根据所述源端向量表示序列获取第三翻译向量;所述第三翻译向量为所述第二时刻内所述源端向量表示序列中未被翻译的源端内容对应的向量;
    采用预设神经网络模型对所述第三翻译向量和所述源端上下文向量进行处理,以得到所述第一翻译向量;
    根据所述源端上下文向量在所述源端向量表示序列中出现的位置,获取所述第二翻译向量;其中,所述第二翻译向量用于更新第四翻译向量,所述第四翻译向量为所述第一时刻内所述源端向量表示序列中已被翻译的源端内容对应的向量,所述第四翻译向量为采用所述预设神经网络模型对所述第二翻译向量和所述源端上下文向量处理后得到的。
  7. 根据权利要求4所述的方法,其中,根据所述源端向量表示序列以及所述源端上下文向量确定第二翻译向量,包括:
    根据所述源端上下文向量在所述源端向量表示序列中出现的位置,获取所述第二翻译向量,其中,所述第二翻译向量用于生成第四翻译向量,所述第四翻译向量为所述第一时刻内所述源端向量表示序列中已被翻译的源端内容对应的向量,所述第四翻译向量为采用所述预设神经网络模型对所述第二翻译向量和所述源端上下文向量处理后得到的。
  8. 根据权利要求5或6所述的方法,其中,所述采用预设神经网络模型对所述第三翻译向量和所述源端上下文向量进行处理,以得到所述第一翻译向量,包括:
    采用门控循环单元从所述第三翻译向量中减去所述源端上下文向量,以得到所述第一翻译向量。
  9. 根据权利要求5或6所述的方法,其中,所述采用预设神经网络模型对所述第三翻译向量和所述源端上下文向量进行处理,以得到所述第一翻译向量,包括:
    采用门控循环单元对所述第三翻译向量和所述源端上下文向量进行处理,以得到中间向量;
    将所述中间向量与所述第三翻译向量进行插值合并,以得到所述第一翻译向量。
  10. 根据权利要求4所述的方法,其中,所述根据所述源端向量表示序列获取第一时刻对应的源端上下文向量,包括:
    根据所述第二时刻的解码器状态、所述第二翻译向量、第三翻译向量以及所述源端向量表示序列中源端内容的向量,确定源端内容的对齐概率;所述第三翻译向量为所述第二时刻内所述源端向量表示序列中未被翻译的源端内容对应的向量;
    根据所述源端内容的对齐概率以及所述源端内容的语义向量,确定所述第一时刻对应的所述源端上下文向量。
  11. 根据权利要求4所述的方法,其中,所述对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息之前,所述方法还包括:根据所述第二时刻的解码器状态、所述第二时刻的目标信息、所述源端上下文向量和所述翻译向量,确定所述第一时刻的解码器状态;
    对应的,所述对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息,包括:对所述第一时刻的解码器状态、所述源端上下文向量和所述翻译向量进行解码处理,以得到所述第一时刻的目标信息。
  12. 根据权利要求10或11所述的方法,其中,所述方法还包括:
    根据所述第一翻译向量以及所述第三翻译向量,获取第一指标期望值,其中,所述第一指标期望值用于表示未来翻译向量变化与所述第一时刻的目标信息之间语义的一致性情况;
    根据所述第二翻译向量以及所述第四翻译向量,获取第二指标期望值,其中,所述第二指标期望值用于表示过去翻译向量变化与所述第一时刻的目标信息之间语义的一致性情况;
    根据所述第一指标期望值以及所述第二指标期望值确定训练目标,其中,所述训练目标用于构建预设神经网络模型。
  13. 一种目标信息确定装置,包括至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:编码模块,用于对待处理文本信息进行编码处理,以得到源端向量表示序列;第一获取模块,用于根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;第一确定模块,用 于根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;解码模块,用于对对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息。
  14. 一种目标信息确定装置,包括:存储器、处理器以及总线系统;其中,所述存储器用于存储程序;所述处理器用于执行所述存储器中的程序,包括如下步骤:对待处理文本信息进行编码处理,以得到源端向量表示序列;根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息;所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
  15. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1-12所述的方法。
  16. 一种目标信息确定的方法,该方法由电子设备执行,该方法包括:
    对待处理文本信息进行编码处理,以得到源端向量表示序列;
    根据所述源端向量表示序列获取第一时刻对应的源端上下文向量;其中,所述第一时刻对应的源端上下文向量用于指示在所述第一时刻所述待处理文本信息中待处理的源端内容;
    根据所述源端向量表示序列以及所述源端上下文向量确定翻译向量;其中,所述翻译向量包括第一翻译向量和/或第二翻译向量,其中,所述第一翻译向量为第一时刻内在所述源端向量表示序列中未被翻译的源端内容对应的向量,所述第二翻译向量为第二时刻内在所述源端向量表示序列中已经被翻译的源端内容对应的向量,所述第二时刻为所述第一时刻之前相邻的一个时刻;
    对所述翻译向量以及所述源端上下文向量进行解码处理,以得到第一时刻的目标信息。
PCT/CN2018/095231 2017-07-25 2018-07-11 翻译的方法、目标信息确定的方法及相关装置、存储介质 WO2019019916A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP2020503957A JP7025090B2 (ja) 2017-07-25 2018-07-11 翻訳方法、ターゲット情報決定方法および関連装置、ならびにコンピュータプログラム
KR1020207002392A KR102382499B1 (ko) 2017-07-25 2018-07-11 번역 방법, 타깃 정보 결정 방법, 관련 장치 및 저장 매체
EP18837956.4A EP3660707A4 (en) 2017-07-25 2018-07-11 TRANSLATION PROCEDURES, TARGET INFORMATION DETERMINATION PROCESS AND ASSOCIATED DEVICE AND STORAGE MEDIUM
US16/749,243 US11928439B2 (en) 2017-07-25 2020-01-22 Translation method, target information determining method, related apparatus, and storage medium
US18/390,153 US20240169166A1 (en) 2017-07-25 2023-12-20 Translation method, target information determining method, related apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710612833.7 2017-07-25
CN201710612833.7A CN107368476B (zh) 2017-07-25 2017-07-25 一种翻译的方法、目标信息确定的方法及相关装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/749,243 Continuation US11928439B2 (en) 2017-07-25 2020-01-22 Translation method, target information determining method, related apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2019019916A1 true WO2019019916A1 (zh) 2019-01-31

Family

ID=60306920

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/095231 WO2019019916A1 (zh) 2017-07-25 2018-07-11 翻译的方法、目标信息确定的方法及相关装置、存储介质

Country Status (6)

Country Link
US (2) US11928439B2 (zh)
EP (1) EP3660707A4 (zh)
JP (1) JP7025090B2 (zh)
KR (1) KR102382499B1 (zh)
CN (1) CN107368476B (zh)
WO (1) WO2019019916A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472255A (zh) * 2019-08-20 2019-11-19 腾讯科技(深圳)有限公司 神经网络机器翻译方法、模型、电子终端以及存储介质
CN111209395A (zh) * 2019-12-27 2020-05-29 铜陵中科汇联科技有限公司 一种短文本相似度计算系统及其训练方法
US11710003B2 (en) 2018-02-26 2023-07-25 Tencent Technology (Shenzhen) Company Limited Information conversion method and apparatus, storage medium, and electronic device

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101328745B1 (ko) * 2011-06-13 2013-11-20 (주)우남케미칼 예비발포입자 제조 시스템
CN107368476B (zh) * 2017-07-25 2020-11-03 深圳市腾讯计算机系统有限公司 一种翻译的方法、目标信息确定的方法及相关装置
CN108363763B (zh) * 2018-02-05 2020-12-01 深圳市腾讯计算机系统有限公司 一种自动问答方法、装置和存储介质
CN108197123A (zh) * 2018-02-07 2018-06-22 云南衍那科技有限公司 一种基于智能手表的云翻译系统和方法
CN110134971B (zh) 2018-02-08 2022-12-16 腾讯科技(深圳)有限公司 一种机器翻译的方法、设备以及计算机可读存储介质
CN110489761B (zh) * 2018-05-15 2021-02-02 科大讯飞股份有限公司 一种篇章级文本翻译方法及装置
CN108984539B (zh) * 2018-07-17 2022-05-17 苏州大学 基于模拟未来时刻的翻译信息的神经机器翻译方法
CN109062897A (zh) * 2018-07-26 2018-12-21 苏州大学 基于深度神经网络的句子对齐方法
CN109274814B (zh) * 2018-08-20 2020-10-23 维沃移动通信有限公司 一种消息提示方法、装置及终端设备
CN109271646B (zh) * 2018-09-04 2022-07-08 腾讯科技(深圳)有限公司 文本翻译方法、装置、可读存储介质和计算机设备
CN109146064B (zh) * 2018-09-05 2023-07-25 腾讯科技(深圳)有限公司 神经网络训练方法、装置、计算机设备和存储介质
CN109446534B (zh) * 2018-09-21 2020-07-31 清华大学 机器翻译方法及装置
CN111428516B (zh) * 2018-11-19 2022-08-19 腾讯科技(深圳)有限公司 一种信息处理的方法以及装置
CN109543199B (zh) * 2018-11-28 2022-06-10 腾讯科技(深圳)有限公司 一种文本翻译的方法以及相关装置
CN109858045B (zh) * 2019-02-01 2020-07-10 北京字节跳动网络技术有限公司 机器翻译方法和装置
CN109933809B (zh) * 2019-03-15 2023-09-15 北京金山数字娱乐科技有限公司 一种翻译方法及装置、翻译模型的训练方法及装置
CN111783435A (zh) * 2019-03-18 2020-10-16 株式会社理光 共享词汇的选择方法、装置及存储介质
CN110110337B (zh) * 2019-05-08 2023-04-18 网易有道信息技术(北京)有限公司 翻译模型训练方法、介质、装置和计算设备
CN111597829B (zh) * 2020-05-19 2021-08-27 腾讯科技(深圳)有限公司 翻译方法和装置、存储介质和电子设备
US11741317B2 (en) * 2020-05-25 2023-08-29 Rajiv Trehan Method and system for processing multilingual user inputs using single natural language processing model
CN114254660A (zh) * 2020-09-22 2022-03-29 北京三星通信技术研究有限公司 多模态翻译方法、装置、电子设备及计算机可读存储介质
KR20220056004A (ko) * 2020-10-27 2022-05-04 삼성전자주식회사 전자 장치 및 이의 제어 방법
JPWO2022102364A1 (zh) * 2020-11-13 2022-05-19
KR20230135990A (ko) * 2022-03-17 2023-09-26 주식회사 아론티어 트랜스포머와 원자 환경을 이용한 역합성 번역 방법 및 이를 수행하기 위한 장치

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105068998A (zh) * 2015-07-29 2015-11-18 百度在线网络技术(北京)有限公司 基于神经网络模型的翻译方法及装置
US20160179790A1 (en) * 2013-06-03 2016-06-23 National Institute Of Information And Communications Technology Translation apparatus, learning apparatus, translation method, and storage medium
CN106126507A (zh) * 2016-06-22 2016-11-16 哈尔滨工业大学深圳研究生院 一种基于字符编码的深度神经翻译方法及系统
CN106372058A (zh) * 2016-08-29 2017-02-01 中译语通科技(北京)有限公司 一种基于深度学习的短文本情感要素抽取方法及装置
CN106663092A (zh) * 2014-10-24 2017-05-10 谷歌公司 具有罕见词处理的神经机器翻译系统
CN107368476A (zh) * 2017-07-25 2017-11-21 深圳市腾讯计算机系统有限公司 一种翻译的方法、目标信息确定的方法及相关装置

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2612404C (en) * 2005-06-17 2014-05-27 National Research Council Of Canada Means and method for adapted language translation
US9471565B2 (en) * 2011-07-29 2016-10-18 At&T Intellectual Property I, L.P. System and method for locating bilingual web sites
SG11201404225WA (en) * 2012-01-27 2014-08-28 Nec Corp Term translation acquisition method and term translation acquisition apparatus
JP6296592B2 (ja) * 2013-05-29 2018-03-20 国立研究開発法人情報通信研究機構 翻訳語順情報出力装置、機械翻訳装置、学習装置、翻訳語順情報出力方法、学習方法、およびプログラム
US9535960B2 (en) * 2014-04-14 2017-01-03 Microsoft Corporation Context-sensitive search using a deep learning model
US9846836B2 (en) * 2014-06-13 2017-12-19 Microsoft Technology Licensing, Llc Modeling interestingness with deep neural networks
US9778929B2 (en) * 2015-05-29 2017-10-03 Microsoft Technology Licensing, Llc Automated efficient translation context delivery
CN106484682B (zh) * 2015-08-25 2019-06-25 阿里巴巴集团控股有限公司 基于统计的机器翻译方法、装置及电子设备
CN106484681B (zh) * 2015-08-25 2019-07-09 阿里巴巴集团控股有限公司 一种生成候选译文的方法、装置及电子设备
CN106672058A (zh) * 2015-11-08 2017-05-17 匡桃红 婴儿车无晃动收合关节
US9792534B2 (en) * 2016-01-13 2017-10-17 Adobe Systems Incorporated Semantic natural language vector space
US11449744B2 (en) * 2016-06-23 2022-09-20 Microsoft Technology Licensing, Llc End-to-end memory networks for contextual language understanding
US10706351B2 (en) * 2016-08-30 2020-07-07 American Software Safety Reliability Company Recurrent encoder and decoder
US20180197080A1 (en) * 2017-01-11 2018-07-12 International Business Machines Corporation Learning apparatus and method for bidirectional learning of predictive model based on data sequence
KR102637338B1 (ko) * 2017-01-26 2024-02-16 삼성전자주식회사 번역 보정 방법 및 장치와 번역 시스템
US10839790B2 (en) * 2017-02-06 2020-11-17 Facebook, Inc. Sequence-to-sequence convolutional architecture
CN107092664B (zh) * 2017-03-30 2020-04-28 华为技术有限公司 一种内容解释方法及装置
KR102329127B1 (ko) * 2017-04-11 2021-11-22 삼성전자주식회사 방언을 표준어로 변환하는 방법 및 장치
US10733380B2 (en) * 2017-05-15 2020-08-04 Thomson Reuters Enterprise Center Gmbh Neural paraphrase generator
CN108959312B (zh) * 2017-05-23 2021-01-29 华为技术有限公司 一种多文档摘要生成的方法、装置和终端
CN110309839B (zh) * 2019-08-27 2019-12-03 北京金山数字娱乐科技有限公司 一种图像描述的方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160179790A1 (en) * 2013-06-03 2016-06-23 National Institute Of Information And Communications Technology Translation apparatus, learning apparatus, translation method, and storage medium
CN106663092A (zh) * 2014-10-24 2017-05-10 谷歌公司 具有罕见词处理的神经机器翻译系统
CN105068998A (zh) * 2015-07-29 2015-11-18 百度在线网络技术(北京)有限公司 基于神经网络模型的翻译方法及装置
CN106126507A (zh) * 2016-06-22 2016-11-16 哈尔滨工业大学深圳研究生院 一种基于字符编码的深度神经翻译方法及系统
CN106372058A (zh) * 2016-08-29 2017-02-01 中译语通科技(北京)有限公司 一种基于深度学习的短文本情感要素抽取方法及装置
CN107368476A (zh) * 2017-07-25 2017-11-21 深圳市腾讯计算机系统有限公司 一种翻译的方法、目标信息确定的方法及相关装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3660707A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11710003B2 (en) 2018-02-26 2023-07-25 Tencent Technology (Shenzhen) Company Limited Information conversion method and apparatus, storage medium, and electronic device
CN110472255A (zh) * 2019-08-20 2019-11-19 腾讯科技(深圳)有限公司 神经网络机器翻译方法、模型、电子终端以及存储介质
CN110472255B (zh) * 2019-08-20 2021-03-02 腾讯科技(深圳)有限公司 神经网络机器翻译方法、模型、电子终端以及存储介质
CN111209395A (zh) * 2019-12-27 2020-05-29 铜陵中科汇联科技有限公司 一种短文本相似度计算系统及其训练方法
CN111209395B (zh) * 2019-12-27 2022-11-11 铜陵中科汇联科技有限公司 一种短文本相似度计算系统及其训练方法

Also Published As

Publication number Publication date
US11928439B2 (en) 2024-03-12
JP2020528625A (ja) 2020-09-24
EP3660707A1 (en) 2020-06-03
CN107368476A (zh) 2017-11-21
CN107368476B (zh) 2020-11-03
KR20200019740A (ko) 2020-02-24
JP7025090B2 (ja) 2022-02-24
KR102382499B1 (ko) 2022-04-01
EP3660707A4 (en) 2020-08-05
US20200226328A1 (en) 2020-07-16
US20240169166A1 (en) 2024-05-23

Similar Documents

Publication Publication Date Title
WO2019019916A1 (zh) 翻译的方法、目标信息确定的方法及相关装置、存储介质
CN110309514B (zh) 一种语义识别方法及装置
CN107836000B (zh) 用于语言建模和预测的改进的人工神经网络方法、电子设备
WO2019154210A1 (zh) 机器翻译的方法、设备以及计算机可读存储介质
CN112487182A (zh) 文本处理模型的训练方法、文本处理方法及装置
CN113657399A (zh) 文字识别模型的训练方法、文字识别方法及装置
CN110162766B (zh) 词向量更新方法和装置
WO2018232699A1 (zh) 一种信息处理的方法及相关装置
CN110457661B (zh) 自然语言生成方法、装置、设备及存储介质
CN111144140B (zh) 基于零次学习的中泰双语语料生成方法及装置
US20220148239A1 (en) Model training method and apparatus, font library establishment method and apparatus, device and storage medium
CN111738020B (zh) 一种翻译模型的训练方法及装置
WO2022100481A1 (zh) 一种文本信息的翻译方法、装置、电子设备和存储介质
US11710003B2 (en) Information conversion method and apparatus, storage medium, and electronic device
US20230215203A1 (en) Character recognition model training method and apparatus, character recognition method and apparatus, device and storage medium
CN110020440B (zh) 一种机器翻译方法、装置、服务器及存储介质
JP2023002690A (ja) セマンティックス認識方法、装置、電子機器及び記憶媒体
JP2023072022A (ja) マルチモーダル表現モデルのトレーニング方法、クロスモーダル検索方法及び装置
WO2020155769A1 (zh) 关键词生成模型的建模方法和装置
CN115312034A (zh) 基于自动机和字典树处理语音信号的方法、装置和设备
CN114580444A (zh) 文本翻译模型的训练方法、设备及存储介质
CN113947091A (zh) 用于语言翻译的方法、设备、装置和介质
CN111178097B (zh) 基于多级翻译模型生成中泰双语语料的方法及装置
CN117034951A (zh) 基于大语言模型的具有特定语言风格的数字人
CN108874786B (zh) 机器翻译方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18837956

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20207002392

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020503957

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018837956

Country of ref document: EP

Effective date: 20200225