WO2018082569A1 - 序列转换方法及装置 - Google Patents

序列转换方法及装置 Download PDF

Info

Publication number
WO2018082569A1
WO2018082569A1 PCT/CN2017/108950 CN2017108950W WO2018082569A1 WO 2018082569 A1 WO2018082569 A1 WO 2018082569A1 CN 2017108950 W CN2017108950 W CN 2017108950W WO 2018082569 A1 WO2018082569 A1 WO 2018082569A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
candidate target
probability value
source
intermediate state
Prior art date
Application number
PCT/CN2017/108950
Other languages
English (en)
French (fr)
Inventor
涂兆鹏
尚利峰
刘晓华
李航
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP17868103.7A priority Critical patent/EP3534276A4/en
Publication of WO2018082569A1 publication Critical patent/WO2018082569A1/zh
Priority to US16/396,172 priority patent/US11132516B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/51Translation evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present invention relates to computer technology, and in particular, to a sequence conversion method and apparatus.
  • Sequence-to-sequence learning is a A learning process that maps a source sequence to a target sequence.
  • the results of sequence-to-sequence learning are mainly used for sequence conversion.
  • Typical application scenarios of sequence conversion include machine translation, speech recognition, dialog system or conversational agent, and automatic summarization. , answer answering and image caption generation, and so on.
  • a typical sequence conversion method involves two phases: an encoding phase and a decoding phase.
  • the coding phase generally converts the source sequence into a source vector representation sequence through a Recurrent Neural Network (RNN), and then converts the source vector representation sequence into a source context vector through an attention mechanism. Specifically, each time a part of the source sequence is selected to be converted into a source context vector, the source sequence can be converted into multiple source context vectors (thus generating a corresponding source for each target word in the decoding phase) Context vector).
  • RNN Recurrent Neural Network
  • the decoding stage generates a target end sequence by generating a target end word each time: in each step, the decoder obtains the current source side context vector according to the encoding stage, and the decoder intermediate state of the previous step and the target end generated by the previous step.
  • the word is used to calculate the current intermediate state of the decoder, and the target end word of the current step is predicted according to the current intermediate state and the source side context vector.
  • the RNN can also be used for processing in the decoding stage.
  • the RNN mainly refers to the target-side context vector that has been predicted in the prediction process, and only uses the source-side context vector as an additional input, the information corresponding to the current source-side context vector may not be correctly transmitted to the corresponding
  • the target side context vector has too many under-translation and over-translation, so that the target-end sequence of the prediction acquisition cannot faithfully reflect the information of the source-end sequence.
  • the embodiments of the present invention provide a sequence conversion method and apparatus, which can improve the accuracy of a target end sequence relative to a source end sequence when performing sequence conversion.
  • a first aspect of the present invention provides a sequence conversion method comprising:
  • the preset adjustment factor when the translation probability value of the candidate target end sequence is adjusted, the preset adjustment factor may be directly used, or a preset adjustment algorithm may be used. Among them, the use of the adjustment factor can improve the processing efficiency of the system, and the adjustment algorithm can improve the accuracy of the adjusted translation probability value.
  • the word vectorization technique may be used to convert the source sequence into a source vector representation sequence.
  • the acquiring, by the source vector representative sequence, the at least two candidate target sequences includes:
  • the adjusting the translation probability values of each candidate target end sequence includes:
  • the respective translation probability values are adjusted based on the decoded intermediate state sequence of each of the candidate target sequences.
  • the decoding intermediate state sequence can represent the translation accuracy of the corresponding candidate target sequence to a certain extent, the adjustment of the translation probability value according to the decoding intermediate state sequence can improve the accuracy of the adjusted translation probability value, thereby improving the final The accuracy of the target sequence.
  • the at least two candidate target end sequences comprise a first candidate target end sequence
  • the first The candidate target end sequence is any one of the at least two candidate target end sequences
  • the adjusting the translation probability values based on the decoded intermediate state sequence of each candidate target sequence includes:
  • the decoding intermediate state sequence can represent the translation accuracy of the corresponding candidate target sequence to a certain extent, the adjustment of the translation probability value according to the decoding intermediate state sequence can improve the accuracy of the adjusted translation probability value, thereby improving the final The accuracy of the target sequence.
  • the acquiring the first candidate based on a decoding intermediate state of the first candidate target sequence includes:
  • Obtaining a reconstruction probability value of the first candidate target sequence based on a reverse attention mechanism where an input of the reverse attention mechanism is a decoded intermediate state sequence of the first candidate target sequence, the reverse attention The output of the force mechanism is a reconstruction probability value of the first candidate target sequence.
  • the acquiring a reconstruction probability of the first candidate target sequence based on a reverse attention mechanism Values include:
  • g R () is the Softmax function
  • e j,k is the inverse attention mechanism score of the elements in the source sequence, obtained by the following function:
  • the source sequence is reconstructed according to the decoded intermediate state sequence, and the source sequence is determined. Therefore, the corresponding reconstruction probability value may be obtained according to the specific situation of the reconstructed back source sequence. Therefore, the reconstructed probability value obtained can also reflect the accuracy of the candidate target sequence, so adjusting the translation probability value according to the reconstructed probability value can ensure the accuracy of the adjusted translation probability value, thereby improving the output target sequence. accuracy.
  • the parameters ⁇ 1 , ⁇ 2 and ⁇ 3 are acquired by an end-to-end learning algorithm.
  • the parameters ⁇ 1 , ⁇ 2 and ⁇ 3 are obtained by training as follows:
  • ⁇ and ⁇ are parameters of the nervous system that need to be trained to acquire
  • represents the parameter ⁇ 1 , ⁇ 2 or ⁇ 3
  • N is the number of training sequence pairs in the training sequence set
  • X n is the source in the training sequence pair
  • Y n is the target end sequence in the training sequence pair
  • s n is the decoded intermediate state sequence when X n is converted to Y n
  • is linear interpolation.
  • the reconstructing probability value based on the first candidate target end sequence is paired with the first
  • the adjustment of the translation probability values of the candidate target sequence includes:
  • the translation probability value and the reconstruction probability value of the first candidate target end sequence are summed using linear interpolation to obtain an adjusted translation probability value of the first candidate target end sequence.
  • both the translation probability value and the reconstruction probability value can reflect the accuracy of the corresponding candidate target sequence relative to the source sequence to a certain extent
  • the linear interpolation method can be used to sum the two well. The balance is such that the adjusted translation probability value can better reflect the accuracy of the corresponding candidate target sequence, so that the output target sequence can better match the source sequence.
  • the source sequence is a natural language text or is obtained based on the one natural language text Text file
  • the target end sequence is another natural language text or a text file obtained based on the other natural language text
  • the source end sequence is a voice data file obtained by human voice content or based on the voice content of the human, and the target end sequence is a natural language text corresponding to the voice content or a text file obtained based on the natural language text. ;
  • the source sequence is a voice content file of a human or a voice data file obtained based on the voice content of the human
  • the target end sequence is a voice reply of the voice content of the human or a voice data file obtained based on the voice reply.
  • the source end sequence is a natural language text to be abstracted
  • the target end sequence is a summary of a natural language text to be abstracted
  • the abstract is a natural language text or a text file obtained based on the natural language text
  • the source sequence is an image or an image data file obtained based on the image, the target end sequence being a natural language caption of the image or a text file obtained based on the natural language caption.
  • a second aspect of the present invention provides a sequence conversion apparatus comprising:
  • a receiving unit configured to receive a source sequence
  • a converting unit configured to convert the source end sequence into a source end vector representation sequence
  • an acquiring unit configured to acquire at least two candidate target end sequences according to the source end vector representation sequence, and a translation probability value of each candidate target end sequence of the at least two candidate target end sequences
  • An adjusting unit configured to adjust a translation probability value of each candidate target end sequence
  • a selecting unit configured to select an output target end sequence from the at least two candidate target end sequences according to the adjusted translation probability value of each candidate target end sequence
  • an output unit configured to output the output target sequence.
  • the preset adjustment factor when the translation probability value of the candidate target end sequence is adjusted, the preset adjustment factor may be directly used, or a preset adjustment algorithm may be used. Among them, the use of the adjustment factor can improve the processing efficiency of the system, and the adjustment algorithm can improve the accuracy of the adjusted translation probability value.
  • the word vectorization technique may be used to convert the source sequence into a source vector representation sequence.
  • the acquiring unit is specifically configured to acquire at least two source side context vectors according to the source end vector representation sequence based on an attention mechanism; and acquire the at least two sources a respective decoded intermediate state sequence of the end context vectors; acquiring a candidate target end sequence of each of the at least two decoded intermediate state sequences;
  • the adjusting unit is specifically configured to adjust respective translation probability values based on the decoded intermediate state sequence of each candidate target sequence.
  • the decoding intermediate state sequence can represent the translation accuracy of the corresponding candidate target sequence to a certain extent, the adjustment of the translation probability value according to the decoding intermediate state sequence can improve the accuracy of the adjusted translation probability value, thereby improving the final The accuracy of the target sequence.
  • the at least two candidate target end sequences include a first candidate target end sequence, the first The candidate target end sequence is any one of the at least two candidate target end sequences;
  • the adjustment unit includes:
  • Obtaining a subunit configured to acquire, according to the decoded intermediate state sequence of the first candidate target sequence, a reconstruction probability value of the first candidate target sequence
  • a adjusting subunit configured to adjust a translation probability value of the first candidate target end sequence based on a reconstruction probability value of the first candidate target end sequence.
  • the obtaining subunit is specifically configured to:
  • Obtaining a reconstruction probability value of the first candidate target sequence based on a reverse attention mechanism where an input of the reverse attention mechanism is a decoded intermediate state sequence of the first candidate target sequence, the reverse attention The output of the force mechanism is a reconstruction probability value of the first candidate target sequence.
  • the reconstructed probability value obtaining subunit is specifically configured to:
  • g R () is the Softmax function
  • e j,k is the inverse attention mechanism score of the elements in the source sequence, obtained by the following function:
  • the source sequence is reconstructed according to the decoded intermediate state sequence, and the source sequence is determined. Therefore, the corresponding reconstruction probability value may be obtained according to the specific situation of the reconstructed back source sequence. Therefore, the reconstructed probability value obtained can also reflect the accuracy of the candidate target sequence, so adjusting the translation probability value according to the reconstructed probability value can ensure the accuracy of the adjusted translation probability value, thereby improving the output target sequence. accuracy.
  • the apparatus further includes a training unit, configured to acquire the parameter by an end-to-end learning algorithm training ⁇ 1 , ⁇ 2 and ⁇ 3 .
  • the training unit is specifically configured to acquire the parameters ⁇ 1 , ⁇ 2 by the following function training And ⁇ 3 :
  • ⁇ and ⁇ are parameters of the nervous system that need to be trained to acquire
  • represents the parameter ⁇ 1 , ⁇ 2 or ⁇ 3
  • N is the number of training sequence pairs in the training sequence set
  • X n is the source in the training sequence pair
  • Y n is the target end sequence in the training sequence pair
  • s n is the decoded intermediate state sequence when X n is converted to Y n
  • is linear interpolation.
  • the adjusting subunit is specifically configured to:
  • the translation probability value and the reconstruction probability value of the first candidate target end sequence are summed using linear interpolation to obtain an adjusted translation probability value of the first candidate target end sequence.
  • both the translation probability value and the reconstruction probability value can reflect the accuracy of the corresponding candidate target sequence relative to the source sequence to a certain extent
  • the linear interpolation method can be used to sum the two well. Balance from The adjusted translation probability value can better reflect the accuracy of the corresponding candidate target sequence, so that the output target sequence can better match the source sequence.
  • the source sequence is a natural language text or a text file obtained based on the one natural language text
  • the target end sequence is another natural language text or text obtained based on the other natural language text. file
  • the source end sequence is a voice data file obtained by human voice content or based on the voice content of the human, and the target end sequence is a natural language text corresponding to the voice content or a text file obtained based on the natural language text. ;
  • the source sequence is a voice content file of a human or a voice data file obtained based on the voice content of the human
  • the target end sequence is a voice reply of the voice content of the human or a voice data file obtained based on the voice reply.
  • the source end sequence is a natural language text to be abstracted
  • the target end sequence is a summary of a natural language text to be abstracted
  • the abstract is a natural language text or a text file obtained based on the natural language text
  • the source sequence is an image or an image data file obtained based on the image, the target end sequence being a natural language caption of the image or a text file obtained based on the natural language caption.
  • a third aspect of the present invention provides a sequence conversion apparatus including a processor and a memory, the memory storing executable instructions for instructing the processor to perform the following steps:
  • the output target sequence is output.
  • the preset adjustment factor when the translation probability value of the candidate target end sequence is adjusted, the preset adjustment factor may be directly used, or a preset adjustment algorithm may be used. Among them, the use of the adjustment factor can improve the processing efficiency of the system, and the adjustment algorithm can improve the accuracy of the adjusted translation probability value.
  • the word vectorization technique may be used to convert the source sequence into a source vector representation sequence.
  • the processor is configured to perform the following steps when acquiring at least two candidate target end sequences according to the source end vector representation sequence:
  • the processor is configured to perform the following steps when adjusting the translation probability value of each candidate target end sequence:
  • the respective translation probability values are adjusted based on the decoded intermediate state sequence of each of the candidate target sequences.
  • the decoding intermediate state sequence can represent the translation accuracy of the corresponding candidate target sequence to a certain extent, the adjustment of the translation probability value according to the decoding intermediate state sequence can improve the accuracy of the adjusted translation probability value, thereby improving the final The accuracy of the target sequence.
  • the at least two candidate target end sequences comprise a first candidate target end sequence, the first The candidate target end sequence is any one of the at least two candidate target end sequences;
  • the processor is configured to perform the following steps when adjusting the respective translation probability values based on the decoded intermediate state sequence of each candidate target sequence:
  • the processor is used when converting the source sequence to a source context vector sequence Perform the following steps:
  • the processor is configured to perform the following steps when acquiring the reconstruction probability value of the first candidate target sequence based on the decoding intermediate state of the first candidate target sequence:
  • Obtaining a reconstruction probability value of the first candidate target sequence based on a reverse attention mechanism where an input of the reverse attention mechanism is a decoded intermediate state sequence of the first candidate target sequence, the reverse attention The output of the force mechanism is a reconstruction probability value of the first candidate target sequence.
  • the processor acquires the first candidate target sequence based on a reverse attention mechanism When reconstructing the probability value, it is used to perform the following steps:
  • g R () is the Softmax function
  • e j,k is the inverse attention mechanism score of the elements in the source sequence, obtained by the following function:
  • the source sequence is reconstructed according to the decoded intermediate state sequence, and the source sequence is determined. Therefore, the corresponding reconstruction probability value may be obtained according to the specific situation of the reconstructed back source sequence. Therefore, the reconstructed probability value obtained can also reflect the accuracy of the candidate target sequence, so adjusting the translation probability value according to the reconstructed probability value can ensure the accuracy of the adjusted translation probability value, thereby improving the output target sequence. accuracy.
  • the executable instruction is further used to instruct the processor to perform the following steps: through end-to-end Learning algorithm training acquires parameters ⁇ 1 , ⁇ 2 and ⁇ 3 .
  • the following steps are performed when training the parameters ⁇ 1 , ⁇ 2 and ⁇ 3 by the following function training; :
  • ⁇ and ⁇ are parameters of the nervous system that need to be trained to acquire
  • represents the parameter ⁇ 1 , ⁇ 2 or ⁇ 3
  • N is the number of training sequence pairs in the training sequence set
  • X n is the source in the training sequence pair
  • Y n is the target end sequence in the training sequence pair
  • s n is the decoded intermediate state sequence when X n is converted to Y n
  • is linear interpolation.
  • the processor is configured to perform a reconstruction probability value based on the first candidate target end sequence.
  • the following steps are performed:
  • the translation probability value and the reconstruction probability value of the first candidate target end sequence are summed using linear interpolation to obtain an adjusted translation probability value of the first candidate target end sequence.
  • the linear interpolation method can be used to sum the two well. Balance so that the adjusted translation probability value can better reflect the accuracy of the corresponding candidate target sequence, thus making the output The target end sequence is better aligned with the source sequence.
  • the source sequence is a natural language text or a text file obtained based on the one natural language text
  • the target end sequence is another natural language text or text obtained based on the other natural language text. file
  • the source end sequence is a voice data file obtained by human voice content or based on the voice content of the human, and the target end sequence is a natural language text corresponding to the voice content or a text file obtained based on the natural language text. ;
  • the source sequence is a voice content file of a human or a voice data file obtained based on the voice content of the human
  • the target end sequence is a voice reply of the voice content of the human or a voice data file obtained based on the voice reply.
  • the source end sequence is a natural language text to be abstracted
  • the target end sequence is a summary of a natural language text to be abstracted
  • the abstract is a natural language text or a text file obtained based on the natural language text
  • the source sequence is an image or an image data file obtained based on the image, the target end sequence being a natural language caption of the image or a text file obtained based on the natural language caption.
  • a fourth aspect of the present invention provides a sequence conversion system including an input interface, an output interface, and a second aspect of the present invention, a possible implementation of the second aspect, a third aspect, and a possible implementation manner of the third aspect Any of the provided sequence conversion devices;
  • the input interface is configured to receive source data and convert the source data into the source sequence
  • the output interface is configured to output an output target sequence output by the sequence conversion device.
  • the input interface and the output interface may be different according to the specific manifestation of the sequence conversion system.
  • the input interface may be a network interface
  • the source data is from the client, the source.
  • the end data may be a voice data file acquired by the client, an image data file, a text file, etc.; correspondingly, the output interface may also be the foregoing network interface, for outputting the output target end sequence to the client end.
  • the input interface may be different according to the type of the source data required.
  • the input interface may be a keyboard, a mouse, a touch screen, or a tablet.
  • the output interface can be a network interface or a display interface.
  • the input interface may be a sound collection device such as a microphone, and the output interface may be a speaker, a network interface or a display interface (specifically, depending on the presentation form of the output target sequence).
  • the input interface may be an image acquisition device such as a camera, and the output interface may be a network interface or a display interface.
  • a fifth aspect of the invention provides a computer storage medium for storing executable instructions that, when executed, can implement any one of the first aspect and the possible implementations of the first aspect.
  • the translation probability value of the candidate target sequence is adjusted, so that the adjusted translation probability value can better reflect the target end.
  • the degree of coincidence between the sequence and the source sequence so when selecting the output candidate target sequence based on the adjusted translation probability value, The selected output target vector sequence can be more consistent with the source sequence, so that the acquired target end sequence can be better loyal to the source sequence, thereby improving the accuracy of the target end sequence relative to the source sequence.
  • Figure 1 is a structural view of a computer of the present invention
  • Figure 3 is a structural view of a neuron of the present invention.
  • FIG. 5 is a flowchart of an adjustment method according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of a sequence conversion in a sequence conversion method according to an embodiment of the present invention.
  • FIG. 7 is a structural diagram of a sequence conversion apparatus according to an embodiment of the present invention.
  • FIG. 8 is a structural diagram of an adjustment unit according to an embodiment of the present invention.
  • FIG. 9 is a structural diagram of a sequence conversion apparatus according to another embodiment of the present invention.
  • FIG. 10 is a structural diagram of a sequence conversion apparatus according to another embodiment of the present invention.
  • Figure 11 is a block diagram of a sequence conversion system according to an embodiment of the present invention.
  • the embodiment of the invention provides a sequence conversion method, which can be applied to any application scenario that requires sequence conversion, such as machine translation, speech recognition, dialogue system, automatic digest, automatic question and answer, and image description text generation.
  • Machine Translation (often abbreviated as MT, commonly known as machine flip) belongs to the category of computational linguistics. Its research uses computer programs to translate words or speeches from a natural language into another natural language. . Natural language usually refers to languages that naturally evolve with culture, such as English, Chinese, French, Spanish, and Japanese, all of which are natural languages.
  • the input natural language text can be manually input by a user through a keyboard, a mouse, a touch screen, a tablet, or the like, or can be input by a remote device through a network; the output natural language text can be directly presented through the display screen. It can also be output to the remote device through the network and presented by the remote device.
  • Speech recognition is the conversion of human speech content into natural language text through a computer.
  • the input human voice content can be input by a sound collection device such as a microphone, or can be input by a remote device and input through a network;
  • the natural language text can be presented directly through the display screen, or it can be output to the remote device through the network for presentation by the remote device.
  • the dialogue system is a voice conversation with a human being through a computer.
  • the input is human speech content
  • the output is a voice response corresponding to the input speech content.
  • the input human voice content can be input by a sound collection device such as a microphone, or can be input by the remote device and input through the network; the output voice response can be directly presented through a speaker or the like, or can be output to the remote end through the network.
  • the device is presented by the remote device.
  • the automatic question and answer is to answer human language questions through a computer.
  • the input and output of the automatic question and answer can be similar to the dialogue system, and will not be described here.
  • Automated abstracts are the general idea of generating a piece of natural text from a computer, usually used to provide abstracts of articles in known areas, such as generating a summary of an article in a newspaper.
  • the input of a piece of natural text can be manually input by the user through a keyboard, mouse, touch screen, tablet, etc., or can be input by the remote device through the network, or can be recognized by optical character recognition (OCR) technology.
  • OCR optical character recognition
  • Input; the summary of the output consists of natural language text, which can be directly presented through the display screen, or output to the remote device through the network for presentation by the remote device.
  • Image caption text generation is a caption that produces an image from a computer.
  • the input image can be input by the remote device and input through the network. It can also be input by the image acquisition device such as the camera.
  • the output description text is composed of natural language characters, which can be directly displayed through the display screen or output to the network through the network.
  • the end device is presented by the remote device.
  • FIG. 1 depicts a structure of a computer 100 including at least one processor 101, at least one network interface 104, and a memory 105. And at least one communication bus 102 for implementing connection communication between the devices.
  • the processor 101 is configured to execute an executable module stored in the memory 105 to implement the sequence conversion method of the present invention, wherein the executable module may be a computer program.
  • the computer 100 may further include at least one input interface 106 and at least one output interface 107 according to the role of the computer 100 in the system and the application scenario of the sequence conversion method.
  • the input interface 106 and the output interface 107 may be different according to the application scenario of the sequence conversion method. For example, when the sequence conversion method is applied to machine translation, if the input natural language text is manually input by a user through a manual input device such as a keyboard, a mouse, a touch screen, a tablet, etc., the input interface 106 needs to include a keyboard, a mouse, a touch screen, and a handwriting. An interface for communicating with a manual input device such as a board; if the output natural language text is directly presented through the display screen, the output interface 107 needs to include an interface for communicating with the display screen.
  • a manual input device such as a keyboard, a mouse, a touch screen, a tablet, etc.
  • An interface for communicating with a manual input device such as a board
  • the output natural language text is directly presented through the display screen
  • the output interface 107 needs to include an interface for communicating with the display screen.
  • the input interface 106 needs to include an interface for communicating with a sound collection device such as a microphone; if the output natural language text is displayed through The screen is rendered directly, and the output interface 107 needs to include an interface to communicate with the display.
  • the input interface 106 needs to include an interface for communicating with a sound collection device such as a microphone;
  • the voice response is directly presented by a sound emitting device such as a speaker, and the output interface 107 needs to include an interface for communicating with a sound emitting device such as a speaker.
  • the input interface 106 needs to include an interface for communication with a sound collection device such as a microphone; if the output voice is restored through a speaker or the like.
  • the playback device is directly presented, and the output interface 107 needs to include an interface for communicating with a playback device such as a speaker.
  • the input natural text is manually input by the user through a manual input device such as a keyboard, a mouse, a touch screen, a tablet, etc.
  • the input interface 106 needs to include a keyboard, a mouse, a touch screen, and a tablet.
  • the input interface 106 needs to include an interface for communication with an image acquisition device such as a camera; if the output explanatory text is directly through the display screen To present, the output interface 107 needs to include an interface to communicate with the display screen.
  • FIG. 2 is a flowchart of a sequence conversion method according to an embodiment of the present invention.
  • the method may be implemented based on a recurrent neural network technology. As shown in FIG. 2, the method includes:
  • the source sequence may vary according to the application scenario.
  • the source sequence when applied to machine translation, is a natural language text, which can be a phrase, a sentence, or even a paragraph of text.
  • speech recognition the source sequence is a piece of human speech content.
  • the source sequence When applied to a dialog system, is a piece of human speech content.
  • the source sequence When applied to an automatic question and answer, is a piece of human speech content.
  • the source sequence is a piece of natural text to be summarized.
  • the source sequence is an image of the caption to be generated.
  • the source sequence is represented by (x 1 , x 2 , x 3 , ..., x J ), where J is the number of elements in the source sequence.
  • the source vector indicates that the number of elements (source vector representation) in the sequence may be one or more, and the specific number may be different according to the situation of the source sequence and the conversion algorithm used.
  • the source sequence (x 1 , x 2 , x 3 , . . . , x J ) may be converted into a source vector representation sequence by a Recurrent Neural Network (RNN). 1 , h 2 , h 3 ,...,h J ).
  • RNN Recurrent Neural Network
  • the word vectorization technique can be used to convert the source sequence into a source vector representation sequence.
  • the word vectorization method when the application scenario is machine translation, when the source sequence is converted into the source vector representation sequence, the word vectorization method may be adopted, and the word vectorization refers to learning through a large amount of text (no need to mark), according to The semantics of each word are automatically learned before and after, and then each word is mapped into a representation of a real vector form.
  • each word vector is converted into a source vector representation, such as vec (China)
  • Subtracting vec (Beijing) is equal to vec (UK) minus Go to vec (London). Based on the word vector, you can extend a sentence to a vector and even map a paragraph to a vector.
  • the source sequence can also be converted into a source vector representation sequence in a similar manner to the above word vector.
  • the candidate target end sequence and the translation probability value of the candidate target end sequence can be obtained through the RNN.
  • the at least two source-side context vectors may be obtained according to the source-end vector representation sequence based on the attention mechanism; and the decoded intermediate state sequence of each of the at least two source-side context vectors is acquired; And acquiring a candidate target end sequence of each of the at least two decoded intermediate state sequences.
  • the source vector representation sequence (h 1 , h 2 , h 3 , . . . , h J ) is converted into the source side context vector (c 1 , c 2 , . . . , c I ) based on the attention mechanism
  • Each of the vectors of the source vector representation sequence (h 1 , h 2 , h 3 , . . . , h J ) is given a weight by the attention mechanism (the weight can be automatically learned from the training corpus), the weight represents the source
  • the end vector represents the alignment probability of the vector in the sequence (h 1 , h 2 , h 3 , . . . , h J ) and the target word to be generated, and the sequence of the vector obtained by weighting each vector in the source vector representation sequence.
  • This is the source context sequence.
  • the values of I and J can be different.
  • the source vector may represent the last vector in the sequence as the source context vector; or the attention mechanism may be used to use the weighted sum of all vectors in the source vector representation sequence at different decoding moments according to specific needs.
  • the way is summarized as a source-side context vector; the source-end sequence can also be summarized into a source-side context vector using the Convolutional Neural Network.
  • the process of acquiring the target end word in the specific candidate target sequence may include the following two steps:
  • the decoding intermediate state is a summary of past translation information, and can be updated by using a general RNN model.
  • the input of the update step includes: the current source context vector (c i ), the previous decoding intermediate state (s i-1 ), and the previous target end word (y i-1 ); the output of the update step Includes the current decoding intermediate state (s i ).
  • an initialization decoding intermediate state may be preset, and the initialization decoding intermediate state carries The information can be empty (ie all zeros) or it can be pre-set information.
  • the input of the generating step includes: a current source context vector (c i ), a current decoding intermediate state (s i ), and a previous target word (y i-1 ); the output of the generating step includes the current target The terminal (y i ) and the translation probability value.
  • the translation probability value indicates the degree of coincidence between the corresponding target end word and the source side context vector. Depending on the specific algorithm, the translation probability value may be as high as possible, or may be as low as possible, or may be The closer a preset value is set, the better.
  • an initialization target end word may be preset, and the initialization target end word carries The information can be empty (ie all zeros) or it can be pre-set information.
  • the translation probability value corresponding to each target end word may be directly multiplied to obtain a translation probability value of the target end sequence; or each target end word may be correspondingly Translation probability The product of the value, and then the value obtained by normalizing the product based on the number of target words is used as the translation probability value of the target end sequence.
  • the representation form of the element in the target end sequence is different from the representation form of the target end sequence.
  • the representation form of the element in the target end sequence may be a word vector, corresponding to Is one or a set of target words.
  • the specific target end sequence acquisition process may be different according to the application scenario.
  • the corresponding mainstream common technologies may be used in each application scenario, and the present invention does not limit the specific prediction method.
  • the acquisition process of acquiring at least two candidate target sequences may be performed simultaneously, or may be serially acquired, that is, one is acquired.
  • the candidate target end sequence is followed by another candidate target sequence.
  • the embodiment of the present invention does not limit the specific acquisition sequence, as long as the at least two candidate target sequences can be acquired, and the implementation of the embodiment of the present invention is not affected.
  • the translation probability value of the candidate target end sequence when it is adjusted, it may be performed according to a preset manner.
  • a possible implementation manner is that some adjustment factors are set in advance, and the translation probability values can be directly adjusted by using these preset adjustment factors when adjusting the translation probability values, so that the adjusted translation probability values can be obtained;
  • an adjustment factor acquisition algorithm is preset, and the input of the acquisition algorithm may be a candidate target end sequence, or a translation probability value of the candidate target end sequence, or may be in the process of acquiring the candidate target end sequence.
  • the intermediate information may be a decoding intermediate state sequence, and/or a target end sequence or the like.
  • the intermediate information may be a decoding intermediate state sequence, and the relationship between the intermediate state sequence and the candidate target end sequence is decoded, and the decoding of the intermediate state sequence includes acquiring the corresponding candidate target sequence. Get the intermediate state of decoding.
  • the use of the adjustment factor can improve the processing efficiency of the system, and the adjustment algorithm can further improve the accuracy of the adjustment, thereby improving the coincidence degree between the output target end sequence and the source end sequence.
  • the adjustment process of the at least two translation probability values may be performed simultaneously, or another translation probability value may be adjusted after adjusting another Translate probability values.
  • the embodiments of the present invention do not limit the specific adjustment order.
  • the candidate target sequence with the highest translation probability value may be directly selected, or the candidate target sequence with the lowest translation probability value may be selected, or the translation probability may be selected.
  • the specific output process may be different according to the application scenario.
  • Corresponding mainstream general technologies may be used in each application scenario.
  • the present invention does not limit the specific output method.
  • the present invention adjusts the translation probability value of the candidate target end sequence, so that the adjusted translation probability value can better reflect the coincidence degree between the target end sequence and the source end sequence, and therefore, according to the adjustment
  • the translation probability value is selected to output the candidate target end sequence
  • the selected output target end sequence can be more consistent with the source end sequence, so that the acquired target end sequence can be better loyal to the source end sequence without affecting the fluency. , thereby improving the accuracy of the target end sequence relative to the source sequence, wherein the accuracy includes loyalty and fluency.
  • Neurons are the simplest neural network.
  • the neural network is a research hotspot in the field of artificial intelligence since the 1980s. It abstracts the human brain neuron network from the perspective of information processing, establishes a simple model, and forms different networks according to different connection methods. In engineering and academia, it is often referred to directly as a neural network or a neural network.
  • a neural network is an operational model consisting of a large number of neurons (nodes) connected to each other. Each node represents a specific output function called an activation function. The connection between every two nodes represents a weighting value for passing the connection signal, called weight, which is equivalent to the memory of the artificial neural network.
  • the output of the network varies depending on the connection method of the network, the weight value and the activation function.
  • Figure 3 depicts the structure of a neuron.
  • the neuron is an operation unit with x 1 , x 2 , x 3 and intercept +1 as input values, and the specific expression of the neuron can be an activation function, for example, as shown in FIG.
  • the output of a neuron can be expressed in the form of a function (1) as follows:
  • W i is a weight vector
  • b is a bias unit
  • function f is an activation function.
  • the activation function can be implemented by a sigmoid function, and a typical sigmoid function has the following function (2). Show:
  • RNN Recurrent Neural Network
  • the main idea of RNN is to cyclically compress the input sequence into a vector of fixed dimensions, also known as the intermediate state. Specifically, the RNN cyclically reads the input sequence, calculates a decoding intermediate state corresponding to the current source context vector according to the current source context vector and the previous decoding intermediate state, and predicts the current source according to the current decoding intermediate state and the current source context vector. The word on the target end corresponding to the end context vector.
  • the current source context vector, the previous decoding intermediate state, and the previous target end word may be used to calculate the decoding intermediate state corresponding to the current source context vector, and then according to the current decoding intermediate state, the current The source side context vector and the previous target end word predict the target end word corresponding to the current source side context vector. Finally, the final target end sequence can be obtained from all the target words obtained.
  • FIG. 4 is a flowchart of a sequence conversion method according to another embodiment of the present invention, including:
  • the translation probability value can be adjusted according to a preset method.
  • FIG. 5 depicts a flow of an adjustment method provided by an embodiment of the present invention, including:
  • the first candidate target end sequence is any one of the at least two candidate target end sequences.
  • the obtaining of the reconstructed probability value may be specifically obtained through an attention mechanism.
  • the mainstream attention may be adopted by an inverse attention mechanism to obtain a reconstructed probability value.
  • the decoded intermediate state sequence (s 1 , s 2 , . . . , s I ) of the first candidate target sequence may be used as an input of the reverse attention mechanism.
  • the output of the reverse attention mechanism is the reconstructed probability value of the first candidate target sequence.
  • the reconstruction refers to forcibly decoding the decoded intermediate state sequence (s 1 , s 2 , . . . , s I ) into the source sequence (x 1 , x 2 , . . .
  • the reconstructed probability value indicates the degree of coincidence between the decoded intermediate state sequence and the source end sequence, and specifically, the higher the reconstructed probability value is, the higher the degree of coincidence is, or the lower the reconstructed probability value is, the higher the degree of coincidence is, or The closer the reconstruction probability value is to the preset reference value, the higher the degree of coincidence.
  • the reconstruction probability value of the first candidate target sequence may be specifically obtained according to the following function (3):
  • g R () is a Softmax function
  • the Softmax function can perform a normalization operation on all word vectors in the source sequence obtained by the reconstruction to obtain a reconstruction probability value of each word vector, and then through the above function (3) A reconstruction probability value of the first candidate target sequence is determined.
  • the Softmax function is a function commonly used in neural networks, and will not be described here.
  • e j,k is the inverse attention mechanism score of the elements in the source sequence, which can be obtained by the following function (6):
  • the specific end-to-end learning algorithm can be preset.
  • the translation probability value and the reconstruction probability value of the first candidate target end sequence may be summed by using linear interpolation to obtain an adjusted translation probability value of the first candidate target end sequence;
  • the result of the linear interpolation summation may be directly used as the adjusted translation probability value.
  • the result of the linear interpolation summation may be further processed, and the processed result is used as the adjusted result.
  • the translation probability value and the reconstruction probability value of the first candidate target sequence may also be summed using a weighted average to obtain an adjusted translation probability value of the first candidate target sequence. It is also possible to directly add the translation probability value of the first candidate target end sequence and the reconstruction probability value, and obtain the obtained sum as the adjusted translation probability value of the first candidate target end sequence.
  • both the translation probability value and the reconstruction probability value can reflect the accuracy of the corresponding candidate target sequence relative to the source sequence to a certain extent
  • the linear interpolation method can be used to sum the two well. The balance is such that the adjusted translation probability value can better reflect the accuracy of the corresponding candidate target sequence, so that the finally obtained target end sequence can better match the source sequence.
  • the steps 4041 and 4042 can be performed once for each candidate target sequence, that is, several candidate target sequences are executed cyclically several times, thereby obtaining the adjusted translation probability values of each candidate target sequence.
  • steps 405 and 406 For specific implementations of steps 405 and 406, reference may be made to specific implementations of steps 205 and 206, and details are not described herein again.
  • the translation probability value of the candidate target end sequence is adjusted, so that the adjusted translation probability value can better reflect the degree of coincidence between the target end sequence and the source end sequence, and therefore adjusted
  • the translation probability value is selected to output the candidate target end sequence
  • the selected output target end sequence can be more consistent with the source end sequence, so that the acquired target end sequence can be better loyal to the source end sequence, thereby improving the target end sequence relative.
  • the accuracy of the sequence at the source end; at the same time, since the acquisition of the candidate target sequence needs to be based on the corresponding decoded intermediate state sequence, the step of increasing the decoding intermediate state sequence of each candidate target sequence does not substantially increase the sequence.
  • the processing load of the conversion device since the decoding intermediate state sequence can represent the translation accuracy of the corresponding candidate target sequence to a certain extent, the adjustment of the translation probability value according to the decoding intermediate state sequence can improve the adjusted translation probability value. The accuracy of the final target end sequence is thus improved.
  • the parameters ⁇ 1 , ⁇ 2 and ⁇ 3 are obtained by training in an end-to-end learning algorithm.
  • the parameters ⁇ 1 , ⁇ 2 and ⁇ 3 are specific It can be obtained by training as follows: (8):
  • ⁇ and ⁇ are parameters of the nervous system that need to be trained to acquire
  • represents the parameter ⁇ 1 , ⁇ 2 or ⁇ 3
  • N is the number of training sequence pairs in the training sequence set
  • X n is the source in the training sequence pair
  • Y n is the target end sequence in the training sequence pair
  • s n is the decoded intermediate state sequence when X n is converted to Y n
  • is linear interpolation.
  • can be set manually in advance, or can be obtained through training through function control.
  • the function (8) consists of two parts: (likelihood) probability and reconstruction (reconstruction) probability, wherein the likelihood probability can well evaluate the fluency of the translation, and the reconstruction probability can evaluate the loyalty of the translation. Combining the two can better assess the quality of the translation, effectively guiding the parameter training to produce better translation results.
  • the specific expression form of the training sequence pair may be different according to the specific scenario applied by the sequence conversion method; for example, when the application scenario is machine translation, each training sequence pair is a pair of natural language sentences that are mutually translated.
  • the decoder intermediate state may be encouraged to include as much as possible when training the acquisition parameters ⁇ 1 , ⁇ 2 and ⁇ 3 Complete source-side information to increase loyalty to the target-end sequence.
  • FIG. 6 is a flowchart showing a sequence conversion process in a sequence conversion method according to an embodiment of the present invention, as shown in FIG.
  • sequence conversion In performing the sequence conversion method of the present invention, the following sequence conversion is included:
  • the source sequence (x 1 , x 2 , x 3 , ..., x J ) is first converted into a source vector representation sequence (h 1 , h 2 , h 3 , ..., h J ).
  • the process can be specifically implemented by word vectorization technology.
  • the source vector representation sequence (h 1 , h 2 , h 3 , ..., h J ) is converted into the source context vector (c 1 , c 2 ,..., c I ) by the attention mechanism;
  • the values of I and J may be the same or different.
  • the decoding intermediate state s i refers to the previous decoding intermediate state s i-1 where 1 ⁇ i ⁇ I. It should be noted that, since there is no previous decoding intermediate state that can be referred to when the decoding intermediate state s 1 is obtained, referring to the preset initial decoding intermediate state, the initialization decoding intermediate state may carry null information (ie, The information is not limited. The information in the embodiment of the present invention does not specifically limit the carried information.
  • a candidate target sequence (y 1 , y 2 , ... based on the decoded intermediate state sequence (s 1 , s 2 , . . . , s I ) and the source-side context vector sequence (c 1 , c 2 , . . . , c I ). , y I ); at the same time, at this step, the translation probability value of the candidate target end sequence (y 1 , y 2 , . . .
  • the target end sequence (y 1 , y 2 , y I ) is simultaneously output, specifically, the target end sequence (y 1 , y 2 , In the process of ..., y I ), the translation probability value 1 of the target end word y 1 is obtained by s 1 and c 1 respectively, and the translation probability value 2 of the target end word y 2 is obtained by s 2 and c 2, ...
  • the translation probability value J of the target end word y J is obtained by s J and c J , and then the candidate target end sequence (y 1 ) can be obtained based on the obtained translation probability value 1, the translation probability value 2, ..., and the translation probability value J. , y 2 ,...,y I ) translation probability values.
  • the translation probability value 1, the translation probability value 2, ..., and the translation probability value J may be multiplied to obtain a final translation probability value; or a translation probability value of 1, a translation probability value of 2, ..., and After the product of the probability value J is translated, the product is normalized based on the value of J to obtain the final translation probability value.
  • the initialization intermediate state is initialized with reference to the preset initialization intermediate state, which may carry null information (ie, all zeros), or may be pre-set information.
  • the embodiment of the present invention does not specifically limit the carried information.
  • the reconstructed probability value 1, the reconstructed probability value 2, . . . , and the reconstructed probability value J may be multiplied to obtain a final reconstructed probability value; or the reconstructed probability value 1 may be obtained, and the probability value may be reconstructed.
  • the product of 2, ..., and the reconstructed probability value J the product is normalized based on the value of J to obtain the final reconstructed probability value.
  • the step BD needs to be performed at least twice, and the at least two executions may be performed simultaneously or sequentially, and the present invention does not limit the specific execution order.
  • the step EG also needs to be performed at least twice, and the at least two executions may be performed simultaneously or sequentially.
  • the present invention does not limit the specific execution order, as long as at least two reconstruction probability values can be obtained without affecting the implementation of the embodiments of the present invention.
  • sequence conversion method provided by the present invention has been described in detail above.
  • the inventor tested the sequence conversion method provided by the present invention in the case where the application scenario is machine translation to test The accuracy of translating Chinese into English.
  • the prior art and the present invention are all implemented under a neural network, wherein the prior art adopts a standard neural network machine translation (NMT) system, and the test result of the present invention is increased on the basis of NMT. Acquired by the solution of the present invention.
  • NMT neural network machine translation
  • Table 1 describes training of a training sequence set with 1.25 million training sequence pairs to obtain parameters ⁇ 1 , ⁇ 2 or ⁇ 3 , which are then acquired on a standard public test set using the techniques of the present invention and prior art. Bleuual Evaluation Understudy score.
  • Beam is the search space in Table 1
  • Tuning is the development set
  • MT05, MT06 and MT08 are three different test sets
  • All represents the test set sequence
  • Oracle represents the theoretical optimal value.
  • the BLEU score of the present technology is higher than the prior art under each condition, and on average, 2.3 BLEU scores are improved compared to the prior art.
  • the prior art reduces the translation quality when the search space (Beam in Table 1) is increased, and the present invention also overcomes the shortcoming of the prior art, that is, the search space is larger. The quality of the translation is also better.
  • the BLEU score is already higher than that of the prior art, that is, only applying the technique of the present invention to training can improve the quality of sequence conversion.
  • the quality of the sequence conversion can be further improved when the technique of the present invention is to be applied to training and testing.
  • the technology of the present invention is better compatible with the existing related enhancement technologies, that is, the coverage model and the context gate mechanism, and can improve the BLEU score after applying the technology of the present invention, and thus can be compared with the existing
  • the complementary enhancement techniques complement each other to further improve the quality of sequence conversion (machine translation).
  • FIG. 7 illustrates the structure of a sequence conversion apparatus 500 according to an embodiment of the present invention.
  • the sequence conversion apparatus 500 includes:
  • the receiving unit 501 is configured to receive a source sequence. For specific implementation, refer to the description of step 201, and details are not described herein again.
  • the converting unit 502 is configured to convert the source sequence into a source vector representation sequence. For specific implementation, refer to the description of step 202, and details are not described herein again.
  • the obtaining unit 503 is configured to obtain, according to the source end vector representation sequence, at least two candidate target end sequences, and a translation probability value of each of the at least two candidate target end sequences. For specific implementation, refer to the description of step 203, and details are not described herein again.
  • the adjusting unit 504 is configured to adjust a translation probability value of each candidate target end sequence. For specific implementation, refer to the description of step 204, and details are not described herein again.
  • the selecting unit 505 is configured to select an output target end sequence from the at least two candidate target end sequences according to the adjusted translation probability value of each candidate target end sequence. For a specific implementation, reference may be made to the description of step 205, and details are not described herein again.
  • the output unit 506 is configured to output the output target sequence.
  • the present invention adjusts the translation probability value of the candidate target end sequence, so that the adjusted translation probability value can better reflect the coincidence degree between the target end sequence and the source end sequence, and therefore, according to the adjustment
  • the translation probability value is selected to output the candidate target end sequence
  • the selected output target end sequence can be more consistent with the source end sequence, so that the acquired target end sequence can be better loyal to the source end sequence, thereby improving the target end sequence. Relative to the accuracy of the source sequence.
  • the obtaining unit 503 in FIG. 7 may be specifically configured to acquire at least two source side context vectors according to the source end vector representation sequence according to an attention mechanism; and obtain the at least two source end contexts.
  • Each of the vectors decodes an intermediate state sequence; acquires a candidate target end sequence of each of the at least two decoded intermediate state sequences.
  • the adjusting unit 504 is specifically configured to adjust the respective translation probability values based on the decoded intermediate state sequence of each candidate target sequence.
  • adjusting the translation probability value based on the decoded intermediate state sequence does not further increase the processing load of the sequence conversion device; meanwhile, since the decoding intermediate state sequence can be certain The degree represents the translation accuracy of the corresponding candidate target sequence, so the adjustment of the translation probability value according to the decoded intermediate state sequence can improve the accuracy of the adjusted translation probability value, thereby improving the accuracy of the final target end sequence.
  • the adjusting unit 504 included in the sequence conversion apparatus provided by the embodiment of the present invention may specifically include: an obtaining subunit 5041, configured to be based on the sequence of the first candidate target end Decoding an intermediate state sequence to obtain a reconstruction probability value of the first candidate target end sequence, the first candidate target end sequence is any one of the at least two candidate target end sequences; the adjusting subunit 5042 is configured to be based on The reconstruction probability value of the first candidate target sequence adjusts a translation probability value of the first candidate target sequence.
  • the obtaining sub-unit 5041 may be specifically configured to: acquire a reconstruction probability value of the first candidate target sequence based on a reverse attention mechanism, where the input of the reverse attention mechanism is The decoded intermediate state sequence of the first candidate target sequence, the output of the reverse attention mechanism is a reconstruction probability value of the first candidate target sequence.
  • the obtaining sub-unit 5041 may be configured to acquire a reconstruction probability value of the first candidate target sequence according to the following function:
  • g R () is the Softmax function
  • e j,k is the inverse attention mechanism score of the elements in the source sequence, which can be obtained by the following function:
  • the adjusting subunit 5042 is configured to perform a linear interpolation method on the translation probability value and the reconstruction probability value of the first candidate target end sequence to obtain the The adjusted translation probability value of the first candidate target sequence.
  • both the translation probability value and the reconstruction probability value can reflect the accuracy of the corresponding candidate target sequence relative to the source sequence to a certain extent
  • the linear interpolation method can be used to sum the two well. The balance is such that the adjusted translation probability value can better reflect the accuracy of the corresponding candidate target sequence, so that the finally obtained target end sequence can better match the source sequence.
  • FIG. 9 depicts the structure of a sequence conversion device 800 provided by another embodiment of the present invention.
  • the sequence conversion device 800 depicted in FIG. 9 has a training unit 801 added to the end-to-end device as compared to the sequence conversion device 500 described in FIG.
  • the learning algorithm training acquires the parameters ⁇ 1 , ⁇ 2 and ⁇ 3 ; the obtaining unit 503 can acquire the candidate target end sequences by training the acquired parameters ⁇ 1 , ⁇ 2 and ⁇ 3 .
  • the remaining input unit 501, the converting unit 502, the obtaining unit 503, the adjusting unit 504, the selecting unit 505, and the output unit 506, reference may be made to the foregoing description, and details are not described herein.
  • the training unit 801 is specifically configured to acquire the parameters ⁇ 1 , ⁇ 2 and ⁇ 3 by the following function:
  • ⁇ and ⁇ are parameters of the nervous system that need to be trained to acquire
  • represents the parameter ⁇ 1 , ⁇ 2 or ⁇ 3
  • N is the number of training sequence pairs in the training sequence set
  • X n is the source in the training sequence pair
  • Y n is the target end sequence in the training sequence pair
  • s n is the decoded intermediate state sequence when X n is converted to Y n
  • is linear interpolation.
  • FIG. 10 illustrates a structure of a sequence conversion apparatus 900 according to another embodiment of the present invention, including at least one processor 902 (eg, a CPU), at least one network interface 905 or other communication interface, a memory 906, and at least one communication bus 903. Used to implement connection communication between these devices.
  • the processor 902 is configured to execute executable modules, such as computer programs, stored in the memory 906.
  • the memory 906 may include a high speed random access memory (RAM: Random Access Memory), and may also include a non-volatile memory, for example, to One less disk storage.
  • the communication connection between the system gateway and at least one other network element is implemented by at least one network interface 705 (which may be wired or wireless), and may use an Internet, a wide area network, a local network, a metropolitan area network, or the like.
  • the memory 906 stores a program 9061 that can be executed by the processor 902, and when executed, the sequence conversion method provided by the present invention can be performed.
  • the source end sequence is a natural language text or a text file obtained based on the one natural language text
  • the target end sequence is another a natural language text or a text file obtained based on the other natural language text
  • the source end sequence is a speech content file of a human or a speech data file obtained based on the speech content of the human
  • the target end sequence is a natural language text corresponding to the speech content or based on a text file obtained from natural language
  • the source sequence is a voice content file of a human or a voice data file obtained based on the voice content of the human
  • the target end sequence is a voice response or based on the voice content of the human.
  • the source end sequence is a natural language text to be abstracted
  • the target end sequence is a summary of the natural language text to be abstracted
  • the abstract is a natural language text or obtained based on the natural language text.
  • the source end sequence is an image or an image data file obtained based on the image
  • the target end sequence is a natural language explanatory text of the image or obtained based on the natural language explanatory text. Text file.
  • the sequence conversion apparatus adjusts the translation probability value of the candidate target end sequence when performing sequence conversion, so that the adjusted translation probability value can better reflect the coincidence degree between the target end sequence and the source end sequence, so
  • the output candidate target end sequence is selected according to the adjusted translation probability value, the selected output target end sequence can be more consistent with the source end sequence, so that the acquired target end sequence can be better loyal to the source end sequence, thereby improving The accuracy of the target sequence relative to the source sequence.
  • Figure 11 illustrates the structure of a sequence conversion system 1100 provided by an embodiment of the present invention. As shown in Figure 11, the system includes:
  • the input interface 1101 is configured to receive source data and convert the source data into the source sequence; the source sequence obtained by the conversion may be input to the sequence conversion device 1102;
  • the specific processing process of converting the source data into the source sequence may be different according to the presentation form of the source data.
  • the source data is human voice
  • the human voice is converted into a voice data file as a source sequence.
  • the source data is an image
  • the image is converted into an image data file as a source sequence;
  • the source data is a natural language text
  • the natural language text is converted into a text file as a source sequence.
  • the specific conversion process may use existing common well-known techniques, and the present invention does not limit the specific conversion process.
  • the output interface 1103 is configured to output an output target sequence output by the sequence conversion device 1102.
  • the input interface 1101 and the output interface 1103 may be different according to the specific manifestation of the sequence conversion system.
  • the input interface 1101 may be a network interface
  • the source data is from the client.
  • the source data may be a voice data file, an image data file, a text file, and the like acquired by the client.
  • the output interface 1103 may also be the foregoing network interface, and configured to output the output target sequence. Give the client.
  • the content is based on the same concept as the method embodiment of the present invention.
  • the description in the method embodiment of the present invention and details are not described herein again.
  • the above storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM: Read-Only Memory), or a RAM.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

序列转换方法及装置,其中序列转换方法包括:接收源端序列(201);将所述源端序列转换为源端向量表示序列(202);根据所述源端向量表示序列获取至少两个候选目标端序列,以及所述至少两个候选目标端序列中每一个候选目标端序列的翻译概率值(203);对所述每一个候选目标端序列的翻译概率值进行调整(204);根据所述每一个候选目标端序列的调整后的翻译概率值,从所述至少两个候选目标端序列中选择输出目标端序列(205);输出所述输出目标端序列(206)。使用该方法和装置,能够在进行序列转换时提高目标端序列相对于源端序列的忠诚度。

Description

序列转换方法及装置
本申请要求于2016年11月4日提交中国专利局、申请号为201610982039.7、发明名称为“序列转换方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机技术,具体涉及一种序列转换方法及装置。
背景技术
随着计算机技术的飞速发展,深度学习的研究也取得了较大进展,涉及自然语言处理的序列到序列学习(sequence-to-sequence learning)也取得了突破性的进展,序列到序列学习是一种将源端序列映射到目标端序列的学习过程。序列到序列学习的成果主要用于序列转换,典型的序列转换的应用场景包括机器翻译(Machine Translation)、语音识别(speech recognition)、对话系统(dialog system or conversational agent)、自动摘要(automatic summarization)、自动问答(question answering)和图像说明文字生成(image caption generation)等等。
一种典型的序列转换方法包括两个阶段:编码阶段和解码阶段。其中,编码阶段一般会通过循环神经网络(Recurrent Neural Network,RNN)将源端序列转化源端向量表示序列,然后再通过注意力机制(attention mechanism)将源端向量表示序列转换为源端上下文向量,,具体地,每次选择源端序列的一部分转化为源端上下文向量,因此源端序列可以被转化为多个源端上下文向量(从而为解码阶段的每个目标端词生成对应的源端上下文向量)。解码阶段通过每次生成一个目标端词的方式生成目标端序列:在每一步中,解码器根据编码阶段获取的当前源端上下文向量,以及上一步的解码器中间状态和上一步生成的目标端词,来计算当前的解码器中间状态,根据该当前的中间状态及源端上下文向量预测当前步骤的目标端词。
在序列转换方法应用到自然语言处理时,由于源端序列和目标端序列的长度都是不固定的,因此在解码阶段也可以使用RNN进行处理。
由于RNN在预测过程中主要参考的还是已经预测获取的目标端上下文向量,而仅仅将源端上下文向量作为一个额外的输入,导致当前源端上下文向量对应的信息可能不会正确地传递到对应的目标端上下文向量,存在过多的遗漏翻译(under-translation)和过度翻译(over-translation),从而导致预测获取的目标端序列不能忠诚地体现源端序列的信息。
发明内容
本发明实施例提供了序列转换方法及装置,能够在进行序列转换时提高目标端序列相对于源端序列的准确性。
本发明的第一方面提供了一种序列转换方法,包括:
接收源端序列;
将所述源端序列转换为源端向量表示序列;
根据所述源端向量表示序列获取至少两个候选目标端序列,以及所述至少两个候选目标端序列中每一个候选目标端序列的翻译概率值;
对所述每一个候选目标端序列的翻译概率值进行调整;
根据所述每一个候选目标端序列的调整后的翻译概率值,从所述至少两个候选目标端序列中选择输出目标端序列;输出所述输出目标端序列。
其中,对候选目标端序列的翻译概率值进行调整时,可以直接使用预先设置好的调整因子,也可以使用预先设置的调整算法。其中,使用调整因子可以提高系统的处理效率,使用调整算法可以提高调整后的翻译概率值的准确性。
其中,具体可以采用词向量化技术将源端序列转换为源端向量表示序列。
结合第一方面,在一种可能的实施方式中,所述根据所述源端向量表示序列获取至少两个候选目标端序列包括:
基于注意力机制根据所述源端向量表示序列获取至少两个源端上下文向量;
获取所述至少两个源端上下文向量各自的解码中间状态序列;
获取所述至少两个解码中间状态序列各自的候选目标端序列;
所述对所述每一个候选目标端序列的翻译概率值进行调整包括:
基于所述每一个候选目标端序列的解码中间状态序列对各自的翻译概率值进行调整。
由于解码中间状态序列可以在一定程度上代表对应的候选目标端序列的翻译准确度,因此根据解码中间状态序列对翻译概率值的调整可以提高调整后的翻译概率值的准确性,从而提高最终的目标端序列的准确性。
结合第一方面和前述的第一方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述至少两个候选目标端序列包括第一候选目标端序列,所述第一候选目标端序列是所述至少两个候选目标端序列中的任意一个;
所述基于所述每一个候选目标端序列的解码中间状态序列对各自的翻译概率值进行调整包括:
基于所述第一候选目标端序列的解码中间状态序列获取所述第一候选目标端序列的重构概率值;
基于所述第一候选目标端序列的重构概率值对所述第一候选目标端序列的翻译概率值进行调整。
由于解码中间状态序列可以在一定程度上代表对应的候选目标端序列的翻译准确度,因此根据解码中间状态序列对翻译概率值的调整可以提高调整后的翻译概率值的准确性,从而提高最终的目标端序列的准确性。
结合第一方面和前述的第一方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述基于所述第一候选目标端序列的解码中间状态获取所述第一候选目标端序列的重构概率值包括:
基于反向注意力机制获取所述第一候选目标端序列的重构概率值,所述反向注意力机制的输入是所述第一候选目标端序列的解码中间状态序列,所述反向注意力机制的输出是所述第一候选目标端序列的重构概率值。
结合第一方面和前述的第一方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述基于反向注意力机制获取所述第一候选目标端序列的重构概率值包括:
根据如下函数获取所述第一候选目标端序列的重构概率值:
Figure PCTCN2017108950-appb-000001
其中,gR()是Softmax函数;
Figure PCTCN2017108950-appb-000002
是通过反向注意力机制总结得到的向量,通过如下的函数获取:
Figure PCTCN2017108950-appb-000003
其中,
Figure PCTCN2017108950-appb-000004
是由反向注意力机制输出的对齐概率,通过如下的函数获取:
Figure PCTCN2017108950-appb-000005
其中,ej,k是源端序列中元素的反向注意力机制得分,通过如下的函数获取:
Figure PCTCN2017108950-appb-000006
Figure PCTCN2017108950-appb-000007
是获取重构概率值时的中间状态,通过如下的函数获取:
Figure PCTCN2017108950-appb-000008
xj是所述源端序列中的元素,J表示所述源端序列中元素的数量;si表示所述第一候选目标端序列的解码中间状态序列中的元素,I表示所述第一候选目标端序列的解码中间状态序列中元素的数量;fR是激活函数,R是重构概率值;γ1,γ2和γ3是参数。
由于在获取重构概率值时,会根据解码中间状态序列重构回源端序列,而源端序列是确定的,因此可以根据重构回源端序列的具体情况获取对应的重构概率值,因此获取的重构概率值也能够体现候选目标端序列的准确性,因此根据重构概率值对翻译概率值进行调整能够确保调整后的翻译概率值的准确性,从而提高输出的目标端序列的准确性。
结合第一方面和前述的第一方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述参数γ1,γ2和γ3通过端到端学习算法训练获取。
结合第一方面和前述的第一方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述参数γ1,γ2和γ3通过如下函数训练获取:
Figure PCTCN2017108950-appb-000009
其中,θ和γ是需要训练获取的神经系统的参数,γ表示所述参数γ1,γ2或γ3,N是训练序列集合中训练序列对的数量,Xn是训练序列对中的源端序列,Yn是训练序列对中的目标端序列,sn是Xn转换成Yn时的解码中间状态序列,λ是线性插值。
结合第一方面和前述的第一方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述基于所述第一候选目标端序列的重构概率值对所述第一候选目标端序列的翻译概率值进行调整包括:
对所述第一候选目标端序列的翻译概率值和重构概率值使用线性插值的方式求和,以获取所述第一候选目标端序列的调整后的翻译概率值。
由于翻译概率值和重构概率值都能够在一定程度上体现对应的候选目标端序列相对于源端序列的准确性,因此使用线性插值的方式对二者求和可以很好地对二者进行平衡,从而使调整后的翻译概率值能够更好地体现对应的候选目标端序列的准确性,从而使输出的目标端序列能够更好地与源端序列吻合。
结合第一方面和前述的第一方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述源端序列是一种自然语言文字或基于所述一种自然语言文字获得的文本文件,所述目标端序列是另一种自然语言文字或基于所述另一种自然语言文字获得的文本文件;
所述源端序列是人类的语音内容或基于所述人类的语音内容获得的语音数据文件,所述目标端序列是所述语音内容对应的自然语言文字或基于所述自然语言文字获得的文本文件;
所述源端序列是人类的语音内容或基于所述人类的语音内容获得的语音数据文件,所述目标端序列是所述人类的语音内容的语音回复或基于所述语音回复获得的语音数据文件;
所述源端序列是待摘要的自然语言文字,所述目标端序列是待摘要的自然语言文字的摘要,摘要是自然语言文字或基于所述自然语言文字获得的文本文件;或者
所述源端序列是图像或基于所述图像获得的图像数据文件,所述目标端序列是图像的自然语言说明文字或基于所述自然语言说明文字获得的文本文件。
本发明的第二方面提供了一种序列转换装置,包括:
接收单元,用于接收源端序列;
转换单元,用于将所述源端序列转换为源端向量表示序列;
获取单元,用于根据所述源端向量表示序列获取至少两个候选目标端序列,以及所述至少两个候选目标端序列中每一个候选目标端序列的翻译概率值;
调整单元,用于对所述每一个候选目标端序列的翻译概率值进行调整;
选择单元,用于根据所述每一个候选目标端序列的调整后的翻译概率值,从所述至少两个候选目标端序列中选择输出目标端序列;
输出单元,用于输出所述输出目标端序列。
其中,对候选目标端序列的翻译概率值进行调整时,可以直接使用预先设置好的调整因子,也可以使用预先设置的调整算法。其中,使用调整因子可以提高系统的处理效率,使用调整算法可以提高调整后的翻译概率值的准确性。
其中,具体可以采用词向量化技术将源端序列转换为源端向量表示序列。
结合第二方面,在一种可能的实施方式中,所述获取单元,具体用于基于注意力机制根据所述源端向量表示序列获取至少两个源端上下文向量;获取所述至少两个源端上下文向量各自的解码中间状态序列;获取所述至少两个解码中间状态序列各自的候选目标端序列;
所述调整单元,具体用于基于所述每一个候选目标端序列的解码中间状态序列对各自的翻译概率值进行调整。
由于解码中间状态序列可以在一定程度上代表对应的候选目标端序列的翻译准确度,因此根据解码中间状态序列对翻译概率值的调整可以提高调整后的翻译概率值的准确性,从而提高最终的目标端序列的准确性。
结合第二方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述至少两个候选目标端序列包括第一候选目标端序列,所述第一候选目标端序列是所述至少两个候选目标端序列中的任意一个;
所述调整单元包括:
获取子单元,用于基于所述第一候选目标端序列的解码中间状态序列获取所述第一候选目标端序列的重构概率值;
调整子单元,用于基于所述第一候选目标端序列的重构概率值对所述第一候选目标端序列的翻译概率值进行调整。
结合第二方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述获取子单元具体用于:
基于反向注意力机制获取所述第一候选目标端序列的重构概率值,所述反向注意力机制的输入是所述第一候选目标端序列的解码中间状态序列,所述反向注意力机制的输出是所述第一候选目标端序列的重构概率值。
结合第二方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述重构概率值获取子单元具体用于:
根据如下函数获取所述第一候选目标端序列的重构概率值:
Figure PCTCN2017108950-appb-000010
其中,gR()是Softmax函数;
Figure PCTCN2017108950-appb-000011
是通过反向注意力机制总结得到的向量,通过如下的函数获取:
Figure PCTCN2017108950-appb-000012
其中,
Figure PCTCN2017108950-appb-000013
是由反向注意力机制输出的对齐概率,通过如下的函数获取:
Figure PCTCN2017108950-appb-000014
其中,ej,k是源端序列中元素的反向注意力机制得分,通过如下的函数获取:
Figure PCTCN2017108950-appb-000015
Figure PCTCN2017108950-appb-000016
是获取重构概率值时的中间状态,通过如下的函数获取:
Figure PCTCN2017108950-appb-000017
xj是所述源端序列中的元素,J表示所述源端序列中元素的数量;si表示所述第一候选目标端序列的解码中间状态序列中的元素,I表示所述第一候选目标端序列的解码中间状态序列中元素的数量;fR是激活函数,R是重构概率值;γ1,γ2和γ3是参数。
由于在获取重构概率值时,会根据解码中间状态序列重构回源端序列,而源端序列是确定的,因此可以根据重构回源端序列的具体情况获取对应的重构概率值,因此获取的重构概率值也能够体现候选目标端序列的准确性,因此根据重构概率值对翻译概率值进行调整能够确保调整后的翻译概率值的准确性,从而提高输出的目标端序列的准确性。
结合第二方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述装置还包括训练单元,用于通过端到端学习算法训练获取所述参数γ1,γ2和γ3
结合第二方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述训练单元,具体用于通过如下函数训练获取所述参数γ1,γ2和γ3
Figure PCTCN2017108950-appb-000018
其中,θ和γ是需要训练获取的神经系统的参数,γ表示所述参数γ1,γ2或γ3,N是训练序列集合中训练序列对的数量,Xn是训练序列对中的源端序列,Yn是训练序列对中的目标端序列,sn是Xn转换成Yn时的解码中间状态序列,λ是线性插值。
结合第二方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述调整子单元,具体用于:
对所述第一候选目标端序列的翻译概率值和重构概率值使用线性插值的方式求和,以获取所述第一候选目标端序列的调整后的翻译概率值。
由于翻译概率值和重构概率值都能够在一定程度上体现对应的候选目标端序列相对于源端序列的准确性,因此使用线性插值的方式对二者求和可以很好地对二者进行平衡,从 而使调整后的翻译概率值能够更好地体现对应的候选目标端序列的准确性,从而使输出的目标端序列能够更好地与源端序列吻合。
结合第二方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,
所述源端序列是一种自然语言文字或基于所述一种自然语言文字获得的文本文件,所述目标端序列是另一种自然语言文字或基于所述另一种自然语言文字获得的文本文件;
所述源端序列是人类的语音内容或基于所述人类的语音内容获得的语音数据文件,所述目标端序列是所述语音内容对应的自然语言文字或基于所述自然语言文字获得的文本文件;
所述源端序列是人类的语音内容或基于所述人类的语音内容获得的语音数据文件,所述目标端序列是所述人类的语音内容的语音回复或基于所述语音回复获得的语音数据文件;
所述源端序列是待摘要的自然语言文字,所述目标端序列是待摘要的自然语言文字的摘要,摘要是自然语言文字或基于所述自然语言文字获得的文本文件;或者
所述源端序列是图像或基于所述图像获得的图像数据文件,所述目标端序列是图像的自然语言说明文字或基于所述自然语言说明文字获得的文本文件。
本发明的第三方面提供了一种序列转换装置,包括处理器和存储器,所述存储器存储了可执行指令,所述可执行指令用于指示所述处理器执行如下步骤:
接收源端序列;
将所述源端序列转换为源端向量表示序列;
根据所述源端向量表示序列获取至少两个候选目标端序列,以及所述至少两个候选目标端序列中每一个候选目标端序列的翻译概率值;
对所述每一个候选目标端序列的翻译概率值进行调整;
根据所述每一个候选目标端序列的调整后的翻译概率值,从所述至少两个候选目标端序列中选择输出目标端序列;
输出所述输出目标端序列。
其中,对候选目标端序列的翻译概率值进行调整时,可以直接使用预先设置好的调整因子,也可以使用预先设置的调整算法。其中,使用调整因子可以提高系统的处理效率,使用调整算法可以提高调整后的翻译概率值的准确性。
其中,具体可以采用词向量化技术将源端序列转换为源端向量表示序列。
结合第三方面,在一种可能的实施方式中,所述处理器在根据所述源端向量表示序列获取至少两个候选目标端序列时用于执行如下步骤:
基于注意力机制根据所述源端向量表示序列获取至少两个源端上下文向量;
获取所述至少两个源端上下文向量各自的解码中间状态序列;
获取所述至少两个解码中间状态序列各自的候选目标端序列;
所述处理器在对所述每一个候选目标端序列的翻译概率值进行调整时用于执行如下步骤:
基于所述每一个候选目标端序列的解码中间状态序列对各自的翻译概率值进行调整。
由于解码中间状态序列可以在一定程度上代表对应的候选目标端序列的翻译准确度,因此根据解码中间状态序列对翻译概率值的调整可以提高调整后的翻译概率值的准确性,从而提高最终的目标端序列的准确性。
结合第三方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述至少两个候选目标端序列包括第一候选目标端序列,所述第一候选目标端序列是所述至少两个候选目标端序列中的任意一个;
所述处理器在基于所述每一个候选目标端序列的解码中间状态序列对各自的翻译概率值进行调整时用于执行如下步骤:
基于所述第一候选目标端序列的解码中间状态序列获取所述第一候选目标端序列的重构概率值;
基于所述第一候选目标端序列的重构概率值对所述第一候选目标端序列的翻译概率值进行调整。
结合第三方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述处理器在将所述源端序列转换为源端上下文向量序列时用于执行如下步骤:
基于注意力机制将所述源端序列转换为源端上下文向量序列;
所述处理器在基于所述第一候选目标端序列的解码中间状态获取所述第一候选目标端序列的重构概率值时用于执行如下步骤:
基于反向注意力机制获取所述第一候选目标端序列的重构概率值,所述反向注意力机制的输入是所述第一候选目标端序列的解码中间状态序列,所述反向注意力机制的输出是所述第一候选目标端序列的重构概率值。
结合第三方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述处理器在基于反向注意力机制获取所述第一候选目标端序列的重构概率值时用于执行如下步骤:
根据如下函数获取所述第一候选目标端序列的重构概率值:
Figure PCTCN2017108950-appb-000019
其中,gR()是Softmax函数;
Figure PCTCN2017108950-appb-000020
是通过反向注意力机制总结得到的向量,通过如下的函数获取:
Figure PCTCN2017108950-appb-000021
其中,
Figure PCTCN2017108950-appb-000022
是由反向注意力机制输出的对齐概率,通过如下的函数获取:
Figure PCTCN2017108950-appb-000023
其中,ej,k是源端序列中元素的反向注意力机制得分,通过如下的函数获取:
Figure PCTCN2017108950-appb-000024
Figure PCTCN2017108950-appb-000025
是获取重构概率值的处理的中间状态,通过如下的函数获取:
Figure PCTCN2017108950-appb-000026
xj是所述源端序列中的元素,J表示所述源端序列中元素的数量;si表示所述第一候选目标端序列的解码中间状态序列中的元素,I表示所述第一候选目标端序列的解码中间状态序列中元素的数量;fR是激活函数,R是重构概率值;γ1,γ2和γ3是参数。
由于在获取重构概率值时,会根据解码中间状态序列重构回源端序列,而源端序列是确定的,因此可以根据重构回源端序列的具体情况获取对应的重构概率值,因此获取的重构概率值也能够体现候选目标端序列的准确性,因此根据重构概率值对翻译概率值进行调整能够确保调整后的翻译概率值的准确性,从而提高输出的目标端序列的准确性。
结合第三方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述可执行指令还用于指示所述处理器执行如下步骤:通过端到端学习算法训练获取参数γ1,γ2和γ3
结合第三方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述在通过如下函数训练获取参数γ1,γ2和γ3时执行如下步骤:
Figure PCTCN2017108950-appb-000027
其中,θ和γ是需要训练获取的神经系统的参数,γ表示所述参数γ1,γ2或γ3,N是训练序列集合中训练序列对的数量,Xn是训练序列对中的源端序列,Yn是训练序列对中的目标端序列,sn是Xn转换成Yn时的解码中间状态序列,λ是线性插值。
结合第三方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,所述处理器在基于所述第一候选目标端序列的重构概率值对所述第一候选目标端序列的翻译概率值进行调整时用于执行如下步骤:
对所述第一候选目标端序列的翻译概率值和重构概率值使用线性插值的方式求和,以获取所述第一候选目标端序列的调整后的翻译概率值。
由于翻译概率值和重构概率值都能够在一定程度上体现对应的候选目标端序列相对于源端序列的准确性,因此使用线性插值的方式对二者求和可以很好地对二者进行平衡,从而使调整后的翻译概率值能够更好地体现对应的候选目标端序列的准确性,从而使输出的 目标端序列能够更好地与源端序列吻合。
结合第三方面和前述的第二方面的可能的实施方式中的任意一个,在一种可能的实施方式中,
所述源端序列是一种自然语言文字或基于所述一种自然语言文字获得的文本文件,所述目标端序列是另一种自然语言文字或基于所述另一种自然语言文字获得的文本文件;
所述源端序列是人类的语音内容或基于所述人类的语音内容获得的语音数据文件,所述目标端序列是所述语音内容对应的自然语言文字或基于所述自然语言文字获得的文本文件;
所述源端序列是人类的语音内容或基于所述人类的语音内容获得的语音数据文件,所述目标端序列是所述人类的语音内容的语音回复或基于所述语音回复获得的语音数据文件;
所述源端序列是待摘要的自然语言文字,所述目标端序列是待摘要的自然语言文字的摘要,摘要是自然语言文字或基于所述自然语言文字获得的文本文件;或者
所述源端序列是图像或基于所述图像获得的图像数据文件,所述目标端序列是图像的自然语言说明文字或基于所述自然语言说明文字获得的文本文件。
本发明的第四方面提供了一种序列转换系统,包括输入接口,输出接口以及本发明的第二方面,第二方面的可能的实施方式,第三方面,第三方面的可能的实施方式中的任意一个提供的序列转换装置;
其中,所述输入接口用于接收源端数据并将所述源端数据转换为所述源端序列;
所述输出接口,用于输出所述序列转换装置输出的输出目标端序列。
其中,输入接口和输出接口根据序列转换系统的具体表现形式的不同会有不同,例如在序列转换系统是服务器或部署在云端时,输入接口可以是网络接口,源端数据来自于客户端,源端数据可以是经客户端采集获取的语音数据文件,图像数据文件,文本文件等等;相应的,输出接口也可以是前述的网络接口,用于将所述输出目标端序列输出给所述客户端。
在序列转换系统是手机,终端等本地设备时,输入接口根据需要源端数据的类型不同会有不同,例如在源端数据是自然语言文字时,输入接口可以是键盘,鼠标,触摸屏,手写板等手动输入设备,输出接口可以是网络接口或显示接口等。在源端数据是人类语言时,输入接口可以是麦克风等声音采集设备,输出接口可以是扬声器,网络接口或显示接口等(具体根据输出目标端序列的呈现形式的不同会有不同)。在源端数据是图像数据时,输入接口可以是摄像头等图像采集设备,输出接口可以是网络接口或显示接口等。
本发明的第五方面提供了一种计算机存储介质,用于存储可执行指令,所述可执行指令被执行时可以实现第一方面以及第一方面的可能的实施方式中的任意一种方法。
从本发明实施例提供的以上技术方案可以看出,由于本发明实施例在进行序列转换时,会对候选目标端序列的翻译概率值进行调整,使得调整后的翻译概率值更能够体现目标端序列与源端序列的吻合度,因此在根据调整后的翻译概率值选择输出候选目标端序列时, 能够使得选择的输出目标端向量序列更能够与源端序列吻合,从而使得获取的目标端序列能够更好地忠于源端序列,从而提高目标端序列相对于源端序列的准确性。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获取其他的附图。
图1为本发明的一种计算机的结构图;
图2为本发明一个实施例提供的序列转换方法的流程图;
图3为本发明的一种神经元的结构图;
图4为本发明另一个实施例提供的序列转换方法的流程图;
图5为本发明一个实施例提供的调整方法的流程图;
图6为本发明一个实施例提供的序列转换方法中序列的转换流程图;
图7为本发明一个实施例提供的序列转换装置的结构图;
图8为本发明一个实施例提供的调整单元的结构图;
图9为本发明另一个实施例提供的序列转换装置的结构图;
图10为本发明另一个实施例提供的序列转换装置的结构图;
图11位本发明一个实施例提供的序列转换系统的结构图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获取的所有其他实施例,都属于本发明保护的范围。
本发明实施例提供了序列转换方法,可以应用在机器翻译、语音识别、对话系统、自动摘要、自动问答和图像说明文字生成等等任何需要使用序列转换的应用场景。
其中,机器翻译(Machine Translation,经常简写为MT,俗称机翻)属于计算语言学的范畴,其研究借由计算机程序将文字或演说从一种自然语言(Natural language)翻译成另一种自然语言。自然语言通常是指自然地随文化演化的语言,例如英语、汉语、法语、西班牙语和日语等等都是自然语言。其中,输入的自然语言文字可以由用户手动通过键盘,鼠标,触摸屏,手写板等手动输入设备输入,也可以是由远端设备通过网络输入;输出的自然语言文字可以通过显示屏直接进行呈现,也可以通过网络输出给远端设备由远端设备呈现。
语音识别是通过计算机将人类语音内容转换为自然语言文字。输入的人类语音内容可以由麦克风等声音采集设备采集后输入,也可以由远端设备采集后通过网络输入;输出的 自然语言文字可以通过显示屏直接进行呈现,也可以通过网络输出给远端设备由远端设备呈现。
对话系统是通过计算机与人类进行语音对话,其输入的是人类语音内容,输出的是与输入语音内容对应的语音回复。输入的人类语音内容可以由麦克风等声音采集设备采集后输入,也可以由远端设备采集后通过网络输入;输出的语音回复可以通过扬声器等放音设备直接呈现,也可以通过网络输出给远端设备由远端设备呈现。
自动问答是通过计算机对人类语言问题进行回答。自动问答的输入输出可以与对话系统类似,此处不再赘述。
现在主流商用的语音助手涉及到的应用场景就包括语音识别,对话系统和自动问答等。
自动摘要是通过计算机生成一段自然文字的大意,通常用于提供已知领域的文章摘要,例如生成报纸上某篇文章的摘要。输入的一段自然文字可以由用户手动通过键盘,鼠标,触摸屏,手写板等手动输入设备输入,也可以是由远端设备通过网络输入,也可以通过光学字符识别(OCR,Optical Character Recognition)技术识别输入;输出的摘要由自然语言文字组成,可以通过显示屏直接进行呈现,也可以通过网络输出给远端设备由远端设备呈现。
图像说明文字生成是通过计算机产生一幅图像的说明文字。输入的图像可以由远端设备采集后通过网络输入,也可以由摄像头等图像采集设备采集输入;输出的说明文字由自然语言文字组成,可以通过显示屏直接进行呈现,也可以通过网络输出给远端设备由远端设备呈现。
本发明实施例提供的序列转换方法可以通过计算机实现,具体地可以是由一台通用计算机或专用计算机来实现,也可以通过计算机集群来实现,当然也可以通过云来实现。可以理解的是,不管采用前述哪种方式实现,其最终都可以认为是通过计算机来实现,图1描述了一种计算机100的结构,包括至少一个处理器101,至少一个网络接口104,存储器105,和至少一个通信总线102,用于实现这些装置之间的连接通信。处理器101用于执行存储器105中存储的可执行模块来实现本发明的序列转换方法,其中的可执行模块可以是计算机程序。其中,根据计算机100在系统中的作用,以及序列转换方法的应用场景,该计算机100还可以包括有至少一个输入接口106和至少一个输出接口107。
其中,输入接口106和输出接口107根据序列转换方法的应用场景不同会有不同。例如,在序列转换方法应用于机器翻译时,如果输入的自然语言文字由用户手动通过键盘,鼠标,触摸屏,手写板等手动输入设备输入,则输入接口106需要包括与键盘,鼠标,触摸屏,手写板等手动输入设备进行通信的接口;如果输出的自然语言文字通过显示屏直接进行呈现,则输出接口107需要包括与显示屏通信的接口。
在序列转换方法应用于语音识别时,如果输入的人类语音内容由麦克风等声音采集设备采集后输入,则输入接口106需要包括与麦克风等声音采集设备通信的接口;如果输出的自然语言文字通过显示屏直接进行呈现,则输出接口107需要包括与显示屏通信的接口。
在序列转换方法应用于对话系统时,如果输入的人类语音内容由麦克风等声音采集设备采集后输入,则输入接口106需要包括与麦克风等声音采集设备通信的接口;如果输出 的语音回复通过扬声器等放音设备直接呈现,则输出接口107需要包括与扬声器等放音设备通信的接口。
在序列转换方法应用于自动问答时,如果输入的人类语音内容由麦克风等声音采集设备采集后输入,则输入接口106需要包括与麦克风等声音采集设备通信的接口;如果输出的语音回复通过扬声器等放音设备直接呈现,则输出接口107需要包括与扬声器等放音设备通信的接口。
在序列转换方法应用于自动摘要时,如果输入的一段自然文字由用户手动通过键盘,鼠标,触摸屏,手写板等手动输入设备输入时,则输入接口106需要包括与键盘,鼠标,触摸屏,手写板等手动输入设备进行通信的接口;如果输入的一段自然文字由OCR模块识别输入时,则输出接口107需要包括与OCR模块通信的接口;如果输出的摘要通过显示屏直接进行呈现,则输入输出接口106需要包括与显示屏通信的接口。
在序列转换方法应用于图像说明文字生成时,如果输入的图像由摄像头等图像采集设备采集输入,则输入接口106需要包括与摄像头等图像采集设备通信的接口;如果输出的说明文字通过显示屏直接进行呈现,则输出接口107需要包括与显示屏通信的接口。
如下先介绍本发明实施例提供的序列转换方法,图2描述了本发明一个实施例提供的序列转换方法的流程,该方法可以基于递归神经网络技术实现,如图2所示,该方法包括:
201、接收源端序列。
源端序列根据应用场景的不同会有不同,例如在应用于机器翻译时,源端序列是一种自然语言的文字,可以是一个短语,也可以是一句话,甚至可以是一段文字。在应用于语音识别时,源端序列是一段人类语音内容。在应用于对话系统时,源端序列是一段人类语音内容。在应用于自动问答时,源端序列是一段人类语音内容。在应用于自动摘要时,源端序列是待摘要的一段自然文字。在应用于图像说明文字生成时,源端序列是待生成说明文字的一幅图像。
源端序列的具体接收过程已经在前面进行了详细的描述,此处不再赘述。
在如下的实施例描述中,源端序列用(x1,x2,x3,…,xJ)表示,其中J是源端序列中元素的数量。
202、将源端序列转换为源端向量表示序列。
其中,源端向量表示序列中的元素(源端向量表示)的数量可以为一个或多个,具体数量根据源端序列的情况以及所使用的转换算法的不同会有不同。
具体地,在一种实施方式中,可以通过循环神经网络(Recurrent Neural Network,RNN)将源端序列(x1,x2,x3,…,xJ)转换成源端向量表示序列(h1,h2,h3,…,hJ)。具体可以采用词向量化技术将源端序列转换成源端向量表示序列。
其中,在应用场景为机器翻译时,在将源端序列转换为源端向量表示序列时,可以采用词向量化的方式,词向量化是指通过对大量文本的学习(不需要标注),根据前后文自动学习到每个词的语义,然后把每个词映射为一个实数向量形式的表达,在这种情况下,每个词向量会被转换成一个源端向量表示,例如vec(中国)减去vec(北京)约等于vec(英国)减 去vec(伦敦)。在词向量的基础上,可扩展地把一句话映射为一个向量,甚至把一段文字映射为一个向量。
可以理解的是,在应用场景不是机器翻译时,也可以采用与上述词向量类似的方式将源端序列转换为源端向量表示序列。
203、根据所述源端向量表示序列获取至少两个候选目标端序列,以及所述至少两个候选目标端序列中每一个候选目标端序列的翻译概率值。
具体地,可以通过RNN来获取候选目标端序列,以及候选目标端序列的翻译概率值。
在一种可能的实施方式中,可以基于注意力机制根据所述源端向量表示序列获取至少两个源端上下文向量;再获取所述至少两个源端上下文向量各自的解码中间状态序列;然后再获取所述至少两个解码中间状态序列各自的候选目标端序列。
其中,在基于注意力机制将源端向量表示序列(h1,h2,h3,…,hJ)转换成源端上下文向量(c1,c2,…,cI)时,具体可以通过注意力机制为源端向量表示序列(h1,h2,h3,…,hJ)中每一个向量赋予一个权重(该权重可以从训练语料中自动学习到的),该权重表示源端向量表示序列(h1,h2,h3,…,hJ)中的向量与即将生成的目标端词的对齐概率,对源端向量表示序列中每一个向量加权后获取向量组成的序列即为源端上下文序列。其中,I和J的取值可以不同。
在另一些实施方式中,可以将源端向量表示序列中最后一个向量作为源端上下文向量;也可以使用注意力机制根据具体需要在不同解码时刻将源端向量表示序列中的所有向量使用加权和的方式总结成一个源端上下文向量;也可以使用卷积神经网络(Convolutional Neural Network)将源端序列总结成一个源端上下文向量。
其中,在一些实施方式中,具体的候选目标端序列中目标端词的获取过程可以包括如下两个步骤:
1.更新当前的解码中间状态。其中,解码中间状态是过去翻译信息的总结,具体可以使用通用的RNN模型来进行更新。该更新步骤的输入包括:当前的源端上下文向量(ci),前一个的解码中间状态(si-1),以及前一个的目标端词(yi-1);该更新步骤的输出包括当前的解码中间状态(si)。其中,需要说明的是,在当前的解码中间状态是s1时,由于没有前一个解码中间状态可以输入,因此在实际实现过程中,可以预先设置一个初始化解码中间状态,该初始化解码中间状态携带的信息可以为空(即全零),也可以是预先设置的信息。
2.生成当前的目标端词。该生成步骤的输入包括:当前的源端上下文向量(ci),当前的解码中间状态(si)和前一个的目标端词(yi-1);该生成步骤的输出包括当前的目标端词(yi)和翻译概率值。其中,翻译概率值表示的是对应的目标端词与源端上下文向量的吻合度,根据具体的算法不同,翻译概率值可以是越高越好,也可以是越低越好,还可以是与预先设置的一个基准值越接近越好。其中,需要说明的是,在当前的目标端词是y1时,由于没有前一个目标端词可以输入,因此在实际实现过程中,可以预先设置一个初始化目标端词,该初始化目标端词携带的信息可以为空(即全零),也可以是预先设置的信息。
在目标端序列中的每一个目标端词都确定了之后,可以直接将每一个目标端词对应的翻译概率值相乘得到该目标端序列的翻译概率值;也可以将各个目标端词对应的翻译概率 值的乘积,再基于目标端词的数量对乘积进行归一化获得的值作为该目标端序列的翻译概率值。
可以理解但是,如上两个步骤可能会循环执行多次,循环执行的次数根据源端上下文向量序列中元素(即源端上下文向量)的数量的不同会有不同。
其中,目标端序列中元素的表现形式跟目标端序列的表现形式的不同会有不同,例如,在目标端序列是自然语言文字序列时,目标端序列中元素的表现形式可以是词向量,对应的是一个或一组目标端词。
具体的目标端序列的获取过程根据应用场景的不同会有不同,在各个应用场景下都有对应的主流通用技术可以使用,本发明并不对具体的预测方法进行限定。
需要说明的是,为了所述至少两个候选目标端序列可以是并行获取的,即可以同时执行获取至少两个候选目标端序列的获取过程;也可以是串行获取的,即在获取了一个候选目标端序列后再去获取另一个候选目标端序列。本发明实施例不对具体的获取顺序进行限定,只要能够获取至少两个候选目标端序列都不会影响本发明实施例的实现。
204.对每一个候选目标端序列的翻译概率值进行调整。
其中,在对候选目标端序列的翻译概率值进行调整时,可以根据预先设置的方式进行。一种可能的实施方式是预先设置好一些调整因子,在对翻译概率值进行调整时可以直接使用这些预先设置好的调整因子对翻译概率值进行调整,从而可以获取调整后的翻译概率值;另一种可能的实施方式是预先设置好调整因子的获取算法,该获取算法的输入可以是候选目标端序列,或候选目标端序列的翻译概率值,或者也可以是获取候选目标端序列的过程中的一些中间信息,所述的中间信息可以是解码中间状态序列,和/或目标端序列等。在本发明的一种实施方式中,该中间信息可以是解码中间状态序列,解码中间状态序列与候选目标端序列时一一对应的关系,解码中间状态序列包括了获取对应的候选目标端序列时获取的解码中间状态。
其中,使用调整因子可以提高系统的处理效率,使用调整算法可以进一步提高调整的准确性,从而提高输出目标端序列与源端序列的吻合度。
需要说明的是,在对至少两个候选目标端序列的翻译概率值进行调整时,可以同时执行该至少两个翻译概率值的调整过程,也可以在调整完一个翻译概率值后再调整另一个翻译概率值。本发明实施例并不对具体的调整顺序进行限定。
205.根据每一个候选目标端序列的调整后的翻译概率值,从所述的至少两个候选目标端序列中选择输出目标端序列。
具体地,由于翻译概率值表示了对应的候选目标端序列与源端序列的吻合度(忠诚度越好,以及流畅度越高则吻合度越高),因此在从至少两个候选目标端序列中选择输出目标端序列时,根据翻译概率值的大小与吻合度的对应关系,可以直接选择翻译概率值最高的候选目标端序列,或者选择翻译概率值最低的候选目标端序列,或者选择翻译概率值与预先设置的基准值最接近的候选目标端序列。
206、输出选择的输出目标端序列。
具体的输出过程根据应用场景的不同会有不同,在各个应用场景下都有对应的主流通用技术可以使用,本发明并不对具体的输出方法进行限定。
从上可知,本发明在进行序列转换时,会对候选目标端序列的翻译概率值进行调整,使得调整后的翻译概率值更能够体现目标端序列与源端序列的吻合度,因此在根据调整后的翻译概率值选择输出候选目标端序列时,能够使得选择的输出目标端序列更能够与源端序列吻合,从而使得获取的目标端序列能够更好地忠于源端序列同时不会影响流畅度,从而提高目标端序列相对于源端序列的准确性,其中,准确性包括忠诚度和流畅度。
神经元是一种最简单的神经网络,神经网络是20世纪80年代以来人工智能领域兴起的研究热点。它从信息处理角度对人脑神经元网络进行抽象,建立某种简单模型,按不同的连接方式组成不同的网络。在工程与学术界也常直接简称为神经网络或类神经网络。神经网络是一种运算模型,由大量的神经元(节点)相互联接构成。每个节点代表一种特定的输出函数,称为激活函数(activation function)。每两个节点间的连接都代表一个对于通过该连接信号的加权值,称之为权重,这相当于人工神经网络的记忆。网络的输出则依网络的连接方式,权重值和激活函数的不同而不同。
图3描述了一种神经元的结构。如图3所示,该神经元是一个以x1,x2,x3及截距+1为输入值的运算单元,神经元的具体表现形式可以为激活函数,例如,图3所示的神经元的输出可以表示成如下的函数(1)的形式:
Figure PCTCN2017108950-appb-000028
其中,Wi为权重向量,b为偏置单元,函数f为激活函数,在一些实施方式中,激活函数可以用sigmoid函数实现,一种典型的sigmoid函数的表现形式如下的函数(2)所示:
Figure PCTCN2017108950-appb-000029
在涉及自然语言文字处理的序列到序列转换中,源端序列的长度和目标端序列的长度都不是固定的。本发明的一个实施例采用循环神经网络(RNN,Recurrent Neural Network)来处理这种变长的源端序列和目标端序列。RNN的主要思想是循环地将输入的序列压缩成一个固定维度的向量,该固定维度向量也称为中间状态。具体地,RNN循环读取输入序列,根据当前源端上下文向量和前一解码中间状态计算当前源端上下文向量对应的解码中间状态,再根据当前解码中间状态和当前源端上下文向量来预测当前源端上下文向量对应的目标端上词。在本发明的一个实施例中,还可以使用当前源端上下文向量,前一解码中间状态以及前一目标端词来计算当前源端上下文向量对应的解码中间状态,再根据当前解码中间状态,当前源端上下文向量以及前一目标端词来预测当前源端上下文向量对应的目标端词。最后可以根据获得的所有目标端词获取最终的目标端序列。
图4描述了本发明另一个实施例提供的序列转换方法的流程,包括:
401、接收源端序列。
402、将源端序列转换为源端向量表示序列。
403、根据所述源端向量表示序列获取至少两个候选目标端序列,以及所述至少两个候选目标端序列中每一个候选目标端序列的翻译概率值以及解码中间状态序列。
其中,步骤401-403的具体实现可以参考步骤201-203的具体实现,此处不再赘述。
404、基于每一个候选目标端序列的解码中间状态序列对各自的翻译概率值进行调整。
具体地,可以根据预先设置的方法对翻译概率值进行调整。
例如,图5描述了本发明一个实施例提供的调整方法的流程,包括:
4041.基于第一候选目标端序列的解码中间状态序列获取第一候选目标端序列的重构概率值;该第一候选目标端序列是所述的至少两个候选目标端序列中的任意一个。
其中,在获取重构概率值时具体可以通过注意力机制来获取,例如可以采用主流的是反向注意力(inverse attention)机制来获取重构概率值。
在通过反向注意力机制来获取重构概率值时,可以将第一候选目标端序列的解码中间状态序列(s1,s2,…,sI)作为该反向注意力机制的输入,该反向注意力机制的输出即是第一候选目标端序列的重构概率值。其中,重构是指的将解码中间状态序列(s1,s2,…,sI)强制反向解码至源端序列(x1,x2,…,xJ),由于源端序列是已知的,因此可以确定将解码中间状态序列(s1,s2,…,sI)强制反向解码至源端序列(x1,x2,…,xJ)的重构概率值,在具体的重构过程中,可以分别获得重构回x1,x2,…,xJ的J个重构概率值,然后基于该J个重构概率值得到最终的重构概率值。重构概率值表示的是解码中间状态序列与源端序列之间的吻合度,具体可以是重构概率值越高吻合度越高,或者是重构概率值越低则吻合度越高,或者是重构概率值与预先设置的基准值越接近则吻合度越高。
在一种实施方式中,具体可以根据如下函数(3)来获取第一候选目标端序列的重构概率值:
Figure PCTCN2017108950-appb-000030
其中,gR()是Softmax函数,Softmax函数可以对重构获取的源端序列中所有词向量进行归一化操作获取每个词向量的重构概率值,再通过上述函数(3)就可以确定第一候选目标端序列的重构概率值。其中,Softmax函数是神经网络中通用的一个函数,此处不对其进行赘述。
Figure PCTCN2017108950-appb-000031
是通过反向注意力机制总结得到的向量,可以通过如下的函数(4)获取:
Figure PCTCN2017108950-appb-000032
其中,
Figure PCTCN2017108950-appb-000033
是由反向注意力机制输出的对齐概率,可以通过如下的函数(5)获取:
Figure PCTCN2017108950-appb-000034
其中,ej,k是源端序列中元素的反向注意力机制得分,,可以通过如下的函数(6)获取:
Figure PCTCN2017108950-appb-000035
Figure PCTCN2017108950-appb-000036
是重构过程的中间状态,可以通过如下的函数(7)获取:
Figure PCTCN2017108950-appb-000037
xj是所述源端序列中的元素,J表示所述源端序列中元素的数量;si表示所述第一候选目标端序列的解码中间状态序列中的元素,I表示所述第一候选目标端序列的解码中间状态序列中元素的数量;fR是激活函数,R是重构概率值;γ1是函数(3)的参数,γ2是函数(6)的参数,γ3是函数(7)的参数;在一些实施方式中,γ1,γ2和γ3可以是不同的参数,在一些实施方式中,γ1,γ2和γ3也可以是部分不同的参数,在一些实施例中γ1,γ2和γ3还可以是相同的参数;γ1,γ2和γ3可以通过端到端学习算法训练获取。具体的端到端学习算法可以预先设定。
4042.基于第一候选目标端序列的重构概率值对第一候选目标端序列的翻译概率值进行调整。
其中,具体可以对所述第一候选目标端序列的翻译概率值和重构概率值使用线性插值的方式求和,以获取所述第一候选目标端序列的调整后的翻译概率值;在一些实施方式中,可以直接将线性插值求和的结果作为调整后的翻译概率值,在一些实施方式中,也可以对线性插值求和的结果做进一步的处理,将处理后的结果作为调整后的翻译概率值。也可以对所述第一候选目标端序列的翻译概率值和重构概率值使用加权平均的方式求和,以获取所述第一候选目标端序列的调整后的翻译概率值。还可以直接将第一候选目标端序列的翻译概率值和重构概率值相加,将得到的和作为所述第一候选目标端序列的调整后的翻译概率值。
由于翻译概率值和重构概率值都能够在一定程度上体现对应的候选目标端序列相对于源端序列的准确性,因此使用线性插值的方式对二者求和可以很好地对二者进行平衡,从而使调整后的翻译概率值能够更好地体现对应的候选目标端序列的准确性,从而使最终得到的目标端序列能够更好地与源端序列吻合。
其中,步骤4041和4042可以针对每一个候选目标端序列均执行一次,即有几个候选目标端序列就循环执行几次,从而获取每个候选目标端序列的调整后的翻译概率值。
405.根据每一个候选目标端序列的调整后的翻译概率值,从所述的至少两个候选目标端序列中选择输出目标端序列。
406、输出选择的输出目标端序列。
其中,步骤405和406的具体实现可以参考步骤205和206的具体实现,此处不再赘述。
从上可知,本实施例在进行序列转换时,会对候选目标端序列的翻译概率值进行调整,使得调整后的翻译概率值更能够体现目标端序列与源端序列的吻合度,因此在根据调整后 的翻译概率值选择输出候选目标端序列时,能够使得选择的输出目标端序列更能够与源端序列吻合,从而使得获取的目标端序列能够更好地忠于源端序列,从而提高目标端序列相对于源端序列的准确性;同时,由于候选目标端序列的获取需要基于对应的解码中间状态序列,因此增加获取所述每一个候选目标端序列的解码中间状态序列的步骤并不会实质增加序列转换装置的处理负荷;同时,由于解码中间状态序列可以在一定程度上代表对应的候选目标端序列的翻译准确度,因此根据解码中间状态序列对翻译概率值的调整可以提高调整后的翻译概率值的准确性,从而提高最终的目标端序列的准确性。
如上所述,所述的参数γ1,γ2和γ3是通过端到端的学习算法训练获取的,在本发明的一种实施方式中,所述的参数γ1,γ2和γ3具体可以是通过如下的函数(8)训练获取的:
Figure PCTCN2017108950-appb-000038
其中,θ和γ是需要训练获取的神经系统的参数,γ表示所述参数γ1,γ2或γ3,N是训练序列集合中训练序列对的数量,Xn是训练序列对中的源端序列,Yn是训练序列对中的目标端序列,sn是Xn转换成Yn时的解码中间状态序列,λ是线性插值。其中,λ可以人工提前设定,也可以通过函数控制通过训练得到。可以看出,函数(8)包含两部分:(likelihood)概率和重构(reconstruction)概率,其中似然概率可以很好地评估翻译的流畅度,而重构概率可以评价翻译的忠诚度。将两者结合起来可以更好地评估翻译的质量,从而有效引导参数训练以生成更好的翻译结果。其中,训练序列对的具体表现形式根据序列转换方法应用的具体场景不同会有不同;例如,在应用场景为机器翻译时,每一个训练序列对都是一对互为翻译的自然语言句子。
在一些实施方式中,为了在进行序列转换时能够更好地将解码中间状态重构回源端序列,在训练获取参数γ1,γ2和γ3时,可以鼓励解码器中间状态尽可能包含完整的源端信息,从而提高目标端序列的忠诚度。
图6描述了本发明一个实施例提供的序列转换方法中序列的转换流程,如图6所示,
在执行本发明的序列转换方法过程中,包括了如下的序列转换:
A,源端序列(x1,x2,x3,…,xJ)先被转换成了源端向量表示序列(h1,h2,h3,…,hJ)。该过程具体可以采用词向量化技术实现。
B,源端向量表示序列(h1,h2,h3,…,hJ)通过注意力机制转换成了源端上下文向量(c1,c2,…,cI);需要注意的是,I和J的取值可以相同也可以不同。
C,通过源端上下文向量(c1,c2,…,cI)获得了对应的解码中间状态序列(s1,s2,…,sI),从图中可以看出,在获得当前的解码中间状态si时参考了前一个的解码中间状态si-1其中,1≤i≤I。需要说明的是,由于在获得解码中间状态s1时并没有可以参考的前一个的解码中间状态,此时参考预先设置好的初始化解码中间状态,该初始化解码中间状态可以携带空信息(即全零),也可以是预先设置好的信息,本发明实施例不对携带的信息进行具体限定。
D,基于解码中间状态序列(s1,s2,…,sI)和源端上下文向量序列(c1,c2,…,cI)获取候选目标 端序列(y1,y2,…,yI);同时,在这个步骤会同时输出该候选目标端序列(y1,y2,…,yI)的翻译概率值,具体地,在获取目标端序列(y1,y2,…,yI)的过程中,可以分别计算通过s1和c1得到目标端词y1的翻译概率值1,通过s2和c2得到目标端词y2的翻译概率值2,……,通过sJ和cJ得到目标端词yJ的翻译概率值J,然后就可以基于得到的翻译概率值1,翻译概率值2,……,翻译概率值J获得候选目标端序列(y1,y2,…,yI)的翻译概率值。具体地,可以将翻译概率值1,翻译概率值2,……,和翻译概率值J相乘得到最终的翻译概率值;也可以在得到翻译概率值1,翻译概率值2,……,和翻译概率值J的乘积后,再基于J的值对乘积进行归一化获得最终的翻译概率值。
E,基于反向注意力机制将解码中间状态序列(s1,s2,…,sI)转换为重构源端上下文向量
Figure PCTCN2017108950-appb-000039
Figure PCTCN2017108950-appb-000040
F,基于重构源端上下文向量
Figure PCTCN2017108950-appb-000041
获取重构中间状态序列
Figure PCTCN2017108950-appb-000042
其中,如图6所示,在获取当前的重构源端向量表示
Figure PCTCN2017108950-appb-000043
时参考了前一个重构中间状态
Figure PCTCN2017108950-appb-000044
其中,1≤j≤J。需要说明的是,由于在获取重构中间状态
Figure PCTCN2017108950-appb-000045
时并没有可以参考的前一个重构中间状态,此时参考预先设置好的初始化重构中间状态,该初始化重构中间状态可以携带空信息(即全零),也可以是预先设置好的信息,本发明实施例不对携带的信息进行具体限定。
G,基于重构中间状态序列
Figure PCTCN2017108950-appb-000046
和重构源端上下文向量
Figure PCTCN2017108950-appb-000047
获取源端序列(x1,x2,x3,…,xJ)。由于源端序列(x1,x2,x3,…,xJ)是已知的,因此在通过重构中间状态序列
Figure PCTCN2017108950-appb-000048
Figure PCTCN2017108950-appb-000049
和重构源端上下文向量
Figure PCTCN2017108950-appb-000050
输出源端序列(x1,x2,x3,…,xJ)时,可以分别计算通过
Figure PCTCN2017108950-appb-000051
Figure PCTCN2017108950-appb-000052
得到x1的重构概率值1,通过
Figure PCTCN2017108950-appb-000053
Figure PCTCN2017108950-appb-000054
得到x2的重构概率值2,……,通过
Figure PCTCN2017108950-appb-000055
Figure PCTCN2017108950-appb-000056
得到xJ的重构概率值J,然后就可以基于得到的重构概率值1,重构概率值2,……,重构概率值J获得通过解码中间状态序列重构回源端序列的重构概率值。具体地,可以将重构概率值1,重构概率值2,……,和重构概率值J相乘得到最终的重构概率值;也可以在得到重构概率值1,重构概率值2,……,和重构概率值J的乘积后,再基于J的值对乘积进行归一化获得最终的重构概率值。
可以理解的是,为了获取至少两个候选目标端序列,步骤B-D需要执行至少两次,该至少两次执行可以是同时进行的,也可以是顺序进行的,本发明不对具体的执行顺序进行限定,只要能够得到至少两个候选目标端序列都不会影响本发明实施例的实现。同理,为了获得与所述两个候选目标端序列分别对应的至少两个重构概率值,步骤E-G也需要执行至少两次,该至少两次执行可以是同时进行的,也可以是顺序进行的,本发明不对具体的执行顺序进行限定,只要能够得到至少两个重构概率值都不会影响本发明实施例的实现。
如上对本发明提供的序列转换方法进行了详细描述,为了验证本发明提供的序列转换方法的有效性,发明人在应用场景为机器翻译的情况下对本发明提供的序列转换方法进行了测试,以测试将汉语翻译成英语的准确性。为了表示测试的公平,现有技术和本发明都是在神经网络下实现的,其中,现有技术采用的是标准神经网络机器翻译(NMT)系统,本发明测试结果是在NMT的基础上增加实现本发明方案获取的。
表1描述了在具有125万个训练序列对的训练序列集合进行了训练获取了参数γ1,γ2 或γ3,然后在标准的公开测试集上使用测试本发明技术和现有技术获取的双语评估替代(BLEU,Bilingual Evaluation Understudy)得分。
表1
Figure PCTCN2017108950-appb-000057
其中,表1中Beam是搜索空间,Tuning是开发集,MT05,MT06和MT08是三种不同的测试集,All表示的是测试集序列,Oracle表示的是理论最优值。
从表1可以看出,本发明技术的BLEU得分在每一个条件下都高于现有技术,平均来看,相比现有技术提高了2.3个BLEU得分。同时,需要注意的是,现有技术在增加搜索空间(表1中的Beam)时,翻译质量反而下降,而本发明也很好地克服了现有技术的这个缺陷,即搜索空间越大,翻译质量也越好。
为了进一步地验证本发明技术的效果,发明人测试了将本发明技术应用在训练和在线测试时分别的效果,结果如表2所示。
表2
Figure PCTCN2017108950-appb-000058
从表2可以看出,在仅将本发明技术应用在训练时,BLEU得分就已经高于现有技术了,即仅仅将本发明技术应用在训练就可以提高序列转换的质量。在将将本发明技术应用在训练和测试时,可以进一步提高序列转换的质量。
为了更全面的评估本发明技术的效果,发明人评估了序列转换(机器翻译)时遗漏翻译和过度翻译问题的情况,结果如表3所示。
表3
Model 遗漏翻译 过度翻译
现有技术 18.2% 3.9%
本发明技术 16.2% 2.4%
从表3可以看出,使用本发明技术后,相对现有技术可以减少11.0%的遗漏翻译,以及减少38.5%的过度翻译,效果提升显著。
进一步地,发明人测试了本发明技术与现有的相关增强技术的兼容度,发明人具体测 试了本发明技术与覆盖率模型(Coverage Model)和上下文门(Context Gates)机制的兼容度,结果如表4所示。
表4
Figure PCTCN2017108950-appb-000059
从表4可以看出,本发明技术与现有的相关增强技术,即覆盖率模型和上下文门机制的技术兼容性比较好,在应用本发明技术之后都能够提高BLEU得分,因此能够与现有的相关增强技术互补,进一步提高序列转换(机器翻译)的质量。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
图7描述了本发明一个实施例提供的序列转换装置500的结构,如图7所示,该序列转换装置500包括:
接收单元501,用于接收源端序列。具体实现可以参考步骤201的描述,此处不再赘述。
转换单元502,用于将所述源端序列转换为源端向量表示序列。具体实现可以参考步骤202的描述,此处不再赘述。
获取单元503,用于根据所述源端向量表示序列获取至少两个候选目标端序列,以及所述至少两个候选目标端序列中每一个候选目标端序列的翻译概率值。具体实现可以参考步骤203的描述,此处不再赘述。
调整单元504,用于对所述每一个候选目标端序列的翻译概率值进行调整。具体实现可以参考步骤204的描述,此处不再赘述。
选择单元505,用于根据所述每一个候选目标端序列的调整后的翻译概率值,从所述至少两个候选目标端序列中选择输出目标端序列。具体实现可以参考步骤205的描述,此处不再赘述。
输出单元506,用于输出所述输出目标端序列。具体实现可以参考步骤206的描述,此处不再赘述。
从上可知,本发明在进行序列转换时,会对候选目标端序列的翻译概率值进行调整,使得调整后的翻译概率值更能够体现目标端序列与源端序列的吻合度,因此在根据调整后的翻译概率值选择输出候选目标端序列时,能够使得选择的输出目标端序列更能够与源端序列吻合,从而使得获取的目标端序列能够更好地忠于源端序列,从而提高目标端序列相对于源端序列的准确性。
在本发明的一些实施方式中,图7中的获取单元503可以具体用于基于注意力机制根据所述源端向量表示序列获取至少两个源端上下文向量;获取所述至少两个源端上下文向量各自的解码中间状态序列;获取所述至少两个解码中间状态序列各自的候选目标端序列。相应的,调整单元504,可以具体用于基于所述每一个候选目标端序列的解码中间状态序列对各自的翻译概率值进行调整。
由于候选目标端序列的获取需要基于对应的解码中间状态序列,因此基于解码中间状态序列对翻译概率值进行调整并不会进一步增加序列转换装置的处理负荷;同时,由于解码中间状态序列可以在一定程度上代表对应的候选目标端序列的翻译准确度,因此根据解码中间状态序列对翻译概率值的调整可以提高调整后的翻译概率值的准确性,从而提高最终的目标端序列的准确性。
如图8所示,在一种具体实施方式中,本发明实施例提供的序列转换装置所包括的调整单元504具体可以包括:获取子单元5041,用于基于所述第一候选目标端序列的解码中间状态序列获取所述第一候选目标端序列的重构概率值,所述第一候选目标端序列是所述至少两个候选目标端序列中的任意一个;调整子单元5042,用于基于所述第一候选目标端序列的重构概率值对所述第一候选目标端序列的翻译概率值进行调整。
其中,在一种具体实施方式中,获取子单元5041可以具体用于:基于反向注意力机制获取所述第一候选目标端序列的重构概率值,所述反向注意力机制的输入是所述第一候选目标端序列的解码中间状态序列,所述反向注意力机制的输出是所述第一候选目标端序列的重构概率值。
在一种实施方式中,获取子单元5041可以用于根据如下函数获取所述第一候选目标端序列的重构概率值:
Figure PCTCN2017108950-appb-000060
其中,gR()是Softmax函数;
Figure PCTCN2017108950-appb-000061
是通过反向注意力机制总结得到的向量,可以通过如下的函数获取:
Figure PCTCN2017108950-appb-000062
其中,
Figure PCTCN2017108950-appb-000063
是由反向注意力机制输出的对齐概率,可以通过如下的函数获取:
Figure PCTCN2017108950-appb-000064
其中,ej,k是源端序列中元素的反向注意力机制得分,可以通过如下的函数获取:
Figure PCTCN2017108950-appb-000065
Figure PCTCN2017108950-appb-000066
是获取重构概率值时的中间状态,可以通过如下的函数获取:
Figure PCTCN2017108950-appb-000067
xj是所述源端序列中的元素,J表示所述源端序列中元素的数量;si表示所述第一候选目标端序列的解码中间状态序列中的元素,I表示所述第一候选目标端序列的解码中间状态序列中元素的数量;fR是激活函数,R是重构概率值;γ1,γ2和γ3是参数。
其中,在一种实施方式中,所述的调整子单元5042,具体用于对所述第一候选目标端序列的翻译概率值和重构概率值使用线性插值的方式求和,以获取所述第一候选目标端序列的调整后的翻译概率值。
由于翻译概率值和重构概率值都能够在一定程度上体现对应的候选目标端序列相对于源端序列的准确性,因此使用线性插值的方式对二者求和可以很好地对二者进行平衡,从而使调整后的翻译概率值能够更好地体现对应的候选目标端序列的准确性,从而使最终得到的目标端序列能够更好地与源端序列吻合。
图9描述了本发明另一个实施例提供的序列转换装置800的结构,图9描述的序列转换装置800与图7描述的序列转换装置500相比增加了训练单元801,用于通过端到端学习算法训练获取所述参数γ1,γ2和γ3;获取单元503就可以通过训练单元训练获取的参数γ1,γ2和γ3来获取候选目标端序列。其余的输入单元501,转换单元502,获取单元503,调整单元504,选择单元505,和输出单元506的实现可以参考前面的描述,不再赘述。
在一种实施方式中,训练单元801,具体用于通过如下函数训练获取所述参数γ1,γ2和γ3
Figure PCTCN2017108950-appb-000068
其中,θ和γ是需要训练获取的神经系统的参数,γ表示所述参数γ1,γ2或γ3,N是训练序列集合中训练序列对的数量,Xn是训练序列对中的源端序列,Yn是训练序列对中的目标端序列,sn是Xn转换成Yn时的解码中间状态序列,λ是线性插值。
图10描述了本发明另一个实施例提供的序列转换装置900的结构,包括至少一个处理器902(例如CPU),至少一个网络接口905或者其他通信接口,存储器906,和至少一个通信总线903,用于实现这些装置之间的连接通信。处理器902用于执行存储器906中存储的可执行模块,例如计算机程序。存储器906可能包含高速随机存取存储器(RAM:Random Access Memory),也可能还包括非不稳定的存储器(non-volatile memory),例如至 少一个磁盘存储器。通过至少一个网络接口705(可以是有线或者无线)实现该系统网关与至少一个其他网元之间的通信连接,可以使用互联网,广域网,本地网,城域网等。
在一些实施方式中,存储器906存储了程序9061,程序9061可以被处理器902执行,这个程序被执行时可以执行上述本发明提供的序列转换方法。
在本发明的实施例中,在序列转换方法/装置应用于机器翻译时,源端序列是一种自然语言文字或基于所述一种自然语言文字获得的文本文件,目标端序列是另一种自然语言文字或基于所述另一种自然语言文字获得的文本文件;
在序列转换方法/装置应用于语音识别时,源端序列是人类的语音内容或基于所述人类的语音内容获得的语音数据文件,目标端序列是所述语音内容对应的自然语言文字或基于所述自然语言文字获得的文本文件;
在序列转换方法/装置应用于自动对话时,源端序列是人类的语音内容或基于所述人类的语音内容获得的语音数据文件,目标端序列是所述人类的语音内容的语音回复或基于所述语音回复获得的语音数据文件;
在序列转换方法/装置应用于自动摘要时,源端序列是待摘要的自然语言文字,目标端序列是待摘要的自然语言文字的摘要,摘要是自然语言文字或基于所述自然语言文字获得的文本文件;
在序列转换方法/装置应用于图像说明文字生成时,源端序列是图像或基于所述图像获得的图像数据文件,目标端序列是图像的自然语言说明文字或基于所述自然语言说明文字获得的文本文件。
本发明实施例提供的序列转换装置在进行序列转换时,会对候选目标端序列的翻译概率值进行调整,使得调整后的翻译概率值更能够体现目标端序列与源端序列的吻合度,因此在根据调整后的翻译概率值选择输出候选目标端序列时,能够使得选择的输出目标端序列更能够与源端序列吻合,从而使得获取的目标端序列能够更好地忠于源端序列,从而提高目标端序列相对于源端序列的准确性。
图11描述了本发明一个实施例提供的序列转换系统1100的结构,如图11所示,该系统包括:
输入接口1101,输出接口1103以及序列转换装置1102;其中,序列转换装置1102可以是本发明实施例提供的任意一个序列转换装置,此处不再赘述其功能和实现。
其中,所述输入接口1101用于接收源端数据并将所述源端数据转换为所述源端序列;转换获得的源端序列可以输入所述序列转换装置1102;
其中将源端数据转换为源端序列的具体处理过程根据源端数据的呈现形式的不同会有不同,例如源端数据是人类语音时,则会将人类语音转换成语音数据文件作为源端序列;源端数据是图像时,则会将图像转换成图像数据文件作为源端序列;源端数据是自然语言文字时,则会将自然语言文字转换为文本文件作为源端序列。可以理解的是,具体的转换过程可以使用现有的通用公知技术,本发明并不对具体的转换过程进行限定。
所述输出接口1103,用于输出所述序列转换装置1102输出的输出目标端序列。
其中,输入接口1101和输出接口1103根据序列转换系统的具体表现形式的不同会有不同,例如在序列转换系统是服务器或部署在云端时,输入接口1101可以是网络接口,源端数据来自于客户端,源端数据可以是经客户端采集获取的语音数据文件,图像数据文件,文本文件等等;相应的,输出接口1103也可以是前述的网络接口,用于将所述输出目标端序列输出给所述客户端。
上述装置和系统内的各模块之间的信息交互、执行过程等内容,由于与本发明方法实施例基于同一构思,具体内容可参见本发明方法实施例中的叙述,此处不再赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,上述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,上述的存储介质可为磁碟、光盘、只读存储记忆体(ROM:Read-Only Memory)或RAM等。
本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (17)

  1. 一种序列转换方法,其特征在于,包括:
    接收源端序列;
    将所述源端序列转换为源端向量表示序列;
    根据所述源端向量表示序列获取至少两个候选目标端序列,以及所述至少两个候选目标端序列中每一个候选目标端序列的翻译概率值;
    对所述每一个候选目标端序列的翻译概率值进行调整;
    根据所述每一个候选目标端序列的调整后的翻译概率值,从所述至少两个候选目标端序列中选择输出目标端序列;
    输出所述输出目标端序列。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述源端向量表示序列获取至少两个候选目标端序列包括:
    基于注意力机制根据所述源端向量表示序列获取至少两个源端上下文向量;
    获取所述至少两个源端上下文向量各自的解码中间状态序列;
    获取所述至少两个解码中间状态序列各自的候选目标端序列;
    所述对所述每一个候选目标端序列的翻译概率值进行调整包括:
    基于所述每一个候选目标端序列的解码中间状态序列对各自的翻译概率值进行调整。
  3. 根据权利要求2所述的方法,其特征在于,所述至少两个候选目标端序列包括第一候选目标端序列,所述第一候选目标端序列是所述至少两个候选目标端序列中的任意一个;
    所述基于所述每一个候选目标端序列的解码中间状态序列对各自的翻译概率值进行调整包括:
    基于所述第一候选目标端序列的解码中间状态序列获取所述第一候选目标端序列的重构概率值;
    基于所述第一候选目标端序列的重构概率值对所述第一候选目标端序列的翻译概率值进行调整。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述第一候选目标端序列的解码中间状态获取所述第一候选目标端序列的重构概率值包括:
    基于反向注意力机制获取所述第一候选目标端序列的重构概率值,所述反向注意力机制的输入是所述第一候选目标端序列的解码中间状态序列,所述反向注意力机制的输出是所述第一候选目标端序列的重构概率值。
  5. 根据权利要求4所述的方法,其特征在于,所述基于反向注意力机制获取所述第一候选目标端序列的重构概率值包括:
    根据如下函数获取所述第一候选目标端序列的重构概率值:
    Figure PCTCN2017108950-appb-100001
    其中,gR()是Softmax函数;
    Figure PCTCN2017108950-appb-100002
    是通过反向注意力机制总结得到的向量,通过如下的函数获取:
    Figure PCTCN2017108950-appb-100003
    其中,
    Figure PCTCN2017108950-appb-100004
    是由反向注意力机制输出的对齐概率,通过如下的函数获取:
    Figure PCTCN2017108950-appb-100005
    其中,ej,k是源端序列中元素的反向注意力机制得分,通过如下的函数获取:
    Figure PCTCN2017108950-appb-100006
    Figure PCTCN2017108950-appb-100007
    是获取重构概率值时的中间状态,通过如下的函数获取:
    Figure PCTCN2017108950-appb-100008
    xj是所述源端序列中的元素,J表示所述源端序列中元素的数量;si表示所述第一候选目标端序列的解码中间状态序列中的元素,I表示所述第一候选目标端序列的解码中间状态序列中元素的数量;fR是激活函数,R是重构概率值;γ1,γ2和γ3是参数。
  6. 根据权利要求5所述的方法,其特征在于,所述参数γ1,γ2和γ3通过端到端学习算法训练获取。
  7. 根据权利要求6所述的方法,其特征在于,所述参数γ1,γ2和γ3通过如下函数训练获取:
    Figure PCTCN2017108950-appb-100009
    其中,θ和γ是需要训练获取的神经系统的参数,γ表示所述参数γ1,γ2或γ3,N是训练序列集合中训练序列对的数量,Xn是训练序列对中的源端序列,Yn是训练序列对中的目标端序列,sn是Xn转换成Yn时的解码中间状态序列,λ是线性插值。
  8. 根据权利要求3至7任一所述的方法,其特征在于,所述基于所述第一候选目标端序列的重构概率值对所述第一候选目标端序列的翻译概率值进行调整包括:
    对所述第一候选目标端序列的翻译概率值和重构概率值使用线性插值的方式求和,以获取所述第一候选目标端序列的调整后的翻译概率值。
  9. 一种序列转换装置,其特征在于,包括:
    接收单元,用于接收源端序列;
    转换单元,用于将所述源端序列转换为源端向量表示序列;
    获取单元,用于根据所述源端向量表示序列获取至少两个候选目标端序列,以及所述至少两个候选目标端序列中每一个候选目标端序列的翻译概率值;
    调整单元,用于对所述每一个候选目标端序列的翻译概率值进行调整;
    选择单元,用于根据所述每一个候选目标端序列的调整后的翻译概率值,从所述至少两个候选目标端序列中选择输出目标端序列;
    输出单元,用于输出所述输出目标端序列。
  10. 根据权利要求9所述的装置,其特征在于,所述获取单元,具体用于基于注意力机制根据所述源端向量表示序列获取至少两个源端上下文向量;获取所述至少两个源端上下文向量各自的解码中间状态序列;获取所述至少两个解码中间状态序列各自的候选目标端序列;
    所述调整单元,具体用于基于所述每一个候选目标端序列的解码中间状态序列对各自的翻译概率值进行调整。
  11. 根据权利要求10所述的装置,其特征在于,所述至少两个候选目标端序列包括第一候选目标端序列,所述第一候选目标端序列是所述至少两个候选目标端序列中的任意一个;
    所述调整单元包括:
    获取子单元,用于基于所述第一候选目标端序列的解码中间状态序列获取所述第一候选目标端序列的重构概率值;
    调整子单元,用于基于所述第一候选目标端序列的重构概率值对所述第一候选目标端序列的翻译概率值进行调整。
  12. 根据权利要求11所述的装置,其特征在于,所述获取子单元具体用于:
    基于反向注意力机制获取所述第一候选目标端序列的重构概率值,所述反向注意力机制的输入是所述第一候选目标端序列的解码中间状态序列,所述反向注意力机制的输出是所述第一候选目标端序列的重构概率值。
  13. 根据权利要求12所述的装置,其特征在于,所述获取子单元具体用于:
    根据如下函数获取所述第一候选目标端序列的重构概率值:
    Figure PCTCN2017108950-appb-100010
    其中,gR()是Softmax函数;
    Figure PCTCN2017108950-appb-100011
    是通过反向注意力机制总结得到的向量,通过如下的函数获取:
    Figure PCTCN2017108950-appb-100012
    其中,
    Figure PCTCN2017108950-appb-100013
    是由反向注意力机制输出的对齐概率,通过如下的函数获取:
    Figure PCTCN2017108950-appb-100014
    其中,ej,k是源端序列中元素的反向注意力机制得分,通过如下的函数获取:
    Figure PCTCN2017108950-appb-100015
    Figure PCTCN2017108950-appb-100016
    是获取重构概率值时的中间状态,通过如下的函数获取:
    Figure PCTCN2017108950-appb-100017
    xj是所述源端序列中的元素,J表示所述源端序列中元素的数量;si表示所述第一候选目标端序列的解码中间状态序列中的元素,I表示所述第一候选目标端序列的解码中间状态序列中元素的数量;fR是激活函数,R是重构概率值;γ1,γ2和γ3是参数。
  14. 根据权利要求13所述的装置,其特征在于,所述装置还包括训练单元,用于通过端到端学习算法训练获取所述参数γ1,γ2和γ3
  15. 根据权利要求14所述的装置,其特征在于,所述训练单元,具体用于通过如下函数训练获取所述参数γ1,γ2和γ3
    Figure PCTCN2017108950-appb-100018
    其中,θ和γ是需要训练获取的神经系统的参数,γ表示所述参数γ1,γ2或γ3,N是训练序列集合中训练序列对的数量,Xn是训练序列对中的源端序列,Yn是训练序列对中的目标端序列,sn是Xn转换成Yn时的解码中间状态序列,λ是线性插值。
  16. 根据权利要求11至15任一所述的装置,其特征在于,所述调整子单元,具体用于:
    对所述第一候选目标端序列的翻译概率值和重构概率值使用线性插值的方式求和,以获取所述第一候选目标端序列的调整后的翻译概率值。
  17. 一种序列转换系统,其特征在于,包括输入接口,输出接口以及如权利要求9至16任一所述的序列转换装置;
    所述输入接口,用于接收源端数据并将所述源端数据转换为所述源端序列;
    所述输出接口,用于输出所述序列转换装置输出的输出目标端序列。
PCT/CN2017/108950 2016-11-04 2017-11-01 序列转换方法及装置 WO2018082569A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17868103.7A EP3534276A4 (en) 2016-11-04 2017-11-01 SEQUENCE CONVERSION PROCESS AND DEVICE
US16/396,172 US11132516B2 (en) 2016-11-04 2019-04-26 Sequence translation probability adjustment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610982039.7A CN108021549B (zh) 2016-11-04 2016-11-04 序列转换方法及装置
CN201610982039.7 2016-11-04

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/396,172 Continuation US11132516B2 (en) 2016-11-04 2019-04-26 Sequence translation probability adjustment

Publications (1)

Publication Number Publication Date
WO2018082569A1 true WO2018082569A1 (zh) 2018-05-11

Family

ID=62075696

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/108950 WO2018082569A1 (zh) 2016-11-04 2017-11-01 序列转换方法及装置

Country Status (4)

Country Link
US (1) US11132516B2 (zh)
EP (1) EP3534276A4 (zh)
CN (1) CN108021549B (zh)
WO (1) WO2018082569A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408626B (zh) * 2018-11-09 2021-09-21 思必驰科技股份有限公司 对自然语言进行处理的方法及装置
US11106873B2 (en) * 2019-01-22 2021-08-31 Sap Se Context-based translation retrieval via multilingual space
CN109948166B (zh) * 2019-03-25 2021-03-02 腾讯科技(深圳)有限公司 文本翻译方法、装置、存储介质和计算机设备
CN110209801B (zh) * 2019-05-15 2021-05-14 华南理工大学 一种基于自注意力网络的文本摘要自动生成方法
CN110377902B (zh) * 2019-06-21 2023-07-25 北京百度网讯科技有限公司 描述文本生成模型的训练方法和装置
TWI724644B (zh) * 2019-11-22 2021-04-11 中華電信股份有限公司 基於類神經網路之語音或文字文件摘要系統及方法
US11586833B2 (en) 2020-06-12 2023-02-21 Huawei Technologies Co., Ltd. System and method for bi-directional translation using sum-product networks

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1679621A1 (en) * 2004-12-17 2006-07-12 Xerox Corporation Method and appartus for explaining categorization decisions
CN103207899A (zh) * 2013-03-19 2013-07-17 新浪网技术(中国)有限公司 文本文件推荐方法及系统
CN104965822A (zh) * 2015-07-29 2015-10-07 中南大学 一种基于计算机信息处理技术的中文文本情感分析方法
CN105260361A (zh) * 2015-10-28 2016-01-20 南京邮电大学 一种生物医学事件的触发词标注系统及方法

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7006881B1 (en) * 1991-12-23 2006-02-28 Steven Hoffberg Media recording device with remote graphic user interface
US7904187B2 (en) * 1999-02-01 2011-03-08 Hoffberg Steven M Internet appliance system and method
DE102004036474A1 (de) 2004-07-28 2006-03-23 Roche Diagnostics Gmbh Analysesystem zur Analyse einer Probe auf einem Testelement
JP4050755B2 (ja) * 2005-03-30 2008-02-20 株式会社東芝 コミュニケーション支援装置、コミュニケーション支援方法およびコミュニケーション支援プログラム
KR101356417B1 (ko) * 2010-11-05 2014-01-28 고려대학교 산학협력단 병렬 말뭉치를 이용한 동사구 번역 패턴 구축 장치 및 그 방법
US10453479B2 (en) * 2011-09-23 2019-10-22 Lessac Technologies, Inc. Methods for aligning expressive speech utterances with text and systems therefor
US8873813B2 (en) * 2012-09-17 2014-10-28 Z Advanced Computing, Inc. Application of Z-webs and Z-factors to analytics, search engine, learning, recognition, natural language, and other utilities
US9916538B2 (en) * 2012-09-15 2018-03-13 Z Advanced Computing, Inc. Method and system for feature detection
JP5528420B2 (ja) 2011-12-05 2014-06-25 シャープ株式会社 翻訳装置、翻訳方法及びコンピュータプログラム
CN104391842A (zh) 2014-12-18 2015-03-04 苏州大学 一种翻译模型构建方法和系统
US10846589B2 (en) * 2015-03-12 2020-11-24 William Marsh Rice University Automated compilation of probabilistic task description into executable neural network specification
EP3338221A4 (en) * 2015-08-19 2019-05-01 D-Wave Systems Inc. DISCRETE VARIATION SELF-ENCODING SYSTEMS AND METHODS FOR MACHINE LEARNING USING ADIABATIC QUANTUM COMPUTERS
US10776712B2 (en) * 2015-12-02 2020-09-15 Preferred Networks, Inc. Generative machine learning systems for drug design

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1679621A1 (en) * 2004-12-17 2006-07-12 Xerox Corporation Method and appartus for explaining categorization decisions
CN103207899A (zh) * 2013-03-19 2013-07-17 新浪网技术(中国)有限公司 文本文件推荐方法及系统
CN104965822A (zh) * 2015-07-29 2015-10-07 中南大学 一种基于计算机信息处理技术的中文文本情感分析方法
CN105260361A (zh) * 2015-10-28 2016-01-20 南京邮电大学 一种生物医学事件的触发词标注系统及方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3534276A4

Also Published As

Publication number Publication date
US11132516B2 (en) 2021-09-28
US20190251178A1 (en) 2019-08-15
CN108021549B (zh) 2019-08-13
EP3534276A4 (en) 2019-11-13
CN108021549A (zh) 2018-05-11
EP3534276A1 (en) 2019-09-04

Similar Documents

Publication Publication Date Title
WO2018082569A1 (zh) 序列转换方法及装置
US11481562B2 (en) Method and apparatus for evaluating translation quality
WO2018032765A1 (zh) 序列转换方法及装置
CN113892135A (zh) 多语言语音合成和跨语言话音克隆
US10319368B2 (en) Meaning generation method, meaning generation apparatus, and storage medium
CA3119529A1 (en) Reconciliation between simulated data and speech recognition output using sequence-to-sequence mapping
KR20200019740A (ko) 번역 방법, 타깃 정보 결정 방법, 관련 장치 및 저장 매체
CN110428820B (zh) 一种中英文混合语音识别方法及装置
US10810993B2 (en) Sample-efficient adaptive text-to-speech
KR102577589B1 (ko) 음성 인식 방법 및 음성 인식 장치
US20220300718A1 (en) Method, system, electronic device and storage medium for clarification question generation
KR101666930B1 (ko) 심화 학습 모델을 이용한 목표 화자의 적응형 목소리 변환 방법 및 이를 구현하는 음성 변환 장치
US20230230571A1 (en) Audio processing method and apparatus based on artificial intelligence, device, storage medium, and computer program product
KR20200044388A (ko) 음성을 인식하는 장치 및 방법, 음성 인식 모델을 트레이닝하는 장치 및 방법
WO2022141842A1 (zh) 基于深度学习的语音训练方法、装置、设备以及存储介质
CN112837669B (zh) 语音合成方法、装置及服务器
KR20220064940A (ko) 음성 생성 방법, 장치, 전자기기 및 저장매체
Nagaraj et al. Kannada to English Machine Translation Using Deep Neural Network.
JP6243072B1 (ja) 入出力システム、入出力プログラム、情報処理装置、チャットシステム
KR20210045217A (ko) 감정 이식 장치 및 감정 이식 방법
Dida et al. ChatGPT and Big Data: Enhancing Text-to-Speech Conversion
CN111797220A (zh) 对话生成方法、装置、计算机设备和存储介质
Kumar et al. Towards building text-to-speech systems for the next billion users
US20230274751A1 (en) Audio signal conversion model learning apparatus, audio signal conversion apparatus, audio signal conversion model learning method and program
CN111241830B (zh) 对语词向量生成方法、对语生成模型训练方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17868103

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017868103

Country of ref document: EP

Effective date: 20190529