WO2021184769A1 - 神经网络文本翻译模型的运行方法、装置、设备、及介质 - Google Patents

神经网络文本翻译模型的运行方法、装置、设备、及介质 Download PDF

Info

Publication number
WO2021184769A1
WO2021184769A1 PCT/CN2020/125431 CN2020125431W WO2021184769A1 WO 2021184769 A1 WO2021184769 A1 WO 2021184769A1 CN 2020125431 W CN2020125431 W CN 2020125431W WO 2021184769 A1 WO2021184769 A1 WO 2021184769A1
Authority
WO
WIPO (PCT)
Prior art keywords
vocabulary
sequence
unknown
language vocabulary
attention
Prior art date
Application number
PCT/CN2020/125431
Other languages
English (en)
French (fr)
Inventor
单杰
Original Assignee
江苏省舜禹信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江苏省舜禹信息技术有限公司 filed Critical 江苏省舜禹信息技术有限公司
Publication of WO2021184769A1 publication Critical patent/WO2021184769A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present disclosure relate to the technical field of natural language processing, and in particular to an operating method, device, electronic device, and storage medium of a neural network text translation model.
  • SMT Statistical Machine Translation
  • NMT Neuro Machine Translation
  • RNN Recurrent Neural Network
  • LSTM Long Short Term Memory
  • GRU Gate Recurrent Unit
  • the embodiments of the present disclosure provide an operating method, device, electronic device, and storage medium of a neural network text translation model, so as to reduce the unknown text in the translation result.
  • the embodiments of the present disclosure provide a method for operating a neural network text translation model.
  • the neural network text translation model includes an encoder layer, an attention mechanism layer, and a decoder layer, including: converting a source language vocabulary sequence Input the encoder layer for processing to form a hidden structure vector; control the attention mechanism layer to generate attention information according to the internal state of the encoder layer and the decoder layer, and generate unknown information according to the attention information
  • a vocabulary alignment table for text replacement wherein the vocabulary alignment table has no repeated words; input the hidden structure vector and the context vector when translating each word into the decoder layer for processing to generate a target language vocabulary sequence; obtain the For unknown characters in the target language vocabulary sequence, determine according to the vocabulary alignment table that the unknown characters correspond to the source language vocabulary in the source language vocabulary sequence; translate the source language vocabulary to obtain the target language vocabulary; convert the target The unknown text in the language vocabulary sequence is replaced with the target language vocabulary.
  • generating a vocabulary alignment table for unknown text replacement according to the attention information includes:
  • the source language vocabulary sequence is associated with the vocabulary unit with the highest attention in the target language vocabulary sequence through an intersection algorithm, and an unknown word replacement vocabulary alignment table is generated according to the association result, wherein the vocabulary The unit includes one or more adjacent words.
  • the method before generating the unknown word replacement vocabulary alignment table according to the association result, the method further includes:
  • the generating a vocabulary alignment table for unknown text replacement according to the association result includes:
  • a vocabulary alignment table for replacing unknown characters is generated according to the association result and the second association result.
  • the method further includes: determining, based on the vocabulary alignment table, that there is no corresponding first in the target language vocabulary sequence.
  • the target language vocabulary is determined according to the attention information to establish a third association with the first target language vocabulary; based on the vocabulary alignment table, the first target language vocabulary sequence that has no corresponding relationship is determined
  • a source language vocabulary according to the attention information, it is determined that the unit with the highest attention establishes a fourth association with the first source language vocabulary; according to the association result, the second association result, the third association result, and The fourth association result generates a vocabulary alignment table for unknown text replacement.
  • controlling the attention mechanism layer to generate attention information according to the internal states of the encoder layer and the decoder layer, and generating an unknown word replacement vocabulary alignment table according to the attention information includes: controlling The attention mechanism layer determines the context vector when translating each vocabulary in the source language vocabulary sequence according to the internal state of the encoder layer and the decoder layer, and generates unknown text according to the context vector when translating each word Replacement vocabulary alignment table.
  • the attention mechanism layer is controlled to determine the context vector when translating each word in the source language vocabulary sequence according to the internal state of the encoder layer and the decoder layer, and according to the translation of each word
  • the context vector generated at the time of the unknown text replacement vocabulary alignment table includes: when translating each vocabulary, determine the sequence number of the target language vocabulary sequence currently translated, obtain the position that should be paid attention to when translating the vocabulary, and compare each vocabulary sequence in the source language.
  • Vocabulary calculates the attention probability, multiplies the distribution representation vector corresponding to each vocabulary in the source language vocabulary sequence by the attention probability of the vocabulary, and then determines the sequence number of the vocabulary corresponding to the maximum value in the source language vocabulary sequence; For vocabulary, the sequence number of the currently translated target language vocabulary sequence is associated with the sequence number of the vocabulary corresponding to the determined maximum value in the source language vocabulary sequence, and an unknown word replacement vocabulary alignment table is generated according to the association result.
  • translating the source language vocabulary to obtain the target language vocabulary includes: using the IBM alignment model to translate the source language vocabulary to obtain the target language vocabulary; or translating the source language vocabulary through an external dictionary to obtain the target language vocabulary Target language vocabulary.
  • the embodiments of the present disclosure also provide an operating device for a neural network text translation model.
  • the neural network text translation model includes an encoder layer, an attention mechanism layer, and a decoder layer.
  • the source language vocabulary sequence is input to the encoder layer for processing to form a hidden structure vector; an attention control unit is used to control the attention mechanism layer according to the internal state of the encoder layer and the decoder layer Attention information is generated, and a vocabulary alignment table for replacing unknown characters is generated according to the attention information, wherein the vocabulary alignment table has no repeated words;
  • the decoding unit inputs the hidden structure vector and the context vector when translating each word into the
  • the decoder layer performs processing to generate a target language vocabulary sequence; an unknown character positioning unit is used to obtain an unknown character in the target language vocabulary sequence, and determine according to the vocabulary alignment table that the unknown character corresponds to the source language vocabulary sequence
  • the attention control unit is configured to associate the source language vocabulary sequence with the vocabulary unit with the highest attention in the target language vocabulary sequence through an intersection algorithm according to the attention information, and The association result generates a vocabulary alignment table for replacing unknown characters, wherein the vocabulary unit includes one or more adjacent vocabulary.
  • the attention control unit is configured to, before generating a vocabulary alignment table for unknown text replacement according to the association result: use an intersection algorithm to combine the source language vocabulary sequence and the target language vocabulary sequence with the highest attention
  • the adjacent unit of the vocabulary unit establishes a second association; a vocabulary alignment table for unknown text replacement is generated according to the association result and the second association result.
  • the attention control unit is configured to: after generating an unknown word replacement vocabulary alignment table according to the association result and the second association result, determine the target language vocabulary sequence based on the vocabulary alignment table According to the attention information, determine the unit with the highest attention to establish a third association with the first target language vocabulary; based on the vocabulary alignment table, determine the source language vocabulary For the first source language vocabulary that does not have a corresponding relationship in the sequence, according to the attention information, the unit with the highest attention is determined to establish a fourth association with the first source language vocabulary; according to the association result, the second association result, and the The third association result and the fourth association result generate a vocabulary alignment table for unknown character replacement.
  • the attention control unit is configured to control the attention mechanism layer to determine when to translate each vocabulary in the source language vocabulary sequence according to the internal state of the encoder layer and the decoder layer According to the context vector of each vocabulary, and generate a vocabulary alignment table for unknown text replacement according to the context vector when translating each vocabulary.
  • the attention control unit is configured to: when translating each vocabulary, determine the sequence number of the target language vocabulary sequence currently being translated, obtain the position that should be paid attention to when translating the vocabulary, and compare the source language vocabulary sequence The attention probability is calculated for each vocabulary, and after the distribution representation vector corresponding to each vocabulary in the source language vocabulary sequence is multiplied by the attention probability of the vocabulary, the sequence number of the vocabulary corresponding to the maximum value in the source language vocabulary sequence is determined; For each vocabulary, the sequence number of the currently translated target language vocabulary sequence is associated with the sequence number of the vocabulary corresponding to the determined maximum value in the source language vocabulary sequence, and an unknown word replacement vocabulary alignment table is generated according to the association result.
  • the unknown text translation unit is used to: use the IBM alignment model to translate the source language vocabulary to obtain the target language vocabulary; or to translate the source language vocabulary through an external dictionary to obtain the target language vocabulary.
  • an embodiment of the present disclosure also provides an electronic device, including: one or more processors; a memory, used to store one or more programs; when the one or more programs are used by the one or more Executed by two processors, so that the one or more processors implement the instructions of the method according to any one of the first aspects.
  • the embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method described in any one of the first aspects are implemented.
  • the embodiment of the present disclosure generates an aligned vocabulary list through the attention mechanism, finds unknown characters in the target language vocabulary sequence, determines the vocabulary in the source language vocabulary sequence corresponding to the unknown text, translates the vocabulary, and then replaces it with the translated vocabulary Unknown text, thereby eliminating unknown text.
  • a vocabulary alignment table without repeated words is made, and the unknown text in the output result is judged which vocabulary corresponds to the source language vocabulary sequence, and then the unknown text is replaced with an appropriate vocabulary. Reduce or even completely eliminate the unknown text in the translation result.
  • FIG. 1 is a schematic flowchart of a method for operating a neural network text translation model provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of another method for running a neural network text translation model provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of a running device of a neural network text translation model provided by an embodiment of the present disclosure
  • Fig. 4 shows a schematic structural diagram of an electronic device suitable for implementing the embodiments of the present disclosure.
  • Figure 1 shows a schematic flow chart of a method for operating a neural network text translation model provided by an embodiment of the present disclosure. This embodiment is applicable to the case of text translation through a neural network machine translation model.
  • the neural network text translation model in the device is executed by the running device.
  • the neural network text translation model includes an encoder layer, an attention mechanism layer, and a decoder layer.
  • the neural network described in this embodiment The operating methods of the text translation model include:
  • step S110 the source language vocabulary sequence is input to the encoder layer for processing to form a hidden structure vector.
  • This step can be implemented in a variety of ways. For example, an encoder can convert each vocabulary (Word Embedding) into a distributed representation (distributed representation) vector, which contains semantics. Using forward RNN and backward recurrent neural network RNN, the obtained distribution representation vector is combined to generate a hidden structure vector.
  • step S120 the attention mechanism layer is controlled to generate attention information according to the internal states of the encoder layer and the decoder layer, and an unknown word replacement vocabulary alignment table is generated according to the attention information. There are no repeated words in the word alignment table.
  • e ij a(s i-1 ,h j ) calculates the attention probability ⁇ ij , which represents the probability that x i is associated with y i ;
  • step S130 the hidden structure vector and the context vector when each vocabulary is translated are input to the decoder layer for processing to generate a target language vocabulary sequence.
  • step S140 an unknown character in the target language vocabulary sequence is acquired, and it is determined according to the vocabulary alignment table that the unknown character corresponds to a source language vocabulary in the source language vocabulary sequence.
  • step S150 the source language vocabulary is translated to obtain the target language vocabulary.
  • the IBM alignment model is used to translate the source language vocabulary to obtain the target language vocabulary.
  • the source language vocabulary is translated through an external dictionary to obtain the target language vocabulary.
  • step S160 the unknown characters in the target language vocabulary sequence are replaced with the target language vocabulary.
  • the aforementioned correlation functions a(), f(), g(), and h() are functions that use the nonlinear function tanh to transform the weighted linear sum of the input variables.
  • the input variables are set to v 1 ,v 2 ??,v n
  • the weight of each variable is set to w 1 ,w 2 ??,w n
  • a variety of methods can be used to generate a vocabulary alignment table for unknown text replacement according to the attention information.
  • the vocabulary unit with the highest attention in the vocabulary sequence is associated, and an unknown word replacement vocabulary alignment table is generated according to the association result, wherein the vocabulary unit includes one or more adjacent words.
  • the source language vocabulary sequence and the adjacent unit of the vocabulary unit with the highest attention in the target language vocabulary sequence may also be used to establish a second association through an intersection algorithm .
  • a vocabulary alignment table for replacing unknown characters is generated according to the association result and the second association result.
  • the first target language that does not have a corresponding relationship in the target language vocabulary sequence may also be determined based on the vocabulary alignment table Vocabulary, according to the attention information, determine that the unit with the highest attention establishes a third association with the first target language vocabulary; and based on the vocabulary alignment table, determine the first source language vocabulary sequence that has no corresponding relationship Source language vocabulary, according to the attention information, determine the unit with the highest attention to establish a fourth association with the first source language vocabulary; to establish a fourth association according to the association result, the second association result, the third association result, and The fourth association result generates a vocabulary alignment table for unknown text replacement.
  • the attention mechanism layer is controlled to generate attention information according to the internal states of the encoder layer and the decoder layer, and an unknown word replacement vocabulary alignment table is generated according to the attention information, which can be controlled
  • the attention mechanism layer determines the context vector when translating each vocabulary in the source language vocabulary sequence according to the internal state of the encoder layer and the decoder layer, and generates unknown text according to the context vector when translating each word Replacement vocabulary alignment table.
  • the attention mechanism layer is controlled to determine the context vector when translating each word in the source language vocabulary sequence according to the internal state of the encoder layer and the decoder layer, and according to the context vector when translating each word.
  • the sequence number of the currently translated target language vocabulary sequence When translating each vocabulary, determine the sequence number of the currently translated target language vocabulary sequence, obtain the position that should be paid attention to when translating the vocabulary, calculate the attention probability for each vocabulary in the source language vocabulary sequence, and divide the source language vocabulary sequence into After the distribution representation vector corresponding to each vocabulary is multiplied by the attention probability of the vocabulary, the sequence number of the vocabulary corresponding to the maximum value in the source language vocabulary sequence is determined; according to the sequence number of the currently translated target language vocabulary sequence when each vocabulary is translated, The vocabulary corresponding to the determined maximum value is associated with the sequence number of the source language vocabulary sequence, and an unknown word replacement vocabulary alignment table is generated according to the association result.
  • an attention mechanism is used to generate an aligned vocabulary, find unknown characters in the target language vocabulary sequence, determine the vocabulary in the source language vocabulary sequence corresponding to the unknown text, translate the vocabulary, and then replace the unknown with the translated vocabulary Text, thereby reducing the unknown text in the translation result.
  • Fig. 2 shows a schematic flow chart of another method for operating a neural network text translation model provided by an embodiment of the present disclosure. This embodiment is based on the foregoing embodiment and has been improved and optimized. As shown in Fig. 2, the operation method of the neural network text translation model described in this embodiment includes:
  • step S210 a vocabulary alignment table is made based on the attention mechanism.
  • intersection algorithm intersection; see Koehn et al. 2003
  • correction algorithm 1 and correction algorithm 2 can be used to create a vocabulary alignment table for unknown text replacement.
  • the attention probability represented by a ij in the formula
  • the attention probability corresponding to the i-th target language vocabulary and the j-th source language vocabulary and the elements in the vocabulary alignment table constitute a unit.
  • the unit with the highest attention value is associated with the source language and the target language.
  • the value b ij of each unit can be calculated according to the following formula:
  • b ij is the attention value
  • arg i 'max a i'j parametric function is a maximum value (arguments of the maxima), for i and j and calculates the attention of the highest value, thereby determining the corresponding equivalent unit.
  • the vocabulary alignment table b'ij obtained by the correction algorithm 1 is calculated in the following manner using a certain function that can find the number of units with a value of 1 among the adjacent units of the upper, lower, left, and right sides of b pq.
  • the formula for modified algorithm 1 is as follows:
  • the unit with the highest attention value is used to establish the corresponding relationship; conversely, for the source language vocabulary that does not have a corresponding target language vocabulary, the corresponding relationship is established;
  • the source language vocabulary and the target language vocabulary are similarly taken from the unit with the highest attention value to establish a corresponding relationship.
  • the obtained vocabulary alignment table b" ij can be calculated by modified algorithm 2, and the formula is as follows:
  • I and J are the set of source language vocabulary without corresponding target language vocabulary and the set of target language vocabulary without corresponding source language vocabulary; the argmax function is the same as described above.
  • each vocabulary in the target language vocabulary sequence corresponds to at least one vocabulary in the source language vocabulary sequence.
  • all unknown characters in the target language vocabulary sequence can be correspondingly assigned to the vocabulary in the source language vocabulary sequence.
  • step S210 the vocabulary corresponding to the unknown character is determined according to the vocabulary alignment table, and the unknown character is replaced with the vocabulary.
  • b ij 1 ⁇ , and confirm Translate the vocabulary line and replace e i with the corresponding vocabulary.
  • the method of determining the translation vocabulary line can adopt the IBM alignment model or import external dictionaries.
  • the embodiment of the present disclosure can completely eliminate unknown characters when the correction algorithm 2 is used to make the vocabulary alignment table.
  • the BLEU value see Papineni, Roukos, Ward, and Zhu 2002
  • the METEOR value (Banerjee and Lavie 2005) are also improved.
  • the accuracy of translation can be further improved, and a better translation effect can be achieved for scientific and technological documents and patent documents with higher requirements for term translation.
  • the linguistic sense it uses the characteristic of the corresponding relationship between neighboring words to calculate the aligned vocabulary according to the attention mechanism. Then, the generated aligned vocabulary is used to replace the unknown characters, and at the same time, the advantages of the attention mechanism and linguistic characteristics of neural network machine translation are used to solve the problem of unknown characters.
  • the implementation of the present disclosure creates a vocabulary alignment table without repeated words, determines which vocabulary corresponds to the unknown text in the output result and the source language vocabulary sequence, and then replaces the unknown text with an appropriate vocabulary, which can reduce Even completely eliminate the unknown text in the translation result.
  • the unknown characters can be completely eliminated, and the BLEU value and the METEOR value are also improved. Furthermore, by adopting the method of the present invention and introducing a more professional external dictionary, the accuracy of translation can be further improved, and a better translation effect can be achieved for scientific and technological documents and patent documents with higher requirements for term translation.
  • the present invention utilizes the characteristic that there is a correspondence between adjacent words, and calculates the aligned word list according to the attention mechanism. Then, the generated aligned word list is used to replace the unknown text, and at the same time, the advantages of the attention mechanism and linguistic characteristics of the neural network machine translation are used to solve the unknown text problem.
  • the corpora used by neural network machine translation is NIST and WMT, and other types of corpora can also be used.
  • the parallel corpus uses NIST and WMT
  • the learning model and decoder use nematus
  • the number of hidden layers is 1000
  • the word vector dimension is 512
  • the RNN uses GRU
  • the learning algorithm Adam the learning rate is 0.0001
  • the batch size (Batch_Size) 40 Do not add dropout, learn in this environment.
  • Stanford Parser is used for English syntax analysis
  • KyTea is used for Chinese indicator tokenization
  • the IBM model specifically uses GIZA++
  • mosesdecoder is used to extract phrase tables
  • EDict is used as an external dictionary to replace unknown languages.
  • the number of words in the training text is 10,000 to 50,000. Based on this, the BLEU value of the translation result is calculated every time an increase of 10,000 words is added.
  • the BLEU value of the translation result is shown in Table 2 every time 10,000 words are added.
  • the number of words in the test is set to 40,000.
  • BLEU Breast Evaluation Understudy
  • METEOR METEOR standard: Language specific translation evaluation for any target language
  • Baseline is a model obtained by learning under the preset values of the neural network machine translation system nematus.
  • BPE and PosUNK use the algorithms proposed by Sennrich et al. 2016 and Luong et al. 2015 respectively. Intersection is the intersection algorithm, and Dict is the imported algorithm.
  • the external dictionary, Webster Dictionary can also use other commonly used dictionaries. When the number of words is set to 40,000, the results are shown in the following table:
  • the neural network machine translation method of the embodiment of the present disclosure is based on the attention generated by neural network machine translation to create a word alignment table without repeated words, and determine which word corresponds to the unknown text in the output result and the source language vocabulary sequence , And then use the SMT model to replace the unknown text with appropriate words.
  • unknown characters can be completely eliminated, and the BLEU value and METEOR value are also improved.
  • this application provides an embodiment of a device for running a neural network text translation model.
  • Figure 3 shows the device for running a neural network text translation model provided by this embodiment.
  • the device embodiment corresponds to the method embodiment shown in FIG. 1 and FIG. 2.
  • the device can be specifically applied to various electronic devices.
  • the neural network text translation model in this embodiment includes an encoder layer, an attention mechanism layer, and a decoder layer.
  • the operation device of the neural network text translation model in this embodiment includes an encoding unit 310, The attention control unit 320, the decoding unit 330, the unknown text positioning unit 340, the unknown text translation unit 350, and the vocabulary replacement unit 360.
  • the encoding unit 310 is configured to input the source language vocabulary sequence into the encoder layer for processing to form a hidden structure vector.
  • the attention control unit 320 is configured to control the attention mechanism layer to generate attention information according to the internal states of the encoder layer and the decoder layer, and generate unknown text replacement according to the attention information
  • a vocabulary alignment table is used, wherein the vocabulary alignment table does not have repeated words.
  • the decoding unit 330 is configured to input the hidden structure vector and the context vector when each vocabulary is translated into the decoder layer for processing, so as to generate a target language vocabulary sequence.
  • the unknown character positioning unit 340 is configured to obtain unknown characters in the target language vocabulary sequence, and determine according to the vocabulary alignment table that the unknown characters correspond to the source language vocabulary in the source language vocabulary sequence.
  • the unknown text translation unit 350 is configured to translate the source language vocabulary to obtain the target language vocabulary.
  • the vocabulary replacement unit 360 is configured to replace the unknown characters in the target language vocabulary sequence with the target language vocabulary.
  • the attention control unit 320 is configured to pay attention to the source language vocabulary sequence and the target language vocabulary sequence through an intersection algorithm according to the attention information.
  • the most powerful vocabulary unit is associated, and an unknown word replacement vocabulary alignment table is generated according to the association result, wherein the vocabulary unit includes one or more adjacent words.
  • the attention control unit 320 is configured to: before generating an unknown word replacement vocabulary alignment table according to the association result: combine the source language vocabulary sequence with all the vocabulary sequences through an intersection algorithm.
  • the adjacent unit of the vocabulary unit with the highest attention in the target language vocabulary sequence establishes a second association; a vocabulary alignment table for unknown text replacement is generated according to the association result and the second association result.
  • the attention control unit 320 is configured to, after generating an unknown word replacement vocabulary alignment table according to the association result and the second association result, align based on the vocabulary Table, determining the first target language vocabulary that has no corresponding relationship in the target language vocabulary sequence, and determining, according to the attention information, the unit with the highest attention to establish a third association with the first target language vocabulary; based on the vocabulary
  • the alignment table is used to determine the first source language vocabulary that has no corresponding relationship in the source language vocabulary sequence, and according to the attention information, it is determined that the unit with the highest attention has a fourth association with the first source language vocabulary; according to the association result ,
  • the second association result, the third association result, and the fourth association result generate an unknown word replacement vocabulary alignment table.
  • the attention control unit 320 is configured to control the attention mechanism layer to determine the translation office according to the internal state of the encoder layer and the decoder layer. State the context vector of each vocabulary in the source language vocabulary sequence, and generate a vocabulary alignment table for unknown text replacement according to the context vector of each vocabulary when translating each vocabulary.
  • the attention control unit 320 is configured to determine the sequence number of the currently translated target language vocabulary sequence when translating each vocabulary, and obtain the position that should be paid attention to when translating the vocabulary Calculate the attention probability for each vocabulary in the source language vocabulary sequence, and after multiplying the distribution representation vector corresponding to each vocabulary in the source language vocabulary sequence by the attention probability of the vocabulary, it is determined that the vocabulary corresponding to the maximum value is in the The sequence number of the source language vocabulary sequence; when translating each vocabulary, the sequence number of the currently translated target language vocabulary sequence and the vocabulary corresponding to the determined maximum value are associated with the sequence number of the source language vocabulary sequence, and an unknown is generated according to the association result Word alignment table for text replacement.
  • the unknown text translation unit 350 is configured to use the IBM alignment model to translate the source language vocabulary to obtain the target language vocabulary. Or it is used to translate the source language vocabulary through an external dictionary to obtain the target language vocabulary.
  • the running device of the neural network text translation model provided in this embodiment can execute the running method of the neural network text translation model provided in the method embodiment of the present disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
  • Terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (e.g. Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 4 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the electronic device 400 may include a processing device (such as a central processing unit, a graphics processor, etc.) 401, which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 402 or from a storage device 408.
  • the program in the memory (RAM) 403 executes various appropriate actions and processing.
  • various programs and data required for the operation of the electronic device 400 are also stored.
  • the processing device 401, ROM 402, and RAM 403 are connected to each other through a bus 404.
  • An input/output (I/O) interface 405 is also connected to the bus 404.
  • the following devices can be connected to the I/O interface 405: including input devices 406 such as touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; including, for example, liquid crystal displays (LCD), speakers, vibration An output device 407 such as a device; a storage device 408 such as a magnetic tape, a hard disk, etc.; and a communication device 409.
  • the communication device 409 may allow the electronic device 400 to perform wireless or wired communication with other devices to exchange data.
  • FIG. 4 shows an electronic device 400 having various devices, it should be understood that it is not required to implement or have all of the illustrated devices. It may be implemented alternatively or provided with more or fewer devices.
  • the process described above with reference to the flowchart can be implemented as a computer software program.
  • the embodiments of the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 409, or installed from the storage device 408, or installed from the ROM 402.
  • the processing device 401 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the above-mentioned computer-readable medium in the embodiments of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above.
  • Computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable removable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs.
  • the electronic device When the above-mentioned one or more programs are executed by the electronic device, the electronic device is caused to input the source language vocabulary sequence into the encoder layer for processing to form an implicit structure Vector; controlling the attention mechanism layer to generate attention information according to the internal states of the encoder layer and the decoder layer, and generate a vocabulary alignment table for unknown text replacement according to the attention information, wherein the vocabulary alignment table There is no repeated vocabulary; the hidden structure vector and the context vector when translating each vocabulary are input to the decoder layer for processing to generate the target language vocabulary sequence; the unknown text in the target language vocabulary sequence is obtained, and the unknown text is obtained according to the vocabulary
  • the alignment table determines that the unknown text corresponds to the source language vocabulary in the source language vocabulary sequence; translates the source language vocabulary to obtain a target language vocabulary; uses the unknown text in the target language vocabulary sequence as the target Language vocabulary replacement.
  • the computer program code used to perform the operations of the embodiments of the present disclosure can be written in one or more programming languages or a combination thereof.
  • the above-mentioned programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and Conventional procedural programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logic function.
  • Executable instructions can also occur in a different order from the order marked in the drawings. For example, two blocks shown one after another can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure can be implemented in software or hardware. Wherein, the name of the unit does not constitute a limitation on the unit itself under certain circumstances.
  • the first obtaining unit can also be described as "a unit for obtaining at least two Internet Protocol addresses.”

Abstract

本公开实施例公开了一种神经网络文本翻译模型的运行方法、装置、电子设备、及存储介质,所述神经网络文本翻译模型,包括编码器层、注意力机制层、以及解码器层,方法包括:将源语言词汇序列输入编码器层进行处理,以形成隐结构向量;控制注意力机制层生成词汇对齐表;将隐结构向量和翻译各个词汇时的上下文向量输入解码器层进行处理,以生成目标语言词汇序列;获取所述目标语言词汇序列中的未知文字,根据所述词汇对齐表确定所述未知文字对应所述源语言词汇序列中的源语言词汇;对所述源语言词汇进行翻译得到目标语言词汇;将所述目标语言词汇序列中的所述未知文字用所述目标语言词汇替换,能减少乃至完全消除翻译结果中的未知文字。

Description

神经网络文本翻译模型的运行方法、装置、设备、及介质 技术领域
本公开实施例涉及自然语言处理技术领域,具体涉及一种神经网络文本翻译模型的运行方法、装置、电子设备、及存储介质。
背景技术
传统的SMT(统计机器翻译,Statistical Machine Translation)是从平行语料库获得翻译规则的概率,根据该概率将源语言的词汇或短语转换为目标语言的词汇或短语。但是,SMT方法没有反映出距离较远短语间的联系,因此译文常欠缺通顺性。
与SMT相比,NMT(神经网络机器翻译,Neural Machine Translation)基于数值向量将源语言表示为分散表示,使用神经网络将其加以转换,根据获得的数值向量求出目标语言的词汇串,从而进行翻译,其通过利用RNN(循环神经网络,Recurrent Neural Network)和LSTM(长短记忆网络,Long Short Term Memory)或GRU(门控循环单元,Gated Recurrent Unit),在考虑较长区间内词汇或短语彼此的联系的基础上进行翻译,因此译文通顺性显著提升。但是,翻译得到的译文还常存在未知词汇(unknown words)或无意义词汇(nonsense words)的问题。
发明内容
有鉴于此,本公开实施例提供一种神经网络文本翻译模型的运行方法、装 置、电子设备、及存储介质,以减少翻译结果中的未知文字。
本公开实施例的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本公开实施例的实践而习得。
第一方面,本公开实施例提供了一种神经网络文本翻译模型的运行方法,所述神经网络文本翻译模型包括编码器层、注意力机制层、以及解码器层,包括:将源语言词汇序列输入所述编码器层进行处理,以形成隐结构向量;控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态生成注意力信息,根据所述注意力信息生成未知文字替换用词汇对齐表,其中所述词汇对齐表没有重复词汇;将所述隐结构向量和翻译各个词汇时的上下文向量输入所述解码器层进行处理,以生成目标语言词汇序列;获取所述目标语言词汇序列中的未知文字,根据所述词汇对齐表确定所述未知文字对应所述源语言词汇序列中的源语言词汇;对所述源语言词汇进行翻译得到目标语言词汇;将所述目标语言词汇序列中的所述未知文字用所述目标语言词汇替换。
于一实施例中,根据所述注意力信息生成未知文字替换用词汇对齐表包括:
根据所述注意力信息,通过交集算法将所述源语言词汇序列与所述目标语言词汇序列中注意力最高的词汇单元建立关联,根据关联结果生成未知文字替换用词汇对齐表,其中所述词汇单元包括一个或一个以上相邻的词汇。
于一实施例中,在根据关联结果生成未知文字替换用词汇对齐表之前还包括:
通过交集算法将所述源语言词汇序列与所述目标语言词汇序列中注意力最高的词汇单元的邻接单元建立第二关联;
所述根据关联结果生成未知文字替换用词汇对齐表包括:
根据关联结果和所述第二关联结果生成未知文字替换用词汇对齐表。
于一实施例中,在根据关联结果和所述第二关联结果生成未知文字替换用词汇对齐表之后还包括:基于所述词汇对齐表,确定所述目标语言词汇序列中没有对应关系的第一目标语言词汇,根据所述注意力信息,确定注意力最高的单元与所述第一目标语言词汇建立第三关联;基于所述词汇对齐表,确定所述源语言词汇序列中没有对应关系的第一源语言词汇,根据所述注意力信息,确定注意力最高的单元与所述第一源语言词汇建立第四关联;根据关联结果、所述第二关联结果、所述第三关联结果、以及所述第四关联结果生成未知文字替换用词汇对齐表。
于一实施例中,控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态生成注意力信息,根据所述注意力信息生成未知文字替换用词汇对齐表包括:控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态,确定翻译所述源语言词汇序列中各个词汇时的上下文向量,以及根据翻译各个词汇时的上下文向量生成未知文字替换用词汇对齐表。
于一实施例中,控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态,确定翻译所述源语言词汇序列中各个词汇时的上下文向量,以及根据翻译各个词汇时的上下文向量生成未知文字替换用词汇对齐表包括:在翻译各个词汇时,确定当前翻译的目标语言词汇序列的序号,获取翻译该词汇时应注意的位置,对所述源语言词汇序列中各词汇计算注意力概率,将所述源语言词汇序列中各词汇对应的分布表示向量乘以该词汇的注意力概率之后,确定最大值对应的词汇在所述源语言词汇序列的序号;根据翻译各个词汇时,将当前翻译的目标语言词汇序列的序号,和所确定的最大值对应的词汇在所述源语言词汇序列的序号建立关联,根据关联结果生成未知文字替换用词汇对齐表。
于一实施例中,对所述源语言词汇进行翻译得到目标语言词汇包括:采用 IBM对齐模型对所述源语言词汇进行翻译得到目标语言词汇;或者通过外部词典对所述源语言词汇进行翻译得到目标语言词汇。
第二方面,本公开实施例还提供了一种神经网络文本翻译模型的运行装置,所述神经网络文本翻译模型包括编码器层、注意力机制层、以及解码器层,包括:编码单元,用于将源语言词汇序列输入所述编码器层进行处理,以形成隐结构向量;注意力控制单元,用于控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态生成注意力信息,根据所述注意力信息生成未知文字替换用词汇对齐表,其中所述词汇对齐表没有重复词汇;解码单元,将所述隐结构向量和翻译各个词汇时的上下文向量输入所述解码器层进行处理,以生成目标语言词汇序列;未知文字定位单元,用于获取所述目标语言词汇序列中的未知文字,根据所述词汇对齐表确定所述未知文字对应所述源语言词汇序列中的源语言词汇;未知文字翻译单元,用于对所述源语言词汇进行翻译得到目标语言词汇;词汇替换单元,用于将所述目标语言词汇序列中的所述未知文字用所述目标语言词汇替换。
于一实施例中,所述注意力控制单元用于:根据所述注意力信息,通过交集算法将所述源语言词汇序列与所述目标语言词汇序列中注意力最高的词汇单元建立关联,根据关联结果生成未知文字替换用词汇对齐表,其中所述词汇单元包括一个或一个以上相邻的词汇。
于一实施例中,所述注意力控制单元用于,在根据关联结果生成未知文字替换用词汇对齐表之前:通过交集算法将所述源语言词汇序列与所述目标语言词汇序列中注意力最高的词汇单元的邻接单元建立第二关联;根据关联结果和所述第二关联结果生成未知文字替换用词汇对齐表。
于一实施例中,所述注意力控制单元用于:在根据关联结果和所述第二关 联结果生成未知文字替换用词汇对齐表之后,基于所述词汇对齐表,确定所述目标语言词汇序列中没有对应关系的第一目标语言词汇,根据所述注意力信息,确定注意力最高的单元与所述第一目标语言词汇建立第三关联;基于所述词汇对齐表,确定所述源语言词汇序列中没有对应关系的第一源语言词汇,根据所述注意力信息,确定注意力最高的单元与所述第一源语言词汇建立第四关联;根据关联结果、所述第二关联结果、所述第三关联结果、以及所述第四关联结果生成未知文字替换用词汇对齐表。
于一实施例中,所述注意力控制单元用于:控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态,确定翻译所述源语言词汇序列中各个词汇时的上下文向量,以及根据翻译各个词汇时的上下文向量生成未知文字替换用词汇对齐表。
于一实施例中,所述注意力控制单元用于:在翻译各个词汇时,确定当前翻译的目标语言词汇序列的序号,获取翻译该词汇时应注意的位置,对所述源语言词汇序列中各词汇计算注意力概率,将所述源语言词汇序列中各词汇对应的分布表示向量乘以该词汇的注意力概率之后,确定最大值对应的词汇在所述源语言词汇序列的序号;根据翻译各个词汇时,将当前翻译的目标语言词汇序列的序号,和所确定的最大值对应的词汇在所述源语言词汇序列的序号建立关联,根据关联结果生成未知文字替换用词汇对齐表。
于一实施例中,所述未知文字翻译单元用于:采用IBM对齐模型对所述源语言词汇进行翻译得到目标语言词汇;或者通过外部词典对所述源语言词汇进行翻译得到目标语言词汇。
第三方面,本公开实施例还提供了一种电子设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或 多个处理器执行,使得所述一个或多个处理器实现如第一方面中任一项所述方法的指令。
第四方面,本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如第一方面中任一项所述方法的步骤。
本公开实施例提出的技术方案的有益技术效果是:
本公开实施例通过注意力机制生成对齐词汇表,找出目标语言词汇序列中的未知文字,确定该未知文字所对应源语言词汇序列中的词汇,对该词汇进行翻译,再利用翻译的词汇替换未知文字,从而消除未知文字。具体而言,根据神经网络机器翻译生成的注意力,制作没有重复词汇的词汇对齐表,判断输出结果中的未知文字和源语言词汇序列哪个词汇对应,再将未知文字替换成适当的词汇,能减少乃至完全消除翻译结果中的未知文字。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对本公开实施例描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本公开实施例中的一部分实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据本公开实施例的内容和这些附图获得其他的附图。
图1是本公开实施例提供的一种神经网络文本翻译模型的运行方法的流程示意图;
图2是本公开实施例提供的另一种神经网络文本翻译模型的运行方法的流程示意图;
图3是本公开实施例提供的一种神经网络文本翻译模型的运行装置的结构示意图;
图4示出了适于用来实现本公开实施例的电子设备的结构示意图。
具体实施方式
为使本公开实施例解决的技术问题、采用的技术方案和达到的技术效果更加清楚,下面将结合附图对本公开实施例的技术方案作进一步的详细描述,显然,所描述的实施例仅仅是本公开实施例中的一部分实施例,而不是全部的实施例。基于本公开实施例中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开实施例保护的范围。
需要说明的是,本公开实施例中术语“系统”和“网络”在本文中常被可互换使用。本公开实施例中提到的“和/或”是指包括一个或更多个相关所列项目的任何和所有组合。本公开的说明书和权利要求书及附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于限定特定顺序。
还需要说明是,本公开实施例中下述各个实施例可以单独执行,各个实施例之间也可以相互结合执行,本公开实施例对此不作具体限制。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
下面结合附图并通过具体实施方式来进一步说明本公开实施例的技术方案。
图1示出了本公开实施例提供的一种神经网络文本翻译模型的运行方法的流程示意图,本实施例可适用于通过神经网络机器翻译模型进行文本翻译的情况,该方法可以由配置于电子设备中的神经网络文本翻译模型的运行装置来执 行,所述神经网络文本翻译模型包括编码器层、注意力机制层、以及解码器层,如图1所示,本实施例所述的神经网络文本翻译模型的运行方法包括:
在步骤S110中,将源语言词汇序列输入所述编码器层进行处理,以形成隐结构向量。本步骤可通过多种方式实现,例如可通过编码器将各词汇转换(词嵌入,WordEmbedding)为分布表示(distributed representation)向量,其中包含语义。利用前向传播(forward)RNN和反向传播(backward)循环神经网络RNN,将所获得的分布表示向量组合生成隐结构向量。
具体而言,将源语言词汇序列f=(f 1,f 2,……,f J)及其分布表示(独热码,One-Hot Encoding)x=(x 1,x 2,……,x J)、目标语言词汇序列e(f 1,f 2,……,f I)及其分布表示y=(y 1,y 2,……,y I)进行学习。
根据h j=h(x j,h j-1)算出第j个源语言词汇的上下文向量h j
在步骤S120中,控制所述注意力机制层根据所述编码器层的和所述解码器层的内部状态生成注意力信息,根据所述注意力信息生成未知文字替换用词汇对齐表,其中所述词汇对齐表没有重复词汇。
具体而言,根据
Figure PCTCN2020125431-appb-000001
e ij=a(s i-1,h j)算出注意力概率α ij,该注意力概率表示x i与y i相关联的概率;
根据
Figure PCTCN2020125431-appb-000002
算出第i个目标语言词汇的上下文向量c i
根据s i=f(s i-1,y i-1,c i)算出第i个目标语言词汇的隐结构向量s i
在步骤S130中,将所述隐结构向量和翻译各个词汇时的上下文向量输入所述解码器层进行处理,以生成目标语言词汇序列。
具体而言,根据p(e i|e 1,e 2,······,e i-1,x)=g(y i-1,s i,c i)算出e i的生成概率;
根据
Figure PCTCN2020125431-appb-000003
算出短语翻译概率p(e|f)。
在步骤S140中,获取所述目标语言词汇序列中的未知文字,根据所述词汇对齐表确定所述未知文字对应所述源语言词汇序列中的源语言词汇。
在步骤S150中,对所述源语言词汇进行翻译得到目标语言词汇。例如采用IBM对齐模型对所述源语言词汇进行翻译得到目标语言词汇。又如,通过外部词典对所述源语言词汇进行翻译得到目标语言词汇。
在步骤S160中,将所述目标语言词汇序列中的所述未知文字用所述目标语言词汇替换。
上述相关函数a()、f()、g()、h()是利用非线性函数tanh将输入变量的加权线性和进行转化的函数。例如将输入变量设为v 1,v 2……,v n,将各变量的权重设为w 1,w 2……,w n,将截距设为c,a(v 1,v 2……,v n)=tanh(∑ iw iv i+c)。
于一实施例中,根据所述注意力信息生成未知文字替换用词汇对齐表可采用多种方法,例如可根据所述注意力信息,通过交集算法将所述源语言词汇序列与所述目标语言词汇序列中注意力最高的词汇单元建立关联,根据关联结果生成未知文字替换用词汇对齐表,其中所述词汇单元包括一个或一个以上相邻的词汇。
进一步地,在根据关联结果生成未知文字替换用词汇对齐表之前,还可通过交集算法将所述源语言词汇序列与所述目标语言词汇序列中注意力最高的词汇单元的邻接单元建立第二关联。以根据关联结果和所述第二关联结果生成未知文字替换用词汇对齐表。
更进一步地,在根据关联结果和所述第二关联结果生成未知文字替换用词汇对齐表之后,还可基于所述词汇对齐表,确定所述目标语言词汇序列中没有对应关系的第一目标语言词汇,根据所述注意力信息,确定注意力最高的单元 与所述第一目标语言词汇建立第三关联;以及基于所述词汇对齐表,确定所述源语言词汇序列中没有对应关系的第一源语言词汇,根据所述注意力信息,确定注意力最高的单元与所述第一源语言词汇建立第四关联;以根据关联结果、所述第二关联结果、所述第三关联结果、以及所述第四关联结果生成未知文字替换用词汇对齐表。
于一实施例中,控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态生成注意力信息,根据所述注意力信息生成未知文字替换用词汇对齐表,可控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态,确定翻译所述源语言词汇序列中各个词汇时的上下文向量,以及根据翻译各个词汇时的上下文向量生成未知文字替换用词汇对齐表。
其中,控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态,确定翻译所述源语言词汇序列中各个词汇时的上下文向量,以及根据翻译各个词汇时的上下文向量生成未知文字替换用词汇对齐表可执行如下操作:
在翻译各个词汇时,确定当前翻译的目标语言词汇序列的序号,获取翻译该词汇时应注意的位置,对所述源语言词汇序列中各词汇计算注意力概率,将所述源语言词汇序列中各词汇对应的分布表示向量乘以该词汇的注意力概率之后,确定最大值对应的词汇在所述源语言词汇序列的序号;根据翻译各个词汇时,将当前翻译的目标语言词汇序列的序号,和所确定的最大值对应的词汇在所述源语言词汇序列的序号建立关联,根据关联结果生成未知文字替换用词汇对齐表。
本实施例通过注意力机制生成对齐词汇表,找出目标语言词汇序列中的未知文字,确定该未知文字所对应源语言词汇序列中的词汇,对该词汇进行翻译,再利用翻译的词汇替换未知文字,从而减少翻译结果中的未知文字。
图2示出了本公开实施例提供的另一种神经网络文本翻译模型的运行方法的流程示意图,本实施例以前述实施例为基础,进行了改进优化。如图2所示,本实施例所述的神经网络文本翻译模型的运行方法包括:
在步骤S210中,基于注意力机制制作词汇对齐表。
本步骤可通过交集算法(intersection;参见Koehn et al.2003)、修正算法1及修正算法2制作未知文字替换用词汇对齐表。
如下所述,第i个目标语言词汇及第j个源语言词汇所对应的注意力概率(式中以a ij表示)及词汇对齐表中的要素构成一个单元。
首先通过交集算法,针对源语言及目标语言,将注意力值最高的单元建立关联。根据交集算法,各单元的值b ij可根据下式计算:
Figure PCTCN2020125431-appb-000004
其中,b ij为注意力值;arg i’max a i’j为最大值参数函数(arguments of the maxima),用以计算注意力值最高时的i及j,从而确定该等值所对应的单元。
采用修正算法1(算法1),提取通过交集算法获得的b ij为1的单元的邻接单元作为备选单元,这些备选单元的初始值均为0。当某备选单元的注意力值大于其他源语言对应的注意力值时,将该单元的值设为1;另外,当某备选单元的注意力值大于其他目标语言对应的注意力值时,将该单元的值设为1。该算法是考虑到通常源语言的一个词汇对应目标语言的多个词汇时,所述多个词汇通常在目标语言中是邻接的,因此可以将已经列入词汇对齐表中的词汇的邻接词汇也列入表中。通过该修正算法1获得的词汇对齐表b' ij是使用某种函数通过如下方式计算,该函数能够求出b pq的上下左右的邻近单元中值取1的单元的数量。修正算法1的公式如下:
Figure PCTCN2020125431-appb-000005
上述公式中:
neighbor(b pq)=b (p-1)q+b (p+1)q+b p(q-1)+b p(q+1),该函数是计算为b ij为1的单元的四个相邻单元中值为1的单元的数量;argmax函数同上文所述。
进一步地,采用进一步的修正算法(算法2),基于词汇对齐表b' ij,对于没有对应的目标语言词汇的源语言词汇,取注意力值最高的单元建立对应关系;反之,对于没有对应的源语言词汇的目标语言词汇,同样地取注意力值最高的单元建立对应关系。获得的词汇对齐表b" ij可通过修正算法2计算,公式如下:
Figure PCTCN2020125431-appb-000006
其中
Figure PCTCN2020125431-appb-000007
I及J为没有对应的目标语言词汇的源语言词汇的集合及没有对应的源语言词汇的目标语言词汇的集合;argmax函数同上文所述。
在修正算法2中,目标语言词汇序列中的各词汇对应于至少一个源语言词汇序列中的词汇。换言之,通过修正算法2,目标语言词汇序列中的所有未知文字都能对应地被分配至源语言词汇序列中的词汇。
在步骤S210中,依据该词汇对齐表确定与未知文字对应的词汇,并将该未知文字替换为该词汇。
本步骤使用所制作的词汇对齐表,将与目标语言词汇序列中的未知文字e i对应的源语言词汇序列中的词汇行f i设为f i={f j|b ij=1},确定翻译词汇行并将e i替换为对应的词汇。翻译词汇行的确定方法可采用IBM对齐模型或导入外部词典等方式。
对平行语料库使用IBM对齐模型(参见Hashimoto et al.2016;Arthur et al.2016),获得词汇翻译概率p(e|f),从中选出源语言词汇序列各词汇中概 率最高的词汇e highest=arg emaxp(e|f i)。
或者,采用ChangePhrase(参见Koehn et al.2003),对平行语料库使用统计机器翻译制作短语表,参考该短语表,由语料库算出短语翻译概率
Figure PCTCN2020125431-appb-000008
Figure PCTCN2020125431-appb-000009
并选择短语翻译概率最高的短语e highest=arg emaxP(e|f i),c(f)为语料库中的短语f的出现次数,c(e,f)为短语e和f同时出现的次数。
或者,也可以通过检索外部导入的词典来选择翻译词汇。
本公开实施例在使用修正算法2制作词汇对齐表的情况下,能够完全消除未知文字,同时BLEU值(参见Papineni,Roukos,Ward,and Zhu 2002)和METEOR值(Banerjee and Lavie 2005)也有所提高。进一步地,通过采用本发明的方法,并且导入更加专业的外部词典,能够进一步提高翻译准确度,对于对术语翻译要求更高的科技文献、专利文献等能够实现更好的翻译效果。另外,在语言学意义上是利用邻近词汇间存在对应关系这一特性,根据注意力机制推算对齐词汇表。然后,利用生成的对齐词汇表对于未知文字加以替换,同时发挥神经网络机器翻译的注意力机制的优点和语言学特性,从而解决未知文字问题。
本公开实施根据神经网络机器翻译生成的注意力,制作没有重复词汇的词汇对齐表,判断输出结果中的未知文字和源语言词汇序列哪个词汇对应,再将未知文字替换成适当的词汇,能减少乃至完全消除翻译结果中的未知文字。
具体地,在使用修正算法2制作单词对齐表的情况下,能够完全消除未知文字,同时BLEU值和METEOR值也有所提高。进一步地,通过采用本发明的方法,并且导入更加专业的外部词典,能够进一步提高翻译准确度,对于对术语翻译要求更高的科技文献、专利文献等能够实现更好的翻译效果。
另外,本发明在语言学意义上是利用邻近单词间存在对应关系这一特性,根据注意力机制推算对齐单词表。然后,利用生成的对齐单词表对于未知文字加以替换,同时发挥神经网络机器翻译的注意力机制的优点和语言学特性,从而解决未知文字问题。
神经网络机器翻译所采用的语料库为NIST及WMT,亦可采用其他类型的语料库。
以下内容为根据本公开实施例所述的方法,具体采用的算法对应的效果检验方法。于效果检验环境,平行语料库使用NIST及WMT,学习模型及解码器使用nematus,隐藏层数1000层,单词向量维数512,RNN使用GRU,学习算法Adam,学习率0.0001,批大小(Batch_Size)40,不添加dropout,在该环境下进行学习。使用Stanford Parser进行英文句法分析,使用KyTea进行汉语的指示标记化,IBM模型具体采用GIZA++,使用mosesdecoder提取短语表,使用EDict作为替换未知语言所使用的外部词典。
利用NIST,训练文本单词数量为1万至5万,据此计算每次增加1万个单词时翻译结果的BLEU值。
各语料库的文本量及单词量如表1所示
Figure PCTCN2020125431-appb-000010
表1
每次增加1万个单词时翻译结果的BLEU值如表2所示。
单词量 1万 2万 3万 4万 5万
BLEU 23.02 24.11 24.45 24.89 24.73
表2
根据结果,在检验中将单词量设为4万。
关于测评指标,通常翻译准确度的测评采用BLEU(Bilingual Evaluation Understudy,双语评估替换分数)(Papineni,Roukos,Ward,and Zhu 2002)及METEOR(METEOR标准:Language specific translation evaluation for any target language)(Banerjeeand Lavie 2005)。
检验结果如下:
Baseline是在神经网络机器翻译系统nematus的预设值下进行学习获得的模型,BPE及PosUNK分别采用Sennrich et al.2016及Luong et al.2015所提出的算法,Intersection为交集算法,Dict为导入的外部词典韦氏辞典(Webster Dictionary),也可采用其他常用词典,在单词量设为4万时,结果见下表:
NIST语料库的翻译准确度结果如表3所示。
Figure PCTCN2020125431-appb-000011
表3
WMT语料库的翻译准确度结果如表4所示:
Figure PCTCN2020125431-appb-000012
Figure PCTCN2020125431-appb-000013
表4
根据上述表4的结果可知,采用IBM算法时翻译结果优于采用ChangePhrase算法。实验结果表明,采用ChangePhrase时会连续出现多个未知文字而形成短语,如果该短语无法翻译,则最终会导致多个未知文字无法翻译。而IBM算法是逐个替换单词,因此只要语料库中有该单词就能够进行替换。
同时使用gdfa-f和IBM时,能够将全部未知文字加以替换,并且其BLEU值也不逊于现有的intersection法。
现有的BPE法(Sennrich et al.2016)和PosUNK法(Luong et al.2015)虽然能够减少未知文字,但翻译质量也随之下降。
综上所述,本公开实施例的神经网络机器翻译方法是根据神经网络机器翻译生成的注意力,制作没有重复单词的单词对齐表,判断输出结果中的未知文字和源语言词汇序列哪个单词对应,再使用SMT模型将未知文字替换成适当的单词。使用gdfa-f制作单词对齐表时,能够完全消除未知文字,同时BLEU值和METEOR值也有所提高。
采用本公开实施例的方法,通过导入更加专业的外部词典,能够进一步提高翻译准确度,对于对术语翻译要求更高的科技文献、专利文献等能够产生更积极的意义。
作为上述各图所示方法的实现,本申请提供了一种神经网络文本翻译模型的运行装置的一个实施例,图3示出了本实施例提供的一种神经网络文本翻译模型的运行装置的结构示意图,该装置实施例与图1和图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。本实施例所述神经网络文本翻译模型包括编码器层、注意力机制层、以及解码器层,如图3所示,本实施例所述的神经网络文本翻译模型的运行装置包括编码单元310、注意力控制单元320、解码单元330、未知文字定位单元340、未知文字翻译单元350和词汇替换单元360。
所述编码单元310被配置为,用于将源语言词汇序列输入所述编码器层进行处理,以形成隐结构向量。
所述注意力控制单元320被配置为,用于控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态生成注意力信息,根据所述注意力信息生成未知文字替换用词汇对齐表,其中所述词汇对齐表没有重复词汇。
所述解码单元330被配置为,将所述隐结构向量和翻译各个词汇时的上下文向量输入所述解码器层进行处理,以生成目标语言词汇序列。
所述未知文字定位单元340被配置为,用于获取所述目标语言词汇序列中的未知文字,根据所述词汇对齐表确定所述未知文字对应所述源语言词汇序列中的源语言词汇。
所述未知文字翻译单元350被配置为,用于对所述源语言词汇进行翻译得到目标语言词汇。
所述词汇替换单元360被配置为,用于将所述目标语言词汇序列中的所述未知文字用所述目标语言词汇替换。
根据本公开的一个或多个实施例,所述注意力控制单元320被配置为,用于根据所述注意力信息,通过交集算法将所述源语言词汇序列与所述目标语言词汇序列中注意力最高的词汇单元建立关联,根据关联结果生成未知文字替换用词汇对齐表,其中所述词汇单元包括一个或一个以上相邻的词汇。
根据本公开的一个或多个实施例,所述注意力控制单元320被配置为,用于在根据关联结果生成未知文字替换用词汇对齐表之前:通过交集算法将所述源语言词汇序列与所述目标语言词汇序列中注意力最高的词汇单元的邻接单元建立第二关联;根据关联结果和所述第二关联结果生成未知文字替换用词汇对齐表。
根据本公开的一个或多个实施例,所述注意力控制单元320被配置为,用于在根据关联结果和所述第二关联结果生成未知文字替换用词汇对齐表之后,基于所述词汇对齐表,确定所述目标语言词汇序列中没有对应关系的第一目标语言词汇,根据所述注意力信息,确定注意力最高的单元与所述第一目标语言词汇建立第三关联;基于所述词汇对齐表,确定所述源语言词汇序列中没有对应关系的第一源语言词汇,根据所述注意力信息,确定注意力最高的单元与所述第一源语言词汇建立第四关联;根据关联结果、所述第二关联结果、所述第三关联结果、以及所述第四关联结果生成未知文字替换用词汇对齐表。
根据本公开的一个或多个实施例,所述注意力控制单元320被配置为,用于控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态,确定翻译所述源语言词汇序列中各个词汇时的上下文向量,以及根据翻译各个词汇时的上下文向量生成未知文字替换用词汇对齐表。
根据本公开的一个或多个实施例,所述注意力控制单元320被配置为,用于在翻译各个词汇时,确定当前翻译的目标语言词汇序列的序号,获取翻译该 词汇时应注意的位置,对所述源语言词汇序列中各词汇计算注意力概率,将所述源语言词汇序列中各词汇对应的分布表示向量乘以该词汇的注意力概率之后,确定最大值对应的词汇在所述源语言词汇序列的序号;根据翻译各个词汇时,将当前翻译的目标语言词汇序列的序号,和所确定的最大值对应的词汇在所述源语言词汇序列的序号建立关联,根据关联结果生成未知文字替换用词汇对齐表。
根据本公开的一个或多个实施例,所述未知文字翻译单元350被配置为,用于采用IBM对齐模型对所述源语言词汇进行翻译得到目标语言词汇。或者用于通过外部词典对所述源语言词汇进行翻译得到目标语言词汇。
本实施例提供的神经网络文本翻译模型的运行装置可执行本公开方法实施例所提供的神经网络文本翻译模型的运行方法,具备执行方法相应的功能模块和有益效果。
下面参考图4,其示出了适于用来实现本公开实施例的电子设备400的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图4示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图4所示,电子设备400可以包括处理装置(例如中央处理器、图形处理器等)401,其可以根据存储在只读存储器(ROM)402中的程序或者从存储装置408加载到随机访问存储器(RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中,还存储有电子设备400操作所需的各种程序和数据。处理 装置401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(I/O)接口405也连接至总线404。
通常,以下装置可以连接至I/O接口405:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置406;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置407;包括例如磁带、硬盘等的存储装置408;以及通信装置409。通信装置409可以允许电子设备400与其他设备进行无线或有线通信以交换数据。虽然图4示出了具有各种装置的电子设备400,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开实施例的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开实施例的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置409从网络上被下载和安装,或者从存储装置408被安装,或者从ROM402被安装。在该计算机程序被处理装置401执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开实施例上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储 器件、或者上述的任意合适的组合。在本公开实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开实施例中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:将源语言词汇序列输入所述编码器层进行处理,以形成隐结构向量;控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态生成注意力信息,根据所述注意力信息生成未知文字替换用词汇对齐表,其中所述词汇对齐表没有重复词汇;将所述隐结构向量和翻译各个词汇时的上下文向量输入所述解码器层进行处理,以生成目标语言词汇序列;获取所述目标语言词汇序列中的未知文字,根据所述词汇对齐表确定所述未知文字对应所述源语言词汇序列中的源语言词汇;对所述源语言词汇进行翻译得到目标语言词汇;将所述目标语言词汇序列中的所述未知文字用所述目标语言词汇替换。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开实施例的 操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开实施例各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
以上描述仅为本公开实施例的较佳实施例以及对所运用技术原理的说明。 本领域技术人员应当理解,本公开实施例中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开实施例中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (10)

  1. 一种神经网络文本翻译模型的运行方法,所述神经网络文本翻译模型包括编码器层、注意力机制层、以及解码器层,其特征在于,包括:
    将源语言词汇序列输入所述编码器层进行处理,以形成隐结构向量;
    控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态生成注意力信息,根据所述注意力信息生成未知文字替换用词汇对齐表,其中所述词汇对齐表没有重复词汇;
    将所述隐结构向量和翻译各个词汇时的上下文向量输入所述解码器层进行处理,以生成目标语言词汇序列;
    获取所述目标语言词汇序列中的未知文字,根据所述词汇对齐表确定所述未知文字对应所述源语言词汇序列中的源语言词汇;
    对所述源语言词汇进行翻译得到目标语言词汇;
    将所述目标语言词汇序列中的所述未知文字用所述目标语言词汇替换。
  2. 根据权利要求1所述的神经网络文本翻译模型的运行方法,其特征在于,根据所述注意力信息生成未知文字替换用词汇对齐表包括:
    根据所述注意力信息,通过交集算法将所述源语言词汇序列与所述目标语言词汇序列中注意力最高的词汇单元建立关联,根据关联结果生成未知文字替换用词汇对齐表,其中所述词汇单元包括一个或一个以上相邻的词汇。
  3. 根据权利要求2所述的神经网络文本翻译模型的运行方法,其特征在于,在根据关联结果生成未知文字替换用词汇对齐表之前还包括:
    通过交集算法将所述源语言词汇序列与所述目标语言词汇序列中注意力最高的词汇单元的邻接单元建立第二关联;
    所述根据关联结果生成未知文字替换用词汇对齐表包括:
    根据关联结果和所述第二关联结果生成未知文字替换用词汇对齐表。
  4. 根据权利要求3所述的神经网络文本翻译模型的运行方法,其特征在于,在根据关联结果和所述第二关联结果生成未知文字替换用词汇对齐表之后还包括:
    基于所述词汇对齐表,确定所述目标语言词汇序列中没有对应关系的第一目标语言词汇,根据所述注意力信息,确定注意力最高的单元与所述第一目标语言词汇建立第三关联;
    基于所述词汇对齐表,确定所述源语言词汇序列中没有对应关系的第一源语言词汇,根据所述注意力信息,确定注意力最高的单元与所述第一源语言词汇建立第四关联;
    根据关联结果、所述第二关联结果、所述第三关联结果、以及所述第四关联结果生成未知文字替换用词汇对齐表。
  5. 根据权利要求1所述的神经网络文本翻译模型的运行方法,其特征在于,控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态生成注意力信息,根据所述注意力信息生成未知文字替换用词汇对齐表包括:
    控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态,确定翻译所述源语言词汇序列中各个词汇时的上下文向量,以及根据翻译各个词汇时的上下文向量生成未知文字替换用词汇对齐表。
  6. 根据权利要求5所述的神经网络文本翻译模型的运行方法,其特征在于,控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态,确定翻译所述源语言词汇序列中各个词汇时的上下文向量,以及根据翻译各个词汇时的上下文向量生成未知文字替换用词汇对齐表包括:
    在翻译各个词汇时,确定当前翻译的目标语言词汇序列的序号,获取翻译该词汇时应注意的位置,对所述源语言词汇序列中各词汇计算注意力概率,将所述源语言词汇序列中各词汇对应的分布表示向量乘以该词汇的注意力概率之后,确定最大值对应的词汇在所述源语言词汇序列的序号;
    根据翻译各个词汇时,将当前翻译的目标语言词汇序列的序号,和所确定的最大值对应的词汇在所述源语言词汇序列的序号建立关联,根据关联结果生成未知文字替换用词汇对齐表。
  7. 根据权利要求1所述的神经网络文本翻译模型的运行方法,其特征在于,对所述源语言词汇进行翻译得到目标语言词汇包括:
    采用IBM对齐模型对所述源语言词汇进行翻译得到目标语言词汇;或者
    通过外部词典对所述源语言词汇进行翻译得到目标语言词汇。
  8. 一种神经网络文本翻译模型的运行装置,所述神经网络文本翻译模型包括编码器层、注意力机制层、以及解码器层,其特征在于,包括:
    编码单元,用于将源语言词汇序列输入所述编码器层进行处理,以形成隐结构向量;
    注意力控制单元,用于控制所述注意力机制层根据所述编码器层和所述解码器层的内部状态生成注意力信息,根据所述注意力信息生成未知文字替换用词汇对齐表,其中所述词汇对齐表没有重复词汇;
    解码单元,将所述隐结构向量和翻译各个词汇时的上下文向量输入所述解码器层进行处理,以生成目标语言词汇序列;
    未知文字定位单元,用于获取所述目标语言词汇序列中的未知文字,根据所述词汇对齐表确定所述未知文字对应所述源语言词汇序列中的源语言词汇;
    未知文字翻译单元,用于对所述源语言词汇进行翻译得到目标语言词汇;
    词汇替换单元,用于将所述目标语言词汇序列中的所述未知文字用所述目标语言词汇替换。
  9. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一项所述方法的指令。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1-7任一项所述方法的步骤。
PCT/CN2020/125431 2020-03-17 2020-10-30 神经网络文本翻译模型的运行方法、装置、设备、及介质 WO2021184769A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010187586.2A CN111401078A (zh) 2020-03-17 2020-03-17 神经网络文本翻译模型的运行方法、装置、设备、及介质
CN202010187586.2 2020-03-17

Publications (1)

Publication Number Publication Date
WO2021184769A1 true WO2021184769A1 (zh) 2021-09-23

Family

ID=71430926

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125431 WO2021184769A1 (zh) 2020-03-17 2020-10-30 神经网络文本翻译模型的运行方法、装置、设备、及介质

Country Status (2)

Country Link
CN (1) CN111401078A (zh)
WO (1) WO2021184769A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113971837A (zh) * 2021-10-27 2022-01-25 厦门大学 一种基于知识的多模态特征融合的动态图神经手语翻译方法
CN114898595A (zh) * 2022-06-14 2022-08-12 泉州师范学院 一种船舶驾驶台通信集成设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401078A (zh) * 2020-03-17 2020-07-10 江苏省舜禹信息技术有限公司 神经网络文本翻译模型的运行方法、装置、设备、及介质
CN111814496B (zh) * 2020-08-04 2023-11-28 腾讯科技(深圳)有限公司 文本处理方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391842A (zh) * 2014-12-18 2015-03-04 苏州大学 一种翻译模型构建方法和系统
CN105446958A (zh) * 2014-07-18 2016-03-30 富士通株式会社 词对齐方法和词对齐设备
CN108647214A (zh) * 2018-03-29 2018-10-12 中国科学院自动化研究所 基于深层神经网络翻译模型的解码方法
CN111401078A (zh) * 2020-03-17 2020-07-10 江苏省舜禹信息技术有限公司 神经网络文本翻译模型的运行方法、装置、设备、及介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11113480B2 (en) * 2016-09-26 2021-09-07 Google Llc Neural machine translation systems
CN107967262B (zh) * 2017-11-02 2018-10-30 内蒙古工业大学 一种神经网络蒙汉机器翻译方法
CN109684648B (zh) * 2019-01-14 2020-09-01 浙江大学 一种多特征融合的古今汉语自动翻译方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105446958A (zh) * 2014-07-18 2016-03-30 富士通株式会社 词对齐方法和词对齐设备
CN104391842A (zh) * 2014-12-18 2015-03-04 苏州大学 一种翻译模型构建方法和系统
CN108647214A (zh) * 2018-03-29 2018-10-12 中国科学院自动化研究所 基于深层神经网络翻译模型的解码方法
CN111401078A (zh) * 2020-03-17 2020-07-10 江苏省舜禹信息技术有限公司 神经网络文本翻译模型的运行方法、装置、设备、及介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113971837A (zh) * 2021-10-27 2022-01-25 厦门大学 一种基于知识的多模态特征融合的动态图神经手语翻译方法
CN114898595A (zh) * 2022-06-14 2022-08-12 泉州师范学院 一种船舶驾驶台通信集成设备
CN114898595B (zh) * 2022-06-14 2024-03-01 泉州师范学院 一种船舶驾驶台通信集成设备

Also Published As

Publication number Publication date
CN111401078A (zh) 2020-07-10

Similar Documents

Publication Publication Date Title
WO2021184769A1 (zh) 神经网络文本翻译模型的运行方法、装置、设备、及介质
JP7166322B2 (ja) モデルを訓練するための方法、装置、電子機器、記憶媒体およびコンピュータプログラム
US11157698B2 (en) Method of training a descriptive text generating model, and method and apparatus for generating descriptive text
Tan et al. Neural machine translation: A review of methods, resources, and tools
US11314946B2 (en) Text translation method, device, and storage medium
CN109219812B (zh) 口语对话系统中的自然语言生成
US20210397780A1 (en) Method, device, and storage medium for correcting error in text
CN107861954B (zh) 基于人工智能的信息输出方法和装置
EP4116861A2 (en) Method and apparatus for pre-training semantic representation model and electronic device
US20230023789A1 (en) Method for identifying noise samples, electronic device, and storage medium
US20220215177A1 (en) Method and system for processing sentence, and electronic device
CN111563390B (zh) 文本生成方法、装置和电子设备
CN109408834B (zh) 辅助机器翻译方法、装置、设备及存储介质
JP7133002B2 (ja) 句読点予測方法および装置
JP7395553B2 (ja) 文章翻訳方法、装置、電子機器及び記憶媒体
WO2023061106A1 (zh) 用于语言翻译的方法、设备、装置和介质
US20230178067A1 (en) Method of training speech synthesis model and method of synthesizing speech
WO2023082931A1 (zh) 用于语音识别标点恢复的方法、设备和存储介质
JP2022059021A (ja) モデル訓練方法および装置、テキスト予測方法および装置、電子デバイス、コンピュータ可読記憶媒体、およびコンピュータプログラム
CN111339788A (zh) 交互式机器翻译方法、装置、设备和介质
US11461549B2 (en) Method and apparatus for generating text based on semantic representation, and medium
US20210232775A1 (en) Language generation method and apparatus, electronic device and storage medium
CN111460224A (zh) 评论数据的质量标注方法、装置、设备及存储介质
US20220083745A1 (en) Method, apparatus and electronic device for determining word representation vector
CN115640815A (zh) 翻译方法、装置、可读介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20925152

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20925152

Country of ref document: EP

Kind code of ref document: A1