CN110008482A

CN110008482A - Text handling method, device, computer readable storage medium and computer equipment

Info

Publication number: CN110008482A
Application number: CN201910308349.4A
Authority: CN
Inventors: 王星; 涂兆鹏; 王龙跃; 史树明
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-04-17
Filing date: 2019-04-17
Publication date: 2019-07-12
Anticipated expiration: 2039-04-17
Also published as: CN110008482B; CN111368564B; CN111368564A

Abstract

This application involves a kind of text handling method, device, computer readable storage medium and computer equipments, which comprises obtains the list entries of source text；The list entries is obtained into source sequence vector by semantic coding；Obtain corresponding first weight vectors of each word in the source sequence vector；According to the source sequence vector and corresponding first weight vectors of each word, the target side vector of each word is generated；Target sentences vector is obtained according to the source sequence vector；According to target sentences vector described in the target side vector sum of each word, the corresponding target word of each word is determined；According to the corresponding target word of each word, the corresponding target text of the source text is generated.Each word can be translated using sentence information using this programme, improve the accuracy rate of translation.

Description

Text handling method, device, computer readable storage medium and computer equipment

Technical field

This application involves field of computer technology, more particularly to a kind of text handling method, device, computer-readable deposit Storage media and computer equipment.

Background technique

With the continuous development of machine learning techniques, there is machine translation mothod.Neural network machine translation at present is made Attention mechanism is typically utilized in current neural machine Translation Study and application for the translation technology of latest generation Word in selection source sentence is decoded translation.

However, current neural machine translation framework is when carrying out that attention mechanism selects that suitable word translated It waits, entire source sentence information is accounted for being unable to fully, so that it is not accurate enough to translate the text come.For example, for Certain ambiguous words are unable to fully consider the case where context may cause translation error.

Summary of the invention

Based on this, it is necessary in being directed to can not the technical issues of leading to translation error from the context, a kind of text is provided Processing method, device, computer readable storage medium and computer equipment.

A kind of text handling method, comprising:

Obtain the list entries of source text；

The list entries is obtained into source sequence vector by semantic coding；

Obtain corresponding first weight vectors of each word in the source sequence vector；

According to the source sequence vector and corresponding first weight vectors of each word, the mesh of each word is generated Mark end vector；

Target sentences vector is obtained according to the source sequence vector；

According to target sentences vector described in the target side vector sum of each word, the corresponding target word of each word is determined；

According to the corresponding target word of each word, the corresponding target text of the source text is generated.

A kind of text-processing is set, and described device includes:

Retrieval module, for obtaining the list entries of source text；

Coding module, for the list entries to be obtained source sequence vector by semantic coding；

Weight Acquisition module, for obtaining corresponding first weight vectors of each word in the source sequence vector；

Target side vector generation module, for according to the source sequence vector and corresponding first weight of each word Vector generates the target side vector of each word；

Target sentences vector determining module, for obtaining target sentences vector according to the source sequence vector；

Target word determining module is determined for the target sentences vector according to the target side vector sum of each word The corresponding target word of each word；

Target text generation module, for it is corresponding to generate the source text according to the corresponding target word of each word Target text.

A kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor executes the step of any of the above-described the method.

A kind of computer equipment, including memory and processor, the memory are stored with computer program, the calculating When machine program is executed by the processor, so that the step of processor executes any of the above-described the method.

Above-mentioned text handling method, device, computer readable storage medium and computer equipment, by obtaining source text List entries is obtained source sequence vector by semantic coding by list entries.It is corresponding to obtain each word in source sequence vector The first weight vectors the target of each word is produced according to source sequence vector and corresponding first weight vectors of each word Vector is held, so that the vector of each word of source to be converted to the vector of each word of target side.It is obtained according to source sequence vector To target sentences vector, so that the target sentences Vector Fusion key message of each word of source, and make each word with before Morphology afterwards is at association.According to the target side vector sum target sentences vector of each word, the corresponding target word of each word is determined, most Afterwards according to the corresponding target word of each word, the corresponding target text of source text is generated, is solved in traditional text interpretation method Corresponding target word is determined only according to the target side vector of single word, ignores each word semanteme represented in sentence, from And lead to the problem of translation inaccuracy.Each word can be translated using sentence information using this programme, improve translation Accuracy rate.

Detailed description of the invention

Fig. 1 is the applied environment figure of text handling method in one embodiment；

Fig. 2 is the flow diagram of text handling method in one embodiment；

Fig. 3 is the flow diagram that translation model handles text in one embodiment；

Fig. 4 is flow diagram the step of obtaining target sentences vector in one embodiment；

Fig. 5 is flow diagram the step of obtaining target sentences vector in one embodiment；

Fig. 6 is the flow diagram of one embodiment mid-deep strata sentence vector modeling；

Fig. 7 is flow diagram the step of generating deep sentence vector in one embodiment；

Fig. 8 is the flow diagram of another embodiment mid-deep strata sentence vector modeling；

Fig. 9 is flow diagram the step of generating shallow-layer sentence vector in one embodiment；

Figure 10 is flow diagram the step of determining target word in one embodiment；

Figure 11 is the architecture diagram of neural network machine translation system in one embodiment；

Figure 12 is the structural block diagram of text processing apparatus in another embodiment；

Figure 13 is the structural block diagram of computer equipment in one embodiment.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.

Fig. 1 is the applied environment figure of text handling method in one embodiment.Referring to Fig.1, this article treatment method application In text processing system.Text processing system includes terminal 110 and server 120.Terminal 110 and server 120 pass through net Network connection.Terminal 110 specifically can be terminal console or mobile terminal, and mobile terminal specifically can be with mobile phone, tablet computer, notes At least one of this computer etc..Server 120 can use the server of the either multiple server compositions of independent server Cluster is realized.In the present embodiment, terminal 110 can obtain source text to be translated, and source text is segmented, and obtain source The corresponding term vector of each word, to obtain the corresponding list entries of source text.Terminal 110 is executed according to list entries by source Text is translated as the operation of target text.After terminal 110 obtains the corresponding list entries of source text, list entries can be sent to Server 120 is executed the operation that source text is translated as to target text by server 120 according to list entries.

As shown in Fig. 2, in one embodiment, providing a kind of text handling method.The present embodiment is mainly in this way It is illustrated applied to the computer equipment in above-mentioned Fig. 1.The computer equipment can be terminal or server.Reference Fig. 2, This article treatment method specifically comprises the following steps:

Step 202, the list entries of source text is obtained.

Wherein, source text refers to text to be translated.Source text can be the texts such as sentence, paragraph, chapter.Source text can With but be not limited to Chinese text and English text.List entries refers to source text is segmented after, the corresponding term vector of each word The sequence of composition.

Specifically, computer equipment obtains text to be translated, is segmented text to be translated using participle mode. Computer equipment obtains the corresponding term vector of each word after participle, obtains the corresponding list entries of text to be translated.

In the present embodiment, computer equipment can segment method and statistical morphology using semantic participle method, character match Word segmentation processing is carried out Deng to source text.After participle, the corresponding term vector of each word is determined from vocabulary.It is had recorded in vocabulary each The corresponding term vector of word, the corresponding term vector of a word.After computer equipment segments source text, searched in vocabulary and every The identical word of the word of a source text, using term vector corresponding to word identical with the word of source text in vocabulary as the source text The term vector of word.

In the present embodiment, computer equipment can directly segment source text, obtain list entries.It can also incite somebody to action Source text is sent to third party device processing, carries out word segmentation processing to source text by third party device.Then computer equipment is straight Connect the corresponding list entries of the source text from third party device.

Step 204, which is obtained into source sequence vector by semantic coding.

Wherein, semantic coding, which refers to, processes information by word, by the meaning of a word, the classification of word category system or language It says that material with specific linguistic form is subject to tissue and summary, finds out the main argument, argument and logical construction of material, and according to Semantic feature encodes word.Refer to by the classification of word category system word information is different according to sport, news and amusement etc. Category system is classified.Source sequence vector refers to the sequence for obtaining list entries progress semantic coding.

Specifically, for computer equipment by the multilayer neural network of list entries input coding device, the multilayer of encoder is refreshing Semantic coding successively is carried out to list entries through network, obtains the source sequence vector of each layer of neural network output.

In the present embodiment, for computer equipment by the first layer neural network of list entries input coding device, first layer is refreshing Semantic coding is carried out to list entries through network, obtains the source sequence vector of first layer neural network output.Then by first Input of the source sequence vector of layer neural network output as second layer neural network, second layer neural network is to first layer mind Source sequence vector through network output carries out semantic coding, obtains the source sequence vector of second layer neural network output.Together Sample, using the source sequence vector of current layer output as next layer of input, the source sequence vector of next layer of output is obtained, Until obtaining the source sequence vector of the last layer neural network output.

In the present embodiment, computer equipment is by the first layer neural network of list entries input coding device, list entries It is made of the term vector of multiple words of source text.First layer neural network carries out the term vector of each word in list entries Semantic coding obtains the corresponding source vector of term vector of each word of first layer neural network output.When first layer nerve net After network all completes semantic coding to the term vector of all words in list entries, all words of first layer neural network output are obtained The corresponding source vector of term vector, these source vectors constitute first layer neural network output source sequence vector.

Step 206, corresponding first weight vectors of each word in source sequence vector are obtained.

Wherein, the first weight vectors are to carry out attention mechanism operation processing according to the source sequence vector of preceding layer to obtain 's.

Specifically, the corresponding source vector of each word.The initial value generated according to preset initial value or at random is to defeated The source vector for entering first layer neural network carries out attention operation, obtains the source vector pair of first layer neural network output The weight vectors answered.The weight vectors are the weight vectors of the corresponding word of source vector.Similarly, in the same fashion Obtain the corresponding weight vectors of each word of first layer neural network output.Then, from second layer neural network, according to preceding The corresponding weight vectors of each word of one layer of neural network output pay attention to the source sequence vector of encoder the last layer Power operation, obtains the corresponding weight vectors of each word of current layer output.Until obtaining the every of the last layer neural network output The corresponding weight vectors of a word, the corresponding weight vectors of each word that the last layer neural network is exported are as source vector sequence Corresponding first weight vectors of each word in column.

In the present embodiment, each layer neural network in encoder corresponds to each layer neural network in decoder.Each word A corresponding source vector.Computer equipment obtains the source sequence vector of the last layer neural network output in encoder, and By the first layer neural network of the source sequence vector input decoder of the last layer in encoder.The first layer nerve of decoder The initial value that network is generated by preset initial value or at random infuses a source vector in the source sequence vector Power of anticipating operation, exports the corresponding weight vectors of source vector, which is the weight of the corresponding word of source vector Vector.Similarly, the corresponding weight vectors of each source vector of the first layer neural network output of decoder can be obtained, thus The corresponding weight vectors of each word of the first layer neural network output of decoder can be obtained.

Then, the corresponding weight vectors input solution of each word that computer equipment exports decoder first layer neural network The second layer neural network of code device, and in the second layer neural network of the source sequence vector input decoder.Pass through first layer The corresponding weight vectors of each word of neural network output carry out attention operation to the source sequence vector, obtain decoder the The corresponding weight vectors of each word of two layers of neural network output.From decoder second layer neural network, by upper one layer of nerve The corresponding weight vectors of each word of network output and the input of the source sequence vector as current layer.Pass through upper one layer of nerve The corresponding weight vectors of each word of network output carry out attention operation to the source sequence vector, obtain current layer nerve net The corresponding weight vectors of each word of network output.And so on, each of decoder the last layer neural network output can be obtained The corresponding weight vectors of word, using the corresponding weight vectors of each word of decoder the last layer neural network output as source Corresponding first weight vectors of each word in sequence vector.

It should be noted that the source sequence vector in each layer neural network of input decoder is coding in the present embodiment The source sequence vector that device the last layer neural network is exported.

Step 208, according to source sequence vector and corresponding first weight vectors of each word, the target side of each word is generated Vector.

Wherein, target side vector is that the hidden layer that the corresponding source vector of word each in list entries is input to decoder is fallen into a trap Obtained vector.Hidden layer may include multiple neural net layers.

Specifically, computer equipment by source sequence vector respectively with decoder the last layer neural network output each of The corresponding weight vectors of word carry out dot product calculating, obtain the corresponding target side vector of each word.

Step 210, target sentences vector is obtained according to source sequence vector.

Wherein, target sentences vector refers to the vector for handling source sequence vector according to ad hoc rules.Target sentence Subvector includes shallow-layer sentence vector sum deep sentence vector.

Specifically, computer equipment handles source sequence vector according to preset rules, obtains the source vector sequence Arrange corresponding target sentences vector.Also, shallow-layer sentence vector or deep sentence vector are by computer equipment according to different pre- If rule is handled to obtain to source sequence vector.

Step 212, according to the target side vector sum target sentences vector of each word, the corresponding target word of each word is determined.

Step 214, according to the corresponding target word of each word, the corresponding target text of source text is generated.

Wherein, target word refers to the word that the word in source text obtains after translation.Target text refers to source text Translate obtained translation.

Specifically, the target side vector sum target sentences SYSTEM OF LINEAR VECTOR of each word can be superimposed by computer equipment.Obtain mesh Mark end the corresponding term vector of candidate word, by the vector obtained after linear superposition term vector corresponding with the candidate word of target side into Row matching, obtains the corresponding target word of each word, to generate the corresponding target text of source text.

It in the present embodiment, can be by the target side vector sum of each word when target sentences vector is shallow-layer sentence vector The superposition of shallow-layer sentence SYSTEM OF LINEAR VECTOR, obtains superimposed vector and carries out next processing.When target sentences vector is deep sentence When vector, the target side vector sum deep sentence SYSTEM OF LINEAR VECTOR of each word can be superimposed, obtain superimposed vector and carry out down One processing.

Above-mentioned text handling method is obtained list entries by semantic coding by obtaining the list entries of source text Source sequence vector.Corresponding first weight vectors of each word in source sequence vector are obtained, according to source sequence vector and every Corresponding first weight vectors of a word, produce the target side vector of each word, so that the vector of each word of source be converted For the vector of each word of target side.Target sentences vector is obtained according to source sequence vector, so that target sentences Vector Fusion The key message of each word of source, and make each word and the morphology of front and back at being associated with.According to the target side of each word to Amount and target sentences vector, determine the corresponding target word of each word, finally according to the corresponding target word of each word, generate source text Corresponding target text, solve the target side vector in traditional text interpretation method only according to single word determine it is corresponding Target word ignores each word semanteme represented in sentence, so as to cause the problem of translation inaccuracy.It can using this programme Each word is translated using sentence information, improves the accuracy rate of translation.

In one embodiment, corresponding first weight vectors of each word in source sequence vector are obtained, according to source to It measures sequence and corresponding first weight vectors of each word, the realization process for generating the target side vector of each word is as follows:

Specifically, computer equipment obtains the source sequence vector of the last layer neural network output in encoder, and will The first layer neural network of the source sequence vector input decoder of the last layer in encoder.The first layer nerve net of decoder The initial value that network is generated by preset initial value or at random pays attention to a source vector in the source sequence vector Power operation, exports the corresponding weight vectors of source vector, which is the first power of the corresponding word of source vector Weight vector.Similarly, can be obtained decoder first layer neural network output corresponding first weight of each source vector to Amount, so that corresponding first weight vectors of each word of the first layer neural network output of decoder can be obtained.Then, computer Equipment is added corresponding first weight vectors of each word that first layer neural network exports with the source sequence vector respectively Power summation, obtains the corresponding target side vector of each word of decoder first layer neural network output.

Then, the corresponding target side vector sum of each word that computer equipment exports decoder first layer neural network should The second layer neural network of source sequence vector input decoder calculates each word corresponding first in second layer neural network Weight vectors.Corresponding first weight vectors of word each in second layer neural network and source sequence vector weighting are asked again With, obtain the second layer neural network output the corresponding target side vector of each word.Similarly, decoder the last layer can be obtained The corresponding target side vector of each word of neural network output.Then, further according to the output of decoder the last layer neural network The corresponding target side vector sum target sentences vector of each word, determines the corresponding target word of each word.

In one embodiment, the step of generating the target side vector of each word, further includes: according to previous moment output Source sequence vector described in target side vector sum determines the target side vector of current time output.

Specifically, decoder decodes to obtain the corresponding target word of each source word to be that single carries out, that is to say, that each Moment decodes to obtain the corresponding target word of a source word.The source vector sequence that computer equipment exports encoder the last layer Column and preset initial value are input in the first layer neural network of decoder, obtain the first layer neural network first of decoder The target side vector of a moment output.Then, target side vector sum source that computer equipment exports first moment to It measures sequence and carries out attention operation, obtain the target side vector of second moment of first layer neural network output of decoder.Class As, attention operation is carried out to the source sequence vector using the target side vector of output of previous moment, when obtaining current Carve the target side vector of output.Each source word of each moment output of first layer neural network to obtain decoder is corresponding Target side vector.Similarly, each moment output of decoder the last layer neural network is obtained according to identical processing mode The corresponding target side vector of each source word.According to each source of each moment output of decoder the last layer neural network The corresponding target side vector sum target sentences vector of word, it may be determined that the corresponding target word of each word.

In above-mentioned text handling method, the source sequence vector is infused by the target side vector that previous moment exports Power of anticipating operation, thus according to the output probability at the output prediction current time of previous moment, to obtain the mesh of current time output Mark end vector.And then the target word at current time can be predicted according to the target word of previous moment, complete the translation of source text Operation.

As shown in figure 3, the process that neural network machine translation model handles text is illustrated in one embodiment Figure.By in source text input coding device, semantic coding is carried out to source text by coding module, obtains source sequence vector.Again The corresponding term vector input of target word that source sequence vector and previous neural network export is paid attention into power module, passes through attention Power module carries out the processing of attention mechanism to source sequence vector, obtains when secondary source content vector, that is, current time Source context.Again by current time source context input decoder, by decoder module to current time source above and below Text is decoded processing, exports the target word at current time.

In one embodiment, which is shallow-layer sentence vector；This is obtained according to the source sequence vector Target sentences vector, comprising: obtain the source sequence vector of encoder output layer；Determine that the source sequence vector of the output layer exists Mean value in corresponding dimension, generates shallow-layer sentence vector.

Wherein, shallow-layer sentence vector refers to that the source sequence vector for exporting encoder output layer is carried out according to preset rules Handle obtained vector.The preset rules can be the mean value for seeking source sequence vector in corresponding dimension.

Specifically, computer equipment obtains the source sequence vector of the output of encoder output layer, calculates the output layer Average value of the source sequence vector in each dimension, using the average value in each dimension as the source of encoder output layer to Measure the corresponding shallow-layer sentence vector of sequence.

For example, encoder output layer output source sequence vector be H=[(1.2,1.4,0.6), (0.2,1.3,1.6), (0.5,0.2,1.7)], wherein each vector indicates that a word carries out the source vector obtained after semantic coding.Each of here Vector is three-dimensional vector, and seeking the source sequence vector is the corresponding shallow-layer sentence vector of H, can calculate this 3 vectors first Average value in dimension, the average value on average value and the third dimension in the second dimension, obtain vector h=[(1.2+0.2+0.5)/3, (1.4+1.3+0.2)/3, (0.6+1.6+1.7)/3].The vector h being calculated is the source vector of encoder output layer The corresponding shallow-layer sentence vector of sequence H.When the corresponding vector of each word is the vector of one 512 dimension, obtained source vector sequence Column are the sequences of multiple 512 dimensional vector compositions.Average value in 1 to 512 dimensions is asked to the source sequence vector, obtains one 512 The vector of dimension, i.e. shallow-layer sentence vector are the vector of one 512 dimension.It should be noted that in practical application scene, each word The dimension of corresponding vector includes but is not limited to 512 dimensions.The vector dimension of specific each word can be arranged according to different needs.

Above-mentioned text handling method, by asking the source sequence vector of encoder output layer being averaged in corresponding dimension Value, each average value have merged element information of each word in corresponding dimension, the shallow-layer sentence Vector Fusion obtained from The information of each word, so that information represented by single word to be converted to the information of entire sentence expression.

In one embodiment, which is shallow-layer sentence vector；This is obtained according to the source sequence vector Target sentences vector, comprising: obtain the source sequence vector of encoder output layer；Determine that the source sequence vector of the output layer exists Maximum value in corresponding dimension, generates shallow-layer sentence vector.

Wherein, shallow-layer sentence vector refers to that the source sequence vector for exporting encoder output layer is carried out according to preset rules Handle obtained vector.The preset rules can be the maximum value for seeking source sequence vector in corresponding dimension.

Specifically, computer equipment obtains the source sequence vector of the output of encoder output layer, determines the output layer Maximum value of the source sequence vector in each dimension obtains the vector of the maximum value composition in each dimension.By each dimension Source sequence vector corresponding shallow-layer sentence vector of the vector of maximum value composition on degree as encoder output layer.

For example, encoder output layer output source sequence vector be H=[(1.2,1.4,0.6), (0.2,1.3,1.6), (0.5,0.2,1.7)], wherein each vector indicates that a word carries out the source vector obtained after semantic coding.Each of here Vector is three-dimensional vector, and seeking the source sequence vector is the corresponding shallow-layer sentence vector of H, can calculate this 3 vectors first The maximum value in the maximum value and the third dimension in maximum value, the second dimension in dimension, obtains vector h=(1.3,1.4,1.7).It calculates Obtained vector h is the corresponding shallow-layer sentence vector of source sequence vector H of encoder output layer.When each word is corresponding Vector is the vector of one 512 dimension, and obtained source sequence vector is the sequence of multiple 512 dimensional vector compositions.To the source to Amount sequence ask 1 to 512 dimension on maximum values, obtain one 512 dimension vector, i.e., shallow-layer sentence vector be one 512 dimension to Amount.Similarly, in practical application scene, the dimension of the corresponding vector of each word includes but is not limited to 512 dimensions.

Above-mentioned text handling method, it is important in the dimension that element of each word in corresponding dimension embodies each word Degree.By determining maximum value of the source sequence vector of encoder output layer in corresponding dimension, each dimension can determine On most representational information.Using maximum value of the source sequence vector of output layer in corresponding dimension as shallow-layer sentence to Amount, each component of shallow-layer sentence vector obtained from represents most important information in each dimension, so that shallow-layer Sentence vector remains the Global Information of each word.

In one embodiment, as shown in figure 4, the target sentences vector is shallow-layer sentence vector；This according to the source to Amount sequence obtains target sentences vector, comprising:

Step 402, the list entries of encoder input layer is obtained.

Step 404, it determines maximum value of the list entries of the input layer in corresponding dimension, obtains intermediate vector.

Wherein, intermediate vector, which refers to, calculates the vector that maximum value of the list entries in corresponding dimension obtains.

Specifically, computer equipment obtains the list entries of encoder input layer, determines the input of the encoder input layer Maximum value of the sequence in each dimension obtains the vector of the maximum value composition in each dimension, obtains an intermediate vector.

Step 406, the source sequence vector of encoder output layer is obtained.

Step 408, the similarity vector between the intermediate vector and the source sequence vector of the output layer is determined.

Wherein, similarity vector refers to the logical similarity between source sequence vector and each key-value pair.

Specifically, computer equipment obtains the source sequence vector of encoder output layer, by dot product Modeling Calculation in this Between the vector sum output layer source sequence vector between similarity, obtain similarity vector.For example, passing through Seek the similarity between intermediate vector q and the source sequence vector H of the output layer.Wherein, H^TThe source of presentation code device output layer Hold sequence vector transposition, d be model hidden state vector dimension, d is constant, when source sequence vector be 512 dimension to When measuring sequence, d 512.

Step 410, the corresponding weight vectors of the similarity are obtained according to the similarity vector.

Wherein, weight vectors refer to similarity vector is normalized after obtained vector.Normalization can incite somebody to action One is transformed into another K containing any real number K dimensional vector and ties up in real vector, so that the range of each element is between 0 to 1, And the sum of all elements are 1.

Specifically, computer equipment obtain the similarity between intermediate vector and the source sequence vector of the output layer to Amount, which is normalized, the corresponding weight vectors of the similarity vector are obtained.

Step 412, according to the source sequence vector of the weight vectors and the output layer, shallow-layer sentence vector is generated.

Specifically, the source sequence vector of the weight vectors and output layer progress dot product is calculated computer equipment The corresponding shallow-layer sentence vector of the source sequence vector of the output layer.The corresponding shallow-layer sentence of the source sequence vector of the output layer Vector is target sentences vector.

For example, similarity vector e is normalized, obtain the corresponding weight vectors E of e, by weight vectors E and The source sequence vector H of the output layer carries out dot product, obtains shallow-layer sentence vector g, i.e. g=EH.

Above-mentioned text handling method determines the input sequence of the input layer by obtaining the list entries of encoder input layer The maximum value being listed in corresponding dimension, obtains intermediate vector, to extract the key message of source word.Obtain encoder output The source sequence vector of layer, determines the similarity vector between the intermediate vector and the source sequence vector of the output layer, with true The logical similarity between sequence after determining source word and progress semantic coding.The similarity pair is obtained according to the similarity vector The weight vectors answered, and according to the source sequence vector of the weight vectors and the output layer, shallow-layer sentence vector is generated, is made it possible to The enough information by the sequence after the information of source word and progress semantic coding is integrated, so that the key message of word is integrated into In sentence.

In one embodiment, as shown in figure 5, the target sentences vector is deep sentence vector；This according to the source to Amount sequence obtains target sentences vector, comprising:

Step 502, the source sequence vector of every layer of encoder is obtained.

Step 504, according to every layer of source sequence vector, every layer of shallow-layer sentence vector is obtained.

Specifically, computer equipment obtains the source sequence vector of each layer of neural network of encoder output, obtains every layer Source sequence vector.Then, source sequence vector first layer neural network exported carries out attention operation, obtains first The shallow-layer sentence vector of layer.Using the shallow-layer sentence vector of first layer as the input of second layer neural network, pass through the first layer Shallow-layer sentence vector to source sequence vector carry out attention operation, obtain the shallow-layer sentence vector of the second layer.Similarly, from Second layer neural network rises, the input by the output of preceding layer neural network as current layer, to the source vector sequence of current layer Column carry out attention operation, the shallow-layer sentence vector of current layer are obtained, to obtain every layer of shallow-layer sentence vector.

Step 506, according to every layer of shallow-layer sentence vector, deep sentence vector is generated.

Wherein, deep sentence vector is modeled by every layer of shallow-layer sentence vector.

Specifically, computer equipment updates first layer neural network using the shallow-layer sentence vector of first layer as input Hidden state.Again using the shallow-layer sentence vector of the updated hidden state of first layer neural network and the second layer as second layer mind Input through network updates the hidden state of second layer neural network.Similarly, until the neural network of update the last layer Hidden state, using the updated hidden state of the neural network of the last layer as deep sentence vector.

In above-mentioned text handling method, by obtain every layer of encoder source sequence vector, according to every layer of source to Sequence is measured, every layer of shallow-layer sentence vector is obtained.Further according to every layer of shallow-layer sentence vector, deep sentence vector is generated, so that The obtained deep sentence Vector Fusion information of the shallow-layer sentence vector of each layer.To remain source in deep sentence vector The global information of text, so that the target text that translation obtains is more acurrate.

In one embodiment, this generates deep sentence vector according to every layer of the shallow-layer sentence vector, comprising:

Every layer of shallow-layer sentence vector is inputted into Recognition with Recurrent Neural Network, obtains the deep layer of Recognition with Recurrent Neural Network output layer output Sentence vector；Each layer network of the Recognition with Recurrent Neural Network corresponds to each layer network of encoder, each layer in the Recognition with Recurrent Neural Network Input include in encoder in the shallow-layer sentence vector sum Recognition with Recurrent Neural Network of respective layer upper one layer of output of this layer it is implicit State vector, the hidden state vector be the Recognition with Recurrent Neural Network to upper one layer input shallow-layer sentence Vector Processing after obtain 's.

Wherein, Recognition with Recurrent Neural Network (Recurrent Neural Network, abbreviation RNN) is a kind of node orientation connection The artificial neural network of cyclization.The internal state of this network can show dynamic time sequence behavior, and RNN can use internal note Recall to handle the sequence of the input of arbitrary sequence.

Specifically, what computer equipment was generated by the shallow-layer sentence vector sum experience preset initial value of first layer or at random is first Initial value inputs the first layer of Recognition with Recurrent Neural Network, updates the hidden state vector of the first layer of Recognition with Recurrent Neural Network.Then, it will follow The shallow-layer sentence vector of the updated hidden state vector sum second layer of the first layer of ring neural network is as Recognition with Recurrent Neural Network The input of the second layer updates the hidden state vector of the second layer of Recognition with Recurrent Neural Network.Similarly, the of self-loopa neural network Two layers are risen, and the corresponding shallow-layer sentence vector of the updated hidden state vector sum current layer of the preceding layer of Recognition with Recurrent Neural Network is made For the input of Recognition with Recurrent Neural Network current layer, to update the hidden state vector of Recognition with Recurrent Neural Network current layer.Until updating After the hidden state vector of Recognition with Recurrent Neural Network the last layer, by the updated hidden state of Recognition with Recurrent Neural Network the last layer to Amount is used as deep sentence vector.By the way that every layer of shallow-layer sentence vector is inputted Recognition with Recurrent Neural Network, Recognition with Recurrent Neural Network is obtained The deep sentence vector of output layer output, so that the obtained deep sentence Vector Fusion information of each layer shallow-layer sentence vector, To remain the global information of source text.

As shown in fig. 6, g_n=(g₁,g₂,g₃), g_nFor each layer shallow-layer sentence vector in encoder, deep sentence vector isThe Recognition with Recurrent Neural Network haves three layers, and deep sentence vector isIt can be byIt obtains.It is pre- then to obtain experience If initial value r₀Or the initial value r generated at random₀, by the first layer shallow-layer sentence vector g in encoder₁And r₀Input circulation mind First layer through network, by first layer shallow-layer sentence vector g₁And r₀It is weighted, obtains the first layer of Recognition with Recurrent Neural Network Updated hidden state vector r₁, i.e. r₁=g₁·r₀.Then, by the updated implicit shape of the first layer of Recognition with Recurrent Neural Network State vector r₁With the shallow-layer sentence vector g of the second layer in encoder₂As the input of the Recognition with Recurrent Neural Network second layer, followed The updated hidden state vector r of the first layer of ring neural network₂.It similarly, according to the method described above, can be until circulation nerve net The updated hidden state vector r of network the last layer₃, by the r₃As deep sentence vector.

In one embodiment, as shown in fig. 7, this generates deep sentence vector according to every layer of the shallow-layer sentence vector, Include:

Step 702, it determines similar between upper one layer of target side vector of the shallow-layer sentence vector sum current layer of current layer Spend vector.

Step 704, upper one layer of target of the shallow-layer sentence vector sum current layer of current layer is determined according to the similarity vector Hold the weight vectors between vector.

Step 706, according to the weight vectors and every layer of shallow-layer sentence vector, deep sentence vector is generated.

Specifically, each layer neural network in encoder corresponds to each layer neural network in decoder.Computer equipment obtains Take the shallow-layer sentence vector of each layer neural network output in encoder, and the initial target end vector of decoder.By encoder The initial target end vector of the shallow-layer sentence vector sum decoder of one layer of neural network output is input to the first layer mind of decoder In network, the logical similarity between initial target end vector sum first layer shallow-layer sentence vector is calculated, obtains first layer mind The similarity vector exported through network.The similarity vector is normalized, the output of first layer neural network is obtained Weight vectors.

Then, computer equipment obtains the shallow-layer sentence vector of second layer neural network output in encoder, and obtains solution The target side vector of the first layer neural network output of code device.By the of the second layer shallow-layer sentence vector sum decoder of encoder One layer of target side vector is input in the first layer neural network of decoder, calculates first layer target side vector sum second layer shallow-layer Logical similarity between sentence vector obtains the similarity vector of second layer neural network output.By the similarity vector into Row normalized obtains the weight vectors of decoder second layer neural network output.Similarly, from second layer neural network It rises, using the shallow-layer sentence vector of the target side vector sum current layer of upper one layer of neural network output as current layer neural network Input.Calculate the similarity between the shallow-layer sentence vector for determining the target side vector sum current layer of upper one layer of neural network output Vector, then the similarity vector that every layer is calculated is normalized, correspondence obtains every layer of weight vectors.Then, Every layer of weight vectors and the corresponding shallow-layer sentence vector of every layer of weight vectors are carried out dot product calculating by computer equipment, obtain depth Spend sentence vector.

In above-mentioned text handling method, upper one layer of target side of the shallow-layer sentence vector sum current layer by determining current layer Similarity vector between vector, between upper one layer of target side vector to determine the shallow-layer sentence vector sum current layer of current layer Logical similarity.According to the similarity vector determine upper one layer of target side of the shallow-layer sentence vector sum current layer of current layer to Weight vectors between amount generate deep sentence vector according to the weight vectors and every layer of shallow-layer sentence vector.It can be every layer The case where key message of shallow-layer sentence vector is integrated, and avoids the case where key message is lost, translation is avoided to malfunction.

In one embodiment, as shown in figure 8, source sequence vector H to every layer of encoder_iAttention operation is carried out, Form each layer of shallow-layer sentence vector g_i, to obtain each layer shallow-layer sentence vector g_iThe shallow-layer sentence sequence vector G of composition.

G={ g₁,...,g_N}

g_i=Global (H_i)

After obtaining shallow-layer sentence sequence vector G, in a decoder, by target side vector d_i-1As inquiry, to encoder Shallow-layer sentence sequence vector G={ g₁,...,g_NCarry out attention operation, i.e. g_i=Att (d_i-1, G), formed deep sentence to Amount.

Obtain the specific behaviour of deep sentence vector are as follows: to target side vector d_i-1Carry out dot product with G with obtain request with it is each Similarity vector e between key-value pair_i:

Wherein, G_n ^TIndicate that the transposition of the shallow-layer sentence vector of every layer of code device, d are the dimension of model hidden state vector.So Afterwards by similarity vector e_iIt is normalized, obtains weight vectors β_i, then to weight vectors β_iWith every layer of encoder of shallow-layer Sentence vector g_iCarry out dot product calculating, i.e. g=β_iG, to obtain deep sentence vector g.

In one embodiment, as shown in figure 9, this generates every layer of shallow-layer sentence according to every layer of the source sequence vector Subvector, comprising:

Step 902, the source sequence vector of every layer of encoder is obtained.

Specifically, computer equipment carries out the first layer neural network of the list entries input coding device of source text semantic Coding obtains the source sequence vector of the first layer neural network output of encoder.It is again that the source sequence vector of first layer is defeated The second layer neural network for entering encoder carries out semantic coding, obtains second layer source sequence vector.In the same fashion, it obtains The source sequence vector exported to every layer of neural network of encoder.

Step 904, be used as current layer one by one for every layer, determine current layer source sequence vector and upper one layer of current layer Similarity vector between shallow-layer sentence vector.

Step 906, the corresponding weight vectors of the similarity are obtained according to the similarity vector.

Step 908, according to the source sequence vector of the weight vectors and current layer, generate the shallow-layer sentence of current layer to Amount.

Specifically, computer equipment regard every layer of neural network as current layer in sequence one by one.Obtain the source of current layer Sequence vector is held, and obtains upper one layer of shallow-layer sentence vector of current layer, calculates the source sequence vector and current layer of current layer Upper one layer of shallow-layer sentence vector between logical similarity, obtain similarity vector.Then, normalizing is carried out to similarity vector Change processing, obtains the corresponding weight vectors of the similarity.The source sequence vector of the weight vectors and current layer is added again Power summation, obtains the shallow-layer sentence vector of current layer.

In the present embodiment, first layer neural network is not one layer upper, thus also without upper one layer of shallow-layer sentence vector.Cause This, needs to preset initial shallow-layer sentence vector.Using first layer neural network as when current layer, by initial shallow-layer sentence Upper one layer of shallow-layer sentence vector of the subvector as first layer neural network.To which the output of first layer neural network be calculated Shallow-layer sentence vector.

Above-mentioned text handling method, by the way that every layer of source sequence vector to be converted to every layer of shallow-layer sentence vector, and Using the shallow-layer sentence vector of current layer output as next layer of input, next layer of shallow-layer sentence vector is calculated.So that every The shallow-layer sentence vector of one current layer is obtained by the source sequence vector of upper one layer of shallow-layer sentence vector sum current layer, is protected The transmitting of the information of source word is demonstrate,proved, and all layers before making the shallow-layer sentence vector of current layer incorporate current layer Information, so that the information of each word of source text is integrated into sentence.

In one embodiment, shallow-layer sentence vector learning manipulation is carried out to each layer of encoder of source sequence vector. For example, the source sequence vector to n-th layer carries out attention operation, the shallow-layer sentence vector of n-th layer is formed:

g_n=Global (H_n)

Specifically, with (n-1)th layer of shallow-layer sentence vector g_n-1As the input of n-th layer, pass through (n-1)th layer of the shallow-layer Source sequence vector H of the sentence vector to encoder n-th layer_nAttention operation is carried out, calculation is as follows:

g_n=Att (g_n-1,H_n)

Further, concrete operations are to g_n-1With H_nIt is similar between each key-value pair to be requested to carry out dot product Spend vector e_n:

Wherein, H_n ^TThe transposition of the source sequence vector of presentation code device input layer, d are the dimension of model hidden state vector Degree.Then to similarity vector e_nIt is normalized, obtains similarity vector e_nCorresponding weight vectors E_n

Then to the source sequence vector H of weight and encoder n-th layer_nIt carries out dot product and calculates the shallow-layer sentence for obtaining n-th layer Subvector g_n:

g_n=E_n·H_n。

In one embodiment, the source sequence vector that computer equipment exports first layer neural network carries out attention Operation, obtains the shallow-layer sentence vector of first layer.Using the shallow-layer sentence vector of first layer as the input of second layer neural network, Attention operation is carried out to source sequence vector by the shallow-layer sentence vector of the first layer, obtain the shallow-layer sentence of the second layer to Amount.Similarly, from second layer neural network, by the input of preceding layer neural network exported as current layer, to current layer Source sequence vector carry out attention operation, the shallow-layer sentence vector of current layer is obtained, to obtain every layer of shallow-layer sentence Vector.It should be noted that first layer neural network is not one layer upper, thus also without upper one layer of shallow-layer sentence vector.Therefore, When first layer source sequence vector input first layer neural network is carried out attention operation, need to preset initial value or logical Cross random generation initial value.The initial value is inputted into first layer neural network, by the initial value to the source vector of first layer Sequence carries out attention operation.

In one embodiment, as shown in Figure 10, the target side vector sum target sentences vector according to each word, Determine the corresponding target word of each word, comprising:

Step 1002, according to the target side vector sum of each word target sentences vector, the corresponding prediction of each word is obtained Term vector.

Wherein, prediction term vector is handled according to the corresponding target side vector sum target sentences vector of a word Vector.The corresponding prediction term vector of one word.

Specifically, computer equipment obtains the corresponding target side vector of a word, and obtains target sentences vector, by the word Corresponding target side vector sum target sentences vector carries out linear superposition, obtains the corresponding prediction term vector of the word.According to identical Mode, obtain the corresponding prediction term vector of each word.

Step 1004, the term vector of the candidate word of target side is obtained.

Step 1006, the similar of the term vector of the corresponding prediction term vector of each word and the candidate word of the target side is determined Degree.

Wherein, candidate word refers to the word in pre-set dictionary, the corresponding term vector of each word in the dictionary.

Specifically, computer equipment obtains the corresponding term vector of each candidate word from the dictionary of target side.Then, it chooses The corresponding prediction term vector of one word, and calculate the prediction term vector word corresponding with each candidate word of target side of the selection to Similarity between amount.Similarly, make the word for calculating the candidate word of each prediction term vector and target side in a like fashion The similarity of vector.

Step 1008, the highest candidate word of similarity of prediction term vector corresponding with each word is corresponding as each word Target word.

Specifically, computer equipment determines the prediction term vector term vector corresponding with each candidate word of target side chosen Between similarity after, the determining highest candidate word of prediction term vector similarity with the selection, and using the candidate word as The corresponding target word of prediction term vector of the selection.Make to obtain the corresponding target word of each prediction term vector in a like fashion, To obtain the corresponding target word of word of each source.

In the present embodiment, computer equipment determines each candidate word pair of the prediction term vector and target side of a selection After the similarity between term vector answered, the corresponding target word of prediction term vector of the selection can be directly determined.Then, it then selects The corresponding prediction term vector of another word is taken, the similarity between term vector corresponding with each candidate word of target side is calculated, To determine the operation for the corresponding target word of word chosen.

In above-mentioned text handling method, according to the target side vector sum of each word target sentences vector, each word is obtained Corresponding prediction term vector.It obtains the term vector of the candidate word of target side and determines the corresponding prediction term vector of each word and the mesh Mark the similarity of the term vector of the candidate word at end.The highest candidate word of similarity of prediction term vector corresponding with each word is made For the corresponding target word of each word, to obtain the corresponding target word of each word of source text.

In one embodiment, a full connection feedforward network (position-wise fully can be used in computer equipment Connected feed-forward network) output layer is generated, by all times of obtained prediction term vector and target side It selects the corresponding term vector of word to be compared, selects the highest candidate word of similarity as the corresponding target of prediction term vector Word, to obtain the target word of the word of the corresponding source text of prediction term vector.

In one embodiment, the target side vector sum target sentences vector according to each word, determines each word Corresponding target word, comprising: obtain the term vector of the candidate word of target side；Determine the corresponding prediction term vector of each word and target The similarity of the term vector of the candidate word at end；The candidate word for exporting the high similarity of preset quantity, the height of preset quantity is similar The candidate word of degree is as the corresponding target word of each word.

This generates the corresponding target text of the source text, comprising: according to each word according to the corresponding target word of each word The candidate word of corresponding preset quantity generates candidate text；The highest candidate text of output probability is corresponding as source text Target text.

Wherein, output probability, which refers to, each of is calculated candidate text according to the candidate word of the corresponding preset quantity of each word This probability.

Specifically, computer equipment determines the prediction term vector term vector corresponding with each candidate word of target side chosen Between similarity after, each candidate word can be sorted according to similarity, and the candidate word that the similarity for choosing preset quantity is high. The high candidate word of the similarity of the preset quantity can be used as the corresponding multiple target words of prediction term vector of the selection.Work as source text In the corresponding preset quantity of each word candidate word after, computer equipment can be according to the corresponding preset quantity of each word Candidate word, generate multiple candidate texts.Further, computer equipment is according to the corresponding candidate word meter of word each in source text Calculate the output probability of each candidate text.The highest candidate text of output probability is determined, by the highest candidate text of the output probability This is as the corresponding target text of source text.The mode for calculating the output probability of candidate text includes but is not limited to beam-search.

It is corresponding to obtain source text by the corresponding candidate word of word each in determining source text for above-mentioned text handling method Multiple candidate's texts, and using the highest candidate text of output probability as the corresponding target text of source text.To obtain and source The immediate target text of text, so that translation is more acurrate.

In one embodiment, this article treatment method includes:

The list entries of computer equipment acquisition source text.

Then, which is obtained source sequence vector by semantic coding by computer equipment.

Then, computer equipment obtains corresponding first weight vectors of each word in the source sequence vector.

Further, computer equipment is raw according to the source sequence vector the first weight vectors corresponding with each word At the target side vector of each word.

Then, computer equipment obtains the source sequence vector of every layer of encoder.

Further, computer equipment is used as current layer for every layer one by one, determines the source sequence vector of current layer and works as Similarity vector between upper one layer of shallow-layer sentence vector of front layer.

Then, computer equipment obtains the corresponding weight vectors of the similarity according to the similarity vector.

Then, computer equipment generates the shallow-layer of current layer according to the source sequence vector of the weight vectors and current layer Sentence vector.

Then, computer equipment determine the shallow-layer sentence vector sum current layer of current layer upper one layer of target side vector it Between similarity vector.

Further, computer equipment determines the shallow-layer sentence vector sum current layer of current layer according to the similarity vector Weight vectors between upper one layer of target side vector.

Then, computer equipment generates deep sentence vector according to the weight vectors and every layer of shallow-layer sentence vector.

Further, computer equipment obtains each according to the target side vector sum of each word deep sentence vector The corresponding prediction term vector of word.

Then, computer equipment obtains the term vector of the candidate word of target side.

Then, computer equipment determines the term vector of each corresponding prediction term vector of word and the candidate word of the target side Similarity.

Further, computer equipment makees the highest candidate word of similarity for predicting term vector corresponding with each word For the corresponding target word of each word.

Then, computer equipment generates the corresponding target text of the source text according to the corresponding target word of each word.

In above-mentioned text handling method, by obtaining the list entries of source text, list entries is passed through into semantic coding Obtain source sequence vector.By the way that every layer of source sequence vector to be converted to every layer of shallow-layer sentence vector, and by current layer The shallow-layer sentence vector of output calculates next layer of shallow-layer sentence vector as next layer of input.So that each is current The shallow-layer sentence vector of layer is obtained by the source sequence vector of upper one layer of shallow-layer sentence vector sum current layer, ensure that source The transmitting of the information of word, and all layers of information before making the shallow-layer sentence vector of current layer incorporate current layer, To which the information of each word of source text is integrated into sentence.

By corresponding first weight vectors of word each in acquisition source sequence vector, according to source sequence vector and each Corresponding first weight vectors of word, produce the target side vector of each word, so that the vector of each word of source is converted to The vector of each word of target side.

By determine current layer shallow-layer sentence vector sum current layer upper one layer of target side vector between similarity to Amount, the logical similarity between upper one layer of target side vector to determine the shallow-layer sentence vector sum current layer of current layer.According to The similarity vector determines the weight vectors between upper one layer of target side vector of the shallow-layer sentence vector sum current layer of current layer, According to the weight vectors and every layer of shallow-layer sentence vector, deep sentence vector is generated.It is capable of the pass of every layer of shallow-layer sentence vector The case where key information is integrated, and avoids the case where key message is lost, translation is avoided to malfunction.

Target sentences vector is obtained according to source sequence vector, so that target sentences Vector Fusion each word of source Key message, and make each word and the morphology of front and back at being associated with.

According to the target side vector sum target sentences vector of each word, the corresponding target word of each word is determined, solve existing Target side vector in some text interpretation methods only according to single word determines corresponding target word, ignores each word in sentence In represented semanteme, so as to cause the problem of translation inaccuracy.Finally according to the corresponding target word of each word, source text is generated Corresponding target text.Each word can be translated using sentence information using this programme, improve the accuracy rate of translation.

It as shown in figure 11, is the architecture diagram of neural network machine translation system in one embodiment.Figure 11 illustrates nerve One layer of encoder and decoder of structure of Network-based machine translation model.

As shown in figure 11, wherein one layer of the structure of the Nx presentation code device in left side includes two sublayers in this layer, the One sublayer is bull attention layer, and second sublayer is propagated forward layer.Each sublayer is output and input in the presence of association, An input data of the output of current sublayer as next sublayer.Each sublayer is grasped followed by a normalization Make, normalization operation can be improved the convergence rate of model.The Nx on right side indicates wherein one layer of the structure of decoder, decoder One layer in include three sublayers, first sublayer is the bull attention sublayer of mask matrix majorization, raw for modeling At target side sentence vector need a mask matrix to control during training so that bull attention meter every time When calculation, only calculates and arrive preceding t-1 word.Second sublayer is bull attention sublayer, is between encoder and decoder Attention mechanism, that is, relevant semantic information is searched in source text, the calculating of this layer has used the mode of dot product.Third A sublayer is propagated forward sublayer, consistent with the calculation of propagated forward sublayer in encoder.Each sublayer of decoder Between also all there is association, an input data of the output of current sublayer as next sublayer.And decoder is every A same and then normalization operation after a sublayer, to accelerate model convergence.

In one embodiment, it is tested in WMT2017 Germany and Britain machine translation task using this programme, test result It is as follows:

Wherein, this programme-model 1, which refers to, carries out the operation of mean value pondization for the source sequence vector of encoder output layer, obtains To shallow-layer sentence vector, which is target sentences vector.This programme-model 2 refers to encoder output layer Source sequence vector carry out maximum value pond (max-pooling) operation, obtain shallow-layer sentence vector, the shallow-layer sentence vector As target sentences vector.This programme-model 3 refers to using the list entries of encoder input layer as inquiry (query), to volume The source sequence vector of code device output layer carries out attention operation, obtains shallow-layer sentence vector, which is mesh Mark sentence vector.When target sentences vector is deep sentence vector, the side of this programme-model 4 or this programme-model 5 is used Formula obtains deep sentence vector.This programme-model 4 is to obtain the static mode of deep sentence vector, using Recognition with Recurrent Neural Network Circulation modeling is carried out to every layer of shallow-layer sentence vector, takes its end-state as deep sentence vector.This programme-model 5 is The dynamical fashion of deep sentence vector is obtained, using the target side vector at decoder each moment as inquiry, to pay attention to force mode The target side vector to form each moment is operated to every layer of shallow-layer sentence vector progress attention.Side proposed in the present embodiment Method is simple and is easy to model, few to increase required computing resource and reduce calculating speed, can efficiently be helped using sentence information Neural machine translation mothod is helped to improve translation performance.

It is to significantly improve that BLEU in upper table, which generally improves more than 0.5 point, and the Δ on the column refers to the absolute number of raising Value.As can be seen from the table, the Δ of this programme-model 2, this programme-model 3, this programme-model 4 and this programme-model 5 is all super 0.5 point has been crossed, has shown that the method that this programme is proposed can be obviously improved translation quality.

In addition to the above described embodiments, other machine learning also can be used in the mode that shallow-layer sentence vector is obtained in this programme Mode to the source sequence vector that encoder output layer exports handle at.The mode that deep sentence vector is obtained in this programme is removed Except above-described embodiment, the method (for example, convolutional neural networks etc.) that also can use other machines study is every to encoder The shallow-layer sentence vector of layer is operated to obtain.

This programme do not limit specifically neural network model is similar and topological structure, the neural network in scheme can replace For various other novel model structures for having paddle, such as Recognition with Recurrent Neural Network and mutation, or other network structures are replaced with, Such as convolutional neural networks etc..

Simultaneously it may be noted that be a bit, method provided by this programme can be used in all mainstream neural network machines turn over It translates in system, and is suitable for the translation duties of all language.

Fig. 2-Figure 10 is the flow diagram of text handling method in one embodiment.Although should be understood that Fig. 2- Each step in the flow chart of Figure 10 is successively shown according to the instruction of arrow, but these steps are not inevitable according to arrow The sequence of instruction successively executes.Unless expressly stating otherwise herein, there is no stringent sequences to limit for the execution of these steps, These steps can execute in other order.Moreover, at least part step in Fig. 2-Figure 10 may include multiple sub-steps Rapid perhaps multiple these sub-steps of stage or stage are not necessarily to execute completion in synchronization, but can be in difference At the time of execute, the execution in these sub-steps or stage sequence, which is also not necessarily, successively to be carried out, but can be with other steps Either the sub-step of other steps or at least part in stage execute in turn or alternately.

In one embodiment, as shown in figure 12, a kind of text processing apparatus is provided.Text processing unit includes:

Retrieval module 1202, for obtaining the list entries of source text.

Coding module 1204, for list entries to be obtained source sequence vector by semantic coding.

Weight Acquisition module 1206, for obtaining corresponding first weight vectors of each word in source sequence vector.

Target side vector generation module 1208, for according to source sequence vector and corresponding first weight of each word to Amount, generates the target side vector of each word.

Target sentences vector determining module 1210, for obtaining target sentences vector according to source sequence vector.

Target word determining module 1212 determines each for the target side vector sum target sentences vector according to each word The corresponding target word of word.

Target text generation module 1214, for generating the corresponding target of source text according to the corresponding target word of each word Text.

Above-mentioned text processing apparatus is obtained list entries by semantic coding by obtaining the list entries of source text Source sequence vector.Corresponding first weight vectors of each word in source sequence vector are obtained, according to source sequence vector and every Corresponding first weight vectors of a word, produce the target side vector of each word, so that the vector of each word of source be converted For the vector of each word of target side.Target sentences vector is obtained according to source sequence vector, so that target sentences Vector Fusion The key message of each word of source, and make each word and the morphology of front and back at being associated with.According to the target side of each word to Amount and target sentences vector, determine the corresponding target word of each word, solve in existing text interpretation method only according to list The target side vector of a word determines corresponding target word, ignores each word semanteme represented in sentence, so as to cause translation The problem of inaccuracy.Finally according to the corresponding target word of each word, the corresponding target text of source text is generated.Using this programme energy Enough each word is translated using sentence information, improves the accuracy rate of translation.

In one embodiment, when the target sentences vector is shallow-layer sentence vector, which determines mould Block 1210 is also used to: obtaining the source sequence vector of encoder output layer；Determine the source sequence vector of the output layer in correspondence Mean value in dimension generates shallow-layer sentence vector.By asking the source sequence vector of encoder output layer in corresponding dimension Average value, each average value have merged element information of each word in corresponding dimension, and the shallow-layer sentence vector obtained from melts The information of each word is closed, so that information represented by single word to be converted to the information of entire sentence expression.

In one embodiment, when the target sentences vector is shallow-layer sentence vector, which determines mould Block 1210 is also used to: obtaining the source sequence vector of encoder output layer；Determine the source sequence vector of the output layer in correspondence Maximum value in dimension generates shallow-layer sentence vector.Above-mentioned text processing apparatus, element of each word in corresponding dimension embody Significance level of each word in the dimension.By determining the source sequence vector of encoder output layer in corresponding dimension Maximum value can determine most representational information in each dimension.By the source sequence vector of output layer in corresponding dimension On maximum value as shallow-layer sentence vector, each component of shallow-layer sentence vector obtained from represents each dimension Upper most important information, so that shallow-layer sentence vector remains the Global Information of each word.

In one embodiment, when the target sentences vector is shallow-layer sentence vector, which determines mould Block 1210 is also used to: obtaining the list entries of encoder input layer；Determine the list entries of the input layer in corresponding dimension Maximum value obtains intermediate vector；Obtain the source sequence vector of encoder output layer；Determine the intermediate vector and the output layer Similarity vector between source sequence vector；The corresponding weight vectors of the similarity are obtained according to the similarity vector；According to The source sequence vector of the weight vectors and the output layer generates shallow-layer sentence vector.By obtaining the defeated of encoder input layer Enter sequence, determines maximum value of the list entries of the input layer in corresponding dimension, intermediate vector is obtained, to extract source The key message of word.The source sequence vector for obtaining encoder output layer, determine the source of the intermediate vector and the output layer to The similarity vector between sequence is measured, to determine the logical similarity between the sequence after source word and progress semantic coding.Root The corresponding weight vectors of the similarity are obtained according to the similarity vector, and according to the source vector of the weight vectors and the output layer Sequence generates shallow-layer sentence vector, makes it possible to carry out the information of the sequence after the information of source word and progress semantic coding Integration, so that the key message of word is integrated into sentence.

In one embodiment, when the target sentences vector is deep sentence vector, which determines mould Block 1210 is also used to: obtaining the source sequence vector of every layer of encoder；According to every layer of source sequence vector, every layer shallow is obtained Layer sentence vector；According to every layer of shallow-layer sentence vector, deep sentence vector is generated.By obtain every layer of encoder source to It measures sequence and every layer of shallow-layer sentence vector is obtained according to every layer of source sequence vector.Further according to every layer of shallow-layer sentence to Amount generates deep sentence vector, so that the obtained deep sentence Vector Fusion information of the shallow-layer sentence vector of each layer.To The global information of source text is remained in deep sentence vector, so that the target text that translation obtains is more acurrate.

In one embodiment, when the target sentences vector is deep sentence vector, which determines mould Block 1210 is also used to: every layer of shallow-layer sentence vector being inputted Recognition with Recurrent Neural Network, obtains the output of Recognition with Recurrent Neural Network output layer Deep sentence vector；Each layer network of the Recognition with Recurrent Neural Network corresponds to each layer network of encoder, in the Recognition with Recurrent Neural Network Each layer of input includes upper one layer of output of this layer in the shallow-layer sentence vector sum Recognition with Recurrent Neural Network of respective layer in encoder Hidden state vector, the hidden state vector be the Recognition with Recurrent Neural Network to upper one layer input shallow-layer sentence Vector Processing after It obtains.By the way that every layer of shallow-layer sentence vector is inputted Recognition with Recurrent Neural Network, the output of Recognition with Recurrent Neural Network output layer is obtained Deep sentence vector, so that the obtained deep sentence Vector Fusion information of each layer shallow-layer sentence vector, to remain source The global information of text.

In one embodiment, when the target sentences vector is deep sentence vector, which determines mould Block 1210 is also used to: determine similarity between upper one layer of target side vector of the shallow-layer sentence vector sum current layer of current layer to Amount；The power between upper one layer of target side vector of the shallow-layer sentence vector sum current layer of current layer is determined according to the similarity vector Weight vector.According to the weight vectors and every layer of shallow-layer sentence vector, deep sentence vector is generated.Above-mentioned text processing apparatus In, by determine current layer shallow-layer sentence vector sum current layer upper one layer of target side vector between similarity vector, with Determine the logical similarity between upper one layer of target side vector of the shallow-layer sentence vector sum current layer of current layer.It is similar according to this Degree vector determines the weight vectors between upper one layer of target side vector of the shallow-layer sentence vector sum current layer of current layer, according to this Weight vectors and every layer of shallow-layer sentence vector generate deep sentence vector.It is capable of the key message of every layer of shallow-layer sentence vector The case where being integrated, avoiding the case where key message is lost, translation is avoided to malfunction.

In one embodiment, when the target sentences vector is deep sentence vector, which determines mould Block 1210 is also used to: obtaining the source sequence vector of every layer of encoder；It is used as current layer one by one by every layer, determines the source of current layer Hold the similarity vector between sequence vector and upper one layer of shallow-layer sentence vector of current layer；It is somebody's turn to do according to the similarity vector The corresponding weight vectors of similarity；According to the source sequence vector of the weight vectors and current layer, the shallow-layer sentence of current layer is generated Subvector.By the way that every layer of source sequence vector to be converted to every layer of shallow-layer sentence vector, and the shallow-layer that current layer is exported Sentence vector calculates next layer of shallow-layer sentence vector as next layer of input.So that the shallow-layer sentence of each current layer Subvector is obtained by the source sequence vector of upper one layer of shallow-layer sentence vector sum current layer, ensure that the information of source word Transmitting, and all layers of information before making the shallow-layer sentence vector of current layer incorporate current layer, thus by source document The information of this each word is integrated into sentence.

In one embodiment, target word determining module 1212 is also used to: according to the target side vector sum of each word mesh Sentence vector is marked, the corresponding prediction word of each word is obtained；Obtain the candidate word of target side；Determine the corresponding prediction word of each word and The similarity of the candidate word of the target side；Using the highest candidate word of similarity of prediction word corresponding with each word as each word Corresponding target word.According to the target side vector sum of each word target sentences vector, obtain the corresponding prediction word of each word to Amount.It obtains the term vector of the candidate word of target side and determines the corresponding prediction term vector of each word and the candidate word of the target side The similarity of term vector.The highest candidate word of similarity of prediction term vector corresponding with each word is corresponding as each word Target word, to obtain the corresponding target word of each word of source text.

Figure 13 shows the internal structure chart of computer equipment in one embodiment.The computer equipment specifically can be figure Terminal 110 (or server 120) in 1.As shown in figure 13, it includes passing through system which, which includes the computer equipment, Processor, memory, network interface, input unit and the display screen of bus connection.Wherein, memory includes non-volatile memories Medium and built-in storage.The non-volatile memory medium of the computer equipment is stored with operating system, can also be stored with computer Program when the computer program is executed by processor, may make processor to realize text handling method.It can also in the built-in storage Computer program is stored, when which is executed by processor, processor may make to execute text handling method.It calculates The display screen of machine equipment can be liquid crystal display or electric ink display screen, and the input unit of computer equipment can be aobvious The touch layer covered in display screen is also possible to the key being arranged on computer equipment shell, trace ball or Trackpad, can also be External keyboard, Trackpad or mouse etc..

It will be understood by those skilled in the art that structure shown in Figure 13, only part relevant to application scheme The block diagram of structure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, and specific computer is set Standby may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.

In one embodiment, text processing apparatus provided by the present application can be implemented as a kind of shape of computer program Formula, computer program can be run in computer equipment as shown in fig. 13 that.Composition can be stored in the memory of computer equipment Each program module of text processing unit, for example, retrieval module 1202, coding module 1204 shown in Figure 12, power Recapture modulus block 1206, target side vector generation module 1208, target sentences vector determining module 1210, target word determining module 1212 and target text generation module 1214.The computer program that each program module is constituted makes processor execute this specification Described in each embodiment of the application text handling method in step.

For example, computer equipment shown in Figure 13 can pass through the retrieval in text processing apparatus as shown in figure 12 Module executes the step of list entries for obtaining source text.Computer equipment can be executed by coding module passes through list entries Semantic coding obtains the step of source sequence vector.Computer equipment can be executed by Weight Acquisition module and obtain source vector sequence In column the step of each word corresponding first weight vectors.Computer equipment can execute basis by target side vector generation module Source sequence vector and corresponding first weight vectors of each word, the step of generating the target side vector of each word.Computer is set It is standby to execute the step of target sentences vector is obtained according to source sequence vector by target sentences vector determining module.Computer Equipment can execute the target side vector sum target sentences vector according to each word by target word determining module, determine each word pair The step of target word answered.Computer equipment can be executed by target text generation module according to the corresponding target word of each word, The step of generating source text corresponding target text.

In one embodiment, a kind of computer equipment, including memory and processor are provided, memory is stored with meter Calculation machine program, when computer program is executed by processor, so that the step of processor executes above-mentioned text handling method.It is literary herein The step for the treatment of method, can be the step in the text handling method of above-mentioned each embodiment.

In one embodiment, a kind of computer readable storage medium is provided, computer program, computer journey are stored with When sequence is executed by processor, so that the step of processor executes above-mentioned text handling method.The step of text handling method herein It can be the step in the text handling method of above-mentioned each embodiment.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a non-volatile computer and can be read In storage medium, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, provided herein Each embodiment used in any reference to memory, storage, database or other media, may each comprise non-volatile And/or volatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of text handling method, comprising:

Obtain the list entries of source text；

The list entries is obtained into source sequence vector by semantic coding；

According to the source sequence vector and corresponding first weight vectors of each word, the target side of each word is generated Vector；

Target sentences vector is obtained according to the source sequence vector；

2. the method according to claim 1, wherein the target sentences vector is shallow-layer sentence vector；It is described Target sentences vector is obtained according to the source sequence vector, comprising:

Obtain the source sequence vector of encoder output layer；

It determines mean value of the source sequence vector of the output layer in corresponding dimension, generates shallow-layer sentence vector.

3. the method according to claim 1, wherein the target sentences vector is shallow-layer sentence vector；It is described Target sentences vector is obtained according to the source sequence vector, comprising:

Obtain the source sequence vector of encoder output layer；

It determines maximum value of the source sequence vector of the output layer in corresponding dimension, generates shallow-layer sentence vector.

4. the method according to claim 1, wherein the target sentences vector is shallow-layer sentence vector；It is described Target sentences vector is obtained according to the source sequence vector, comprising:

Obtain the list entries of encoder input layer；

It determines maximum value of the list entries of the input layer in corresponding dimension, obtains intermediate vector；

Obtain the source sequence vector of encoder output layer；

Determine the similarity vector between the intermediate vector and the source sequence vector of the output layer；

The corresponding weight vectors of the similarity are obtained according to the similarity vector；

According to the source sequence vector of the weight vectors and the output layer, shallow-layer sentence vector is generated.

5. the method according to claim 1, wherein the target sentences vector is deep sentence vector；It is described Target sentences vector is obtained according to the source sequence vector, comprising:

Obtain the source sequence vector of every layer of encoder；

According to described every layer of source sequence vector, every layer of shallow-layer sentence vector is obtained；

According to described every layer of shallow-layer sentence vector, deep sentence vector is generated.

6. according to the method described in claim 5, it is characterized in that, the shallow-layer sentence vector according to described every layer, generates Deep sentence vector, comprising:

Described every layer of shallow-layer sentence vector is inputted into Recognition with Recurrent Neural Network, obtains the deep layer of Recognition with Recurrent Neural Network output layer output Sentence vector；Each layer network of the Recognition with Recurrent Neural Network corresponds to each layer network of the encoder, the Recognition with Recurrent Neural Network In each layer of input include in encoder in the shallow-layer sentence vector sum Recognition with Recurrent Neural Network of respective layer upper one layer of this layer it is defeated Hidden state vector out, the hidden state vector are the shallow-layer sentence vector that the Recognition with Recurrent Neural Network inputs upper one layer It is obtained after processing.

7. according to the method described in claim 5, it is characterized in that, the shallow-layer sentence vector according to described every layer, generates Deep sentence vector, comprising:

Determine the similarity vector between upper one layer of target side vector of the shallow-layer sentence vector sum current layer of current layer；

Between the upper one layer of target side vector for determining the shallow-layer sentence vector sum current layer of current layer according to the similarity vector Weight vectors；

According to the weight vectors and every layer of shallow-layer sentence vector, deep sentence vector is generated.

8. according to the method described in claim 5, it is characterized in that, the source sequence vector according to described every layer, generates Every layer of shallow-layer sentence vector, comprising:

Obtain the source sequence vector of every layer of encoder；

It is used as current layer one by one by every layer, determines the source sequence vector of current layer and upper one layer of shallow-layer sentence vector of current layer Between similarity vector；

According to the source sequence vector of the weight vectors and current layer, the shallow-layer sentence vector of current layer is generated.

9. the method according to claim 1, wherein described according to the target side vector sum of each word Target sentences vector determines the corresponding target word of each word, comprising:

According to target sentences vector described in the target side vector sum of each word, the corresponding prediction term vector of each word is obtained；

Obtain the term vector of the candidate word of target side；

Determine the similarity of the term vector of the candidate word of the corresponding prediction term vector of each word and the target side；

Using the highest candidate word of similarity of prediction term vector corresponding with each word as the corresponding mesh of each word Mark word.

10. a kind of text-processing is set, which is characterized in that described device includes:

Retrieval module, for obtaining the list entries of source text；

Target side vector generation module, for according to the source sequence vector and corresponding first weight of each word to Amount generates the target side vector of each word；

Target word determining module determines each for the target sentences vector according to the target side vector sum of each word The corresponding target word of word；

Target text generation module, for generating the corresponding target of the source text according to the corresponding target word of each word Text.

11. a kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor is executed such as the step of any one of claims 1 to 9 the method.

12. a kind of computer equipment, including memory and processor, the memory is stored with computer program, the calculating When machine program is executed by the processor, so that the processor executes the step such as any one of claims 1 to 9 the method Suddenly.