CN109492232A - A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer - Google Patents
A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer Download PDFInfo
- Publication number
- CN109492232A CN109492232A CN201811231017.2A CN201811231017A CN109492232A CN 109492232 A CN109492232 A CN 109492232A CN 201811231017 A CN201811231017 A CN 201811231017A CN 109492232 A CN109492232 A CN 109492232A
- Authority
- CN
- China
- Prior art keywords
- sublayer
- similarity
- semantic
- indicate
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The illiteracy Chinese machine translation method for the enhancing semantic feature information based on Transformer model that this paper presents a kind of.Firstly, the present invention from the language feature of Mongolian, finds out it in stem, affixe with the feature for the supplementary element passed, and these language features are dissolved among the training of model.Secondly, the present invention is expressed as research background to measure the distribution of the similarity degree between two words, the comprehensive analysis influence of depth and density, semantic registration to Concept Semantic Similarity.The present invention is in translation process, using Transformer model, the Transformer model is to carry out multilevel encoder-decoder architecture position encoded and based on enhanced bull attention mechanism construction using trigonometric function, to place one's entire reliance upon attention mechanism to draw the global dependence between outputting and inputting, recurrence and convolution are eliminated.
Description
Technical field
The invention belongs to machine translation mothod field, in particular to a kind of enhancing semantic feature based on Transformer
The illiteracy Chinese machine translation method of information.
Background technique
Mongol is a kind of agglutinative language, is under the jurisdiction of Altai family.Mongolian written has traditional Mongolian and West
That Mongolian, " illiteracy " in illiteracy Chinese translation system that we are studied here refer to the translation of traditional Mongolian to Chinese.It passes
Mongolian of uniting is also a kind of alphabetic writing, and alphabetical form is not unique, position phase of the variation of form with letter in word
It closes, position includes that the independent of word starts, in word and suffix.The word of Mongolian is by root (root)+affixe (suffix) side
Formula is formed, and affixe is divided into two classes: one kind assigns original word for sewing to be connected to new meaning behind root, be called derivative
Sew, sews behind root and connect one or more derivational suffixes just and will form stem (stem);It is another kind of sew to be connected to behind stem be used for
Express grammatical meaning.All there are a variety of variations such as tense, number, lattice in noun, the verb of Mongolian, these variations are again by sewing
Affixe is connect to realize, therefore Mongolian morphological change is extremely complex.In addition, the word order of Mongolian and Chinese have very big difference,
The verb of Mongolian is behind subject and predicate, and positioned at the end of sentence, and verb is between subject and object in Chinese.
A dimension difference of vector is only used with one-hot expression, the distributed of word indicates, uses the dense reality of low-dimensional
Number vector indicates word.In the low-dimensional vector space, can be convenient according to distance or angle isometry mode, measure two
Similarity degree between a word.In addition, on technological layer, under the background studied statistical language model, Google
Company has opened Word2vec in 2013, and this is a for training the software tool of term vector.Word2vec can be according to given
Corpus, by optimization after training pattern a word is fast and effeciently expressed as vector form, be natural language at
The application study in reason field provides new tool.However, Word2vec relies on skip-grams or continuous bag of words (CBOW) are come
Establish neural word insertion.But word2vec realizes when semantic relevancy calculates there is certain limitation at present, on the one hand uses
Foundation of the local context information of translation to be generated as prediction translation, not using global contextual information, so right
Contextual information using insufficient, there is also rooms for promotion for the extraction of semantic feature.On the other hand, due to the knot of frame itself
Structure limits the parallelization of calculating, and computational efficiency is up for improving.
Traditional machine translation system, it is most of be based on Recognition with Recurrent Neural Network (RNN), shot and long term memory (LSTM) or
Gate recurrent neural network (GRU).These methods have become the Series Modelings such as machine translation in the past few years and conversion is asked
Inscribe state-of-the-art method.However recursive models usually consider the calculating along the character position for outputting and inputting sequence.By position with
The step alignment in the time is calculated, they generate a series of hidden state h in position t inputt, while being also previously to hide shape
State ht-1Function.This intrinsic sequential nature eliminates the parallelization in training example, and parallelization is in longer sequence length
In become most important because memory restrict crosses over exemplary batch processing.Nearest work is by decomposing skill and base
Significantly improving for computational efficiency is realized in the calculating of condition, while also improving model performance in the latter case.However,
The basic constraint that sequence calculates still has.
Current encoder device-decoder chassis is a main model for solving the problems, such as sequence to sequence.Model uses coding
Device carries out compression expression to source language sentence, generates target language sentence based on the compression expression of source using decoder.The knot
The benefit of structure can be achieved on the modeling of end-to-end mode between two sentences, and all parametric variables are unified to one in model
It is trained under objective function, model performance is preferable.Fig. 1 illustrates the structure of coder-decoder model, is Down-Up one
The process of a machine translation.
Encoder and decoder can select the neural network of different structure, such as RNN, CNN.The working method of RNN is
To sequence according to time step, compression expression is successively carried out.When using RNN, two-way RNN structure generally will use.Specifically
Mode is using a RNN to the compression expression of element progress from left to right in sequence, another RNN carries out from the right side sequence
Compression expression to the left.Two kinds indicate to be joined together using the distribution as ultimate sequence indicates.In the structure, due to
To be handled in order the element in sequence, the interaction distance between two words may be considered between them it is opposite away from
From.With the growth of sentence, the increase of relative distance, there is the apparent theoretical upper limit to the processing of information.
When using CNN structure, the structure of multilayer is generally used, Lai Shixian sequence is partially illustrated the mistake of global expression
Journey.The viewpoint that can regard a kind of time series as using RNN modeling sentence can regard a kind of knot as using CNN modeling sentence
The viewpoint of structure.Sequence using RNN structure mainly includes RNNSearch, GNMT etc. to series model, uses CNN structure
Sequence mainly has ConvS2S etc. to series model, and what is embodied is a kind of from part to global feature extraction process, between word
Interaction distance, corresponding thereto apart from directly proportional.It can only meet on higher CNN node apart from farther away word, just generate friendship
Mutually, this process may have more information and lose.
Summary of the invention
In order to overcome the disadvantages of the above prior art, the purpose of the present invention is to provide a kind of based on Transformer's
Enhance the illiteracy Chinese machine translation method of semantic feature information, the system based entirely on attention mechanism, completely eliminate recurrence and
Convolution.Experiment shows that the system is more superior in quality, while being easier to parallelization, and the less time is needed to be instructed
Practice, reaches 45.4BLEU in the translation duties of 120 Wan Menghan Parallel Corpus, realize higher translation quality.
To achieve the goals above, the technical solution adopted by the present invention is that: a kind of enhancing based on Transformer is semantic
The illiteracy Chinese machine translation method of characteristic information, which is characterized in that Transformer model is used in translation process, it is described
Transformer model is to carry out multilayer position encoded and based on enhanced bull attention mechanism construction using trigonometric function
Coder-decoder framework, thus place one's entire reliance upon attention mechanism to draw the global dependence between outputting and inputting,
Eliminate recurrence and convolution.
Before translation, feature is preferably extracted for the ease of deep learning neural network, first data is pre-processed,
Described to carry out pretreatment to data be to carry out cutting separation to the supplementary element of stem, affixe and lattice in Mongolian corpus, with drop
The sparsity of low data, while character segmentation processing is carried out to Chinese, Mongolian is found out in stem, affixe with the supplementary element passed
Language feature, and these language features are dissolved among training.
The cutting separation includes the additional of the affixe cutting of small grain size, the stem cutting of big granularity and small-scale lattice
Ingredient cutting.
After being pre-processed to data, the influence of comprehensive depth, density, semantic registration to Concept Semantic Similarity, collection
Similarity matrix is established at the similarity algorithm of semantic distance and the information content, then carries out principal component analysis, by similarity moment
Battle array is converted into principal component transform matrix, calculates principal component contributor rate, and be weighted processing as weight, obtains final
Concept Semantic Similarity.
The formula of the similarity matrix is expressed as
Xsim=(xi1,xi2,xi3,xi4,xi5)T, i=1,2,3 ..., n
The final Concept Semantic Similarity calculates representation formula
δsim=r1ysim1+r2ysim2+r3ysim3+r4ysim4+r5ysim5
Wherein, XsimIndicate similarity matrix, xi1Indicate Ds,xi2It indicatesxi3Indicate Zs,xi4Indicate Ss,xi5Indicate Is,N is to be compared concept to the logarithm of the notional word in set, xi=(Dsi,Ksi,Zsi,Ssi,Isi), based on
A vector in ingredient input sample set, wherein respectively representing each section in comprehensive similarity computing module per one-dimensional variable
Semantic Similarity Measurement as a result, DsiIndicate the relationship in vector between the semantic distance and similarity of i-th dimension element, KsiTable
Show the semantic similarity in vector in terms of the depth of i-th dimension element, ZsiIndicate the density of the notional word c of i-th dimension element in vector
Impact factor, SsiIndicate the similarity in vector in terms of the semantic registration of i-th dimension element, IsiIndicate i-th dimension element in vector
The information content in terms of similarity;δsimIndicate Concept Semantic Similarity, ysim1,ysim2,ysim3,ysim4,ysim5For to similarity
Matrix XsimCarry out the principal component that principal component analysis is extracted, r1,r2,r3,r4,r5Indicate each principal component contributor rate.
The bull attention mechanism is described as inquiry and one group of key-value pair is mapped to output, wherein inquiry, key, value and defeated
It is all vector out, output is calculated as the weighted sum of value, distributes to the weight of each value by inquiring the compatibility with corresponding secret key
Function is calculated.
The encoder is made of N number of identical layer, sublayer there are two every layer, and first sublayer is bull attention
Layer, second sublayer are propagated forward sublayers, and each sublayer is output and input there is residual error connection, after each sublayer
Face follows a step regularization to operate, to accelerate model convergence;
The decoder is made of N number of identical layer, and every layer there are three sublayers, and first sublayer is mask matrix majorization
Bull attention sublayer, for modeling the target side sentence generated, during training, with a mask matrix majorization
Each bull attention only calculates when calculating and arrives preceding t-1 word;Second sublayer is bull attention sublayer, is encoder reconciliation
Attention mechanism between code device, that is, go in original language to look for relevant semantic information;Third sublayer is propagated forward sublayer, with
Propagated forward sublayer in encoder is completely the same, and each sublayer outputs and inputs that there is residual error connections, and heel one
Regularization operation is walked, to accelerate model convergence.
Multilevel encoder-decoder architecture is constructed by the following method:
In encoder, the output of each sublayer is LayerNorm (x+Sublayer (x)), and wherein LayerNorm () is indicated
Layer normalized function, the function that Sublayer () is realized using the sublayer itself that the residual error based on bull attention mechanism connects,
X indicates the current layer vector to be inputted, and Mongolian sentence is generated corresponding vector using word2vec vector techniques, is then made
For the input of the first layer coder, i.e. Sublayer (x) is the function of being realized by the sublayer itself based on bull attention mechanism,
In order to promote residual error to connect, all sublayers and embeding layer generate dimension dmodel=512 output.
The propagated forward sublayer of the encoder has a linear transformation twice in realizing, a Relu nonlinear activation, specifically
Calculation formula is as follows:
FFN (x)=γ (0, xW1+b1)W2+b2
X presentation code device inputs information, W1Indicate the corresponding weight of input vector, b1Indicate the inclined of bull attention mechanism
The factor is set, (0, xW1+b1) indicate propagated forward sublayer input layer information, W2The corresponding weight of input vector is indicated, before b2 expression
To the bias factor of propagation function, the nonlinear activation function of γ presentation code device layer.
It carry out position encoded being calculated absolute position as the variable in trigonometric function using trigonometric function, formula is such as
Under:
In formula, pos is position, and i is dimension, i.e., position encoded each dimension corresponds to sine curve, and wavelength is formed from 2
Geometric progression of the π to 100002 π, dmodelBe it is position encoded after embeding layer dimension, the value range of 2i is that minimum value is
0, maximum value is dmodel。
Compared with the prior art, the advantages of the present invention are as follows:
1, the present invention uses the Series Modeling method based on Transformer, and the model of sequence to sequence is still continued to use
Classical coder-decoder structure, but RNN or CNN are not used as Series Modeling mechanism, but used bull note
Meaning power mechanism, to be easier capture " long-distance dependence information ".
2, the stem of the invention in Mongolian corpus, affixe are split with the supplementary element passed, the supplementary element of lattice
It is affixe special in Mongolian, the difference with common affixe, first consisting in it only indicates grammer meaning, without any language
The meaning of adopted level, the present invention carry out cutting separation to the supplementary element of the lattice in corpus, on the one hand can reduce the dilute of data
Property is dredged, Mongolian stem information is on the other hand also preferably remained.
3, the present invention is directed to the serious Sparse Problem as caused by Mongolian word-building characteristic, proposes three kinds in various degree
Word segmentation scheme, be the supplementary element of the affixe cutting of small grain size, the stem cutting of big granularity and small-scale lattice respectively
Cutting.Experiment shows to combine stem cutting and the supplementary element cutting of lattice, can maximally promote the quality of translation.
4, the present invention is expressed as research background to measure the distribution of the similarity degree between two words, comprehensive analysis depth
The influence of degree and density, semantic registration to Concept Semantic Similarity, and it is integrated with traditional semantic distance and the information content
Similarity algorithm establishes similarity matrix, by carrying out principal component analysis to it, original similarity matrix is converted into newly
Principal component transform matrix, calculate its principal component contributor rate, and be weighted processing as weight, obtain final concept
Semantic similarity.
Detailed description of the invention
Fig. 1 is the illiteracy Chinese machine translation frame diagram the present invention is based on Transformer.
Fig. 2 is the illustraton of model the present invention is based on bull attention mechanism to Series Modeling.
Fig. 3 is " soft " attention model figure of the invention.
Fig. 4 is bull attention model figure of the present invention.
Fig. 5 is morpheme cutting flow chart of the present invention.
Fig. 6 is computation model of the bull attention mechanism of the present invention to weight.
Fig. 7 is that the present invention uses two-way RNN to carry out modeling schematic diagram to sequence.
Fig. 8 is that the present invention uses multi-layer C NN to carry out modeling schematic diagram to sequence.
Fig. 9 be aggregate concept Semantic Similarity Measurement of the present invention distributed algorithm under randomly select the phases of 65 groups of words pair
Like degree distribution map.
Specific embodiment
The embodiment that the present invention will be described in detail with reference to the accompanying drawings and examples.
A kind of illiteracy Chinese machine translation method based on Transformer of the present invention, first pre-processes Mongolian corpus, then with
The correlation model that word2vec generates term vector is research background, and comprehensive depth, density, semantic registration are similar to Concept Semantic
The similarity algorithm of the influence of degree, Semantic distance and the information content establishes similarity matrix, then carries out principal component analysis,
Similarity matrix is converted into principal component transform matrix, calculates principal component contributor rate, and be weighted processing as weight,
Obtain final Concept Semantic Similarity;Transformer model is finally used in translation process, thus the note that places one's entire reliance upon
Meaning power mechanism draws the global dependence between outputting and inputting, and eliminates recurrence and convolution, wherein the Transformer mould
Type is to carry out multilevel encoder-decoder position encoded and based on enhanced bull attention mechanism construction using trigonometric function
Framework.
Mongolian corpus pretreatment: morpheme cutting based on dictionary, when carrying out cutting firstly the need of utilize word frequency statistics
The dictionary of tool OpenNMT.dict generation Mongol corpus.After dictionary generates, searches for stem in dictionary and summarized, generated
Stem table.Part other than stem table is corresponding affixe exterior portion point.Herein based on stem table and affixe table, using reverse
Maximum matching algorithm carries out morpheme to Mongolian each word-building and carries out cutting, and cutting process is as shown in Figure 5.For each
A Mongolian word to be processed matches all dictionary records, one by one if a Mongolian word to be processed includes a certain item
Record, then carry out cutting, keep the supplementary element of lattice disconnected, last a Mongolian word is separated into two parts: lattice add
Ingredient a part, being left part is another part.
To cover Chinese bilingual corpora carry out coding be uniformly processed after, construct bilingual dictionary on this basis.It should
The modeling of Transformer model carries out position comprising the coder-decoder structure of building multilayered structure, using trigonometric function
Coding and the model construction based on enhanced bull attention mechanism, and to the training optimization method of model and canonical strategy into
Row improves.
On the algorithm based on the information content, the present invention passes through the distributed analysis indicated to word, finds concept packet
The sub- concept contained is more, and the concept information contained content is fewer, and gives the distributed I calculating mould indicated for word
Type:
Wherein: all sub- concept node numbers of h (c) expression notional word node c;maxwnIt is a constant, indicates in semanteme
All concept node trees in classification tree.
The present invention proposes the aggregative weighted method based on principal component analysis, and principal component analysis is introduced into weight computing, benefit
Use the contribution rate of principal component as weight to similarity carry out aggregative weighted calculating, alleviate dimension disaster, gradient explosion ask
Topic, facilitates the fast convergence of model.
Aggregative weighted algorithm in the present invention based on principal component analysis is mainly by multi-angle similarity calculation, similarity matrix
It extracts, 3 part of the weight computing composition based on principal component analysis.
First part's multi-angle similarity calculation
Semantic similarity is analyzed from semantic distance, depth and density, semantic registration and the information content respectively,
Provide the calculation formula of each section semantic similarity.
(1) semantic distance
Relationship between semantic distance and similarity is expressed as
Wherein: c1And c2For the distributed vector of two concepts to be compared;A is an adjustable parameter, is taken here to be compared
Value of the concept to the average semantic distance of set as a;D(c1,c2) be 2 concepts semantic distance, indicate c1And c2Between
Shortest path.
(2) depth and density
The level locating for semantic tree interior joint is higher, and representative notional word is more abstract;Locating level is lower, representative
Notional word it is more specific.If the notional word c compared1And c2Node where the depth capacity of semantic tree be respectively Kmax(c1) and
Kmax(c2), notional word c1And c2Node depth be respectively K (c1) and K (c2), then the Semantic Similarity Measurement formula in terms of depth
For
In semantic hierarchies tree, the density of regional area is bigger, illustrates that this region is more specific to the division of concept, in region
Semantic similarity between notional word is relatively large.The Effects of Density factor of notional word c is
Wherein: n (c) is using notional word node c as the direct descendent number of root node, and n (O) is notional word c node place
The maximum value of the direct descendent number of sub- each node of semantic tree O.It is obtained based on following formula and is compared notional word c1And c2?
The calculation formula of semantic similarity is in terms of density
(3) semantic registration
The root node for defining semantic hierarchies tree is R, c1And c2For arbitrary 2 notional word nodes, S (c1) it is from c1It sets out
Until the node number in the node set that root node R is passed through, S (c2) it is from c2The node to set out until root node R is passed through
Node number in set, S (c1)∩S(c2) indicate from c1And c2The node set (intersection) passed through jointly to R, S (c1)∪S
(c2) indicate from c1To the R node set passed through and c2The union of the node set passed through to R, then in terms of semantic registration
Similarity be expressed as
(4) information content
In order to define the similarity in terms of the information content, following algorithm is proposed to calculate I value.Calculation formula is
Wherein, c1And c2Indicate the distributed vector of two concepts to be compared, I (c1) indicate with Concept Vectors c1For father
The sum of the vector dimension of all child nodes of node, I (c2) indicate with Concept Vectors c2For the vector of all child nodes of father node
The sum of dimension.
Second part similarity matrix extracts
Assuming that being compared concept to having n in set to notional word, if xi=(Dsi,Ksi,Zsi,Ssi,Isi) it is that principal component inputs
A vector in sample set, wherein it is similar to respectively represent each section semanteme in comprehensive similarity computing module per one-dimensional variable
Degree calculate as a result, DsiIndicate the relationship in vector between the semantic distance and similarity of i-th dimension element, KsiIt indicates in vector
Semantic similarity in terms of the depth of i-th dimension element, ZsiIndicate the Effects of Density factor of the notional word c of i-th dimension element in vector,
SsiIndicate the similarity in vector in terms of the semantic registration of i-th dimension element, IsiIt indicates in vector in the information of i-th dimension element
Hold the similarity of aspect.
Then similarity matrix is expressed as
Xsim=(xi1,xi2,xi3,xi4,xi5)T, i=1,2,3 ..., n
Weight computing of the Part III based on principal component analysis
The thought of principal component analysis is that multiple indexs are converted into several overall targets under the premise of losing little information
Multivariate statistical method.The overall target being usually converted into is known as principal component, wherein each principal component is original variable
Linear combination, and it is irrelevant between each principal component, this, which allows for principal component, has certain superior performances than original variable.
The weight of each principal component is distributed in Principal Component Analysis Algorithm according to the contribution rate of principal component, rather than artificially determine, thus
The defect that weight is artificially determined in multi-variables analysis is overcome, so that result is objective rationally.
To the similarity matrix X builtsimPrincipal component analysis is carried out, the principal component extracted is
Y=(ysim1,ysim2,ysim3,ysim4,ysim5)
Each principal component contributor rate is (r1,r2,r3,r4,r5), then final Concept Semantic Similarity calculation formula is
δsim=r1ysim1+r2ysim2+r3ysim3+r4ysim4+r5ysim5
The method for constructing the coder-decoder structure of multilayered structure are as follows:
In encoder, the output of each sublayer is LayerNorm (x+Sublayer (x)), and wherein LayerNorm () is indicated
Layer normalized function, the function that Sublayer () is realized using the sublayer itself that the residual error based on bull attention mechanism connects,
X indicates the current layer vector to be inputted, and Mongolian sentence is generated corresponding vector using word2vec vector techniques, is then made
For the input of the first layer coder, i.e. Sublayer (x) is the function of being realized by the sublayer itself based on bull attention mechanism,
In order to promote residual error to connect, all sublayers and embeding layer generate dimension dmodel=512 output.
Fig. 1 illustrates one layer of encoder and decoder of the structure of Transformer.
With reference to Fig. 1, the Nx in left side represents one layer of encoder, and two sublayers are contained in this layer, and first sublayer is
Bull attention sublayer, second sublayer are a propagated forward sublayers.Each sublayer outputs and inputs that there is residual errors
Connection, this mode can theoretically return gradient well.Each sublayer is followed by step regularization operation, regularization
Use can accelerate the convergence rate of model.The calculating of bull attention sublayer, will be more in enhanced bull attention mechanism
It is discussed in detail in the model construction of head attention mechanism.Propagated forward sublayer has linear transformation twice in realizing, one time Relu is non-
Linear activation, specific formula for calculation are as follows:
FFN (x)=γ (0, xW1+b1)W2+b2
X presentation code device inputs information, W1Indicate the corresponding weight of input vector, b1Indicate the inclined of bull attention mechanism
Set the factor.(0,xW1+b1) indicate propagated forward sublayer input layer information, W2The corresponding weight of input vector is indicated, before b2 expression
To the bias factor of propagation function, the nonlinear activation function of γ presentation code device layer.Wherein, encoder input information is insertion
The vector obtained after layer information coal addition position encoded information processing.It is first son of encoder that propagated forward sublayer, which inputs information,
Output after layer processing.
With reference to Fig. 1, the Nx on right side represents one layer in decoder of structure, there are three sublayer structures in this layer, first
A sublayer is the bull attention sublayer of mask matrix majorization, for modeling the target side sentence generated, in trained mistake
Cheng Zhong needs a mask matrix to control, and when so that bull attention calculates every time, only calculates and arrives preceding t-1 word.The
Two sublayers are bull attention sublayers, are the attention mechanism between encoder and decoder, that is, go in original language to look for
Relevant semantic information, the calculating of this part and the attention of other sequences to sequence calculate unanimously, make in Transformer
With the mode of dot product.Third sublayer is propagated forward sublayer, completely the same with the propagated forward sublayer in encoder.Each
Also all there is residual error connections and regularization operation for sublayer, to accelerate model convergence.
The present invention carries out position encoded method using trigonometric function are as follows:
Bull attention mechanism models the mode of sequence, neither the timing feature of RNN, nor the structuring of CNN is special
Point, but the characteristics of a kind of bag of words (bag of words).If being further described, it should say that the mechanism regards a sequence to be flat
Flat structure, no matter being all 1 in bull attention mechanism because distance word how far seemed.Such modeling pattern, it is real
The relative distance relationship between word can be lost on border.Citing: " ox has eaten grass ", " grass has eaten ox ", " having eaten timothy " three sentences
The corresponding expression of each word come is modeled, can be consistent.
In order to alleviate this problem, the present invention word is mapped to the location of in sentence in Transformer to
Amount, adds in its embeding layer.The thinking is not to be suggested for the first time, and there is also similarly be difficult to build in fact for CNN model
The defect of mould relative position (timing information), Facebook propose position encoded method.A kind of direct mode is, directly
Inside absolute location information modeling to embeding layer, i.e., by word WiI be mapped to a vector, be added in its embeding layer, but
The shortcomings that this mode is the sequence that can only model finite length.
A kind of new timing information modeling pattern is used in the present invention, that is, utilizes the periodicity of trigonometric function, Lai Jianmo
Relative positional relationship between word.Specific mode is calculated absolute position as the variable in trigonometric function, specific public
Formula is as follows:
Pos is position, and i is dimension.That is, position encoded each dimension corresponds to sine curve.Wavelength is formed
From 2 π to the geometric progression of 100002 π.The present invention has selected this function, it allows model easily to learn relative position,
Because for any constant offset k, PEpos+kIt can be expressed as PEposLinear function.dmodelBe it is position encoded after embeding layer
Dimension, the value range of 2i is that minimum value is 0, and maximum value is dmodel。
Trigonometric function has good periodicity, that is, every a certain distance, the value of dependent variable can repeat, this spy
Property can be used to model relative distance;On the other hand, the codomain of trigonometric function is [- 1,1], can provide embeding layer member well
The value of element.
The method of the model construction based on enhanced bull attention mechanism are as follows:
Fig. 2 illustrates the Series Modeling method based on bull attention mechanism.Note that it is apparent in order to show figure,
Lack some connecting lines of picture, each word and first layer in " source language sentence subvector " layer (i.e. original language morpheme vector in figure) are more
Node in head attention layer is all the relationship connected entirely, between first layer bull attention layer and second layer bull attention layer
Node be also all the relationship connected entirely.It can be seen that the interaction distance between any two word is all in this modeling method
It is that there is no relationships for relative distance between 1, with word.Under this mode, the semantic determination of each word, all consider with entirely
The relationship of all words in sentence.Bull attention mechanism can capture more so that this global interaction becomes more complicated
More information.
To sum up, bull attention mechanism can capture long-distance dependence knowledge when modeling sequence problem, have better
Theoretical basis.
The mathematical formization expression of bull attention mechanism is described below.Firstly, from being said attention mechanism.
1. attention mechanism (model)
When handling a large amount of input information with neural network, the attention mechanism of human brain can also be used for reference, is only selected
The information input of some keys is handled, the efficiency of Lai Tigao neural network.In current neural network model, it can incite somebody to action
Maximum convergence (max pooling) gates (gating) mechanism approximatively to regard the note based on conspicuousness from bottom to top as
Meaning power mechanism.In addition to this, top-down convergence type attention is also a kind of effective information selection mode.Understood with reading
For task, as soon as given very long article, then the content of this article is putd question to.The problem of proposition, is only and in paragraph
One or two of sentence is related, and rest part is all unrelated.In order to reduce the computation burden of neural network, it is only necessary to relevant
Section, which is picked out, allows subsequent neural network to handle, without all article contents are all inputed to neural network.
Use x1:N=[x1,…,xN] indicate N number of input information, in order to save computing resource, not needing will be all N number of defeated
Enter information and be all input to neural network to be calculated, it is only necessary to from x1:NThe middle some information inputs relevant with task of selection are to mind
Through network.A query vector q relevant with task is given, we indicate to be selected information with attention variable z ∈ [1, N]
Index position, i.e. z=i expression selected i-th of input information.In order to facilitate calculating, selected using the information of a kind of " soft "
The system of selecting a good opportunity is calculated in given q and x first1:NUnder, select the probability α of i-th of input informationi,
Wherein s (xi, q) and it is scoring functions, following three kinds of modes can be used to calculate:
Addition model s (xi, q) and=vTtanh(Wxi+Uq)
Dot product model
Multiplied model
Wherein W, U, v are the network parameter that can learn, and T is the transposition operation of matrix.
Attention is distributed αiIt can be construed in Context query q, the concerned degree of i-th of information.Using one kind
The information selection mechanism of " soft " is encoded to input information
Fig. 3 gives the example of " soft " attention mechanism.
2. the variant of attention mechanism
2.1 key-value pair attentions
More generally, input information can be indicated with key-value pair (key-value pair) format, wherein " key " K is used to
It calculates attention and is distributed αi, " value " V is used to generate the information of selection.With (k, v)1:N=[(k1,v1),…,(kN,vN)] indicate N number of
It inputs information, when the relevant query vector q of Given task, notices that force function is
Wherein s (ki, q) and indicate scoring functions.
Fig. 4 gives the example of key-value pair attention mechanism.If the k in key-value pair modei=vi,Then it is just etc.
Valence is in common attention mechanism.
2.2 scaling dot product attentions
Scale dot product attention algorithm be describe by key-value pair K-V and query vector q, very be abstracted, here we
Assuming that " key " K in key-value pair corresponds to same vector, i.e. K=V with " value " V, as shown in fig. 6, query vector q corresponds to target sentence
The term vector of son.
There are three steps for specific operation.
1. the calculating process that each query vector q and " key " K can make a dot product
2. finally will use softmax their normalizings, it is maintained at the range of probability value in [0,1] section.
3. can be used to the end multiplied by " value " V as attention force vector again
Here mathematic(al) representation is as follows.
WhereinFor zoom factor, the transposition operation of T representing matrix.
2.3 bull attentions
Bull attention is to utilize multiple queries q1:M={ q1,…,qM, it is more to calculate the selection from input information in parallel
A information.The different piece of each attention concern input information.
The method that the present invention improves the training optimization method and canonical strategy of model are as follows:
The training of model uses Adam method, and present invention employs a kind of learning rate adjusting methods for being warm up, such as
Shown in formula:
The formula is meant that training needs to preset the super ginseng of a warmup_steps.
A. when train epochs step_num is less than the value, learning rate, the formula are determined with the Section 2 formula in bracket
The linear function that the slope of really step_num variable is positive.
B. when train epochs step_num is greater than warm_steps, learning rate, the public affairs are determined with the first item in bracket
Formula is just the power function of negative at an index.
So on the whole, learning rate is conducive to the fast convergence of model in downward trend after first rising.
Two important regularization methods are also used in model, one is common dropout method, is used in every
Behind a sublayer and in the calculating of attention.The other is label smoothing method, that is, when training, calculate cross entropy
When, no longer it is the model answer of one-hot, but also fills a non-zero minimum at each 0 value.In this way may be used
To enhance the robustness of model, the BLEU value of lift scheme.
To sum up, the present invention is based on the modeling sequence method of Transformer, the model of sequence to sequence has still continued to use warp
The coder-decoder structure of allusion quotation the difference is that not using RNN or CNN as Series Modeling mechanism, but has used more
Head attention mechanism.The theoretic advantage of bull attention mechanism is more easily capture " long-distance dependence information ".It is so-called " long
Apart from Dependency Specification " it can be regarded as: 1) word be in fact the symbol that can express diversity semantic information (ambiguity ask
Topic).2) semanteme of a word determines, to rely on the context environmental where it.(based on context disappear qi) 3) word that has may
The lesser context environmental of range is needed just to can determine that its semantic (short distance dependence phenomenon), some words may need one
The biggish context environmental of range just can determine that its semantic (long-distance dependence phenomenon).
For example, following two word is seen:
" have many cuckoos on mountain, spring to when, can opening all over the mountains and plains, it is very beautiful."
" have many cuckoos on mountain, spring to when, can cry of birds or animals all over the mountains and plains, very in a roundabout way."
In this two word, " cuckoo " respectively refers to colored and bird.In machine translation problem, if do not seen distant away from its
The word of distance is difficult to translate " cuckoo " this word correct.The example is an obvious example, can significantly be seen
Remote dependence between word.Certainly, the most of meaning of a word in a small range of context semantic environment just
It was determined that as the ratio regular meeting that above-mentioned example accounts in language is relatively small.It is desirable that be that model can either be good
Learn to short-range dependence knowledge, can also learn the knowledge to long-distance dependence.
It is short-range that bull attention mechanism in Transformer of the present invention theoretically can preferably capture this length
Knowledge is relied on, lower mask body compares three kinds of Series Modeling methods based on RNN, CNN, Transformer, between any two word
Interaction distance on difference.
Fig. 7 is the method modeled using two-way RNN to sequence.Due to be to the element in sequence in order
Processing, the interaction distance between two words may be considered the relative distance between them.Interaction distance between W1 and Wn
It is n-1.Historical information selectively can be stored and be forgotten in RNN model theory with door control mechanism, have than
Pure RNN structure preferably shows, but in the case that gating parameter amount is certain, this ability is certain.With the increasing of sentence
Long, there is the apparent theoretical upper limit in the increase of relative distance.
Fig. 8 illustrates the method modeled using multi-layer C NN to sequence.The semantic ring of the CNN unit covering of first layer
Border range is smaller, and the semantic environment range of second layer covering can become larger, and so on, the more CNN unit of deep layer, the semanteme of covering
Environment can be bigger.One prefix can first interact on bottom CNN unit with the generation of the word of its short distance, then in slightly higher level
It is interacted on CNN unit with its farther some word generation.So the CNN structure of multilayer embody be it is a kind of from part to the overall situation
Feature extraction process.Interaction distance between word, corresponding thereto apart from directly proportional.It can only be in higher CNN apart from farther away word
It meets on node, just generates interaction.This process may have more information and lose.
And Fig. 2 show the present invention is based on the Series Modeling methods of bull attention mechanism to be obviously better than two kinds of sides
Formula can capture more information.
It is a specific illiteracy Chinese translation instance below.
It is tested using 120 Wan Menghan Parallel Corpus as data set, effect of the invention is verified.
Aiming at the problem that the serious Sparse occurred in Mongolian corpus, three kinds of processing modes are carried out, have been respectively: word
Sew the supplementary element cutting of cutting, stem cutting and lattice, wherein the granularity of affixe cutting is smaller, the fineness ratio of stem cutting
Larger, the dicing process of the supplementary element of lattice is similar to stem cutting, and the granularity of cutting is bigger.
The present invention tests these three cutting methods of corpus respectively, and experimental result is as shown in table 1.
Table 1
It can be seen that the quality that all cutting methods all improve translation from the experimental result in table.Wherein, stem is cut
Operation BLEU value is divided to can be improved 1.02, although the supplementary element cutting promotion of lattice is unobvious, when common with stem cutting
When effect, so that BLEU lifting values have reached 1.14.Why the result of affixe cutting is not so good as stem cutting, it is believed that main
The reason is that since affixe cutting is too careful, so that the sentence length amplification after cutting is larger, and neural network machine translation pair
The processing capacity of long sentence is weaker, therefore effect will receive influence.Distribution is indicated aggregate concept semantic similarity is added
After calculating, BLEU improves 5.88.Then we randomly select out 65 groups of words pair, are built using word pair and similarity value for coordinate
Vertical coordinate system indicates that algorithm calculates point of similarity value in a coordinate system to the distributed word of aggregate concept semantic similarity
Cloth situation is analyzed, from fig. 9, it can be seen that the obtained continuity of this algorithm is relatively good, this explanation is based on calculating herein
The similarity value calculation of method and the artificial marking of similarity have the good degree of correlation.By the pretreated of the data of front two
Journey is put into finally, the data pre-processed are divided into training set, verifying collection and test set with certain proportion by this experiment
Training in Transformer model, BLEU value improve 10.16, and training effect is obviously better than RNN.
Claims (10)
1. a kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer, which is characterized in that turning over
Using Transformer model during translating, the Transformer model is to carry out position encoded and base using trigonometric function
In multilevel encoder-decoder architecture of enhanced bull attention mechanism construction, so that the attention mechanism that places one's entire reliance upon is come
The global dependence between outputting and inputting is drawn, recurrence and convolution are eliminated.
2. the illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer according to claim 1,
Be characterized in that, before translation, first data pre-processed, it is described to data carry out pretreatment be to the word in Mongolian corpus
The supplementary element of dry, affixe and lattice carries out cutting separation, to reduce the sparsity of data, while finding out Mongolian in stem, affixe
With the language feature of the supplementary element of qualifying, and these language features are dissolved among training.
3. the illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer according to claim 2,
It is characterized in that, the cutting separation includes the attached of the affixe cutting of small grain size, the stem cutting of big granularity and small-scale lattice
Addition cutting point.
4. the illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer according to claim 1,
It is characterized in that, after being pre-processed to data, the influence of comprehensive depth, density, semantic registration to Concept Semantic Similarity, collection
Similarity matrix is established at the similarity algorithm of semantic distance and the information content, then carries out principal component analysis, by similarity moment
Battle array is converted into principal component transform matrix, calculates principal component contributor rate, and be weighted processing as weight, obtains final
Concept Semantic Similarity.
5. the illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer according to claim 4,
It is characterized in that, the formula of the similarity matrix is expressed as
Xsim=(xi1,xi2,xi3,xi4,xi5)T, i=1,2,3 ..., n
The final Concept Semantic Similarity calculates representation formula
δsim=r1ysim1+r2ysim2+r3ysim3+r4ysim4+r5ysim5
Wherein, XsimIndicate similarity matrix, xi1Indicate Ds,xi2Indicate Ks,xi3
Indicate Zs,xi4Indicate Ss,xi5Indicate Is,N is by relatively more general
Read the logarithm to the notional word in set, xi=(Dsi,Ksi,Zsi,Ssi,Isi), be principal component input sample set in one to
Amount, wherein respectively represented per one-dimensional variable each section Semantic Similarity Measurement in comprehensive similarity computing module as a result, DsiTable
Show the relationship in vector between the semantic distance and similarity of i-th dimension element, KsiIt indicates in vector in terms of the depth of i-th dimension element
Semantic similarity, ZsiIndicate the Effects of Density factor of the notional word c of i-th dimension element in vector, SsiIndicate i-th dimension in vector
Similarity in terms of the semantic registration of element, IsiIndicate the similarity in vector in terms of the information content of i-th dimension element;δsim
Indicate Concept Semantic Similarity, ysim1,ysim2,ysim3,ysim4,ysim5For to similarity matrix XsimPrincipal component analysis is carried out to be mentioned
The principal component of taking-up, r1,r2,r3,r4,r5Indicate each principal component contributor rate.
6. the illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer according to claim 1,
Be characterized in that, the bull attention mechanism be described as inquiry and one group of key-value pair is mapped to output, wherein inquiry, key, value and
Output is all vector, and output is calculated as the weighted sum of value, and the weight for distributing to each value is compatible with corresponding secret key by inquiring
Property function is calculated.
7. the illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer according to claim 1,
It is characterized in that,
The encoder is made of N number of identical layer, and every layer there are two sublayers, and first sublayer is bull attention sublayer, the
Two sublayers are propagated forward sublayers, and each sublayer is output and input there is residual error connection, each sublayer followed by
A step regularization operation, with accelerate model convergence;
The decoder is made of N number of identical layer, and every layer there are three sublayers, and first sublayer is the bull of mask matrix majorization
Attention sublayer, it is each with a mask matrix majorization during training for modeling the target side sentence generated
It is only calculated when bull attention calculates and arrives preceding t-1 word;Second sublayer is bull attention sublayer, is encoder and decoder
Between attention mechanism, that is, go in original language to look for relevant semantic information;Third sublayer is propagated forward sublayer, with coding
Propagated forward sublayer in device is completely the same, and each sublayer outputs and inputs that there is residual error connections, and one step of heel is just
Change operation, then to accelerate model convergence.
8. according to claim 1 or the illiteracy Chinese machine translation side of the 7 enhancing semantic feature information based on Transformer
Method, which is characterized in that construct multilevel encoder-decoder architecture by the following method:
In encoder, the output of each sublayer is LayerNorm (x+Sublayer (x)), and wherein LayerNorm () expression layer is returned
One changes function, the function that Sublayer () is realized using the sublayer itself that the residual error based on bull attention mechanism connects, x table
Show the current layer vector to be inputted, Mongolian sentence is generated into corresponding vector, then conduct using word2vec vector techniques
The input of first layer coder, i.e. Sublayer (x) are the functions of being realized by the sublayer itself based on bull attention mechanism, are
The connection of promotion residual error, all sublayers and embeding layer generate dimension dmodel=512 output.
9. according to claim 1 or the illiteracy Chinese machine translation side of the 7 enhancing semantic feature information based on Transformer
Method, which is characterized in that
The propagated forward sublayer of the encoder has linear transformation twice in realizing, a Relu nonlinear activation is specific to calculate
Formula is as follows:
FFN (x)=γ (0, xW1+b1)W2+b2
X presentation code device inputs information, W1Indicate the corresponding weight of input vector, b1Indicate bull attention mechanism biasing because
Son, (0, xW1+b1) indicate propagated forward sublayer input layer information, W2Indicate the corresponding weight of input vector, b2 indicates preceding to biography
Broadcast the bias factor of function, the nonlinear activation function of γ presentation code device layer.
10. according to claim 1 or the illiteracy Chinese machine translation side of the 7 enhancing semantic feature information based on Transformer
Method, which is characterized in that it carry out position encoded being calculated absolute position as the variable in trigonometric function using trigonometric function,
Formula is as follows:
In formula, pos is position, and i is dimension, i.e., position encoded each dimension correspond to sine curve, wavelength formed from 2 π to
The geometric progression of 100002 π, dmodelBe it is position encoded after embeding layer dimension, the value range of 2i is that minimum value is 0, most
Big value is dmodel。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811231017.2A CN109492232A (en) | 2018-10-22 | 2018-10-22 | A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811231017.2A CN109492232A (en) | 2018-10-22 | 2018-10-22 | A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109492232A true CN109492232A (en) | 2019-03-19 |
Family
ID=65692441
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811231017.2A Pending CN109492232A (en) | 2018-10-22 | 2018-10-22 | A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109492232A (en) |
Cited By (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110083826A (en) * | 2019-03-21 | 2019-08-02 | 昆明理工大学 | A kind of old man's bilingual alignment method based on Transformer model |
CN110196946A (en) * | 2019-05-29 | 2019-09-03 | 华南理工大学 | A kind of personalized recommendation method based on deep learning |
CN110297887A (en) * | 2019-06-26 | 2019-10-01 | 山东大学 | Service robot personalization conversational system and method based on cloud platform |
CN110321961A (en) * | 2019-07-09 | 2019-10-11 | 北京金山数字娱乐科技有限公司 | A kind of data processing method and device |
CN110321962A (en) * | 2019-07-09 | 2019-10-11 | 北京金山数字娱乐科技有限公司 | A kind of data processing method and device |
CN110349676A (en) * | 2019-06-14 | 2019-10-18 | 华南师范大学 | Timing physiological data classification method, device, storage medium and processor |
CN110390340A (en) * | 2019-07-18 | 2019-10-29 | 暗物智能科技(广州)有限公司 | The training method and detection method of feature coding model, vision relationship detection model |
CN110427493A (en) * | 2019-07-11 | 2019-11-08 | 新华三大数据技术有限公司 | Electronic health record processing method, model training method and relevant apparatus |
CN110543551A (en) * | 2019-09-04 | 2019-12-06 | 北京香侬慧语科技有限责任公司 | question and statement processing method and device |
CN110597947A (en) * | 2019-03-20 | 2019-12-20 | 桂林电子科技大学 | Reading understanding system and method based on global and local attention interaction |
CN110598221A (en) * | 2019-08-29 | 2019-12-20 | 内蒙古工业大学 | Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network |
CN110619034A (en) * | 2019-06-27 | 2019-12-27 | 中山大学 | Text keyword generation method based on Transformer model |
CN110674647A (en) * | 2019-09-27 | 2020-01-10 | 电子科技大学 | Layer fusion method based on Transformer model and computer equipment |
CN110704587A (en) * | 2019-08-22 | 2020-01-17 | 平安科技(深圳)有限公司 | Text answer searching method and device |
CN110717343A (en) * | 2019-09-27 | 2020-01-21 | 电子科技大学 | Optimal alignment method based on transformer attention mechanism output |
CN110765768A (en) * | 2019-10-16 | 2020-02-07 | 北京工业大学 | Optimized text abstract generation method |
CN110795535A (en) * | 2019-10-28 | 2020-02-14 | 桂林电子科技大学 | Reading understanding method for depth separable convolution residual block |
CN110827219A (en) * | 2019-10-31 | 2020-02-21 | 北京小米智能科技有限公司 | Training method, device and medium of image processing model |
CN111080032A (en) * | 2019-12-30 | 2020-04-28 | 成都数之联科技有限公司 | Load prediction method based on Transformer structure |
CN111105423A (en) * | 2019-12-17 | 2020-05-05 | 北京小白世纪网络科技有限公司 | Deep learning-based kidney segmentation method in CT image |
CN111310485A (en) * | 2020-03-12 | 2020-06-19 | 南京大学 | Machine translation method, device and storage medium |
CN111353315A (en) * | 2020-01-21 | 2020-06-30 | 沈阳雅译网络技术有限公司 | Deep neural machine translation system based on random residual algorithm |
CN111382583A (en) * | 2020-03-03 | 2020-07-07 | 新疆大学 | Chinese-Uygur name translation system with mixed multiple strategies |
CN111401052A (en) * | 2020-04-24 | 2020-07-10 | 南京莱科智能工程研究院有限公司 | Semantic understanding-based multilingual text matching method and system |
CN111428509A (en) * | 2020-03-05 | 2020-07-17 | 北京一览群智数据科技有限责任公司 | Latin letter-based Uygur language processing method and system |
CN111428443A (en) * | 2020-04-15 | 2020-07-17 | 中国电子科技网络信息安全有限公司 | Entity linking method based on entity context semantic interaction |
CN111444695A (en) * | 2020-03-25 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Text generation method, device and equipment based on artificial intelligence and storage medium |
CN111488742A (en) * | 2019-08-19 | 2020-08-04 | 北京京东尚科信息技术有限公司 | Method and device for translation |
CN111507328A (en) * | 2020-04-13 | 2020-08-07 | 北京爱咔咔信息技术有限公司 | Text recognition and model training method, system, equipment and readable storage medium |
CN111581987A (en) * | 2020-04-13 | 2020-08-25 | 广州天鹏计算机科技有限公司 | Disease classification code recognition method, device and storage medium |
CN111626062A (en) * | 2020-05-29 | 2020-09-04 | 苏州思必驰信息科技有限公司 | Text semantic coding method and system |
CN112037776A (en) * | 2019-05-16 | 2020-12-04 | 武汉Tcl集团工业研究院有限公司 | Voice recognition method, voice recognition device and terminal equipment |
CN112084794A (en) * | 2020-09-18 | 2020-12-15 | 西藏大学 | Tibetan-Chinese translation method and device |
WO2020253060A1 (en) * | 2019-06-17 | 2020-12-24 | 平安科技(深圳)有限公司 | Speech recognition method, model training method, apparatus and device, and storage medium |
CN112185104A (en) * | 2020-08-22 | 2021-01-05 | 南京理工大学 | Traffic big data restoration method based on countermeasure autoencoder |
CN112329760A (en) * | 2020-11-17 | 2021-02-05 | 内蒙古工业大学 | Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network |
CN112507733A (en) * | 2020-11-06 | 2021-03-16 | 昆明理工大学 | Dependency graph network-based Hanyue neural machine translation method |
CN112580373A (en) * | 2020-12-26 | 2021-03-30 | 内蒙古工业大学 | High-quality Mongolian unsupervised neural machine translation method |
CN112947930A (en) * | 2021-01-29 | 2021-06-11 | 南通大学 | Method for automatically generating Python pseudo code based on Transformer |
CN113065432A (en) * | 2021-03-23 | 2021-07-02 | 内蒙古工业大学 | Handwritten Mongolian recognition method based on data enhancement and ECA-Net |
CN113076398A (en) * | 2021-03-30 | 2021-07-06 | 昆明理工大学 | Cross-language information retrieval method based on bilingual dictionary mapping guidance |
CN113095091A (en) * | 2021-04-09 | 2021-07-09 | 天津大学 | Chapter machine translation system and method capable of selecting context information |
CN113177546A (en) * | 2021-04-30 | 2021-07-27 | 中国科学技术大学 | Target detection method based on sparse attention module |
CN113255597A (en) * | 2021-06-29 | 2021-08-13 | 南京视察者智能科技有限公司 | Transformer-based behavior analysis method and device and terminal equipment thereof |
CN113297841A (en) * | 2021-05-24 | 2021-08-24 | 哈尔滨工业大学 | Neural machine translation method based on pre-training double-word vectors |
CN113761841A (en) * | 2021-04-19 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Method for converting text data into acoustic features |
US11556723B2 (en) | 2019-10-24 | 2023-01-17 | Beijing Xiaomi Intelligent Technology Co., Ltd. | Neural network model compression method, corpus translation method and device |
CN116186249A (en) * | 2022-10-24 | 2023-05-30 | 数采小博科技发展有限公司 | Item prediction robot for electronic commerce commodity and implementation method thereof |
CN117711417A (en) * | 2024-02-05 | 2024-03-15 | 武汉大学 | Voice quality enhancement method and system based on frequency domain self-attention network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105957518A (en) * | 2016-06-16 | 2016-09-21 | 内蒙古大学 | Mongolian large vocabulary continuous speech recognition method |
CN107967262A (en) * | 2017-11-02 | 2018-04-27 | 内蒙古工业大学 | A kind of neutral net covers Chinese machine translation method |
CN108681539A (en) * | 2018-05-07 | 2018-10-19 | 内蒙古工业大学 | A kind of illiteracy Chinese nerve interpretation method based on convolutional neural networks |
-
2018
- 2018-10-22 CN CN201811231017.2A patent/CN109492232A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105957518A (en) * | 2016-06-16 | 2016-09-21 | 内蒙古大学 | Mongolian large vocabulary continuous speech recognition method |
CN107967262A (en) * | 2017-11-02 | 2018-04-27 | 内蒙古工业大学 | A kind of neutral net covers Chinese machine translation method |
CN108681539A (en) * | 2018-05-07 | 2018-10-19 | 内蒙古工业大学 | A kind of illiteracy Chinese nerve interpretation method based on convolutional neural networks |
Non-Patent Citations (2)
Title |
---|
ASHISH VASWANI ET AL.: "Attention Is All You Need", 《31ST CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2017)》 * |
王桐 等: "WordNet中的综合概念语义相似度计算方法", 《北京邮电大学学报》 * |
Cited By (72)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110597947B (en) * | 2019-03-20 | 2023-03-28 | 桂林电子科技大学 | Reading understanding system and method based on global and local attention interaction |
CN110597947A (en) * | 2019-03-20 | 2019-12-20 | 桂林电子科技大学 | Reading understanding system and method based on global and local attention interaction |
CN110083826A (en) * | 2019-03-21 | 2019-08-02 | 昆明理工大学 | A kind of old man's bilingual alignment method based on Transformer model |
CN112037776A (en) * | 2019-05-16 | 2020-12-04 | 武汉Tcl集团工业研究院有限公司 | Voice recognition method, voice recognition device and terminal equipment |
CN110196946A (en) * | 2019-05-29 | 2019-09-03 | 华南理工大学 | A kind of personalized recommendation method based on deep learning |
CN110196946B (en) * | 2019-05-29 | 2021-03-30 | 华南理工大学 | Personalized recommendation method based on deep learning |
CN110349676A (en) * | 2019-06-14 | 2019-10-18 | 华南师范大学 | Timing physiological data classification method, device, storage medium and processor |
CN110349676B (en) * | 2019-06-14 | 2021-10-29 | 华南师范大学 | Time-series physiological data classification method and device, storage medium and processor |
WO2020253060A1 (en) * | 2019-06-17 | 2020-12-24 | 平安科技(深圳)有限公司 | Speech recognition method, model training method, apparatus and device, and storage medium |
CN110297887A (en) * | 2019-06-26 | 2019-10-01 | 山东大学 | Service robot personalization conversational system and method based on cloud platform |
CN110619034A (en) * | 2019-06-27 | 2019-12-27 | 中山大学 | Text keyword generation method based on Transformer model |
CN110321962B (en) * | 2019-07-09 | 2021-10-08 | 北京金山数字娱乐科技有限公司 | Data processing method and device |
CN110321962A (en) * | 2019-07-09 | 2019-10-11 | 北京金山数字娱乐科技有限公司 | A kind of data processing method and device |
CN110321961A (en) * | 2019-07-09 | 2019-10-11 | 北京金山数字娱乐科技有限公司 | A kind of data processing method and device |
CN110427493B (en) * | 2019-07-11 | 2022-04-08 | 新华三大数据技术有限公司 | Electronic medical record processing method, model training method and related device |
CN110427493A (en) * | 2019-07-11 | 2019-11-08 | 新华三大数据技术有限公司 | Electronic health record processing method, model training method and relevant apparatus |
CN110390340A (en) * | 2019-07-18 | 2019-10-29 | 暗物智能科技(广州)有限公司 | The training method and detection method of feature coding model, vision relationship detection model |
CN111488742A (en) * | 2019-08-19 | 2020-08-04 | 北京京东尚科信息技术有限公司 | Method and device for translation |
CN111488742B (en) * | 2019-08-19 | 2021-06-29 | 北京京东尚科信息技术有限公司 | Method and device for translation |
CN110704587A (en) * | 2019-08-22 | 2020-01-17 | 平安科技(深圳)有限公司 | Text answer searching method and device |
CN110704587B (en) * | 2019-08-22 | 2023-10-20 | 平安科技(深圳)有限公司 | Text answer searching method and device |
CN110598221A (en) * | 2019-08-29 | 2019-12-20 | 内蒙古工业大学 | Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network |
CN110543551A (en) * | 2019-09-04 | 2019-12-06 | 北京香侬慧语科技有限责任公司 | question and statement processing method and device |
CN110543551B (en) * | 2019-09-04 | 2022-11-08 | 北京香侬慧语科技有限责任公司 | Question and statement processing method and device |
CN110717343B (en) * | 2019-09-27 | 2023-03-14 | 电子科技大学 | Optimal alignment method based on transformer attention mechanism output |
CN110674647A (en) * | 2019-09-27 | 2020-01-10 | 电子科技大学 | Layer fusion method based on Transformer model and computer equipment |
CN110717343A (en) * | 2019-09-27 | 2020-01-21 | 电子科技大学 | Optimal alignment method based on transformer attention mechanism output |
CN110765768A (en) * | 2019-10-16 | 2020-02-07 | 北京工业大学 | Optimized text abstract generation method |
US11556723B2 (en) | 2019-10-24 | 2023-01-17 | Beijing Xiaomi Intelligent Technology Co., Ltd. | Neural network model compression method, corpus translation method and device |
CN110795535A (en) * | 2019-10-28 | 2020-02-14 | 桂林电子科技大学 | Reading understanding method for depth separable convolution residual block |
CN110827219A (en) * | 2019-10-31 | 2020-02-21 | 北京小米智能科技有限公司 | Training method, device and medium of image processing model |
CN110827219B (en) * | 2019-10-31 | 2023-04-07 | 北京小米智能科技有限公司 | Training method, device and medium of image processing model |
CN111105423A (en) * | 2019-12-17 | 2020-05-05 | 北京小白世纪网络科技有限公司 | Deep learning-based kidney segmentation method in CT image |
CN111105423B (en) * | 2019-12-17 | 2021-06-29 | 北京小白世纪网络科技有限公司 | Deep learning-based kidney segmentation method in CT image |
CN111080032A (en) * | 2019-12-30 | 2020-04-28 | 成都数之联科技有限公司 | Load prediction method based on Transformer structure |
CN111080032B (en) * | 2019-12-30 | 2023-08-29 | 成都数之联科技股份有限公司 | Load prediction method based on transducer structure |
CN111353315A (en) * | 2020-01-21 | 2020-06-30 | 沈阳雅译网络技术有限公司 | Deep neural machine translation system based on random residual algorithm |
CN111353315B (en) * | 2020-01-21 | 2023-04-25 | 沈阳雅译网络技术有限公司 | Deep nerve machine translation system based on random residual error algorithm |
CN111382583A (en) * | 2020-03-03 | 2020-07-07 | 新疆大学 | Chinese-Uygur name translation system with mixed multiple strategies |
CN111428509A (en) * | 2020-03-05 | 2020-07-17 | 北京一览群智数据科技有限责任公司 | Latin letter-based Uygur language processing method and system |
CN111310485B (en) * | 2020-03-12 | 2022-06-21 | 南京大学 | Machine translation method, device and storage medium |
CN111310485A (en) * | 2020-03-12 | 2020-06-19 | 南京大学 | Machine translation method, device and storage medium |
CN111444695B (en) * | 2020-03-25 | 2022-03-01 | 腾讯科技(深圳)有限公司 | Text generation method, device and equipment based on artificial intelligence and storage medium |
CN111444695A (en) * | 2020-03-25 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Text generation method, device and equipment based on artificial intelligence and storage medium |
CN111507328A (en) * | 2020-04-13 | 2020-08-07 | 北京爱咔咔信息技术有限公司 | Text recognition and model training method, system, equipment and readable storage medium |
CN111581987A (en) * | 2020-04-13 | 2020-08-25 | 广州天鹏计算机科技有限公司 | Disease classification code recognition method, device and storage medium |
CN111428443A (en) * | 2020-04-15 | 2020-07-17 | 中国电子科技网络信息安全有限公司 | Entity linking method based on entity context semantic interaction |
CN111428443B (en) * | 2020-04-15 | 2022-09-13 | 中国电子科技网络信息安全有限公司 | Entity linking method based on entity context semantic interaction |
CN111401052A (en) * | 2020-04-24 | 2020-07-10 | 南京莱科智能工程研究院有限公司 | Semantic understanding-based multilingual text matching method and system |
CN111626062B (en) * | 2020-05-29 | 2023-05-30 | 思必驰科技股份有限公司 | Text semantic coding method and system |
CN111626062A (en) * | 2020-05-29 | 2020-09-04 | 苏州思必驰信息科技有限公司 | Text semantic coding method and system |
CN112185104A (en) * | 2020-08-22 | 2021-01-05 | 南京理工大学 | Traffic big data restoration method based on countermeasure autoencoder |
CN112185104B (en) * | 2020-08-22 | 2021-12-10 | 南京理工大学 | Traffic big data restoration method based on countermeasure autoencoder |
CN112084794A (en) * | 2020-09-18 | 2020-12-15 | 西藏大学 | Tibetan-Chinese translation method and device |
CN112507733A (en) * | 2020-11-06 | 2021-03-16 | 昆明理工大学 | Dependency graph network-based Hanyue neural machine translation method |
CN112329760A (en) * | 2020-11-17 | 2021-02-05 | 内蒙古工业大学 | Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network |
CN112329760B (en) * | 2020-11-17 | 2021-12-21 | 内蒙古工业大学 | Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network |
CN112580373A (en) * | 2020-12-26 | 2021-03-30 | 内蒙古工业大学 | High-quality Mongolian unsupervised neural machine translation method |
CN112580373B (en) * | 2020-12-26 | 2023-06-27 | 内蒙古工业大学 | High-quality Mongolian non-supervision neural machine translation method |
CN112947930A (en) * | 2021-01-29 | 2021-06-11 | 南通大学 | Method for automatically generating Python pseudo code based on Transformer |
CN113065432A (en) * | 2021-03-23 | 2021-07-02 | 内蒙古工业大学 | Handwritten Mongolian recognition method based on data enhancement and ECA-Net |
CN113076398B (en) * | 2021-03-30 | 2022-07-29 | 昆明理工大学 | Cross-language information retrieval method based on bilingual dictionary mapping guidance |
CN113076398A (en) * | 2021-03-30 | 2021-07-06 | 昆明理工大学 | Cross-language information retrieval method based on bilingual dictionary mapping guidance |
CN113095091A (en) * | 2021-04-09 | 2021-07-09 | 天津大学 | Chapter machine translation system and method capable of selecting context information |
CN113761841A (en) * | 2021-04-19 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Method for converting text data into acoustic features |
CN113177546A (en) * | 2021-04-30 | 2021-07-27 | 中国科学技术大学 | Target detection method based on sparse attention module |
CN113297841A (en) * | 2021-05-24 | 2021-08-24 | 哈尔滨工业大学 | Neural machine translation method based on pre-training double-word vectors |
CN113255597A (en) * | 2021-06-29 | 2021-08-13 | 南京视察者智能科技有限公司 | Transformer-based behavior analysis method and device and terminal equipment thereof |
CN116186249A (en) * | 2022-10-24 | 2023-05-30 | 数采小博科技发展有限公司 | Item prediction robot for electronic commerce commodity and implementation method thereof |
CN116186249B (en) * | 2022-10-24 | 2023-10-13 | 数采小博科技发展有限公司 | Item prediction robot for electronic commerce commodity and implementation method thereof |
CN117711417A (en) * | 2024-02-05 | 2024-03-15 | 武汉大学 | Voice quality enhancement method and system based on frequency domain self-attention network |
CN117711417B (en) * | 2024-02-05 | 2024-04-30 | 武汉大学 | Voice quality enhancement method and system based on frequency domain self-attention network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109492232A (en) | A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer | |
CN110334219B (en) | Knowledge graph representation learning method based on attention mechanism integrated with text semantic features | |
JP7468929B2 (en) | How to acquire geographical knowledge | |
CN106202010B (en) | Method and apparatus based on deep neural network building Law Text syntax tree | |
CN111581401B (en) | Local citation recommendation system and method based on depth correlation matching | |
CN111444343B (en) | Cross-border national culture text classification method based on knowledge representation | |
CN109492227A (en) | It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations | |
CN110377686A (en) | A kind of address information Feature Extraction Method based on deep neural network model | |
CN107729311B (en) | Chinese text feature extraction method fusing text moods | |
CN110222140A (en) | A kind of cross-module state search method based on confrontation study and asymmetric Hash | |
CN107480132A (en) | A kind of classic poetry generation method of image content-based | |
CN106055675B (en) | A kind of Relation extraction method based on convolutional neural networks and apart from supervision | |
CN106650789A (en) | Image description generation method based on depth LSTM network | |
CN111858932A (en) | Multiple-feature Chinese and English emotion classification method and system based on Transformer | |
CN105938485A (en) | Image description method based on convolution cyclic hybrid model | |
CN103778227A (en) | Method for screening useful images from retrieved images | |
CN108268449A (en) | A kind of text semantic label abstracting method based on lexical item cluster | |
CN105528437A (en) | Question-answering system construction method based on structured text knowledge extraction | |
CN111881677A (en) | Address matching algorithm based on deep learning model | |
Tang et al. | Deep sequential fusion LSTM network for image description | |
CN111291556A (en) | Chinese entity relation extraction method based on character and word feature fusion of entity meaning item | |
CN110765755A (en) | Semantic similarity feature extraction method based on double selection gates | |
CN113515632B (en) | Text classification method based on graph path knowledge extraction | |
CN110795565A (en) | Semantic recognition-based alias mining method, device, medium and electronic equipment | |
CN113553440A (en) | Medical entity relationship extraction method based on hierarchical reasoning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190319 |