CN109492232A - A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer - Google Patents

A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer Download PDF

Info

Publication number
CN109492232A
CN109492232A CN201811231017.2A CN201811231017A CN109492232A CN 109492232 A CN109492232 A CN 109492232A CN 201811231017 A CN201811231017 A CN 201811231017A CN 109492232 A CN109492232 A CN 109492232A
Authority
CN
China
Prior art keywords
sublayer
similarity
semantic
indicate
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811231017.2A
Other languages
Chinese (zh)
Inventor
苏依拉
张振
高芬
王宇飞
孙晓骞
牛向华
赵亚平
卞乐乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University of Technology
Original Assignee
Inner Mongolia University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University of Technology filed Critical Inner Mongolia University of Technology
Priority to CN201811231017.2A priority Critical patent/CN109492232A/en
Publication of CN109492232A publication Critical patent/CN109492232A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The illiteracy Chinese machine translation method for the enhancing semantic feature information based on Transformer model that this paper presents a kind of.Firstly, the present invention from the language feature of Mongolian, finds out it in stem, affixe with the feature for the supplementary element passed, and these language features are dissolved among the training of model.Secondly, the present invention is expressed as research background to measure the distribution of the similarity degree between two words, the comprehensive analysis influence of depth and density, semantic registration to Concept Semantic Similarity.The present invention is in translation process, using Transformer model, the Transformer model is to carry out multilevel encoder-decoder architecture position encoded and based on enhanced bull attention mechanism construction using trigonometric function, to place one's entire reliance upon attention mechanism to draw the global dependence between outputting and inputting, recurrence and convolution are eliminated.

Description

A kind of illiteracy Chinese machine translation of the enhancing semantic feature information based on Transformer Method
Technical field
The invention belongs to machine translation mothod field, in particular to a kind of enhancing semantic feature based on Transformer The illiteracy Chinese machine translation method of information.
Background technique
Mongol is a kind of agglutinative language, is under the jurisdiction of Altai family.Mongolian written has traditional Mongolian and West That Mongolian, " illiteracy " in illiteracy Chinese translation system that we are studied here refer to the translation of traditional Mongolian to Chinese.It passes Mongolian of uniting is also a kind of alphabetic writing, and alphabetical form is not unique, position phase of the variation of form with letter in word It closes, position includes that the independent of word starts, in word and suffix.The word of Mongolian is by root (root)+affixe (suffix) side Formula is formed, and affixe is divided into two classes: one kind assigns original word for sewing to be connected to new meaning behind root, be called derivative Sew, sews behind root and connect one or more derivational suffixes just and will form stem (stem);It is another kind of sew to be connected to behind stem be used for Express grammatical meaning.All there are a variety of variations such as tense, number, lattice in noun, the verb of Mongolian, these variations are again by sewing Affixe is connect to realize, therefore Mongolian morphological change is extremely complex.In addition, the word order of Mongolian and Chinese have very big difference, The verb of Mongolian is behind subject and predicate, and positioned at the end of sentence, and verb is between subject and object in Chinese.
A dimension difference of vector is only used with one-hot expression, the distributed of word indicates, uses the dense reality of low-dimensional Number vector indicates word.In the low-dimensional vector space, can be convenient according to distance or angle isometry mode, measure two Similarity degree between a word.In addition, on technological layer, under the background studied statistical language model, Google Company has opened Word2vec in 2013, and this is a for training the software tool of term vector.Word2vec can be according to given Corpus, by optimization after training pattern a word is fast and effeciently expressed as vector form, be natural language at The application study in reason field provides new tool.However, Word2vec relies on skip-grams or continuous bag of words (CBOW) are come Establish neural word insertion.But word2vec realizes when semantic relevancy calculates there is certain limitation at present, on the one hand uses Foundation of the local context information of translation to be generated as prediction translation, not using global contextual information, so right Contextual information using insufficient, there is also rooms for promotion for the extraction of semantic feature.On the other hand, due to the knot of frame itself Structure limits the parallelization of calculating, and computational efficiency is up for improving.
Traditional machine translation system, it is most of be based on Recognition with Recurrent Neural Network (RNN), shot and long term memory (LSTM) or Gate recurrent neural network (GRU).These methods have become the Series Modelings such as machine translation in the past few years and conversion is asked Inscribe state-of-the-art method.However recursive models usually consider the calculating along the character position for outputting and inputting sequence.By position with The step alignment in the time is calculated, they generate a series of hidden state h in position t inputt, while being also previously to hide shape State ht-1Function.This intrinsic sequential nature eliminates the parallelization in training example, and parallelization is in longer sequence length In become most important because memory restrict crosses over exemplary batch processing.Nearest work is by decomposing skill and base Significantly improving for computational efficiency is realized in the calculating of condition, while also improving model performance in the latter case.However, The basic constraint that sequence calculates still has.
Current encoder device-decoder chassis is a main model for solving the problems, such as sequence to sequence.Model uses coding Device carries out compression expression to source language sentence, generates target language sentence based on the compression expression of source using decoder.The knot The benefit of structure can be achieved on the modeling of end-to-end mode between two sentences, and all parametric variables are unified to one in model It is trained under objective function, model performance is preferable.Fig. 1 illustrates the structure of coder-decoder model, is Down-Up one The process of a machine translation.
Encoder and decoder can select the neural network of different structure, such as RNN, CNN.The working method of RNN is To sequence according to time step, compression expression is successively carried out.When using RNN, two-way RNN structure generally will use.Specifically Mode is using a RNN to the compression expression of element progress from left to right in sequence, another RNN carries out from the right side sequence Compression expression to the left.Two kinds indicate to be joined together using the distribution as ultimate sequence indicates.In the structure, due to To be handled in order the element in sequence, the interaction distance between two words may be considered between them it is opposite away from From.With the growth of sentence, the increase of relative distance, there is the apparent theoretical upper limit to the processing of information.
When using CNN structure, the structure of multilayer is generally used, Lai Shixian sequence is partially illustrated the mistake of global expression Journey.The viewpoint that can regard a kind of time series as using RNN modeling sentence can regard a kind of knot as using CNN modeling sentence The viewpoint of structure.Sequence using RNN structure mainly includes RNNSearch, GNMT etc. to series model, uses CNN structure Sequence mainly has ConvS2S etc. to series model, and what is embodied is a kind of from part to global feature extraction process, between word Interaction distance, corresponding thereto apart from directly proportional.It can only meet on higher CNN node apart from farther away word, just generate friendship Mutually, this process may have more information and lose.
Summary of the invention
In order to overcome the disadvantages of the above prior art, the purpose of the present invention is to provide a kind of based on Transformer's Enhance the illiteracy Chinese machine translation method of semantic feature information, the system based entirely on attention mechanism, completely eliminate recurrence and Convolution.Experiment shows that the system is more superior in quality, while being easier to parallelization, and the less time is needed to be instructed Practice, reaches 45.4BLEU in the translation duties of 120 Wan Menghan Parallel Corpus, realize higher translation quality.
To achieve the goals above, the technical solution adopted by the present invention is that: a kind of enhancing based on Transformer is semantic The illiteracy Chinese machine translation method of characteristic information, which is characterized in that Transformer model is used in translation process, it is described Transformer model is to carry out multilayer position encoded and based on enhanced bull attention mechanism construction using trigonometric function Coder-decoder framework, thus place one's entire reliance upon attention mechanism to draw the global dependence between outputting and inputting, Eliminate recurrence and convolution.
Before translation, feature is preferably extracted for the ease of deep learning neural network, first data is pre-processed, Described to carry out pretreatment to data be to carry out cutting separation to the supplementary element of stem, affixe and lattice in Mongolian corpus, with drop The sparsity of low data, while character segmentation processing is carried out to Chinese, Mongolian is found out in stem, affixe with the supplementary element passed Language feature, and these language features are dissolved among training.
The cutting separation includes the additional of the affixe cutting of small grain size, the stem cutting of big granularity and small-scale lattice Ingredient cutting.
After being pre-processed to data, the influence of comprehensive depth, density, semantic registration to Concept Semantic Similarity, collection Similarity matrix is established at the similarity algorithm of semantic distance and the information content, then carries out principal component analysis, by similarity moment Battle array is converted into principal component transform matrix, calculates principal component contributor rate, and be weighted processing as weight, obtains final Concept Semantic Similarity.
The formula of the similarity matrix is expressed as
Xsim=(xi1,xi2,xi3,xi4,xi5)T, i=1,2,3 ..., n
The final Concept Semantic Similarity calculates representation formula
δsim=r1ysim1+r2ysim2+r3ysim3+r4ysim4+r5ysim5
Wherein, XsimIndicate similarity matrix, xi1Indicate Ds,xi2It indicatesxi3Indicate Zs,xi4Indicate Ss,xi5Indicate Is,N is to be compared concept to the logarithm of the notional word in set, xi=(Dsi,Ksi,Zsi,Ssi,Isi), based on A vector in ingredient input sample set, wherein respectively representing each section in comprehensive similarity computing module per one-dimensional variable Semantic Similarity Measurement as a result, DsiIndicate the relationship in vector between the semantic distance and similarity of i-th dimension element, KsiTable Show the semantic similarity in vector in terms of the depth of i-th dimension element, ZsiIndicate the density of the notional word c of i-th dimension element in vector Impact factor, SsiIndicate the similarity in vector in terms of the semantic registration of i-th dimension element, IsiIndicate i-th dimension element in vector The information content in terms of similarity;δsimIndicate Concept Semantic Similarity, ysim1,ysim2,ysim3,ysim4,ysim5For to similarity Matrix XsimCarry out the principal component that principal component analysis is extracted, r1,r2,r3,r4,r5Indicate each principal component contributor rate.
The bull attention mechanism is described as inquiry and one group of key-value pair is mapped to output, wherein inquiry, key, value and defeated It is all vector out, output is calculated as the weighted sum of value, distributes to the weight of each value by inquiring the compatibility with corresponding secret key Function is calculated.
The encoder is made of N number of identical layer, sublayer there are two every layer, and first sublayer is bull attention Layer, second sublayer are propagated forward sublayers, and each sublayer is output and input there is residual error connection, after each sublayer Face follows a step regularization to operate, to accelerate model convergence;
The decoder is made of N number of identical layer, and every layer there are three sublayers, and first sublayer is mask matrix majorization Bull attention sublayer, for modeling the target side sentence generated, during training, with a mask matrix majorization Each bull attention only calculates when calculating and arrives preceding t-1 word;Second sublayer is bull attention sublayer, is encoder reconciliation Attention mechanism between code device, that is, go in original language to look for relevant semantic information;Third sublayer is propagated forward sublayer, with Propagated forward sublayer in encoder is completely the same, and each sublayer outputs and inputs that there is residual error connections, and heel one Regularization operation is walked, to accelerate model convergence.
Multilevel encoder-decoder architecture is constructed by the following method:
In encoder, the output of each sublayer is LayerNorm (x+Sublayer (x)), and wherein LayerNorm () is indicated Layer normalized function, the function that Sublayer () is realized using the sublayer itself that the residual error based on bull attention mechanism connects, X indicates the current layer vector to be inputted, and Mongolian sentence is generated corresponding vector using word2vec vector techniques, is then made For the input of the first layer coder, i.e. Sublayer (x) is the function of being realized by the sublayer itself based on bull attention mechanism, In order to promote residual error to connect, all sublayers and embeding layer generate dimension dmodel=512 output.
The propagated forward sublayer of the encoder has a linear transformation twice in realizing, a Relu nonlinear activation, specifically Calculation formula is as follows:
FFN (x)=γ (0, xW1+b1)W2+b2
X presentation code device inputs information, W1Indicate the corresponding weight of input vector, b1Indicate the inclined of bull attention mechanism The factor is set, (0, xW1+b1) indicate propagated forward sublayer input layer information, W2The corresponding weight of input vector is indicated, before b2 expression To the bias factor of propagation function, the nonlinear activation function of γ presentation code device layer.
It carry out position encoded being calculated absolute position as the variable in trigonometric function using trigonometric function, formula is such as Under:
In formula, pos is position, and i is dimension, i.e., position encoded each dimension corresponds to sine curve, and wavelength is formed from 2 Geometric progression of the π to 100002 π, dmodelBe it is position encoded after embeding layer dimension, the value range of 2i is that minimum value is 0, maximum value is dmodel
Compared with the prior art, the advantages of the present invention are as follows:
1, the present invention uses the Series Modeling method based on Transformer, and the model of sequence to sequence is still continued to use Classical coder-decoder structure, but RNN or CNN are not used as Series Modeling mechanism, but used bull note Meaning power mechanism, to be easier capture " long-distance dependence information ".
2, the stem of the invention in Mongolian corpus, affixe are split with the supplementary element passed, the supplementary element of lattice It is affixe special in Mongolian, the difference with common affixe, first consisting in it only indicates grammer meaning, without any language The meaning of adopted level, the present invention carry out cutting separation to the supplementary element of the lattice in corpus, on the one hand can reduce the dilute of data Property is dredged, Mongolian stem information is on the other hand also preferably remained.
3, the present invention is directed to the serious Sparse Problem as caused by Mongolian word-building characteristic, proposes three kinds in various degree Word segmentation scheme, be the supplementary element of the affixe cutting of small grain size, the stem cutting of big granularity and small-scale lattice respectively Cutting.Experiment shows to combine stem cutting and the supplementary element cutting of lattice, can maximally promote the quality of translation.
4, the present invention is expressed as research background to measure the distribution of the similarity degree between two words, comprehensive analysis depth The influence of degree and density, semantic registration to Concept Semantic Similarity, and it is integrated with traditional semantic distance and the information content Similarity algorithm establishes similarity matrix, by carrying out principal component analysis to it, original similarity matrix is converted into newly Principal component transform matrix, calculate its principal component contributor rate, and be weighted processing as weight, obtain final concept Semantic similarity.
Detailed description of the invention
Fig. 1 is the illiteracy Chinese machine translation frame diagram the present invention is based on Transformer.
Fig. 2 is the illustraton of model the present invention is based on bull attention mechanism to Series Modeling.
Fig. 3 is " soft " attention model figure of the invention.
Fig. 4 is bull attention model figure of the present invention.
Fig. 5 is morpheme cutting flow chart of the present invention.
Fig. 6 is computation model of the bull attention mechanism of the present invention to weight.
Fig. 7 is that the present invention uses two-way RNN to carry out modeling schematic diagram to sequence.
Fig. 8 is that the present invention uses multi-layer C NN to carry out modeling schematic diagram to sequence.
Fig. 9 be aggregate concept Semantic Similarity Measurement of the present invention distributed algorithm under randomly select the phases of 65 groups of words pair Like degree distribution map.
Specific embodiment
The embodiment that the present invention will be described in detail with reference to the accompanying drawings and examples.
A kind of illiteracy Chinese machine translation method based on Transformer of the present invention, first pre-processes Mongolian corpus, then with The correlation model that word2vec generates term vector is research background, and comprehensive depth, density, semantic registration are similar to Concept Semantic The similarity algorithm of the influence of degree, Semantic distance and the information content establishes similarity matrix, then carries out principal component analysis, Similarity matrix is converted into principal component transform matrix, calculates principal component contributor rate, and be weighted processing as weight, Obtain final Concept Semantic Similarity;Transformer model is finally used in translation process, thus the note that places one's entire reliance upon Meaning power mechanism draws the global dependence between outputting and inputting, and eliminates recurrence and convolution, wherein the Transformer mould Type is to carry out multilevel encoder-decoder position encoded and based on enhanced bull attention mechanism construction using trigonometric function Framework.
Mongolian corpus pretreatment: morpheme cutting based on dictionary, when carrying out cutting firstly the need of utilize word frequency statistics The dictionary of tool OpenNMT.dict generation Mongol corpus.After dictionary generates, searches for stem in dictionary and summarized, generated Stem table.Part other than stem table is corresponding affixe exterior portion point.Herein based on stem table and affixe table, using reverse Maximum matching algorithm carries out morpheme to Mongolian each word-building and carries out cutting, and cutting process is as shown in Figure 5.For each A Mongolian word to be processed matches all dictionary records, one by one if a Mongolian word to be processed includes a certain item Record, then carry out cutting, keep the supplementary element of lattice disconnected, last a Mongolian word is separated into two parts: lattice add Ingredient a part, being left part is another part.
To cover Chinese bilingual corpora carry out coding be uniformly processed after, construct bilingual dictionary on this basis.It should The modeling of Transformer model carries out position comprising the coder-decoder structure of building multilayered structure, using trigonometric function Coding and the model construction based on enhanced bull attention mechanism, and to the training optimization method of model and canonical strategy into Row improves.
On the algorithm based on the information content, the present invention passes through the distributed analysis indicated to word, finds concept packet The sub- concept contained is more, and the concept information contained content is fewer, and gives the distributed I calculating mould indicated for word Type:
Wherein: all sub- concept node numbers of h (c) expression notional word node c;maxwnIt is a constant, indicates in semanteme All concept node trees in classification tree.
The present invention proposes the aggregative weighted method based on principal component analysis, and principal component analysis is introduced into weight computing, benefit Use the contribution rate of principal component as weight to similarity carry out aggregative weighted calculating, alleviate dimension disaster, gradient explosion ask Topic, facilitates the fast convergence of model.
Aggregative weighted algorithm in the present invention based on principal component analysis is mainly by multi-angle similarity calculation, similarity matrix It extracts, 3 part of the weight computing composition based on principal component analysis.
First part's multi-angle similarity calculation
Semantic similarity is analyzed from semantic distance, depth and density, semantic registration and the information content respectively, Provide the calculation formula of each section semantic similarity.
(1) semantic distance
Relationship between semantic distance and similarity is expressed as
Wherein: c1And c2For the distributed vector of two concepts to be compared;A is an adjustable parameter, is taken here to be compared Value of the concept to the average semantic distance of set as a;D(c1,c2) be 2 concepts semantic distance, indicate c1And c2Between Shortest path.
(2) depth and density
The level locating for semantic tree interior joint is higher, and representative notional word is more abstract;Locating level is lower, representative Notional word it is more specific.If the notional word c compared1And c2Node where the depth capacity of semantic tree be respectively Kmax(c1) and Kmax(c2), notional word c1And c2Node depth be respectively K (c1) and K (c2), then the Semantic Similarity Measurement formula in terms of depth For
In semantic hierarchies tree, the density of regional area is bigger, illustrates that this region is more specific to the division of concept, in region Semantic similarity between notional word is relatively large.The Effects of Density factor of notional word c is
Wherein: n (c) is using notional word node c as the direct descendent number of root node, and n (O) is notional word c node place The maximum value of the direct descendent number of sub- each node of semantic tree O.It is obtained based on following formula and is compared notional word c1And c2? The calculation formula of semantic similarity is in terms of density
(3) semantic registration
The root node for defining semantic hierarchies tree is R, c1And c2For arbitrary 2 notional word nodes, S (c1) it is from c1It sets out Until the node number in the node set that root node R is passed through, S (c2) it is from c2The node to set out until root node R is passed through Node number in set, S (c1)∩S(c2) indicate from c1And c2The node set (intersection) passed through jointly to R, S (c1)∪S (c2) indicate from c1To the R node set passed through and c2The union of the node set passed through to R, then in terms of semantic registration Similarity be expressed as
(4) information content
In order to define the similarity in terms of the information content, following algorithm is proposed to calculate I value.Calculation formula is
Wherein, c1And c2Indicate the distributed vector of two concepts to be compared, I (c1) indicate with Concept Vectors c1For father The sum of the vector dimension of all child nodes of node, I (c2) indicate with Concept Vectors c2For the vector of all child nodes of father node The sum of dimension.
Second part similarity matrix extracts
Assuming that being compared concept to having n in set to notional word, if xi=(Dsi,Ksi,Zsi,Ssi,Isi) it is that principal component inputs A vector in sample set, wherein it is similar to respectively represent each section semanteme in comprehensive similarity computing module per one-dimensional variable Degree calculate as a result, DsiIndicate the relationship in vector between the semantic distance and similarity of i-th dimension element, KsiIt indicates in vector Semantic similarity in terms of the depth of i-th dimension element, ZsiIndicate the Effects of Density factor of the notional word c of i-th dimension element in vector, SsiIndicate the similarity in vector in terms of the semantic registration of i-th dimension element, IsiIt indicates in vector in the information of i-th dimension element Hold the similarity of aspect.
Then similarity matrix is expressed as
Xsim=(xi1,xi2,xi3,xi4,xi5)T, i=1,2,3 ..., n
Weight computing of the Part III based on principal component analysis
The thought of principal component analysis is that multiple indexs are converted into several overall targets under the premise of losing little information Multivariate statistical method.The overall target being usually converted into is known as principal component, wherein each principal component is original variable Linear combination, and it is irrelevant between each principal component, this, which allows for principal component, has certain superior performances than original variable. The weight of each principal component is distributed in Principal Component Analysis Algorithm according to the contribution rate of principal component, rather than artificially determine, thus The defect that weight is artificially determined in multi-variables analysis is overcome, so that result is objective rationally.
To the similarity matrix X builtsimPrincipal component analysis is carried out, the principal component extracted is
Y=(ysim1,ysim2,ysim3,ysim4,ysim5)
Each principal component contributor rate is (r1,r2,r3,r4,r5), then final Concept Semantic Similarity calculation formula is
δsim=r1ysim1+r2ysim2+r3ysim3+r4ysim4+r5ysim5
The method for constructing the coder-decoder structure of multilayered structure are as follows:
In encoder, the output of each sublayer is LayerNorm (x+Sublayer (x)), and wherein LayerNorm () is indicated Layer normalized function, the function that Sublayer () is realized using the sublayer itself that the residual error based on bull attention mechanism connects, X indicates the current layer vector to be inputted, and Mongolian sentence is generated corresponding vector using word2vec vector techniques, is then made For the input of the first layer coder, i.e. Sublayer (x) is the function of being realized by the sublayer itself based on bull attention mechanism, In order to promote residual error to connect, all sublayers and embeding layer generate dimension dmodel=512 output.
Fig. 1 illustrates one layer of encoder and decoder of the structure of Transformer.
With reference to Fig. 1, the Nx in left side represents one layer of encoder, and two sublayers are contained in this layer, and first sublayer is Bull attention sublayer, second sublayer are a propagated forward sublayers.Each sublayer outputs and inputs that there is residual errors Connection, this mode can theoretically return gradient well.Each sublayer is followed by step regularization operation, regularization Use can accelerate the convergence rate of model.The calculating of bull attention sublayer, will be more in enhanced bull attention mechanism It is discussed in detail in the model construction of head attention mechanism.Propagated forward sublayer has linear transformation twice in realizing, one time Relu is non- Linear activation, specific formula for calculation are as follows:
FFN (x)=γ (0, xW1+b1)W2+b2
X presentation code device inputs information, W1Indicate the corresponding weight of input vector, b1Indicate the inclined of bull attention mechanism Set the factor.(0,xW1+b1) indicate propagated forward sublayer input layer information, W2The corresponding weight of input vector is indicated, before b2 expression To the bias factor of propagation function, the nonlinear activation function of γ presentation code device layer.Wherein, encoder input information is insertion The vector obtained after layer information coal addition position encoded information processing.It is first son of encoder that propagated forward sublayer, which inputs information, Output after layer processing.
With reference to Fig. 1, the Nx on right side represents one layer in decoder of structure, there are three sublayer structures in this layer, first A sublayer is the bull attention sublayer of mask matrix majorization, for modeling the target side sentence generated, in trained mistake Cheng Zhong needs a mask matrix to control, and when so that bull attention calculates every time, only calculates and arrives preceding t-1 word.The Two sublayers are bull attention sublayers, are the attention mechanism between encoder and decoder, that is, go in original language to look for Relevant semantic information, the calculating of this part and the attention of other sequences to sequence calculate unanimously, make in Transformer With the mode of dot product.Third sublayer is propagated forward sublayer, completely the same with the propagated forward sublayer in encoder.Each Also all there is residual error connections and regularization operation for sublayer, to accelerate model convergence.
The present invention carries out position encoded method using trigonometric function are as follows:
Bull attention mechanism models the mode of sequence, neither the timing feature of RNN, nor the structuring of CNN is special Point, but the characteristics of a kind of bag of words (bag of words).If being further described, it should say that the mechanism regards a sequence to be flat Flat structure, no matter being all 1 in bull attention mechanism because distance word how far seemed.Such modeling pattern, it is real The relative distance relationship between word can be lost on border.Citing: " ox has eaten grass ", " grass has eaten ox ", " having eaten timothy " three sentences The corresponding expression of each word come is modeled, can be consistent.
In order to alleviate this problem, the present invention word is mapped to the location of in sentence in Transformer to Amount, adds in its embeding layer.The thinking is not to be suggested for the first time, and there is also similarly be difficult to build in fact for CNN model The defect of mould relative position (timing information), Facebook propose position encoded method.A kind of direct mode is, directly Inside absolute location information modeling to embeding layer, i.e., by word WiI be mapped to a vector, be added in its embeding layer, but The shortcomings that this mode is the sequence that can only model finite length.
A kind of new timing information modeling pattern is used in the present invention, that is, utilizes the periodicity of trigonometric function, Lai Jianmo Relative positional relationship between word.Specific mode is calculated absolute position as the variable in trigonometric function, specific public Formula is as follows:
Pos is position, and i is dimension.That is, position encoded each dimension corresponds to sine curve.Wavelength is formed From 2 π to the geometric progression of 100002 π.The present invention has selected this function, it allows model easily to learn relative position, Because for any constant offset k, PEpos+kIt can be expressed as PEposLinear function.dmodelBe it is position encoded after embeding layer Dimension, the value range of 2i is that minimum value is 0, and maximum value is dmodel
Trigonometric function has good periodicity, that is, every a certain distance, the value of dependent variable can repeat, this spy Property can be used to model relative distance;On the other hand, the codomain of trigonometric function is [- 1,1], can provide embeding layer member well The value of element.
The method of the model construction based on enhanced bull attention mechanism are as follows:
Fig. 2 illustrates the Series Modeling method based on bull attention mechanism.Note that it is apparent in order to show figure, Lack some connecting lines of picture, each word and first layer in " source language sentence subvector " layer (i.e. original language morpheme vector in figure) are more Node in head attention layer is all the relationship connected entirely, between first layer bull attention layer and second layer bull attention layer Node be also all the relationship connected entirely.It can be seen that the interaction distance between any two word is all in this modeling method It is that there is no relationships for relative distance between 1, with word.Under this mode, the semantic determination of each word, all consider with entirely The relationship of all words in sentence.Bull attention mechanism can capture more so that this global interaction becomes more complicated More information.
To sum up, bull attention mechanism can capture long-distance dependence knowledge when modeling sequence problem, have better Theoretical basis.
The mathematical formization expression of bull attention mechanism is described below.Firstly, from being said attention mechanism.
1. attention mechanism (model)
When handling a large amount of input information with neural network, the attention mechanism of human brain can also be used for reference, is only selected The information input of some keys is handled, the efficiency of Lai Tigao neural network.In current neural network model, it can incite somebody to action Maximum convergence (max pooling) gates (gating) mechanism approximatively to regard the note based on conspicuousness from bottom to top as Meaning power mechanism.In addition to this, top-down convergence type attention is also a kind of effective information selection mode.Understood with reading For task, as soon as given very long article, then the content of this article is putd question to.The problem of proposition, is only and in paragraph One or two of sentence is related, and rest part is all unrelated.In order to reduce the computation burden of neural network, it is only necessary to relevant Section, which is picked out, allows subsequent neural network to handle, without all article contents are all inputed to neural network.
Use x1:N=[x1,…,xN] indicate N number of input information, in order to save computing resource, not needing will be all N number of defeated Enter information and be all input to neural network to be calculated, it is only necessary to from x1:NThe middle some information inputs relevant with task of selection are to mind Through network.A query vector q relevant with task is given, we indicate to be selected information with attention variable z ∈ [1, N] Index position, i.e. z=i expression selected i-th of input information.In order to facilitate calculating, selected using the information of a kind of " soft " The system of selecting a good opportunity is calculated in given q and x first1:NUnder, select the probability α of i-th of input informationi,
Wherein s (xi, q) and it is scoring functions, following three kinds of modes can be used to calculate:
Addition model s (xi, q) and=vTtanh(Wxi+Uq)
Dot product model
Multiplied model
Wherein W, U, v are the network parameter that can learn, and T is the transposition operation of matrix.
Attention is distributed αiIt can be construed in Context query q, the concerned degree of i-th of information.Using one kind The information selection mechanism of " soft " is encoded to input information
Fig. 3 gives the example of " soft " attention mechanism.
2. the variant of attention mechanism
2.1 key-value pair attentions
More generally, input information can be indicated with key-value pair (key-value pair) format, wherein " key " K is used to It calculates attention and is distributed αi, " value " V is used to generate the information of selection.With (k, v)1:N=[(k1,v1),…,(kN,vN)] indicate N number of It inputs information, when the relevant query vector q of Given task, notices that force function is
Wherein s (ki, q) and indicate scoring functions.
Fig. 4 gives the example of key-value pair attention mechanism.If the k in key-value pair modei=vi,Then it is just etc. Valence is in common attention mechanism.
2.2 scaling dot product attentions
Scale dot product attention algorithm be describe by key-value pair K-V and query vector q, very be abstracted, here we Assuming that " key " K in key-value pair corresponds to same vector, i.e. K=V with " value " V, as shown in fig. 6, query vector q corresponds to target sentence The term vector of son.
There are three steps for specific operation.
1. the calculating process that each query vector q and " key " K can make a dot product
2. finally will use softmax their normalizings, it is maintained at the range of probability value in [0,1] section.
3. can be used to the end multiplied by " value " V as attention force vector again
Here mathematic(al) representation is as follows.
WhereinFor zoom factor, the transposition operation of T representing matrix.
2.3 bull attentions
Bull attention is to utilize multiple queries q1:M={ q1,…,qM, it is more to calculate the selection from input information in parallel A information.The different piece of each attention concern input information.
The method that the present invention improves the training optimization method and canonical strategy of model are as follows:
The training of model uses Adam method, and present invention employs a kind of learning rate adjusting methods for being warm up, such as Shown in formula:
The formula is meant that training needs to preset the super ginseng of a warmup_steps.
A. when train epochs step_num is less than the value, learning rate, the formula are determined with the Section 2 formula in bracket The linear function that the slope of really step_num variable is positive.
B. when train epochs step_num is greater than warm_steps, learning rate, the public affairs are determined with the first item in bracket Formula is just the power function of negative at an index.
So on the whole, learning rate is conducive to the fast convergence of model in downward trend after first rising.
Two important regularization methods are also used in model, one is common dropout method, is used in every Behind a sublayer and in the calculating of attention.The other is label smoothing method, that is, when training, calculate cross entropy When, no longer it is the model answer of one-hot, but also fills a non-zero minimum at each 0 value.In this way may be used To enhance the robustness of model, the BLEU value of lift scheme.
To sum up, the present invention is based on the modeling sequence method of Transformer, the model of sequence to sequence has still continued to use warp The coder-decoder structure of allusion quotation the difference is that not using RNN or CNN as Series Modeling mechanism, but has used more Head attention mechanism.The theoretic advantage of bull attention mechanism is more easily capture " long-distance dependence information ".It is so-called " long Apart from Dependency Specification " it can be regarded as: 1) word be in fact the symbol that can express diversity semantic information (ambiguity ask Topic).2) semanteme of a word determines, to rely on the context environmental where it.(based on context disappear qi) 3) word that has may The lesser context environmental of range is needed just to can determine that its semantic (short distance dependence phenomenon), some words may need one The biggish context environmental of range just can determine that its semantic (long-distance dependence phenomenon).
For example, following two word is seen:
" have many cuckoos on mountain, spring to when, can opening all over the mountains and plains, it is very beautiful."
" have many cuckoos on mountain, spring to when, can cry of birds or animals all over the mountains and plains, very in a roundabout way."
In this two word, " cuckoo " respectively refers to colored and bird.In machine translation problem, if do not seen distant away from its The word of distance is difficult to translate " cuckoo " this word correct.The example is an obvious example, can significantly be seen Remote dependence between word.Certainly, the most of meaning of a word in a small range of context semantic environment just It was determined that as the ratio regular meeting that above-mentioned example accounts in language is relatively small.It is desirable that be that model can either be good Learn to short-range dependence knowledge, can also learn the knowledge to long-distance dependence.
It is short-range that bull attention mechanism in Transformer of the present invention theoretically can preferably capture this length Knowledge is relied on, lower mask body compares three kinds of Series Modeling methods based on RNN, CNN, Transformer, between any two word Interaction distance on difference.
Fig. 7 is the method modeled using two-way RNN to sequence.Due to be to the element in sequence in order Processing, the interaction distance between two words may be considered the relative distance between them.Interaction distance between W1 and Wn It is n-1.Historical information selectively can be stored and be forgotten in RNN model theory with door control mechanism, have than Pure RNN structure preferably shows, but in the case that gating parameter amount is certain, this ability is certain.With the increasing of sentence Long, there is the apparent theoretical upper limit in the increase of relative distance.
Fig. 8 illustrates the method modeled using multi-layer C NN to sequence.The semantic ring of the CNN unit covering of first layer Border range is smaller, and the semantic environment range of second layer covering can become larger, and so on, the more CNN unit of deep layer, the semanteme of covering Environment can be bigger.One prefix can first interact on bottom CNN unit with the generation of the word of its short distance, then in slightly higher level It is interacted on CNN unit with its farther some word generation.So the CNN structure of multilayer embody be it is a kind of from part to the overall situation Feature extraction process.Interaction distance between word, corresponding thereto apart from directly proportional.It can only be in higher CNN apart from farther away word It meets on node, just generates interaction.This process may have more information and lose.
And Fig. 2 show the present invention is based on the Series Modeling methods of bull attention mechanism to be obviously better than two kinds of sides Formula can capture more information.
It is a specific illiteracy Chinese translation instance below.
It is tested using 120 Wan Menghan Parallel Corpus as data set, effect of the invention is verified.
Aiming at the problem that the serious Sparse occurred in Mongolian corpus, three kinds of processing modes are carried out, have been respectively: word Sew the supplementary element cutting of cutting, stem cutting and lattice, wherein the granularity of affixe cutting is smaller, the fineness ratio of stem cutting Larger, the dicing process of the supplementary element of lattice is similar to stem cutting, and the granularity of cutting is bigger.
The present invention tests these three cutting methods of corpus respectively, and experimental result is as shown in table 1.
Table 1
It can be seen that the quality that all cutting methods all improve translation from the experimental result in table.Wherein, stem is cut Operation BLEU value is divided to can be improved 1.02, although the supplementary element cutting promotion of lattice is unobvious, when common with stem cutting When effect, so that BLEU lifting values have reached 1.14.Why the result of affixe cutting is not so good as stem cutting, it is believed that main The reason is that since affixe cutting is too careful, so that the sentence length amplification after cutting is larger, and neural network machine translation pair The processing capacity of long sentence is weaker, therefore effect will receive influence.Distribution is indicated aggregate concept semantic similarity is added After calculating, BLEU improves 5.88.Then we randomly select out 65 groups of words pair, are built using word pair and similarity value for coordinate Vertical coordinate system indicates that algorithm calculates point of similarity value in a coordinate system to the distributed word of aggregate concept semantic similarity Cloth situation is analyzed, from fig. 9, it can be seen that the obtained continuity of this algorithm is relatively good, this explanation is based on calculating herein The similarity value calculation of method and the artificial marking of similarity have the good degree of correlation.By the pretreated of the data of front two Journey is put into finally, the data pre-processed are divided into training set, verifying collection and test set with certain proportion by this experiment Training in Transformer model, BLEU value improve 10.16, and training effect is obviously better than RNN.

Claims (10)

1. a kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer, which is characterized in that turning over Using Transformer model during translating, the Transformer model is to carry out position encoded and base using trigonometric function In multilevel encoder-decoder architecture of enhanced bull attention mechanism construction, so that the attention mechanism that places one's entire reliance upon is come The global dependence between outputting and inputting is drawn, recurrence and convolution are eliminated.
2. the illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer according to claim 1, Be characterized in that, before translation, first data pre-processed, it is described to data carry out pretreatment be to the word in Mongolian corpus The supplementary element of dry, affixe and lattice carries out cutting separation, to reduce the sparsity of data, while finding out Mongolian in stem, affixe With the language feature of the supplementary element of qualifying, and these language features are dissolved among training.
3. the illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer according to claim 2, It is characterized in that, the cutting separation includes the attached of the affixe cutting of small grain size, the stem cutting of big granularity and small-scale lattice Addition cutting point.
4. the illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer according to claim 1, It is characterized in that, after being pre-processed to data, the influence of comprehensive depth, density, semantic registration to Concept Semantic Similarity, collection Similarity matrix is established at the similarity algorithm of semantic distance and the information content, then carries out principal component analysis, by similarity moment Battle array is converted into principal component transform matrix, calculates principal component contributor rate, and be weighted processing as weight, obtains final Concept Semantic Similarity.
5. the illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer according to claim 4, It is characterized in that, the formula of the similarity matrix is expressed as
Xsim=(xi1,xi2,xi3,xi4,xi5)T, i=1,2,3 ..., n
The final Concept Semantic Similarity calculates representation formula
δsim=r1ysim1+r2ysim2+r3ysim3+r4ysim4+r5ysim5
Wherein, XsimIndicate similarity matrix, xi1Indicate Ds,xi2Indicate Ks,xi3 Indicate Zs,xi4Indicate Ss,xi5Indicate Is,N is by relatively more general Read the logarithm to the notional word in set, xi=(Dsi,Ksi,Zsi,Ssi,Isi), be principal component input sample set in one to Amount, wherein respectively represented per one-dimensional variable each section Semantic Similarity Measurement in comprehensive similarity computing module as a result, DsiTable Show the relationship in vector between the semantic distance and similarity of i-th dimension element, KsiIt indicates in vector in terms of the depth of i-th dimension element Semantic similarity, ZsiIndicate the Effects of Density factor of the notional word c of i-th dimension element in vector, SsiIndicate i-th dimension in vector Similarity in terms of the semantic registration of element, IsiIndicate the similarity in vector in terms of the information content of i-th dimension element;δsim Indicate Concept Semantic Similarity, ysim1,ysim2,ysim3,ysim4,ysim5For to similarity matrix XsimPrincipal component analysis is carried out to be mentioned The principal component of taking-up, r1,r2,r3,r4,r5Indicate each principal component contributor rate.
6. the illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer according to claim 1, Be characterized in that, the bull attention mechanism be described as inquiry and one group of key-value pair is mapped to output, wherein inquiry, key, value and Output is all vector, and output is calculated as the weighted sum of value, and the weight for distributing to each value is compatible with corresponding secret key by inquiring Property function is calculated.
7. the illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer according to claim 1, It is characterized in that,
The encoder is made of N number of identical layer, and every layer there are two sublayers, and first sublayer is bull attention sublayer, the Two sublayers are propagated forward sublayers, and each sublayer is output and input there is residual error connection, each sublayer followed by A step regularization operation, with accelerate model convergence;
The decoder is made of N number of identical layer, and every layer there are three sublayers, and first sublayer is the bull of mask matrix majorization Attention sublayer, it is each with a mask matrix majorization during training for modeling the target side sentence generated It is only calculated when bull attention calculates and arrives preceding t-1 word;Second sublayer is bull attention sublayer, is encoder and decoder Between attention mechanism, that is, go in original language to look for relevant semantic information;Third sublayer is propagated forward sublayer, with coding Propagated forward sublayer in device is completely the same, and each sublayer outputs and inputs that there is residual error connections, and one step of heel is just Change operation, then to accelerate model convergence.
8. according to claim 1 or the illiteracy Chinese machine translation side of the 7 enhancing semantic feature information based on Transformer Method, which is characterized in that construct multilevel encoder-decoder architecture by the following method:
In encoder, the output of each sublayer is LayerNorm (x+Sublayer (x)), and wherein LayerNorm () expression layer is returned One changes function, the function that Sublayer () is realized using the sublayer itself that the residual error based on bull attention mechanism connects, x table Show the current layer vector to be inputted, Mongolian sentence is generated into corresponding vector, then conduct using word2vec vector techniques The input of first layer coder, i.e. Sublayer (x) are the functions of being realized by the sublayer itself based on bull attention mechanism, are The connection of promotion residual error, all sublayers and embeding layer generate dimension dmodel=512 output.
9. according to claim 1 or the illiteracy Chinese machine translation side of the 7 enhancing semantic feature information based on Transformer Method, which is characterized in that
The propagated forward sublayer of the encoder has linear transformation twice in realizing, a Relu nonlinear activation is specific to calculate Formula is as follows:
FFN (x)=γ (0, xW1+b1)W2+b2
X presentation code device inputs information, W1Indicate the corresponding weight of input vector, b1Indicate bull attention mechanism biasing because Son, (0, xW1+b1) indicate propagated forward sublayer input layer information, W2Indicate the corresponding weight of input vector, b2 indicates preceding to biography Broadcast the bias factor of function, the nonlinear activation function of γ presentation code device layer.
10. according to claim 1 or the illiteracy Chinese machine translation side of the 7 enhancing semantic feature information based on Transformer Method, which is characterized in that it carry out position encoded being calculated absolute position as the variable in trigonometric function using trigonometric function, Formula is as follows:
In formula, pos is position, and i is dimension, i.e., position encoded each dimension correspond to sine curve, wavelength formed from 2 π to The geometric progression of 100002 π, dmodelBe it is position encoded after embeding layer dimension, the value range of 2i is that minimum value is 0, most Big value is dmodel
CN201811231017.2A 2018-10-22 2018-10-22 A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer Pending CN109492232A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811231017.2A CN109492232A (en) 2018-10-22 2018-10-22 A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811231017.2A CN109492232A (en) 2018-10-22 2018-10-22 A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer

Publications (1)

Publication Number Publication Date
CN109492232A true CN109492232A (en) 2019-03-19

Family

ID=65692441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811231017.2A Pending CN109492232A (en) 2018-10-22 2018-10-22 A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer

Country Status (1)

Country Link
CN (1) CN109492232A (en)

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083826A (en) * 2019-03-21 2019-08-02 昆明理工大学 A kind of old man's bilingual alignment method based on Transformer model
CN110196946A (en) * 2019-05-29 2019-09-03 华南理工大学 A kind of personalized recommendation method based on deep learning
CN110297887A (en) * 2019-06-26 2019-10-01 山东大学 Service robot personalization conversational system and method based on cloud platform
CN110321961A (en) * 2019-07-09 2019-10-11 北京金山数字娱乐科技有限公司 A kind of data processing method and device
CN110321962A (en) * 2019-07-09 2019-10-11 北京金山数字娱乐科技有限公司 A kind of data processing method and device
CN110349676A (en) * 2019-06-14 2019-10-18 华南师范大学 Timing physiological data classification method, device, storage medium and processor
CN110390340A (en) * 2019-07-18 2019-10-29 暗物智能科技(广州)有限公司 The training method and detection method of feature coding model, vision relationship detection model
CN110427493A (en) * 2019-07-11 2019-11-08 新华三大数据技术有限公司 Electronic health record processing method, model training method and relevant apparatus
CN110543551A (en) * 2019-09-04 2019-12-06 北京香侬慧语科技有限责任公司 question and statement processing method and device
CN110597947A (en) * 2019-03-20 2019-12-20 桂林电子科技大学 Reading understanding system and method based on global and local attention interaction
CN110598221A (en) * 2019-08-29 2019-12-20 内蒙古工业大学 Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network
CN110619034A (en) * 2019-06-27 2019-12-27 中山大学 Text keyword generation method based on Transformer model
CN110674647A (en) * 2019-09-27 2020-01-10 电子科技大学 Layer fusion method based on Transformer model and computer equipment
CN110704587A (en) * 2019-08-22 2020-01-17 平安科技(深圳)有限公司 Text answer searching method and device
CN110717343A (en) * 2019-09-27 2020-01-21 电子科技大学 Optimal alignment method based on transformer attention mechanism output
CN110765768A (en) * 2019-10-16 2020-02-07 北京工业大学 Optimized text abstract generation method
CN110795535A (en) * 2019-10-28 2020-02-14 桂林电子科技大学 Reading understanding method for depth separable convolution residual block
CN110827219A (en) * 2019-10-31 2020-02-21 北京小米智能科技有限公司 Training method, device and medium of image processing model
CN111080032A (en) * 2019-12-30 2020-04-28 成都数之联科技有限公司 Load prediction method based on Transformer structure
CN111105423A (en) * 2019-12-17 2020-05-05 北京小白世纪网络科技有限公司 Deep learning-based kidney segmentation method in CT image
CN111310485A (en) * 2020-03-12 2020-06-19 南京大学 Machine translation method, device and storage medium
CN111353315A (en) * 2020-01-21 2020-06-30 沈阳雅译网络技术有限公司 Deep neural machine translation system based on random residual algorithm
CN111382583A (en) * 2020-03-03 2020-07-07 新疆大学 Chinese-Uygur name translation system with mixed multiple strategies
CN111401052A (en) * 2020-04-24 2020-07-10 南京莱科智能工程研究院有限公司 Semantic understanding-based multilingual text matching method and system
CN111428509A (en) * 2020-03-05 2020-07-17 北京一览群智数据科技有限责任公司 Latin letter-based Uygur language processing method and system
CN111428443A (en) * 2020-04-15 2020-07-17 中国电子科技网络信息安全有限公司 Entity linking method based on entity context semantic interaction
CN111444695A (en) * 2020-03-25 2020-07-24 腾讯科技(深圳)有限公司 Text generation method, device and equipment based on artificial intelligence and storage medium
CN111488742A (en) * 2019-08-19 2020-08-04 北京京东尚科信息技术有限公司 Method and device for translation
CN111507328A (en) * 2020-04-13 2020-08-07 北京爱咔咔信息技术有限公司 Text recognition and model training method, system, equipment and readable storage medium
CN111581987A (en) * 2020-04-13 2020-08-25 广州天鹏计算机科技有限公司 Disease classification code recognition method, device and storage medium
CN111626062A (en) * 2020-05-29 2020-09-04 苏州思必驰信息科技有限公司 Text semantic coding method and system
CN112037776A (en) * 2019-05-16 2020-12-04 武汉Tcl集团工业研究院有限公司 Voice recognition method, voice recognition device and terminal equipment
CN112084794A (en) * 2020-09-18 2020-12-15 西藏大学 Tibetan-Chinese translation method and device
WO2020253060A1 (en) * 2019-06-17 2020-12-24 平安科技(深圳)有限公司 Speech recognition method, model training method, apparatus and device, and storage medium
CN112185104A (en) * 2020-08-22 2021-01-05 南京理工大学 Traffic big data restoration method based on countermeasure autoencoder
CN112329760A (en) * 2020-11-17 2021-02-05 内蒙古工业大学 Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
CN112507733A (en) * 2020-11-06 2021-03-16 昆明理工大学 Dependency graph network-based Hanyue neural machine translation method
CN112580373A (en) * 2020-12-26 2021-03-30 内蒙古工业大学 High-quality Mongolian unsupervised neural machine translation method
CN112947930A (en) * 2021-01-29 2021-06-11 南通大学 Method for automatically generating Python pseudo code based on Transformer
CN113065432A (en) * 2021-03-23 2021-07-02 内蒙古工业大学 Handwritten Mongolian recognition method based on data enhancement and ECA-Net
CN113076398A (en) * 2021-03-30 2021-07-06 昆明理工大学 Cross-language information retrieval method based on bilingual dictionary mapping guidance
CN113095091A (en) * 2021-04-09 2021-07-09 天津大学 Chapter machine translation system and method capable of selecting context information
CN113177546A (en) * 2021-04-30 2021-07-27 中国科学技术大学 Target detection method based on sparse attention module
CN113255597A (en) * 2021-06-29 2021-08-13 南京视察者智能科技有限公司 Transformer-based behavior analysis method and device and terminal equipment thereof
CN113297841A (en) * 2021-05-24 2021-08-24 哈尔滨工业大学 Neural machine translation method based on pre-training double-word vectors
CN113761841A (en) * 2021-04-19 2021-12-07 腾讯科技(深圳)有限公司 Method for converting text data into acoustic features
US11556723B2 (en) 2019-10-24 2023-01-17 Beijing Xiaomi Intelligent Technology Co., Ltd. Neural network model compression method, corpus translation method and device
CN116186249A (en) * 2022-10-24 2023-05-30 数采小博科技发展有限公司 Item prediction robot for electronic commerce commodity and implementation method thereof
CN117711417A (en) * 2024-02-05 2024-03-15 武汉大学 Voice quality enhancement method and system based on frequency domain self-attention network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105957518A (en) * 2016-06-16 2016-09-21 内蒙古大学 Mongolian large vocabulary continuous speech recognition method
CN107967262A (en) * 2017-11-02 2018-04-27 内蒙古工业大学 A kind of neutral net covers Chinese machine translation method
CN108681539A (en) * 2018-05-07 2018-10-19 内蒙古工业大学 A kind of illiteracy Chinese nerve interpretation method based on convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105957518A (en) * 2016-06-16 2016-09-21 内蒙古大学 Mongolian large vocabulary continuous speech recognition method
CN107967262A (en) * 2017-11-02 2018-04-27 内蒙古工业大学 A kind of neutral net covers Chinese machine translation method
CN108681539A (en) * 2018-05-07 2018-10-19 内蒙古工业大学 A kind of illiteracy Chinese nerve interpretation method based on convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ASHISH VASWANI ET AL.: "Attention Is All You Need", 《31ST CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2017)》 *
王桐 等: "WordNet中的综合概念语义相似度计算方法", 《北京邮电大学学报》 *

Cited By (72)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110597947B (en) * 2019-03-20 2023-03-28 桂林电子科技大学 Reading understanding system and method based on global and local attention interaction
CN110597947A (en) * 2019-03-20 2019-12-20 桂林电子科技大学 Reading understanding system and method based on global and local attention interaction
CN110083826A (en) * 2019-03-21 2019-08-02 昆明理工大学 A kind of old man's bilingual alignment method based on Transformer model
CN112037776A (en) * 2019-05-16 2020-12-04 武汉Tcl集团工业研究院有限公司 Voice recognition method, voice recognition device and terminal equipment
CN110196946A (en) * 2019-05-29 2019-09-03 华南理工大学 A kind of personalized recommendation method based on deep learning
CN110196946B (en) * 2019-05-29 2021-03-30 华南理工大学 Personalized recommendation method based on deep learning
CN110349676A (en) * 2019-06-14 2019-10-18 华南师范大学 Timing physiological data classification method, device, storage medium and processor
CN110349676B (en) * 2019-06-14 2021-10-29 华南师范大学 Time-series physiological data classification method and device, storage medium and processor
WO2020253060A1 (en) * 2019-06-17 2020-12-24 平安科技(深圳)有限公司 Speech recognition method, model training method, apparatus and device, and storage medium
CN110297887A (en) * 2019-06-26 2019-10-01 山东大学 Service robot personalization conversational system and method based on cloud platform
CN110619034A (en) * 2019-06-27 2019-12-27 中山大学 Text keyword generation method based on Transformer model
CN110321962B (en) * 2019-07-09 2021-10-08 北京金山数字娱乐科技有限公司 Data processing method and device
CN110321962A (en) * 2019-07-09 2019-10-11 北京金山数字娱乐科技有限公司 A kind of data processing method and device
CN110321961A (en) * 2019-07-09 2019-10-11 北京金山数字娱乐科技有限公司 A kind of data processing method and device
CN110427493B (en) * 2019-07-11 2022-04-08 新华三大数据技术有限公司 Electronic medical record processing method, model training method and related device
CN110427493A (en) * 2019-07-11 2019-11-08 新华三大数据技术有限公司 Electronic health record processing method, model training method and relevant apparatus
CN110390340A (en) * 2019-07-18 2019-10-29 暗物智能科技(广州)有限公司 The training method and detection method of feature coding model, vision relationship detection model
CN111488742A (en) * 2019-08-19 2020-08-04 北京京东尚科信息技术有限公司 Method and device for translation
CN111488742B (en) * 2019-08-19 2021-06-29 北京京东尚科信息技术有限公司 Method and device for translation
CN110704587A (en) * 2019-08-22 2020-01-17 平安科技(深圳)有限公司 Text answer searching method and device
CN110704587B (en) * 2019-08-22 2023-10-20 平安科技(深圳)有限公司 Text answer searching method and device
CN110598221A (en) * 2019-08-29 2019-12-20 内蒙古工业大学 Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network
CN110543551A (en) * 2019-09-04 2019-12-06 北京香侬慧语科技有限责任公司 question and statement processing method and device
CN110543551B (en) * 2019-09-04 2022-11-08 北京香侬慧语科技有限责任公司 Question and statement processing method and device
CN110717343B (en) * 2019-09-27 2023-03-14 电子科技大学 Optimal alignment method based on transformer attention mechanism output
CN110674647A (en) * 2019-09-27 2020-01-10 电子科技大学 Layer fusion method based on Transformer model and computer equipment
CN110717343A (en) * 2019-09-27 2020-01-21 电子科技大学 Optimal alignment method based on transformer attention mechanism output
CN110765768A (en) * 2019-10-16 2020-02-07 北京工业大学 Optimized text abstract generation method
US11556723B2 (en) 2019-10-24 2023-01-17 Beijing Xiaomi Intelligent Technology Co., Ltd. Neural network model compression method, corpus translation method and device
CN110795535A (en) * 2019-10-28 2020-02-14 桂林电子科技大学 Reading understanding method for depth separable convolution residual block
CN110827219A (en) * 2019-10-31 2020-02-21 北京小米智能科技有限公司 Training method, device and medium of image processing model
CN110827219B (en) * 2019-10-31 2023-04-07 北京小米智能科技有限公司 Training method, device and medium of image processing model
CN111105423A (en) * 2019-12-17 2020-05-05 北京小白世纪网络科技有限公司 Deep learning-based kidney segmentation method in CT image
CN111105423B (en) * 2019-12-17 2021-06-29 北京小白世纪网络科技有限公司 Deep learning-based kidney segmentation method in CT image
CN111080032A (en) * 2019-12-30 2020-04-28 成都数之联科技有限公司 Load prediction method based on Transformer structure
CN111080032B (en) * 2019-12-30 2023-08-29 成都数之联科技股份有限公司 Load prediction method based on transducer structure
CN111353315A (en) * 2020-01-21 2020-06-30 沈阳雅译网络技术有限公司 Deep neural machine translation system based on random residual algorithm
CN111353315B (en) * 2020-01-21 2023-04-25 沈阳雅译网络技术有限公司 Deep nerve machine translation system based on random residual error algorithm
CN111382583A (en) * 2020-03-03 2020-07-07 新疆大学 Chinese-Uygur name translation system with mixed multiple strategies
CN111428509A (en) * 2020-03-05 2020-07-17 北京一览群智数据科技有限责任公司 Latin letter-based Uygur language processing method and system
CN111310485B (en) * 2020-03-12 2022-06-21 南京大学 Machine translation method, device and storage medium
CN111310485A (en) * 2020-03-12 2020-06-19 南京大学 Machine translation method, device and storage medium
CN111444695B (en) * 2020-03-25 2022-03-01 腾讯科技(深圳)有限公司 Text generation method, device and equipment based on artificial intelligence and storage medium
CN111444695A (en) * 2020-03-25 2020-07-24 腾讯科技(深圳)有限公司 Text generation method, device and equipment based on artificial intelligence and storage medium
CN111507328A (en) * 2020-04-13 2020-08-07 北京爱咔咔信息技术有限公司 Text recognition and model training method, system, equipment and readable storage medium
CN111581987A (en) * 2020-04-13 2020-08-25 广州天鹏计算机科技有限公司 Disease classification code recognition method, device and storage medium
CN111428443A (en) * 2020-04-15 2020-07-17 中国电子科技网络信息安全有限公司 Entity linking method based on entity context semantic interaction
CN111428443B (en) * 2020-04-15 2022-09-13 中国电子科技网络信息安全有限公司 Entity linking method based on entity context semantic interaction
CN111401052A (en) * 2020-04-24 2020-07-10 南京莱科智能工程研究院有限公司 Semantic understanding-based multilingual text matching method and system
CN111626062B (en) * 2020-05-29 2023-05-30 思必驰科技股份有限公司 Text semantic coding method and system
CN111626062A (en) * 2020-05-29 2020-09-04 苏州思必驰信息科技有限公司 Text semantic coding method and system
CN112185104A (en) * 2020-08-22 2021-01-05 南京理工大学 Traffic big data restoration method based on countermeasure autoencoder
CN112185104B (en) * 2020-08-22 2021-12-10 南京理工大学 Traffic big data restoration method based on countermeasure autoencoder
CN112084794A (en) * 2020-09-18 2020-12-15 西藏大学 Tibetan-Chinese translation method and device
CN112507733A (en) * 2020-11-06 2021-03-16 昆明理工大学 Dependency graph network-based Hanyue neural machine translation method
CN112329760A (en) * 2020-11-17 2021-02-05 内蒙古工业大学 Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
CN112329760B (en) * 2020-11-17 2021-12-21 内蒙古工业大学 Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
CN112580373A (en) * 2020-12-26 2021-03-30 内蒙古工业大学 High-quality Mongolian unsupervised neural machine translation method
CN112580373B (en) * 2020-12-26 2023-06-27 内蒙古工业大学 High-quality Mongolian non-supervision neural machine translation method
CN112947930A (en) * 2021-01-29 2021-06-11 南通大学 Method for automatically generating Python pseudo code based on Transformer
CN113065432A (en) * 2021-03-23 2021-07-02 内蒙古工业大学 Handwritten Mongolian recognition method based on data enhancement and ECA-Net
CN113076398B (en) * 2021-03-30 2022-07-29 昆明理工大学 Cross-language information retrieval method based on bilingual dictionary mapping guidance
CN113076398A (en) * 2021-03-30 2021-07-06 昆明理工大学 Cross-language information retrieval method based on bilingual dictionary mapping guidance
CN113095091A (en) * 2021-04-09 2021-07-09 天津大学 Chapter machine translation system and method capable of selecting context information
CN113761841A (en) * 2021-04-19 2021-12-07 腾讯科技(深圳)有限公司 Method for converting text data into acoustic features
CN113177546A (en) * 2021-04-30 2021-07-27 中国科学技术大学 Target detection method based on sparse attention module
CN113297841A (en) * 2021-05-24 2021-08-24 哈尔滨工业大学 Neural machine translation method based on pre-training double-word vectors
CN113255597A (en) * 2021-06-29 2021-08-13 南京视察者智能科技有限公司 Transformer-based behavior analysis method and device and terminal equipment thereof
CN116186249A (en) * 2022-10-24 2023-05-30 数采小博科技发展有限公司 Item prediction robot for electronic commerce commodity and implementation method thereof
CN116186249B (en) * 2022-10-24 2023-10-13 数采小博科技发展有限公司 Item prediction robot for electronic commerce commodity and implementation method thereof
CN117711417A (en) * 2024-02-05 2024-03-15 武汉大学 Voice quality enhancement method and system based on frequency domain self-attention network
CN117711417B (en) * 2024-02-05 2024-04-30 武汉大学 Voice quality enhancement method and system based on frequency domain self-attention network

Similar Documents

Publication Publication Date Title
CN109492232A (en) A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer
CN110334219B (en) Knowledge graph representation learning method based on attention mechanism integrated with text semantic features
JP7468929B2 (en) How to acquire geographical knowledge
CN106202010B (en) Method and apparatus based on deep neural network building Law Text syntax tree
CN111581401B (en) Local citation recommendation system and method based on depth correlation matching
CN111444343B (en) Cross-border national culture text classification method based on knowledge representation
CN109492227A (en) It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations
CN110377686A (en) A kind of address information Feature Extraction Method based on deep neural network model
CN107729311B (en) Chinese text feature extraction method fusing text moods
CN110222140A (en) A kind of cross-module state search method based on confrontation study and asymmetric Hash
CN107480132A (en) A kind of classic poetry generation method of image content-based
CN106055675B (en) A kind of Relation extraction method based on convolutional neural networks and apart from supervision
CN106650789A (en) Image description generation method based on depth LSTM network
CN111858932A (en) Multiple-feature Chinese and English emotion classification method and system based on Transformer
CN105938485A (en) Image description method based on convolution cyclic hybrid model
CN103778227A (en) Method for screening useful images from retrieved images
CN108268449A (en) A kind of text semantic label abstracting method based on lexical item cluster
CN105528437A (en) Question-answering system construction method based on structured text knowledge extraction
CN111881677A (en) Address matching algorithm based on deep learning model
Tang et al. Deep sequential fusion LSTM network for image description
CN111291556A (en) Chinese entity relation extraction method based on character and word feature fusion of entity meaning item
CN110765755A (en) Semantic similarity feature extraction method based on double selection gates
CN113515632B (en) Text classification method based on graph path knowledge extraction
CN110795565A (en) Semantic recognition-based alias mining method, device, medium and electronic equipment
CN113553440A (en) Medical entity relationship extraction method based on hierarchical reasoning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190319