CN110688860A - A Weight Allocation Method Based on Transformer Multiple Attention Mechanisms - Google Patents

A Weight Allocation Method Based on Transformer Multiple Attention Mechanisms Download PDF

Info

Publication number
CN110688860A
CN110688860A CN201910924914.XA CN201910924914A CN110688860A CN 110688860 A CN110688860 A CN 110688860A CN 201910924914 A CN201910924914 A CN 201910924914A CN 110688860 A CN110688860 A CN 110688860A
Authority
CN
China
Prior art keywords
output
attention
model
sequence
attention mechanism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910924914.XA
Other languages
Chinese (zh)
Other versions
CN110688860B (en
Inventor
闫明明
陈绪浩
罗华成
赵宇
段世豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910924914.XA priority Critical patent/CN110688860B/en
Publication of CN110688860A publication Critical patent/CN110688860A/en
Application granted granted Critical
Publication of CN110688860B publication Critical patent/CN110688860B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a weight distribution method based on multiple attention mechanisms of a transformer; the method comprises the following steps: the input to the attention mechanism is the word vectors in the target and source languages, and the output is an alignment tensor. Multiple alignment tensor outputs may be output using multiple attention mechanism functions, and each output is different due to random parametric variations in the calculation process. All attention mechanism models are put into operation, and various attention mechanism outputs are subjected to regularization calculation to approach to the optimal output. The regularization calculation method determines that the obtained value does not deviate from the optimal value too far, the optimality of each attention model is also preserved, and if the experimental effect of one attention model is excellent, the weight function of the model is increased to increase the influence of the model on final output, so that the translation effect is improved.

Description

一种基于transformer多种注意力机制的权重分配方法A Weight Allocation Method Based on Transformer Multiple Attention Mechanisms

技术领域technical field

本发明涉及的神经机器翻译相关领域,具体来讲是一种基于transformer多种注意力机制权重分配方法。The present invention relates to the related field of neural machine translation, in particular to a weight distribution method based on multiple attention mechanisms of transformers.

背景技术Background technique

神经网络机器翻译是最近几年提出来的一种机器翻译方法。相比于传统的统计机器翻译而言,神经网络机器翻译能够训练一张能够从一个序列映射到另一个序列的神经网络,输出的可以是一个变长的序列,这在翻译、对话和文字概括方面能够获得非常好的表现。神经网络机器翻译其实是一个编码-译码系统,编码把源语言序列进行编码,并提取源语言中信息,通过译码再把这种信息转换到另一种语言即目标语言中来,从而完成对语言的翻译。Neural network machine translation is a machine translation method proposed in recent years. Compared with traditional statistical machine translation, neural network machine translation can train a neural network that can map from one sequence to another, and the output can be a variable-length sequence, which is useful in translation, dialogue and text generalization. can get very good performance. Neural network machine translation is actually an encoding-decoding system. The encoding encodes the source language sequence, extracts the information in the source language, and then converts this information into another language, the target language, through decoding. Translation of languages.

而该模型在产生输出的时候,会产生一个注意力范围来表示接下来输出的时候要重点关注输入序列的哪些部分,然后根据关注的区域来产生下一个输出,如此反复。注意力机制和人的一些行为特征有一定相似之处,人在看一段话的时候,通常只会重点注意具有信息量的词,而非全部词,即人会赋予每个词的注意力权重不同。注意力机制模型虽然增加了模型的训练难度,但提升了文本生成的效果。在该专利中,我们就是在注意力机制函数中进行改进.When the model generates output, it will generate an attention range to indicate which parts of the input sequence to focus on when outputting next, and then generate the next output according to the area of interest, and so on. There are certain similarities between the attention mechanism and some behavioral characteristics of people. When people read a paragraph, they usually only pay attention to words with information, not all words, that is, people will give attention weight to each word. different. Although the attention mechanism model increases the training difficulty of the model, it improves the effect of text generation. In this patent, we are making improvements in the attention mechanism function.

自2013年提出了神经机器翻译系统之后,随着计算机的计算力发展的迅速,神经机器翻译也得到了迅速的发展,先后提出了seq-seq模型,transformer模型等等,2013年,Nal Kalchbrenner和Phil Blunsom提出了一种用于机器翻译的新型端到端编码器-解码器结构[4]。该模型可以使用卷积神经网络(CNN)将给定的一段源文本编码成一个连续的向量,然后再使用循环神经网络(RNN)作为解码器将该状态向量转换成目标语言。2017年谷歌发布了一种新的机器学习模型Transformer,该模型在机器翻译及其他语言理解任务上的表现远远超越了现有算法。Since the neural machine translation system was proposed in 2013, with the rapid development of computer computing power, neural machine translation has also developed rapidly, and successively proposed the seq-seq model, the transformer model, etc. In 2013, Nal Kalchbrenner and Phil Blunsom proposed a novel end-to-end encoder-decoder structure for machine translation [4]. The model can use a convolutional neural network (CNN) to encode a given piece of source text into a continuous vector, and then use a recurrent neural network (RNN) as a decoder to convert this state vector into the target language. In 2017, Google released a new machine learning model, Transformer, which far outperformed existing algorithms in machine translation and other language understanding tasks.

传统技术存在以下技术问题:The traditional technology has the following technical problems:

在注意力机制函数对齐过程中,现有的框架是先计算输入的两个句子词向量的相似度,再进行一系列计算得到对齐函数。而每个对齐函数在计算时会输出一遍,再以该次的输出作为下次的输入进行计算。这样单个线程的计算,很有可能导致误差的累积。我们引进多种注意力机制的权重分配,就是为了找出多个计算过程中的最优解。达到最佳翻译效果。In the process of attention mechanism function alignment, the existing framework first calculates the similarity of the two input sentence word vectors, and then performs a series of calculations to obtain the alignment function. Each alignment function will output once during calculation, and then use the output of this time as the next input for calculation. The calculation of such a single thread is likely to lead to the accumulation of errors. We introduce the weight distribution of multiple attention mechanisms in order to find the optimal solution in multiple computing processes. achieve the best translation effect.

发明内容SUMMARY OF THE INVENTION

因此,为了解决上述不足,本发明在此提供一种基于transformer多种注意力机制的权重分配方法;应用在基于注意力机制的transformer框架模型上。包括:注意力机制的输入是目标语言的目标语言和源语言的词向量,输出是一个对齐张量。使用多个注意力机制函数可以输出多个对齐张量输出,并且由于计算过程中有随机参数的变化,所以每个输出是不同的。现今已经提出了很多个注意力机制模型,比如自注意力机制,多头注意力机制,全部注意力机制,局部注意力机制等等,每种不同的注意力机制有着不同的输出与特点,我们将所有的注意力机制模型都投入运算中,并将多种注意力机制输出做正则化计算,来逼近最佳输出。Therefore, in order to solve the above deficiencies, the present invention provides a weight distribution method based on multiple attention mechanisms of the transformer; it is applied to the transformer frame model based on the attention mechanism. Including: the input of the attention mechanism is the target language of the target language and the word vector of the source language, and the output is an alignment tensor. Using multiple attention mechanism functions can output multiple aligned tensor outputs, and each output is different due to random parameter changes during computation. Many attention mechanism models have been proposed, such as self-attention mechanism, multi-head attention mechanism, full attention mechanism, local attention mechanism, etc. Each different attention mechanism has different outputs and characteristics, we will All attention mechanism models are put into operation, and the outputs of various attention mechanisms are regularized to approximate the optimal output.

本发明是这样实现的,构造一种基于transformer多种注意力机制的权重分配方法,应用基于注意力机制的transformer模型中,其特征在于;包括如下步骤:The present invention is implemented in this way, constructs a weight distribution method based on multiple attention mechanisms of transformers, and applies it to the transformer model based on attention mechanisms, and is characterized in that it includes the following steps:

步骤1:在transformer模型中,针对应用情景选取其中较优秀的模型输出。Step 1: In the transformer model, select the best model output for the application scenario.

步骤2:初始化权重序列δ的值,第一次计算时权重序列δ为随机数,并且δ12+....+δi=1;Step 2: Initialize the value of the weight sequence δ, the weight sequence δ is a random number in the first calculation, and δ 12 +....+δ i =1;

步骤3:将各模型输出进行正则化计算并计算出各输出的中心点(与所有值最接近的点),通过计算公式fin_out=δ1O12O23O3.......+δiOi计算出最优的匹配值作为最终输出;其中δ12+....+δi=1且δi是我们设置的权重参数;Oi是各种注意力模型的输出;Step 3: Regularize the output of each model and calculate the center point of each output (the closest point to all values), through the calculation formula fin_out=δ 1 O 12 O 23 O 3 .. .....+δ i O i calculates the optimal matching value as the final output; where δ 12 +....+δ i =1 and δ i is the weight parameter we set; O i is The output of various attention models;

步骤4:将最终输出代入后续运算中,计算与上一次训练相比损失函数的差值,若损失函数下降,则提高δ中靠中心点的序列比重;若损失函数上升,则提升δ序列中与中心点最远的序列比重,整个过程严格遵守δ12+....+δi=1的规则;Step 4: Substitute the final output into the subsequent operation, and calculate the difference between the loss function and the previous training. If the loss function decreases, increase the proportion of the sequence near the center point in the delta; if the loss function increases, increase the sequence in the delta sequence. The proportion of the sequence farthest from the center point, the whole process strictly follows the rule of δ 12 +....+δ i =1;

步骤5:多次循环迭代计算,最终确定最佳权重序列δ。Step 5: Iterative calculation in multiple loops, and finally determine the optimal weight sequence δ.

本发明具有如下优点:本发明公开了一种基于transformer多种注意力机制的权重分配方法。应用在基于注意力机制的transformer框架模型上。包括:注意力机制的输入是目标语言的目标语言和源语言的词向量,输出是一个对齐张量。使用多个注意力机制函数可以输出多个对齐张量输出,并且由于计算过程中有随机参数的变化,所以每个输出是不同的。现今已经提出了很多个注意力机制模型,比如自注意力机制,多头注意力机制,全部注意力机制,局部注意力机制等等,每种不同的注意力机制有着不同的输出与特点,我们将所有的注意力机制模型都投入运算中,并将多种注意力机制输出做正则化计算,来逼近最佳输出。运用公式:fin_out=δ1O12O23O3.......+δiOi其中δ12+....+δi=1且δi是我们设置的权重参数。Oi是各种注意力模型的输出,这种正则化计算方法确定了所得的值不会偏离最优值太远,也保存了各个注意力模型的最优性,若是一个注意力模型的实验效果极好,则加大该模型的权重函数来加大该模型对最终输出的影响力,从而提高翻译效果。The invention has the following advantages: the invention discloses a weight distribution method based on multiple attention mechanisms of the transformer. Applied to the transformer framework model based on the attention mechanism. Including: the input of the attention mechanism is the target language of the target language and the word vector of the source language, and the output is an alignment tensor. Using multiple attention mechanism functions can output multiple aligned tensor outputs, and each output is different due to random parameter changes during computation. Many attention mechanism models have been proposed, such as self-attention mechanism, multi-head attention mechanism, full attention mechanism, local attention mechanism, etc. Each different attention mechanism has different outputs and characteristics, we will All attention mechanism models are put into operation, and the outputs of various attention mechanisms are regularized to approximate the optimal output. Using the formula: fin_out=δ 1 O 12 O 23 O 3 .......+δ i O i where δ 12 +....+δ i =1 and δ i is the weight parameter we set. O i is the output of various attention models. This regularization calculation method determines that the obtained value will not deviate too far from the optimal value, and also preserves the optimality of each attention model. If it is an experiment of an attention model If the effect is excellent, then increase the weight function of the model to increase the influence of the model on the final output, thereby improving the translation effect.

具体实施方式Detailed ways

下面将对本发明进行详细说明,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The present invention will be described in detail below, and the technical solutions in the embodiments of the present invention will be described clearly and completely. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明通过改进在此提供一种基于transformer多种注意力机制的权重分配方法。应用在基于注意力机制的transformer框架模型上。The present invention provides a weight distribution method based on multiple attention mechanisms of transformers by improving. Applied to the transformer framework model based on the attention mechanism.

transformer框架介绍:Introduction to the transformer framework:

Encoder:由6个相同的layers组成,每一层包含两个sub-layers.第一个sub-layer就是多头注意力层然后是一个简单的全连接层。其中每个sub-layer都加了残差连接和归一)。Encoder: consists of 6 identical layers, each layer contains two sub-layers. The first sub-layer is a multi-head attention layer and then a simple fully connected layer. Residual connections and normalization are added to each sub-layer).

Decoder:由6个相同的Layer组成,但这里的layer和encoder不一样,这里的layer包含了三个sub-layers,其中有一个self-attention layer,encoder-decoder attentionlayer最后是一个全连接层。前两个sub-layer都是基于multi-head attention layer。这里有个特别点就是masking,masking的作用就是防止在训练的时候使用未来的输出的单词。Decoder: It consists of 6 identical layers, but the layer here is different from the encoder. The layer here contains three sub-layers, including a self-attention layer, and the encoder-decoder attentionlayer is finally a fully connected layer. The first two sub-layers are based on the multi-head attention layer. A special point here is masking. The function of masking is to prevent future output words from being used during training.

注意力模型:Attention Model:

encoder-decoder模型虽然非常经典,但是局限性也非常大。较大的局限性就在于编码和解码之间的联系就是一个固定长度的语义向量C。也就是说,编码器要将整个序列的信息压缩进一个固定长度的向量中去。但是这样做有两个弊端,一是语义向量无法完全表示整个序列的信息,二是先输入的内容携带的信息会被后输入的信息稀释掉。输入序列越长,这个现象就越严重。这就使得在解码的时候一开始就没有获得输入序列足够的信息,那么解码时准确率就要打一定折扣。Although the encoder-decoder model is very classic, it is also very limited. The bigger limitation is that the connection between encoding and decoding is a fixed-length semantic vector C. That is, the encoder compresses the entire sequence of information into a fixed-length vector. However, there are two drawbacks in this way. First, the semantic vector cannot fully represent the information of the entire sequence. Second, the information carried by the first input content will be diluted by the later input information. The longer the input sequence, the more severe this phenomenon is. This makes it impossible to obtain enough information of the input sequence at the beginning of decoding, so the accuracy of decoding will be discounted to a certain extent.

为了解决上述问题,在Seq2Seq出现一年之后,注意力模型被提出了。该模型在产生输出的时候,会产生一个注意力范围来表示接下来输出的时候要重点关注输入序列的哪些部分,然后根据关注的区域来产生下一个输出,如此反复。注意力和人的一些行为特征有一定相似之处,人在看一段话的时候,通常只会重点注意具有信息量的词,而非全部词,即人会赋予每个词的注意力权重不同。注意力模型虽然增加了模型的训练难度,但提升了文本生成的效果。To solve the above problems, attention model was proposed one year after Seq2Seq appeared. When the model generates output, it will generate an attention range to indicate which parts of the input sequence to focus on when outputting next, and then generate the next output according to the area of interest, and so on. Attention has certain similarities with some behavioral characteristics of people. When people read a paragraph, they usually only pay attention to words with information, not all words, that is, people will give each word a different attention weight. . Although the attention model increases the training difficulty of the model, it improves the effect of text generation.

第一步,生成该时刻语义向量:The first step is to generate the semantic vector of the moment:

Figure BDA0002218636610000041
Figure BDA0002218636610000041

st=tanh(W[st-1,yt-1])s t =tanh(W[s t-1 , y t-1 ])

第二步,传递隐层信息并预测:The second step is to pass the hidden layer information and predict:

Figure BDA0002218636610000044
Figure BDA0002218636610000044

Figure BDA0002218636610000045
Figure BDA0002218636610000045

现今已经提出了很多个注意力机制模型,比如自注意力机制,多头注意力机制,全部注意力机制,局部注意力机制等等,每种不同的注意力机制有着不同的输出与特点。Many attention mechanism models have been proposed, such as self-attention mechanism, multi-head attention mechanism, total attention mechanism, local attention mechanism, etc. Each different attention mechanism has different outputs and characteristics.

在此的改进就是在注意力函数中修改。The improvement here is to modify the attention function.

在此将所有的注意力机制模型都投入运算中,并将多种注意力机制输出做正则化计算,来逼近最佳输出。运用公式:fin_out=δ1O12O23O3.......+δiOi其中δ12+....+δi=1且δi是我们设置的权重参数。Oi是各种注意力模型的输出,这种正则化计算方法确定了所得的值不会偏离最优值太远,也保存了各个注意力模型的最优性。本发明具体实现步骤为;Here, all attention mechanism models are put into operation, and the outputs of various attention mechanisms are regularized to approximate the optimal output. Using the formula: fin_out=δ 1 O 12 O 23 O 3 .......+δ i O i where δ 12 +....+δ i =1 and δ i is the weight parameter we set. O i is the output of various attention models. This regularization calculation method determines that the obtained values will not deviate too far from the optimal value, and also preserves the optimality of each attention model. The specific implementation steps of the present invention are as follows;

步骤1:在transformer模型中,针对应用情景选取其中较优秀的模型输出。Step 1: In the transformer model, select the best model output for the application scenario.

步骤2:初始化权重序列δ的值,第一次计算时权重序列δ为随机数,并且δ12+....+δi=1;Step 2: Initialize the value of the weight sequence δ, the weight sequence δ is a random number in the first calculation, and δ 12 +....+δ i =1;

步骤3:将各模型输出进行正则化计算并计算出各输出的中心点(与所有值最接近的点),通过计算公式fin_out=δ1O12O23O3.......+δiOi计算出最优的匹配值作为最终输出。Step 3: Regularize the output of each model and calculate the center point of each output (the closest point to all values), through the calculation formula fin_out=δ 1 O 12 O 23 O 3 .. .....+δ i O i to calculate the optimal matching value as the final output.

步骤4:将最终输出代入后续运算中,计算与上一次训练相比损失函数的差值,若损失函数下降,则提高δ中靠中心点的序列比重;若损失函数上升,则提升δ序列中与中心点最远的序列比重,整个过程严格遵守δ12+....+δi=1的规则。Step 4: Substitute the final output into the subsequent operation, and calculate the difference between the loss function and the previous training. If the loss function decreases, increase the proportion of the sequence near the center point in the delta; if the loss function increases, increase the sequence in the delta sequence. The proportion of the sequence farthest from the center point, the whole process strictly obeys the rule of δ 12 +....+δ i =1.

步骤5:多次循环迭代计算,最终确定最佳权重序列δ。Step 5: Iterative calculation in multiple loops, and finally determine the optimal weight sequence δ.

对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (1)

1.一种基于transformer多种注意力机制的权重分配方法,应用基于注意力机制的transformer模型中,其特征在于;包括如下步骤:1. a weight distribution method based on multiple attention mechanisms of transformer, applied in the transformer model based on attention mechanism, is characterized in that; Comprise the following steps: 步骤1:在transformer模型中,针对应用情景选取其中较优秀的模型输出。Step 1: In the transformer model, select the best model output for the application scenario. 步骤2:初始化权重序列δ的值,第一次计算时权重序列δ为随机数,并且δ12+....+δi=1;Step 2: Initialize the value of the weight sequence δ, the weight sequence δ is a random number in the first calculation, and δ 12 +....+δ i =1; 步骤3:将各模型输出进行正则化计算并计算出各输出的中心点(与所有值最接近的点),通过计算公式fin_out=δ1O12O23O3.......+δiOi计算出最优的匹配值作为最终输出;其中δ12+....+δi=1且δi是我们设置的权重参数;Oi是各种注意力模型的输出;Step 3: Regularize the output of each model and calculate the center point of each output (the closest point to all values), through the calculation formula fin_out=δ 1 O 12 O 23 O 3 .. .....+δ i O i calculates the optimal matching value as the final output; where δ 12 +....+δ i =1 and δ i is the weight parameter we set; O i is The output of various attention models; 步骤4:将最终输出代入后续运算中,计算与上一次训练相比损失函数的差值,若损失函数下降,则提高δ中靠中心点的序列比重;若损失函数上升,则提升δ序列中与中心点最远的序列比重,整个过程严格遵守δ12+....+δi=1的规则;Step 4: Substitute the final output into the subsequent operation, and calculate the difference between the loss function and the previous training. If the loss function decreases, increase the proportion of the sequence near the center point in the delta; if the loss function increases, increase the sequence in the delta sequence. The proportion of the sequence farthest from the center point, the whole process strictly follows the rule of δ 12 +....+δ i =1; 步骤5:多次循环迭代计算,最终确定最佳权重序列δ。Step 5: Iterative calculation in multiple loops, and finally determine the optimal weight sequence δ.
CN201910924914.XA 2019-09-27 2019-09-27 Weight distribution method based on multiple attention mechanisms of transformer Active CN110688860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910924914.XA CN110688860B (en) 2019-09-27 2019-09-27 Weight distribution method based on multiple attention mechanisms of transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910924914.XA CN110688860B (en) 2019-09-27 2019-09-27 Weight distribution method based on multiple attention mechanisms of transformer

Publications (2)

Publication Number Publication Date
CN110688860A true CN110688860A (en) 2020-01-14
CN110688860B CN110688860B (en) 2024-02-06

Family

ID=69110821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910924914.XA Active CN110688860B (en) 2019-09-27 2019-09-27 Weight distribution method based on multiple attention mechanisms of transformer

Country Status (1)

Country Link
CN (1) CN110688860B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381581A (en) * 2020-11-17 2021-02-19 东华理工大学 Advertisement click rate estimation method based on improved Transformer
CN112992129A (en) * 2021-03-08 2021-06-18 中国科学技术大学 A method for preserving the monotonicity of the attention mechanism in speech recognition tasks
CN113505193A (en) * 2021-06-01 2021-10-15 华为技术有限公司 Data processing method and related equipment
CN114678011A (en) * 2022-03-29 2022-06-28 贝壳找房网(北京)信息技术有限公司 Speech recognition method, speech recognition apparatus, electronic device, storage medium, and program product

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381581A (en) * 2020-11-17 2021-02-19 东华理工大学 Advertisement click rate estimation method based on improved Transformer
CN112381581B (en) * 2020-11-17 2022-07-08 东华理工大学 A CTR Prediction Method Based on Improved Transformer
CN112992129A (en) * 2021-03-08 2021-06-18 中国科学技术大学 A method for preserving the monotonicity of the attention mechanism in speech recognition tasks
CN113505193A (en) * 2021-06-01 2021-10-15 华为技术有限公司 Data processing method and related equipment
WO2022253074A1 (en) * 2021-06-01 2022-12-08 华为技术有限公司 Data processing method and related device
CN114678011A (en) * 2022-03-29 2022-06-28 贝壳找房网(北京)信息技术有限公司 Speech recognition method, speech recognition apparatus, electronic device, storage medium, and program product

Also Published As

Publication number Publication date
CN110688860B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
CN110688860A (en) A Weight Allocation Method Based on Transformer Multiple Attention Mechanisms
CN107967262B (en) A kind of neural network illiteracy Chinese machine translation method
CN110598221B (en) Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network
CN110674646A (en) A Mongolian-Chinese Machine Translation System Based on Byte Pair Encoding Technology
CN110046656A (en) Multi-modal scene recognition method based on deep learning
CN111143563A (en) Text classification method based on fusion of BERT, LSTM and CNN
CN110276396B (en) Image description generation method based on object saliency and cross-modal fusion features
CN113807079B (en) A sequence-to-sequence based end-to-end joint entity and relation extraction method
CN108763191A (en) A kind of text snippet generation method and system
CN112348911A (en) Semantic constraint-based method and system for generating fine-grained image by stacking texts
CN110032638A (en) A kind of production abstract extraction method based on coder-decoder
CN111444730A (en) Data-enhanced Uyghur-Chinese machine translation system training method and device based on Transformer model
CN112860904B (en) External knowledge-integrated biomedical relation extraction method
CN116188509A (en) A High Efficiency 3D Image Segmentation Method
CN117034950A (en) Long sentence embedding method and system for introducing condition mask comparison learning
CN117435744A (en) Multi-modal knowledge graph representation learning method based on cross-modal semantic alignment
CN113780350A (en) Image description method based on ViLBERT and BilSTM
CN115599984B (en) Retrieval method
CN111026846B (en) An online short text data stream classification method based on feature extension
CN110717342B (en) Distance parameter alignment translation method based on transformer
CN110674647A (en) A layer fusion method and computer equipment based on Transformer model
CN115422939A (en) Fine-grained commodity named entity identification method based on big data
CN113297374B (en) Text classification method based on BERT and word feature fusion
CN113377901B (en) Mongolian text emotion analysis method based on multi-size CNN and LSTM models
CN114691858A (en) An Improved UNILM Abstract Generation Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant