本发明的最佳实施方式The best mode of the present invention
如图2所示,是本发明实施例的语音识别装置的模型结构图;本发明实施例循环神经网络包括:As shown in Figure 2, it is a model structure diagram of a speech recognition device according to an embodiment of the present invention; the cyclic neural network of the embodiment of the present invention includes:
基线模型,由2层LSTM网络层2连接形成。The baseline model is formed by connecting layer 2 of the 2-layer LSTM network.
延伸模型,所述延伸模型包括多层残差网络层3,各层的所述残差网络层3由一层LSTM网络层2和一层加法函数层连接形成,所述残差网络层3的输入端连接上一层网络层的输出,所述加法函数层的两个输入端分别连接所述残差网络层3的LSTM网络层2的输出和上一层网络层的输出,所述加法函数层的输出作为所述残差网络层3的输出。An extended model, the extended model includes a multi-layer residual network layer 3, the residual network layer 3 of each layer is formed by connecting a layer of LSTM network layer 2 and a layer of addition function layer, the residual network layer 3 The input terminal is connected to the output of the upper network layer, and the two input terminals of the addition function layer are respectively connected to the output of the LSTM network layer 2 of the residual network layer 3 and the output of the upper network layer. The addition function The output of the layer is used as the output of the residual network layer 3.
所述延伸模型所包括的所述残差网络层3的深度为1至7层,循环神经网络的深度为3至9层。The depth of the residual network layer 3 included in the extended model is 1 to 7 layers, and the depth of the cyclic neural network is 3 to 9 layers.
所述延伸模型的延伸深度通过训练确认,当增加一层所述残差网络时训练结果变差,则以增加的所述残差网络之前的深度为所述循环神经网络的深度。The extension depth of the extension model is confirmed by training. When the training result becomes worse when a layer of the residual network is added, the depth before the increased residual network is taken as the depth of the recurrent neural network.
本发明实施例中,所述循环神经网络用于语音识别装置。In the embodiment of the present invention, the recurrent neural network is used in a speech recognition device.
所述语音识别装置包括:卷积层1,所述循环神经网络,全连接层4和CTC层5。The speech recognition device includes: a convolutional layer 1, the recurrent neural network, a fully connected layer 4, and a CTC layer 5.
所述卷积层1接收声音的频谱信号,所述卷积层1的输出连接到所述循环神经网络中,所述循环深度网络通过所述全连接层4连接到所述CTC层5。所述CTC层5提高CTC损失函数并用于对语音信号进行训练。The convolutional layer 1 receives sound spectrum signals, the output of the convolutional layer 1 is connected to the recurrent neural network, and the recurrent deep network is connected to the CTC layer 5 through the fully connected layer 4. The CTC layer 5 improves the CTC loss function and is used to train the speech signal.
所述卷积层1的层数为1至3层,所述卷积层1通常为不变卷积层。The number of layers of the convolutional layer 1 is 1 to 3, and the convolutional layer 1 is usually an invariant convolutional layer.
所述全连接层4为1层以上。The fully connected layer 4 is one or more layers.
所述循环神经网络中,每一层网络层包括相同的网络节点;对于LSTM网络层2,网络节点都为LSTM网络节点6;对于残差网络层3,网络节点都为残差网络节点8。由图2所示可知,所述残差网络节点8由一个LSTM网络节点6和一个加法函数节点9组成,加法函数节点9在图2中也采用ADD表示,各所述加法函数节点9组成所述加法函数层。In the recurrent neural network, each network layer includes the same network node; for LSTM network layer 2, the network nodes are all LSTM network nodes 6; for residual network layer 3, the network nodes are all residual network nodes 8. As shown in FIG. 2, it can be seen that the residual network node 8 is composed of an LSTM network node 6 and an addition function node 9. The addition function node 9 is also represented by ADD in FIG. Describe the additive function layer.
所述循环神经网络中的各网络层都为双向网络层。也即在各所述网络层的宽度方向上,不同的网络节点能互相传递信息如虚线圈7的两根箭头线所示。图2中,各网络层仅详细描述了一个网络层的网络节点的详细信息,采用三个点表示网络层中包含有更多的网络节点。Each network layer in the cyclic neural network is a bidirectional network layer. That is, in the width direction of each network layer, different network nodes can transmit information to each other, as shown by the two arrow lines of the dashed circle 7. In Figure 2, each network layer only describes the detailed information of the network nodes of one network layer, and three points are used to indicate that the network layer contains more network nodes.
在所述循环神经网络的深度方向上,各所述网络层的网络节点数相同且具有一一对应的关系。In the depth direction of the recurrent neural network, the number of network nodes in each network layer is the same and has a one-to-one correspondence.
对于一个所述残差网络节点8,前一个所述网络节点的输出分别输入到LSTM网络节点6和加法函数节点9,所述残差网络节点8中的LSTM网络节点6的输出也输入到所述加法函数节点9,以加法函数节点9的输出作为所述残差网络节点8的输出。对于第K+1层网络层为所述残差网络层3时,所述残差网络层3中对应的所述残差网络节点8的输出信号可以采用如下公式表示:For one residual network node 8, the output of the previous network node is input to the LSTM network node 6 and the addition function node 9, respectively, and the output of the LSTM network node 6 in the residual network node 8 is also input to all The addition function node 9 uses the output of the addition function node 9 as the output of the residual network node 8. When the K+1 network layer is the residual network layer 3, the output signal of the residual network node 8 corresponding to the residual network layer 3 can be expressed by the following formula:
output_{k+1}=LSTM_{k+1}(output_k)+output_k;output_{k+1}=LSTM_{k+1}(output_k)+output_k;
其中,output_{k+1}表示第K+1层网络层的所述残差网络节点8的输出即所述加法函数节点9的输出;Wherein, output_{k+1} represents the output of the residual network node 8 of the K+1 network layer, that is, the output of the addition function node 9;
output_{k}表示第K层网络层的所述残差网络节点8的输出即所述加法函数节点9的输出;output_{k} represents the output of the residual network node 8 of the K-th network layer, that is, the output of the addition function node 9;
LSTM_{k+1}()表示第K+1层网络层的所述残差网络节点8中的LSTM网络节点6的函数表达式;LSTM_{k+1}() represents the functional expression of the LSTM network node 6 in the residual network node 8 of the K+1 network layer;
LSTM_{k+1}(output_k)则表示输入为output_k时第K+1层网络层的所述残差网络节点8中的LSTM网络节点6的输出。LSTM_{k+1}(output_k) represents the output of the LSTM network node 6 in the residual network node 8 of the K+1 network layer when the input is output_k.
而对于基线模型,即前两个所述LSTM网络层2,各LSTM网络节点6的输出信号为:LSTM_{k}(output_{k-1});LSTM_{k}()表示第K层LSTM网络层2的所述LSTM网络节点6的函数表达式;LSTM_{k}(output_{k-1})则表示输入为output_{k-1}时第K层LSTM网络层2的所述LSTM网络节点6的输出。For the baseline model, that is, the first two LSTM network layers 2, the output signal of each LSTM network node 6 is: LSTM_{k}(output_{k-1}); LSTM_{k}() represents the K-th LSTM The functional expression of the LSTM network node 6 of the network layer 2; LSTM_{k}(output_{k-1}) represents the LSTM network of the K-th LSTM network layer 2 when the input is output_{k-1} The output of node 6.
本发明实施例循环神经网络在由2层LSTM网络层2组成的基线模型的基础上,增加了残差网络层3且残差网络层3是由LSTM网络层2和加法函数层连接形成,残差网络层3能在增加循环神经网络的深度的同时还能保持收敛,最后能实现提高网络深度,并从而能提高训练效果和性能。In the embodiment of the present invention, the recurrent neural network adds a residual network layer 3 on the basis of a baseline model composed of two layers of LSTM network layer 2. The residual network layer 3 is formed by connecting the LSTM network layer 2 and the additive function layer. The difference network layer 3 can increase the depth of the recurrent neural network while maintaining convergence, and finally can realize the improvement of the network depth, and thus can improve the training effect and performance.
如图3所示,是本发明实施例循环神经网络训练方法的流程图;本发明实施例循环神经网络的训练方法包括如下步骤:As shown in FIG. 3, it is a flowchart of a cyclic neural network training method according to an embodiment of the present invention; the training method of a cyclic neural network according to an embodiment of the present invention includes the following steps:
步骤一、提供循环神经网络的基线模型,所述基线模型由2层LSTM网络层2连接形成。步骤一对应于图3中标记301所示步骤。Step 1: Provide a baseline model of the recurrent neural network, the baseline model is formed by connecting two layers of the LSTM network. Step one corresponds to the step marked 301 in FIG. 3.
步骤二、对所述基线模型进行初始化,该初始化对应于图3中标记302所示步骤。Step 2: Initialize the baseline model, and this initialization corresponds to the step marked 302 in FIG. 3.
从第1层所述LSTM网络层2开始对所述循环神经网络进行训练。图3中,对第1层所述LSTM网络层2的训练步骤未直接示意,包括在所述初始化的步骤中。图3中标记303对应的步骤是从K=2开始的,K大于2时对应于后续的延伸模型的训练。The training of the recurrent neural network starts from the first layer of the LSTM network layer 2. In FIG. 3, the training step of the LSTM network layer 2 of the first layer is not directly illustrated, and it is included in the initialization step. The step corresponding to mark 303 in FIG. 3 starts from K=2, and when K is greater than 2, it corresponds to the subsequent training of the extension model.
步骤三、在所述基线模型的基础上增加延伸模型,所述延伸模型包括多层残差网络层3,各层的所述残差网络层3由一层LSTM网络层2和一层加法函数层连接形成,所述残差网络层3的输入端连接上一层网络层的输出,所述加法函数层的两个输入端分别连接所述残差网络层3的LSTM网络层2的输出和上一层网络层的输出,所述加法函数层的输出作为所述残差网络层3的输出。Step 3: Add an extended model based on the baseline model. The extended model includes a multi-layer residual network layer 3. The residual network layer 3 of each layer consists of a layer of LSTM network layer 2 and a layer of addition function Layer connection is formed, the input end of the residual network layer 3 is connected to the output of the upper network layer, and the two input ends of the addition function layer are respectively connected to the output of the LSTM network layer 2 of the residual network layer 3 and The output of the upper network layer, and the output of the addition function layer is used as the output of the residual network layer 3.
每增加一层所述残差网络层3,则进行一次所述循环神经网络的训练即标记303对应的训练,增加所述残差网络层3的分步骤包括:Each time a layer of the residual network layer 3 is added, the training of the recurrent neural network is performed once, that is, the training corresponding to the mark 303. The sub-steps of adding the residual network layer 3 include:
步骤31、增加一层新的所述残差网络层3,令新增加的所述残差网络层3为第K+1层,前K层网络层都已训练好,采用以训练好的模型对前K层网络层进行初始化,第K+1层网络采用随机参数进行初始化。Step 31: Add a new layer of the residual network layer 3 to make the newly added layer 3 of the residual network layer K+1. The previous K-layer network layers have been trained, and the trained model is used The first K-layer network layer is initialized, and the K+1-th layer network is initialized with random parameters.
如标记307对应的步骤所示,通常增加了一层所述残差网络层3后,为了便于循环训练,通常会重新设置K,K=K+1。As shown in the step corresponding to mark 307, after adding a layer of the residual network layer 3, in order to facilitate the loop training, K is usually reset, and K=K+1.
之后,如标记308对应的步骤所示,由于重新设置了K值之后,则有:前K-1层网络层采用已训练参数对进行初始化,第K层网络层采用随机参数初始化。After that, as shown in the step corresponding to the mark 308, after the K value is reset, the first K-1 network layer is initialized with the trained parameter pair, and the K-th network layer is initialized with random parameters.
步骤32、对第K+1层所述残差网络层3进行训练。即进行标记303所示步骤。Step 32: Train the residual network layer 3 of the K+1th layer. That is, the step indicated by mark 303 is performed.
步骤33、进行性能测试,检查性能测试结果的提升值是否大于阈值。即进行标记304所示步骤。Step 33: Perform a performance test to check whether the promotion value of the performance test result is greater than the threshold value. That is, the step indicated by mark 304 is performed.
参考标记304对应的步骤所示:The steps corresponding to reference mark 304 are as follows:
如果所述性能测试结果的提升值大于阈值,则进行步骤34。步骤33中的所述阈值为3%。If the promotion value of the performance test result is greater than the threshold value, step 34 is performed. The threshold in step 33 is 3%.
如果所述性能测试结果的提升值小于阈值,则进行步骤35。If the promotion value of the performance test result is less than the threshold value, step 35 is performed.
步骤34、将第K+1层所述残差网络层3增加到所述循环神经网络中,之后重复步骤31。Step 34: Add the residual network layer 3 of the K+1th layer to the recurrent neural network, and then repeat step 31.
步骤35、如标记309对应的步骤所示,训练结束,停止继续增加所述残差网络层3,以已有的K层网络层作为所述循环神经网络。Step 35: As shown in the step corresponding to the mark 309, the training ends, stop adding the residual network layer 3, and use the existing K-layer network layer as the recurrent neural network.
本发明实施例方法能实现:所述延伸模型所包括的所述残差网络层3的深度为1至7层,循环神经网络的深度为3至9层。The method of the embodiment of the present invention can realize that: the depth of the residual network layer 3 included in the extended model is 1 to 7 layers, and the depth of the cyclic neural network is 3 to 9 layers.
本发明实施例方法中,所述循环神经网络用于语音识别装置。In the method of the embodiment of the present invention, the recurrent neural network is used in a speech recognition device.
所述语音识别装置包括:卷积层1,所述循环神经网络,全连接层4和CTC层5。The speech recognition device includes: a convolutional layer 1, the recurrent neural network, a fully connected layer 4, and a CTC layer 5.
所述卷积层1接收声音的频谱信号,所述卷积层1的输出连接到所述循环神经网络中,所述循环深度网络通过所述全连接层4连接到所述CTC层5。所述CTC层5提高CTC损失函数并用于对语音信号进行训练。The convolutional layer 1 receives sound spectrum signals, the output of the convolutional layer 1 is connected to the recurrent neural network, and the recurrent deep network is connected to the CTC layer 5 through the fully connected layer 4. The CTC layer 5 improves the CTC loss function and is used to train the speech signal.
所述卷积层1的层数为1至3层,所述卷积层1通常为不变卷积层。The number of layers of the convolutional layer 1 is 1 to 3, and the convolutional layer 1 is usually an invariant convolutional layer.
所述全连接层4为1层以上。The fully connected layer 4 is one or more layers.
所述循环神经网络中,每一层网络层包括相同的网络节点;对于LSTM网络层2,网络节点都为LSTM网络节点6;对于残差网络层3,网络节点都为残差网络节点8。由图2所示可知,所述残差网络节点8由一个LSTM网络节点6和一个加法函数节点9组成,加法函数节点9在图2中也采用ADD表示,各所述加法函数节点9组成所述加法函数层。In the recurrent neural network, each network layer includes the same network node; for LSTM network layer 2, the network nodes are all LSTM network nodes 6; for residual network layer 3, the network nodes are all residual network nodes 8. As shown in FIG. 2, it can be seen that the residual network node 8 is composed of an LSTM network node 6 and an addition function node 9. The addition function node 9 is also represented by ADD in FIG. Describe the additive function layer.
所述循环神经网络中的各网络层都为双向网络层。也即在各所述网络层的宽度方向上,不同的网络节点能互相传递信息如虚线圈7的两根箭头线所示。图2中,各网络层仅详细描述了一个网络层的网络节点的详细信息,采用三个点表示网络层中包含有更多的网络节点。Each network layer in the cyclic neural network is a bidirectional network layer. That is, in the width direction of each network layer, different network nodes can transmit information to each other, as shown by the two arrow lines of the dashed circle 7. In Figure 2, each network layer only describes the detailed information of the network nodes of one network layer, and three points are used to indicate that the network layer contains more network nodes.
在所述循环神经网络的深度方向上,各所述网络层的网络节点数相同且具有一一对应的关系。In the depth direction of the recurrent neural network, the number of network nodes in each network layer is the same and has a one-to-one correspondence.
对于一个所述残差网络节点8,前一个所述网络节点的输出分别输入到LSTM网络节点6和加法函数节点9,所述残差网络节点8中的LSTM网络节点6的输出也输入到所述加法函数节点9,以加法函数节点9的输出作为所述残差网络节点8的输出。对于第K+1层网络层为所述残差网络层3时,所述残差网络层3中对应的所述残差网络节点8的输出信号可以采用如下公式表示:For one residual network node 8, the output of the previous network node is input to the LSTM network node 6 and the addition function node 9, respectively, and the output of the LSTM network node 6 in the residual network node 8 is also input to all The addition function node 9 uses the output of the addition function node 9 as the output of the residual network node 8. When the K+1 network layer is the residual network layer 3, the output signal of the residual network node 8 corresponding to the residual network layer 3 can be expressed by the following formula:
output_{k+1}=LSTM_{k+1}(output_k)+output_k;output_{k+1}=LSTM_{k+1}(output_k)+output_k;
其中,output_{k+1}表示第K+1层网络层的所述残差网络节点8的输出即所述加法函数节点9的输出;Wherein, output_{k+1} represents the output of the residual network node 8 of the K+1 network layer, that is, the output of the addition function node 9;
output_{k}表示第K层网络层的所述残差网络节点8的输出即所述加法函数节点9的输出;output_{k} represents the output of the residual network node 8 of the K-th network layer, that is, the output of the addition function node 9;
LSTM_{k+1}()表示第K+1层网络层的所述残差网络节点8中的LSTM网络节点6的函数表达式;LSTM_{k+1}() represents the functional expression of the LSTM network node 6 in the residual network node 8 of the K+1 network layer;
LSTM_{k+1}(output_k)则表示输入为output_k时第K+1层网络层的所述残差网络节点8中的LSTM网络节点6的输出。LSTM_{k+1}(output_k) represents the output of the LSTM network node 6 in the residual network node 8 of the K+1 network layer when the input is output_k.
而对于基线模型,即前两个所述LSTM网络层2,各LSTM网络节点6的输出信号为:LSTM_{k}(output_{k-1});LSTM_{k}()表示第K层LSTM网络层2的所述LSTM网络节点6的函数表达式;LSTM_{k}(output_{k-1})则表示输入为output_{k-1}时第K层LSTM网络层2的所述LSTM网络节点6的输出。For the baseline model, that is, the first two LSTM network layers 2, the output signal of each LSTM network node 6 is: LSTM_{k}(output_{k-1}); LSTM_{k}() represents the K-th LSTM The functional expression of the LSTM network node 6 of the network layer 2; LSTM_{k}(output_{k-1}) represents the LSTM network of the K-th LSTM network layer 2 when the input is output_{k-1} The output of node 6.
以上通过具体实施例对本发明进行了详细的说明,但这些并非构成对本发明的限制。在不脱离本发明原理的情况下,本领域的技术人员还可做出许多变形和改进,这些也应视为本发明的保护范围。The present invention has been described in detail through specific embodiments above, but these do not constitute a limitation to the present invention. Without departing from the principle of the present invention, those skilled in the art can make many modifications and improvements, which should also be regarded as the protection scope of the present invention.