CN110458802A

CN110458802A - Stereoscopic image quality evaluation method based on normalization of projection weights

Info

Publication number: CN110458802A
Application number: CN201910580586.6A
Authority: CN
Inventors: 李素梅; 王明毅; 赵平
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2019-11-15

Abstract

The invention belongs to the field of image processing, in order to propose a new image quality evaluation method, which can keep consistency with the subjective evaluation of human eyes, and solve the morbid problem in the network training process. It provides a research idea for the deep learning method of stereo image quality evaluation, and promotes the development of stereo imaging technology on a certain basis. To this end, the present invention, based on a stereoscopic image quality evaluation method based on the normalization of projection weights, fuses the left and right viewpoint maps of the stereoscopic image to obtain a single fused image, and then preprocesses the single image: dicing and normalizing Build a deep convolutional neural network model, use the preprocessed image slicing as the input of the deep convolutional neural network, and use the projection weight normalization and data batch normalization for the deep convolutional neural network. The structure is optimized, and the quality evaluation result of the stereo image is obtained through the output of the deep convolutional neural network. The present invention is mainly applied to image processing occasions.

Description

Stereoscopic image quality evaluation method based on normalization of projection weights

技术领域technical field

本发明属于图像处理领域，涉及到图像融合以及深度学习在立体图像质量评价中的应用与优化。The invention belongs to the field of image processing, and relates to the application and optimization of image fusion and deep learning in stereoscopic image quality evaluation.

背景技术Background technique

立体成像技术可给人们带来较好的视觉体验，但从立体图像的采集到显示均会产生降质问题^[1-2]，降质图像会影响人们对立体内容的感知，因此如何对立体图像的质量进行合理高效地评定已成为立体信息领域的研究热点之一。立体图像质量评价方法主要分为主观评价和客观评价。但主观评价实验耗时耗力，代价较大。而客观评价具有较强的可操作性。因此，建立合理、高效的立体图像质量客观评价机制具有十分重要的现实意义。Stereoscopic imaging technology can bring people a better visual experience, but from the collection of stereoscopic images to the display, there will be degradation problems ^[1-2] , and degraded images will affect people's perception of stereoscopic content. Reasonable and efficient assessment of image quality has become one of the hotspots in the field of stereoscopic information. Stereoscopic image quality evaluation methods are mainly divided into subjective evaluation and objective evaluation. However, the subjective evaluation experiment is time-consuming and labor-intensive, and the cost is high. The objective evaluation has strong operability. Therefore, it is of great practical significance to establish a reasonable and efficient objective evaluation mechanism for stereoscopic image quality.

截至目前，研究学者已提出多种立体图像质量评价方法，大致可分为传统方法和人工神经网络的方法。绝大多数传统方法分别对左、右视图进行特征提取，然后对左、右视图的质量分数进行加权，得到最终的客观评价值^[3-7]。但传统方法所提取到的特征不一定能真实反映图像的本质特征。为了更好地模拟人眼提取特征的机制，研究学者将人工神经网络应用于立体图像质量评价，如[8-10]等将浅层神经网络应用于立体图像客观质量评价，但网络的层数较少，结构较为简单，不能更准确地模拟人类视觉系统分层级处理信息的过程。相比浅层神经网络，深度学习更能模拟人脑处理信息的方式，可通过深层次网络对特征进行逐层提取。卷积神经网络(Convolutional Neural Network,CNN)是深度学习中的经典网络，适用于计算机视觉、自然语言处理等领域。Zhang Wei等人将卷积神经网络应用于立体图像质量评价，用2个卷积层、2个池化层进行特征提取，并在网络的最后引入多层感知机(Multi-layer Perception,MLP)，将所学习到的特征进行全连接从而得到质量分数^[11]；陈慧等人采用具有12个卷积层的卷积神经网络模型^[12]、Ding等人采用具有5个卷积层的卷积神经网络模型，所得到的客观评价分数与人眼主观评价分数具有较高的一致性^[13]。目前立体图像质量评价领域内所采用的深层神经网络的结构存在一定的局限性：一方面，网络内部卷积核之间排列方式较为简单，均按顺序进行连接，提取到的特征较为单一；另一方面，组成网络的层均为最基本的卷积层、池化层和全连接层，功能较少，没有进行规范化，致使网络无法处理梯度弥散问题。Up to now, researchers have proposed a variety of stereo image quality evaluation methods, which can be roughly divided into traditional methods and artificial neural network methods. Most of the traditional methods perform feature extraction on the left and right views respectively, and then weight the quality scores of the left and right views to obtain the final objective evaluation value ^[3-7] . However, the features extracted by traditional methods may not truly reflect the essential features of the image. In order to better simulate the mechanism of feature extraction by the human eye, researchers apply artificial neural network to stereo image quality evaluation, such as [8-10], etc. The structure is relatively simple, and it cannot more accurately simulate the process of hierarchical processing of information by the human visual system. Compared with shallow neural networks, deep learning can better simulate the way the human brain processes information, and features can be extracted layer by layer through deep networks. Convolutional Neural Network (CNN) is a classic network in deep learning, suitable for computer vision, natural language processing and other fields. Zhang Wei et al. applied convolutional neural network to stereo image quality evaluation, used 2 convolutional layers and 2 pooling layers for feature extraction, and introduced a Multi-layer Perception (MLP) at the end of the network. , the learned features are fully connected to obtain the quality score ^[11] ; Chen Hui et al. adopted a convolutional neural network model with 12 convolutional layers ^[12] , Ding et al. adopted a convolutional neural network model with 5 convolutional layers. Convolutional neural network model, the obtained objective evaluation scores and human subjective evaluation scores have high consistency ^[13] . The structure of the deep neural network currently used in the field of stereo image quality evaluation has certain limitations: on the one hand, the arrangement of the convolution kernels in the network is relatively simple, they are all connected in sequence, and the extracted features are relatively single; On the one hand, the layers that make up the network are the most basic convolutional layers, pooling layers, and fully-connected layers, with fewer functions and no normalization, which makes the network unable to deal with the gradient dispersion problem.

另外，在实际研究中发现，人脑在感知立体图像时，首先对左、右视图进行融合，而后分层级对融合图像进行处理^[14]。Lin等人运用传统方法对融合后的立体图像进行质量评价，但仅仅融合了相位图和幅度图^[15]。为了更好地模拟该特征，采用深度学习对立体图像质量进行评价的文献(如[16])也开始采用融合图像进行处理，但该文献的融合方法未考虑发生增益增强和增益控制的门限^[17]。In addition, it is found in practical research that when the human brain perceives stereo images, it first fuses the left and right views, and then processes the fused images hierarchically ^[14] . Lin et al. used traditional methods to evaluate the quality of the fused stereo images, but only fused the phase map and the magnitude map ^[15] . In order to better simulate this feature, the literature that uses deep learning to evaluate the quality of stereo images (such as [16]) also begins to use fusion images for processing, but the fusion method of this literature does not consider the threshold for gain enhancement and gain control ^{[ 17]} .

针对以上问题，本发明提出了一种基于深度卷积神经网络的立体图像质量评价模型，将预处理后的融合图像作为网络的输入，使网络的学习过程更加符合人眼视觉特性。模型引入数据批量归一化层(Batch Normalization,BN)来保证网络输出数据与输入数据同分布，避免梯度消失；引入投影权值归一化层(Projection Based Weight Normalization,PBWN)来规范化不同量级的参数，缓解海森(Hessian)矩阵的病态现象，从而提高网络的学习能力。模型第一阶段为卷积核并行连接模块，第二阶段为卷积核按顺序连接模块，并引入残差单元来避免网络发生退化，最后引入全连接层，完成立体图像的质量评价。In view of the above problems, the present invention proposes a stereo image quality evaluation model based on a deep convolutional neural network. The preprocessed fusion image is used as the input of the network, so that the learning process of the network is more in line with the visual characteristics of the human eye. The model introduces a data batch normalization layer (Batch Normalization, BN) to ensure that the network output data and the input data are equally distributed to avoid the disappearance of gradients; the projection based weight normalization layer (PBWN) is introduced to normalize different orders of magnitude. The parameters of , alleviate the ill-conditioned phenomenon of the Hessian matrix, thereby improving the learning ability of the network. The first stage of the model is the convolution kernel parallel connection module, the second stage is the convolution kernel connecting the modules in sequence, and the residual unit is introduced to avoid the degradation of the network, and finally the fully connected layer is introduced to complete the quality evaluation of the stereo image.

发明内容SUMMARY OF THE INVENTION

为克服现有技术的不足，本发明旨在基于深度卷积神经网络，提出一种基于投影权值归一化的立体图像质量评价方法。此方法性能较好，与人眼主观评价保持一致性，且引入数据批量归一化与投影权值归一化，解决网络训练过程中的病态问题。此方法为立体图像质量评价的深度学习方法提供了研究思路，在一定基础上推动立体成像技术的发展。为此，本发明采取的技术方案是，基于投影权值归一化的立体图像质量评价方法，将立体图像的左右视点图进行融合，得到单幅融合图像，然后对单幅图像进行预处理：切块与归一化；搭建深层次卷积神经网络模型，将预处理后的图像切块作为深层次卷积神经网络的输入，并采用投影权值归一化与数据批量归一化对深层次卷积神经网络结构进行优化，通过深层次卷积神经网络的输出得到立体图像的质量评价结果。In order to overcome the deficiencies of the prior art, the present invention aims to propose a stereo image quality evaluation method based on the normalization of projection weights based on a deep convolutional neural network. This method has better performance and is consistent with the subjective evaluation of the human eye. It also introduces batch normalization of data and normalization of projection weights to solve the ill-posed problem in the process of network training. This method provides a research idea for the deep learning method of stereo image quality evaluation, and promotes the development of stereo imaging technology on a certain basis. To this end, the technical solution adopted by the present invention is to fuse the left and right viewpoint images of the stereoscopic image based on the stereoscopic image quality evaluation method normalized by the projection weight to obtain a single fused image, and then preprocess the single image: Slicing and normalization; build a deep convolutional neural network model, use the preprocessed image slicing as the input of the deep convolutional neural network, and use projection weight normalization and data batch normalization to analyze the depth The structure of the hierarchical convolutional neural network is optimized, and the quality evaluation result of the stereo image is obtained through the output of the deep convolutional neural network.

融合图像的获取具体步骤The specific steps of obtaining the fusion image

采用Gabor滤波器，该滤波器有6个尺度f_s∈{1.5,2.5,3.5,5,7,10}}和8个方向θ∈{kπ/8|k＝0,1…7}，将经过Gabor滤波后的左、右视图按照公式(1)融合成为一幅图像。Using a Gabor filter with 6 scales f _s ∈ {1.5, 2.5, 3.5, 5, 7, 10 }} and 8 directions θ ∈ {kπ/8|k=0,1…7}, the The left and right views after Gabor filtering are fused into one image according to formula (1).

其中，I_l(x,y)与I_r(x,y)分别表示左、右视图中位于位置(x,y)的像素值，C(x,y)表示融合图像的像素值，TCE表示对本视点的增强分量，TCE^*表示对另一视点的抑制分量，计算方式如公式(2)、(3)所示：Among them, I _l (x, y) and I _r (x, y) represent the pixel value at position (x, y) in the left and right views respectively, C(x, y) represents the pixel value of the fused image, and TCE represents For the enhancement component of this viewpoint, TCE ^* represents the suppression component of another viewpoint, and the calculation method is shown in formulas (2) and (3):

其中，t表示左视点或者右视点，gc表示增强门限，ge表示控制门限，经Gabor滤波后得到48幅图像，表示t视点的第n幅图像被对比敏感度函数滤除的频率信息,表示t视点的第n幅图像的权重，i、j分别表示Gabor滤波的6个尺度f_s∈{1.5,2.5,3.5,5,7,10}(cycles/degree)和8个方向θ∈{kπ/8|k＝0,1…7}；Among them, t represents the left view or right view, gc represents the enhancement threshold, ge represents the control threshold, and 48 images are obtained after Gabor filtering, represents the frequency information of the nth image of the t viewpoint filtered by the contrast sensitivity function, Represents the weight of the n-th image of the t viewpoint, i, j represent the Gabor filtering 6 scales f _s ∈ {1.5, 2.5, 3.5, 5, 7, 10} (cycles/degree) and 8 directions θ ∈ { kπ/8|k=0,1...7};

图像预处理image preprocessing

归一化计算过程如公式(5)所示：The normalization calculation process is shown in formula (5):

其中，I(x,y)表示位于(x,y)坐标点的像素值，μ(x,y)为像素值的平均值，σ(x,y)为像素值的标准差，ε为无限趋近于0的任意正数。Among them, I(x,y) represents the pixel value at the (x,y) coordinate point, μ(x,y) is the average value of the pixel value, σ(x,y) is the standard deviation of the pixel value, and ε is infinite Any positive number approaching 0.

卷积神经网络模型Convolutional Neural Network Model

基于多尺度提取特征Inception结构和残差网络结构块Block，搭建同时具有两种卷积核排列方式的深层次卷积神经网络模型，该模型的输入为切割后的小块，模型包含1个Inception结构，1个卷积层，3个Block结构，1个池化层与1个全连接层，在网络Inception结构内部的同一层中，通过不同大小的卷积核并行运算，提取到图像不同规模的特征，并且引入1×1大小的卷积核来减少网络参数，降低计算复杂度。Based on the multi-scale extraction feature Inception structure and the residual network structure block Block, a deep convolutional neural network model with two convolution kernel arrangements is built. The input of the model is the cut block, and the model contains 1 Inception Structure, 1 convolution layer, 3 Block structures, 1 pooling layer and 1 fully connected layer, in the same layer inside the network Inception structure, through parallel operations of different sizes of convolution kernels, images of different scales are extracted features, and introduce a 1×1 convolution kernel to reduce network parameters and reduce computational complexity.

(1)投影权值归一化(1) Normalization of projection weights

在网络寻求最优解的规划问题中，添加对各层权值矩阵W的约束：In the planning problem that the network seeks the optimal solution, constraints on the weight matrix W of each layer are added:

min l(y,f(x；W))min l(y,f(x;W))

其中W＝{w_i,i＝1,2…L}表示网络权值矩阵的集合，集合中的元素为第1层到第L层各层的权值矩阵，l(y,f(x；W))表示损失函数，以y为期望输出，f(x；W)为实际输出。表示保留矩阵的主对角元素且使矩阵所有非对角元素均变为0。where W={ _wi ,i=1,2...L} represents the set of network weight matrices, and the elements in the set are the weight matrices of each layer from the 1st layer to the Lth layer, l(y,f(x; W)) represents the loss function, with y as the expected output and f(x; W) as the actual output. Represents the retention matrix the main diagonal elements of and make the matrix All off-diagonal elements become 0.

该约束将各层的权值矩阵规定在流形空间的一个子空间内，即各层权值矩阵w均满足This constraint defines the weight matrix of each layer in a subspace of the manifold space, that is, the weight matrix w of each layer satisfies the

ddiag(ww^T)＝E (7)ddiag(ww ^T )=E (7)

采用黎曼Riemannian优化理论求解该约束，可得流形空间内的黎曼梯度为Using the Riemannian optimization theory to solve this constraint, the Riemann gradient in the manifold space can be obtained as

其中，为无约束情况下求得的梯度。当每个神经元的权值矩阵ω满足单位规范化，即ωω^T＝1，基于公式(8)可得其黎曼梯度即为：in, is the gradient obtained in the unconstrained case. When the weight matrix ω of each neuron satisfies the unit normalization, that is, ωω ^T =1, the Riemann gradient can be obtained based on formula (8) as:

黎曼梯度相比原始梯度减少了一项对所减少的这一项的范数进行分析：The Riemann gradient is reduced by one term compared to the original gradient Analyze the norm of this reduced term:

采用原始梯度进行计算来减小计算量，采用公式(11)进行权值更新：The original gradient is used for calculation to reduce the amount of calculation, and formula (11) is used to update the weights:

(2)数据批量归一化(2) Batch normalization of data

数据批量归一化的方法如公式(12)所示，在训练过程中，对每一批量batch的数据计算均值μ和方差σ²，对每个特征x_i进行处理，得到经过数据批量归一化处理后的激活为y_i。The method of data batch normalization is shown in formula (12). During the training process, the mean μ and variance σ ² are calculated for the data of each batch, and each feature _xi is processed to obtain the batch normalized data. The activation after transformation is _yi .

在测试时，用所有训练batch的均值表示E[x]，用所有训练batch方差的无偏估计表示var[x]，如公式(13)、(14)所示，m为各个batch的大小。During testing, E[x] is represented by the mean of all training batches, and var[x] is represented by an unbiased estimate of the variance of all training batches, as shown in formulas (13) and (14), where m is the size of each batch.

E[x]＝E_B[μ_B] (13)E[x]=E _B [μ _B ] (13)

故测试阶段，数据批量归一化的公式如公式(15)所示，参数γ,β的功能是缩放和平移，恢复模型的表达能力，提高网络泛化性能：Therefore, in the test phase, the formula for batch normalization of data is shown in formula (15). The functions of parameters γ and β are to zoom and translate, restore the expressive ability of the model, and improve the generalization performance of the network:

本发明的特点及有益效果是：The characteristics and beneficial effects of the present invention are:

本发明基于深度卷积神经网络提出了一种引入投影权值归一化的立体图像质量评价方法，对立体图像质量评价的识别率较高。CNN模型通过卷积核顺行与并行两个模块，对预处理后的融合立体图像进行特征提取，使网络对图像的学习更为充分。相比现有深度学习评价算法，本发明引入BN与PBWN进行网络优化，解决网络训练过程中的病态问题，有效地提高了网络评价准确性。Based on the deep convolutional neural network, the present invention proposes a stereoscopic image quality evaluation method by introducing the normalization of projection weights, and the recognition rate of the stereoscopic image quality evaluation is high. The CNN model extracts the features of the preprocessed fused stereo images through two modules of convolution kernels, anterograde and parallel, so that the network can learn more fully from the images. Compared with the existing deep learning evaluation algorithm, the present invention introduces BN and PBWN for network optimization, solves the ill-posed problem in the network training process, and effectively improves the accuracy of network evaluation.

本发明的立体图像质量评价方法考虑了人眼视觉机制，将预处理后的融合图像作为网络的输入，并且引入了对深层次卷积神经网络结构的优化，有效地提高了网络的性能。实验表明，本发明的评价结果与主观质量具有较好的一致性。The stereoscopic image quality evaluation method of the present invention considers the human visual mechanism, takes the preprocessed fused image as the input of the network, and introduces the optimization of the structure of the deep convolutional neural network, which effectively improves the performance of the network. Experiments show that the evaluation results of the present invention have good consistency with subjective quality.

附图说明：Description of drawings:

图1本方法的具体流程图。Figure 1 is a specific flow chart of the method.

具体实施方式Detailed ways

本发明采取的技术方案步骤是，先将立体图像的左右视点图进行融合，得到单幅融合图像，然后对单幅图像进行切块与归一化。搭建深层次卷积神经网络模型，将预处理后的图像小块作为深层次卷积神经网络的输入，并采用投影权值归一化与数据批量归一化对深层次卷积神经网络结构进行优化，通过深层次卷积神经网络的输出得到立体图像的质量。The steps of the technical solution adopted in the present invention are as follows: firstly, the left and right viewpoint maps of the stereoscopic image are fused to obtain a single fused image, and then the single image is sliced and normalized. Build a deep convolutional neural network model, take the preprocessed image small blocks as the input of the deep convolutional neural network, and use the projection weight normalization and data batch normalization to carry out the deep convolutional neural network structure. Optimization, the quality of the stereo image is obtained through the output of the deep convolutional neural network.

1、融合图像1. Fusion images

受生物科学人类视觉系统(Human Visual System,HVS)中人眼双目竞争现象的启发，本发明采用融合图像方法，采用Gabor滤波器对图像进行滤波。Gabor滤波器具有六个尺度f_s∈{1.5,2.5,3.5,5,7,10}(cycles/degree)}和八个方向θ∈{kπ/8|k＝0,1,…7}。滤波后可得到各视点各通道的48张特征图。根据双目竞争机制，结合本视点的增强分量与对另一视点的抑制分量，计算得到最终融合图像。Inspired by the binocular competition phenomenon of human eyes in the biological science human visual system (Human Visual System, HVS), the present invention adopts a fusion image method and uses a Gabor filter to filter the image. The Gabor filter has six scales f _s ∈ {1.5, 2.5, 3.5, 5, 7, 10} (cycles/degree)} and eight directions θ ∈ {kπ/8|k=0,1,…7}. After filtering, 48 feature maps of each channel of each viewpoint can be obtained. According to the binocular competition mechanism, the final fusion image is calculated by combining the enhancement component of this viewpoint and the suppression component of another viewpoint.

2、深度学习2. Deep Learning

选用深度学习中起步较早，发展较为成熟的卷积神经网络算法。基于Inception结构和Block结构^[18-19]，本发明搭建同时具有以上两种卷积核排列方式的深层次卷积神经网络模型。The convolutional neural network algorithm that started earlier and developed more maturely in deep learning is selected. Based on the Inception structure and the Block structure ^[18-19] , the present invention builds a deep convolutional neural network model with both the above two convolution kernel arrangements.

3、网络结构的优化3. Optimization of network structure

在网络中引入数据批量归一化层(Batch Normalization,BN)来保证网络输出数据与输入数据同分布，避免梯度消失；引入投影权值归一化层(Projection Based WeightNormalization,PBWN)来规范化不同量级的参数，缓解Hessian矩阵的病态现象，从而提高网络的学习能力。Introduce a data batch normalization layer (Batch Normalization, BN) into the network to ensure that the network output data and the input data are equally distributed to avoid gradient disappearance; introduce a projection weight normalization layer (Projection Based Weight Normalization, PBWN) to normalize different quantities It can alleviate the ill-conditioned phenomenon of the Hessian matrix and improve the learning ability of the network.

投影权值归一化的目的是解决深度学习非线性网络中由于缩放权值空间对称性导致的网络训练病态问题^[20]。缩放权值空间对称性使Hessian矩阵陷入病态，导致网络在训练中容易陷入局部最优值，不利于网络寻求全局最优解^[21]。为了缓解该问题，将权值进行单位规范化，从而确保各层权值的量级相同。The purpose of normalization of projection weights is to solve the ill-conditioned network training problem caused by the spatial symmetry of scaling weights in deep learning nonlinear networks ^[20] . The symmetry of the scaling weight space makes the Hessian matrix fall into an ill-conditioned state, which makes the network easy to fall into the local optimal value during training, which is not conducive to the network seeking the global optimal solution ^[21] . To alleviate this problem, the weights are unit-normalized to ensure that the magnitudes of the weights at each layer are the same.

数据批量归一化可避免数据分布逐渐偏移，有效解决原空间与目标空间分布不一致的问题^[22]，对于经过神经元激活后的输出，进行归一化处理之后再送入下一层的神经元进行激活，规避梯度弥散与梯度爆炸。并且引入可学习重构参数，提高网络学习能力与泛化能力。Batch normalization of data can avoid the gradual shift of the data distribution, and effectively solve the problem of inconsistency between the original space and the target space ^[22] . Elements are activated to avoid gradient dispersion and gradient explosion. And the learnable reconstruction parameters are introduced to improve the network learning ability and generalization ability.

本发明在公开的立体图像库LIVE I和LIVE II上进行实验。LIVE-3DⅠ有20对原始图像，365幅对称失真图像，包含5种失真(Gblur,WN,JPEG,JP2K和FF)，LIVE-3DⅡ有8对原始图像，360幅对称和非对称失真图像，包含5种失真类型(Gblur,WN,JPEG,JP2K和FF)。The present invention is tested on the public stereo image libraries LIVE I and LIVE II. LIVE-3DⅠ has 20 pairs of original images, 365 symmetrically distorted images, including 5 kinds of distortions (Gblur, WN, JPEG, JP2K and FF), LIVE-3DⅡ has 8 pairs of original images, 360 symmetrical and asymmetrical distorted images, including 5 distortion types (Gluur, WN, JPEG, JP2K and FF).

下面结合具体实例详细说明本方法。The method is described in detail below in conjunction with specific examples.

本发明基于深度卷积神经网络提出了一种引入投影权值归一化的立体图像质量评价方法，对立体图像质量评价的识别率较高。CNN模型通过卷积核顺行与并行两个模块，对预处理后的融合立体图像进行特征提取，使网络对图像的学习更为充分。相比现有深度学习评价算法，本发明引入BN与PBWN进行网络优化，解决网络训练过程中的病态问题，有效地提高了网络评价准确性。本发明所提方法的具体流程图如图1所示。Based on the deep convolutional neural network, the present invention proposes a stereoscopic image quality evaluation method by introducing the normalization of projection weights, and the recognition rate of the stereoscopic image quality evaluation is high. The CNN model extracts the features of the preprocessed fused stereo images through two modules of convolution kernels, anterograde and parallel, so that the network can learn more fully from the images. Compared with the existing deep learning evaluation algorithm, the present invention introduces BN and PBWN for network optimization, solves the ill-posed problem in the network training process, and effectively improves the accuracy of network evaluation. The specific flow chart of the method proposed by the present invention is shown in FIG. 1 .

具体步骤如下：Specific steps are as follows:

1、融合图像的获取1. Acquisition of fusion images

采用Gabor滤波器，该滤波器有6个尺度f_s∈{1.5,2.5,3.5,5,7,10}(cycles/degree)}和8个方向θ∈{kπ/8|k＝0,1…7}。将经过Gabor滤波后的左、右视图按照公式(16)融合成为一幅图像。A Gabor filter is adopted, which has 6 scales f _s ∈ {1.5, 2.5, 3.5, 5, 7, 10} (cycles/degree)} and 8 directions θ ∈ {kπ/8|k=0,1 …7}. The left and right views after Gabor filtering are fused into one image according to formula (16).

其中，I_l(x,y)与I_r(x,y)分别表示左、右视图中位于位置(x,y)的像素值，C(x,y)表示融合图像的像素值。TCE表示对本视点的增强分量，TCE^*表示对另一视点的抑制分量，计算方式如公式(17)、(18)所示：Among them, I _l (x, y) and I _r (x, y) represent the pixel value at position (x, y) in the left and right views, respectively, and C(x, y) represent the pixel value of the fused image. TCE represents the enhancement component for this viewpoint, and TCE ^* represents the suppression component for another viewpoint. The calculation methods are shown in formulas (17) and (18):

其中，t表示左视点或者右视点，gc表示增强门限，ge表示控制门限。经Gabor滤波后得到48幅图像，表示t视点的第n幅图像被对比敏感度函数滤除的频率信息,表示t视点的第n幅图像的权重，i、j分别表示Gabor滤波的6个尺度f_s∈{1.5,2.5,3.5,5,7,10}(cycles/degree)和8个方向θ∈{kπ/8|k＝0,1…7}。Among them, t represents the left view or right view, gc represents the enhancement threshold, and ge represents the control threshold. After Gabor filtering, 48 images are obtained, represents the frequency information of the nth image of the t viewpoint filtered by the contrast sensitivity function, Represents the weight of the n-th image of the t viewpoint, i, j represent the Gabor filtering 6 scales f _s ∈ {1.5, 2.5, 3.5, 5, 7, 10} (cycles/degree) and 8 directions θ ∈ { kπ/8|k=0,1...7}.

2、图像预处理2. Image preprocessing

融合后的单幅图像尺寸较大，因此将原始图像切割为32×32的图像块从而减小网络运算量，而后进行归一化计算。归一化计算过程如公式(20)所示：The size of the fused single image is large, so the original image is cut into 32 × 32 image blocks to reduce the amount of network operations, and then the normalization calculation is performed. The normalization calculation process is shown in formula (20):

3、卷积神经网络模型3. Convolutional Neural Network Model

基于Inception结构和Block结构，本发明搭建同时具有两种卷积核排列方式的深层次卷积神经网络模型，该模型的输入为切割后的小块。模型包含1个Inception结构，1个卷积层，3个Block结构，1个池化层与1个全连接层，如图1所示。Based on the Inception structure and the Block structure, the present invention builds a deep-level convolutional neural network model with two convolution kernel arrangement modes at the same time, and the input of the model is the cut block. The model contains 1 Inception structure, 1 convolution layer, 3 Block structures, 1 pooling layer and 1 fully connected layer, as shown in Figure 1.

在网络Inception结构内部的同一层中，通过不同大小的卷积核并行运算，可提取到图像不同规模的特征，使提取过程更全面、充分，并且引入1×1大小的卷积核来减少网络参数，降低计算复杂度。In the same layer inside the network Inception structure, through the parallel operation of convolution kernels of different sizes, the features of different scales of the image can be extracted to make the extraction process more comprehensive and sufficient, and a 1×1 size convolution kernel is introduced to reduce the network size. parameters to reduce computational complexity.

表1网络模型参数设置Table 1 Network model parameter settings

Block结构引入“残差”的思想，通过增加一个通道，将上一层的输入直接连接输出，解决网络退化问题。The block structure introduces the idea of "residual error". By adding a channel, the input of the previous layer is directly connected to the output to solve the problem of network degradation.

4、网络结构的优化4. Optimization of network structure

在本发明采用的CNN网络中，每一个卷积层后均引入投影权值归一化层(PBWN)对与数据批量归一化层(BN)分别对各层权值参数与输入数据进行归一化。In the CNN network adopted in the present invention, a projection weight normalization layer (PBWN) pair and a data batch normalization layer (BN) are introduced after each convolution layer to normalize the weight parameters of each layer and the input data respectively. unify.

(1)投影权值归一化(1) Normalization of projection weights

min l(y,f(x；W))min l(y,f(x;W))

ddiag(ww^T)＝E (22)ddiag(ww ^T )=E (22)

采用黎曼优化理论求解该约束，可得流形空间内的黎曼梯度为Using the Riemann optimization theory to solve this constraint, the Riemann gradient in the manifold space can be obtained as

说明该项在公式(24)中不是主导项，且通过实验证明黎曼梯度与原始梯度的效果几乎相同。故本发明采用原始梯度进行计算来减小计算量。It shows that this term is not the dominant term in formula (24), and it is proved by experiments that the effect of the Riemann gradient is almost the same as that of the original gradient. Therefore, the present invention adopts the original gradient for calculation to reduce the amount of calculation.

因此，本发明采用公式(26)进行权值更新。Therefore, the present invention adopts formula (26) to update the weights.

(2)数据批量归一化(2) Batch normalization of data

数据批量归一化的方法如公式(27)所示，在训练过程中，对每一个batch计算均值μ和方差σ²，对每个特征x_i进行处理，得到经过数据批量归一化处理后的激活为y_i。The method of data batch normalization is shown in formula (27). During the training process, the mean μ and variance σ ² are calculated for each batch, and each feature _xi is processed to obtain the batch normalized data. The activation is y _i .

在测试时，用所有训练batch的均值表示E[x]，用所有训练batch方差的无偏估计表示var[x]，如公式(28)、(29)所示，m为各个batch的大小。During testing, E[x] is represented by the mean of all training batches, and var[x] is represented by an unbiased estimate of the variance of all training batches, as shown in formulas (28) and (29), where m is the size of each batch.

E[x]＝E_B[μ_B] (28)E[x]=E _B [μ _B ] (28)

故测试阶段，数据批量归一化的公式如公式(30)所示，参数γ,β的功能是缩放和平移，恢复模型的表达能力，提高网络泛化性能。Therefore, in the test phase, the formula for batch normalization of data is shown in formula (30). The functions of parameters γ and β are to zoom and translate, restore the expressive ability of the model, and improve the generalization performance of the network.

5、立体图像质量评价结果与分析5. Stereoscopic image quality evaluation results and analysis

本发明的实验在两个公开的立体图像库中进行，分别是LIVE I和LIVE II数据库。本发明选用的评价指标为皮尔森线性相关系数(Pearsonlinear correlationcoefficient，PLCC)、斯皮尔曼秩相关系数(Spearman rank order correlationcoefficient，SROCC)和均方根误差(Root mean square error，RMSE)。PLCC、SROCC的值越大，RMSE的值越小，表示模型评价结果与主观结果的一致性越强，效果越好。The experiments of the present invention were carried out in two public stereo image databases, LIVE I and LIVE II databases, respectively. The evaluation indicators selected in the present invention are Pearson linear correlation coefficient (PLCC), Spearman rank order correlation coefficient (SROCC) and root mean square error (Root mean square error, RMSE). The larger the value of PLCC and SROCC, the smaller the value of RMSE, which means that the consistency between the model evaluation results and the subjective results is stronger, and the effect is better.

表2为本发明算法与其他方法在在LIVE-I、LIVE-II数据库上的性能比较。Table 2 is the performance comparison between the algorithm of the present invention and other methods on the LIVE-I and LIVE-II databases.

表2各评价方法的总体性能比较Table 2 Overall performance comparison of each evaluation method

Chen[12]未给出LIVE-II数据库总体评价指标数值，仅给出分失真类型的指标数值，故与Chen[12]的对比在表3、表4进行。表2表明，本发明算法性能明显优于Heeseok[16]性能，这是由于本发明在融合图像过程中充分考虑到线性与非线性两种情况，即双眼接受到的刺激很小时，对左右眼接受到的刺激线性加权，当刺激达到了发生增益增强与增益控制的门限时，采用非线性加权。相比Lin[15]，本发明在融合图像时对原图进行融合，[15]仅对图像底层特征进行融合，故本发明所得指标优于[15]指标。本发明所得PLCC与SROCC相较其他未对图像进行融合的深度学习方法[11,13]和传统方法[5-7]有了显著提升。在LIVE-II上，本发明所得PLCC位列次优，较Ding[13]低0.0122％。相比其他算法，本发明模型在LIVE-I与LIVE-II数据库上计算得到的RMSE较小，综合三个指标考虑，本发明算法在对称失真与非对称失真的立体图像质量评价上均具有较好的性能。Chen[12] did not give the overall evaluation index value of LIVE-II database, but only gave the index value of sub-distortion type, so the comparison with Chen[12] is carried out in Table 3 and Table 4. Table 2 shows that the performance of the algorithm of the present invention is obviously better than that of Heeseok [16]. This is because the present invention fully considers both linearity and nonlinearity in the process of fusing images. The received stimuli are linearly weighted, and when the stimuli reach the threshold of gain enhancement and gain control, nonlinear weighting is used. Compared with Lin [15], the present invention fuses the original image when fusing images, and [15] only fuses the underlying features of the image, so the index obtained by the present invention is better than the index [15]. Compared with other deep learning methods [11, 13] and traditional methods [5-7] that do not fuse images, the PLCC and SROCC obtained by the present invention are significantly improved. On LIVE-II, the PLCC obtained by the present invention ranks second, which is 0.0122% lower than that of Ding [13]. Compared with other algorithms, the RMSE calculated by the model of the present invention on the LIVE-I and LIVE-II databases is smaller. Considering the three indicators, the algorithm of the present invention has better performance in the quality evaluation of stereo images of symmetrical distortion and asymmetrical distortion. good performance.

分析本发明算法对不同失真类型的评价效果，如表3、表4所示。The evaluation effect of the algorithm of the present invention on different distortion types is analyzed, as shown in Table 3 and Table 4.

表3各评价方法对LIVE-3DⅠ数据库中不同失真类型立体图像质量评价的性能比较Table 3 Performance comparison of each evaluation method for quality evaluation of stereo images with different distortion types in the LIVE-3DI database

表4各评价方法对LIVE-3D II数据库中不同失真类型立体图像质量评价的性能比较Table 4. Performance comparison of each evaluation method for quality evaluation of stereo images with different distortion types in the LIVE-3D II database

网络在进行测试时，表3、表4中PLCC、SROCC指标普遍低于现有算法，这是因为本发明所做实验为2分类，故即使在测试中仅判错1张图，也会对PLCC造成极大影响。实验表明，本发明算法对于5种失真类型整体评价效果较好，对于LIVE-I数据库中FF失真类型与LIVE-II数据库中的FF、BLUR失真类型，由于识别率达到了100％，所以PLCC、SROCC的值也达到了1，且RMSE的值为0。When the network is tested, the PLCC and SROCC indicators in Table 3 and Table 4 are generally lower than the existing algorithms. This is because the experiments done by the present invention are classified into two categories, so even if only one picture is wrongly judged in the test, it will also be correct. PLCC has a great impact. Experiments show that the algorithm of the present invention has a better overall evaluation effect on the five distortion types. For the FF distortion types in the LIVE-I database and the FF and BLUR distortion types in the LIVE-II database, since the recognition rate reaches 100%, the PLCC, The value of SROCC also reaches 1, and the value of RMSE is 0.

表5展示了在各卷积层后加入PBWN层与不加PBWN对模型性能的影响。结果表明，添加PBWN后会使实验结果有明显提升。对LIVE-I图像质量评价的识别率提高了2.833％，达到98.113％，对LIVE-II图像质量评价的识别率提高了5.88％，达到96.47％。Table 5 shows the effect of adding a PBWN layer after each convolutional layer and not adding a PBWN layer on the performance of the model. The results showed that the experimental results were significantly improved after adding PBWN. The recognition rate for LIVE-I image quality evaluation increased by 2.833% to 98.113%, and the recognition rate for LIVE-II image quality evaluation increased by 5.88% to 96.47%.

表5本算法关于立体图像质量评价的识别率Table 5. Recognition rate of this algorithm for stereo image quality evaluation

表6本算法测试所需时间(单位：秒)Table 6 Time required for this algorithm test (unit: seconds)

表6展示了对比有无PBWN对于测试时间的影响。PBWN使各层权值参数的量级相同，且权值参数均被单位规范化，有效地避免训练过程中出现Hessian矩阵的病态现象，提高了网络的学习能力与泛化能力，加速了网络收敛，缩短了网络测试所需时间。Table 6 shows the effect of comparing with and without PBWN on the test time. PBWN makes the magnitudes of the weight parameters of each layer the same, and the weight parameters are normalized by units, which effectively avoids the ill-conditioned phenomenon of the Hessian matrix during the training process, improves the learning ability and generalization ability of the network, and accelerates the network convergence. Reduced time required for network testing.

参考文献references

[1]Zilly F，Kluger J，Kauff P.Production rules for stereo acquisition[J].Proceedings of the IEEE，2011，99(4)：590-606.[1] Zilly F, Kluger J, Kauff P. Production rules for stereo acquisition [J]. Proceedings of the IEEE, 2011, 99(4): 590-606.

[2]Urey H，Chellappan K V，Erden E，et al.State of the art instereoscopic and autostereoscoic displays[J].Proceedings of the IEEE，2011，99(4)：540-555.[2] Urey H, Chellappan K V, Erden E, et al. State of the art instereoscopic and autostereoscoic displays[J]. Proceedings of the IEEE, 2011, 99(4): 540-555.

[3]毛香英,郁梅,蒋刚毅,等.基于结构失真分析的立体图像质量客观评价模型[J].计算机辅助设计与图形学学报，2012,24(8)：1047-1056.[3] Mao Xiangying, Yu Mei, Jiang Gangyi, et al. An Objective Evaluation Model of Stereo Image Quality Based on Structural Distortion Analysis [J]. Journal of Computer Aided Design and Graphics, 2012, 24(8): 1047-1056.

[4]徐姝宁，李素梅.基于视觉显著性的立体图像质量评价方法[J].信息技术，2016，2016(10)：91-93.[4] Xu Shuning, Li Sumei. Stereoscopic image quality evaluation method based on visual saliency [J]. Information Technology, 2016, 2016(10): 91-93.

[5]Bensalma Rafik，Larabi Mohamed-Chaker.A perceptual metric forstereoscopic image quality assessment based on the binocular energy[J].Multidimensional Systems and Signal Processing，2013，24(2)：281-316.[5] Bensalma Rafik, Larabi Mohamed-Chaker. A perceptual metric forstereoscopic image quality assessment based on the binocular energy [J]. Multidimensional Systems and Signal Processing, 2013, 24(2): 281-316.

[6]Shao Feng，Jiang Gangyi，Yu Mei，et al.Binocular energy responsebased quality assessment of stereoscopic images[J].Digital Signal Processing，2014，29：45-53.[6] Shao Feng, Jiang Gangyi, Yu Mei, et al.Binocular energy responsebased quality assessment of stereoscopic images[J].Digital Signal Processing, 2014, 29:45-53.

[7]Shao Feng，Lin Weisi，Wang Shanshan，et al.Learning Receptive Fieldsand Quality Lookups for Blind Quality Assessment of Stereoscopic Images[J].IEEE Transactions on Cybernetics，2016，46(3)：730-743.[7] Shao Feng, Lin Weisi, Wang Shanshan, et al. Learning Receptive Fields and Quality Lookups for Blind Quality Assessment of Stereoscopic Images [J]. IEEE Transactions on Cybernetics, 2016, 46(3): 730-743.

[8]王光华,李素梅,朱丹，等.极端学习机在立体图像质量客观评价中的应用[J].光电子·激光，2014，2014(9)：1837-1842.[8] Wang Guanghua, Li Sumei, Zhu Dan, et al. Application of extreme learning machine in objective evaluation of stereoscopic image quality [J]. Optoelectronics Laser, 2014, 2014(9): 1837-1842.

[9]顾珊波，邵枫，蒋刚毅，等.基于支持向量回归的立体图像客观质量评价模型[J].电子与信息学报，2012，34(2)：368-374.[9] Gu Shanbo, Shao Feng, Jiang Gangyi, et al. Objective Quality Evaluation Model of Stereo Image Based on Support Vector Regression [J]. Journal of Electronics and Information, 2012, 34(2): 368-374.

[10]吴限光，李素梅，程金翠.基于遗传神经网络的立体图像的客观评价[J].信息技术，2013，2013(5)：148-153.[10] Wu Xingguang, Li Sumei, Cheng Jincui. Objective Evaluation of Stereo Image Based on Genetic Neural Network [J]. Information Technology, 2013, 2013(5): 148-153.

[11]Zhang Wei，Qu Cchenfei，Lin Ma，et al.Learning structure ofstereoscopic image for no-reference quality assessment with convolutionalneural network[J].Pattern Recognition，2016，59(C)：176-187.[11] Zhang Wei, Qu Cchenfei, Lin Ma, et al. Learning structure of stereoscopic image for no-reference quality assessment with convolutionalneural network [J]. Pattern Recognition, 2016, 59(C): 176-187.

[12]陈慧，李朝锋.深度卷积神经网络的立体彩色图像质量评价[J].计算机科学与探索，2018，12(08)：1315-1322[12] Chen Hui, Li Chaofeng. Stereo color image quality evaluation of deep convolutional neural network [J]. Computer Science and Exploration, 2018, 12(08): 1315-1322

[13]Ding Yong,Deng Ruizhe,Xie Xin,et al.Reference stereoscopic imagequality assessment using convolutional neural network for adaptive featureextraction[J].IEEE Access，2018，2018(6)：37595-37603.[13] Ding Yong, Deng Ruizhe, Xie Xin, et al. Reference stereoscopic imagequality assessment using convolutional neural network for adaptive featureextraction[J]. IEEE Access , 2018, 2018(6): 37595-37603.

[14]Hubel D.H.，Wiesel T.N.Receptive fields of singleneurones in thecat’s striate cortex[J].The Journal of Physiology，1959，148(3)：574-591.[14] Hubel D.H., Wiesel T.N. Receptive fields of singleneurones in the cat’s striate cortex [J]. The Journal of Physiology, 1959, 148(3): 574-591.

[15]Lin Yancong,Yang Jiachen,Lu Wen，et al.Quality index forstereoscopic images by jointly evaluating cyclopean amplitude and cyclopeanphase[J].IEEE Journal of Selected Topics in Signal Processing，2017，11(11)：89-101.[15] Lin Yancong, Yang Jiachen, Lu Wen, et al. Quality index forstereoscopic images by jointly evaluating cyclopean amplitude and cyclopean phase [J]. IEEE Journal of Selected Topics in Signal Processing, 2017, 11(11): 89-101.

[16]Oh Heeseok，Ahn Sewoong，Kim Jongyoo，et al.Blind deep S3D imagequality evaluation via local to global feature aggregation[J].IEEETransactions on Image Processing，2017，26(10)：4923-4936.[16] Oh Heeseok, Ahn Sewoong, Kim Jongyoo, et al. Blind deep S3D imagequality evaluation via local to global feature aggregation [J]. IEEE Transactions on Image Processing, 2017, 26(10): 4923-4936.

[17]Ding Jian，Klein S.A.，Levi D.M.Binocular combination of phaseandcontrast explained by a gain-control and gain-enhancement model[J].Journal ofVision，2013，13(2)：13.[17] Ding Jian, Klein S.A., Levi D.M.Binocular combination of phaseandcontrast explained by a gain-control and gain-enhancement model[J].Journal of Vision, 2013, 13(2): 13.

[18]Szegedy C,Liu W,Jia Y,et al.Going deeper with convolutions[J].2014:1-9.[18] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions [J]. 2014: 1-9.

[19]He K,Zhang X,Ren S,et al.Deep Residual Learning for ImageRecognition[J].2015:770-778.[19] He K, Zhang X, Ren S, et al. Deep Residual Learning for ImageRecognition [J]. 2015:770-778.

[20]L.Huang,X.Liu,B.Lang,and B.Li.Projection based weightnormalization for deep neural networks.CoRR,abs/1710.02338,2017.[20] L. Huang, X. Liu, B. Lang, and B. Li. Projection based weightnormalization for deep neural networks. CoRR, abs/1710.02338, 2017.

[21]Ian Goodfellow,Yoshua Bengio,and Aaron Courville.DeepLearning.MIT Press,2016.[21] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. DeepLearning. MIT Press, 2016.

[22]S.Ioffe and C.Szegedy.Batch normalization:Accelerating deepnetwork training by reducing internal covariate shift.In Proceedings of the32nd International Conference on Machine Learning,ICML 2015。[22] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015.

Claims

1. A three-dimensional image quality evaluation method based on projection weight normalization is characterized in that left and right viewpoint images of a three-dimensional image are fused to obtain a single fused image, and then the single image is preprocessed: cutting and normalizing; and constructing a deep convolutional neural network model, taking the preprocessed image blocks as the input of the deep convolutional neural network, optimizing the structure of the deep convolutional neural network by adopting projection weight normalization and data batch normalization, and obtaining the quality evaluation result of the stereo image through the output of the deep convolutional neural network.

2. The method of claim 1, wherein the projection weight normalization-based stereo image quality evaluation method,

the specific steps of obtaining the fused image

Using a Gabor filter having 6 dimensions f_sE {1.5,2.5,3.5,5,7,10} } and 8 directions theta e { k pi/8 | k ═ 0,1 … 7}, and fusing the Gabor-filtered left and right views into an image according to formula (1).

Wherein, I_l(x, y) and I_r(x, y) denotes a pixel value at a position (x, y) in the left and right views, respectively, C (x, y) denotes a pixel value of the fused image, TCE denotes an enhancement component to the present viewpoint, TCE denotes^*Representing the suppressed component for the other viewpoint, the calculation is as shown in equations (2) and (3):

wherein t represents a left viewpoint or a right viewpoint, gc represents an enhancement threshold, ge represents a control threshold, 48 images are obtained after Gabor filtering,frequency information of the nth image representing the t viewpoint filtered by the contrast sensitivity function,weights of the n-th image representing t-viewpoint, i, j representing 6 scales f of Gabor filtering, respectively_sE {1.5,2.5,3.5,5,7,10} (cycles/degree) and 8 directions θ e { k pi/8 | k ═ 0,1 … 7 };

image pre-processing

The normalization calculation process is shown in equation (5):

wherein I (x, y) represents a pixel value at the (x, y) coordinate point, μ (x, y) is an average value of the pixel values, σ (x, y) is a standard deviation of the pixel values, and ∈ is an arbitrary positive number approaching 0 infinitely;

convolutional neural network model

Based on multi-scale extraction of feature inclusion structure and residual network structure Block, build a deep convolutional neural network model with two convolutional kernel arrangement modes, the input of the model is a small Block after cutting, the model comprises 1 inclusion structure, 1 convolutional layer, 3 Block structures, 1 pooling layer and 1 full-connection layer, in the same layer in the network inclusion structure, through convolutional kernel parallel operation of different sizes, the features of different scales of the image are extracted, and the convolutional kernel of 1 × 1 size is introduced to reduce network parameters, so that the computational complexity is reduced.

3. The method of claim 1, wherein the projection weight normalization-based stereo image quality evaluation method,

(1) projection weight normalization

In the planning problem of seeking the optimal solution by the network, adding the constraint on the weight matrix W of each layer:

min l(y,f(x；W))

wherein W ═ { W ═ W_iI-1, 2 … L represents a set of network weight matrices, the elements in the set being layers 1 to LL (y, f (x; W)) represents the loss function, with y being the desired output and f (x; W) being the actual output.Representation reservation matrixAnd the main diagonal elements of the matrixAll off-diagonal elements become 0.

The constraint defines the weight matrix of each layer in a subspace of the manifold space, namely, the weight matrix w of each layer satisfies

ddiag(ww^T)＝E (7)

Solving the constraint by using Riemann Riemannian optimization theory to obtain Riemannian gradient in manifold space

Wherein,the gradient is obtained without constraint. When the weight matrix omega of each neuron meets the unit normalization, namely omega^TThe riemann gradient is obtained based on equation (8) as 1:

riemann gradient is reduced by one term compared with original gradientThe norm of this reduced term is analyzed:

the original gradient is adopted for calculation to reduce the calculation amount, and the formula (11) is adopted for weight updating:

(2) batch normalization of data

The batch normalization method of data is shown in formula (12), and during the training process, the mean value mu and the variance sigma are calculated for the data of each batch²For each feature x_iProcessing to obtain the activation y after data batch normalization_i。

During the test, the mean value of all training batchs is used to represent E [ x ], the unbiased estimation of the variance of all training batchs is used to represent var [ x ], and m is the size of each batch as shown in the formulas (13) and (14).

E[x]＝E_B[μ_B] (13)

Therefore, in the testing stage, the formula of data batch normalization is shown in formula (15), the function of the parameter gamma and beta is zooming and translation, the expression capability of the model is restored, and the network generalization performance is improved: