CN108377387A

CN108377387A - Virtual reality method for evaluating video quality based on 3D convolutional neural networks

Info

Publication number: CN108377387A
Application number: CN201810240647.XA
Authority: CN
Inventors: 杨嘉琛; 刘天麟
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-03-22
Filing date: 2018-03-22
Publication date: 2018-08-07

Abstract

The present invention relates to a kind of stereo image quality evaluation methods based on 3D CNN, include the following steps：Video pre-filtering：VR differential videos are obtained using the left view video and right view video of VR videos, frame is uniformly taken out from differential video, give each frame nonoverlapping stripping and slicing, the video block of each frame same position constitutes a VR video patch, to generate training of enough data for 3D CNN；Establish 3D CNN models；Training 3D CNN models：Utilize stochastic gradient descent method, it is input with VR video patches, each patch mixes original video mass fraction as label, it is inputted network in batches, each layer weight of network is fully optimized after successive ignition, finally obtains the convolutional neural networks model that can be used for evaluating virtual reality video quality；Obtain final result.The present invention improves method for objectively evaluating accuracy rate.

Description

Virtual reality method for evaluating video quality based on 3D convolutional neural networks

Technical field

The invention belongs to field of video processing, are related to virtual reality method for evaluating video quality.

Background technology

The emulation and interaction technique new as one --- virtual reality (VR) technology many fields as building, game with It is used in military affairs, it can create a virtual environment consistent with the rule of real world, or establish one and completely disengage The simulated environment of reality, this can bring the more true hearing experience of people and when participating in the cintest experience [1].As the important of virtual reality Carrier, it is panoramic stereoscopic video currently to be defined closest to VR videos, plays huge effect.However, VR videos are being adopted During collection, storage and transmission due to equipment and processing means etc., some distortions, Jin Erying are inevitably introduced Ring the quality of VR videos.Therefore, it is most important to study a kind of evaluation method of energy effective evaluation virtual reality video quality.But It is that subjective evaluation method is easily interfered by many factors, and time-consuming and laborious, evaluation result is also not sufficiently stable.Opposite subjective assessment, The quality of objective evaluation evaluation image in the form of software, while it being not required to participant and a large amount of subjective test, it is easy to operate, and It is highly relevant with subjective assessment, increasingly paid close attention to by correlative study person.

Since virtual reality technology has just emerged in recent years, there is presently no commented with objective for VR video specifications standard Valence system [2].VR videos are realistic, feeling of immersion, the characteristics such as three-dimensional sense [3], in conventional Multi Media type neutrality volumetric video The characteristics of with VR videos, is closest, therefore, the think of that evaluation needs to refer to current stereoscopic video quality evaluation is carried out to VR videos Think.It is the evaluation side based on human visual system (HVS) that the method for objectively evaluating of current three-dimensional video-frequency, which mainly has three classes, the first kind, Method.Second class is the evaluation method based on characteristics of image and combination machine learning.Third class is the evaluation side using deep learning Method.The above method all has good reference to the evaluation of VR video objectives.

[1]Minderer M,Harvey C D,Donato F,et al.Neuroscience:Virtual reality explored.[J]. Nature,2016,533(7603):324.

[2]X.Ge,L.Pan,Q.Li.Multi-Path Cooperative Communications Networks forAugmented and Virtual Reality Transmission.IEEE Transactions on Multimedia,vol.19,no.10,pp.2345-2358, 2017.

[3]Hosseini M,Swaminathan V.Adaptive 360VR Video Streaming:Divide and Conquer[C]//IEEE International Symposium on Multimedia.IEEE,2017:107-110.

Invention content

It is an object of the invention to establish a VR method for evaluating video quality for fully considering virtual reality characteristic.This hair The VR video objective quality evaluation methods of bright proposition allow machine to carry using deep learning model 3D convolutional neural networks (3D CNN) Take VR video features, rather than traditional manual extraction feature, and newest deep learning model 3D CNN can be examined fully Consider the temporal motion information of video.At the same time the present invention devises fitting VR video productions and merges plan with the score of the feature of broadcasting Slightly, to make accurately and objective appraisal.Technical solution is as follows：

A kind of stereo image quality evaluation method based on 3D CNN, evaluation method include the following steps：

1) video pre-filtering：VR differential videos are obtained using the left view video and right view video of VR videos, from difference Frame is uniformly taken out in video, and the nonoverlapping stripping and slicing of each frame, the video block of each frame same position is given to constitute a VR video patch, To generate training of enough data for 3D CNN；

2) 3D CNN models are established：Including two convolutional layers, two pond layers and two full articulamentums, activation primitive uses Rectification linear unit (ReLU) prevents over-fitting using Dropout strategies；Then structure and training parameter in the layer of adjustment network To reach better classifying quality；

3) training 3D CNN models：It is input with VR video patches using stochastic gradient descent method, each patch mixes original Video quality score is inputted network in batches as label, and each layer weight of network obtains fully excellent after successive ignition Change, finally obtains the convolutional neural networks model that can be used for evaluating virtual reality video quality；

4) final result is obtained：Used 3D CNN obtain the score of VR video patches, recycle score convergence strategy difference Different weights is assigned to the VR video patches of different location, is weighted to obtain final virtual reality video quality objective and comment Valence score.

VR video objective quality evaluation methods proposed by the invention utilize newest deep learning model, can extract VR The more high-dimensional feature of video is not only not necessarily to the feature of manual extraction video, learns the feature that extraction needs using machine itself, The movable information of video time domain is fully taken into account simultaneously.In addition to this present invention combines making and the feature of broadcasting of VR videos, right Different video patch scores gives different weights and is weighted, and then integrates statement VR videos using score convergence strategy Objective quality.The video pre-filtering method that the present invention takes is simple, has stronger practicability, the test model consumption proposed When it is small, it is easily operated.The VR video quality objective assessments result that this method obtains and subjective evaluation result have very high consistent Property, it can accurately reflect the quality of VR videos.

Description of the drawings

Fig. 1 VR video pre-filtering flow charts.

Fig. 2 3D CNN network frame figures.

Fig. 3 3D trellis diagrams.

The subjective and objective fractional relationship scatter plots of Fig. 4：(a) symmetrical distortion, (b) H.264 asymmetric distortion, (c) be distorted, (d) JPEG2000 is distorted.

Specific implementation mode

Stereo image quality evaluation method provided by the invention based on 3D CNN, each VR videos that are distorted are to by left video V_lWith right video V_rComposition, evaluation method include the following steps：

The first step：Difference video V is built according to three-dimensional perception principle_d.It is first that original VR videos and distortion VR videos is every Then one frame gray processing utilizes left video V_lWith right video V_rThe difference video needed.It calculates at video location (x, y, z) On and value video V_dValue such as formula (1) shown in：

V_d(x, y, z)=| V_l(x,y,z)-V_r(x,y,z)| (1)

Second step：By VR difference video strippings and slicings to constitute video patch, to the capacity of EDS extended data set.Specifically, 1 frame is extracted every 8 frames from all VR differences videos, extracts N frames altogether.It is cut into 32 × 32 pictures in the same position of extraction frame Then the image block of same video, same position is constituted a VR video patch by the square image blocks of plain size.In order to fill Divide extraction sdi video information, each frame uniformly nonoverlapping cutting image block, each frame video should be cut into M image block. The video patch that M size is 32 × 32 × N can be extracted altogether according to each VR videos of the difference of resolution sizes.

Third walks：Structure and training 3D CNN deep learning models.Model of the present invention is by two 3D convolutional layers, two 3D Pond layer and two full articulamentums form.On the basis of 2D CNN, 3D CNN consider the information between multiple input, can have The movable information of effect extraction video time domain, it is therefore necessary to illustrate the convolution and pond process of 3D CNN.The formula of 3D convolution is：

Wherein k indicates that the index of the Feature Mapping in (l-1) layer is connected to current convolution kernel,It indicates in (l-1) layer K-th of 3D characteristic pattern,It is i-th of 3D convolution kernel in l layers, convolution existsOn.One additional bias item and one Nonlinear terms activation function is performed to obtain final characteristic pattern.It is additional bias item, f () is nonlinear activation function, Such as sigmoid function, hyperbolic tangent function and integer linear function.

The formula in the ponds 3D is：

Wherein m, n, j represent the size in characteristic pattern selected point region.

3D CNN structures are using stochastic gradient descent and ReLU as activation primitive in the present invention, in order to prevent over-fitting, This invention takes dropout strategies, i.e., the dropout strategies for the use of parameter being 0.5 after each pond layer, complete at first After articulamentum, we are tactful using the dropout that parameter is 0.25.Minibatch sizes are 128 in network, model training Habit rate is set as 0.001.In addition, using batch normalization to accelerate network training between the subsequent activation of each convolution sum. The object function that this model is taken is as follows：

Wherein λ represents regularization parameter, y_iIndicate real estate mass fraction, f (x_i) indicate prediction score.Model construction After the completion using 80% data as training, 20% data are as test.

4th step：It obtains obtaining VR videos by score convergence strategy after video patch score by depth model finally dividing Number.Equirectangular projection pattern of the score convergence strategy that the present invention uses according to VR videos, to the VR of different location Video patch assigns different weights, to obtain final objective evaluation mass fraction.Equirectangular projection patterns Video the two poles of the earth part in projection process can be made significantly to be stretched, influence the spatial distribution of VR videos under areal model.Due to Method for evaluating objective quality be using planar video as input, and subjective assessment score be then with spherical video perception experience for according to According to, therefore the present invention designs shown in score convergence strategy such as formula (4)：

Wherein S_fIndicate final score, S_xyIndicate that the prediction score in the video patch of video frame position (x, y), x represent Width position, y represent height and position, W_xyIndicate that the weight of corresponding position, h indicate that the vertical height of VR videos, h' indicate video The vertical range of patch center position VR video hubs.

5th step：Choose database.For the image prediction objective quality scores for proving the method for the present invention acquisition and subjective matter Amount score has very high consistency, and prediction objective quality scores can accurately reflect the quality of image, by the method for the present invention in VRQ- It is tested on TJU databases.The database includes 13 original VR videos and 364 distortion VR videos, and type of distortion includes H.264 and include symmetrical distortion and asymmetric distortion, wherein symmetrical distortion video 104, asymmetric mistake simultaneously with JPEG2000 True 260.

Take 4 in the world commonly weigh Objective image quality evaluation algorithms index evaluation the method for the present invention performance, 4 A index be respectively Pearson's linearly dependent coefficient (Pearson linear correlation coefficient, PLCC), Spearman sequence related coefficient (Spearman rank-order correlation coefficient, SRCC), Ken Deer Rank related coefficient (Kendallrank-order correlation coefficient, KROCC) and root-mean-square error (Root Mean SquaredError, RMSE).For the value of three above related coefficient closer to 1, RMSE value is smaller, explanation Algorithm is more accurate.

6th step：Analysis and comparison algorithm performance.The verification present invention for VR video quality evaluations specific aim and have Effect property, the present invention are directed to image quality evaluation IQA, and stereo image quality evaluates SIQA, video quality evaluation VQA, three-dimensional video-frequency Quality evaluation SVQA respectively refers to a kind of method contrast verification in the database, corresponds to [1] respectively successively, [2], [3] and [4]。

1 integrated data index of table

The different type of distortion indexs of 2 present invention of table

[1]A.Liu,W.Lin,and M Narwaria.Image quality assessment basedon gradient similarity. IEEE Transactions on Image Processing APublication of the IEEE Signal Processing Society, 21(4):1500,2012.

[2]Alexandre Benoit,Patrick Le Callet,Patrizio Campisi,and RomainCousseau.Using disparity for quality assessment of stereoscopic images.In IEEE International Conference on Image Processing,pages 389–392, 2008.

[3]Kalpana Seshadrinathan,Rajiv Soundararajan,Alan Conrad Bovik, andLawrence K Cormack.Study of subjective and objective quality assessment of video.IEEE Transactions on Image Processing,19(6):1427–1441,2010.

[4]Nukhet Ozbek and A.Murat Tekalp.Unequal inter-view rate allocationusing scalable stereo video coding and an objective stereo video qualitymeasure.In IEEE Intern。

Claims

1. a kind of stereo image quality evaluation method based on 3D CNN, evaluation method include the following steps：

1) video pre-filtering：VR differential videos are obtained using the left view video and right view video of VR videos, from differential video In uniformly take out frame, give the nonoverlapping stripping and slicing of each frame, the video block of each frame same position to constitute a VR video patch, with production Raw enough data are used for the training of 3D CNN；

2) 3D CNN models are established：Including two convolutional layers, two pond layers and two full articulamentums, activation primitive uses rectification Linear unit (ReLU) prevents over-fitting using Dropout strategies；Structure and training parameter are to reach in the layer of subsequent adjustment network To better classifying quality；

3) training 3D CNN models：It is input with VR video patches using stochastic gradient descent method, each patch mixes original video Mass fraction is inputted network in batches as label, and each layer weight of network is fully optimized after successive ignition, most The convolutional neural networks model that can be used for evaluating virtual reality video quality is obtained eventually；

4) final result is obtained：Used 3D CNN obtain the score of VR video patches, recycle score convergence strategy respectively to not VR video patches with position assign different weights, are weighted to obtain final virtual reality video quality objective assessment point Number.