CN107018400B

CN107018400B - It is a kind of by 2D Video Quality Metrics into the method for 3D videos

Info

Publication number: CN107018400B
Application number: CN201710227433.4A
Authority: CN
Inventors: 曹治国; 赵富荣; 肖阳; 李炽; 张骁迪; 鲜可; 李睿博; 李然; 张润泽; 杨佳琪; 朱延俊; 赵峰
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2017-04-07
Filing date: 2017-04-07
Publication date: 2018-06-19
Anticipated expiration: 2037-04-07
Also published as: CN107018400A

Abstract

It is a kind of by 2D Video Quality Metrics into the method for 3D videos, belong to pattern-recognition and computer vision field, it is therefore intended that eliminate the estimation of prior art scene depth and unpredictable error that View Synthesis is brought, while greatly improve calculating speed.The present invention includes training stage and service stage, and the training stage includes data input, feature extraction, Fusion Features, View Synthesis and parameter updating step successively；Service stage includes data input, feature extraction, Fusion Features and View Synthesis step successively.Training stage is to 10⁶The 3D three-dimensional video-frequency movie films of the left-right format of grade are trained, scene depth estimation and View Synthesis are carried out at the same time Optimization Solution, it determines parameter, ensure that the Pixel-level accuracy prediction of output right wing view, reduce the error that 2D videos are turned 3D videos and are divided into two task operatings and bring；After training is completed, you can directly carry out conversion of the 2D videos to 3D videos, transformation of ownership efficiency can be greatly improved, ensure the precision of 3D three-dimensional video-frequencies finally exported.

Description

It is a kind of by 2D Video Quality Metrics into the method for 3D videos

Technical field

The invention belongs to pattern-recognitions and computer vision field, and in particular to it is a kind of by 2D Video Quality Metrics into 3D videos Method, the plane 2D videos for common camera to be shot are directly changed into the 3D for the left-right format that can be watched in cinema Three-dimensional video-frequency.

Background technology

With the development of virtual reality technology, spectators is allowed to be immersed in 3D experience and increasingly become one, multimedia recreation field Very important direction.3D experiences the support for needing panoramic video and 3D effect, and common planar video is if it is desire to obtain shadow The 3D viewings experience of institute, needs the planar video of 2D being converted into 3D videos.There are many kinds of the forms of 3D videos, for having a left side The 3D three-dimensional video-frequencies of right form, it is first various geometrical relationships, scene in 2D video images that usual 2D videos, which turn 3D videos, Semantic information, estimate the front and rear hierarchical relationship of the object in 2D video images, and then by this spatial relation, into Row geometric maps synthesize the image of another viewpoint, and the image of two viewpoints in left and right merges again, ultimately generates 3D solids Video.

Traditional two-dimensional video turns 3 D video by two kinds of approach of hardware and software to solve.Hardware mainly has three-dimensional throwing Shadow instrument or bore hole 3D TVs, width, into the compression of line width, is become original by them first by the 2D video images of input Obtained video, is then not added with the offset or mapping of direct carry out level amount distinguished by half, and final same object exists Position in two viewpoints in left and right is different, and the perception principle based on human eye it is expected to obtain 3D effect, this is a kind of very coarse Method, 3D effect is not notable under most scenes, and the 2D of this product turns 3D functions and is limited by its product quality on the market, very The user of positive audient is fewer, and most manufacturers are more absorbed in the performance boost of hardware device, that is, it is good to input the transformation of ownership 3D three-dimensional video-frequencies, by its hardware, obtain relatively good Three-dimensional Display effect.And the function that 2D turns 3D is mainly calculated by software Method obtains, and general 2D, which turns 3D, to be come by the depth hierarchical information of object, semantic analysis, View Synthesis in estimation scene To video in addition all the way, two different visual point images have the space structure of scene different exhibition methods, and people is watching When be obtained with relatively good three-dimensional experience.

The solutions such as Electric company of Sichuan Changhong (201110239086.X) are optimized by hardware or software algorithm Above-mentioned two step is exported with it is expected to obtain better 3D visual effect.On the one hand these schemes compare the consumption of computing resource Greatly, time cost is also higher；Depth extraction, the View Synthesis technology of still further aspect scene are also further being explored at present In, during actual use larger calculating error is had, two-part error is easy to cumulative and then influences final viewing Experience.With the arrival in big data epoch, the videos such as more and more good three-dimensional movies, documentary film, animation are produced out, And the later stage 2D of current plane video turn 3D making still to expend huge manpower and materials.

VGG16 models involved in the present invention, it is by K.Simonyan, A.Zisserma, in " Very deep Convolutional networks for large-scale imagerecognition, " it proposes, see arXiv： 1409.1556 2014.This article annex open source projects, including Parameter File vgg16-0001.params, content packet Include the weighting parameter of each layer convolution kernel in VGG16 models.

Invention content

The present invention provide it is a kind of by 2D Video Quality Metrics into the method for 3D videos, it is therefore intended that eliminate prior art Scene The unpredictable error that estimation of Depth and View Synthesis are brought, while the calculating speed of the video solid transformation of ownership is greatly improved, to obtain Obtain better stereos copic viewing experience.

It is provided by the present invention it is a kind of by 2D Video Quality Metrics into the method for 3D videos, including training stage and service stage, It is characterized in that：

(1) training stage includes data input step, characteristic extraction step, Fusion Features step, viewpoint conjunction successively Into step and parameter updating step；

(1.1) data input step：It is 10 to obtain the order of magnitude by disclosed video data resource⁶Left and right view format 3D three-dimensional video-frequency movie films, and select disparity range be -15~+16 3D three-dimensional video-frequency movie films, as training Data set, image size is H rows in three-dimensional video-frequency movie film, W is arranged, H=200~1920, W=180~1080；

(1.2) characteristic extraction step：The three-dimensional video-frequency that training data concentrates left and right view format is split as left and right view The stereo-picture of form, reservation right view is constant, and mathematics convolution algorithm and the down-sampled fortune of pondization are passed sequentially through to a frame left view It calculates and carries out feature extraction, obtain the eleventh floor convolution eigenmatrix of the first layer convolution eigenmatrix of the frame left view~second, make For its characteristics of image；

(1.3) Fusion Features step：Respectively to third layer, layer 6, the tenth layer, the 14th layer, the second eleventh floor convolution Eigenmatrix is operated into row matrix deconvolution, by obtained first group of deconvolution eigenmatrix D3 to the 5th groups of deconvolution spy Sign matrix D 21 is cascaded, and forms fusion feature matrix D c, and fusion feature matrix D c sizes are H × W × 32, i.e. matrix has H Row, W row, 32 tensor dimensions；

(1.4) View Synthesis step：

To fusion feature matrix D c, using regression parameter matrix θ, obtain each pixel in corresponding left view and take different parallaxes When probabilistic forecasting value, form the parallax probability matrix Dep of left view；

According to original left view and parallax probability matrix Dep, by View Synthesis, right wing synthesis view is obtained；

(1.5) parameter updating step：

Calculate the error matrix err of right wing synthesis view and the right view in step (1.2)_R；

View, each error matrix err are synthesized for continuous N frame right wing_RIt is added, forms propagated error matrix errS_R, M >= 16；Obtained propagated error matrix is traveled into View Synthesis step, Fusion Features step and spy by back-propagation algorithm Each sub-step in extraction step is levied, reversely update parallax probability matrix Dep, regression parameter matrix θ and each layer are respectively rolled up successively Each weights of product core, complete the primary update of all parameters；

Turn characteristic extraction step (1.2), remaining left view continuation is concentrated to carry out successively to the training data obtained Feature extraction, Fusion Features, View Synthesis step and parameter updating step are stated, when training data concentrates all left and right views equal Using finishing, the first round update of parallax probability matrix Dep, regression parameter matrix θ and each convolution kernel of each layer is completed；

According to the step (1.1) to step (1.5), parallax probability matrix Dep, regression parameter matrix θ are continued to complete, with And the second wheel update of each each convolution kernel of layer；So repeat, after the 50th wheel the~the two hundred wheel update is completed, the training stage Terminate；

(2) service stage includes data input step, characteristic extraction step, Fusion Features step and viewpoint conjunction successively Into step；

(2.1) data input step：Prepare plane 2D videos to be converted；

(2.2) characteristic extraction step：Plane 2D videos are split as image, as the left view being equivalent in step (1.2) Figure, passes sequentially through mathematics convolution algorithm and the down-sampled operation of pondization carries out feature extraction, obtains the first layer volume of the frame left view The product eleventh floor convolution eigenmatrix of eigenmatrix~second, as its characteristics of image, each each weights of convolution kernel are equal involved by each layer Each weights of each convolution kernel of each layer are corresponded to after using the training stage；

(2.3) Fusion Features step：It is identical with step (1.3)；

(2.4) View Synthesis step：It is identical with step (1.4), according to the parallax probability matrix obtained in step (1.4) Image in Dep and (2.2) obtains right wing by View Synthesis and synthesizes view；

Features described above extraction step (2.2), Fusion Features step (2.3) and viewpoint is carried out successively to each frame left view to close Into step (2.4), left and right splicing, right a later frame one are carried out using original image as left view and obtained right wing synthesis view Frame, which connects together, obtains the 3D three-dimensional video-frequencies of left-right format.

The characteristic extraction step (1.2) includes following sub-steps：

(1.2.1) carries out convolution operation to a frame left view, obtains first layer convolution eigenmatrix：

Using 3 × 3 convolution kernel, step-length 1 since the upper left corner of image, moves right successively, until the right of image Boundary is changed to the next line of image, continues to move from left to right, and until the lower right corner of image, often mobile primary, convolution kernel is each Weights are multiplied with the pixel value of corresponding position image, and all products are added again, obtain image convolution core region convolution value；Figure Former regional location is pressed as each region convolution value to arrange, and forms 1 convolution eigenmatrix C1 of the frame left view₁, to C1₁In own Minus element value value is zero；

The convolution kernel of 64 3 × 3 is used altogether, and above-mentioned convolution operation is carried out to image, obtains 64 1 convolution feature squares Battle array C1₁、C1₂、C₁₃……C1₆₄, form first layer convolution eigenmatrix；

(1.2.2) carries out convolution operation to first layer convolution eigenmatrix, obtains second layer convolution eigenmatrix：

Use 3 × 3 convolution kernel, step-length 1, from 1 convolution eigenmatrix C1 of first layer₁The upper left corner start, successively It moves right, until 1 convolution eigenmatrix C1₁Right margin, be changed to next line, continue to move from left to right, until 1 convolution Eigenmatrix C1₁The lower right corner until, often mobile primary, each weights of convolution kernel and 1 convolution eigenmatrix C1 of corresponding position₁'s Matrix element value is multiplied, and all products are added again, obtain 1 convolution eigenmatrix C1₁Convolution kernel region convolution value；1 time Convolution eigenmatrix C1₁Each region convolution value is pressed former regional location and is arranged, and convolution eigenmatrix is formed, to obtained convolution feature All minus element value values are zero in matrix；

Later to remaining 63 1 convolution eigenmatrix C1₂、C₁₃……C1₆₄, it is corresponding using this layer of convolution eigenmatrix Convolution kernel repeats operation described in the preceding paragraph, obtains 64 convolution eigenmatrixes altogether, is directly added, and forms 2 secondary volumes Product eigenmatrix C2₁；

The convolution kernel of 64 3 × 3 is used altogether, to 64 1 convolution features included by first layer convolution eigenmatrix Matrix C 1₁、C1₂、C₁₃……C1₆₄Two sections of convolution operations are carried out, obtain 64 2 convolution eigenmatrix C2₁、C2₂、 C2₃……C2₆₄, form second layer convolution eigenmatrix；

(1.2.3) carries out the down-sampled operation of first time pondization to second layer convolution eigenmatrix, obtains third layer convolution spy Levy matrix：

To 2 convolution eigenmatrix C2 of the second layer₁, use 2 × 2 sliding window, step-length 2, from 2 convolution feature squares Battle array C2₁The upper left corner start, move right successively, until 2 convolution matrix C2₁Right margin, be changed to next line, continue from left-hand It moves right, until 2 convolution matrix C1₁The lower right corner until, it is often mobile primary, take corresponding position in 2 × 2 sliding window region 2 convolution eigenmatrix C2₁The maximum value of matrix element, the pondization as convolution kernel region sample characteristic value, 2 convolution Eigenmatrix C2₁Each pool areaization sampling characteristic value is pressed former regional location and is arranged, and forms 3 convolution eigenmatrix C3₁；

To remaining 63 2 convolution eigenmatrix C2₂、C2₃……C2₆₄The down-sampled operation of above-mentioned pondization is carried out successively, altogether Meter obtains 64 3 convolution eigenmatrix C3₁、C3₂、C3₃……C3₆₄, form third layer convolution eigenmatrix；

(1.2.4) carries out convolution operation to third layer convolution eigenmatrix, obtains the 4th layer of convolution eigenmatrix：

Using the same manner in sub-step (1.2.2), using 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel are right 64 3 convolution eigenmatrix C3₁、C3₂、C3₃……C3₆₄, matrix convolution operation is carried out respectively, obtains 64 convolution spies altogether Matrix is levied, is directly added, forms 4 convolution eigenmatrix C4₁；

The convolution kernel of 128 3 × 3 is used altogether, to 64 3 convolution features included by third layer convolution eigenmatrix Matrix C 3₁、C3₂、C3₃……C3₆₄, carry out the preceding paragraph described in convolution operation, obtain 128 4 convolution eigenmatrix C4₁、C4₂、 C4₃……C4₁₂₈, form the 4th layer of convolution eigenmatrix；

(1.2.5) carries out convolution operation to the 4th layer of convolution eigenmatrix, obtains layer 5 convolution eigenmatrix：

Using the same manner in sub-step (1.2.2), using 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel are right 128 4 convolution eigenmatrix C4₁、C4₂、C4₃……C4₁₂₈, matrix convolution operation is carried out respectively, obtains 128 convolution altogether Eigenmatrix is directly added, and forms 5 convolution eigenmatrix C5₁；

The convolution kernel of 128 3 × 3 is used altogether, to 128 4 convolution spies included by the 4th layer of convolution eigenmatrix Levy Matrix C 4₁、C4₂、C4₃……C4₁₂₈, carry out the preceding paragraph described in convolution operation, obtain 128 5 convolution eigenmatrix C5₁、 C5₂、C5₃……C5₁₂₈, form layer 5 convolution eigenmatrix；

(1.2.6) carries out the down-sampled operation of second of pondization to layer 5 convolution eigenmatrix, obtains layer 6 convolution spy Levy matrix：

Using the same manner in sub-step (1.2.3), to 128 5 convolution eigenmatrix C5₁、C5₂、C5₃……C5₁₂₈, The down-sampled operation of pondization is carried out successively, obtains 128 6 convolution eigenmatrix C6 altogether₁、C6₂、C6₃……C6₁₂₈, form the Six layers of convolution eigenmatrix；

(1.2.7) carries out convolution operation to layer 6 convolution eigenmatrix, obtains layer 7 convolution eigenmatrix：

Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, to 128 6 convolution spies are used Levy Matrix C 6₁、C6₂、C6₃……C6₁₂₈, matrix convolution operation is carried out respectively, is obtained 128 convolution results, is directly added, Form 7 convolution eigenmatrix C7₁；

The convolution kernel of 256 3 × 3 is used altogether, to 128 6 convolution spies included by layer 6 convolution eigenmatrix Levy Matrix C 6₁、C6₂、C6₃……C6₁₂₈, carry out the preceding paragraph described in convolution operation, obtain 256 7 convolution eigenmatrix C7₁、 C7₂、C7₃……C7₂₅₆, form layer 7 convolution eigenmatrix；

(1.2.8) carries out convolution operation to layer 7 output eigenmatrix, obtains the 8th layer of convolution eigenmatrix：

Using the same manner in sub-step (1.2.2), using 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel are right 256 7 convolution eigenmatrix C7₁、C7₂、C7₃……C7₂₅₆, matrix convolution operation is carried out respectively, obtains 256 convolution knots Fruit is directly added, and forms 8 convolution eigenmatrix C8₁；

The convolution kernel of 256 3 × 3 is used altogether, to 256 7 convolution spies included by layer 7 convolution eigenmatrix Levy Matrix C 7₁、C7₂、C7₃……C7₂₅₆, carry out the preceding paragraph described in convolution operation, obtain 256 8 convolution eigenmatrix C8₁、 C8₂、C8₃……C8₂₅₆, form the 8th layer of convolution eigenmatrix；

(1.2.9) carries out convolution operation to the 8th layer of output eigenmatrix, obtains the 9th layer of convolution eigenmatrix：

Using the same manner in sub-step (1.2.2), using 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel are right 256 8 convolution eigenmatrix C8₁、C8₂、C8₃……C8₂₅₆, matrix convolution operation is carried out respectively, obtains 256 convolution knots Fruit is directly added, and forms 9 convolution eigenmatrix C9₁；

The convolution kernel of 256 3 × 3 is used altogether, to 256 8 convolution spies included by the 8th layer of convolution eigenmatrix Levy Matrix C 8₁、C8₂、C8₃……C8₂₅₆, carry out the preceding paragraph described in convolution operation, obtain 256 9 convolution eigenmatrix C9₁、 C9₂、C9₃……C9₂₅₆, form the 9th layer of convolution eigenmatrix；

(1.2.10) carries out the down-sampled operation of third time pondization to the 9th layer of output convolution eigenmatrix, obtains the tenth layer of volume Product eigenmatrix：

Using the same manner in sub-step (1.2.3), to 256 9 convolution eigenmatrix C9₁、C9₂、C9₃……C9₂₅₆, The down-sampled operation of pondization is carried out successively, obtains 256 10 convolution eigenmatrix C10 altogether₁、C10₂、C10₃……C10₂₅₆, structure Into the tenth layer of convolution eigenmatrix；

(1.2.11) carries out convolution operation successively to the tenth layer of convolution eigenmatrix, obtains eleventh floor, the 12nd successively Layer, the tenth three-layer coil product eigenmatrix：

According to the similar operation of sub-step (1.2.7), convolution operation, the volume used are carried out to the tenth layer of convolution eigenmatrix Product nuclear volume is 512, to 256 10 convolution eigenmatrix C10 included by the tenth layer of convolution eigenmatrix₁、C10₂、 C10₃……C10₂₅₆Obtain 512 11 convolution eigenmatrix C11₁、C11₂、C11₃……C11₅₁₂, form eleventh floor convolution Eigenmatrix；

According to the similar operation of sub-step (1.2.8), convolution operation is carried out to eleventh floor convolution eigenmatrix, is used Convolution nuclear volume is 512, to 512 11 convolution eigenmatrix C11 included by eleventh floor convolution eigenmatrix₁、 C11₂、C11₃……C11₅₁₂Obtain 512 12 convolution eigenmatrix C12₁、C12₂、C12₃……C12₅₁₂, form Floor 12 Convolution eigenmatrix；

According to the similar operation of sub-step (1.2.9), convolution operation is carried out to Floor 12 convolution eigenmatrix, is used Convolution nuclear volume is 512, to 512 12 convolution eigenmatrix C12 included by Floor 12 convolution eigenmatrix₁、 C12₂、C12₃……C12₅₁₂Obtain 512 13 convolution eigenmatrix C13₁、C13₂、C13₃……C13₅₁₂, form the 13rd layer Convolution eigenmatrix；

(1.2.12) is to the tenth three-layer coil product eigenmatrix C13₁、C13₂、C13₃……C13₅₁₂, carry out the 4th pondization drop Sampling operation obtains the 14th layer of convolution eigenmatrix：

Using the same manner in sub-step (1.2.3), to 512 13 convolution eigenmatrix C13₁、C13₂、C13₃…… C13₅₁₂, the down-sampled operation of pondization is carried out successively, obtains 512 14 convolution eigenmatrix C14 altogether₁、C14₂、C14₃…… C14₅₁₂, form the 14th layer of convolution eigenmatrix；

(1.2.13) carries out convolution operation successively to the 14th layer of convolution eigenmatrix, obtain successively the 15th layer, the tenth Six layers, the 17th layer of convolution eigenmatrix：

According to the similar operation of sub-step (1.2.11), convolution operation is carried out to the 14th layer of convolution eigenmatrix, is used Convolution nuclear volume for 512, to 512 14 convolution eigenmatrix C14 included by the 14th layer of convolution eigenmatrix₁、 C14₂、C14₃……C14₅₁₂Obtain 512 15 convolution eigenmatrix C15₁、C15₂、C15₃……C15₅₁₂, form the 15th layer Convolution eigenmatrix；

According to the similar operation of sub-step (1.2.12), convolution operation is carried out to the 15th layer of convolution eigenmatrix, is used Convolution nuclear volume for 512, to 512 15 convolution eigenmatrix C15 included by the 15th layer of convolution eigenmatrix₁、 C15₂、C15₃……C15₅₁₂Obtain 512 16 convolution eigenmatrix C16₁、C16₂、C16₃……C16₅₁₂, form the 16th layer Convolution eigenmatrix；

According to the similar operation of sub-step (1.2.12), convolution operation is carried out to the 16th layer of convolution eigenmatrix, is used Convolution nuclear volume for 512, to 512 16 convolution eigenmatrix C16 included by the 16th layer of convolution eigenmatrix₁、 C16₂、C16₃……C16₅₁₂Obtain 512 17 convolution eigenmatrix C17₁、C17₂、C17₃……C17₅₁₂, form the 17th layer Convolution eigenmatrix；

(1.2.14) carries out the 4th down-sampled operation of pondization, obtains the 18th layer to the 17th layer of convolution eigenmatrix Convolution eigenmatrix：

Using the same manner in sub-step (1.2.3), to 512 17 convolution eigenmatrix C17₁、C17₂、C17₃…… C17₅₁₂, the down-sampled operation of pondization is carried out successively, obtains 512 18 convolution eigenmatrix C18 altogether₁、C18₂、C18₃…… C18₅₁₂, form the 18th layer of convolution eigenmatrix；

(1.2.15) carries out convolution operation to the 18th layer of output eigenmatrix, obtains the 19th layer of convolution eigenmatrix：

Using the same manner in sub-step (1.2.2), using 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel are right 512 18 convolution eigenmatrix C18₁、C18₂、C18₃……C18₅₁₂, matrix convolution operation is carried out respectively, obtains 512 volumes Product is as a result, be directly added, 19 convolution eigenmatrix C19 of composition₁；

The convolution kernel of 4096 3 × 3 is used altogether, to 512 18 secondary volumes included by the 18th layer of convolution eigenmatrix Product eigenmatrix C18₁、C18₂、C18₃……C18₅₁₂, carry out the preceding paragraph described in convolution operation, obtain 4096 19 convolution spies Levy Matrix C 19₁、C19₂、C19₃……C19₄₀₉₆, form the 19th layer of convolution eigenmatrix；

(1.2.16) carries out convolution operation to the 19th layer of convolution eigenmatrix, obtains the 20th layer of convolution eigenmatrix：

Using the same manner in sub-step (1.2.2), using 1 × 1 convolution kernel, step-length 1, each weights of convolution kernel are right 4096 19 convolution eigenmatrix C19₁、C19₂、C19₃……C19₄₀₉₆, matrix convolution operation is carried out respectively, obtains 4096 Convolution results are directly added, and form 20 convolution eigenmatrix C20₁；

The convolution kernel of 4096 1 × 1 is used altogether, to 4096 19 secondary volumes included by the 19th layer of convolution eigenmatrix Product eigenmatrix C19₁、C19₂、C19₃……C19₄₀₉₆, carry out the preceding paragraph described in convolution operation, obtain 4096 20 convolution spies Levy Matrix C 20₁、C20₂、C20₃……C20₄₀₉₆, form the 20th layer of convolution eigenmatrix；

(1.2.17) carries out convolution operation to the 20th layer of output eigenmatrix, obtains the second eleventh floor convolution feature square Battle array：

Using the same manner in sub-step (1.2.2), using 1 × 1 convolution kernel, step-length 1, each weights of convolution kernel are right 4096 20 convolution eigenmatrix C20₁、C20₂、C20₃……C20₄₀₉₆, matrix convolution operation is carried out respectively, obtains 4096 Convolution results are directly added, and form 21 convolution eigenmatrix C21₁；

The convolution kernel of 32 1 × 1 is used altogether, to 4096 20 convolution included by the 20th layer of convolution eigenmatrix Eigenmatrix C20₁、C20₂、C20₃……C20₄₀₉₆, carry out the preceding paragraph described in convolution operation, obtain 32 21 convolution feature squares Battle array C21₁、C21₂、C21₃……C21₃₂, form the second eleventh floor convolution eigenmatrix；

Involved each each weights of convolution kernel are using VGG16 models in sub-step (1.2.1) to sub-step (1.2.17) Numerical value is initialized in Parameter File vgg16-0001.params, then corresponding each afterwards using parameter updating step (1.5) later Each weights of each convolution kernel of layer；

To every frame left view, by sub-step (1.2.1) to (1.2.17), two extracted by different scale are obtained Eleventh floor convolution eigenmatrix.

The Fusion Features step (1.3) includes following sub-steps：

(1.3.1) obtained third layer convolution eigenmatrix that operates down-sampled to first time pondization carries out deconvolution operation, Obtain first group of deconvolution eigenmatrix D3：

Using 1 × 1 convolution kernel, respectively to 64 3 convolution eigenmatrix C3₁、C3₂、C3₃……C3₆₄It is anti-into row matrix Convolution operation obtains 64 convolution outputs as a result, being directly added, forms 1 deconvolution eigenmatrix D3₁；

The convolution kernel of 32 1 × 1 is used altogether, to 64 3 convolution features included by third layer convolution eigenmatrix Matrix C 3₁、C3₂、C3₃……C3₆₄Operation described in the preceding paragraph is carried out, obtains 32 1 deconvolution eigenmatrix D3₁、D3₂、 D3₃……D3₃₂, form first group of deconvolution eigenmatrix D3；

(1.3.2) obtained layer 6 convolution eigenmatrix that operates down-sampled to second of pondization carries out deconvolution operation, Obtain second group of deconvolution eigenmatrix D6：

Using 2 × 2 convolution kernel, respectively to 128 6 convolution eigenmatrix C6₁、C6₂、C6₃……C6₁₂₈Into row matrix Deconvolution operates, and obtains 128 convolution outputs as a result, being directly added, forms 2 deconvolution eigenmatrix D6₁；

The convolution kernel of 32 2 × 2 is used altogether, to 128 6 convolution features included by layer 6 convolution eigenmatrix Matrix C 6₁、C6₂、C6₃……C6₁₂₈Operation described in the preceding paragraph is carried out, obtains 32 2 deconvolution eigenmatrix D6₁、D6₂、 D6₃……D6₃₂, form second group of deconvolution eigenmatrix D6；

(1.3.3) down-sampled to third time pondization to operate the tenth layer of obtained convolution eigenmatrix, the 4th pondization is dropped and adopted The 14th layer of convolution eigenmatrix that sample operates carries out deconvolution operation, obtains third group deconvolution eigenmatrix respectively D10 and the 4th group of deconvolution eigenmatrix D14：

Wherein, it is operated according to the matrix deconvolution in sub-step (1.3.2), altogether using the convolution kernel of 32 4 × 4, volume Each weights of product core, to 256 10 convolution eigenmatrix C10 included by the tenth layer of convolution eigenmatrix₁、C10₂、C10₃…… C10₂₅₆It is operated, obtains 32 3 deconvolution eigenmatrix D10₁、D10₂、D10₃……D10₃₂, form third group deconvolution Eigenmatrix D10；

The convolution kernel of 32 8 × 8 is used altogether, to 512 14 convolution included by the 14th layer of convolution eigenmatrix Eigenmatrix C14₁、C14₂、C14₃……C14₅₁₂It is operated, obtains 32 4 deconvolution eigenmatrix D14₁、D14₂、 D14₃……D14₃₂, form the 4th group of deconvolution eigenmatrix D14；

(1.3.4) operates the second eleventh floor convolution eigenmatrix into row matrix deconvolution, and it is special to obtain the 5th group of deconvolution Levy matrix：

It is operated according to the matrix deconvolution in sub-step (1.3.2), altogether using the convolution kernel of 32 16 × 16, convolution kernel Each weights, to 32 21 convolution eigenmatrix C21 included by the second eleventh floor convolution eigenmatrix₁、C21₂、C21₃…… C21₃₂It is operated, obtains 32 5 deconvolution eigenmatrix D21₁、D21₂、D21₃……D21₃₂, form the 5th group of deconvolution Eigenmatrix D21；

(1.3.5) cascades first group of deconvolution eigenmatrix D3 to the 5th groups of deconvolution eigenmatrix D21, forms Fusion feature matrix D c, shown in specific cascade system such as formula (1)：

Involved each each weights of convolution kernel use the ginseng of VGG16 models in sub-step (1.3.1) to sub-step (1.3.4) Numerical value is initialized in number file vgg16-0001.params, then corresponds to each layer afterwards using parameter updating step (1.5) later Each weights of each convolution kernel.

The View Synthesis step (1.4) includes following sub-steps：

(1.4.1) depth map：It to fusion feature matrix D c, is calculated, is obtained in corresponding left view using the following formula (2) Each pixel takes probabilistic forecasting value during different parallaxes, forms the parallax probability matrix Dep of left view：

Wherein, matrix element Dep_kAlso for matrix, k=1,2 ..., 32, represent probability value p (y=k | x；θ), that is, parallax is taken For k-16, each pixel of left view corresponding parallax probability value when regression parameter matrix is θ；Regression parameter matrix θ is by regression parameter [θ₁, θ₂..., θ₃₂] composition,Represent that p-th of tensor dimension is in regression parameter θ in Dc_pUnder logistic regression value, p=1, 2nd ... 32, use each recurrence of the numerical value to regression parameter matrix θ in the Parameter File vgg16-0001.params of VGG16 models Parameter initialization, later then using each regression parameter of parameter updating step (1.5) corresponding regression parameter matrix θ afterwards；

(1.4.2) forms right wing synthesis view R, wherein the pixel value R of the i-th row jth row_{I, j}As shown in formula (3)：

Wherein, L_dPan view during parallax d is translated for original left view,For its i-th row jth row pixel value,L_{I, j-d}For i in original left view, the pixel value at j-d positions, if j-d ＜ 0,D be regarding Difference, -15≤d≤+ 16,For i in matrix D epk, the element value at j positions is i in left view, pixel value at j positions Parallax is taken as the probability of d, k=d+16, i=1~H, j=1~W.

The parameter updating step (1.5) includes following sub-steps：

(1.5.1) subtracts each other right wing synthesis view R with the right view in step (1.2), obtains error matrix err_R；For Continuous N frame right wing synthesizes view, each error matrix err_RIt is added, forms propagated error matrix errS_R, M >=16；

(1.5.2) is by propagated error matrix errS_RView Synthesis step (1.4) is propagated backward to, first by parallax probability Matrix D ep is updated to new parallax probability matrix Dep^l+1, matrix element isAs shown in formula (4)：

The updated value of the l+1 times when expression parallax is d (d=k-16),It is once updated before representing it Value, learning rate η are initialized as 0.00003~0.00008；

Secondly regression parameter matrix θ is updated to θ l by (1.5.3)⁺¹, for θ^l+1Each regression parameterUnder State formula (5) update；

WhereinRepresent regression parameter θ_pThe l+1 times updated value,Represent its previous updated value, Dep_pAs Corresponding Dep after the update of View Synthesis step (1.4) parameter_k, wherein p=k, L_pFor the corresponding L of View Synthesis step_d, wherein D=p-16, Dc_pAs corresponding Dc in step (1.3.5)_t, the value range of wherein t=p, p are the update speed that 1 to 32, λ is θ Rate, 0.00001≤λ≤0.01；

(1.5.4) is by propagated error matrix errS_RBecome characteristic error matrix errD_R, as shown in formula (6)：

Again by characteristic error matrix errD_RCharacteristic extraction step (1.2) and Fusion Features step (1.3) are sent into, to being related to The weights of each convolution kernel be updated using the iterative manner in caffe deep learning frames,

Update 12608 convolution kernels altogether successively from the second eleventh floor to first layer, altogether 113472 convolution kernel weights； By sub-step (1.5.2)~(1.5.4), the primary update of all parameters is completed.

Characteristic extraction step is consistent with the level of VGG16 models that parameter initialization uses with Fusion Features step, real When border operates, to the right value update of each convolution kernel using caffe deep learnings frame (Jia Y, Shelhamer E, Donahue J, et al.Caffe：Convolutional architecture for fast feature embedding.In：Proc, ACM international conference on Multimedia.2014：In 675-678.) Iterative manner, be academia and the general gradient dropout error circulation way of industrial quarters, core is described as：Convolution kernel is each The current value of weights adds corresponding residual values, is each weights of updated convolution kernel；A certain layer input convolution eigenmatrix warp This layer of convolution kernel is crossed to carry out obtaining output convolution eigenmatrix, as next layer of input convolution feature square after convolution algorithm Battle array, then it is assumed that this layer inputs each by the convolution kernel of this layer between convolution eigenmatrix and next layer of input convolution eigenmatrix There is weights connection between weights corresponding position,

The residual values of a certain each weights of layer convolution kernel form a matrix, solve in the following manner：

It is right that the residual values of each weights on all convolution kernels of the later layer for there are weights to connect with the convolution kernel are multiplied by its respectively Answer position weights, all results summed to obtain matrix of consequence, when its somewhere location matrix element be less than zero when value be zero, most Obtained matrix is residual error value matrix eventually.

The present invention include training stage and service stage, the training stage by feature extraction, Fusion Features, View Synthesis and Parameter updates four core procedures, to 10⁶The 3D three-dimensional video-frequency movie films of the left-right format of grade are trained, and scene depth is estimated The task of meter and View Synthesis is carried out at the same time Optimization Solution, determines parameter, ensure that the pixel class precision of output right wing view is pre- It surveys, reduces and 2D videos are usually turned into the error that 3D videos are divided into two task operatings and bring.After training is completed, you can directly Conversion of the 2D videos to 3D videos is carried out, centre needs not move through turn being separated from the estimation of Depth of scene to View Synthesis Process processed, but front and rear step as input optimizes, trains together jointly in its training process, it can be big in service stage The big 2D videos that improve ensure that the precision of 3D three-dimensional video-frequencies finally exported to the transformation of ownership efficiency of 3D videos.

Compared with other existing 2D video transformation of ownership 3D video methods, effect of the invention protrusion is embodied in the following：

1. compared to the previous scene depth estimation of carry out first, the technological approaches of View Synthesis, this hair are then carried out again Bright to be put forward for the first time in scene depth estimation and View Synthesis unification a to frame, the design of one side step is concise, instruction It is fast to practice calculating speed after completing；On the other hand reduce amount broad in the middle and calculate the estimation of Depth precision of especially current scene not The error enough brought, can obtain higher output accuracy；

2. the present invention is accomplished that the image output of pixel scale, for each pixel in original image, prediction Its possible pixel-map accurately obtains its image distribution situation in another viewpoint.According to 10⁶The left and right lattice of grade The 3D three-dimensional video-frequency movie films of formula are trained, and the original intention of these video productions design is exactly stronger in order to which people is allowed to have when watching Euphorosia sense, it ensure that the accuracy of the 3 D video finally exported, while stronger vision is brought in video-see Impact effect.

Description of the drawings

Fig. 1 is the flow diagram of the present invention；

Fig. 2 is the input and output image of the present invention, one behavior input picture of top, the corresponding synthesis right wing of a following behavior View.

Specific embodiment

Below in conjunction with drawings and examples, the present invention is described in more detail.

As shown in Figure 1, the present invention includes training stage and service stage, the training stage includes data input step successively Suddenly, characteristic extraction step, Fusion Features step, View Synthesis step and parameter updating step；The service stage includes successively Data input step, characteristic extraction step, Fusion Features step and View Synthesis step.

The embodiment of the present invention, including training stage and service stage,

(1.1) data input step：It is 10 to obtain the order of magnitude by disclosed video data resource⁶Left and right view format 3D three-dimensional video-frequency movie films, and select disparity range be -15 to+16 3D three-dimensional video-frequency movie films, as training Data set, image size is H rows in three-dimensional video-frequency movie film, W is arranged, H=200~1920, W=180~1080；

The 3D three-dimensional video-frequency movie films of left and right view format contain different types of three-dimensional film data, including action Piece, feature film, external animation, documentary film, 3D video frequency propaganda films shown etc.；

The 3D three-dimensional video-frequency movie films of disparity range -15~+16 are picked out from 3D three-dimensional video-frequency movie films, process is such as Under：

(a) video data in 3D three-dimensional video-frequency movie films is converted to the solid of left and right view format one by one Image；

(b) the left and right view obtained in (a) is obtained into corresponding disparity map using Stereo Matching Algorithm；

(c) mean filter smoothing processing is carried out to the disparity map that (b) is obtained；

(d) statistics with histogram is carried out to the disparity map after (c) smoothing processing, obtains regarding in disparity map statistics with histogram The maximum value and minimum value of difference；

(e) according to the maximum value and minimum value of parallax value obtained in (d), judge the frame stereo-picture whether -15~+ In 16 disparity ranges, it is, retains；Otherwise present frame stereo-picture is abandoned.

For the processing that the every section of video initially chosen all proceeds as described above, ensure not by the video of indivedual specially treateds Segment is interfered, and in the present embodiment, the image data of about 1,200,000 frames is obtained eventually by the above process, is used as really The training dataset of rational method.

Binocular Stereo Matching Algorithm (D.Scharstein, R.Szeliski, " Ataxonomy and are used in the present embodiment Evaluation of dense two-frame stereo correspondencealgorithms ", International Journal of Computer Vision, 2002,47 (1-3), pp.7-42.) to training data concentrate video left and right two Road carries out Stereo matching, obtains the parallax distribution of each binocular tri-dimensional video, and the Binocular Stereo Matching Algorithm is a kind of General solution, the present embodiment use its official's open source projects.

After select training dataset, to different training video data into the unification of row format, in view of training data It concentrates the picture size gap of different video can be bigger, needs to do graphical rule size unified processing, ensure model ginseng Graphical rule in all videos is scaled unified 640 × 960 in the present embodiment by several trainabilities.

(1.2) characteristic extraction step：The three-dimensional video-frequency that training data concentrates left-right format is split as left and right road view lattice The stereo-picture of formula, reservation right view is constant, and mathematics convolution algorithm and the down-sampled operation of pondization are passed sequentially through to a frame left view Feature extraction is carried out, obtains the eleventh floor convolution eigenmatrix of the first layer convolution eigenmatrix of the frame left view~second, as Its characteristics of image.

Characteristic extraction step employs a large amount of convolution operation and the down-sampled operation of pondization, constantly carries out different scale, The feature extraction of different zones.For clarity, will be formed the operation of each layer convolution eigenmatrix, convolution kernel size, quantity with And sliding window size, quantity are given in down：

First layer, convolution algorithm, convolution kernel size are 3 × 3, quantity 64；

The second layer, convolution algorithm, convolution kernel size are 3 × 3, quantity 64；

Third layer, the down-sampled operation of first time pondization, sliding window are 2 × 2, step-length 2；

4th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 128；

Layer 5, convolution algorithm, convolution kernel size are 3 × 3, quantity 128；

Layer 6, the down-sampled operation of second of pondization, sliding window are 2 × 2, step-length 2；

Layer 7, convolution algorithm, convolution kernel size are 3 × 3, quantity 256；

8th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 256；

9th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 256；

Tenth layer, the down-sampled operation of third time pondization, sliding window is 2 × 2, step-length 2；

Eleventh floor, convolution algorithm, convolution kernel size are 3 × 3, quantity 512；

Floor 12, convolution algorithm, convolution kernel size are 3 × 3, quantity 512；

13rd layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 512；

14th layer, the 4th down-sampled operation of pondization, sliding window is 2 × 2, step-length 2；

15th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 512；

16th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 512；

17th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 512；

18th layer, the 5th down-sampled operation of pondization, sliding window is 2 × 2, step-length 2；

19th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 4096；

20th layer, convolution algorithm, convolution kernel size is 1 × 1, quantity 4096；

Second eleventh floor, convolution algorithm, convolution kernel size are 1 × 1, quantity 32；

The feature that input left view obtains after continuous several convolution feature extractions with stronger regional area with Global space ability to express.For it is one big it is small be 640 × 960 sizes input picture for, left image and right image are big Small is respectively 640 × 480 and 640 × 480.

After convolution operation and the down-sampled operation of pondization, the third layer of left view, layer 6, the tenth layer, the tenth Four layers, the second eleventh floor convolution eigenmatrix size be respectively 320 × 240,160 × 120,80 × 60,40 × 30,20 × 15.

(1.3) Fusion Features step：Respectively to third layer, layer 6, the tenth layer, the 14th layer, the second eleventh floor convolution Eigenmatrix is operated into row matrix deconvolution, by obtained first group of deconvolution eigenmatrix D3 to the 5th groups of deconvolution spy Sign matrix D 21 is cascaded, and forms fusion feature matrix D c, in this example fusion feature matrix D c sizes for 640 × 480 × 32, i.e. matrix has 640 rows, 480 row, 32 tensor dimensions；

It is finally the image of output pixel rank in the present invention, thus, in order to ensure that the right wing of pixel scale synthesizes view In to different scale, different zones, part and global characteristics all have good response, what the present invention exported different convolutional layers Convolution eigenmatrix is merged.Every time after the down-sampled operation of pondization, matrix character dimension will be reduced to last layer Half, such as C3₁、C3₂、C3₃…C3₆₄In the dimension of each eigenmatrix be C4₁、C4₂、C4₃…C4₁₂₈In each matrix character Twice, i.e. Matrix C 3₁Length and width be C4 respectively₁Twice.Matrix dimensionality size is consistent during in order to ensure Fusion Features, For the dimension of matrix after the down-sampled operation of different pondizations in the present invention, respectively into the deconvolution of row matrix, different pondization drops Sample level output convolution eigenmatrix carries out deconvolution operation, and difference lies in the of different sizes of convolution kernel.

With the Matrix C 6 of deconvolution to be carried out₁For, the concrete operations of deconvolution are as follows：First by Matrix C 6₁Length and width Expand N times respectively, N is the size of deconvolution core, as it was noted above, in the present invention deconvolution core size be respectively set to 2 × 2,4 × 4,8 × 8,16 × 16,32 × 32, i.e. N takes 2,4,8,16,32 according to different layers respectively.To C6₁N takes 4 during deconvolution.Square Array length is wide expand N times it is later, in-between value is completed using arest neighbors interpolation, obtains Matrix C 6₁′.Then using N × N's Convolution kernel, step-length N/2, from Matrix C 6₁' the upper left corner start, move right successively, until Matrix C 6₁' right margin, be changed to square Battle array C6₁' next line, continue to move from left to right, until Matrix C 6₁' the lower right corner until.Often mobile primary, convolution kernel is respectively weighed Value and corresponding position Matrix C 6₁' convolution value be multiplied, all products are added again, obtain Matrix C 6₁' region convolution the value.Matrix C6₁' each region convolution value is arranged by former regional location, that is, completes matrix deconvolution operation.

To the left view that input size is 640 × 480 in (1.2), third layer, layer 6, the tenth layer, the 14th layer, Second eleventh floor convolution eigenmatrix is after deconvolution operation, obtained first group of deconvolution eigenmatrix D3 to the 5th Group deconvolution eigenmatrix D21 sizes are 640 × 480 × 32.By cascade, it is 640 × 480 × 32 to finally constitute size Fusion feature matrix.

(1.4) View Synthesis step：To fusion feature matrix D c, using regression parameter matrix θ, obtain in corresponding left view Each pixel takes probabilistic forecasting value during different parallaxes, forms the parallax probability matrix Dep of left view；According to original left view and Parallax probability matrix Dep by View Synthesis, obtains right wing synthesis view.The fusion feature matrix of 640 × 480 × 32 sizes, Parallax probabilistic forecasting recurrence is carried out by regression parameter matrix θ, what is obtained is still 640 × 480 × 32 matrix；At this point for Each tensor dimension, i.e., 640 × 480 matrix intuitively understand that each pixel that can be obtained in left view takes correspondence The probability of parallax value d, the parallax probability matrix Dep and size that the present embodiment foundation size is 640 × 480 × 32 are 640 × 480 Original left view, it is in the same size with original left view that view is synthesized by the obtained right wing of View Synthesis, is 640 × 480；

(1.5) parameter updating step：Calculate the error matrix of right wing synthesis view and the right view in step (1.2) err_R；

View, each error matrix err are synthesized for continuous 640 frame right wing_RIt is added, forms propagated error matrix errS_R；It will Obtained propagated error matrix travels to View Synthesis step, Fusion Features step and feature extraction by back-propagation algorithm Each sub-step in step reversely updates parallax probability matrix Dep, regression parameter matrix θ and each convolution kernel of each layer successively Each weights complete the primary update of all parameters；The error of Fusion Features and characteristic extraction step in parameter renewal process passes It broadcasts and corresponding function interface in caffe deep learning frames is called to can be realized.The present embodiment learning rate η Initialize installations are 0.00005, learning rate attenuation 30% is set after 20000 wheel operations later；

Turn characteristic extraction step (1.2), the left view continuation being left to the training data concentration obtained carries out successively Feature extraction, Fusion Features, View Synthesis step and parameter updating step are stated, when training data concentrates all left and right views equal Using finishing, the first round update of parallax probability matrix Dep, regression parameter matrix θ and each convolution kernel of each layer, this reality are completed Applying needs undated parameter 2000 times in each round in example；

According to aforesaid way, parallax probability matrix Dep, regression parameter matrix θ and each convolution kernel of each layer are continued to complete Second wheel update；It so repeats, after the 100th wheel update is completed, the present embodiment training stage terminates；

(2) service stage includes data input step, characteristic extraction step, Fusion Features step and viewpoint conjunction successively Into step.

(2.1) data input step：Prepare plane 2D videos to be converted；

(2.2) characteristic extraction step：Plane 2D videos are split as image, as the left view being equivalent in step (1.2) Figure, passes sequentially through mathematics convolution algorithm and the down-sampled operation of pondization carries out feature extraction, obtains the first layer volume of the frame left view The product eleventh floor convolution eigenmatrix of eigenmatrix~second, as its characteristics of image；Each each weights of convolution kernel are equal involved by each layer Each weights of each convolution kernel of each layer are corresponded to after using the training stage；

(2.3) Fusion Features step：It is identical with step (1.3)；

(2.4) View Synthesis step：It is identical with step (1.4), according to the parallax probability matrix obtained in step (1.4) Image in Dep and (2.2) obtains right wing by View Synthesis and synthesizes view,

Features described above extraction, Fusion Features and View Synthesis step are carried out successively to each frame left view, by original image Left and right splicing is carried out as left view and obtained right wing synthesis view, then connecting together one by one obtains left and right lattice The 3D three-dimensional video-frequencies of formula.

Output par, c experimental result as shown in Fig. 2, top a line from left to right for continuous three frames input picture, following a line View is synthesized for corresponding right wing.

Claims

1. it is a kind of by 2D Video Quality Metrics into the method for 3D videos, including training stage and service stage, it is characterised in that：

(1) training stage includes data input step, characteristic extraction step, Fusion Features step, View Synthesis step successively Rapid and parameter updating step；

(1.1) data input step：It is 10 to obtain the order of magnitude by disclosed video data resource⁶Left and right view format 3D Three-dimensional video-frequency movie film, and the 3D three-dimensional video-frequency movie films that disparity range is -15~+16 are selected, as training data Collect, image size is H rows in three-dimensional video-frequency movie film, W is arranged, H=200~1920, W=180~1080；

(1.2) characteristic extraction step：The three-dimensional video-frequency that training data concentrates left and right view format is split as left and right view format Stereo-picture, retain right view it is constant, a frame left view is passed sequentially through mathematics convolution algorithm and the down-sampled operation of pondization into Row feature extraction obtains the eleventh floor convolution eigenmatrix of the first layer convolution eigenmatrix of the frame left view~second, as it Characteristics of image；

(1.3) Fusion Features step：Respectively to third layer, layer 6, the tenth layer, the 14th layer, the second eleventh floor convolution feature Matrix is operated into row matrix deconvolution, by obtained first group of deconvolution eigenmatrix D3 to the 5th groups of deconvolution feature square Battle array D21 is cascaded, and forms fusion feature matrix D c, and fusion feature matrix D c sizes are H × W × 32, i.e. matrix has H rows, W Row, 32 tensor dimensions；

(1.4) View Synthesis step：

To fusion feature matrix D c, using regression parameter matrix θ, obtain when each pixel in corresponding left view takes different parallaxes Probabilistic forecasting value forms the parallax probability matrix Dep of left view；

(1.5) parameter updating step：

View, each error matrix err are synthesized for continuous N frame right wing_RIt is added, forms propagated error matrix errS_R, M >=16；It will Obtained propagated error matrix travels to View Synthesis step, Fusion Features step and feature extraction by back-propagation algorithm Each sub-step in step reversely updates parallax probability matrix Dep, regression parameter matrix θ and each convolution kernel of each layer successively Each weights complete the primary update of all parameters；

Turn characteristic extraction step (1.2), remaining left view continuation is concentrated to carry out above-mentioned spy successively to the training data obtained Extraction, Fusion Features, View Synthesis step and parameter updating step are levied, when training data concentrates all left and right views to use It finishes, completes the first round update of parallax probability matrix Dep, regression parameter matrix θ and each convolution kernel of each layer；

According to the step (1.1) to step (1.5), parallax probability matrix Dep, regression parameter matrix θ and each are continued to complete Second wheel update of each convolution kernel of layer；So repeat, after the 50th wheel the~the two hundred wheel update is completed, training stage knot Beam；

(2) service stage includes data input step, characteristic extraction step, Fusion Features step and View Synthesis step successively Suddenly；

(2.1) data input step：Prepare plane 2D videos to be converted；

(2.2) characteristic extraction step：Plane 2D videos are split as image, as the left view being equivalent in step (1.2), according to It is secondary that feature extraction is carried out by mathematics convolution algorithm and the down-sampled operation of pondization, obtain the first layer convolution feature of the frame left view The eleventh floor convolution eigenmatrix of matrix~second, as its characteristics of image；Each each weights of convolution kernel involved by each layer are using instruction Each weights of each convolution kernel of each layer are corresponded to after practicing the stage；

(2.3) Fusion Features step：It is identical with step (1.3)；

(2.4) View Synthesis step：It is identical with step (1.4), according to the parallax probability matrix Dep obtained in step (1.4) and (2.2) image in obtains right wing by View Synthesis and synthesizes view；

Carry out features described above extraction step (2.2), Fusion Features step (2.3) and View Synthesis step successively to each frame left view Suddenly (2.4) carry out left and right splicing using original image as left view and obtained right wing synthesis view, then connect one by one Obtain the 3D three-dimensional video-frequencies of left-right format together.

2. as described in claim 1 by 2D Video Quality Metrics into the method for 3D videos, it is characterised in that：The characteristic extraction step (1.2) including following sub-steps：

Using 3 × 3 convolution kernel, step-length 1 since the upper left corner of image, moves right successively, until the right margin of image, The next line of image is changed to, continues to move from left to right, until the lower right corner of image, often mobile primary, convolution kernel is respectively weighed Value is multiplied with the pixel value of corresponding position image, and all products are added again, obtain image convolution core region convolution value；Image Each region convolution value is pressed former regional location and is arranged, and forms 1 convolution eigenmatrix C1 of the frame left view₁, to C1₁In it is all small In zero element value value be zero；

Use the convolution kernel of 64 3 × 3 altogether, the preceding paragraph is carried out to image described in convolution operation, obtain 64 1 convolution features Matrix C 1₁、C1₂、C₁₃……C1₆₄, form first layer convolution eigenmatrix；

Use 3 × 3 convolution kernel, step-length 1, from 1 convolution eigenmatrix C1 of first layer₁The upper left corner start, move right successively It is dynamic, until 1 convolution eigenmatrix C1₁Right margin, be changed to next line, continue to move from left to right, until 1 convolution feature square Battle array C1₁The lower right corner until, often mobile primary, each weights of convolution kernel and 1 convolution eigenmatrix C1 of corresponding position₁Matrix element Element value is multiplied, and all products are added again, obtain 1 convolution eigenmatrix C1₁Convolution kernel region convolution value；1 convolution spy Levy Matrix C 1₁Each region convolution value is pressed former regional location and is arranged, and forms convolution eigenmatrix, in obtained convolution eigenmatrix All minus element value values are zero；

Later to remaining 63 1 convolution eigenmatrix C1₂、C₁₃……C1₆₄, use the corresponding convolution of this layer of convolution eigenmatrix Core repeats operation described in the preceding paragraph, obtains 64 convolution eigenmatrixes altogether, is directly added, and forms 2 convolution spies Levy Matrix C 2₁；

The convolution kernel of 64 3 × 3 is used altogether, to 64 1 convolution eigenmatrixes included by first layer convolution eigenmatrix C1₁、C1₂、C₁₃……C1₆₄Two sections of convolution operations are carried out, obtain 64 2 convolution eigenmatrix C2₁、C2₂、C2₃…… C2₆₄, form second layer convolution eigenmatrix；

(1.2.3) carries out the down-sampled operation of first time pondization to second layer convolution eigenmatrix, obtains third layer convolution feature square Battle array：

To 2 convolution eigenmatrix C2 of the second layer₁, use 2 × 2 sliding window, step-length 2, from 2 convolution eigenmatrix C2₁ The upper left corner start, move right successively, until 2 convolution matrix C2₁Right margin, be changed to next line, continuation moves from left to right It is dynamic, until 2 convolution matrix C1₁The lower right corner until, it is often mobile primary, take corresponding position 2 times in 2 × 2 sliding window region Convolution eigenmatrix C2₁The maximum value of matrix element, the pondization as convolution kernel region sample characteristic value, 2 convolution spies Levy Matrix C 2₁Each pool areaization sampling characteristic value is pressed former regional location and is arranged, and forms 3 convolution eigenmatrix C3₁；

To remaining 63 2 convolution eigenmatrix C2₂、C2₃……C2₆₄The down-sampled operation of above-mentioned pondization is carried out successively, is obtained altogether 64 3 convolution eigenmatrix C3₁、C3₂、C3₃……C3₆₄, form third layer convolution eigenmatrix；

Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel, to 64 3 are used Secondary convolution eigenmatrix C3₁、C3₂、C3₃……C3₆₄, matrix convolution operation is carried out respectively, obtains 64 convolution feature squares altogether Battle array, is directly added, and forms 4 convolution eigenmatrix C4₁；

The convolution kernel of 128 3 × 3 is used altogether, to 64 3 convolution eigenmatrixes included by third layer convolution eigenmatrix C3₁、C3₂、C3₃……C3₆₄, carry out the preceding paragraph described in convolution operation, obtain 128 4 convolution eigenmatrix C4₁、C4₂、 C4₃……C4₁₂₈, form the 4th layer of convolution eigenmatrix；

Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel, to 128 are used 4 convolution eigenmatrix C4₁、C4₂、C4₃……C4₁₂₈, matrix convolution operation is carried out respectively, obtains 128 convolution features altogether Matrix is directly added, and forms 5 convolution eigenmatrix C5₁；

The convolution kernel of 128 3 × 3 is used altogether, to 128 4 convolution feature squares included by the 4th layer of convolution eigenmatrix Battle array C4₁、C4₂、C4₃……C4₁₂₈, carry out the preceding paragraph described in convolution operation, obtain 128 5 convolution eigenmatrix C5₁、C5₂、 C5₃……C5₁₂₈, form layer 5 convolution eigenmatrix；

(1.2.6) carries out the down-sampled operation of second of pondization to layer 5 convolution eigenmatrix, obtains layer 6 convolution feature square Battle array：

Using the same manner in sub-step (1.2.3), to 128 5 convolution eigenmatrix C5₁、C5₂、C5₃……C5₁₂₈, successively The down-sampled operation of pondization is carried out, obtains 128 6 convolution eigenmatrix C6 altogether₁、C6₂、C6₃……C6₁₂₈, form layer 6 Convolution eigenmatrix；

Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, to 128 6 convolution feature squares are used Battle array C6₁、C6₂、C6₃……C6₁₂₈, matrix convolution operation is carried out respectively, 128 convolution results is obtained, is directly added, and is formed 7 convolution eigenmatrix C7₁；

The convolution kernel of 256 3 × 3 is used altogether, to 128 6 convolution feature squares included by layer 6 convolution eigenmatrix Battle array C6₁、C6₂、C6₃……C6₁₂₈, carry out the preceding paragraph described in convolution operation, obtain 256 7 convolution eigenmatrix C7₁、C7₂、 C7₃……C7₂₅₆, form layer 7 convolution eigenmatrix；

Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel, to 256 are used 7 convolution eigenmatrix C7₁、C7₂、C7₃……C7₂₅₆, matrix convolution operation is carried out respectively, obtains 256 convolution results, directly It is added, forms 8 convolution eigenmatrix C8₁；

The convolution kernel of 256 3 × 3 is used altogether, to 256 7 convolution feature squares included by layer 7 convolution eigenmatrix Battle array C7₁、C7₂、C7₃……C7₂₅₆, carry out the preceding paragraph described in convolution operation, obtain 256 8 convolution eigenmatrix C8₁、C8₂、 C8₃……C8₂₅₆, form the 8th layer of convolution eigenmatrix；

Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel, to 256 are used 8 convolution eigenmatrix C8₁、C8₂、C8₃……C8₂₅₆, matrix convolution operation is carried out respectively, obtains 256 convolution results, directly It is added, forms 9 convolution eigenmatrix C9₁；

The convolution kernel of 256 3 × 3 is used altogether, to 256 8 convolution feature squares included by the 8th layer of convolution eigenmatrix Battle array C8₁、C8₂、C8₃……C8₂₅₆, carry out the preceding paragraph described in convolution operation, obtain 256 9 convolution eigenmatrix C9₁、C9₂、 C9₃……C9₂₅₆, form the 9th layer of convolution eigenmatrix；

(1.2.10) carries out the down-sampled operation of third time pondization to the 9th layer of output convolution eigenmatrix, obtains the tenth layer of convolution spy Levy matrix：

Using the same manner in sub-step (1.2.3), to 256 9 convolution eigenmatrix C9₁、C9₂、C9₃……C9₂₅₆, successively The down-sampled operation of pondization is carried out, obtains 256 10 convolution eigenmatrix C10 altogether₁、C10₂、C10₃……C10₂₅₆, form the Ten layers of convolution eigenmatrix；

(1.2.11) carries out convolution operation successively to the tenth layer of convolution eigenmatrix, obtains eleventh floor, Floor 12, successively Ten three-layer coils accumulate eigenmatrix：

According to the similar operation of sub-step (1.2.7), convolution operation, the convolution kernel used are carried out to the tenth layer of convolution eigenmatrix Quantity is 512, to 256 10 convolution eigenmatrix C10 included by the tenth layer of convolution eigenmatrix₁、C10₂、 C10₃……C10₂₅₆Obtain 512 11 convolution eigenmatrix C11₁、C11₂、C11₃……C11₅₁₂, form eleventh floor convolution Eigenmatrix；

According to the similar operation of sub-step (1.2.8), convolution operation, the convolution used are carried out to eleventh floor convolution eigenmatrix Nuclear volume is 512, to 512 11 convolution eigenmatrix C11 included by eleventh floor convolution eigenmatrix₁、C11₂、 C11₃……C11₅₁₂Obtain 512 12 convolution eigenmatrix C12₁、C12₂、C12₃……C12₅₁₂, form Floor 12 convolution Eigenmatrix；

According to the similar operation of sub-step (1.2.9), convolution operation, the convolution used are carried out to Floor 12 convolution eigenmatrix Nuclear volume is 512, to 512 12 convolution eigenmatrix C12 included by Floor 12 convolution eigenmatrix₁、C12₂、 C12₃……C12₅₁₂Obtain 512 13 convolution eigenmatrix C13₁、C13₂、C13₃……C13₅₁₂, form the tenth three-layer coil product Eigenmatrix；

(1.2.12) is to the tenth three-layer coil product eigenmatrix C13₁、C13₂、C13₃……C13₅₁₂, it is down-sampled to carry out the 4th pondization Operation, obtains the 14th layer of convolution eigenmatrix：

(1.2.13) carries out convolution operation successively to the 14th layer of convolution eigenmatrix, obtain successively the 15th layer, the 16th layer, 17th layer of convolution eigenmatrix：

According to the similar operation of sub-step (1.2.11), convolution operation, the volume used are carried out to the 14th layer of convolution eigenmatrix Product nuclear volume is 512, to 512 14 convolution eigenmatrix C14 included by the 14th layer of convolution eigenmatrix₁、C14₂、 C14₃……C14₅₁₂Obtain 512 15 convolution eigenmatrix C15₁、C15₂、C15₃……C15₅₁₂, form the 15th layer of convolution Eigenmatrix；

According to the similar operation of sub-step (1.2.12), convolution operation, the volume used are carried out to the 15th layer of convolution eigenmatrix Product nuclear volume is 512, to 512 15 convolution eigenmatrix C15 included by the 15th layer of convolution eigenmatrix₁、C15₂、 C15₃……C15₅₁₂Obtain 512 16 convolution eigenmatrix C16₁、C16₂、C16₃……C16₅₁₂, form the 16th layer of convolution Eigenmatrix；

According to the similar operation of sub-step (1.2.12), convolution operation, the volume used are carried out to the 16th layer of convolution eigenmatrix Product nuclear volume is 512, to 512 16 convolution eigenmatrix C16 included by the 16th layer of convolution eigenmatrix₁、C16₂、 C16₃……C16₅₁₂Obtain 512 17 convolution eigenmatrix C17₁、C17₂、C17₃……C17₅₁₂, form the 17th layer of convolution Eigenmatrix；

(1.2.14) carries out the 4th down-sampled operation of pondization, obtains the 18th layer of convolution to the 17th layer of convolution eigenmatrix Eigenmatrix：

Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel, to 512 are used 18 convolution eigenmatrix C18₁、C18₂、C18₃……C18₅₁₂, matrix convolution operation is carried out respectively, obtains 512 convolution knots Fruit is directly added, and forms 19 convolution eigenmatrix C19₁；

The convolution kernel of 4096 3 × 3 is used altogether, to 512 18 convolution spies included by the 18th layer of convolution eigenmatrix Levy Matrix C 18₁、C18₂、C18₃……C18₅₁₂, carry out the preceding paragraph described in convolution operation, obtain 4096 19 convolution feature squares Battle array C19₁、C19₂、C19₃……C19₄₀₉₆, form the 19th layer of convolution eigenmatrix；

Using the same manner in sub-step (1.2.2), 1 × 1 convolution kernel, step-length 1, each weights of convolution kernel, to 4096 are used A 19 convolution eigenmatrixes C19₁、C19₂、C19₃……C19₄₀₉₆, matrix convolution operation is carried out respectively, obtains 4096 convolution As a result, being directly added, 20 convolution eigenmatrix C20 are formed₁；

The convolution kernel of 4096 1 × 1 is used altogether, to 4096 19 convolution spies included by the 19th layer of convolution eigenmatrix Levy Matrix C 19₁、C19₂、C19₃……C19₄₀₉₆, carry out the preceding paragraph described in convolution operation, obtain 4096 20 convolution feature squares Battle array C20₁、C20₂、C20₃……C20₄₀₉₆, form the 20th layer of convolution eigenmatrix；

(1.2.17) carries out convolution operation to the 20th layer of output eigenmatrix, obtains the second eleventh floor convolution eigenmatrix：

Using the same manner in sub-step (1.2.2), 1 × 1 convolution kernel, step-length 1, each weights of convolution kernel, to 4096 are used A 20 convolution eigenmatrixes C20₁、C20₂、C20₃……C20₄₀₉₆, matrix convolution operation is carried out respectively, obtains 4096 convolution As a result, being directly added, 21 convolution eigenmatrix C21 are formed₁；

The convolution kernel of 32 1 × 1 is used altogether, to 4096 20 convolution features included by the 20th layer of convolution eigenmatrix Matrix C 20₁、C20₂、C20₃……C20₄₀₉₆, carry out the preceding paragraph described in convolution operation, obtain 32 21 convolution eigenmatrixes C21₁、C21₂、C21₃……C21₃₂, form the second eleventh floor convolution eigenmatrix；

Involved each each weights of convolution kernel use the parameter of VGG16 models in sub-step (1.2.1) to sub-step (1.2.17) Numerical value is initialized in file vgg16-0001.params, and it is each then using parameter updating step (1.5) to correspond to each layer afterwards later Each weights of convolution kernel；

To every frame left view, by sub-step (1.2.1) 55 (1.2.17), 21 extracted by different scale are obtained Layer convolution eigenmatrix.

3. as claimed in claim 1 or 2 by 2D Video Quality Metrics into the method for 3D videos, it is characterised in that：

The Fusion Features step (1.3) includes following sub-steps：

(1.3.1) obtained third layer convolution eigenmatrix that operates down-sampled to first time pondization carries out deconvolution operation, obtains First group of deconvolution eigenmatrix D3：

Using 1 × 1 convolution kernel, respectively to 64 3 convolution eigenmatrix C3₁、C3₂、C3₃……C3₆₄Into row matrix deconvolution Operation obtains 64 convolution outputs as a result, being directly added, forms 1 deconvolution eigenmatrix D3₁；

The convolution kernel of 32 1 × 1 is used altogether, to 64 3 convolution eigenmatrixes included by third layer convolution eigenmatrix C3₁、C3₂、C3₃……C3₆₄Operation described in the preceding paragraph is carried out, obtains 32 1 deconvolution eigenmatrix D3₁、D3₂、D3₃…… D3₃₂, form first group of deconvolution eigenmatrix D3；

(1.3.2) obtained layer 6 convolution eigenmatrix that operates down-sampled to second of pondization carries out deconvolution operation, obtains Second group of deconvolution eigenmatrix D6：

Using 2 × 2 convolution kernel, respectively to 128 6 convolution eigenmatrix C6₁、C6₂、C6₃……C6₁₂₈Into row matrix warp Product operation obtains 128 convolution outputs as a result, being directly added, forms 2 deconvolution eigenmatrix D6₁；

The convolution kernel of 32 2 × 2 is used altogether, to 128 6 convolution eigenmatrixes included by layer 6 convolution eigenmatrix C6₁、C6₂、C6₃……C6₁₂₈Operation described in the preceding paragraph is carried out, obtains 32 2 deconvolution eigenmatrix D6₁、D6₂、D6₃…… D6₃₂, form second group of deconvolution eigenmatrix D6；

(1.3.3) is down-sampled to third time pondization to operate the tenth layer of obtained convolution eigenmatrix, the 4th down-sampled behaviour of pondization Make obtained the 14th layer of convolution eigenmatrix, carry out deconvolution operation respectively, obtain third group deconvolution eigenmatrix D10 and 4th group of deconvolution eigenmatrix D14：

Wherein, it is operated according to the matrix deconvolution in sub-step (1.3.2), altogether using the convolution kernel of 32 4 × 4, convolution kernel Each weights, to 256 10 convolution eigenmatrix C10 included by the tenth layer of convolution eigenmatrix₁、C10₂、C10₃…… C10₂₅₆It is operated, obtains 32 3 deconvolution eigenmatrix D10₁、D10₂、D10₃……D10₃₂, form third group deconvolution Eigenmatrix D10；

The convolution kernel of 32 8 × 8 is used altogether, to 512 14 convolution features included by the 14th layer of convolution eigenmatrix Matrix C 14₁、C14₂、C14₃……C14₅₁₂It is operated, obtains 32 4 deconvolution eigenmatrix D14₁、D14₂、D14₃…… D14₃₂, form the 4th group of deconvolution eigenmatrix D14；

(1.3.4) operates the second eleventh floor convolution eigenmatrix into row matrix deconvolution, obtains the 5th group of deconvolution feature square Battle array：

It is operated according to the matrix deconvolution in sub-step (1.3.2), altogether using the convolution kernel of 32 16 × 16, convolution kernel is respectively weighed Value, to 32 21 convolution eigenmatrix C21 included by the second eleventh floor convolution eigenmatrix₁、C21₂、C21₃……C21₃₂ It is operated, obtains 32 5 deconvolution eigenmatrix D21₁、D21₂、D21₃……D21₃₂, form the 5th group of deconvolution feature Matrix D 21；

(1.3.5) cascades first group of deconvolution eigenmatrix D3 to the 5th groups of deconvolution eigenmatrix D21, forms fusion Eigenmatrix Dc, shown in specific cascade system such as formula (1)：

Wherein N=32 represents every group of deconvolution feature by sub-step (1.3.1)-(1.3.4), being obtained per sub-steps The number of matrix, obtained fusion feature matrix D c sizes are H × W × 32, i.e. H rows, W row, 32 tensor dimensions, Dc₁It represents First in 32 tensor dimensions, Dc_tRepresent t-th in 32 tensor dimensions；

Involved each each weights of convolution kernel are using the parameter of VGG16 models text in sub-step (1.3.1) to sub-step (1.3.4) Numerical value is initialized in part vgg16-0001.params, and then corresponding to each layer afterwards using parameter updating step (1.5) later respectively rolls up Each weights of product core.

4. as claimed in claim 3 by 2D Video Quality Metrics into the method for 3D videos, it is characterised in that：

The View Synthesis step (1.4) includes following sub-steps：

(1.4.1) depth map：It to fusion feature matrix D c, is calculated, obtained each in corresponding left view using the following formula (2) Pixel takes probabilistic forecasting value during different parallaxes, forms the parallax probability matrix Dep of left view：

Wherein, matrix element Dep_kAlso for matrix, k=1,2 ..., 32, represent probability value p (y=k | x；θ), that is, it is k- to take parallax 16, each pixel of left view corresponding parallax probability value when regression parameter matrix is θ；Regression parameter matrix θ is by regression parameter [θ₁, θ₂..., θ₃₂] composition,Represent that p-th of tensor dimension is in regression parameter θ in Dc_pUnder logistic regression value, p=1,2, ... 32, ginseng is returned to each of regression parameter matrix θ using numerical value in the Parameter File vgg16-0001.params of VGG16 models Number initialization, later then using each regression parameter of parameter updating step (1.5) corresponding regression parameter matrix θ afterwards；

(1.4.2) forms right wing synthesis view R, wherein the pixel value R of the i-th row jth row_i,_jAs shown in formula (3)：

Wherein, L_dPan view during parallax d is translated for original left view,For its i-th row jth row pixel value,L_{I, j-d}For i in original left view, the pixel value at j-d positions, if j-d ＜ 0,D be regarding Difference, -15≤d≤+ 16,For matrix D ep_kElement value at middle i, j position is i in left view, and pixel value regards at j positions Difference is taken as the probability of d, k=d+16, i=1~H, j=1~W.

5. as claimed in claim 4 by 2D Video Quality Metrics into the method for 3D videos, it is characterised in that：

The parameter updating step (1.5) includes following sub-steps：

(1.5.2) is by propagated error matrix errS_RView Synthesis step (1.4) is propagated backward to, first by parallax probability matrix Dep is updated to new parallax probability matrix Dep^l+1, matrix element isAs shown in formula (4)：

The updated value of the l+1 times when expression parallax is d (d=k-16), d=k-16；Represent that it is preceding primary Updated value, learning rate η are initialized as 0.00003~0.00008；

Secondly regression parameter matrix θ is updated to θ by (1.5.3)^l+1, for θ^l+1Each regression parameterAccording to following public affairs Formula (5) updates；

WhereinRepresent regression parameter θ_pThe l+1 times updated value,Represent its previous updated value, Dep_pAs viewpoint Corresponding Dep after the update of synthesis step (1.4) parameter_k, wherein p=k, L_pFor the corresponding L of View Synthesis step_d, wherein d= P-16, Dc_pAs corresponding Dc in step (1.3.5)_t, the value range of wherein t=p, p are the renewal rates that 1 to 32, λ is θ, 0.00001≤λ≤0.01；

Again by characteristic error matrix errD_RCharacteristic extraction step (1.2) and Fusion Features step (1.3) are sent into, it is each to what is be related to The weights of convolution kernel are updated using the iterative manner of caffe deep learning frames；

Update 12608 convolution kernels altogether successively from the second eleventh floor to first layer, altogether 113472 convolution kernel weights；By Sub-step (1.5.2)~(1.5.4) completes the primary update of all parameters.