CN107018400B - It is a kind of by 2D Video Quality Metrics into the method for 3D videos - Google Patents

It is a kind of by 2D Video Quality Metrics into the method for 3D videos Download PDF

Info

Publication number
CN107018400B
CN107018400B CN201710227433.4A CN201710227433A CN107018400B CN 107018400 B CN107018400 B CN 107018400B CN 201710227433 A CN201710227433 A CN 201710227433A CN 107018400 B CN107018400 B CN 107018400B
Authority
CN
China
Prior art keywords
convolution
eigenmatrix
layer
matrix
obtains
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710227433.4A
Other languages
Chinese (zh)
Other versions
CN107018400A (en
Inventor
曹治国
赵富荣
肖阳
李炽
张骁迪
鲜可
李睿博
李然
张润泽
杨佳琪
朱延俊
赵峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201710227433.4A priority Critical patent/CN107018400B/en
Publication of CN107018400A publication Critical patent/CN107018400A/en
Application granted granted Critical
Publication of CN107018400B publication Critical patent/CN107018400B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/139Format conversion, e.g. of frame-rate or size

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)
  • Complex Calculations (AREA)

Abstract

It is a kind of by 2D Video Quality Metrics into the method for 3D videos, belong to pattern-recognition and computer vision field, it is therefore intended that eliminate the estimation of prior art scene depth and unpredictable error that View Synthesis is brought, while greatly improve calculating speed.The present invention includes training stage and service stage, and the training stage includes data input, feature extraction, Fusion Features, View Synthesis and parameter updating step successively;Service stage includes data input, feature extraction, Fusion Features and View Synthesis step successively.Training stage is to 106The 3D three-dimensional video-frequency movie films of the left-right format of grade are trained, scene depth estimation and View Synthesis are carried out at the same time Optimization Solution, it determines parameter, ensure that the Pixel-level accuracy prediction of output right wing view, reduce the error that 2D videos are turned 3D videos and are divided into two task operatings and bring;After training is completed, you can directly carry out conversion of the 2D videos to 3D videos, transformation of ownership efficiency can be greatly improved, ensure the precision of 3D three-dimensional video-frequencies finally exported.

Description

It is a kind of by 2D Video Quality Metrics into the method for 3D videos
Technical field
The invention belongs to pattern-recognitions and computer vision field, and in particular to it is a kind of by 2D Video Quality Metrics into 3D videos Method, the plane 2D videos for common camera to be shot are directly changed into the 3D for the left-right format that can be watched in cinema Three-dimensional video-frequency.
Background technology
With the development of virtual reality technology, spectators is allowed to be immersed in 3D experience and increasingly become one, multimedia recreation field Very important direction.3D experiences the support for needing panoramic video and 3D effect, and common planar video is if it is desire to obtain shadow The 3D viewings experience of institute, needs the planar video of 2D being converted into 3D videos.There are many kinds of the forms of 3D videos, for having a left side The 3D three-dimensional video-frequencies of right form, it is first various geometrical relationships, scene in 2D video images that usual 2D videos, which turn 3D videos, Semantic information, estimate the front and rear hierarchical relationship of the object in 2D video images, and then by this spatial relation, into Row geometric maps synthesize the image of another viewpoint, and the image of two viewpoints in left and right merges again, ultimately generates 3D solids Video.
Traditional two-dimensional video turns 3 D video by two kinds of approach of hardware and software to solve.Hardware mainly has three-dimensional throwing Shadow instrument or bore hole 3D TVs, width, into the compression of line width, is become original by them first by the 2D video images of input Obtained video, is then not added with the offset or mapping of direct carry out level amount distinguished by half, and final same object exists Position in two viewpoints in left and right is different, and the perception principle based on human eye it is expected to obtain 3D effect, this is a kind of very coarse Method, 3D effect is not notable under most scenes, and the 2D of this product turns 3D functions and is limited by its product quality on the market, very The user of positive audient is fewer, and most manufacturers are more absorbed in the performance boost of hardware device, that is, it is good to input the transformation of ownership 3D three-dimensional video-frequencies, by its hardware, obtain relatively good Three-dimensional Display effect.And the function that 2D turns 3D is mainly calculated by software Method obtains, and general 2D, which turns 3D, to be come by the depth hierarchical information of object, semantic analysis, View Synthesis in estimation scene To video in addition all the way, two different visual point images have the space structure of scene different exhibition methods, and people is watching When be obtained with relatively good three-dimensional experience.
The solutions such as Electric company of Sichuan Changhong (201110239086.X) are optimized by hardware or software algorithm Above-mentioned two step is exported with it is expected to obtain better 3D visual effect.On the one hand these schemes compare the consumption of computing resource Greatly, time cost is also higher;Depth extraction, the View Synthesis technology of still further aspect scene are also further being explored at present In, during actual use larger calculating error is had, two-part error is easy to cumulative and then influences final viewing Experience.With the arrival in big data epoch, the videos such as more and more good three-dimensional movies, documentary film, animation are produced out, And the later stage 2D of current plane video turn 3D making still to expend huge manpower and materials.
VGG16 models involved in the present invention, it is by K.Simonyan, A.Zisserma, in " Very deep Convolutional networks for large-scale imagerecognition, " it proposes, see arXiv: 1409.1556 2014.This article annex open source projects, including Parameter File vgg16-0001.params, content packet Include the weighting parameter of each layer convolution kernel in VGG16 models.
Invention content
The present invention provide it is a kind of by 2D Video Quality Metrics into the method for 3D videos, it is therefore intended that eliminate prior art Scene The unpredictable error that estimation of Depth and View Synthesis are brought, while the calculating speed of the video solid transformation of ownership is greatly improved, to obtain Obtain better stereos copic viewing experience.
It is provided by the present invention it is a kind of by 2D Video Quality Metrics into the method for 3D videos, including training stage and service stage, It is characterized in that:
(1) training stage includes data input step, characteristic extraction step, Fusion Features step, viewpoint conjunction successively Into step and parameter updating step;
(1.1) data input step:It is 10 to obtain the order of magnitude by disclosed video data resource6Left and right view format 3D three-dimensional video-frequency movie films, and select disparity range be -15~+16 3D three-dimensional video-frequency movie films, as training Data set, image size is H rows in three-dimensional video-frequency movie film, W is arranged, H=200~1920, W=180~1080;
(1.2) characteristic extraction step:The three-dimensional video-frequency that training data concentrates left and right view format is split as left and right view The stereo-picture of form, reservation right view is constant, and mathematics convolution algorithm and the down-sampled fortune of pondization are passed sequentially through to a frame left view It calculates and carries out feature extraction, obtain the eleventh floor convolution eigenmatrix of the first layer convolution eigenmatrix of the frame left view~second, make For its characteristics of image;
(1.3) Fusion Features step:Respectively to third layer, layer 6, the tenth layer, the 14th layer, the second eleventh floor convolution Eigenmatrix is operated into row matrix deconvolution, by obtained first group of deconvolution eigenmatrix D3 to the 5th groups of deconvolution spy Sign matrix D 21 is cascaded, and forms fusion feature matrix D c, and fusion feature matrix D c sizes are H × W × 32, i.e. matrix has H Row, W row, 32 tensor dimensions;
(1.4) View Synthesis step:
To fusion feature matrix D c, using regression parameter matrix θ, obtain each pixel in corresponding left view and take different parallaxes When probabilistic forecasting value, form the parallax probability matrix Dep of left view;
According to original left view and parallax probability matrix Dep, by View Synthesis, right wing synthesis view is obtained;
(1.5) parameter updating step:
Calculate the error matrix err of right wing synthesis view and the right view in step (1.2)R
View, each error matrix err are synthesized for continuous N frame right wingRIt is added, forms propagated error matrix errSR, M >= 16;Obtained propagated error matrix is traveled into View Synthesis step, Fusion Features step and spy by back-propagation algorithm Each sub-step in extraction step is levied, reversely update parallax probability matrix Dep, regression parameter matrix θ and each layer are respectively rolled up successively Each weights of product core, complete the primary update of all parameters;
Turn characteristic extraction step (1.2), remaining left view continuation is concentrated to carry out successively to the training data obtained Feature extraction, Fusion Features, View Synthesis step and parameter updating step are stated, when training data concentrates all left and right views equal Using finishing, the first round update of parallax probability matrix Dep, regression parameter matrix θ and each convolution kernel of each layer is completed;
According to the step (1.1) to step (1.5), parallax probability matrix Dep, regression parameter matrix θ are continued to complete, with And the second wheel update of each each convolution kernel of layer;So repeat, after the 50th wheel the~the two hundred wheel update is completed, the training stage Terminate;
(2) service stage includes data input step, characteristic extraction step, Fusion Features step and viewpoint conjunction successively Into step;
(2.1) data input step:Prepare plane 2D videos to be converted;
(2.2) characteristic extraction step:Plane 2D videos are split as image, as the left view being equivalent in step (1.2) Figure, passes sequentially through mathematics convolution algorithm and the down-sampled operation of pondization carries out feature extraction, obtains the first layer volume of the frame left view The product eleventh floor convolution eigenmatrix of eigenmatrix~second, as its characteristics of image, each each weights of convolution kernel are equal involved by each layer Each weights of each convolution kernel of each layer are corresponded to after using the training stage;
(2.3) Fusion Features step:It is identical with step (1.3);
(2.4) View Synthesis step:It is identical with step (1.4), according to the parallax probability matrix obtained in step (1.4) Image in Dep and (2.2) obtains right wing by View Synthesis and synthesizes view;
Features described above extraction step (2.2), Fusion Features step (2.3) and viewpoint is carried out successively to each frame left view to close Into step (2.4), left and right splicing, right a later frame one are carried out using original image as left view and obtained right wing synthesis view Frame, which connects together, obtains the 3D three-dimensional video-frequencies of left-right format.
The characteristic extraction step (1.2) includes following sub-steps:
(1.2.1) carries out convolution operation to a frame left view, obtains first layer convolution eigenmatrix:
Using 3 × 3 convolution kernel, step-length 1 since the upper left corner of image, moves right successively, until the right of image Boundary is changed to the next line of image, continues to move from left to right, and until the lower right corner of image, often mobile primary, convolution kernel is each Weights are multiplied with the pixel value of corresponding position image, and all products are added again, obtain image convolution core region convolution value;Figure Former regional location is pressed as each region convolution value to arrange, and forms 1 convolution eigenmatrix C1 of the frame left view1, to C11In own Minus element value value is zero;
The convolution kernel of 64 3 × 3 is used altogether, and above-mentioned convolution operation is carried out to image, obtains 64 1 convolution feature squares Battle array C11、C12、C13……C164, form first layer convolution eigenmatrix;
(1.2.2) carries out convolution operation to first layer convolution eigenmatrix, obtains second layer convolution eigenmatrix:
Use 3 × 3 convolution kernel, step-length 1, from 1 convolution eigenmatrix C1 of first layer1The upper left corner start, successively It moves right, until 1 convolution eigenmatrix C11Right margin, be changed to next line, continue to move from left to right, until 1 convolution Eigenmatrix C11The lower right corner until, often mobile primary, each weights of convolution kernel and 1 convolution eigenmatrix C1 of corresponding position1's Matrix element value is multiplied, and all products are added again, obtain 1 convolution eigenmatrix C11Convolution kernel region convolution value;1 time Convolution eigenmatrix C11Each region convolution value is pressed former regional location and is arranged, and convolution eigenmatrix is formed, to obtained convolution feature All minus element value values are zero in matrix;
Later to remaining 63 1 convolution eigenmatrix C12、C13……C164, it is corresponding using this layer of convolution eigenmatrix Convolution kernel repeats operation described in the preceding paragraph, obtains 64 convolution eigenmatrixes altogether, is directly added, and forms 2 secondary volumes Product eigenmatrix C21
The convolution kernel of 64 3 × 3 is used altogether, to 64 1 convolution features included by first layer convolution eigenmatrix Matrix C 11、C12、C13……C164Two sections of convolution operations are carried out, obtain 64 2 convolution eigenmatrix C21、C22、 C23……C264, form second layer convolution eigenmatrix;
(1.2.3) carries out the down-sampled operation of first time pondization to second layer convolution eigenmatrix, obtains third layer convolution spy Levy matrix:
To 2 convolution eigenmatrix C2 of the second layer1, use 2 × 2 sliding window, step-length 2, from 2 convolution feature squares Battle array C21The upper left corner start, move right successively, until 2 convolution matrix C21Right margin, be changed to next line, continue from left-hand It moves right, until 2 convolution matrix C11The lower right corner until, it is often mobile primary, take corresponding position in 2 × 2 sliding window region 2 convolution eigenmatrix C21The maximum value of matrix element, the pondization as convolution kernel region sample characteristic value, 2 convolution Eigenmatrix C21Each pool areaization sampling characteristic value is pressed former regional location and is arranged, and forms 3 convolution eigenmatrix C31
To remaining 63 2 convolution eigenmatrix C22、C23……C264The down-sampled operation of above-mentioned pondization is carried out successively, altogether Meter obtains 64 3 convolution eigenmatrix C31、C32、C33……C364, form third layer convolution eigenmatrix;
(1.2.4) carries out convolution operation to third layer convolution eigenmatrix, obtains the 4th layer of convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), using 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel are right 64 3 convolution eigenmatrix C31、C32、C33……C364, matrix convolution operation is carried out respectively, obtains 64 convolution spies altogether Matrix is levied, is directly added, forms 4 convolution eigenmatrix C41
The convolution kernel of 128 3 × 3 is used altogether, to 64 3 convolution features included by third layer convolution eigenmatrix Matrix C 31、C32、C33……C364, carry out the preceding paragraph described in convolution operation, obtain 128 4 convolution eigenmatrix C41、C42、 C43……C4128, form the 4th layer of convolution eigenmatrix;
(1.2.5) carries out convolution operation to the 4th layer of convolution eigenmatrix, obtains layer 5 convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), using 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel are right 128 4 convolution eigenmatrix C41、C42、C43……C4128, matrix convolution operation is carried out respectively, obtains 128 convolution altogether Eigenmatrix is directly added, and forms 5 convolution eigenmatrix C51
The convolution kernel of 128 3 × 3 is used altogether, to 128 4 convolution spies included by the 4th layer of convolution eigenmatrix Levy Matrix C 41、C42、C43……C4128, carry out the preceding paragraph described in convolution operation, obtain 128 5 convolution eigenmatrix C51、 C52、C53……C5128, form layer 5 convolution eigenmatrix;
(1.2.6) carries out the down-sampled operation of second of pondization to layer 5 convolution eigenmatrix, obtains layer 6 convolution spy Levy matrix:
Using the same manner in sub-step (1.2.3), to 128 5 convolution eigenmatrix C51、C52、C53……C5128, The down-sampled operation of pondization is carried out successively, obtains 128 6 convolution eigenmatrix C6 altogether1、C62、C63……C6128, form the Six layers of convolution eigenmatrix;
(1.2.7) carries out convolution operation to layer 6 convolution eigenmatrix, obtains layer 7 convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, to 128 6 convolution spies are used Levy Matrix C 61、C62、C63……C6128, matrix convolution operation is carried out respectively, is obtained 128 convolution results, is directly added, Form 7 convolution eigenmatrix C71
The convolution kernel of 256 3 × 3 is used altogether, to 128 6 convolution spies included by layer 6 convolution eigenmatrix Levy Matrix C 61、C62、C63……C6128, carry out the preceding paragraph described in convolution operation, obtain 256 7 convolution eigenmatrix C71、 C72、C73……C7256, form layer 7 convolution eigenmatrix;
(1.2.8) carries out convolution operation to layer 7 output eigenmatrix, obtains the 8th layer of convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), using 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel are right 256 7 convolution eigenmatrix C71、C72、C73……C7256, matrix convolution operation is carried out respectively, obtains 256 convolution knots Fruit is directly added, and forms 8 convolution eigenmatrix C81
The convolution kernel of 256 3 × 3 is used altogether, to 256 7 convolution spies included by layer 7 convolution eigenmatrix Levy Matrix C 71、C72、C73……C7256, carry out the preceding paragraph described in convolution operation, obtain 256 8 convolution eigenmatrix C81、 C82、C83……C8256, form the 8th layer of convolution eigenmatrix;
(1.2.9) carries out convolution operation to the 8th layer of output eigenmatrix, obtains the 9th layer of convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), using 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel are right 256 8 convolution eigenmatrix C81、C82、C83……C8256, matrix convolution operation is carried out respectively, obtains 256 convolution knots Fruit is directly added, and forms 9 convolution eigenmatrix C91
The convolution kernel of 256 3 × 3 is used altogether, to 256 8 convolution spies included by the 8th layer of convolution eigenmatrix Levy Matrix C 81、C82、C83……C8256, carry out the preceding paragraph described in convolution operation, obtain 256 9 convolution eigenmatrix C91、 C92、C93……C9256, form the 9th layer of convolution eigenmatrix;
(1.2.10) carries out the down-sampled operation of third time pondization to the 9th layer of output convolution eigenmatrix, obtains the tenth layer of volume Product eigenmatrix:
Using the same manner in sub-step (1.2.3), to 256 9 convolution eigenmatrix C91、C92、C93……C9256, The down-sampled operation of pondization is carried out successively, obtains 256 10 convolution eigenmatrix C10 altogether1、C102、C103……C10256, structure Into the tenth layer of convolution eigenmatrix;
(1.2.11) carries out convolution operation successively to the tenth layer of convolution eigenmatrix, obtains eleventh floor, the 12nd successively Layer, the tenth three-layer coil product eigenmatrix:
According to the similar operation of sub-step (1.2.7), convolution operation, the volume used are carried out to the tenth layer of convolution eigenmatrix Product nuclear volume is 512, to 256 10 convolution eigenmatrix C10 included by the tenth layer of convolution eigenmatrix1、C102、 C103……C10256Obtain 512 11 convolution eigenmatrix C111、C112、C113……C11512, form eleventh floor convolution Eigenmatrix;
According to the similar operation of sub-step (1.2.8), convolution operation is carried out to eleventh floor convolution eigenmatrix, is used Convolution nuclear volume is 512, to 512 11 convolution eigenmatrix C11 included by eleventh floor convolution eigenmatrix1、 C112、C113……C11512Obtain 512 12 convolution eigenmatrix C121、C122、C123……C12512, form Floor 12 Convolution eigenmatrix;
According to the similar operation of sub-step (1.2.9), convolution operation is carried out to Floor 12 convolution eigenmatrix, is used Convolution nuclear volume is 512, to 512 12 convolution eigenmatrix C12 included by Floor 12 convolution eigenmatrix1、 C122、C123……C12512Obtain 512 13 convolution eigenmatrix C131、C132、C133……C13512, form the 13rd layer Convolution eigenmatrix;
(1.2.12) is to the tenth three-layer coil product eigenmatrix C131、C132、C133……C13512, carry out the 4th pondization drop Sampling operation obtains the 14th layer of convolution eigenmatrix:
Using the same manner in sub-step (1.2.3), to 512 13 convolution eigenmatrix C131、C132、C133…… C13512, the down-sampled operation of pondization is carried out successively, obtains 512 14 convolution eigenmatrix C14 altogether1、C142、C143…… C14512, form the 14th layer of convolution eigenmatrix;
(1.2.13) carries out convolution operation successively to the 14th layer of convolution eigenmatrix, obtain successively the 15th layer, the tenth Six layers, the 17th layer of convolution eigenmatrix:
According to the similar operation of sub-step (1.2.11), convolution operation is carried out to the 14th layer of convolution eigenmatrix, is used Convolution nuclear volume for 512, to 512 14 convolution eigenmatrix C14 included by the 14th layer of convolution eigenmatrix1、 C142、C143……C14512Obtain 512 15 convolution eigenmatrix C151、C152、C153……C15512, form the 15th layer Convolution eigenmatrix;
According to the similar operation of sub-step (1.2.12), convolution operation is carried out to the 15th layer of convolution eigenmatrix, is used Convolution nuclear volume for 512, to 512 15 convolution eigenmatrix C15 included by the 15th layer of convolution eigenmatrix1、 C152、C153……C15512Obtain 512 16 convolution eigenmatrix C161、C162、C163……C16512, form the 16th layer Convolution eigenmatrix;
According to the similar operation of sub-step (1.2.12), convolution operation is carried out to the 16th layer of convolution eigenmatrix, is used Convolution nuclear volume for 512, to 512 16 convolution eigenmatrix C16 included by the 16th layer of convolution eigenmatrix1、 C162、C163……C16512Obtain 512 17 convolution eigenmatrix C171、C172、C173……C17512, form the 17th layer Convolution eigenmatrix;
(1.2.14) carries out the 4th down-sampled operation of pondization, obtains the 18th layer to the 17th layer of convolution eigenmatrix Convolution eigenmatrix:
Using the same manner in sub-step (1.2.3), to 512 17 convolution eigenmatrix C171、C172、C173…… C17512, the down-sampled operation of pondization is carried out successively, obtains 512 18 convolution eigenmatrix C18 altogether1、C182、C183…… C18512, form the 18th layer of convolution eigenmatrix;
(1.2.15) carries out convolution operation to the 18th layer of output eigenmatrix, obtains the 19th layer of convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), using 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel are right 512 18 convolution eigenmatrix C181、C182、C183……C18512, matrix convolution operation is carried out respectively, obtains 512 volumes Product is as a result, be directly added, 19 convolution eigenmatrix C19 of composition1
The convolution kernel of 4096 3 × 3 is used altogether, to 512 18 secondary volumes included by the 18th layer of convolution eigenmatrix Product eigenmatrix C181、C182、C183……C18512, carry out the preceding paragraph described in convolution operation, obtain 4096 19 convolution spies Levy Matrix C 191、C192、C193……C194096, form the 19th layer of convolution eigenmatrix;
(1.2.16) carries out convolution operation to the 19th layer of convolution eigenmatrix, obtains the 20th layer of convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), using 1 × 1 convolution kernel, step-length 1, each weights of convolution kernel are right 4096 19 convolution eigenmatrix C191、C192、C193……C194096, matrix convolution operation is carried out respectively, obtains 4096 Convolution results are directly added, and form 20 convolution eigenmatrix C201
The convolution kernel of 4096 1 × 1 is used altogether, to 4096 19 secondary volumes included by the 19th layer of convolution eigenmatrix Product eigenmatrix C191、C192、C193……C194096, carry out the preceding paragraph described in convolution operation, obtain 4096 20 convolution spies Levy Matrix C 201、C202、C203……C204096, form the 20th layer of convolution eigenmatrix;
(1.2.17) carries out convolution operation to the 20th layer of output eigenmatrix, obtains the second eleventh floor convolution feature square Battle array:
Using the same manner in sub-step (1.2.2), using 1 × 1 convolution kernel, step-length 1, each weights of convolution kernel are right 4096 20 convolution eigenmatrix C201、C202、C203……C204096, matrix convolution operation is carried out respectively, obtains 4096 Convolution results are directly added, and form 21 convolution eigenmatrix C211
The convolution kernel of 32 1 × 1 is used altogether, to 4096 20 convolution included by the 20th layer of convolution eigenmatrix Eigenmatrix C201、C202、C203……C204096, carry out the preceding paragraph described in convolution operation, obtain 32 21 convolution feature squares Battle array C211、C212、C213……C2132, form the second eleventh floor convolution eigenmatrix;
Involved each each weights of convolution kernel are using VGG16 models in sub-step (1.2.1) to sub-step (1.2.17) Numerical value is initialized in Parameter File vgg16-0001.params, then corresponding each afterwards using parameter updating step (1.5) later Each weights of each convolution kernel of layer;
To every frame left view, by sub-step (1.2.1) to (1.2.17), two extracted by different scale are obtained Eleventh floor convolution eigenmatrix.
The Fusion Features step (1.3) includes following sub-steps:
(1.3.1) obtained third layer convolution eigenmatrix that operates down-sampled to first time pondization carries out deconvolution operation, Obtain first group of deconvolution eigenmatrix D3:
Using 1 × 1 convolution kernel, respectively to 64 3 convolution eigenmatrix C31、C32、C33……C364It is anti-into row matrix Convolution operation obtains 64 convolution outputs as a result, being directly added, forms 1 deconvolution eigenmatrix D31
The convolution kernel of 32 1 × 1 is used altogether, to 64 3 convolution features included by third layer convolution eigenmatrix Matrix C 31、C32、C33……C364Operation described in the preceding paragraph is carried out, obtains 32 1 deconvolution eigenmatrix D31、D32、 D33……D332, form first group of deconvolution eigenmatrix D3;
(1.3.2) obtained layer 6 convolution eigenmatrix that operates down-sampled to second of pondization carries out deconvolution operation, Obtain second group of deconvolution eigenmatrix D6:
Using 2 × 2 convolution kernel, respectively to 128 6 convolution eigenmatrix C61、C62、C63……C6128Into row matrix Deconvolution operates, and obtains 128 convolution outputs as a result, being directly added, forms 2 deconvolution eigenmatrix D61
The convolution kernel of 32 2 × 2 is used altogether, to 128 6 convolution features included by layer 6 convolution eigenmatrix Matrix C 61、C62、C63……C6128Operation described in the preceding paragraph is carried out, obtains 32 2 deconvolution eigenmatrix D61、D62、 D63……D632, form second group of deconvolution eigenmatrix D6;
(1.3.3) down-sampled to third time pondization to operate the tenth layer of obtained convolution eigenmatrix, the 4th pondization is dropped and adopted The 14th layer of convolution eigenmatrix that sample operates carries out deconvolution operation, obtains third group deconvolution eigenmatrix respectively D10 and the 4th group of deconvolution eigenmatrix D14:
Wherein, it is operated according to the matrix deconvolution in sub-step (1.3.2), altogether using the convolution kernel of 32 4 × 4, volume Each weights of product core, to 256 10 convolution eigenmatrix C10 included by the tenth layer of convolution eigenmatrix1、C102、C103…… C10256It is operated, obtains 32 3 deconvolution eigenmatrix D101、D102、D103……D1032, form third group deconvolution Eigenmatrix D10;
The convolution kernel of 32 8 × 8 is used altogether, to 512 14 convolution included by the 14th layer of convolution eigenmatrix Eigenmatrix C141、C142、C143……C14512It is operated, obtains 32 4 deconvolution eigenmatrix D141、D142、 D143……D1432, form the 4th group of deconvolution eigenmatrix D14;
(1.3.4) operates the second eleventh floor convolution eigenmatrix into row matrix deconvolution, and it is special to obtain the 5th group of deconvolution Levy matrix:
It is operated according to the matrix deconvolution in sub-step (1.3.2), altogether using the convolution kernel of 32 16 × 16, convolution kernel Each weights, to 32 21 convolution eigenmatrix C21 included by the second eleventh floor convolution eigenmatrix1、C212、C213…… C2132It is operated, obtains 32 5 deconvolution eigenmatrix D211、D212、D213……D2132, form the 5th group of deconvolution Eigenmatrix D21;
(1.3.5) cascades first group of deconvolution eigenmatrix D3 to the 5th groups of deconvolution eigenmatrix D21, forms Fusion feature matrix D c, shown in specific cascade system such as formula (1):
Involved each each weights of convolution kernel use the ginseng of VGG16 models in sub-step (1.3.1) to sub-step (1.3.4) Numerical value is initialized in number file vgg16-0001.params, then corresponds to each layer afterwards using parameter updating step (1.5) later Each weights of each convolution kernel.
The View Synthesis step (1.4) includes following sub-steps:
(1.4.1) depth map:It to fusion feature matrix D c, is calculated, is obtained in corresponding left view using the following formula (2) Each pixel takes probabilistic forecasting value during different parallaxes, forms the parallax probability matrix Dep of left view:
Wherein, matrix element DepkAlso for matrix, k=1,2 ..., 32, represent probability value p (y=k | x;θ), that is, parallax is taken For k-16, each pixel of left view corresponding parallax probability value when regression parameter matrix is θ;Regression parameter matrix θ is by regression parameter [θ1, θ2..., θ32] composition,Represent that p-th of tensor dimension is in regression parameter θ in DcpUnder logistic regression value, p=1, 2nd ... 32, use each recurrence of the numerical value to regression parameter matrix θ in the Parameter File vgg16-0001.params of VGG16 models Parameter initialization, later then using each regression parameter of parameter updating step (1.5) corresponding regression parameter matrix θ afterwards;
(1.4.2) forms right wing synthesis view R, wherein the pixel value R of the i-th row jth rowI, jAs shown in formula (3):
Wherein, LdPan view during parallax d is translated for original left view,For its i-th row jth row pixel value,LI, j-dFor i in original left view, the pixel value at j-d positions, if j-d < 0,D be regarding Difference, -15≤d≤+ 16,For i in matrix D epk, the element value at j positions is i in left view, pixel value at j positions Parallax is taken as the probability of d, k=d+16, i=1~H, j=1~W.
The parameter updating step (1.5) includes following sub-steps:
(1.5.1) subtracts each other right wing synthesis view R with the right view in step (1.2), obtains error matrix errR;For Continuous N frame right wing synthesizes view, each error matrix errRIt is added, forms propagated error matrix errSR, M >=16;
(1.5.2) is by propagated error matrix errSRView Synthesis step (1.4) is propagated backward to, first by parallax probability Matrix D ep is updated to new parallax probability matrix Depl+1, matrix element isAs shown in formula (4):
The updated value of the l+1 times when expression parallax is d (d=k-16),It is once updated before representing it Value, learning rate η are initialized as 0.00003~0.00008;
Secondly regression parameter matrix θ is updated to θ l by (1.5.3)+1, for θl+1Each regression parameterUnder State formula (5) update;
WhereinRepresent regression parameter θpThe l+1 times updated value,Represent its previous updated value, DeppAs Corresponding Dep after the update of View Synthesis step (1.4) parameterk, wherein p=k, LpFor the corresponding L of View Synthesis stepd, wherein D=p-16, DcpAs corresponding Dc in step (1.3.5)t, the value range of wherein t=p, p are the update speed that 1 to 32, λ is θ Rate, 0.00001≤λ≤0.01;
(1.5.4) is by propagated error matrix errSRBecome characteristic error matrix errDR, as shown in formula (6):
Again by characteristic error matrix errDRCharacteristic extraction step (1.2) and Fusion Features step (1.3) are sent into, to being related to The weights of each convolution kernel be updated using the iterative manner in caffe deep learning frames,
Update 12608 convolution kernels altogether successively from the second eleventh floor to first layer, altogether 113472 convolution kernel weights; By sub-step (1.5.2)~(1.5.4), the primary update of all parameters is completed.
Characteristic extraction step is consistent with the level of VGG16 models that parameter initialization uses with Fusion Features step, real When border operates, to the right value update of each convolution kernel using caffe deep learnings frame (Jia Y, Shelhamer E, Donahue J, et al.Caffe:Convolutional architecture for fast feature embedding.In:Proc, ACM international conference on Multimedia.2014:In 675-678.) Iterative manner, be academia and the general gradient dropout error circulation way of industrial quarters, core is described as:Convolution kernel is each The current value of weights adds corresponding residual values, is each weights of updated convolution kernel;A certain layer input convolution eigenmatrix warp This layer of convolution kernel is crossed to carry out obtaining output convolution eigenmatrix, as next layer of input convolution feature square after convolution algorithm Battle array, then it is assumed that this layer inputs each by the convolution kernel of this layer between convolution eigenmatrix and next layer of input convolution eigenmatrix There is weights connection between weights corresponding position,
The residual values of a certain each weights of layer convolution kernel form a matrix, solve in the following manner:
It is right that the residual values of each weights on all convolution kernels of the later layer for there are weights to connect with the convolution kernel are multiplied by its respectively Answer position weights, all results summed to obtain matrix of consequence, when its somewhere location matrix element be less than zero when value be zero, most Obtained matrix is residual error value matrix eventually.
The present invention include training stage and service stage, the training stage by feature extraction, Fusion Features, View Synthesis and Parameter updates four core procedures, to 106The 3D three-dimensional video-frequency movie films of the left-right format of grade are trained, and scene depth is estimated The task of meter and View Synthesis is carried out at the same time Optimization Solution, determines parameter, ensure that the pixel class precision of output right wing view is pre- It surveys, reduces and 2D videos are usually turned into the error that 3D videos are divided into two task operatings and bring.After training is completed, you can directly Conversion of the 2D videos to 3D videos is carried out, centre needs not move through turn being separated from the estimation of Depth of scene to View Synthesis Process processed, but front and rear step as input optimizes, trains together jointly in its training process, it can be big in service stage The big 2D videos that improve ensure that the precision of 3D three-dimensional video-frequencies finally exported to the transformation of ownership efficiency of 3D videos.
Compared with other existing 2D video transformation of ownership 3D video methods, effect of the invention protrusion is embodied in the following:
1. compared to the previous scene depth estimation of carry out first, the technological approaches of View Synthesis, this hair are then carried out again Bright to be put forward for the first time in scene depth estimation and View Synthesis unification a to frame, the design of one side step is concise, instruction It is fast to practice calculating speed after completing;On the other hand reduce amount broad in the middle and calculate the estimation of Depth precision of especially current scene not The error enough brought, can obtain higher output accuracy;
2. the present invention is accomplished that the image output of pixel scale, for each pixel in original image, prediction Its possible pixel-map accurately obtains its image distribution situation in another viewpoint.According to 106The left and right lattice of grade The 3D three-dimensional video-frequency movie films of formula are trained, and the original intention of these video productions design is exactly stronger in order to which people is allowed to have when watching Euphorosia sense, it ensure that the accuracy of the 3 D video finally exported, while stronger vision is brought in video-see Impact effect.
Description of the drawings
Fig. 1 is the flow diagram of the present invention;
Fig. 2 is the input and output image of the present invention, one behavior input picture of top, the corresponding synthesis right wing of a following behavior View.
Specific embodiment
Below in conjunction with drawings and examples, the present invention is described in more detail.
As shown in Figure 1, the present invention includes training stage and service stage, the training stage includes data input step successively Suddenly, characteristic extraction step, Fusion Features step, View Synthesis step and parameter updating step;The service stage includes successively Data input step, characteristic extraction step, Fusion Features step and View Synthesis step.
The embodiment of the present invention, including training stage and service stage,
(1) training stage includes data input step, characteristic extraction step, Fusion Features step, viewpoint conjunction successively Into step and parameter updating step;
(1.1) data input step:It is 10 to obtain the order of magnitude by disclosed video data resource6Left and right view format 3D three-dimensional video-frequency movie films, and select disparity range be -15 to+16 3D three-dimensional video-frequency movie films, as training Data set, image size is H rows in three-dimensional video-frequency movie film, W is arranged, H=200~1920, W=180~1080;
The 3D three-dimensional video-frequency movie films of left and right view format contain different types of three-dimensional film data, including action Piece, feature film, external animation, documentary film, 3D video frequency propaganda films shown etc.;
The 3D three-dimensional video-frequency movie films of disparity range -15~+16 are picked out from 3D three-dimensional video-frequency movie films, process is such as Under:
(a) video data in 3D three-dimensional video-frequency movie films is converted to the solid of left and right view format one by one Image;
(b) the left and right view obtained in (a) is obtained into corresponding disparity map using Stereo Matching Algorithm;
(c) mean filter smoothing processing is carried out to the disparity map that (b) is obtained;
(d) statistics with histogram is carried out to the disparity map after (c) smoothing processing, obtains regarding in disparity map statistics with histogram The maximum value and minimum value of difference;
(e) according to the maximum value and minimum value of parallax value obtained in (d), judge the frame stereo-picture whether -15~+ In 16 disparity ranges, it is, retains;Otherwise present frame stereo-picture is abandoned.
For the processing that the every section of video initially chosen all proceeds as described above, ensure not by the video of indivedual specially treateds Segment is interfered, and in the present embodiment, the image data of about 1,200,000 frames is obtained eventually by the above process, is used as really The training dataset of rational method.
Binocular Stereo Matching Algorithm (D.Scharstein, R.Szeliski, " Ataxonomy and are used in the present embodiment Evaluation of dense two-frame stereo correspondencealgorithms ", International Journal of Computer Vision, 2002,47 (1-3), pp.7-42.) to training data concentrate video left and right two Road carries out Stereo matching, obtains the parallax distribution of each binocular tri-dimensional video, and the Binocular Stereo Matching Algorithm is a kind of General solution, the present embodiment use its official's open source projects.
After select training dataset, to different training video data into the unification of row format, in view of training data It concentrates the picture size gap of different video can be bigger, needs to do graphical rule size unified processing, ensure model ginseng Graphical rule in all videos is scaled unified 640 × 960 in the present embodiment by several trainabilities.
(1.2) characteristic extraction step:The three-dimensional video-frequency that training data concentrates left-right format is split as left and right road view lattice The stereo-picture of formula, reservation right view is constant, and mathematics convolution algorithm and the down-sampled operation of pondization are passed sequentially through to a frame left view Feature extraction is carried out, obtains the eleventh floor convolution eigenmatrix of the first layer convolution eigenmatrix of the frame left view~second, as Its characteristics of image.
Characteristic extraction step employs a large amount of convolution operation and the down-sampled operation of pondization, constantly carries out different scale, The feature extraction of different zones.For clarity, will be formed the operation of each layer convolution eigenmatrix, convolution kernel size, quantity with And sliding window size, quantity are given in down:
First layer, convolution algorithm, convolution kernel size are 3 × 3, quantity 64;
The second layer, convolution algorithm, convolution kernel size are 3 × 3, quantity 64;
Third layer, the down-sampled operation of first time pondization, sliding window are 2 × 2, step-length 2;
4th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 128;
Layer 5, convolution algorithm, convolution kernel size are 3 × 3, quantity 128;
Layer 6, the down-sampled operation of second of pondization, sliding window are 2 × 2, step-length 2;
Layer 7, convolution algorithm, convolution kernel size are 3 × 3, quantity 256;
8th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 256;
9th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 256;
Tenth layer, the down-sampled operation of third time pondization, sliding window is 2 × 2, step-length 2;
Eleventh floor, convolution algorithm, convolution kernel size are 3 × 3, quantity 512;
Floor 12, convolution algorithm, convolution kernel size are 3 × 3, quantity 512;
13rd layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 512;
14th layer, the 4th down-sampled operation of pondization, sliding window is 2 × 2, step-length 2;
15th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 512;
16th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 512;
17th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 512;
18th layer, the 5th down-sampled operation of pondization, sliding window is 2 × 2, step-length 2;
19th layer, convolution algorithm, convolution kernel size is 3 × 3, quantity 4096;
20th layer, convolution algorithm, convolution kernel size is 1 × 1, quantity 4096;
Second eleventh floor, convolution algorithm, convolution kernel size are 1 × 1, quantity 32;
The feature that input left view obtains after continuous several convolution feature extractions with stronger regional area with Global space ability to express.For it is one big it is small be 640 × 960 sizes input picture for, left image and right image are big Small is respectively 640 × 480 and 640 × 480.
After convolution operation and the down-sampled operation of pondization, the third layer of left view, layer 6, the tenth layer, the tenth Four layers, the second eleventh floor convolution eigenmatrix size be respectively 320 × 240,160 × 120,80 × 60,40 × 30,20 × 15.
(1.3) Fusion Features step:Respectively to third layer, layer 6, the tenth layer, the 14th layer, the second eleventh floor convolution Eigenmatrix is operated into row matrix deconvolution, by obtained first group of deconvolution eigenmatrix D3 to the 5th groups of deconvolution spy Sign matrix D 21 is cascaded, and forms fusion feature matrix D c, in this example fusion feature matrix D c sizes for 640 × 480 × 32, i.e. matrix has 640 rows, 480 row, 32 tensor dimensions;
It is finally the image of output pixel rank in the present invention, thus, in order to ensure that the right wing of pixel scale synthesizes view In to different scale, different zones, part and global characteristics all have good response, what the present invention exported different convolutional layers Convolution eigenmatrix is merged.Every time after the down-sampled operation of pondization, matrix character dimension will be reduced to last layer Half, such as C31、C32、C33…C364In the dimension of each eigenmatrix be C41、C42、C43…C4128In each matrix character Twice, i.e. Matrix C 31Length and width be C4 respectively1Twice.Matrix dimensionality size is consistent during in order to ensure Fusion Features, For the dimension of matrix after the down-sampled operation of different pondizations in the present invention, respectively into the deconvolution of row matrix, different pondization drops Sample level output convolution eigenmatrix carries out deconvolution operation, and difference lies in the of different sizes of convolution kernel.
With the Matrix C 6 of deconvolution to be carried out1For, the concrete operations of deconvolution are as follows:First by Matrix C 61Length and width Expand N times respectively, N is the size of deconvolution core, as it was noted above, in the present invention deconvolution core size be respectively set to 2 × 2,4 × 4,8 × 8,16 × 16,32 × 32, i.e. N takes 2,4,8,16,32 according to different layers respectively.To C61N takes 4 during deconvolution.Square Array length is wide expand N times it is later, in-between value is completed using arest neighbors interpolation, obtains Matrix C 61′.Then using N × N's Convolution kernel, step-length N/2, from Matrix C 61' the upper left corner start, move right successively, until Matrix C 61' right margin, be changed to square Battle array C61' next line, continue to move from left to right, until Matrix C 61' the lower right corner until.Often mobile primary, convolution kernel is respectively weighed Value and corresponding position Matrix C 61' convolution value be multiplied, all products are added again, obtain Matrix C 61' region convolution the value.Matrix C61' each region convolution value is arranged by former regional location, that is, completes matrix deconvolution operation.
To the left view that input size is 640 × 480 in (1.2), third layer, layer 6, the tenth layer, the 14th layer, Second eleventh floor convolution eigenmatrix is after deconvolution operation, obtained first group of deconvolution eigenmatrix D3 to the 5th Group deconvolution eigenmatrix D21 sizes are 640 × 480 × 32.By cascade, it is 640 × 480 × 32 to finally constitute size Fusion feature matrix.
(1.4) View Synthesis step:To fusion feature matrix D c, using regression parameter matrix θ, obtain in corresponding left view Each pixel takes probabilistic forecasting value during different parallaxes, forms the parallax probability matrix Dep of left view;According to original left view and Parallax probability matrix Dep by View Synthesis, obtains right wing synthesis view.The fusion feature matrix of 640 × 480 × 32 sizes, Parallax probabilistic forecasting recurrence is carried out by regression parameter matrix θ, what is obtained is still 640 × 480 × 32 matrix;At this point for Each tensor dimension, i.e., 640 × 480 matrix intuitively understand that each pixel that can be obtained in left view takes correspondence The probability of parallax value d, the parallax probability matrix Dep and size that the present embodiment foundation size is 640 × 480 × 32 are 640 × 480 Original left view, it is in the same size with original left view that view is synthesized by the obtained right wing of View Synthesis, is 640 × 480;
(1.5) parameter updating step:Calculate the error matrix of right wing synthesis view and the right view in step (1.2) errR
View, each error matrix err are synthesized for continuous 640 frame right wingRIt is added, forms propagated error matrix errSR;It will Obtained propagated error matrix travels to View Synthesis step, Fusion Features step and feature extraction by back-propagation algorithm Each sub-step in step reversely updates parallax probability matrix Dep, regression parameter matrix θ and each convolution kernel of each layer successively Each weights complete the primary update of all parameters;The error of Fusion Features and characteristic extraction step in parameter renewal process passes It broadcasts and corresponding function interface in caffe deep learning frames is called to can be realized.The present embodiment learning rate η Initialize installations are 0.00005, learning rate attenuation 30% is set after 20000 wheel operations later;
Turn characteristic extraction step (1.2), the left view continuation being left to the training data concentration obtained carries out successively Feature extraction, Fusion Features, View Synthesis step and parameter updating step are stated, when training data concentrates all left and right views equal Using finishing, the first round update of parallax probability matrix Dep, regression parameter matrix θ and each convolution kernel of each layer, this reality are completed Applying needs undated parameter 2000 times in each round in example;
According to aforesaid way, parallax probability matrix Dep, regression parameter matrix θ and each convolution kernel of each layer are continued to complete Second wheel update;It so repeats, after the 100th wheel update is completed, the present embodiment training stage terminates;
(2) service stage includes data input step, characteristic extraction step, Fusion Features step and viewpoint conjunction successively Into step.
(2.1) data input step:Prepare plane 2D videos to be converted;
(2.2) characteristic extraction step:Plane 2D videos are split as image, as the left view being equivalent in step (1.2) Figure, passes sequentially through mathematics convolution algorithm and the down-sampled operation of pondization carries out feature extraction, obtains the first layer volume of the frame left view The product eleventh floor convolution eigenmatrix of eigenmatrix~second, as its characteristics of image;Each each weights of convolution kernel are equal involved by each layer Each weights of each convolution kernel of each layer are corresponded to after using the training stage;
(2.3) Fusion Features step:It is identical with step (1.3);
(2.4) View Synthesis step:It is identical with step (1.4), according to the parallax probability matrix obtained in step (1.4) Image in Dep and (2.2) obtains right wing by View Synthesis and synthesizes view,
Features described above extraction, Fusion Features and View Synthesis step are carried out successively to each frame left view, by original image Left and right splicing is carried out as left view and obtained right wing synthesis view, then connecting together one by one obtains left and right lattice The 3D three-dimensional video-frequencies of formula.
Output par, c experimental result as shown in Fig. 2, top a line from left to right for continuous three frames input picture, following a line View is synthesized for corresponding right wing.

Claims (5)

1. it is a kind of by 2D Video Quality Metrics into the method for 3D videos, including training stage and service stage, it is characterised in that:
(1) training stage includes data input step, characteristic extraction step, Fusion Features step, View Synthesis step successively Rapid and parameter updating step;
(1.1) data input step:It is 10 to obtain the order of magnitude by disclosed video data resource6Left and right view format 3D Three-dimensional video-frequency movie film, and the 3D three-dimensional video-frequency movie films that disparity range is -15~+16 are selected, as training data Collect, image size is H rows in three-dimensional video-frequency movie film, W is arranged, H=200~1920, W=180~1080;
(1.2) characteristic extraction step:The three-dimensional video-frequency that training data concentrates left and right view format is split as left and right view format Stereo-picture, retain right view it is constant, a frame left view is passed sequentially through mathematics convolution algorithm and the down-sampled operation of pondization into Row feature extraction obtains the eleventh floor convolution eigenmatrix of the first layer convolution eigenmatrix of the frame left view~second, as it Characteristics of image;
(1.3) Fusion Features step:Respectively to third layer, layer 6, the tenth layer, the 14th layer, the second eleventh floor convolution feature Matrix is operated into row matrix deconvolution, by obtained first group of deconvolution eigenmatrix D3 to the 5th groups of deconvolution feature square Battle array D21 is cascaded, and forms fusion feature matrix D c, and fusion feature matrix D c sizes are H × W × 32, i.e. matrix has H rows, W Row, 32 tensor dimensions;
(1.4) View Synthesis step:
To fusion feature matrix D c, using regression parameter matrix θ, obtain when each pixel in corresponding left view takes different parallaxes Probabilistic forecasting value forms the parallax probability matrix Dep of left view;
According to original left view and parallax probability matrix Dep, by View Synthesis, right wing synthesis view is obtained;
(1.5) parameter updating step:
Calculate the error matrix err of right wing synthesis view and the right view in step (1.2)R
View, each error matrix err are synthesized for continuous N frame right wingRIt is added, forms propagated error matrix errSR, M >=16;It will Obtained propagated error matrix travels to View Synthesis step, Fusion Features step and feature extraction by back-propagation algorithm Each sub-step in step reversely updates parallax probability matrix Dep, regression parameter matrix θ and each convolution kernel of each layer successively Each weights complete the primary update of all parameters;
Turn characteristic extraction step (1.2), remaining left view continuation is concentrated to carry out above-mentioned spy successively to the training data obtained Extraction, Fusion Features, View Synthesis step and parameter updating step are levied, when training data concentrates all left and right views to use It finishes, completes the first round update of parallax probability matrix Dep, regression parameter matrix θ and each convolution kernel of each layer;
According to the step (1.1) to step (1.5), parallax probability matrix Dep, regression parameter matrix θ and each are continued to complete Second wheel update of each convolution kernel of layer;So repeat, after the 50th wheel the~the two hundred wheel update is completed, training stage knot Beam;
(2) service stage includes data input step, characteristic extraction step, Fusion Features step and View Synthesis step successively Suddenly;
(2.1) data input step:Prepare plane 2D videos to be converted;
(2.2) characteristic extraction step:Plane 2D videos are split as image, as the left view being equivalent in step (1.2), according to It is secondary that feature extraction is carried out by mathematics convolution algorithm and the down-sampled operation of pondization, obtain the first layer convolution feature of the frame left view The eleventh floor convolution eigenmatrix of matrix~second, as its characteristics of image;Each each weights of convolution kernel involved by each layer are using instruction Each weights of each convolution kernel of each layer are corresponded to after practicing the stage;
(2.3) Fusion Features step:It is identical with step (1.3);
(2.4) View Synthesis step:It is identical with step (1.4), according to the parallax probability matrix Dep obtained in step (1.4) and (2.2) image in obtains right wing by View Synthesis and synthesizes view;
Carry out features described above extraction step (2.2), Fusion Features step (2.3) and View Synthesis step successively to each frame left view Suddenly (2.4) carry out left and right splicing using original image as left view and obtained right wing synthesis view, then connect one by one Obtain the 3D three-dimensional video-frequencies of left-right format together.
2. as described in claim 1 by 2D Video Quality Metrics into the method for 3D videos, it is characterised in that:The characteristic extraction step (1.2) including following sub-steps:
(1.2.1) carries out convolution operation to a frame left view, obtains first layer convolution eigenmatrix:
Using 3 × 3 convolution kernel, step-length 1 since the upper left corner of image, moves right successively, until the right margin of image, The next line of image is changed to, continues to move from left to right, until the lower right corner of image, often mobile primary, convolution kernel is respectively weighed Value is multiplied with the pixel value of corresponding position image, and all products are added again, obtain image convolution core region convolution value;Image Each region convolution value is pressed former regional location and is arranged, and forms 1 convolution eigenmatrix C1 of the frame left view1, to C11In it is all small In zero element value value be zero;
Use the convolution kernel of 64 3 × 3 altogether, the preceding paragraph is carried out to image described in convolution operation, obtain 64 1 convolution features Matrix C 11、C12、C13……C164, form first layer convolution eigenmatrix;
(1.2.2) carries out convolution operation to first layer convolution eigenmatrix, obtains second layer convolution eigenmatrix:
Use 3 × 3 convolution kernel, step-length 1, from 1 convolution eigenmatrix C1 of first layer1The upper left corner start, move right successively It is dynamic, until 1 convolution eigenmatrix C11Right margin, be changed to next line, continue to move from left to right, until 1 convolution feature square Battle array C11The lower right corner until, often mobile primary, each weights of convolution kernel and 1 convolution eigenmatrix C1 of corresponding position1Matrix element Element value is multiplied, and all products are added again, obtain 1 convolution eigenmatrix C11Convolution kernel region convolution value;1 convolution spy Levy Matrix C 11Each region convolution value is pressed former regional location and is arranged, and forms convolution eigenmatrix, in obtained convolution eigenmatrix All minus element value values are zero;
Later to remaining 63 1 convolution eigenmatrix C12、C13……C164, use the corresponding convolution of this layer of convolution eigenmatrix Core repeats operation described in the preceding paragraph, obtains 64 convolution eigenmatrixes altogether, is directly added, and forms 2 convolution spies Levy Matrix C 21
The convolution kernel of 64 3 × 3 is used altogether, to 64 1 convolution eigenmatrixes included by first layer convolution eigenmatrix C11、C12、C13……C164Two sections of convolution operations are carried out, obtain 64 2 convolution eigenmatrix C21、C22、C23…… C264, form second layer convolution eigenmatrix;
(1.2.3) carries out the down-sampled operation of first time pondization to second layer convolution eigenmatrix, obtains third layer convolution feature square Battle array:
To 2 convolution eigenmatrix C2 of the second layer1, use 2 × 2 sliding window, step-length 2, from 2 convolution eigenmatrix C21 The upper left corner start, move right successively, until 2 convolution matrix C21Right margin, be changed to next line, continuation moves from left to right It is dynamic, until 2 convolution matrix C11The lower right corner until, it is often mobile primary, take corresponding position 2 times in 2 × 2 sliding window region Convolution eigenmatrix C21The maximum value of matrix element, the pondization as convolution kernel region sample characteristic value, 2 convolution spies Levy Matrix C 21Each pool areaization sampling characteristic value is pressed former regional location and is arranged, and forms 3 convolution eigenmatrix C31
To remaining 63 2 convolution eigenmatrix C22、C23……C264The down-sampled operation of above-mentioned pondization is carried out successively, is obtained altogether 64 3 convolution eigenmatrix C31、C32、C33……C364, form third layer convolution eigenmatrix;
(1.2.4) carries out convolution operation to third layer convolution eigenmatrix, obtains the 4th layer of convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel, to 64 3 are used Secondary convolution eigenmatrix C31、C32、C33……C364, matrix convolution operation is carried out respectively, obtains 64 convolution feature squares altogether Battle array, is directly added, and forms 4 convolution eigenmatrix C41
The convolution kernel of 128 3 × 3 is used altogether, to 64 3 convolution eigenmatrixes included by third layer convolution eigenmatrix C31、C32、C33……C364, carry out the preceding paragraph described in convolution operation, obtain 128 4 convolution eigenmatrix C41、C42、 C43……C4128, form the 4th layer of convolution eigenmatrix;
(1.2.5) carries out convolution operation to the 4th layer of convolution eigenmatrix, obtains layer 5 convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel, to 128 are used 4 convolution eigenmatrix C41、C42、C43……C4128, matrix convolution operation is carried out respectively, obtains 128 convolution features altogether Matrix is directly added, and forms 5 convolution eigenmatrix C51
The convolution kernel of 128 3 × 3 is used altogether, to 128 4 convolution feature squares included by the 4th layer of convolution eigenmatrix Battle array C41、C42、C43……C4128, carry out the preceding paragraph described in convolution operation, obtain 128 5 convolution eigenmatrix C51、C52、 C53……C5128, form layer 5 convolution eigenmatrix;
(1.2.6) carries out the down-sampled operation of second of pondization to layer 5 convolution eigenmatrix, obtains layer 6 convolution feature square Battle array:
Using the same manner in sub-step (1.2.3), to 128 5 convolution eigenmatrix C51、C52、C53……C5128, successively The down-sampled operation of pondization is carried out, obtains 128 6 convolution eigenmatrix C6 altogether1、C62、C63……C6128, form layer 6 Convolution eigenmatrix;
(1.2.7) carries out convolution operation to layer 6 convolution eigenmatrix, obtains layer 7 convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, to 128 6 convolution feature squares are used Battle array C61、C62、C63……C6128, matrix convolution operation is carried out respectively, 128 convolution results is obtained, is directly added, and is formed 7 convolution eigenmatrix C71
The convolution kernel of 256 3 × 3 is used altogether, to 128 6 convolution feature squares included by layer 6 convolution eigenmatrix Battle array C61、C62、C63……C6128, carry out the preceding paragraph described in convolution operation, obtain 256 7 convolution eigenmatrix C71、C72、 C73……C7256, form layer 7 convolution eigenmatrix;
(1.2.8) carries out convolution operation to layer 7 output eigenmatrix, obtains the 8th layer of convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel, to 256 are used 7 convolution eigenmatrix C71、C72、C73……C7256, matrix convolution operation is carried out respectively, obtains 256 convolution results, directly It is added, forms 8 convolution eigenmatrix C81
The convolution kernel of 256 3 × 3 is used altogether, to 256 7 convolution feature squares included by layer 7 convolution eigenmatrix Battle array C71、C72、C73……C7256, carry out the preceding paragraph described in convolution operation, obtain 256 8 convolution eigenmatrix C81、C82、 C83……C8256, form the 8th layer of convolution eigenmatrix;
(1.2.9) carries out convolution operation to the 8th layer of output eigenmatrix, obtains the 9th layer of convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel, to 256 are used 8 convolution eigenmatrix C81、C82、C83……C8256, matrix convolution operation is carried out respectively, obtains 256 convolution results, directly It is added, forms 9 convolution eigenmatrix C91
The convolution kernel of 256 3 × 3 is used altogether, to 256 8 convolution feature squares included by the 8th layer of convolution eigenmatrix Battle array C81、C82、C83……C8256, carry out the preceding paragraph described in convolution operation, obtain 256 9 convolution eigenmatrix C91、C92、 C93……C9256, form the 9th layer of convolution eigenmatrix;
(1.2.10) carries out the down-sampled operation of third time pondization to the 9th layer of output convolution eigenmatrix, obtains the tenth layer of convolution spy Levy matrix:
Using the same manner in sub-step (1.2.3), to 256 9 convolution eigenmatrix C91、C92、C93……C9256, successively The down-sampled operation of pondization is carried out, obtains 256 10 convolution eigenmatrix C10 altogether1、C102、C103……C10256, form the Ten layers of convolution eigenmatrix;
(1.2.11) carries out convolution operation successively to the tenth layer of convolution eigenmatrix, obtains eleventh floor, Floor 12, successively Ten three-layer coils accumulate eigenmatrix:
According to the similar operation of sub-step (1.2.7), convolution operation, the convolution kernel used are carried out to the tenth layer of convolution eigenmatrix Quantity is 512, to 256 10 convolution eigenmatrix C10 included by the tenth layer of convolution eigenmatrix1、C102、 C103……C10256Obtain 512 11 convolution eigenmatrix C111、C112、C113……C11512, form eleventh floor convolution Eigenmatrix;
According to the similar operation of sub-step (1.2.8), convolution operation, the convolution used are carried out to eleventh floor convolution eigenmatrix Nuclear volume is 512, to 512 11 convolution eigenmatrix C11 included by eleventh floor convolution eigenmatrix1、C112、 C113……C11512Obtain 512 12 convolution eigenmatrix C121、C122、C123……C12512, form Floor 12 convolution Eigenmatrix;
According to the similar operation of sub-step (1.2.9), convolution operation, the convolution used are carried out to Floor 12 convolution eigenmatrix Nuclear volume is 512, to 512 12 convolution eigenmatrix C12 included by Floor 12 convolution eigenmatrix1、C122、 C123……C12512Obtain 512 13 convolution eigenmatrix C131、C132、C133……C13512, form the tenth three-layer coil product Eigenmatrix;
(1.2.12) is to the tenth three-layer coil product eigenmatrix C131、C132、C133……C13512, it is down-sampled to carry out the 4th pondization Operation, obtains the 14th layer of convolution eigenmatrix:
Using the same manner in sub-step (1.2.3), to 512 13 convolution eigenmatrix C131、C132、C133…… C13512, the down-sampled operation of pondization is carried out successively, obtains 512 14 convolution eigenmatrix C14 altogether1、C142、C143…… C14512, form the 14th layer of convolution eigenmatrix;
(1.2.13) carries out convolution operation successively to the 14th layer of convolution eigenmatrix, obtain successively the 15th layer, the 16th layer, 17th layer of convolution eigenmatrix:
According to the similar operation of sub-step (1.2.11), convolution operation, the volume used are carried out to the 14th layer of convolution eigenmatrix Product nuclear volume is 512, to 512 14 convolution eigenmatrix C14 included by the 14th layer of convolution eigenmatrix1、C142、 C143……C14512Obtain 512 15 convolution eigenmatrix C151、C152、C153……C15512, form the 15th layer of convolution Eigenmatrix;
According to the similar operation of sub-step (1.2.12), convolution operation, the volume used are carried out to the 15th layer of convolution eigenmatrix Product nuclear volume is 512, to 512 15 convolution eigenmatrix C15 included by the 15th layer of convolution eigenmatrix1、C152、 C153……C15512Obtain 512 16 convolution eigenmatrix C161、C162、C163……C16512, form the 16th layer of convolution Eigenmatrix;
According to the similar operation of sub-step (1.2.12), convolution operation, the volume used are carried out to the 16th layer of convolution eigenmatrix Product nuclear volume is 512, to 512 16 convolution eigenmatrix C16 included by the 16th layer of convolution eigenmatrix1、C162、 C163……C16512Obtain 512 17 convolution eigenmatrix C171、C172、C173……C17512, form the 17th layer of convolution Eigenmatrix;
(1.2.14) carries out the 4th down-sampled operation of pondization, obtains the 18th layer of convolution to the 17th layer of convolution eigenmatrix Eigenmatrix:
Using the same manner in sub-step (1.2.3), to 512 17 convolution eigenmatrix C171、C172、C173…… C17512, the down-sampled operation of pondization is carried out successively, obtains 512 18 convolution eigenmatrix C18 altogether1、C182、C183…… C18512, form the 18th layer of convolution eigenmatrix;
(1.2.15) carries out convolution operation to the 18th layer of output eigenmatrix, obtains the 19th layer of convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), 3 × 3 convolution kernel, step-length 1, each weights of convolution kernel, to 512 are used 18 convolution eigenmatrix C181、C182、C183……C18512, matrix convolution operation is carried out respectively, obtains 512 convolution knots Fruit is directly added, and forms 19 convolution eigenmatrix C191
The convolution kernel of 4096 3 × 3 is used altogether, to 512 18 convolution spies included by the 18th layer of convolution eigenmatrix Levy Matrix C 181、C182、C183……C18512, carry out the preceding paragraph described in convolution operation, obtain 4096 19 convolution feature squares Battle array C191、C192、C193……C194096, form the 19th layer of convolution eigenmatrix;
(1.2.16) carries out convolution operation to the 19th layer of convolution eigenmatrix, obtains the 20th layer of convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), 1 × 1 convolution kernel, step-length 1, each weights of convolution kernel, to 4096 are used A 19 convolution eigenmatrixes C191、C192、C193……C194096, matrix convolution operation is carried out respectively, obtains 4096 convolution As a result, being directly added, 20 convolution eigenmatrix C20 are formed1
The convolution kernel of 4096 1 × 1 is used altogether, to 4096 19 convolution spies included by the 19th layer of convolution eigenmatrix Levy Matrix C 191、C192、C193……C194096, carry out the preceding paragraph described in convolution operation, obtain 4096 20 convolution feature squares Battle array C201、C202、C203……C204096, form the 20th layer of convolution eigenmatrix;
(1.2.17) carries out convolution operation to the 20th layer of output eigenmatrix, obtains the second eleventh floor convolution eigenmatrix:
Using the same manner in sub-step (1.2.2), 1 × 1 convolution kernel, step-length 1, each weights of convolution kernel, to 4096 are used A 20 convolution eigenmatrixes C201、C202、C203……C204096, matrix convolution operation is carried out respectively, obtains 4096 convolution As a result, being directly added, 21 convolution eigenmatrix C21 are formed1
The convolution kernel of 32 1 × 1 is used altogether, to 4096 20 convolution features included by the 20th layer of convolution eigenmatrix Matrix C 201、C202、C203……C204096, carry out the preceding paragraph described in convolution operation, obtain 32 21 convolution eigenmatrixes C211、C212、C213……C2132, form the second eleventh floor convolution eigenmatrix;
Involved each each weights of convolution kernel use the parameter of VGG16 models in sub-step (1.2.1) to sub-step (1.2.17) Numerical value is initialized in file vgg16-0001.params, and it is each then using parameter updating step (1.5) to correspond to each layer afterwards later Each weights of convolution kernel;
To every frame left view, by sub-step (1.2.1) 55 (1.2.17), 21 extracted by different scale are obtained Layer convolution eigenmatrix.
3. as claimed in claim 1 or 2 by 2D Video Quality Metrics into the method for 3D videos, it is characterised in that:
The Fusion Features step (1.3) includes following sub-steps:
(1.3.1) obtained third layer convolution eigenmatrix that operates down-sampled to first time pondization carries out deconvolution operation, obtains First group of deconvolution eigenmatrix D3:
Using 1 × 1 convolution kernel, respectively to 64 3 convolution eigenmatrix C31、C32、C33……C364Into row matrix deconvolution Operation obtains 64 convolution outputs as a result, being directly added, forms 1 deconvolution eigenmatrix D31
The convolution kernel of 32 1 × 1 is used altogether, to 64 3 convolution eigenmatrixes included by third layer convolution eigenmatrix C31、C32、C33……C364Operation described in the preceding paragraph is carried out, obtains 32 1 deconvolution eigenmatrix D31、D32、D33…… D332, form first group of deconvolution eigenmatrix D3;
(1.3.2) obtained layer 6 convolution eigenmatrix that operates down-sampled to second of pondization carries out deconvolution operation, obtains Second group of deconvolution eigenmatrix D6:
Using 2 × 2 convolution kernel, respectively to 128 6 convolution eigenmatrix C61、C62、C63……C6128Into row matrix warp Product operation obtains 128 convolution outputs as a result, being directly added, forms 2 deconvolution eigenmatrix D61
The convolution kernel of 32 2 × 2 is used altogether, to 128 6 convolution eigenmatrixes included by layer 6 convolution eigenmatrix C61、C62、C63……C6128Operation described in the preceding paragraph is carried out, obtains 32 2 deconvolution eigenmatrix D61、D62、D63…… D632, form second group of deconvolution eigenmatrix D6;
(1.3.3) is down-sampled to third time pondization to operate the tenth layer of obtained convolution eigenmatrix, the 4th down-sampled behaviour of pondization Make obtained the 14th layer of convolution eigenmatrix, carry out deconvolution operation respectively, obtain third group deconvolution eigenmatrix D10 and 4th group of deconvolution eigenmatrix D14:
Wherein, it is operated according to the matrix deconvolution in sub-step (1.3.2), altogether using the convolution kernel of 32 4 × 4, convolution kernel Each weights, to 256 10 convolution eigenmatrix C10 included by the tenth layer of convolution eigenmatrix1、C102、C103…… C10256It is operated, obtains 32 3 deconvolution eigenmatrix D101、D102、D103……D1032, form third group deconvolution Eigenmatrix D10;
The convolution kernel of 32 8 × 8 is used altogether, to 512 14 convolution features included by the 14th layer of convolution eigenmatrix Matrix C 141、C142、C143……C14512It is operated, obtains 32 4 deconvolution eigenmatrix D141、D142、D143…… D1432, form the 4th group of deconvolution eigenmatrix D14;
(1.3.4) operates the second eleventh floor convolution eigenmatrix into row matrix deconvolution, obtains the 5th group of deconvolution feature square Battle array:
It is operated according to the matrix deconvolution in sub-step (1.3.2), altogether using the convolution kernel of 32 16 × 16, convolution kernel is respectively weighed Value, to 32 21 convolution eigenmatrix C21 included by the second eleventh floor convolution eigenmatrix1、C212、C213……C2132 It is operated, obtains 32 5 deconvolution eigenmatrix D211、D212、D213……D2132, form the 5th group of deconvolution feature Matrix D 21;
(1.3.5) cascades first group of deconvolution eigenmatrix D3 to the 5th groups of deconvolution eigenmatrix D21, forms fusion Eigenmatrix Dc, shown in specific cascade system such as formula (1):
Wherein N=32 represents every group of deconvolution feature by sub-step (1.3.1)-(1.3.4), being obtained per sub-steps The number of matrix, obtained fusion feature matrix D c sizes are H × W × 32, i.e. H rows, W row, 32 tensor dimensions, Dc1It represents First in 32 tensor dimensions, DctRepresent t-th in 32 tensor dimensions;
Involved each each weights of convolution kernel are using the parameter of VGG16 models text in sub-step (1.3.1) to sub-step (1.3.4) Numerical value is initialized in part vgg16-0001.params, and then corresponding to each layer afterwards using parameter updating step (1.5) later respectively rolls up Each weights of product core.
4. as claimed in claim 3 by 2D Video Quality Metrics into the method for 3D videos, it is characterised in that:
The View Synthesis step (1.4) includes following sub-steps:
(1.4.1) depth map:It to fusion feature matrix D c, is calculated, obtained each in corresponding left view using the following formula (2) Pixel takes probabilistic forecasting value during different parallaxes, forms the parallax probability matrix Dep of left view:
Wherein, matrix element DepkAlso for matrix, k=1,2 ..., 32, represent probability value p (y=k | x;θ), that is, it is k- to take parallax 16, each pixel of left view corresponding parallax probability value when regression parameter matrix is θ;Regression parameter matrix θ is by regression parameter [θ1, θ2..., θ32] composition,Represent that p-th of tensor dimension is in regression parameter θ in DcpUnder logistic regression value, p=1,2, ... 32, ginseng is returned to each of regression parameter matrix θ using numerical value in the Parameter File vgg16-0001.params of VGG16 models Number initialization, later then using each regression parameter of parameter updating step (1.5) corresponding regression parameter matrix θ afterwards;
(1.4.2) forms right wing synthesis view R, wherein the pixel value R of the i-th row jth rowi,jAs shown in formula (3):
Wherein, LdPan view during parallax d is translated for original left view,For its i-th row jth row pixel value,LI, j-dFor i in original left view, the pixel value at j-d positions, if j-d < 0,D be regarding Difference, -15≤d≤+ 16,For matrix D epkElement value at middle i, j position is i in left view, and pixel value regards at j positions Difference is taken as the probability of d, k=d+16, i=1~H, j=1~W.
5. as claimed in claim 4 by 2D Video Quality Metrics into the method for 3D videos, it is characterised in that:
The parameter updating step (1.5) includes following sub-steps:
(1.5.1) subtracts each other right wing synthesis view R with the right view in step (1.2), obtains error matrix errR;For continuous N Frame right wing synthesizes view, each error matrix errRIt is added, forms propagated error matrix errSR, M >=16;
(1.5.2) is by propagated error matrix errSRView Synthesis step (1.4) is propagated backward to, first by parallax probability matrix Dep is updated to new parallax probability matrix Depl+1, matrix element isAs shown in formula (4):
The updated value of the l+1 times when expression parallax is d (d=k-16), d=k-16;Represent that it is preceding primary Updated value, learning rate η are initialized as 0.00003~0.00008;
Secondly regression parameter matrix θ is updated to θ by (1.5.3)l+1, for θl+1Each regression parameterAccording to following public affairs Formula (5) updates;
WhereinRepresent regression parameter θpThe l+1 times updated value,Represent its previous updated value, DeppAs viewpoint Corresponding Dep after the update of synthesis step (1.4) parameterk, wherein p=k, LpFor the corresponding L of View Synthesis stepd, wherein d= P-16, DcpAs corresponding Dc in step (1.3.5)t, the value range of wherein t=p, p are the renewal rates that 1 to 32, λ is θ, 0.00001≤λ≤0.01;
(1.5.4) is by propagated error matrix errSRBecome characteristic error matrix errDR, as shown in formula (6):
Again by characteristic error matrix errDRCharacteristic extraction step (1.2) and Fusion Features step (1.3) are sent into, it is each to what is be related to The weights of convolution kernel are updated using the iterative manner of caffe deep learning frames;
Update 12608 convolution kernels altogether successively from the second eleventh floor to first layer, altogether 113472 convolution kernel weights;By Sub-step (1.5.2)~(1.5.4) completes the primary update of all parameters.
CN201710227433.4A 2017-04-07 2017-04-07 It is a kind of by 2D Video Quality Metrics into the method for 3D videos Active CN107018400B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710227433.4A CN107018400B (en) 2017-04-07 2017-04-07 It is a kind of by 2D Video Quality Metrics into the method for 3D videos

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710227433.4A CN107018400B (en) 2017-04-07 2017-04-07 It is a kind of by 2D Video Quality Metrics into the method for 3D videos

Publications (2)

Publication Number Publication Date
CN107018400A CN107018400A (en) 2017-08-04
CN107018400B true CN107018400B (en) 2018-06-19

Family

ID=59446292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710227433.4A Active CN107018400B (en) 2017-04-07 2017-04-07 It is a kind of by 2D Video Quality Metrics into the method for 3D videos

Country Status (1)

Country Link
CN (1) CN107018400B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977981B (en) * 2017-12-27 2020-11-24 深圳市优必选科技有限公司 Scene analysis method based on binocular vision, robot and storage device
CN109788270B (en) * 2018-12-28 2021-04-09 南京美乐威电子科技有限公司 3D-360-degree panoramic image generation method and device
CN109948689B (en) * 2019-03-13 2022-06-03 北京达佳互联信息技术有限公司 Video generation method and device, electronic equipment and storage medium
CN111932437B (en) * 2020-10-10 2021-03-05 深圳云天励飞技术股份有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2130178A1 (en) * 2007-03-23 2009-12-09 Thomson Licensing System and method for region classification of 2d images for 2d-to-3d conversion
CN102223553B (en) * 2011-05-27 2013-03-20 山东大学 Method for converting two-dimensional video into three-dimensional video automatically
CN104243956B (en) * 2014-09-12 2016-02-24 宁波大学 A kind of stereo-picture visual saliency map extracting method
CN104954780B (en) * 2015-07-01 2017-03-08 南阳师范学院 A kind of DIBR virtual image restorative procedure suitable for the conversion of high definition 2D/3D
CN105979244A (en) * 2016-05-31 2016-09-28 十二维度(北京)科技有限公司 Method and system used for converting 2D image to 3D image based on deep learning

Also Published As

Publication number Publication date
CN107018400A (en) 2017-08-04

Similar Documents

Publication Publication Date Title
Rematas et al. Soccer on your tabletop
Cao et al. Semi-automatic 2D-to-3D conversion using disparity propagation
JP7569439B2 (en) Scalable 3D Object Recognition in Cross-Reality Systems
JP7026222B2 (en) Image generation network training and image processing methods, equipment, electronics, and media
CN107018400B (en) It is a kind of by 2D Video Quality Metrics into the method for 3D videos
CN100407798C (en) Three-dimensional geometric mode building system and method
CN107578436A (en) A kind of monocular image depth estimation method based on full convolutional neural networks FCN
CN108932725B (en) Scene flow estimation method based on convolutional neural network
CN110827312B (en) Learning method based on cooperative visual attention neural network
US20210407125A1 (en) Object recognition neural network for amodal center prediction
WO2024055211A1 (en) Method and system for three-dimensional video reconstruction based on nerf combination of multi-view layers
CN112819951A (en) Three-dimensional human body reconstruction method with shielding function based on depth map restoration
Shi et al. Garf: Geometry-aware generalized neural radiance field
Guo et al. Real-Time Free Viewpoint Video Synthesis System Based on DIBR and A Depth Estimation Network
Yao et al. Neural Radiance Field-based Visual Rendering: A Comprehensive Review
CN117711066A (en) Three-dimensional human body posture estimation method, device, equipment and medium
CN117115398A (en) Virtual-real fusion digital twin fluid phenomenon simulation method
Inamoto et al. Fly through view video generation of soccer scene
Zhang et al. SivsFormer: Parallax-aware transformers for single-image-based view synthesis
CN111178163A (en) Cubic projection format-based stereo panoramic image salient region prediction method
Su et al. Objective visual comfort assessment model of stereoscopic images based on BP neural network
Wang et al. Local and nonlocal flow-guided video inpainting
CN118071969B (en) Method, medium and system for generating XR environment background in real time based on AI
Liu RETRACTED ARTICLE: Light image enhancement based on embedded image system application in animated character images
CN115294488B (en) AR rapid object matching display method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant