CN103379349B

CN103379349B - A kind of View Synthesis predictive coding method, coding/decoding method, corresponding device and code stream

Info

Publication number: CN103379349B
Application number: CN201210125366.2A
Authority: CN
Inventors: 赵寅; 虞露
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2012-04-25
Filing date: 2012-04-25
Publication date: 2016-06-29
Anticipated expiration: 2032-04-25
Also published as: CN103379349A

Abstract

The invention provides a kind of View Synthesis predictive coding method being applied to field of multimedia communication, in coding three-dimensional video sequence in the process of an image I in a vision point 1, adopt one synthesis precision X, by the reconstruction image in another encoded vision point 2 in three-dimensional video sequence and the reconstruction degree of depth, in synthesis vision point 1, the View Synthesis image P, View Synthesis image P of a region H1 are for producing the prediction pixel that in the process of coded image I, View Synthesis prediction needs；Synthesis precision X is write in the code stream of three-dimensional video sequence；The invention also discloses the code stream of a kind of View Synthesis prediction decoding method, View Synthesis predictive coding apparatus, decoding device and correspondence.The present invention can improve the code efficiency of three-dimensional video sequence.

Description

A kind of View Synthesis predictive coding method, coding/decoding method, corresponding device and code stream

Technical field

The present invention and a kind of field of multimedia communication, be specifically related to a kind of View Synthesis predictive coding method, coding/decoding method, corresponding device and code stream.

Background technology

3 D video (3Dvideo) sequence includes multichannel (being generally 2 tunnels) image sequence (comprising texture information) and the corresponding degree of depth (depth) sequence (comprises depth information, i.e. distance between object and video camera in each pixel corresponding three-dimensional space in image), it is also generally referred to as MVD (multi-viewvideoplusdepth) form.Three-dimensional video sequence comprises several and accesses unit (accessunit), and one accesses unit and includes the image in a certain moment of multiple (two or more than two) viewpoint and the degree of depth of correspondence thereof.Coding three-dimensional video sequence forms 3 D video code stream (bitstream), and code stream is made up of bit.Coded method can be based on MPEG-2, H.264/AVC, the coded method of the video encoding standard such as HEVC, AVS, VC-1.It should be noted that, video encoding standard defines the grammer (syntax) of code stream and the coding/decoding method to the code stream meeting coding standard, regulation does not produce the coded method of code stream, but the coding/decoding method coupling that the coded method adopted must specify with standard, form the code stream meeting coding standard, decoder just can complete the decoding to these code streams, and otherwise decoding process is likely to make mistakes.The image that decoding obtains is called decoding (decoded) image or rebuilds (reconstructed) image, and the degree of depth that decoding obtains is called decoding (decoded) degree of depth or rebuilds the degree of depth；(un-encoded) image being input to encoder is called original (original) image.It is to say, decoder internal also completes the information obtained by coding produces to rebuild the decoding process of image (or rebuilding the degree of depth), and this decoding process is identical to the decoding process of code stream with decoder.Current encoded image (or image of present encoding) refers to the original image at macro block (macroblck) place frame (frame) currently encoded, and currently decodes image (or image of current decoding) and refers to the reconstruction image of the macro block place frame currently decoded.Present encoding viewpoint refers to the viewpoint at the macro block place currently encoded, and current decoding viewpoint refers to the viewpoint at the macro block place currently decoded.Encoded viewpoint refers to the viewpoint at the macro block place that non-present encoding, and the frame in this viewpoint prior in present encoding viewpoint mutually frame in the same time complete coding；Decode the viewpoint at the macro block place that viewpoint refers to that non-present decoding, and the frame in this viewpoint has completed decoding prior to currently decoding in viewpoint frame in the same time mutually.

Current video encoding standard, for instance H.264/AVC, is mostly based on and comprises prediction (prediction) coding and convert mixing (hybrid) coding framework that (transform) encodes.Each two field picture in video sequence encodes frame by frame, and a two field picture is divided into several macro blocks (Macroblock), and each macro block can be subdivided into again several blocks (block).Common coded treatment can be summarized as: to each macro block, by a kind of Forecasting Methodology (corresponding to a kind of predictive mode), obtain the predicted picture (being predicted that pixel forms by a group) of this macro block, Forecasting Methodology therein generally includes infra-frame prediction (intraprediction, the predicted picture of this block is obtained from the reconstruction pixel around present encoding block) and inter prediction (interprediction obtains the predicted picture of this block from the reconstruction image of another encoded frame)；Prediction between also including multi-view point video (multi-viewvideo) sequence looking (inter-viewprediction, from another encoded viewpoint rebuild the predicted picture obtaining this block image)；The three-dimensional video sequence comprising depth information can also be included View Synthesis prediction (viewsynthesisprediction, from rebuilding image and rebuilding the predicted picture obtaining this block the View Synthesis image that the degree of depth generates by another encoded viewpoint).Then, the original image of this macro block and predicted picture are subtracted each other, obtains residual error (residual)；Afterwards, residual error is converted, obtain conversion coefficient；Conversion coefficient is quantified, and the side informations (sideinformation) such as the conversion coefficient after quantization and predictive mode are carried out entropy code, form code stream.Due to the introducing quantified, there is certain difference between image (namely rebuilding image) and the input picture (i.e. original image, or title current encoded image) of un-encoded after coding, be commonly referred to distortion.

Decoding process can be considered as the inverse process of coded treatment, generally includes following steps: to each macro block, the conversion coefficient after parsing prediction mode information corresponding to this macro block from code stream and quantifying；By predictive mode, adopt corresponding a kind of Forecasting Methodology, obtain the predicted picture of this macro block；By the conversion coefficient inverse quantization after quantization and inverse transformation, produce residual image, residual image and predicted picture are added, obtain rebuilding image.

View Synthesis prediction (viewsynthesisprediction, VSP) is the predictive coding of a kind of three-dimensional video sequence.View Synthesis prediction rebuilds image and the corresponding reconstruction degree of depth according to a vision point 2 a certain moment, adopt based on the degree of depth and image rendering (depth-image-basedrendering, it is called for short DIBR) technology, by the reconstruction image projection of V2 to another vision point 1, thus generating projection picture；Fill one or more process in (holefilling), the filtering process such as (filtering), Variable sampling (resampling) again through cavity, produce a View Synthesis image P in vision point 1.Original image O corresponding with synchronization vision point 1 for this View Synthesis image P has significantly high similarity, so predicted picture when being taken as original image O in a coding V1 viewpoint.Current coding macro block (is namely being encoded the macro block processed) and some pixels can selected from View Synthesis image as the predicted picture in this macro block, carries out subsequently converting, quantifies, entropy code, obtains code stream；Accordingly, current decoded macroblock (being namely decoded the macro block processed) can be obtained rebuilding image with the addition of corresponding residual information by this predicted picture.Generally, View Synthesis prediction is similar to traditional inter prediction, differ primarily in that View Synthesis predicts that the predicted picture used is a View Synthesis image generated with the reconstruction degree of depth by the reconstruction image of encoded (or the decoding) viewpoint being different from present encoding (or decoding) viewpoint, and the predicted picture that inter prediction uses is the reconstruction image in present encoding (or decoding) another moment of viewpoint.

In View Synthesis prediction, a key point is in that to produce high-quality View Synthesis image.This View Synthesis image and original image are closer to (namely similarity is more high), then predictive efficiency is more high, and corresponding residual error also can reduce.In View Synthesis, projection can adopt whole pixel precision, and by the grids of pixels of projection picture in the pixel projection in vision point 2 to vision point 1, and now projection picture is identical with the resolution of vision point 2 image.Method in the grids of pixels of projection picture in pixel projection each in vision point 2 to vision point 1 is generally pixel each in vision point 2 subpoint in vision point 1 (is obtained by based on the degree of depth and graphics rendering techniques, and subpoint is usually located in vision point 1 between two pixels) it is rounded to that pixel that horizontal range is nearest, produce a projected pixel.When a pixel only one of which projected pixel in grids of pixels in V1, take this projected pixel pixel as this pixel；When multiple projected pixel are corresponding to same pixel in grids of pixels in V1, then using that projected pixel of minimum for the degree of depth (distance video camera the is nearest) pixel as this pixel；When not having projected pixel for a pixel in grids of pixels in V1, this pixel is called cavity (hole).

In View Synthesis, projection can also adopt sub-pix synthesis precision, is designated as K/L pixel precision (K and L is positive integer, and L is generally the multiple of 2, and K is generally 1, for instance 1/2 pixel synthesis precision and 1/4 pixel synthesis precision).When adopting K/L pixel precision, by the grids of pixels of projection picture in the pixel projection in vision point 2 to vision point 1, and now projection is in vision point 2 L/K times of image level resolution as the resolution of horizontal direction.A pixel projection in vision point 2 is generally had following two method one or a combination set of to the method in the grids of pixels of projection picture in vision point 1: 1) by image (and degree of depth) horizontal direction up-sampling L/K times (i.e. L/K times that horizontal direction resolution is original image of up-sampling image) of vision point 2, then by the image projection of up-sampling to vision point 1, the projection picture that horizontal direction resolution is in vision point 2 L/K times of image level resolution is formed；2) be set in V2 by the horizontal resolution of the projection picture in V1 image L/K times, in V2 image, each pixel is multiplied by L/K with the calculated projected position of whole pixel precision, (namely on the projection picture of whole pixel precision, projection coordinate is (a to obtain the projected position under K/L pixel precision, b) point, projection coordinate under K/L pixel precision is (aK/L, b))；Then, synthesize in precision View Synthesis similar to whole pixel, projected position is rounded in grids of pixels the most close in projection picture, obtain projected pixel and then the projection picture of L/K times of the image level resolution that generation horizontal direction resolution is vision point 2.Projection picture forms View Synthesis image through process such as cavity fillings.In coding or decoding process, adopt the macro block of View Synthesis prediction or block will obtain some pixels from View Synthesis image (resolution is likely to different from the resolution of current encoded image) and produce the predicted picture (such as extracting M × K the pixel predicted picture as present encoding block from this View Synthesis image in one M × L region) of this macro block or block；Or, when projecting image resolution ratio and being different from the resolution of current encoded image, by the image projection up-sampling after the filling of cavity or down-sampling, obtaining the View Synthesis image identical with present encoding resolution, present encoding (or decoding) block obtains prediction pixel (such as taking out the pixel predicted picture as present encoding block of a block from this View Synthesis image) from this through the View Synthesis image that Variable sampling is formed.

Adopting the View Synthesis image that difference synthesis precision generates to be typically different, predictive efficiency during for predicting also has difference.In general, adopt sub-pix synthesis ratio of precision to adopt whole pixel synthesis precision can obtain higher-quality View Synthesis image, but computation complexity is also higher.For improving compressed encoding performance, encoder can select View Synthesis image the highest with current encoded image (i.e. original image) similarity in the View Synthesis image of multiple different synthesis precision as View Synthesis image by adaptive method, for producing the prediction pixel in View Synthesis prediction, and by corresponding synthesis precision information write code stream, carry out decoder and adopt corresponding synthesis precision to generate the View Synthesis image mated with coding side.

The information (such as predictive mode, conversion coefficient etc.) that coding produces is converted to code stream by entropy code.One information (such as synthesizing precision) can quantize, namely by the numerical value of certain scope (being generally integer field) to represent the content that information describes, such as an information describes three kinds of situations, then can represent these three situation respectively with 0,1,2.Information value turns to a syntactic element (syntaxelement), syntactic element can according to the scope of its numerical value and distribution situation, a kind of suitable code is adopted to be encoded, form code word (codeword, it is made up of one or more bits), namely become a string bit after syntactic element coding.Conventional code has n-bit fixed length code, Exp-Golomb, arithmetic coding etc.；In particular, 1 bit unsigned integer code includes two code words 0 and 1,2 bit unsigned integer code and includes the code word of four code words 00,01,10 and 11,0 rank Exp-Golomb and include code word 1,010,011 etc..Code word is reverted to the numerical value of syntactic element corresponding to this code word by corresponding coding/decoding method (namely such as look-up table searches the syntactic element numerical value that code word is corresponding from code table).One information can also be turned to a syntactic element by associating numerical value together with out of Memory, thus corresponding to a code word, for instance the combination of two information can be numbered, this is numbered as a syntactic element.Writing information into code stream, usual way for turning to syntactic element by information value, by code word write code stream corresponding for syntactic element.

Summary of the invention

For overcoming the drawbacks described above of prior art, it is an object of the invention to provide and a kind of improve the coding View Synthesis predictive coding method of predictive efficiency, coding/decoding method, corresponding device and code stream.

The present invention the first technical scheme is to provide a kind of View Synthesis predictive coding method, as the image I encoded in three-dimensional video sequence in a vision point 1, adopt one synthesis precision X, by the reconstruction image in another encoded vision point 2 in described three-dimensional video sequence and the reconstruction degree of depth, synthesize the View Synthesis image P of a region H1 in described vision point 1, described View Synthesis image P for producing the prediction pixel that in the process of coded image I, View Synthesis prediction needs；Meanwhile, described synthesis precision X is write in the code stream of described three-dimensional video sequence, wherein vision point 1 i.e. present encoding viewpoint.

As preferably, the preparation method of described synthesis precision X includes one of following methods:

1) adopting N kind synthesis precision to be respectively synthesized the View Synthesis image Pn of a region H2 in present encoding vision point 1, wherein N >=2,1≤n≤N, N, n are integer；Calculate the similarity between the image of corresponding region in each View Synthesis image Pn and image I, select the synthesis precision that the highest View Synthesis image of similarity is corresponding as described synthesis precision X from all View Synthesis image Pn；

2) the variance Vk of pixel in K region H3k is calculated in described image I, the meansigma methods V of the variance Vk of capture element, by the function f (V) about meansigma methods V, derive described synthesis precision X, wherein 1≤k≤K, k is integer；Described function f (V) is such asWherein E is constant, such as E=15,Representing and round under x, described synthesis precision X is 1/D pixel precision.

As preferably, 1) described in, " calculating the similarity between the image of corresponding region in each View Synthesis image Pn and image I " includes one of following methods:

Method one: to M1 pixel Q1m in described View Synthesis image Pn, calculate the difference D1m between the pixel R1m of correspondence position, wherein 1≤n≤N, 1≤m≤M1 in each pixel Q1m and described image I respectively, m, M1 are positive integer, and M1 is less than or equal to the total pixel number in described region H2；Calculating the meansigma methods ValB of meansigma methods ValA or the D1m square value of D1m absolute value, described meansigma methods ValA or meansigma methods ValB is more little, and in described View Synthesis image Pn and described image I, the similarity between the image of corresponding region is more high；

Method two: to M2 pixel Q2m in described View Synthesis image Pn, calculate the linearly dependent coefficient C between the pixel R2m of correspondence position in each pixel Q2m and described image I, described linearly dependent coefficient C is more big, in described View Synthesis image Pn and described image I, the similarity between the image of corresponding region is more high, wherein 1≤n≤N, 1≤m≤M2, n, N, m, M2 are positive integer, and M2 is less than or equal to the total pixel number in described region H2.

The present invention the second technical scheme is to provide a kind of View Synthesis prediction decoding method, as the image I decoded in three-dimensional video sequence in a vision point 1, parses described synthesis precision X corresponding for image I from the code stream of described three-dimensional video sequence；Adopt described synthesis precision X, by in described three-dimensional video sequence, another has decoded rebuilding image and rebuilding the degree of depth and synthesize the View Synthesis image P, described View Synthesis image P of a region H1 in described vision point 1 for producing the prediction pixel that in the process of decoding image I, View Synthesis prediction needs in vision point 2.

The present invention the 3rd technical scheme is to provide a kind of View Synthesis predictive coding apparatus, it is characterised in that include following two module:

View Synthesis image generation module, one synthesis precision X is adopted to generate View Synthesis image, its input includes synthesis precision X, reconstruction image in an encoded vision point 2 and the reconstruction degree of depth in three-dimensional video sequence, and the camera parameters information of described encoded vision point 2 and present encoding vision point 1, its output includes the View Synthesis image P of a region H1 in described vision point 1, when its process completed includes an image I in encoding described vision point 1, adopt described synthesis precision X, by the reconstruction image in described encoded vision point 2 and the reconstruction degree of depth, synthesize the View Synthesis image P of a region H1 in described vision point 1, described View Synthesis image P is for producing the prediction pixel that in the process of coded image I, View Synthesis prediction needs；

Synthesis precision writing module, described synthesis precision X is write the code stream of described three-dimensional video sequence, its input includes the code stream of described synthesis precision X and described three-dimensional video sequence, its output includes the three-dimensional video sequence code stream containing described synthesis precision X, and its process completed includes the code stream that described synthesis precision X writes described three-dimensional video sequence.

As preferably, described a kind of View Synthesis predictive coding apparatus also includes with one of lower module:

Synthesis precision decision-making module based on image similarity, its input includes rebuilding image and rebuilding the camera parameters information of the degree of depth and described encoded vision point 2 and present encoding vision point 1 in described encoded vision point 2, it is output as described synthesis precision X, its process completed includes the View Synthesis image Pn adopting N kind synthesis precision to be respectively synthesized a region H2 in present encoding vision point 1, wherein N >=2, n, N are integer, 1≤n≤N；Calculate the similarity between the image of corresponding region in each View Synthesis image Pn and image I, select the synthesis precision that the highest View Synthesis image of similarity is corresponding as synthesis precision X from Pn；

Synthesis precision decision-making module based on Texture complication, its input includes described image I, it is output as described synthesis precision X, its process completed includes calculating in described image I the variance Vk of pixel in K region H3k, take the meansigma methods V of Vk, by the function f (V) about meansigma methods V, derive described synthesis precision X, wherein 1≤k≤K, k, K are integer；Described function f (V) is such asWherein E is constant, such as E=15,Representing and round under x, described synthesis precision X is 1/D pixel precision.

As preferably, described based in the synthesis precision decision-making module of image similarity, similarity between the image of corresponding region in described calculating each View Synthesis image Pn and image I, including one of following process:

Process one: to M1 pixel Q1m in described View Synthesis predicted picture Pn, calculate the difference D1m between the pixel R1m of correspondence position, wherein 1≤n≤N, 1≤m≤M1 in each pixel Q1m and described image I respectively, m, M1 are positive integer, and M1 is less than or equal to the total pixel number in described region H2；Calculating the meansigma methods ValB of meansigma methods ValA or the D1m square value of D1m absolute value, described meansigma methods ValA or meansigma methods ValB is more little, and in described View Synthesis image Pn and described image I, the similarity between the image of corresponding region is more high；

Process two: to M2 pixel Q2m in described View Synthesis image Pn, calculate the linearly dependent coefficient C between the pixel R2m of correspondence position in each pixel Q2m and described image I, described linearly dependent coefficient C is more big, in described View Synthesis image Pn and described original image O, the similarity between the image of corresponding region is more high, wherein 1≤n≤N, 1≤m≤M2, m, M2 are positive integer, and M2 is less than or equal to the total pixel number in described region H2.

The present invention the 4th technical scheme is to provide a kind of View Synthesis prediction decoding device, including following two module:

Synthesis precision parsing module, synthesis precision X is parsed from three-dimensional video sequence code stream, its input is three-dimensional video sequence code stream, it is output as described synthesis precision X, its function completed is as the image I decoded in three-dimensional video sequence in a vision point 1, parses described synthesis precision X corresponding for image I from the code stream of described three-dimensional video sequence；

View Synthesis image generation module, the View Synthesis image P in described vision point 1 is synthesized with described synthesis precision X, its input includes synthesis precision X, one reconstruction image having decoded vision point 2 and the reconstruction degree of depth, the described camera parameters having decoded vision point 2 and described vision point 1, it is output as the View Synthesis image P of a region H1 in described vision point 1, its process completed includes being synthesized the View Synthesis image P of a region H1 in described vision point 1 by the described image of rebuilding decoded in vision point 2 with the reconstruction degree of depth, described View Synthesis image P for produce decoding image I process in View Synthesis prediction need prediction pixel.

The present invention the 5th technical scheme is to provide the code stream of a kind of View Synthesis prediction, comprises information corresponding to synthesis precision X；Described synthesis precision is the synthesis precision generating View Synthesis image in View Synthesis prediction.

Beneficial effect: compared with prior art, the View Synthesis predictive coding method of the present invention, coding/decoding method, corresponding device and code stream are by a best synthesis precision, generate View Synthesis image, for providing the prediction pixel needed in View Synthesis prediction, thus improving coding predictive efficiency.

Accompanying drawing explanation

In conjunction with accompanying drawing, the other features and advantages of the invention can become more apparent upon from below by the explanation of the preferred implementation that principles of the invention is made an explanation of illustrating.

Fig. 1 is the structural representation of an embodiment of View Synthesis predictive coding apparatus in the present invention；

Fig. 2 is the structural representation of another embodiment of View Synthesis predictive coding apparatus in the present invention；

Fig. 3 is the structural representation of the further embodiment of View Synthesis predictive coding apparatus in the present invention；

Fig. 4 is the structural representation of an embodiment of View Synthesis prediction decoding device in the present invention.

Detailed description of the invention

Below in conjunction with accompanying drawing, embodiments of the present invention are described in detail:

Embodiment 1

First embodiment of the present invention relates to a kind of View Synthesis predictive coding method.One three-dimensional video sequence (is included at least two viewpoint, each viewpoint includes an image sequence and a degree of depth sequence), in the process of the image I (image I is the original image of the T frame of image sequence in vision point 1) in encoding one of them vision point 1, adopt one synthesis precision X, utilize based on the degree of depth and graphics rendering techniques, by (or another moment of synchronization in another encoded vision point 2 in described three-dimensional video sequence, such as T-1 frame) reconstruction image and rebuild the degree of depth, the View Synthesis image P of a region H1 in the vision point 1 that synthesis is described.Described region H1 can be whole described image I, can also be a region specified in image I (such as center and image I center superposition and be sized to image I size 1/2 matrix window), can also be a macro block in image I or block, etc..Described View Synthesis image P is for producing the prediction pixel that in the process of the described image I of coding, View Synthesis prediction needs, namely encode and the process of described image I has at least a macro block adopting View Synthesis prediction or block, using prediction pixel as this macro block or block of the pixel chosen in View Synthesis image P.Such as, when using whole pixel synthesis precision, it is possible to take the pixel of a rectangular area identical with this macro block or block size from P, as the prediction pixel of this macro block or block；When P adopts sub-pixel precision, one M × L region is extracted M × J pixel from P, or it is M × J pixel by Variable sampling (i.e. up-sampling or down-sampling) in P M × L region, as the prediction pixel of this macro block or block (being sized to M × J pixel).

Meanwhile, by the code stream of three-dimensional video sequence described for described synthesis precision X write；Described synthesis precision X can be whole pixel precision or sub-pixel precision, sub-pixel precision such as 1/2 pixel precision, 1/4 pixel precision, 1/3 pixel precision, k/l pixel precision (k, l are positive integer) etc..

When the synthesis precision X adopted in View Synthesis is whole pixel precision, by the grids of pixels of projection picture in the pixel projection in vision point 2 to vision point 1, and now projection picture is identical with the resolution of vision point 2 image.Method in the grids of pixels of projection picture in pixel projection each in vision point 2 to vision point 1 is generally pixel each in vision point 2 subpoint in vision point 1 (is obtained by based on the degree of depth and graphics rendering techniques, it is usually located in vision point 1 between two pixels) it is rounded to that pixel that horizontal range is nearest, produce a projected pixel.When a pixel only one of which projected pixel in grids of pixels in V1, take this projected pixel pixel as this pixel；When multiple projected pixel are corresponding to same pixel in grids of pixels in V1, then using that projected pixel of minimum for the degree of depth (distance video camera the is nearest) pixel as this pixel.

If the synthesis precision X adopted in View Synthesis is sub-pix synthesis precision, then synthesis precision X is designated as K/L pixel precision (K and L is positive integer, and L is generally the multiple of 2, and K is generally 1, for instance 1/2 pixel synthesis precision and 1/4 pixel synthesis precision).When adopting K/L pixel precision, by the grids of pixels of projection picture in the pixel projection in vision point 2 to vision point 1, and now projection is in vision point 2 L/K times of image level resolution as the resolution of horizontal direction.A pixel projection in vision point 2 is generally had following two method one or a combination set of to the method in the grids of pixels of projection picture in vision point 1: 1) by image (and degree of depth) horizontal direction up-sampling L/K times (i.e. L/K times that horizontal direction resolution is original image of up-sampling image) of vision point 2, then by the image projection of up-sampling to vision point 1, the projection picture of L/K times of the image level resolution that horizontal direction resolution is vision point 2 is formed；2) be set in V2 by the horizontal resolution of the projection picture in V1 image L/K times, pixel each in V2 image is multiplied by L/K with the calculated projected position of whole pixel precision, (namely on the projection picture of whole pixel precision, projection coordinate is (a to obtain the projected position under K/L pixel precision, b) point, projection coordinate under K/L pixel precision is (aK/L, b)), again projected position is rounded in grids of pixels the most close in projection picture, obtain projected pixel, and then the projection picture of L/K times of the image level resolution that generation horizontal direction resolution is vision point 2.

Projection picture exist during cavity (hole) (projection picture does not have projected pixel something like vegetarian refreshments), also need to carry out cavity filling (holefilling) and carry out the pixel that filling cavity region lacks, obtain View Synthesis image.

Embodiment 2

Second embodiment of the present invention relates to a kind of View Synthesis predictive coding method.To a three-dimensional video sequence, in the process of the image I (image I is the original image of the T frame of image sequence in vision point 1) in encoding one of them vision point 1, adopt one synthesis precision X, utilize based on the degree of depth and graphics rendering techniques, by the reconstruction image of synchronization in another encoded vision point 2 in described three-dimensional video sequence and the reconstruction degree of depth, the View Synthesis image P of a region H1 in the vision point 1 that synthesis is described.

Wherein synthesis precision X determines by the following method.According to the reconstruction image of synchronization in vision point 2 with rebuild degree of depth out of Memory such as (and) camera parameters, adopt based on the degree of depth and graphics rendering techniques, with N number of synthesis precision (such as N=2, respectively whole picture number synthesis precision and 1/2 pixel synthesis precision) N number of View Synthesis image Pn (1≤n≤N, n is integer) of being respectively synthesized in present encoding vision point 1 a region H2.

Adopt N (N >=2, N is integer) to plant synthesis precision and be respectively synthesized the View Synthesis image Pn (1≤n≤N, n is integer) of a region H2 in present encoding vision point 1.Described region H2 can be whole described image I, it is also possible to be region H1, it is also possible to be the set in the area multiple scattered region (such as macro block) less than image I.

To described N number of View Synthesis image Pn, calculate the similarity between the image of corresponding region U in each View Synthesis image Pn and described image I respectively.When adopting whole pixel synthesis precision, U position in image I in region is identical with View Synthesis image Pn position in projection picture.When adopting sub-pix synthesis precision, described corresponding region U is according to region identical with View Synthesis image Pn position in projection picture in the image after the synthesis precision correspondingly Variable sampling (up-sampling or down-sampling) adopted by image I.

Select the synthesis precision that View Synthesis image that described similarity is the highest is corresponding as described synthesis precision X from all Pn.Described similarity calculating method includes one of following two method:

Method one: to M1 the pixel Q1m (1≤m≤M1 in described View Synthesis image Pn (1≤n≤N), m is positive integer, M1 is positive integer, M1 is less than or equal to the total pixel number in described region H2), calculate the difference D1m (1≤m≤M1) between the pixel R1m (1≤m≤M1) of correspondence position (namely in the U of region, coordinate is identical) in each pixel Q1m (1≤m≤M) and described image I respectively；Calculate the meansigma methods ValB of meansigma methods ValA or D1m (1≤m≤M1) square value of D1m (1≤m≤M1) absolute value, described meansigma methods ValA or meansigma methods ValB is more little, and in described View Synthesis image Pn and described image I, the similarity between the image of corresponding region U is more high；

Method two: to M2 the pixel Q2m (1≤m≤M2 in described View Synthesis image Pn (1≤n≤N), m is positive integer, M2 is positive integer, M2 is less than or equal to the total pixel number in described region H2), calculate the linearly dependent coefficient C between the pixel R2m (1≤m≤M2) of correspondence position (namely in the U of region, coordinate is identical) in each pixel Q2m (1≤m≤M2) and described image I, namely have:

C = \frac{Σ_{m = 1}^{M} (Q_{m} - Q_{mean}) (R_{m} - R_{mean})}{\sqrt{Σ_{m = 1}^{M} {(Q_{m} - Q_{mean})}^{2}} \sqrt{Σ_{m = 1}^{M} {(R_{m} - R_{mean})}^{2}}}

Wherein Q_meanAnd R_meanAverage for Qm (1≤m≤M) and Rm (1≤m≤M).Described linearly dependent coefficient C is more big, and in described View Synthesis image Pn and described image I, the similarity between the image of corresponding region U is more high.

Meanwhile, by the code stream of three-dimensional video sequence described for synthesis precision write corresponding for View Synthesis image the highest for similarity.Synthesis precision COUNTABLY VALUED turns to a syntactic element including 0 and 1 two value, and wherein 0 and 1 represents whole pixel projection precision and 1/2 pixel projection precision respectively.Synthesis precision D can adopt 0 rank Exp-Golomb coding, it would however also be possible to employ 1 bit unsigned integer code coding.

Embodiment 3

3rd embodiment of the present invention relates to a kind of View Synthesis predictive coding method.To a three-dimensional video sequence, in the process of the image I (pixel in the image I described in the present embodiment is original pixels) in encoding one of them vision point 1, adopt one synthesis precision X, utilize based on the degree of depth and graphics rendering techniques, by the reconstruction image of synchronization in another encoded vision point 2 in described three-dimensional video sequence and the reconstruction degree of depth, the View Synthesis image P of a region H1 in the vision point 1 that synthesis is described.

Wherein synthesis precision X determines by the following method.

In calculating image I, the variance Vk (1≤k≤K) of pixel in K region H3k (1≤k≤K, k is integer), takes the meansigma methods V of Vk, by a function f (V) about V, derives described synthesis precision X；Described f (V) is such as

1)Wherein E is constant, for instance E=15,Representing and round under x, described synthesis precision X is 1/D pixel precision；

2)Wherein F is constant, for instance F=10, and described synthesis precision X is 1/D pixel precision.

Meanwhile, the code stream of the described three-dimensional video sequence of precision X write will be combined to.Synthesis precision COUNTABLY VALUED turns to a syntactic element including N number of nonnegative integer, and its numerical value Y represents that synthesis precision X is 1/Y pixel precision respectively.

Embodiment 4

4th embodiment of the present invention relates to a kind of View Synthesis prediction decoding method.When an image I (pixel in the image I described in the present embodiment is for rebuilding pixel) decoded in three-dimensional video sequence in a vision point 1, from the code stream of described three-dimensional video sequence, parse described synthesis precision X corresponding for image I；Adopt described synthesis precision X, utilize based on the degree of depth and graphics rendering techniques, by in described three-dimensional video sequence, another has decoded reconstruction image in vision point 2 and has rebuild the degree of depth (and corresponding camera parameters, such as focal length, camera coordinates etc.) synthesize the View Synthesis image P of a region H1 in described vision point 1.Described region H1 can be whole described image I, can also be a region specified in image I (such as center and image I center superposition and be sized to image I size 1/2 matrix window), can also be a macro block in image I or block, etc..Described View Synthesis image P is for producing the prediction pixel that in the process of the described image I of decoding, View Synthesis prediction needs, namely decode and the process of described image I has at least a macro block adopting View Synthesis prediction or block, using prediction pixel as this macro block or block of the pixel chosen in View Synthesis image P.Such as, when using whole pixel synthesis precision, it is possible to take the pixel of a rectangular area identical with this macro block or block size from P, as the prediction pixel of this macro block or block；When P adopts sub-pixel precision, one M × L region is extracted M × J pixel from P, or it is M × J pixel by Variable sampling (i.e. up-sampling or down-sampling) in P M × L region, as the prediction pixel of this macro block or block (being sized to M × J pixel).

Described synthesis precision X can be whole pixel precision or sub-pixel precision, sub-pixel precision such as 1/2 pixel precision, 1/4 pixel precision, 1/3 pixel precision, 3/8 pixel precision, k/l pixel precision (k, l are positive integer) etc..

Embodiment 5

5th embodiment of the present invention relates to a kind of View Synthesis predictive coding apparatus.Fig. 1 is the example structure schematic diagram of a kind of View Synthesis predictive coding apparatus.This device includes following two module: adopt a kind of precision that synthesizes to generate the View Synthesis image generation module of View Synthesis image, and described synthesis precision information writes the synthesis precision writing module of three-dimensional video sequence code stream.

The input of View Synthesis image generation module includes the reconstruction image of an encoded vision point 2 and the reconstruction degree of depth in three-dimensional video sequence, synthesis precision X, the information such as the camera parameters of encoded vision point 2 and present encoding vision point 1, its output includes the View Synthesis image P of present encoding vision point 1, its function completed and embodiment and above-mentioned View Synthesis predictive coding method adopt one synthesis precision X, utilize based on the degree of depth and graphics rendering techniques, by (or another moment of synchronization in vision point 2 encoded in described three-dimensional video sequence, such as T-1 frame) reconstruction image and rebuild the degree of depth, in the vision point 1 that synthesis is described, the function of the View Synthesis image P of a region H1 is identical with embodiment.

The input of synthesis precision writing module includes described synthesis precision X and three-dimensional video sequence code stream, it is identical with the function of the code stream that described synthesis precision X writes in above-mentioned View Synthesis predictive coding method described three-dimensional video sequence and embodiment that its output includes the three-dimensional video sequence code stream containing described synthesis precision X, its function completed and embodiment.

Embodiment 6

6th embodiment of the present invention relates to a kind of View Synthesis predictive coding apparatus.Fig. 2 is another example structure schematic diagram in a kind of View Synthesis predictive coding apparatus.This device is distinctive in that with the device in embodiment 5, a synthesis precision decision-making module based on image similarity was also included before described View Synthesis image generation module, its input includes the reconstruction image in described encoded vision point 2 and rebuilds the degree of depth, and the information such as the camera parameters of described encoded vision point 2 and present encoding vision point 1, the image I (pixel in the image I described in the present embodiment is original pixels) of present encoding in present encoding vision point 1, it is output as described synthesis precision X (input as View Synthesis image generation module), its function completed and embodiment generate multiple View Synthesis image Pn (1≤n≤N) of a region H2 in described present encoding vision point 1 respectively from adopting different synthesis precision in above-mentioned View Synthesis predictive coding method, calculate the similarity between the image of corresponding region in each described View Synthesis image Pn and described image I, the synthesis precision selecting View Synthesis image that described similarity is the highest corresponding from all Pn is identical with embodiment as the function of described synthesis precision X.

Embodiment 7

7th embodiment of the present invention relates to a kind of View Synthesis predictive coding apparatus.Fig. 3 is another embodiment structural representation in a kind of View Synthesis predictive coding apparatus.This device is distinctive in that with the device in embodiment 5, a synthesis precision decision-making module based on Texture complication was also included before described View Synthesis image generation module, its input includes the image I (pixel in the image I described in the present embodiment is original pixels) of present encoding, it is output as described synthesis precision X, its function completed and embodiment and above-mentioned View Synthesis predictive coding method calculate K region H3k (1≤k≤K in described image I, k is integer) in the variance Vk (1≤k≤K) of pixel, take the meansigma methods V of Vk, by a function f (V) about V, the function deriving described synthesis precision X is identical with embodiment.

Embodiment 8

8th embodiment of the present invention relates to a kind of View Synthesis prediction decoding device.Fig. 4 is the example structure schematic diagram of a kind of View Synthesis prediction decoding device.This device includes following two module: parses the synthesis precision parsing module of synthesis precision X from three-dimensional video sequence code stream, and adopts the View Synthesis image generation module of the View Synthesis image of the described vision point 1 described in synthesis precision X synthesis.

The input of synthesis precision parsing module is three-dimensional video sequence code stream, it is output as described synthesis precision X, in its function completed and embodiment and above-mentioned View Synthesis prediction decoding method when an image I (pixel in the image I described in the present embodiment is for rebuilding pixel) decoded in three-dimensional video sequence in a vision point 1, parse from the code stream of described three-dimensional video sequence described image I corresponding described in the function of synthesis precision X identical with embodiment；

The input of View Synthesis image generation module include described synthesis precision information X, one decoded rebuilding image and rebuilding the degree of depth, described decode vision point 2 and the described current camera parameters decoding vision point 1 of vision point 2, it is identical by the described function rebuilding the View Synthesis image P of a region H1 in the vision point 1 described in image and reconstruction degree of depth synthesis decoding in vision point 2 and embodiment with above-mentioned View Synthesis prediction decoding method that it is output as the View Synthesis image P of a region H1 in described vision point 1, its function completed and embodiment.Described View Synthesis image P is for producing the prediction pixel that in the process of the described image I of decoding, View Synthesis prediction needs.

Described View Synthesis predictive coding apparatus and decoding device can be realized by various ways, for instance:

Method one: realize for the additional software program identical with described View Synthesis predictive coding method and decoding methodological function of hardware with electronic computer.

Method two: realize for the additional software program identical with described View Synthesis predictive coding method and decoding methodological function of hardware with single-chip microcomputer.

Method three: realize for the additional software program identical with described View Synthesis predictive coding method and decoding methodological function of hardware with digital signal processor.

Method four: design the circuit identical with described View Synthesis predictive coding method and decoding methodological function and realize.

Realize the method that the method for described View Synthesis predictive coding apparatus and decoding device can also have other, be not limited only to above-mentioned four kinds.

Embodiment 9

9th embodiment of invention relates to the code stream of a kind of View Synthesis prediction, comprises the information that synthesis precision is corresponding；Described synthesis precision is the synthesis precision generating View Synthesis image in View Synthesis prediction.

In code stream one synthesizes information corresponding to precision and may be used for the View Synthesis prediction (generating the synthesis precision of video composite picture in instruction View Synthesis prediction) of all frames of all viewpoints in a three-dimensional video sequence, or may be used for the prediction of the View Synthesis of all frames of some viewpoint in a three-dimensional video sequence (or some frames), or may be used for the prediction of the View Synthesis of a certain frame of some viewpoint in a three-dimensional video sequence, or may be used for the View Synthesis of certain region (such as macro block) in a certain frame of some viewpoint in three-dimensional video sequence prediction.

Although being described in conjunction with the accompanying embodiments of the present invention, but those of ordinary skill in the art can make various deformation or amendment within the scope of the appended claims.

Claims

1. a View Synthesis predictive coding method, it is characterized in that, as the image I encoded in three-dimensional video sequence in a vision point 1, adopt one synthesis precision X, by the reconstruction image in another encoded vision point 2 in described three-dimensional video sequence and the reconstruction degree of depth, synthesize the View Synthesis image P of a region H1 in described vision point 1, described View Synthesis image P for producing the prediction pixel that in the process of coded image I, View Synthesis prediction needs；Meanwhile, the described synthesis precision X of region H1 is write in the code stream of described three-dimensional video sequence, wherein vision point 1 i.e. present encoding viewpoint；The preparation method of described synthesis precision X includes one of following methods:

2) the variance Vk of pixel in K region H3k is calculated in described image I, the meansigma methods V of the variance Vk of capture element, by the function f (V) about meansigma methods V, derive described synthesis precision X, wherein 1≤k≤K, k is integer, and V is become integer value by described function f (V), D=f (v), described synthesis precision X is 1/D pixel precision.

2. as claimed in claim 1 a kind of View Synthesis predictive coding method, it is characterised in that 1) described in " calculating the similarity between the image of corresponding region in each View Synthesis image Pn and image I " include one of following methods:

3. a View Synthesis prediction decoding method, it is characterised in that as the image I decoded in three-dimensional video sequence in a vision point 1, parse described synthesis precision X corresponding for image I from the code stream of described three-dimensional video sequence；Adopt described synthesis precision X, by in described three-dimensional video sequence, another has decoded rebuilding image and rebuilding the View Synthesis image P of a region H1 in the degree of depth described vision point 1 of synthesis in vision point 2, described View Synthesis image P for produce decoding image I process in View Synthesis prediction need prediction pixel, for different described synthesis precision X, the method producing described prediction pixel is different.

4. a View Synthesis predictive coding apparatus, it is characterised in that include following two module:

5. a kind of View Synthesis predictive coding apparatus as claimed in claim 4, it is characterised in that also include with one of lower module:

Synthesis precision decision-making module based on Texture complication, its input includes described image I, it is output as described synthesis precision X, its process completed includes calculating in described image I the variance Vk of pixel in K region H3k, take the meansigma methods V of Vk, by the function f (V) about meansigma methods V, derive described synthesis precision X, wherein 1≤k≤K, k, K are integer, V is become integer value, D=f (v) by described function f (V), and described synthesis precision X is 1/D pixel precision.

6. a kind of View Synthesis predictive coding apparatus as claimed in claim 5, it is characterized in that, described based in the synthesis precision decision-making module of image similarity, similarity between the image of corresponding region in described calculating each View Synthesis image Pn and image I, including one of following process:

Process one: to M1 pixel Q1m in described View Synthesis image Pn, calculate the difference D1m between the pixel R1m of correspondence position, wherein 1≤n≤N, 1≤m≤M1 in each pixel Q1m and described image I respectively, m, M1 are positive integer, and M1 is less than or equal to the total pixel number in described region H2；Calculating the meansigma methods ValB of meansigma methods ValA or the D1m square value of D1m absolute value, described meansigma methods ValA or meansigma methods ValB is more little, and in described View Synthesis image Pn and described image I, the similarity between the image of corresponding region is more high；

Process two: to M2 pixel Q2m in described View Synthesis image Pn, calculate the linearly dependent coefficient C between the pixel R2m of correspondence position in each pixel Q2m and described image I, described linearly dependent coefficient C is more big, in described View Synthesis image Pn and described image I, the similarity between the image of corresponding region is more high, wherein 1≤n≤N, 1≤m≤M2, m, M2 are positive integer, and M2 is less than or equal to the total pixel number in described region H2.

7. a View Synthesis prediction decoding device, it is characterised in that include following two module: