Disclosure of Invention
The technical problem to be solved by the invention is to provide a depth image post-processing method which can effectively improve the rendering performance of a virtual viewpoint image on the basis of keeping the compression efficiency of the depth image.
The technical scheme adopted by the invention for solving the technical problems is as follows: a method for post-processing a depth image is characterized in that the processing process comprises the following steps: firstly, coding an obtained color image and a depth image corresponding to the color image to obtain a coded code stream; then, obtaining coding distortion compensation parameters of the depth image, and coding the coding distortion compensation parameters of the depth image to obtain a parameter code stream; then decoding the coded code stream and the parameter code stream to obtain a decoded color image, a decoded depth image and a decoded coding distortion compensation parameter of the depth image; and then, compensating the decoded depth image by using the coding distortion compensation parameter of the depth image to obtain a depth compensation image, and performing filtering processing on the depth compensation image to obtain a depth filtering image, wherein the depth filtering image is used for drawing a virtual viewpoint image.
The post-processing method comprises the following specific steps:
acquiring K color images with YUV color spaces of K reference viewpoints at t moment and K depth images corresponding to the color images, and recording the color image of the kth reference viewpoint at t moment as
Record the depth image of the kth reference viewpoint at the time t as
Wherein, K is not less than 1 and not more than K, the initial value of K is 1, i is 1,2,3 respectively represent three components of YUV color space, the 1 st component of YUV color space is a brightness component and is marked as Y, the 2 nd component is a first chroma component and is marked as U, and the 3 rd component is a second chroma component and is marked as V, (x, Y) represents the coordinate position of the pixel point in the color image and the depth image, x is not less than 1 and not more than W, Y is not less than 1 and not more than H, W represents the width of the color image and the depth image, H represents the height of the color image and the depth image,
color image representing the kth reference viewpoint at time t
The value of the ith component of the pixel point with the middle coordinate position of (x, y),
depth image representing kth reference viewpoint at time t
The middle coordinate position is the depth value of the pixel point of (x, y);
respectively coding K color images with YUV color spaces of K reference viewpoints at the time t and K depth images corresponding to the color images according to a set coding prediction structure, outputting color image code streams and depth image code streams frame by frame to obtain coding code streams, and transmitting the coding code streams to a user terminal by a server through a network;
thirdly, according to the K depth images of the K reference viewpoints at the time t and the K depth images of the K reference viewpoints at the time t obtained by decoding after encoding, predicting and obtaining encoding distortion compensation parameters of the K depth images of the K reference viewpoints at the time t by adopting a wiener filter, then respectively encoding the encoding distortion compensation parameters of the K depth images of the K reference viewpoints at the time t by adopting a CABAC lossless compression method, outputting parameter code streams frame by frame, and finally transmitting the parameter code streams to a user terminal by a service terminal through a network;
decoding the coded code stream sent by the server by the user end to respectively obtain K color images and corresponding K depth images of K reference viewpoints at t moment after decoding, and correspondingly recording the color image and the corresponding depth image of the kth reference viewpoint at t moment after decoding as
And
wherein,
color image representing k-th reference viewpoint at time t after decoding
The middle coordinate position is (x, y)The value of the ith component of the pixel point of (a),
depth image representing kth reference viewpoint at decoded t-time
The middle coordinate position is the depth value of the pixel point of (x, y);
fifthly, the user end decodes the parameter code stream sent by the server end to obtain the coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment, then the coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment are utilized to compensate the K depth images of the K reference viewpoints at the t moment after decoding, the K depth compensation images of the K reference viewpoints at the t moment after decoding are obtained, and the depth compensation image of the kth reference viewpoint at the t moment after decoding is recorded as the depth compensation image of the kth reference viewpoint at the t moment
Wherein,
depth compensated image representing kth reference viewpoint at decoded t-time
The middle coordinate position is the depth value of the pixel point of (x, y);
sixthly, respectively carrying out bidirectional filtering processing on the decoded K depth compensation images of the K reference viewpoints at the t moment by adopting a bidirectional filter to obtain K depth filtering images of the K reference viewpoints at the t moment, and recording the depth filtering image of the kth reference viewpoint at the t moment as a decoded depth filtering image
Wherein,
indicating time t after decodingDepth filtered image of kth reference viewpoint
The middle coordinate position is the depth value of the pixel point of (x, y).
The specific process of obtaining the coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment in the third step is as follows:
③ 1, the depth image of the kth reference viewpoint currently processed in the K depth images of the K reference viewpoints at the time t
Defining the depth image as a current depth image;
③ 2, for the current depth image
Implementing 3-level wavelet transform to obtain wavelet coefficient matrix of 3 directional sub-bands of each level of wavelet transform, the 3 directional sub-bands including horizontal sub-band, vertical sub-band and diagonal sub-band
The wavelet coefficient matrix of the nth direction sub-band obtained after the mth level wavelet transformation is carried out is recorded as
Wherein m is more than or equal to 1 and less than or equal to 3, n is more than or equal to 1 and less than or equal to 3,
to represent
The middle coordinate position is the wavelet coefficient at (x, y);
thirdly, 3, the depth image of the kth reference viewpoint at the t moment obtained by decoding after encoding
Implementing 3-level wavelet transform to obtain wavelet coefficient matrix of 3 directional sub-bands of each level of wavelet transform, the 3 directional sub-bands including horizontal sub-band, vertical sub-band and diagonal sub-band
The wavelet coefficient matrix of the nth direction sub-band obtained after the mth level wavelet transformation is carried out is recorded as
Wherein m is more than or equal to 1 and less than or equal to 3, n is more than or equal to 1 and less than or equal to 3,
to represent
The middle coordinate position is the wavelet coefficient at (x, y);
thirdly, predicting and obtaining the depth image of the kth reference viewpoint at the t moment after decoding by adopting a wiener filter
The coding distortion compensation parameters of the wavelet coefficient matrix of each directional subband of each level of wavelet transform are
Is recorded as a coding distortion compensation parameter
Wherein L represents the filtering length range of the wiener filter,
expression solution
Mathematical expectation ofThe value of the one or more of,
to represent
The middle coordinate position is the wavelet coefficient at (X + p, y + q), argmin (X) represents the parameter that minimizes the function X;
③ -5, according to the depth image of the kth reference viewpoint at the moment t after decoding
The coding distortion compensation parameters of the wavelet coefficient matrixes of the sub-bands in all directions of each level of wavelet transformation are obtained to obtain the current depth image
Coding distortion compensation parameter of
And (3) taking the depth image of the next to-be-processed reference viewpoint in the K depth images of the K reference viewpoints at the time t as the current depth image, and then returning to the step (2) to continue executing until the depth images of all the reference viewpoints in the K depth images of the K reference viewpoints at the time t are processed, wherein the initial value of K' is 0.
The fifth step obtains the depth compensation image of the kth reference viewpoint at the t moment after decoding
The specific process comprises the following steps:
fifthly-1, decoding the depth image of the kth reference viewpoint at the t moment
Implementing 3-level wavelet transform to obtain 3 directional sub-band minimums of each level of wavelet transformA wave coefficient matrix, 3 directional subbands including a horizontal subband, a vertical subband and a diagonal subband
The wavelet coefficient matrix of the nth direction sub-band obtained after the mth level wavelet transformation is carried out is recorded as
Wherein m is more than or equal to 1 and less than or equal to 3, n is more than or equal to 1 and less than or equal to 3,
to represent
The middle coordinate position is the wavelet coefficient at (x, y);
fifthly-2, calculating the depth image of the kth reference viewpoint at the t moment after decoding
The wavelet coefficient matrixes of all direction sub-bands of each level of wavelet transform are respectively compensated to obtain wavelet coefficient matrixes
The compensated wavelet coefficient matrix is recorded as
Wherein,
to represent
The middle coordinate position is a wavelet coefficient at (x + p, y + q);
fifthly, 3, decoding the depth image of the kth reference viewpoint at the t moment
The wavelet coefficient matrixes of the sub-bands in all directions of each level of wavelet transform are respectively subjected to inverse wavelet transform after compensation to obtain a depth compensation image of the kth reference viewpoint at the t moment after decoding, and the depth compensation image is recorded as the depth compensation image
Wherein,
depth compensated image representing kth reference viewpoint at decoded t-time
The middle coordinate position is the depth value of the pixel point of (x, y).
The step of compensating the depth of the kth reference viewpoint of the decoded t time
The specific process of performing the bidirectional filtering processing is as follows:
sixthly-1, defining the depth compensation image of the kth reference viewpoint at the t moment after decoding
The currently processed pixel point is the current pixel point;
sixthly-2, recording the coordinate position of the current pixel point as p ', recording the coordinate position of the neighborhood pixel point of the current pixel point as q', and then adopting a gradient template G
xFor the depth value of the current pixel point
Performing convolution operation to obtain gradient value gx (p') of current pixel point,
then judging whether | gx (p') | is more than or equal to T, if so, executing a step (c-3), otherwise, executing a step (c-4), wherein,
"+" is convolution operation symbol, "| |" is operation symbol for solving absolute value, T is gradient amplitude threshold value;
sixthly-3, adopting standard deviation of (sigma)
s1,σ
r1) The depth value of the two-way filter to the neighborhood pixel point of the current pixel point
Filtering operation is carried out to obtain the depth value of the current pixel point after filtering, and the depth value is recorded as
Wherein,
<math>
<mrow>
<msub>
<mi>r</mi>
<mrow>
<mi>s</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mn>1</mn>
<mo>/</mo>
<munder>
<mi>Σ</mi>
<mrow>
<msup>
<mi>q</mi>
<mo>′</mo>
</msup>
<mo>∈</mo>
<mi>N</mi>
<mrow>
<mo>(</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
</mrow>
</munder>
<msub>
<mi>G</mi>
<mrow>
<mi>σs</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mo>|</mo>
<mo>|</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>-</mo>
<msup>
<mi>q</mi>
<mo>′</mo>
</msup>
<mo>|</mo>
<mo>|</mo>
<mo>)</mo>
</mrow>
<msub>
<mi>G</mi>
<mrow>
<mi>σs</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mo>|</mo>
<msubsup>
<mover>
<mi>I</mi>
<mo>~</mo>
</mover>
<mrow>
<mi>R</mi>
<mo>,</mo>
<mi>t</mi>
<mo>,</mo>
<mi>i</mi>
</mrow>
<mi>k</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msubsup>
<mover>
<mi>I</mi>
<mo>~</mo>
</mover>
<mrow>
<mi>R</mi>
<mo>,</mo>
<mi>t</mi>
<mo>,</mo>
<mi>i</mi>
</mrow>
<mi>k</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msup>
<mi>q</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>)</mo>
</mrow>
<mo>,</mo>
<msub>
<mi>G</mi>
<mrow>
<mi>σs</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mo>|</mo>
<mo>|</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>-</mo>
<msup>
<mi>q</mi>
<mo>′</mo>
</msup>
<mo>|</mo>
<mo>|</mo>
<mo>)</mo>
</mrow>
</mrow>
</math> denotes the standard deviation as σ
s1The function of the gaussian function of (a) is,
| p '-q' | | represents the euclidean distance between the coordinate position p 'and the coordinate position q', "| | | |" is a euclidean distance symbol,
denotes the standard deviation as σ
r1The function of the gaussian function of (a) is,
<math>
<mrow>
<msub>
<mi>G</mi>
<mrow>
<mi>σr</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mo>|</mo>
<msubsup>
<mover>
<mi>I</mi>
<mo>~</mo>
</mover>
<mrow>
<mi>R</mi>
<mo>,</mo>
<mi>t</mi>
<mo>,</mo>
<mi>i</mi>
</mrow>
<mi>k</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msubsup>
<mover>
<mi>I</mi>
<mo>~</mo>
</mover>
<mrow>
<mi>R</mi>
<mo>,</mo>
<mi>t</mi>
<mo>,</mo>
<mi>i</mi>
</mrow>
<mi>k</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msup>
<mi>q</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<mo>-</mo>
<mfrac>
<msup>
<mrow>
<mo>|</mo>
<msubsup>
<mover>
<mi>I</mi>
<mo>~</mo>
</mover>
<mrow>
<mi>R</mi>
<mo>,</mo>
<mi>t</mi>
<mo>,</mo>
<mi>i</mi>
</mrow>
<mi>k</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msubsup>
<mover>
<mi>I</mi>
<mo>~</mo>
</mover>
<mrow>
<mi>R</mi>
<mo>,</mo>
<mi>t</mi>
<mo>,</mo>
<mi>i</mi>
</mrow>
<mi>k</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msup>
<mi>q</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
<mn>2</mn>
</msup>
<msup>
<msub>
<mrow>
<mn>2</mn>
<mi>σ</mi>
</mrow>
<mrow>
<mi>r</mi>
<mn>1</mn>
</mrow>
</msub>
<mn>2</mn>
</msup>
</mfrac>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
</math> "| |" is an operation symbol for absolute value,
color image representing k-th reference viewpoint at time t after decoding
The value of the i-th component of the pixel point with the middle coordinate position p',
color image representing k-th reference viewpoint at time t after decoding
The value of the i-th component of the pixel point with the middle coordinate position q',
depth compensated image representing kth reference viewpoint at decoded t-time
The depth value of a pixel point with a middle coordinate position of q ', exp () represents an exponential function with e as a base, e =2.71828183, and N (p ') represents a 7 × 7 neighborhood window with the pixel point with the coordinate position of p ' as a center; then executing the step (sixthly-5);
sixthly, 4, calculating the depth value of the current pixel point
As filtered depth values
Namely, it is
Wherein,
"=" in (1) is an assigned symbol; then executing the step (sixthly-5);
sixthly-5, depth compensation image of k-th reference viewpoint at t moment after decoding
Taking the next pixel point to be processed as the current pixel point, then returning to the step (c) -2 to continue executing until the depth compensation image of the kth reference viewpoint at the moment t after decoding
After all the pixel points in the image are processed, a filtered depth filtering image is obtained and recorded as
And the coding prediction structure set in the step two is an HBP coding prediction structure.
Compared with the prior art, the invention has the advantages that:
1) according to the method, the coding distortion compensation parameters of the depth image are obtained, the decoded depth image is compensated by using the coding distortion compensation parameters, the depth compensation image obtained after compensation is filtered, and the depth filtering image obtained after filtering is used for drawing the virtual viewpoint image, so that the influence of coding distortion on the drawing of the virtual viewpoint image is reduced on the basis of keeping the compression efficiency of the depth image, and the drawing performance of the virtual viewpoint image is greatly improved.
2) The method of the invention obtains the coding distortion compensation parameters of the wavelet coefficient matrixes of different sub-bands of the depth image by adopting the wiener filter for prediction, codes the coding distortion compensation parameters by adopting a distortion-free compression mode, and then compensates the decoded depth image at a user terminal, thereby reducing the influence of coding distortion on the drawing of the virtual viewpoint image.
3) The method of the invention considers the characteristic that the edge area of the depth image is discontinuous and the depth distortion of the edge area can generate larger influence on the virtual viewpoint image drawing, and adopts the two-way filter to carry out filtering processing on the depth value of each pixel point of the edge area of the depth compensation image, thus effectively improving the drawing performance of the virtual viewpoint image.
Drawings
FIG. 1 is a block diagram of the basic components of a typical three-dimensional video system;
FIG. 2a is a color image of the 8 th reference viewpoint of the "Bookarrival" three-dimensional video test sequence;
FIG. 2b is a color image of the 10 th reference viewpoint of the "Bookarrival" three-dimensional video test sequence;
FIG. 2c is a depth image corresponding to the color image shown in FIG. 2 a;
FIG. 2d is a depth image corresponding to the color image shown in FIG. 2 b;
FIG. 3a is a color image of the 8 th reference viewpoint of the "Altmoabit" three-dimensional video test sequence;
FIG. 3b is a color image of the 10 th reference viewpoint of the "Altmoabit" three-dimensional video test sequence;
FIG. 3c is a depth image corresponding to the color image shown in FIG. 3 a;
FIG. 3d is a depth image corresponding to the color image shown in FIG. 3 b;
fig. 4a is a decoded depth image of the 8 th reference view of the "bookangular" three-dimensional video test sequence;
FIG. 4b is a depth filtering image obtained by the method of the present invention for the 8 th reference viewpoint of the "Bookarrival" three-dimensional video test sequence;
FIG. 5a is a decoded depth image of the 8 th reference view of the "Altmoabit" three-dimensional video test sequence;
FIG. 5b is a depth filtered image obtained by the method of the present invention for the 8 th reference viewpoint of the "Altmoabit" three-dimensional video test sequence;
fig. 6a is a virtual viewpoint image obtained by drawing the original depth image of the 9 th reference viewpoint of the "bookarrrival" three-dimensional video test sequence;
fig. 6b is a virtual viewpoint image obtained by drawing a 9 th reference viewpoint of the "bookarrrival" three-dimensional video test sequence by using a decoded depth image;
fig. 6c is a virtual viewpoint image obtained by rendering the 9 th reference viewpoint of the "bookkarrival" three-dimensional video test sequence by using the method of the present invention;
fig. 7a is a virtual viewpoint image obtained by drawing an original depth image of the 9 th reference viewpoint of the "Altmoabit" three-dimensional video test sequence;
fig. 7b is a virtual viewpoint image obtained by drawing a 9 th reference viewpoint of the "altmoobit" three-dimensional video test sequence by using a decoded depth image;
FIG. 7c is a virtual viewpoint image obtained by rendering the 9 th reference viewpoint of the Altmoabit three-dimensional video test sequence by the method of the present invention;
FIG. 8a is an enlarged view of a portion of FIG. 6 a;
FIG. 8b is an enlarged view of a portion of FIG. 6 b;
FIG. 8c is an enlarged view of a portion of FIG. 6 c;
FIG. 9a is an enlarged view of a portion of FIG. 7 a;
FIG. 9b is an enlarged view of a portion of FIG. 7 b;
fig. 9c is an enlarged view of a detail of fig. 7 c.
Detailed Description
The invention is described in further detail below with reference to the accompanying examples.
The invention provides a method for post-processing a depth image, which comprises the following processing procedures: firstly, coding an obtained color image and a depth image corresponding to the color image to obtain a coded code stream; then, obtaining coding distortion compensation parameters of the depth image, and coding the coding distortion compensation parameters of the depth image to obtain a parameter code stream; then decoding the coded code stream and the parameter code stream to obtain a decoded color image, a decoded depth image and a decoded coding distortion compensation parameter of the depth image; and then, compensating the decoded depth image by using the coding distortion compensation parameter of the depth image to obtain a depth compensation image, and filtering the depth compensation image to obtain a depth filtering image, wherein the depth filtering image is used for drawing a virtual viewpoint image, namely the virtual viewpoint image can be obtained by drawing based on the depth image according to the decoded color image and the depth filtering image. The method specifically comprises the following steps:
acquiring K color images with YUV color spaces of K reference viewpoints at t moment and K depth images corresponding to the color images, and recording the color image of the kth reference viewpoint at t moment as
Record the depth image of the kth reference viewpoint at the time t as
Wherein, K is not less than 1 and not more than K, the initial value of K is 1, i is 1,2,3 respectively represent three components of YUV color space, the 1 st component of YUV color space is a brightness component and is marked as Y, the 2 nd component is a first chroma component and is marked as U, and the 3 rd component is a second chroma component and is marked as V, (x, Y) represents the coordinate position of the pixel point in the color image and the depth image, x is not less than 1 and not more than W, Y is not less than 1 and not more than H, W represents the width of the color image and the depth image, H represents the height of the color image and the depth image,
color image representing the kth reference viewpoint at time t
The value of the ith component of the pixel point with the middle coordinate position of (x, y),
indicating the depth of the kth reference viewpoint at time tImage of a person
The middle coordinate position is the depth value of the pixel point of (x, y).
Here, three-dimensional video test sequences "bookwarrival" and "altmobait" provided by HHI laboratories in germany are used, each of which includes 16 color images of 16 reference viewpoints and corresponding 16 depth images, each of which has a resolution of 1024 × 768 and a frame rate of 15 frames per second, i.e., 15fps, and are standard test sequences recommended by ISO/MPEG. Fig. 2a and 2b show a color image of the 8 th and 10 th reference viewpoints of "bookangular", respectively; fig. 2c and 2d show the depth images corresponding to the color images of the 8 th and 10 th reference viewpoints of "Bookarrival", respectively; FIGS. 3a and 3b show a color image of the 8 th and 10 th reference viewpoints of "Altmoabit", respectively; fig. 3c and 3d show the depth images corresponding to the color images of the 8 th and 10 th reference viewpoints of "Altmoabit", respectively.
And secondly, respectively coding K color images with YUV color spaces of K reference viewpoints at the time t and K depth images corresponding to the color images according to a set coding prediction structure, then outputting the color image code stream and the depth image code stream frame by frame to obtain a coded code stream, and transmitting the coded code stream to the user side by the service side through a network.
Here, the set coding prediction structure is a known HBP coding prediction structure.
The Coding of the depth image can cause the quality of the decoded depth image to be reduced and inevitably cause the drawing performance of the virtual viewpoint image to be reduced, so the invention adopts a wiener filter to predict and obtain the Coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment according to the K depth images of the K reference viewpoints at the t moment and the K depth images of the K reference viewpoints at the t moment after Coding and then decoding, then adopts a CABAC (Context-based Adaptive Binary Arithmetic Coding, previously referenced Adaptive Binary Arithmetic Coding) lossless compression method to respectively code the Coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment, then outputs parameter code streams frame by frame, and finally transmits the parameter code streams to a user end through a network by a service end.
In this specific embodiment, the specific process of obtaining the coding distortion compensation parameters of the K depth images of the K reference viewpoints at the time t in the step (c) is as follows:
③ 1, the depth image of the kth reference viewpoint currently processed in the K depth images of the K reference viewpoints at the time t
Defined as the current depth image.
③ 2, for the current depth image
Implementing 3-level wavelet transform to obtain wavelet coefficient matrix of 3 directional sub-bands of each level of wavelet transform, the 3 directional sub-bands including horizontal sub-band, vertical sub-band and diagonal sub-band
The wavelet coefficient matrix of the nth direction sub-band obtained after the mth level wavelet transformation is carried out is recorded as
Wherein m is more than or equal to 1 and less than or equal to 3, n is more than or equal to 1 and less than or equal to 3,
to represent
The middle coordinate position is the wavelet coefficient at (x, y).
Thirdly, 3, the depth image of the kth reference viewpoint at the t moment obtained by decoding after encoding
Implementing 3-level wavelet transform to obtain wavelet coefficient matrix of 3 directional sub-bands of each level of wavelet transform, the 3 directional sub-bands including horizontal sub-band, vertical sub-band and diagonal sub-band
The wavelet coefficient matrix of the nth direction sub-band obtained after the mth level wavelet transformation is carried out is recorded as
Wherein m is more than or equal to 1 and less than or equal to 3, n is more than or equal to 1 and less than or equal to 3,
to represent
The middle coordinate position is the wavelet coefficient at (x, y).
Thirdly, predicting and obtaining the depth image of the kth reference viewpoint at the t moment after decoding by adopting a wiener filter
The coding distortion compensation parameters of the wavelet coefficient matrix of each directional subband of each level of wavelet transform are
Is recorded as a coding distortion compensation parameter
Wherein L represents the filtering length range of the wiener filter,
expression solution
The mathematical expected value of (a) is,
to represent
The wavelet coefficient at the middle coordinate position (X + p, y + q), argmin (X) represents the parameter that minimizes the function X, i.e. the
Is shown to make
The minimum parameter.
③ -5, according to the depth image of the kth reference viewpoint at the moment t after decoding
The coding distortion compensation parameters of the wavelet coefficient matrixes of the sub-bands in all directions of each level of wavelet transformation are obtained to obtain the current depth image
Coding distortion compensation parameter of
And (3) taking the depth image of the next to-be-processed reference viewpoint in the K depth images of the K reference viewpoints at the time t as the current depth image, and then returning to the step (2) to continue executing until the depth images of all the reference viewpoints in the K depth images of the K reference viewpoints at the time t are processed, wherein the initial value of K' is 0.
Decoding the coded code stream sent by the server by the user end to respectively obtain K color images and corresponding K depth images of K reference viewpoints at the t moment after decoding, and decoding the t moment after decodingRespectively correspond to the color image and the corresponding depth image of the kth reference viewpoint
And
wherein,
color image representing k-th reference viewpoint at time t after decoding
The value of the ith component of the pixel point with the middle coordinate position of (x, y),
depth image representing kth reference viewpoint at decoded t-time
The middle coordinate position is the depth value of the pixel point of (x, y).
Fifthly, the user end decodes the parameter code stream sent by the server end to obtain the coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment, then the coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment are utilized to compensate the K depth images of the K reference viewpoints at the t moment after decoding, the K depth compensation images of the K reference viewpoints at the t moment after decoding are obtained, and the depth compensation image of the kth reference viewpoint at the t moment after decoding is recorded as the depth compensation image of the kth reference viewpoint at the t moment
Wherein,
depth compensated image representing kth reference viewpoint at decoded t-time
The middle coordinate position is the depth value of the pixel point of (x, y).
In this embodiment, the depth compensation image of the kth reference viewpoint at time t after decoding is obtained in the fifth step
The specific process comprises the following steps:
fifthly-1, decoding the depth image of the kth reference viewpoint at the t moment
Implementing 3-level wavelet transform to obtain wavelet coefficient matrix of 3 directional sub-bands of each level of wavelet transform, the 3 directional sub-bands including horizontal sub-band, vertical sub-band and diagonal sub-band
The wavelet coefficient matrix of the nth direction sub-band obtained after the mth level wavelet transformation is carried out is recorded as
Wherein m is more than or equal to 1 and less than or equal to 3, n is more than or equal to 1 and less than or equal to 3,
to represent
The middle coordinate position is the wavelet coefficient at (x, y).
Fifthly-2, calculating the depth image of the kth reference viewpoint at the t moment after decoding
The wavelet coefficient matrixes of all direction sub-bands of each level of wavelet transform are respectively compensated to obtain wavelet coefficient matrixes
Compensated smallThe wave coefficient matrix is recorded as
Wherein,
to represent
The middle coordinate position is the wavelet coefficient at (x + p, y + q).
Fifthly, 3, decoding the depth image of the kth reference viewpoint at the t moment
The wavelet coefficient matrixes of the sub-bands in all directions of each level of wavelet transform are respectively subjected to inverse wavelet transform after compensation to obtain a depth compensation image of the kth reference viewpoint at the t moment after decoding, and the depth compensation image is recorded as the depth compensation image
Wherein,
depth compensated image representing kth reference viewpoint at decoded t-time
The middle coordinate position is the depth value of the pixel point of (x, y).
Sixthly, because of the limitation of the depth image obtaining method, the edge area of the depth image is discontinuous, strong correlation exists between the depth image and the color image, and the moving object boundary of the depth image and the color image are consistent, therefore, the edge information of the color image can be used for assisting the filtering processing of the depth imageCarrying out bidirectional filtering processing on the K depth compensation images of the viewpoints to obtain K depth filtering images of K reference viewpoints at t moment after decoding, and recording the depth filtering image of the kth reference viewpoint at t moment after decoding as
Wherein,
depth filtered image representing kth reference viewpoint at decoded t-time
The middle coordinate position is the depth value of the pixel point of (x, y). When the virtual viewpoint image is rendered, the virtual viewpoint image can be obtained by rendering based on the depth image according to the K color images of the K reference viewpoints at the t moment after decoding and the K depth filtering images of the K reference viewpoints at the t moment after decoding.
In this embodiment, the depth compensated image of the kth reference viewpoint at time t after decoding in step [ ]
The specific process of performing the bidirectional filtering processing is as follows:
sixthly-1, defining the depth compensation image of the kth reference viewpoint at the t moment after decoding
And the currently processed pixel point is the current pixel point.
Sixthly-2, recording the coordinate position of the current pixel point as p ', recording the coordinate position of the neighborhood pixel point of the current pixel point as q', and then adopting a gradient template G
xFor the depth value of the current pixel point
Performing convolution operation to obtain gradient value gx (p') of current pixel point,
then judging whether | gx (p') | ≧ T is true, if true,
the step of sixthly-3 is executed, otherwise, the step of sixthly-4 is executed, wherein, "+" is the convolution operation symbol, "|" is the operation symbol for absolute value, T is the gradient amplitude threshold, in this embodiment, T = 5.
Sixthly-3, adopting standard deviation of (sigma)
s1,σ
r1) The depth value of the two-way filter to the neighborhood pixel point of the current pixel point
Filtering operation is carried out to obtain the depth value of the current pixel point after filtering, and the depth value is recorded as
Wherein,
<math>
<mrow>
<msub>
<mi>r</mi>
<mrow>
<mi>s</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mn>1</mn>
<mo>/</mo>
<munder>
<mi>Σ</mi>
<mrow>
<msup>
<mi>q</mi>
<mo>′</mo>
</msup>
<mo>∈</mo>
<mi>N</mi>
<mrow>
<mo>(</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
</mrow>
</munder>
<msub>
<mi>G</mi>
<mrow>
<mi>σs</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mo>|</mo>
<mo>|</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>-</mo>
<msup>
<mi>q</mi>
<mo>′</mo>
</msup>
<mo>|</mo>
<mo>|</mo>
<mo>)</mo>
</mrow>
<msub>
<mi>G</mi>
<mrow>
<mi>σs</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mo>|</mo>
<msubsup>
<mover>
<mi>I</mi>
<mo>~</mo>
</mover>
<mrow>
<mi>R</mi>
<mo>,</mo>
<mi>t</mi>
<mo>,</mo>
<mi>i</mi>
</mrow>
<mi>k</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msubsup>
<mover>
<mi>I</mi>
<mo>~</mo>
</mover>
<mrow>
<mi>R</mi>
<mo>,</mo>
<mi>t</mi>
<mo>,</mo>
<mi>i</mi>
</mrow>
<mi>k</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msup>
<mi>q</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>)</mo>
</mrow>
<mo>,</mo>
<msub>
<mi>G</mi>
<mrow>
<mi>σs</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mo>|</mo>
<mo>|</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>-</mo>
<msup>
<mi>q</mi>
<mo>′</mo>
</msup>
<mo>|</mo>
<mo>|</mo>
<mo>)</mo>
</mrow>
</mrow>
</math> denotes the standard deviation as σ
s1The function of the gaussian function of (a) is,
| p '-q' | | represents the euclidean distance between the coordinate position p 'and the coordinate position q', "| | | |" is a euclidean distance symbol,
denotes the standard deviation as σ
r1The function of the gaussian function of (a) is,
<math>
<mrow>
<msub>
<mi>G</mi>
<mrow>
<mi>σr</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mo>|</mo>
<msubsup>
<mover>
<mi>I</mi>
<mo>~</mo>
</mover>
<mrow>
<mi>R</mi>
<mo>,</mo>
<mi>t</mi>
<mo>,</mo>
<mi>i</mi>
</mrow>
<mi>k</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msubsup>
<mover>
<mi>I</mi>
<mo>~</mo>
</mover>
<mrow>
<mi>R</mi>
<mo>,</mo>
<mi>t</mi>
<mo>,</mo>
<mi>i</mi>
</mrow>
<mi>k</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msup>
<mi>q</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<mo>-</mo>
<mfrac>
<msup>
<mrow>
<mo>|</mo>
<msubsup>
<mover>
<mi>I</mi>
<mo>~</mo>
</mover>
<mrow>
<mi>R</mi>
<mo>,</mo>
<mi>t</mi>
<mo>,</mo>
<mi>i</mi>
</mrow>
<mi>k</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msup>
<mi>p</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msubsup>
<mover>
<mi>I</mi>
<mo>~</mo>
</mover>
<mrow>
<mi>R</mi>
<mo>,</mo>
<mi>t</mi>
<mo>,</mo>
<mi>i</mi>
</mrow>
<mi>k</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msup>
<mi>q</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
<mn>2</mn>
</msup>
<msup>
<msub>
<mrow>
<mn>2</mn>
<mi>σ</mi>
</mrow>
<mrow>
<mi>r</mi>
<mn>1</mn>
</mrow>
</msub>
<mn>2</mn>
</msup>
</mfrac>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
</math> "| |" is an operation symbol for absolute value,
color image representing k-th reference viewpoint at time t after decoding
The value of the i-th component of the pixel point with the middle coordinate position p',
color image representing k-th reference viewpoint at time t after decoding
The value of the i-th component of the pixel point with the middle coordinate position q',
depth compensated image representing kth reference viewpoint at decoded t-time
The depth value of a pixel point with a middle coordinate position of q ', exp () represents an exponential function with e as a base, e =2.71828183, and N (p ') represents a 7 × 7 neighborhood window with a pixel point with a coordinate position of p ' as a center, and in the actual processing process, neighborhood windows with other sizes can be adopted, but a large number of experiments show that the best effect can be achieved when the 7 × 7 neighborhood window is adopted; then the step of (4) is executed.
In the present embodiment, the standard deviation (σ)s1,σr1)=(5,0.1)。
Sixthly, 4, calculating the depth value of the current pixel point
As filtered depth values
Namely, it is
Wherein "
"in" = "is an assigned symbol; then the step of (4) is executed.
Sixthly-5, depth compensation image of k-th reference viewpoint at t moment after decoding
Taking the next pixel point to be processed as the current pixel point, then returning to the step (c) -2 to continue executing until the depth compensation image of the kth reference viewpoint at the moment t after decoding
After all the pixel points in the image are processed, a filtered depth filtering image is obtained and recorded as
The depth images of the three-dimensional video test sequences of "bookwarrival" and "altmoobit" are subjected to a filtering process experiment, fig. 4a shows a decoded depth image of the 8 th reference viewpoint of "bookwarrival", fig. 4b shows a depth filtered image of the 8 th reference viewpoint of "bookwarfarval" obtained by the method of the present invention, fig. 5a shows a decoded depth image of the 8 th reference viewpoint of "altmoobit", fig. 5b shows a depth filtered image of the 8 th reference viewpoint of "altmoobit" obtained by the method of the present invention, and as can be seen from fig. 4a to 5b, the depth images obtained after the filtering process by the method of the present invention, i.e., the depth filtered images, maintain important geometric features of the depth images, and generate satisfactory edges and smooth contours.
The subjective performance of virtual viewpoint image rendering on the Bookarrival three-dimensional video test sequence and the Altmoabit three-dimensional video test sequence is compared by using the method.
The virtual viewpoint image obtained by the method of the present invention is compared with a virtual viewpoint image obtained without the method of the present invention (directly using the decoded image). Fig. 6a shows a virtual viewpoint image obtained by rendering a 9 th reference viewpoint of "bookwarrival" with an original depth image, fig. 6b shows a virtual viewpoint image obtained by rendering a 9 th reference viewpoint of "bookwarrival" with a decoded depth image, fig. 6c shows a virtual viewpoint image obtained by rendering a 9 th reference viewpoint of "bookwarrival" with the method of the present invention, fig. 7a shows a virtual viewpoint image obtained by rendering a 9 th reference viewpoint of "altmoebit" with an original depth image, fig. 7b shows a virtual viewpoint image obtained by rendering a 9 th reference viewpoint of "altmoebit" with a decoded depth image, fig. 7c shows a virtual viewpoint image obtained by rendering a 9 th reference viewpoint of "altmoebit" with the method of the present invention, fig. 8a, fig. 8b and fig. 8c respectively show partial details of fig. 6a, fig. 6b and fig. 6c, and fig. 9a partial enlarged view of fig. 9a, Fig. 9b and 9c show enlarged partial detail views of fig. 7a, 7b and 7c, respectively. As can be seen from fig. 6a to 9c, the virtual viewpoint image obtained by the method of the present invention can maintain better object contour information, thereby reducing coverage of the background generated in the mapping process to the foreground due to distortion of the depth image, and performing bidirectional filtering processing on the edge area of the depth image according to the edge information of the color image, so as to effectively eliminate stripe noise in the drawn virtual viewpoint image.
The peak signal-to-noise ratio (PSNR) of the virtual viewpoint image obtained by the method of the present invention is compared with the peak signal-to-noise ratio (PSNR) of the virtual viewpoint image obtained by the method without the method of the present invention, and the comparison results are listed in table 1, and it can be seen from table 1 that the quality of the virtual viewpoint image obtained by the method of the present invention is significantly better than the quality of the virtual viewpoint image obtained by the method without the present invention, which is sufficient to show that the method is effective and feasible.
TABLE 1 comparison of peak signal-to-noise ratio using and without the inventive method