CN102769749A

CN102769749A - Post-processing method for depth image

Info

Publication number: CN102769749A
Application number: CN2012102260184A
Authority: CN
Inventors: 邵枫; 蒋刚毅; 郁梅; 彭宗举; 李福翠
Original assignee: Ningbo University
Current assignee: Luyake Fire Vehicle Manufacturing Co ltd
Priority date: 2012-06-29
Filing date: 2012-06-29
Publication date: 2012-11-07
Anticipated expiration: 2032-06-29
Also published as: CN102769749B

Abstract

The invention discloses a post-processing method for a depth image. The method comprises the following steps of: coding an acquired color image and a corresponding depth image to obtain a coded code stream; acquiring a coding distortion compensation parameter of the depth image, and coding the coding distortion compensation parameter of the depth image to obtain a parameter code stream; decoding the coded code stream and the parameter code stream to obtain a decoded color image, a decoded depth image and a decoded coding distortion compensation parameter; and compensating the decoded depth image by using the coding distortion compensation parameter of the depth image to obtain a depth compensated image, and filtering the depth compensated image to obtain a depth filtered image which is used for drawing a virtual viewpoint image. The method has the advantages that the influence of coding distortion on the drawing of the virtual viewpoint image is lowered on the basis of maintaining the compression efficiency of the depth image, and the drawing performance of the virtual viewpoint image is greatly improved.

Description

Post-processing method of depth image

Technical Field

The present invention relates to an image processing method, and in particular, to a depth image post-processing method.

Background

Three-Dimensional Video (3 DV) is an advanced visual mode, which makes people have stereoscopic impression and immersion when watching images on a screen, and can meet the requirement that people watch Three-Dimensional (3D) scenes from different angles. A typical three-dimensional video system is shown in fig. 1, and mainly includes modules of video capturing, video encoding, transport decoding, virtual viewpoint rendering, and interactive display.

Multi-view video plus depth (MVD) is the 3D scene information representation adopted by current ISO/MPEG recommendations. The MVD data increases the depth information of corresponding viewpoints on the basis of a multi-viewpoint color image, and two basic approaches are mainly used for obtaining the depth information at present: 1) acquiring through a depth camera; 2) depth information is generated from a general two-dimensional (2D) video by a generation method. Depth Image Based Rendering (DIBR) is a method of generating a virtual viewpoint image by Depth image Rendering corresponding to a color image of a reference viewpoint, and synthesizes a virtual viewpoint image of a three-dimensional scene by using the color image of the reference viewpoint and Depth information corresponding to each pixel point in the color image of the reference viewpoint.

However, compared with color images, depth images have simple textures and include more flat regions, but due to the limitation of depth image acquisition methods, depth images generally have the problems of poor time continuity, depth discontinuity and the like, and more importantly, depth images are not directly used for viewing, but are used for assisting DIBR and 3D display. At present, related researchers have proposed some preprocessing methods for depth images, such as symmetric gaussian filtering and asymmetric gaussian filtering, however, these preprocessing methods consider how to improve the performance of coding, and the improvement of the coding performance inevitably sacrifices the rendering performance of virtual viewpoints.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a depth image post-processing method which can effectively improve the rendering performance of a virtual viewpoint image on the basis of keeping the compression efficiency of the depth image.

The technical scheme adopted by the invention for solving the technical problems is as follows: a method for post-processing a depth image is characterized in that the processing process comprises the following steps: firstly, coding an obtained color image and a depth image corresponding to the color image to obtain a coded code stream; then, obtaining coding distortion compensation parameters of the depth image, and coding the coding distortion compensation parameters of the depth image to obtain a parameter code stream; then decoding the coded code stream and the parameter code stream to obtain a decoded color image, a decoded depth image and a decoded coding distortion compensation parameter of the depth image; and then, compensating the decoded depth image by using the coding distortion compensation parameter of the depth image to obtain a depth compensation image, and performing filtering processing on the depth compensation image to obtain a depth filtering image, wherein the depth filtering image is used for drawing a virtual viewpoint image.

The post-processing method comprises the following specific steps:

acquiring K color images with YUV color spaces of K reference viewpoints at t moment and K depth images corresponding to the color images, and recording the color image of the kth reference viewpoint at t moment as

Record the depth image of the kth reference viewpoint at the time t as

Wherein, K is not less than 1 and not more than K, the initial value of K is 1, i is 1,2,3 respectively represent three components of YUV color space, the 1 st component of YUV color space is a brightness component and is marked as Y, the 2 nd component is a first chroma component and is marked as U, and the 3 rd component is a second chroma component and is marked as V, (x, Y) represents the coordinate position of the pixel point in the color image and the depth image, x is not less than 1 and not more than W, Y is not less than 1 and not more than H, W represents the width of the color image and the depth image, H represents the height of the color image and the depth image,

color image representing the kth reference viewpoint at time t

The value of the ith component of the pixel point with the middle coordinate position of (x, y),

depth image representing kth reference viewpoint at time tThe middle coordinate position is the depth value of the pixel point of (x, y);

respectively coding K color images with YUV color spaces of K reference viewpoints at the time t and K depth images corresponding to the color images according to a set coding prediction structure, outputting color image code streams and depth image code streams frame by frame to obtain coding code streams, and transmitting the coding code streams to a user terminal by a server through a network;

thirdly, according to the K depth images of the K reference viewpoints at the time t and the K depth images of the K reference viewpoints at the time t obtained by decoding after encoding, predicting and obtaining encoding distortion compensation parameters of the K depth images of the K reference viewpoints at the time t by adopting a wiener filter, then respectively encoding the encoding distortion compensation parameters of the K depth images of the K reference viewpoints at the time t by adopting a CABAC lossless compression method, outputting parameter code streams frame by frame, and finally transmitting the parameter code streams to a user terminal by a service terminal through a network;

decoding the coded code stream sent by the server by the user end to respectively obtain K color images and corresponding K depth images of K reference viewpoints at t moment after decoding, and correspondingly recording the color image and the corresponding depth image of the kth reference viewpoint at t moment after decoding as

And

wherein,

color image representing k-th reference viewpoint at time t after decoding

The middle coordinate position is (x, y)The value of the ith component of the pixel point of (a),

depth image representing kth reference viewpoint at decoded t-time

The middle coordinate position is the depth value of the pixel point of (x, y);

fifthly, the user end decodes the parameter code stream sent by the server end to obtain the coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment, then the coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment are utilized to compensate the K depth images of the K reference viewpoints at the t moment after decoding, the K depth compensation images of the K reference viewpoints at the t moment after decoding are obtained, and the depth compensation image of the kth reference viewpoint at the t moment after decoding is recorded as the depth compensation image of the kth reference viewpoint at the t momentWherein,depth compensated image representing kth reference viewpoint at decoded t-time

The middle coordinate position is the depth value of the pixel point of (x, y);

sixthly, respectively carrying out bidirectional filtering processing on the decoded K depth compensation images of the K reference viewpoints at the t moment by adopting a bidirectional filter to obtain K depth filtering images of the K reference viewpoints at the t moment, and recording the depth filtering image of the kth reference viewpoint at the t moment as a decoded depth filtering imageWherein,

indicating time t after decodingDepth filtered image of kth reference viewpoint

The middle coordinate position is the depth value of the pixel point of (x, y).

The specific process of obtaining the coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment in the third step is as follows:

③ 1, the depth image of the kth reference viewpoint currently processed in the K depth images of the K reference viewpoints at the time t

Defining the depth image as a current depth image;

③ 2, for the current depth image

Implementing 3-level wavelet transform to obtain wavelet coefficient matrix of 3 directional sub-bands of each level of wavelet transform, the 3 directional sub-bands including horizontal sub-band, vertical sub-band and diagonal sub-bandThe wavelet coefficient matrix of the nth direction sub-band obtained after the mth level wavelet transformation is carried out is recorded as

Wherein m is more than or equal to 1 and less than or equal to 3, n is more than or equal to 1 and less than or equal to 3,

to represent

The middle coordinate position is the wavelet coefficient at (x, y);

thirdly, 3, the depth image of the kth reference viewpoint at the t moment obtained by decoding after encoding

Implementing 3-level wavelet transform to obtain wavelet coefficient matrix of 3 directional sub-bands of each level of wavelet transform, the 3 directional sub-bands including horizontal sub-band, vertical sub-band and diagonal sub-band

The wavelet coefficient matrix of the nth direction sub-band obtained after the mth level wavelet transformation is carried out is recorded as

to represent

The middle coordinate position is the wavelet coefficient at (x, y);

thirdly, predicting and obtaining the depth image of the kth reference viewpoint at the t moment after decoding by adopting a wiener filter

The coding distortion compensation parameters of the wavelet coefficient matrix of each directional subband of each level of wavelet transform are

Is recorded as a coding distortion compensation parameter Wherein L represents the filtering length range of the wiener filter,

expression solution

Mathematical expectation ofThe value of the one or more of,

to representThe middle coordinate position is the wavelet coefficient at (X + p, y + q), argmin (X) represents the parameter that minimizes the function X;

③ -5, according to the depth image of the kth reference viewpoint at the moment t after decoding

The coding distortion compensation parameters of the wavelet coefficient matrixes of the sub-bands in all directions of each level of wavelet transformation are obtained to obtain the current depth image

Coding distortion compensation parameter of

And (3) taking the depth image of the next to-be-processed reference viewpoint in the K depth images of the K reference viewpoints at the time t as the current depth image, and then returning to the step (2) to continue executing until the depth images of all the reference viewpoints in the K depth images of the K reference viewpoints at the time t are processed, wherein the initial value of K' is 0.

The fifth step obtains the depth compensation image of the kth reference viewpoint at the t moment after decoding

The specific process comprises the following steps:

fifthly-1, decoding the depth image of the kth reference viewpoint at the t momentImplementing 3-level wavelet transform to obtain 3 directional sub-band minimums of each level of wavelet transformA wave coefficient matrix, 3 directional subbands including a horizontal subband, a vertical subband and a diagonal subband

Wherein m is more than or equal to 1 and less than or equal to 3, n is more than or equal to 1 and less than or equal to 3,to represent

The middle coordinate position is the wavelet coefficient at (x, y);

fifthly-2, calculating the depth image of the kth reference viewpoint at the t moment after decoding

The wavelet coefficient matrixes of all direction sub-bands of each level of wavelet transform are respectively compensated to obtain wavelet coefficient matrixes

The compensated wavelet coefficient matrix is recorded as

Wherein,

to representThe middle coordinate position is a wavelet coefficient at (x + p, y + q);

fifthly, 3, decoding the depth image of the kth reference viewpoint at the t moment

The wavelet coefficient matrixes of the sub-bands in all directions of each level of wavelet transform are respectively subjected to inverse wavelet transform after compensation to obtain a depth compensation image of the kth reference viewpoint at the t moment after decoding, and the depth compensation image is recorded as the depth compensation image

Wherein,

depth compensated image representing kth reference viewpoint at decoded t-time

The middle coordinate position is the depth value of the pixel point of (x, y).

The step of compensating the depth of the kth reference viewpoint of the decoded t time

The specific process of performing the bidirectional filtering processing is as follows:

sixthly-1, defining the depth compensation image of the kth reference viewpoint at the t moment after decoding

The currently processed pixel point is the current pixel point;

sixthly-2, recording the coordinate position of the current pixel point as p ', recording the coordinate position of the neighborhood pixel point of the current pixel point as q', and then adopting a gradient template G_xFor the depth value of the current pixel point

Performing convolution operation to obtain gradient value gx (p') of current pixel point,then judging whether | gx (p') | is more than or equal to T, if so, executing a step (c-3), otherwise, executing a step (c-4), wherein,

G_{x} = [\begin{matrix} - 1 & 0 & 1 \\ - 2 & 0 & 2 \\ - 1 & 0 & 1 \end{matrix}],

"+" is convolution operation symbol, "| |" is operation symbol for solving absolute value, T is gradient amplitude threshold value;

sixthly-3, adopting standard deviation of (sigma)_s1,σ_r1) The depth value of the two-way filter to the neighborhood pixel point of the current pixel point

Filtering operation is carried out to obtain the depth value of the current pixel point after filtering, and the depth value is recorded as

Wherein,

<math> <mrow> <msub> <mi>r</mi> <mrow> <mi>s</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <msup> <mi>p</mi> <mo>′</mo> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> <mo>/</mo> <munder> <mi>Σ</mi> <mrow> <msup> <mi>q</mi> <mo>′</mo> </msup> <mo>&Element;</mo> <mi>N</mi> <mrow> <mo>(</mo> <msup> <mi>p</mi> <mo>′</mo> </msup> <mo>)</mo> </mrow> </mrow> </munder> <msub> <mi>G</mi> <mrow> <mi>σs</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <msup> <mi>p</mi> <mo>′</mo> </msup> <mo>-</mo> <msup> <mi>q</mi> <mo>′</mo> </msup> <mo>|</mo> <mo>|</mo> <mo>)</mo> </mrow> <msub> <mi>G</mi> <mrow> <mi>σs</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <mo>|</mo> <msubsup> <mover> <mi>I</mi> <mo>~</mo> </mover> <mrow> <mi>R</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> </mrow> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <msup> <mi>p</mi> <mo>′</mo> </msup> <mo>)</mo> </mrow> <mo>-</mo> <msubsup> <mover> <mi>I</mi> <mo>~</mo> </mover> <mrow> <mi>R</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> </mrow> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <msup> <mi>q</mi> <mo>′</mo> </msup> <mo>)</mo> </mrow> <mo>|</mo> <mo>)</mo> </mrow> <mo>,</mo> <msub> <mi>G</mi> <mrow> <mi>σs</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <msup> <mi>p</mi> <mo>′</mo> </msup> <mo>-</mo> <msup> <mi>q</mi> <mo>′</mo> </msup> <mo>|</mo> <mo>|</mo> <mo>)</mo> </mrow> </mrow> </math>

denotes the standard deviation as σ_s1The function of the gaussian function of (a) is,

| p '-q' | | represents the euclidean distance between the coordinate position p 'and the coordinate position q', "| | | |" is a euclidean distance symbol,

denotes the standard deviation as σ_r1The function of the gaussian function of (a) is,

"| |" is an operation symbol for absolute value,

color image representing k-th reference viewpoint at time t after decoding

The value of the i-th component of the pixel point with the middle coordinate position p',

color image representing k-th reference viewpoint at time t after decoding

The value of the i-th component of the pixel point with the middle coordinate position q',

depth compensated image representing kth reference viewpoint at decoded t-time

The depth value of a pixel point with a middle coordinate position of q ', exp () represents an exponential function with e as a base, e =2.71828183, and N (p ') represents a 7 × 7 neighborhood window with the pixel point with the coordinate position of p ' as a center; then executing the step (sixthly-5);

sixthly, 4, calculating the depth value of the current pixel point

As filtered depth values

Namely, it isWherein,

"=" in (1) is an assigned symbol; then executing the step (sixthly-5);

sixthly-5, depth compensation image of k-th reference viewpoint at t moment after decodingTaking the next pixel point to be processed as the current pixel point, then returning to the step (c) -2 to continue executing until the depth compensation image of the kth reference viewpoint at the moment t after decoding

After all the pixel points in the image are processed, a filtered depth filtering image is obtained and recorded as

And the coding prediction structure set in the step two is an HBP coding prediction structure.

Compared with the prior art, the invention has the advantages that:

1) according to the method, the coding distortion compensation parameters of the depth image are obtained, the decoded depth image is compensated by using the coding distortion compensation parameters, the depth compensation image obtained after compensation is filtered, and the depth filtering image obtained after filtering is used for drawing the virtual viewpoint image, so that the influence of coding distortion on the drawing of the virtual viewpoint image is reduced on the basis of keeping the compression efficiency of the depth image, and the drawing performance of the virtual viewpoint image is greatly improved.

2) The method of the invention obtains the coding distortion compensation parameters of the wavelet coefficient matrixes of different sub-bands of the depth image by adopting the wiener filter for prediction, codes the coding distortion compensation parameters by adopting a distortion-free compression mode, and then compensates the decoded depth image at a user terminal, thereby reducing the influence of coding distortion on the drawing of the virtual viewpoint image.

3) The method of the invention considers the characteristic that the edge area of the depth image is discontinuous and the depth distortion of the edge area can generate larger influence on the virtual viewpoint image drawing, and adopts the two-way filter to carry out filtering processing on the depth value of each pixel point of the edge area of the depth compensation image, thus effectively improving the drawing performance of the virtual viewpoint image.

Drawings

FIG. 1 is a block diagram of the basic components of a typical three-dimensional video system;

FIG. 2a is a color image of the 8 th reference viewpoint of the "Bookarrival" three-dimensional video test sequence;

FIG. 2b is a color image of the 10 th reference viewpoint of the "Bookarrival" three-dimensional video test sequence;

FIG. 2c is a depth image corresponding to the color image shown in FIG. 2 a;

FIG. 2d is a depth image corresponding to the color image shown in FIG. 2 b;

FIG. 3a is a color image of the 8 th reference viewpoint of the "Altmoabit" three-dimensional video test sequence;

FIG. 3b is a color image of the 10 th reference viewpoint of the "Altmoabit" three-dimensional video test sequence;

FIG. 3c is a depth image corresponding to the color image shown in FIG. 3 a;

FIG. 3d is a depth image corresponding to the color image shown in FIG. 3 b;

fig. 4a is a decoded depth image of the 8 th reference view of the "bookangular" three-dimensional video test sequence;

FIG. 4b is a depth filtering image obtained by the method of the present invention for the 8 th reference viewpoint of the "Bookarrival" three-dimensional video test sequence;

FIG. 5a is a decoded depth image of the 8 th reference view of the "Altmoabit" three-dimensional video test sequence;

FIG. 5b is a depth filtered image obtained by the method of the present invention for the 8 th reference viewpoint of the "Altmoabit" three-dimensional video test sequence;

fig. 6a is a virtual viewpoint image obtained by drawing the original depth image of the 9 th reference viewpoint of the "bookarrrival" three-dimensional video test sequence;

fig. 6b is a virtual viewpoint image obtained by drawing a 9 th reference viewpoint of the "bookarrrival" three-dimensional video test sequence by using a decoded depth image;

fig. 6c is a virtual viewpoint image obtained by rendering the 9 th reference viewpoint of the "bookkarrival" three-dimensional video test sequence by using the method of the present invention;

fig. 7a is a virtual viewpoint image obtained by drawing an original depth image of the 9 th reference viewpoint of the "Altmoabit" three-dimensional video test sequence;

fig. 7b is a virtual viewpoint image obtained by drawing a 9 th reference viewpoint of the "altmoobit" three-dimensional video test sequence by using a decoded depth image;

FIG. 7c is a virtual viewpoint image obtained by rendering the 9 th reference viewpoint of the Altmoabit three-dimensional video test sequence by the method of the present invention;

FIG. 8a is an enlarged view of a portion of FIG. 6 a;

FIG. 8b is an enlarged view of a portion of FIG. 6 b;

FIG. 8c is an enlarged view of a portion of FIG. 6 c;

FIG. 9a is an enlarged view of a portion of FIG. 7 a;

FIG. 9b is an enlarged view of a portion of FIG. 7 b;

fig. 9c is an enlarged view of a detail of fig. 7 c.

Detailed Description

The invention is described in further detail below with reference to the accompanying examples.

The invention provides a method for post-processing a depth image, which comprises the following processing procedures: firstly, coding an obtained color image and a depth image corresponding to the color image to obtain a coded code stream; then, obtaining coding distortion compensation parameters of the depth image, and coding the coding distortion compensation parameters of the depth image to obtain a parameter code stream; then decoding the coded code stream and the parameter code stream to obtain a decoded color image, a decoded depth image and a decoded coding distortion compensation parameter of the depth image; and then, compensating the decoded depth image by using the coding distortion compensation parameter of the depth image to obtain a depth compensation image, and filtering the depth compensation image to obtain a depth filtering image, wherein the depth filtering image is used for drawing a virtual viewpoint image, namely the virtual viewpoint image can be obtained by drawing based on the depth image according to the decoded color image and the depth filtering image. The method specifically comprises the following steps:

Record the depth image of the kth reference viewpoint at the time t as

color image representing the kth reference viewpoint at time t

indicating the depth of the kth reference viewpoint at time tImage of a personThe middle coordinate position is the depth value of the pixel point of (x, y).

Here, three-dimensional video test sequences "bookwarrival" and "altmobait" provided by HHI laboratories in germany are used, each of which includes 16 color images of 16 reference viewpoints and corresponding 16 depth images, each of which has a resolution of 1024 × 768 and a frame rate of 15 frames per second, i.e., 15fps, and are standard test sequences recommended by ISO/MPEG. Fig. 2a and 2b show a color image of the 8 th and 10 th reference viewpoints of "bookangular", respectively; fig. 2c and 2d show the depth images corresponding to the color images of the 8 th and 10 th reference viewpoints of "Bookarrival", respectively; FIGS. 3a and 3b show a color image of the 8 th and 10 th reference viewpoints of "Altmoabit", respectively; fig. 3c and 3d show the depth images corresponding to the color images of the 8 th and 10 th reference viewpoints of "Altmoabit", respectively.

And secondly, respectively coding K color images with YUV color spaces of K reference viewpoints at the time t and K depth images corresponding to the color images according to a set coding prediction structure, then outputting the color image code stream and the depth image code stream frame by frame to obtain a coded code stream, and transmitting the coded code stream to the user side by the service side through a network.

Here, the set coding prediction structure is a known HBP coding prediction structure.

The Coding of the depth image can cause the quality of the decoded depth image to be reduced and inevitably cause the drawing performance of the virtual viewpoint image to be reduced, so the invention adopts a wiener filter to predict and obtain the Coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment according to the K depth images of the K reference viewpoints at the t moment and the K depth images of the K reference viewpoints at the t moment after Coding and then decoding, then adopts a CABAC (Context-based Adaptive Binary Arithmetic Coding, previously referenced Adaptive Binary Arithmetic Coding) lossless compression method to respectively code the Coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment, then outputs parameter code streams frame by frame, and finally transmits the parameter code streams to a user end through a network by a service end.

In this specific embodiment, the specific process of obtaining the coding distortion compensation parameters of the K depth images of the K reference viewpoints at the time t in the step (c) is as follows:

Defined as the current depth image.

③ 2, for the current depth image

Wherein m is more than or equal to 1 and less than or equal to 3, n is more than or equal to 1 and less than or equal to 3,to representThe middle coordinate position is the wavelet coefficient at (x, y).

to represent

The middle coordinate position is the wavelet coefficient at (x, y).

Is recorded as a coding distortion compensation parameter

Wherein L represents the filtering length range of the wiener filter,

expression solution

The mathematical expected value of (a) is,

to represent

The wavelet coefficient at the middle coordinate position (X + p, y + q), argmin (X) represents the parameter that minimizes the function X, i.e. theIs shown to make

The minimum parameter.

③ -5, according to the depth image of the kth reference viewpoint at the moment t after decodingThe coding distortion compensation parameters of the wavelet coefficient matrixes of the sub-bands in all directions of each level of wavelet transformation are obtained to obtain the current depth image

Coding distortion compensation parameter of

Decoding the coded code stream sent by the server by the user end to respectively obtain K color images and corresponding K depth images of K reference viewpoints at the t moment after decoding, and decoding the t moment after decodingRespectively correspond to the color image and the corresponding depth image of the kth reference viewpointAnd

wherein,

color image representing k-th reference viewpoint at time t after decoding

depth image representing kth reference viewpoint at decoded t-timeThe middle coordinate position is the depth value of the pixel point of (x, y).

Fifthly, the user end decodes the parameter code stream sent by the server end to obtain the coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment, then the coding distortion compensation parameters of the K depth images of the K reference viewpoints at the t moment are utilized to compensate the K depth images of the K reference viewpoints at the t moment after decoding, the K depth compensation images of the K reference viewpoints at the t moment after decoding are obtained, and the depth compensation image of the kth reference viewpoint at the t moment after decoding is recorded as the depth compensation image of the kth reference viewpoint at the t moment

Wherein,

depth compensated image representing kth reference viewpoint at decoded t-time

The middle coordinate position is the depth value of the pixel point of (x, y).

In this embodiment, the depth compensation image of the kth reference viewpoint at time t after decoding is obtained in the fifth step

The specific process comprises the following steps:

fifthly-1, decoding the depth image of the kth reference viewpoint at the t moment

to represent

The middle coordinate position is the wavelet coefficient at (x, y).

Compensated smallThe wave coefficient matrix is recorded as

Wherein,to represent

The middle coordinate position is the wavelet coefficient at (x + p, y + q).

Wherein,

depth compensated image representing kth reference viewpoint at decoded t-timeThe middle coordinate position is the depth value of the pixel point of (x, y).

Sixthly, because of the limitation of the depth image obtaining method, the edge area of the depth image is discontinuous, strong correlation exists between the depth image and the color image, and the moving object boundary of the depth image and the color image are consistent, therefore, the edge information of the color image can be used for assisting the filtering processing of the depth imageCarrying out bidirectional filtering processing on the K depth compensation images of the viewpoints to obtain K depth filtering images of K reference viewpoints at t moment after decoding, and recording the depth filtering image of the kth reference viewpoint at t moment after decoding as

Wherein,

depth filtered image representing kth reference viewpoint at decoded t-time

The middle coordinate position is the depth value of the pixel point of (x, y). When the virtual viewpoint image is rendered, the virtual viewpoint image can be obtained by rendering based on the depth image according to the K color images of the K reference viewpoints at the t moment after decoding and the K depth filtering images of the K reference viewpoints at the t moment after decoding.

In this embodiment, the depth compensated image of the kth reference viewpoint at time t after decoding in step [ ]

And the currently processed pixel point is the current pixel point.

Sixthly-2, recording the coordinate position of the current pixel point as p ', recording the coordinate position of the neighborhood pixel point of the current pixel point as q', and then adopting a gradient template G_xFor the depth value of the current pixel pointPerforming convolution operation to obtain gradient value gx (p') of current pixel point,

then judging whether | gx (p') | ≧ T is true, if true,

the step of sixthly-3 is executed, otherwise, the step of sixthly-4 is executed, wherein,

G_{x} = [\begin{matrix} - 1 & 0 & 1 \\ - 2 & 0 & 2 \\ - 1 & 0 & 1 \end{matrix}],

"+" is the convolution operation symbol, "|" is the operation symbol for absolute value, T is the gradient amplitude threshold, in this embodiment, T = 5.

Wherein,

| p '-q' | | represents the euclidean distance between the coordinate position p 'and the coordinate position q', "| | | |" is a euclidean distance symbol,denotes the standard deviation as σ_r1The function of the gaussian function of (a) is,

"| |" is an operation symbol for absolute value,

color image representing k-th reference viewpoint at time t after decoding

color image representing k-th reference viewpoint at time t after decoding

depth compensated image representing kth reference viewpoint at decoded t-time

The depth value of a pixel point with a middle coordinate position of q ', exp () represents an exponential function with e as a base, e =2.71828183, and N (p ') represents a 7 × 7 neighborhood window with a pixel point with a coordinate position of p ' as a center, and in the actual processing process, neighborhood windows with other sizes can be adopted, but a large number of experiments show that the best effect can be achieved when the 7 × 7 neighborhood window is adopted; then the step of (4) is executed.

In the present embodiment, the standard deviation (σ)_s1,σ_r1)=(5,0.1)。

Sixthly, 4, calculating the depth value of the current pixel point

As filtered depth values

Namely, it isWherein "

"in" = "is an assigned symbol; then the step of (4) is executed.

Sixthly-5, depth compensation image of k-th reference viewpoint at t moment after decoding

Taking the next pixel point to be processed as the current pixel point, then returning to the step (c) -2 to continue executing until the depth compensation image of the kth reference viewpoint at the moment t after decoding

The depth images of the three-dimensional video test sequences of "bookwarrival" and "altmoobit" are subjected to a filtering process experiment, fig. 4a shows a decoded depth image of the 8 th reference viewpoint of "bookwarrival", fig. 4b shows a depth filtered image of the 8 th reference viewpoint of "bookwarfarval" obtained by the method of the present invention, fig. 5a shows a decoded depth image of the 8 th reference viewpoint of "altmoobit", fig. 5b shows a depth filtered image of the 8 th reference viewpoint of "altmoobit" obtained by the method of the present invention, and as can be seen from fig. 4a to 5b, the depth images obtained after the filtering process by the method of the present invention, i.e., the depth filtered images, maintain important geometric features of the depth images, and generate satisfactory edges and smooth contours.

The subjective performance of virtual viewpoint image rendering on the Bookarrival three-dimensional video test sequence and the Altmoabit three-dimensional video test sequence is compared by using the method.

The virtual viewpoint image obtained by the method of the present invention is compared with a virtual viewpoint image obtained without the method of the present invention (directly using the decoded image). Fig. 6a shows a virtual viewpoint image obtained by rendering a 9 th reference viewpoint of "bookwarrival" with an original depth image, fig. 6b shows a virtual viewpoint image obtained by rendering a 9 th reference viewpoint of "bookwarrival" with a decoded depth image, fig. 6c shows a virtual viewpoint image obtained by rendering a 9 th reference viewpoint of "bookwarrival" with the method of the present invention, fig. 7a shows a virtual viewpoint image obtained by rendering a 9 th reference viewpoint of "altmoebit" with an original depth image, fig. 7b shows a virtual viewpoint image obtained by rendering a 9 th reference viewpoint of "altmoebit" with a decoded depth image, fig. 7c shows a virtual viewpoint image obtained by rendering a 9 th reference viewpoint of "altmoebit" with the method of the present invention, fig. 8a, fig. 8b and fig. 8c respectively show partial details of fig. 6a, fig. 6b and fig. 6c, and fig. 9a partial enlarged view of fig. 9a, Fig. 9b and 9c show enlarged partial detail views of fig. 7a, 7b and 7c, respectively. As can be seen from fig. 6a to 9c, the virtual viewpoint image obtained by the method of the present invention can maintain better object contour information, thereby reducing coverage of the background generated in the mapping process to the foreground due to distortion of the depth image, and performing bidirectional filtering processing on the edge area of the depth image according to the edge information of the color image, so as to effectively eliminate stripe noise in the drawn virtual viewpoint image.

The peak signal-to-noise ratio (PSNR) of the virtual viewpoint image obtained by the method of the present invention is compared with the peak signal-to-noise ratio (PSNR) of the virtual viewpoint image obtained by the method without the method of the present invention, and the comparison results are listed in table 1, and it can be seen from table 1 that the quality of the virtual viewpoint image obtained by the method of the present invention is significantly better than the quality of the virtual viewpoint image obtained by the method without the present invention, which is sufficient to show that the method is effective and feasible.

TABLE 1 comparison of peak signal-to-noise ratio using and without the inventive method

Claims

1. A method for post-processing a depth image is characterized in that the processing process comprises the following steps: firstly, coding an obtained color image and a depth image corresponding to the color image to obtain a coded code stream; then, obtaining coding distortion compensation parameters of the depth image, and coding the coding distortion compensation parameters of the depth image to obtain a parameter code stream; then decoding the coded code stream and the parameter code stream to obtain a decoded color image, a decoded depth image and a decoded coding distortion compensation parameter of the depth image; and then, compensating the decoded depth image by using the coding distortion compensation parameter of the depth image to obtain a depth compensation image, and performing filtering processing on the depth compensation image to obtain a depth filtering image, wherein the depth filtering image is used for drawing a virtual viewpoint image.

2. The method for post-processing the depth image according to claim 1, comprising the following steps:

Record the depth image of the kth reference viewpoint at the time t as

color image representing the kth reference viewpoint at time t

depth image representing kth reference viewpoint at time t

The middle coordinate position is the depth value of the pixel point of (x, y);

And

wherein,

color image representing k-th reference viewpoint at time t after decoding

The value of the ith component of the pixel point with the middle coordinate position of (x, y),representation decodingDepth image of kth reference viewpoint at subsequent t timeThe middle coordinate position is the depth value of the pixel point of (x, y);

Wherein,

depth compensated image representing kth reference viewpoint at decoded t-time

The middle coordinate position is the depth value of the pixel point of (x, y);

sixthly, respectively carrying out bidirectional filtering processing on the decoded K depth compensation images of the K reference viewpoints at the t moment by adopting a bidirectional filter to obtain K depth filtering images of the K reference viewpoints at the t moment, and recording the depth filtering image of the kth reference viewpoint at the t moment as a decoded depth filtering image

Wherein,

depth filtered image representing kth reference viewpoint at decoded t-time

The middle coordinate position is the depth value of the pixel point of (x, y).

3. The method of claim 2, wherein the specific process of obtaining the coding distortion compensation parameters of K depth images of K reference viewpoints at time t comprises:

Defining the depth image as a current depth image;

③ 2, for the current depth image

Implementing 3-level wavelet transform to obtain wavelet coefficient matrix of 3 directional sub-bands of each level of wavelet transform, the 3 directional sub-bands including horizontal sub-band, vertical sub-band and diagonal sub-bandThe wavelet coefficient matrix of the nth direction sub-band obtained after the mth level wavelet transformation is carried out is recorded asWherein m is more than or equal to 1 and less than or equal to 3, n is more than or equal to 1 and less than or equal to 3,

to represent

The middle coordinate position is the wavelet coefficient at (x, y);

The wavelet coefficient matrix of the nth direction sub-band obtained after the mth level wavelet transformation is carried out is recorded asWherein m is more than or equal to 1 and less than or equal to 3, n is more than or equal to 1 and less than or equal to 3,to represent

The middle coordinate position is the wavelet coefficient at (x, y);

thirdly, predicting and obtaining the depth image of the kth reference viewpoint at the t moment after decoding by adopting a wiener filterThe coding distortion compensation parameters of the wavelet coefficient matrix of each directional subband of each level of wavelet transform are

Is recorded as a coding distortion compensation parameter

Wherein L represents the filtering length range of the wiener filter,expression solution

The mathematical expected value of (a) is,to representThe middle coordinate position is the wavelet coefficient at (X + p, y + q), argmin (X) represents the parameter that minimizes the function X;

Coding distortion compensation parameter of

4. A method for post-processing depth image as claimed in claim 2 or 3, wherein said step (c) obtains the depth compensation image of the kth reference viewpoint at the time t after decoding

The specific process comprises the following steps:

⑤-1、for the depth image of the k-th reference viewpoint at the decoded t timeImplementing 3-level wavelet transform to obtain wavelet coefficient matrix of 3 directional sub-bands of each level of wavelet transform, the 3 directional sub-bands including horizontal sub-band, vertical sub-band and diagonal sub-band

to represent

The middle coordinate position is the wavelet coefficient at (x, y);

fifthly-2, calculating the depth image of the kth reference viewpoint at the t moment after decodingThe wavelet coefficient matrixes of all direction sub-bands of each level of wavelet transform are respectively compensated to obtain wavelet coefficient matrixes

The compensated wavelet coefficient matrix is recorded as

Wherein,

to represent

The middle coordinate position is a wavelet coefficient at (x + p, y + q);

Wherein,

5. The method of claim 4, wherein the step of compensating the depth of the kth reference viewpoint for the decoded t-th time is performed

The currently processed pixel point is the current pixel point;

then judging whether | gx (p') | is more than or equal to T, if so, executing a step (c-3), otherwise, executing a step (c-4), wherein,

G_{x} = [\begin{matrix} - 1 & 0 & 1 \\ - 2 & 0 & 2 \\ - 1 & 0 & 1 \end{matrix}],

Filtering operation is carried out to obtain the depth value of the current pixel point after filtering, and the depth value is recorded as Wherein,

<math> <mrow> <msub> <mi>r</mi> <mrow> <mi>s</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <msup> <mi>p</mi> <mo>′</mo> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> <mo>/</mo> <munder> <mi>Σ</mi> <mrow> <msup> <mi>q</mi> <mo>′</mo> </msup> <mo>&Element;</mo> <mi>N</mi> <mrow> <mo>(</mo> <msup> <mi>p</mi> <mo>′</mo> </msup> <mo>)</mo> </mrow> </mrow> </munder> <msub> <mi>G</mi> <mrow> <mi>σs</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <msup> <mi>p</mi> <mo>′</mo> </msup> <mo>-</mo> <msup> <mi>q</mi> <mo>′</mo> </msup> <mo>|</mo> <mo>|</mo> <mo>)</mo> </mrow> <msub> <mi>G</mi> <mrow> <mi>σs</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <mo>|</mo> <msubsup> <mover> <mi>I</mi> <mo>~</mo> </mover> <mrow> <mi>R</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> </mrow> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <msup> <mi>p</mi> <mo>′</mo> </msup> <mo>)</mo> </mrow> <mo>-</mo> <msubsup> <mover> <mi>I</mi> <mo>~</mo> </mover> <mrow> <mi>R</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> </mrow> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <msup> <mi>q</mi> <mo>′</mo> </msup> <mo>)</mo> </mrow> <mo>|</mo> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>

G_σs1(| p '-q' |) represents a standard deviation of σ_s1The function of the gaussian function of (a) is,

"| |" is an operation symbol for absolute value,

color image representing k-th reference viewpoint at time t after decoding

color image representing k-th reference viewpoint at time t after decoding

The value of the i-th component of the pixel point with the middle coordinate position q',depth compensated image representing kth reference viewpoint at decoded t-time

sixthly, 4, calculating the depth value of the current pixel point

As filtered depth valuesNamely, it isWherein "

"in" = "is an assigned symbol; then executing the step (sixthly-5);

Taking the next pixel point to be processed as the current pixel point, then returning to the step (c) -2 to continue executing until the depth compensation image of the kth reference viewpoint at the moment t after decodingAfter all the pixel points in the image are processed, a filtered depth filtering image is obtained and recorded as

6. The method as claimed in claim 5, wherein the coding prediction structure set in step (ii) is an HBP coding prediction structure.