CN117440158A

CN117440158A - MIV immersion type video coding rate distortion optimization method based on three-dimensional geometric distortion

Info

Publication number: CN117440158A
Application number: CN202311759886.3A
Authority: CN
Inventors: 曾焕强; 孔庆玮; 陈婧; 朱建清; 施一帆; 林琦; 郑惠洁
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-01-23
Anticipated expiration: 2043-12-20
Also published as: CN117440158B

Abstract

The invention discloses an MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion, which relates to the field of video coding and comprises the following steps: s1, encoding an immersive video sequence based on an MIV encoding platform, and calculating a depth mapping range coefficient after generating an atlas; s2, when the two-dimensional video encoder supporting the MIV standard is used for encoding the immersive video geometric atlas, a relation model of three-dimensional geometric distortion and mean square error is constructed; s3, calculating a three-dimensional geometric distortion coefficient according to a relation model of the three-dimensional geometric distortion and the mean square error; s4, calculating a new Lagrangian multiplier in the rate distortion optimization model according to the three-dimensional geometric distortion coefficient, and encoding the current CTU based on the adjusted rate distortion optimization model so as to improve the rate distortion performance of the immersive video rendering quality. The final rendered immersive video has better rate distortion performance of quality and code rate.

Description

MIV immersion type video coding rate distortion optimization method based on three-dimensional geometric distortion

Technical Field

The invention relates to the field of video coding, in particular to an MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion.

Background

With the development of technologies such as acquisition, interaction and display, immersive video serving as a new generation of video media technology gradually becomes a research hotspot. Immersive video supports users viewing three-dimensional scenes with 6 degrees of freedom (Degrees of Freedom, doF), giving users a feeling of being immersive. Therefore, the immersive video needs to contain three-dimensional information of the scene, the data form of the scene contains multi-view and depth map video, the light field and point cloud data and the like, the form is complex, the data volume is huge, and the transmission and storage face serious experiments. The moving picture expert group (Moving Picture Experts Group, MPEG) therefore proposes a new coding standard MIV (MPEG Immersive Video) for immersive video. MIV mainly takes Multi-view plus depth map video (Multi-View plus Depth map video, MVD) as a signal source, compresses redundancy among multiple views by a view-based rendering mode to obtain a compact representation form atlas (atlas) of the Multi-view data, and compresses space-time redundancy by a 2D video encoder to obtain a binary code stream.

The goal of video coding is to reduce the bits consumed by the video code stream as much as possible on the premise of ensuring the video quality, however, the code rate of the coded output and the reconstructed video quality are mutually restricted and contradicted. The rate distortion optimization in video coding is to find the optimal coding parameter value based on the rate distortion theory, so as to achieve the balance between the code rate and the quality, and is one of the main methods for improving the performance of the encoder. The rate-distortion optimization process may be expressed as minimizing distortion of the reconstructed video without exceeding a defined code rate. The rate-distortion process of existing video encoders uses the sum of squares error (Sum of Square Error, SSE) as a distortion metric method that can measure the pixel fidelity of the geometry-set video, but cannot measure the distortion level of the geometry information in the immersive video and ultimately the distortion level of the rendered video.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide an MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion, which can improve the rate distortion performance of immersive video rendering quality and obtain better rendering video quality under the same code rate.

The invention adopts the following technical scheme:

an MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion, comprising:

s1, encoding an immersive video sequence based on an MIV encoding platform, and calculating a depth mapping range coefficient after generating an atlas;

s2, when the two-dimensional video encoder supporting the MIV standard is used for encoding the immersive video geometric atlas, a relation model of three-dimensional geometric distortion and mean square error is constructed;

s3, calculating a three-dimensional geometric distortion coefficient according to a relation model of the three-dimensional geometric distortion and the mean square error;

s4, calculating a new Lagrangian multiplier in the rate distortion optimization model according to the three-dimensional geometric distortion coefficient, and encoding the current CTU based on the adjusted rate distortion optimization model so as to improve the rate distortion performance of the immersive video rendering quality.

Preferably, in the step S1, calculating the depth map range includes:

calculating pixels in each sub-block patch according to the camera internal parameters of the source view of each sub-block patch in the generated graphDepth map range coefficients under the corresponding camera coordinate system.

Preferably, pixels in the sub-block patch are calculated according to in-camera parameters of the source view to which each sub-block patch belongs in the generated map setThe depth map range coefficients under the corresponding camera coordinate system are as follows:

；

wherein,representing pixel +.>The corresponding minimum value of the depth mapping coefficient; />Representing the maximum value of the depth mapping coefficient corresponding to the pixel; />Representing a nearest distance of the source viewpoint depth map; />Representing the furthest distance of the source viewpoint depth map.

Preferably, in the step S2, a relationship model between the three-dimensional geometric distortion and the mean square error is constructed, and the relationship model is expressed as follows:

；

wherein,representing three-dimensional geometric distortion of the CU; />Representing the number of sub-blocks of the CU divided, wherein the sub-block size is determined by the minimum sub-block patch size in the MIV coding parameters; />Representing sub-block->Is a Euclidean distance coefficient of (2); />Representing sub-block->Normal vector coefficients of (a); />Representing the mean square error distortion of the CU.

Preferably, in the step S3, the specific calculation method of the three-dimensional geometric distortion coefficient is as follows:

s31, calculating Euclidean distance coefficients of the sub-blocks according to the sub-block weight average value, the depth mapping range coefficient average value, the pixel average value and the pixel variance;

s32, calculating a normal vector coefficient of the sub-block based on a least square method;

and S33, calculating a three-dimensional geometric distortion coefficient according to the Euclidean distance coefficient and the normal vector coefficient.

Preferably, the step S31 specifically includes:

s311, calculating a sub-block weight average value;

pixel weight under perspective projection modelExpressed as:

；

wherein,representing pixels in the image coordinate system>Is the abscissa of (2); />Representing pixels in the image coordinate system>Is the ordinate of (2);representing the projected focus in the image coordinate systemA horizontal component; />Representing the vertical component of the projected focus in the image coordinate system; />A horizontal component representing a principal point in the image coordinate system; />A vertical component representing a principal point in the image coordinate system;

pixels under ERP projection and orthogonal projection modelsWeight->；

Sub-blockWeight mean->The expression is as follows:

；

wherein,representing sub-block->The number of pixels included;

s312, calculating the coefficient mean value of the depth mapping range of the sub-block pixels;

pixel arrangementDepth map range coefficient +_>The definition is as follows:

；

wherein,representing pixel +.>The corresponding minimum value of the depth mapping coefficient; />Representing pixel +.>A corresponding maximum value of the depth map coefficient;

sub-blockDepth map range coefficient mean ++>The method comprises the following steps:

；

s313, calculating a Euclidean distance coefficient;

sub-blockEuropean distance coefficient->Defined by the following formula:

；

wherein,representing sub-block pixel values +.>The mean of the squares; />Representing sub-block pixels->Square variance; />And->Two constant coefficients, the value is 2.

Preferably, the step S32 specifically includes:

s321, according to the pixel valueCalculate the corresponding depth value +.>The calculation process is as follows:

；

wherein,the maximum value of the pixel value is represented, and the bit depth is represented as follows:

；

wherein,representing the bit depth of a pixel in an image; />An index representing 2 in terms of bit depth of pixels in the image;

s322, according to sub-blockSize, building sub-block constant matrixThe following are provided:

；

wherein,representing pixel +.>The number of rows in the sub-block, +.>Representing pixel +.>Column number of sub-block>Representing the number of pixels contained in the sub-block;

s323, according to the sub-block constant matrixAnd depth value->The column vector is formed, an approximate plane formula of the sub-block is calculated, and the calculation process is as follows:

depth valueComponent column vector->The definition is as follows:

；

the approximate plane formula of the sub-block is defined as follows:

；

wherein,、/>and->Is a coefficient to be constant in a plane formula;

the plane formula is calculated by the least square method:

；

s324, calculating a normal vector coefficient of the sub-block according to an approximate plane formula, wherein the process is as follows:

the normal vector coefficients are defined as follows:

；

wherein,representing sub-block->Normal vector coefficients of (a).

Preferably, in S33, the three-dimensional geometric distortion coefficient is calculated according to the euclidean distance coefficient and the normal vector coefficient, as follows:

；

wherein,representing the number of sub-blocks of a CU divided, the sub-block size is defined by the minimum sub-block patch size in the MIV coding parametersDetermining; />Representing sub-block->Is a Euclidean distance coefficient of (2); />Representing sub-block->Normal vector coefficients of (a).

Preferably, in S4, the new lagrangian multiplier is expressed as follows:

；

wherein,a Lagrangian multiplier in the original rate distortion cost process is represented; />Representing the three-dimensional geometric distortion coefficients.

Compared with the prior art, the invention has the following beneficial effects:

(1) The 2D video encoder used for the existing MIV encoding immersive video lacks consideration of geometric atlas video characteristics and relation between geometric atlas video encoding quality and immersive video rendering quality, and cannot realize the optimization of rendering quality and rate distortion performance of encoding code rate; according to the method, in the encoding geometric diagram set, three-dimensional geometric distortion caused by geometric diagram set pixel distortion is considered, and Lagrangian factors in an original rate distortion model are changed, so that the rate distortion cost is calculated more reasonably;

(2) Aiming at the characteristics of a geometric atlas, the method multiplies the distortion value measurement of different areas by different distortion weights in the rate distortion optimization process according to the three-dimensional geometric distortion coefficients of different subblocks, establishes a rate distortion optimization model based on three-dimensional geometric distortion, enables the rate distortion optimization model based on three-dimensional geometric distortion of different areas to reflect the corresponding relation between the pixel value distortion condition of the areas and the three-dimensional geometric information distortion recovered in rendering, improves the rate distortion performance of immersive video rendering quality and coding code rate, and has better coding effect.

Drawings

FIG. 1 is a flow chart of an MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion according to an embodiment of the present invention;

FIG. 2 is a block diagram of an overall encoding scheme provided in an embodiment of the present invention;

FIG. 3 is a flowchart of generating relevant parameters when an MIV coding platform provided by an embodiment of the present invention codes a multi-view plus depth map video sequence;

fig. 4 is a flowchart of a 2D video encoder according to an embodiment of the present invention when encoding an atlas.

Detailed Description

The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.

In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present invention and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, and the described embodiments are merely some, rather than all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Referring to fig. 1, the MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion according to the present embodiment includes the following steps:

Specifically, referring to fig. 2 and 4, the overall encoding procedure of the present embodiment is as follows.

Step one, the coding platform inputs the immersive video sequence and related configuration information, and after the MIV completes the generation of the atlas, the depth mapping range is calculated, and the flow is shown in fig. 3. And traversing all pixel positions of the atlas, and if the current pixel belongs to a valid pixel, acquiring sub-block patch information of the current pixel, wherein the sub-block patch information comprises information such as depth range of a viewpoint of the sub-block patch, camera imaging type, parameters in a camera and the like, and calculating a depth mapping range, wherein the depth mapping range is specifically as follows.

A depth map range is then calculated from the depth range:

；

Step two, the VVC encoding platform inputs geometric atlas data, reads the corresponding depth mapping range and pixel camera type mark before encoding each frame, and the current encoding frame is defined as。

Dividing the current coding frame into sizesCoding Tree Uint (CTU) of the code Tree (code Tree Uint) is divided, +.>The total number of CTUs involved is defined as +.>The currently encoded CTU is noted +.>The currently encoded CTU sequence number is marked +.>。

Step four, theIn a size of +.>Divided into sub-blocks, the total number of sub-blocks is marked +.>The current sub-block sequence number is marked as +.> 。

Fifthly, calculating the Euclidean distance coefficient of the current sub-block according to the calculated sub-block pixel weight average value, the pixel depth mapping range coefficient average value, the pixel square average value and the pixel square variance, wherein the Euclidean distance coefficient of the current sub-block is specifically as follows:

calculating a pixel weight mean value of the sub-block and calculating a pixel weight under the perspective projection modelExpressed as:

；

wherein,representing pixels in the image coordinate system>Is the abscissa of (2); />Representing pixels in the image coordinate system>Is the ordinate of (2);representing a horizontal component of the projected focus in the image coordinate system; />Representing the vertical component of the projected focus in the image coordinate system; />A horizontal component representing a principal point in the image coordinate system; />A vertical component representing a principal point in the image coordinate system;

pixels under ERP projection and orthogonal projection modelsWeight->。

Sub-blockWeight mean->The expression is as follows:

；

wherein,representing sub-block->The number of pixels included.

Calculating the coefficient mean value of the sub-block pixel depth mapping range;

pixel arrangementDepth map range coefficient +_>The definition is as follows:

；

。

calculating Euclidean distance coefficient and sub-blockEuropean distance coefficient->Defined by the following formula:

；

And step six, calculating the normal vector coefficient of the sub-block based on the least square method, wherein the normal vector coefficient is specifically as follows.

According to pixel valuesCalculate the corresponding depth value +.>The calculation process is as follows:

；

wherein the maximum value of the pixel value is represented, related to the bit depth, as follows:

；

constructing a subblock constant matrix according to subblock size：

；

Calculating parameters of the approximate plane of the current sub-block by a least square method:

；

wherein,、/>and->Is a coefficient to be constant in a plane formula; />For depth value +.>The constituent column vectors are defined as follows:

；

according to the obtained normal vector coefficients of the planar parameter technology sub-blocks, the normal vector coefficients are defined as follows:

；

wherein,representing sub-block->Normal vector coefficients of (a); />Representing the number of pixels contained in the sub-block;

step seven, repeating the step five and the step six until the sequence number of the sub-block is equal toEqual to->According to->Euclidean distance coefficient and normal vector coefficient calculation of all sub-blocks>Is>。

The definition of the three-dimensional geometric distortion coefficient is as follows:

；

wherein,representing the number of sub-blocks of the CU divided, wherein the sub-block size is determined by the minimum sub-block patch size in the MIV coding parameters; />Representing sub-block->Is a Euclidean distance coefficient of (2); />Representing sub-block->Normal vector coefficients of (a).

Step eight, calculating a new Lagrangian multiplier according to the three-dimensional geometric distortion coefficient to realize the improvement of the rendering quality of MIV and the rate distortion performance of the depth map code rate, wherein the specific process is as follows:

novel Lagrangian multiplierThe definition is as follows:

；

wherein,lagrangian multiplier in the original rate distortion cost process is represented by +.>Representing the three-dimensional geometric distortion coefficients. The improved rate distortion cost calculation formula:

；

and carrying out rate distortion optimization on the current coding unit according to the rate distortion cost.

Step nine, theAs->Returning to the fourth step and continuing to execute until +.>The CTUs in (a) complete the encoding, and then step ten is executed.

And step ten, taking the next frame to be encoded as the current frame to be encoded, returning to the step three to continue execution until all the frames to be encoded are encoded.

To further illustrate the feasibility and effectiveness of the method of the invention, the following experiments were performed.

The method is implemented on a TMIV14.1 coding test platform of an MIV coding standard and a VVenc0.3.1 coding test platform of a VVC coding standard. The test mode is an MIV Anchor mode, and the relevant quantization parameter is determined by the test condition set by the standard, and the standard test sequence is used, and the specific condition is shown in table 1.

TABLE 1 Experimental sequences

To illustrate the significance of the inventive method to provide an immersive video coding effect, the inventive method was compared to the original TMIV14.1 and vvenc0.3.1 platforms. In the experiment, two image quality objective evaluation indexes of immersive video peak signal-to-noise ratio (Immersive Video Peak Signal to Noise Ratio, IV-PSNR) and Multi-scale structural similarity (Multi-Scale Structural Similarity, MS-SSIM) are adopted to evaluate the video quality of the video sequence after decoding and rendering. Table 2 shows the comparison of the inventive method with the original platform coding results. Under the condition that BDBR in the table 2 represents the same decoding rendering quality, compared with the original method, the smaller the bit rate saved by the method is, the more the bit rate saved is, when total bit rate is calculated by the method, the texture atlas bit rate is calculated, the geometric atlas bit rate is added, and the total bit rate of the metadata atlas bit rate is calculated, and when the bit rate is calculated by the method, only the depth atlas bit rate is calculated by the method. As can be seen from the experimental data in Table 2, the method of the invention can save at most under the same IVPSNR indexSum of the total code rate of>Can save on average the depth map code rateSum of the total code rate of>Is a depth map code rate; under the same MS-SSIM index, the highest saving can be achievedSum of the total code rate of>Can save on average +.>Sum of the total code rate of (2)Is a depth map code rate of (c).

Table 2 experimental results

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. An MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion is characterized by comprising the following steps:

2. The MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion according to claim 1, wherein in S1, calculating a depth map range comprises:

3. The MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion according to claim 2, wherein pixels in the sub-block patch are calculated according to in-camera parameters of a source view to which each sub-block patch belongs in the generated map setThe depth map range coefficients under the corresponding camera coordinate system are as follows:

；

4. The MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion according to claim 1, wherein in S2, a relation model of three-dimensional geometric distortion and mean square error is constructed, which is represented as follows:

；

5. The MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion according to claim 1, wherein in S3, the specific calculation method of the three-dimensional geometric distortion coefficient is as follows:

6. The MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion according to claim 5, wherein S31 specifically comprises:

s311, calculating a sub-block weight average value;

pixel weight under perspective projection modelExpressed as:

；

wherein,representing pixels in the image coordinate system>Is the abscissa of (2); />Representing pixels in the image coordinate system>Is the ordinate of (2); />Representing a horizontal component of the projected focus in the image coordinate system; />Representing the vertical component of the projected focus in the image coordinate system; />A horizontal component representing a principal point in the image coordinate system; />A vertical component representing a principal point in the image coordinate system;

pixels under ERP projection and orthogonal projection modelsWeight->；

Sub-blockWeight mean->The expression is as follows:

；

wherein,representing sub-block->The number of pixels included;

pixel arrangementDepth map range coefficient +_>The definition is as follows:

；

s313, calculating a Euclidean distance coefficient;

sub-blockEuropean distance coefficient->Defined by the following formula:

；

wherein,representing sub-blocksPixel value +.>The mean of the squares; />Representing sub-block pixels->Square variance; />And->Two constant coefficients, the value is 2.

7. The MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion according to claim 5, wherein S32 specifically comprises:

；

s322, constructing a subblock constant matrix according to the subblock sizeThe following are provided:

；

depth valueComponent column vector->The definition is as follows:

；

the approximate plane formula of the sub-block is defined as follows:

；

wherein,、/>and->Is a coefficient to be constant in a plane formula;

the plane formula is calculated by the least square method:

；

the normal vector coefficients are defined as follows:

；

wherein,representing sub-block->Normal vector coefficients of (a).

8. The MIV immersive video coding rate distortion optimization method based on three-dimensional geometric distortion according to claim 5, wherein in S33, the three-dimensional geometric distortion coefficient is calculated according to the euclidean distance coefficient and the normal vector coefficient as follows:

；

wherein,representing the number of sub-blocks of the CU divided, wherein the sub-block size is determined by the minimum sub-block patch size in the MIV coding parameters;representing sub-block->Is a Euclidean distance coefficient of (2); />Representing sub-block->Normal vector coefficients of (a).

9. The method for optimizing MIV immersive video coding rate distortion based on three-dimensional geometric distortion according to claim 5, wherein in S4, the new lagrangian multiplier is expressed as follows:

；