CN103220532B

CN103220532B - The associated prediction coded method of three-dimensional video-frequency and system

Info

Publication number: CN103220532B
Application number: CN201310158699.XA
Authority: CN
Inventors: 季向阳; 汪启扉; 戴琼海; 张乃尧
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2013-05-02
Filing date: 2013-05-02
Publication date: 2016-08-10
Anticipated expiration: 2033-05-02
Also published as: CN103220532A

Abstract

The present invention proposes a joint predictive coding method and system for stereoscopic video. Among them, the method includes: S1: Input stereoscopic video and divide the stereoscopic video into multiple coded macroblocks; S2: Predict the depth prediction disparity of the current coded macroblock through the method of depth prediction, and perform depth-assisted viewing on the current coded macroblock Inter-predictive coding; S3: Perform traditional inter-view predictive coding on the current macroblock; S4: Perform time-domain predictive coding on the current coded macroblock; S5: Calculate the depth-assisted inter-view predictive coding and traditional inter-view prediction of the current coded macroblock respectively Rate-distortion performance in coding and time-domain predictive coding modes; S6: Select the predictive coding mode with the best rate-distortion performance as the prediction mode of the currently coded macroblock and perform coding. According to the method of the embodiment of the present invention, the disparity of coded macroblocks is estimated by depth to perform inter-view compensation prediction, which reduces the bit rate required for disparity coding in stereoscopic video coding and improves the efficiency of stereoscopic video coding.

Description

Joint predictive coding method and system for stereoscopic video

技术领域technical field

本发明涉及视频编码领域，特别涉及一种立体视频的联合预测编码方法及系统。The present invention relates to the field of video coding, in particular to a joint predictive coding method and system for stereoscopic video.

背景技术Background technique

随着视频技术的不断发展，立体视频以其逼真的视觉效果获得了广泛的关注。在立体视频中，视频数据由视频序列和深度图序列构成。其中，视频序列通常包含两路甚至多路视频序列。深度图序列则包含每一路视频所对应的深度图。因此，在立体视频的应用中，如何有效的压缩和传输海量的视频和深度图成为立体视频应用的关键技术瓶颈之一。With the continuous development of video technology, stereoscopic video has gained widespread attention for its realistic visual effects. In stereoscopic video, the video data consists of a video sequence and a depth map sequence. Wherein, the video sequence usually includes two or even multiple video sequences. The depth map sequence contains the depth map corresponding to each channel of video. Therefore, in the application of stereoscopic video, how to effectively compress and transmit massive video and depth maps has become one of the key technical bottlenecks in the application of stereoscopic video.

为了实现对立体视频数据的高效压缩，研究人员提出了多视点视频编码方案。在该方案中，多视点视频中的一路视频作为基本视点，采用传统的视频编码方案压缩时域上的冗余。对于其余视点的视频，该编码方案引入了视间预测模式，通过时域预测和视间预测来压缩多视点视频的时域以及视间冗余，从而有效的降低了编码多视点视频所需要的码率。由于深度图可以视为多视点灰度视频序列，因此，多视点视频编码方案同样用来对深度图进行编码。在当前的主流立体视频编码方案中，编码器对多视点视频以及深度图分别采用多视点视频编码方案进行压缩，获得视频以及深度图两路码流，并将两路码流同时传输到解码端，重构多视点视频以及深度图序列。解码端根据用户需要进一步绘制虚拟视点，从而形成用户所需要的立体视频序列，并在相应的立体视频显示器上进行播放。In order to achieve efficient compression of stereoscopic video data, researchers proposed a multi-view video coding scheme. In this scheme, one video in the multi-view video is taken as the basic view, and the redundancy in the time domain is compressed using the traditional video coding scheme. For the video of other viewpoints, the coding scheme introduces an inter-view prediction mode, which compresses the time domain and inter-view redundancy of multi-view video through temporal prediction and inter-view prediction, thus effectively reducing the time required for encoding multi-view video. code rate. Since the depth map can be regarded as a multi-view grayscale video sequence, the multi-view video coding scheme is also used to code the depth map. In the current mainstream stereoscopic video coding scheme, the encoder compresses the multi-viewpoint video and the depth map respectively using a multi-viewpoint video coding scheme to obtain two code streams of video and depth map, and transmit the two code streams to the decoding end at the same time , to reconstruct multi-view video and depth map sequences. The decoding end further draws the virtual viewpoint according to the needs of the user, thereby forming the stereoscopic video sequence required by the user, and playing it on the corresponding stereoscopic video display.

尽管多视点视频编码能够有效的压缩多视点视频以及深度图的时域和视间冗余，然而多视点视频和深度图之间的冗余仍无法被有效地压缩。在立体视频中，深度图表征了视频序列中对应点的深度信息。在给定拍摄条件的前提下，每个编码宏块的视差信息可以通过深度值预测获得。在立体视频中，深度图可以视为多视点视频编码的边信息，从而通过深度计算视差可以代替通过视差搜索所获得视差，减少编码视差所需要的编码码率，压缩多视点视频以及深度图之间的冗余。Although multi-view video coding can effectively compress the temporal and inter-view redundancy of multi-view videos and depth maps, the redundancy between multi-view videos and depth maps cannot be effectively compressed. In stereoscopic video, the depth map represents the depth information of corresponding points in the video sequence. Under the premise of given shooting conditions, the disparity information of each coded macroblock can be obtained through depth value prediction. In stereoscopic video, the depth map can be regarded as the side information of multi-view video coding, so the disparity obtained by disparity search can be replaced by disparity calculated by depth, the coding bit rate required for coding disparity can be reduced, and the difference between multi-view video and depth map can be compressed. Redundancy between.

目前基于多视点视频和深度图联合编码的立体视频编码方式有两种。一种是编码器通过根据当前待编码视频帧对应的深度图和及其参考视频帧渲染出一幅虚拟参考帧，从而减少深度图和视差编码中存在的冗余信息。另一种是通过时域运动信息以及视间视差信息的几何约束关系得出时域运动信息和视差信息相互关系的预测方法。Currently, there are two stereoscopic video coding methods based on joint coding of multi-view video and depth map. One is that the encoder renders a virtual reference frame according to the depth map corresponding to the current video frame to be encoded and its reference video frame, so as to reduce the redundant information in the depth map and disparity coding. The other is a prediction method that obtains the relationship between temporal motion information and disparity information through the geometric constraint relationship between temporal motion information and inter-view disparity information.

现有技术的缺点包括：Disadvantages of existing technologies include:

（1）需要额外的编解码器缓存，增加了编解码器的空间复杂度(1) An additional codec cache is required, which increases the space complexity of the codec

（2）计算复杂度较高，增加了编解码器的时间复杂度(2) The computational complexity is high, which increases the time complexity of the codec

发明内容Contents of the invention

本发明的目的旨在至少解决上述的技术缺陷之一。The object of the present invention is to solve at least one of the above-mentioned technical drawbacks.

为此，本发明的一个目的在于提出一种立体视频的联合预测编码方法。Therefore, an object of the present invention is to propose a joint predictive coding method for stereoscopic video.

本发明的另一目的在于提出一种立体视频的联合预测编码系统。Another object of the present invention is to propose a joint predictive coding system for stereoscopic video.

为达到上述目的，本发明一方面的实施例提出一种立体视频的联合预测编码方法，包括以下步骤：S1：输入立体视频并将所述立体视频分为多个编码宏块；S2：通过深度预测的方法预测当前编码宏块的深度预测视差，并根据所述深度预测视差对当前编码宏块进行深度辅助的视间预测编码；S3：通过视间匹配的方法获得视差向量，并根据所述视差向量对所述当前宏块进行传统视间预测编码；S4：通过时域运动估计的方法获得运动向量，并根据所述运动向量对所述当前编码宏块进行时域预测编码；S5：分别计算所述当前编码宏块在所述深度辅助的视间预测编码、传统视间预测编码和时域预测编码模式下的率失真性能；以及S6：选择率失真性能最优的预测编码模式作为当前编码宏块的预测模式并进行编码。In order to achieve the above purpose, an embodiment of the present invention proposes a joint predictive coding method for stereoscopic video, including the following steps: S1: input stereoscopic video and divide the stereoscopic video into multiple coded macroblocks; S2: pass depth The prediction method predicts the depth prediction disparity of the current coded macroblock, and performs depth-assisted inter-view prediction coding on the current coded macroblock according to the depth prediction disparity; S3: Obtain the disparity vector through the inter-view matching method, and according to the Perform traditional inter-view predictive coding on the current macroblock with the disparity vector; S4: Obtain a motion vector through time domain motion estimation, and perform time domain predictive coding on the current coded macroblock according to the motion vector; S5: Respectively Calculate the rate-distortion performance of the currently coded macroblock in the depth-assisted inter-view predictive coding, traditional inter-view predictive coding, and time-domain predictive coding modes; and S6: Select the predictive coding mode with the best rate-distortion performance as the current The prediction mode of the macroblock is encoded and encoded.

根据本发明实施例的方法，通过深度来估计编码宏块的视差来进行视间补偿预测，减少了立体视频编码中视差编码所需要的码率，同时提高了立体视频编码的效率。According to the method of the embodiment of the present invention, the disparity of coded macroblocks is estimated by depth to perform inter-view compensation prediction, which reduces the bit rate required for disparity coding in stereoscopic video coding and improves the efficiency of stereoscopic video coding.

本发明的一个实施例中，所述方法还包括：S7：判断所述所有编码宏块是否编码完成；S8：如果未完成，则对未编码宏块重复所述步骤S1-S5直至所有编码宏块均完成编码。In an embodiment of the present invention, the method further includes: S7: judging whether the encoding of all encoded macroblocks is completed; S8: if not, repeating the steps S1-S5 for unencoded macroblocks until all encoded macroblocks The blocks are all encoded.

本发明的一个实施例中，所述时域预测编码的率失真性能通过如下公式获得， $J_{MCP} (\overset{&RightArrow;}{m}, {ref}_{m}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{m}, {ref}_{m}) | + λ_{motion} (r_{m} + r_{h}),$ 其中，为运动向量，B_k为当前编码宏块，ref_m为所指向的参考帧，X为B_k中的每一个像素，I为X对应的亮度或者色度分量值，为所指向的参考帧中对应像素点的亮度或者色度分量值，λ_motion为时域预测的拉格朗日乘子，r_m为编码运动矢量所需要的编码码率，r_h为编码除运动矢量外其他宏块头信息所需要的码率。In an embodiment of the present invention, the rate-distortion performance of the time-domain predictive coding is obtained by the following formula, $J_{MCP} (\overset{&Right Arrow;}{m}, {ref}_{m}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{m}, {ref}_{m}) | + λ_{motion} (r_{m} + r_{h}),$ in, is the motion vector, B _k is the current coded macroblock, and ref _m is The reference frame pointed to, X is each pixel in B _k , I is the brightness or chrominance component value corresponding to X, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, λ _motion is the Lagrangian multiplier of time domain prediction, r _m is the encoding bit rate required for encoding the motion vector, r _h is the encoding except motion The code rate required by other macroblock header information other than the vector.

本发明的一个实施例中，所述传统视间预测编码的率失真性能通过如下公式获得， $J_{DCP} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}) | + λ_{motion} (r_{d} + r_{h}),$ 其中，为视间匹配所得视差，B_k为当前编码宏块，ref_d为所指向的参考帧，为所指向的参考帧中对应像素点的亮度或者色度分量值，X为B_k中的每一个像素，I为X对应的亮度或者色度分量值，λ_motion为传统视间预测的拉格朗日乘子，r_d为编码搜索视差矢量所需要的编码码率。In an embodiment of the present invention, the rate-distortion performance of the traditional inter-view predictive coding is obtained by the following formula, $J_{DCP} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}) | + λ_{motion} (r_{d} + r_{h}),$ in, is the disparity obtained by inter-view matching, B _k is the current coded macroblock, and ref _d is the frame of reference pointed to, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, X is each pixel in B _k , I is the luminance or chrominance component value corresponding to X, and λ _motion is the Lagrang of traditional inter-view prediction day multiplier, rd is the encoding bit rate required for encoding and _searching for the disparity vector.

本发明的一个实施例中，所述深度辅助的视间预测编码的率失真性能通过如下公式获得， $J_{DADCP} (\overset{&RightArrow;}{d_{z}}, {ref}_{z}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{d_{z}}, {ref}_{z}) | + λ_{motion} \times r_{h}^{'},$ 其中，为深度计算视差，B_k为当前编码宏块，ref_z为所指向的参考帧，为所指向的参考帧中对应像素点的亮度或者色度分量值，X为B_k中的每一个像素，I为X对应的亮度或者色度分量值，λ_motion为深度辅助的视间预测的拉格朗日乘子，r_h′为基于深度预测视差的视差补偿预测模式下，编码宏块头信息所需要的码率。In an embodiment of the present invention, the rate-distortion performance of the depth-assisted inter-view predictive coding is obtained by the following formula, $J_{DADCP} (\overset{&Right Arrow;}{d_{z}}, {ref}_{z}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{d_{z}}, {ref}_{z}) | + λ_{motion} \times r_{h}^{'},$ in, Disparity is calculated for depth, B _k is the current coded macroblock, and ref _z is the frame of reference pointed to, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, X is each pixel in B _k , I is the luminance or chrominance component value corresponding to X, and λ _motion is the pull of depth-assisted inter-view prediction Grangian multiplier, r _h ′ is the code rate required for encoding the macroblock header information in the disparity compensation prediction mode based on the depth prediction disparity.

为达到上述目的，本发明的实施例另一方面提出一种立体视频的联合预测编码系统，包括：划分模块，用于输入立体视频并将所述立体视频分为多个编码宏块；第一预测模块，用于通过深度预测的方法预测当前编码宏块的深度预测视差，并根据所述深度预测视差对当前编码宏块进行深度辅助的视间预测编码；第二预测模块，用于对所述当前宏块进行传统视间预测编码；第三预测模块，用于对所述当前编码宏块进行时域预测编码；计算模块，用于分别计算所述当前编码宏块在所述深度辅助的视间预测编码、传统视间预测编码和时域预测编码模式下的率失真性能；以及选择模块，用于选择率失真性能最优的预测编码模式作为当前编码宏块的预测模式并进行编码。In order to achieve the above object, another embodiment of the present invention proposes a joint predictive coding system for stereoscopic video, including: a division module for inputting stereoscopic video and dividing the stereoscopic video into multiple coded macroblocks; The prediction module is used to predict the depth prediction disparity of the currently coded macroblock through the method of depth prediction, and perform depth-assisted inter-view prediction coding on the current coded macroblock according to the depth prediction disparity; the second prediction module is used to encode the current coded macroblock performing traditional inter-view predictive encoding on the current macroblock; a third prediction module, configured to perform time-domain predictive encoding on the currently encoded macroblock; and a calculation module, configured to calculate the depth-assisted Rate-distortion performance in inter-view predictive coding, traditional inter-view predictive coding and time-domain predictive coding modes; and a selection module, used to select the predictive coding mode with the best rate-distortion performance as the prediction mode of the currently coded macroblock and perform coding.

根据本发明实施例的系统，通过深度来估计编码宏块的视差来进行视间补偿预测，减少了立体视频编码中视差编码所需要的码率，同时提高了立体视频编码的效率。According to the system of the embodiment of the present invention, the disparity of coded macroblocks is estimated by depth to perform inter-view compensation prediction, which reduces the bit rate required for disparity coding in stereoscopic video coding and improves the efficiency of stereoscopic video coding.

本发明的一个实施例中，所述系统还包括：判断模块，用于判断所述所有编码宏块是否编码完成；处理模块，用于当编码未完成时，则重复使用划分模块、第一预测模块、第二预测模块、第三预测模块、计算模块和选择模块直至所有编码宏块均完成编码。In an embodiment of the present invention, the system further includes: a judging module, used to judge whether the coding of all coded macroblocks is completed; a processing module, used to reuse the division module, the first prediction module, the second prediction module, the third prediction module, the calculation module and the selection module until all coded macroblocks are coded.

本发明的一个实施例中，所述传统视间预测编码的率失真性能通过如下公式获得， $J_{DCP} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}) | + λ_{motion} \times (r_{d} + r_{h}),$ 其中，为视间匹配所得视差，B_k为当前编码宏块，ref_d为所指向的参考帧，为所指向的参考帧中对应像素点的亮度或者色度分量值，X为B_k中的每一个像素，I为X对应的亮度或者色度分量值，λ_motion为传统视间预测的拉格朗日乘子，r_d为编码搜索视差矢量所需要的编码码率。In an embodiment of the present invention, the rate-distortion performance of the traditional inter-view predictive coding is obtained by the following formula, $J_{DCP} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}) | + λ_{motion} \times (r_{d} + r_{h}),$ in, is the disparity obtained by inter-view matching, B _k is the current coded macroblock, and ref _d is the frame of reference pointed to, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, X is each pixel in B _k , I is the luminance or chrominance component value corresponding to X, and λ _motion is the Lagrang of traditional inter-view prediction day multiplier, rd is the encoding bit rate required for encoding and _searching for the disparity vector.

本发明附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

附图说明Description of drawings

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

图1为根据本发明一个实施例的立体视频的联合预测编码方法的流程图；FIG. 1 is a flowchart of a joint predictive coding method for stereoscopic video according to an embodiment of the present invention;

图2为根据本发明一个实施例的虚拟视点绘制原理图；Fig. 2 is a schematic diagram of drawing a virtual viewpoint according to an embodiment of the present invention;

图3为根据本发明一个实施例的编码预测结构示意图；以及FIG. 3 is a schematic diagram of a coding prediction structure according to an embodiment of the present invention; and

图4为根据本发明一个实施例的立体视频的联合预测编码系统的结构框图。Fig. 4 is a structural block diagram of a joint predictive coding system for stereoscopic video according to an embodiment of the present invention.

具体实施方式detailed description

下面详细描述本发明的实施例，实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能解释为对本发明的限制。Embodiments of the present invention are described in detail below, and examples of the embodiments are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

图1为根据本发明一个实施例的立体视频的联合预测编码方法的流程图。如图1所示，根据本发明实施例的立体视频的联合预测编码方法，包括以下步骤：FIG. 1 is a flowchart of a joint predictive coding method for stereoscopic video according to an embodiment of the present invention. As shown in FIG. 1, the joint predictive coding method for stereoscopic video according to an embodiment of the present invention includes the following steps:

步骤S101，输入立体视频并将立体视频分为多个编码宏块。Step S101, input a stereoscopic video and divide the stereoscopic video into a plurality of coded macroblocks.

具体地，输入立体视频并对其进行校正、对齐等前期处理，再将处理后的立体视频分为多个编码宏块。Specifically, the stereoscopic video is input, pre-processing such as correction and alignment is performed on it, and then the processed stereoscopic video is divided into multiple coded macroblocks.

步骤S102，通过深度预测的方法预测当前编码宏块的深度预测视差，并根据深度预测视差对当前编码宏块进行深度辅助的视间预测编码。Step S102 , predicting the depth prediction disparity of the currently coded macroblock through a depth prediction method, and performing depth-assisted inter-view prediction coding on the current coded macroblock according to the depth prediction disparity.

具体地，假设立体视频序列中仅包含左右视点的视频和深度图序列。左右视点的基线距离为c，左右视点的相机焦距均为f。当前编码宏块为B_k。B_k包含有n_j个像素点，每个像素点对应的深度值为通过每个像素点的深度值来预测当前编码宏块B_k的深度预测视差。设当前编码宏块B_k的深度值为B_k所包含的所有像素点对应的深度值的最大似然值，z_k可以表示为 $z_{k} = \underset{z_{k}^{j}}{\arg \max} prob ({z_{k}^{j} | j = 1,2, . . ., n_{B_{k}}}),$ 其中，为每个像素点的深度值。Specifically, it is assumed that a stereoscopic video sequence only contains videos and depth map sequences of left and right viewpoints. The baseline distance between the left and right viewpoints is c, and the camera focal lengths of the left and right viewpoints are both f. The currently coded macroblock is B _k . B _k contains n _j pixels, and the depth value corresponding to each pixel is The depth prediction disparity of the currently coded macroblock B _k is predicted through the depth value of each pixel. Assuming that the depth value of the current coded macroblock B _k is the maximum likelihood value of the depth values corresponding to all pixels contained in B _k , z _k can be expressed as $z_{k} = \underset{z_{k}^{j}}{\arg \max} prob ({z_{k}^{j} | j = 1,2, . . ., {no}_{B_{k}}}),$ in, is the depth value of each pixel.

图2为根据本发明一个实施例的虚拟视点绘制原理图。如图2所示，在获得B_k所对应的深度值之后，可以通过深度与视差之间的映射关系来计算当前编码宏块的视差。当前编码宏块的预测视差可以表示为，其中，d_k为计算视差，f为焦距，c为左右视点的基线距离。对于四分之一像素精度的编码模式，将d_k取整到最临近的四分之一像素点位置作为当前编码宏块的深度预测视差。Fig. 2 is a schematic diagram of virtual viewpoint rendering according to an embodiment of the present invention. As shown in FIG. 2 , after the depth value corresponding to B _k is obtained, the disparity of the currently coded macroblock can be calculated through the mapping relationship between depth and disparity. The predicted disparity of the currently coded macroblock can be expressed as, Among them, d _k is the calculated disparity, f is the focal length, and c is the baseline distance between the left and right viewpoints. For a coding mode with quarter-pixel precision, d _k is rounded to the nearest quarter-pixel point position as the depth prediction disparity of the current coded macroblock.

步骤S103，通过视间匹配的方法获得视差向量，并根据视差向量对当前宏块进行传统视间预测编码。In step S103, a disparity vector is obtained by means of inter-view matching, and conventional inter-view predictive coding is performed on the current macroblock according to the disparity vector.

步骤S104，通过时域运动估计的方法获得运动向量，并根据运动向量对当前编码宏块进行时域预测编码。In step S104, a motion vector is obtained by means of time-domain motion estimation, and time-domain predictive coding is performed on the currently coded macroblock according to the motion vector.

步骤S105，分别计算当前编码宏块在深度辅助的视间预测编码、传统视间预测编码和时域预测编码模式下的率失真性能Step S105, respectively calculate the rate-distortion performance of the current coded macroblock in depth-assisted inter-view predictive coding, traditional inter-view predictive coding and time-domain predictive coding modes

具体地，编码器将计算不同预测模式下的率失真性能。设当前编码宏块B_k的运动向量为搜索视差为深度预测视差为 Specifically, the encoder will calculate the rate-distortion performance in different prediction modes. Let the motion vector of the current coded macroblock B _k be Search parallax for The depth prediction disparity is

当前宏块搜索视差的时域预测编码的率失真性能通过如下公式获得， $J_{MCP} (\overset{&RightArrow;}{m}, {ref}_{m}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{m}, {ref}_{m}) | + λ_{motion} (r_{m} + r_{h}),$ 其中，为运动向量，B_k为当前编码宏块，ref_m为所指向的参考帧，X为B_k中的每一个像素，I为X对应的亮度或者色度分量值，为所指向的参考帧中对应像素点的亮度或者色度分量值，λ_motion为时域预测的拉格朗日乘子，r_m为编码运动矢量所需要的编码码率，r_h为编码除运动矢量外其他宏块头信息所需要的码率。The rate-distortion performance of the time-domain predictive coding of the current macroblock search disparity is obtained by the following formula, $J_{MCP} (\overset{&Right Arrow;}{m}, {ref}_{m}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{m}, {ref}_{m}) | + λ_{motion} (r_{m} + r_{h}),$ in, is the motion vector, B _k is the current coded macroblock, and ref _m is The reference frame pointed to, X is each pixel in B _k , I is the brightness or chrominance component value corresponding to X, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, λ _motion is the Lagrangian multiplier of time domain prediction, r _m is the encoding bit rate required for encoding the motion vector, r _h is the encoding except motion The code rate required by other macroblock header information other than the vector.

当前宏块搜索视差的传统视间预测编码的率失真性能通过如下公式获得， $J_{DCP} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}) | + λ_{motion} \times (r_{d} + r_{h}),$ 其中，为视间匹配所得视差，B_k为当前编码宏块，ref_d为所指向的参考帧，为所指向的参考帧中对应像素点的亮度或者色度分量值，X为B_k中的每一个像素，I为X对应的亮度或者色度分量值，λ_motion为传统视间预测的拉格朗日乘子，r_d为编码搜索视差矢量所需要的编码码率。The rate-distortion performance of the traditional inter-view predictive coding of the current macroblock search disparity is obtained by the following formula, $J_{DCP} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}) | + λ_{motion} \times (r_{d} + r_{h}),$ in, is the disparity obtained by inter-view matching, B _k is the current coded macroblock, and ref _d is the frame of reference pointed to, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, X is each pixel in B _k , I is the luminance or chrominance component value corresponding to X, and λ _motion is the Lagrang of traditional inter-view prediction day multiplier, rd is the encoding bit rate required for encoding and _searching for the disparity vector.

在立体视频中，深度信息可以视为视频编码的边信息。因此，我们可以假设编码端和解码端能够同时获得相同的重构深度图。从而深度预测视差不需要编入码流当中。因此，当前宏块通过深度预测视差进行深度辅助的视间预测编码所对应的率失真性能可以表示为， $J_{DADCP} (\overset{&RightArrow;}{d_{z}}, {ref}_{z}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{d_{z}}, {ref}_{z}) | + λ_{motion} \times r_{h}^{'},$ 其中，为深度计算视差，B_k为当前编码宏块，ref_z为所指向的参考帧，为所指向的参考帧中对应像素点的亮度或者色度分量值，X为B_k中的每一个像素，I为X对应的亮度或者色度分量值，λ_motion为深度辅助的视间预测的拉格朗日乘子，r_h′为基于深度预测视差的视差补偿预测模式下，编码宏块头信息所需要的码率。In stereoscopic video, depth information can be regarded as side information for video coding. Therefore, we can assume that the encoder and decoder can simultaneously obtain the same reconstructed depth map. Therefore, the depth prediction disparity does not need to be encoded into the code stream. Therefore, the rate-distortion performance corresponding to the depth-assisted inter-view prediction coding of the current macroblock through the depth prediction parallax can be expressed as, $J_{DADCP} (\overset{&Right Arrow;}{d_{z}}, {ref}_{z}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{d_{z}}, {ref}_{z}) | + λ_{motion} \times r_{h}^{'},$ in, Disparity is calculated for depth, B _k is the current coded macroblock, and ref _z is the frame of reference pointed to, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, X is each pixel in B _k , I is the luminance or chrominance component value corresponding to X, and λ _motion is the pull of depth-assisted inter-view prediction Grangian multiplier, r _h ′ is the code rate required for encoding the macroblock header information in the disparity compensation prediction mode based on the depth prediction disparity.

步骤S106，选择率失真性能最小的率失真性能对应的预测模式作为当前编码宏块的预测模式并进行编码。Step S106, selecting the prediction mode corresponding to the rate-distortion performance with the smallest rate-distortion performance as the prediction mode of the currently coded macroblock and performing coding.

具体地，编码器将选择率失真最优的预测模式作为当前编码宏块的预测模式。其选择过程可以表示为， $J = \min (J_{MCP} (\overset{&RightArrow;}{m}, {ref}_{m}), J_{DCP} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}), J_{DADCP} (\overset{&RightArrow;}{d_{z}}, {ref}_{z})),$ 其中，和分别表示时域预测的率失真性能、传统视间预测的率失真性能和深度辅助的视间预测的率失真性能。Specifically, the encoder selects the prediction mode with the best rate-distortion as the prediction mode of the currently coded macroblock. Its selection process can be expressed as, $J = \min (J_{MCP} (\overset{&Right Arrow;}{m}, {ref}_{m}), J_{DCP} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}), J_{DADCP} (\overset{&Right Arrow;}{d_{z}}, {ref}_{z})),$ in, and Denote the rate-distortion performance of temporal prediction, the rate-distortion performance of traditional inter-view prediction, and the rate-distortion performance of depth-assisted inter-view prediction, respectively.

在本发明的一个实施例中，对立体视频编码的视频序列采用标清格式的名字为“Book Arrival”的标准测试视频序列，且该标清格式视频序列的像素为1024×768。解码器采用H.264/AVC（Multi-view Video Coding，多视点视频扩展版本）标准的参考软件JMVC（Joint Multi-view Video Coding，多视点视频编码），编码器GOP（Group of Pictures，图像组）的帧数为8，编码的时域预测编码采用Hierarchical B（层次化双向预测编码帧，简称层次化B帧）的预测结构，图3为根据本发明一个实施例的编码预测结构示意图。如图3所示，虚拟视点绘制采用与虚拟视点相邻的两路彩色视频和深度图来绘制。本实施样例采用“Book Arrival”序列的视点10和视点8这两路视频作为多视点视频输入序列，其中视点10称为左参考视点，视点8称为右参考视点。多视点视频和多视点深度图编码量化参数QP的取值范围为0到51之间的整数。左右视点的基线距离为10，相机的焦距为100。In one embodiment of the present invention, a standard test video sequence named "Book Arrival" in standard definition format is used for the video sequence of stereoscopic video encoding, and the pixels of the video sequence in standard definition format are 1024×768. The decoder adopts H.264/AVC (Multi-view Video Coding, multi-view video extension version) standard reference software JMVC (Joint Multi-view Video Coding, multi-view video coding), encoder GOP (Group of Pictures, group of pictures) ) frame number is 8, and the time-domain predictive coding of coding adopts the prediction structure of Hierarchical B (hierarchical bidirectional predictive coding frame, referred to as hierarchical B frame). FIG. 3 is a schematic diagram of the coding prediction structure according to an embodiment of the present invention. As shown in Figure 3, the virtual view point is drawn using two channels of color video and a depth map adjacent to the virtual view point. In this implementation example, two videos of viewpoint 10 and viewpoint 8 of the "Book Arrival" sequence are used as the multi-view video input sequence, wherein viewpoint 10 is called the left reference viewpoint, and viewpoint 8 is called the right reference viewpoint. The value range of the multi-view video and multi-view depth map coding quantization parameter QP is an integer between 0 and 51. The baseline distance of the left and right viewpoints is 10, and the focal length of the camera is 100.

设当前编码宏块B_k为“Book Arrival”序列的视点8视频中一帧中的一个8×8的宏块。其对应的深度值如下述8×8矩阵所示。Assume that the currently coded macroblock B _k is an 8×8 macroblock in one frame of view 8 video in the “Book Arrival” sequence. Its corresponding depth value is shown in the following 8×8 matrix.

$z_{B_{k}} = [\begin{matrix} 62 & 63 & 62 & 61 & 63 & 58 & 62 & 63 \\ 61 & 62 & 62 & 63 & 61 & 64 & 65 & 64 \\ 62 & 61 & 57 & 61 & 63 & 59 & 63 & 63 \\ 67 & 61 & 58 & 62 & 61 & 61 & 66 & 64 \\ 63 & 62 & 62 & 58 & 60 & 62 & 63 & 62 \\ 61 & 62 & 61 & 61 & 61 & 63 & 60 & 58 \\ 62 & 62 & 61 & 62 & 62 & 62 & 60 & 62 \\ 64 & 63 & 61 & 62 & 62 & 62 & 60 & 61 \end{matrix}] .$ 对于当前编码宏块B_k，其对应的深度值为B_k所包含的所有像素点对应的深度值的最大似然值z_k为， $z_{k} = \underset{z_{k}^{j}}{\arg \max} prob ({z_{k}^{j} | j = 1,2, . . ., n_{B_{k}}}) = 62 .$ 在获得当前编码宏块B_k的深度值后，当前编码宏块的预测视差为，对于四分之一像素精度的编码模式，将d_k取整到最临近的像素点后，其视差应为，d_k′=[d_k]=16.25。编码器再进行基于预测的视差信息进行传统视间预测。对于当前编码宏块，其预测视差为16.25。编码器在视点10的对应帧中找到相应的参考宏块，进行预测。设预测残差绝对值之和为50。此外，编码器还将对当前宏块进行补偿预测，即时域预测和传统视间预测。在时域预测中，不妨设当前宏块的运动向量为32，时域预测的残差的绝对值之和为80。在视间的传统视间预测中，设编码器通过块匹配搜索得到的视差为16，传统视间预测的残差的绝对值之和为45。 $z_{B_{k}} = [\begin{matrix} 62 & 63 & 62 & 61 & 63 & 58 & 62 & 63 \\ 61 & 62 & 62 & 63 & 61 & 64 & 65 & 64 \\ 62 & 61 & 57 & 61 & 63 & 59 & 63 & 63 \\ 67 & 61 & 58 & 62 & 61 & 61 & 66 & 64 \\ 63 & 62 & 62 & 58 & 60 & 62 & 63 & 62 \\ 61 & 62 & 61 & 61 & 61 & 63 & 60 & 58 \\ 62 & 62 & 61 & 62 & 62 & 62 & 60 & 62 \\ 64 & 63 & 61 & 62 & 62 & 62 & 60 & 61 \end{matrix}] .$ For the current coded macroblock B _k , the maximum likelihood value z _k of the corresponding depth values of all pixels contained in B _k is, $z_{k} = \underset{z_{k}^{j}}{\arg \max} prob ({z_{k}^{j} | j = 1,2, . . ., {no}_{B_{k}}}) = 62 .$ After obtaining the depth value of the currently coded macroblock B _k , the predicted disparity of the current coded macroblock is, For the encoding mode with quarter-pixel precision, after d _k is rounded to the nearest pixel, the disparity should be, d _k ′=[d _k ]=16.25. The encoder then performs traditional inter-view prediction based on the predicted disparity information. For the currently coded macroblock, its predicted disparity is 16.25. The encoder finds the corresponding reference macroblock in the corresponding frame of the viewpoint 10, and performs prediction. Let the sum of the absolute values of the forecast residuals be 50. In addition, the encoder will also perform compensation prediction, instant domain prediction and traditional inter-view prediction on the current macroblock. In the time-domain prediction, it is advisable to set the motion vector of the current macroblock as 32, and the sum of the absolute values of the residuals of the time-domain prediction as 80. In the traditional inter-view prediction of inter-view, it is assumed that the disparity obtained by the encoder through block matching search is 16, and the sum of the absolute values of the residuals of the traditional inter-view prediction is 45.

然后，编码器将比较不同的宏块间预测的率失真性能。设编码当前宏块B_k的运动向量所需要的比特数为r_m=10，编码B_k的块匹配搜索所得视差所需要的比特数为r_d=8，编码B_k的头信息所需要的比特数为r_h=20。那么，在基于深度预测视差的传统视间预测中，编码B_k的头信息所需要的比特数为r_h′=21。额外的一个比特用于标识当前宏块采用基于深度预测视差进行传统视间预测。在率失真优化过程中，设拉格朗日乘子λ_motion的取值为1.5。The encoder will then compare the rate-distortion performance of different inter-macroblock predictions. Assume that the number of bits required to encode the motion vector of the current macroblock B _k is rm = 10, the number of bits required to encode the disparity obtained by the block matching search of B _k is r _d = 8, and the number of bits required to encode the header information of _B _k The number of bits is r _h =20. Then, in the traditional inter-view prediction based on the depth prediction disparity, the number of bits required to encode the header information of B _k is r _h ′=21. An additional bit is used to identify that the current macroblock adopts traditional inter-view prediction based on depth prediction disparity. In the rate-distortion optimization process, the value of the Lagrangian multiplier λ _motion is set to 1.5.

因此，对于宏块B_k，其时域预测的率失真性能为， $J_{MCP} (\overset{&RightArrow;}{m}, {ref}_{m}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{m}, {ref}_{m}) | + λ_{motion} (r_{m} + r_{h}) = 80 + 1.5 \times (10 + 20) = 125 .$ Therefore, for a macroblock B _k , the rate-distortion performance of its time-domain prediction is, $J_{MCP} (\overset{&Right Arrow;}{m}, {ref}_{m}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{m}, {ref}_{m}) | + λ_{motion} (r_{m} + r_{h}) = 80 + 1.5 \times (10 + 20) = 125 .$

B_k的传统视间预测的率失真性能为， $J_{DCP} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}) | + λ_{motion} \times (r_{d} + r_{h}) = 45 + 1.5 \times (8 + 20) = 87$ The rate-distortion performance of conventional inter-view prediction for _Bk is, $J_{DCP} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}) | + λ_{motion} \times (r_{d} + r_{h}) = 45 + 1.5 \times (8 + 20) = 87$

当采用深度预测视差进行预测编码时，B_k的深度辅助的视间预测编码的率失真性能为， $J_{DADCP} (\overset{&RightArrow;}{d_{z}}, {ref}_{z}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{d_{z}}, {ref}_{z}) | + λ_{motion} \times r_{h}^{'} = 50 + 1.5 \times 21 = 81.5 .$ When using depth-predicted disparity for predictive coding, the rate-distortion performance of depth-assisted inter-view predictive coding of _Bk is, $J_{DADCP} (\overset{&Right Arrow;}{d_{z}}, {ref}_{z}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{d_{z}}, {ref}_{z}) | + λ_{motion} \times r_{h}^{'} = 50 + 1.5 \times twenty one = 81.5 .$

之后，编码器通过比较不同预测模式下率失真性能来选择最优的帧间预测编码模式。对于当前宏块B_k， $\begin{matrix} J = \min (J_{MCP} (\overset{&RightArrow;}{m}, {ref}_{m}), J_{DCP} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}), J_{DADCP} (\overset{&RightArrow;}{d_{z}}, {ref}_{z})) \\ = \min (125,87,81.5) \\ = 81.5 \end{matrix} .$ 因此，其最优的帧间预测编码模式为深度辅助的视间预测编码。在获得最优的帧间预测模式后，编码器将进行第二次率失真优化选择。编码器将进一步比较帧间预测模式与帧内预测模式的率失真性能，最终选择率失真最优的模式对当前宏块进行编码。Afterwards, the encoder selects the optimal inter prediction coding mode by comparing the rate-distortion performance of different prediction modes. For the current macroblock B _k , $\begin{matrix} J = \min (J_{MCP} (\overset{&Right Arrow;}{m}, {ref}_{m}), J_{DCP} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}), J_{DADCP} (\overset{&Right Arrow;}{d_{z}}, {ref}_{z})) \\ = \min (125,87,81.5) \\ = 81.5 \end{matrix} .$ Therefore, the optimal inter-frame predictive coding mode is depth-assisted inter-view predictive coding. After obtaining the optimal inter prediction mode, the encoder will perform a second rate-distortion optimization selection. The encoder will further compare the rate-distortion performance of the inter-frame prediction mode and the intra-frame prediction mode, and finally select the mode with the best rate-distortion to encode the current macroblock.

图4为根据本发明一个实施例的立体视频的联合预测编码系统的结构框图。如图4所示，立体视频的联合预测编码系统包括划分模块100、第一预测模块200、第二预测模块300、第三预测模块400、计算模块500和选择模块600。Fig. 4 is a structural block diagram of a joint predictive coding system for stereoscopic video according to an embodiment of the present invention. As shown in FIG. 4 , the joint predictive coding system for stereoscopic video includes a division module 100 , a first prediction module 200 , a second prediction module 300 , a third prediction module 400 , a calculation module 500 and a selection module 600 .

划分模块100用于输入立体视频并将立体视频分为多个编码宏块。The dividing module 100 is used for inputting stereoscopic video and dividing the stereoscopic video into multiple coded macroblocks.

第一预测模块200用于通过深度预测的方法预测当前编码宏块的深度预测视差，并根据深度预测视差对当前编码宏块进行深度辅助的视间预测编码。The first prediction module 200 is configured to predict the depth prediction disparity of the currently coded macroblock through a depth prediction method, and perform depth-assisted inter-view prediction coding on the current coded macroblock according to the depth prediction disparity.

具体地，假设立体视频序列中仅包含左右视点的视频和深度图序列。左右视点的基线距离为c，左右视点的相机焦距均为f。当前编码宏块为B_k。B_k包含有n_j个像素点，每个像素点对应的深度值为通过每个像素点的深度值来预测当前编码宏块B_k的深度预测视差。设当前编码宏块B_k的深度值为B_k所包含的所有像素点对应的深度值的最大似然值，z_k可以表示为其中，为每个像素点的深度值。Specifically, it is assumed that a stereoscopic video sequence only contains videos and depth map sequences of left and right viewpoints. The baseline distance between the left and right viewpoints is c, and the camera focal lengths of the left and right viewpoints are both f. The currently coded macroblock is B _k . B _k contains n _j pixels, and the depth value corresponding to each pixel is The depth prediction disparity of the currently coded macroblock B _k is predicted through the depth value of each pixel. Assuming that the depth value of the current coded macroblock B _k is the maximum likelihood value of the depth values corresponding to all pixels contained in B _k , z _k can be expressed as in, is the depth value of each pixel.

第二预测模块300用于通过视间匹配的方法获得视差向量，并根据视差向量对当前宏块进行传统视间预测编码。The second prediction module 300 is configured to obtain a disparity vector through an inter-view matching method, and perform traditional inter-view predictive coding on the current macroblock according to the disparity vector.

第三预测模块400用于通过时域运动估计的方法获得运动向量，并根据运动向量对当前编码宏块进行时域预测编码。The third prediction module 400 is configured to obtain a motion vector through a time-domain motion estimation method, and perform time-domain predictive coding on the currently coded macroblock according to the motion vector.

计算模块500用于计算当前编码宏块在视间预测和补偿预测的多个率失真性能。The calculation module 500 is used to calculate multiple rate-distortion performances of the current coded macroblock in inter-view prediction and compensation prediction.

当前宏块的运动补偿预测率失真性能可以表示为，运动补偿预测的率失真性能通过如下公式获得， $J_{MCP} (\overset{&RightArrow;}{m}, {ref}_{m}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{m}, {ref}_{m}) | + λ_{motion} (r_{m} + r_{h}),$ 其中，为运动向量，B_k为当前编码宏块，ref_m为所指向的参考帧，X为B_k中的每一个像素，I为X对应的亮度或者色度分量值，为所指向的参考帧中对应像素点的亮度或者色度分量值，λ_motion为时域预测的拉格朗日乘子，r_m为编码运动矢量所需要的编码码率，r_h为编码除运动矢量外其他宏块头信息所需要的码率。The rate-distortion performance of the motion compensation prediction of the current macroblock can be expressed as, the rate-distortion performance of the motion compensation prediction is obtained by the following formula, $J_{MCP} (\overset{&Right Arrow;}{m}, {ref}_{m}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{m}, {ref}_{m}) | + λ_{motion} (r_{m} + r_{h}),$ in, is the motion vector, B _k is the current coded macroblock, and ref _m is The reference frame pointed to, X is each pixel in B _k , I is the brightness or chrominance component value corresponding to X, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, λ _motion is the Lagrangian multiplier of time domain prediction, r _m is the encoding bit rate required for encoding the motion vector, r _h is the encoding except motion The code rate required by other macroblock header information other than the vector.

当前宏块的搜索视差补偿预测的率失真性能通过如下公式获得， $J_{DCP} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}) | + λ_{motion} (r_{d} + r_{h}),$ 其中，为视间匹配所得视差，B_k为当前编码宏块，ref_d为所指向的参考帧，为所指向的参考帧中对应像素点的亮度或者色度分量值，X为B_k中的每一个像素，I为X对应的亮度或者色度分量值，λ_motion为传统视间预测的拉格朗日乘子，r_d为编码搜索视差矢量所需要的编码码率。The rate-distortion performance of the search parallax compensation prediction of the current macroblock is obtained by the following formula, $J_{DCP} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}) | + λ_{motion} (r_{d} + r_{h}),$ in, is the disparity obtained by inter-view matching, B _k is the current coded macroblock, and ref _d is the frame of reference pointed to, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, X is each pixel in B _k , I is the luminance or chrominance component value corresponding to X, and λ _motion is the Lagrang of traditional inter-view prediction day multiplier, rd is the encoding bit rate required for encoding and _searching for the disparity vector.

在立体视频中，深度信息可以视为视频编码的边信息。因此，我们可以假设编码端和解码端能够同时获得相同的重构深度图。从而深度预测视差不需要编入码流当中。因此，当前宏块通过深度预测视差进行视差补偿预测所对应的率失真性能可以表示为， $J_{DADCP} (\overset{&RightArrow;}{d_{z}}, {ref}_{z}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{d_{z}}, {ref}_{z}) | + λ_{motion} \times r_{h}^{'},$ 其中，为深度计算视差，B_k为当前编码宏块，ref_z为所指向的参考帧，为所指向的参考帧中对应像素点的亮度或者色度分量值，X为B_k中的每一个像素，I为X对应的亮度或者色度分量值，λ_motion为深度辅助的视间预测的拉格朗日乘子，r_h′为基于深度预测视差的视差补偿预测模式下，编码宏块头信息所需要的码率。In stereoscopic video, depth information can be regarded as side information for video coding. Therefore, we can assume that the encoder and decoder can simultaneously obtain the same reconstructed depth map. Therefore, the depth prediction disparity does not need to be encoded into the code stream. Therefore, the rate-distortion performance corresponding to the parallax compensation prediction of the current macroblock through the depth prediction parallax can be expressed as, $J_{DADCP} (\overset{&Right Arrow;}{d_{z}}, {ref}_{z}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{d_{z}}, {ref}_{z}) | + λ_{motion} \times r_{h}^{'},$ in, Disparity is calculated for depth, B _k is the current coded macroblock, and ref _z is the frame of reference pointed to, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, X is each pixel in B _k , I is the luminance or chrominance component value corresponding to X, and λ _motion is the pull of depth-assisted inter-view prediction Grangian multiplier, r _h ′ is the code rate required for encoding the macroblock header information in the disparity compensation prediction mode based on the depth prediction disparity.

选择模块600用于选择率失真性能最小的率失真性能对应的预测模式作为当前编码宏块的预测模式并进行编码。The selection module 600 is configured to select the prediction mode corresponding to the rate-distortion performance with the smallest rate-distortion performance as the prediction mode of the currently coded macroblock and perform coding.

在本发明的一个实施例中，设当前编码宏块B_k为“Book Arrival”序列的视点8视频中一帧中的一个8×8的宏块。其对应的深度值如下述8×8矩阵所示。In one embodiment of the present invention, the current coded macroblock B _k is assumed to be an 8×8 macroblock in one frame of view 8 video in the “Book Arrival” sequence. Its corresponding depth value is shown in the following 8×8 matrix.

$z_{B_{k}} = [\begin{matrix} 62 & 63 & 62 & 61 & 63 & 58 & 62 & 63 \\ 61 & 62 & 62 & 63 & 61 & 64 & 65 & 64 \\ 62 & 61 & 57 & 61 & 63 & 59 & 63 & 63 \\ 67 & 61 & 58 & 62 & 61 & 61 & 66 & 64 \\ 63 & 62 & 62 & 58 & 60 & 62 & 63 & 62 \\ 61 & 62 & 61 & 61 & 61 & 63 & 60 & 58 \\ 62 & 62 & 61 & 62 & 62 & 62 & 60 & 62 \\ 64 & 63 & 61 & 62 & 62 & 62 & 60 & 61 \end{matrix}] .$ 对于当前编码宏块B_k，其对应的深度值为B_k所包含的所有像素点对应的深度值的最大似然值zk为， $z_{k} = \underset{z_{k}^{j}}{\arg \max} prob ({z_{k}^{j} | j = 1,2, . . ., n_{B_{k}}}) = 62 .$ 在获得当前编码宏块B_k的深度值后，当前编码宏块的预测视差为， $d_{k} = \frac{fc}{z_{k}} = \frac{100 \times 10}{62} = 16.13 .$ $z_{B_{k}} = [\begin{matrix} 62 & 63 & 62 & 61 & 63 & 58 & 62 & 63 \\ 61 & 62 & 62 & 63 & 61 & 64 & 65 & 64 \\ 62 & 61 & 57 & 61 & 63 & 59 & 63 & 63 \\ 67 & 61 & 58 & 62 & 61 & 61 & 66 & 64 \\ 63 & 62 & 62 & 58 & 60 & 62 & 63 & 62 \\ 61 & 62 & 61 & 61 & 61 & 63 & 60 & 58 \\ 62 & 62 & 61 & 62 & 62 & 62 & 60 & 62 \\ 64 & 63 & 61 & 62 & 62 & 62 & 60 & 61 \end{matrix}] .$ For the current coded macroblock B _k , the maximum likelihood value zk of the corresponding depth values of all pixels contained in B _k is, $z_{k} = \underset{z_{k}^{j}}{\arg \max} prob ({z_{k}^{j} | j = 1,2, . . ., {no}_{B_{k}}}) = 62 .$ After obtaining the depth value of the currently coded macroblock B _k , the predicted disparity of the current coded macroblock is, $d_{k} = \frac{fc}{z_{k}} = \frac{100 \times 10}{62} = 16.13 .$

对于四分之一像素精度的编码模式，将d_k取整到最临近的像素点后，其视差应为，d_k′=[d_k]=16.25。编码器再进行基于预测的视差信息进行传统视间预测。对于当前编码宏块，其预测视差为16.25。编码器在视点10的对应帧中找到相应的参考宏块，进行预测。设预测残差绝对值之和为50。此外，编码器还将对当前宏块进行补偿预测，即时域预测和传统视间预测。在时域预测中，不妨设当前宏块的运动向量为32，时域预测的残差的绝对值之和为80。在视间的传统视间预测中，设编码器通过块匹配搜索得到的视差为16，传统视间预测的残差的绝对值之和为45。For the encoding mode with quarter-pixel precision, after d _k is rounded to the nearest pixel, the disparity should be, d _k ′=[d _k ]=16.25. The encoder then performs traditional inter-view prediction based on the predicted disparity information. For the currently coded macroblock, its predicted disparity is 16.25. The encoder finds the corresponding reference macroblock in the corresponding frame of the viewpoint 10, and performs prediction. Let the sum of the absolute values of the forecast residuals be 50. In addition, the encoder will perform compensation prediction, instant domain prediction and traditional inter-view prediction on the current macroblock. In the time-domain prediction, it is advisable to set the motion vector of the current macroblock as 32, and the sum of the absolute values of the time-domain prediction residuals as 80. In the traditional inter-view prediction of inter-view, it is assumed that the disparity obtained by the encoder through block matching search is 16, and the sum of the absolute values of the residuals of the traditional inter-view prediction is 45.

在本发明的一个实施例中，计算模块500计算不同预测编码模式下的率失真性能。对于宏块B_k，其时域预测的率失真性能为， $J_{MCP} (\overset{&RightArrow;}{m}, {ref}_{m}) = \underset{X &Element; B_{k}}{Σ} | I - I_{p} (\overset{&RightArrow;}{m}, {ref}_{m}) | + λ_{motion} (r_{m} + r_{h}) = 80 + 1.5 \times (10 + 20) = 125 .$ In an embodiment of the present invention, the calculation module 500 calculates the rate-distortion performance in different predictive coding modes. For a macroblock B _k , the rate-distortion performance of its time-domain prediction is, $J_{MCP} (\overset{&Right Arrow;}{m}, {ref}_{m}) = \underset{x &Element; B_{k}}{Σ} | I - I_{p} (\overset{&Right Arrow;}{m}, {ref}_{m}) | + λ_{motion} (r_{m} + r_{h}) = 80 + 1.5 \times (10 + 20) = 125 .$

选择模块600通过编码器比较不同预测模式下率失真性能，并选择最优的预测编码模式。对于当前宏块B_k， $\begin{matrix} J = \min (J_{MCP} (\overset{&RightArrow;}{m}, {ref}_{m}), J_{DCP} (\overset{&RightArrow;}{d_{s}}, {ref}_{d}), J_{DADCP} (\overset{&RightArrow;}{d_{z}}, {ref}_{z})) \\ = \min (125,87,81.5) \\ = 81.5 \end{matrix} .$ 因此，其最优的帧间预测编码模式为深度辅助的视间预测编码。在获得最优的帧间预测编码模式后，编码器将进行第二次率失真优化选择。编码器将进一步比较帧间预测模式与帧内预测模式的率失真性能，最终选择率失真最优的模式对当前宏块进行编码。The selection module 600 compares the rate-distortion performance in different prediction modes through the encoder, and selects the optimal prediction coding mode. For the current macroblock B _k , $\begin{matrix} J = \min (J_{MCP} (\overset{&Right Arrow;}{m}, {ref}_{m}), J_{DCP} (\overset{&Right Arrow;}{d_{the s}}, {ref}_{d}), J_{DADCP} (\overset{&Right Arrow;}{d_{z}}, {ref}_{z})) \\ = \min (125,87,81.5) \\ = 81.5 \end{matrix} .$ Therefore, the optimal inter-frame predictive coding mode is depth-assisted inter-view predictive coding. After obtaining the optimal inter-frame prediction coding mode, the encoder will perform a second rate-distortion optimization selection. The encoder will further compare the rate-distortion performance of the inter-frame prediction mode and the intra-frame prediction mode, and finally select the mode with the best rate-distortion to encode the current macroblock.

尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在不脱离本发明的原理和宗旨的情况下在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it can be understood that the above embodiments are exemplary and cannot be construed as limitations to the present invention. Variations, modifications, substitutions, and modifications to the above-described embodiments are possible within the scope of the present invention.

Claims

1. a joint predictive coding method of stereoscopic video, is characterized in that, comprises the following steps:

S0: Encode the depth map sequence using a multi-view video coding method, and input the decoded depth map information as side information for stereoscopic video color image coding;

S1: Input a stereoscopic video color image sequence and divide each frame image in the stereoscopic video color image sequence into a plurality of coded macroblocks;

S2: Predict the depth prediction disparity of the currently coded macroblock by a depth prediction method, and perform depth-assisted inter-view prediction coding on the current coded macroblock according to the depth prediction disparity;

S3: Obtain the disparity vector of the currently coded macroblock by means of inter-view matching, and perform traditional inter-view predictive coding on the current macroblock according to the disparity vector;

S4: Obtain the motion vector of the currently coded macroblock by means of time domain motion estimation, and perform time domain predictive coding on the current coded macroblock according to the motion vector;

S5: Calculate the rate-distortion performance of the currently coded macroblock in the depth-assisted inter-view predictive coding, traditional inter-view predictive coding, and time-domain predictive coding modes;

S6: Select the predictive coding mode with the best rate-distortion performance as the predictive mode of the currently coded macroblock and perform coding;

S7: judging whether the coding of all the coded macroblocks is completed;

S8: If it is not completed, repeat the steps S1-S5 for the uncoded macroblocks until all coded macroblocks are coded;

Wherein, the depth-assisted inter-view predictive coding specifically includes calculating the depth prediction disparity for each pixel in the current coded macroblock, and performing a quarter of the depth prediction disparity vector of the current coded macroblock Quantization with one-pixel accuracy, and performing pixel-by-pixel predictive coding according to the depth prediction parallax to obtain a predictive coding residual.

2. the joint predictive coding method of stereoscopic video as claimed in claim 1, is characterized in that, the rate-distortion performance of described temporal domain predictive coding obtains by following formula,

{J J}_{M m C C P P} ((\overset{&RightArrow; &Right Arrow;}{m m},, {ref ref}_{m m})) = = \underset{X x &Element; &Element; {B B}_{k k}}{Σ Σ} | | I I - - {I I}_{p p} ((\overset{&RightArrow; &Right Arrow;}{m m},, {ref ref}_{m m})) | | + + {λ λ}_{m m o o t t i i o o n no} (({r r}_{m m} + + {r r}_{h h})),,

in, is the motion vector, B _k is the current coded macroblock, and ref _m is The reference frame pointed to, X is each pixel in B _k , I is the brightness or chrominance component value corresponding to X, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, λ _motion is the Lagrangian multiplier of motion compensation prediction, r _m is the encoding bit rate required for encoding the motion vector, r _h is the encoding except motion The code rate required by other macroblock header information other than the vector.

3. The joint predictive coding method of stereoscopic video as claimed in claim 1, is characterized in that, the rate-distortion performance of described traditional inter-view predictive coding is obtained by the following formula,

{J J}_{D D. C C P P} ((\overset{&RightArrow; &Right Arrow;}{{d d}_{s the s}},, {ref ref}_{d d})) = = \underset{X x &Element; &Element; {B B}_{k k}}{Σ Σ} | | I I - - {I I}_{p p} ((\overset{&RightArrow; &Right Arrow;}{{d d}_{s the s}},, {ref ref}_{d d})) | | + + {λ λ}_{m m o o t t i i o o n no} \times \times (({r r}_{d d} + + {r r}_{h h})),,

in, is the disparity obtained by inter-view matching, B _k is the current coded macroblock, and ref _d is the frame of reference pointed to, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, X is each pixel in B _k , I is the luminance or chrominance component value corresponding to X, and λ _motion is the Lagrangian predicted by motion compensation multiplier, rd is the encoding bit rate required for encoding and searching the _disparity vector.

4. The joint predictive coding method of stereoscopic video as claimed in claim 1, is characterized in that, the rate-distortion performance of the depth-assisted inter-view predictive coding is obtained by the following formula,

{J J}_{D D. A A D D. C C P P} ((\overset{&RightArrow; &Right Arrow;}{{d d}_{z z}},, {ref ref}_{z z})) = = \underset{X x &Element; &Element; {B B}_{k k}}{Σ Σ} | | I I - - {I I}_{p p} ((\overset{&RightArrow; &Right Arrow;}{{d d}_{z z}},, {ref ref}_{z z})) | | + + {λ λ}_{m m o o t t i i o o n no} \times \times {r r}_{h h}^{' '},,

Wherein, B _k is the current coded macroblock, X is each pixel in B _k , and I is the brightness or chrominance component value corresponding to X, Calculate the disparity vector for the quantized depth corresponding to the pixel point X, ref _z is the frame of reference pointed to, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, λ _motion is the Lagrangian multiplier of motion compensation prediction, r′ _h is the coded macroblock header information in the parallax compensation prediction mode based on depth prediction parallax required code rate.

5. A joint predictive coding system for stereoscopic video, comprising:

Depth map codec module for encoding and decoding depth map sequences;

A division module for inputting stereoscopic video and dividing the stereoscopic video into a plurality of coded macroblocks;

The first prediction module is used to predict the depth prediction parallax of each pixel in the current coded macroblock through a depth prediction method, and perform depth-assisted inter-view prediction coding on the current coded macroblock according to the depth prediction parallax;

The second prediction module is configured to obtain the disparity vector of the current encoding module through an inter-view matching method, and perform traditional inter-view prediction coding on the current macroblock according to the disparity vector;

The third prediction module is used to obtain the motion vector of the current coding module through the method of time domain motion estimation, and perform time domain predictive coding on the current coded macroblock according to the motion vector;

A calculation module, configured to calculate the rate-distortion performance of the currently coded macroblock in the depth-assisted inter-view predictive coding, traditional inter-view predictive coding, and time-domain predictive coding modes;

The selection module is used to select the prediction coding mode with the best rate-distortion performance as the prediction mode of the current coded macroblock and perform coding;

a judging module, configured to judge whether all coded macroblocks have been coded; and

The processing module is used to repeatedly use the division module, the first prediction module, the second prediction module, the third prediction module, the calculation module and the selection module until the coding of all the coded macroblocks is completed when the coding is not completed;

Wherein, the first prediction module calculates the depth prediction disparity for each pixel in the current coded macroblock, and quantizes the depth prediction disparity vector of the current coded macroblock with quarter-pixel precision, And perform pixel-by-pixel predictive coding according to the depth prediction parallax to obtain a predictive coding residual.

6. The joint predictive coding system of stereoscopic video as claimed in claim 5, is characterized in that, the rate-distortion performance of described temporal domain predictive coding obtains by following formula,

{J J}_{M m C C P P} ((\overset{&RightArrow; &Right Arrow;}{m m},, {ref ref}_{m m})) = = \underset{X x &Element; &Element; {B B}_{k k}}{Σ Σ} | | I I - - {I I}_{p p} ((\overset{&RightArrow; &Right Arrow;}{m m},, {ref ref}_{m m})) | | + + {λ λ}_{m m o o t t i i o o n no} (({r r}_{m m} + + {r r}_{h h})),,

7. The joint predictive coding system of stereoscopic video as claimed in claim 5, is characterized in that, the rate-distortion performance of described traditional inter-view predictive coding is obtained by the following formula,

{J J}_{D D. C C P P} ((\overset{&RightArrow; &Right Arrow;}{{d d}_{s the s}},, {ref ref}_{d d})) = = \underset{X x &Element; &Element; {B B}_{k k}}{Σ Σ} | | I I - - {I I}_{p p} ((\overset{&RightArrow; &Right Arrow;}{{d d}_{s the s}},, {ref ref}_{d d})) | | + + {λ λ}_{m m o o t t i i o o n no} \times \times (({r r}_{d d} + + {r r}_{h h})),,

in, is the disparity obtained by stereo matching, ref _d is the frame of reference pointed to, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, X is each pixel in B _k , B _k is the current coded macroblock, I is the luminance or chrominance component value corresponding to X, and λ _motion is The Lagrangian multiplier of motion compensation prediction, rd is the encoding bit rate required for encoding and searching the _disparity vector.

8. The joint predictive coding system of stereoscopic video as claimed in claim 5, is characterized in that, the rate-distortion performance of the depth-assisted inter-view predictive coding is obtained by the following formula,

{J J}_{D D. A A D D. C C P P} ((\overset{&RightArrow; &Right Arrow;}{{d d}_{z z}},, {ref ref}_{z z})) = = \underset{X x &Element; &Element; {B B}_{k k}}{Σ Σ} | | I I - - {I I}_{p p} ((\overset{&RightArrow; &Right Arrow;}{{d d}_{z z}},, {ref ref}_{z z})) | | + + {λ λ}_{m m o o t t i i o o n no} \times \times {r r}_{h h}^{' '},,

Wherein, B _k is the current coded macroblock, X is each pixel in B _k , and I is the brightness or chrominance component value corresponding to X, is the quantized depth prediction disparity vector corresponding to the pixel point X, and ref _z is the frame of reference pointed to, for The luminance or chrominance component value of the corresponding pixel in the pointed reference frame, λ _motion is the Lagrangian multiplier of motion compensation prediction, r′ _h is the coded macroblock header information in the parallax compensation prediction mode based on depth prediction parallax required code rate.