CN108886598B

CN108886598B - Compression method and device for panoramic stereoscopic video system

Info

Publication number: CN108886598B
Application number: CN201680078558.9A
Authority: CN
Inventors: 虞晶怡; 马毅
Original assignee: ShanghaiTech University
Current assignee: ShanghaiTech University
Priority date: 2016-01-12
Filing date: 2016-01-18
Publication date: 2020-08-25
Anticipated expiration: 2036-01-18
Also published as: WO2017120981A1; EP3403400B1; US10636121B2; EP3403401A1; CN109076200A; EP3403401B1; EP3403403A1; US10643305B2; US20190035055A1; WO2017120802A1; WO2017120776A1; CN108886598A; EP3403403B1; EP3403401A4; EP3403403A4; US20190028707A1; US20190028693A1; CN108886611A; EP3403400A1; CN108886611B

Abstract

Provided is a method for compressing a stereoscopic video containing a left-view frame and a right-view frame, the method comprising: determining, through intra-frame prediction, a texture saliency value of a first sub-block in the left-view frame (1101); Through motion estimation, determine the motion saliency value of the first sub-block (1102); determine the disparity saliency value between the first sub-block and the corresponding second sub-block in the right-view frame (1103) ; determining a quantization parameter according to the disparity saliency value, the texture saliency value and the motion saliency value (1104); and according to the quantization parameter, quantizing the first partition (1105).

Description

Compression method and device for panoramic stereoscopic video system

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求申请号为PCT/CN2016/070712，名称为“全景立体视频系统的校准方法和装置”，申请日为2016年1月12日的国际专利申请，以及申请号为PCT/CN2016/070823，名称为“全景立体视频系统的拼接方法和装置”，申请日为2016年1月13日的国际专利申请的权益和优先权。此两申请的全部公开内容通过引用并入本文。This application requires an international patent application with an application number of PCT/CN2016/070712, titled "Calibration method and device for a panoramic stereoscopic video system", an application date of January 12, 2016, and an application number of PCT/CN2016/070823, The title is "Splicing Method and Device for Panoramic Stereoscopic Video System", the application date is the rights and priority of the international patent application on January 13, 2016. The entire disclosures of both applications are incorporated herein by reference.

技术领域technical field

本发明涉及用于拍摄、处理、压缩及显示3D全景立体视频的全景立体视频系统，尤其涉及在全景立体视频系统内进行3D全景立体视频压缩的方法和装置。The present invention relates to a panoramic stereoscopic video system for shooting, processing, compressing and displaying 3D panoramic stereoscopic video, in particular to a method and device for compressing 3D panoramic stereoscopic video in the panoramic stereoscopic video system.

背景技术Background technique

上述文献中提出的全景立体视频系统通过将立体全景视频显示于头戴式显示器(HMD)以实现沉浸式3D体验。立体视频的分辨率和持续性为决定用户体验的两个主要特征。该系统将16个高分辨率(HD)摄像头拍摄的图像相互拼接，从而生成立体视频，而且每一视野的分辨率至少为3840×2160(4K)。由于该系统的帧速率为50fps，因此其可大幅降低运动模糊及闪动效应。然而，另一方面，其超高的分辨率及高刷新率导致产生巨量的视频数据，从而给3D视频服务和广播造成了难题。The panoramic stereoscopic video system proposed in the above-mentioned documents realizes an immersive 3D experience by displaying the stereoscopic panoramic video on a head-mounted display (HMD). The resolution and persistence of stereoscopic video are the two main characteristics that determine the user experience. The system stitches images from 16 high-resolution (HD) cameras to each other to generate stereoscopic video with a resolution of at least 3840 x 2160 (4K) per field of view. Since the system has a frame rate of 50fps, it greatly reduces motion blur and flickering effects. However, on the other hand, its ultra-high resolution and high refresh rate result in a huge amount of video data, which poses a problem for 3D video services and broadcasting.

如H.264、VC-1及HEVC等的现有混合视频编码方法的视频编码效率在过去十年中获得了显著提升，而且通过实施密集时空预测，大大降低了视频序列中的时间和空间冗余度。如MV-HEVC和3D-HEVC等的3D技术的最新进展还进一步研究了不同视场之间的视差预测。然而，为了实现立体全景视频更好的压缩性能，还需要通过对人类视觉特性及针对全景的特性加以考虑以提高主观视频质量。The video coding efficiency of existing hybrid video coding methods such as H.264, VC-1 and HEVC has improved significantly over the past decade, and by implementing dense spatiotemporal prediction, the temporal and spatial redundancy in video sequences has been greatly reduced redundancy. Recent advances in 3D technologies such as MV-HEVC and 3D-HEVC have further investigated disparity prediction between different fields of view. However, in order to achieve better compression performance of stereoscopic panoramic video, it is also necessary to improve the subjective video quality by taking into account the characteristics of human vision and the characteristics of panorama.

一般而言，360度的全景图像包含拉长的视场，而且大部分视场很有可能仅为背景。然而，用户可能仅关注视场当中颜色、纹理、运动或深度对比较为显著的一小部分。In general, a 360-degree panoramic image contains an elongated field of view, and most of the field of view is likely to be background only. However, the user may only focus on a small portion of the field of view where color, texture, motion, or depth contrast is significant.

基于人类视觉特征的压缩方法的基本原理在于，仅对少量具有高优先度的被选关注区域进行编码，以获得高主观视频质量，与此同时，以低优先度处理较不关注的区域，以节省比特。为了实现这一点，通常使用关注预测方法对用户可能关注的区域进行预测。The basic principle of compression methods based on human visual features is to encode only a small number of selected areas of interest with high priority to obtain high subjective video quality, while processing less attention areas with low priority to obtain high subjective video quality. Save bits. To achieve this, attention prediction methods are usually used to predict the areas that users may pay attention to.

现有的2D图像显著性计算主要考虑如颜色、形状、朝向、纹理、曲率等的特征的对比反差。在图像序列或视频中，关注区域检测侧重于可将前景与背景区别开来的运动信息。然而，由于现有视频压缩方法不考虑立体视频中的立体视觉反差，因此不适合用于立体视频。此外，当显著性对象在空间层面上不具有视觉独特性，而且在时间层面上不发生运动时，该现有方法难以对其关注区域进行检测。Existing 2D image saliency calculations mainly consider the contrast of features such as color, shape, orientation, texture, curvature, etc. In image sequences or videos, region-of-interest detection focuses on motion information that distinguishes foreground from background. However, since existing video compression methods do not consider the stereoscopic contrast in stereoscopic videos, they are not suitable for stereoscopic videos. Furthermore, when the salient objects are not visually distinctive at the spatial level and do not move at the temporal level, this existing method is difficult to detect their regions of interest.

因此，需要提供一种同时将纹理、运动和立体视觉反差用于显著性分析的新的立体视频压缩方法。Therefore, there is a need to provide a new stereoscopic video compression method that simultaneously uses texture, motion and stereoscopic contrast for saliency analysis.

发明内容SUMMARY OF THE INVENTION

为了解决现有技术中的问题，本发明实施方式提供一种同时将纹理、运动和立体视觉反差用于显著性分析的新的立体视频压缩方法。具体而言，通过采用基于分块的立体视觉检测，进一步提供在人类视觉中具有重要作用的深度线索。In order to solve the problems in the prior art, embodiments of the present invention provide a new stereoscopic video compression method that simultaneously uses texture, motion and stereoscopic contrast for saliency analysis. Specifically, by employing block-based stereo vision detection, we further provide depth cues that play an important role in human vision.

根据本发明一种实施方式，提供一种对含左视帧和右视帧的立体视频进行压缩的方法，该方法包括：通过帧内预测，确定所述左视帧内的第一分块的纹理显著性值；通过运动估算，确定该第一分块的运动显著性值；确定所述第一分块与所述右视帧内的相应的第二分块之间的视差显著性值；以及根据所述视差显著性值、纹理显著性值以及运动显著性值，确定量化参数。According to an embodiment of the present invention, a method for compressing a stereoscopic video containing left-view frames and right-view frames is provided, the method comprising: determining, by intra-frame prediction, a Texture saliency value; determine the motion saliency value of the first sub-block through motion estimation; determine the disparity saliency value between the first sub-block and the corresponding second sub-block in the right-view frame; and determining a quantization parameter according to the disparity saliency value, the texture saliency value, and the motion saliency value.

优选地，所述方法还包括：根据所述量化参数，对所述第一分块进行量化。Preferably, the method further comprises: quantizing the first sub-block according to the quantization parameter.

优选地，所述方法还包括：确定所述左视帧的混合立体显著性映射；将该混合立体显著性映射的尺寸缩减至与变换单元(TU)的尺寸相匹配；确定该变换单元的第二量化参数；以及根据该第二量化参数，对所述变换单元进行量化。Preferably, the method further comprises: determining a hybrid stereo saliency map of the left-view frame; reducing the size of the hybrid stereo saliency map to match the size of a transform unit (TU); binary quantization parameter; and quantizing the transform unit according to the second quantization parameter.

优选地，所述方法还包括：根据高效视频编码(HEVC)的DC模式帧内预测输出，确定所述纹理显著性值。Advantageously, the method further comprises: determining the texture saliency value from a DC mode intra prediction output of High Efficiency Video Coding (HEVC).

优选地，所述方法还包括：根据高效视频编码(HEVC)的运动估算输出，确定所述第一分块的运动显著性值。Advantageously, the method further comprises: determining a motion saliency value of the first partition according to a motion estimation output of High Efficiency Video Coding (HEVC).

优选地，所述方法还包括：通过将所述视差显著性值、纹理显著性值及运动显著性值与加权参数叠加，确定所述第一分块的混合立体显著性值。Preferably, the method further comprises: determining the hybrid stereo saliency value of the first segment by superimposing the parallax saliency value, the texture saliency value and the motion saliency value with a weighting parameter.

优选地，在第一方向上对所述左视帧和右视帧进行修正，而且所述方法还包括：在与所述第一方向垂直的第二方向上搜索所述视差显著性值。Preferably, the left-view frame and the right-view frame are modified in a first direction, and the method further comprises: searching for the parallax saliency value in a second direction perpendicular to the first direction.

优选地，所述视差显著性值包括非整数值。Preferably, the disparity saliency value includes a non-integer value.

优选地，所述方法还包括：根据由高效视频编码(HEVC)的子像素运动估算生成的1/4像素样本，确定所述视差显著性值。Advantageously, the method further comprises determining the disparity saliency value from 1/4 pixel samples generated by high efficiency video coding (HEVC) sub-pixel motion estimation.

根据本发明另一实施方式，提供一种非暂时性计算机可读介质，该介质上存有计算机可执行指令，该计算机可执行指令包括对含左视帧和右视帧的立体视频进行压缩的方法，该方法包括：通过帧内预测，确定所述左视帧内的第一分块的纹理显著性值；通过运动估算，确定该第一分块的运动显著性值；确定所述第一分块与所述右视帧内的相应的第二分块之间的视差显著性值；以及根据所述视差显著性值、纹理显著性值及运动显著性值，确定量化参数。According to another embodiment of the present invention, there is provided a non-transitory computer-readable medium having computer-executable instructions stored thereon, the computer-executable instructions comprising a method for compressing a stereoscopic video including left-view frames and right-view frames The method includes: determining a texture saliency value of a first sub-block in the left-view frame through intra-frame prediction; determining a motion saliency value of the first sub-block through motion estimation; determining the first sub-block a disparity saliency value between a partition and a corresponding second partition within the right-view frame; and determining a quantization parameter based on the disparity saliency value, texture saliency value, and motion saliency value.

优选地，所述方法还包括：根据高效视频编码(HEVC)的DC模式帧内预测的输出，确定所述纹理显著性值。Advantageously, the method further comprises determining the texture saliency value from an output of DC mode intra prediction of High Efficiency Video Coding (HEVC).

优选地，所述方法还包括：根据高效视频编码(HEVC)的运动估算的输出，确定所述第一分块的运动显著性值。Advantageously, the method further comprises: determining a motion saliency value of the first partition according to an output of high efficiency video coding (HEVC) motion estimation.

优选地，所述方法还包括：通过将所述视差显著性值、纹理显著性值及运动显著性值与加权参数叠加，以确定所述第一分块的混合立体显著性值。Preferably, the method further comprises: determining the hybrid stereo saliency value of the first segment by superimposing the parallax saliency value, the texture saliency value and the motion saliency value with a weighting parameter.

优选地，所述方法还包括：根据由高效视频编码(HEVC)的子像素运动估算所生成的1/4像素样本，确定所述视差显著性值。Advantageously, the method further comprises determining the disparity saliency value from 1/4 pixel samples generated by high efficiency video coding (HEVC) sub-pixel motion estimation.

根据本发明实施方式，采用基于关注区域的视频编码方案，该方案采用基于视觉关注的比特分配法。其中，具体而言，在视频关注预测中将空间、时间及立体线索考虑在内。所述空间和时间对比特征直接提取自现有的视频编码过程，无需引入额外计算。此外，还采用子像素视差强度估算来提高立体系统的视觉显著性精度。如此，可在高效压缩立体视频的同时，不影响最终用户的感知质量。According to an embodiment of the present invention, a video coding scheme based on a region of interest is adopted, and the scheme adopts a bit allocation method based on visual attention. Specifically, spatial, temporal and stereo cues are taken into account in video attention prediction. The spatial and temporal contrast features are directly extracted from the existing video coding process without introducing additional computation. In addition, sub-pixel disparity intensity estimation is also employed to improve the visual saliency accuracy of stereo systems. In this way, stereoscopic video can be compressed efficiently without affecting the perceived quality of the end user.

附图说明Description of drawings

为了更好地说明本发明实施方式的技术特征，以下结合附图对本发明各种实施方式进行简单描述。In order to better illustrate the technical features of the embodiments of the present invention, various embodiments of the present invention are briefly described below with reference to the accompanying drawings.

图1为根据本发明实施方式的全景立体视频系统的例示示意图。FIG. 1 is a schematic diagram illustrating a panoramic stereoscopic video system according to an embodiment of the present invention.

图2为根据本发明实施方式的全景立体视频系统的摄像头阵列的例示示意图。FIG. 2 is a schematic diagram illustrating a camera array of a panoramic stereoscopic video system according to an embodiment of the present invention.

图3为根据本发明实施方式的全景立体视频系统的数据处理单元的例示示意图。FIG. 3 is a schematic diagram illustrating an example of a data processing unit of a panoramic stereoscopic video system according to an embodiment of the present invention.

图4为根据本发明实施方式的全景立体视频拼接方法的例示流程图。FIG. 4 is an exemplary flowchart of a panoramic stereoscopic video stitching method according to an embodiment of the present invention.

图5为根据本发明实施方式的全景立体视频显示方法的例示流程图。FIG. 5 is an exemplary flowchart of a panoramic stereoscopic video display method according to an embodiment of the present invention.

图6为根据本发明实施方式的HEVC空间预测模式的例示示意图。FIG. 6 is a schematic diagram illustrating an HEVC spatial prediction mode according to an embodiment of the present invention.

图7为根据本发明实施方式采用运动矢量预测的基于分块的运动估算的例示示意图。FIG. 7 is a schematic diagram illustrating a block-based motion estimation using motion vector prediction according to an embodiment of the present invention.

图8为根据本发明实施方式通过运动估算所获取的运动强度映射的例示示意图。FIG. 8 is a schematic diagram illustrating a motion intensity map obtained by motion estimation according to an embodiment of the present invention.

图9为根据本发明实施方式的立体视频编码所采用的基于分块的视差估算的例示示意图。FIG. 9 is a schematic diagram illustrating a block-based disparity estimation employed in stereoscopic video coding according to an embodiment of the present invention.

图10为根据本发明实施方式的基于混合关注区域的立体视频压缩系统的例示示意图。FIG. 10 is a schematic diagram illustrating a stereoscopic video compression system based on a mixed region of interest according to an embodiment of the present invention.

图11为根据本发明实施例的基于混合关注区域的立体视频压缩方法示例性流程图。FIG. 11 is an exemplary flowchart of a method for compressing a stereoscopic video based on a mixed region of interest according to an embodiment of the present invention.

具体实施方式Detailed ways

为了更好地说明本发明实施方式的目的、技术特征及优点，以下结合附图对本发明各种实施方式进行进一步的描述。容易理解的是，附图仅用于说明本发明的例示实施方式，本领域技术人员可在不脱离本发明原理的前提下获得其他附图。In order to better illustrate the objectives, technical features and advantages of the embodiments of the present invention, various embodiments of the present invention will be further described below with reference to the accompanying drawings. It is readily understood that the accompanying drawings are only used to illustrate exemplary embodiments of the present invention, and other drawings may be obtained by those skilled in the art without departing from the principles of the present invention.

根据本发明的实施方式，提供一种具有多摄像头视频拍摄、数据处理、立体视频编码、传输及3D显示功能的全景立体视频系统。该全景立体视频系统采用实时多视场视频拍摄，图像修正和预处理，以及基于关注区域(ROI)的立体视频压缩。在传输和解码过程之后，使用头戴式显示器(HMD)耳机显示左右视场。According to an embodiment of the present invention, a panoramic stereoscopic video system with functions of multi-camera video shooting, data processing, stereoscopic video encoding, transmission and 3D display is provided. The panoramic stereoscopic video system adopts real-time multi-field video shooting, image correction and preprocessing, and region-of-interest (ROI)-based stereoscopic video compression. After the transmission and decoding process, the left and right fields of view are displayed using a head mounted display (HMD) headset.

1.系统概述1. System overview

图1为根据本发明实施方式的全景立体视频系统的例示示意图。全景立体视频系统采用摄像头阵列拍摄3D全景视频，并将所拍摄的3D全景视频显示于3D电视或头戴式虚拟现实显示装置上。如图1所示，全景立体视频系统包括数据采集单元200，数据处理单元300及数据显示单元400。数据采集单元200包括摄像头阵列210内的多个摄像头以及摄像头校准单元220。数据处理单元300包括数据预处理单元310及高级立体视频转码单元320。数据显示单元400包括解码单元410及显示器耳机420。FIG. 1 is a schematic diagram illustrating a panoramic stereoscopic video system according to an embodiment of the present invention. The panoramic stereoscopic video system uses a camera array to shoot 3D panoramic video, and displays the captured 3D panoramic video on a 3D TV or a head-mounted virtual reality display device. As shown in FIG. 1 , the panoramic stereoscopic video system includes a data acquisition unit 200 , a data processing unit 300 and a data display unit 400 . The data acquisition unit 200 includes a plurality of cameras in the camera array 210 and a camera calibration unit 220 . The data processing unit 300 includes a data preprocessing unit 310 and an advanced stereoscopic video transcoding unit 320 . The data display unit 400 includes a decoding unit 410 and a display earphone 420 .

2.数据采集单元2. Data acquisition unit

如图1所示，数据采集单元200包括摄像头阵列210内的多个摄像头，以及对摄像头阵列210进行校准的摄像头校准单元220。As shown in FIG. 1 , the data acquisition unit 200 includes a plurality of cameras in the camera array 210 , and a camera calibration unit 220 for calibrating the camera array 210 .

2.1摄像头阵列2.1 Camera array

图2为根据本发明实施方式的全景立体视频系统内的摄像头阵列的例示示意图。2 is a schematic diagram illustrating a camera array in a panoramic stereoscopic video system according to an embodiment of the present invention.

如图2所示，摄像头阵列210具有安装于正八边形安装框架上的16个高清晰度摄像头c1～c16，该八边形的每条边上安装有一对摄像头。每条边上的两个摄像头，如c1和c2，具有平行光轴且相互间间隔距离d。摄像头阵列210所采集的原始视频数据经线缆发送至计算机以供进一步处理。所述摄像头的参数列于下表1。As shown in FIG. 2, the camera array 210 has 16 high-definition cameras c1-c16 mounted on a regular octagon mounting frame, and a pair of cameras is mounted on each side of the octagon. The two cameras on each side, such as c1 and c2, have parallel optical axes and are spaced apart from each other by a distance d. The raw video data captured by the camera array 210 is sent over a cable to a computer for further processing. The parameters of the camera are listed in Table 1 below.

表1Table 1

需要注意的是，虽然所述摄像头阵列在图2中示为正八边形，但是在本发明其他实施方式中，该摄像头阵列也可设置为其他形状。具体而言，在本发明一种实施方式中，各摄像头安装于刚性框架上，从而使得该多个摄像头之间的相对位置基本恒定。在本发明的一种其他实施方式中，所述摄像头基本设于同一平面上，例如设于多边形的各条边上。It should be noted that although the camera array is shown as a regular octagon in FIG. 2 , in other embodiments of the present invention, the camera array may also be set in other shapes. Specifically, in an embodiment of the present invention, each camera is mounted on a rigid frame, so that the relative positions among the plurality of cameras are substantially constant. In another embodiment of the present invention, the cameras are basically arranged on the same plane, for example, arranged on each side of a polygon.

2.2摄像头校准2.2 Camera Calibration

为了将各摄像头拍摄的图像拼接在一起并生成3D效果，需要同时获得这些摄像头的内部和外部参数。该外部参数包括各摄像头之间的旋转和平移，从而使得由不同摄像头所拍摄的图像能沿水平方向得到修正和对齐。此外，各摄像头所拍摄的图像可能存在失真，为了获得无失真的图像，需要了解各摄像头的失真参数。这些参数可在摄像头的校准过程中获得。In order to stitch together the images captured by each camera and generate a 3D effect, both the internal and external parameters of these cameras need to be obtained. The external parameters include rotation and translation between the cameras, so that the images captured by the different cameras can be corrected and aligned in the horizontal direction. In addition, the images captured by each camera may be distorted. In order to obtain a distortion-free image, it is necessary to know the distortion parameters of each camera. These parameters are available during the calibration of the camera.

2.2.1内部和失真参数校准2.2.1 Internal and Distortion Parameter Calibration

各摄像头的内部和失真参数可通过各种方法获得，例如由Zhengyou Zhang所提出的校准方法。此外，MatLab等工具可用于获得此类参数。The internal and distortion parameters of each camera can be obtained by various methods, such as the calibration method proposed by Zhengyou Zhang. Furthermore, tools such as MatLab can be used to obtain such parameters.

2.2.2外部参数校准2.2.2 External parameter calibration

在获得各摄像头的内部参数之后，采用基于运动恢复结构的方法获得各摄像头之间的旋转和平移。该方法具有如下优点：After obtaining the internal parameters of each camera, the rotation and translation between the cameras are obtained by a method based on the motion recovery structure. This method has the following advantages:

高效性：无需逐对校准摄像头。相反，所有摄像头在校准过程中均同时拍摄一个场景，而且可同时获得所有摄像头的外部参数。Efficiency: No need to calibrate cameras pair by pair. Instead, all cameras capture a scene at the same time during the calibration process, and the extrinsic parameters of all cameras are obtained at the same time.

精确性：在基于图案的校准方法中，需要由两个相邻的摄像头对图案进行拍摄，这往往造成该图案的分辨率及校准精度降低。在本发明基于运动恢复结构的方法中，对每一摄像头的运动进行独立估算以获得上述参数，而且相邻摄像头无需具有重叠视场。因此，各摄像头可放置于离待拍摄场景更近的位置，从而实现更高的精确度。Accuracy: In the pattern-based calibration method, the pattern needs to be photographed by two adjacent cameras, which often reduces the resolution and calibration accuracy of the pattern. In the method based on the motion recovery structure of the present invention, the motion of each camera is independently estimated to obtain the above parameters, and adjacent cameras do not need to have overlapping fields of view. As a result, each camera can be placed closer to the scene to be captured, allowing for greater accuracy.

可扩展性：由于本发明方法的相邻摄像头不需要重叠视场，因此其甚至可以适用于以背靠背姿态放置的摄像机。Scalability: Since the adjacent cameras of the method of the present invention do not need to overlap fields of view, it can even be applied to cameras placed in a back-to-back posture.

2.3数据采集方法2.3 Data collection method

来自所述16个摄像头的数据在采集后通过软件保存，然后提供于所述数据处理单元。由各摄像头所拍摄的每一帧的图像数据可通过FFmpeg和DirectShow(或称DShow)等的软件收集。每个摄像头所拍摄的各帧在压缩后被保存为视频文件。由于摄像头的个数为多个，因此需要利用时间戳等使得各摄像头所拍摄的各帧同步。例如，各摄像头所拍摄的每一帧都可在添加时间戳后置于队列中，从而使得其与具有相同时间戳的其他各帧同步。同步后的各帧被编码为视频流后本地保存，或者经网络同时传输。The data from the 16 cameras are saved by software after being collected and then provided to the data processing unit. The image data of each frame captured by each camera can be collected by software such as FFmpeg and DirectShow (or DShow). Each frame captured by each camera is compressed and saved as a video file. Since there are a plurality of cameras, it is necessary to synchronize the frames captured by the cameras by using a time stamp or the like. For example, each frame captured by each camera can be time stamped and placed in a queue so that it is synchronized with every other frame with the same time stamp. The synchronized frames are encoded as video streams and stored locally, or transmitted simultaneously over the network.

3.数据处理单元3. Data processing unit

如1图所示，数据处理单元300包括数据预处理单元310和高级立体视频转码单元320。As shown in FIG. 1 , the data processing unit 300 includes a data preprocessing unit 310 and an advanced stereoscopic video transcoding unit 320 .

图3为根据本发明实施方式的全景立体视频系统内的数据处理单元的例示示意图。如图3所示，数据预处理单元310包括：用于使各摄像头所拍摄的图像同步的时间轴同步311；用于解码原始视频流的若干解码器312；用于原始视频修正的若干修正器313；用于实施包括降噪和编辑在内的视频处理的编码器314；用于将各视频进行拼接以生成全景视频的拼接单元。数据预处理单元310向高级立体视频转码单元320输出左眼视频和右眼视频。高级立体视频转码单元320生成视频的运动映射321和纹理映射322。混合关注区域(ROI)生成单元323根据该运动映射321和纹理映射322识别视频中的关注区域。比特分配单元324根据所识别出的关注区域分配比特，HEVC编码单元325对视频进行编码。H.265打包器326将编码后的视频进行打包，以供传输。FIG. 3 is a schematic diagram illustrating an example of a data processing unit in a panoramic stereoscopic video system according to an embodiment of the present invention. As shown in FIG. 3 , the data preprocessing unit 310 includes: a time axis synchronization 311 for synchronizing images captured by each camera; several decoders 312 for decoding the original video stream; several modifiers for modifying the original video 313; an encoder for implementing video processing including noise reduction and editing 314; a stitching unit for stitching each video to generate a panoramic video. The data preprocessing unit 310 outputs the left-eye video and the right-eye video to the advanced stereoscopic video transcoding unit 320 . The advanced stereoscopic video transcoding unit 320 generates a motion map 321 and a texture map 322 of the video. A hybrid region of interest (ROI) generation unit 323 identifies a region of interest in the video based on the motion map 321 and the texture map 322 . The bit allocation unit 324 allocates bits according to the identified region of interest, and the HEVC encoding unit 325 encodes the video. The H.265 packer 326 packs the encoded video for transmission.

3.1失真校正和预处理3.1 Distortion Correction and Preprocessing

根据校准过程中所获得的失真参数，将各摄像头所拍摄的各帧进行卷曲，以获得无失真帧。为了提高图像对齐和拼接精确度，需首先对各帧进行滤波以减少噪声。According to the distortion parameters obtained during the calibration process, each frame captured by each camera is curled to obtain a distortion-free frame. To improve image alignment and stitching accuracy, each frame is first filtered to reduce noise.

3.2图像对齐3.2 Image Alignment

对设于所述八边形每条边上的每对摄像头进行图像对齐，而且将每对摄像头所拍摄的图像沿水平方向对齐。根据本的发明一种实施方式，各对摄像头所拍摄的每一帧均卷曲至与该对摄像头的光轴平行的平面。Image alignment is performed on each pair of cameras disposed on each side of the octagon, and the images captured by each pair of cameras are aligned in the horizontal direction. According to an embodiment of the present invention, each frame captured by each pair of cameras is curled to a plane parallel to the optical axis of the pair of cameras.

4.全景视频拼接4. Panoramic video stitching

所述摄像头阵列具有8对摄像头。在将所有左侧摄像头所拍摄的各帧均投影至圆柱体上后，将其拼接成全景图像。通过在各左侧摄像头所拍摄的所有帧上重复上述步骤，可获得全景视频。通过以相同方式对各右侧摄像头所拍摄的帧进行处理，可获得另一全景视频。此两全景视频形成全景立体视频。The camera array has 8 pairs of cameras. After each frame from all the left cameras is projected onto the cylinder, it is stitched into a panoramic image. A panoramic video is obtained by repeating the above steps on all frames captured by each left camera. Another panoramic video can be obtained by processing the frames captured by each right camera in the same way. The two panoramic videos form a panoramic stereoscopic video.

5.数据显示单元5. Data display unit

如1图所示，数据显示单元400包括解码单元410和显示器耳机420。在通过编解码系统后，所述全景立体视频播放于显示器耳机420之上，该显示器耳机可以为可穿戴虚拟现实(VR)设备，例如由Oculus VR公司提供的此类设备。所述全景立体视频分别渲染于该Oculus设备的左眼显示器和右眼显示器上。该全景立体视频的显示区域可根据检测装置的移动进行调节，以模拟虚拟现实中的视角变化。As shown in FIG. 1 , the data display unit 400 includes a decoding unit 410 and a display earphone 420 . After passing through the codec system, the panoramic stereoscopic video is played on a display headset 420, which may be a wearable virtual reality (VR) device such as that provided by Oculus VR. The panoramic stereoscopic video is rendered on the left-eye display and the right-eye display of the Oculus device, respectively. The display area of the panoramic stereoscopic video can be adjusted according to the movement of the detection device, so as to simulate the change of the viewing angle in virtual reality.

图5为根据本发明实施方式的全景立体视频显示方法的例示流程图。如图5所示，在步骤501中，首先将编码视频流解码为YUV。在步骤502中，根据Oculus传感器数据，进行位置计算和视场选择。在步骤503中，对左眼和右眼图像进行分别渲染。在步骤504中，将所渲染的图像显示于所述Oculus显示器耳机上。FIG. 5 is an exemplary flowchart of a panoramic stereoscopic video display method according to an embodiment of the present invention. As shown in FIG. 5, in step 501, the encoded video stream is first decoded into YUV. In step 502, position calculation and field of view selection are performed according to the Oculus sensor data. In step 503, the left-eye and right-eye images are rendered separately. In step 504, the rendered image is displayed on the Oculus display headset.

6.立体视频压缩6. Stereoscopic video compression

在该立体全景视频系统中，由视频处理模块对左右超分辨率视频进行拼接，然而巨量视频数据成为视频压缩和传输的难题。根据本发明实施方式，提供一种基于关注区域的视频编码方案，该方案采用基于视觉关注的比特分配法。其中，具体而言，在视频关注预测中将空间、时间及立体线索考虑在内。空间和时间对比特征直接提取自视频编码过程，无需引入额外计算。此外，还采用子像素视差强度估算提高立体系统的视觉显著性精度。子像素样本的重复使用及基于分块的匹配保证本发明算法能够以良好的性能实施实时检测。总体而言，该方案在极大提高视频压缩率的同时不影响最终用户的感知质量。In the stereoscopic panoramic video system, the left and right super-resolution videos are spliced by the video processing module. However, the huge amount of video data becomes a difficult problem in video compression and transmission. According to an embodiment of the present invention, a video coding scheme based on a region of interest is provided, and the scheme adopts a bit allocation method based on visual attention. Specifically, spatial, temporal and stereo cues are taken into account in video attention prediction. Spatial and temporal contrast features are directly extracted from the video encoding process without introducing additional computations. In addition, sub-pixel disparity intensity estimation is also employed to improve the visual saliency accuracy of stereo systems. The repeated use of sub-pixel samples and block-based matching ensure that the algorithm of the present invention can perform real-time detection with good performance. Overall, this scheme greatly improves the video compression rate without affecting the end user's perceived quality.

6.1关注区域检测6.1 Region of Interest Detection

6.1.1通过帧内预测提取空间特征6.1.1 Extracting Spatial Features by Intra Prediction

在HEVC编码标准中，帧内预测(或称空间预测)用于对需要以独立于之前所编码的各帧的方式被压缩的分块进行编码，而且通过之前编码及重建的分块的相邻样本获得像素级的空间相关性。在此之后，从原始像素值中减去预测样本，以获得残差分块。获得自帧内预测的该残差含有纹理对比信息，并用于生成空间显著性映射。In the HEVC coding standard, intra-frame prediction (or spatial prediction) is used to encode blocks that need to be compressed independently of previously coded frames, and the adjacent blocks of previously coded and reconstructed blocks are coded. Samples obtain pixel-level spatial correlation. After this, the predicted samples are subtracted from the original pixel values to obtain a residual block. This residual, obtained from intra prediction, contains texture contrast information and is used to generate a spatial saliency map.

在HEVC视频编码中，空间预测包括供预测单元(PU)选择的33种定向预测模式(H.264中只有8种此类模式)，DC预测模式(总体平均)及平面(表面拟合)预测模式。图6为根据本发明实施方式的HEVC空间预测模式的例示示意图。图6中示出了所有的35种预测模式。HEVC预测单元的大小选自64×64至8×8，而且所有的35种模式均可实现最佳分块分割及最佳残差。为了降低复杂度，在一种实施方式中，所述基于分块的残差映射对在固定的8×8分块上实施DC模式预测所产生的结果进行重复使用。分块k的残差的计算如下所示：In HEVC video coding, spatial prediction includes 33 directional prediction modes for prediction unit (PU) selection (there are only 8 such modes in H.264), DC prediction mode (overall average) and planar (surface fitting) prediction model. FIG. 6 is a schematic diagram illustrating an HEVC spatial prediction mode according to an embodiment of the present invention. All 35 prediction modes are shown in FIG. 6 . The size of the HEVC prediction unit is selected from 64×64 to 8×8, and all 35 modes can achieve the best block segmentation and the best residual. To reduce complexity, in one embodiment, the block-based residual mapping reuses the result of implementing DC mode prediction on fixed 8x8 blocks. The residuals for block k are calculated as follows:

其中，C_ij和R_ij为当前原始分块C和重构分块R的第(i,j)个元素。之后，可根据每个分块的残差计算其纹理显著性值S_T，并将其归一化至[0，1]范围：Among them, C _ij and R _ij are the (i, j)th elements of the current original block C and reconstructed block R. After that, the texture saliency value S _T can be calculated from the residual of each block and normalized to the range [0, 1]:

其中，N为帧内的分块数。由于采用HEVC帧内预测方法进行8×8分块的空间残差检测，因此无需引入额外计算。Among them, N is the number of blocks in the frame. Since the HEVC intra-frame prediction method is used for the spatial residual detection of the 8×8 block, there is no need to introduce additional computation.

在其他实施方式中：每一帧可分割为64×64或16×16像素等的不同大小的非重叠分块；并可根据与帧内预测方法类似或具有可比性的其他视频编码方法的结果计算纹理显著性映射；而且可根据H.264/AVC或AVS等其他编码标准进行压缩。优选地，所述帧内预测或其他视频处理基于分割自所述帧的相同大小的分块。In other embodiments: each frame may be divided into non-overlapping blocks of different sizes of 64x64 or 16x16 pixels, etc.; and may be based on the results of other video coding methods similar or comparable to intra prediction methods Calculates texture saliency maps; and can be compressed according to other encoding standards such as H.264/AVC or AVS. Preferably, the intra prediction or other video processing is based on partitions of the same size from the frame.

6.1.2通过运动估算提取时间特征6.1.2 Extracting Temporal Features by Motion Estimation

一个快速移动的物体能够引起视觉关注。然而，由于视频序列由处于运动状态的摄像头所拍摄，因此其中存在全局运动。所以，需要通过以HEVC帧间预测运动估算法估算运动矢量差(MVD)的方式测量局部运动显著性。A fast-moving object can attract visual attention. However, since the video sequence is captured by a camera that is in motion, there is global motion in it. Therefore, there is a need to measure local motion saliency by estimating motion vector difference (MVD) in HEVC inter-predictive motion estimation.

大多数视频编码标准中的运动估算技术一般都基于分块匹配，其中，由2D平移模式表示运动矢量，而且将每个分块与预定搜索区域内的所有候选位置相匹配。由于相邻分块的运动矢量通常高度相关，因此在HEVC所采用的运动矢量预测技术中，根据附近已编码的分块的运动矢量，预测当前分块的运动矢量。Motion estimation techniques in most video coding standards are generally based on block matching, where a motion vector is represented by a 2D translation pattern, and each block is matched to all candidate positions within a predetermined search area. Since the motion vectors of adjacent blocks are usually highly correlated, in the motion vector prediction technique adopted by HEVC, the motion vector of the current block is predicted based on the motion vectors of nearby coded blocks.

图7为根据本发明实施方式采用运动矢量(MV)预测的基于分块的运动估算的例示示意图。如图7所示，对于当前帧710中的当前分块711，根据相邻分块的运动矢量预测矢量mv_pred712，而且将相应分块721与预定搜索区域725内的所有的候选位置相匹配。最后，将最佳矢量mv_best723与预测矢量mv_pred721之间的矢量差编码后发送。7 is a schematic diagram illustrating a block-based motion estimation employing motion vector (MV) prediction according to an embodiment of the present invention. As shown in FIG. 7, for the current block 711 in the current frame 710, the vector mv _pred 712 is predicted according to the motion vector of the adjacent block, and the corresponding block 721 is matched with all the candidate positions in the predetermined search area 725 . Finally, the vector difference between the best vector mv _best 723 and the predicted vector mv _pred 721 is encoded and sent.

在一种实施方式中，采用8×8分块运动估算所生成的运动矢量差。其中，运动矢量差的大小可定义为：In one embodiment, motion vector differences generated by 8x8 block motion estimation are used. Among them, the magnitude of the motion vector difference can be defined as:

MVD_k＝||mv_best(k)-mv_pred(k)|| (3)MVD _k =||mv _best (k)-mv _pred (k)|| (3)

然后，可通过将同一帧内的运动矢量差进行归一化而计算运动显著性映射S_M：The motion saliency map S _M can then be computed by normalizing the motion vector differences within the same frame:

可根据运动估算的结果计算该运动显著性映射，该运动估算为HEVC视频编码的主要过程。因此，该方法可在不引入任何额外处理的同时提取运动特征。图8为根据本发明实施方式通过运动估算所获取的运动强度映射的例示示意图。The motion saliency map can be calculated from the results of motion estimation, which is the main process of HEVC video coding. Therefore, this method can extract motion features without introducing any additional processing. FIG. 8 is a schematic diagram illustrating a motion intensity map obtained by motion estimation according to an embodiment of the present invention.

在其他实施方式中，每一帧可分割为64×64或16×16像素等的不同大小的非重叠分块；并可根据与帧间预测运动估算类似或具有可比性的其他视频编码方法的结果计算运动显著性映射；而且可根据H.264/AVC或AVS等的其他编码标准来进行压缩。优选地，所述运动估算或其他视频处理基于分割自所述帧的相同大小的分块。In other embodiments, each frame may be divided into non-overlapping blocks of different sizes of 64x64 or 16x16 pixels, etc.; and may be based on other video coding methods similar or comparable to inter-predictive motion estimation. The result calculates a motion saliency map; and can be compressed according to other coding standards such as H.264/AVC or AVS. Preferably, the motion estimation or other video processing is based on equally sized blocks divided from the frame.

6.1.3通过视差预测进行视差估计6.1.3 Disparity estimation via disparity prediction

我们还在显著性分析中采用立体视觉，以进一步提供深度线索，而且该立体视觉在立体全景视频中发挥重要作用。其中，在视差映射处理中引入基于分块的视差估算方法。We also employ stereo vision in our saliency analysis to further provide depth cues, and this stereo vision plays an important role in stereo panoramic videos. Among them, a block-based disparity estimation method is introduced in the disparity mapping process.

图9为根据本发明实施方式立体视频编码所采用的基于分块的视差估算的例示示意图。如图9所示，在所述高分辨率视频系统中，所述立体图像的左视场910和右视场920均得到了良好修正。每一视场均分割为大小为8×8像素的非重叠分块，而且分块内的所有像素均假定具有相同的视差。如此，可预期与右视场分块相匹配的左视场分块能在同一扫描线上找到，而且视差922成为一维矢量(垂直分量等于零)。该视差匹配方案类似于帧间预测中的运动估算。具体而言，搜索区域925仅限定于水平方向上±32之内的范围。初始搜索位置设为右视场920的相应分块921的位置，而且将绝对差值和(SAD)用作匹配标准。FIG. 9 is a schematic diagram illustrating a block-based disparity estimation used in stereoscopic video coding according to an embodiment of the present invention. As shown in FIG. 9 , in the high-resolution video system, both the left field of view 910 and the right field of view 920 of the stereoscopic image are well corrected. Each field of view is divided into non-overlapping blocks of size 8x8 pixels, and all pixels within a block are assumed to have the same disparity. As such, it is expected that the left field block that matches the right field block can be found on the same scanline, and the disparity 922 becomes a one-dimensional vector (with the vertical component equal to zero). This disparity matching scheme is similar to motion estimation in inter prediction. Specifically, the search area 925 is limited to a range within ±32 in the horizontal direction. The initial search position is set to the position of the corresponding segment 921 of the right field of view 920, and the Sum of Absolute Differences (SAD) is used as the matching criterion.

为了实现更佳的预测精度，我们将非整数值的视差也考虑在内，而且使用HEVC 7/8抽头滤波器对子像素强度进行插值。由于子像素样本插值为最复杂的操作之一，因此本发明的子像素视差搜索直接使用由HEVC子像素运动估算所生成的1/4像素样本。通过重复使用HEVC的7抽头插值，可大幅降低计算复杂度。而且，根据分块视差值d_k，生成逐块视差映射：To achieve better prediction accuracy, we take into account the disparity of non-integer values, and use HEVC 7/8-tap filters to interpolate sub-pixel intensities. Since sub-pixel sample interpolation is one of the most complex operations, the sub-pixel disparity search of the present invention directly uses the 1/4 pixel samples generated by HEVC sub-pixel motion estimation. Computational complexity is greatly reduced by reusing HEVC's 7-tap interpolation. And, based on the block disparity value d _k , a block-wise disparity map is generated:

在其他实施方式中，每一帧可分割为64×64或16×16像素等的不同大小的非重叠分块；并可根据与运动估算类似或具有可比性的其他视频编码方法的结果计算视差映射；而且可根据H.264/AVC或AVS等的其他编码标准进行压缩。优选地，所述运动估算处理基于分割自所述帧的相同大小的分块。In other embodiments, each frame may be divided into non-overlapping blocks of different sizes of 64x64 or 16x16 pixels, etc.; disparity may be calculated from the results of other video coding methods similar or comparable to motion estimation Mapping; and can be compressed according to other coding standards such as H.264/AVC or AVS. Preferably, the motion estimation process is based on partitions of the same size from the frame.

6.1.4混合关注区域确定6.1.4 Determination of mixed areas of interest

在一种实施方式中，通过将时空特征与视差特征，即式(2)中的纹理对比度S_T，式(4)中的运动对比度S_M及式(5)中的视差强度，相结合的方式进行关注区域检测。虽然每种特征都有其自身的优缺点，但通过将所有特征相结合，可实现最佳效果。首先，每一特征映射均归一化至[0，1]的范围。其次，通过将S_M，S_T和S_D相叠加而形成混合立体显著性映射S：In one embodiment, by combining spatiotemporal features and disparity features, that is, the texture contrast S _T in Equation (2), the motion contrast S _M in Equation (4) and the disparity intensity in Equation (5), way to detect the region of interest. While each feature has its own advantages and disadvantages, the best results are achieved by combining all of them. First, each feature map is normalized to the range [0, 1]. Second, a hybrid stereo saliency map S is formed by superimposing S _M , S _T and S _D :

S(b_i)＝λ_TS_T+λ_MS_M+λ_DS_D (6)S(b _i )=λ _T S _T +λ _M S _M +λ _D S _D (6)

其中，λ_T，λ_M和λ_D为加权参数。Among them, λ _T , λ _M and λ _D are weighting parameters.

图10为根据本发明实施方式的基于混合关注区域的立体视频压缩系统的例示示意图。如图10所示，该立体视频压缩系统具有空间预测模块1001，时间预测模块1102及视差预测模块1103。由空间预测模块1001、时间预测模块1102及视差预测模块1103生成的结果输入至混合关注区域生成模块1004，该在识别出显著性区域后，进行相应的比特分配。变换及量化模块1105根据由混合关注区域生成模块1004所确定的比特分配方式实施量化，而熵编码模块1106用于将各帧编码以生成压缩帧1006。FIG. 10 is a schematic diagram illustrating a stereoscopic video compression system based on a mixed region of interest according to an embodiment of the present invention. As shown in FIG. 10 , the stereoscopic video compression system has a spatial prediction module 1001 , a temporal prediction module 1102 and a disparity prediction module 1103 . The results generated by the spatial prediction module 1001, the temporal prediction module 1102, and the disparity prediction module 1103 are input to the mixed region of interest generation module 1004, which performs corresponding bit allocation after identifying the saliency region. Transform and quantization module 1105 performs quantization according to the bit allocation determined by hybrid region of interest generation module 1004 , while entropy encoding module 1106 is used to encode each frame to generate compressed frame 1006 .

6.2基于关注区域的立体视频编码6.2 Stereoscopic Video Coding Based on Region of Interest

基于关注区域的压缩的其中一个概念在于，以有利于显著性区域的方式进行比特分配。所述混合关注区域检测方法可生成高质量高精确性的显著性映射。此外，为了提高视频压缩性能，还选用具有高压缩效率的高级视频标准HEVC。One of the concepts of region-of-interest based compression is to allocate bits in a way that favors saliency regions. The hybrid region-of-interest detection method can generate high-quality and high-accuracy saliency maps. In addition, in order to improve the video compression performance, HEVC, an advanced video standard with high compression efficiency, is also selected.

由于本发明的关注区域检测是基于8×8的分块，因此需要将所估算的显著性映射的尺寸缩减至与当前变换单元的尺寸相匹配，该现有变换单元的尺寸可选为32×32，16×16以及8×8。新的QP值可按照下式计算：Since the ROI detection of the present invention is based on 8×8 blocks, the size of the estimated saliency map needs to be reduced to match the size of the current transform unit, and the size of the existing transform unit can be selected as 32× 32, 16×16 and 8×8. The new QP value can be calculated as:

Q′＝max(Q-ψ·(S-ES)，0) (8)Q'=max(Q-ψ·(S-ES), 0) (8)

其中，

为由x265编码器所选择的原始QP值，in,

is the original QP value chosen by the x265 encoder,

为针对当前帧的尺寸缩减的显著性映射。

is the saliency map for the size reduction of the current frame.

如此，减小了含显著性区域的编码单元的QP值，而增大不含显著性区域的编码单元的QP值。参数ψ可由用户选择，并对显著性区域和非显著性区域之间的比特率分布进行控制：ψ的值越高，显著性区域的比特越多。In this way, the QP value of the coding unit containing the saliency region is decreased, and the QP value of the coding unit without the saliency region is increased. The parameter ψ is user-selectable and controls the bitrate distribution between salient and non-salient regions: the higher the value of ψ, the more bits in the saliency region.

图11为根据本发明实施方式的基于混合关注区域的立体视频压缩方法的例示流程图。如图11所示，该压缩方法包括如下步骤：FIG. 11 is an exemplary flowchart of a method for compressing a stereoscopic video based on a mixed region of interest according to an embodiment of the present invention. As shown in Figure 11, the compression method includes the following steps:

步骤1101：通过帧内预测，确定左视帧内的第一分块的纹理显著性值。优选地，根据高效视频编码(HEVC)的DC模式帧内预测的输出，确定所述纹理显著性值。Step 1101: Determine the texture saliency value of the first sub-block in the left-view frame through intra-frame prediction. Preferably, the texture saliency value is determined from the output of high efficiency video coding (HEVC) DC mode intra prediction.

步骤1102：通过运动估算，确定所述第一分块的运动显著性值。优选地，根据高效视频编码(HEVC)的运动估算的输出，确定所述运动显著性值。步骤1103：确定所述第一分块与右视帧内的相应的第二分块之间的视差。优选地，先在第一方向上对所述左视帧和右视帧进行修正，然后在与该第一方向垂直的第二方向上进行视差搜索。Step 1102: Determine the motion saliency value of the first sub-block through motion estimation. Preferably, the motion saliency value is determined from the output of high efficiency video coding (HEVC) motion estimation. Step 1103: Determine the disparity between the first sub-block and the corresponding second sub-block in the right-view frame. Preferably, the left-view frame and the right-view frame are first corrected in a first direction, and then a parallax search is performed in a second direction perpendicular to the first direction.

步骤1104：根据所述视差、纹理显著性值及运动显著性值，确定量化参数。优选地，通过将所述视差、纹理显著性值及运动显著性值与加权参数进行叠加，以确定混合立体显著性值。Step 1104: Determine a quantization parameter according to the disparity, texture saliency value and motion saliency value. Preferably, the hybrid stereo saliency value is determined by superimposing the disparity, texture saliency value and motion saliency value with a weighting parameter.

步骤1105：根据所述量化参数，对所述第一分块进行量化。其中，如果该分块的尺寸与当前变换单元的尺寸不同时，则将所述混合立体显著性映射的尺寸缩减至与该当前变换单元的尺寸相匹配，并计算新的量化参数。Step 1105: Quantize the first sub-block according to the quantization parameter. Wherein, if the size of the block is different from the size of the current transform unit, the size of the hybrid stereo saliency map is reduced to match the size of the current transform unit, and a new quantization parameter is calculated.

根据本发明实施方式，采用基于关注区域的视频编码方案，该方案采用基于视觉关注的比特分配法。其中，具体而言，在视频关注预测中将空间、时间及立体线索考虑在内。所述空间和时间对比特征直接提取自视频编码过程，无需引入额的外计算。此外，还采用子像素视差强度估算提高视觉显著性精度。如此，可在高效压缩立体视频的同时，不影响最终用户的感知质量。According to an embodiment of the present invention, a video coding scheme based on a region of interest is adopted, and the scheme adopts a bit allocation method based on visual attention. Specifically, spatial, temporal and stereo cues are taken into account in video attention prediction. The spatial and temporal contrast features are directly extracted from the video encoding process without introducing extra computation. In addition, sub-pixel disparity intensity estimation is also employed to improve visual saliency accuracy. In this way, stereoscopic video can be compressed efficiently without affecting the perceived quality of the end user.

上述各种模块、单元和部件可实施为：专用集成电路(ASIC)；电子电路；组合逻辑电路；现场可编程门阵列(FPGA)；执行代码的处理器(共享、专用或成组)；或者提供上述功能的其他合适硬件部件。所述处理器可以为Intel公司的微处理器，或者为IBM公司的大型计算机。The various modules, units, and components described above may be implemented as: application specific integrated circuits (ASICs); electronic circuits; combinational logic circuits; field programmable gate arrays (FPGAs); processors (shared, dedicated, or grouped) executing code; or Other suitable hardware components that provide the above functionality. The processor may be a microprocessor of Intel Corporation, or a mainframe computer of IBM Corporation.

需要注意的是，上述功能当中的一项或多项可由软件或固件实施，该软件或固件存储于存储器内并由处理器执行，或者存储于程序存储器内并由处理器执行。此外，该软件或固件可存储和/或传输于任何计算机可读介质之内，以供指令执行系统、装置或设备使用或与其结合使用，该指令执行系统、装置或设备例如为基于计算机的系统、含处理器的系统或者可从所述指令执行系统、装置或设备中获取指令并对其加以执行的其他系统。在本文语境中，“计算机可读介质”可以为任何可含有或存储供所述指令执行系统、装置或设备使用或用于与其结合使用的程序的介质。该计算机可读介质可包括，但不限于，电子、磁性、光学、电磁、红外或半导体系统、装置或设备，便携式计算机磁盘(磁性)，随机存取存储器(RAM)(磁性)，只读存储器(ROM)(磁性)，可擦除可编程只读存储器(EPROM)(磁性)，CD、CD-R、CD-RW、DVD、DVD-R或DVD-RW等便携式光盘，或袖珍闪存卡、安全数字卡、USB存储装置、记忆棒等闪存。It should be noted that one or more of the above functions may be implemented by software or firmware stored in memory and executed by a processor, or stored in program memory and executed by a processor. Furthermore, the software or firmware may be stored and/or transmitted within any computer-readable medium for use by or in connection with an instruction execution system, apparatus or device, such as a computer-based system , a processor-containing system, or other system that can retrieve and execute instructions from the instruction execution system, apparatus, or device. In this context, a "computer-readable medium" can be any medium that can contain or store a program for use by or in connection with the instruction execution system, apparatus, or device. The computer readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared or semiconductor systems, devices or devices, portable computer disks (magnetic), random access memory (RAM) (magnetic), read only memory (ROM) (magnetic), Erasable Programmable Read-Only Memory (EPROM) (magnetic), a portable optical disc such as CD, CD-R, CD-RW, DVD, DVD-R or DVD-RW, or a compact flash card, Flash memory such as secure digital cards, USB storage devices, memory sticks, etc.

上述各种本发明实施方式仅为优选实施方式，并不旨在限制本发明的范围，而且本发明范围涵盖不脱离本发明精神和原则的任何修饰、等同及改进方案。The various embodiments of the present invention described above are only preferred embodiments and are not intended to limit the scope of the present invention, and the scope of the present invention covers any modifications, equivalents and improvements without departing from the spirit and principles of the present invention.

Claims

1. A method for compressing a stereoscopic video containing a left-view frame and a right-view frame, wherein the method comprises:

determining the texture saliency value of the first sub-block in the left-view frame by intra-frame prediction;

Determine the motion saliency value of the first sub-block through motion estimation of inter-frame prediction;

determining a disparity saliency value between the first partition and a corresponding second partition within the right-view frame;

determining a quantization parameter based on the disparity saliency value, the texture saliency value, and the motion saliency value; and

quantizing the first sub-block according to the quantization parameter;

Wherein, the mixed stereo saliency value of the first block is determined by weighting the parallax saliency value, the texture saliency value and the motion saliency value, and the following formula is used to determine the Quantization parameter Q':

Q'=max(Q-ψ·(S-ES), 0)

Among them, Q is the original quantization parameter value, S is the mixed stereo saliency value, and ψ is the parameter that controls the bit rate distribution between the saliency area and the non-saliency area.

2. The method of claim 1, further comprising:

The texture saliency value is determined from an output of DC mode intra prediction of high efficiency video coding.

3. The method of claim 1, further comprising:

A motion saliency value for the first partition is determined from the output of the motion estimation of the high efficiency video coding.

4. The method of claim 1, wherein the left-view frame is partitioned into a plurality of non-overlapping partitions, and the motion estimation is based on the same size as the first partition.

5. The method of claim 4, further comprising:

The first sub-block is quantized according to the quantization parameter.

6. The method of claim 4, further comprising:

determining a hybrid stereo saliency map for the left-view frame;

reducing the size of the hybrid stereo saliency map to match the size of the transform unit;

determining a second quantization parameter for the transform unit; and

The transform unit is quantized according to the second quantization parameter.

7. The method of claim 1, wherein the left-view frame and the right-view frame are modified in a first direction, and the method further comprises:

The disparity saliency value is searched for in a second direction perpendicular to the first direction.

8. The method of claim 7, wherein the disparity saliency value comprises a non-integer value.

9. The method of claim 8, further comprising:

The disparity saliency value is determined from 1/4 pixel samples generated by sub-pixel motion estimation for high efficiency video coding.

10. A non-transitory computer-readable medium having computer-executable instructions stored thereon, the computer-executable instructions, when executed by a processor, can perform the following stereoscopic analysis of a left-view frame and a right-view frame: A method for compressing video, characterized in that the method includes:

Determine the motion saliency value of the first sub-block by motion estimation;

quantizing the first sub-block according to the quantization parameter;

Wherein, the mixed stereo saliency value of the first sub-block is determined by weighted summation of the parallax saliency value, the texture saliency value and the motion saliency value, and the following formula is used to determine the Quantization parameter Q':

Q'=max(Q-ψ·(S-ES), 0)

11. The computer-readable medium of claim 10, wherein the method further comprises:

12. The computer-readable medium of claim 10, wherein the method further comprises:

13. The computer-readable medium of claim 10, wherein the left-view frame is partitioned into a plurality of non-overlapping partitions, and the motion estimation is based on the same partition as the first partition size.

14. The computer-readable medium of claim 13, wherein the method further comprises:

The first sub-block is quantized according to the quantization parameter.

15. The computer-readable medium of claim 13, wherein the method further comprises:

determining a hybrid stereo saliency map for the left-view frame;

determining a second quantization parameter for the transform unit; and

The transform unit is quantized according to the second quantization parameter.

16. The computer-readable medium of claim 10, wherein the left-view and right-view frames are modified in a first direction, and the method further comprises:

17. The computer-readable medium of claim 16, wherein the disparity saliency value comprises a non-integer value.

18. The computer-readable medium of claim 17, wherein the method further comprises: