CN106803952B

CN106803952B - In conjunction with the cross validation depth map quality evaluating method of JND model

Info

Publication number: CN106803952B
Application number: CN201710041375.6A
Authority: CN
Inventors: 陈芬; 陈嘉丽; 彭宗举; 蒋刚毅; 郁梅
Original assignee: Ningbo University
Current assignee: Shenzhen Lizhuan Technology Transfer Center Co ltd; Shenzhen Yiqi Culture Co ltd
Priority date: 2017-01-20
Filing date: 2017-01-20
Publication date: 2018-09-14
Anticipated expiration: 2037-01-20
Also published as: CN106803952A

Abstract

The invention discloses a cross-validation depth map quality evaluation method combined with a JND model, which uses the color map corresponding to the depth map and the color map on the auxiliary viewpoint to obtain a difference map; uses the depth map and the color map corresponding to it through 3D ‑Warping is mapped to the number of pixels at each coordinate in the color map on the auxiliary viewpoint to obtain the occlusion mask; then use the occlusion mask to remove the occluded pixels in the difference map to obtain the de-occluded difference map; then Divide the color image on the auxiliary viewpoint into three regions of flat, edge and texture to obtain the region marker map; then introduce the JND model, combined with the region marker map, to obtain the error visual threshold of each pixel in the color image on the auxiliary viewpoint ;Finally, according to the de-occluded difference map and the error visual threshold, the depth error map is obtained, and then the ratio of the error pixels in the depth map is obtained as the quality evaluation value; the advantage is that it can effectively improve the evaluation result and the drawn result. Consistency between the quality of virtual viewpoints.

Description

Cross Validated Depth Map Quality Evaluation Method Combined with JND Model

技术领域technical field

本发明涉及一种图像质量评价方法，尤其是涉及一种结合JND(Just-noticeable-distortion，恰可察觉失真)模型的交叉验证深度图质量评价方法。The invention relates to an image quality evaluation method, in particular to a cross-validation depth map quality evaluation method combined with a JND (Just-noticeable-distortion, just noticeable distortion) model.

背景技术Background technique

近年来，视频技术迅速发展，出现了许多新的应用，如3D视频和自由视点视频(FVV，Free Viewpoint Video)。与传统的二维视频相比，3D视频提供了深度信息，带来了更逼真的视觉体验。深度图在许多3D视频应用中起着基本作用，例如，深度图可以用于通过在可用视点处内插或外推图像来生成任意新视点图像；此外，高质量的深度图为解决计算机视觉中的具有挑战性的问题提供了帮助。许多3D视频应用的性能受益于准确和高质量的深度图的估计或采集，可以通过匹配经校正的彩色图像或使用深度相机来获得深度图。在立体匹配技术中，由于受遮挡和大面积均匀区域的影响，因此常常会产生不准确的深度图，虽然立体匹配算法的固有困难可以使用深度相机来解决，但是不可避免的传感器噪声问题依然存在，影响了深度的精度和对象的形态。In recent years, video technology has developed rapidly, and many new applications have emerged, such as 3D video and Free Viewpoint Video (FVV, Free Viewpoint Video). Compared with traditional two-dimensional video, 3D video provides depth information, resulting in a more realistic visual experience. Depth maps play a fundamental role in many 3D video applications, for example, depth maps can be used to generate arbitrary new viewpoint images by interpolating or extrapolating images at available viewpoints; moreover, a high-quality depth map is an important tool for solving problems in computer vision. provided assistance with challenging problems. The performance of many 3D video applications benefits from the estimation or acquisition of accurate and high-quality depth maps, which can be obtained by matching rectified color images or using depth cameras. In stereo matching technology, due to the influence of occlusion and large uniform area, it often produces inaccurate depth maps. Although the inherent difficulties of stereo matching algorithms can be solved by using depth cameras, the inevitable sensor noise problem still exists. , affecting the accuracy of the depth and the shape of the object.

3D视频技术的一个主要发展方向是基于彩色加深度的自由视点视频系统，该系统的基本框架包括采集、预处理、编码、传输、解码、虚拟视点图像绘制和显示等环节。基于彩色加深度的自由视点视频系统能让用户自由地选择任意位置的视点进行观看，增强了人机交互性。实现自由视点视频系统的一项关键技术就是虚拟视点生成技术，它的主要用途是克服相机获取真实视点能力的限制，产生任意位置的虚拟视点。影响虚拟视点质量的因素主要有两个：一是深度图和对应彩色图像的质量；二是虚拟视点绘制算法。目前，基于深度图的绘制(DIBR，Depth Image Based Rendering)技术是业界应用最为广泛的一种虚拟视点生成技术。在基于深度图的绘制技术中，深度信息是生成高质量的虚拟视点的关键，深度信息错误将导致视差错误，造成虚拟视点中像素位置的偏移和对象扭曲，影响用户感知。深度信息代表的是对应场景到相机成像平面的距离信息，它将实际距离值量化到[0，255]。由于深度相机价格昂贵，因此目前用于测试的深度图大多通过深度估计软件得到。为了推广应用和降低成本，用于虚拟视点绘制的深度信息不适合在接收端通过深度估计产生，需要在发送端采集或者估计，然后编码传送给接收端。因此，深度图获取算法的限制和深度图编码会导致深度估计不准和深度压缩失真。One of the main development directions of 3D video technology is the free-viewpoint video system based on color plus depth. The free-viewpoint video system based on color plus depth allows users to freely choose any viewpoint for viewing, which enhances human-computer interaction. A key technology to realize the free-viewpoint video system is the virtual viewpoint generation technology. Its main purpose is to overcome the limitation of the camera's ability to obtain the real viewpoint and generate a virtual viewpoint at any position. There are two main factors affecting the quality of the virtual viewpoint: one is the quality of the depth map and the corresponding color image; the other is the virtual viewpoint rendering algorithm. Currently, a depth image based rendering (DIBR, Depth Image Based Rendering) technology is the most widely used virtual viewpoint generation technology in the industry. In the depth map-based rendering technology, depth information is the key to generate high-quality virtual viewpoints. Errors in depth information will lead to parallax errors, resulting in pixel position offsets and object distortions in virtual viewpoints, affecting user perception. Depth information represents the distance information from the corresponding scene to the camera imaging plane, and it quantifies the actual distance value to [0, 255]. Because the depth camera is expensive, most of the depth maps currently used for testing are obtained through depth estimation software. In order to popularize applications and reduce costs, the depth information used for virtual viewpoint rendering is not suitable for depth estimation at the receiving end. It needs to be collected or estimated at the sending end, and then encoded and transmitted to the receiving end. Therefore, the limitation of depth map acquisition algorithm and depth map encoding will lead to inaccurate depth estimation and depth compression distortion.

基于深度图的绘制技术的核心思想为利用深度信息和相机参数将参考图像中的像素投影到目标虚拟视点，一般可以分为两步，首先将原参考视点中的像素利用其深度信息重投影到它们对应的三维空间位置；然后根据虚拟视点的位置(如相机平移、旋转参数等)将这些三维空间点再投影到虚拟相机平面进行成像得到虚拟视点中的像素。虚拟视点绘制时，需要将深度转化为视差，通过视差可求得参考像素点在虚拟视点中的位置，深度值决定了参考视点中的像素的偏移距离。若相邻像素的深度值变化剧烈，则会在两像素之间产生空洞，深度值变化越尖锐，则产生的空洞越大。由于前背景交界处深度值变化较大，因此空洞的产生一般位于前背景交界处。当参考图像中被前景对象遮挡的背景区域在虚拟图像中可见时，虚拟图像中将出现空洞，而当参考图像中未被前景对象遮挡的背景区域在虚拟图像中不可见时，则发生遮挡。The core idea of the depth map-based rendering technology is to use depth information and camera parameters to project the pixels in the reference image to the target virtual viewpoint. Generally, it can be divided into two steps. First, reproject the pixels in the original reference viewpoint to the target virtual viewpoint using its depth information Their corresponding three-dimensional space positions; then according to the position of the virtual viewpoint (such as camera translation, rotation parameters, etc.), these three-dimensional space points are re-projected to the virtual camera plane for imaging to obtain the pixels in the virtual viewpoint. When drawing a virtual viewpoint, it is necessary to convert the depth into a parallax. The position of the reference pixel in the virtual viewpoint can be obtained through the parallax. The depth value determines the offset distance of the pixel in the reference viewpoint. If the depth values of adjacent pixels change sharply, a hole will be generated between the two pixels, and the sharper the depth value change, the larger the hole will be. Because the depth value changes greatly at the junction of the foreground and the background, the generation of the hole is generally located at the junction of the foreground and the background. Holes appear in the virtual image when background regions in the reference image that are occluded by foreground objects are visible in the virtual image, while occlusion occurs when background regions that are not occluded by foreground objects in the reference image are not visible in the virtual image.

虚拟视点失真大多为虚拟视点中像素位置偏移和对象扭曲，检测出的失真区域并非都可以很好地被人眼所察觉。图像由边缘、纹理及平坦区域三部分构成，不同区域不同幅度的失真对人眼视觉效果的影响不尽相同，纹理复杂度较高或纹理特征相似的区域往往可以容忍更多的失真，而边缘附近的变化则最能引起人眼的视觉感知。视觉生理、心理等方面的研究发现人类视觉系统特性和掩蔽效应对图像处理起着非常重要的作用，当图像失真小于某一范围时，人眼不能够感觉到此种影响，基于此人们提出了恰可察觉失真(JND，Just-noticeable-distortion)模型。常见的掩蔽效应包括：1)亮度掩蔽特性，人眼对被观测物体的绝对亮度判断力差，而对亮度的相对差异判断力较强，对高亮区所附加的噪声其敏感性较大；2)纹理掩蔽特性，人类视觉系统对图像平滑区域的敏感性远远高于纹理区域，纹理复杂度较高的区域往往可以容忍更多的失真。Most of the virtual viewpoint distortions are pixel position offset and object distortion in the virtual viewpoint, and not all detected distorted areas can be well perceived by human eyes. The image is composed of three parts: edge, texture, and flat area. Different areas of distortion have different effects on human visual effects. Areas with higher texture complexity or similar texture features can often tolerate more distortion, while edges The changes in the vicinity can most cause the visual perception of the human eye. Studies in visual physiology and psychology have found that the characteristics of the human visual system and the masking effect play a very important role in image processing. When the image distortion is less than a certain range, the human eye cannot feel this effect. Based on this, people put forward Just-noticeable-distortion (JND, Just-noticeable-distortion) model. Common masking effects include: 1) Brightness masking characteristics, the human eye has poor judgment on the absolute brightness of the observed object, but has a strong judgment on the relative difference in brightness, and is more sensitive to the noise added to the highlight area; 2) Texture masking characteristics, the human visual system is much more sensitive to image smooth areas than texture areas, and areas with higher texture complexity can often tolerate more distortion.

由于深度图的广泛使用，深度图的质量评估变得至关重要，能促进许多实际应用。例如，在自由视点视频系统中，检测深度失真能帮助进行深度增强，通过深度增强，可以进一步提高虚拟视点的质量，使观众可以享受更好的观看体验。深度图的质量评估的一个简单方法是将待测试的深度图与无失真参考深度图进行比较，该方法对应于全参考深度质量度量，其可以精确地测量深度图的精度，然而，在大多数实际应用中，由于深度图的误差不可避免，无失真的参考深度图通常无法获得，因此用无参考评价方法评估深度图更合理。Xiang等人提出的无参考深度图质量评估方案通过匹配彩色图像和深度图的边缘来检测误差，计算坏点率来评价深度图的质量，与绘制得到的虚拟图像的质量有较好的一致性，但是该方案只考虑了边缘附近的错误，忽略了其他平滑区域，所检测出来的只是部分误差像素，且场景的不同属性和误差分布对该方案的性能影响较大。深度图并不直接用于观看，而是作为辅助信息用于绘制虚拟视点，因此需要从应用的角度出发来评价深度图的质量。Due to the widespread use of depth maps, the quality assessment of depth maps becomes crucial and can facilitate many practical applications. For example, in a free-viewpoint video system, detecting depth distortion can help to perform depth enhancement. Through depth enhancement, the quality of virtual viewpoint can be further improved, so that viewers can enjoy a better viewing experience. A simple method for quality assessment of depth maps is to compare the depth map under test with an undistorted reference depth map. This method corresponds to a full-reference depth quality metric, which can accurately measure the accuracy of depth maps. However, in most In practical applications, due to the unavoidable errors of the depth map, the reference depth map without distortion is usually not available, so it is more reasonable to evaluate the depth map with the no-reference evaluation method. The no-reference depth map quality assessment scheme proposed by Xiang et al. detects errors by matching the edges of the color image and the depth map, and calculates the dead pixel rate to evaluate the quality of the depth map, which has a good consistency with the quality of the drawn virtual image. , but this scheme only considers the errors near the edge, ignoring other smooth areas, and only some error pixels are detected, and the different attributes and error distribution of the scene have a great impact on the performance of the scheme. The depth map is not directly used for viewing, but is used as auxiliary information to draw a virtual viewpoint. Therefore, it is necessary to evaluate the quality of the depth map from an application point of view.

发明内容Contents of the invention

本发明所要解决的技术问题是提供一种结合JND模型的交叉验证深度图质量评价方法，其不需要无失真参考深度图，且能够有效地提高评价结果与绘制得到的虚拟视点的质量之间的一致性。The technical problem to be solved by the present invention is to provide a cross-validation depth map quality evaluation method combined with the JND model, which does not require a distortion-free reference depth map, and can effectively improve the relationship between the evaluation result and the quality of the drawn virtual viewpoint. consistency.

本发明解决上述技术问题所采用的技术方案为：一种结合JND模型的交叉验证深度图质量评价方法，其特征在于包括以下步骤：The technical solution adopted by the present invention to solve the above-mentioned technical problems is: a method for evaluating the quality of a cross-validated depth map combined with a JND model, which is characterized in that it includes the following steps:

①将待评价的深度图记为D_tar，将D_tar对应的彩色图记为Τ_tar，将除D_tar和Τ_tar所在视点外的另一个已知视点定义为辅助视点，将辅助视点上的彩色图记为T_ref；然后通过将D_tar中的所有像素点的像素值转化为视差值，将Τ_tar中的所有像素点经3D-Warping映射到T_ref中；其中，D_tar、Τ_tar和T_ref的垂直方向上的像素点的总个数为M，D_tar、Τ_tar和T_ref的水平方向上的像素点的总个数为N；① mark the depth map to be evaluated as D _tar , mark the color map corresponding to D _tar as Τ _tar , define another known viewpoint except the viewpoint where D _tar and Τ _tar are located as an auxiliary viewpoint, and define the The color map is marked as T _ref ; then by converting the pixel values of all pixels in D _tar into parallax values, all pixels in Τ _tar are mapped to T _ref through 3D-Warping; wherein, D _tar , Τ The total number of pixels on the vertical direction of _tar and T _ref is M, and the total number of pixels on the horizontal direction of D _tar , Τ _tar and T _ref is N;

②令E_tar表示尺寸大小与D_tar的尺寸大小相同的差值图，将E_tar中坐标位置为(x,y)的像素点的像素值记为E_tar(x,y)，当辅助视点在D_tar和Τ_tar所在视点的左边时，判断y+d_tar,p(x,y)是否大于N，如果是，则令E_tar(x,y)＝0，否则，满足u＝x、v＝y+d_tar,p(x,y)，E_tar(x,y)＝|Ι_tar(x,y)-Ι_ref(u,v)|；当辅助视点在D_tar和Τ_tar所在视点的右边时，判断y-d_tar,p(x,y)是否小于1，如果是，则令E_tar(x,y)＝0，否则，满足u＝x、v＝y-d_tar,p(x,y)，E_tar(x,y)＝|Ι_tar(x,y)-Ι_ref(u,v)|；其中，1≤x≤M,1≤y≤N，1≤u≤M,1≤v≤N，d_tar,p(x,y)表示D_tar中坐标位置为(x,y)的像素点的像素值转化得到的视差值，符号“||”为取绝对值符号，Ι_tar(x,y)表示Τ_tar中坐标位置为(x,y)的像素点的亮度分量，Ι_ref(u,v)表示T_ref中坐标位置为(u,v)的像素点的亮度分量；②Let E _tar represent the difference map whose size is the same as that of D _tar , and record the pixel value of the pixel whose coordinate position is (x, y) in E _tar as E _tar (x, y), when the auxiliary viewpoint When D _tar and Τ _tar are on the left side of the viewpoint, judge whether y+d _{tar, p} (x, y) is greater than N, if so, then make E _tar (x, y)=0, otherwise, satisfy u=x, v=y+d _{tar, p} (x, y), E _tar (x, y)=|Ι _tar (x, y)-Ι _ref (u, v)|; when the auxiliary viewpoint is at the location of D _tar and Τ _tar When on the right side of the viewpoint, judge whether yd _tar,p (x, y) is less than 1, if yes, then set E _tar (x, y)=0, otherwise, satisfy u=x, v=yd _tar,p (x, y), E _tar (x, y) = |Ι _tar (x, y)-Ι _ref (u, v)|; among them, 1≤x≤M, 1≤y≤N, 1≤u≤M, 1 ≤v≤N, d _{tar, p} (x, y) represents the disparity value converted from the pixel value of the pixel whose coordinate position is (x, y) in D _tar , the symbol "||" is the absolute value symbol, Ι _tar (x, y) represents the luminance component of a pixel whose coordinate position is (x, y) in Τ _tar , and Ι _ref (u, v) represents the brightness of a pixel whose coordinate position is (u, v) in T _ref weight;

③令C表示尺寸大小与D_tar的尺寸大小相同的遮挡掩膜图像，将C中坐标位置为(x,y)的像素点的像素值记为C(x,y)，将C中的每个像素点的像素值初始化为0，将Τ_tar中经3D-Warping映射到T_ref中坐标位置为(u,v)处的像素点的总个数记为N_(u,v)；当N_(u,v)＝1时，令C(x,y)＝0；当N_(u,v)>1时，其中，N_(u,v)的值为0或为1或大于1，D_tar(x,y)表示D_tar中坐标位置为(x,y)的像素点的像素值，max()为取最大值函数，1≤x_(u,v),i≤M,1≤y_(u,v),i≤N，(x_(u,v),i,y_(u,v),i)表示Τ_tar中经3D-Warping映射到T_ref中坐标位置为(u,v)处的N_(u,v)个像素点中的第i个像素点在Τ_tar中的坐标位置，D_tar(x_(u,v),i,y_(u,v),i)表示D_tar中坐标位置为(x_(u,v),i,y_(u,v),i)的像素点的像素值；③ Let C represent the occlusion mask image whose size is the same as that of D _tar , record the pixel value of the pixel whose coordinate position is (x, y) in C as C(x, y), and record each pixel in C The pixel value of each pixel point is initialized to 0, and the total number of pixels at the coordinate position (u, v) in T ref is mapped to T _ref through 3D-Warping in T _tar and recorded as N _{(u, v)} ; when N When _(u,v) ＝1, set C(x,y)＝0; when N _(u,v) >1, Among them, the value of N _{(u, v)} is 0 or 1 or greater than 1, D _tar (x, y) represents the pixel value of the pixel point whose coordinate position is (x, y) in D _tar , and max() is to take Maximum value function, 1≤x _(u,v),i ≤M,1≤y _(u,v),i ≤N, (x _(u,v),i ,y _(u,v),i ) means The coordinate position of the i-th pixel in the N _{(u, v) pixel points at (u, v)} at the coordinate position in _{T ref} _through 3D-Warping mapping in T _tar , D _tar (x _{(u, v), i} , y _{(u, v), i} ) represents the pixel value of the pixel point whose coordinate position is (x _{(u, v), i} , y _{(u, v), i} ) in D _tar ;

④利用C去除E_tar中被遮挡的像素点，得到去遮挡后的差值图，记为E'_tar，将E'_tar中坐标位置为(x,y)的像素点的像素值记为E'_tar(x,y)，E'_tar(x,y)＝E_tar(x,y)×(1-C(x,y))；④Use C to remove the occluded pixels in E _tar to obtain the difference map after de-occlusion, which is denoted as E' _tar , and the pixel value of the pixel whose coordinate position is (x, y) in E' _tar is denoted as E ' _tar (x, y), E' _tar (x, y) = E _tar (x, y) × (1-C (x, y));

⑤计算T_ref中的每个像素点的纹理判断因子，将T_ref中坐标位置为(u,v)的像素点的纹理判断因子记为z(u,v)，其中，1≤u≤M,1≤v≤N，z_h(u,v)表示T_ref中坐标位置为(u,v)的像素点的水平方向的纹理判断因子，z_h(u,v)的值为1或0，z_h(u,v)＝1表示T_ref中坐标位置为(u,v)的像素点为水平方向的纹理像素点，z_h(u,v)＝0表示T_ref中坐标位置为(u,v)的像素点为水平方向的非纹理像素点，z_v(u,v)表示T_ref中坐标位置为(u,v)的像素点的垂直方向的纹理判断因子，z_v(u,v)的值为1或0，z_v(u,v)＝1表示T_ref中坐标位置为(u,v)的像素点为垂直方向的纹理像素点，z_v(u,v)＝0表示T_ref中坐标位置为(u,v)的像素点为垂直方向的非纹理像素点；⑤ Calculate the texture judgment factor of each pixel in T _ref , and record the texture judgment factor of the pixel whose coordinate position is (u, v) in T _ref as z(u, v), Among them, 1≤u≤M, 1≤v≤N, z _h (u, v) represents the texture judgment factor in the horizontal direction of the pixel whose coordinate position is (u, v) in T _ref , z _h (u, v ) value is 1 or 0, z _h (u, v) = 1 means that the pixel at the coordinate position (u, v) in T _ref is a texture pixel point in the horizontal direction, z _h (u, v) = 0 means The pixel at the coordinate position (u, v) in T _ref is a non-texture pixel in the horizontal direction, and z _v (u, v) represents the texture in the vertical direction of the pixel at the coordinate position (u, v) in T _ref Judgment factor, the value of z _v (u, v) is 1 or 0, z _v (u, v) = 1 means that the pixel at the coordinate position (u, v) in T _ref is a texture pixel in the vertical direction, z _v (u, v)=0 indicates that the pixel at the coordinate position (u, v) in T _ref is a non-texture pixel in the vertical direction;

⑥令T表示尺寸大小与T_ref的尺寸大小相同的区域标记图，将T中坐标位置为(u,v)的像素点的像素值记为T(u,v)，将T中的每个像素点的像素值初始化为0；利用Canny算子检测出T_ref中的边缘区域，假设T_ref中坐标位置为(u,v)的像素点属于边缘区域，则令T(u,v)＝1；假设T_ref中坐标位置为(u,v)的像素点的纹理判断因子z(u,v)＝1，则当T(u,v)＝0时确定T_ref中坐标位置为(u,v)的像素点属于纹理区域，并重新令T(u,v)＝2；其中，T(u,v)的值为0或1或2，T(u,v)＝0代表T_ref中坐标位置为(u,v)的像素点属于平坦区域，T(u,v)＝1代表T_ref中坐标位置为(u,v)的像素点属于边缘区域，T(u,v)＝2代表T_ref中坐标位置为(u,v)的像素点属于纹理区域；⑥Let T denote the region marker map with the same size as T _ref , record the pixel value of the pixel whose coordinate position is (u,v) in T as T(u,v), and set each of T in T The pixel value of the pixel is initialized to 0; use the Canny operator to detect the edge area in T _ref , assuming that the pixel with the coordinate position (u, v) in T _ref belongs to the edge area, then set T (u, v) = 1; Assuming that the texture judgment factor z(u,v)=1 of the pixel whose coordinate position in T _ref is (u,v), then when T(u,v)=0, determine that the coordinate position in T _ref is (u ,v) belongs to the texture area, and set T(u,v)=2 again; wherein, the value of T(u,v) is 0 or 1 or 2, and T(u,v)=0 represents T _ref The pixel point whose coordinate position is (u, v) belongs to the flat area, T(u, v)=1 means that the pixel point whose coordinate position is (u, v) in T _ref belongs to the edge area, T(u, v)= 2 means that the pixel at the coordinate position (u, v) in T _ref belongs to the texture area;

⑦引入基于亮度掩蔽和纹理掩蔽效应的JND模型，利用JND模型，并根据T_ref中的每个像素点所属区域，计算T_ref中的每个像素点的误差可视阈值，将T_ref中坐标位置为(u,v)的像素点的误差可视阈值记为Th(u,v)，其中，max()为取最大值函数，min()为取最小值函数，bg(u,v)表示T_ref中坐标位置为(u,v)的像素点的平均背景亮度，mg(u,v)表示T_ref中坐标位置为(u,v)的像素点的周围亮度的最大平均加权，LA(u,v)表示T_ref中坐标位置为(u,v)的像素点的亮度掩蔽效应，f(bg(u,v),mg(u,v))＝mg(u,v)×α(bg(u,v))+β(bg(u,v))，α(bg(u,v))＝bg(u,v)×0.0001+0.115，β(bg(u,v))＝0.5-bg(u,v)×0.01；⑦ Introduce a JND model based on brightness masking and texture masking effects, use the JND model, and calculate the error visual threshold of each pixel in T _ref according to the area to which each pixel in T _ref belongs, and convert the coordinates in T _ref The error visual threshold of the pixel at position (u, v) is denoted as Th(u, v), Among them, max() is the function of taking the maximum value, min() is the function of taking the minimum value, bg(u, v) represents the average background brightness of the pixel at the coordinate position (u, v) in T _ref , mg(u, v) represents the maximum average weighting of the surrounding brightness of the pixel at the coordinate position (u, v) in T _ref , and LA(u, v) represents the brightness masking effect of the pixel at the coordinate position (u, v) in T _ref , f(bg(u,v),mg(u,v))=mg(u,v)×α(bg(u,v))+β(bg(u,v)), α(bg(u ,v))=bg(u,v)×0.0001+0.115, β(bg(u,v))=0.5-bg(u,v)×0.01;

⑧令E表示尺寸大小与D_tar的尺寸大小相同的深度误差图，将E中坐标位置为(x,y)的像素点的像素值记为E(x,y)，当E'_tar(x,y)＝0时，E(x,y)＝0；当E'_tar(x,y)≠0时，其中，V_(x,y)＝(u,v)表示一个映射过程，(x,y)为Τ_tar中的像素点的坐标位置，(u,v)为T_ref中的像素点的坐标位置，当T_ref所在视点在Τ_tar所在视点的左边时，满足u＝x、v＝y+d_tar,p(x,y)；当T_ref所在视点在Τ_tar所在视点的右边时，满足u＝x、v＝y-d_tar,p(x,y)；⑧ Let E represent the depth error map whose size is the same as that of D _tar , record the pixel value of the pixel whose coordinate position is (x, y) in E as E(x, y), when E' _tar (x ,y)=0, E(x,y)=0; when E' _tar (x,y)≠0, Wherein, V _{(x, y)} =(u, v) represents a mapping process, (x, y) is the coordinate position of the pixel in T _tar , (u, v) is the coordinate position of the pixel in T _ref , when the viewpoint of T _ref is on the left side of the viewpoint of Τ _tar , satisfy u=x, v=y+d _{tar, p} (x, y); when the viewpoint of T _ref is on the right of the viewpoint of Τ _tar , satisfy u = x, v = yd _{tar, p} (x, y);

⑨统计E中像素值为1的像素点的总个数，记为num_E；然后计算D_tar中的错误像素点的比率作为D_tar的质量评价值，记为EPR， ⑨ count the total number of pixels with a pixel value of 1 in E, and record it as num _E ; then calculate the ratio of the wrong pixels in D _tar as the quality evaluation value of D _tar , and record it as EPR,

所述的步骤①中将D_tar中的所有像素点的像素值转化为视差值的具体过程为：对于D_tar中坐标位置为(x,y)的像素点，将其像素值转化得到的视差值记为d_tar,p(x,y)，其中，1≤x≤M,1≤y≤N，b表示相机间的基线距离，f表示相机的焦距，Z_near为最近实际景深，Z_far为最远实际景深，D_tar(x,y)表示D_tar中坐标位置为(x,y)的像素点的像素值。The specific process of converting the pixel values of all pixels in D _tar into parallax values in the described step 1. is: for a pixel whose coordinate position is (x, y) in D _tar , its pixel value is converted into The disparity value is recorded as d _tar,p (x,y), Among them, 1≤x≤M, 1≤y≤N, b represents the baseline distance between cameras, f represents the focal length of the camera, Z _near is the closest actual depth of field, Z _far is the farthest actual depth of field, D _tar (x,y) Indicates the pixel value of the pixel whose coordinate position is (x, y) in D _tar .

所述的步骤⑤中z_h(u,v)和z_v(u,v)的获取过程为：The acquisition process of z _h (u, v) and z _v (u, v) in the step ⑤ is:

⑤_1、计算T_ref中的每个像素点沿水平方向的差分信号，将T_ref中坐标位置为(u,v)的像素点沿水平方向的差分信号记为d_h(u,v)，其中，I_ref(u,v+1)表示T_ref中坐标位置为(u,v+1)的像素点的亮度分量；⑤_1. Calculate the differential signal of each pixel point in T _ref along the horizontal direction, and record the differential signal of the pixel point whose coordinate position is (u, v) in T _ref along the horizontal direction as d _h (u, v), Wherein, I _ref (u, v+1) represents the luminance component of the pixel whose coordinate position is (u, v+1) in T _ref ;

⑤_2、计算T_ref中的每个像素点沿水平方向的差分信号的特征符号，将d_h(u,v)的特征符号记为symd_h(u,v)， ⑤_2. Calculate the characteristic sign of the differential signal of each pixel point in T _ref along the horizontal direction, and record the characteristic sign of d _h (u, v) as symd _h (u, v),

⑤_3、计算z_h(u,v)，其中，d_hsym(u,v)为中间变量，symd_h(u,v+1)表示T_ref中坐标位置为(u,v+1)的像素点沿水平方向的差分信号的特征符号；⑤_3, calculate z _h (u, v), Among them, d _hsym (u, v) is an intermediate variable, symd _h (u, v+1) represents the characteristic symbol of the differential signal along the horizontal direction of the pixel at the coordinate position (u, v+1) in T _ref ;

⑤_4、计算T_ref中的每个像素点沿垂直方向的差分信号，将T_ref中坐标位置为(u,v)的像素点沿垂直方向的差分信号记为d_v(u,v)，其中，I_ref(u+1,v)表示T_ref中坐标位置为(u+1,v)的像素点的亮度分量；⑤_4. Calculate the differential signal of each pixel point in T _ref along the vertical direction, and record the differential signal of the pixel point whose coordinate position is (u, v) in T _ref along the vertical direction as d _v (u, v), Wherein, I _ref (u+1, v) represents the luminance component of the pixel whose coordinate position in T _ref is (u+1, v);

⑤_5、计算T_ref中的每个像素点沿垂直方向的差分信号的特征符号，将d_v(u,v)的特征符号记为symd_v(u,v)， ⑤_5. Calculate the characteristic sign of the differential signal of each pixel point in T _ref along the vertical direction, and record the characteristic sign of d _v (u, v) as symd _v (u, v),

⑤_6、计算z_v(u,v)，其中，d_vsym(u,v)为中间变量，symd_v(u+1,v)表示T_ref中坐标位置为(u+1,v)的像素点沿垂直方向的差分信号的特征符号。⑤_6, calculate z _v (u, v), Among them, d _vsym (u, v) is an intermediate variable, symd _v (u+1, v) represents the characteristic sign of the differential signal along the vertical direction of the pixel at the coordinate position (u+1, v) in T _ref .

与现有技术相比，本发明的优点在于：Compared with the prior art, the present invention has the advantages of:

1)本发明方法充分考虑了深度图在虚拟视点绘制中的作用，深度图并不用于直接观看，而是提供像素位置偏移信息，因此用深度失真造成的虚拟视点失真来标记深度失真区域更合理。1) The method of the present invention fully considers the role of the depth map in virtual viewpoint rendering. The depth map is not used for direct viewing, but provides pixel position offset information. Therefore, it is more convenient to use the virtual viewpoint distortion caused by depth distortion to mark the depth distortion area. Reasonable.

2)本发明方法深入探索了深度图失真对虚拟视点质量的影响，深度失真会造成利用该深度信息绘制得到的虚拟视点中像素位置的偏移和对象扭曲，对应像素的亮度值错误，一般深度失真越严重，虚拟视点像素的亮度值误差越大，从而可以将虚拟视点像素的亮度误差值作为对应深度像素的误差标记，得到差值图。2) The method of the present invention deeply explores the influence of depth map distortion on the quality of virtual viewpoint. Depth distortion will cause offset and object distortion of pixel positions in the virtual viewpoint drawn by using the depth information, and the brightness value of the corresponding pixel is wrong. Generally, the depth The more serious the distortion, the greater the error of the luminance value of the virtual viewpoint pixel, so that the luminance error value of the virtual viewpoint pixel can be used as the error mark of the corresponding depth pixel to obtain a difference map.

3)本发明方法充分考虑了虚拟视点绘制时边界像素的遮挡情况，彩色图像中的像素点经3D-Warping映射到辅助视点上后，物体边界附近离成像平面较近的像素点可能会挡住离成像平面较远的像素点，由于被遮挡的像素点的失真对最终虚拟视点的质量没有影响，因此可以将这些被遮挡的像素点标记出来得到遮挡掩膜，在差值图中去除被遮挡像素的误差标记，可使得深度图质量评价结果与虚拟视点质量客观结果更加一致。3) The method of the present invention fully considers the occlusion of the boundary pixels when the virtual viewpoint is drawn. After the pixels in the color image are mapped to the auxiliary viewpoint by 3D-Warping, the pixels near the object boundary that are closer to the imaging plane may block the distance from the imaging plane. For pixels far from the imaging plane, since the distortion of the occluded pixels has no effect on the quality of the final virtual viewpoint, these occluded pixels can be marked to obtain an occlusion mask, and the occluded pixels can be removed in the difference map The error mark can make the evaluation result of the depth map quality more consistent with the objective result of the virtual viewpoint quality.

4)本发明方法充分考虑了人眼视觉特性，将辅助视点上的彩色图像划分为边缘、纹理和平坦区域三个部分，利用基于亮度掩蔽和纹理掩蔽效应的JND模型得到不同部分各像素点的误差可视阈值，在去遮挡后的差值图中将映射后小于对应误差可视阈值的误差标记去除，得到最终的深度误差图，使深度图质量评价结果更加符合人眼特性。4) The method of the present invention fully considers the visual characteristics of the human eye, divides the color image on the auxiliary viewpoint into three parts of edge, texture and flat area, and uses the JND model based on brightness masking and texture masking effects to obtain the pixel points of different parts. The error visual threshold is used to remove the error marks that are smaller than the corresponding error visual threshold after mapping in the de-occluded difference map to obtain the final depth error map, so that the quality evaluation results of the depth map are more in line with the characteristics of the human eye.

附图说明Description of drawings

图1为本发明方法的总体实现框图；Fig. 1 is the overall realization block diagram of the inventive method;

图2为交叉验证过程的示意图；Figure 2 is a schematic diagram of the cross-validation process;

图3为遮挡示意图；Figure 3 is a schematic diagram of occlusion;

图4a为Cones序列第2视点由AdaptBP方法估计得到的深度图；Figure 4a is the depth map estimated by the AdaptBP method at the second viewpoint of the Cones sequence;

图4b为图4a所示的深度图对应的彩色图；Figure 4b is a color map corresponding to the depth map shown in Figure 4a;

图4c为Cones序列第3视点的彩色图；Figure 4c is a color map of the third viewpoint of the Cones sequence;

图4d为经过交叉验证后得到的图4a所示的深度图的差值图；Figure 4d is a difference map of the depth map shown in Figure 4a obtained after cross-validation;

图5a为利用图4a所示的深度图中的像素点的像素值映射到Cones序列第3视点得到的遮挡掩膜图像；Figure 5a is an occlusion mask image obtained by mapping the pixel values of the pixels in the depth map shown in Figure 4a to the third viewpoint of the Cones sequence;

图5b为图4a所示的深度图对应的去遮挡后的差值图；Fig. 5b is a difference map after de-occlusion corresponding to the depth map shown in Fig. 4a;

图5c为图4a所示的深度图对应的深度误差图。Fig. 5c is a depth error map corresponding to the depth map shown in Fig. 4a.

具体实施方式Detailed ways

以下结合附图实施例对本发明作进一步详细描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

本发明提出的一种结合JND模型的交叉验证深度图质量评价方法，其总体实现框图如图1所示，其包括以下步骤：A method for evaluating the quality of a cross-validated depth map combined with a JND model proposed by the present invention, its overall implementation block diagram is shown in Figure 1, and it includes the following steps:

①将待评价的深度图记为D_tar，将D_tar对应的彩色图记为Τ_tar，将除D_tar和Τ_tar所在视点外的另一个已知视点定义为辅助视点，将辅助视点上的彩色图记为T_ref；然后通过将D_tar中的所有像素点的像素值转化为视差值，将Τ_tar中的所有像素点经3D-Warping映射到T_ref中；其中，D_tar、Τ_tar和T_ref的垂直方向上的像素点的总个数为M，D_tar、Τ_tar和T_ref的水平方向上的像素点的总个数为N。① mark the depth map to be evaluated as D _tar , mark the color map corresponding to D _tar as Τ _tar , define another known viewpoint except the viewpoint where D _tar and Τ _tar are located as an auxiliary viewpoint, and define the The color map is marked as T _ref ; then by converting the pixel values of all pixels in D _tar into parallax values, all pixels in Τ _tar are mapped to T _ref through 3D-Warping; wherein, D _tar , Τ The total number of pixels in the vertical direction of _tar and T _ref is M, and the total number of pixels in the horizontal direction of D _tar , Τ _tar and T _ref is N.

在此具体实施例中，步骤①中将D_tar中的所有像素点的像素值转化为视差值的具体过程为：对于D_tar中坐标位置为(x,y)的像素点，将其像素值转化得到的视差值记为d_tar,p(x,y)，其中，1≤x≤M,1≤y≤N，b表示相机间的基线距离，f表示相机的焦距，Z_near为最近实际景深，Z_far为最远实际景深，D_tar(x,y)表示D_tar中坐标位置为(x,y)的像素点的像素值。In this specific embodiment, the specific process of converting the pixel values of all pixels in D _tar into parallax values in step 1. is: for a pixel whose coordinate position is (x, y) in D _tar , its pixel The disparity value converted from the value is denoted as d _tar,p (x,y), Among them, 1≤x≤M, 1≤y≤N, b represents the baseline distance between cameras, f represents the focal length of the camera, Z _near is the closest actual depth of field, Z _far is the farthest actual depth of field, D _tar (x,y) Indicates the pixel value of the pixel whose coordinate position is (x, y) in D _tar .

②令E_tar表示尺寸大小与D_tar的尺寸大小相同的差值图，将E_tar中坐标位置为(x,y)的像素点的像素值记为E_tar(x,y)，当辅助视点在D_tar和Τ_tar所在视点的左边时，判断y+d_tar,p(x,y)是否大于N，如果是，则令E_tar(x,y)＝0，否则，满足u＝x、v＝y+d_tar,p(x,y)，E_tar(x,y)＝|Ι_tar(x,y)-Ι_ref(u,v)|；当辅助视点在D_tar和Τ_tar所在视点的右边时，判断y-d_tar,p(x,y)是否小于1，如果是，则令E_tar(x,y)＝0，否则，满足u＝x、v＝y-d_tar,p(x,y)，E_tar(x,y)＝|Ι_tar(x,y)-Ι_ref(u,v)|；其中，1≤x≤M,1≤y≤N，1≤u≤M,1≤v≤N，d_tar,p(x,y)表示D_tar中坐标位置为(x,y)的像素点的像素值转化得到的视差值，符号“||”为取绝对值符号，Ι_tar(x,y)表示Τ_tar中坐标位置为(x,y)的像素点的亮度分量，Ι_ref(u,v)表示T_ref中坐标位置为(u,v)的像素点的亮度分量。②Let E _tar represent the difference map whose size is the same as that of D _tar , and record the pixel value of the pixel whose coordinate position is (x, y) in E _tar as E _tar (x, y), when the auxiliary viewpoint When D _tar and Τ _tar are on the left side of the viewpoint, judge whether y+d _{tar, p} (x, y) is greater than N, if so, then make E _tar (x, y)=0, otherwise, satisfy u=x, v=y+d _{tar, p} (x, y), E _tar (x, y)=|Ι _tar (x, y)-Ι _ref (u, v)|; when the auxiliary viewpoint is at the location of D _tar and Τ _tar When on the right side of the viewpoint, judge whether yd _tar,p (x, y) is less than 1, if yes, then set E _tar (x, y)=0, otherwise, satisfy u=x, v=yd _tar,p (x, y), E _tar (x, y) = |Ι _tar (x, y)-Ι _ref (u, v)|; among them, 1≤x≤M, 1≤y≤N, 1≤u≤M, 1 ≤v≤N, d _{tar, p} (x, y) represents the disparity value converted from the pixel value of the pixel whose coordinate position is (x, y) in D _tar , the symbol "||" is the absolute value symbol, Ι _tar (x, y) represents the luminance component of a pixel whose coordinate position is (x, y) in Τ _tar , and Ι _ref (u, v) represents the brightness of a pixel whose coordinate position is (u, v) in T _ref portion.

步骤①中将Τ_tar中的所有像素点经3D-Warping映射到T_ref中的过程和步骤②的过程为交叉验证过程，图2给出了交叉验证过程的示意图，其中，T_l、T_r、D_l、D_r对应表示左视点彩色图、右视点彩色图、左视点深度图和右视点深度图。当要得到左视点深度图对应的差值图时，将右视点彩色图作为辅助信息进行交叉验证。T_l中坐标位置为(x_l,y_l)的像素点对应的亮度值为Ι_l1，利用D_l中的深度信息，经过3D-Warping过程映射到右视点彩色图T_r上，若超出图像范围，则L_d中坐标位置为(x_l,y_l)的像素点赋值为0，若映射到右视点彩色图T_r中坐标位置为(x_lr,y_l)的像素点处，对应的亮度值为Ι_r1，则将两个像素点的亮度值的差值|Ι_l1-Ι_r1|赋给L_d中坐标位置为(x_l,y_l)的像素点，L_d即为左视点深度图对应的差值图。同理,要得到右视点深度图对应的差值图时，将左视点彩色图T_l作为辅助信息进行交叉验证，即可得到右视点深度图对应的差值图R_d。In step ①, the process of mapping all pixels in Τ _tar to T _ref through 3D-Warping and the process of step ② are cross-validation processes. Figure 2 shows a schematic diagram of the cross-validation process, where T _l , T _r , D _l , and D _r correspond to the left-viewpoint color map, right-viewpoint color map, left-viewpoint depth map, and right-viewpoint depth map. When it is necessary to obtain the difference map corresponding to the left-viewpoint depth map, the right-viewpoint color map is used as auxiliary information for cross-validation. The brightness value corresponding to the pixel at the coordinate position (x _l , y _l ) in T _l is I _l1 , using the depth information in D _l , it is mapped to the right-viewpoint color image T _r through the 3D-Warping process. range, the pixel at the coordinate position (x _l , y _l ) in L _d is assigned a value of 0, and if it is mapped to the pixel at the coordinate position (x _lr , y _l ) in the right-view color image T _r , the corresponding The luminance value is Ι _r1 , then assign the difference |Ι _l1 -Ι _r1 | of the luminance values of two pixels to the pixel in L _d whose coordinate position is (x _l , y _l ), and L _d is the left view point The difference map corresponding to the depth map. Similarly, to obtain the difference map corresponding to the right-viewpoint depth map, the left-viewpoint color map T _l is used as auxiliary information for cross-validation, and the difference map _Rd corresponding to the right-viewpoint depth map can be obtained.

用Cones序列第2视点由AdaptBP方法估计得到的深度图作为待评价的深度图，如图4a所示；图4b为图4a所示深度图对应的彩色图；用Cones序列第3视点的彩色图作为辅助视点上的彩色图，如图4c所示；经交叉验证后得到的差值图如图4d所示。Use the depth map estimated by the AdaptBP method at the second viewpoint of the Cones sequence as the depth map to be evaluated, as shown in Figure 4a; Figure 4b is the color map corresponding to the depth map shown in Figure 4a; use the color map of the third viewpoint of the Cones sequence As the color map on the auxiliary viewpoint, it is shown in Figure 4c; the difference map obtained after cross-validation is shown in Figure 4d.

③令C表示尺寸大小与D_tar的尺寸大小相同的遮挡掩膜图像，将C中坐标位置为(x,y)的像素点的像素值记为C(x,y)，将C中的每个像素点的像素值初始化为0，将Τ_tar中经3D-Warping映射到T_ref中坐标位置为(u,v)处的像素点的总个数记为N_(u,v)；当N_(u,v)＝1时，令C(x,y)＝0；当N_(u,v)>1时，其中，N_(u,v)的值为0或为1或大于1，D_tar(x,y)表示D_tar中坐标位置为(x,y)的像素点的像素值，max()为取最大值函数，1≤x_(u,v),i≤M,1≤y_(u,v),i≤N，(x_(u,v),i,y_(u,v),i)表示Τ_tar中经3D-Warping映射到T_ref中坐标位置为(u,v)处的N_(u,v)个像素点中的第i个像素点在Τ_tar中的坐标位置，D_tar(x_(u,v),i,y_(u,v),i)表示D_tar中坐标位置为(x_(u,v),i,y_(u,v),i)的像素点的像素值。③ Let C represent the occlusion mask image whose size is the same as that of D _tar , record the pixel value of the pixel whose coordinate position is (x, y) in C as C(x, y), and record each pixel in C The pixel value of each pixel point is initialized to 0, and the total number of pixels at the coordinate position (u, v) in T ref is mapped to T _ref through 3D-Warping in T _tar and recorded as N _{(u, v)} ; when N When _(u,v) ＝1, set C(x,y)＝0; when N _(u,v) >1, Among them, the value of N _{(u, v)} is 0 or 1 or greater than 1, D _tar (x, y) represents the pixel value of the pixel point whose coordinate position is (x, y) in D _tar , and max() is to take Maximum value function, 1≤x _(u,v),i ≤M,1≤y _(u,v),i ≤N, (x _(u,v),i ,y _(u,v),i ) means The coordinate position of the i-th pixel in the N _{(u, v) pixel points at (u, v)} at the coordinate position in _{T ref} _through 3D-Warping mapping in T _tar , D _tar (x _(u,v),i ,y _(u,v),i ) represents the pixel value of the pixel point whose coordinate position is (x _(u,v),i ,y _(u,v),i ) in D _tar .

图3给出了遮挡示意图，左参考视点前景边界点、背景边界点从左至右分别表示为右参考视点前景边界点、背景边界点从左至右分别表示为在3D-Warping过程中，左参考视点中的边界点分别映射到由它绘制的虚拟视点中的同理，右参考视点中的边界点分别映射到由它绘制的虚拟视点中的在左参考虚拟视图中，从到这一部分，既有左参考图像中的前景像素点映射过来，也有背景像素点映射过来。同样的，在右参考虚拟视图中，从到这一部分，既有右参考图像中的前景像素点映射过来，也有背景像素点映射过来。这些部分发生了前景像素点对背景像素点的遮挡，在进行交叉验证后得到的差值图里，被遮挡的背景像素点也会被标记出来，但这并不是由于深度值错误导致的，所以需要将这部分背景像素点去除。经过3D-Warping后，对于映射到同一位置的像素点，比较其深度值大小，保留深度值最大的像素点，其余像素点标记出来得到遮挡掩膜图像。Figure 3 shows a schematic diagram of occlusion. The left reference viewpoint foreground boundary point and background boundary point are denoted from left to right as The foreground boundary point and background boundary point of the right reference viewpoint are denoted from left to right as In the 3D-Warping process, the boundary point in the left reference viewpoint respectively mapped to the virtual viewpoint drawn by it Similarly, the boundary points in the right reference viewpoint respectively mapped to the virtual viewpoint drawn by it In the left reference virtual view, from arrive In this part, both the foreground pixels in the left reference image and the background pixels are mapped. Similarly, in the right reference virtual view, from the arrive In this part, both the foreground pixels in the right reference image and the background pixels are mapped. In these parts, the foreground pixels are occluded from the background pixels. In the difference map obtained after cross-validation, the occluded background pixels will also be marked, but this is not caused by the wrong depth value, so These background pixels need to be removed. After 3D-Warping, for the pixels mapped to the same position, compare their depth values, keep the pixel with the largest depth value, and mark the remaining pixels to obtain an occlusion mask image.

用图4a所示的深度图作为待评价的深度图，图4c所示的彩色图作为辅助视点上的彩色图，得到的遮挡掩膜图像如图5a所示。Using the depth map shown in Figure 4a as the depth map to be evaluated, and the color map shown in Figure 4c as the color map on the auxiliary viewpoint, the obtained occlusion mask image is shown in Figure 5a.

④利用C去除E_tar中被遮挡的像素点，得到去遮挡后的差值图，记为E'_tar，将E'_tar中坐标位置为(x,y)的像素点的像素值记为E'_tar(x,y)，E'_tar(x,y)＝E_tar(x,y)×(1-C(x,y))。④Use C to remove the occluded pixels in E _tar to obtain the difference map after de-occlusion, which is denoted as E' _tar , and the pixel value of the pixel whose coordinate position is (x, y) in E' _tar is denoted as E ' _tar (x, y), E' _tar (x, y) = E _tar (x, y) × (1-C (x, y)).

图5b为在图4d所示的差值图中去除图5a所示的遮挡掩膜图像中被遮挡的像素点后得到的差值图。Fig. 5b is a difference map obtained after removing the occluded pixels in the occlusion mask image shown in Fig. 5a from the difference map shown in Fig. 4d.

⑤计算T_ref中的每个像素点的纹理判断因子，将T_ref中坐标位置为(u,v)的像素点的纹理判断因子记为z(u,v)，其中，1≤u≤M,1≤v≤N，z_h(u,v)表示T_ref中坐标位置为(u,v)的像素点的水平方向的纹理判断因子，z_h(u,v)的值为1或0，z_h(u,v)＝1表示T_ref中坐标位置为(u,v)的像素点为水平方向的纹理像素点，z_h(u,v)＝0表示T_ref中坐标位置为(u,v)的像素点为水平方向的非纹理像素点，z_v(u,v)表示T_ref中坐标位置为(u,v)的像素点的垂直方向的纹理判断因子，z_v(u,v)的值为1或0，z_v(u,v)＝1表示T_ref中坐标位置为(u,v)的像素点为垂直方向的纹理像素点，z_v(u,v)＝0表示T_ref中坐标位置为(u,v)的像素点为垂直方向的非纹理像素点。⑤ Calculate the texture judgment factor of each pixel in T _ref , and record the texture judgment factor of the pixel whose coordinate position is (u, v) in T _ref as z(u, v), Among them, 1≤u≤M, 1≤v≤N, z _h (u, v) represents the texture judgment factor in the horizontal direction of the pixel whose coordinate position is (u, v) in T _ref , z _h (u, v ) value is 1 or 0, z _h (u, v) = 1 means that the pixel at the coordinate position (u, v) in T _ref is a texture pixel point in the horizontal direction, z _h (u, v) = 0 means The pixel at the coordinate position (u, v) in T _ref is a non-texture pixel in the horizontal direction, and z _v (u, v) represents the texture in the vertical direction of the pixel at the coordinate position (u, v) in T _ref Judgment factor, the value of z _v (u, v) is 1 or 0, z _v (u, v) = 1 means that the pixel at the coordinate position (u, v) in T _ref is a texture pixel in the vertical direction, z _v (u, v)=0 indicates that the pixel at the coordinate position (u, v) in T _ref is a non-texture pixel in the vertical direction.

在此具体实施例中，步骤⑤中z_h(u,v)和z_v(u,v)的获取过程为：In this specific embodiment, the acquisition process of z _h (u, v) and z _v (u, v) in step ⑤ is:

⑤_1、计算T_ref中的每个像素点沿水平方向的差分信号，将T_ref中坐标位置为(u,v)的像素点沿水平方向的差分信号记为d_h(u,v)，其中，I_ref(u,v+1)表示T_ref中坐标位置为(u,v+1)的像素点的亮度分量。⑤_1. Calculate the differential signal of each pixel point in T _ref along the horizontal direction, and record the differential signal of the pixel point whose coordinate position is (u, v) in T _ref along the horizontal direction as d _h (u, v), Wherein, I _ref (u, v+1) represents the luminance component of the pixel at the coordinate position (u, v+1) in T _ref .

⑤_3、计算z_h(u,v)，其中，d_hsym(u,v)为中间变量，symd_h(u,v+1)表示T_ref中坐标位置为(u,v+1)的像素点沿水平方向的差分信号的特征符号。⑤_3, calculate z _h (u, v), Among them, d _hsym (u, v) is an intermediate variable, symd _h (u, v+1) represents the characteristic symbol of the differential signal along the horizontal direction of the pixel at the coordinate position (u, v+1) in T _ref .

⑤_4、计算T_ref中的每个像素点沿垂直方向的差分信号，将T_ref中坐标位置为(u,v)的像素点沿垂直方向的差分信号记为d_v(u,v)，其中，I_ref(u+1,v)表示T_ref中坐标位置为(u+1,v)的像素点的亮度分量。⑤_4. Calculate the differential signal of each pixel point in T _ref along the vertical direction, and record the differential signal of the pixel point whose coordinate position is (u, v) in T _ref along the vertical direction as d _v (u, v), Wherein, I _ref (u+1, v) represents the luminance component of the pixel at the coordinate position (u+1, v) in T _ref .

⑥令T表示尺寸大小与T_ref的尺寸大小相同的区域标记图，将T中坐标位置为(u,v)的像素点的像素值记为T(u,v)，将T中的每个像素点的像素值初始化为0；利用Canny算子检测出T_ref中的边缘区域，假设T_ref中坐标位置为(u,v)的像素点属于边缘区域，则令T(u,v)＝1；假设T_ref中坐标位置为(u,v)的像素点的纹理判断因子z(u,v)＝1，则当T(u,v)＝0时确定T_ref中坐标位置为(u,v)的像素点属于纹理区域，并重新令T(u,v)＝2；其中，T(u,v)的值为0或1或2，T(u,v)＝0代表T_ref中坐标位置为(u,v)的像素点属于平坦区域，T(u,v)＝1代表T_ref中坐标位置为(u,v)的像素点属于边缘区域，T(u,v)＝2代表T_ref中坐标位置为(u,v)的像素点属于纹理区域。⑥Let T denote the region marker map with the same size as T _ref , record the pixel value of the pixel whose coordinate position is (u,v) in T as T(u,v), and set each of T in T The pixel value of the pixel is initialized to 0; use the Canny operator to detect the edge area in T _ref , assuming that the pixel with the coordinate position (u, v) in T _ref belongs to the edge area, then let T (u, v) = 1; Assuming that the texture judgment factor z(u,v)=1 of the pixel whose coordinate position is (u,v) in T _ref , then when T(u,v)=0, it is determined that the coordinate position in T _ref is (u ,v) belongs to the texture area, and set T(u,v)=2 again; wherein, the value of T(u,v) is 0 or 1 or 2, and T(u,v)=0 represents T _ref The pixel point whose coordinate position is (u, v) belongs to the flat area, T(u, v)=1 means that the pixel point whose coordinate position is (u, v) in T _ref belongs to the edge area, T(u, v)= 2 means that the pixel at the coordinate position (u, v) in T _ref belongs to the texture area.

⑦引入基于亮度掩蔽和纹理掩蔽效应的JND模型，利用JND模型，并根据T_ref中的每个像素点所属区域，计算T_ref中的每个像素点的误差可视阈值，将T_ref中坐标位置为(u,v)的像素点的误差可视阈值记为Th(u,v)，其中，max()为取最大值函数，min()为取最小值函数，bg(u,v)表示T_ref中坐标位置为(u,v)的像素点的平均背景亮度，bg(u,v)由加权的低通算子计算得到，mg(u,v)表示T_ref中坐标位置为(u,v)的像素点的周围亮度的最大平均加权，LA(u,v)表示T_ref中坐标位置为(u,v)的像素点的亮度掩蔽效应，f(bg(u,v),mg(u,v))＝mg(u,v)×α(bg(u,v))+β(bg(u,v))，α(bg(u,v))＝bg(u,v)×0.0001+0.115，β(bg(u,v))＝0.5-bg(u,v)×0.01。⑦ Introduce a JND model based on brightness masking and texture masking effects, use the JND model, and calculate the error visual threshold of each pixel in T _ref according to the area to which each pixel in T _ref belongs, and convert the coordinates in T _ref The error visual threshold of the pixel at position (u, v) is denoted as Th(u, v), Among them, max() is the function of taking the maximum value, min() is the function of taking the minimum value, bg(u, v) represents the average background brightness of the pixel at the coordinate position (u, v) in T _ref , bg(u, v) Calculated by a weighted low-pass operator, mg(u,v) represents the maximum average weighting of the surrounding brightness of the pixel whose coordinate position is (u,v) in T _ref , LA(u,v) represents T _ref The brightness masking effect of the pixel with the middle coordinate position (u,v), f(bg(u,v),mg(u,v))=mg(u,v)×α(bg(u,v))+β(bg(u,v)), α(bg(u, v))=bg(u,v)×0.0001+0.115, β(bg(u,v))=0.5−bg(u,v)×0.01.

⑧令E表示尺寸大小与D_tar的尺寸大小相同的深度误差图，将E中坐标位置为(x,y)的像素点的像素值记为E(x,y)，当E'_tar(x,y)＝0时，E(x,y)＝0；当E'_tar(x,y)≠0时，其中，V_(x,y)＝(u,v)表示一个映射过程，(x,y)为Τ_tar中的像素点的坐标位置，(u,v)为T_ref中的像素点的坐标位置，当T_ref所在视点在Τ_tar所在视点的左边时，满足u＝x、v＝y+d_tar,p(x,y)；当T_ref所在视点在Τ_tar所在视点的右边时，满足u＝x、v＝y-d_tar,p(x,y)。⑧ Let E represent the depth error map whose size is the same as that of D _tar , record the pixel value of the pixel whose coordinate position is (x, y) in E as E(x, y), when E' _tar (x ,y)=0, E(x,y)=0; when E' _tar (x,y)≠0, Wherein, V _{(x, y)} =(u, v) represents a mapping process, (x, y) is the coordinate position of the pixel in T _tar , (u, v) is the coordinate position of the pixel in T _ref , when the viewpoint of T _ref is on the left side of the viewpoint of Τ _tar , satisfy u=x, v=y+d _{tar, p} (x, y); when the viewpoint of T _ref is on the right of the viewpoint of Τ _tar , satisfy u =x, v=yd _tar,p (x,y).

图5c为在图5b中去除小于相应误差可视阈值的像素点后得到的图4a的深度误差图。Fig. 5c is the depth error map of Fig. 4a obtained after removing pixels smaller than the corresponding error visual threshold in Fig. 5b.

为了测试本发明方法的性能，对采用Middlebury数据库提供的多种不同算法估计得到的深度图进行测试，选用了四个场景：“Tsukuba”、“Venus”、“Teddy”和“Cones”，对于每个场景，使用了第2视点九种不同的立体匹配算法估计得到的深度图，共36幅深度图构成评价数据库。这九种不同的立体匹配算法分别为：AdaptBP、WarpMat、P-LinearS、VSW、BPcompressed、Layered、SNCC、ReliabilityDP和Infection。In order to test the performance of the method of the present invention, the depth maps estimated by using a variety of different algorithms provided by the Middlebury database are tested, and four scenes are selected: "Tsukuba", "Venus", "Teddy" and "Cones", for each For each scene, the depth maps estimated by nine different stereo matching algorithms from the second viewpoint are used, and a total of 36 depth maps constitute the evaluation database. The nine different stereo matching algorithms are: AdaptBP, WarpMat, P-LinearS, VSW, BPcompressed, Layered, SNCC, ReliabilityDP and Infection.

表1给出了评价数据库中“Tsukuba”、“Venus”、“Teddy”和“Cones”的全参考客观质量评价指标PBMP(Percentage of Bad Matching Pixels)的值，PBMP是用估计深度图与无失真参考深度图作比较来计算误差的，如果某个像素点的视差误差大于一个像素宽度，就被视为错误像素点。由于使用了无失真深度图作参考，因此PBMP是一种准确而可靠的全参考指标。Table 1 shows the values of the full-reference objective quality evaluation index PBMP (Percentage of Bad Matching Pixels) for "Tsukuba", "Venus", "Teddy" and "Cones" in the evaluation database. PBMP is used to estimate the depth map with the distortion-free Refer to the depth map for comparison to calculate the error. If the parallax error of a pixel is greater than one pixel width, it is regarded as an error pixel. Due to the use of an undistorted depth map as a reference, PBMP is an accurate and reliable full reference metric.

表1评价数据库中不同深度图的PBMP(％)的值Table 1 Values of PBMP (%) for different depth maps in the evaluation database

方法method TsukubaTsukuba VenusVenus TeddyTeddy ConesCones AdaptBPAdaptBP 1.371.37 0.210.21 7.067.06 7.927.92 WarpMatWarpMat 1.351.35 0.240.24 9.309.30 8.478.47 P-LinearSP-LinearS 1.671.67 0.890.89 12.0012.00 8.448.44 VSWVSW 1.881.88 0.810.81 13.313.3 8.858.85 BPcompressedBP compressed 3.633.63 1.891.89 13.913.9 9.859.85 LayeredLayered 1.871.87 1.851.85 14.314.3 14.7014.70 SNCCSNCC 6.086.08 1.731.73 11.1011.10 9.029.02 ReliabilityDPReliabilityDP 3.393.39 3.483.48 16.9016.90 19.9019.90 InfectionInfection 9.549.54 5.535.53 25.1025.10 21.3021.30

表2给出了本发明方法得到的评价数据库中的“Tsukuba”、“Venus”、“Teddy”和“Cones”的质量评价值。表3给出了本发明方法的评价结果与全参考指标PBMP的相关系数，相关系数衡量了两者的一致性程度，皮尔逊系数和线性回归系数的值都是越接近1越好。由表3可知：本发明方法求得的结果与PBMP有很好的一致性，说明本发明方法能准确检测深度误差和评价深度图的质量。Table 2 shows the quality evaluation values of "Tsukuba", "Venus", "Teddy" and "Cones" in the evaluation database obtained by the method of the present invention. Table 3 has provided the evaluation result of the inventive method and the correlation coefficient of full reference index PBMP, and correlation coefficient has weighed the consistency degree of both, and the value of Pearson coefficient and linear regression coefficient all is that the value closer to 1 is better. It can be seen from Table 3 that the results obtained by the method of the present invention are in good agreement with PBMP, indicating that the method of the present invention can accurately detect depth errors and evaluate the quality of depth maps.

表2评价数据库中不同深度图的质量评价值EPR(％)Table 2 The quality evaluation value EPR (%) of different depth maps in the evaluation database

表3质量评价值EPR与PBMP之间的相关性Table 3 Correlation between quality evaluation value EPR and PBMP

TsukubaTsukuba VenusVenus TeddyTeddy ConesCones 皮尔逊系数Pearson Coefficient 0.940.94 0.900.90 0.840.84 0.970.97 线性回归系数linear regression coefficient 0.890.89 0.800.80 0.710.71 0.930.93

表4给出了本发明方法的评价结果与虚拟视点质量的相关系数，虚拟视点质量用客观评价指标均方误差MSE来衡量。因为虚拟视点合成是基于深度图进行的，深度质量越差将导致虚拟视图中出现更多的错误，这表明MSE应该随着质量评价值EPR的增大而增大，用MSE和质量评价值EPR的线性回归系数表示度量的准确度。在“Tsukuba”、“Venus”、“Teddy”和“Cones”中，质量评价值EPR与MSE的线性回归系数均超过0.75。特别地，在“Tsukuba”中线性回归系数超过了0.92。这表明，质量评价值EPR与虚拟视点的质量有很好的一致性。Table 4 shows the correlation coefficient between the evaluation results of the method of the present invention and the quality of the virtual viewpoint, and the quality of the virtual viewpoint is measured by the objective evaluation index mean square error MSE. Because the virtual view synthesis is based on the depth map, the poorer the depth quality will lead to more errors in the virtual view, which indicates that MSE should increase with the increase of the quality evaluation value EPR, using MSE and quality evaluation value EPR The linear regression coefficient for indicates the accuracy of the measure. In "Tsukuba", "Venus", "Teddy" and "Cones", the linear regression coefficients of the quality evaluation values EPR and MSE all exceeded 0.75. In particular, the linear regression coefficient exceeded 0.92 in "Tsukuba". This shows that the quality evaluation value EPR has a good agreement with the quality of the virtual viewpoint.

表4质量评价值EPR与虚拟视点质量之间的相关性Table 4 Correlation between quality evaluation value EPR and virtual viewpoint quality

TsukubaTsukuba VenusVenus TeddyTeddy ConesCones 线性回归系数linear regression coefficient 0.930.93 0.760.76 0.840.84 0.910.91

Claims

1. a method for evaluating the quality of a cross-validation depth map in conjunction with a JND model, characterized in that it may further comprise the steps:

① mark the depth map to be evaluated as D _tar , mark the color map corresponding to D _tar as Τ _tar , define another known viewpoint except the viewpoint where D _tar and Τ _tar are located as an auxiliary viewpoint, and define the The color map is marked as T _ref ; then by converting the pixel values of all pixels in D _tar into parallax values, all pixels in Τ _tar are mapped to T _ref through 3D-Warping; wherein, D _tar , Τ The total number of pixels on the vertical direction of _tar and T _ref is M, and the total number of pixels on the horizontal direction of D _tar , Τ _tar and T _ref is N;

②Let E _tar represent the difference map whose size is the same as that of D _tar , and record the pixel value of the pixel whose coordinate position is (x, y) in E _tar as E _tar (x, y), when the auxiliary viewpoint When D _tar and Τ _tar are on the left side of the viewpoint, judge whether y+d _{tar, p} (x, y) is greater than N, if so, then make E _tar (x, y)=0, otherwise, satisfy u=x, v=y+d _{tar, p} (x, y), E _tar (x, y)=|Ι _tar (x, y)-Ι _ref (u, v)|; when the auxiliary viewpoint is at the location of D _tar and Τ _tar When on the right side of the viewpoint, judge whether yd _tar,p (x, y) is less than 1, if yes, then set E _tar (x, y)=0, otherwise, satisfy u=x, v=yd _tar,p (x, y), E _tar (x, y) = |Ι _tar (x, y)-Ι _ref (u, v)|; among them, 1≤x≤M, 1≤y≤N, 1≤u≤M, 1 ≤v≤N, d _{tar, p} (x, y) represents the disparity value converted from the pixel value of the pixel whose coordinate position is (x, y) in D _tar , the symbol "| |" is the absolute value symbol, Ι _tar (x, y) represents the luminance component of a pixel whose coordinate position is (x, y) in Τ _tar , and Ι _ref (u, v) represents the brightness of a pixel whose coordinate position is (u, v) in T _ref weight;

③ Let C represent the occlusion mask image whose size is the same as that of D _tar , record the pixel value of the pixel whose coordinate position is (x, y) in C as C(x, y), and record each pixel in C The pixel value of each pixel point is initialized to 0, and the total number of pixels at the coordinate position (u, v) in T ref is mapped to T _ref through 3D-Warping in T _tar and recorded as N _{(u, v)} ; when N When _(u,v) ＝1, set C(x,y)＝0; when N _(u,v) >1, Among them, the value of N _{(u, v)} is 0 or 1 or greater than 1, D _tar (x, y) represents the pixel value of the pixel point whose coordinate position is (x, y) in D _tar , and max() is to take Maximum value function, 1≤x _(u,v),i ≤M,1≤y _(u,v),i ≤N, (x _(u,v),i ,y _(u,v),i ) means The coordinate position of the i-th pixel in the N _{(u, v) pixel points at (u, v)} at the coordinate position in _{T ref} _through 3D-Warping mapping in T _tar , D _tar (x _{(u, v), i} , y _{(u, v), i} ) represents the pixel value of the pixel point whose coordinate position is (x _{(u, v), i} , y _{(u, v), i} ) in D _tar ;

④Use C to remove the occluded pixels in E _tar to obtain the difference map after de-occlusion, which is denoted as E' _tar , and the pixel value of the pixel whose coordinate position is (x, y) in E' _tar is denoted as E ' _tar (x, y), E' _tar (x, y) = E _tar (x, y) × (1-C (x, y));

⑤ Calculate the texture judgment factor of each pixel in T _ref , and record the texture judgment factor of the pixel whose coordinate position is (u, v) in T _ref as z(u, v), Among them, 1≤u≤M, 1≤v≤N, z _h (u, v) represents the texture judgment factor in the horizontal direction of the pixel whose coordinate position is (u, v) in T _ref , z _h (u, v ) value is 1 or 0, z _h (u, v) = 1 means that the pixel at the coordinate position (u, v) in T _ref is a texture pixel point in the horizontal direction, z _h (u, v) = 0 means The pixel at the coordinate position (u, v) in T _ref is a non-texture pixel in the horizontal direction, and z _v (u, v) represents the texture in the vertical direction of the pixel at the coordinate position (u, v) in T _ref Judgment factor, the value of z _v (u, v) is 1 or 0, z _v (u, v) = 1 means that the pixel at the coordinate position (u, v) in T _ref is a texture pixel in the vertical direction, z _v (u, v)=0 indicates that the pixel at the coordinate position (u, v) in T _ref is a non-texture pixel in the vertical direction;

⑥Let T denote the region marker map with the same size as T _ref , record the pixel value of the pixel whose coordinate position is (u,v) in T as T(u,v), and set each of T in T The pixel value of the pixel is initialized to 0; use the Canny operator to detect the edge area in T _ref , assuming that the pixel with the coordinate position (u, v) in T _ref belongs to the edge area, then let T (u, v) = 1; Assuming that the texture judgment factor z(u,v)=1 of the pixel whose coordinate position is (u,v) in T _ref , then when T(u,v)=0, it is determined that the coordinate position in T _ref is (u ,v) belongs to the texture area, and set T(u,v)=2 again; wherein, the value of T(u,v) is 0 or 1 or 2, and T(u,v)=0 represents T _ref The pixel point whose coordinate position is (u, v) belongs to the flat area, T(u, v)=1 means that the pixel point whose coordinate position is (u, v) in T _ref belongs to the edge area, T(u, v)= 2 means that the pixel at the coordinate position (u, v) in T _ref belongs to the texture area;

⑦ Introduce a JND model based on brightness masking and texture masking effects, use the JND model, and calculate the error visual threshold of each pixel in T _ref according to the area to which each pixel in T _ref belongs, and convert the coordinates in T _ref The error visual threshold of the pixel at position (u, v) is denoted as Th(u, v), Among them, max() is the function of taking the maximum value, min() is the function of taking the minimum value, bg(u, v) represents the average background brightness of the pixel at the coordinate position (u, v) in T _ref , mg(u, v) represents the maximum average weighting of the surrounding brightness of the pixel at the coordinate position (u, v) in T _ref , and LA(u, v) represents the brightness masking effect of the pixel at the coordinate position (u, v) in T _ref ,

f(bg(u,v),mg(u,v))=mg(u,v)×α(bg(u,v))+β(bg(u,v)), α(bg(u, v))=bg(u,v)×0.0001+0.115, β(bg(u,v))=0.5-bg(u,v)×0.01;

⑧ Let E represent the depth error map whose size is the same as that of D _tar , record the pixel value of the pixel whose coordinate position is (x, y) in E as E(x, y), when E' _tar (x ,y)=0, E(x,y)=0; when E' _tar (x,y)≠0, Wherein, V _{(x, y)} =(u, v) represents a mapping process, (x, y) is the coordinate position of the pixel in T _tar , (u, v) is the coordinate position of the pixel in T _ref , when the viewpoint of T _ref is on the left side of the viewpoint of Τ _tar , satisfy u=x, v=y+d _{tar, p} (x, y); when the viewpoint of T _ref is on the right of the viewpoint of Τ _tar , satisfy u = x, v = yd _{tar, p} (x, y);

⑨ count the total number of pixels with a pixel value of 1 in E, and record it as num _E ; then calculate the ratio of the wrong pixels in D _tar as the quality evaluation value of D _tar , and record it as EPR,

2. the method for evaluating the quality of the cross-validation depth map in conjunction with the JND model according to claim 1, is characterized in that in described step 1. the concrete process that the pixel value of all pixel points in D _tar is converted into parallax value is: : For the pixel point whose coordinate position in D _tar is (x, y), the disparity value obtained by converting its pixel value is recorded as d _{tar, p} (x, y), Among them, 1≤x≤M, 1≤y≤N, b represents the baseline distance between cameras, f represents the focal length of the camera, Z _near is the closest actual depth of field, Z _far is the farthest actual depth of field, D _tar (x,y) Indicates the pixel value of the pixel whose coordinate position is (x, y) in D _tar .

3. according to claim 1 and 2 described cross validation depth map quality evaluation method in conjunction with JND model, it is characterized in that described step 5. in the acquisition process of z _h (u, v) and z _v (u, v) for:

⑤_1. Calculate the differential signal of each pixel point in T _ref along the horizontal direction, and record the differential signal of the pixel point whose coordinate position is (u, v) in T _ref along the horizontal direction as d _h (u, v), Wherein, I _ref (u, v+1) represents the luminance component of the pixel whose coordinate position is (u, v+1) in T _ref ;

⑤_2. Calculate the characteristic sign of the differential signal of each pixel point in T _ref along the horizontal direction, and record the characteristic sign of d _h (u, v) as symd _h (u, v),

⑤_3, calculate z _h (u, v), Among them, d _hsym (u, v) is an intermediate variable, symd _h (u, v+1) represents the characteristic symbol of the differential signal along the horizontal direction of the pixel at the coordinate position (u, v+1) in T _ref ;

⑤_4. Calculate the differential signal of each pixel point in T _ref along the vertical direction, and record the differential signal of the pixel point whose coordinate position is (u, v) in T _ref along the vertical direction as d _v (u, v), Wherein, I _ref (u+1, v) represents the luminance component of the pixel whose coordinate position in T _ref is (u+1, v);

⑤_5. Calculate the characteristic sign of the differential signal of each pixel point in T _ref along the vertical direction, and record the characteristic sign of d _v (u, v) as symd _v (u, v),

⑤_6, calculate z _v (u, v), Among them, d _vsym (u, v) is an intermediate variable, symd _v (u+1, v) represents the characteristic sign of the differential signal along the vertical direction of the pixel at the coordinate position (u+1, v) in T _ref .