CN114842063A

CN114842063A - Depth map optimization method, device, equipment and storage medium

Info

Publication number: CN114842063A
Application number: CN202110131227.XA
Authority: CN
Inventors: 程载熙
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-01-30
Filing date: 2021-01-30
Publication date: 2022-08-02

Abstract

The present application provides a depth map optimization method, device, device and storage medium, and relates to the field of image processing. The method includes: acquiring a depth map and an RGB map corresponding to a real scene; segmenting the RGB map to obtain a segment map corresponding to the RGB map; determining an effective depth in the depth map according to the segment map; The optimized depth map corresponding to the real scene is obtained by fusion. This method can solve the problems of holes, noise, and environmental influences in the depth map collected by the depth camera, and output an optimized depth map with better quality and higher resolution with low power consumption and low delay.

Description

Depth map optimization method, device, device and storage medium

技术领域technical field

本申请实施例涉及图像处理领域，尤其涉及一种深度图优化方法、装置、设备及存储介质。The embodiments of the present application relate to the field of image processing, and in particular, to a depth map optimization method, apparatus, device, and storage medium.

背景技术Background technique

增强现实(augmented reality，AR)技术是一种将虚拟信息与真实场景巧妙融合的技术。AR技术实现的主要原理是将虚拟信息与真实场景进行融合。在将虚拟信息与真实场景进行融合的过程中，会涉及到对虚拟信息和真实场景之间的虚实遮挡处理。在进行虚实遮挡处理时，需要获取真实场景的深度信息、以及虚拟信息的深度信息，然后根据二者的深度信息实现真实场景与虚拟信息之间的遮挡效果。其中，虚拟信息是通过计算和模拟仿真生成的，所以，虚拟信息的深度信息是已知的，从而，虚实遮挡处理的关键之一在于如何获取真实场景的深度信息。Augmented reality (AR) technology is a technology that skillfully integrates virtual information with real scenes. The main principle of AR technology is to integrate virtual information with real scenes. In the process of fusing the virtual information with the real scene, the virtual and real occlusion processing between the virtual information and the real scene will be involved. When performing virtual and real occlusion processing, it is necessary to obtain the depth information of the real scene and the depth information of the virtual information, and then realize the occlusion effect between the real scene and the virtual information according to the depth information of the two. Among them, the virtual information is generated by calculation and simulation, so the depth information of the virtual information is known. Therefore, one of the keys to the virtual and real occlusion processing is how to obtain the depth information of the real scene.

目前，获取真实场景的深度信息的方式一般为：通过深度相机(或称为深度传感器)采集真实场景对应的深度图，该深度图即包含了真实场景的深度信息。At present, the method of acquiring the depth information of the real scene is generally: collecting a depth map corresponding to the real scene through a depth camera (or called a depth sensor), and the depth map contains the depth information of the real scene.

但是，通过深度相机获取到的真实场景对应的深度图质量较差、且分辨率较低。However, the depth map corresponding to the real scene obtained by the depth camera is of poor quality and low resolution.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供一种深度图优化方法、装置、设备及存储介质，可以解决深度相机采集的深度图空洞、噪声、受环境影响等问题，低功耗、低时延地输出质量更好、分辨率更高的优化后的深度图。The embodiments of the present application provide a depth map optimization method, device, device, and storage medium, which can solve the problems of holes, noise, and environmental influences in the depth map collected by the depth camera, and have better output quality with low power consumption and low delay. Optimized depth map with higher resolution.

第一方面，本申请实施例提供一种深度图优化方法，所述方法包括：获取真实场景对应的深度图和RGB图；对RGB图进行分割，得到RGB图对应的分割图；根据分割图，确定深度图中的有效深度；根据深度图中的有效深度和分割图，融合得到真实场景对应的优化后的深度图。In a first aspect, an embodiment of the present application provides a depth map optimization method, the method includes: acquiring a depth map and an RGB map corresponding to a real scene; segmenting the RGB map to obtain a segment map corresponding to the RGB map; according to the segment map, Determine the effective depth in the depth map; according to the effective depth in the depth map and the segmentation map, the optimized depth map corresponding to the real scene is obtained by fusion.

该深度优化方法通过对彩色相机采集的RGB图、以及深度相机采集的深度图进行预处理，对预处理后的RGB图进行分割，得到RGB图对应的分割图，然后将深度图和RGB图对应的分割图进行融合，可以得到质量更好、分辨率更高的优化后的深度图。同时，通过该深度图优化方法获取优化后的深度图的方式可以实现更低的时延和功耗。The depth optimization method preprocesses the RGB image collected by the color camera and the depth image collected by the depth camera, segmentes the preprocessed RGB image, and obtains the segmented image corresponding to the RGB image, and then associates the depth image with the RGB image. Fusion of the segmentation maps can obtain an optimized depth map with better quality and higher resolution. At the same time, by obtaining the optimized depth map through the depth map optimization method, lower latency and power consumption can be achieved.

例如，深度相机采集的深度图存在空洞、噪声、受环境影响等诸多问题，通过该方法对深度相机采集的深度图进行优化后，优化后的深度图质量会更好。For example, the depth map collected by the depth camera has many problems such as holes, noise, and environmental influence. After the depth map collected by the depth camera is optimized by this method, the quality of the optimized depth map will be better.

又例如，一般而言，TOF深度相机的分辨率为240*180，当应用场景通常需要5米(甚至更远)的深度时，TOF深度相机需要较强的曝光，而高曝光高分辨率必然导致高功耗。而通过该深度图优化方法，可以提高深度图的分辨率，所以，可以通过降低TOF深度相机采集深度图的分辨率(如48*30)和帧率(如10fps)，以达到低功耗的目的。另外，由于该深度图优化方法无需采用专门的算法对深度图进行超分处理，所以处理过程更加轻便简介，从而可以达到较低的时延。For another example, in general, the resolution of the TOF depth camera is 240*180. When the application scene usually requires a depth of 5 meters (or even further), the TOF depth camera requires strong exposure, and high exposure and high resolution must be lead to high power consumption. Through this depth map optimization method, the resolution of the depth map can be improved. Therefore, the resolution (such as 48*30) and frame rate (such as 10fps) of the depth map collected by the TOF depth camera can be reduced to achieve low power consumption. Purpose. In addition, since the depth map optimization method does not need to use a special algorithm to perform superdivision processing on the depth map, the processing process is more lightweight and simple, so that a lower delay can be achieved.

可选地，所述根据分割图，确定深度图中的有效深度，包括：逐像素遍历深度图，获取深度图中的每个像素对应的第一对象实例的信息，得到深度图中每个第一对象实例对应的有效深度总和、以及有效深度的像素个数。所述根据深度图中的有效深度和分割图，融合得到真实场景对应的优化后的深度图，包括：根据深度图中每个第一对象实例对应的有效深度总和、以及有效深度的像素个数，确定每个第一对象实例的平均深度；将分割图上采样至第一分辨率，并针对每个第一对象实例填充第一对象实例的平均深度，得到真实场景对应的优化后的深度图。Optionally, determining the effective depth in the depth map according to the segmentation map includes: traversing the depth map pixel by pixel, acquiring information of the first object instance corresponding to each pixel in the depth map, and obtaining each first object instance in the depth map. The sum of the effective depths corresponding to an object instance, and the number of pixels in the effective depths. According to the effective depth in the depth map and the segmentation map, the optimized depth map corresponding to the real scene is obtained by fusion, including: according to the sum of the effective depths corresponding to each first object instance in the depth map and the number of pixels of the effective depth , determine the average depth of each first object instance; upsample the segmentation map to the first resolution, and fill in the average depth of the first object instance for each first object instance to obtain an optimized depth map corresponding to the real scene .

可选地，分割图为人像分割图，第一对象实例为人像实例。Optionally, the segmentation map is a portrait segmentation map, and the first object instance is a portrait instance.

本申请实施例同样适用于优化深度图中的其他对象(如动物、树木等)的深度。换言之，可以将本申请实施例扩展应用至优化更多场景下的深度图，不限于人像深度。The embodiments of the present application are also applicable to optimizing the depth of other objects (such as animals, trees, etc.) in the depth map. In other words, the embodiments of the present application can be extended and applied to optimize depth maps in more scenes, not limited to portrait depth.

可选地，所述根据分割图，确定深度图中的有效深度之前，所述方法还包括：将分割图缩放至与深度图相同大小。Optionally, before determining the effective depth in the depth map according to the segmentation map, the method further includes: scaling the segmentation map to the same size as the depth map.

可选地，所述对RGB图进行分割之前，所述方法还包括：对RGB图和深度图进行对齐和归一化。Optionally, before the RGB image is segmented, the method further includes: aligning and normalizing the RGB image and the depth map.

可选地，所述对RGB图进行分割，得到RGB图对应的分割图，包括：采用训练好的深度神经网络模型对RGB图进行推理，得到RGB图对应的掩码图；对掩码图进行连通区域分割，得到分割图。Optionally, the step of segmenting the RGB image to obtain a segmentation image corresponding to the RGB image includes: using a trained deep neural network model to infer the RGB image to obtain a mask image corresponding to the RGB image; The connected regions are segmented to obtain a segmentation map.

可选地，深度图是通过深度相机采集的深度图、或者单目估计深度图；其中，深度相机包括：结构光深度相机、飞行时间深度相机、以及双目深度相机中的任意一种。Optionally, the depth map is a depth map collected by a depth camera, or a monocular estimated depth map; wherein the depth camera includes any one of a structured light depth camera, a time-of-flight depth camera, and a binocular depth camera.

也即，通过该深度图优化方法可以优化深度相机采集的深度图，也可以优化单目估计的深度图。That is, through the depth map optimization method, the depth map acquired by the depth camera can be optimized, and the depth map estimated by the monocular can also be optimized.

第二方面，本申请实施例提供一种深度图优化装置，该装置可以用于实现上述第一方面所述的方法。该装置的功能可以通过硬件实现，也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块或单元，例如，获取模块、分割模块、多信息融合模块等。In a second aspect, an embodiment of the present application provides an apparatus for optimizing a depth map, and the apparatus can be used to implement the method described in the first aspect above. The functions of the apparatus may be implemented by hardware, or by executing corresponding software by hardware. The hardware or software includes one or more modules or units corresponding to the above functions, for example, an acquisition module, a segmentation module, a multi-information fusion module, and the like.

其中，获取模块，用于获取真实场景对应的深度图和RGB图；分割模块，用于对RGB图进行分割，得到RGB图对应的分割图；多信息融合模块，用于根据分割图，确定深度图中的有效深度，并根据深度图中的有效深度和分割图，融合得到真实场景对应的优化后的深度图。Among them, the acquisition module is used to obtain the depth map and RGB map corresponding to the real scene; the segmentation module is used to segment the RGB image to obtain the segmentation map corresponding to the RGB image; the multi-information fusion module is used to determine the depth according to the segmentation map. The effective depth in the map is obtained, and the optimized depth map corresponding to the real scene is obtained by fusion according to the effective depth in the depth map and the segmentation map.

可选地，多信息融合模块具体用于，逐像素遍历深度图，获取深度图中的每个像素对应的第一对象实例的信息，得到深度图中每个第一对象实例对应的有效深度总和、以及有效深度的像素个数；根据深度图中每个第一对象实例对应的有效深度总和、以及有效深度的像素个数，确定每个第一对象实例的平均深度；将分割图上采样至第一分辨率，并针对每个第一对象实例填充第一对象实例的平均深度，得到真实场景对应的优化后的深度图。Optionally, the multi-information fusion module is specifically used to traverse the depth map pixel by pixel, obtain information of the first object instance corresponding to each pixel in the depth map, and obtain the effective depth sum corresponding to each first object instance in the depth map. , and the number of pixels of the effective depth; determine the average depth of each first object instance according to the sum of the effective depths corresponding to each first object instance in the depth map and the number of pixels of the effective depth; upsample the segmentation map to the first resolution, and fill in the average depth of the first object instance for each first object instance to obtain an optimized depth map corresponding to the real scene.

可选地，所述多信息融合模块在根据分割图，确定深度图中的有效深度之前，还用于，将分割图缩放至与深度图相同大小。Optionally, before determining the effective depth in the depth map according to the segmentation map, the multi-information fusion module is further configured to scale the segmentation map to the same size as the depth map.

可选地，所述装置还包括：预处理模块，用于对RGB图和深度图进行对齐和归一化。Optionally, the apparatus further includes: a preprocessing module for aligning and normalizing the RGB map and the depth map.

可选地，分割模块具体用于，采用训练好的深度神经网络模型对RGB图进行推理，得到RGB图对应的掩码图；对掩码图进行连通区域分割，得到分割图。Optionally, the segmentation module is specifically used for inferring the RGB image by using the trained deep neural network model to obtain a mask image corresponding to the RGB image; and segmenting the connected area of the mask image to obtain a segmentation image.

第三方面，本申请实施例提供一种电子设备，包括：处理器，用于存储处理器可执行指令的存储器；处理器被配置为执行所述指令时，使得电子设备实现如第一方面所述的深度图优化方法。In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory for storing instructions executable by the processor; when the processor is configured to execute the instructions, the electronic device achieves the implementation of the first aspect. The depth map optimization method described above.

该电子设备可以是手机、平板电脑、可穿戴设备、车载设备、AR/VR设备、笔记本电脑、超级移动个人计算机、上网本、个人数字助理等移动终端。The electronic device may be a mobile terminal such as a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an AR/VR device, a notebook computer, a super mobile personal computer, a netbook, a personal digital assistant, and the like.

第四方面，本申请实施例提供一种计算机可读存储介质，其上存储有计算机程序指令；当计算机程序指令被电子设备执行时，使得电子设备实现如第一方面所述的深度图优化方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium on which computer program instructions are stored; when the computer program instructions are executed by an electronic device, the electronic device is enabled to implement the depth map optimization method described in the first aspect .

第五方面，本申请实施例还提供一种计算机程序产品，包括计算机可读代码，当所述计算机可读代码在电子设备中运行时，使得电子设备实现前述第一方面所述的深度图优化方法。In a fifth aspect, the embodiments of the present application further provide a computer program product, including computer-readable codes, which, when the computer-readable codes are executed in an electronic device, enable the electronic device to implement the depth map optimization described in the foregoing first aspect method.

上述第二方面至第五方面所具备的有益效果，可参考第一方面中所述，在此不再赘述。For the beneficial effects of the second aspect to the fifth aspect, reference may be made to the description in the first aspect, which will not be repeated here.

应当理解的是，本申请中对技术特征、技术方案、有益效果或类似语言的描述并不是暗示在任意的单个实施例中可以实现所有的特点和优点。相反，可以理解的是对于特征或有益效果的描述意味着在至少一个实施例中包括特定的技术特征、技术方案或有益效果。因此，本说明书中对于技术特征、技术方案或有益效果的描述并不一定是指相同的实施例。进而，还可以任何适当的方式组合本实施例中所描述的技术特征、技术方案和有益效果。本领域技术人员将会理解，无需特定实施例的一个或多个特定的技术特征、技术方案或有益效果即可实现实施例。在其他实施例中，还可在没有体现所有实施例的特定实施例中识别出额外的技术特征和有益效果。It should be understood that the description of technical features, technical solutions, beneficial effects or similar language in this application does not imply that all features and advantages may be realized in any single embodiment. On the contrary, it can be understood that the description of features or beneficial effects means that a specific technical feature, technical solution or beneficial effect is included in at least one embodiment. Therefore, descriptions of technical features, technical solutions or beneficial effects in this specification do not necessarily refer to the same embodiments. Furthermore, the technical features, technical solutions and beneficial effects described in this embodiment can also be combined in any appropriate manner. Those skilled in the art will understand that an embodiment can be implemented without one or more specific technical features, technical solutions or beneficial effects of a specific embodiment. In other embodiments, additional technical features and benefits may also be identified in specific embodiments that do not embody all embodiments.

附图说明Description of drawings

图1示出了一种AR应用的界面示意图；FIG. 1 shows a schematic interface diagram of an AR application;

图2示出了本申请实施例提供的终端设备的结构示意图；FIG. 2 shows a schematic structural diagram of a terminal device provided by an embodiment of the present application;

图3示出了本申请实施例提供的深度图优化方法的流程示意图；FIG. 3 shows a schematic flowchart of a depth map optimization method provided by an embodiment of the present application;

图4示出了本申请实施例提供的深度图优化方法的逻辑示意图；FIG. 4 shows a schematic logical diagram of a depth map optimization method provided by an embodiment of the present application;

图5示出了本申请实施例提供的深度图优化方法的效果示意图；FIG. 5 shows a schematic diagram of the effect of the depth map optimization method provided by the embodiment of the present application;

图6示出了本申请实施例提供的深度图优化装置的结构示意图；FIG. 6 shows a schematic structural diagram of a depth map optimization apparatus provided by an embodiment of the present application;

图7示出了本申请实施例提供的深度图优化装置的另一结构示意图。FIG. 7 shows another schematic structural diagram of the depth map optimization apparatus provided by the embodiment of the present application.

具体实施方式Detailed ways

增强现实(augmented reality，AR)技术是一种将虚拟信息与真实场景巧妙融合的技术，广泛运用了多媒体、三维建模、实时跟踪及注册、智能交互、传感等多种技术手段，可以将计算机生成的文字、图像、三维模型、音乐、视频等虚拟信息模拟仿真后，应用到真实场景中，两种信息互为补充，从而实现对真实场景的“增强”。Augmented reality (AR) technology is a technology that skillfully integrates virtual information with real scenes. The computer-generated text, images, 3D models, music, videos and other virtual information are simulated and applied to the real scene.

以一种手机上安装的AR应用为例，图1示出了一种AR应用的界面示意图。如图1所示，在手机运行的AR应用中，可以将摄像机摄取的真实场景的图像与虚拟合成的虚拟信息进行融合，并通过手机的屏幕进行展示。其中，虚拟信息可以包括虚拟文字、图片、视频等。Taking an AR application installed on a mobile phone as an example, FIG. 1 shows a schematic interface diagram of an AR application. As shown in Figure 1, in the AR application running on the mobile phone, the image of the real scene captured by the camera can be fused with the virtual information synthesized virtually, and displayed on the screen of the mobile phone. The virtual information may include virtual text, pictures, videos, and the like.

可以看到，AR技术实现的主要原理是将虚拟信息与真实场景进行融合。在将虚拟信息与真实场景进行融合的过程中，会涉及到对虚拟信息和真实场景之间的虚实遮挡处理。例如，虚拟信息与真实场景间会存在一定的空间位置关系，从用户视角或者从展示效果来看，这种空间位置关系也即为虚拟信息与真实场景之间的遮挡关系，如：当虚拟信息是某个虚拟物体时，该虚拟物体与真实场景中的物体之间的遮挡关系可以为虚拟物体遮挡真实场景中的物体，或者，真实场景中的物体遮挡虚拟物体。因此，在将虚拟信息与真实场景进行融合的过程中，需要结合虚拟信息和真实场景之间的这种遮挡关系，进行相应的虚实遮挡处理。It can be seen that the main principle of AR technology is to integrate virtual information with real scenes. In the process of fusing the virtual information with the real scene, the virtual and real occlusion processing between the virtual information and the real scene will be involved. For example, there will be a certain spatial positional relationship between virtual information and the real scene. From the user's perspective or from the display effect, this spatial positional relationship is also the occlusion relationship between the virtual information and the real scene. For example: when the virtual information When it is a certain virtual object, the occlusion relationship between the virtual object and the object in the real scene may be that the virtual object occludes the object in the real scene, or the object in the real scene occludes the virtual object. Therefore, in the process of fusing the virtual information with the real scene, it is necessary to combine the occlusion relationship between the virtual information and the real scene to perform corresponding virtual and real occlusion processing.

在进行虚实遮挡处理时，需要获取真实场景的深度信息、以及虚拟信息的深度信息，然后根据二者的深度信息实现真实场景与虚拟信息之间的遮挡效果。其中，虚拟信息是通过计算和模拟仿真生成的，所以，虚拟信息的深度信息是已知的，从而，虚实遮挡处理的关键之一在于如何获取真实场景的深度信息。When performing virtual and real occlusion processing, it is necessary to obtain the depth information of the real scene and the depth information of the virtual information, and then realize the occlusion effect between the real scene and the virtual information according to the depth information of the two. Among them, the virtual information is generated by calculation and simulation, so the depth information of the virtual information is known. Therefore, one of the keys to the virtual and real occlusion processing is how to obtain the depth information of the real scene.

目前，获取真实场景的深度信息的方式一般为：通过深度相机(或称为深度传感器)采集真实场景对应的深度图，该深度图即包含了真实场景的深度信息。深度相机可以包括主动式深度相机和被动式深度相机两类。At present, the method of acquiring the depth information of the real scene is generally: collecting a depth map corresponding to the real scene through a depth camera (or called a depth sensor), and the depth map contains the depth information of the real scene. Depth cameras can include two types of active depth cameras and passive depth cameras.

其中，主动式深度相机主要包括：结构光(structured light)深度相机和飞行时间(time of flight，TOF)深度相机。结构光深度相机的基本原理是通过近红外激光器，将具有一定结构特征的光线投射到被拍摄物体(即真实场景中的物体)上，再由专门的红外摄像头进行采集。这种具备一定结构的光线，会因被摄物体的不同深度区域，而采集不同的图像相位信息，然后通过运算单元将这种结构的变化换算成深度信息，从而得到真实场景对应的深度图。TOF深度相机的基本原理是通过红外发射器发射调制过的光脉冲，遇到真实场景中的物体反射后，用接收器反射回来的光脉冲，并根据光脉冲的往返时间计算TOF深度相机与真实场景中的物体之间的距离，从而得到真实场景对应的深度图。Among them, the active depth cameras mainly include: structured light (structured light) depth cameras and time of flight (time of flight, TOF) depth cameras. The basic principle of the structured light depth camera is to project light with certain structural characteristics onto the object to be photographed (that is, the object in the real scene) through the near-infrared laser, and then collect it by a special infrared camera. This kind of light with a certain structure will collect different image phase information due to different depth regions of the subject, and then convert the change of this structure into depth information through the computing unit, so as to obtain the depth map corresponding to the real scene. The basic principle of the TOF depth camera is to transmit modulated light pulses through an infrared transmitter. After encountering the reflection of objects in the real scene, the light pulses reflected by the receiver are used, and the TOF depth camera is calculated according to the round-trip time of the light pulses. The distance between objects in the scene, so as to obtain the depth map corresponding to the real scene.

被动式相机主要包括双目深度相机。双目深度相机的基本原理是基于视差原理并利用成像设备，从不同的位置获取真实环境中的物体的两幅图像，然后对两幅图像进行图像特征匹配，通过计算两幅图像对应的特征点之间的位置偏差，从而得到真实场景对应的深度图。Passive cameras mainly include binocular depth cameras. The basic principle of the binocular depth camera is based on the principle of parallax and using imaging equipment to obtain two images of objects in the real environment from different positions, and then perform image feature matching on the two images, by calculating the corresponding feature points of the two images. The position deviation between them can be obtained to obtain the depth map corresponding to the real scene.

但是，上述通过深度相机获取到的真实场景对应的深度图质量较差。例如，通过结构光深度相机获取真实场景对应的深度图时，容易受到环境光(如太阳光)的干扰而导致深度图质量较差。TOF深度相机获取的深度图具有很多噪声，如多路径干扰(multi-pathinterference，MPI)噪声、串扰、散射等，并且由于其发射的是主动光，因此对于真实场景中低反射率的物体难以计算深度，如黑色头发。通过双目深度相机获取真实场景对应的深度图时，依赖于所获取的图像的质量，在图像较亮、较暗或无纹理区域难以计算深度。However, the quality of the depth map corresponding to the real scene obtained by the above-mentioned depth camera is poor. For example, when a depth map corresponding to a real scene is obtained by a structured light depth camera, it is easily disturbed by ambient light (such as sunlight), resulting in poor quality of the depth map. The depth map acquired by the TOF depth camera has a lot of noise, such as multi-path interference (MPI) noise, crosstalk, scattering, etc., and because it emits active light, it is difficult to calculate for objects with low reflectivity in real scenes Depth, like black hair. When the depth map corresponding to the real scene is acquired by the binocular depth camera, depending on the quality of the acquired image, it is difficult to calculate the depth in the brighter, darker or textureless areas of the image.

另外，上述通过深度相机获取到的真实场景对应的深度图的分辨率也较低。例如，常见的深度相机获取到的深度图的分辨率为240*180，但是，在进行虚实遮挡处理时，真实场景对应的彩色图(即RGB图)的分辨率一般都较高，如：分辨率为1920×1080，因此，还需要对深度图进行超分处理。In addition, the resolution of the depth map corresponding to the real scene obtained by the depth camera is also low. For example, the resolution of the depth map obtained by a common depth camera is 240*180, but when performing virtual and real occlusion processing, the resolution of the color map (ie RGB map) corresponding to the real scene is generally higher, such as: The ratio is 1920×1080, so the depth map also needs to be super-divided.

基于目前通过深度相机获取到的真实场景对应的深度图质量较差、以及分辨率较低的问题，本申请实施例提供一种深度图优化方法。该方法包括：获取彩色相机采集的RGB图、以及深度相机采集的深度图。对RGB图和深度图进行预处理。对预处理后的RGB图进行分割，得到RGB图对应的分割图。将深度图和RGB图对应的分割图进行融合，得到优化后的深度图。通过该方法可以对深度相机采集到的深度图进行优化，得到质量更好、分辨率更高的优化后的深度图。Based on the problems of poor quality and low resolution of a depth map corresponding to a real scene obtained by a depth camera at present, an embodiment of the present application provides a depth map optimization method. The method includes: acquiring an RGB image collected by a color camera and a depth image collected by a depth camera. Preprocess the RGB map and depth map. The preprocessed RGB image is segmented to obtain the segmented image corresponding to the RGB image. The depth map and the corresponding segmentation map of the RGB map are fused to obtain the optimized depth map. Through this method, the depth map collected by the depth camera can be optimized, and an optimized depth map with better quality and higher resolution can be obtained.

可以理解的，彩色相机采集的RGB图、以及深度相机采集的深度图为真实场景对应的RGB图和深度图。It can be understood that the RGB image collected by the color camera and the depth image collected by the depth camera are the RGB image and the depth image corresponding to the real scene.

示例性地，该方法可以应用于终端设备。终端设备可以是手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality，AR)/虚拟现实(virtual reality，VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer，UMPC)、上网本、个人数字助理(personal digital assistant，PDA)等移动终端，本申请实施例对终端设备的具体类型不作限制。Exemplarily, the method can be applied to a terminal device. The terminal device can be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop, an ultra-mobile personal computer (UMPC) ), a netbook, a personal digital assistant (personal digital assistant, PDA) and other mobile terminals, the embodiments of the present application do not limit the specific type of the terminal device.

下面将以终端设备为手机为例，结合附图对本申请实施例进行具体说明。The embodiments of the present application will be specifically described below with reference to the accompanying drawings, taking the terminal device as a mobile phone as an example.

需要说明的是，在本申请的描述中，“至少一个”是指一个或多个，“多个”是指两个或两个以上。“第一”、“第二”等字样仅仅是为了区分描述，并不用于对某个特征的特别限定。“和/或”用于描述关联对象的关联关系，表示可以存在三种关系。例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。It should be noted that, in the description of this application, "at least one" refers to one or more, and "a plurality" refers to two or more. The words "first", "second" and the like are only used for distinguishing descriptions, and are not used for special limitation on a certain feature. "And/or" is used to describe the association relationship of associated objects, indicating that there are three kinds of relationships. For example, A and/or B can mean that A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are an "or" relationship.

以终端设备为手机为例，图2示出了本申请实施例提供的终端设备的结构示意图。如图2所示，手机可以包括处理器210，外部存储器接口220，内部存储器221，通用串行总线(universal serial bus，USB)接口230，充电管理模块240，电源管理模块241，电池242，天线1，天线2，移动通信模块250，无线通信模块260，音频模块270，扬声器270A，受话器270B，麦克风270C，耳机接口270D，传感器模块280，按键290，马达291，指示器292，摄像头293，显示屏294，以及用户标识模块(subscriber identification module，SIM)卡接口295等。Taking the terminal device as a mobile phone as an example, FIG. 2 shows a schematic structural diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 2, the mobile phone may include a processor 210, an external memory interface 220, an internal memory 221, a universal serial bus (USB) interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1. Antenna 2, Mobile Communication Module 250, Wireless Communication Module 260, Audio Module 270, Speaker 270A, Receiver 270B, Microphone 270C, Headphone Interface 270D, Sensor Module 280, Key 290, Motor 291, Indicator 292, Camera 293, Display screen 294, and a subscriber identification module (SIM) card interface 295 and the like.

处理器210可以包括一个或多个处理单元，例如：处理器210可以包括应用处理器(application processor，AP)，调制解调处理器，图形处理器(graphics processingunit，GPU)，图像信号处理器(image signal processor，ISP)，控制器，存储器，视频编解码器，数字信号处理器(digital signal processor，DSP)，基带处理器，和/或神经网络处理器(neural-network processing unit，NPU)等。其中，不同的处理单元可以是独立的器件，也可以集成在一个或多个处理器中。The processor 210 may include one or more processing units, for example, the processor 210 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor ( image signal processor, ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and/or neural-network processing unit (NPU), etc. . Wherein, different processing units may be independent devices, or may be integrated in one or more processors.

其中，控制器可以是手机的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号，产生操作控制信号，完成取指令和执行指令的控制。Among them, the controller can be the nerve center and command center of the mobile phone. The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.

处理器210中还可以设置存储器，用于存储指令和数据。在一些实施例中，处理器210中的存储器为高速缓冲存储器。该存储器可以保存处理器210刚用过或循环使用的指令或数据。如果处理器210需要再次使用该指令或数据，可从所述存储器中直接调用。避免了重复存取，减少了处理器210的等待时间，因而提高了系统的效率。A memory may also be provided in the processor 210 for storing instructions and data. In some embodiments, the memory in processor 210 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 210 . If the processor 210 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided, and the waiting time of the processor 210 is reduced, thereby improving the efficiency of the system.

在一些实施例中，处理器210可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit，I2C)接口，集成电路内置音频(inter-integrated circuitsound，I2S)接口，脉冲编码调制(pulse code modulation，PCM)接口，通用异步收发传输器(universal asynchronous receiver/transmitter，UART)接口，移动产业处理器接口(mobile industry processor interface，MIPI)，通用输入输出(general-purposeinput/output，GPIO)接口，SIM接口，和/或USB接口等。In some embodiments, the processor 210 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuitsound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver (universal asynchronous receiver) interface /transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input/output (general-purpose input/output, GPIO) interface, SIM interface, and/or USB interface, etc.

外部存储器接口220可以用于连接外部存储卡，例如Micro SD卡，实现扩展手机的存储能力。外部存储卡通过外部存储器接口220与处理器210通信，实现数据存储功能。例如将音乐，视频等文件保存在外部存储卡中。The external memory interface 220 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the mobile phone. The external memory card communicates with the processor 210 through the external memory interface 220 to realize the data storage function. For example to save files like music, video etc in external memory card.

内部存储器221可以用于存储计算机可执行程序代码，所述可执行程序代码包括指令。处理器210通过运行存储在内部存储器221的指令，从而执行手机的各种功能应用以及数据处理。内部存储器221可以包括存储程序区和存储数据区。其中，存储程序区可存储操作系统，至少一个功能所需的应用程序(比如声音播放功能，图像播放功能等)等。存储数据区可存储手机使用过程中所创建的数据(比如图像数据，电话本等)等。此外，内部存储器221可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件，闪存器件，通用闪存存储器(universal flash storage，UFS)等。Internal memory 221 may be used to store computer executable program code, which includes instructions. The processor 210 executes various functional applications and data processing of the mobile phone by executing the instructions stored in the internal memory 221 . The internal memory 221 may include a storage program area and a storage data area. The storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like. The storage data area can store data (such as image data, phone book, etc.) created during the use of the mobile phone. In addition, the internal memory 221 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.

充电管理模块240用于从充电器接收充电输入。充电管理模块240为电池242充电的同时，还可以通过电源管理模块241为手机供电。电源管理模块241用于连接电池242，充电管理模块240，以及处理器210。电源管理模块241也可接收电池242的输入为手机供电。The charging management module 240 is used to receive charging input from the charger. While the charging management module 240 charges the battery 242 , it can also supply power to the mobile phone through the power management module 241 . The power management module 241 is used to connect the battery 242 , the charging management module 240 , and the processor 210 . The power management module 241 can also receive the input of the battery 242 to supply power to the mobile phone.

手机的无线通信功能可以通过天线1，天线2，移动通信模块250，无线通信模块260，调制解调处理器以及基带处理器等实现。天线1和天线2用于发射和接收电磁波信号。手机中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用，以提高天线的利用率。例如：可以将天线1复用为无线局域网的分集天线。在另外一些实施例中，天线可以和调谐开关结合使用。The wireless communication function of the mobile phone can be realized by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, the modulation and demodulation processor, the baseband processor, and the like. Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in a cell phone can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

手机可以通过音频模块270，扬声器270A，受话器270B，麦克风270C，耳机接口270D，以及应用处理器等实现音频功能。例如音乐播放，录音等。The mobile phone can implement audio functions through an audio module 270, a speaker 270A, a receiver 270B, a microphone 270C, an earphone interface 270D, and an application processor. Such as music playback, recording, etc.

传感器模块280可以包括压力传感器280A，陀螺仪传感器280B，气压传感器280C，磁传感器280D，加速度传感器280E，距离传感器280F，接近光传感器280G，指纹传感器280H，温度传感器280J，触摸传感器280K，环境光传感器280L，骨传导传感器280M等。The sensor module 280 may include a pressure sensor 280A, a gyro sensor 280B, an air pressure sensor 280C, a magnetic sensor 280D, an acceleration sensor 280E, a distance sensor 280F, a proximity light sensor 280G, a fingerprint sensor 280H, a temperature sensor 280J, a touch sensor 280K, and an ambient light sensor 280L, bone conduction sensor 280M, etc.

显示屏294用于显示图像，视频等。显示屏294包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display，LCD)，有机发光二极管(organic light-emittingdiode，OLED)，有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrixorganic light emitting diode，AMOLED)，柔性发光二极管(flex light-emittingdiode，FLED)，Miniled，MicroLed，Micro-oLed，量子点发光二极管(quantum dot lightemitting diodes，QLED)等。在一些实施例中，手机可以包括1个或N个显示屏294，N为大于1的正整数。例如，显示屏294可以用于显示拍照界面，照片播放界面等。Display screen 294 is used to display images, videos, and the like. Display screen 294 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). , AMOLED), flexible light-emitting diodes (flex light-emitting diodes, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (quantum dot light emitting diodes, QLED) and so on. In some embodiments, the cell phone may include 1 or N display screens 294, where N is a positive integer greater than 1. For example, the display screen 294 may be used to display a photo-taking interface, a photo-playing interface, and the like.

手机通过GPU，显示屏294，以及应用处理器等实现显示功能。GPU为图像处理的微处理器，连接显示屏294和应用处理器。GPU用于执行数学和几何计算，用于图形渲染。处理器210可包括一个或多个GPU，其执行程序指令以生成或改变显示信息。The mobile phone realizes the display function through the GPU, the display screen 294, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 294 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or alter display information.

可以理解的是，图2所示的结构并不构成对手机的具体限定。在一些实施例中，手机也可以包括比图2所示更多或更少的部件，或者组合某些部件，或者拆分某些部件，或者不同的部件布置等。又或者，图2所示的一些部件可以以硬件，软件或软件和硬件的组合实现。It can be understood that the structure shown in FIG. 2 does not constitute a specific limitation on the mobile phone. In some embodiments, the mobile phone may also include more or less components than those shown in FIG. 2 , or combine some components, or separate some components, or different component arrangements, and the like. Alternatively, some of the components shown in FIG. 2 may be implemented in hardware, software, or a combination of software and hardware.

另外，当终端设备是其他平板电脑、可穿戴设备、车载设备、AR/VR设备、笔记本电脑、UMPC、上网本、PDA等移动终端时，这些其他终端设备的具体结构也可以参考图2所示。示例性地，其他终端设备可以是在图2给出的结构的基础上增加或减少了组件，在此不再一一赘述。In addition, when the terminal device is other tablet computer, wearable device, vehicle-mounted device, AR/VR device, notebook computer, UMPC, netbook, PDA and other mobile terminals, the specific structure of these other terminal devices can also be referred to as shown in FIG. 2 . Exemplarily, other terminal devices may have components added or reduced on the basis of the structure given in FIG. 2 , which will not be repeated here.

图3示出了本申请实施例提供的深度图优化方法的流程示意图。如图3所示，本申请实施例提供的深度图优化方法可以包括S301-S304。FIG. 3 shows a schematic flowchart of a depth map optimization method provided by an embodiment of the present application. As shown in FIG. 3 , the depth map optimization method provided in this embodiment of the present application may include S301-S304.

S301、获取彩色相机采集的RGB图、以及深度相机采集的深度图。S301. Acquire an RGB image collected by a color camera and a depth image collected by a depth camera.

其中，深度相机可以结构光(structured light)深度相机、飞行时间(time offlight，TOF)深度相机、以及双目深度相机中的任意一种，又或者，还可以是其他类型的深度图相机，在此不作限制。The depth camera may be any one of a structured light (structured light) depth camera, a time-of-flight (TOF) depth camera, and a binocular depth camera, or may also be other types of depth map cameras. This is not limited.

S302、对RGB图和深度图进行预处理。S302. Preprocess the RGB map and the depth map.

例如，可以对齐RGB图和深度图的视角(field of view，FOV)，并进行归一化等预处理，得到预处理后的RGB图和深度图。For example, the field of view (FOV) of the RGB map and the depth map can be aligned, and preprocessing such as normalization can be performed to obtain the preprocessed RGB map and depth map.

S303、对预处理后的RGB图进行分割，得到分割图。S303 , segment the preprocessed RGB image to obtain a segmented image.

示例性地，可以将预处理后的RGB图输入用于人像分割推理的深度卷积神经网络(deep convolutional neural networks，DCNN)模型中，该DCNN模型可以对预处理后的RGB图进行人像掩码(mask)推理，得到该DCNN模型输出的掩码图。然后，可以将DCNN模型输出的掩码图进行连通区域分割，得到RGB图对应的人像分割图。也即，S303中得到的分割图可以是人像分割图。Exemplarily, the preprocessed RGB image can be input into a deep convolutional neural network (DCNN) model for portrait segmentation inference, and the DCNN model can perform a portrait mask on the preprocessed RGB image. (mask) inference to get the mask map output by the DCNN model. Then, the mask image output by the DCNN model can be divided into connected regions to obtain the portrait segmentation map corresponding to the RGB image. That is, the segmentation map obtained in S303 may be a portrait segmentation map.

可选地，上述DCNN模型为离线训练好的神经网络模型。其他一些实现方式中，对预处理后的RGB进行分割时，也可以采用其他用于分割推理的网络模型，并不限于用于人像分割推理的DCNN模型，在此不作限制。Optionally, the above DCNN model is a neural network model trained offline. In some other implementation manners, other network models for segmentation inference can also be used when segmenting the preprocessed RGB, which is not limited to the DCNN model for portrait segmentation inference, which is not limited here.

S304、将分割图和预处理后的深度图进行融合，得到优化后的深度图。S304 , fuse the segmentation map and the preprocessed depth map to obtain an optimized depth map.

示例性地，以S303中得到的分割图是人像分割图为例，将分割图和预处理后的深度图进行融合的步骤可以包括：逐像素遍历预处理后的深度图，获取预处理后的深度图中的每个像素对应的人像实例信息(本申请实施例中，人像实例或其他对象实例可以称为第一对象实例)，如果是背景(不是人像)则忽略；如果是人像，则记录有效深度。统计预处理后的深度图中每个人像实例对应的有效深度总和、以及有效深度的像素个数，根据预处理后的深度图中每个人像实例对应的有效深度总和、以及有效深度的像素个数，计算得到每个人像实例的平均深度。然后，将人像分割图上采样至第一分辨率并针对每个人像实例填充前述平均深度，从而可以质量更好、分辨率更高的人像深度图，也即，该人像深度图即为优化后的深度图。Exemplarily, taking the segmentation map obtained in S303 as an example of a portrait segmentation map, the step of fusing the segmentation map and the preprocessed depth map may include: traversing the preprocessed depth map pixel by pixel, and obtaining the preprocessed depth map. Portrait instance information corresponding to each pixel in the depth map (in the embodiment of this application, a portrait instance or other object instance may be referred to as the first object instance), if it is a background (not a portrait), ignore it; if it is a portrait, record effective depth. Count the sum of the effective depths corresponding to each portrait instance in the preprocessed depth map and the number of pixels in the effective depth. number to calculate the average depth of each portrait instance. Then, the portrait segmentation map is up-sampled to the first resolution and the aforementioned average depth is filled for each portrait instance, so that a portrait depth map with better quality and higher resolution can be obtained, that is, the portrait depth map is optimized depth map.

下面以分割图为人像分割图为例，对该深度图优化方法进行更详细地举例说明。The depth map optimization method is illustrated in more detail below by taking the segmentation map as a portrait segmentation map as an example.

请参考图4和图5所示，图4示出了本申请实施例提供的深度图优化方法的逻辑示意图，图5示出了本申请实施例提供的深度图优化方法的效果示意图。Referring to FIG. 4 and FIG. 5 , FIG. 4 shows a schematic diagram of the logic of the depth map optimization method provided by the embodiment of the present application, and FIG. 5 is a schematic diagram of the effect of the depth map optimization method provided by the embodiment of the present application.

如图4所示，本申请实施例中，对彩色相机采集的RGB图、以及深度相机采集的深度图进行预处理的过程可以通过预处理模块完成。(手机)在获取到彩色相机采集的RGB图、以及深度相机采集的深度图后，可以将RGB图和深度图输入预处理模块，预处理模块可以对RGB图和深度图进行预处理。As shown in FIG. 4 , in the embodiment of the present application, the process of preprocessing the RGB image collected by the color camera and the depth image collected by the depth camera may be completed by a preprocessing module. (Mobile phone) After obtaining the RGB image collected by the color camera and the depth image collected by the depth camera, the RGB image and the depth image can be input into the preprocessing module, and the preprocessing module can preprocess the RGB image and the depth map.

假设RGB图为Rgb@1440*1080(即分别率为1440*1080)、深度图为DepthLR@48*30(即分别率为48*30)，则预处理模块对RGB图进行预处理的步骤可以如下。其中，Rgb@1440*1080可以参考图5中的(a)所示，DepthLR@48*30可以参考图5中的(b)所示。Assuming that the RGB image is Rgb@1440*1080 (ie, the resolution is 1440*1080), and the depth map is DepthLR@48*30 (ie, the resolution is 48*30), the preprocessing module preprocesses the RGB image. The steps can be as follows. Wherein, Rgb@1440*1080 may refer to (a) in FIG. 5 , and DepthLR@48*30 may refer to (b) in FIG. 5 .

1)将Rgb@1440*1080缩放至288*288，得到RgbSmall@288*288。1) Scale Rgb@1440*1080 to 288*288 to get RgbSmall@288*288.

2)由旋转角R，旋转RgbSamll@288*288保持人物水平，得到RgbSmallR@288*288。2) From the rotation angle R, rotate RgbSamll@288*288 to keep the character level, and get RgbSmallR@288*288.

3)根据归一化系数(r1，r2，g1，g2，b1，b2)，将RgbSmallR@288*288三个通道分别进行归一化，如R通道进行减去r1，并除以r2，G通道进行减去g1，并除以g2，B通道进行减去b1，并除以b2，得到RGBSmallRN@288*288。3) According to the normalization coefficients (r1, r2, g1, g2, b1, b2), the three channels of RgbSmallR@288*288 are respectively normalized, for example, the R channel is subtracted by r1, and divided by r2, G The channel is subtracted by g1 and divided by g2, and the B channel is subtracted by b1 and divided by b2 to get RGBSmallRN@288*288.

4)对RGBSmallRN@288*288进行通道转换，得到RgbSmallRNC@288*288。4) Perform channel conversion on RGBSmallRN@288*288 to obtain RgbSmallRNC@288*288.

具体地，由于RGBSmallRN@288*288的数据排列方式为HWC(H表示高、W表示宽、C表示通道)，而DCNN模型的输入数据排列方式为CHW，因此，需要对RGBSmallRN@288*288进行通道转换，得到数据排列方式为CHW的RgbSmallRNC@288*288。Specifically, since the data arrangement of RGBSmallRN@288*288 is HWC (H represents height, W represents width, and C represents channel), and the input data arrangement of DCNN model is CHW, therefore, it is necessary to perform RGBSmallRN@288*288 Channel conversion to obtain RgbSmallRNC@288*288 whose data arrangement is CHW.

对Rgb@1440*1080进行上述预处理，得到的RgbSmallRNC@288*288可以参考图5中的(c)所示。The above preprocessing is performed on Rgb@1440*1080, and the obtained RgbSmallRNC@288*288 can be referred to as shown in (c) in FIG. 5 .

预处理模块对深度图进行预处理的步骤可以如下。The steps of preprocessing the depth map by the preprocessing module may be as follows.

1)对DepthLR@48*30进行裁剪，得到DepthLRCrop@40*30。1) Crop DepthLR@48*30 to get DepthLRCrop@40*30.

具体地，由于DepthLR的数据格式问题，虽然DepthLR的分辨率为48*30，但实际上可用的只有40*30，因此需要先裁剪掉最右边的8*30，得到DepthLRCrop@40*30。Specifically, due to the data format problem of DepthLR, although the resolution of DepthLR is 48*30, only 40*30 is actually available, so it is necessary to crop the rightmost 8*30 first to obtain DepthLRCrop@40*30.

2)对DepthLRCrop@40*30进行数据解析，得到DepthLRCropT@40*30。2) Perform data analysis on DepthLRCrop@40*30 to obtain DepthLRCropT@40*30.

具体地，步骤1)中获取的DepthLR数据一般是用ushort类型存储的，每个数值用16位保存，而真实的深度值仅为后13位，因此需要解析数据，得到后13位真实深度的DepthLRCropT@40*30。Specifically, the DepthLR data obtained in step 1) is generally stored in the ushort type, and each value is stored in 16 bits, while the real depth value is only the last 13 bits, so it is necessary to parse the data to obtain the actual depth of the last 13 bits. DepthLRCropT@40*30.

3)根据Rgb-D对齐系数(scale)，将DepthLRCropT@40*30与预处理后的RGB图进行对齐，得到DepthAlign@54*40。scale也可以称为缩放系数，此处可以为1.348。3) According to the Rgb-D alignment coefficient (scale), align DepthLRCropT@40*30 with the preprocessed RGB image to obtain DepthAlign@54*40. scale can also be called a scaling factor, which can be 1.348 here.

具体地，根据Rgb-D对齐系数计算得到对齐后的深度图的分辨率应该为54*40，因此，需要将DepthLRCropT@40*30填充为DepthAlign@54*40。Specifically, the resolution of the aligned depth map calculated according to the Rgb-D alignment coefficient should be 54*40. Therefore, DepthLRCropT@40*30 needs to be filled with DepthAlign@54*40.

对DepthLR@48*30进行上述预处理，得到的DepthAlign@54*40可以参考图5中的(d)所示。The above preprocessing is performed on DepthLR@48*30, and the obtained DepthAlign@54*40 can be referred to as shown in (d) in FIG. 5 .

可以理解的，前述对深度图和RGB进行预处理的过程中，涉及到的旋转角R、Rgb-D对齐系数、归一化系数等参数可以为根据获取到的深度图和RGB图所确定的，如：可以根据彩色相机和深度相机采集RGB图和深度图时的相机参数确定，在此不再详细说明。It can be understood that in the aforementioned process of preprocessing the depth map and RGB, the involved parameters such as the rotation angle R, the Rgb-D alignment coefficient, and the normalization coefficient can be determined according to the acquired depth map and RGB image. For example, it can be determined according to the camera parameters when the color camera and the depth camera collect the RGB image and the depth image, and will not be described in detail here.

在分别对RGB图和深度图进行预处理后，可以对预处理后的RGB图进行分割，得到RGB图对应的人像分割图。其中，对预处理后的RGB图进行分割的过程可以通过DCNN分割模块完成。After preprocessing the RGB image and the depth map respectively, the preprocessed RGB image can be segmented to obtain a portrait segmentation map corresponding to the RGB image. Among them, the process of segmenting the preprocessed RGB image can be completed by the DCNN segmentation module.

DCNN分割模块可以包括DCNN_PeopleSeg(DCNN_PeopleSeg是一个用于人像分割推理的深度卷积神经网络模型，已离线训练完成)、以及连通区域分割子模块。以前述预处理后的RGB图：RgbSmallRNC@288*288为例，可以将RgbSmallRNC@288*288输入到DCNN_PeopleSeg中进行推理，得到DCNN_PeopleSeg输出的MaskOut@288*288。然后，连通区域分割子模块可以对MaskOut@288*288进行连通区域分割，得到ushort格式的Mask@288*28，其中，Mask@288*28中背景的标签为0、人像实例的标签从1开始。例如，Mask@288*28可以参考图5中的(e)所示，包括两个人像实例，则第一个人像实例(如左侧)的标签为1，第二个人像实例(右侧)的标签为2。The DCNN segmentation module can include DCNN_PeopleSeg (DCNN_PeopleSeg is a deep convolutional neural network model for portrait segmentation inference, which has been trained offline), and a connected region segmentation sub-module. Taking the preprocessed RGB image: RgbSmallRNC@288*288 as an example, you can input RgbSmallRNC@288*288 into DCNN_PeopleSeg for inference, and get the MaskOut@288*288 output by DCNN_PeopleSeg. Then, the connected area segmentation sub-module can segment the connected area of MaskOut@288*288 to obtain Mask@288*28 in ushort format, where the label of the background in Mask@288*28 is 0, and the label of the portrait instance starts from 1 . For example, Mask@288*28 can refer to (e) in Figure 5, including two portrait instances, then the label of the first portrait instance (such as the left) is 1, and the second portrait instance (right) The label is 2.

在得到RGB图对应的人像分割图后，可以将人像分割图和前述预处理后的深度图进行融合，得到优化后的深度图。将人像分割图和前述预处理后的深度图进行融合的过程可以通过多信息融合模块实现。After the portrait segmentation map corresponding to the RGB image is obtained, the portrait segmentation map and the preprocessed depth map can be fused to obtain an optimized depth map. The process of fusing the portrait segmentation map and the aforementioned preprocessed depth map can be realized by a multi-information fusion module.

以前述人像分割图：Mask@288*28、以及预处理后的深度图DepthAlign@54*40为例，多信息融合模块将人像分割图和前述预处理后的深度图进行融合的步骤可以如下。Taking the aforementioned portrait segmentation map: Mask@288*28 and the preprocessed depth map DepthAlign@54*40 as examples, the multi-information fusion module fuses the portrait segmentation map and the aforementioned preprocessed depth map The steps can be as follows.

1)将Mask@288*288缩放至DepthAlign@54*40的大小，得到MaskLR@54*40。1) Scale Mask@288*288 to the size of DepthAlign@54*40 to get MaskLR@54*40.

2)同时遍历MaskLR@54*40和DepthAlign@54*40，得到每个标签i(i等于0、1、2等)所对应的有效深度值个数num_i和深度值的总和value_i。以标签2为例(即第二个人像实例)，同时遍历MaskLR@54*40和DepthAlign@54*40中每个像素点(x，y)，x和y表示像素点坐标，当MaskLR(x，y)等于2，并且DepthALign(x，y)大于0时，num_2加1，value_2加DepthALign(x，y)，按此步骤，遍历全图，得到所有标签的num_i和value_i。2) Traverse MaskLR@54*40 and DepthAlign@54*40 at the same time to obtain the number of valid depth values num_i and the sum of depth values value_i corresponding to each label i (i is equal to 0, 1, 2, etc.). Taking label 2 as an example (that is, the second portrait instance), traverse each pixel (x, y) in MaskLR@54*40 and DepthAlign@54*40 at the same time, x and y represent pixel coordinates, when MaskLR(x , y) is equal to 2, and when DepthALign(x, y) is greater than 0, add 1 to num_2, and add DepthALign(x, y) to value_2. According to this step, traverse the whole image to get num_i and value_i of all labels.

3)计算每个标签的平均深度值，即value_i除以num_i，得到avg_i。此时，每个标签i的avg_i即为标签i所对应的平均深度值。3) Calculate the average depth value of each label, that is, divide value_i by num_i to get avg_i. At this time, the avg_i of each label i is the average depth value corresponding to the label i.

4)初始化DepthT@288*288，遍历Mask@288*288中所有像素点(x，y)，当Mask(x，y)等于i时，DepthT(x，y)的值则为avg_i。所有像素遍历完后，得到具有真实深度值的DepthT@288*288。4) Initialize DepthT@288*288, traverse all pixels (x, y) in Mask@288*288, when Mask(x, y) is equal to i, the value of DepthT(x, y) is avg_i. After all pixels are traversed, DepthT@288*288 with the true depth value is obtained.

5)将DepthT@288*288缩放至第一分辨率并输出，如第一分辨率为640*480时，可以输出得到DepthHR@640*480。DepthHR@640*480可以参考图5中的(f)所示。5) Scale DepthT@288*288 to the first resolution and output, for example, when the first resolution is 640*480, you can output DepthHR@640*480. DepthHR@640*480 can refer to (f) in FIG. 5 .

相较于深度相机采集的深度图DepthLR@48*30而言，本申请实施例中，通过前述处理过程，得到的深度图DepthHR@640*480中，人体区域被填充了完整的深度值，人像实例的深度的精度更高，质量会更好，另外，DepthHR@640*480的分辨率也更高。从而，实现了对深度相机采集的深度图的优化。Compared with the depth map DepthLR@48*30 collected by the depth camera, in the embodiment of the present application, in the depth map DepthHR@640*480 obtained through the aforementioned processing process, the human body area is filled with the complete depth value, and the human body area is filled with the complete depth value. The depth of the instance is more accurate and the quality will be better, in addition, the resolution of DepthHR@640*480 is also higher. Thus, optimization of the depth map acquired by the depth camera is achieved.

可选地，本申请前述实施例中虽然是优化深度图中的人像深度为例进行说明的，但是，本申请实施例同样适用于优化深度图中的其他对象(如动物、树木等)的深度。换言之，可以将本申请实施例扩展应用至优化更多场景下的深度图，不限于人像深度。例如，可以将前述DCNN模型训练为用于其他对象分割推理的网络模型，以优化更多场景下的深度图。Optionally, although optimizing the depth of a portrait in a depth map is described as an example in the foregoing embodiments of the present application, the embodiments of the present application are also applicable to optimizing the depths of other objects (such as animals, trees, etc.) in the depth map. . In other words, the embodiments of the present application can be extended and applied to optimize depth maps in more scenes, not limited to portrait depth. For example, the aforementioned DCNN model can be trained as a network model for other object segmentation inference to optimize depth maps in more scenarios.

另外，还需要说明的是，本申请实施例提供的该深度图优化方法，不仅可以应用于前述实施例中提到的AR技术中的虚实融合场景，还可以应用于大光圈虚化、碰撞检测等更多基于深度图的应用场景中，在此不作限制。In addition, it should also be noted that the depth map optimization method provided in this embodiment of the present application can not only be applied to the virtual-real fusion scene in the AR technology mentioned in the foregoing embodiment, but also can be applied to large-aperture blurring and collision detection In other application scenarios based on depth map, there is no limit here.

可选地，本申请另外一些实施例中，前述S301中所述的深度相机采集的深度图，也可以被替换为其他方式获取到的深度图，例如，可以是单目估计的深度图，本申请在此同样不作限制。Optionally, in some other embodiments of the present application, the depth map acquired by the depth camera described in S301 can also be replaced with a depth map obtained in other ways, for example, it can be a monocular estimated depth map. The application is also not limited here.

进一步地，目前，也有一些深度图的处理方式为对于一些移动端应用场景(如手机、平板电脑、可穿戴设备等)而言，进行虚实遮挡处理需要实时性，如：手机对拍摄到的每一帧真实场景对应的RGB图而言，都需要通过深度相机采集相应的深度图以进行虚实遮挡处理，因此，这些拍摄硬件设备会占用较高的手机功耗。而通过本申请实施例提供的该深度图优化方法，还可以在一定程度上减少手机功耗，实现低功耗、低时延的虚实遮挡处理。Further, at present, there are also some depth map processing methods. For some mobile application scenarios (such as mobile phones, tablet computers, wearable devices, etc.), real-time occlusion processing requires real-time processing. For the RGB image corresponding to a frame of real scene, the corresponding depth image needs to be collected by the depth camera for virtual and real occlusion processing. Therefore, these shooting hardware devices will occupy a high power consumption of the mobile phone. With the depth map optimization method provided by the embodiments of the present application, the power consumption of the mobile phone can also be reduced to a certain extent, and the virtual and real occlusion processing with low power consumption and low delay can be realized.

以手机通过TOF深度相机(或TOF深度图传感器)采集真实场景对应的深度图为例，一般而言，TOF深度相机的分辨率为240*180，当应用场景通常需要5米(甚至更远)的深度时，TOF深度相机需要较强的曝光，而高曝光高分辨率必然导致高功耗。而通过本申请实施例提供的深度图优化方法，可以提高深度图的分辨率，所以，可以通过降低TOF深度相机采集深度图的分辨率(如48*30)和帧率(如10fps)，以达到低功耗的目的。另外，由于该深度图优化方法无需采用专门的算法对深度图进行超分处理，所以处理过程更加轻便简介，从而可以达到较低的时延。Take the mobile phone collecting the depth map corresponding to the real scene through the TOF depth camera (or TOF depth map sensor) as an example. Generally speaking, the resolution of the TOF depth camera is 240*180, and the application scene usually requires 5 meters (or even further) When the depth is high, the TOF depth camera requires strong exposure, and high exposure and high resolution will inevitably lead to high power consumption. The depth map optimization method provided by the embodiments of the present application can improve the resolution of the depth map. Therefore, the resolution (such as 48*30) and frame rate (such as 10fps) of the depth map collected by the TOF depth camera can be achieve the purpose of low power consumption. In addition, since the depth map optimization method does not need to use a special algorithm to perform superdivision processing on the depth map, the processing process is more lightweight and simple, so that a lower delay can be achieved.

综上所述，本申请实施例提供的该深度优化方法，通过对彩色相机采集的RGB图、以及深度相机采集的深度图进行预处理，对预处理后的RGB图进行分割，得到RGB图对应的分割图，然后将深度图和RGB图对应的分割图进行融合，可以得到质量更好、分辨率更高的优化后的深度图。同时，通过该深度图优化方法获取优化后的深度图的方式可以实现更低的时延和功耗。To sum up, the depth optimization method provided by the embodiments of the present application performs preprocessing on the RGB image collected by the color camera and the depth image collected by the depth camera, and then divides the preprocessed RGB image to obtain the corresponding RGB image. , and then fuse the depth map with the corresponding segmentation map of the RGB image to obtain an optimized depth map with better quality and higher resolution. At the same time, by obtaining the optimized depth map through the depth map optimization method, lower latency and power consumption can be achieved.

也即，通过本申请实施例提供的该深度优化方法，可以解决深度相机采集的深度图空洞、噪声、受环境影响等问题，低功耗、低时延地输出质量更好、分辨率更高的优化后的深度图。That is, the depth optimization method provided by the embodiments of the present application can solve the problems such as holes, noise, and environmental influences in the depth map collected by the depth camera, and the output quality and resolution are better with low power consumption and low delay. The optimized depth map of .

应当理解，本申请实施例提供的该深度图优化方法同样可以应用于其他非移动端，如：服务器、计算机等设备，同样可以达到与前述实施例所述相同的有益效果，在此不再赘述。It should be understood that the depth map optimization method provided in the embodiments of the present application can also be applied to other non-mobile terminals, such as servers, computers and other devices, and can also achieve the same beneficial effects as those described in the foregoing embodiments, which will not be repeated here. .

对应于前述实施例中所述的深度图优化方法，本申请实施例还提供一种深度图优化装置，可以应用于终端设备。该装置的功能可以通过硬件实现，也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块或单元。例如，图6示出了本申请实施例提供的深度图优化装置的结构示意图。如图6所示，该深度图优化装置可以包括：获取模块601、分割模块602、多信息融合模块603等。Corresponding to the depth map optimization method described in the foregoing embodiments, an embodiment of the present application further provides a depth map optimization apparatus, which can be applied to a terminal device. The functions of the apparatus may be implemented by hardware, or by executing corresponding software by hardware. The hardware or software includes one or more modules or units corresponding to the above functions. For example, FIG. 6 shows a schematic structural diagram of an apparatus for optimizing a depth map provided by an embodiment of the present application. As shown in FIG. 6, the depth map optimization apparatus may include: an acquisition module 601, a segmentation module 602, a multi-information fusion module 603, and the like.

其中，获取模块601，用于获取真实场景对应的深度图和RGB图。分割模块602，用于对RGB图进行分割，得到RGB图对应的分割图。多信息融合模块603，用于根据分割图，确定深度图中的有效深度，并根据深度图中的有效深度和分割图，融合得到真实场景对应的优化后的深度图。Among them, the obtaining module 601 is used to obtain the depth map and RGB map corresponding to the real scene. The segmentation module 602 is configured to segment the RGB image to obtain a segmentation image corresponding to the RGB image. The multi-information fusion module 603 is configured to determine the effective depth in the depth map according to the segmentation map, and fuse to obtain an optimized depth map corresponding to the real scene according to the effective depth in the depth map and the segmentation map.

可选地，多信息融合模块603具体用于，逐像素遍历深度图，获取深度图中的每个像素对应的第一对象实例的信息，得到深度图中每个第一对象实例对应的有效深度总和、以及有效深度的像素个数；根据深度图中每个第一对象实例对应的有效深度总和、以及有效深度的像素个数，确定每个第一对象实例的平均深度；将分割图上采样至第一分辨率，并针对每个第一对象实例填充第一对象实例的平均深度，得到真实场景对应的优化后的深度图。Optionally, the multi-information fusion module 603 is specifically used to traverse the depth map pixel by pixel, obtain the information of the first object instance corresponding to each pixel in the depth map, and obtain the effective depth corresponding to each first object instance in the depth map. The sum and the number of pixels of the effective depth; the average depth of each first object instance is determined according to the sum of the effective depths corresponding to each first object instance in the depth map and the number of pixels of the effective depth; the segmentation map is upsampled to the first resolution, and fill in the average depth of the first object instance for each first object instance to obtain an optimized depth map corresponding to the real scene.

可选地，所述多信息融合模块603在根据分割图，确定深度图中的有效深度之前，还用于，将分割图缩放至与深度图相同大小。Optionally, before determining the effective depth in the depth map according to the segmentation map, the multi-information fusion module 603 is further configured to scale the segmentation map to the same size as the depth map.

可选地，图7示出了本申请实施例提供的深度图优化装置的另一结构示意图。如图7所示，该深度图优化装置还包括：预处理模块604，用于对RGB图和深度图进行对齐和归一化。Optionally, FIG. 7 shows another schematic structural diagram of the depth map optimization apparatus provided by the embodiment of the present application. As shown in FIG. 7 , the depth map optimization apparatus further includes: a preprocessing module 604 for aligning and normalizing the RGB map and the depth map.

可选地，分割模块602具体用于，采用训练好的深度神经网络模型对RGB图进行推理，得到RGB图对应的掩码图；对掩码图进行连通区域分割，得到分割图。Optionally, the segmentation module 602 is specifically configured to use the trained deep neural network model to infer the RGB image to obtain a mask image corresponding to the RGB image; and to segment the connected region of the mask image to obtain a segmentation image.

例如，分割模块602可以是前述DCNN分割模块。For example, the segmentation module 602 may be the aforementioned DCNN segmentation module.

应理解以上装置中单元或模块(以下均称为单元)的划分仅仅是一种逻辑功能的划分，实际实现时可以全部或部分集成到一个物理实体上，也可以物理上分开。且装置中的单元可以全部以软件通过处理元件调用的形式实现；也可以全部以硬件的形式实现；还可以部分单元以软件通过处理元件调用的形式实现，部分单元以硬件的形式实现。It should be understood that the division of units or modules (hereinafter referred to as units) in the above apparatus is only a division of logical functions, and in actual implementation, all or part of them may be integrated into a physical entity, or may be physically separated. And all the units in the device can be realized in the form of software calling through the processing element; also can all be realized in the form of hardware; some units can also be realized in the form of software calling through the processing element, and some units can be realized in the form of hardware.

例如，各个单元可以为单独设立的处理元件，也可以集成在装置的某一个芯片中实现，此外，也可以以程序的形式存储于存储器中，由装置的某一个处理元件调用并执行该单元的功能。此外这些单元全部或部分可以集成在一起，也可以独立实现。这里所述的处理元件又可以称为处理器，可以是一种具有信号的处理能力的集成电路。在实现过程中，上述方法的各步骤或以上各个单元可以通过处理器元件中的硬件的集成逻辑电路实现或者以软件通过处理元件调用的形式实现。For example, each unit can be a separately established processing element, or can be integrated in a certain chip of the device to be implemented, and can also be stored in the memory in the form of a program, which can be called by a certain processing element of the device and execute the unit's processing. Function. In addition, all or part of these units can be integrated together, and can also be implemented independently. The processing element described here may also be called a processor, which may be an integrated circuit with signal processing capability. In the implementation process, each step of the above method or each of the above units may be implemented by an integrated logic circuit of hardware in the processor element or implemented in the form of software being invoked by the processing element.

在一个例子中，以上装置中的单元可以是被配置成实施以上方法的一个或多个集成电路，例如：一个或多个专用集成电路(application specific integrated circuit，ASIC)，或，一个或多个数字信号处理器(digital signal process，DSP)，或，一个或者多个现场可编辑逻辑门阵列(field programmable gate array，FPGA)，或这些集成电路形式中至少两种的组合。In one example, a unit in the above apparatus may be one or more integrated circuits configured to implement the above method, eg, one or more application specific integrated circuits (ASICs), or, one or more A digital signal processor (DSP), or, one or more field programmable gate arrays (FPGA), or a combination of at least two of these integrated circuit forms.

再如，当装置中的单元可以通过处理元件调度程序的形式实现时，该处理元件可以是通用处理器，例如CPU或其它可以调用程序的处理器。再如，这些单元可以集成在一起，以片上系统(system-on-a-chip，SOC)的形式实现。For another example, when a unit in the apparatus can be implemented in the form of a processing element scheduler, the processing element can be a general-purpose processor, such as a CPU or other processors that can invoke programs. For another example, these units can be integrated together and implemented in the form of a system-on-a-chip (SOC).

在一种实现中，以上装置实现以上方法中各个对应步骤的单元可以通过处理元件调度程序的形式实现。例如，该装置可以包括处理元件和存储元件，处理元件调用存储元件存储的程序，以执行以上方法实施例所述的方法。存储元件可以为与处理元件处于同一芯片上的存储元件，即片内存储元件。In one implementation, the unit of the above apparatus for implementing each corresponding step in the above method may be implemented in the form of a processing element scheduler. For example, the apparatus may include a processing element and a storage element, and the processing element invokes a program stored in the storage element to execute the method described in the above method embodiments. The storage element may be a storage element on the same chip as the processing element, ie, an on-chip storage element.

在另一种实现中，用于执行以上方法的程序可以在与处理元件处于不同芯片上的存储元件，即片外存储元件。此时，处理元件从片外存储元件调用或加载程序于片内存储元件上，以调用并执行以上方法实施例所述的方法。In another implementation, the program for performing the above method may be in a storage element on a different chip from the processing element, ie, an off-chip storage element. At this time, the processing element calls or loads the program from the off-chip storage element to the on-chip storage element, so as to call and execute the methods described in the above method embodiments.

例如，本申请实施例还可以提供一种装置，如：电子设备，可以包括：处理器，用于存储该处理器可执行指令的存储器。该处理器被配置为执行上述指令时，使得该电子设备实现如前述实施例所述的深度图优化方法。该存储器可以位于该电子设备之内，也可以位于该电子设备之外。且该处理器包括一个或多个。For example, an embodiment of the present application may further provide an apparatus, such as an electronic device, which may include a processor, a memory for storing instructions executable by the processor. When the processor is configured to execute the above instructions, the electronic device implements the depth map optimization method described in the foregoing embodiments. The memory may be located within the electronic device or external to the electronic device. And the processor includes one or more.

在又一种实现中，该装置实现以上方法中各个步骤的单元可以是被配置成一个或多个处理元件，这里的处理元件可以为集成电路，例如：一个或多个ASIC，或，一个或多个DSP，或，一个或者多个FPGA，或者这些类集成电路的组合。这些集成电路可以集成在一起，构成芯片。In yet another implementation, the unit of the apparatus implementing each step in the above method may be configured as one or more processing elements, where the processing elements may be integrated circuits, such as: one or more ASICs, or, one or more Multiple DSPs, or, one or more FPGAs, or a combination of these types of integrated circuits. These integrated circuits can be integrated together to form chips.

例如，本申请实施例还提供一种芯片，该芯片可以应用于上述电子设备。芯片包括一个或多个接口电路和一个或多个处理器；接口电路和处理器通过线路互联；处理器通过接口电路从电子设备的存储器接收并执行计算机指令，以实现如前述实施例所述的深度图优化。For example, an embodiment of the present application further provides a chip, which can be applied to the above-mentioned electronic device. The chip includes one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected by lines; the processor receives and executes computer instructions from the memory of the electronic device through the interface circuit, so as to realize the above-mentioned embodiments. Depth map optimization.

本申请实施例还提供一种计算机程序产品，包括计算机可读代码，当计算机可读代码在电子设备中运行时，使得电子设备实现如前述实施例所述的深度图优化。Embodiments of the present application also provide a computer program product, including computer-readable codes, which, when the computer-readable codes are executed in an electronic device, enable the electronic device to implement the depth map optimization described in the foregoing embodiments.

通过以上的实施方式的描述，所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将装置的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。From the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated as required. It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个装置，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是一个物理单元或多个物理单元，即可以位于一个地方，或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed to multiple different places . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个可读取存储介质中。基于这样的理解，本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，如：程序。该软件产品存储在一个程序产品，如计算机可读存储介质中，包括若干指令用以使得一个设备(可以是单片机，芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application essentially or contribute to the prior art, or all or part of the technical solutions may be embodied in the form of software products, such as programs. The software product is stored in a program product, such as a computer-readable storage medium, and includes several instructions to cause a device (which may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all of the methods described in the various embodiments of the present application. or partial steps. The aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk and other mediums that can store program codes.

例如，本申请实施例还可以提供一种计算机可读存储介质，其上存储有计算机程序指令。当计算机程序指令被电子设备执行时，使得电子设备实现如前述实施例所述的深度图优化。For example, the embodiments of the present application may further provide a computer-readable storage medium on which computer program instructions are stored. The computer program instructions, when executed by an electronic device, cause the electronic device to implement the depth map optimization as described in the previous embodiments.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何在本申请揭露的技术范围内的变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this, and any changes or substitutions within the technical scope disclosed in the present application should be covered within the protection scope of the present application. . Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A method for depth map optimization, the method comprising:

acquiring a depth map and an RGB map corresponding to a real scene;

dividing the RGB map to obtain a division map corresponding to the RGB map;

determining effective depth in the depth map according to the segmentation map;

and according to the effective depth in the depth map and the segmentation map, fusing to obtain an optimized depth map corresponding to the real scene.

2. The method of claim 1, wherein determining the effective depth in the depth map from the segmentation map comprises:

traversing the depth map pixel by pixel, acquiring information of a first object example corresponding to each pixel in the depth map, and obtaining the effective depth sum corresponding to each first object example in the depth map and the number of pixels of the effective depth;

the obtaining of the optimized depth map corresponding to the real scene through fusion according to the effective depth in the depth map and the segmentation map includes:

determining the average depth of each first object instance according to the effective depth sum corresponding to each first object instance in the depth map and the number of pixels of the effective depth;

and upsampling the segmentation map to a first resolution, and filling the average depth of the first object instance for each first object instance to obtain an optimized depth map corresponding to the real scene.

3. The method of claim 2, wherein the segmentation map is a portrait segmentation map and the first object instance is a portrait instance.

4. The method of any of claims 1-3, wherein prior to determining the effective depth in the depth map from the segmentation map, the method further comprises:

scaling the segmentation map to the same size as the depth map.

5. The method according to any of claims 1-4, wherein prior to said segmenting said RGB map, said method further comprises:

and aligning and normalizing the RGB map and the depth map.

6. The method according to any one of claims 1 to 5, wherein the segmenting the RGB map to obtain the segmentation map corresponding to the RGB map comprises:

reasoning the RGB map by adopting a trained deep neural network model to obtain a mask map corresponding to the RGB map;

and carrying out connected region segmentation on the mask image to obtain the segmentation image.

7. The method according to any of claims 1-6, wherein the depth map is a depth map acquired by a depth camera or a monocular estimated depth map;

wherein the depth camera comprises: any one of a structured light depth camera, a time-of-flight depth camera, and a binocular depth camera.

8. A depth map optimization apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a depth map and an RGB map corresponding to a real scene;

the segmentation module is used for segmenting the RGB image to obtain a segmentation image corresponding to the RGB image;

and the multi-information fusion module is used for determining the effective depth in the depth map according to the segmentation map, and fusing to obtain the optimized depth map corresponding to the real scene according to the effective depth in the depth map and the segmentation map.

9. An electronic device, comprising: a processor, a memory for storing the processor-executable instructions;

the processor is configured to, when executing the instructions, cause the electronic device to implement the method of any of claims 1-7.

10. A computer readable storage medium having stored thereon computer program instructions; it is characterized in that the preparation method is characterized in that,

the computer program instructions, when executed by an electronic device, cause the electronic device to implement the method of any of claims 1-7.