CN112802081A

CN112802081A - Depth detection method and device, electronic equipment and storage medium

Info

Publication number: CN112802081A
Application number: CN202110105980.1A
Authority: CN
Inventors: 李雷; 李健华; 张家旺; 史璇珂; 王权; 钱晨
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2021-05-14
Also published as: WO2022160587A1

Abstract

The disclosure provides a depth detection method, a depth detection device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a binocular image of a target scene acquired by a binocular camera; generating a depth image of the target scene and a color image aligned with the depth image based on the binocular image; performing semantic segmentation on the color image to obtain a semantic segmentation result of the color image, wherein the semantic segmentation result is used for indicating pixel points matched with a target object in the color image; and generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image.

Description

Depth detection method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a depth detection method and apparatus, an electronic device, and a storage medium.

Background

In the computer vision technology, a process related to depth information recognition is often required, for example, in application scenarios such as Augmented Reality (AR) interaction, virtual photography, short video special effect, and the like, depth information recognition needs to be performed on a target object to achieve a more realistic picture effect.

In general, depth information recognition for an object may be accomplished by a monocular camera or a Time Of Flight (TOF) camera, for example, depth information between the object and the monocular camera may be estimated through an image collected by the monocular camera and a depth detection network trained in advance; or the depth image at which the object is located may be acquired by the TOF camera from the time interval between transmitting the light pulses and receiving the light pulses, and the speed of transmission of the light pulses.

When the image collected based on the monocular camera is used for depth information identification, the problem of large error exists depending on a depth detection network trained in advance, and when the depth information is determined based on the TOF camera, the defects of low resolution, poor outdoor scene effect and high hardware cost easily exist.

Disclosure of Invention

The disclosed embodiments provide at least one depth detection scheme.

In a first aspect, an embodiment of the present disclosure provides a depth detection method, including:

acquiring a binocular image of a target scene acquired by a binocular camera;

generating a depth image of the target scene and a color image aligned with the depth image based on the binocular image;

performing semantic segmentation on the color image to obtain a semantic segmentation result of the color image, wherein the semantic segmentation result is used for indicating pixel points matched with a target object in the color image;

and generating a target object depth map of the target object based on target image information belonging to the target object in the color image and the depth image.

In the embodiment of the disclosure, the binocular image of the target scene acquired by the binocular camera can be compared with the depth image with higher accuracy and the color image aligned with the depth image based on the monocular camera and the TOF camera, so that the color image can be subjected to semantic segmentation, the pixel point matched with the target object in the color image is determined, and the depth value of the pixel point of the target object can be accurately obtained in the depth image aligned with the color image.

In a possible embodiment, the generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image includes:

adjusting the depth value of a pixel point with the depth value not conforming to the preset depth value range in the depth image to be a target depth value;

determining the position information of the pixel point of the target object in the depth image aligned with the color image based on the position information of the pixel point of the target object in the color image indicated by the semantic segmentation result of the color image;

and adjusting the depth value of the pixel point except the image area where the target object is located in the depth image to the target depth value based on the determined position information of the pixel point, so as to obtain a target object depth map of the target object.

In the embodiment of the present disclosure, the depth values of the pixel points in the depth image whose depth values do not conform to the preset depth value range and the pixel points in the depth image except for the image area where the target object is located are adjusted to be the target depth values, for example, to be 0, so that the depth information of the pixel points except for the target object can be filtered, and thus the target object depth map capable of visually representing the depth information of the target object is obtained.

In a possible implementation, after generating the target object depth map of the target object, the method further includes:

and performing depth value correction on pixel points, corresponding to the depth values of the target object depth map, and performing smoothing processing on the corrected target object depth map to obtain an optimized target object depth map.

In the embodiment of the disclosure, the depth value of the pixel point with the depth value equal to the target depth value in the image area where the target object is located in the target object depth map is corrected, the depth value of the pixel point with the depth value equal to the target depth value in the image area where the target object is located is supplemented, and then the corrected target object is subjected to smoothing processing, so that the accuracy and the integrity of the obtained target object depth map can be improved.

In a possible implementation manner, the performing depth value correction on a pixel point, in the target object depth map, whose corresponding depth value is equal to the target depth value includes:

traversing pixel points of an image area where the target object is located in the target object depth map, and detecting whether the depth value of the traversed current pixel point is equal to the target depth value;

under the condition that the depth value of the current pixel point is not equal to the target depth value, whether the depth value of the first associated pixel point is equal to the target depth value or not is detected; the first associated pixel point is a pixel point which is adjacent to the current pixel point and is in the image area where the target object is located;

and under the condition that the depth value of the first associated pixel point is detected to be equal to the target depth value, updating the depth value of the first associated pixel point to the depth value of the current pixel point.

In the embodiment of the present disclosure, when the depth value of the current pixel is not equal to the target depth value, the depth value of the pixel that is adjacent and in the image area where the target object is located may be complemented by the depth value of the current pixel, and according to this manner, the true depth value of the pixel whose depth value is equal to the target depth value in the image area where the target object is located in the depth map may be quickly determined, so as to obtain a complete depth map of the target object.

In a possible implementation, the smoothing of the modified target object depth map includes:

determining the average depth value of any pixel point and a second associated pixel point aiming at any pixel point of the image area where the target object is located in the corrected target object depth map; the second associated pixel point is a pixel point which is within a preset pixel point quantity range and is in the image area of the target object at an interval with any one pixel point;

and adjusting the depth value of any pixel point to the average depth value.

In the embodiment of the present disclosure, the depth value of any pixel point in the target object is smoothed by the pixel point in the preset pixel point number range spaced from any pixel point and the depth value of any pixel point, so that a smooth target object depth map can be obtained.

In a possible implementation, after generating a target object depth map of the target object, the depth detection method further includes:

determining first position information of the target object in a real scene based on the position information of the target object in the color image, the target object depth map and the pose data of the binocular camera;

determining second position information of a target virtual object in the real scene based on the position information of the target virtual object in a three-dimensional scene model matched with the real scene;

and determining an AR special effect when the AR equipment presents the target virtual object according to the pose data, the first position information and the second position information of the AR equipment in the real scene.

In the embodiment of the disclosure, after the depth map of the target object is determined, the first position information of the target object in the world coordinate system can be further determined, so that the AR special effect of the target virtual object when the target virtual object is presented in the AR device can be determined by combining the second position information of the target virtual object in the world coordinate system, and the sense of reality of the AR special effect is improved.

In a possible embodiment, the semantically segmenting the color image to obtain a semantic segmentation result of the color image includes:

classifying and identifying the pixel points in the color image according to a pre-trained target neural network to obtain the probability that the pixel points in the color image belong to a target object;

and obtaining a semantic segmentation result of the color image based on the probability that each pixel point in the color image belongs to the target object.

In the embodiment of the disclosure, each pixel point in the color image can be classified through the pre-trained target neural network, the pixel point belonging to the target object in the color image is quickly determined, and the depth value of each pixel point contained in the target object can be conveniently extracted in the depth image based on the pixel position information of the pixel point belonging to the target object in the color image.

In a second aspect, an embodiment of the present disclosure provides a depth detection apparatus, including:

the acquisition module is used for acquiring binocular images of a target scene acquired by the binocular camera;

a first generation module, configured to generate a depth image of the target scene and a color image aligned with the depth image based on the binocular image;

the segmentation module is used for performing semantic segmentation on the color image to obtain a semantic segmentation result of the color image, and the semantic segmentation result is used for indicating pixel points matched with the target object;

and the second generation module is used for generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the depth detection method according to the first aspect.

In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, performs the steps of the depth detection method according to the first aspect.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flow chart of a depth detection method provided by an embodiment of the present disclosure;

FIG. 2 is a flow chart illustrating a method for determining a depth map of a target object according to an embodiment of the present disclosure;

FIG. 3a illustrates a target object depth map provided by an embodiment of the present disclosure;

FIG. 3b illustrates a modified target object depth map provided by an embodiment of the present disclosure;

fig. 4 shows a flowchart of a method for correcting a target object depth map according to an embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating a method for smoothing a depth map of a target object according to an embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating a method for determining an AR effect according to an embodiment of the disclosure;

fig. 7 is a schematic structural diagram illustrating a depth detection apparatus provided in an embodiment of the present disclosure;

fig. 8 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Depth detection Of a target object, such as human depth detection, has important applications in various scenes, such as AR interaction, virtual photographing, short video special effects, and the like, and the existing human depth detection technology is usually completed based on a monocular camera or a Time Of Flight (TOF) camera, and errors Of human depth information determined by the two methods are large.

Based on the research, the depth detection method provided by the disclosure is characterized in that a binocular image of a target scene acquired by a binocular camera is used, and compared with a depth image with higher accuracy and a color image aligned with the depth image which can be acquired by a monocular camera and a TOF camera, semantic segmentation can be performed on the color image, and pixel points matched with a target object are determined to be contained in the color image, so that the depth value of each pixel point of the target object can be accurately acquired in the depth image aligned with the color image.

To facilitate understanding of the present embodiment, first, a depth detection method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the depth detection method provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and the computer device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a handheld device, a computing device, a vehicle-mounted device, a wearable device, or a server or other processing device. In some possible implementations, the depth detection method may be implemented by a processor calling computer readable instructions stored in a memory.

Referring to fig. 1, a flowchart of a depth detection method provided in an embodiment of the present disclosure is shown, where the depth detection method includes the following steps S101 to S104:

and S101, acquiring binocular images of a target scene acquired by a binocular camera.

Illustratively, a target scene includes a target object to be subjected to depth extraction, and two images of the same target scene may be acquired by a binocular camera, for example, the two images include a first image and a second image, where the first image and the second image may be red, green, blue, RGB images or grayscale images.

And S102, generating a depth image of the target scene and a color image aligned with the depth image based on the binocular image.

For example, after obtaining the binocular images, the depth image of the target scene may be determined according to disparity information between the binocular images, such as the depth image of the target scene and the color image aligned with the depth image may be obtained from the binocular images through a binocular stereo matching algorithm.

Illustratively, the depth image of the target scene includes a depth value corresponding to each pixel point, and the depth value corresponding to each pixel point can represent a distance between a position indicated by the pixel point in the target scene and the binocular camera.

Illustratively, in the case where the binocular image is an RGB image, the color image is also an RGB image, and in the case where the binocular image is a grayscale image, the color image is also a grayscale image.

Illustratively, the depth image of the target scene and the color image aligned with the depth image are the same in size, pixels having the same pixel coordinate in the depth image and the color image can represent the same position point in the target scene, for example, a pixel point having the pixel coordinate (i, j) in the image coordinate system corresponding to the color image can represent the color feature of the position point a in the target scene, and a pixel point having the same pixel coordinate (i, j) in the image coordinate system corresponding to the depth image aligned with the color image can represent the depth value of the position point a in the target scene, so that the target object depth map of the target object can be determined based on the color image and the depth image.

S103, performing semantic segmentation on the color image to obtain a semantic segmentation result of the color image, wherein the semantic segmentation result is used for indicating pixel points matched with the target object in the color image.

Exemplarily, semantic segmentation can be performed on the color image according to a pre-trained semantic segmentation model to obtain a semantic segmentation result of the color image, the semantic segmentation result can be represented by a binary image aligned with the color image, a pixel point in the color image, which is matched with the target object, can be represented by a first label value, such as 1, in the binary image, and a pixel point, which is matched with a region outside the target object, can be represented by a second label value, such as 0, in the binary image.

And S104, generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image.

For example, considering that the depth image and the color image are aligned, the depth value of the pixel point indicated by the pixel position information may be extracted from the depth image corresponding to the color image based on the pixel position information of the pixel point in the color image matching the target object in the color image, and in this way, after obtaining the depth value corresponding to each pixel point matching the target object, a target object depth map of the target object may be generated, which may characterize the distance between the target object and the binocular camera in the target scene.

In the embodiment of the disclosure, the binocular image of the target scene acquired by the binocular camera can be compared with the depth image with higher accuracy and the color image aligned with the depth image based on the monocular camera and the TOF camera, so that the color image can be subjected to semantic segmentation, the pixel points matched with the target object in the color image are determined, and the depth value of each pixel point of the target object can be accurately obtained in the depth image aligned with the color image.

In one embodiment, regarding S103, when performing semantic segmentation on the color image to obtain a semantic segmentation result of the color image, the following S1031 to S1032 are included:

s1031, classifying and identifying the pixel points in the color image to obtain the probability that the pixel points in the color image belong to the target object;

s1032, obtaining a semantic segmentation result of the color image based on the probability that each pixel point in the color image belongs to the target object.

Illustratively, the pixel points of the color image can be classified and identified based on a pre-trained target neural network, the target neural network can be a semantic segmentation model for semantic segmentation, taking two-classification prediction on the color image as an example, the pixel points of the color image, which belong to the target object, and the pixel points of the color image, which do not belong to the target object, can be determined, the color image can be input into the target neural network for class prediction, and the probability that each pixel point in the color image belongs to the target object and the probability that each pixel point in the color image does not belong to the target object are obtained, and the sum of the probability that each pixel point in the color image belongs to the target object and the probability that each pixel point in the color image does not belong to the target object is 1, so that which pixel points in the color image belong to the target object can be determined.

Illustratively, the semantic segmentation result of the color image can indicate pixel points in the color image, which are matched with the target object, for example, the label value of the pixel point belonging to the target object can be set as a first label value, and the label value of the pixel point not belonging to the target object is recorded as a second label value, so that a binary image corresponding to the color image can be obtained, and the binary image can visually represent the pixel points belonging to the target object and the pixel points not belonging to the target object in the color image.

Specifically, when generating a target object depth map of a target object based on a semantic segmentation result of a color image and a depth image, as shown in fig. 2, the following steps S201 to S203 may be included:

s201, adjusting the depth value of a pixel point in the depth image, wherein the depth value of the pixel point does not accord with a preset depth value range, to be a target depth value.

For example, in the process of determining a depth image based on a binocular image acquired by a binocular camera, due to distortion of the binocular camera or an error in the process of determining the depth image, depth values of some pixel points may not be obtained, or depth values of some acquired pixel points exceed a reasonable range, for example, exceed a maximum depth value that can be acquired by the binocular camera, or the depth values are negative numbers, which may affect a depth map of a subsequently determined target object, so that after obtaining the depth image of a target scene, an initial correction may be performed on the depth image.

For example, the maximum depth value and the minimum depth value in the preset depth value range may be determined according to parameters of the binocular camera, after the preset depth value range is obtained, the depth values of the pixel points whose depth values do not conform to the preset depth value range in the depth image may be reset, for example, the depth values of the pixel points which do not conform to the preset depth value range are set to be fixed target depth values, for example, set to be 0, and some hollow regions exist in the depth image obtained in this way, that is, regions whose depth values are 0.

S202, determining the position information of pixel points of a target object in a depth image aligned with a color image based on the position information of the pixel points of the target object in the color image indicated by the semantic segmentation result of the color image;

s203, based on the determined position information of the pixel points, the depth values of the pixel points in the depth image except the image area where the target object is located are adjusted to be target depth values, and a target object depth map of the target object is obtained.

Considering that the color image and the depth image are aligned, the pixel position information of the target object in the depth image may be determined based on the pixel position information of the target object in the color image, so that the pixel in the depth image may be traversed, whether the current pixel belongs to the target object is determined according to the determined pixel position information, when the current pixel is determined to belong to the target object, the depth value of the current pixel is retained, and when the current pixel is determined not to belong to the target object, the depth value of the current pixel is set to the target depth value, for example, to 0.

Exemplarily, there are some pixel points whose depth values are equal to the target depth value in the image region where the target object is located in the target object depth map obtained in the above manner, and a region formed by these pixel points may be referred to as a void region as shown in fig. 3a, so that after the target object depth map of the target object is generated, the depth detection method provided in the embodiment of the present disclosure further includes:

and carrying out depth value correction on pixel points, corresponding to the depth values of the target object, in the target object depth map, wherein the corresponding depth values are equal to the target depth values, and carrying out smoothing treatment on the corrected target object depth map to obtain an optimized target object depth map.

For example, after performing depth value correction on the target object depth map having the void region, the target object depth map as shown in fig. 3b may be obtained, and in addition, considering that there may be some noise in the target object depth map obtained after performing depth value correction, the depth of the contour of the target object obtained in this way is not smooth, or a region with a drastic change in depth value occurs inside the target object, which affects the accuracy of the target object depth map, so the target object depth map after correction may be subjected to smoothing processing to improve the accuracy of the target object depth map.

Specifically, when performing depth value correction on a pixel point in the target object depth map, where the corresponding depth value is equal to the target depth value, as shown in fig. 4, the method includes the following steps S301 to S303:

s301, traversing pixel points of an image area where a target object is located in the target object depth map, and detecting whether the depth value of the traversed current pixel point is equal to the target depth value.

Exemplarily, taking the target object depth map shown in fig. 3a as an example, a region where a pixel point whose depth value is equal to the target depth value in an image region where the target object is located in the target object depth map is taken as a void region, where the depth value of the pixel point in the void region and the depth value of the pixel point outside the region where the target object is located are both target depth values, so that the target object depth map is missing when representing the depth information of the target object, and the depth value of the pixel point in the void region needs to be corrected.

S302, under the condition that the depth value of the current pixel point is not equal to the target depth value, whether the depth value of the first associated pixel point is equal to the target depth value is detected; the first associated pixel point is a pixel point which is adjacent to the current pixel point and is in the image area where the target object is located.

For example, in the case that it is detected that the depth value of the current pixel point is not equal to the target depth value, it may be continuously detected whether the depth value of the pixel point adjacent to the current pixel point and in the image area where the target object is located is equal to the target depth value.

And S303, under the condition that the depth value of the first associated pixel point is detected to be equal to the target depth value, updating the depth value of the first associated pixel point to the depth value of the current pixel point.

Exemplarily, the depth value of the pixel point adjacent to the current pixel point and in the image region where the target object is located is complemented by the depth value of the current pixel point, so as to obtain the depth value of each pixel point in the hollow region in the target object, and obtain the target object depth map as shown in fig. 3 b.

In the embodiment of the present disclosure, when the depth value of the current pixel is not equal to the target depth value, the depth value of the pixel adjacent to and in the image region where the target object is located may be complemented by the depth value of the current pixel, and according to this manner, the true depth value of the pixel whose depth value is equal to the target depth value in the image region where the target object is located in the depth map may be quickly determined, so as to obtain a complete depth map of the target object.

Specifically, when the target object depth map after the correction is subjected to the smoothing processing, as shown in fig. 5, the following steps S401 to S402 may be included:

s401, aiming at any pixel point of the image area where the target object is located in the corrected target object depth image, determining the average depth value of the any pixel point and a second associated pixel point; the second associated pixel point is a pixel point which is within a preset pixel point quantity range and is in the image area of the target object;

s402, adjusting any pixel point to be an average depth value.

For example, for a pixel point with a pixel coordinate of (i, j) in the target depth map, an average depth value between the pixel point and another pixel point in the image region where the target object is located within a preset pixel number range from the pixel point may be determined, and the determined average depth value is used as an updated depth value of the pixel point.

In an implementation manner, after generating a target object depth map of a target object, as shown in fig. 6, the depth detection method provided by the embodiment of the present disclosure further includes:

s501, determining first position information of the target object in the real scene based on the position information of the target object in the color image, the target object depth map and the pose data of the binocular camera.

For example, the pose data of the binocular camera specifically includes the position and the pose of the binocular camera in the real scene, for example, a world coordinate system may be constructed in advance based on the real scene, and the position and the pose of the binocular camera in the world coordinate system corresponding to the real scene may be determined by calibrating the binocular camera in advance.

For example, a camera coordinate system is constructed by using a binocular camera, and based on the position information of the target object in the color image, the depth map of the target object and the pose data of the binocular camera in the world coordinate system, first position information of each pixel point included in the target object in the color image in the world coordinate system can be determined, that is, the first position information of the target object in the real scene is obtained.

S502, determining second position information of the target virtual object in the real scene based on the position information of the target virtual object in the three-dimensional scene model matched with the real scene.

For example, a three-dimensional scene model matched with a real scene may be constructed in advance, the three-dimensional scene model and the real scene are presented in the same coordinate system at a ratio of 1:1, a display position of a target virtual object in a world coordinate system may be generated based on the three-dimensional scene model, for example, the target virtual object is a virtual cartoon character, and second position information of the target virtual object in the world coordinate system corresponding to the real scene may be determined based on position information of the target virtual object in the three-dimensional scene model.

S503, determining the AR special effect when the AR device presents the target virtual object according to the pose data, the first position information and the second position information of the AR device in the real scene.

The AR device may include or not include the aforementioned binocular camera, and specifically, the AR device may include, but is not limited to, devices with display function and data processing capability, such as AR glasses, a tablet computer, a smart phone, a smart wearable device, and the like, in which an application program for presenting AR scene content may be installed, and a user may experience AR special effects in the application program.

For example, the pose data of the AR device may include a position and a posture of the AR device in a world coordinate system corresponding to the real scene, and may be determined based on a pose sensor installed on the AR device, or may be determined based on an image of the real scene captured by the AR device, which is not specifically limited herein.

For example, an AR special effect between the target virtual object and the target object when the AR device is rendered, such as whether the target virtual object is occluded by the target object or not, or whether the target virtual object occludes the target object, may be determined through pose data of the AR device in the world coordinate system, first position information of the target object in the world coordinate system, and second position information of the target virtual object in the world coordinate system.

In the following, taking a target object as a human body as an example, the depth detection method provided by the present disclosure is introduced, and specifically includes the following processes:

(1) obtaining a depth image of a target scene and an RGB image aligned with the depth image from an input binocular image through a binocular stereo matching algorithm;

(2) performing portrait segmentation on the RGB image to obtain a portrait mask capable of representing a semantic segmentation result of the RGB image;

(3) matching the depth image with the portrait mask to obtain a human body depth map;

(4) and carrying out cavity completion and smoothing treatment on the human body depth map to obtain a target human body depth map.

Wherein the step (4) comprises: (4-1) traversing each pixel point in the human body depth map, and if the depth value is not 0, transmitting the depth value to the pixel point which is adjacent to the pixel point and has the depth value of 0. Repeating the iteration process until all the cavity areas with the depth values of 0 in the human body depth map are filled; and (4-2) smoothing the filled and supplemented human body depth map.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same technical concept, a depth detection device corresponding to the depth detection method is further provided in the embodiment of the present disclosure, and as the principle of solving the problem of the device in the embodiment of the present disclosure is similar to that of the depth detection method in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 7, a schematic diagram of a depth detection apparatus 600 according to an embodiment of the present disclosure is shown, where the depth detection apparatus 600 includes:

the acquisition module 601 is used for acquiring binocular images of a target scene acquired by a binocular camera;

a first generating module 602, configured to generate a depth image of a target scene and a color image aligned with the depth image based on a binocular image;

the segmentation module 603 is configured to perform semantic segmentation on the color image to obtain a semantic segmentation result of the color image, where the semantic segmentation result is used to indicate a pixel point in the color image, where the pixel point is matched with the target object;

and a second generating module 604, configured to generate a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image.

In a possible implementation, the second generating module 604, when configured to generate a target object depth map of a target object based on a semantic segmentation result of a color image and a depth image, includes:

adjusting the depth value of a pixel point of which the depth value does not conform to a preset depth value range in the depth image to be a target depth value;

determining the position information of pixel points of a target object in a depth image aligned with a color image based on the position information of the pixel points of the target object in the color image indicated by the semantic segmentation result of the color image;

and adjusting the depth values of the pixel points except the image area where the target object is located in the depth image to be target depth values based on the determined position information of the pixel points, so as to obtain a target object depth map of the target object.

In a possible implementation, the second generating module 604, after generating the target object depth map of the target object, is further configured to:

In a possible implementation manner, the second generating module 604, when configured to perform depth value correction on a pixel point in the target object depth map, where a corresponding depth value is equal to the target depth value, includes:

traversing pixel points of an image area where a target object is located in the target object depth map, and detecting whether the depth value of the traversed current pixel point is equal to the target depth value;

under the condition that the depth value of the current pixel point is not equal to the target depth value, whether the depth value of the first associated pixel point is equal to the target depth value is detected; the first associated pixel point is a pixel point which is adjacent to the current pixel point and is in the image area of the target object;

In a possible implementation, the second generating module 604, when configured to perform the smoothing process on the modified target object depth map, includes:

aiming at any pixel point of the image area where the target object is located in the corrected target object depth image, determining the average depth value of the any pixel point and the second associated pixel point; the second associated pixel point is a pixel point which is within a preset pixel point quantity range and is in the image area of the target object;

and adjusting the depth value of any pixel point to be an average depth value.

In a possible implementation, after generating the target object depth map of the target object, the second generating module 604 is further configured to:

determining first position information of a target object in a real scene based on position information of the target object in a color image, a target object depth map and pose data of a binocular camera;

determining second position information of the target virtual object in the real scene based on the position information of the target virtual object in the three-dimensional scene model matched with the real scene;

and determining the AR special effect when the AR equipment presents the target virtual object according to the pose data, the first position information and the second position information of the AR equipment in the real scene.

In a possible implementation, the segmentation module 603, when configured to perform semantic segmentation on the color image to obtain a semantic segmentation result of the color image, includes:

classifying and identifying the pixel points in the color image to obtain the probability that the pixel points in the color image belong to a target object;

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

Corresponding to the depth detection method in fig. 1, an embodiment of the present disclosure further provides an electronic device 700, as shown in fig. 8, which is a schematic structural diagram of the electronic device 700 provided in the embodiment of the present disclosure, and includes:

a processor 71, a memory 72, and a bus 73; the memory 72 is used for storing execution instructions and includes a memory 721 and an external memory 722; the memory 721 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 71 and the data exchanged with the external memory 722 such as a hard disk, the processor 71 exchanges data with the external memory 722 through the memory 721, and when the electronic device 700 is operated, the processor 71 and the memory 72 communicate with each other through the bus 73, so that the processor 71 executes the following instructions: acquiring a binocular image of a target scene acquired by a binocular camera; generating a depth image of the target scene and a color image aligned with the depth image based on the binocular image; performing semantic segmentation on the color image to obtain a semantic segmentation result of the color image, wherein the semantic segmentation result is used for indicating pixel points matched with a target object in the color image; and generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image.

The embodiments of the present disclosure also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the depth detection method in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the depth detection method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A depth detection method, comprising:

acquiring a binocular image of a target scene acquired by a binocular camera;

and generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image.

2. The depth detection method of claim 1, wherein generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image comprises:

3. The depth detection method of claim 2, wherein after generating the target object depth map of the target object, further comprising:

4. The method according to claim 3, wherein the performing depth value correction on the pixel point of the target object depth map corresponding to the depth value equal to the target depth value includes:

5. The depth detection method according to claim 3 or 4, wherein the smoothing of the modified target object depth map comprises:

and adjusting the depth value of any pixel point to the average depth value.

6. The depth detection method according to any one of claims 1 to 5, wherein after generating the target object depth map of the target object, the depth detection method further comprises:

7. The depth detection method according to any one of claims 1 to 6, wherein the semantic segmentation of the color image to obtain the semantic segmentation result of the color image comprises:

8. A depth detection device, comprising:

9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the depth detection method according to any one of claims 1 to 7.

10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the steps of the depth detection method according to any one of claims 1 to 7.