CN116168071A

CN116168071A - Depth data acquisition method, device, electronic equipment and machine-readable storage medium

Info

Publication number: CN116168071A
Application number: CN202310110036.4A
Authority: CN
Inventors: 周杨; 陈元吉; 邓志辉
Original assignee: Hangzhou Hikrobot Co Ltd
Current assignee: Hangzhou Hikrobot Co Ltd
Priority date: 2023-02-08
Filing date: 2023-02-08
Publication date: 2023-05-26

Abstract

The application provides a depth data acquisition method, a depth data acquisition device, electronic equipment and a machine-readable storage medium, wherein the depth data acquisition method comprises the following steps: acquiring multi-frame gray level images of a scene to be tested under different textures through image acquisition equipment; determining a background gray level map according to the multi-frame gray level map; obtaining a multi-frame texture enhanced gray scale map according to the multi-frame gray scale map and the background gray scale map; coding and synthesizing the multi-frame gray level images to obtain a first synthesized image; and performing coding synthesis on the multi-frame texture enhanced gray level image to obtain a second synthesized image; determining a first depth map according to the first synthetic map; determining a second depth map according to the second synthetic map; and fusing the first depth map and the second depth map to obtain a final depth map. The method can obtain the final depth map with both high dynamic and edge precision.

Description

Depth data acquisition method, device, electronic equipment and machine-readable storage medium

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a depth data acquisition method, a depth data acquisition device, an electronic device, and a machine-readable storage medium.

Background

Currently, in the field of three-dimensional detection based on gray level matching, a camera is generally used to acquire an image of a scene to be detected illuminated by a coding pattern generator, and then the camera is used to acquire the spatial position relation of texture information to recover depth information of the scene to be detected. However, in the reconstruction process, the gray level image captured at a time is limited by the laser power and the texture contrast, the textures in the image are often sparse and have limited contrast, the depth data recovery capability of the scene is limited, and the depth image with high dynamic and edge precision for different scenes is difficult to realize.

Disclosure of Invention

In view of this, the present application provides a depth data acquisition method, apparatus, electronic device, and machine-readable storage medium.

According to a first aspect of an embodiment of the present application, there is provided a depth data acquisition method, including:

acquiring multi-frame gray level images of a scene to be tested under different textures through image acquisition equipment;

determining a background gray level map according to the multi-frame gray level map;

obtaining a multi-frame texture enhanced gray scale map according to the multi-frame gray scale map and the background gray scale map;

coding and synthesizing the multi-frame gray level images to obtain a first synthesized image; and performing coding synthesis on the multi-frame texture enhanced gray level image to obtain a second synthesized image;

Determining a first depth map according to the first synthetic map; determining a second depth map according to the second synthetic map;

and fusing the first depth map and the second depth map to obtain a final depth map.

According to a second aspect of embodiments of the present application, there is provided a depth data acquisition apparatus, including:

the acquisition unit is used for acquiring multi-frame gray images of the scene to be tested under different textures through the image acquisition equipment;

the first determining unit is used for determining a background gray level map according to the multi-frame gray level map;

the enhancement unit is used for obtaining a multi-frame texture enhanced gray level image according to the multi-frame gray level image and the background gray level image;

the synthesis unit is used for carrying out coding synthesis on the multi-frame gray level images to obtain a first synthesis image; and performing coding synthesis on the multi-frame texture enhanced gray level image to obtain a second synthesized image;

the second determining unit is used for determining a first depth map according to the first synthetic map; determining a second depth map according to the second synthetic map;

and the fusion unit is used for fusing the first depth map and the second depth map to obtain a final depth map.

According to a third aspect of embodiments of the present application, there is provided an electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor for executing the machine-executable instructions to implement the method provided in the first aspect.

According to a fourth aspect of embodiments of the present application, there is provided a machine-readable storage medium having stored therein machine-executable instructions which, when executed by a processor, implement the method provided in the first aspect.

According to the depth data acquisition method, multiple frames of gray maps under different textures of a scene to be detected are acquired, a background gray map is determined according to the acquired multiple frames of gray maps, and multiple frames of texture enhanced gray maps are obtained according to the multiple frames of gray maps and the background gray map so as to enhance effective textures in the gray maps; furthermore, on the one hand, the original multi-frame gray level image can be coded and synthesized to obtain a first synthesized image, and a first depth image is determined according to the first synthesized image; on the other hand, the multi-frame texture enhanced gray level images are coded and synthesized to obtain a second synthesized image, a second depth image is determined according to the second synthesized image, and multi-frame information is fully utilized through the coding and synthesis of the multi-frame gray level images to obtain a depth image with higher edge precision; in addition, by enhancing the effective texture, a high-dynamic depth map is obtained, and therefore, by fusing the first depth map and the second depth map, a final depth map with both high dynamic and edge precision is obtained.

Drawings

Fig. 1 is a flow chart of a depth data acquisition method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a depth data acquisition method based on a binocular depth camera with both high dynamic and edge accuracy according to an embodiment of the present application;

fig. 3 is a flow chart of a depth data acquisition method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a depth data acquiring device according to an embodiment of the present application;

fig. 5 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

In order to better understand the technical solutions provided by the embodiments of the present application and make the above objects, features and advantages of the embodiments of the present application more obvious, the technical solutions in the embodiments of the present application are described in further detail below with reference to the accompanying drawings.

It should be noted that, the sequence number of each step in the embodiment of the present application does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

Referring to fig. 1, a flow chart of a depth data acquisition method provided in an embodiment of the present application, as shown in fig. 1, the depth data acquisition method may include the following steps:

step S100, acquiring multi-frame gray level images of a scene to be tested under different textures through image acquisition equipment.

The image capture device is illustratively an image capture device having depth information acquisition capabilities, which may include, but is not limited to, a monocular depth camera or a multi-view depth camera.

By way of example, the projection of various textures of the measured scene can be realized by adjusting the patterns and/or the projection angles of the textures projected to the measured scene, so that multi-frame gray level diagrams of the measured scene under different textures can be obtained, and the sufficient richness of the texture information can be ensured.

For example, in the case where the image capturing device is a multi-view depth camera, the gray scale map under any texture of the scene under test includes the gray scale maps captured by the plurality of cameras.

For example, taking a binocular depth camera as an example, the gray scale map of the scene under test at any texture includes a left-eye gray scale map and a right-eye gray scale map.

It should be noted that, when the image capturing device obtains multiple frames of gray images under different textures of the measured scene, imaging parameters (such as exposure time, gain, etc.) of the image capturing device need to be kept consistent, and brightness of the texture light source needs to be kept consistent (the same or brightness difference value is within a preset difference value range) when each frame of gray image is captured.

Step S110, determining a background gray level map according to the acquired multi-frame gray level map.

Step S120, obtaining a multi-frame texture enhanced gray scale map according to the multi-frame gray scale map and the background gray scale map.

In this embodiment of the present application, in order to obtain a depth map with high dynamic state, a background gray map may be determined according to an obtained multi-frame gray map, and an effective texture enhancement may be performed on the obtained multi-frame gray map according to the background gray map, and a background suppression (may also be referred to as a background texture suppression) may be performed to highlight an effective texture, so as to obtain a gray map with multi-frame texture enhancement (i.e., effective texture enhancement).

Step S130, coding and synthesizing a multi-frame gray scale image to obtain a first synthesized image; and performing coding synthesis on the multi-frame texture enhanced gray level image to obtain a second synthesized image.

In the embodiment of the application, in order to combine the texture information of the multi-frame gray level images into the same frame of image, the information of the multi-frame gray level images can be combined into the same frame of image, so that the information of the multi-frame gray level images can be fully utilized, and a depth image with higher edge precision can be obtained.

Accordingly, the multi-frame gray-scale images acquired in step S100 may be respectively encoded and synthesized to obtain corresponding synthesized images (referred to herein as first synthesized images).

And, performing encoding synthesis on the multi-frame texture enhanced gray scale map obtained in step S120 to obtain a corresponding synthesized map (referred to herein as a second synthesized map).

Step S140, determining a first depth map according to the first synthetic map; and determining a second depth map according to the second composite map.

And step S150, fusing the first depth map and the second depth map to obtain a final depth map.

In the embodiment of the present application, when the first synthesized image is obtained in a manner, a corresponding depth image (referred to herein as a first depth image) may be determined according to the first synthesized image.

For example, for a binocular depth camera, the parallax of each pixel position of the left and right eye images can be determined according to the matching result of the pixel points of the left eye image and the right eye image, and further, the depth information of each pixel position can be determined, so as to obtain a corresponding depth map.

For a monocular depth camera, depth information of each pixel position can be determined according to a gray level image and a preset calibration image, and a corresponding depth image is obtained.

Similarly, in the case where the second synthesized image is obtained in a manner, a corresponding depth image (referred to herein as a second depth image) may be determined from the second synthesized image.

In this embodiment of the present application, since the original gray-scale image has both active and passive textures (i.e., the effective textures and the background textures), the edge details of the corresponding depth map (i.e., the first depth map) are usually clear; the effective texture in the texture enhancement image is enhanced, and the dynamic range of the corresponding depth map (i.e., the second depth map) is higher, so that a depth map (which can be called a final depth map) with both high dynamic and edge precision can be obtained by fusing the first depth map and the second depth map.

It can be seen that, in the flow of the method shown in fig. 1, by acquiring multiple frames of gray maps under different textures of a measured scene, determining a background gray map according to the acquired multiple frames of gray maps, and obtaining a multiple frames of enhanced gray maps according to the multiple frames of gray maps and the background gray map, so as to enhance the effective textures in the gray maps; furthermore, on the one hand, the original multi-frame gray level image can be coded and synthesized to obtain a first synthesized image, and a first depth image is determined according to the first synthesized image; on the other hand, the multi-frame texture enhanced gray level images are coded and synthesized to obtain a second synthesized image, a second depth image is determined according to the second synthesized image, and multi-frame information is fully utilized through the coding and synthesis of the multi-frame gray level images to obtain a depth image with higher edge precision; in addition, by enhancing the effective texture, a high-dynamic depth map is obtained, and therefore, by fusing the first depth map and the second depth map, a final depth map with both high dynamic and edge precision is obtained.

In some embodiments, the image acquisition device comprises at least one camera for performing image acquisition, and the at least one camera performs image acquisition under any texture of the scene to be measured.

The determining the background gray level map according to the multi-frame gray level map may include:

and (3) for a multi-frame gray scale image obtained through the same camera, taking the minimum gray scale value of each pixel position in the multi-frame gray scale image, and reconstructing to obtain a background gray scale image corresponding to the multi-frame gray scale image.

The background gray-scale image may be determined by suppressing the effective texture of the acquired gray-scale image, for example.

For example, in a multi-frame gray scale map with consistent brightness of a texture light source, the average brightness of the background will generally be consistent, while the effective texture belongs to increased brightness, so that when the gray scale value at the same pixel position in the multi-frame image is the smallest, the pixel position belongs to the background in the frame.

Therefore, for a plurality of frames of gray level images obtained through the same camera, a background gray level image inhibiting effective textures can be reconstructed by taking the minimum gray level value from the same pixel position of a plurality of frames.

For example, assuming that the image capturing apparatus is a binocular depth camera, when capturing images, a left-eye camera (or referred to as a left camera) and a right-eye camera (or referred to as a right camera) respectively capture multi-frame gray maps under different textures of a measured scene.

For a multi-frame gray level image acquired by the left-eye camera, a background gray level image corresponding to the left-eye camera can be obtained by taking the minimum gray level value of each pixel position in the multi-frame gray level image.

For a multi-frame gray level image acquired by the right-eye camera, a background gray level image corresponding to the right-eye camera can be obtained by taking the minimum gray level value of each pixel position in the multi-frame gray level image.

In some embodiments, the obtaining the multi-frame texture enhanced gray scale map according to the multi-frame gray scale map and the background gray scale map may include:

and reconstructing the gray value of each pixel position in the frame gray map according to the difference value between the gray value of each pixel position in the frame gray map and the gray value of the same pixel position in the corresponding background gray map to obtain the texture enhanced gray map corresponding to the frame gray map.

Illustratively, consider that for any pixel location, the greater the difference in gray value for that pixel location from the gray value for that pixel location in the background gray map, the greater the probability that pixel location is an effective texture; the closer the gray value of the pixel location is to the gray value of the pixel location in the background gray map, the greater the probability that the pixel location is background.

Therefore, when the background gray level map is determined in the above manner, for any frame gray level map, the gray level value of each pixel position in the frame gray level map can be reconstructed according to the difference value between the gray level value of each pixel position in the frame gray level map and the gray level value of the same pixel position in the corresponding background gray level map, the pixel points belonging to the background are suppressed, the pixel points of the effective texture are enhanced, and the texture enhanced gray level map corresponding to the frame gray level map is obtained.

In an example, the reconstructing the gray value of each pixel position in the frame gray scale map according to the difference between the gray value of each pixel position in the frame gray scale map and the gray value of the same pixel position in the corresponding background gray scale map may include:

and for any pixel position of the frame gray scale image, carrying out weighting operation according to the difference value between the gray scale value of the pixel position in the frame gray scale image and the gray scale value of the pixel position in the corresponding background gray scale image and the gray scale value of the pixel position in the frame gray scale image to obtain the reconstruction gray scale value of the pixel position.

For example, for any frame gray scale, the gray scale value for each pixel location in the frame gray scale can be reconstructed by the following formula:

G′ _(x，y) ＝w ₁₁ G _(x，y) +w ₁₂ (G _(x，y) P _(x，y) )

Wherein G is _(x，y) For the gray value, P, of the pixel position (x, y) in the frame gray map _(x，y) Is the gray value, G 'of the pixel position (x, y) in the background gray level image' _(x，y) Gray value, w, of reconstructed pixel location (x, y) ₁₁ W ₁₂ The weight coefficient is the value range of 0 to 1, and w ₁₁ +w ₁₂ ＝1。

In this embodiment of the present application, for any frame gray scale image, the enhancement of the effective texture and the suppression of the background texture may be directly achieved by amplifying the difference between the gray scale value of each pixel position in the frame gray scale image and the gray scale value of the same pixel position in the corresponding background gray scale image by a preset multiple (the multiple is greater than 1).

In some embodiments, the image capturing device includes at least one camera for capturing images, where the at least one camera captures images of the scene under test under any texture.

The above-mentioned encoding synthesis of the multi-frame gray scale image to obtain a first synthesized image may include:

for a multi-frame gray scale map obtained through the same camera, interpolating the multi-frame gray scale map to obtain a multi-frame first to-be-synthesized map with amplified resolution;

image fusion is carried out on the multi-frame first to-be-synthesized image to obtain a first synthesized image;

The encoding synthesis of the multi-frame texture enhanced gray scale image to obtain a second synthesized image may include:

interpolation is carried out on the multi-frame texture enhanced gray level map corresponding to the multi-frame texture enhanced gray level map obtained through the same camera, and a multi-frame second to-be-synthesized map with amplified resolution is obtained;

and carrying out image fusion on the multi-frame second to-be-synthesized image to obtain a second synthesized image.

For example, since the multi-frame image is encoded and synthesized to obtain the synthesized image, in order to combine the texture information of the multi-frame image into the same image, if the resolution of the synthesized image is identical to the resolution of the original gray scale image, the texture information density of the synthesized image may be too high, and the texture information may be too dense.

Correspondingly, when the multi-frame gray images acquired by the same camera are coded and synthesized, interpolation, such as bilinear interpolation, can be performed on each frame gray image to obtain a gray image with amplified resolution (which can be called a first to-be-synthesized image), and image fusion is performed on the multi-frame gray images with amplified resolution to obtain the first synthesized image.

Illustratively, the resolution magnification is the same for each frame of gray scale map.

Similarly, for a multi-frame texture enhanced gray scale map corresponding to a multi-frame gray scale map obtained by the same camera, interpolation can be performed on the multi-frame texture enhanced gray scale map to obtain a texture enhanced gray scale map (which can be called a second to-be-synthesized map) with amplified resolution, and image fusion is performed on the multi-frame second to-be-synthesized map to obtain a second synthesized map.

In an example, the image fusion of the multiple frames of the first to-be-synthesized image to obtain the first synthesized image may include:

taking the maximum gray value of each pixel position in the multi-frame first to-be-synthesized image to obtain a first synthesized image;

the image fusion of the multi-frame second to-be-synthesized image to obtain a second synthesized image may include:

and taking the maximum gray value of each pixel position in the multi-frame first to-be-synthesized image to obtain a first synthesized image.

For example, the image fusion may be performed on each image to be synthesized (such as the first image to be synthesized or the second image to be synthesized) by taking the maximum value from pixel point to pixel point.

Taking the first to-be-synthesized image as an example, for a multi-frame first band synthesized image subjected to image fusion, the maximum gray value of each pixel position in the multi-frame first to-be-synthesized image can be taken to obtain a first synthesized image, namely, for any pixel position, the maximum gray value of the pixel position in the multi-frame first to-be-synthesized image is determined to be the gray value of the pixel position in the first synthesized image.

In some embodiments, the image capture device is a multi-view depth camera; the number of image frames included in the first synthetic image and the number of image frames included in the second synthetic image are consistent with the number of the multi-number depth cameras; the resolution of the first synthesized image and the resolution of the second synthesized image are larger than the resolution of the original gray image;

the determining, according to the first composite map, the first depth map may include:

sampling and matching each frame of image in the first synthetic image to obtain a first depth image with original resolution;

determining a second depth map from the second composite map may include:

and carrying out sampling matching on each frame of image in the second synthetic image to obtain a second depth image with original resolution.

For example, taking an image capturing device as an example of a binocular depth camera, the first composite image obtained in the above manner may include a first left-eye composite image and a first right-eye composite image.

For example, the pixels in the first left-eye synthesized image and the first right-eye synthesized image may be matched to determine the parallax of each pixel in the left-eye image and the right-eye image, so as to obtain a corresponding depth image (i.e., the first depth image).

Similarly, the second synthesized image may include a second left-eye synthesized image and a second right-eye synthesized image, and each pixel point in the second left-eye synthesized image and each pixel point in the second right-eye synthesized image may be matched to determine the parallax of each pixel point of the left-eye image and the right-eye image, so as to obtain a corresponding depth image (i.e., the second depth image).

For example, in order to improve the matching efficiency, sampling matching may be performed on the synthetic graphs (the first synthetic graph or the second synthetic graph), that is, the synthetic graphs are sampled according to a preset sampling manner, and the sampling points are matched.

In an example, the performing sample matching on each frame image in the first composite image to obtain a first depth image with an original resolution may include:

according to the resolution of the first synthetic image and the resolution of the original gray level image, sampling each frame of image in the first synthetic image at intervals, and matching sampling points of each frame of image to obtain a first depth image with the original resolution;

the above-mentioned sampling and matching are carried out on each frame image in the second synthetic image, so as to obtain a second depth image with original resolution, which comprises:

and according to the resolution of the second synthetic image and the resolution of the original gray level image, sampling each frame of image in the second synthetic image at intervals, and matching sampling points of each frame of image to obtain a second depth image with the original resolution.

The sampling rate may be determined, for example, from the ratio of the resolution of the composite map to the original gray map.

For example, assuming that the resolution of the synthesized image (first synthesized image or second synthesized image) is 2 times that of the original gray-scale image, sampling may be performed at intervals, i.e., 1 sampling point is obtained for every 2 pixel points, and the sampling points correspond to the effective points of the original gray-scale image.

In one example, the matching the sampling points of each frame image may include:

and matching sampling points of each frame of image according to the sampling parallax.

Illustratively, since the synthesized image is resolution-enlarged relative to the original gray-scale image, the parallax range of each pixel point in the synthesized image is also enlarged in the same proportion.

When sampling matching is performed, the sampling can be performed according to the efficiency requirement, or the sampling can be performed on the parallax, and the sampling point matching can be performed according to the sampling parallax.

For example, in order to improve the matching efficiency, the parallax may be sampled, and the sampling points of the images of each frame may be matched according to the sampled parallax.

For example, assuming that the resolution of the synthesized image is 2 times that of the original gray-scale image, the parallax may be sampled, and sampling point matching may be performed with a parallax of an integer multiple of 2.

In some embodiments, fusing the first depth map and the second depth map to obtain a final depth map may include:

performing edge corrosion treatment on the second depth map to obtain a treated second depth map;

and fusing the first depth map and the processed second depth map to obtain a final depth map.

For example, since the second depth map is obtained according to the texture enhanced gray scale map, although the dynamic range is higher, the edge accuracy of the second depth map is reduced to a certain extent, in order to avoid the influence of the second depth map on the edge accuracy of the final depth map, the second depth map may be subjected to edge erosion processing, and the first depth map and the processed second depth map are fused to obtain the final depth map, so as to improve the edge accuracy of the final depth map.

In an example, fusing the first depth map and the processed second depth map to obtain the final depth map may include:

for any pixel position, if the pixel value of the pixel position in the first depth map and the processed second depth map is larger than 0, carrying out weighted average processing on the pixel value of the pixel position in the first depth map and the pixel value of the pixel position in the processed second depth map to obtain the pixel value of the pixel position in the final depth map;

if the pixel value of the pixel position in the first depth map and the processed second depth map is 0, determining that the pixel value of the pixel position in the final depth map is 0;

if one of the pixel values of the pixel position in the first depth map and the processed second depth map is 0, determining the pixel value of the pixel position in the final depth map according to the non-0 pixel value of the pixel position.

For example, the fusion of the first depth map and the processed second depth map may be achieved by the following formula:

wherein D is _1(x，y) For the pixel value of the pixel position (x, y) in the first depth map, D _2(x，y) D for the pixel value of the pixel position (x, y) in the processed second depth map _3(x，y) Is the pixel value, w, of the pixel location (x, y) in the final depth map ₂₁ And w ₂₂ As a weighting coefficient, w ₂₁ +w ₂₂ ＝1。

In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the technical solutions provided by the embodiments of the present application are described below in connection with specific application scenarios.

In the embodiment, a depth data acquisition scheme with high dynamic and edge precision is provided, and multi-frame gray level images are acquired by changing patterns and/or projection angles of projection images so as to ensure that texture information is rich enough; reconstructing a multi-frame gray image for enhancing the effective texture and inhibiting the background texture according to the multi-frame gray image; respectively encoding information of a multi-frame original gray scale image and an enhanced gray scale image into two groups of images; then, respectively performing interval matching on the two groups of coded images to obtain two groups of depth maps with different dynamic ranges and better edge precision with higher efficiency; and finally, fusing the two depth maps to obtain the depth map with both high dynamic and edge precision.

Taking a binocular depth camera to collect 4 frames of texture patterns as an example, a description is given below of an implementation flow of the depth data acquisition scheme provided in the embodiment of the present application.

As shown in fig. 2, in this embodiment, the implementation flow of the depth data acquisition scheme, which combines high dynamics and edge accuracy, includes:

1. image acquisition: each frame of image (which can be called a group of images) acquired by the binocular depth camera consists of two images captured by the left camera and the right camera, and the two images are captured simultaneously under the same imaging parameter (the acquisition time is the same or the error of the acquisition time is within an allowable error range); transforming the texture image or changing the projection angle to obtain gray scale images of different textures.

For example, two or more sets of images may be acquired, and the texture light sources of the sets of images are uniform in brightness (the brightness is the same or the brightness difference is within a preset difference range).

As shown in fig. 2, the acquired original gray-scale map may include 4 sets of gray-scale maps: left and right gray maps of texture 1 (including a gray map collected by a left camera under texture 1 and a gray map collected by a right camera, the same applies below), left and right gray maps of texture 2, left and right gray maps of texture 3, and left and right gray maps of texture 4.

The textures 1-4 are different textures obtained by converting texture images or changing projection angles.

2. Background gray scale map: because the average brightness of the background is also consistent in the multi-frame gray level map with the same light source brightness, and the effective texture belongs to the brightness increase; therefore, when the gray value of the same pixel point in the multi-frame image is minimum, the point belongs to the background in the frame, and a background gray image for inhibiting the effective texture can be reconstructed by taking the minimum gray value of the same pixel point in the multi-frame image.

It should be noted that, for the binocular depth camera, in the process of obtaining the background gray level map according to the above manner, the left-eye gray level background map may be obtained according to the image collected by the left camera, and the right-eye gray level background map may be obtained according to the image collected by the right-eye camera.

3. Texture enhanced image: on the basis of the obtained background gray level images, each original gray level image and the background gray level image are subtracted pixel by pixel to obtain the probability that each pixel belongs to an effective texture or a background (the larger the gray level difference value is, the higher the probability of the effective texture is, the smaller the gray level difference value is, the higher the probability of the background is), the pixel belonging to the background is restrained, and the pixel belonging to the effective texture is enhanced, so that a texture enhanced image is obtained.

For any frame gray scale, the gray scale value of each pixel position in the frame gray scale can be reconstructed by the following formula:

G′ _(x，y) ＝w ₁₁ G _(x，y) +w ₁₂ (G _(x，y) -P _(x，y) )

When the original gray-scale image is subjected to texture enhancement, the gray-scale image collected by the left camera is also required to be subjected to texture enhancement according to the left-eye gray-scale background image, and the gray-scale image collected by the right camera is required to be subjected to texture enhancement according to the right-eye gray-scale background image.

4. Multi-frame image coding synthesis: in order to combine multi-frame texture information into the same image, bilinear interpolation can be carried out on each gray image to obtain an image with amplified resolution, and the bearing area of the information is enlarged; and then the multi-frame amplified images are fused into an image with the same resolution, so that multi-frame information can be combined into the same image.

In addition, when encoding and synthesizing a plurality of frames of images, it is necessary to encode and synthesize images of the left camera and images of the right camera, respectively.

For example, when images of the left and right cameras are coded and synthesized, the magnification of the images may be set according to actual demands. For example, the magnification may be 2, i.e., the resolution of the encoded composite map is 2 times the resolution of the original gray map.

Illustratively, the way the images of the left camera are code synthesized is identical to the way the images of the right camera are code synthesized, e.g. pixel-by-pixel, maximum.

5. Sampling and matching: considering that the resolution of the coding synthetic image is larger than the required depth image, in order to improve the matching efficiency, the coding synthetic image can be sampled at intervals, and sampling points correspond to the effective points of the original gray image; each point disparity range of the encoded composite map is also scaled up to the same scale, either non-sampling (non-sampling of the disparity) or sampling disparity matching can be selected according to efficiency requirements.

Illustratively, the lower the disparity sampling rate, the higher the depth map accuracy.

6. Matching window: since the image to be matched is a composite image with enlarged resolution and multiple frames of information fused, when a matching window of the same size as that of a single frame (a single frame image with the resolution of the original gray scale image) is used for the composite image, the actual window of the depth image obtained by sampling and matching needs to be scaled down equally, which is equivalent to matching with a smaller window on the original resolution, so that the edge of the image can be more accurate.

7. Depth map fusion: after multi-frame coding matching is carried out on the original gray level image and the texture enhancement gray level image respectively, two images with different dynamic ranges can be obtained; wherein the depth map obtained from the original gray map (i.e. the first depth map may be denoted as D ₁ ) Because of the active and passive textures, the edge details are quite clear; the effective texture in the texture enhanced image is more pronounced, and its depth map (i.e., the second depth map, which may be denoted as D ₂ ) The dynamic range of (2) is higher and the edge accuracy may be slightly reduced; therefore, the method can be used for fusionClassifying each pixel point into an edge and a smooth area, performing edge corrosion treatment on the second depth map, and fusing with the first depth map pixel by pixel to obtain a final depth map D ₃ 。

Illustratively, the fusion of the first depth map and the processed second depth map may be achieved by the following formula:

as shown in fig. 3, the depth data acquisition procedure in this embodiment may include the following steps:

1. controlling the camera to switch different textures, setting the same imaging parameters, and collecting multi-frame gray images containing left and right gray images (the left gray image is the gray image collected by the left camera, and the right gray image is the gray image collected by the right camera);

2. reconstructing left and right background gray images according to the left and right multi-frame gray images respectively;

3. reconstructing a multi-frame texture enhanced gray scale map according to the background gray scale map and the original multi-frame gray scale map;

4. encoding the original left and right multi-frame gray level images to synthesize a group of left and right images to be matched, and obtaining a first depth image under the original resolution by a sampling matching method;

5. Encoding the left and right multi-frame gray level images after texture enhancement to form a group of left and right images to be matched, and obtaining a second depth image under original resolution by a sampling matching method;

6. classifying the pixel points in the second depth map into an edge and a smooth area, and corroding the edge pixel points to obtain an updated second depth map (namely the processed second depth map);

7. and fusing the first depth map and the updated second depth map pixel by pixel to obtain a final depth map.

The methods provided herein are described above. The apparatus provided in this application is described below:

referring to fig. 4, a schematic structural diagram of a depth data acquiring apparatus according to an embodiment of the present application is shown in fig. 4, where the depth data acquiring apparatus may include:

an acquiring unit 410, configured to acquire, by using an image acquisition device, a multi-frame gray-scale map under different textures of a scene to be measured;

a first determining unit 420, configured to determine a background gray level map according to the multi-frame gray level map;

an enhancement unit 430, configured to obtain a multi-frame texture enhanced gray scale map according to the multi-frame gray scale map and the background gray scale map;

A synthesizing unit 440, configured to code and synthesize the multi-frame gray scale map to obtain a first synthesized map; and performing coding synthesis on the multi-frame texture enhanced gray level image to obtain a second synthesized image;

a second determining unit 450, configured to determine a first depth map according to the first composite map; determining a second depth map according to the second synthetic map;

and a fusion unit 460, configured to fuse the first depth map and the second depth map to obtain a final depth map.

In some embodiments, the image acquisition device includes at least one camera for performing image acquisition, where the at least one camera performs image acquisition under any texture of the scene to be measured;

the first determining unit 420 determines a background gray-scale map according to the multi-frame gray-scale map, including:

In some embodiments, the enhancing unit 430 obtains a multi-frame texture enhanced gray scale map according to the multi-frame gray scale map and the background gray scale map, including:

In some embodiments, the enhancing unit 430 reconstructs the gray value of each pixel position in the frame gray scale map according to the difference between the gray value of each pixel position in the frame gray scale map and the gray value of the same pixel position in the corresponding background gray scale map, including:

the synthesizing unit 440 performs encoding synthesis on the multi-frame gray scale image to obtain a first synthesized image, including:

the synthesizing unit 440 performs encoding synthesis on the multi-frame texture enhanced gray scale map to obtain a second synthesized map, including:

In some embodiments, the synthesizing unit 440 performs image fusion on the multiple frames of the first to-be-synthesized image to obtain a first synthesized image, including:

the synthesizing unit 440 performs image fusion on the multiple frames of second images to be synthesized to obtain a second synthesized image, including:

In some embodiments, the image capture device is a multi-view depth camera; the number of image frames included in the first synthesized image and the number of image frames included in the second synthesized image are consistent with the number of the multi-mesh depth camera; the resolution of the first synthesized image and the resolution of the second synthesized image are both larger than the resolution of the original gray image;

the second determining unit 450 determines a first depth map according to the first composite map, including:

the second determining unit 450 determines a second depth map according to the second composite map, including:

In some embodiments, the second determining unit 450 performs sample matching on each frame image in the first composite image to obtain a first depth image with an original resolution, including:

The second determining unit 450 performs sampling matching on each frame image in the second composite image to obtain a second depth image with an original resolution, including:

In some embodiments, the second determining unit 450 matches sampling points of each frame image, including:

In some embodiments, the fusing unit 460 fuses the first depth map and the second depth map to obtain a final depth map, including:

In some embodiments, the fusing unit 460 fuses the first depth map and the processed second depth map to obtain a final depth map, including:

for any pixel position, if the pixel values of the pixel position in the first depth map and the processed second depth map are both greater than 0, performing weighted average processing on the pixel value of the pixel position in the first depth map and the pixel value of the pixel position in the processed second depth map to obtain the pixel value of the pixel position in the final depth map;

and if one of the pixel values of the pixel position in the first depth map and the processed second depth map is 0, determining the pixel value of the pixel position in the final depth map according to the non-0 pixel value of the pixel position.

An embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores machine executable instructions capable of being executed by the processor, and the processor is configured to execute the machine executable instructions to implement the depth data acquisition method described above.

Fig. 5 is a schematic hardware structure of an electronic device according to an embodiment of the present application. The electronic device may include a processor 501, a memory 502 storing machine-executable instructions. The processor 501 and the memory 502 may communicate via a system bus 503. Also, the processor 501 may perform the depth data acquisition method described above by reading and executing machine executable instructions in the memory 502 corresponding to the depth data acquisition logic.

The memory 502 referred to herein may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state drive, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.

In some embodiments, a machine-readable storage medium, such as memory 502 in fig. 5, is also provided, having stored therein machine-executable instructions that when executed by a processor implement the depth data acquisition method described above. For example, the storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

It is noted that relational terms such as target and object, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims

1. A depth data acquisition method, comprising:

2. The method according to claim 1, wherein the image acquisition device comprises at least one camera for image acquisition, and the at least one camera performs image acquisition under any texture of the scene to be measured;

The determining a background gray level map according to the multi-frame gray level map comprises:

3. The method according to claim 1, wherein obtaining a multi-frame texture enhanced gray scale map from the multi-frame gray scale map and the background gray scale map comprises:

4. A method according to claim 3, wherein reconstructing the gray value of each pixel position in the frame gray scale map according to the difference between the gray value of each pixel position in the frame gray scale map and the gray value of the same pixel position in the corresponding background gray scale map comprises:

5. The method according to claim 1, wherein the image acquisition device comprises at least one camera for image acquisition, and the at least one camera performs image acquisition under any texture of the scene to be measured;

the step of coding and synthesizing the multi-frame gray scale image to obtain a first synthesized image comprises the following steps:

the step of coding and synthesizing the multi-frame texture enhanced gray scale image to obtain a second synthesized image comprises the following steps:

6. The method of claim 5, wherein the performing image fusion on the first plurality of frames of images to be synthesized to obtain the first synthesized image includes:

and performing image fusion on the multi-frame second to-be-synthesized image to obtain a second synthesized image, wherein the image fusion comprises the following steps:

7. The method of claim 1, wherein the image acquisition device is a multi-view depth camera; the number of image frames included in the first synthesized image and the number of image frames included in the second synthesized image are consistent with the number of the multi-mesh depth camera; the resolution of the first synthesized image and the resolution of the second synthesized image are both larger than the resolution of the original gray image;

the determining a first depth map according to the first composite map includes:

the determining a second depth map according to the second composite map includes:

8. The method of claim 7, wherein the performing sample matching on each frame image in the first composite image to obtain a first depth map with an original resolution includes:

the step of sampling and matching each frame of image in the second synthetic image to obtain a second depth image with original resolution comprises the following steps:

9. The method of claim 8, wherein matching the sampling points of each frame of image comprises:

10. The method of claim 1, wherein fusing the first depth map and the second depth map to obtain a final depth map comprises:

11. The method of claim 10, wherein fusing the first depth map and the processed second depth map to obtain a final depth map comprises:

12. A depth data acquisition apparatus, comprising:

13. The apparatus of claim 12, wherein the image acquisition device comprises at least one camera for image acquisition, the at least one camera performing image acquisition for a scene under test under any texture;

the first determining unit determines a background gray level map according to the multi-frame gray level map, including:

for a multi-frame gray scale image obtained through the same camera, taking the minimum gray scale value of each pixel position in the multi-frame gray scale image, and reconstructing to obtain a background gray scale image corresponding to the multi-frame gray scale image;

And/or the number of the groups of groups,

the enhancement unit obtains a multi-frame texture enhanced gray scale map according to the multi-frame gray scale map and the background gray scale map, and the enhancement unit comprises:

for any frame gray level image, reconstructing the gray level value of each pixel position in the frame gray level image according to the difference value between the gray level value of each pixel position in the frame gray level image and the gray level value of the same pixel position in the corresponding background gray level image, so as to obtain a texture enhanced gray level image corresponding to the frame gray level image;

the enhancing unit reconstructs the gray value of each pixel position in the frame gray scale image according to the difference value between the gray value of each pixel position in the frame gray scale image and the gray value of the same pixel position in the corresponding background gray scale image, and the method comprises the following steps:

for any pixel position of the frame gray scale image, carrying out weighting operation according to the difference value between the gray scale value of the pixel position in the frame gray scale image and the gray scale value of the pixel position in the corresponding background gray scale image and the gray scale value of the pixel position in the frame gray scale image to obtain a reconstructed gray scale value of the pixel position;

and/or the number of the groups of groups,

the image acquisition equipment comprises at least one camera for image acquisition, and the at least one camera is used for image acquisition under any texture of a scene to be detected;

The synthesizing unit performs coding synthesis on the multi-frame gray scale image to obtain a first synthesized image, and the method comprises the following steps:

the synthesizing unit performs coding synthesis on the multi-frame texture enhanced gray scale image to obtain a second synthesized image, and the method comprises the following steps:

image fusion is carried out on the multi-frame second to-be-synthesized image to obtain a second synthesized image;

the synthesizing unit performs image fusion on the multi-frame first image to be synthesized to obtain a first synthesized image, and the method comprises the following steps:

the synthesizing unit performs image fusion on the multi-frame second image to be synthesized to obtain a second synthesized image, and the method comprises the following steps:

And/or the number of the groups of groups,

the image acquisition equipment is a multi-view depth camera; the number of image frames included in the first synthesized image and the number of image frames included in the second synthesized image are consistent with the number of the multi-mesh depth camera; the resolution of the first synthesized image and the resolution of the second synthesized image are both larger than the resolution of the original gray image;

the second determining unit determines a first depth map according to the first composite map, including:

the second determining unit determines a second depth map according to the second composite map, including:

sampling and matching each frame of image in the second synthetic image to obtain a second depth image with original resolution;

the second determining unit performs sampling matching on each frame of image in the first synthetic image to obtain a first depth image with original resolution, and the second determining unit includes:

The second determining unit performs sampling matching on each frame image in the second synthetic image to obtain a second depth image with original resolution, including:

according to the resolution of the second synthetic image and the resolution of the original gray level image, sampling each frame of image in the second synthetic image at intervals, and matching sampling points of each frame of image to obtain a second depth image with the original resolution;

the second determining unit matches sampling points of each frame of image, and includes:

according to the sampling parallax, matching sampling points of each frame of image;

and/or the number of the groups of groups,

the fusing unit fuses the first depth map and the second depth map to obtain a final depth map, which comprises the following steps:

fusing the first depth map and the processed second depth map to obtain a final depth map;

the fusing unit fuses the first depth map and the processed second depth map to obtain a final depth map, which includes:

14. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor for executing the machine executable instructions to implement the method of any of claims 1-11.

15. A machine-readable storage medium having stored thereon machine-executable instructions which, when executed by a processor, implement the method of any of claims 1-11.