CN114511608A

CN114511608A - Method, device, terminal, imaging system and medium for acquiring depth image

Info

Publication number: CN114511608A
Application number: CN202210092986.4A
Authority: CN
Inventors: 杨晓立; 余宇山; 刘贤焯
Original assignee: Orbbec Inc
Current assignee: Orbbec Inc
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2022-05-17
Also published as: WO2023142352A1

Abstract

The application is applicable to the technical field of image processing, and provides a method, a device, a terminal, an imaging system and a medium for acquiring a depth image. The method for acquiring the depth image specifically comprises the following steps: acquiring a color image obtained by shooting a target object by a first camera; controlling a speckle projector to project a preset speckle pattern on a target object, and acquiring a speckle image obtained by shooting the target object by a second camera; determining a first depth image of the target object based on a speckle pattern on the target object in the speckle image; and extracting and matching the image characteristics of the color image and the speckle image, and obtaining a second depth image of the target object according to the matched image characteristics and the first depth image. The embodiment of the application can improve the precision of the depth image and reduce the hardware cost at the same time.

Description

Method, device, terminal, imaging system and medium for acquiring depth image

Technical Field

The present application belongs to the field of image processing technologies, and in particular, to a method, an apparatus, a terminal, an imaging system, and a medium for obtaining a depth image.

Background

Depth images (depth images), also known as range images, refer to images that take as pixel values the distance (depth) from an image grabber to each point in a scene, which directly reflects the geometry of the visible surface of the scene. At present, a depth image is usually acquired by a passive binocular stereoscopic vision system, a monocular speckle structured light system and a binocular speckle structured light system.

The passive binocular stereo vision system consists of two cameras, is easily influenced by the texture condition of the surface of a measured object, and is difficult to obtain reliable depth estimation when the surface of the measured object is a weak texture or a non-texture surface.

Monocular speckle structured light systems typically consist of an Infrared (IR) camera and a near-Infrared speckle projector. The active three-dimensional imaging mode can project random scattered spots invisible to human eyes on the surface of a measured object, and a reliable depth estimation result can be obtained for the measured object with weak texture or non-texture surface. In order to obtain the color information of the object to be measured, an RGB camera is also required. Monocular speckle structured light systems also have their limitations. For a measured object with a low-reflectivity surface, because the light reflected to the IR camera by the surface of the measured object is weak, a hole is easy to appear in a depth image of the measured object. For a distant object, due to light attenuation, the IR camera can hardly capture the speckle pattern of the speckle projector on the surface of the object to be measured, which will also cause the appearance of holes in the depth image. In addition, monocular speckle structured light systems are susceptible to sunlight outdoors.

The binocular speckle structured light system consists of two IR cameras and a speckle projector. On one hand, the binocular speckle structured light can carry out image matching by using speckle patterns projected by the speckle projector, and simultaneously can carry out image matching by using texture information of the surface of an object, so that the binocular speckle structured light can be compatible with indoor and outdoor use. Similarly, in order to obtain the color information of the object, an RGB camera is also required. Compared with a monocular speckle structured light system, an IR camera is required, which increases the cost of hardware. Furthermore, in order to obtain a depth image in the RGB camera coordinate system, the depth image needs to be re-projected onto the RGB camera image plane. However, due to the existence of a certain distance between the optical centers of the RGB camera and the IR camera, when a three-dimensional object exists in the scene, the occluded area will cause a new hole to appear in the depth re-projected to the RGB camera coordinate system, which is not desirable for practical application.

Disclosure of Invention

The embodiment of the application provides a method, a device, a terminal, an imaging system and a medium for obtaining a depth image, which can improve the precision of the depth image and reduce the hardware cost.

A first aspect of an embodiment of the present application provides a method for obtaining a depth image, including:

acquiring a color image obtained by shooting a target object by a first camera;

controlling a speckle projector to project a preset speckle pattern on the target object, and acquiring a speckle image obtained by shooting the target object by a second camera;

determining a first depth image of a target object based on a speckle pattern projected on the target object in the speckle image;

and extracting and matching image features of the color image and the speckle image, and obtaining a second depth image of the target object based on the matched image features and the first depth image.

A second aspect of the embodiments of the present application provides an apparatus for obtaining a depth image, including:

a color image acquisition unit configured to acquire a color image obtained by photographing a target object by a first camera;

the speckle image acquisition unit is used for controlling the speckle projector to project a preset speckle pattern on the target object and then acquiring a speckle image obtained by shooting the target object by a second camera;

a monocular structured light unit to determine a first depth image of a target object based on a speckle pattern projected on the target object in the speckle image;

and the depth image acquisition unit is used for extracting and matching the image characteristics of the color image and the speckle image, and obtaining a second depth image of the target object according to the matched image characteristics and the first depth image.

A third aspect of the embodiments of the present application provides a terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method when executing the computer program.

A fourth aspect of the embodiments of the present application provides a depth imaging system, including a first camera, a second camera, a speckle projector, and a terminal provided in the third aspect of the embodiments of the present application, where the first camera, the second camera, and the speckle projector are disposed on a same plane;

the speckle projector is used for projecting a speckle pattern to a target object;

the first camera is used for acquiring a color image of the target object;

the second camera is used for acquiring a speckle image of the target object;

and the terminal is used for acquiring the depth image of the target object by utilizing the color image and the speckle image.

A fifth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the above method.

A sixth aspect of embodiments of the present application provides a computer program product, which when run on a terminal, causes the terminal to perform the steps of the method.

In the embodiment of the application, a first depth image of a target object is determined based on a speckle pattern projected on the target object in the speckle image by acquiring a color image obtained by shooting the target object by a first camera and the speckle image obtained by shooting the target object by a second camera, and at the moment, the first depth image is a depth image obtained based on a monocular structure photon system in an imaging system and has a good three-dimensional reconstruction effect on the target object with weak texture or no texture; the color image and the speckle image can be respectively used as a left image and a right image in a binocular stereo vision system, so that the method has advantages for measuring a long-distance target object and has certain reconstruction capability for the target object with a low-reflectivity surface; and a second depth image of the target object is obtained based on the color image, the speckle image and the first depth image, so that the number of cameras is reduced, and the precision of the depth image is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic structural diagram of a depth imaging system provided in an embodiment of the present application;

fig. 2 is a schematic flow chart of an implementation of a method for obtaining a depth image according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a specific implementation of step S204 provided in the embodiment of the present application;

fig. 4 is a schematic flowchart of a stereo matching network model provided in an embodiment of the present application;

fig. 5a is a schematic diagram of a color image acquired by a first camera according to an embodiment of the present application;

FIG. 5b is a schematic diagram of a speckle image captured by a second camera provided by an embodiment of the present application;

FIG. 5c is a schematic diagram of a first depth image provided by an embodiment of the present application;

FIG. 5d is a schematic diagram of a second depth image provided by an embodiment of the present application;

fig. 6 is a schematic structural diagram of an apparatus for acquiring a depth image according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any inventive step, are intended to be protected by the present application.

Therefore, there is a need for an imaging system and a corresponding depth image acquisition method that can combine the advantages of a passive binocular stereo vision system, a monocular speckle structured light system, and a binocular speckle structured light system.

In order to explain the technical means of the present application, the following description will be given by way of specific examples.

Fig. 1 illustrates a depth imaging system provided in the present application, which is suitable for a situation where the hardware cost is reduced while the accuracy of a depth image is improved.

The imaging system may include a first camera 11, a second camera 12, a speckle projector 13, and a terminal (not shown), wherein the first camera 11, the second camera 12, and the speckle projector 13 are disposed on the same plane.

The speckle projector 13 is configured to project a speckle pattern onto a target object, where a spectral range corresponding to the projected speckle pattern may be a near infrared spectrum. The first camera 11 is used for acquiring a color image of a target object; the second camera 12 is used to acquire speckle images of the target object. The terminal is used for acquiring the depth image of the target object by utilizing the color image and the speckle image.

It should be noted that there is a distance of a baseline length between the second camera 12 and the speckle projector 13, and there is a distance of another baseline length between the first camera 11 and the second camera 12. In some embodiments of the present application, the first camera 11 and the speckle projector 13 should be as close as possible, and ideally, the first camera 11 may be attached to the speckle projector 13.

In some embodiments of the present application, the first camera 11 may be an RGB camera, and the spectral range in which it operates may be the visible light spectrum.

In some embodiments of the present application, the second camera 12 may be an IR camera that operates in a spectral range that includes the infrared spectrum and the visible spectrum. In a conventional monocular speckle structured light system, an IR camera needs to be provided with a narrow-band filter to filter out ineffective ambient light, that is, the IR camera can only sense near infrared light. In the embodiment of the present application, however, the second camera 12 does not need to be equipped with a narrowband filter.

In some embodiments of the present application, the terminal in the imaging system may include a processor, a memory, and a computer program stored in the memory and executable on the processor, and may be configured to perform image processing on the color image acquired by the first camera 11 and the speckle image acquired by the second camera 12 to obtain a depth image of the target object.

In the embodiment of the present application, the second camera 12 and the speckle projector 13 may constitute a monocular-structure photonic system, and the first camera 11 and the second camera 12 constitute a pair of binocular stereo vision subsystems. The monocular structure photonic system has a good three-dimensional reconstruction effect on a target object with weak texture or no texture, while the binocular stereo vision subsystem has advantages on the measurement of a long-distance target object and also has certain reconstruction capability on the target object with a low-reflectivity surface. Meanwhile, since the second camera 12 has the light sensing capability of both the IR camera and the RGB camera, the number of IR cameras can be reduced, and the final output depth image and color image can be naturally aligned pixel by pixel.

Therefore, the embodiment of the application reduces the hardware cost while improving the precision of the depth image, integrates the respective advantages of a monocular speckle structure optical system and a passive binocular stereo vision system, and overcomes the defects of the binocular speckle structure optical system.

Specifically, fig. 2 shows a schematic flow chart of an implementation of the method for obtaining a depth image according to the embodiment of the present application, where the method can be applied to a terminal, and is applicable to a situation where hardware cost needs to be reduced while accuracy of a depth image needs to be improved.

The terminal may be a terminal in the imaging system or a terminal separated from the imaging system. The terminal can be specifically a computer, a tablet computer, a smart phone and other devices.

Specifically, the above-described method for acquiring a depth image may include the following steps S201 to S204.

In step S201, a color image obtained by photographing a target object with a first camera is acquired.

Step S202, after controlling the speckle projector to project the preset speckle pattern on the target object, acquiring a speckle image obtained by shooting the target object by the second camera.

In an embodiment of the present application, the terminal may generate a first control signal to control the speckle projector 13 to project a preset speckle pattern on the target object.

Then, the terminal may generate a second control signal to control the first camera 11 and the second camera 12 to perform image capturing, respectively, and acquire images captured by the first camera 11 and the second camera 12. The terminal may also acquire images actively acquired and uploaded by the first camera 11 and the second camera 12 after projecting a preset speckle pattern on the target object.

In some specific embodiments of the present application, in a scene where a moving object exists, the terminal may control the first camera 11 and the second camera 12 to perform image acquisition at the same frequency, and analyze a depth image of the target object based on a color image and a speckle image acquired at the same sampling time; and in the scene of a static object, the terminal can analyze the depth image of the target object by using the color image and the speckle image acquired at different acquisition moments.

Step S203 determines a first depth image of the target object based on the speckle pattern projected on the target object in the speckle image.

In some embodiments of the present application, after the speckle projector 13 projects a predetermined speckle pattern onto the target object, a speckle pattern is formed on the surface of the target object, and the texture of the surface of the target object affects parameters such as the shape, distance, etc. of each speckle in the speckle pattern on the surface. Accordingly, based on the speckle pattern on the target object in the speckle image, the terminal can determine a first depth image of the target object. For a weakly textured or non-textured target object, the first depth image may still better reflect the depth information of its surface.

Specifically, the terminal can identify each speckle in the speckle pattern in the speckle image, and calculate a first depth value of each pixel point corresponding to the surface of the target object in the speckle image based on the diffraction grating theorem and a triangulation method. The terminal can also input the speckle images into the neural network model by using the existing neural network model to obtain the first depth image corresponding to the speckle images. The terminal can also match each speckle in the speckle pattern in the speckle image with a preset speckle pattern to obtain a first depth image corresponding to the speckle image.

It should be noted that other first depth image obtaining methods applied to the monocular speckle structured light system are also applicable to the present application, and the specific manner adopted in step S203 may be selected according to actual situations, which is not limited in the present application.

And step S204, extracting and matching image characteristics of the color image and the speckle image, and obtaining a second depth image of the target object according to the matched image characteristics and the first depth image.

Specifically, as shown in fig. 3, the step S204 may include the following steps S301 to S303.

In step S301, a first characteristic image of the color image and a second characteristic image of the speckle image are extracted.

Specifically, in the embodiment of the application, the terminal may extract the image features of the color image through a first feature extraction algorithm to obtain a first feature image. The pixel points of the first characteristic image can represent the image characteristics of the pixel points at the same position in the color image.

Similarly, the image features of the speckle images can be extracted through a second feature extraction algorithm to obtain a second feature image. The pixel points of the second characteristic image can represent the image characteristics of the pixel points at the same position in the speckle image.

The first feature extraction algorithm and the second feature extraction algorithm may be the same or different.

Since the spectral range in which the second camera 12 operates includes the infrared light spectrum and the visible light spectrum, the second feature image may represent the image features of the visible light spectrum, and the first feature image may be the feature image of the left image and the feature image of the right image in the passive binocular stereo vision system, respectively. At the same time, the second characteristic image may also characterize image characteristics of the near infrared spectrum, i.e. characteristics including the speckle pattern projected by the speckle projector 13 and formed on the target object.

It should be noted that, in order to ensure the accuracy of feature extraction, the terminal may perform pre-processing on the color image and the speckle image before performing feature extraction on the color image and the speckle image. The pre-processing may specifically include background removal, alignment, incremental processing, etc.

Step S302, a first matching image is obtained by matching the first characteristic image and the second characteristic image, and a first fusion image is obtained by fusing the first matching image and the first depth image.

In some embodiments of the present application, the terminal may calculate, at each parallax, matching features of a first pixel point in the first feature image and a second pixel point corresponding to the first pixel point in the second feature image to obtain a first matching image, that is, a cost space c_d. The first pixel point is any one pixel point in the first characteristic image. And traversing each pixel point of the first characteristic image to finally obtain a first matching image.

In some embodiments of the present application, the first matching image

Wherein,<f_RGB(x,y)，f_IR(x-d,y)>denotes f_RGB(x, y) and f_IRInner product of (x-d, y), N_cNumber of channels characterised by f_RGB(x, y) represents the pixel value sum f of a first pixel point in the first feature image_IR(x-d, y) represents a second pixel point corresponding to the first pixel point in the second characteristic image under the condition that the parallax is d; an inner product operation is required for each disparity d.

In other embodiments of the present application, the first matching image C_concat(d,x,y)＝Concat{f_RGB(x,y),f_IR(x-d, y) }, where Concat { f_RGB(x,y)，f_IR(x-d, y) } denotes that the Concat operation is performed on the first feature image and the second feature image at different disparities d.

Further, after the obtained first matching image, the terminal may perform feature fusion on the first matching image and the first depth image, and adjust the first matching image according to the first depth image to obtain a first fused image.

It should be noted that the first matching image, namely the cost space c_dThe degree of correlation between the first characteristic image and the second characteristic image at different parallaxes in stereo matching is expressed. Because the cost space is established based on the passive binocular stereo matching system, the precision is lower than that of a structured light system when a non-texture area is processed, and therefore the purpose of fusing the first matching image and the first depth image can be achieved by adjusting the cost space by using the first depth image.

In one embodiment, there are multiple ways of performing feature fusion on the first matching image and the first depth image, and preferably, the distribution of the first matching image is adjusted by using gaussian distribution, so as to obtain the first fused image, where the specific formula is as follows:

wherein h is_dIs the adjusted cost space, i.e. the first fused image, g_ijIs the corresponding disparity map, v, obtained by the first depth image through re-projection_ijFor binary mask, when g_ijWhen not equal to 0, v_ij1, k and σ are predetermined parameters that determine the shape of the gaussian distribution.

Step S303, performing disparity estimation on the first fused image to obtain a disparity map of the target object, and determining a second depth image of the target object based on the disparity map.

In particular, the method comprises the following steps of,the first fused image is subjected to soft argmin operation to obtain the parallax map. Disparity values in disparity maps

Wherein D is_maxFor maximum disparity value, D ∈ [0, D ∈_max)。σ(c_d) Representation versus cost space c_dA softmax operation is performed. After the disparity map is obtained, a second depth image can be obtained through linear transformation. It should be understood that the disparity map may be obtained through one softmax operation, or may be obtained through multiple iterations after the softmax operation and through loss function supervision, which is not limited herein.

In some embodiments of the present application, as shown in fig. 4, the step S204 may be implemented by a trained stereo matching network model. Specifically, the terminal may input the color image, the speckle image, and the first depth image to the stereo matching network model, and acquire the second depth image output by the stereo matching network model.

Specifically, the stereo matching network may include a first feature extraction layer, a second feature extraction layer, a feature matching layer, a depth feature fusion layer, and a disparity estimation layer.

The first feature extraction layer may extract a first feature image of the color image based on a feature extraction algorithm. The second feature extraction layer may then extract a second feature image of the IR speckle image based on a feature extraction algorithm.

The feature matching layer can calculate matching features of a first pixel point in the first feature image and a second pixel point corresponding to the first pixel point in the second feature image under each parallax to obtain a first matching image.

After the obtained first matching image, the depth feature fusion layer may perform feature fusion on the first matching image and the first depth image to obtain a first fusion image.

The disparity estimation layer may obtain a disparity map of the target object by performing disparity estimation on the first fusion image, and determine the second depth image based on the disparity map.

In some embodiments of the present application, the training process of the stereo matching network model may include: a sample set is acquired, each sample in the sample set including a sample color image, a sample speckle image, and a sample first depth image. And randomly extracting samples in the sample set, inputting the samples into a neural network model to be trained, calculating errors between the output depth image and a reference depth image corresponding to the samples, adjusting parameters and weights in the neural network model according to the errors, and extracting the samples in the sample set again for training until the number of training iterations reaches a preset threshold value, thereby finishing the training. And taking the neural network model obtained after training as the stereo matching network model.

In the embodiment of the application, a first depth image of a target object is determined based on a speckle pattern projected on the target object in the speckle image by acquiring a color image obtained by shooting the target object by a first camera 11 and the speckle image obtained by shooting the target object by a second camera 12, and at the moment, the first depth image is a depth image obtained based on a monocular structure photon system in an imaging system and has a good three-dimensional reconstruction effect on the target object with weak texture or no texture; the color image and the speckle image can be respectively used as a left image and a right image in a binocular stereo vision system, so that the method has advantages for measuring a long-distance target object and has certain reconstruction capability for the target object with a low-reflectivity surface; and a second depth image of the target object is obtained based on the color image, the speckle image and the first depth image, so that the number of cameras is reduced, and the precision of the depth image is improved.

That is, compare with passive binocular stereo vision system, the scheme that this application provided all can output reliable depth information to weak texture or no texture to be measured object and richness texture to be measured object. Compared with a monocular speckle structured light system, the scheme provided by the application has a larger measuring distance range, and has better performance for a low albedo target, and meanwhile, the scheme can be degenerated to be a common passive binocular stereoscopic vision system to work for an outdoor environment with strong sunlight interference. Compared with a binocular speckle structured light system, the scheme provided by the application can reduce one IR camera. Moreover, the second camera 12 has the capability of sensing visible light and infrared light, which is equivalent to the coincidence of the optical centers of the RGB camera and the IR camera, so that no shielding region exists in the scheme provided by the application, no hole appears in the depth image, and the depth image and the RGB image are naturally aligned pixel by pixel.

Fig. 5a shows a color image acquired by the first camera 11 and fig. 5b shows a speckle image acquired by the second camera 12, fig. 5c shows a first depth image processed by the monocular structure photonic system, and fig. 5d shows a second depth image output by the stereo matching network model. According to the image, the imaging system and the corresponding depth image acquisition method can obtain the depth image with high precision.

It should be noted that, for simplicity of description, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders.

Fig. 6 is a schematic structural diagram of an apparatus 600 for acquiring a depth image according to an embodiment of the present disclosure, where the apparatus 600 for acquiring a depth image is configured on a terminal.

Specifically, the apparatus 600 for acquiring a depth image may include:

a color image acquisition unit 601 configured to acquire a color image obtained by photographing a target object by a first camera;

a speckle image acquiring unit 602, configured to control the speckle projector to project a preset speckle pattern onto the target object, and acquire a speckle image obtained by shooting the target object with a second camera;

a monocular structured light unit 603 configured to determine a first depth image of the target object based on a speckle pattern projected on the target object in the speckle image;

and a depth image obtaining unit 604, configured to extract and match image features of the color image and the speckle image, and obtain a second depth image of the target object based on the matched image features and the first depth image.

In some embodiments of the present application, the depth image obtaining unit 604 may be specifically configured to: extracting a first characteristic image of the color image and a second characteristic image of the speckle image; performing feature matching on the first feature image and the second feature image, and performing feature fusion on a first matched image obtained after the feature matching and the depth image to obtain a first fused image; and performing parallax estimation on the first fusion image to obtain a parallax image of the target object, and determining the second depth image based on the parallax image.

In some embodiments of the present application, the depth image obtaining unit 604 may be specifically configured to: and extracting and matching image features of the color image and the speckle image through a preset stereo matching network model, and obtaining a second depth image of the target object based on the matched image features and the first depth image, wherein the preset stereo matching network model comprises a first feature extraction layer, a second feature extraction layer, a feature matching layer, a depth feature fusion layer and a parallax estimation layer.

In some embodiments of the present application, the depth image obtaining unit 604 may be specifically configured to: and respectively calculating the matching characteristics of a first pixel point in the first characteristic image and a second pixel point corresponding to the first pixel point in the second characteristic image under each parallax to obtain the first matching image.

In some embodiments of the present application, the depth image obtaining unit 604 may be specifically configured to: according to the first depth image, the distribution of the first matching image is adjusted through Gaussian distribution to obtain the first fusion image, and the adjustment formula is as follows:

wherein h is_dIs the adjusted cost space, i.e. the first fused image, g_ijIs the first depth imageCorresponding disparity maps, v, obtained by reprojection_ijFor binary mask, when g_ijWhen not equal to 0, v_ij1, k and σ are predetermined parameters that determine the shape of the gaussian distribution.

In some embodiments of the present application, the spectral range of the second camera includes an infrared light spectrum and a visible light spectrum.

It should be noted that, for convenience and simplicity of description, the specific working process of the depth image obtaining apparatus 600 may refer to the corresponding process of the method described in fig. 1 to 4 and fig. 5a to 5d, and is not described herein again.

Fig. 7 is a schematic diagram of a terminal according to an embodiment of the present application. The terminal 7 may include: a processor 70, a memory 71 and a computer program 72, such as a depth image acquisition program, stored in said memory 71 and executable on said processor 70. The processor 70, when executing the computer program 72, implements the steps in the above-described embodiments of the method for acquiring a depth image, such as the steps S201 to S204 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the modules/units in the above-described apparatus embodiments, such as the color image acquisition unit 601, the speckle image acquisition unit 602, the monocular structured light unit 603, and the depth image acquisition unit 604 shown in fig. 6.

The computer program may be divided into one or more modules/units, which are stored in the memory 71 and executed by the processor 70 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program in the terminal.

For example, the computer program may be divided into: the device comprises a color image acquisition unit, a speckle image acquisition unit, a monocular structure light unit and a depth image acquisition unit.

The specific functions of each unit are as follows: a color image acquisition unit configured to acquire a color image obtained by photographing a target object by a first camera; the speckle image acquisition unit is used for controlling the speckle projector to project a preset speckle pattern on the target object and acquiring a speckle image obtained by shooting the target object by the second camera; a monocular structured light unit to determine a first depth image of the target object based on a speckle pattern projected on the target object in the speckle image; and the depth image acquisition unit is used for extracting and matching the image characteristics of the color image and the speckle image, and obtaining a second depth image of the target object according to the matched image characteristics and the first depth image.

The terminal may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is only an example of a terminal and is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or different components, for example, the terminal may also include input output devices, network access devices, buses, etc.

The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 71 may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal. The memory 71 is used for storing the computer program and other programs and data required by the terminal. The memory 71 may also be used to temporarily store data that has been output or is to be output.

It should be noted that, for convenience and simplicity of description, the structure of the terminal may also refer to the detailed description of the structure in the method embodiment, and is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal and method may be implemented in other ways. For example, the above-described apparatus/terminal embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for acquiring a depth image is characterized by comprising the following steps:

acquiring a color image obtained by shooting a target object by a first camera;

determining a first depth image of the target object based on a speckle pattern on the target object in the speckle image;

and extracting and matching image features of the color image and the speckle image, and obtaining a second depth image of the target object according to the matched image features and the first depth image.

2. The method for acquiring the depth image according to claim 1, wherein the extracting and matching the image features of the color image and the speckle image, and obtaining the second depth image of the target object according to the matched image features and the first depth image, comprises:

extracting a first characteristic image of the color image and a second characteristic image of the speckle image;

matching the first characteristic image and the second characteristic image to obtain a first matching image, and fusing the first matching image and the first depth image to obtain a first fused image;

and performing parallax estimation on the first fusion image to obtain a parallax map of the target object, and determining the second depth image of the target object according to the parallax map.

3. The method for acquiring the depth image according to claim 1, wherein the extracting and matching the image features of the color image and the speckle image, and obtaining the second depth image of the target object according to the matched image features and the first depth image, comprises:

and extracting and matching image features of the color image and the speckle image through a preset stereo matching network model, and obtaining a second depth image of the target object based on the matched image features and the first depth image, wherein the preset stereo matching network model comprises a first feature extraction layer, a second feature extraction layer, a feature matching layer, a depth feature fusion layer and a parallax estimation layer.

4. The method for acquiring the depth image according to claim 2, wherein the matching the first feature image and the second feature image to obtain a first matching image comprises:

and respectively calculating the matching characteristics of a first pixel point in the first characteristic image and a second pixel point corresponding to the first pixel point in the second characteristic image under each parallax to obtain the first matching image.

5. The method for obtaining the depth image according to claim 2, wherein the obtaining a first fused image by fusing the first matching image and the first depth image includes:

according to the first depth image, the distribution of the first matching image is adjusted through Gaussian distribution to obtain the first fusion image, and the adjustment formula is as follows:

wherein h is_dIs the adjusted cost space, i.e. the first fused image, g_ijIs the corresponding disparity map, v, obtained by the first depth image through re-projection_ijIs a binary mask, when g_ijWhen not equal to 0, v_ij1, k and σ are predetermined parameters that determine the shape of the gaussian distribution.

6. The method of acquiring a depth image of any one of claims 1 to 5, wherein the spectral range of the second camera includes an infrared light spectrum and a visible light spectrum.

7. An apparatus for obtaining a depth image, comprising:

a monocular structured light unit to determine a first depth image of the target object based on a speckle pattern projected on the target object in the speckle image;

8. A terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when executing the computer program.

9. A depth imaging system comprising a first camera, a second camera, a speckle projector, and the terminal of claim 8, the first camera, the second camera, and the speckle projector disposed on a same plane;

the speckle projector is used for projecting a preset speckle pattern to a target object;

the first camera is used for acquiring a color image of the target object;

the second camera is used for acquiring a speckle image of the target object;

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.