WO2023155353A1

WO2023155353A1 - Depth image acquisition method and apparatus, and depth system, terminal and storage medium

Info

Publication number: WO2023155353A1
Application number: PCT/CN2022/100593
Authority: WO
Inventors: 杨晓立; 余宇山; 赵鑫
Original assignee: 奥比中光科技集团股份有限公司
Priority date: 2022-02-16
Filing date: 2022-06-23
Publication date: 2023-08-24
Also published as: CN114638869A

Abstract

The present application is applicable to the technical field of image processing. Provided are a depth image acquisition method and apparatus, and a depth system, a terminal and a storage medium. The depth image acquisition method specifically comprises: acquiring a color image and a sparse depth image of a target scene; extracting a color feature and an initial feature of the color image, and acquiring an initial dense depth image and an initial hidden feature on the basis of the color feature, the initial feature and the sparse depth image; performing at least one iterative optimization operation, so as to confirm a target hidden feature on the basis of a hidden feature to be confirmed that is acquired by means of each iterative optimization operation; and performing depth estimation using the target hidden feature, so as to obtain a target dense depth image of the target scene. The embodiments of the present application can improve the reliability of an acquired dense depth image.

Description

Depth image acquisition method, device, depth system, terminal and storage medium

This application claims the priority of the Chinese patent application with the application number 202210142004.8 submitted to the China Patent Office on February 16, 2022. The contents are incorporated by reference in this application.

technical field

The present application belongs to the technical field of image processing, and in particular relates to a depth image acquisition method, device, depth system, terminal and storage medium.

Background technique

In recent years, with the development of computer vision technology in the fields of autonomous driving, robotics, and AR applications, depth estimation has become a popular research application field. Commonly used depth perception methods, such as structured light, TOF, binocular, lidar, etc., after years of development, have become increasingly mature in technology and have been widely used in many fields. However, these methods are limited by the cost and technology itself, and can only obtain reliable sparse depth point clouds, or low-resolution depth maps. Therefore, in recent years, deep completion technology based on neural network has received extensive attention.

How to effectively fuse RGB images and sparse depth images is still an open problem. However, the existing depth completion techniques often do not make good use of sparse depth images, and the resulting dense depth images are not reliable enough.

Contents of the invention

Embodiments of the present application provide a depth image acquisition method, device, depth system, terminal, and storage medium, which can improve the reliability of acquired dense depth images.

The first aspect of the embodiment of the present application provides a method for acquiring a depth image, including:

Obtain a color image and a sparse depth image of the target scene;

extracting color features and initial features of the color image, and obtaining an initial dense depth image and initial hidden features according to the color features, the initial features, and the sparse depth image;

Using the color feature, the sparse depth image and the initial hidden feature, perform at least one iterative optimization operation on the initial dense depth image, and confirm the target hidden feature according to the hidden features to be confirmed obtained by each iterative optimization operation ;

Depth estimation is performed using the target hidden features to obtain a target dense depth image of the target scene.

A device for acquiring a depth image provided in the second aspect of the embodiment of the present application includes:

An image acquisition unit, configured to acquire a color image and a sparse depth image of the target scene;

an initial densification unit, configured to extract color features and initial features of the color image, and obtain an initial dense depth image and initial hidden features according to the color features, the initial features, and the sparse depth image;

An iterative optimization unit, configured to perform at least one iterative optimization operation on the initial dense depth image by using the color feature, the sparse depth image, and the initial hidden feature, and obtain the information to be confirmed according to each iterative optimization operation Hidden features confirm target hidden features;

The target densification unit is configured to use the hidden features of the target to perform depth estimation to obtain a target dense depth image of the target scene.

The third aspect of the embodiment of the present application provides a depth system, including a color module, a depth module, and the acquisition device described in the second aspect of the application, wherein:

The color module is used to collect a color image of a target scene;

The depth module is used to scan the target scene to obtain point cloud data, and obtain a sparse depth image according to the point cloud data;

The acquisition device uses the color image and the sparse depth image to obtain a target dense depth image.

The fourth aspect of the embodiments of the present application provides a terminal, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the above method is implemented when the processor executes the computer program A step of.

A fifth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of the foregoing method are implemented.

The sixth aspect of the embodiments of the present application provides a computer program product, which, when the computer program product runs on a terminal, enables the terminal to execute the steps of the method.

In the embodiment of the present application, by acquiring the color image and sparse depth image of the target scene, extracting the color features and initial features of the color image, and obtaining the initial dense depth image and initial hidden features according to the color features, initial features and sparse depth image, And perform at least one iterative optimization operation to confirm the hidden features of the target according to the hidden features to be confirmed obtained by each iterative optimization operation, and use the hidden features of the target for depth estimation to obtain the target dense depth image of the target scene. Since each iteration The optimization operation needs to refer to the sparse depth image and color features to determine the hidden features to be confirmed, that is, the process of each iterative optimization operation will be guided by the information of the RGB image and the information of the sparse depth image, which can improve the obtained dense depth image. reliability.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the accompanying drawings that need to be used in the descriptions of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings in the following description are only for the present application For some embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings without paying creative efforts.

FIG. 1 is a schematic diagram of an implementation flow of a depth image acquisition method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a specific implementation process for determining an initial hidden feature provided by an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a depth image model provided by an embodiment of the present application;

Fig. 4 is a schematic structural diagram of a feedback module provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a specific implementation process of training a depth image model provided by an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a device for acquiring a depth image provided by an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a terminal provided by an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art without creative efforts shall belong to the protection of the present application.

The existing depth completion technology generally guides the scene information of the scene through the RGB color image to realize the densification of the sparse depth image. Some methods are to directly stitch the RGB and sparse depth images, and then input the stitched images into the neural network for depth completion; other methods are to input the sparse depth images to the neural network to obtain the initial dense depth image, and then The dense depth map is fused with the RGB image to obtain a more accurate and more accurate depth completion result.

However, these methods often do not make good use of sparse depth images, and the obtained dense depth images are not densified enough and have low reliability.

In order to illustrate the technical solution of the present application, specific examples are used below to illustrate.

Fig. 1 shows a schematic flow chart of a method for acquiring a depth image provided by an embodiment of the present application. The method can be applied to a terminal, and is applicable to situations where the reliability of an acquired dense depth image needs to be improved.

Wherein, the above-mentioned terminal may be a device capable of image processing such as a computer, a smart phone, and a tablet device.

Specifically, the above method for acquiring a depth image may include the following steps S101 to S104.

Step S101, acquiring a color image and a sparse depth image of a target scene.

In some embodiments of the present application, the terminal can acquire the color image and point cloud data of the target scene, and then project the point cloud data onto the imaging plane of the color image to obtain a sparse depth image.

More specifically, the terminal can obtain the color image obtained by shooting the target scene through the color camera, obtain the point cloud data obtained by scanning the target scene with the depth sensor, and project the point cloud data obtained by the depth sensor to the imaging of the color camera plane, resulting in a sparse depth image. Among them, the depth sensor can include but not limited to lidar, direct time of flight (Direct Time of flight, dTof), speckle indirect measurement of time of flight (Indirect Time of flight, iTof), etc.

It should be noted that the above target scenario may be selected according to actual conditions, which is not limited in this application.

Step S102, extracting color features and initial features of the color image, and obtaining an initial dense depth image and initial hidden features according to the color features, initial features, and sparse depth image.

In the embodiment of the present application, the terminal may extract its color feature and initial feature through a feature extraction algorithm; wherein, the feature extraction algorithms used for the color feature and the initial feature may be the same or different. For example, the terminal may perform a convolution operation on a color image through different convolution kernels to extract color features and initial features of the color image.

Further, the terminal can use the color features and initial features acquired from the color image, as well as the sparse depth image to determine the initial hidden features for optimization feedback. Among them, the initial hidden features can be used to perform feature optimization feedback on the initial dense depth image, thereby preliminarily realizing the completion of the depth image.

In one embodiment, as shown in FIG. 2, the initial hidden features in step S102 can be obtained from steps S201 to S203.

Step S201, performing depth estimation on initial features to obtain an initial dense depth image.

In some implementation manners, the terminal may perform convolution regression on the initial features to obtain an initial dense depth image. It should be noted that, the foregoing depth estimation manner may be selected according to actual conditions, and no limitation is set here.

In step S202, the initial dense depth image and the sparse depth image are fused to obtain an initial fused feature image.

Specifically, the terminal can realize image fusion through a concatenation (Concat) operation, and the initial fusion feature image F _concat (d _sparse ,d _dense )=Concat{d _sparse ,d _dense }.

Among them, d _sparse is the sparse depth image, and d _dense is the initial dense depth image.

Step S203, using the initial fusion feature image, initial features and color features to determine initial hidden features.

Specifically, the terminal may perform a convolution operation on the initial fusion feature image, the initial feature, and the color feature to determine the initial hidden feature.

Step S103, using the color feature, the sparse depth image and the initial hidden feature, to perform at least one iterative optimization operation on the initial dense depth image, and confirm the target hidden feature according to the hidden features to be confirmed obtained by each iterative optimization operation.

In the embodiment of the present application, after obtaining the initial hidden features, the color features, the initial hidden features and the sparse depth image are used as the input of the first iterative optimization operation, and the hidden features to be confirmed obtained by the first iterative optimization operation will be compared with Color features and sparse depth images are used as input for the second iterative optimization operation, and so on. Each iterative optimization operation can obtain a hidden feature to be confirmed. After at least one iterative optimization operation, the target hidden feature used to determine the target dense depth image can be determined from the hidden features to be confirmed obtained by each iterative optimization operation. feature. Wherein, the target dense depth image is the depth image whose densification effect can meet the requirements.

In some embodiments, the hidden features to be confirmed output by the first iterative optimization operation in step S103 can be obtained by the following steps:

Step S204, performing depth estimation on the initial hidden features to obtain a first dense depth image;

Step S205, fusing the first dense depth image and the sparse depth image to obtain a first fused feature image;

Step S206, using the first fused feature image, color features and initial hidden features to determine hidden features to be confirmed output by the first iterative optimization operation.

When performing the second iterative optimization operation, the terminal can perform depth estimation on the hidden features to be confirmed output by the first iterative optimization operation to obtain the second dense depth image, and then fuse the second dense depth image with the sparse depth image , to obtain the second fused feature image, and use the second fused feature image, color features, and hidden features to be confirmed output by the first iterative optimization operation to determine the hidden features to be confirmed output by the second iterative optimization operation. By analogy and so on, iterating N times, N hidden features to be confirmed can be obtained, where N is a positive integer greater than or equal to 1.

It should be noted that the specific methods of step S204 to step S206 are similar to those of step S201 to step S203 respectively, and will not be repeated here.

In addition, the larger the value of N, the better the densification effect of the target hidden feature output at the end of the iterative optimization operation. Correspondingly, the time consumed and the amount of calculation will increase. Generally, the densification effect of the first few iterations is the best. Obviously, therefore, the specific value of N can be set according to actual conditions such as hardware conditions and densification requirements.

In some embodiments of the present application, after each hidden feature to be confirmed is obtained through an iterative optimization operation, the terminal may calculate the difference between the hidden feature to be confirmed output by the current iterative optimization operation and the hidden feature to be confirmed output by the previous iterative optimization operation Between the error indicators, and determine whether the error indicators are within the preset error threshold range. If the error index is within the preset error threshold range, the iterative optimization operation is stopped, and the hidden feature to be confirmed obtained by the current iterative optimization operation is used as the target hidden feature. If the error index is outside the preset error threshold range, the hidden feature to be confirmed output by the current iterative optimization operation is used as the input of the next iterative optimization operation, and the next iterative optimization operation is continued.

Wherein, the range of the error threshold can be adjusted according to the actual situation, which is not limited in this application.

That is to say, after performing the first iterative optimization operation, the terminal can judge whether the error index between the hidden features to be confirmed and the initial hidden features output by the first iterative optimization operation is within the range of the error threshold, so as to determine whether The next iterative optimization operation is required. If the error index is within the range of the error threshold, the iterative optimization operation is stopped, and the hidden feature to be confirmed output by the first iterative optimization operation is used as the target hidden feature.

Otherwise, perform the second iterative optimization operation, and after the second iterative optimization operation is completed, judge the difference between the hidden features to be confirmed output by the second iterative optimization operation and the hidden features to be confirmed output by the first iterative optimization operation Whether the error index is within the error threshold range, so as to determine whether the next iterative optimization operation is needed.

By analogy, until the error index of the hidden feature to be confirmed output by the Nth iterative optimization operation and the hidden feature to be confirmed output by the N-1th iterative optimization operation are within the error threshold range, the iterative optimization operation is stopped, and the first The hidden features to be confirmed output by the N iteration optimization operation are used as the target hidden features.

In some embodiments of the present application, the error indicator may also be determined based on the dense depth image corresponding to the current iterative optimization operation and the dense depth image corresponding to the previous iterative optimization operation of the current iterative optimization operation. Wherein, the dense depth image corresponding to the current iterative optimization operation refers to the dense depth image obtained by performing depth estimation on hidden features output by the current iterative optimization operation. Preferably, the terminal may subtract the dense depth image corresponding to the previous iterative optimization operation from the dense depth image obtained by the current iterative optimization operation, and then calculate a mean absolute error (Mean Absolute Error, MAE) value as an error indicator. At this time, the number of iterative optimization operations can be compatible with the densification effect and efficiency.

In step S104, depth estimation is performed using hidden features of the target to obtain a target dense depth image of the target scene.

In the embodiment of the present application, the terminal performs depth estimation on the target hidden features confirmed in step S103 to obtain the target dense depth image of the target scene. The dense depth image is an image obtained by performing depth complementation on a sparse depth image, that is, a depth image whose densification effect can meet requirements.

Moreover, each iterative optimization operation will use the hidden features to be confirmed output by the previous iterative optimization operation as a guide, so the densification degree of the dense depth image will be further improved after each iterative optimization operation.

It should be noted that, the above-mentioned method for acquiring a depth image may be implemented through a network model. Fig. 3 shows a schematic diagram of the structure of a depth image model. The terminal can input the color image and the sparse depth image into the depth image model, and obtain the target dense depth image output by the depth image model.

Wherein, the depth image model may include a feature extraction module, N feedback modules and a target depth estimation module.

The terminal can extract the color features and initial features of the color image through the feature extraction module, and obtain the initial dense depth image and initial hidden features through the first feedback module.

Then, the terminal can perform an iterative optimization operation through the remaining feedback modules in turn, and finally through the target depth estimation module, perform depth estimation on the hidden features of the target output by the last feedback module to obtain the target dense depth image of the target scene.

As shown in Fig. 4, each feedback module may include an intermediate depth estimation module, a fusion module and a sequence model module.

The step of the terminal performing an iterative optimization operation in a single feedback module may specifically include: performing depth estimation on the previous hidden feature (that is, the hidden feature to be confirmed output by the previous feedback module) through the intermediate depth estimation module of the current feedback module, and obtaining The dense depth image output by the current feedback module; through the fusion module of the current feedback module, the dense depth image output by the current feedback module is fused with the sparse depth image to obtain the fusion feature map of the current feedback module; through the sequence model module of the current feedback module , use the color feature, the fusion feature map of the current feedback module, and the previous hidden feature to determine the current hidden feature to be confirmed (that is, the hidden feature to be confirmed output by the current feedback module).

The above number of feedback modules can be set according to actual conditions. Moreover, the above-mentioned depth image model can also output a parameter used to characterize the densification effect. If the parameter is greater than a preset threshold, continue to input the output of the current feedback module into the next feedback module. If the parameter is smaller than the threshold, The hidden feature to be confirmed output by the current feedback module is used as the target hidden feature, and the target hidden feature is estimated in depth through the target depth estimation module, and the target dense depth image of the target scene is output.

It should be noted that, for the specific working process of the above-mentioned depth image model, reference may be made to the description of the method shown in Figure 1 and Figure 2 above, which will not be repeated in this application.

Before using the depth image model, the terminal needs to train the depth image model. Furthermore, in the process of training the depth image model, the number of iterations of the iterative optimization operation through the feedback module is fixed. If the number of iterations is not fixed, the dense depth image obtained by the iterative optimization of the feedback module will follow the adjustment of the network parameters to be trained. And change, so that there are two variables in the training process, and the accurate training error cannot be obtained; while in the process of using the depth image model, the number of iterations of the feedback module may not be fixed, and the number of iterations may depend on the current iterative optimization operation. The error between the dense depth image to be optimized and the dense depth image to be optimized obtained by the previous iterative optimization operation.

Specifically, as shown in FIG. 5 , the above-mentioned training process of the depth image model may include steps S501 to S503.

Step S501, acquiring a sample color image, a sample sparse depth image and a corresponding reference dense depth image.

Wherein, for the acquisition manner of the sample color image and the sample sparse depth image, please refer to the description of step S101.

The reference dense depth image is an ideal dense depth image. In some implementations, artificially synthesized depth images can be obtained, for example, it can be realized by the Unreal 4 (unreal engine 4, UE4) engine, or the depth collected by other depth sensors (such as high-precision TOF depth cameras) can also be obtained image.

Step S502, input the sample color image and sample sparse depth image into the network to be trained, obtain the sample dense depth image output by each feedback module in the network to be trained, and the sample target dense depth image output by the target depth estimation module in the network to be trained .

Wherein, the model structure and working process of the network to be trained can refer to the descriptions in FIG. 1 to FIG. 4 , which will not be repeated in this application.

Step S503, calculate the target error value according to the sample target dense depth image, each sample dense depth image, and the reference dense depth image, if the target error value is greater than the error threshold, adjust the parameters of the network to be trained to iteratively optimize the network to be trained, Until the target error value is less than or equal to the error threshold, the network to be trained is used as a depth image model.

Among them, the error threshold refers to the maximum value of the target error value allowed when the model converges, which can be adjusted according to the actual situation.

Specifically, in some embodiments of the present application, the terminal may calculate the initial error value between the sample target dense depth image and each sample dense depth image and the reference dense depth image, and then perform a weighted average on the initial error value, Get the target error value, and then ensure that the densification effect of iterative optimization is better.

By comparing with the preset error threshold, if the target error value is greater than the error threshold, it means that the network to be trained has not converged, so it is necessary to readjust the parameters of the network to be trained, and recalculate the target error value, and iterate until the target error value is less than or It is equal to the error threshold, indicating that the network to be trained has been able to output a reliable dense depth image, and the network can be used as a depth image model and put into use.

It should be noted that there may be multiple sample color images, sample sparse depth images, and corresponding reference dense depth images, and any one or more of them may be selected for training in each iterative training process. The training process can be implemented by using the gradient descent method, and the corresponding loss function (loss function) can be L1 norm loss function, L2 norm loss function or other loss functions.

It should be noted that for the foregoing method embodiments, for the sake of simple description, they are expressed as a series of action combinations, but those skilled in the art should know that the present application is not limited by the described action sequence. Because of this application, certain steps may be performed in other orders.

FIG. 6 is a schematic structural diagram of a depth image acquisition apparatus 600 provided in an embodiment of the present application, and the depth image acquisition apparatus 600 is configured on a terminal.

Specifically, the acquisition device 600 of the depth image may include:

An image acquisition unit 601, configured to acquire a color image and a sparse depth image of a target scene;

An initial densification unit 602, configured to extract color features and initial features of the color image, and obtain an initial dense depth image and initial hidden features according to the color features, the initial features, and the sparse depth image;

An iterative optimization unit 603, configured to perform at least one iterative optimization operation on the initial dense depth image by using the color feature, the sparse depth image and the initial hidden feature, and obtain Confirm hidden features Confirm target hidden features;

The target densification unit 604 is configured to use the hidden features of the target to perform depth estimation to obtain a dense target depth image of the target scene.

Wherein, the aforementioned depth image acquisition device 600 may include the aforementioned depth image model, please refer to FIG. 3 , the initial densification unit 602 may correspond to the feature extraction module and the first feedback module of the depth image model, and the iterative optimization unit 603 may correspond to the depth image For other feedback modules other than the first feedback module in the model, the object densification unit 604 may correspond to the object depth estimation module of the depth image model.

In some implementations of the present application, the initial densification unit 602 may be specifically configured to: perform depth estimation on the initial features to obtain an initial dense depth image; fuse the initial dense depth image with the sparse depth image , to obtain an initial fusion feature image; using the initial fusion feature image, the initial feature, and the color feature to determine the initial hidden feature.

In some embodiments of the present application, the above-mentioned image acquisition unit 601 may be specifically configured to: acquire the color image and point cloud data of the target scene; project the point cloud data onto the imaging plane of the color image to obtain The sparse depth image.

In some embodiments of the present application, the above-mentioned iterative optimization unit 603 may be specifically configured to: use the initial hidden features to perform at least one iterative optimization operation on the initial dense depth image, and calculate the current The error index between the hidden feature to be confirmed output by the iterative optimization operation and the hidden feature to be confirmed output by the previous iterative optimization operation, if the error index is outside the error threshold range, continue to the next iterative optimization operation until the If the error index is within the range of the error threshold, the iterative optimization operation is stopped, and the hidden feature to be confirmed output by the current iterative optimization operation is used as the target hidden feature.

In some embodiments of the present application, the depth image acquisition apparatus 600 may further include a training unit, which may be used to: acquire a sample color image, a sample sparse depth image, and a corresponding reference dense depth image; The sample sparse depth image is input into the network to be trained, and the sample dense depth image output by each feedback module in the network to be trained is obtained, and the sample target dense depth image output by the target depth estimation module in the network to be trained; according to calculating a target error value for the sample target dense depth image, each of the sample dense depth images, and the reference dense depth image, and adjusting the parameters of the network to be trained if the target error value is greater than an error threshold, The network to be trained is optimized iteratively until the target error value is less than or equal to the error threshold, and the network to be trained is used as the depth image model.

In some embodiments of the present application, the above-mentioned training unit may be specifically configured to: extract the color features and initial features of the color image through the feature extraction module in the network to be trained; A feedback module, which obtains an initial dense depth image and an initial hidden feature according to the color feature, the initial feature and the sparse depth image; performs an iterative optimization operation through other feedback modules in the network to be trained, and outputs each A sample dense depth image obtained by an iterative optimization operation.

In some embodiments of the present application, the above training unit may be specifically configured to: calculate the initial error value between the sample target dense depth image and each of the sample dense depth images and the reference dense depth image; The initial error value is weighted and averaged to obtain the target error value.

It should be noted that, for the convenience and brevity of description, the specific working process of the depth image acquisition apparatus 600 can refer to the corresponding process of the methods described in FIG. 1 to FIG. 5 , which will not be repeated here.

The embodiment of the present application also provides a depth system. The system specifically includes a color module, a depth module, and the aforementioned depth image acquisition device 600, wherein the color module is used to collect a color image of the target scene; the depth module is used to The target scene is scanned to obtain point cloud data, and a sparse depth image is obtained according to the point cloud data; the obtaining device uses the color image and the sparse depth image to obtain a dense depth image of the target. It should be noted that the color module includes a color camera, and the depth module includes but is not limited to a laser radar, a direct time of flight (Direct Time of flight, dTof) camera, and a speckle indirect time of flight (Indirect Time of flight, iTof) camera. Any one of them; the color module, the depth module and the acquisition device can be an integrated device or an independent device, and the data between each component can be transmitted by wire or wireless, which is not limited here. For the specific working process of the depth system, reference can be made to the descriptions in FIG. 1 to FIG. 6 , which will not be repeated in this application.

As shown in FIG. 7 , it is a schematic diagram of a terminal provided in the embodiment of the present application. The terminal 7 may include: a processor 70, a memory 71, and a computer program 72 stored in the memory 71 and operable on the processor 70, such as a depth image acquisition program. When the processor 70 executes the computer program 72, it implements the steps in the embodiments of the methods for acquiring depth images above, such as steps S101 to S104 shown in FIG. 1 . Alternatively, when the processor 70 executes the computer program 72, it realizes the functions of the modules/units in the above-mentioned device embodiments, such as the image acquisition unit 601, the initial densification unit 602, the iterative optimization unit 603 and the Target densification unit 604 .

The computer program can be divided into one or more modules/units, and the one or more modules/units are stored in the memory 71 and executed by the processor 70 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program in the terminal.

For example, the computer program can be divided into: an image acquisition unit, an initial densification unit, an iterative optimization unit and a target densification unit.

The specific functions of each unit are as follows:

The image acquisition unit is used to acquire the color image and the sparse depth image of the target scene; the initial densification unit is used to extract the color features and initial features of the color image, and according to the color features, the initial features and the The sparse depth image acquires an initial dense depth image and initial hidden features; an iterative optimization unit is configured to perform at least one iterative optimization operation on the initial dense depth image by using the color features, the sparse depth image, and the initial hidden features , and confirm the target hidden feature according to the hidden features to be confirmed obtained by each iterative optimization operation; the target densification unit is configured to use the target hidden feature to perform depth estimation to obtain a target dense depth image of the target scene.

The terminal may include, but not limited to, a processor 70 and a memory 71 . Those skilled in the art can understand that FIG. 7 is only an example of a terminal, and does not constitute a limitation on the terminal. It may include more or less components than those shown in the figure, or combine certain components, or different components, such as the Terminals may also include input and output devices, network access devices, buses, and so on.

The so-called processor 70 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

The storage 71 may be an internal storage unit of the terminal, such as a hard disk or memory of the terminal. The memory 71 can also be an external storage device of the terminal, such as a plug-in hard disk equipped on the terminal, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) etc. Further, the memory 71 may also include both an internal storage unit of the terminal and an external storage device. The memory 71 is used to store the computer program and other programs and data required by the terminal. The memory 71 can also be used to temporarily store data that has been output or will be output.

It should be noted that, for the convenience and brevity of description, the structure of the terminal above can also refer to the specific description of the structure in the method embodiment, and details are not repeated here.

Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional units and modules is used for illustration. In practical applications, the above-mentioned functions can be assigned to different functional units, Completion of modules means that the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist separately physically, or two or more units can be integrated into one unit, and the above-mentioned integrated units can either adopt hardware It can also be implemented in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working processes of the units and modules in the above system, reference may be made to the corresponding processes in the aforementioned method embodiments, and details will not be repeated here.

In the above-mentioned embodiments, the descriptions of each embodiment have their own emphases, and for parts that are not detailed or recorded in a certain embodiment, refer to the relevant descriptions of other embodiments.

Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

In the embodiments provided in this application, it should be understood that the disclosed device/terminal and method may be implemented in other ways. For example, the device/terminal embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units or Components may be combined or integrated into another system, or some features may be omitted, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

If the integrated module/unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments in the present application can also be completed by instructing related hardware through computer programs. The computer programs can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps in the above-mentioned various method embodiments can be realized. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, and a read-only memory (Read-Only Memory, ROM) , random access memory (Random Access Memory, RAM), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer-readable media Excludes electrical carrier signals and telecommunication signals.

The above-described embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still implement the foregoing embodiments Modifications to the technical solutions described in the examples, or equivalent replacements for some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application, and should be included in the Within the protection scope of this application.

Claims

A method for acquiring a depth image, comprising:

Obtain a color image and a sparse depth image of the target scene;

extracting color features and initial features of the color image, and obtaining an initial dense depth image and initial hidden features according to the color features, the initial features, and the sparse depth image;

Using the color feature, the sparse depth image and the initial hidden feature, perform at least one iterative optimization operation on the initial dense depth image, and confirm the target hidden feature according to the hidden features to be confirmed obtained by each iterative optimization operation ;

Depth estimation is performed using the target hidden features to obtain a target dense depth image of the target scene.
The method for obtaining a depth image according to claim 1, wherein said obtaining an initial dense depth image and an initial hidden feature according to said color feature, said initial feature, and said sparse depth image comprises:

performing depth estimation on the initial features to obtain an initial dense depth image;

Fusing the initial dense depth image with the sparse depth image to obtain an initial fusion feature image;

The initial hidden features are determined using the initial fused feature image, the initial features, and the color features.
The method for acquiring a depth image according to claim 1 or 2, wherein the acquisition of the color image and the sparse depth image of the target scene comprises:

Obtain the color image and point cloud data of the target scene;

The point cloud data is projected onto the imaging plane of the color image to obtain the sparse depth image.
The method for acquiring a depth image according to claim 1 or 2, wherein the initial dense depth image is iterated at least once by using the color feature, the sparse depth image and the initial hidden feature Optimize the operation, and confirm the target hidden features according to the hidden features to be confirmed obtained by each iterative optimization operation, including:

Using the initial hidden features to perform at least one iterative optimization operation on the initial dense depth image, after each iterative optimization operation, calculate the hidden features to be confirmed output by the current iterative optimization operation and the unconfirmed hidden features output by the previous iterative optimization operation The error index between the hidden features, if the error index is outside the error threshold range, continue the next iterative optimization operation until the error index is within the error threshold range, stop the iterative optimization operation, and set The hidden features to be confirmed output by the current iterative optimization operation are used as the target hidden features.
The method for obtaining a depth image according to claim 1 or 2, wherein the method for obtaining a depth image is performed by a pre-trained depth image model;

Wherein, the training process of the depth image model includes:

Obtain the sample color image, the sample sparse depth image and the corresponding reference dense depth image;

Input the sample color image and the sample sparse depth image into the network to be trained, obtain the sample dense depth image output by each feedback module in the network to be trained, and the target depth estimation module output in the network to be trained Sample target dense depth image;

Calculate a target error value based on the sample target dense depth image, each of the sample dense depth images, and the reference dense depth image, and adjust the parameters of the network to be trained if the target error value is greater than an error threshold , to iteratively optimize the network to be trained until the target error value is less than or equal to the error threshold, and use the network to be trained as the depth image model.
The method for obtaining a depth image according to claim 5, wherein said obtaining the sample-dense depth image output by each feedback module in the network to be trained comprises:

By the feature extraction module in the network to be trained, extract the color features and initial features of the color image;

Obtain an initial dense depth image and an initial hidden feature according to the color feature, the initial feature, and the sparse depth image through the first feedback module in the network to be trained;

Perform an iterative optimization operation through other feedback modules in the network to be trained, and output a sample-dense depth image obtained by each iterative optimization operation.
The method for acquiring a depth image according to claim 5, wherein the target error value is calculated according to the sample target dense depth image, each of the sample dense depth images, and the reference dense depth image, include:

calculating an initial error value between the sample target dense depth image and each of the sample dense depth images and the reference dense depth image;

The weighted average is performed on the initial error value to obtain the target error value.
A device for acquiring a depth image, characterized in that it comprises:

An image acquisition unit, configured to acquire a color image and a sparse depth image of the target scene;

an initial densification unit, configured to extract color features and initial features of the color image, and obtain an initial dense depth image and initial hidden features according to the color features, the initial features, and the sparse depth image;

An iterative optimization unit, configured to perform at least one iterative optimization operation on the initial dense depth image by using the color feature, the sparse depth image, and the initial hidden feature, and obtain the information to be confirmed according to each iterative optimization operation Hidden features confirm target hidden features;

The target densification unit is configured to use the hidden features of the target to perform depth estimation to obtain a target dense depth image of the target scene.
A depth system, characterized in that it comprises a color module, a depth module and the acquisition device according to claim 8, wherein:

The color module is used to collect a color image of a target scene;

The depth module is used to scan the target scene to obtain point cloud data, and obtain a sparse depth image according to the point cloud data;

The acquisition device uses the color image and the sparse depth image to obtain a target dense depth image.
A terminal, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, characterized in that, when the processor executes the computer program, the computer program according to claims 1 to 7 is implemented. The steps of any one of the acquisition methods.
A computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, the steps of the acquisition method according to any one of claims 1 to 7 are realized.