CN114862933A

CN114862933A - Depth image prediction method, device, equipment and storage medium

Info

Publication number: CN114862933A
Application number: CN202210540033.XA
Authority: CN
Inventors: 刘钰纯; 杨帆; 岳鸣; 崔兴旺; 肖永钦
Original assignee: Shanghai United Imaging Intelligent Healthcare Co Ltd
Current assignee: Shanghai United Imaging Intelligent Healthcare Co Ltd
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2022-08-05

Abstract

The application relates to a depth image prediction method, a depth image prediction device, a depth image prediction equipment and a storage medium. The method comprises the following steps: collecting an RGB image and a first depth image of a target; the first depth image comprises depth information of the target, and the acquisition range of the RGB image is larger than that of the first depth image; predicting a part with missing depth information in the first depth image according to the RGB image, the first depth image and a preset prediction algorithm, and determining a second depth image corresponding to the RGB image; the acquisition range of the second depth image is substantially the same as the acquisition range of the RGB image. By adopting the method, the accuracy of the predicted depth image can be improved.

Description

Depth image prediction method, device, equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a depth image prediction method, apparatus, device, and storage medium.

Background

In order to capture an image of a target, two types of cameras, namely, an RGB (primary color, Red, Green, and Blue) camera and a depth camera, are integrated into hardware of an image capture module, so as to capture an image of the target by combining the two types of cameras. However, with the continuous development of camera hardware technology and the continuous improvement of user requirements, the field angle of most RGB cameras in the actual image acquisition process is larger than that of the depth camera, in this case, the scene information in the RGB image obtained by the RGB camera is more complete, and a part of the scene information is missing in the depth image obtained by the depth camera, so that the accurate positioning of the target in the 3D scene in the following process is affected.

In order to solve the above problems, in the related art, a single RGB image obtained by an RGB camera in an image acquisition process is input to a trained neural network model to obtain a predicted depth image; wherein the neural network model is generally trained from a single RGB image under multiple scenes.

However, depth images predicted using the above techniques are not sufficiently accurate.

Disclosure of Invention

In view of the above, it is necessary to provide a depth image prediction method, device, apparatus, and storage medium capable of improving accuracy of a predicted depth image in view of the above technical problems.

In a first aspect, the present application provides a depth image prediction method, including:

collecting an RGB image and a first depth image of a target; the first depth image comprises depth information of a target, and the acquisition range of the RGB image is larger than that of the first depth image;

predicting a part with missing depth information in the first depth image according to the RGB image, the first depth image and a preset prediction algorithm, and determining a second depth image corresponding to the RGB image; the capture range of the second depth image is substantially the same as the capture range of the RGB image.

In one embodiment, the predicting, according to the RGB image, the first depth image and a preset prediction algorithm, a portion of the first depth image with missing depth information, and determining the second depth image corresponding to the RGB image includes:

performing image preprocessing operation on the RGB image and the first depth image, and determining a preprocessed image; the preprocessing operation comprises at least one of image alignment processing and image merging processing;

and inputting the preprocessed image into a prediction algorithm, predicting the part with missing depth information in the first depth image, and determining a second depth image corresponding to the RGB image.

In one embodiment, the prediction algorithm is a neural network model; the inputting the preprocessed image into a prediction algorithm, predicting a part with missing depth information in the first depth image, and determining the second depth image corresponding to the RGB image includes:

inputting the preprocessed image into a neural network model for prediction, and determining a second depth image corresponding to the RGB image;

the neural network model is obtained by training a sample data set, the sample data set comprises a plurality of sample RGB images, a sample depth image corresponding to each sample RGB image and a standard depth image corresponding to each sample depth image, the acquisition range of each sample RGB image is larger than that of the corresponding sample depth image, and the acquisition range of each standard depth image is the same as that of the corresponding sample RGB image.

In one embodiment, the training method of the neural network model includes:

preprocessing each sample RGB image and the corresponding sample depth image to determine a preprocessed sample image; the preprocessing operation comprises at least one of image alignment processing and image merging processing;

inputting each preprocessed sample image into an initial neural network model for prediction, and determining a prediction depth image corresponding to each sample depth image;

and training the initial neural network model according to each predicted depth image and each corresponding standard depth image to determine the neural network model.

In one embodiment, the training of the initial neural network model according to each predicted depth image and each corresponding standard depth image includes:

calculating first loss between each pixel point on each predicted depth image and the corresponding pixel point on the corresponding standard depth image;

the initial neural network model is trained based on the first loss.

In one embodiment, the training the initial neural network model according to the first loss includes:

determining a prediction normal map corresponding to each prediction depth image according to each prediction depth image;

determining a standard normal map corresponding to each standard depth image according to each sample depth image;

calculating a second loss between each predicted normal map and the corresponding standard normal map;

and training the initial neural network model according to the first loss and the second loss.

In one embodiment, the method further includes:

and denoising the second depth image.

In a second aspect, the present application further provides an apparatus for predicting a depth image, the apparatus comprising:

the acquisition module is used for acquiring an RGB image and a first depth image of a target; the first depth image comprises depth information of a target, and the acquisition range of the RGB image is larger than that of the first depth image;

the prediction module is used for predicting the part with missing depth information in the first depth image according to the RGB image, the first depth image and a preset prediction algorithm, and determining a second depth image corresponding to the RGB image; the capture range of the second depth image is substantially the same as the capture range of the RGB image.

In a third aspect, the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the following steps when executing the computer program:

In a fourth aspect, the present application also provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of:

In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:

According to the depth image prediction method, the depth image prediction device, the depth image prediction equipment and the storage medium, the part with missing depth information in the first depth image is predicted according to the acquired RGB image of the target, the first depth image and a preset prediction algorithm, and a second depth image corresponding to the RGB image is determined; the first depth image comprises depth information of a target, the acquisition range of the RGB image is larger than that of the first depth image, and the acquisition range of the second depth image is approximately the same as that of the RGB image. In the method, the part with missing depth information in the first depth image can be comprehensively predicted by combining the RGB image with the large acquisition range and the first depth image with the small acquisition range, so that the second depth image determined by combining the partial depth information is relatively accurate, namely the second depth image with the same acquisition range as the RGB image can be obtained, and the accuracy of the predicted second depth image can be improved. In addition, by adopting the method, the depth image with a large acquisition range can be obtained without replacing the original RGB camera and the depth camera, the bottleneck problem of software and hardware system development of the subsequent whole product caused by insufficient view field angle is solved, and the time and the cost for testing and replacing the depth camera can be greatly saved.

Drawings

FIG. 1 is a diagram illustrating an exemplary embodiment of a depth image prediction method;

FIG. 2 is a flow diagram illustrating a method for predicting depth images in one embodiment;

FIG. 3 is an exemplary diagram of field ranges corresponding to an RGB camera and a depth camera in one embodiment;

FIG. 4 is a flowchart illustrating a depth image prediction method according to another embodiment;

FIG. 5 is a flowchart illustrating a depth image prediction method according to another embodiment;

FIG. 6 is a flowchart illustrating a depth image prediction method according to another embodiment;

FIG. 7 is a block diagram showing the structure of a depth image prediction apparatus according to an embodiment;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The prediction method for the depth image provided by the embodiment of the application can be applied to the application environment shown in fig. 1. The computer device 102 may communicate with the acquisition device 104, and the communication mode may be wired communication or wireless communication. The capture device 104 may be a monocular camera, a binocular camera, a depth camera that may capture depth information, a camera, or the like. The computer device 104 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and the like. Specifically, the image captured by the capture device 104 may be transmitted to the computer device 102 to enable the computer device 102 to process the captured image.

In one embodiment, as shown in fig. 2, a depth image prediction method is provided, which is exemplified by the method applied to the computer device in fig. 1, and the method may include the following steps:

s202, collecting an RGB image and a first depth image of a target; the first depth image comprises depth information of a target, and the acquisition range of the RGB image is larger than that of the first depth image.

RGB refers to three primary optical colors, R represents Red (Red), G represents Green (Green), and B represents Blue (Blue), and an RGB image is generally a two-dimensional image, and the RGB image includes a plurality of pixel points and a pixel value of each pixel point, where the pixel value may be a value formed by three primary RGB colors. The RGB image can be acquired by using a common camera or a video camera.

The depth image may be acquired by a depth camera with a depth information acquisition function, where the depth camera may be, for example, a 3D camera or a 3D camera, a camera or a camera with a depth information acquisition function formed by a binocular camera, or the like.

Specifically, in this step, two different cameras or cameras may be used to simultaneously capture images of the target, where one camera or camera is used to capture an RGB image of the target (hereinafter, for convenience of description, the camera or camera capturing the RGB image is referred to as an RGB camera), the other camera or camera is used to capture a depth image of the target (hereinafter, for convenience of description, the camera or camera capturing the depth image is referred to as a depth camera), and the cameras integrating the RGB camera and the depth camera may be used to capture the images, which is not particularly limited herein.

The field angle range of the RGB camera is greater than that of the depth camera, that is, information in RGB images acquired by the RGB camera is relatively comprehensive, and a depth image acquired by the depth camera may have a part of depth information missing, a specific example may be shown in fig. 3, because the depth information is partially missing, accurate position information of a target in a 3D scene may be affected, and accurate positioning of the target may be affected. That is, the acquired RGB image has a larger acquisition range than the acquisition range of the depth image, and the depth image is referred to as a first depth image.

In addition, the target herein refers to a static target, and may also be a dynamic target, such as a human body, a car, an animal, a plant, and the like.

S204, predicting a part with missing depth information in the first depth image according to the RGB image, the first depth image and a preset prediction algorithm, and determining a second depth image corresponding to the RGB image; the capture range of the second depth image is substantially the same as the capture range of the RGB image.

The prediction algorithm may be a neural network model, a statistical model, a machine learning model, or the like, or may be a correlation mathematical calculation method, or the like.

In this step, after obtaining the RGB image with the large acquisition range and the first depth image with the small acquisition range, a prediction algorithm may be used to directly perform prediction processing on the RGB image and the first depth image, predict a portion of the first depth image with missing depth information, and perform information merging on the portion of the first depth image with missing depth information, so as to obtain a second depth image with complete depth information. Or, the RGB image and the first depth image may be processed first, and the processed image is subjected to prediction processing by using a prediction algorithm, so as to predict a portion of the first depth image where depth information is missing, and the portion of the first depth image where depth information is missing and the portion of the first depth image where depth information is present are subjected to information merging, so as to obtain a second depth image with complete depth information. In addition, the second depth image in which the complementary depth information is predicted directly by the prediction algorithm may be used.

The second depth image obtained by complementing the depth information has an acquisition range substantially the same as that of the RGB image. The substantially same here may be that the capture range of the second depth image is approximately equal to the capture range of the RGB image, or the capture range of the second depth image, etc. is slightly smaller or slightly larger than the capture range of the RGB image.

According to the depth image prediction method, a part with missing depth information in a first depth image is predicted according to an acquired RGB image of a target, the first depth image and a preset prediction algorithm, and a second depth image corresponding to the RGB image is determined; the first depth image comprises depth information of a target, the acquisition range of the RGB image is larger than that of the first depth image, and the acquisition range of the second depth image is approximately the same as that of the RGB image. In the method, the part with missing depth information in the first depth image can be comprehensively predicted by combining the RGB image with the large acquisition range and the first depth image with the small acquisition range, so that the second depth image determined by combining the partial depth information is relatively accurate, namely the second depth image with the same acquisition range as the RGB image can be obtained, and the accuracy of the predicted second depth image can be improved. In addition, by adopting the method, the depth image with a large acquisition range can be obtained without replacing the original RGB camera and the depth camera, the bottleneck problem of software and hardware system development of the subsequent whole product caused by insufficient view field angle is solved, and the time and the cost for testing and replacing the depth camera can be greatly saved.

In the above embodiment, it is mentioned that the prediction algorithm may be used to directly perform the prediction processing on the RGB image and the first depth image, or the RGB image and the first depth image may be processed first, and then the prediction algorithm is used to perform the prediction processing on the processed image, which is described in the following embodiment.

In another embodiment, as shown in fig. 4, another depth image prediction method is provided, and based on the above embodiment, the above S204 may include the following steps:

s402, performing image preprocessing operation on the RGB image and the first depth image, and determining a preprocessed image; the preprocessing operation includes at least one of an image alignment process and an image merging process.

In this step, the preprocessing operation may include not only the image alignment processing, the image merging processing, but also the image cropping processing and the like.

After obtaining the RGB image and the first depth image, in one possible implementation, an image alignment process may be performed on the RGB image and the first depth image to obtain an aligned RGB image and an aligned first depth image. The image alignment process is similar to the image registration process/image calibration process, and the first depth image can be projected into the RGB image through the internal reference and the external reference of the RGB camera and the internal reference and the external reference of the depth camera, which are obtained in advance, so that the first depth image is converted into the coordinate system where the RGB image is located to perform subsequent data processing. Through the image alignment processing, images in a plurality of different coordinate systems can be converted into the same coordinate system for processing, so that the calculation amount can be saved, and the calculation efficiency is improved.

In another possible implementation, the RGB image and the first depth image may be subjected to image merging processing, and a merged image is obtained and recorded as an RGBD image. Through image merging processing, information of a plurality of different images is fused in the finally obtained image, and therefore the accuracy of predicting the missing depth information in the follow-up process can be improved.

In addition, when the image merging processing is performed, the RGB values of the first pixel points on the RGB image may be obtained, and the depth information of the second pixel points at the corresponding positions on the first depth image may be obtained; and combining the depth information of each second pixel point with the RGB value of the first pixel point at the corresponding position to obtain an RGBD image. Or, performing feature extraction on the RGB image to obtain a first extracted feature; performing feature extraction on the first depth image to obtain a second extraction feature; and combining the first extraction characteristic and the second extraction characteristic to obtain an RGBD image. Through the two image merging modes, the RGB image and the first depth image can be merged more accurately and rapidly, and the efficiency and the accuracy of image merging are improved.

In another possible implementation, the RGB image and the first depth image may be subjected to image alignment processing to obtain an aligned RGB image and an aligned first depth image, and the aligned RGB image and the aligned first depth image are subjected to image merging processing to obtain a merged image, which is recorded as an RGBD image.

Of course, other combination preprocessing methods may be adopted, such as performing image merging processing first and then performing image alignment processing, or sequentially performing image cropping processing, image alignment processing, image merging processing, and the like.

S404, inputting the preprocessed image into a prediction algorithm, predicting a part with missing depth information in the first depth image, and determining a second depth image corresponding to the RGB image.

In this step, after obtaining the pre-processed image, the pre-processed image may be one image or a plurality of images, including the information of the RGB image and the depth information of the first depth image, that is, the pre-processed image may be combined with a prediction algorithm to predict a portion of the first depth image where the depth information is missing, so as to obtain a second depth image which complements/fills all the depth information.

In this embodiment, the second depth image corresponding to the RGB image is obtained by performing preprocessing operations such as image alignment and image merging on the RGB image and the first depth image, and performing depth information prediction on the preprocessed image through a prediction algorithm. By performing depth information prediction after preprocessing operation, the efficiency and accuracy of prediction of missing depth information can be improved.

In the above embodiments, it is mentioned that the prediction algorithm may be various, and the following description is made specifically for the case where the prediction algorithm is a neural network model.

In another embodiment, another depth image prediction method is provided, and on the basis of the above embodiment, the above S404 may include the following step a:

and step A, inputting the preprocessed image into a neural network model for prediction, and determining a second depth image corresponding to the RGB image.

In this step, the neural network model may be a supervised neural network model, and the specific network model type is not specifically limited here.

In addition, the sample data set may be obtained by directly obtaining a plurality of sample RGB images, a sample depth image corresponding to each sample RGB image, and a standard depth image corresponding to each sample depth image. The method can also comprise the steps of obtaining a plurality of sample RGB images and a standard depth image corresponding to each sample RGB image, then carrying out image processing on each standard depth image to obtain a sample depth image corresponding to each standard depth image, and finally obtaining a sample data set; the image processing here may be, for example, performing cropping processing on the standard depth image, performing 0 setting operation on partial depth information in the standard depth image, and the like, and in short, a sample depth image corresponding to each standard depth image may be obtained.

Before the neural network model is used for depth information prediction, the neural network model may be trained in advance, as shown in fig. 5, and optionally, the training mode of the neural network model may include the following steps:

s502, preprocessing each sample RGB image and the corresponding sample depth image to determine a preprocessed sample image; the preprocessing operation includes at least one of an image alignment process and an image merging process.

In this step, each sample RGB image corresponds to one sample depth image, and each sample depth image corresponds to one standard depth image, that is, each sample RGB image corresponds to one standard depth image, where one sample image, its corresponding sample depth image, and the standard depth image may be a set of images. The acquisition range of the sample RGB image in each group of images is larger than that of the sample depth image, and the acquisition range of the sample RGB image is equal to that of the standard depth image.

In addition, before the preprocessing such as image alignment and image merging is performed on the sample image, other preprocessing operations such as graying out the sample RGB image, modifying the numerical value of the sample depth image using a specific mathematical model, cropping the rotated sample image, and the like may be performed on the sample image in the sample data set.

For the preprocessing operation performed on the sample RGB image and the corresponding sample depth image in each group of images, reference may be made to the explanation in S402, and details are not described here. The resulting pre-processed image may be recorded as a pre-processed sample image.

And S504, inputting the preprocessed sample images into the initial neural network model for prediction, and determining the predicted depth images corresponding to the sample depth images.

In this step, after obtaining the preprocessed sample image, the preprocessed sample image may be one sample image or a plurality of sample images, where the preprocessed sample image includes information of the sample RGB image and depth information of the sample depth image, that is, the preprocessed sample image may be input into the initial neural network model, and a portion of the sample depth image where the depth information is missing is predicted, so as to obtain a predicted depth image that complements/fills all the depth information.

S506, training the initial neural network model according to the predicted depth images and the corresponding standard depth images, and determining the neural network model.

In this step, each group of images can predict a corresponding predicted depth image through the initial neural network model, loss calculation is performed between the predicted depth image of each group and the corresponding standard depth image, the calculated losses of each group are summed, then parameters of the initial neural network model can be adjusted by adopting the calculated sum value, training of the initial neural network model is realized, and finally the trained neural network model is obtained.

In this embodiment, the preprocessed image is input into the neural network model to perform depth information prediction, so as to obtain a second depth image corresponding to the RGB image, and the neural network model prediction is adopted while combining part of depth information, so that the efficiency of depth information prediction and the accuracy of predicted depth information can be improved. Furthermore, the neural network model is trained through the predicted depth image and the standard depth image predicted by the neural network model, so that the training efficiency and accuracy of the neural network model can be further improved.

The above embodiment briefly mentions that the neural network model can be trained through the predicted depth images and the corresponding standard depth images in each set of images, and the training process is specifically described below.

In another embodiment, another depth image prediction method is provided, and based on the above embodiment, as shown in fig. 6, the above S506 may include the following steps:

s602, calculating first loss between each pixel point on each predicted depth image and the corresponding pixel point on the corresponding standard depth image.

In this step, after the predicted depth image of each group of images is obtained, taking a group of images as an example for illustration, each pixel point on the predicted depth image in the group of images and the pixel point on each corresponding position on the corresponding standard depth image can be obtained, and the loss between the pixel point on the predicted depth image and the pixel point on the corresponding position of the standard depth image is calculated by adopting a preset loss function in a pixel-by-pixel manner, and the loss of all the pixels on each group of images is summed, and the obtained loss sum value is recorded as a first loss. The loss function here may be a loss function for calculating a standard deviation, a variance, a mean square error, or the like, but may be another loss function, for example, a loss function for calculating similarity, or the like.

And S604, training the initial neural network model according to the first loss.

In this step, after the first loss corresponding to the pixel point is obtained, the initial neural network model may be directly trained using the first loss, or the initial neural network model may be trained in combination with other losses, and the following describes a case where the model training in combination with other losses is possible.

Optionally, the predicted normal map corresponding to each predicted depth image may be determined according to each predicted depth image, and the standard normal map corresponding to each standard depth image may be determined according to each sample depth image; calculating a second loss between each predicted normal map and the corresponding standard normal map; and training the initial neural network model according to the first loss and the second loss.

The normal map is obtained by taking a normal at each point of the concave-convex surface of the original object, and calculating the normal direction by using the pixel value of the depth image. The normal map corresponding to the pixel value on the predicted depth image in each group of images can be calculated and recorded as a predicted normal map, and the normal map corresponding to the pixel value on the standard depth image in the group of images can be calculated and recorded as a standard normal map.

After the predicted normal map and the standard normal map corresponding to each group of images are obtained, the loss between each group of predicted normal maps and the corresponding standard normal map can be calculated through a preset loss function and is recorded as a second loss. The loss function may be the same as or different from the loss function for calculating the first loss, and is not particularly limited herein.

After the first loss and the second loss corresponding to each group of images are obtained, the first loss and the second loss may be directly summed or weighted summed, and the initial neural network model is trained according to the obtained sum value, so as to obtain a trained neural network model.

In this embodiment, the first loss between each pixel point on the predicted depth image and each corresponding pixel point on the standard depth image is calculated, and the neural network model is trained through the first loss, so that the neural network model is trained through calculating the loss pixel by pixel, and the accuracy of training the neural network model can be improved. Furthermore, the neural network model is trained by combining the loss calculated by the normal map on the loss pixel by pixel, so that the loss function based on the normal map is constructed, geometric information can be introduced in the training of the neural network model to serve as a supervision signal, the defect of a pixel-by-pixel point training model mode can be overcome, the accuracy of the trained neural network model is further improved, and the accuracy of the neural network model for predicting depth information is improved.

In another embodiment, another depth image prediction method is provided, which involves processing operations after obtaining a second depth image, and on the basis of the above embodiment, the method may further include the following step B:

and B, denoising the second depth image.

In this step, the denoising process is mainly directed to a noise problem caused by hardware, and the hardware may be, for example, an RGB camera, a depth camera, a processing chip, and the like. The denoising process may be, but is not limited to, a specific algorithm, including a mathematical expression, a statistical model, a machine learning model, a neural network model, and the like.

In this embodiment, the second depth image with the complete depth information is denoised, so that problems caused by hardware can be eliminated, and the accuracy of the finally determined depth image is improved.

In order to facilitate detailed description of the technical solution of the present application, the technical solution is described in detail below with reference to a specific embodiment, and on the basis of the above embodiment, the method may include the following steps:

s1, collecting an RGB image of the target through an RGB camera and collecting a first depth image of the target through a depth camera; the acquisition range of the RGB image is larger than that of the first depth image.

And S2, carrying out image alignment processing on the RGB image and the first depth image to obtain an aligned RGB image and an aligned first depth image.

And S3, carrying out image merging processing on the aligned RGB image and the aligned first depth image to obtain a merged RGBD image, wherein the RGBD image is the RGB image comprising the depth information in the first depth image.

And S4, inputting the RGBD image into the neural network model for prediction, and determining a second depth image corresponding to the RGB image, wherein the acquisition range of the second depth image is approximately the same as that of the RGB image.

And S5, denoising the second depth image.

And S6, carrying out image alignment processing on the sample RGB image and the sample depth image to obtain an aligned sample RGB image and an aligned sample depth image.

And S7, carrying out image merging processing on the aligned sample RGB image and the aligned sample depth image to obtain a merged sample RGBD image, wherein the sample RGBD image is the sample RGB image including the depth information in the first depth image.

And S8, inputting the sample RGBD image into the neural network model for prediction, and determining a prediction depth image corresponding to the sample RGB image.

And S9, calculating first loss between each pixel point on each predicted depth image and the corresponding pixel point on the corresponding standard depth image.

And S10, determining a prediction normal map corresponding to each prediction depth image according to each prediction depth image, and determining a standard normal map corresponding to each standard depth image according to each sample depth image.

S11, a second penalty is calculated between each predicted normal map and the corresponding standard normal map.

And S12, training the initial neural network model after summing the first loss and the second loss.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a depth image prediction apparatus for implementing the depth image prediction method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the depth image prediction device provided below can be referred to the limitations of the depth image prediction method in the foregoing, and details are not described here.

In one embodiment, as shown in fig. 7, there is provided a depth image prediction apparatus including: an acquisition module 11 and a prediction module 12, wherein:

the acquisition module 11 is used for acquiring an RGB image and a first depth image of a target; the first depth image comprises depth information of a target, and the acquisition range of the RGB image is larger than that of the first depth image;

the prediction module 12 is configured to predict a portion of the first depth image with missing depth information according to the RGB image, the first depth image, and a preset prediction algorithm, and determine a second depth image corresponding to the RGB image; the capture range of the second depth image is substantially the same as the capture range of the RGB image.

In another embodiment, another depth image prediction apparatus is provided, and on the basis of the above embodiment, the prediction module 12 may include a preprocessing unit and a prediction unit, wherein:

the preprocessing unit is used for carrying out image preprocessing operation on the RGB image and the first depth image and determining a preprocessed image; the preprocessing operation comprises at least one of image alignment processing and image merging processing;

and the prediction unit is used for inputting the preprocessed image into a prediction algorithm, predicting the part with missing depth information in the first depth image and determining a second depth image corresponding to the RGB image.

In another embodiment, another depth image prediction apparatus is provided, and on the basis of the above embodiment, the prediction algorithm is a neural network model; the prediction unit is specifically configured to input the preprocessed image into a neural network model for prediction, and determine a second depth image corresponding to the RGB image; the neural network model is obtained by training a sample data set, the sample data set comprises a plurality of sample RGB images, a sample depth image corresponding to each sample RGB image and a standard depth image corresponding to each sample depth image, the acquisition range of each sample RGB image is larger than that of the corresponding sample depth image, and the acquisition range of each standard depth image is the same as that of the corresponding sample RGB image.

In another embodiment, another depth image prediction apparatus is provided, and on the basis of the above embodiment, the apparatus may further include a training module, where the training module may include a sample preprocessing unit, a sample prediction unit, and a training unit, where:

the sample preprocessing unit is used for preprocessing each sample RGB image and the corresponding sample depth image and determining a preprocessed sample image; the preprocessing operation comprises at least one of image alignment processing and image merging processing;

the sample prediction unit is used for inputting each preprocessed sample image into the initial neural network model for prediction and determining a prediction depth image corresponding to each sample depth image;

and the training unit is used for training the initial neural network model according to each predicted depth image and each corresponding standard depth image to determine the neural network model.

Optionally, the prediction unit may include a loss calculation subunit and a training subunit, where:

the loss calculation subunit is used for calculating first loss between each pixel point on each predicted depth image and the corresponding pixel point on the corresponding standard depth image;

and the training subunit is used for training the initial neural network model according to the first loss.

Optionally, the training subunit is specifically configured to determine, according to each predicted depth image, a predicted normal map corresponding to each predicted depth image; determining a standard normal map corresponding to each standard depth image according to each sample depth image; calculating a second loss between each predicted normal map and the corresponding standard normal map; and training the initial neural network model according to the first loss and the second loss.

In another embodiment, another depth image prediction apparatus is provided, and on the basis of the above embodiment, the apparatus may further include a post-processing module, configured to perform denoising processing on the second depth image.

The respective modules in the depth image prediction apparatus may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server or a terminal, and taking the terminal as an example, its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a prediction method for a depth image. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

collecting an RGB image and a first depth image of a target; the first depth image comprises depth information of a target, and the acquisition range of the RGB image is larger than that of the first depth image; predicting a part with missing depth information in the first depth image according to the RGB image, the first depth image and a preset prediction algorithm, and determining a second depth image corresponding to the RGB image; the capture range of the second depth image is substantially the same as the capture range of the RGB image.

In one embodiment, the processor, when executing the computer program, further performs the steps of:

performing image preprocessing operation on the RGB image and the first depth image, and determining a preprocessed image; the preprocessing operation comprises at least one of image alignment processing and image merging processing; and inputting the preprocessed image into a prediction algorithm, predicting the part with missing depth information in the first depth image, and determining a second depth image corresponding to the RGB image.

inputting the preprocessed image into a neural network model for prediction, and determining a second depth image corresponding to the RGB image; the neural network model is obtained by training a sample data set, the sample data set comprises a plurality of sample RGB images, a sample depth image corresponding to each sample RGB image and a standard depth image corresponding to each sample depth image, the acquisition range of each sample RGB image is larger than that of the corresponding sample depth image, and the acquisition range of each standard depth image is the same as that of the corresponding sample RGB image.

preprocessing each sample RGB image and the corresponding sample depth image to determine a preprocessed sample image; the preprocessing operation comprises at least one of image alignment processing and image merging processing; inputting each preprocessed sample image into an initial neural network model for prediction, and determining a prediction depth image corresponding to each sample depth image; and training the initial neural network model according to each predicted depth image and each corresponding standard depth image to determine the neural network model.

calculating first loss between each pixel point on each predicted depth image and the corresponding pixel point on the corresponding standard depth image; the initial neural network model is trained based on the first loss.

determining a prediction normal map corresponding to each prediction depth image according to each prediction depth image; determining a standard normal map corresponding to each standard depth image according to each sample depth image; calculating a second loss between each predicted normal map and the corresponding standard normal map; and training the initial neural network model according to the first loss and the second loss.

and denoising the second depth image.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

and denoising the second depth image.

In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of:

and denoising the second depth image.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims

1. A method for predicting a depth image, the method comprising:

collecting an RGB image and a first depth image of a target; the first depth image comprises depth information of the target, and the acquisition range of the RGB image is larger than that of the first depth image;

predicting a part with missing depth information in the first depth image according to the RGB image, the first depth image and a preset prediction algorithm, and determining a second depth image corresponding to the RGB image; the acquisition range of the second depth image is substantially the same as the acquisition range of the RGB image.

2. The method according to claim 1, wherein the predicting, according to the RGB image and the first depth image and a preset prediction algorithm, a portion of the first depth image where depth information is missing, and determining a second depth image corresponding to the RGB image comprises:

performing image preprocessing operation on the RGB image and the first depth image to determine a preprocessed image; the preprocessing operation comprises at least one of an image alignment process and an image merging process;

inputting the preprocessed image into the prediction algorithm, predicting a part with missing depth information in the first depth image, and determining a second depth image corresponding to the RGB image.

3. The method of claim 2, wherein the predictive algorithm is a neural network model; the inputting the preprocessed image into the prediction algorithm, predicting a part of the first depth image with missing depth information, and determining a second depth image corresponding to the RGB image includes:

inputting the preprocessed image into the neural network model for prediction, and determining a second depth image corresponding to the RGB image;

4. The method of claim 3, wherein the neural network model is trained by:

preprocessing each sample RGB image and the corresponding sample depth image to determine a preprocessed sample image; the preprocessing operation comprises at least one of an image alignment process and an image merging process;

5. The method of claim 4, wherein training the initial neural network model based on each of the predicted depth images and each of the corresponding standard depth images comprises:

training the initial neural network model according to the first loss.

6. The method of claim 5, wherein training the initial neural network model based on the first loss comprises:

determining a prediction normal map corresponding to each predicted depth image according to each predicted depth image;

calculating a second penalty between each of said predicted normal maps and the corresponding standard normal map;

training the initial neural network model according to the first loss and the second loss.

7. The method of claim 1, further comprising:

and denoising the second depth image.

8. An apparatus for predicting a depth image, the apparatus comprising:

the acquisition module is used for acquiring an RGB image and a first depth image of a target; the first depth image comprises depth information of the target, and the acquisition range of the RGB image is larger than that of the first depth image;

the prediction module is used for predicting a part with missing depth information in the first depth image according to the RGB image, the first depth image and a preset prediction algorithm, and determining a second depth image corresponding to the RGB image; the acquisition range of the second depth image is substantially the same as the acquisition range of the RGB image.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.