CN115272709B

CN115272709B - Training method, device, equipment and medium of depth completion model

Info

Publication number: CN115272709B
Application number: CN202210908195.4A
Authority: CN
Inventors: 崔致豪; 丁有爽; 邵天兰
Original assignee: Mech Mind Robotics Technologies Co Ltd
Current assignee: Mech Mind Robotics Technologies Co Ltd
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2023-08-15
Anticipated expiration: 2042-07-29
Also published as: CN115272709A

Abstract

The disclosure provides a training method, a training device, training equipment and training media for a depth completion model, wherein the training method comprises the steps of obtaining a training image and a first depth image corresponding to the training image, and the training image is a two-dimensional image; adding a depth defect into the first depth image to generate a second depth image corresponding to the training image; and performing depth complement training on the depth complement model according to the training image, the first depth image and the second depth image, so that the depth complement model can output the depth image subjected to depth complement based on the input depth image with depth deletion. The depth completion model is built and the deep learning is performed, the trained depth completion model is utilized to complete the depth map with the missing, and a depth completion solution is provided for the depth map with the depth missing.

Description

Training method, device, equipment and medium of depth completion model

Technical Field

The disclosure relates to the field of artificial intelligence, and in particular relates to a training method, device, equipment and medium of a depth completion model.

Background

With the development of artificial intelligence technology, the application field is wider and wider, for example, with the popularization of deep learning training models, the image complement is performed by using a deep learning method.

In practical application, taking an intelligent sorting model as an example, the depth map of an object needs to be analyzed to accurately grasp the object, but the intelligent sorting model has barriers in the process of grasping the object due to the fact that the depth in the depth map of the object is missing due to shielding of the object and the like. Therefore, depth-filling of a depth map having a depth deficiency is a problem to be solved.

Disclosure of Invention

The disclosure provides a training method, device, equipment and medium of a depth completion model, which are used for completing a depth map with a missing.

In one aspect, the present disclosure provides a training method of a depth completion model, including:

acquiring a training image and a first depth image corresponding to the training image, wherein the training image is a two-dimensional image;

adding a depth defect into the first depth image to generate a second depth image corresponding to the training image;

and performing depth complement training on the depth complement model according to the training image, the first depth image and the second depth image, so that the depth complement model can output the depth image subjected to depth complement based on the input depth missing image.

In one embodiment, the adding a depth defect to the first depth image, and generating a second depth image corresponding to the training image, includes:

randomly removing local area depth information of the first depth image to generate a second depth image corresponding to the training image; or alternatively, the process may be performed,

and using the object depth image with depth deficiency to shade the local area of the first depth image, and generating a second depth image corresponding to the training image.

In one embodiment, the performing depth complement training on the depth complement model according to the training image, the first depth image, and the second depth image includes:

inputting the training image and the second depth image into a current depth complement model to obtain a third depth image corresponding to the training image output by the depth complement model;

determining a loss value of a current depth completion model based on a loss function according to the third depth image and the first depth image;

and carrying out parameter adjustment on the current depth completion model according to the loss value until the current depth completion model meets the preset convergence condition, and obtaining the trained depth completion model.

In one embodiment, the loss function is a mean square loss function.

In one embodiment, the depth completion model includes: the first branch, the second branch and the fusion layer;

the first branch is used for inputting the training image, and comprises a plurality of first sub-modules which are sequentially connected, wherein the first sub-modules are used for executing feature extraction on color information, texture information, edge information and space information;

the second branch is used for inputting the second depth image, the second branch comprises a downsampling module and an upsampling module, the downsampling module is used for executing downsampling, and extracting the characteristics of depth information and spatial information from the downsampling result; the up-sampling module is used for extracting the characteristics of the depth information and the space information and executing up-sampling on the result of the characteristic extraction; wherein the number of the downsampling modules is the same as the number of the upsampling modules;

and the fusion layer is used for carrying out feature fusion on the features output by the first branch and the features output by the second branch to obtain the output of the depth completion model.

In one embodiment, the downsampling modules are in one-to-one correspondence with the upsampling modules, wherein the output size of a downsampling module is the same as the input size of a corresponding upsampling module, and at least one downsampling module provides cross-layer connection to a corresponding upsampling module;

And the downsampling module is provided with a cross-layer connection and is used for transmitting the shallow layer characteristics output by the downsampling module to the corresponding upsampling module through the cross-layer connection.

In one embodiment, an up-sampling module of cross-layer connection is provided, which is specifically configured to perform feature extraction of depth information and spatial information on a result of superposition of a shallow feature transmitted by the cross-layer connection and a deep feature output by a previous module of the up-sampling module, and perform up-sampling on a result of feature extraction.

In one embodiment, each of the upsampling modules and each of the downsampling modules includes a second sub-module for performing feature extraction of depth information and spatial information.

In one embodiment, the first sub-module and the second sub-module are both residual channel attention block models.

In another aspect, the present disclosure provides a depth complement image generating method, including:

acquiring a depth image to be processed, wherein the depth image to be processed comprises a depth missing region;

inputting the depth image to be processed into a depth complement model to obtain a depth complement image subjected to depth complement; the depth completion model is generated by training by adopting the training method of the depth completion model.

In yet another aspect, the present disclosure provides a training device for a depth completion model, including:

the acquisition module is used for acquiring a training image and a first depth image corresponding to the training image, wherein the training image is a two-dimensional image;

the processing module is used for adding depth defects into the first depth image and generating a second depth image corresponding to the training image;

and the training module is used for performing depth complement training on the depth complement model according to the training image, the first depth image and the second depth image so that the depth complement model can output the depth image subjected to depth complement based on the input depth missing image.

In yet another aspect, the present disclosure provides an electronic device comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to implement the training method of the depth-completion model as described in any one of the preceding claims or the depth-completion image generation method as described above.

In yet another aspect, the present disclosure provides a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, are configured to implement a training method of a depth-complement model as set forth in any one of the preceding claims or a depth-complement image generation method as set forth in the preceding claims.

In a further aspect, the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements the training method of the depth complement model as set forth in any one of the preceding claims or the depth complement image generation method as set forth in the preceding claims.

According to the training method, the training device, the training equipment and the training medium of the depth completion model, a training image and a first depth image corresponding to the training image are obtained, a second depth image with a depth defect is added into the first depth image, and the depth completion training is carried out on the depth completion model according to the training image, the first depth image and the second depth image, so that the depth completion model can output the depth image subjected to the depth completion based on the depth image with the depth missing. According to the depth completion method and the depth completion device, the depth completion model is built and the deep learning is carried out, the trained depth completion model is utilized to carry out completion on the depth map with the depth missing, a depth completion solution is provided for the depth map with the depth missing, the depth map with the depth missing can be subjected to depth completion, and the depth complete depth map is obtained.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic view of an application scenario of an example of the present disclosure;

FIG. 2 is a schematic flow chart of a training method of a depth completion model according to an embodiment of the disclosure;

FIG. 3 is a flow chart of another training method of a depth completion model according to an embodiment of the disclosure;

FIG. 4 is a flow chart of a training method of a depth completion model according to a first embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a depth-completion model according to an embodiment of the disclosure;

FIG. 6 is a schematic structural diagram of another depth-completion model according to an embodiment of the disclosure;

FIG. 7 is a schematic structural diagram of yet another depth-completion model according to a first embodiment of the disclosure;

FIG. 8 is a schematic structural diagram of yet another depth-completion model according to a first embodiment of the disclosure;

FIG. 9 is a schematic view of a depth-completion model according to a first embodiment of the present disclosure

FIG. 10 is a training device for a depth completion model according to a third embodiment of the present disclosure;

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Reference numerals illustrate:

51: a first sub-module;

52: a downsampling module;

53: an up-sampling module;

61: and a second sub-module.

Specific embodiments of the present disclosure have been shown by way of the above drawings and will be described in more detail below. These drawings and the written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the disclosed concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

At present, artificial intelligence is usually realized through machine learning, the machine learning is trained according to a learning sample, a perfect intelligent model is built, and intelligent operation of a machine is realized. For example, with intelligent sorting scenes, accurate grabbing of objects needs to be performed by analyzing depth maps of the objects, but depth in the depth maps of the objects is missing due to shielding and the like of the objects, an untrained machine cannot judge the accurate position and shape of the objects aiming at the depth maps of the depth missing, and grabbing obstacles exist.

In order to complete the depth map of the depth deficiency, a depth completion model is firstly established for the sorted machine, and training is carried out by a learning sample to obtain an intelligent depth completion model. Fig. 1 is a schematic view of an application scenario of an example of the present disclosure. As shown in the figure, the depth completion model can perform depth completion on the depth map with depth deficiency, and specifically, the depth map with depth deficiency is input into the depth completion model to obtain a depth map after depth completion. Optionally, the depth map after depth completion can be used for realizing accurate positioning of the intelligent sorting model on the position and the shape of the object and realizing accurate grabbing of the object.

It should be noted that the brief description of the terms in the present disclosure is only for convenience in understanding the embodiments described below, and is not intended to limit the embodiments of the present disclosure. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.

The technical scheme of the present disclosure and the technical scheme of the present disclosure are described in detail below with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. In the description of the present disclosure, the terms are to be construed broadly in the art, unless explicitly stated or defined otherwise. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.

Example 1

Fig. 2 is a flow chart of a training method of a depth completion model according to an embodiment of the disclosure, as shown in fig. 1, where the method includes:

step 101, acquiring a training image and a first depth image corresponding to the training image, wherein the training image is a two-dimensional image;

step 102, adding a depth defect into the first depth image, and generating a second depth image corresponding to the training image;

and 103, performing depth complement training on the depth complement model according to the training image, the first depth image and the second depth image, so that the depth complement model can output the depth image subjected to depth complement based on the input depth missing image.

The execution subject of the embodiment is a training device of a depth completion model, which may be implemented by a computer program, for example, application software or the like; alternatively, the computer program may be implemented as a medium storing a related computer program, for example, a usb disk, a cloud disk, or the like; still alternatively, it may be implemented by a physical device, e.g., a chip or the like, in which the relevant computer program is integrated or installed.

Training of an intelligent model generally requires taking a training sample as an input of the model, and then determining parameters of the model according to an optimization algorithm, so that effects which are required to be achieved by the intelligent model are achieved. For example, the training sample of the depth completion model mainly includes a training image, a first depth image and a second depth image. The training image is a two-dimensional image, and the acquisition mode of the training image is not limited. For example, the image acquisition can be performed on the objects which are actually placed. Alternatively, the image capturing manner is not limited, and may be performed by an image capturing device such as a camera or a video camera. The first depth image is a depth image corresponding to the training image, and is a depth image with perfect depth information. Similarly, the first depth image may be obtained by a depth camera, for example, and it should be noted that the first depth image corresponds to the training image, for example, the first depth image is the same as a picture scene in the training image. Wherein the second depth image is obtained by adding a depth defect to the first depth image. In practical applications, there are various ways to add depth defects.

Optionally, fig. 3 is a flow chart of another training method of a depth completion model according to the first embodiment of the present disclosure, as shown in fig. 3, step 102 includes:

step 201, randomly removing local area depth information of the first depth image, and generating a second depth image corresponding to the training image; or alternatively, the process may be performed,

step 202, using an object depth image with depth deficiency to shade a local area of the first depth image, and generating a second depth image corresponding to the training image.

Specifically, the method for adding the depth defect may include, but is not limited to, removing the depth information and shielding the depth information, thereby achieving the effect of adding the depth defect. Taking a mode of removing depth information as an example, the depth information corresponding to a certain local area in the first depth image can be randomly removed, so that the removed part does not have the depth information, and a second depth image added with depth deletion is obtained. Taking a manner of shielding depth information as an example, a certain local area in the first depth image can be shielded by using an object depth image with less depth information, so as to obtain a second depth image added with depth deletion. Alternatively, the object image used to block the depth information may be a depth image of a transparent object.

In practical applications, the intelligent model is required to achieve the desired effect, and model training is required to be performed based on training samples, so that a trained model is obtained. For example, after the second depth image is obtained, the training image, the first depth image, and the second depth image may be used as training samples of the depth completion model for training.

In an example, fig. 4 is a flow chart of a training method of a depth completion model according to the first embodiment of the present disclosure, as shown in fig. 4, in step 103, the performing depth completion training on the depth completion model according to the training image, the first depth image, and the second depth image includes:

step 301, inputting the training image and the second depth image into a current depth complement model to obtain a third depth image corresponding to the training image output by the depth complement model;

step 302, determining a loss value of a current depth completion model based on a loss function according to the third depth image and the first depth image;

and 303, carrying out parameter adjustment on the current depth completion model according to the loss value until the current depth completion model meets the preset convergence condition, and obtaining the trained depth completion model.

In combination with the scene example, before training the depth completion model, parameters in the depth completion model are default values, and proper parameter values need to be determined through training of training samples. And the training image and the second depth image with the defects can be used as first input to obtain a third depth image output by the current depth complement model. And judging whether the model meets the accuracy requirement or not based on the difference between the first depth image and the third depth image. As an example, a smaller difference between the first depth image and the third depth image indicates a better accuracy of the model. And then, based on the current difference between the first depth image and the third depth image, adjusting the parameter value in the depth completion model by using a loss function, selecting the training image and the second depth image again, and inputting the training image and the second depth image into the model after parameter adjustment until the difference between the third depth image and the first depth image output by the current model is small enough, wherein the difference indicates that the depth completion model at the moment is accurate, and the training image can be used as the trained depth completion model. Optionally, the loss function is a mean square loss function, and any function may be used as the loss function, but generally, the mean square loss function calculated by using a mean square error is most used, and the purpose of using the mean square loss function is to make the difference between the first depth image and the third depth image smaller and better, so as to obtain a reference index of an optimal parameter solution.

In one example, fig. 5 is a schematic structural diagram of a depth complement model according to a first embodiment of the disclosure, where the depth complement model includes: the first branch, the second branch and the fusion layer;

the first branch is used for inputting the training image, the first branch comprises a plurality of first sub-modules 51 which are sequentially connected, and the first sub-modules 51 are used for executing feature extraction on color information, texture information, edge information and space information;

the second branch is used for inputting the second depth image, the second branch comprises a downsampling module 52 and an upsampling module 53, the downsampling module 52 is used for executing downsampling, and extracting the characteristics of depth information and spatial information from the downsampling result; the up-sampling module 53 is configured to perform feature extraction of depth information and spatial information, and perform up-sampling on a result of the feature extraction; wherein the number of downsampling modules 52 is the same as the number of upsampling modules 53;

And inputting the training image and the second depth image into the depth complement model in combination with a scene example, and obtaining a depth map subjected to depth complement through processing of the depth complement model, wherein in the processing of the depth complement model, the input training image and the second depth image are required to be subjected to feature extraction, and the extracted image features are required to be fused.

Because the feature information of the training image and the first depth image needs to be extracted respectively, two branches, namely a first branch and a second branch, can be arranged in the depth complement model, the training image is subjected to feature extraction through the first branch, and the second depth image is subjected to feature extraction through the second branch.

The first sub-module 51 is a feature extraction module of the first sub-module, the first sub-module 51 extracts features of the training image, and feature information of the image mainly comprises color information, texture information, edge information and spatial information.

The second branch is respectively provided with a downsampling module 52 and an upsampling module 53, and the number of the downsampling modules 52 is the same as the number of the upsampling modules 53. Because the second depth image is a depth map, features of the second depth image include depth information and spatial information.

Optionally, fig. 6 is a schematic structural diagram of another depth-complement model according to the first embodiment of the disclosure, where each up-sampling module 53 and each down-sampling module 52 include a second sub-module 61, and the second sub-module 61 is configured to perform feature extraction of depth information and spatial information.

The downsampling module 52 is composed of two parts, namely a downsampling operation module and a second submodule 61, wherein the downsampling operation module is responsible for downsampling the second depth image, and the second submodule 61 is responsible for extracting features of the second depth image. Similarly, the upsampling module 53 is also formed by two parts, which are an upsampling operation module and a second sub-module 61, where the upsampling operation module is responsible for upsampling the second depth image, and the second sub-module 61 is responsible for extracting features of the second depth image.

Optionally, fig. 7 is a schematic structural diagram of yet another depth-complementary model provided in the first embodiment of the disclosure, as shown in fig. 7, where the first sub-module 51 and the second sub-module 61 are both residual channel attention modules (residual channel attention block, abbreviated as RCAB). Because the first sub-module 51 and the second sub-module 61 both perform feature extraction on the image, the first sub-module 51 and the second sub-module 61 may be selected as a residual channel attention block model. Each RCAB part is provided with a plurality of channels, each channel can extract different characteristic information of an image, the characteristics of the information channels can be adaptively corrected correspondingly, the network characterization capability is improved, and the image characteristics are fully extracted in a multistage characteristic fusion mode.

In the process of extracting the depth information and the spatial information from the second depth image by the second branch, the second branch sequentially passes through each downsampling module 52, and in the process of sequentially passing through each downsampling module 52, downsampling is performed by the downsampling operation module, and then the depth information and the spatial information are extracted by the RCAB in the downsampling module 52. After each downsampling module 52 completes feature extraction of the second depth image in turn, the feature extraction passes through each upsampling module 53 in turn. In the process of passing through the upsampling module 53, the depth information and the spatial information of the second depth image are extracted through the RCAB first, and then the upsampling operation module upsamples the second depth image.

After the first branch and the second branch respectively extract the features of the training image and the second depth image, the extracted feature information is input into a fusion layer, the fusion layer fuses the feature information obtained by the two branches, and then the depth image subjected to depth complementation is output.

In one example, fig. 8 is a schematic structural diagram of still another depth-complement model provided in the first embodiment of the disclosure, where downsampling modules 52 are in one-to-one correspondence with upsampling modules 53, and an output size of a downsampling module 52 is the same as an input size of a corresponding upsampling module 53, and at least one downsampling module 52 provides a cross-layer connection to the corresponding upsampling module 53;

A cross-layer connection downsampling module 52 is provided for transmitting the shallow features output by the downsampling module 52 to the corresponding upsampling module via the cross-layer connection.

In combination with a scene example, the modules for extracting the features in the first branch and the second branch are RCAB, a plurality of channels exist in the RCAB, and different features of the pictures are extracted in different channels. The number of the downsampling modules 52 is the same as that of the upsampling modules 53, and the downsampling module 52 firstly performs the process of reducing the image size on the second depth image by the downsampling operation module before performing the feature extraction on the second depth image, so as to increase the number of channels in which the feature extraction can be performed by the RCAB in order to better adapt to the multi-channel feature extraction of the RCAB. And after the second depth image is reduced in size, the reduced second depth image is duplicated for multiple times, and then is respectively input into different channels in the RCAB for feature extraction.

As shown in fig. 8, two downsampling modules 52 are illustrated in the second branch, and as can be seen from the figure, after the feature extraction performed by the first downsampling module 52, a second depth image that is size-compressed is output and input to the second downsampling module 52. The downsampling process module in the second downsampling module 52 performs the downscaling operation on the downscaled second depth image again based on the output of the first downsampling module 52, and then performs feature extraction by the RCAB in the second downsampling module 52, so that the features pass through all the downsampling modules 52.

After passing through all downsampling modules 52 in turn, each upsampling module 53 in turn. In passing through each up-sampling module 53, features for the second depth image are first extracted through the RCAB in the up-sampling module 53, and then through up-sampling processing modules in the up-sampling module 53, which may stretch the size of the image. The stretching degree of the up-sampling processing module on the image is the same as the compression degree of the down-sampling operation module on the image, for example, the compression of the down-sampling operation module on the image can enable the image to be half of the original size, and the stretching of the up-sampling processing module on the image can enable the image to be twice of the original size. Since the up-sampling modules 53 and the down-sampling modules 52 in the second branch are the same in number, the size of the second depth image finally output by the second branch is the same as that at the time of the original input.

Since the up-sampling modules 53 and the down-sampling modules 52 in the second branch are the same in number, and the up-sampling modules 53 and the down-sampling modules 52 are sequentially connected, the up-sampling modules 53 and the down-sampling modules 52 are in one-to-one correspondence with each other with the central axis of the second branch as a symmetry line. For example, taking fig. 8 as an example, the second downsampling module 52 corresponds to the first upsampling module 53 in the order from left to right, and the first downsampling module 52 corresponds to the second upsampling module 53, so that it can be seen that the image sizes of feature extraction performed by the mutually corresponding upsampling modules 53 and the RCABs in the downsampling modules 52 are the same.

The cross-linking operation is performed between the up-sampling module 53 and the down-sampling module 52, which correspond to each other, so that the shallow features of the image extracted by the down-sampling module 52 are transmitted to the corresponding up-sampling module 53, and are overlapped with the deep features of the image extracted by the up-sampling module 53, so as to obtain relatively comprehensive image feature information. The image shallow features and the image deep features are defined in relative extraction sequence, the extracted image features are the image shallow features, and the extracted image features are the image deep features. For example, in the second branch, the feature extraction of the second depth image by the downsampling module 52 is prior, so that the feature extracted by the downsampling module 52 for the second depth image is referred to as a shallow feature. Similarly, the feature extraction of the second depth image by the upsampling module 53 is followed by, so the feature extracted by the upsampling module 53 for the second depth image is referred to as a deep feature.

In one example, an up-sampling module 53 of a cross-layer connection is provided, which is specifically configured to perform feature extraction of depth information and spatial information on a result of superposition of a shallow feature transmitted by the cross-layer connection and a deep feature output by a previous module of the up-sampling module 53, and perform up-sampling on a result of feature extraction.

As can be seen from fig. 8, the input of each up-sampling module 53 provided with a cross-layer connection includes two parts, namely, the shallow layer feature output by the down-sampling module 52 corresponding to the cross-layer connection and the deep layer feature output by the previous up-sampling module 53, so that each up-sampling module 53 needs to superimpose the received deep layer feature and the shallow layer feature, and according to the result of the superposition, perform feature extraction of depth information and spatial information on the second depth image again, and perform up-sampling processing on the extracted feature result, and transmit the extracted feature result to the next up-sampling module 53 until all up-sampling modules 53 in the second branch are completed. The present example extracts the features of the second depth image through the downsampling module 52 and the upsampling module 53, and fuses the extracted different image features, so that the extracted image features are richer and more comprehensive, and sufficient image features are provided for depth complement of the second depth image.

Optionally, fig. 9 is a schematic structural diagram of still another depth-complement model provided in the first embodiment of the present disclosure, as shown in fig. 9, one or more multi-layer perceptron (Multilayer Perceptron, abbreviated as MLP) may be separately disposed between the second downsampling module 52 and the first upsampling module 53 in order from left to right, where the MLP is mainly used to adjust the dimension corresponding to the extracted image feature, so as to facilitate reestablishing the depth information of the image, and the MLP is also referred to as a fully connected neural network or a fully connected layer. When the MLP exists between the second downsampling module 52 and the first upsampling module 53, the image feature information directly transmitted to the first upsampling module 53 by the second downsampling module 52 is processed by the MLP, so that the image feature information finally input to the first upsampling module 53 is different from the image feature information input to the first upsampling module 53 by the second downsampling module 52 in a cross-link manner, and thus the first upsampling module receives two different image feature information, and then fuses the received image feature information, so that the image feature information obtained and extractable by the first upsampling module 53 is more sufficient and rich.

According to the embodiment, firstly, a depth complement model is established, then a training image and a first depth image corresponding to the training image are acquired, a second depth image with a depth defect is added into the first depth image, and according to the training image, the first depth image and the second depth image, the depth complement model is subjected to depth complement training, so that the depth complement model can output a depth image subjected to depth complement based on the input of the depth image with depth missing. According to the depth completion method and the depth completion device, the depth completion model is built and the deep learning is carried out, the trained depth completion model is utilized to carry out completion on the depth map with the depth missing, a depth completion solution is provided for the depth map with the depth missing, the depth map with the depth missing can be subjected to depth completion, and the depth complete depth map is obtained.

Example two

The present disclosure also provides a depth complement image generating method, including:

inputting the depth image to be processed into a depth complement model to obtain a depth complement image subjected to depth complement; the depth completion model is generated by training by the training method of the depth completion model.

In practical application, along with the development of artificial intelligence, more and more occasions realize people's demand through adopting artificial intelligence, for example take intelligent letter sorting model as the example, letter sorting model needs the depth map of analysis object to carry out the accurate snatch of object, but because there is the shielding between the object to put at will etc. causes between the object to there is the loss of object degree of depth when causing intelligent letter sorting model to observe the object, causes intelligent letter sorting model to have the obstacle at the in-process of snatching the object.

In combination with a scene example, the embodiment provides a method for completing the depth of an image by establishing and training a depth completion model, a depth map with depth defects is input into the depth completion model, and the depth map after the depth completion operation of the depth completion model can be obtained.

Specifically, in the process of training the depth completion model, a training sample is required to be firstly obtained, the training sample is taken as input, a depth image subjected to depth completion is output, and the difference between the depth image output by the depth completion model and a standard depth image in the training sample is reduced through parameters corresponding to the optimized model. Two branches are arranged in the depth complement model, namely a first branch and a second branch. Firstly, a training image is acquired, wherein the training image is an image obtained by acquiring a real object, the image is input into the first branch for extracting the characteristics of the image, and the characteristics comprise color information, texture information, edge information and space information. And acquiring a first depth image corresponding to the training image again, wherein no depth deletion exists in the first depth image, obtaining a second depth image with defects by a method for randomly removing depth values in the first depth image or a method for shielding the first depth image, and inputting the second depth image into the second branch for extracting depth information and spatial information. And finally, fusing the characteristic information extracted by the first branch and the second branch to obtain a completed depth image, and determining parameters in a depth completion model according to a mean square loss function by the completed depth image and the first depth image without the depth defect to complete training of the depth completion model.

According to the embodiment, the depth image with the depth deficiency is input into the depth complement model, and the depth complement model complements the depth image with the defect, so that a complete depth image method can be obtained.

Example III

In one example, fig. 10 is a depth complement model training apparatus provided in a third embodiment of the disclosure, including:

an acquiring module 71, configured to acquire a training image and a first depth image corresponding to the training image, where the training image is a two-dimensional image;

a processing module 72, configured to add a depth defect to the first depth image, and generate a second depth image corresponding to the training image;

the training module 73 is configured to perform depth-complement training on the depth-complement model according to the training image, the first depth image, and the second depth image, so that the depth-complement model can output a depth-complement-processed depth image based on the input depth-missing image.

In connection with the scenario example, training of the smart model generally requires the acquisition module 71 to acquire training samples as inputs to the model, and then determine parameters of the model according to an optimization algorithm, thereby achieving the effect that the smart model is required to achieve. For example, the training sample of the depth completion model mainly includes a training image, a first depth image and a second depth image. The training image is a two-dimensional image, and the acquisition mode of the training image is not limited. For example, the image acquisition can be performed on the objects which are actually placed. Alternatively, the image capturing manner is not limited, and may be performed by an image capturing device such as a camera or a video camera. The first depth image is a depth image corresponding to the training image, and is a depth image with perfect depth information. Similarly, the first depth image may be obtained by a depth camera, for example, and it should be noted that the first depth image corresponds to the training image, for example, the first depth image is the same as a picture scene in the training image. Wherein the second depth image is obtained by adding a depth defect to the first depth image by the processing module 72. In practical applications, there are various ways to add depth defects.

Specifically, the method of adding the depth defect by the processing module 72 may include, but is not limited to, removing the depth information and blocking the depth information, thereby achieving the effect of adding the depth defect. Taking a mode of removing depth information as an example, the depth information corresponding to a certain local area in the first depth image can be randomly removed, so that the removed part does not have the depth information, and a second depth image added with depth deletion is obtained. Taking a manner of shielding depth information as an example, a certain local area in the first depth image can be shielded by using an object depth image with less depth information, so as to obtain a second depth image added with depth deletion. Alternatively, the object image used to block the depth information may be a depth image of a transparent object.

In practice, the smart model is intended to achieve the desired effect, and the training module 73 is required to perform model training based on the training samples, thereby obtaining a trained model. For example, after the second depth image is obtained, the training image, the first depth image, and the second depth image may be used as training samples of the depth completion model for training.

In combination with the scene example, before training the depth completion model, parameters in the depth completion model are default values, and proper parameter values need to be determined through training of training samples. And the training image and the second depth image with the defects can be used as first input to obtain a third depth image output by the current depth complement model. And judging whether the model meets the accuracy requirement or not based on the difference between the first depth image and the third depth image. As an example, a smaller difference between the first depth image and the third depth image indicates a better accuracy of the model. And then, based on the current difference between the first depth image and the third depth image, adjusting a parameter value in the depth completion model, selecting a training image and the second depth image again, and inputting the training image and the second depth image into the model after parameter adjustment until the difference between the third depth image output by the current model and the first depth image is small enough, so that the depth completion model at the moment is accurate, and the training image can be used as the trained depth completion model. The loss function is a mean square loss function, and any function can be used as the loss function, but the mean square loss function calculated by mean square error is most commonly used, and the purpose of using the mean square loss function is to make the difference between the first depth image and the third depth image smaller and better, so as to obtain the reference index of the optimal parameter solution.

Optionally, the downsampling module 52 is formed by two parts, that is, a downsampling operation module and a second sub-module 61, where the downsampling operation module is responsible for downsampling the second depth image, and the second sub-module 61 is responsible for extracting features of the second depth image. Similarly, the upsampling module 53 is also formed by two parts, which are an upsampling operation module and a second sub-module 61, where the upsampling operation module is responsible for upsampling the second depth image, and the second sub-module 61 is responsible for extracting features of the second depth image.

Optionally, the first sub-module 51 and the second sub-module 61 are both residual channel attention modules (residual channel attention block, abbreviated as RCAB). Because the first sub-module 51 and the second sub-module 61 both perform feature extraction on the image, the first sub-module 51 and the second sub-module 61 may be selected as a residual channel attention block model. Each RCAB part is provided with a plurality of channels, each channel can extract different characteristic information of an image, the characteristics of the information channels can be adaptively corrected correspondingly, the network characterization capability is improved, and the image characteristics are fully extracted in a multistage characteristic fusion mode.

Optionally, in combination with the scenario example, the modules for extracting features in the first branch and the second branch are RCAB, multiple channels exist in the RCAB, and different features of the picture are extracted in different channels. The number of the downsampling modules 52 is the same as that of the upsampling modules 53, and the downsampling module 52 firstly performs the process of reducing the image size on the second depth image by the downsampling operation module before performing the feature extraction on the second depth image, so as to increase the number of channels in which the feature extraction can be performed by the RCAB in order to better adapt to the multi-channel feature extraction of the RCAB. And after the second depth image is reduced in size, the reduced second depth image is duplicated for multiple times, and then is respectively input into different channels in the RCAB for feature extraction.

In fig. 8, two downsampling modules 52 are illustrated in the second branch, and as can be seen from the figure, after the feature extraction performed by the first downsampling module 52, a second depth image that is size-compressed is output and input to the second downsampling module 52. The downsampling process module in the second downsampling module 52 performs the downscaling operation on the downscaled second depth image again based on the output of the first downsampling module 52, and then performs feature extraction by the RCAB in the second downsampling module 52, so that the features pass through all the downsampling modules 52.

The cross-linking operation is performed between the up-sampling module 53 and the down-sampling module 52, which correspond to each other, so that the shallow features of the image extracted by the down-sampling module 52 are transmitted to the corresponding up-sampling module 53, and are overlapped with the deep features of the image extracted by the up-sampling module 53, so as to obtain relatively comprehensive image feature information.

Each input of the up-sampling module 53 provided with the cross-layer connection includes two parts, namely, shallow layer characteristics output by the down-sampling module 52 corresponding to the cross-layer connection and deep layer characteristics output by the previous up-sampling module 53, so that each up-sampling module 53 needs to superimpose the received deep layer characteristics and the shallow layer characteristics, extract characteristics of depth information and spatial information again on the second depth image according to the superimposed result, up-sample the extracted characteristic result, and transmit the extracted characteristic result to the next up-sampling module 53 until all up-sampling modules 53 in the second branch are completed. The present example extracts the features of the second depth image through the downsampling module 52 and the upsampling module 53, and fuses the extracted different image features, so that the extracted image features are richer and more comprehensive, and sufficient image features are provided for depth complement of the second depth image.

Optionally, one or more MLPs may be separately disposed between the second downsampling module 52 and the first upsampling module 53 in the order from left to right, where the MLPs are mainly used to adjust the dimensions corresponding to the extracted image features, so as to reestablish depth information of the image, and the MLPs are also referred to as fully connected neural networks or fully connected layers. When the MLP exists between the second downsampling module 52 and the first upsampling module 53, the image feature information directly transmitted to the first upsampling module 53 by the second downsampling module 52 is processed by the MLP, so that the image feature information finally input to the first upsampling module 53 is different from the image feature information input to the first upsampling module 53 by the second downsampling module 52 in a cross-link manner, and thus the first upsampling module receives two different image feature information, and then fuses the received image feature information, so that the image feature information obtained and extractable by the first upsampling module 53 is more sufficient and rich.

In this embodiment, a depth complement model is first built, then a training image and a first depth image corresponding to the training image are acquired through an acquisition module 71, a second depth image with a depth defect is added to the first depth image by a processing module 72, and the training module 73 performs depth complement training on the depth complement model according to the training image, the first depth image and the second depth image, so that the depth complement model can output a depth image subjected to depth complement based on a depth image with depth missing. According to the depth completion method and the depth completion device, the depth completion model is built and the deep learning is carried out, the trained depth completion model is utilized to carry out completion on the depth map with the depth missing, a depth completion solution is provided for the depth map with the depth missing, the depth map with the depth missing can be subjected to depth completion, and the depth complete depth map is obtained.

Example IV

Fig. 11 is a schematic structural diagram of an electronic device provided in an embodiment of the disclosure, as shown in fig. 11, where the electronic device includes:

a processor 291, the electronic device further comprising a memory 292; a communication interface (Communication Interface) 293 and bus 294 may also be included. The processor 291, the memory 292, and the communication interface 293 may communicate with each other via the bus 294. Communication interface 293 may be used for information transfer. The processor 291 may call logic instructions in the memory 292 to perform the methods of the above-described embodiments.

Further, the logic instructions in memory 292 described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product.

The memory 292 is a computer-readable storage medium that may be used to store a software program, a computer-executable program, and program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 291 executes functional applications and data processing by running software programs, instructions and modules stored in the memory 292, i.e., implements the methods of the method embodiments described above.

Memory 292 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the terminal device, etc. Further, memory 292 may include high-speed random access memory, and may also include non-volatile memory.

The disclosed embodiments provide a non-transitory computer readable storage medium having stored therein computer-executable instructions that, when executed by a processor, are configured to implement the method of the previous embodiments.

Example five

The embodiments of the present disclosure provide a computer program product, which includes a computer program, where the computer program when executed by a processor implements the private network data collection method provided in any of the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A training method for a depth completion model, comprising:

performing depth complement training on the depth complement model according to the training image, the first depth image and the second depth image, so that the depth complement model can output a depth image subjected to depth complement based on an input depth missing image;

the depth completion model includes: the first branch, the second branch and the fusion layer;

The second branch is used for inputting the second depth image, the second branch comprises a downsampling module and an upsampling module, the downsampling module is used for executing downsampling, and extracting the characteristics of depth information and spatial information from the downsampling result; the up-sampling module is used for extracting the characteristics of the depth information and the space information and executing up-sampling on the result of the characteristic extraction; the number of the downsampling modules is the same as that of the upsampling modules, at least one fully-connected neural network is arranged between the last downsampling module and the first upsampling module, and the fully-connected neural network is used for adjusting the dimension corresponding to the extracted image features;

2. The method of claim 1, wherein adding a depth defect to the first depth image, and generating a second depth image corresponding to the training image, comprises:

3. The method of claim 1, wherein said depth completion training the depth completion model from the training image, the first depth image, and the second depth image comprises:

4. A method according to claim 3, wherein the loss function is a mean square loss function.

5. The method of claim 1, wherein the downsampling modules are in one-to-one correspondence with the upsampling modules, wherein an output dimension of a downsampling module is the same as an input dimension of a corresponding upsampling module, at least one of the downsampling modules providing cross-layer connectivity to the corresponding upsampling module;

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

and the up-sampling module is provided with a cross-layer connection, and is particularly used for carrying out feature extraction of depth information and spatial information on the result of superposition of the shallow features transmitted by the cross-layer connection and the deep features output by the last module of the up-sampling module, and carrying out up-sampling on the result of feature extraction.

7. The method of claim 1, wherein the step of determining the position of the substrate comprises,

each up-sampling module and each down-sampling module comprise a second sub-module, and the second sub-modules are used for extracting features of depth information and space information.

8. The method of claim 7, wherein the first sub-module and the second sub-module are each a residual channel attention block model.

9. A depth complement image generation method, comprising:

inputting the depth image to be processed into a depth complement model to obtain a depth complement image subjected to depth complement; wherein the depth completion model is generated by training the depth completion model according to any one of claims 1 to 8.

10. A training device for a depth completion model, comprising:

the training module is used for performing depth complement training on the depth complement model according to the training image, the first depth image and the second depth image so that the depth complement model can output a depth image subjected to depth complement based on the input depth missing image;

11. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement the training method of the depth-completion model of any of claims 1-8 or the depth-completion image generation method of claim 9.

12. A computer readable storage medium, wherein computer executable instructions are stored in the computer readable storage medium, which when executed by a processor is adapted to implement the training method of the depth complement model according to any one of claims 1-8 or the depth complement image generation method according to claim 9.