CN112991416A

CN112991416A - Depth estimation method, model training method, device, equipment and storage medium

Info

Publication number: CN112991416A
Application number: CN202110396926.7A
Authority: CN
Inventors: 董怀琴; 吴宇斌; 尹康; 王慧; 朱志鹏
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2021-06-18

Abstract

The embodiment of the application provides a depth estimation method, a model training method, a device, equipment and a storage medium, and relates to the technical field of computer vision. The method comprises the following steps: acquiring an original image; calling a depth estimation model; estimating the depth value of the original image through a depth estimation model to obtain an estimated depth value set of the original image; wherein the loss function of the depth estimation model includes a depth loss function and/or a gradient loss function, the depth loss function is used for representing the difference degree between the estimated depth value output based on the depth estimation model and the real depth value, the gradient loss function is used for representing the difference degree between the gradient of the estimated depth value and the gradient of the real depth value, the gradient of the estimated depth value is determined based on the estimated depth value and the step length, and the gradient of the real depth value is determined based on the real depth value and the step length. The depth estimation model of the embodiment of the application increases the plane constraint, so that the predicted depth value of the image is more accurate.

Description

Depth estimation method, model training method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computer vision, in particular to a depth estimation method, a model training method, a device, equipment and a storage medium.

Background

The gray value of the pixel point in the depth image can indicate the distance between the scene displayed by the pixel point and the shooting equipment. Thus, the depth image may be used to represent three-dimensional scene information.

In the related art, methods for image depth estimation mainly include a monocular depth estimation method and a binocular depth estimation method. The monocular depth estimation method is based on one shot, and the binocular depth estimation method is based on two shots. The monocular depth estimation method estimates the depth of an image based on the image, the binocular depth estimation method uses two cameras for imaging, and because a certain distance exists between the two cameras, the images of the same scene formed by the two lenses have a certain difference, namely parallax, the depth of the image is estimated based on the parallax.

Disclosure of Invention

The embodiment of the application provides a depth estimation method, a model training method, a device, equipment and a storage medium. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a depth value estimation method, where the method includes:

acquiring an original image;

calling a depth estimation model;

estimating depth values of the original image through the depth estimation model to obtain an estimated depth value set of the original image, wherein the estimated depth value set comprises at least one estimated depth value;

wherein the penalty function of the depth estimation model comprises a depth penalty function for characterizing a degree of difference between an estimated depth value output based on the depth estimation model and a true depth value, and/or a gradient penalty function for characterizing a degree of difference between a gradient of an estimated depth value determined based on the estimated depth value and a step size and a gradient of a true depth value determined based on the true depth value and the step size.

In another aspect, an embodiment of the present application provides a model training method, where the method includes:

obtaining training data for a depth estimation model, the training data comprising at least one training sample, the training sample comprising a training image and a set of standard depth values for the training image, the set of standard depth values comprising at least one standard depth value;

estimating the depth value of the training image through the depth estimation model to obtain a predicted depth value set of the training image, wherein the predicted depth value set comprises at least one predicted depth value;

determining a value of a loss function based on the set of standard depth values for the training image and the set of predicted depth values for the training image;

training the depth estimation model based on the value of the loss function to obtain a trained depth estimation model;

wherein the loss function includes a depth loss function for characterizing a degree of difference between a predicted depth value output based on the depth estimation model and a standard depth value, and/or a gradient loss function for characterizing a degree of difference between a gradient of the predicted depth value determined based on the predicted depth value and a step size and a gradient of the standard depth value determined based on the standard depth value and the step size.

In another aspect, an embodiment of the present application provides a depth value estimation apparatus, including:

the image acquisition module is used for acquiring an original image;

the model calling module is used for calling a depth estimation model;

an image estimation module, configured to perform depth value estimation on the original image through the depth estimation model to obtain an estimated depth value set of the original image, where the estimated depth value set includes at least one estimated depth value;

In another aspect, an embodiment of the present application provides a model training apparatus, where the apparatus includes:

a data acquisition module, configured to acquire training data of a depth estimation model, where the training data includes at least one training sample, where the training sample includes a training image and a standard depth value set of the training image, and the standard depth value set includes at least one standard depth value;

the image estimation module is used for carrying out depth value estimation on the training image through the depth estimation model to obtain a predicted depth value set of the training image, wherein the predicted depth value set comprises at least one predicted depth value;

a loss determination module to determine a value of a loss function based on the set of standard depth values for the training image and the set of predicted depth values for the training image;

the model training module is used for training the depth estimation model based on the value of the loss function to obtain a trained depth estimation model;

In another aspect, embodiments of the present application provide a computer device, which includes a processor and a memory, where the memory stores a computer program, and the computer program is loaded by and executed by the processor to implement the depth value estimation method according to the above aspect, or implement the model training method according to the above aspect.

In yet another aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program is stored, the computer program being loaded and executed by a processor to implement the depth value estimation method according to the above aspect or to implement the model training method according to the above aspect.

In yet another aspect, embodiments of the present application provide a computer program product including computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the depth value estimation method of the above aspect or to implement the model training method of the above aspect.

The technical scheme provided by the embodiment of the application can bring the following beneficial effects:

the depth value of the image is estimated through the depth estimation model, the depth value of the image is obtained, the loss function of the depth estimation model comprises a depth loss function and a gradient loss function, the depth estimation of the plane is restrained by utilizing gradient information in the loss function, and therefore a smoother plane depth value is obtained, and the depth estimation model obtained based on the loss function training increases restraint on the plane, so that the predicted depth value of the image is more accurate.

Drawings

FIG. 1 is a flow chart of a depth value estimation method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of an original image provided by one embodiment of the present application;

FIG. 3 is a schematic diagram of a depth value visualization provided by the related art;

FIG. 4 is a schematic diagram of a depth value visualization provided by an embodiment of the present application;

FIG. 5 is a flow chart of a model training method provided by an embodiment of the present application;

FIG. 6 is a block diagram of a depth value estimation apparatus according to an embodiment of the present application;

FIG. 7 is a block diagram of a model training apparatus provided in one embodiment of the present application;

fig. 8 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Please refer to fig. 1, which shows a flowchart of a depth value estimation method according to an embodiment of the present application. The method may be executed by a Computer device, where the Computer device refers to an electronic device with computing and processing capabilities, the Computer device may include a terminal and a server, the terminal may include an electronic device such as a mobile phone, a tablet Computer, a Personal Computer (PC), an intelligent wearable device, and the like, and the server may be one server or a server cluster including multiple servers, which is not limited in this embodiment of the present application. The method may include several steps as follows.

Step 101, acquiring an original image.

The original image refers to an image for which depth values need to be determined. The depth value is used for indicating the distance between the scene displayed by the pixel points in the original image and the shooting device.

In a possible implementation, the original image may be captured by a monocular camera. The original image may be an RGB (Red Green Blue) image, that is, the original image may be a color image.

In a possible implementation, the original image may be obtained from a network.

When the execution main body of the depth value estimation method is a terminal, the terminal can obtain an original image through shooting by a monocular camera, or obtain the original image from the internet.

When the execution subject of the depth value estimation method is a server, the original image may be sent to the server after being captured by a terminal, or the server obtains the original image locally.

Step 102, calling a depth estimation model.

The depth estimation model refers to a model for estimating a depth value of an image. Illustratively, the depth estimation model is a monocular depth estimation model.

In a possible implementation manner, the depth estimation model may include a convolutional neural network, and the convolutional neural network may include a ResNet50 network, but of course, in other possible implementation manners, the depth estimation model may also be a neural network in other forms, and the embodiments of the present application are not limited thereto.

When the execution subject of the depth value estimation method is a terminal, the depth estimation model can be stored locally in the terminal, and at the moment, the terminal can directly call the depth estimation model from the local; or the depth estimation model can be stored in the server, and at this time, the terminal sends a model acquisition request to the server; the server responds to the model acquisition request and sends the depth estimation model to the terminal.

When the execution subject of the depth value estimation method is a server, the depth estimation model may be stored locally at the server, and at this time, the server may directly call the depth estimation model from the local.

And 103, estimating the depth value of the original image through the depth estimation model to obtain an estimated depth value set of the original image.

The input of the depth estimation model is an image, and the output of the depth estimation model is an estimated depth value set of the image, wherein the estimated depth value set comprises at least one estimated depth value, and the estimated depth value is used for indicating the distance between a scene displayed by a pixel point in the image predicted by the depth estimation model and a shooting device.

Depth value estimation refers to an operation of estimating a depth value of an image.

In a possible implementation, the depth image may be derived based on an estimated set of depth values for the original image. The number of the estimated depth values included in the estimated depth value set of the original image is the same as the number of the pixel points in the original image, that is, each pixel point in the original image corresponds to one estimated depth value.

Wherein the loss function of the depth estimation model includes a depth loss function and/or a gradient loss function, the depth loss function is used for representing the difference degree between the estimated depth value output based on the depth estimation model and the real depth value, the gradient loss function is used for representing the difference degree between the gradient of the estimated depth value and the gradient of the real depth value, the gradient of the estimated depth value is determined based on the estimated depth value and the step length, and the gradient of the real depth value is determined based on the real depth value and the step length.

In a possible implementation manner, before the computer device performs depth value estimation on the training image through the depth estimation model to obtain the predicted depth value of the training image, the computer device may perform enhancement processing on the training image to obtain an enhanced training image, where the enhancement processing includes at least one of: random rotation, random left-right rotation, random clipping and gamma transformation, wherein the training image after enhancement processing is used for inputting the depth estimation model. Optionally, the random rotation includes horizontal flipping, vertical flipping, horizontal and vertical flipping. Gamma conversion refers to the correction of a bleached (camera overexposed) image or an overly dark (underexposed) image during image processing.

The training procedure of the depth estimation model can be seen in the following embodiments, which are not described herein.

Fig. 2 is a schematic diagram of an original image according to an embodiment of the present application, fig. 3 is a schematic diagram of a depth value visualization provided in the related art, in which depth values shown in fig. 3 are obtained by performing depth value estimation on the original image by using only a depth estimation model trained by a depth loss function, and fig. 4 is a schematic diagram of a depth value visualization provided in the embodiment of the present application. In the example of the present application, the depth value of the original image is estimated by a depth estimation model trained by a depth loss function and a gradient loss function, so as to obtain the depth value shown in fig. 4. As can be seen from comparing fig. 3 and fig. 4, the edge of the depth value visualization map shown in fig. 4 is clearer, the image is sharper, the image quality is better, and the edge of the depth value visualization map shown in fig. 3 is blurred and unsmooth.

To sum up, in the technical solution provided in this application, a depth value of an image is estimated by a depth estimation model to obtain a depth value of the image, a loss function of the depth estimation model includes a depth loss function and a gradient loss function, and a depth estimation of a plane is constrained by using gradient information in the loss function, so as to obtain a relatively smooth depth value of the plane, and therefore, the depth estimation model obtained based on the loss function training increases constraints on the plane, so that a predicted depth value of the image is more accurate.

Referring to fig. 5, a flowchart of a model training method according to an embodiment of the present application is shown. The method may be performed by a computer device, which may include a server. The method may include several steps as follows.

Step 501, obtaining training data of a depth estimation model, where the training data includes at least one training sample, and the training sample includes a training image and a standard depth value set of the training image.

The training images refer to images used in the model training process. The training image may be an RGB image having a size H x W x 3, wherein H, W, 3 are the height, width and number of channels of the training image, respectively. In particular, if the training image is a single channel, i.e., a grayscale image, the pixel values of the training image need to be repeated three times in the channel dimension to keep the format uniform.

In a possible implementation, the computer device obtains a plurality of training images and a standard depth value set of the training images through the depth camera; alternatively, the computer device acquires a plurality of training images through the monocular camera and acquires a set of standard depth values for the training images through the depth sensor.

The standard depth value set comprises at least one standard depth value, the number of the standard depth values is the same as the number of pixel points of the training image, namely, each pixel point of the training image corresponds to one standard depth value. The standard depth value is used for indicating the real distance between a scene displayed by a pixel point of the training image and the shooting device.

And 502, estimating the depth value of the training image through the depth estimation model to obtain a prediction depth value set of the training image.

The set of predicted depth values includes at least one predicted depth value indicating a distance between a scene displayed by a pixel point predicted by the depth estimation model and the photographing device.

The number of the predicted depth values is the same as that of the standard depth values, and the predicted depth values and the standard depth values are equal to the number of pixel points in the training image.

Step 503, determining the value of the loss function based on the standard depth value set of the training image and the predicted depth value set of the training image.

The loss function comprises a depth loss function and/or a gradient loss function, the depth loss function is used for representing the difference degree between the predicted depth value output based on the depth estimation model and the standard depth value, the gradient loss function is used for representing the difference degree between the gradient of the predicted depth value and the gradient of the standard depth value, the gradient of the predicted depth value is determined based on the predicted depth value and the step length, and the gradient of the standard depth value is determined based on the standard depth value and the step length.

The step length is used for indicating the variation of the position of the pixel point, and is a positive integer.

The smaller the depth loss function is, the closer the predicted depth value and the standard depth value are represented; the smaller the gradient loss function, the closer the gradient representing the predicted depth value and the gradient of the standard depth value are.

And step 504, training the depth estimation model based on the value of the loss function to obtain the trained depth estimation model.

Illustratively, the network parameters of the depth estimation model are updated based on a gradient descent method back propagation. And training the depth estimation model based on the value of the loss function until the network converges to obtain the trained depth estimation model.

In a possible implementation manner, under the condition that the value of the loss function is smaller than the threshold value, network convergence is determined, and a deep estimation model which completes training is obtained.

In a possible implementation manner, after the trained depth estimation model is obtained, the depth estimation model can be tested again through the test sample to obtain a test result until the test result reaches a preset result, and a final depth estimation model is obtained. The test sample refers to a sample used in a model test stage. The test sample may include the test image and standard depth values for the test image.

In an exemplary embodiment, the loss function may be determined by:

first, a value of a depth loss function is determined based on a set of standard depth values and a set of predicted depth values.

In a possible implementation, the depth loss function is determined by: and determining the sum of the absolute values of the depth value difference values corresponding to all pixel points in the training image as the value of the depth loss function, wherein the depth value difference value corresponding to the pixel point is the difference value between the predicted depth value of the pixel point and the standard depth value of the pixel point.

The sum of the absolute values of the depth value differences corresponding to each pixel point in the training image may also be referred to as taking the norm of L1 for the predicted depth value and the standard depth value of the pixel point of the training image.

Illustratively, the depth loss function L₁Can be determined by:

where ξ (i, j) represents the predicted depth value of the pixel point in the ith row and the jth column,

and indicating the standard depth value of the pixel point of the ith row and the jth column.

Second, based on the standard depth value and the step size, the gradient of the standard depth value is determined.

In a possible implementation, the gradient of the standard depth value is determined by:

1. and for any pixel point at the first position in the training image, determining the initial gradient of the standard depth value of the pixel point at the first position based on the standard depth value of the pixel point at the first position, the standard depth value of the pixel point at the second position and the standard depth value of the pixel point at the third position.

The pixel point at the first position may refer to a pixel point at any position in the training image. The number of rows of the pixel points at the second position is the number of rows of the pixel points at the first position plus the step length, the number of columns of the pixel points at the second position is consistent with the number of columns of the pixel points at the first position, the number of rows of the pixel points at the third position is consistent with the number of rows of the pixel points at the first position, and the number of columns of the pixel points at the third position is the number of columns of the pixel points at the first position plus the step length.

Illustratively, the initial gradient of the standard depth value of the pixel point of the first position

Can be determined by the following formula:

wherein the content of the first and second substances,

an initial gradient of the standard depth value of the pixel point (the pixel point at the first position) at the ith row and the jth column is represented,

indicating the standard depth value of the pixel point (the pixel point at the second position) at the j th row and h th column,

a standard depth value representing a pixel point of the first position,

a standard depth value indicating a pixel (a pixel at the third position) at the ith row and the j + h column^TDenotes a transposition operation, h denotes a step size, and i and j are positive integers.

Note that the expression to the left of the bracket indicates a gradient in the x direction, and the expression to the rear of the bracket indicates a gradient in the y direction.

In an exemplary embodiment, the step size may be selected from a step size set, where the step size set includes at least one step size, and the step size set may be {1,2,4,8,16}, where of course, in other possible implementations, the step size set may further include other numbers of step sizes and step sizes of other values, and the number of step sizes included in the step size set is not limited in the embodiment of the present application, and the value of the step size is also not limited.

Illustratively, the initial gradient may also be referred to as a first order gradient.

2. And determining the initial gradient of the standard depth value of the pixel point at the second position based on the standard depth value of the pixel point at the second position, the standard depth value of the pixel point at the fourth position and the standard depth value of the pixel point at the fifth position.

The number of rows of the pixel points at the fourth position is the number of rows of the pixel points at the second position plus the step length, the number of columns of the pixel points at the fourth position is consistent with the number of columns of the pixel points at the second position, the number of rows of the pixel points at the fifth position is consistent with the number of rows of the pixel points at the second position, and the number of columns of the pixel points at the fifth position is the number of columns of the pixel points at the second position plus the step length.

Illustratively, the initial gradient of the standard depth value of the pixel point of the second position

Can be determined by the following formula:

wherein the content of the first and second substances,

an initial gradient of the standard depth value of the pixel point of the ith + h th row and jth column (the pixel point of the second position),

indicating the standard depth value of the pixel point (pixel point at the fourth position) at the j-th row and the j-th column of the (i +2h),

and (3) representing the standard depth value of the pixel point (the pixel point at the fifth position) at the j + h row and the j + h column of the i + h row.

3. And determining the initial gradient of the standard depth value of the pixel point at the third position based on the standard depth value of the pixel point at the third position, the standard depth value of the pixel point at the sixth position and the standard depth value of the pixel point at the seventh position.

The number of rows of the pixel points at the sixth position is the number of rows of the pixel points at the third position plus the step length, the number of columns of the pixel points at the sixth position is consistent with the number of columns of the pixel points at the third position, the number of rows of the pixel points at the seventh position is consistent with the number of rows of the pixel points at the third position, and the number of columns of the pixel points at the seventh position is the number of columns of the pixel points at the third position plus the step length.

Illustratively, the initial gradient of the standard depth value of the pixel point of the third position

Can be determined by the following formula:

wherein the content of the first and second substances,

an initial gradient of the standard depth value of the pixel point of the ith row and the jth + h column (the pixel point of the third position),

indicating the standard depth value of the pixel point (the pixel point at the sixth position) at the j + h th row and the j + h th column of the i + h th row,

indicating the standard depth value of the pixel point (pixel point at the third position) at the ith row and the jth + h column,

and (3) indicating the standard depth value of the pixel point (the pixel point at the seventh position) at the ith row and the jth +2h column.

4. And determining the gradient of the standard depth value based on the initial gradient of the pixel point at the first position, the initial gradient of the pixel point at the second position and the initial gradient of the pixel point at the third position.

Illustratively, the gradient of the standard depth value

Can be determined by the following formula:

the gradient of the standard depth value may also be referred to as a second order gradient.

Third, determining a gradient of the predicted depth value based on the predicted depth value and the step size.

In a possible implementation, the gradient of the predicted depth value may be determined by the following formula:

1. and determining the initial gradient of the predicted depth value of the pixel point at the first position based on the predicted depth value of the pixel point at the first position, the predicted depth value of the pixel point at the second position and the predicted depth value of the pixel point at the third position.

Illustratively, the initial gradient g of the predicted depth value of the pixel point of the first position_h[ξ](i, j) can be determined by the following formula:

wherein, g_h[ξ](i, j) represents the initial gradient of the predicted depth value of the pixel (pixel at the first position) in the ith row and jth column, ξ (i + h, j) represents the predicted depth value of the pixel (pixel at the second position) in the ith row and jth column, and ξ (i, j) represents the prediction of the pixel (pixel at the first position) in the ith row and jth columnThe depth value ξ (i, j + h) represents the predicted depth value of the pixel point (the pixel point at the third position) at the ith row and the jth column, h represents the step length, and i and j are positive integers.

2. And determining the initial gradient of the predicted depth value of the pixel point at the second position based on the predicted depth value of the pixel point at the second position, the predicted depth value of the pixel point at the fourth position and the predicted depth value of the pixel point at the fifth position.

Illustratively, the initial gradient g of the predicted depth value of the pixel point of the second position_h[ξ](i + h, j) can be determined by the following equation:

wherein, g_h[ξ](i + h, j) represents the initial gradient of the predicted depth value of the pixel point (the pixel point at the second position) in the j-th row and the j-th column in the i + h row, ξ (i +2h, j) represents the predicted depth value of the pixel point (the pixel point at the fourth position) in the j-th row and the j-th column in the i +2h row, ξ (i + h, j) represents the predicted depth value of the pixel point (the pixel point at the second position) in the j-th row and the j-th column in the i + h row, and ξ (i + h, j + h) represents the predicted depth value of the pixel point (the pixel point at the fifth position) in the j-th column in the i +.

3. And determining the initial gradient of the predicted depth value of the pixel point at the third position based on the predicted depth value of the pixel point at the third position, the predicted depth value of the pixel point at the sixth position and the predicted depth value of the pixel point at the seventh position.

Illustratively, the initial gradient g of the predicted depth value of the pixel point of the third position_h[ξ](i, j + h) can be determined by the following equation:

wherein, g_h[ξ](i, j + h) represents the initial gradient of the predicted depth value of the pixel point (the pixel point at the third position) at the j + h line of the ith row, ξ (i + h, j + h) represents the predicted depth value of the pixel point (the pixel point at the sixth position) at the j + h line of the ith row, ξ (i, j + h) represents the predicted depth value of the pixel point (the pixel point at the third position) at the j + h line of the ith row, and ξ (i, j +2h) represents the predicted depth value of the pixel point (the pixel point at the seventh position) at the j +2h line of the ith row.

4. And determining the gradient of the predicted depth value based on the initial gradient of the pixel point at the first position, the initial gradient of the pixel point at the second position and the initial gradient of the pixel point at the third position.

Illustratively, the gradient of the depth value is predicted

Can be determined by the following formula:

fourth, a value of a gradient loss function is determined based on the gradient of the standard depth value and the gradient of the predicted depth value.

In a possible implementation manner, the computer device determines a value of the gradient loss function based on a square root of a sum of squares of gradient value differences corresponding to each pixel point in the training image, where the gradient value difference corresponding to the pixel point is a difference between a gradient of a standard depth value of the pixel point and a gradient of a predicted depth value of the pixel point.

Illustratively, the gradient loss function L₂Can be determined by the following formula:

the square root of the sum of the squares of the gradient value differences corresponding to each pixel point in the training image can also be referred to as taking L as the gradient value difference corresponding to each pixel point in the training image₂And (4) norm.

Fifth, the value of the loss function is determined based on the value of the depth loss function of the first multiple and the value of the gradient loss function of the second multiple.

Illustratively, the Loss function Loss may be determined by the following equation:

Loss＝λ₁L₁+λ₂L₂；

wherein λ is₁Denotes the first multiple, λ₂Denotes the second multiple, λ₁、λ₂Is a non-negative number.

In a possible implementation, λ₁+λ₂＝1。

The embodiment of the application provides a loss function based on a second-order gradient, and solves the problem of unsmooth plane depth estimation, so that a depth estimation model can be suitable for more application scenes. In addition, the loss function provided by the embodiment of the application can be applied to other depth estimation and reconstruction algorithms which need to improve the plane depth estimation effect, and has great use value.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 6, a block diagram of a depth value estimation apparatus provided in an embodiment of the present application is shown, where the apparatus has a function of implementing the above depth value estimation method, and the function may be implemented by hardware, or by hardware executing corresponding software. The apparatus 600 may include:

an image acquisition module 610, configured to acquire an original image;

a model calling module 620, configured to call a depth estimation model;

an image estimation module 630, configured to perform depth value estimation on the original image through the depth estimation model, to obtain an estimated depth value set of the original image, where the estimated depth value set includes at least one estimated depth value;

Referring to fig. 7, a block diagram of a model training apparatus provided in an embodiment of the present application is shown, where the apparatus has a function of implementing the above example of the model training method, and the function may be implemented by hardware or by hardware executing corresponding software. The apparatus 700 may include:

a data obtaining module 710, configured to obtain training data of a depth estimation model, where the training data includes at least one training sample, where the training sample includes a training image and a set of standard depth values of the training image, and the set of standard depth values includes at least one standard depth value;

an image estimation module 720, configured to perform depth value estimation on the training image through the depth estimation model to obtain a set of predicted depth values of the training image, where the set of predicted depth values includes at least one predicted depth value;

a loss determination module 730 for determining a value of a loss function based on the set of standard depth values of the training image and the set of predicted depth values of the training image;

the model training module 740 is configured to train the depth estimation model based on the value of the loss function to obtain a trained depth estimation model;

In an exemplary embodiment, the loss determination module 730 includes:

a first determining unit (not shown in the figures) for determining a value of the depth loss function based on the set of standard depth values and the set of predicted depth values;

a second determination unit (not shown in the figure) for determining a gradient of the standard depth value based on the standard depth value and the step size;

a third determining unit (not shown in the drawings) for determining a gradient of the predicted depth value based on the predicted depth value and the step size;

a fourth determination unit (not shown in the drawings) for determining a value of the gradient loss function based on a gradient of the standard depth value and a gradient of the predicted depth value;

a fifth determining unit (not shown in the figures) for determining the value of the loss function based on the value of the depth loss function at the first multiple and the value of the gradient loss function at the second multiple.

In an exemplary embodiment, the first determining unit is configured to:

and determining the sum of the absolute values of the depth value difference values corresponding to all pixel points in the training image as the value of the depth loss function, wherein the depth value difference value corresponding to the pixel point is the difference value between the predicted depth value of the pixel point and the standard depth value of the pixel point.

In an exemplary embodiment, the second determining unit is configured to:

for any pixel point at a first position in the training image, determining an initial gradient of a standard depth value of the pixel point at the first position based on the standard depth value of the pixel point at the first position, the standard depth value of the pixel point at a second position and the standard depth value of the pixel point at a third position;

determining an initial gradient of the standard depth value of the pixel point at the second position based on the standard depth value of the pixel point at the second position, the standard depth value of the pixel point at the fourth position and the standard depth value of the pixel point at the fifth position;

determining an initial gradient of the standard depth value of the pixel point at the third position based on the standard depth value of the pixel point at the third position, the standard depth value of the pixel point at the sixth position and the standard depth value of the pixel point at the seventh position;

determining the gradient of the standard depth value based on the initial gradient of the pixel point at the first position, the initial gradient of the pixel point at the second position and the initial gradient of the pixel point at the third position;

wherein the number of rows of the pixel points at the second position is the number of rows of the pixel points at the first position plus the step length, the number of columns of the pixel points at the second position is consistent with the number of columns of the pixel points at the first position, the number of rows of the pixel points at the third position is the number of columns of the pixel points at the first position plus the step length, the number of rows of the pixel points at the fourth position is the number of rows of the pixel points at the second position plus the step length, the number of columns of the pixel points at the fourth position is the number of columns of the pixel points at the second position, the number of rows of the pixel points at the fifth position is the number of rows of the pixel points at the second position plus the step length, the number of rows of the pixel points at the sixth position is the number of rows of the pixel points at the third position plus the step length, the number of columns of the pixel points at the sixth position is consistent with the number of columns of the pixel points at the third position, the number of rows of the pixel points at the seventh position is consistent with the number of rows of the pixel points at the third position, and the number of columns of the pixel points at the seventh position is the number of columns of the pixel points at the third position plus the step length.

In an exemplary embodiment, the third determining unit is configured to:

determining an initial gradient of the predicted depth value of the pixel point at the first position based on the predicted depth value of the pixel point at the first position, the predicted depth value of the pixel point at the second position and the predicted depth value of the pixel point at the third position;

determining an initial gradient of the predicted depth value of the pixel point at the second position based on the predicted depth value of the pixel point at the second position, the predicted depth value of the pixel point at the fourth position and the predicted depth value of the pixel point at the fifth position;

determining an initial gradient of the predicted depth value of the pixel point at the third position based on the predicted depth value of the pixel point at the third position, the predicted depth value of the pixel point at the sixth position and the predicted depth value of the pixel point at the seventh position;

determining the gradient of the predicted depth value based on the initial gradient of the pixel point at the first position, the initial gradient of the pixel point at the second position and the initial gradient of the pixel point at the third position;

In an exemplary embodiment, the fourth determining unit is configured to:

and determining the value of the gradient loss function based on the square root of the sum of squares of the gradient value difference values corresponding to each pixel point in the training image, wherein the gradient value difference value corresponding to the pixel point is the difference value between the gradient of the standard depth value of the pixel point and the gradient of the predicted depth value of the pixel point.

In an exemplary embodiment, the apparatus further comprises:

an image enhancement module (not shown in the figure) configured to perform enhancement processing on the training image to obtain an enhanced training image, where the enhancement processing includes at least one of: random rotation, random left-right rotation, random clipping and gamma transformation, wherein the training image after enhancement processing is used for inputting the depth estimation model.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Referring to fig. 8, a block diagram of a computer device according to an embodiment of the present application is shown.

The computer device in the embodiment of the application can comprise one or more of the following components: a processor 810 and a memory 820.

Processor 810 may include one or more processing cores. The processor 810 interfaces with various components throughout the computer device using various interfaces and circuitry to perform various functions of the computer device and process data by executing or performing instructions, programs, code sets, or instruction sets stored in the memory 820 and invoking data stored in the memory 820. Alternatively, the processor 810 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 810 may integrate one or a combination of a Central Processing Unit (CPU) and a modem. Wherein, the CPU mainly processes an operating system, an application program and the like; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 810, but may be implemented by a single chip.

Optionally, the processor 810, when executing the program instructions in the memory 820, implements the methods provided by the various method embodiments described above.

The Memory 820 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). Optionally, the memory 820 includes a non-transitory computer-readable medium. The memory 820 may be used to store instructions, programs, code sets, or instruction sets. The memory 820 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function, instructions for implementing the various method embodiments described above, and the like; the storage data area may store data created according to use of the computer device, and the like.

The structure of the computer device described above is merely illustrative, and in actual implementation, the computer device may include more or less components, such as: a display screen, etc., which are not limited in this embodiment.

Those skilled in the art will appreciate that the architecture shown in FIG. 8 is not intended to be limiting of computer devices, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

In an exemplary embodiment, a computer-readable storage medium is also provided, in which a computer program is stored, which is loaded and executed by a processor of a computer device to implement the respective steps in the above-described depth value estimation method embodiments.

In an exemplary embodiment, a computer readable storage medium is also provided, in which a computer program is stored, which is loaded and executed by a processor of a computer device to implement the steps in the above-described model training method embodiments.

In an exemplary embodiment, a computer program product is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the depth value estimation method.

In an exemplary embodiment, a computer program product is provided that includes computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the model training method.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A depth value estimation method, characterized in that the method comprises:

acquiring an original image;

calling a depth estimation model;

2. A method of model training, the method comprising:

3. The method of claim 2, wherein determining the value of the loss function based on the set of standard depth values for the training image and the set of predicted depth values for the training image comprises:

determining a value of the depth loss function based on the set of standard depth values and the set of predicted depth values;

determining a gradient of the standard depth value based on the standard depth value and the step size;

determining a gradient of the predicted depth value based on the predicted depth value and the step size;

determining a value of the gradient loss function based on the gradient of the standard depth value and the gradient of the predicted depth value;

determining a value of the loss function based on a first multiple of the value of the depth loss function and a second multiple of the value of the gradient loss function.

4. The method of claim 3, wherein determining the value of the depth loss function based on the set of standard depth values and the set of predicted depth values comprises:

5. The method of claim 3, wherein determining the gradient of the standard depth value based on the standard depth value and the step size comprises:

6. The method of claim 3, wherein determining the gradient of the predicted depth value based on the predicted depth value and the step size comprises:

7. The method of claim 3, wherein determining the value of the gradient loss function based on the gradient of the standard depth value and the gradient of the predicted depth value comprises:

8. The method according to any one of claims 2 to 7, wherein before estimating the depth value of the training image by the depth estimation model to obtain the predicted depth value of the training image, the method further comprises:

performing enhancement processing on the training image to obtain an enhanced training image, wherein the enhancement processing comprises at least one of the following steps: random rotation, random left-right rotation, random clipping and gamma transformation, wherein the training image after enhancement processing is used for inputting the depth estimation model.

9. A depth value estimation apparatus, characterized in that the apparatus comprises:

the image acquisition module is used for acquiring an original image;

the model calling module is used for calling a depth estimation model;

10. A model training apparatus, the apparatus comprising:

11. A computer device, characterized in that the computer device comprises a processor and a memory, the memory storing a computer program which is loaded and executed by the processor to implement the depth value estimation method of claim 1 or to implement the model training method of any one of claims 2 to 8.

12. A computer-readable storage medium, in which a computer program is stored, which is loaded and executed by a processor to implement the depth value estimation method of claim 1 or to implement the model training method of any one of claims 2 to 8.