CN114937072A

CN114937072A - Image processing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN114937072A
Application number: CN202210592041.9A
Authority: CN
Inventors: 刘继文
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-05-27
Filing date: 2022-05-27
Publication date: 2022-08-23

Abstract

The present disclosure relates to an image processing method and apparatus, an electronic device, and a computer-readable storage medium, the image processing method including: carrying out depth estimation processing on the target image to obtain an estimated depth value and an estimated offset of each pixel point of the target image; determining offset pixel points of corresponding pixel points according to the estimated offset of each pixel point of the target image; taking the estimated depth value corresponding to the offset pixel point of each pixel point in the target image as the corrected depth value of the corresponding pixel point; and obtaining a depth map of the target image based on the corrected depth value of each pixel point of the target image. The method and the device can simultaneously predict the offset of the depth map and the depth map, and correct the distribution of the depth values on the predicted depth map through the predicted offset, so that a transition zone is eliminated under the condition of consuming a small amount of calculation, and the definition of the depth map is improved.

Description

Image processing method and device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of deep learning technologies, and in particular, to an image processing method and apparatus, an electronic device, and a computer-readable storage medium.

Background

Monocular depth estimation refers to predicting the distance from each point on an RGB (Red Green Blue, Red Green Blue color system, which obtains various colors by the change of Red, Green, and Blue color channels and their mutual superposition) image to a camera plane based on deep learning, that is, the depth of each point, so as to obtain a depth map of the RGB image. However, the depth map obtained by the depth estimation method based on the deep learning often has the problems of transition zones at the boundary and insufficient definition.

Related art to solve this problem, there are two types of solutions. One is to adopt two-dimensional guided filtering to improve depth details, but a depth map is three-dimensional information, and this scheme may destroy three-dimensional geometry, for example, for an inclined wall, the depths are distributed from near to far, this scheme may change the inclination angle of the wall, and may process a two-dimensional image boundary into a boundary on the depth map, which may cause a false edge problem. The other is to increase the input size of the depth estimation model to improve the depth details, but this will certainly increase the amount of calculation greatly, which is not convenient for the mobile terminal.

Disclosure of Invention

The present disclosure provides an image processing method and apparatus, an electronic device, and a computer-readable storage medium, so as to solve at least the problem of how to improve the sharpness of a depth map without greatly increasing the amount of computation in the related art, and may not solve any of the above problems.

According to a first aspect of the present disclosure, there is provided an image processing method including: carrying out depth estimation processing on a target image to obtain an estimated depth value and an estimated offset of each pixel point of the target image; determining offset pixel points of corresponding pixel points according to the estimated offset of each pixel point of the target image; taking the estimated depth value corresponding to the offset pixel point of each pixel point in the target image as a corrected depth value of the corresponding pixel point; and obtaining a depth map of the target image based on the corrected depth value of each pixel point of the target image.

Optionally, the determining offset pixel points of corresponding pixel points according to the estimated offset of each pixel point of the target image includes: and aiming at each pixel point of the target image, adding the coordinate value of the corresponding pixel point and the estimated offset of the corresponding pixel point to obtain the coordinate value of the offset pixel point of the corresponding pixel point.

Optionally, each pixel point has a first direction coordinate value and a second direction coordinate value, and the pre-estimated offset of each pixel point includes a first direction pre-estimated offset and a second direction pre-estimated offset, wherein, for each pixel point of the target image, the coordinate value of the offset pixel point of the corresponding pixel point is obtained by adding the coordinate value of the corresponding pixel point and the pre-estimated offset of the corresponding pixel point, and the method includes: and for each pixel point of the target image, adding the first direction coordinate value of the corresponding pixel point and the first direction pre-estimated offset of the corresponding pixel point to obtain a first direction coordinate value of the offset pixel point of the corresponding pixel point, and adding the second direction coordinate value of the corresponding pixel point and the second direction pre-estimated offset of the corresponding pixel point to obtain a second direction coordinate value of the offset pixel point of the corresponding pixel point.

Optionally, in a case that the target image is in a rectangular coordinate system, the first direction is an x direction of the rectangular coordinate system, and the second direction is a y direction of the rectangular coordinate system; or in the case that the target image is in a polar coordinate system, the first direction is a radial direction of the polar coordinate system, and the second direction is a polar angle direction of the polar coordinate system.

Optionally, the performing depth estimation processing on the target image to obtain an estimated depth value and an estimated offset of each pixel point of the target image includes: inputting the target image into a depth estimation model to obtain an output result of the depth estimation model; the output result comprises the estimated depth value and the estimated offset of each pixel point of the target image.

Optionally, the depth estimation model is obtained by training through the following steps: acquiring a sample image and a reference depth value of each pixel point of the sample image; inputting the sample image into the depth estimation model to obtain a sample estimated depth value and a sample estimated offset of each pixel point of the sample image; determining the reference offset of each pixel point of the sample image according to the reference depth value and the sample pre-estimated depth value of each pixel point of the sample image; determining a loss value based on the reference depth value, the sample pre-estimated depth value, the reference offset and the sample pre-estimated offset of each pixel point of the sample image; adjusting parameters of the depth estimation model based on the loss values to train the depth estimation model.

Optionally, the determining a reference offset of each pixel point of the sample image according to the reference depth value and the sample pre-estimated depth value of each pixel point of the sample image includes: traversing all pixel points of the sample image, and determining a plurality of reference points around the current pixel point; determining a weight of a corresponding reference point of the current pixel point according to the reference depth value of the current pixel point and the sample estimated depth value of each reference point of the current pixel point; determining the offset of each reference point of the current pixel point relative to the current pixel point; and calculating a weighted average of the offsets of all the reference points of the current pixel point according to the weight of each reference point of the current pixel point, and taking the weighted average as the reference offset of the current pixel point.

Optionally, the determining, according to the reference depth value of the current pixel point and the sample estimated depth value of each reference point of the current pixel point, a weight of the corresponding reference point of the current pixel point includes: determining an absolute value of an error between the sample estimated depth value of each reference point of the current pixel point and the reference depth value of the current pixel point as an error of the corresponding reference point; and determining the weight of each reference point of the current pixel point according to the error of each reference point of the current pixel point, wherein the weight of each reference point is negatively correlated with the error of the corresponding reference point.

Optionally, the determining the weight of each reference point of the current pixel point according to the error of each reference point of the current pixel point includes: determining the sum of the error of each reference point of the current pixel point and a set value, and determining the ratio of the set value to the sum as the weight of the corresponding reference point; wherein the set value is a positive number.

Optionally, the determining a plurality of reference points around the current pixel point includes: determining pixel points, of all pixel points of the sample image, of which the distances between the pixel points and the current pixel point in the first direction and the second direction are smaller than corresponding reference values, and using the pixel points as reference points of the current pixel points; or determining the pixel points with the linear distance from the current pixel point to be less than the reference value in all the pixel points of the sample image.

Optionally, the determining a loss value based on the reference depth value, the sample predicted depth value, the reference offset, and the sample predicted offset of each pixel point of the sample image includes: calculating a first loss value based on the reference depth value and the sample estimated depth value of each pixel point of the sample image; calculating a second loss value based on the reference offset and the sample prediction offset of each pixel point of the sample image; determining the loss value based on the first loss value and the second loss value.

According to a second aspect of the present disclosure, there is provided an image processing apparatus including: a prediction unit configured to: carrying out depth estimation processing on a target image to obtain an estimated depth value and an estimated offset of each pixel point of the target image; an offset unit configured to: determining offset pixel points of corresponding pixel points according to the estimated offset of each pixel point of the target image; a correction unit configured to: taking the estimated depth value corresponding to the offset pixel point of each pixel point in the target image as a corrected depth value of the corresponding pixel point; a summarization unit configured to: and obtaining a depth map of the target image based on the corrected depth value of each pixel point of the target image.

Optionally, the offset unit is further configured to: and aiming at each pixel point of the target image, adding the coordinate value of the corresponding pixel point and the estimated offset of the corresponding pixel point to obtain the coordinate value of the offset pixel point of the corresponding pixel point.

Optionally, each pixel point has a first direction coordinate value and a second direction coordinate value, the predicted offset of each pixel point includes a first direction predicted offset and a second direction predicted offset, and the offset unit is further configured to: and aiming at each pixel point of the target image, adding the first direction coordinate value of the corresponding pixel point and the first direction pre-estimated offset of the corresponding pixel point to obtain a first direction coordinate value of the offset pixel point of the corresponding pixel point, and adding the second direction coordinate value of the corresponding pixel point and the second direction pre-estimated offset of the corresponding pixel point to obtain a second direction coordinate value of the offset pixel point of the corresponding pixel point.

Optionally, the pre-estimating unit is further configured to: inputting the target image into a depth estimation model to obtain an output result of the depth estimation model; the output result comprises the estimated depth value and the estimated offset of each pixel point of the target image.

Optionally, the determining, according to the reference depth value of the current pixel point and the sample estimated depth value of each reference point of the current pixel point, a weight of the corresponding reference point of the current pixel point includes: determining an absolute value of an error between the sample estimated depth value of each reference point of the current pixel point and the reference depth value of the current pixel point as an error of the corresponding reference point; and determining the weight of each reference point of the current pixel point according to the error of each reference point of the current pixel point, wherein the weight of each reference point is negatively related to the error of the corresponding reference point.

Optionally, the determining, according to the error of each reference point of the current pixel point, the weight of each reference point of the current pixel point includes: determining the sum of the error of each reference point of the current pixel point and a set value, and determining the ratio of the set value to the sum as the weight of the corresponding reference point; wherein the set value is a positive number.

Optionally, the determining a loss value based on the reference depth value, the sample predicted depth value, the reference offset, and the sample predicted offset of each pixel point of the sample image includes: calculating a first loss value based on the reference depth value and the sample pre-estimated depth value of each pixel point of the sample image; calculating a second loss value based on the reference offset and the sample prediction offset of each pixel point of the sample image; determining the loss value based on the first loss value and the second loss value.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform an image processing method according to the present disclosure.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium in which instructions, when executed by at least one processor, cause the at least one processor to perform an image processing method according to the present disclosure.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by at least one processor, implement an image processing method according to the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the image processing method and the image processing device disclosed by the embodiment of the disclosure, the offset of the depth map and the offset of the depth map can be estimated at the same time, and the distribution of the depth values on the estimated depth map is corrected through the estimated offset, so that a transition zone is eliminated under the condition of consuming a small amount of calculation, and the definition of the depth map is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1a is a schematic diagram illustrating a target image according to an exemplary embodiment of the present disclosure.

Fig. 1b is a schematic diagram of a depth map of a target image obtained by applying an image processing method of the related art.

Fig. 2 is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure.

Fig. 3 is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure.

Fig. 4a is a schematic diagram illustrating a pre-estimated depth map according to an exemplary embodiment of the present disclosure.

Fig. 4b is a schematic diagram illustrating an x-direction estimated offset according to an exemplary embodiment of the present disclosure.

Fig. 4c is a schematic diagram illustrating a y-direction predicted offset amount according to an exemplary embodiment of the present disclosure.

Fig. 4d is a schematic diagram illustrating a depth value replacement operation according to an exemplary embodiment of the present disclosure.

Fig. 4e is a schematic diagram illustrating a modified depth map according to an exemplary embodiment of the present disclosure.

Fig. 5 is a flowchart illustrating a training method of a depth estimation model according to an exemplary embodiment of the present disclosure.

Fig. 6 is a flowchart illustrating a training method of a depth estimation model according to an exemplary embodiment of the present disclosure.

Fig. 7 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment of the present disclosure.

Fig. 8 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "including at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

Monocular depth estimation refers to predicting the distance of each point on an RGB image from a camera plane, i.e., the depth of each point, based on deep learning, thereby obtaining a depth map of the RGB image. Referring to fig. 1a and 1b, it can be seen that the depth map obtained by the depth estimation method based on the deep learning often has a transition zone at the boundary and insufficient definition.

Related art to solve this problem, there are two types of solutions.

One is to adopt a method of guided filtering post-processing to improve depth details, and the guided filtering is to filter the depth map by taking a clear RGB image as a guide map, so that the depth map corresponding to an area with more consistent textures on the RGB image is smoother, and the depth boundary is more close to the boundary of the RGB image. However, guided filtering is a two-dimensional operation, and a depth map is three-dimensional information, and for a planar region, the smoothing effect of guided filtering may destroy the three-dimensional geometry, for example, for an inclined wall surface, the depth is distributed from near to far, and smooth filtering may destroy the geometric relationship, so that the inclination angle of the wall surface changes. Furthermore, at depth boundaries, guided filtering can improve depth details, but also can cause false edge problems, such as poster boundaries on walls, zebra stripes on roads, and boundaries on these regions RGB can cause false boundaries on the filtered depth map.

The other type is a deep learning-based method, which mainly increases the input size of a depth estimation model to improve depth details, and large input can retain more detail information in an image, so that a predicted depth map is clearer, but undoubtedly can cause a large increase in calculation amount, for example, the length and width of the input size are respectively changed to 2 times of the original length and width, and the calculation amount is increased by 4 times.

Hereinafter, an image processing method and an image processing apparatus according to an exemplary embodiment of the present disclosure will be described in detail with reference to fig. 2 to 8.

Fig. 2 is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure. Fig. 3 is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure. Fig. 4a to 4e are a series of images generated in the course of performing an image processing method according to an exemplary embodiment of the present disclosure. It should be understood that the image processing method according to the exemplary embodiment of the present disclosure may be implemented in a terminal device such as a smartphone, a tablet computer, a Personal Computer (PC), or may be implemented in a device such as a server.

Referring to fig. 2 and 3, in step 201, a depth estimation process is performed on the target image to obtain an estimated depth value and an estimated offset of each pixel of the target image.

And collecting the estimated depth values of each pixel point of the target image together to form an estimated depth map. An obvious difference between the image processing method according to the exemplary embodiment of the present disclosure and the depth estimation method in the related art is that, in addition to the estimated depth value, an estimated offset is obtained, which can be used for the subsequent step to correct the distribution of the depth values on the estimated depth map.

As an example, referring to the target image shown in fig. 1a, fig. 4a shows the pre-estimated depth map with blurred boundary obtained in step 201.

Optionally, step 201 specifically includes inputting the target image into the depth estimation model to obtain an output result of the depth estimation model; the output result comprises the pre-estimated depth value and the pre-estimated offset of each pixel point of the target image. The estimated depth value and the estimated offset can be output by utilizing one depth estimation model at the same time, and the estimated depth value and the estimated offset do not need to be calculated separately, so that the calculation process is facilitated to be simplified. The training procedure of the depth estimation model will be described later.

Referring back to fig. 2, in step 202, according to the estimated offset of each pixel of the target image, an offset pixel of the corresponding pixel is determined. And in combination with the pre-estimated offset of each pixel point, the corresponding pixel point can be offset to the offset pixel point, so that the depth value of the corresponding pixel point can be corrected based on the offset pixel point in the subsequent steps.

Optionally, step 202 specifically includes: and aiming at each pixel point of the target image, adding the coordinate value of the corresponding pixel point and the estimated offset of the corresponding pixel point to obtain the coordinate value of the offset pixel point of the corresponding pixel point. By executing the coordinate value calculation, the accurate offset pixel point can be obtained, and the simplicity and reliability of the calculation process are guaranteed.

Optionally, the target image is a two-dimensional plane image, each pixel point has a first direction coordinate value and a second direction coordinate value, the estimated offset of each pixel point correspondingly comprises a first direction estimated offset and a second direction estimated offset, and independent offset in two directions can be achieved. Based on this, step 202 specifically includes: and aiming at each pixel point of the target image, adding the first direction coordinate value of the corresponding pixel point and the first direction pre-estimated offset of the corresponding pixel point to obtain a first direction coordinate value of the offset pixel point of the corresponding pixel point, and adding the second direction coordinate value of the corresponding pixel point and the second direction pre-estimated offset of the corresponding pixel point to obtain a second direction coordinate value of the offset pixel point of the corresponding pixel point. The coordinate values of the offset pixel points can be obtained by respectively executing offset calculation on the two coordinate directions and summarizing the calculation results, so that the accuracy of the offset calculation is guaranteed.

Optionally, in a case that the target image is in a rectangular coordinate system, the first direction is an x direction of the rectangular coordinate system, and the second direction is a y direction of the rectangular coordinate system, that is, the rectangular coordinate system is used to describe coordinate values of the pixel points, which is suitable for calculation of a conventional rectangular image. Referring to fig. 3, the x-direction estimated offset may be denoted as dx, and the y-direction estimated offset may be denoted as dy. As an example, referring to the target image shown in fig. 1a, fig. 4b shows a schematic diagram of the estimated offset in the x direction obtained in step 202, and fig. 4c shows a schematic diagram of the estimated offset in the y direction obtained in step 202.

Optionally, in a case that the target image is in a polar coordinate system, the first direction is a radial direction of the polar coordinate system, and the second direction is a polar angle direction of the polar coordinate system, that is, the polar coordinate system is used to describe coordinate values of the pixel points, which is suitable for calculation and description of the circular image.

It should be understood that in actual calculations, a suitable coordinate system may be employed as desired, and the present disclosure is not limited thereto.

Referring back to fig. 2 and 3, in step 203, the estimated depth value corresponding to the offset pixel point of each pixel point in the target image is used as the corrected depth value of the corresponding pixel point. In this step, a depth value replacement operation is performed, and the estimated depth value of each pixel point of the estimated depth map obtained in step 201 is replaced with the estimated depth value corresponding to the offset pixel point of the pixel point, so that the distribution of the depth values on the estimated depth map can be adjusted to obtain the corrected depth value without recalculating each depth value, and therefore, the input size of the depth estimation model does not need to be increased, the calculation amount is small, and the method is applicable to a mobile terminal. Meanwhile, because a guiding filtering method is not adopted, real depth edges and false edges can be distinguished through feature extraction of a deep learning network, and the false edges can not be brought while the definition is improved.

Referring to fig. 3, for the pixel p (i, j), D1(i, j) represents the predicted depth value, dx (i, j) represents the predicted offset in the x direction, dy (i, j) represents the predicted offset in the y direction, the pixel p' (i + dx (i, j), j + dy (i, j)) represents the offset pixel, and the corrected depth value D2(i, j) of the pixel p (i, j) is D1(i + dx (i, j), j + dy (i, j)). Fig. 4d shows a schematic diagram of the depth value replacement operation. It can be seen from fig. 4b to 4d that the predicted offset at the boundary is obvious, which can be understood as replacing the depth value of the pixel in the boundary transition zone with the depth value of the pixel in the surrounding non-transition zone, thereby improving the definition at the boundary, and for the pixel in the non-transition zone, the predicted offset is basically 0, that is, the depth value replacement is not needed, which not only pointedly improves the definition of the boundary, but also reduces the calculation amount of the depth value replacement operation.

Based on this, optionally, step 203 may be performed only when the offset pixel point obtained in step 202 does not coincide with the corresponding pixel point. Further, steps 202 and 203 may be performed only when the estimated offset obtained in step 201 is not 0. It should be understood that, for the latter, since the estimated offset usually includes the estimated offset in the first direction and the estimated offset in the second direction, specifically, when the estimated offset in the first direction and the estimated offset in the second direction are not both 0,

steps

202 and 203 are performed on the pixel.

Referring back to fig. 2, in step 204, a depth map of the target image is obtained based on the corrected depth value of each pixel point of the target image. The corrected depth values of each pixel point of the target image are gathered together, and the estimated depth map of the image processing method according to the exemplary embodiment of the present disclosure is obtained. As an example, referring to the target image shown in fig. 1a, fig. 4e shows the depth map obtained in step 204, and it is obvious from comparison with fig. 4a that the transition zone is eliminated and the definition of the boundary is significantly improved.

Next, a training method of the depth estimation model is described.

Fig. 5 is a flowchart illustrating a training method of a depth estimation model according to an exemplary embodiment of the present disclosure. Fig. 6 is a flowchart illustrating a training method of a depth estimation model according to an exemplary embodiment of the present disclosure.

Referring to fig. 5, the depth estimation model is trained by the following steps:

in step 501, a sample image and a reference depth value of each pixel point of the sample image are obtained. The step is a training sample acquisition step, and the supervised training of the depth estimation model can be realized. Wherein, the reference depth value can be collected by the depth sensor.

In step 502, the sample image is input into the depth estimation model to obtain a sample estimated depth value and a sample estimated offset of each pixel point of the sample image. The sample prediction offset is used for determining offset pixel points of corresponding pixel points, and the sample prediction depth value of the offset pixel point of each pixel point is used as the corrected depth value of the corresponding pixel point. In the step, a sample image to be trained is used for operating the depth estimation model, so that a corresponding estimated value can be obtained, and the estimated value is compared with a reference value of the sample image to realize supervised training.

In step 503, a reference offset of each pixel point of the sample image is determined according to the reference depth value of each pixel point of the sample image and the sample pre-estimated depth value. The training samples only comprise sample images and reference depth values, and the output quantity of the depth estimation model comprises two of sample estimated depth values and sample estimated offsets, so the reference offsets are confirmed in the step. By combining the reference depth value and the sample predicted depth value of each pixel point, the sample predicted depth value of a specific pixel point can be determined to be closest to the reference depth value of the current pixel point, so that the closest pixel point is used as the offset pixel point of the current pixel point, the offset of the pixel point relative to the current pixel point is used as the reference offset, and a reliable reference offset can be obtained to guide model training.

Optionally, step 503 specifically includes the following four steps:

the method comprises the steps of firstly, traversing all pixel points of a sample image, and determining a plurality of reference points around the current pixel point. Since the offset pixel point is necessarily a pixel point closer to the current pixel point, the offset pixel point can be determined by determining a plurality of reference points around the current pixel point. It should be understood that the more reference points are selected, the greater the possibility of obtaining accurate offset pixel points is, but the greater the number of reference points is, the greater calculation load is also caused, so that the balance between the accuracy and the calculation load can be realized by reasonably controlling the number of reference points, and the calculation efficiency is improved. In practice, the number of reference points to be selected may be determined according to the degree of sharpness of the sample image itself to improve the calculation efficiency of step 503, or a reference point determination criterion with higher universality may be selected to reduce the calculation amount of the previous preparation work (herein, the work of determining the number of reference points) and this disclosure is not limited thereto.

Optionally, the operation of determining a plurality of reference points around the current pixel point may specifically include: determining pixel points, of all pixel points of the sample image, of which the distances between the pixel points and the current pixel point in the first direction and the second direction are smaller than corresponding reference values, and using the pixel points as reference points of the current pixel points; or determining the pixel points with the linear distance from the current pixel point to be smaller than the reference value in all the pixel points of the sample image. By respectively configuring reference values in the first direction and the second direction or configuring reference values in the linear distance, not only can a standard be provided for determining a reference point, but also pixel points in all directions around the current pixel point can be used as the reference point, and the possibility of obtaining accurate offset pixel points is improved. It should be understood that, in a case that the sample image is in a rectangular coordinate system, the first direction is an x direction of the rectangular coordinate system, the second direction is a y direction of the rectangular coordinate system, and reference values in the x direction and the y direction may determine a rectangular neighborhood around a current pixel point; and determining a circular neighborhood around the current pixel point by the reference value on the linear distance, and selecting according to needs in practice. Of course, the neighborhood of pixel points near the sample image boundary is subject to boundary cuts and may not be a complete regular pattern.

And secondly, determining the weight of the corresponding reference point of the current pixel point according to the reference depth value of the current pixel point and the sample estimated depth value of each reference point of the current pixel point.

And thirdly, determining the offset of each reference point of the current pixel point relative to the current pixel point. The offset of each reference point relative to the current pixel point is the difference value between the coordinate value of the reference point and the coordinate value of the current pixel point, and can be directly calculated.

Although theoretically, the reference point with the closest sample estimated depth value to the reference depth value of the current pixel point in all the reference points can also be directly taken as the offset pixel point of the current pixel point, it is found in the test that the reference offset obtained by the method is greatly influenced by noise. According to the exemplary embodiment of the disclosure, the weight of each reference point is determined, and the weighted average of the offsets of all the reference points of the current pixel point is obtained in the fourth step to serve as the reference offset, so that the influence of noise can be weakened, and the accuracy of the obtained reference offset is improved.

Optionally, the operation of determining the weight of each reference point of the current pixel specifically includes: determining an absolute value of an error between a sample estimated depth value of each reference point of the current pixel point and a reference depth value of the current pixel point as an error of the corresponding reference point; and determining the weight of each reference point of the current pixel point according to the error of each reference point of the current pixel point, wherein the weight of each reference point is negatively related to the error of the corresponding reference point. By calculating the errors of the reference points and determining the weights negatively related to the errors, the similarity between the sample estimated depth value of each reference point and the reference depth value of the current pixel point can be visually reflected by using the weights, the influence of local noise can be reduced by weighting, and the accuracy of the obtained reference offset can be effectively improved.

Specifically, the operation of determining the weight of each reference point of the current pixel point according to the error of each reference point of the current pixel point may be specifically executed as: determining the sum of the error of each reference point of the current pixel point and a set value, and determining the ratio of the set value to the sum as the weight of the corresponding reference point; wherein the set value is a positive number. That is to say, for each reference point, the weight w, the error diff, and the set value a satisfy w ═ a/(diff + a), which can make the weight and the error inversely correlated, and can ensure that when the error diff is 0, the weight w is not only meaningful, but also exactly equal to 1, so as to be convenient for visually reflecting the error magnitude through the weight. It should be understood that the set value also affects the weight value when the error is not 0, and the larger the set value is, the larger the weight corresponding to the same error is, and the reasonable selection can be performed according to the actual calculation requirement and the calculation effect. As an example, the set value is 1, i.e., the weight w is 1/(diff + 1).

And fourthly, according to the weight of each reference point of the current pixel point, obtaining a weighted average value of the offsets of all the reference points of the current pixel point, and taking the weighted average value as the reference offset of the current pixel point. For a rectangular coordinate system, the reference offset can be formulated as:

dx_gt(i,j)＝∑(m-i)*w(m,n)/∑w(m,n)

dy_gt(i,j)＝∑(n-j)*w(m,n)/∑w(m,n)

in accordance with the foregoing, the formula indicates, for a current pixel point p (i, j), dx _ gt (i, j) represents an x-direction reference offset of the current pixel point p (i, j), dy _ gt (x, y) represents a y-direction reference offset of the current pixel point p (i, j), any reference point of the current pixel point p (i, j) is represented as p "(m, n), (m-i) represents an x-direction offset of the reference point p" (m, n) relative to the current pixel point p (i, j), (n-j) represents a y-direction offset of the reference point p "(m, n) relative to the current pixel point p (i, j), and w (m, n) represents a weight of the reference point p" (m, n).

In step 504, a loss value is determined based on the reference depth value, the sample predicted depth value, the reference offset, and the sample predicted offset of each pixel point of the sample image.

Optionally, considering that the depth estimation model has two output values of the estimated depth value and the estimated offset, step 504 may specifically include: calculating a first loss value based on a reference depth value and a sample pre-estimated depth value of each pixel point of the sample image; calculating a second loss value based on the reference offset and the sample prediction offset of each pixel point of the sample image; based on the first loss value and the second loss value, a loss value is determined. By respectively determining the first loss value and the second loss value, the loss value aiming at the depth value and the loss value aiming at the offset can be determined definitely, and the orderly and reliable performance of model training is guaranteed.

In step 505, parameters of the depth estimation model are adjusted based on the loss values to train the depth estimation model. In particular, the parameters of the depth estimation model may be adjusted using a back propagation algorithm.

Referring to fig. 6, in order to determine the second loss value, a reference offset generation module needs to be introduced to perform step 503, determining a reference offset. To ensure that only the parameters of the depth estimation model are adjusted, gradient truncation may be performed on the reference offset generation module during training, i.e., the parameters of the reference offset generation module are not adjusted.

Fig. 7 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment of the present disclosure. It should be understood that the image processing apparatus according to the exemplary embodiments of the present disclosure may be implemented in a terminal device such as a smartphone, a tablet computer, a Personal Computer (PC) in software, hardware, or a combination of software and hardware, and may also be implemented in a device such as a server.

Referring to fig. 7, image processing apparatus 700 includes estimation section 701, offset section 702, correction section 703, and summary section 704.

The estimation unit 701 may perform depth estimation processing on the target image to obtain an estimated depth value and an estimated offset of each pixel of the target image.

And collecting the estimated depth values of each pixel point of the target image together to form an estimated depth map. An obvious difference between the image processing method according to the exemplary embodiment of the present disclosure and the depth estimation method in the related art is that, in addition to the estimated depth value, an estimated offset is obtained, which is used by other units to correct the distribution of the depth value on the estimated depth map.

As an example, referring to the target image shown in fig. 1a, fig. 4a shows a predicted depth map with a blurred boundary obtained by the prediction unit 701.

Optionally, the pre-estimation unit 701 performs image processing on the target image, specifically, inputs the target image into the depth estimation model to obtain an output result of the depth estimation model; the output result comprises the estimated depth value and the estimated offset of each pixel point of the target image. The estimated depth value and the estimated offset can be simultaneously output by utilizing one depth estimation model, and the estimated depth value and the estimated offset do not need to be separately calculated, so that the calculation process is facilitated to be simplified. The training process of the depth estimation model is described in the foregoing, and will not be described in detail here.

The offset unit 702 may determine offset pixel points of corresponding pixel points according to the estimated offset of each pixel point of the target image. And in combination with the predicted offset of each pixel point, the corresponding pixel point can be offset to the offset pixel point, so that other units can correct the depth value of the corresponding pixel point based on the offset pixel point.

Optionally, the offset unit 702 may obtain, for each pixel point of the target image, a coordinate value of an offset pixel point of the corresponding pixel point by adding the coordinate value of the corresponding pixel point to the estimated offset of the corresponding pixel point. By executing the coordinate value calculation, the accurate offset pixel point can be obtained, and the simplicity and reliability of the calculation process are ensured.

Optionally, the target image is a two-dimensional plane image, each pixel point has a first direction coordinate value and a second direction coordinate value, the estimated offset of each pixel point correspondingly comprises a first direction estimated offset and a second direction estimated offset, and independent offset in two directions can be achieved. Based on this, the offset unit 702 may obtain, for each pixel point of the target image, a first direction coordinate value of the offset pixel point of the corresponding pixel point by adding the first direction coordinate value of the corresponding pixel point and the first direction predicted offset of the corresponding pixel point, and obtain a second direction coordinate value of the offset pixel point of the corresponding pixel point by adding the second direction coordinate value of the corresponding pixel point and the second direction predicted offset of the corresponding pixel point. The coordinate values of the offset pixel points can be obtained by respectively executing offset calculation on the two coordinate directions and summarizing the calculation results, so that the accuracy of the offset calculation is guaranteed.

Optionally, in a case that the target image is in a rectangular coordinate system, the first direction is an x direction of the rectangular coordinate system, and the second direction is a y direction of the rectangular coordinate system, that is, the rectangular coordinate system is used to describe coordinate values of the pixel points, which is suitable for calculation of a conventional rectangular image. Referring to fig. 3, the x-direction estimated offset may be denoted as dx, and the y-direction estimated offset may be denoted as dy. As an example, referring to the target image shown in fig. 1a, fig. 4b shows a schematic diagram of the estimated x-direction offset amount obtained by the offset unit 702, and fig. 4c shows a schematic diagram of the estimated y-direction offset amount obtained by the offset unit 702.

The correction unit 703 may use the estimated depth value corresponding to the offset pixel point of each pixel point in the target image as the corrected depth value of the corresponding pixel point. The correction unit 703 performs a depth value replacement operation, and by replacing the estimated depth value of each pixel point of the estimated depth map obtained by the estimation unit 701 with the estimated depth value corresponding to the offset pixel point of the pixel point, the distribution of the depth values on the estimated depth map can be adjusted to obtain the corrected depth value without recalculating each depth value, and thus the input size of the depth estimation model does not need to be increased, the calculation amount is small, and the method is applicable to a mobile terminal. Meanwhile, because a guiding filtering method is not adopted, real depth edges and false edges can be distinguished through feature extraction of a deep learning network, and the false edges can not be brought while the definition is improved.

Based on this, optionally, the correction unit 703 may be operated on the pixel point only when the offset pixel point obtained by the offset unit 702 does not coincide with the corresponding pixel point. Furthermore, the shifting unit 702 and the correcting unit 703 may be operated for the pixel only when the estimated offset obtained by the estimating unit 701 is not 0. It should be understood that, for the latter, since the estimated offset usually includes the estimated offset in the first direction and the estimated offset in the second direction, specifically, when the estimated offset in the first direction and the estimated offset in the second direction are not both 0, the offset unit 702 and the correction unit 703 are operated on the pixel.

The summarization unit 704 may obtain a depth map of the target image based on the corrected depth values of each pixel point of the target image. The corrected depth values of each pixel point of the target image are collected together, and the depth map estimated by the image processing apparatus 700 according to the exemplary embodiment of the disclosure is obtained. As an example, referring to the target image shown in fig. 1a, fig. 4e shows the depth map obtained by the summarizing unit 704, and it is obvious that the transition zone is eliminated and the definition of the boundary is significantly improved by comparing with fig. 4 a.

Fig. 8 is a block diagram of an electronic device according to an example embodiment of the present disclosure.

Referring to fig. 8, an electronic device 800 includes at least one memory 801 and at least one processor 802, the at least one memory 801 having stored therein a set of computer-executable instructions that, when executed by the at least one processor 802, perform an image processing method according to an exemplary embodiment of the present disclosure.

By way of example, the electronic device 800 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the set of instructions described above. Here, the electronic device 800 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) either individually or in combination. The electronic device 800 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the electronic device 800, the processor 802 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor 802 may execute instructions or code stored in the memory 801, wherein the memory 801 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory 801 may be integrated with the processor 802, for example, with RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 801 may include a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 801 and the processor 802 may be operatively coupled or may communicate with each other, such as through I/O ports, network connections, etc., so that the processor 802 can read files stored in the memory.

Further, the electronic device 800 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 800 may be connected to each other via a bus and/or a network.

According to an exemplary embodiment of the present disclosure, there may also be provided a computer-readable storage medium, in which instructions, when executed by at least one processor, cause the at least one processor to perform an image processing method according to an exemplary embodiment of the present disclosure. Examples of computer-readable storage media herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a magnetic tape, a floppy disk, a magneto-optical data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic disk, a magnetic tape, a magnetic disk, a magnetic data storage device, a magnetic disk, a magnetic data storage device, a magnetic data, a magnetic disk, a magnetic data storage device, a magnetic disk, a, Hard disk, solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an exemplary embodiment of the present disclosure, there may also be provided a computer program product including computer instructions which, when executed by at least one processor, cause the at least one processor to perform an image processing method according to an exemplary embodiment of the present disclosure.

According to the image processing method and device disclosed by the exemplary embodiment of the disclosure, the offset of the depth map and the offset of the depth map can be estimated at the same time, and the distribution of the depth values on the estimated depth map is corrected through the estimated offset, so that a transition zone is eliminated, and the definition of the depth map is improved. Meanwhile, the calculation amount of correcting the depth map by using the offset is small, and the depth map is only a depth value replacement operation, so that the method can be applied to a mobile terminal. In addition, compared with the method using RGB image guided filtering, the offset in the method is obtained through deep learning training, real deep edges and false edges can be distinguished through feature extraction of a deep learning network, and the false edges cannot be brought while depth is cleared.

According to the training method of the depth estimation model of the exemplary embodiment of the disclosure, besides the beneficial technical effects of the image processing method, a method for determining the reference offset is also provided, the depth distribution trend in the local area is comprehensively considered, the influence of noise can be effectively reduced, the accuracy of the determined reference offset is improved, and the estimation accuracy of the depth estimation model obtained by training is guaranteed.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, characterized in that the image processing method comprises:

carrying out depth estimation processing on a target image to obtain an estimated depth value and an estimated offset of each pixel point of the target image;

determining offset pixel points of corresponding pixel points according to the estimated offset of each pixel point of the target image;

taking the estimated depth value corresponding to the offset pixel point of each pixel point in the target image as a corrected depth value of the corresponding pixel point;

and obtaining a depth map of the target image based on the corrected depth value of each pixel point of the target image.

2. The image processing method of claim 1, wherein said determining an offset pixel of a corresponding pixel based on the estimated offset of each pixel of the target image comprises:

and aiming at each pixel point of the target image, adding the coordinate value of the corresponding pixel point and the estimated offset of the corresponding pixel point to obtain the coordinate value of the offset pixel point of the corresponding pixel point.

3. The image processing method of claim 2, wherein each pixel has a first direction coordinate value and a second direction coordinate value, and the estimated offset of each pixel includes a first direction estimated offset and a second direction estimated offset, wherein the obtaining the coordinate value of the offset pixel of the corresponding pixel by adding the coordinate value of the corresponding pixel to the estimated offset of the corresponding pixel for each pixel of the target image comprises:

and aiming at each pixel point of the target image, adding the first direction coordinate value of the corresponding pixel point and the first direction pre-estimated offset of the corresponding pixel point to obtain a first direction coordinate value of the offset pixel point of the corresponding pixel point, and adding the second direction coordinate value of the corresponding pixel point and the second direction pre-estimated offset of the corresponding pixel point to obtain a second direction coordinate value of the offset pixel point of the corresponding pixel point.

4. The image processing method according to claim 3,

under the condition that the target image is in a rectangular coordinate system, the first direction is the x direction of the rectangular coordinate system, and the second direction is the y direction of the rectangular coordinate system; or

And under the condition that the target image is in a polar coordinate system, the first direction is the radial direction of the polar coordinate system, and the second direction is the polar angle direction of the polar coordinate system.

5. The image processing method according to any one of claims 1 to 4, wherein the performing depth estimation processing on the target image to obtain an estimated depth value and an estimated offset of each pixel point of the target image comprises:

inputting the target image into a depth estimation model to obtain an output result of the depth estimation model; the output result comprises the estimated depth value and the estimated offset of each pixel point of the target image.

6. The image processing method of claim 5, wherein the depth estimation model is trained by:

acquiring a sample image and a reference depth value of each pixel point of the sample image;

inputting the sample image into the depth estimation model to obtain a sample estimated depth value and a sample estimated offset of each pixel point of the sample image;

determining reference offset of each pixel point of the sample image according to the reference depth value and the sample pre-estimated depth value of each pixel point of the sample image;

determining a loss value based on the reference depth value, the sample pre-estimated depth value, the reference offset and the sample pre-estimated offset of each pixel point of the sample image;

adjusting parameters of the depth estimation model based on the loss values to train the depth estimation model.

7. An image processing apparatus characterized by comprising:

a prediction unit configured to: carrying out depth estimation processing on a target image to obtain an estimated depth value and an estimated offset of each pixel point of the target image;

an offset unit configured to: determining offset pixel points of corresponding pixel points according to the estimated offset of each pixel point of the target image;

a correction unit configured to: taking the estimated depth value corresponding to the offset pixel point of each pixel point in the target image as a corrected depth value of the corresponding pixel point;

a summarization unit configured to: and obtaining a depth map of the target image based on the corrected depth value of each pixel point of the target image.

8. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the image processing method of any one of claims 1 to 6.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform the image processing method of any of claims 1 to 6.

10. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by at least one processor, implement the image processing method of any of claims 1 to 6.