CN110691228A

CN110691228A - Three-dimensional transformation-based depth image noise marking method and device and storage medium

Info

Publication number: CN110691228A
Application number: CN201910986081.XA
Authority: CN
Inventors: 杨露
Original assignee: Beijing Maigewei Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd; Beijing Maigewei Technology Co Ltd
Priority date: 2019-10-17
Filing date: 2019-10-17
Publication date: 2020-01-14

Abstract

The invention relates to a depth image noise marking method and device based on three-dimensional transformation and a storage medium. The method comprises the following steps: acquiring a depth image and a texture image of a first reference viewpoint; determining the parallax of the calibrated pixels on the texture image when the calibrated pixels are projected from the first reference viewpoint to the second reference viewpoint; projecting the texture image from the first reference viewpoint to the second reference viewpoint according to the parallax, and determining a pixel difference value between the calibration pixel on the texture image and a corresponding pixel on a projection image of the texture image; and determining a noise pixel in the depth image according to the pixel difference value. The technical scheme of the invention can accurately determine the noise pixel in the depth image, thereby ensuring the accuracy of the application such as three-dimensional view construction.

Description

Three-dimensional transformation-based depth image noise marking method and device and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a depth image noise marking method and device based on three-dimensional transformation and a storage medium.

Background

For a specific target object, a three-dimensional view tends to have a more intuitive and realistic display effect than a two-dimensional view. When a three-dimensional view is constructed, a left camera and a right camera are generally adopted to acquire images of the same target object, information such as texture images representing textures and depth images representing depths is acquired, viewpoint mapping is performed on the basis of the acquired images of the two cameras by combining camera parameters, image plane coordinates mapped from a reference viewpoint to a virtual viewpoint are acquired by using a coordinate transformation relation, and then three-dimensional synthesis processing and other operations are performed. However, due to the existence of noise pixels in the depth image, the noise pixels in the depth image cannot be accurately and quantitatively determined at present, so that the viewpoint mapping process may be performed based on the depth image with relatively high noise, which affects the accuracy of viewpoint mapping and further affects the accuracy of three-dimensional view construction.

Disclosure of Invention

The invention provides a depth image noise marking method and device based on three-dimensional transformation and a storage medium, aiming at accurately determining noise pixels in a depth image.

In a first aspect, the present invention provides a depth image noise labeling method based on three-dimensional transformation, including the following steps:

acquiring a depth image and a texture image of a first reference viewpoint;

determining the parallax of the calibrated pixels on the texture image when the calibrated pixels are projected from the first reference viewpoint to the second reference viewpoint;

projecting the texture image from the first reference viewpoint to the second reference viewpoint according to the parallax, and determining a pixel difference value between the calibration pixel on the texture image and a corresponding pixel on a projection image of the texture image;

and determining a noise pixel in the depth image according to the pixel difference value.

In a second aspect, the present invention provides a depth image noise labeling apparatus based on three-dimensional transformation, including:

the acquisition module is used for acquiring a depth image and a texture image of a first reference viewpoint;

the processing module is used for determining the parallax of the calibrated pixels on the texture image when the calibrated pixels are projected from the first reference viewpoint to the second reference viewpoint; projecting the texture image from the first reference viewpoint to the second reference viewpoint according to the parallax, and determining a pixel difference value between the calibration pixel on the texture image and a corresponding pixel on a projection image of the texture image;

and the judging module is used for determining the noise pixel in the depth image according to the pixel difference value.

In a third aspect, the present invention provides a depth image noise marking apparatus based on three-dimensional transformation, the apparatus comprising a memory and a processor; the memory for storing a computer program; the processor is configured to implement the three-dimensional transformation based depth image noise labeling method as described above when the computer program is executed.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the three-dimensional transform-based depth image noise labeling method as described above.

The method, the device and the storage medium for marking the depth image noise based on the three-dimensional transformation have the advantages that after the collected image is obtained by the camera corresponding to the first reference viewpoint, the texture image representing the texture information of each pixel in the collected image and the depth image representing the depth information can be correspondingly obtained. Each pixel on the texture image is projected from a first reference viewpoint to a second reference viewpoint, and at this time, there is usually a parallax between each pixel and its projected pixel, and this parallax is affected by depth noise. According to the parallax, a projected image which projects the texture image from the first reference viewpoint to the second reference viewpoint can be determined, each calibration pixel on the texture image generates a corresponding pixel on the projected image, and the pixel difference between the calibration pixel of the texture image and the corresponding pixel of the projected image can be used as a basis for judging whether the depth image pixel corresponding to the calibration pixel is the depth image noise pixel in the image collected by the first reference viewpoint camera because the corresponding relationship exists between the pixels in the texture image and the depth image. After traversing all the pixels, all the noise pixels of the depth image in the image collected by the first reference viewpoint camera can be confirmed, so that the information such as the distribution quantity, the position, the proportion and the like of the noise pixels in the depth image can be efficiently and accurately determined, the depth image is quantitatively judged, whether the accuracy of the depth image meets the requirements of the applications such as viewpoint mapping and the like is determined, and the accuracy of the three-dimensional view construction is further ensured. In an application scene which is very dependent on the accuracy of the depth image, the noise marking of the depth image has important reference value for the quality evaluation of the depth image and the filtering of the subsequent depth image.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a depth image noise labeling method based on three-dimensional transformation according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a viewpoint mapping process;

FIG. 3 is a schematic diagram of a viewpoint mapping process;

FIG. 4 is a schematic diagram of imaging projected from a left reference viewpoint to a right reference viewpoint according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a marked noisy pixel in an embodiment of the present invention;

fig. 6 is a block diagram of a depth image noise labeling apparatus based on three-dimensional transformation according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, a depth image noise labeling method based on three-dimensional transformation according to an embodiment of the present invention includes the following steps:

and S1, acquiring the depth image and the texture image of the first reference viewpoint.

Specifically, a left camera and a right camera are used to simulate human eyes to respectively acquire images of a specific target object, so that a texture image representing texture information of each pixel point in the acquired images and a depth image representing depth information can be obtained correspondingly. The first reference viewpoint may be an image reference viewpoint corresponding to the left camera, and may also be an image reference viewpoint corresponding to the right camera.

And S2, determining the parallax of the calibrated pixels on the texture image when the calibrated pixels are projected from the first reference viewpoint to the second reference viewpoint.

Specifically, taking an image reference viewpoint corresponding to the left camera as a first reference viewpoint, that is, a left reference viewpoint, and an image reference viewpoint corresponding to the right camera as a second reference viewpoint, that is, a right reference viewpoint as an example, it can be realized to project each pixel on the texture image from the left reference viewpoint to the right reference viewpoint based on information in the depth image and camera parameters, and at this time, there is a parallax between each pixel and its projected pixel, and this parallax is affected by depth noise.

And S3, projecting the texture image from the first reference viewpoint to the second reference viewpoint according to the parallax, and determining the pixel difference value between the calibration pixel on the texture image and the corresponding pixel on the projection image of the texture image.

Specifically, a projection image for projecting the texture image from the left reference viewpoint to the right reference viewpoint is determined according to the parallax, and each calibration pixel on the texture image generates a corresponding pixel on the projection image. Because the texture image and each pixel in the depth image have a corresponding relationship, the pixel difference between the texture image calibration pixel and the pixel corresponding to the projected image of the texture image can be used as a basis for judging whether the depth image pixel corresponding to the calibration pixel is the depth image noise pixel in the collected image of the left reference viewpoint camera.

And S4, determining noise pixels in the depth image according to the pixel difference values.

Specifically, after traversing all pixels, all depth image noise pixels in the image acquired by the left reference viewpoint camera can be confirmed, so that the information such as the distribution quantity, the position, the proportion and the like of the noise pixels in the depth image can be efficiently and accurately determined.

In this embodiment, quantitative determination may be performed on the depth image to determine whether the accuracy of the depth image meets the requirements of applications such as viewpoint mapping, and thus the accuracy of the three-dimensional view construction is ensured. In an application scene which is very dependent on the accuracy of the depth image, the noise marking of the depth image has important reference value for the quality evaluation of the depth image and the filtering of the subsequent depth image.

Similarly, for the right reference viewpoint camera collected image, the noise pixel in the depth image corresponding to the right reference viewpoint camera collected image can also be labeled in a similar manner to determine the quantitative information of the noise pixel.

It should be noted that, if a plurality of cameras are used for image acquisition, that is, each camera corresponds to one reference viewpoint, for each reference viewpoint image, the noise pixel in the depth image corresponding to the reference viewpoint image may be labeled in a manner similar to that described above, so as to determine the quantitative information of the noise pixel.

Optionally, the method further comprises the steps of:

and acquiring camera parameters corresponding to the first reference viewpoint and the second reference viewpoint.

The determining the disparity of the scaled pixels on the texture image when projected from the first reference viewpoint to the second reference viewpoint comprises:

determining the disparity from the depth image and the camera parameters.

Optionally, the camera parameters include a camera focal length corresponding to the first reference viewpoint, a distance between a camera optical axis corresponding to the first reference viewpoint and a camera optical axis corresponding to the second reference viewpoint.

The determining the disparity from the depth image and the camera parameters comprises:

and determining the measured depth value of the calibration pixel according to the depth image.

Determining the disparity according to a first formula, the first formula being:

wherein, Δ d_pPresentation instrumentThe parallax error f_xRepresents the camera focal length, t, corresponding to the first reference viewpoint_xRepresents a distance between a camera optical axis corresponding to the first reference viewpoint and a camera optical axis corresponding to the second reference viewpoint, z_lRepresenting said measured depth value in the depth image.

In particular, when constructing a three-dimensional view, a viewpoint mapping is required. The viewpoint mapping is also called three-dimensional transformation (3d warping), and the mapping process mainly uses texture image information and depth image information of a reference viewpoint image and camera parameters of an acquired image to map each pixel point in the reference viewpoint image plane to a virtual viewpoint image plane or a pixel point called a projection viewpoint image plane. The method mainly comprises two steps: firstly, mapping pixel coordinates in a current reference viewpoint image plane to position coordinates of a three-dimensional space by utilizing a transformation relation from an image coordinate system to a camera coordinate system and then to a world coordinate system; the second step is opposite to the first step, and the obtained position coordinates of the three-dimensional space are mapped to the image plane coordinates of the virtual viewpoint by using the coordinate transformation relation.

As shown in fig. 2, in three-dimensional space XYZ, in u_lO_lv_lIs a reference viewpoint image plane, i.e. a texture image plane of a first reference viewpoint, u_vO_vv_vFor projecting viewpoint image plane, or called virtual viewpoint image plane, that is texture image plane of the second reference viewpoint, reference pixel point p in the reference viewpoint image plane_l(u_l,v_l) Virtual pixel point p mapped to projection viewpoint image plane_v(u_v,v_v) The process of point mapping is described for the sake of example. Wherein, the coordinates of each pixel point can be understood as texture image information. In addition, in the present embodiment, to explain the projection relationship, the first reference viewpoint is referred to as a reference viewpoint and the second reference viewpoint is referred to as a projection viewpoint.

Let the internal parameter matrix of the camera coordinate system corresponding to the reference viewpoint be A_lThe rotation matrix of the transformation from the world coordinate system to the camera coordinate system is R_lTranslation matrix is T_l. According to the image coordinate systemIn a transformation relation with the camera coordinate system, A can be expressed_lExpressed as:

wherein f is_x ^l、f_y ^lRespectively representing the focal lengths of the reference viewpoint corresponding to the horizontal and vertical directions of the camera, (O)_x ^l,O_y ^l) Representing the intersection of the optical axis and the reference viewpoint image plane, i.e., the reference point, the position coordinate P in the world coordinate system can be obtained_wReference pixel point p mapped onto reference viewpoint image plane_l(u_l,v_l) The relation of (1):

z_lp_l＝A_l(R_lP_w+T_l)。

wherein z is_lAnd if the depth measurement is accurate, the depth value of the pixel point in the depth image corresponding to the reference viewpoint can be calculated by the quantized depth value in the depth image. The actual depth of field may be from zero to infinity, and the depth values in the depth image are quantized values, typically ranging from 0 to 255. After the camera acquisition image is obtained, the corresponding texture image and depth image can be obtained, and then the measured depth value is obtained. And (3) inverting two sides of the above formula to obtain coordinates in a world coordinate system:

P_w(X,Y,Z)＝R_l ^-1(z_lA_l ^-1p_l-T_l)。

similarly, the transformation relationship from the projection viewpoint image plane coordinate system to the world coordinate system can be expressed by the same analogy method as:

P_w(X,Y,Z)＝R_v ^-1(z_vA_v ^-1p_v-T_v)。

combining the above two formulas to obtain:

R_l ^-1(z_lA_l ^-1p_l-T_l)＝R_v ^-1(z_vA_v ^-1p_v-T_v)。

the above formula is transformed to obtain the coordinates of the virtual pixel points mapped in the projection viewpoint image plane:

since the cameras are usually arranged in parallel, the synthesized projection viewpoint and the reference viewpoint should be parallel to each other, and the camera parameters are completely consistent, the following parameters can be obtained:

wherein, t_xThe distance between the reference viewpoint and the projection viewpoint, more specifically, the horizontal distance between the reference viewpoint and the projection viewpoint in the X direction, or the distance between the optical axes of the camera corresponding to the reference viewpoint and the camera corresponding to the projection viewpoint, is represented. Since the cameras are usually horizontally arranged, there is only parallax in the horizontal direction, and the horizontal distance in the X direction corresponds to the distance between the reference viewpoint and the projection viewpoint.

P can be substituted by the above formula_vThe simplification is as follows:

is equivalent to:

since there is only horizontal parallax when cameras are arranged in parallel, v_l＝v_vAnd the depth values of the corresponding pixels of the projection viewpoint and the reference viewpoint are equal, then there is z_l＝z_vAt this time, only the camera focal length in the horizontal direction needs to be considered, and the camera focal length in the vertical direction does not need to be considered, so that the above formula can be further simplified:

wherein, Δ d_pThe disparity between the reference viewpoint image and the projected viewpoint image corresponding to the pixel can be expressed, and a left reference viewpoint corresponding to a left camera is taken as an example, that is, the disparity between the left reference viewpoint image and a left projected viewpoint image projected by the left reference viewpoint image corresponding to the pixel.

It can be seen that since the cameras are generally arranged in parallel, Δ d_pThe difference value of the abscissa between a specific pixel in the reference viewpoint image and the corresponding pixel in the projection viewpoint image can be represented, and at the moment, the abscissa in the reference viewpoint image can be adjusted based on the difference value on the basis of the known pixel coordinates of the reference viewpoint image, so that the pixel coordinates of the projection viewpoint can be obtained, namely the projection image is obtained. It should be noted that this difference is usually taken as an absolute value, i.e. the difference is not negative.

However, since noise usually exists in the depth image corresponding to the reference viewpoint image, as shown in fig. 3, one pixel point p in the reference viewpoint image is caused to be a pixel point p_l(u_l,v_l) Mapping to slave P in world three-dimensional coordinate system_w(X, Y, Z) to P_e(X,Y,Z)。

Finally mapping to position in projection viewpoint from p_v(u_v,v_v) Become p_e(u_e,v_e) Location. If noise appears at the boundary of the foreground and the background, the difference of positions may cause that the original pixels of the foreground are mapped to the background area and background area noise appears, and the pixels of the background are mapped to the foreground position to cause foreground corrosion, thereby affecting the quality of the projected image.

In this embodiment, taking the image acquisition by the left and right cameras as an example, the disparity between each pixel in the texture image and its projection pixel is determined based on the relationship between the disparity and the camera parameters and the depth image information in the viewpoint mapping process. Since the camera parameters are determined, the factors affecting parallax are mainly noise pixels in the depth image. And projecting the texture image between two reference viewpoints according to the parallax, and determining pixel difference generated by projection, wherein the pixel difference can reflect the noise conditions of corresponding pixels in the texture image and the depth image, and accordingly, the noise pixels in the depth image can be determined, so that the depth image is quantitatively judged, whether the accuracy meets the requirements of application such as viewpoint mapping and the like is determined, or reference is provided for subsequent filtering processing. Therefore, when the three-dimensional view is constructed, the depth image used in the viewpoint mapping process meets the precision requirement, and the accuracy of the three-dimensional view is finally ensured.

Specifically, referring to fig. 4, two exemplary pictures are both projection images of a specific texture image projected from a left reference viewpoint to a right reference viewpoint, and it can be seen that a high-brightness region in the picture is a hole generated due to occlusion due to a pixel difference relationship after projection, that is, the texture after projection is not matched with the texture of the original image.

Optionally, the determining a noise pixel in the depth image according to the pixel difference value includes:

and comparing the pixel difference value with a preset threshold value, and when the pixel difference value aiming at the calibration pixel is greater than or equal to the preset threshold value, marking the pixel in the depth image corresponding to the calibration pixel as the noise pixel.

Specifically, taking the image acquisition by the left camera and the right camera as an example, after determining the parallax between a specific pixel on the left reference viewpoint and a pixel corresponding to the projection to the right reference viewpoint by the first formula, determining the projection image of the texture image from the left reference viewpoint to the right reference viewpoint according to the parallax, further determining the pixel difference of each corresponding pixel point on the projection image and the texture image, comparing the pixel difference with a preset threshold, and if the pixel difference is greater than or equal to the preset threshold, indicating that the reliability of the measured depth value of the specific pixel of the image acquired by the left reference viewpoint is insufficient, marking the pixel on the depth image corresponding to the specific pixel as a noise pixel.

In the embodiment, through experimental verification, when a standard medium focus lens of 50mm is used and a preset threshold value is set to be 14 at a reference viewpoint interval of 0.5m, the method is applicable to most scenes to accurately screen out depth noise. As shown in fig. 5, white dots on the black background in (a) to (g) are all marked noise pixels. It can be seen that information of noise pixels, especially the number, proportion and distribution position of boundary noise pixels, can be obtained, which has important reference value for depth image quality evaluation and subsequent depth image filtering.

Optionally, the method further comprises the steps of:

and determining the illumination difference between the reference image of the first reference viewpoint and the reference image of the second reference viewpoint, and adjusting the preset threshold value according to the illumination difference.

Optionally, the adjusting the preset threshold according to the illumination difference includes:

increasing the preset threshold value when the difference between the illumination difference and a preset reference value increases.

Decreasing the preset threshold when the difference between the illumination difference and a preset reference value decreases.

Specifically, the preset threshold value has a positive correlation with the illumination difference as a whole, for example, the larger the illumination difference between the left and right reference viewpoint images is, the larger the preset threshold value is. A predetermined reference value, which may be a fixed value or a range, may be set first. When the illumination difference is equal to the value or within the range, a default preset threshold may be employed. When the environment changes or other conditions causing the illumination difference to change occur, the preset threshold value may be adjusted according to the relationship between the changed illumination difference and the preset reference value, for example, if the illumination difference is larger and larger, or the difference between the illumination difference and the preset reference value is larger and larger, the preset threshold value needs to be increased, and the preset threshold value is reduced in an irregular manner. The predetermined threshold may vary proportionally, exponentially, step-wise, or other function that corresponds to a positive correlation with the illumination difference. Especially for different cameras, and different shooting scenarios, may result in differences in the preset threshold selection.

It should be noted that, when the distance between the illumination difference and the preset reference value reaches a certain value, which indicates that the illumination difference is too large at this time, the preset threshold value may not be adjusted any more since a part of the non-noise depth pixels may be detected as noise.

Optionally, the method further comprises the steps of:

and determining ratio information of the noise pixels in all the pixels in the depth image and/or distribution area information of the noise pixels in the depth image according to the noise pixels.

Specifically, the depth image is composed of a plurality of pixel points, and after the noise pixels in the depth image are determined, information such as the distribution quantity, the position, the proportion and the like of the noise pixels in the depth image can be further determined, so that the depth image is quantitatively judged, whether the accuracy of the depth image meets the requirements of application such as viewpoint mapping and the like is determined, and the accuracy of the three-dimensional view construction is further ensured.

As shown in fig. 6, the depth image noise labeling apparatus based on three-dimensional transformation according to an embodiment of the present invention includes:

and the acquisition module is used for acquiring the depth image and the texture image of the first reference viewpoint.

The processing module is used for determining the parallax of the calibrated pixels on the texture image when the calibrated pixels are projected from the first reference viewpoint to the second reference viewpoint; and projecting the texture image from the first reference viewpoint to the second reference viewpoint according to the parallax, and determining a pixel difference value between the calibration pixel on the texture image and a corresponding pixel on a projection image of the texture image.

Optionally, the obtaining module is further configured to: and acquiring camera parameters corresponding to the first reference viewpoint and the second reference viewpoint.

The processing module is specifically configured to: determining the disparity from the depth image and the camera parameters.

The processing module is specifically configured to:

wherein, Δ d_pRepresenting said parallax, f_xRepresents the camera focal length, t, corresponding to the first reference viewpoint_xRepresents a distance between a camera optical axis corresponding to the first reference viewpoint and a camera optical axis corresponding to the second reference viewpoint, z_lRepresenting said measured depth value in the depth image.

Optionally, the determining module is specifically configured to: and comparing the pixel difference value with a preset threshold value, and when the pixel difference value aiming at the calibration pixel is greater than or equal to the preset threshold value, marking the pixel in the depth image corresponding to the calibration pixel as the noise pixel.

Optionally, the processing module is further configured to: and determining the illumination difference between the reference image of the first reference viewpoint and the reference image of the second reference viewpoint, and adjusting the preset threshold value according to the illumination difference.

Optionally, the processing module is specifically configured to: increasing the preset threshold value when the difference between the illumination difference and a preset reference value increases; decreasing the preset threshold when the difference between the illumination difference and a preset reference value decreases.

Optionally, the processing module is further configured to: and determining ratio information of the noise pixels in all the pixels in the depth image and/or distribution area information of the noise pixels in the depth image according to the noise pixels.

In another embodiment of the present invention, an apparatus for depth image noise labeling based on three-dimensional transformation includes a memory and a processor. The memory is used for storing the computer program. The processor is configured to implement the three-dimensional transformation based depth image noise labeling method as described above when the computer program is executed.

It should be noted that the device may be a computer device such as a server or a mobile terminal.

In another embodiment of the present invention, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the three-dimensional transformation-based depth image noise labeling method as described above.

The reader should understand that in the description of this specification, reference to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example" or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A depth image noise marking method based on three-dimensional transformation is characterized by comprising the following steps:

acquiring a depth image and a texture image of a first reference viewpoint;

2. The three-dimensional transform-based depth image noise labeling method of claim 1, further comprising:

acquiring camera parameters corresponding to the first reference viewpoint and the second reference viewpoint;

determining the disparity from the depth image and the camera parameters.

3. The three-dimensional transformation-based depth image noise labeling method according to claim 2, wherein the camera parameters comprise a camera focal length corresponding to the first reference viewpoint, a distance between a camera optical axis corresponding to the first reference viewpoint and a camera optical axis corresponding to the second reference viewpoint;

determining the measured depth value of the calibration pixel according to the depth image;

wherein, Δ d_pRepresenting said parallax, f_xRepresents the camera focal length, t, corresponding to the first reference viewpoint_xRepresenting a distance between a camera optical axis corresponding to the first reference viewpoint and a camera optical axis corresponding to the second reference viewpoint，z_lRepresenting the measured depth value.

4. The method according to any one of claims 1 to 3, wherein the determining the noise pixel in the depth image according to the pixel difference value comprises:

5. The three-dimensional transform-based depth image noise labeling method of claim 4, further comprising:

6. The three-dimensional transformation-based depth image noise labeling method according to claim 5, wherein the adjusting the preset threshold according to the illumination difference comprises:

increasing the preset threshold value when the difference between the illumination difference and a preset reference value increases;

7. The three-dimensional transform-based depth image noise labeling method of claim 1, further comprising:

8. A depth image noise marking device based on three-dimensional transformation is characterized by comprising:

9. The device for marking the noise of the depth image based on three-dimensional transformation is characterized by comprising a memory and a processor;

the memory for storing a computer program;

the processor, when executing the computer program, is configured to implement the three-dimensional transform-based depth image noise labeling method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the three-dimensional transform-based depth image noise labeling method according to any one of claims 1 to 7.