WO2024083006A1 - 一种三维成像方法、装置、设备和存储介质 - Google Patents
一种三维成像方法、装置、设备和存储介质 Download PDFInfo
- Publication number
- WO2024083006A1 WO2024083006A1 PCT/CN2023/123920 CN2023123920W WO2024083006A1 WO 2024083006 A1 WO2024083006 A1 WO 2024083006A1 CN 2023123920 W CN2023123920 W CN 2023123920W WO 2024083006 A1 WO2024083006 A1 WO 2024083006A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- depth map
- target
- depth
- normal vector
- prediction result
- Prior art date
Links
- 238000003384 imaging method Methods 0.000 title claims abstract description 54
- 239000013598 vector Substances 0.000 claims abstract description 127
- 238000005457 optimization Methods 0.000 claims abstract description 60
- 238000012545 processing Methods 0.000 claims abstract description 60
- 230000011218 segmentation Effects 0.000 claims abstract description 51
- 238000000034 method Methods 0.000 claims abstract description 23
- 230000006870 function Effects 0.000 claims description 26
- 238000009499 grossing Methods 0.000 claims description 26
- 238000010008 shearing Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 7
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 6
- 238000010276 construction Methods 0.000 description 6
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 3
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 3
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 3
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 3
- 108010001267 Protein Subunits Proteins 0.000 description 3
- 238000003491 array Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000011158 industrial composite Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
Definitions
- Embodiments of the present application relate to computer technology, for example, to a three-dimensional imaging method, apparatus, device and storage medium.
- 3D cameras can be used to optically acquire three-dimensional images of scenes of interest.
- 3D imaging methods can include passive binocular 3D imaging, 3D imaging based on the time-of-flight principle, and structured light projection 3D imaging.
- passive binocular 3D imaging uses triangulation to indirectly calculate 3D information through image feature matching.
- 3D imaging based on the time-of-flight principle directly measures the flight time of light.
- Structured light projection 3D imaging actively projects known coded patterns to improve feature matching effects.
- Passive binocular 3D imaging has high requirements for the surface texture features of the object being measured, and cannot measure scenes with unclear textures, so it is not suitable as the eyes of industrial composite robots.
- the measurement accuracy of 3D imaging based on the time-of-flight principle depends on the accuracy of the light beam detection time.
- the imaging resolution and accuracy in close-range scenes are poor, so it is more used in long-range scenes, such as autonomous driving and long-distance search and detection.
- structured light projection 3D imaging since the projected light is easily projected and reflected on transparent objects, it is impossible to effectively perform 3D imaging on transparent objects.
- the present invention provides a three-dimensional imaging method, device, equipment and storage medium to effectively Improve the 3D imaging effect of transparent objects in the shooting scene.
- an embodiment of the present application provides a three-dimensional imaging method, comprising:
- the original depth map is sheared to obtain a first depth map without the depth information of the transparent object
- a target three-dimensional image corresponding to the target shooting scene is determined.
- an embodiment of the present application further provides a three-dimensional imaging device, including:
- An image acquisition module configured to acquire a target color image and an original depth map corresponding to a target shooting scene, wherein the target shooting scene includes at least one transparent object;
- a target color image input module is configured to input the target color image into a preset image processing model for image processing, and obtain a transparent object segmentation result, a boundary prediction result, and a normal vector prediction result in the target shooting scene according to an output of the preset image processing model;
- a shearing processing module configured to perform shearing processing on the original depth map based on the transparent object segmentation result to obtain a first depth map without the depth information of the transparent object;
- a global optimization module configured to perform global optimization of depth information based on the first depth map, the transparent object segmentation result, the boundary prediction result and the normal vector prediction result, and determine an optimized second depth map;
- the target three-dimensional image determination module is configured to determine a target three-dimensional image corresponding to the target shooting scene based on the second depth map.
- an embodiment of the present application further provides an electronic device, the electronic device comprising:
- a memory configured to store at least one program
- the at least one processor When the at least one program is executed by the at least one processor, the at least one processor implements the three-dimensional imaging method provided in any embodiment of the present application.
- an embodiment of the present application further provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements a three-dimensional imaging method as provided in any embodiment of the present application.
- FIG1 is a flow chart of a three-dimensional imaging method provided by an embodiment of the present application.
- FIG2 is an exemplary diagram of a three-dimensional imaging process involved in an embodiment of the present application.
- FIG3 is a flow chart of another three-dimensional imaging method provided by an embodiment of the present application.
- FIG4 is a schematic structural diagram of a three-dimensional imaging device provided by an embodiment of the present application.
- FIG5 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present application.
- FIG1 is a flow chart of a three-dimensional imaging method provided by an embodiment of the present application. This embodiment is applicable to the case of three-dimensional imaging of a shooting scene containing transparent objects.
- the method can be performed by a three-dimensional imaging device, which can be implemented by software and/or hardware and integrated into an electronic device, such as a 3D camera, or a visual sensor of a composite robot in an industrial scene, so as to assist the robot in completing tasks such as scene recognition and target detection under complex working conditions.
- the method includes the following steps:
- S110 Acquire a target color image and an original depth map corresponding to a target shooting scene, wherein the target shooting scene includes at least one transparent object.
- transparent objects can include fully transparent objects that allow light to be fully transmitted and partially transparent objects that allow light to be partially transmitted.
- the target shooting scene may refer to the scene area currently being shot. There may be at least one transparent object in the target shooting scene. There may be only transparent objects in the target shooting scene, or there may be other non-transparent objects in addition to transparent objects.
- the target color image may be an RGB image synthesized by using the three colors of red, blue and green.
- the original depth map may be a depth map containing the depth information of transparent objects.
- a 2D camera may be used to shoot a target shooting scene to obtain a target color image corresponding to the target shooting scene.
- a 3D camera may be used to shoot a target shooting scene to obtain an original depth map corresponding to the target shooting scene.
- the preset image processing model can be a neural network model for segmenting and extracting transparent objects from color images, predicting the boundaries between various objects in the image, and predicting the normal vectors of various element positions in the image.
- the transparent object segmentation result can be represented by a grayscale image, for example, the transparent object location area is a white area with a grayscale value of 255, and other areas except the transparent object are black areas with a grayscale value of 0.
- the boundary prediction result can refer to a boundary prediction image that is consistent with the size of the input target color image.
- the boundary prediction image includes the boundary between the transparent object and the background represented by lines and the boundary between the transparent object and the non-transparent object.
- the normal vector prediction result can refer to a normal vector prediction image that is consistent with the size of the input target color image.
- different colors can be used to represent the normal vector corresponding to each element position in the image.
- the normal vector corresponding to each element position can refer to the normal vector of the plane formed by the element position and other adjacent element positions.
- the preset image processing model is obtained by pre-training the model based on sample data.
- the sample data includes a sample color image containing at least one transparent object, and a transparent object segmentation label, a boundary label, and a normal vector label obtained by calibrating the sample color image.
- FIG2 shows an example diagram of a three-dimensional imaging process.
- the target color image can be input into a pre-trained preset image processing model.
- the preset image processing model can The input target color image is processed, and the transparent object segmentation result, boundary prediction result and normal vector prediction result in the target shooting scene are determined and output, so that the transparent object segmentation result, boundary prediction result and normal vector prediction result can be quickly obtained by using the preset image processing model.
- the preset image processing model may include: a coding sub-model, a first decoding branch sub-model, a second decoding branch sub-model, and a third decoding branch sub-model.
- S120 may include: inputting the target color image into the coding sub-model for feature extraction to obtain extracted target image feature information; inputting the target image feature information into the first decoding branch sub-model for position prediction of transparent objects to determine the transparent object segmentation result; inputting the target image feature information into the second decoding branch sub-model for boundary prediction of transparent objects and non-transparent objects to determine the boundary prediction result; inputting the target image feature information into the third decoding branch sub-model for prediction of normal vectors corresponding to pixel point positions to determine the normal vector prediction result.
- the encoding sub-model may refer to a network model that performs image encoding on a color image and extracts image features from the color image.
- the encoding sub-model may be used as the encoding sub-model, and the third and fourth inference stages may be deleted, thereby reducing the overall time overhead of the model.
- the single-branch decoding network in the Swin Transformer network structure may be modified into a three-branch decoding network model, namely, the first decoding branch sub-model, the second decoding branch sub-model, and the third decoding branch sub-model.
- the first decoding branch sub-model, the second decoding branch sub-model, and the third decoding branch sub-model are three parallel decoding network models for predicting different information, so that the transparent object segmentation results, boundary prediction results, and normal vector prediction results in the input image can be simultaneously predicted through the first decoding branch sub-model, the second decoding branch sub-model, and the third decoding branch sub-model, thereby avoiding repeated inference of the encoding network, improving the information prediction efficiency, and also improving the three-dimensional imaging efficiency.
- the original depth map is sheared to obtain a first depth map with the depth information of the transparent object removed.
- the first depth map may refer to a depth map that does not include the depth information of the transparent object.
- the collected depth information of the transparent object is inaccurate, so it is necessary to predict the wrong depth of the transparent object position in the original depth map. Cut the degree information.
- the depth information at the position of the transparent object in the original depth map is determined, and the transparent object depth information in the original depth map is cut and removed, thereby obtaining a first depth map after removing the erroneously predicted transparent object depth information.
- the second depth map may be an optimal depth map obtained after completing the depth information of the first depth map, that is, a depth map containing the most accurate depth information of the transparent object.
- the first depth map, the transparent object segmentation result, the boundary prediction result and the normal vector prediction result can be input into a global optimizer.
- the global optimizer solves the optimal solution for the depth information based on the input information and outputs the obtained optimal depth map, i.e., the second depth map, thereby completing the depth information of the depth missing area caused by the transparency of the object.
- S140 may include: taking the target depth map as the optimization object, constructing a target optimization function corresponding to the target depth map based on the first depth map, the transparent object segmentation result, the boundary prediction result and the normal vector prediction result; minimizing the target optimization function, and determining the target depth map corresponding to the minimum solution as the optimized second depth map.
- the target depth map can be a depth map obtained after completing the depth information of the first depth map. That is to say, the target depth map is a complete depth map with the same size as the original depth map.
- the target depth map may refer to the optimization object in the target optimization function.
- the depth information of each pixel in the target depth map can be adjusted and optimized so as to determine the optimal depth information corresponding to each pixel.
- the position of each pixel in the target depth map can be modeled based on the first depth map, transparent object segmentation results, boundary prediction results and normal vector prediction results, and a target optimization function with the target depth map as the optimization object is constructed, thereby converting the optimization of the entire depth map into a mathematical process of solving the minimum solution of an n-dimensional linear polynomial, and the sparse square root Cholesky decomposition method can be used to minimize the target optimization function, and each solution corresponds to a method containing The target depth map of the body depth value is obtained, and the minimum solution obtained is the optimal solution.
- the target depth map corresponding to the minimum solution is used as the optimal depth map, that is, the second depth map, so as to achieve depth completion of the depth missing area caused by the transparency of the object.
- S150 Determine a target three-dimensional image corresponding to the target shooting scene based on the second depth map.
- the complete second depth map after depth completion optimization can be converted according to the pre-calibrated camera parameters to obtain the final three-dimensional point cloud data in the target shooting scene, and based on the three-dimensional point cloud data, the target three-dimensional image corresponding to the target shooting scene containing transparent objects can be more accurately constructed.
- a set of post-processing optimization processes are implemented by using a preset image processing model and a global optimization method, so that the three-dimensional imaging effect of the imaging system for transparent objects can be effectively improved under limited computing power overhead.
- the technical solution of the present embodiment is to input the target color image corresponding to the target shooting scene into a preset image processing model for image processing, obtain the transparent object segmentation result, boundary prediction result and normal vector prediction result in the target shooting scene, and based on the transparent object segmentation result, perform shearing processing on the original depth map corresponding to the target shooting scene to obtain a first depth map with the depth information of the transparent object removed, and perform global optimization of the depth information based on the first depth map, the transparent object segmentation result, the boundary prediction result and the normal vector prediction result, so as to perform depth completion on the depth missing area caused by the transparency of the object, and based on the complete second depth map after completion and optimization, the target three-dimensional image corresponding to the target shooting scene containing the transparent object can be more accurately constructed, thereby effectively improving the three-dimensional imaging effect of the transparent object.
- FIG3 is a flow chart of another three-dimensional imaging method provided in an embodiment of the present application. Based on the above embodiment, this embodiment describes in detail the process of constructing the target optimization function corresponding to the target depth map. The explanations of the terms that are the same or corresponding to the above embodiments are not repeated here.
- another three-dimensional imaging method includes the following steps:
- S310 Acquire a target color image and an original depth map corresponding to a target shooting scene, wherein the target shooting scene includes at least one transparent object.
- a pixel-level depth deviation loss i.e., a depth deviation subfunction
- a depth deviation subfunction can be constructed based on the depth value of each pixel in the first depth map and the target depth map, and it can be required that the deviation between the predicted depth value of the current pixel in the target depth map and the original depth value of the pixel in the first depth map is as small as possible.
- S340 may include: obtaining a first depth value corresponding to each first pixel in the first depth map; determining a depth deviation between a first depth value corresponding to the same first pixel and an optimized depth value in the first depth map and the target depth map; and constructing a depth deviation sub-function corresponding to the target depth map based on multiple depth deviations.
- the first pixel is the other pixel in the first depth map except the transparent object pixel.
- the first depth map cuts out the transparent object depth information.
- the transparent object pixel in the first depth map is an invalid pixel without a first depth value.
- each valid pixel with a first depth value in the first depth map can be used as the first pixel, that is, p 1 ⁇ T obs .
- the depth value corresponding to each pixel in the target depth map is called an optimized depth value.
- the depth deviation between the first depth value D 0 (p 1 ) corresponding to the same first pixel p 1 and the optimized depth value D (p 1 ) is determined, and the square values of multiple depth deviations corresponding to multiple first pixels can be added, and the obtained addition result is used as the constructed depth deviation sub-function E D .
- the constructed depth deviation sub-function is:
- the invalid pixels in the first depth map are not calculated in the depth deviation sub-function. Therefore, for the cut-off transparent object area, the deviation of the optimized depth compared to the original depth is not required to be as small as possible. It is only required that the deviation of the optimized depth of the effective pixel points compared to the original depth is as small as possible, so as to ensure the accuracy of the depth optimization of the effective pixel points.
- S350 construct a normal vector deviation sub-function corresponding to the target depth map based on the boundary prediction result and the normal vector prediction result.
- a pixel-level normal vector deviation loss i.e., a normal vector deviation subfunction
- a normal vector deviation subfunction can be constructed based on the normal vectors calculated for each pixel point and adjacent pixel points in the target depth map and the predicted normal vector corresponding to each pixel point in the normal vector prediction result, and it can be required that the angle between the normal vector calculated for each pixel point and adjacent pixel point except the boundary pixels in the target depth map and the predicted normal vector of the pixel in the normal vector prediction result be as small as possible.
- S350 may include: obtaining boundary pixel points in the boundary prediction result; determining the first normal vector corresponding to each second pixel point in the target depth map; obtaining the second normal vector corresponding to each second pixel point in the normal vector prediction result; determining the vector angle corresponding to each second pixel point based on the first normal vector and the second normal vector corresponding to each second pixel point; and constructing a normal vector deviation subfunction corresponding to the target depth map based on multiple vector angles.
- the second pixel points are other pixel points in the target depth map except the boundary pixel points.
- all pixel points on each boundary in the boundary prediction result can be used as boundary pixel points, and all pixel points in the target depth map except all boundary pixel points can be determined as second pixel points, that is, p 2 ⁇ N.
- the first normal vector v(p 2 ,q) corresponding to the second pixel point can be determined in the plane formed by the second pixel point and its adjacent pixel points q in the four directions of up, down, left and right in the target depth map, and the second normal vector N(p 2 ) of the second pixel point in the normal vector prediction result is obtained, and based on the first normal vector and the second normal vector corresponding to the second pixel point, the vector angle ⁇ v(p 2 ,q),N(p 2 )> corresponding to the second pixel point is determined.
- the square values of multiple vector angles corresponding to multiple second pixel points can be added, and the obtained addition result can be used as the constructed normal vector deviation subfunction.
- the constructed normal vector deviation subfunction is:
- the normal vector deviation sub-function does not calculate the pixel points predicted as boundaries in the target depth map, thereby allowing the normal vector at the boundary position in the target shooting scene to change dramatically, thereby effectively ensuring the accuracy of depth optimization.
- determining the first normal vector corresponding to each second pixel in the target depth map may include: determining multiple normal vectors corresponding to each second pixel based on the pixel position of each second pixel in the target depth map and the two pixel positions of two adjacent pixels in each two adjacent directions, wherein the adjacent directions include four directions of up, down, left and right; averaging the multiple normal vectors corresponding to each second pixel to determine the first normal vector corresponding to each second pixel.
- the pixel point position of the second pixel point can be connected with the two pixel point positions of the two adjacent pixel points in each of the two adjacent directions, and the planes corresponding to each of the four directions of up, down, left, and right can be constructed, and the normal vector of each plane can be determined, and the normal vectors corresponding to the second pixel point can be averaged, and the obtained average normal vector can be determined as the first normal vector corresponding to the second pixel point.
- the normal vector deviation can be more accurately characterized, and the global optimization effect of the depth information can be improved.
- a pixel-level depth smoothing loss i.e., a depth smoothing subfunction
- a depth smoothing subfunction can be constructed based on the depth values of each pixel and adjacent pixels in the target depth map, and the depth value change between each pixel and adjacent pixels except for boundary pixels in the target depth map can be required to be as small as possible.
- S360 may include: obtaining boundary pixel points in the boundary prediction result; determining the change depth between each second pixel point and the adjacent pixel points in each adjacent direction according to the optimized depth value corresponding to each second pixel point in the target depth map and the adjacent depth value corresponding to the adjacent pixel points in each adjacent direction; constructing a boundary pixel point in the target depth map based on the change depth corresponding to the plurality of second pixel points in each adjacent direction. Create a depth smoothing subfunction corresponding to the target depth map.
- the second pixel points are other pixel points in the target depth map except the boundary pixel points.
- the adjacent directions include four directions: up, down, left, and right.
- all pixel points on each boundary in the boundary prediction result can be used as boundary pixel points
- all pixel points in the target depth map except all boundary pixel points can be determined as second pixel points, that is, p 2 ⁇ N.
- the optimized depth value D(p 2 ) corresponding to the second pixel point in the target depth map can be subtracted from the adjacent depth value D(q) corresponding to the adjacent pixel point in each adjacent direction to obtain the changed depth between the second pixel point and the adjacent pixel point in each adjacent direction.
- the constructed depth smoothing sub-function is:
- the multiple depth smoothing sub-functions corresponding to the multiple adjacent directions may be solved jointly, or the multiple depth smoothing sub-functions corresponding to the multiple adjacent directions may be averaged, and the obtained average function is used as the depth smoothing sub-function.
- the depth smoothing subfunction does not consider the consistency of depth changes of the pixels predicted as boundaries in the target depth map, thereby allowing the depth at the boundary position in the target shooting scene to jump, thereby effectively ensuring the accuracy of depth optimization.
- S370 construct a target optimization function corresponding to the target depth map based on the depth deviation sub-function, the normal vector deviation sub-function and the depth smoothing sub-function.
- the depth deviation weight ⁇ D can be greater than the normal vector deviation weight ⁇ N and the depth smoothing weight ⁇ S to ensure the accuracy of the depth optimization.
- the depth deviation weight ⁇ D can be 1000, and the normal vector deviation weight ⁇ N and the depth smoothing weight ⁇ S are both 1.
- the sparse square root Cholesky decomposition method can be used to minimize the target optimization function, each solution corresponds to a target depth map, the minimum solution obtained is the optimal solution, and the target depth map corresponding to the minimum solution is used as the optimal depth map, that is, the second depth map, thereby achieving depth completion of the depth missing area caused by the transparency of the object.
- S390 Determine a target three-dimensional image corresponding to the target shooting scene based on the second depth map.
- the technical solution of this embodiment by respectively constructing a depth deviation sub-function, a normal vector deviation sub-function and a depth smoothing sub-function corresponding to the target depth map, can perform global optimization on the depth deviation loss, the normal vector deviation loss and the depth smoothing loss, and thus can perform depth completion more accurately, thereby ensuring the global optimization effect of the depth information and improving the three-dimensional imaging effect of transparent objects.
- the following is an embodiment of a three-dimensional imaging device provided in an embodiment of the present application.
- the device and the three-dimensional imaging methods of the above-mentioned embodiments belong to the same inventive concept.
- FIG4 is a schematic diagram of the structure of a three-dimensional imaging device provided in an embodiment of the present application. This embodiment is applicable to the case of performing three-dimensional imaging on a shooting scene containing a transparent object.
- the device includes: an image acquisition module 410, a target color image input module 420, a shearing processing module 430, a global optimization module 440, and a target three-dimensional image determination module 450.
- the image acquisition module 410 is configured to obtain a target color image and an original depth map corresponding to a target shooting scene, wherein the target shooting scene contains at least one transparent object;
- the target color image input module 420 is configured to input the target color image into a preset image processing model for image processing, and obtain the transparent object segmentation result, boundary prediction result and normal vector prediction result in the target shooting scene according to the output of the preset image processing model;
- the shearing processing module 430 is configured to perform shearing processing on the original depth map based on the transparent object segmentation result to obtain a first depth map without the depth information of the transparent object;
- the global optimization module 440 is configured to perform shearing processing on the original depth map based on the first depth map,
- the transparent object segmentation result, the boundary prediction result and the normal vector prediction result perform global optimization of depth information to determine an optimized second depth map;
- a target three-dimensional image determination module 450 is configured to determine a target three-dimensional image corresponding to the target shooting scene based on the second depth map
- the technical solution of the present embodiment is to input the target color image corresponding to the target shooting scene into a preset image processing model for image processing, obtain the transparent object segmentation result, boundary prediction result and normal vector prediction result in the target shooting scene, and based on the transparent object segmentation result, perform shearing processing on the original depth map corresponding to the target shooting scene to obtain a first depth map with the depth information of the transparent object removed, and perform global optimization of the depth information based on the first depth map, the transparent object segmentation result, the boundary prediction result and the normal vector prediction result, so as to perform depth completion on the depth missing area caused by the transparency of the object, and based on the complete second depth map after completion and optimization, the target three-dimensional image corresponding to the target shooting scene containing the transparent object can be more accurately constructed, thereby effectively improving the three-dimensional imaging effect of the transparent object.
- the global optimization module 440 includes:
- a target optimization function construction unit configured to take a target depth map as an optimization object, and to construct a target optimization function corresponding to the target depth map based on the first depth map, the transparent object segmentation result, the boundary prediction result, and the normal vector prediction result, wherein the target depth map is a depth map obtained after completing the depth information of the first depth map;
- the target optimization function solving unit is configured to minimize the target optimization function and determine the target depth map corresponding to the minimum solution as the optimized second depth map.
- the target optimization function construction unit includes:
- a depth deviation sub-function construction sub-unit configured to take the target depth map as an optimization object and construct a depth deviation sub-function corresponding to the target depth map based on the first depth map;
- a normal vector deviation sub-function construction sub-unit configured to construct a normal vector deviation sub-function corresponding to the target depth map based on the boundary prediction result and the normal vector prediction result;
- a depth smoothing sub-function construction sub-unit configured to construct a depth smoothing sub-function corresponding to the target depth map based on the boundary prediction result
- the target optimization function construction subunit is configured to construct a target optimization function corresponding to the target depth map based on the depth deviation subfunction, the normal vector deviation subfunction and the depth smoothing subfunction.
- the depth bias subfunction constructs a subunit, set to:
- first depth value corresponding to each first pixel in the first depth map wherein the first pixel is a pixel other than a transparent object pixel in the first depth map; determine a depth deviation between a first depth value corresponding to the same first pixel and an optimized depth value in the first depth map and a target depth map; and construct a depth deviation subfunction corresponding to the target depth map based on multiple depth deviations.
- the normal vector deviation subfunction constructs the subunit and is set to:
- Obtain boundary pixel points in the boundary prediction result determine a first normal vector corresponding to each second pixel point in the target depth map, wherein the second pixel point is the pixel point other than the boundary pixel point in the target depth map; obtain a second normal vector corresponding to each second pixel point in the normal vector prediction result; determine a vector angle corresponding to each second pixel point based on the first normal vector and the second normal vector corresponding to each second pixel point; and construct a normal vector deviation subfunction corresponding to the target depth map based on multiple vector angles.
- the normal vector deviation sub-function build sub-unit is also set to:
- multiple normal vectors corresponding to each second pixel are determined, wherein the adjacent directions include four directions of up, down, left and right; the multiple normal vectors corresponding to each second pixel are averaged to determine the first normal vector corresponding to each second pixel.
- the depth smoothing subfunction constructs a subunit, set to:
- the preset image processing model includes: a coding sub-model, a first decoding branch sub-model, a second decoding branch sub-model and a third decoding branch sub-model;
- the target color image input module 420 is configured to: input the target color image into the encoding sub-model for feature extraction to obtain the extracted target image feature information; input the target image feature information into the first decoding branch sub-model to predict the position of transparent objects and determine the transparent object segmentation result; input the target image feature information into the second decoding branch sub-model to predict the boundary between transparent objects and non-transparent objects and determine the boundary prediction result; input the target image feature information into the third decoding branch sub-model to predict the normal vector corresponding to the pixel point position and determine the normal vector prediction result.
- the three-dimensional imaging device provided in the embodiments of the present application can execute the three-dimensional imaging method provided in any embodiment of the present application, and has corresponding functional modules for executing the three-dimensional imaging method.
- FIG5 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
- FIG5 shows a block diagram of an exemplary electronic device 12 suitable for implementing the embodiments of the present application.
- the electronic device 12 shown in FIG5 is only an example and should not bring any limitation to the functions and scope of use of the embodiments of the present application.
- the electronic device 12 is in the form of a general purpose computing device.
- the components of the electronic device 12 may include, but are not limited to, at least one processor or processing unit 16, a system memory 28, and a bus 18 connecting various system components (including the system memory 28 and the processing unit 16).
- Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or any bus using a variety of bus architectures.
- these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
- ISA Industry Standard Architecture
- MCA Micro Channel Architecture
- VESA Video Electronics Standards Association
- PCI Peripheral Component Interconnect
- the electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by the electronic device 12, including volatile and non-volatile media, removable and non-removable media.
- the system memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32.
- the electronic device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
- the storage system 34 may be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 5 , commonly referred to as a “hard drive”).
- a disk drive for reading and writing removable non-volatile disks such as “floppy disks” and an optical disk drive for reading and writing removable non-volatile optical disks (such as read-only optical disks (Compact Disc-Read Only Memory, CD-ROM), digital video disks (Digital Video Disc-Read Only Memory, DVD-ROM) or other optical media)
- each drive may be connected to the bus 18 via at least one data medium interface.
- the system memory 28 may include at least one program product having a set (eg, at least one) of program modules. These program modules are configured to perform the functions of various embodiments of the present application.
- a program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, at least one application program, other program modules, and program data, each of which or some combination may include an implementation of a network environment.
- Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
- the electronic device 12 may also be connected to at least one external device 14 (eg, a keyboard, a pointing device, a display 24, etc.), and can also communicate with at least one device that enables a user to interact with the electronic device 12, and/or communicate with any device that enables the electronic device 12 to communicate with at least one other computing device (such as a network card, a modem, etc.). Such communication can be performed through an input/output (I/O) interface 22.
- the electronic device 12 can also communicate with at least one network (such as a local area network (LAN), a wide area network (WAN) and/or a public network, such as the Internet) through a network adapter 20.
- LAN local area network
- WAN wide area network
- public network such as the Internet
- the network adapter 20 communicates with other modules of the electronic device 12 through the bus 18. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, disk arrays (Redundant Arrays of Independent Disks, RAID) systems, tape drives, and data backup storage systems.
- the processing unit 16 executes various functional applications and data processing by running the program stored in the system memory 28, for example, implementing a three-dimensional imaging method provided in an embodiment of the present application, the method comprising:
- the original depth map is sheared to obtain a first depth map without the depth information of the transparent object
- a target three-dimensional image corresponding to the target shooting scene is determined.
- processor can also implement the technical solution of the three-dimensional imaging method provided in any embodiment of the present application.
- This embodiment provides a computer-readable storage medium having a computer program stored thereon.
- the program is executed by a processor, the three-dimensional imaging method steps provided in any embodiment of the present application are implemented.
- the method includes:
- the original depth map is sheared to obtain a first depth map without the depth information of the transparent object
- a target three-dimensional image corresponding to the target shooting scene is determined.
- the computer storage medium of the embodiment of the present application may adopt any combination of at least one computer-readable medium.
- the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
- the computer-readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
- a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in conjunction with an instruction execution system, device or device.
- a computer-readable signal medium may include a data signal transmitted in a baseband or as part of a carrier wave, in which a computer-readable program code is carried. Such a transmitted data signal may be in a variety of formats.
- the computer readable signal medium may also be any computer readable medium other than a computer readable storage medium, which can send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device.
- the program code contained on the computer-readable medium can be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical cable, radio frequency (RF), etc., or any suitable combination of the above.
- any appropriate medium including but not limited to: wireless, wire, optical cable, radio frequency (RF), etc., or any suitable combination of the above.
- Computer program code for performing the operation of the present application can be written in one or more programming languages or a combination thereof, including object-oriented programming languages, such as Java, Smalltalk, C++, and also conventional procedural programming languages-such as "C" language or similar programming languages.
- the program code can be executed entirely on the user's computer, partially on the user's computer, as an independent software package, partially on the user's computer and partially on a remote computer, or completely on a remote computer or server.
- the remote computer can be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (e.g., using an Internet service provider to connect through the Internet).
- LAN local area network
- WAN wide area network
- modules or steps of the present application can be implemented by a general computing device, they can be concentrated on a single computing device, or distributed on a network composed of multiple computing devices, optionally, they can be implemented by a program code executable by a computer device, so that they can be stored in a storage device and executed by the computing device, or they can be made into individual integrated circuit modules, or multiple modules or steps therein can be made into a single integrated circuit module for implementation.
- the present application is not limited to any specific combination of hardware and software.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
本申请实施例公开了一种三维成像方法、装置、设备和存储介质。该方法包括:获取目标拍摄场景对应的目标彩色图像和原始深度图,其中,目标拍摄场景中包含至少一个透明物体;将目标彩色图像输入至预设图像处理模型中,并根据预设图像处理模型的输出,获得透明物体分割结果、边界预测结果和法向量预测结果;基于透明物体分割结果,对原始深度图进行剪切处理,获得去除透明物体深度信息的第一深度图;基于第一深度图、透明物体分割结果、边界预测结果和法向量预测结果进行深度图全局优化,确定优化后的第二深度图;基于第二深度图确定目标拍摄场景对应的目标三维图像。
Description
本申请要求在2022年10月19日提交中国专利局、申请号为202211281815.2的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
本申请实施例涉及计算机技术,例如涉及一种三维成像方法、装置、设备和存储介质。
随着计算机技术的快速发展,可以利用3D相机以光学的方式获取感兴趣场景的三维图像。
目前,三维成像方式可以包括被动双目三维成像、基于时间飞行原理的三维成像和结构光投影三维成像。其中,被动双目三维成像是通过图像特征匹配的方式,利用三角法间接计算三维信息。基于时间飞行原理的三维成像是根据光的飞行时间直接测量。结构光投影三维成像是主动投射已知编码图案,提高特征匹配效果。
相关技术中至少存在如下问题:
被动双目三维成像中对被测物体表面纹理特征要求较高,无法测量纹理不明显场景,从而不适合作为工业复合机器人的眼睛。基于时间飞行原理的三维成像的测量精度取决于对光束探测时间的精度,在近距离场景中的成像分辨率和精度较差,从而更多地应用到远距离场景中,比如自动驾驶和远距离搜索探测等。结构光投影三维成像中由于投射光线在透明物体上容易发生投射和反射,从而无法对透明物体进行有效地三维成像。
发明内容
本申请实施例提供了一种三维成像方法、装置、设备和存储介质,以有效
提升拍摄场景中的透明物体的三维成像效果。
第一方面,本申请实施例提供了一种三维成像方法,包括:
获取目标拍摄场景对应的目标彩色图像和原始深度图,其中,所述目标拍摄场景中包含至少一个透明物体;
将所述目标彩色图像输入至预设图像处理模型中进行图像处理,并根据所述预设图像处理模型的输出,获得所述目标拍摄场景中的透明物体分割结果、边界预测结果和法向量预测结果;
基于所述透明物体分割结果,对所述原始深度图进行剪切处理,获得去除透明物体深度信息的第一深度图;
基于所述第一深度图、所述透明物体分割结果、所述边界预测结果和所述法向量预测结果进行深度信息的全局优化,确定优化后的第二深度图;
基于所述第二深度图,确定所述目标拍摄场景对应的目标三维图像。
第二方面,本申请实施例还提供了一种三维成像装置,包括:
图像获取模块,设置为获取目标拍摄场景对应的目标彩色图像和原始深度图,其中,所述目标拍摄场景中包含至少一个透明物体;
目标彩色图像输入模块,设置为将所述目标彩色图像输入至预设图像处理模型中进行图像处理,并根据所述预设图像处理模型的输出,获得所述目标拍摄场景中的透明物体分割结果、边界预测结果和法向量预测结果;
剪切处理模块,设置为基于所述透明物体分割结果,对所述原始深度图进行剪切处理,获得去除透明物体深度信息的第一深度图;
全局优化模块,设置为基于所述第一深度图、所述透明物体分割结果、所述边界预测结果和所述法向量预测结果进行深度信息的全局优化,确定优化后的第二深度图;
目标三维图像确定模块,设置为基于所述第二深度图,确定所述目标拍摄场景对应的目标三维图像。
第三方面,本申请实施例还提供了一种电子设备,所述电子设备包括:
至少一个处理器;
存储器,设置为存储至少一个程序;
当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如本申请任意实施例所提供的三维成像方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请任意实施例所提供的三维成像方法。
图1是本申请一个实施例所提供的一种三维成像方法的流程图;
图2是本申请一个实施例所涉及的一种三维成像过程的示例图;
图3是本申请一个实施例提供的另一种三维成像方法的流程图;
图4是本申请一个实施例提供的一种三维成像装置的结构示意图;
图5是本申请一个实施例提供的一种电子设备的结构示意图。
下面结合附图和实施例对本申请作详细说明。
图1为本申请一个实施例所提供的一种三维成像方法的流程图,本实施例可适用于对包含有透明物体的拍摄场景进行三维成像的情况。该方法可以由三维成像装置来执行,该装置可以由软件和/或硬件的方式来实现,集成于电子设备中,比如3D相机,或者工业场景中的复合机器人的视觉传感器,以便辅助机器人完成复杂工况下的场景识别、目标检测等任务。如图1所示,该方法包括以下步骤:
S110、获取目标拍摄场景对应的目标彩色图像和原始深度图,其中,目标拍摄场景中包含至少一个透明物体。
其中,透明物体可以包括允许光完全透射的全透明物体和允许光部分透射
的半透明物体,比如,玻璃杯和塑料瓶等。目标拍摄场景可以是指当前拍摄的场景区域。目标拍摄场景中可以存在至少一个透明物体。目标拍摄场景中可以仅存在透明物体,还可以除了透明物体之外还存在其他非透明物体。目标彩色图像可以是利用红蓝绿三种颜色合成的RGB图像。原始深度图可以是包含有透明物体深度信息的深度图。
示例性地,可以利用2D相机对目标拍摄场景进行拍摄,获得目标拍摄场景对应的目标彩色图像。可以利用3D相机对目标拍摄场景进行拍摄,获得目标拍摄场景对应的原始深度图。
S120、将目标彩色图像输入至预设图像处理模型中进行图像处理,并根据预设图像处理模型的输出,获得目标拍摄场景中的透明物体分割结果、边界预测结果和法向量预测结果。
其中,预设图像处理模型可以是用于对彩色图像进行透明物体分割提取、预测图像中的各个物体之间的边界、以及预测图像中的各个元素位置的法向量的神经网络模型。透明物体分割结果可以利用灰度图像进行表征,例如,透明物体所在位置区域为灰度值255的白色区域,除了透明物体之外的其他区域为灰度值0的黑色区域。边界预测结果可以是指与输入的目标彩色图像大小一致的边界预测图像。边界预测图像中包括利用线条表征出的透明物体与背景的边界以及透明物体与非透明物体之间的边界。法向量预测结果可以是指与输入的目标彩色图像大小一致的法向量预测图像。法向量预测图像中可以利用不同的颜色表征出图像中的每个元素位置对应的法向量。每个元素位置对应的法向量可以是指元素位置与其相邻的其他元素位置所构成的平面的法向量。需要说明的是,预设图像处理模型是预先基于样本数据进行模型训练获得的。样本数据包括包含有至少一个透明物体的样本彩色图像,以及对样本彩色图像进行标定获得的透明物体分割标签、边界标签和法向量标签。
示例性地,图2给出了一种三维成像过程的示例图。如图2所示,可以将目标彩色图像输入至预先训练好的预设图像处理模型中,预设图像处理模型可
以对输入的目标彩色图像进行图像处理,同时确定出目标拍摄场景中的透明物体分割结果、边界预测结果和法向量预测结果并输出,从而利用预设图像处理模型可以快速地获得透明物体分割结果、边界预测结果和法向量预测结果。
示例性地,预设图像处理模型可以包括:编码子模型、第一解码分支子模型、第二解码分支子模型和第三解码分支子模型。相应地,S120可以包括:将目标彩色图像输入至编码子模型中进行特征提取,获得提取出的目标图像特征信息;将目标图像特征信息输入至第一解码分支子模型中进行透明物体的位置预测,确定透明物体分割结果;将目标图像特征信息输入至第二解码分支子模型中进行透明物体和非透明物体的边界预测,确定边界预测结果;将目标图像特征信息输入至第三解码分支子模型中进行像素点位置对应的法向量的预测,确定法向量预测结果。
示例性地,编码子模型可以是指对彩色图像进行图像编码,提取出彩色图像中的图像特征的网络模型。例如,可以仅将原始的Swin Transformer整个网络结构中的前两个推理阶段作为编码子模型,删除第三个和第四个两个推理阶段,从而可以降低模型的整体时间开销。可以将Swin Transformer网络结构中的单分支解码网络修改为三分支解码网络模型,即第一解码分支子模型、第二解码分支子模型和第三解码分支子模型。第一解码分支子模型、第二解码分支子模型和第三解码分支子模型是三个并列的用于预测不同信息的解码网络模型,从而通过第一解码分支子模型、第二解码分支子模型和第三解码分支子模型可以同时预测出输入图像中的透明物体分割结果、边界预测结果和法向量预测结果,避免了编码网络的重复推理,提高了信息预测效率,也提高了三维成像效率。
S130、基于透明物体分割结果,对原始深度图进行剪切处理,获得去除透明物体深度信息的第一深度图。
其中,第一深度图可以是指不包含有透明物体深度信息的深度图。在采集原始深度图时,由于投射光线在透明物体上的透射和反射,导致采集到的透明物体深度信息不准确,从而需要对原始深度图中的透明物体位置预测错误的深
度信息进行剪切。
示例性地,如图2所示,根据透明物体分割结果中的透明物体位置信息,确定原始深度图中的透明物体所在位置处的深度信息,并将原始深度图中的透明物体深度信息进行剪切去除,从而获得去除预测错误的透明物体深度信息后的第一深度图。
S140、基于第一深度图、透明物体分割结果、边界预测结果和法向量预测结果进行深度信息的全局优化,确定优化后的第二深度图。
其中,第二深度图可以是对第一深度图进行深度信息补全后获得的最优深度图,也就是包含有最准确的透明物体深度信息的深度图。
示例性地,如图2所示,可以将第一深度图、透明物体分割结果、边界预测结果和法向量预测结果输入一个全局优化器,全局优化器基于输入的信息求解出深度信息的最优解,并将获得的最优深度图,即第二深度图进行输出,从而实现对因物体透明而导致的深度缺失区域的深度信息补全。
示例性地,S140可以包括:以目标深度图作为优化对象,基于第一深度图、透明物体分割结果、边界预测结果和法向量预测结果,构建目标深度图所对应的目标优化函数;对目标优化函数进行最小化求解,并将最小解所对应的目标深度图确定为优化后的第二深度图。
其中,目标深度图可以是对第一深度图进行深度信息补全后获得的深度图。也就是说,目标深度图是一个与原始深度图尺寸大小一致的完整深度图。目标深度图可以是指目标优化函数中的优化对象。目标深度图中的每个像素点的深度信息均是可以调整优化的,以便确定出每个像素点对应的最优深度信息。示例性地,在深度信息的全局优化时可以基于第一深度图、透明物体分割结果、边界预测结果和法向量预测结果,对目标深度图中的每个像素点位置进行建模,构建出以目标深度图为优化对象的目标优化函数,从而将整张深度图的优化转换为求解一个n元一次多项式的最小解的数学过程,并可以采用稀疏平方根Cholesky分解方式,对目标优化函数进行最小化求解,每个解对应一个包含具
体深度值的目标深度图,获得的最小解为最优解,将最小解所对应的目标深度图作为最优深度图,即第二深度图,从而实现对因物体透明而导致的深度缺失区域的深度补全。
S150、基于第二深度图,确定目标拍摄场景对应的目标三维图像。
示例性地,如图2所示,可以根据预先标定好的相机参数,对深度补全优化后的完整的第二深度图进行转换,获得目标拍摄场景中最终的三维点云数据,并基于三维点云数据可以更加精确地构建出包含有透明物体的目标拍摄场景对应的目标三维图像。通过在目标彩色图像和原始深度图的基础上,利用预设图像处理模型和全局优化方式,实现了一套后处理优化流程,从而在有限的算力开销下,可以有效改善成像系统对透明物体的三维成像效果。
本实施例的技术方案,通过将目标拍摄场景对应的目标彩色图像输入至预设图像处理模型中进行图像处理,获得目标拍摄场景中的透明物体分割结果、边界预测结果和法向量预测结果,并基于透明物体分割结果,对目标拍摄场景对应的原始深度图进行剪切处理,获得去除透明物体深度信息的第一深度图,并基于第一深度图、透明物体分割结果、边界预测结果和法向量预测结果进行深度信息的全局优化,从而对因物体透明而导致的深度缺失区域进行深度补全,并基于补全优化后完整的第二深度图,可以更加精确地构建出包含有透明物体的目标拍摄场景所对应的目标三维图像,从而有效提升了透明物体的三维成像效果。
图3为本申请实施例提供的另一种三维成像方法的流程图,本实施例在上述实施例的基础上,对目标深度图对应的目标优化函数的构建过程进行了详细描述。其中与上述各实施例相同或相应的术语的解释在此不再赘述。
参见图3,本实施例提供的另一种三维成像方法包括以下步骤:
S310、获取目标拍摄场景对应的目标彩色图像和原始深度图,其中,目标拍摄场景中包含至少一个透明物体。
S320、将目标彩色图像输入至预设图像处理模型中进行图像处理,并根据预设图像处理模型的输出,获得目标拍摄场景中的透明物体分割结果、边界预测结果和法向量预测结果。
S330、基于透明物体分割结果,对原始深度图进行剪切处理,获得去除透明物体深度信息的第一深度图。
S340、以目标深度图作为优化对象,基于第一深度图,构建目标深度图对应的深度偏差子函数。
示例性地,可以基于第一深度图和目标深度图中的每个像素点的深度值,构建出像素级别的深度偏差损失,即深度偏差子函数,并可以要求目标深度图中的当前像素的预测深度值与第一深度图中该像素的原始深度值的偏差尽可能地小。
示例性地,S340可以包括:获取第一深度图中的每个第一像素点对应的第一深度值;在第一深度图和目标深度图中,确定同一个第一像素点对应的第一深度值与优化深度值之间的深度偏差;基于多个深度偏差,构建目标深度图对应的深度偏差子函数。
其中,第一像素点是第一深度图中除了透明物体像素点之外的其他像素点。第一深度图剪切去除了透明物体深度信息,第一深度图中的透明物体像素点为不存在第一深度值的无效像素点,此时可以将第一深度图中存在第一深度值的每个有效像素点作为第一像素点,即p1∈Tobs。目标深度图中每个像素点对应的深度值称为优化深度值。
示例性地,针对每个第一像素点而言,在第一深度图和目标深度图中,确定同一个第一像素点p1对应的第一深度值D0(p1)与优化深度值D(p1)之间的深度偏差,并可以将多个第一像素点对应的多个深度偏差的平方值进行相加,获得的相加结果作为构建出的深度偏差子函数ED。例如,构建出的深度偏差子函数为:
需要说明的是,深度偏差子函数中不对第一深度图中的无效像素点进行计算,从而对于剪切掉的透明物体区域,其优化后的深度相比于原始深度的偏差不要求尽可能小,仅要求有效像素点的优化后深度相比于原始深度的偏差要求尽可能小,从而保证有效像素点深度优化的准确性。
S350、基于边界预测结果和法向量预测结果,构建目标深度图对应的法向量偏差子函数。
示例性地,可以基于目标深度图中的每个像素点和相邻像素点计算出的法向量和法向量预测结果中的每个像素点对应的预测法向量,构建出像素级别的法向量偏差损失,即法向量偏差子函数,并可以要求目标深度图中的除了边界像素点之外的每个像素点和相邻像素点计算出的法向量与法向量预测结果中该像素的预测法向量的夹角尽可能地小。
示例性地,S350可以包括:获取边界预测结果中的边界像素点;确定目标深度图中的每个第二像素点对应的第一法向量;获取法向量预测结果中的每个第二像素点对应的第二法向量;基于每个第二像素点对应的第一法向量和第二法向量,确定每个第二像素点对应的向量夹角;根据多个向量夹角,构建目标深度图对应的法向量偏差子函数。
其中,第二像素点是目标深度图中除了边界像素点之外的其他像素点。示例性地,可以将边界预测结果中的每个边界上的所有像素点作为边界像素点,并将目标深度图中除了所有边界像素点之外的所有像素点确定为第二像素点,即p2∈N。针对每个第二像素点p2而言,可以在目标深度图中该第二像素点与其相邻的上下左右四个方向的相邻像素点q构成的平面,确定出该第二像素点对应的第一法向量v(p2,q),并获取该第二像素点在法向量预测结果中的第二法向量N(p2),基于该第二像素点对应的第一法向量和第二法向量,确定出该第二像素点对应的向量夹角<v(p2,q),N(p2)>。可以将多个第二像素点对应的多个向量夹角的平方值进行相加,获得的相加结果作为构建出的法向量偏差子函数。例
如,构建出的法向量偏差子函数为:
需要说明的是,法向量偏差子函数中不对目标深度图中被预测为边界像素点进行计算,从而允许目标拍摄场景中边界位置处的法向量发生剧烈变化,进而有效保证深度优化的准确性。
示例性地,确定目标深度图中的每个第二像素点对应的第一法向量,可以包括:根据目标深度图中的每个第二像素点所在的像素点位置以及在每两个相邻方向上的两个相邻像素点分别所在的两个像素点位置,确定每个第二像素点对应的多个法向量,其中,相邻方向包括上下左右四个方向;对每个第二像素点对应的多个法向量进行平均处理,确定每个第二像素点对应的第一法向量。
示例性地,对于每个第二像素点而言,可以将该第二像素点所在的像素点位置与其每两个相邻方向的两个相邻像素点所在的两个像素点位置进行连线处理,构建出上下左右四个方向中每两个相邻方向所对应的平面,确定每个平面的法向量,并对该第二像素点对应的各个法向量进行平均处理,将获得的平均法向量确定为该第二像素点对应的第一法向量。通过对每两个相邻方向所对应的法向量进行平均处理,可以更加准确地表征出法向量偏差,提高深度信息的全局优化效果。
S360、基于边界预测结果,构建目标深度图对应的深度平滑子函数。
示例性地,可以基于目标深度图中的每个像素点和相邻像素点的深度值,构建出像素级别的深度平滑损失,即深度平滑子函数,并可以要求目标深度图中的除了边界像素点之外的每个像素点和相邻像素点之间的深度值变化尽可能地小。
示例性地,S360可以包括:获取边界预测结果中的边界像素点;根据目标深度图中的每个第二像素点对应的优化深度值以及在每个相邻方向上的相邻像素点对应的相邻深度值,确定每个第二像素点与每个相邻方向上的相邻像素点之间的变化深度;基于每个相邻方向上的多个第二像素点对应的变化深度,构
建目标深度图对应的深度平滑子函数。
其中,第二像素点是目标深度图中除了边界像素点之外的其他像素点。相邻方向包括上下左右四个方向,示例性地,可以将边界预测结果中的每个边界上的所有像素点作为边界像素点,并将目标深度图中除了所有边界像素点之外的所有像素点确定为第二像素点,即p2∈N。针对每个第二像素点p2而言,可以将目标深度图中该第二像素点对应的优化深度值D(p2)与每个相邻方向的相邻像素点对应的相邻深度值D(q)进行相减,获得该第二像素点与每个相邻方向的相邻像素点之间的变化深度。针对每个相邻方向而言,可以将该相邻方向上的各个第二像素点对应的变化深度的平方值进行相加,获得的相加结果作为构建出的该相邻方向对应的深度平滑子函数。例如,构建出的深度平滑子函数为:可以将多个相邻方向对应的多个深度平滑子函数进行联立求解,或者也可以对多个相邻方向对应的多个深度平滑子函数进行平均处理,获得的平均函数作为深度平滑子函数。
需要说明的是,深度平滑子函数中不考虑目标深度图中被预测为边界像素点的深度变化的一致性,从而允许目标拍摄场景中边界位置处的深度发生跳变,进而有效保证深度优化的准确性。
S370、基于深度偏差子函数、法向量偏差子函数和深度平滑子函数,构建目标深度图对应的目标优化函数。
示例性地,基于深度偏差权重λD、法向量偏差权重λN和深度平滑权重λS,对深度偏差子函数、法向量偏差子函数和深度平滑子函数进行加权求和,获得的求和结果作为构建出的目标优化函数E,即E=λDED+λNEN+λSES。其中,深度偏差权重λD可以大于法向量偏差权重λN和深度平滑权重λS,以保证深度优化的准确性,例如,深度偏差权重λD可以为1000,法向量偏差权重λN和深度平滑权重λS均为1。
S380、对目标优化函数进行最小化求解,并将最小解所对应的目标深度图确定为优化后的第二深度图。
示例性地,可以采用稀疏平方根Cholesky分解方式,对目标优化函数进行最小化求解,每个解对应一个目标深度图,获得的最小解为最优解,将最小解所对应的目标深度图作为最优深度图,即第二深度图,从而实现对因物体透明而导致的深度缺失区域的深度补全。
S390、基于第二深度图,确定目标拍摄场景对应的目标三维图像。
本实施例的技术方案,通过分别构建目标深度图对应的深度偏差子函数、法向量偏差子函数和深度平滑子函数,从而可以从深度偏差损失、法向量偏差损失和深度平滑损失上进行全局优化,进而可以更加准确地进行深度补全,保证深度信息的全局优化效果,提高了透明物体三维成像效果。
以下是本申请实施例提供的三维成像装置的实施例,该装置与上述各实施例的三维成像方法属于同一个发明构思,在三维成像装置的实施例中未详尽描述的细节内容,可以参考上述三维成像方法的实施例。
图4为本申请实施例提供的一种三维成像装置的结构示意图,本实施例可适用于对包含有透明物体的拍摄场景进行三维成像的情况。如图4所示,该装置包括:图像获取模块410、目标彩色图像输入模块420、剪切处理模块430、全局优化模块440和目标三维图像确定模块450。
其中,图像获取模块410,设置为获取目标拍摄场景对应的目标彩色图像和原始深度图,其中,所述目标拍摄场景中包含至少一个透明物体;目标彩色图像输入模块420,设置为将所述目标彩色图像输入至预设图像处理模型中进行图像处理,并根据所述预设图像处理模型的输出,获得所述目标拍摄场景中的透明物体分割结果、边界预测结果和法向量预测结果;剪切处理模块430,设置为基于所述透明物体分割结果,对所述原始深度图进行剪切处理,获得去除透明物体深度信息的第一深度图;全局优化模块440,设置为基于所述第一深度图、
所述透明物体分割结果、所述边界预测结果和所述法向量预测结果进行深度信息的全局优化,确定优化后的第二深度图;目标三维图像确定模块450,设置为基于所述第二深度图,确定所述目标拍摄场景对应的目标三维图像。
本实施例的技术方案,通过将目标拍摄场景对应的目标彩色图像输入至预设图像处理模型中进行图像处理,获得目标拍摄场景中的透明物体分割结果、边界预测结果和法向量预测结果,并基于透明物体分割结果,对目标拍摄场景对应的原始深度图进行剪切处理,获得去除透明物体深度信息的第一深度图,并基于第一深度图、透明物体分割结果、边界预测结果和法向量预测结果进行深度信息的全局优化,从而对因物体透明而导致的深度缺失区域进行深度补全,并基于补全优化后完整的第二深度图,可以更加精确地构建出包含有透明物体的目标拍摄场景所对应的目标三维图像,从而有效提升了透明物体的三维成像效果。
可选地,全局优化模块440,包括:
目标优化函数构建单元,设置为以目标深度图作为优化对象,基于所述第一深度图、所述透明物体分割结果、所述边界预测结果和所述法向量预测结果,构建目标深度图所对应的目标优化函数,其中,所述目标深度图是对所述第一深度图进行深度信息补全后获得的深度图;
目标优化函数求解单元,设置为对所述目标优化函数进行最小化求解,并将最小解所对应的目标深度图确定为优化后的第二深度图。
可选地,目标优化函数构建单元,包括:
深度偏差子函数构建子单元,设置为以目标深度图作为优化对象,基于所述第一深度图,构建所述目标深度图对应的深度偏差子函数;
法向量偏差子函数构建子单元,设置为基于所述边界预测结果和所述法向量预测结果,构建所述目标深度图对应的法向量偏差子函数;
深度平滑子函数构建子单元,设置为基于所述边界预测结果,构建所述目标深度图对应的深度平滑子函数;
目标优化函数构建子单元,设置为基于所述深度偏差子函数、所述法向量偏差子函数和所述深度平滑子函数,构建所述目标深度图对应的目标优化函数。
可选地,深度偏差子函数构建子单元,设置为:
获取所述第一深度图中的每个第一像素点对应的第一深度值,其中,所述第一像素点是所述第一深度图中除了透明物体像素点之外的其他像素点;在所述第一深度图和目标深度图中,确定同一个所述第一像素点对应的第一深度值与优化深度值之间的深度偏差;基于多个深度偏差,构建所述目标深度图对应的深度偏差子函数。
可选地,法向量偏差子函数构建子单元,设置为:
获取所述边界预测结果中的边界像素点;确定所述目标深度图中的每个第二像素点对应的第一法向量,其中,所述第二像素点是所述目标深度图中除了所述边界像素点之外的其他像素点;获取所述法向量预测结果中的每个所述第二像素点对应的第二法向量;基于每个所述第二像素点对应的所述第一法向量和所述第二法向量,确定每个所述第二像素点对应的向量夹角;根据多个向量夹角,构建所述目标深度图对应的法向量偏差子函数。
可选地,法向量偏差子函数构建子单元,还设置为:
根据所述目标深度图中的每个第二像素点所在的像素点位置以及在每两个相邻方向上的两个相邻像素点分别所在的两个像素点位置,确定每个第二像素点对应的多个法向量,其中,所述相邻方向包括上下左右四个方向;对每个第二像素点对应的多个所述法向量进行平均处理,确定每个第二像素点对应的第一法向量。
可选地,深度平滑子函数构建子单元,设置为:
获取所述边界预测结果中的边界像素点;根据所述目标深度图中的每个第二像素点对应的优化深度值以及在每个相邻方向上的相邻像素点对应的相邻深度值,确定每个第二像素点与每个相邻方向上的相邻像素点之间的变化深度;其中,所述第二像素点是所述目标深度图中除了所述边界像素点之外的其他像
素点;基于每个相邻方向上的多个第二像素点对应的变化深度,构建所述目标深度图对应的深度平滑子函数。
可选地,所述预设图像处理模型包括:编码子模型、第一解码分支子模型、第二解码分支子模型和第三解码分支子模型;
目标彩色图像输入模块420,设置为:将所述目标彩色图像输入至所述编码子模型中进行特征提取,获得提取出的目标图像特征信息;将所述目标图像特征信息输入至第一解码分支子模型中进行透明物体的位置预测,确定透明物体分割结果;将所述目标图像特征信息输入至第二解码分支子模型中进行透明物体和非透明物体的边界预测,确定边界预测结果;将所述目标图像特征信息输入至第三解码分支子模型中进行像素点位置对应的法向量的预测,确定法向量预测结果。
本申请实施例所提供的三维成像装置可执行本申请任意实施例所提供的三维成像方法,具备执行三维成像方法相应的功能模块。
值得注意的是,上述三维成像装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。
图5为本申请实施例提供的一种电子设备的结构示意图。图5示出了适于用来实现本申请实施方式的示例性电子设备12的框图。图5显示的电子设备12仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图5所示,电子设备12以通用计算设备的形式表现。电子设备12的组件可以包括但不限于:至少一个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线
结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture,ISA)总线,微通道体系结构(Micro Channel Architecture,MCA)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association,VESA)局域总线以及外围组件互连(Peripheral Component Interconnect,PCI)总线。
电子设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被电子设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
系统存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory,RAM)30和/或高速缓存存储器32。电子设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图5未显示,通常称为“硬盘驱动器”)。尽管图5中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如只读光盘(Compact Disc-Read Only Memory,CD-ROM),数字视盘(Digital Video Disc-Read Only Memory,DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过至少一个数据介质接口与总线18相连。系统存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本申请各实施例的功能。
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如系统存储器28中,这样的程序模块42包括但不限于操作系统、至少一个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本申请所描述的实施例中的功能和/或方法。
电子设备12也可以与至少一个外部设备14(例如键盘、指向设备、显示器
24等)通信,还可与至少一个使得用户能与该电子设备12交互的设备通信,和/或与使得该电子设备12能与至少一个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(Input/Output,I/O)接口22进行。并且,电子设备12还可以通过网络适配器20与至少一个网络(例如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20通过总线18与电子设备12的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、磁盘阵列(Redundant Arrays of Independent Disks,RAID)系统、磁带驱动器以及数据备份存储系统等。
处理单元16通过运行存储在系统存储器28中的程序,从而执行各种功能应用以及数据处理,例如实现本申请实施例所提供的一种三维成像方法步骤,该方法包括:
获取目标拍摄场景对应的目标彩色图像和原始深度图,其中,所述目标拍摄场景中包含至少一个透明物体;
将所述目标彩色图像输入至预设图像处理模型中进行图像处理,并根据所述预设图像处理模型的输出,获得所述目标拍摄场景中的透明物体分割结果、边界预测结果和法向量预测结果;
基于所述透明物体分割结果,对所述原始深度图进行剪切处理,获得去除透明物体深度信息的第一深度图;
基于所述第一深度图、所述透明物体分割结果、所述边界预测结果和所述法向量预测结果进行深度信息的全局优化,确定优化后的第二深度图;
基于所述第二深度图,确定所述目标拍摄场景对应的目标三维图像。
当然,本领域技术人员可以理解,处理器还可以实现本申请任意实施例所提供的三维成像方法的技术方案。
本实施例提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请任意实施例所提供的三维成像方法步骤,该方法包括:
获取目标拍摄场景对应的目标彩色图像和原始深度图,其中,所述目标拍摄场景中包含至少一个透明物体;
将所述目标彩色图像输入至预设图像处理模型中进行图像处理,并根据所述预设图像处理模型的输出,获得所述目标拍摄场景中的透明物体分割结果、边界预测结果和法向量预测结果;
基于所述透明物体分割结果,对所述原始深度图进行剪切处理,获得去除透明物体深度信息的第一深度图;
基于所述第一深度图、所述透明物体分割结果、所述边界预测结果和所述法向量预测结果进行深度信息的全局优化,确定优化后的第二深度图;
基于所述第二深度图,确定所述目标拍摄场景对应的目标三维图像。
本申请实施例的计算机存储介质,可以采用至少一个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是但不限于:电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有至少一个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM(Erasable Programmable Read-Only Memory)或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种
形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
本领域普通技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个计算装置上,或者分布在多个计算装置所组成的网络上,可选地,他们可以用计算机装置可执行的程序代码来实现,从而可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件的结合。
注意,上述仅为本申请的可选实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以
上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由所附的权利要求范围决定。
Claims (11)
- 一种三维成像方法,包括:获取目标拍摄场景对应的目标彩色图像和原始深度图,其中,所述目标拍摄场景中包含至少一个透明物体;将所述目标彩色图像输入至预设图像处理模型中进行图像处理,并根据所述预设图像处理模型的输出,获得所述目标拍摄场景中的透明物体分割结果、边界预测结果和法向量预测结果;基于所述透明物体分割结果,对所述原始深度图进行剪切处理,获得去除透明物体深度信息的第一深度图;基于所述第一深度图、所述透明物体分割结果、所述边界预测结果和所述法向量预测结果进行深度信息的全局优化,确定优化后的第二深度图;基于所述第二深度图,确定所述目标拍摄场景对应的目标三维图像。
- 根据权利要求1所述的方法,其中,所述基于所述第一深度图、所述透明物体分割结果、所述边界预测结果和所述法向量预测结果进行深度信息的全局优化,确定优化后的第二深度图,包括:以目标深度图作为优化对象,基于所述第一深度图、所述透明物体分割结果、所述边界预测结果和所述法向量预测结果,构建所述目标深度图对应的目标优化函数,其中,所述目标深度图是对所述第一深度图进行深度信息补全后获得的深度图;对所述目标优化函数进行最小化求解,并将最小解所对应的目标深度图确定为优化后的第二深度图。
- 根据权利要求2所述的方法,其中,所述以目标深度图作为优化对象,基于所述第一深度图、所述透明物体分割结果、所述边界预测结果和所述法向量预测结果,构建所述目标深度图对应的目标优化函数,包括:以目标深度图作为优化对象,基于所述第一深度图,构建所述目标深度图对应的深度偏差子函数;基于所述边界预测结果和所述法向量预测结果,构建所述目标深度图对应 的法向量偏差子函数;基于所述边界预测结果,构建所述目标深度图对应的深度平滑子函数;基于所述深度偏差子函数、所述法向量偏差子函数和所述深度平滑子函数,构建所述目标深度图对应的目标优化函数。
- 根据权利要求3所述的方法,其中,所述基于所述第一深度图,构建所述目标深度图对应的深度偏差子函数,包括:获取所述第一深度图中的每个第一像素点对应的第一深度值,其中,所述第一像素点是所述第一深度图中除了透明物体像素点之外的其他像素点;在所述第一深度图和所述目标深度图中,确定同一个所述第一像素点对应的第一深度值与优化深度值之间的深度偏差;基于多个深度偏差,构建所述目标深度图对应的深度偏差子函数。
- 根据权利要求3所述的方法,其中,所述基于所述边界预测结果和所述法向量预测结果,构建所述目标深度图对应的法向量偏差子函数,包括:获取所述边界预测结果中的边界像素点;确定所述目标深度图中的每个第二像素点对应的第一法向量,其中,所述第二像素点是所述目标深度图中除了所述边界像素点之外的其他像素点;获取所述法向量预测结果中的每个所述第二像素点对应的第二法向量;基于每个所述第二像素点对应的所述第一法向量和所述第二法向量,确定每个所述第二像素点对应的向量夹角;根据多个向量夹角,构建所述目标深度图对应的法向量偏差子函数。
- 根据权利要求5所述的方法,其中,所述确定所述目标深度图中的每个第二像素点对应的第一法向量,包括:根据所述目标深度图中的每个第二像素点所在的像素点位置以及在每两个相邻方向上的两个相邻像素点分别所在的两个像素点位置,确定每个第二像素点对应的多个法向量,其中,所述相邻方向包括上下左右四个方向;对每个第二像素点对应的多个法向量进行平均处理,确定每个第二像素点 对应的第一法向量。
- 根据权利要求3所述的方法,其中,所述基于所述边界预测结果,构建所述目标深度图对应的深度平滑子函数,包括:获取所述边界预测结果中的边界像素点;根据所述目标深度图中的每个第二像素点对应的优化深度值以及在每个相邻方向上的相邻像素点对应的相邻深度值,确定每个第二像素点与每个相邻方向上的相邻像素点之间的变化深度;其中,所述第二像素点是所述目标深度图中除了所述边界像素点之外的其他像素点;基于每个相邻方向上的多个所述第二像素点对应的变化深度,构建所述目标深度图对应的深度平滑子函数。
- 根据权利要求1-7任一项所述的方法,其中,所述预设图像处理模型包括:编码子模型、第一解码分支子模型、第二解码分支子模型和第三解码分支子模型;所述将所述目标彩色图像输入至预设图像处理模型中进行图像处理,包括:将所述目标彩色图像输入至所述编码子模型中进行特征提取,获得提取出的目标图像特征信息;将所述目标图像特征信息输入至第一解码分支子模型中进行透明物体的位置预测,确定透明物体分割结果;将所述目标图像特征信息输入至第二解码分支子模型中进行透明物体和非透明物体的边界预测,确定边界预测结果;将所述目标图像特征信息输入至第三解码分支子模型中进行像素点位置对应的法向量的预测,确定法向量预测结果。
- 一种三维成像装置,包括:图像获取模块,设置为获取目标拍摄场景对应的目标彩色图像和原始深度图,其中,所述目标拍摄场景中包含至少一个透明物体;目标彩色图像输入模块,设置为将所述目标彩色图像输入至预设图像处理 模型中进行图像处理,并根据所述预设图像处理模型的输出,获得所述目标拍摄场景中的透明物体分割结果、边界预测结果和法向量预测结果;剪切处理模块,设置为基于所述透明物体分割结果,对所述原始深度图进行剪切处理,获得去除透明物体深度信息的第一深度图;全局优化模块,设置为基于所述第一深度图、所述透明物体分割结果、所述边界预测结果和所述法向量预测结果进行深度信息的全局优化,确定优化后的第二深度图;目标三维图像确定模块,设置为基于所述第二深度图,确定所述目标拍摄场景对应的目标三维图像。
- 一种电子设备,包括:至少一个处理器;存储器,设置为存储至少一个程序;当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-8中任一所述的三维成像方法。
- 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-8中任一所述的三维成像方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211281815.2 | 2022-10-19 | ||
CN202211281815.2A CN115578516A (zh) | 2022-10-19 | 2022-10-19 | 一种三维成像方法、装置、设备和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024083006A1 true WO2024083006A1 (zh) | 2024-04-25 |
Family
ID=84586600
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/123920 WO2024083006A1 (zh) | 2022-10-19 | 2023-10-11 | 一种三维成像方法、装置、设备和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115578516A (zh) |
WO (1) | WO2024083006A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115578516A (zh) * | 2022-10-19 | 2023-01-06 | 京东科技控股股份有限公司 | 一种三维成像方法、装置、设备和存储介质 |
CN117058384B (zh) * | 2023-08-22 | 2024-02-09 | 山东大学 | 一种三维点云语义分割的方法及系统 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110109617A1 (en) * | 2009-11-12 | 2011-05-12 | Microsoft Corporation | Visualizing Depth |
CN109903372A (zh) * | 2019-01-28 | 2019-06-18 | 中国科学院自动化研究所 | 深度图超分辨率补全方法及高质量三维重建方法与系统 |
CN113160313A (zh) * | 2021-03-03 | 2021-07-23 | 广东工业大学 | 一种透明物体抓取控制方法、装置、终端及存储介质 |
CN113269689A (zh) * | 2021-05-25 | 2021-08-17 | 西安交通大学 | 一种基于法向量和高斯权重约束的深度图像补全方法及系统 |
CN113313810A (zh) * | 2021-06-18 | 2021-08-27 | 广东工业大学 | 一种透明物体的6d姿态参数计算方法 |
CN114004972A (zh) * | 2021-12-03 | 2022-02-01 | 京东鲲鹏(江苏)科技有限公司 | 一种图像语义分割方法、装置、设备和存储介质 |
CN115578516A (zh) * | 2022-10-19 | 2023-01-06 | 京东科技控股股份有限公司 | 一种三维成像方法、装置、设备和存储介质 |
-
2022
- 2022-10-19 CN CN202211281815.2A patent/CN115578516A/zh active Pending
-
2023
- 2023-10-11 WO PCT/CN2023/123920 patent/WO2024083006A1/zh unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110109617A1 (en) * | 2009-11-12 | 2011-05-12 | Microsoft Corporation | Visualizing Depth |
CN109903372A (zh) * | 2019-01-28 | 2019-06-18 | 中国科学院自动化研究所 | 深度图超分辨率补全方法及高质量三维重建方法与系统 |
CN113160313A (zh) * | 2021-03-03 | 2021-07-23 | 广东工业大学 | 一种透明物体抓取控制方法、装置、终端及存储介质 |
CN113269689A (zh) * | 2021-05-25 | 2021-08-17 | 西安交通大学 | 一种基于法向量和高斯权重约束的深度图像补全方法及系统 |
CN113313810A (zh) * | 2021-06-18 | 2021-08-27 | 广东工业大学 | 一种透明物体的6d姿态参数计算方法 |
CN114004972A (zh) * | 2021-12-03 | 2022-02-01 | 京东鲲鹏(江苏)科技有限公司 | 一种图像语义分割方法、装置、设备和存储介质 |
CN115578516A (zh) * | 2022-10-19 | 2023-01-06 | 京东科技控股股份有限公司 | 一种三维成像方法、装置、设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN115578516A (zh) | 2023-01-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3852068A1 (en) | Method for training generative network, method for generating near-infrared image and apparatuses | |
WO2019223382A1 (zh) | 单目深度估计方法及其装置、设备和存储介质 | |
WO2024083006A1 (zh) | 一种三维成像方法、装置、设备和存储介质 | |
CN113674421B (zh) | 3d目标检测方法、模型训练方法、相关装置及电子设备 | |
CN111753961A (zh) | 模型训练方法和装置、预测方法和装置 | |
CN115861601B (zh) | 一种多传感器融合感知方法及装置 | |
WO2022257487A1 (zh) | 深度估计模型的训练方法, 装置, 电子设备及存储介质 | |
CN112561978B (zh) | 深度估计网络的训练方法、图像的深度估计方法、设备 | |
CN115880555B (zh) | 目标检测方法、模型训练方法、装置、设备及介质 | |
CN113592015B (zh) | 定位以及训练特征匹配网络的方法和装置 | |
CN116188893A (zh) | 基于bev的图像检测模型训练及目标检测方法和装置 | |
CN115719436A (zh) | 模型训练方法、目标检测方法、装置、设备以及存储介质 | |
CN112184828B (zh) | 激光雷达与摄像头的外参标定方法、装置及自动驾驶车辆 | |
Shi et al. | An improved lightweight deep neural network with knowledge distillation for local feature extraction and visual localization using images and LiDAR point clouds | |
CN117422884A (zh) | 三维目标检测方法、系统、电子设备及存储介质 | |
CN115330940A (zh) | 一种三维重建方法、装置、设备和介质 | |
CN117876608B (zh) | 三维图像重建方法、装置、计算机设备及存储介质 | |
CN117745944A (zh) | 预训练模型确定方法、装置、设备以及存储介质 | |
Arampatzakis et al. | Monocular depth estimation: A thorough review | |
CN111709269B (zh) | 一种深度图像中基于二维关节信息的人手分割方法和装置 | |
CN115866229B (zh) | 多视角图像的视角转换方法、装置、设备和介质 | |
CN112085842B (zh) | 深度值确定方法及装置、电子设备和存储介质 | |
CN116642490A (zh) | 基于混合地图的视觉定位导航方法、机器人及存储介质 | |
CN116129422A (zh) | 单目3d目标检测方法、装置、电子设备和存储介质 | |
CN113763468B (zh) | 一种定位方法、装置、系统及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23878997 Country of ref document: EP Kind code of ref document: A1 |