CN113808063A - Depth map optimization method and device for large-scale scene reconstruction and storage medium - Google Patents

Depth map optimization method and device for large-scale scene reconstruction and storage medium Download PDF

Info

Publication number
CN113808063A
CN113808063A CN202111117916.1A CN202111117916A CN113808063A CN 113808063 A CN113808063 A CN 113808063A CN 202111117916 A CN202111117916 A CN 202111117916A CN 113808063 A CN113808063 A CN 113808063A
Authority
CN
China
Prior art keywords
depth
reference image
map
depth map
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111117916.1A
Other languages
Chinese (zh)
Inventor
何娇
董林佳
王江安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tudou Data Technology Group Co ltd
Original Assignee
Tudou Data Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tudou Data Technology Group Co ltd filed Critical Tudou Data Technology Group Co ltd
Priority to CN202111117916.1A priority Critical patent/CN113808063A/en
Publication of CN113808063A publication Critical patent/CN113808063A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration by the use of local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The application discloses a depth map optimization method for large-scale scene reconstruction, which relates to the technical field of image processing and comprises the following steps: preprocessing the N multi-view images, and determining the depth estimation range of each multi-view image; selecting one image of the N multi-view images as a reference image and M images as a source image, and determining an initial depth map and an initial depth confidence map of the reference image according to the depth estimation range of the reference image and the depth estimation range of the source image; calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image; optimizing the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value to determine an optimized depth map; the optimized depth map is filtered to determine the depth map, the problem that the fusion degree of a texture region is not high is solved, missing parts can be filled in the depth map of the weak texture region, and a large outdoor scene is effectively reconstructed.

Description

Depth map optimization method and device for large-scale scene reconstruction and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a depth map optimization method and apparatus for large-scale scene reconstruction, and a storage medium.
Background
The traditional depth map fusion-based method obtains relatively high precision under certain environments, but because the texture environment is seriously relied on in the implementation process to calculate the consistency of the photos, the problem of low fusion degree can occur when the depth fusion is carried out on the physical scene with weak texture, for example, when buildings and water surfaces with a large amount of glass materials are used, the traditional depth map fusion method can not meet the requirement of high-precision fusion.
At present, how to reconstruct a texture region is a problem to be solved urgently.
Disclosure of Invention
The depth map optimization method for large-scale scene reconstruction solves the problem that in the prior art, the fusion degree of texture regions is not high, the depth map of weak texture regions can be filled in missing positions, and large-scale outdoor scenes are effectively reconstructed.
In a first aspect, an embodiment of the present invention provides a depth map optimization method for large-scale scene reconstruction, where the method includes:
preprocessing N multi-view images, and determining the depth estimation range of each multi-view image;
selecting one image of N multi-view images as a reference image and M images as source images, wherein M is less than N;
determining a reference image initial depth map and a reference image initial depth confidence map according to the depth estimation range of the reference image and the depth estimation range of the source image;
calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image;
optimizing the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value to determine an optimized depth map;
and filtering the optimized depth map to determine the depth map.
With reference to the first aspect, in a possible implementation manner, the preprocessing the N multiview images includes:
estimating a depth range by adopting an incremental motion recovery structure algorithm;
and selecting an image for stereo matching.
With reference to the first aspect, in a possible implementation manner, the determining an initial depth map of a reference image includes:
calculating the sampling number of each pixel point in the reference image in the depth direction of the reference visual angle;
and calculating the initial depth value of the pixel point according to the sampling number, and determining a reference image initial depth map.
With reference to the first aspect, in a possible implementation manner, the determining an initial depth confidence map of a reference image includes:
calculating the matching correlation value of each pixel point in the reference image corresponding to the pixel point in the M source images, and determining a plurality of matching correlation values of each pixel point in the reference image;
calculating an average value of a plurality of matching correlation values of each pixel point in M pieces of source images, and determining an initial depth confidence value of each pixel point of the reference image;
and determining the initial depth confidence map of the reference image according to the initial depth confidence value of each pixel point of the reference image.
With reference to the first aspect, in a possible implementation manner, the calculating a confidence mask value of a pixel point in the reference image includes:
determining constraint conditions including depth constraint, smooth constraint and normal constraint, and determining an objective function;
calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image;
the confidence coefficient mask value of the pixel point in the reference image is used for calculating the minimum value of the target function;
and determining the minimum value of the objective function as the optimized depth value of the pixel point in the reference image.
With reference to the first aspect, in a possible implementation manner, the filtering the optimized depth map includes: and filtering the depth map by adopting an iterative filtering method.
In a second aspect, an embodiment of the present invention provides a depth map optimization apparatus for large-scale scene reconstruction, where the apparatus includes:
the depth range estimation module is used for preprocessing N multi-view images and determining the depth estimation range of each multi-view image;
the image selecting module is used for selecting one image of N multi-view images as a reference image and M images as source images, wherein M is less than N;
the initial depth map and initial depth confidence map determining module is used for determining a reference image initial depth map and a reference image initial depth confidence map according to the depth estimation range of the reference image and the depth estimation range of the source image;
the confidence mask determining module is used for calculating confidence mask values of pixel points in the reference image according to the initial depth confidence map of the reference image;
the optimized depth map determining module is used for optimizing the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value to determine an optimized depth map;
and the depth map determining module is used for filtering the optimized depth map to determine the depth map.
With reference to the second aspect, in a possible implementation manner, the depth range estimation module is configured to estimate the depth range by using an incremental motion recovery structure algorithm;
and selecting an image for stereo matching.
With reference to the second aspect, in a possible implementation manner, the initial depth map and the initial depth confidence map determining module are configured to calculate a sampling number of each pixel point in the reference image in a depth direction of a reference view;
and calculating the initial depth value of the pixel point according to the sampling number, and determining a reference image initial depth map.
With reference to the second aspect, in a possible implementation manner, the initial depth map and initial depth confidence map determining module is configured to calculate a matching correlation value of each pixel point in the reference image corresponding to a pixel point in M pieces of the source images, and determine a plurality of matching correlation values of each pixel point in the reference image;
calculating an average value of a plurality of matching correlation values of each pixel point in M pieces of source images, and determining an initial depth confidence value of each pixel point of the reference image;
and determining the initial depth confidence map of the reference image according to the initial depth confidence value of each pixel point of the reference image.
With reference to the second aspect, in a possible implementation manner, the confidence mask determining module is configured to determine that the constraint condition includes a depth constraint, a smooth constraint, and a normal constraint, and determine an objective function;
calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image;
the confidence coefficient mask value of the pixel point in the reference image is used for calculating the minimum value of the target function;
and determining the minimum value of the objective function as the optimized depth value of the pixel point in the reference image.
With reference to the second aspect, in a possible implementation manner, the depth map determining module is configured to perform depth map filtering by using an iterative filtering method.
In a third aspect, an embodiment of the present invention provides a depth map optimization server for large-scale scene reconstruction, including a memory and a processor;
the memory is to store computer-executable instructions;
the processor is configured to execute the computer-executable instructions to implement the method provided by the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores executable instructions, and a computer can implement the method provided in the first aspect when executing the executable instructions.
One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:
the embodiment of the invention adopts a depth map optimization method for large-scale scene reconstruction, and the method comprises the steps of preprocessing N multi-view images and determining the depth estimation range of each multi-view image; selecting one image of N multi-view images as a reference image and M images as source images, wherein M is less than N; determining a reference image initial depth map and a reference image initial depth confidence map according to the depth estimation range of the reference image and the depth estimation range of the source image; calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image; optimizing the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value to determine an optimized depth map; the optimized depth map is filtered, the depth map is determined, the confidence value of the pixel point of the reference image is added to optimize the depth value of the pixel point, the problem that the fusion degree of a texture area is not high in the prior art is solved, the depth map of a weak texture area can be filled in the missing part, and a large outdoor scene is effectively reconstructed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present invention or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart illustrating steps of a depth map optimization method for large-scale scene reconstruction according to an embodiment of the present disclosure;
fig. 2 is a flowchart of image preprocessing steps in a depth map optimization method for large-scale scene reconstruction according to an embodiment of the present disclosure;
fig. 3 is a flowchart illustrating a step of determining an initial depth map in a depth map optimization method for large-scale scene reconstruction according to an embodiment of the present disclosure;
fig. 4 is a view illustrating an initial depth confidence map determined in the depth map optimization method for large-scale scene reconstruction provided in the embodiment of the present application;
fig. 5 is a flowchart of confidence mask value calculation steps in a depth map optimization method for large-scale scene reconstruction according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of a depth map optimization apparatus for large-scale scene reconstruction provided in an embodiment of the present application;
fig. 7 is a schematic diagram of a depth map optimization server for large-scale scene reconstruction according to an embodiment of the present application;
fig. 8A is an input image in a depth map optimization method for large-scale scene reconstruction provided in an embodiment of the present application;
fig. 8B is an initial depth map in the depth map optimization method for large-scale scene reconstruction provided in the embodiment of the present application;
fig. 8C is a confidence mask map in the depth map optimization method for large-scale scene reconstruction provided in the embodiment of the present application;
fig. 8D is a surface normal map in the depth map optimization method for large-scale scene reconstruction provided in the embodiment of the present application;
fig. 8E is an optimized depth map in the depth map optimization method for large-scale scene reconstruction provided in the embodiment of the present application;
fig. 8F is a fused point cloud image in the depth map optimization method for large-scale scene reconstruction provided in the embodiment of the present application.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In a large-scale outdoor scene, because the photo consistency measurement error of a weak texture area is large, the traditional depth fusion-based method depends heavily on a texture environment to calculate the consistency of photos, so that for scenes of many weak texture objects, such as buildings and water surfaces with a large amount of glass materials, the original depth fusion method often causes incomplete reconstruction.
The embodiment of the invention provides a depth map optimization method for large-scale scene reconstruction, which comprises the following steps as shown in figure 1:
step S101, preprocessing N multi-view images, and determining the depth estimation range of each multi-view image.
Step S102, selecting one image of N multi-view images as a reference image and M images as source images, wherein M is less than N.
And step S103, determining a reference image initial depth map and a reference image initial depth confidence map according to the depth estimation range of the reference image and the depth estimation range of the source image.
And step S104, calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image.
And S105, optimizing the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value, and determining an optimized depth map.
And step S106, filtering the optimized depth map to determine the depth map.
In the steps, the problem that the depth fusion precision of the traditional depth fusion-based method is not high in certain environments is solved, the confidence value of the pixel point of the reference image is added to optimize the depth value of the pixel point, the missing part of the depth image of the weak texture area can be filled, and the large outdoor scene can be effectively reconstructed.
With reference to the first aspect, in a possible implementation manner, the preprocessing the N multiview images includes the following steps as shown in fig. 2:
step S201, an incremental motion recovery structure algorithm is adopted to estimate a depth range.
Step S202, selecting an image for stereo matching.
In the above step S201, the N-length multi-view images in the same area are preprocessed, the incremental motion recovery algorithm is used to estimate the camera internal and external parameters, and the depth estimation range [ d ] of each pixel point on each image is obtainedmin,dmax]. The principle of the incremental motion restoration algorithm is that feature point matching and detection are carried out on two views to solve the geometrical relation between N images, then a segmented Gaussian function is used for grading each image, a pair of images with the highest grading are selected to carry out initialization reconstruction, usually a random sampling consistency algorithm is adopted to remove wrong matching points, and beam adjustment optimization is carried out on the initial pose and the 3D point of the camera again. On the basis, new images are continuously added to solve the camera pose and position and triangulate the feature points, and in the adding process, beam adjustment is required to be optimized to reduce error accumulation during each adding, so that the robustness of the incremental motion recovery structure algorithm is improved.
With reference to the first aspect, in a possible implementation manner, determining an initial depth map of a reference image, as shown in fig. 3, includes the following steps:
step S301, calculating the sampling number of each pixel point in the reference image in the depth direction of the reference view.
Step S302, calculating the initial depth value of the pixel point according to the sampling number, and determining the initial depth map of the reference image.
Before step S301 is executed, N multiview images are grouped based on the feature point matching in step S101, and distortion correction is performed on all N images as the probability of being a group increases as the degree of feature point matching increases.
In step S301, one picture in the grouped group is selected as a reference image, and the rest of the grouped pictures are selected as source images.
Sampling quantity D of pixel points in reference image in depth direction of pixel points in source imagenumThe calculation formula of (a) is as follows:
Figure DEST_PATH_IMAGE002
dmin represents the minimum depth value of a certain pixel point in the reference image; dmax represents the maximum depth value of a certain pixel point in the reference image; and rho represents the distance lowest value of the corresponding pixel point in the source image projected to the horizontal coordinate system.
Initial depth value D of reference image0The calculation formula of (p) is as follows:
Figure DEST_PATH_IMAGE004
dmin represents the minimum depth value of a certain pixel point in the reference image; dmax represents the maximum depth value of a certain pixel point in the reference image; dnumAnd the sampling number of the pixel points in the reference image in the depth direction of the pixel points in the source image is represented.
With reference to the first aspect, in a possible implementation manner, referring to the initial depth confidence map of the image, as shown in fig. 4, includes the following steps:
step S401, calculating the matching correlation value of each pixel point in the reference image corresponding to the pixel points in the M source images, and determining a plurality of matching correlation values of each pixel point in the reference image.
Step S402, calculating an average value of a plurality of matching correlation values of each pixel point in the M source images, and determining an initial depth confidence value of each pixel point of the reference image.
Step S403, determining an initial depth confidence map of the reference image according to the initial depth confidence value of each pixel point of the reference image.
In step S401, a plurality of matching correlation values for each pixel in the reference image are calculated for normalizing the correlation between matching targets. By referencing picture IrPixel point I inr(x, y) centered 3 x 3 domain matching window WpIn the source image I matched therewithsCorresponding matching point
Figure DEST_PATH_IMAGE006
Wherein adjacent matching windows W are constructedp’To perform the calculation of the matching error. The premise for constructing the matching window is that the baseline correction has already been performed between the two matching images. The method for calculating the matching correlation value NCC between the matched pixel points is as follows:
Figure DEST_PATH_IMAGE008
wherein, Ir(x, y) represents the coordinates of the pixel points in the reference image within the matching window,
Figure DEST_PATH_IMAGE010
representing matching windows in a source image
Figure DEST_PATH_IMAGE012
The coordinates of the pixel points within the frame,
Figure DEST_PATH_IMAGE014
representing reference image matching windows
Figure DEST_PATH_IMAGE016
The mean value of the pixel values within (a),
Figure DEST_PATH_IMAGE018
representing the mean of the pixel values within the source image matching window.
In the method, in order to improve efficiency, the NCC average value of the matched pixel points in the M source image and one reference image can be used as the corresponding depth confidence value in the reference image in the calculation process. Namely:
Figure DEST_PATH_IMAGE020
wherein p represents a pixel point of a depth confidence value to be solved in the reference image,
Figure DEST_PATH_IMAGE022
and representing pixel points matched with the p points in the source image.
With reference to the first aspect, in a possible implementation manner, calculating a confidence mask value of a pixel point in a reference image includes the following steps as shown in fig. 5:
step S501, determining constraint conditions including depth constraint, smooth constraint and normal constraint, and determining an objective function.
Step S502, according to the initial depth confidence map of the reference image, confidence mask values of pixel points in the reference image are calculated.
In step S503, the confidence mask values of the pixel points in the reference image are used to calculate the minimum value of the objective function.
Step S504, the minimum value of the objective function is determined to be the optimized depth value of the pixel point in the reference image.
In the present application, before performing step S501, a full convolution network based on a computer vision Group (VGG) is trained, with symmetric encoding and decoding for predicting the surface normal of each pixel from all images. In step S501, an equation system is defined to complete the drawing of the optimized depth map, and the objective function is defined as a weighted sum of three constraint conditions, namely, a depth constraint, a smooth constraint and a normal constraint.
Constraining E at depthDAnd representing the distance between the initial depth value of the pixel point to be calculated and the estimated depth value, so that the estimated depth value is close to the initial depth value.
Smoothing constraint ESTo representThe depth consistency of adjacent pixels encourages them to have the same depth.
Normal constraint ENRepresenting the correspondence between the reference image prediction surface normal n (p) and the matching point prediction normal in the source image.
In step S502, the depth map calculated by MVS inevitably contains noise, anomalies and large holes, especially in weak texture regions, so that it is difficult to complete the depth map directly. Aiming at the problem, a confidence mask is designed according to the output confidence of MVS, and a depth constraint E is addedDTo indicate the reliability of each depth point. That is, if the p-confidence value of a pixel is low, the depth is considered unreliable, and the depth constraint E is weighted downwardDThe confidence mask value proposed by the present invention is described as:
Figure DEST_PATH_IMAGE024
,
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE026
represents the mean in a gaussian distribution;
Figure DEST_PATH_IMAGE028
represents the variance in a gaussian distribution; c represents the depth confidence value of the pixel point;
Figure DEST_PATH_IMAGE030
representing a depth confidence threshold; in this application, let
Figure DEST_PATH_IMAGE032
When the confidence of the pixel point is larger than
Figure DEST_PATH_IMAGE034
When the confidence coefficient is less than the confidence coefficient, the reliability of the depth value of the pixel point is high
Figure DEST_PATH_IMAGE036
In time, the reliability of the depth value representing the pixel point is limited. But these blurred pixels are not discarded directly, but are weighted by a gaussian distribution application.
The objective function in step S501 is:
Figure DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE040
representing the initial depth values of the pixels of the reference image,
Figure DEST_PATH_IMAGE042
representing an estimated depth value of the reference image,
Figure DEST_PATH_IMAGE044
representing the surface normal of each pixel point in the reference image,
Figure DEST_PATH_IMAGE046
representing the tangent vector required for dot product with the surface normal, p is a pixel point on the reference image,
Figure DEST_PATH_IMAGE048
respectively a depth constraint, a smoothing constraint and a normal constraint.
Figure DEST_PATH_IMAGE050
Weight, initial parameter setting, respectively representing each constraint
Figure DEST_PATH_IMAGE052
=103
Figure DEST_PATH_IMAGE054
=1,
Figure DEST_PATH_IMAGE056
=10-3. By establishing an error matrix of the objective function, ultimately, theAnd (3) solving by using sparse Cholesky factorization, wherein the final solution is a global minimum solution of the approximate objective function, and D (p) is obtained and is the optimized depth value.
With reference to the first aspect, in a possible implementation manner, the filtering the optimized depth map includes: and filtering the depth map by adopting an iterative filtering method.
And after the initial depth values of all the pixel points in the reference image are optimized and calculated, determining an optimized depth map. The optimized depth extract is further filtered. For reference picture IrThe point p in (1) is,
Figure DEST_PATH_IMAGE058
is that point p is in image IrBack-projecting p to world coordinates by known camera intrinsic parameters and pose information to obtain 3D coordinates of the corresponding point p. Then projects p to its neighboring image Ir+1In (1).
Figure DEST_PATH_IMAGE060
Is that point p is in image Ir+1From the projection position in image Ir+1If satisfied, the obtained depth value
Figure DEST_PATH_IMAGE062
Set t =0.01, consider 3D point p at IrAnd Ir+1Is consistent. If IrSatisfies this equation, the pixel point p is considered accurate and retained, otherwise it is removed. The depth map filtering process can fill many small holes effectively, but for some large missing regions in the weak texture region, the fused point cloud still contains some unfilled parts. Therefore, the invention provides an iterative filtering and completion method.
All reserved depth points are cross-validated in the filtering process, which means that they are consistent in adjacent images, so the confidence mask value of the newly added pixel point is set to 1, and the newly generated pixel band point is used by the nearby pixel points to construct a smooth constraint condition, which will help the generation of the smooth constraint condition by the pixel pointsA more accurate depth value. This completion and filtering process is repeated until the number of points in the point cloud becomes stable. Direct comparison of the number of points in the point cloud is superfluous because of the additional fusion step required. Calculating the growth rate of the average pixel point number in the optimized depth map after each iteration, if the average pixel point number
Figure DEST_PATH_IMAGE064
And the number in the last iteration
Figure DEST_PATH_IMAGE066
Satisfy the equation
Figure DEST_PATH_IMAGE068
The iteration is stopped. As a compromise to balance model integrity and processing time, the present application will
Figure DEST_PATH_IMAGE070
Set to 0.01. Finally, using these iteratively complemented depth maps, all depth maps are back-projected to the 3D points and the points are merged together, fusing them to obtain the final 3D point cloud.
In a second aspect, an embodiment of the present invention provides a depth map optimization apparatus 600 for large scale scene reconstruction, as shown in fig. 6, the apparatus includes: a depth range estimation module 601, an image selection module 602, an initial depth map and initial depth confidence map determination module 603, a confidence mask determination module 604, an optimized depth map determination module 605, a depth map determination module 606.
A depth range estimation module 601, configured to pre-process the N multi-view images, and determine a depth estimation range of each multi-view image; and estimating the depth range by adopting an incremental motion recovery structure algorithm, and selecting an image for stereo matching.
An image selecting module 602, configured to select one of N multi-view images as a reference image and M images as source images, where M < N;
the initial depth map and initial depth confidence map determining module 603 determines an initial depth map and an initial depth confidence map of a reference image according to the depth estimation range of the reference image and the depth estimation range of the source image; calculating the sampling number of each pixel point in the reference image in the depth direction of the reference visual angle; and calculating the initial depth value of the pixel point according to the sampling number, and determining the initial depth map of the reference image. Calculating the matching correlation value of each pixel point in the reference image corresponding to the pixel points in the M source images, and determining a plurality of matching correlation values of each pixel point in the reference image; calculating an average value of a plurality of matching correlation values of each pixel point in the M source images, and determining an initial depth confidence value of each pixel point of a reference image; and determining an initial depth confidence map of the reference image according to the initial depth confidence value of each pixel point of the reference image.
The confidence mask determining module 604 is configured to calculate a confidence mask value of a pixel point in the reference image according to the initial depth confidence map of the reference image; determining constraint conditions including depth constraint, smooth constraint and normal constraint, and determining an objective function; calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image; the confidence coefficient mask value of the pixel point in the reference image is used for calculating the minimum value of the target function; and determining the minimum value of the objective function as the optimized depth value of the pixel point in the reference image.
And the optimized depth map determining module 605 is configured to optimize the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value, and determine an optimized depth map.
A depth map determining module 606, configured to filter the optimized depth map and determine a depth map; the method is used for filtering the depth map by adopting an iterative filtering method.
In a third aspect, an embodiment of the present invention provides a depth map optimization server for large-scale scene reconstruction, as shown in fig. 7, including a memory 701 and a processor 702; the memory 701 is used to store computer executable instructions; the processor 702 is configured to execute computer-executable instructions to implement the methods provided above.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where executable instructions are stored in the computer-readable storage medium, and when the computer executes the executable instructions, the method described above can be implemented.
In a specific embodiment of the present application, fig. 8A is a specific input image, fig. 8B is an initial depth map, fig. 8C is a confidence mask map, fig. 8D is a surface normal map, fig. 8E is an optimized depth map, and fig. 8F is a fused point cloud map.
The storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache, a Hard Disk (Hard Disk Drive), or a Memory Card (HDD). The memory may be used to store computer program instructions.
Although the present application provides method steps as described in an embodiment or flowchart, additional or fewer steps may be included based on conventional or non-inventive efforts. The sequence of steps recited in this embodiment is only one of many steps performed and does not represent a unique order of execution. When an actual apparatus or client product executes, it can execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the methods shown in this embodiment or the figures.
The apparatuses or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. The functionality of the modules may be implemented in the same one or more software and/or hardware implementations of the present application. Of course, a module that implements a certain function may be implemented by a plurality of sub-modules or sub-units in combination.
The methods, apparatus or modules described herein may be implemented in a computer readable program code means for a controller in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip pIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
Some of the modules in the apparatus described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary hardware. Based on such understanding, the technical solutions of the present application may be embodied in the form of software products or in the implementation process of data migration, which essentially or partially contributes to the prior art. The computer software product may be stored in a storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, mobile terminal, server, or network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. All or portions of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, mobile communication terminals, multiprocessor systems, microprocessor-based systems, programmable electronic devices, network pC, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the present application; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the present disclosure.

Claims (9)

1. A depth map optimization method for large scale scene reconstruction, comprising:
preprocessing N multi-view images, and determining the depth estimation range of each multi-view image;
selecting one image of N multi-view images as a reference image and M images as source images, wherein M is less than N;
determining a reference image initial depth map and a reference image initial depth confidence map according to the depth estimation range of the reference image and the depth estimation range of the source image;
calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image;
optimizing the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value to determine an optimized depth map;
and filtering the optimized depth map to determine the depth map.
2. The method according to claim 1, wherein the preprocessing the N multiview images comprises:
estimating a depth range by adopting an incremental motion recovery structure algorithm;
and selecting an image for stereo matching.
3. The method of claim 1, wherein determining the initial depth map of the reference image comprises:
calculating the sampling number of each pixel point in the reference image in the depth direction of the reference visual angle;
and calculating the initial depth value of the pixel point according to the sampling number, and determining a reference image initial depth map.
4. The method of claim 1, wherein determining the initial depth confidence map for the reference image comprises:
calculating the matching correlation value of each pixel point in the reference image corresponding to the pixel point in the M source images, and determining a plurality of matching correlation values of each pixel point in the reference image;
calculating an average value of a plurality of matching correlation values of each pixel point in M pieces of source images, and determining an initial depth confidence value of each pixel point of the reference image;
and determining the initial depth confidence map of the reference image according to the initial depth confidence value of each pixel point of the reference image.
5. The method of claim 1, wherein calculating confidence mask values for pixels in the reference image comprises:
determining constraint conditions including depth constraint, smooth constraint and normal constraint, and determining an objective function;
calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image;
the confidence coefficient mask value of the pixel point in the reference image is used for calculating the minimum value of the target function;
and determining the minimum value of the objective function as the optimized depth value of the pixel point in the reference image.
6. The method of claim 1, wherein the filtering the optimized depth map comprises: and filtering the depth map by adopting an iterative filtering method.
7. A depth map optimization apparatus for large scale scene reconstruction, comprising:
the depth range estimation module is used for preprocessing N multi-view images and determining the depth estimation range of each multi-view image;
the image selecting module is used for selecting one image of N multi-view images as a reference image and M images as source images, wherein M is less than N;
the initial depth map and initial depth confidence map determining module is used for determining a reference image initial depth map and a reference image initial depth confidence map according to the depth estimation range of the reference image and the depth estimation range of the source image;
the confidence mask determining module is used for calculating confidence mask values of pixel points in the reference image according to the initial depth confidence map of the reference image;
the optimized depth map determining module is used for optimizing the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value to determine an optimized depth map;
and the depth map determining module is used for filtering the optimized depth map to determine the depth map.
8. A depth map optimization server for large scale scene reconstruction, comprising a memory and a processor;
the memory is to store computer-executable instructions;
the processor is configured to execute the computer-executable instructions to implement the method of any of claims 1-6.
9. A computer-readable storage medium having stored thereon executable instructions that, when executed by a computer, are capable of implementing the method of any one of claims 1-6.
CN202111117916.1A 2021-09-24 2021-09-24 Depth map optimization method and device for large-scale scene reconstruction and storage medium Pending CN113808063A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111117916.1A CN113808063A (en) 2021-09-24 2021-09-24 Depth map optimization method and device for large-scale scene reconstruction and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111117916.1A CN113808063A (en) 2021-09-24 2021-09-24 Depth map optimization method and device for large-scale scene reconstruction and storage medium

Publications (1)

Publication Number Publication Date
CN113808063A true CN113808063A (en) 2021-12-17

Family

ID=78896393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111117916.1A Pending CN113808063A (en) 2021-09-24 2021-09-24 Depth map optimization method and device for large-scale scene reconstruction and storage medium

Country Status (1)

Country Link
CN (1) CN113808063A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114066779A (en) * 2022-01-13 2022-02-18 杭州蓝芯科技有限公司 Depth map filtering method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114066779A (en) * 2022-01-13 2022-02-18 杭州蓝芯科技有限公司 Depth map filtering method and device, electronic equipment and storage medium
CN114066779B (en) * 2022-01-13 2022-05-06 杭州蓝芯科技有限公司 Depth map filtering method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US10540576B1 (en) Panoramic camera systems
US10630956B2 (en) Image processing method and apparatus
WO2021174939A1 (en) Facial image acquisition method and system
US9519954B2 (en) Camera calibration and automatic adjustment of images
Wei et al. Fisheye video correction
US8896665B2 (en) Camera calibration method and medium and 3D object reconstruction method and medium using the same
EP2064675A1 (en) Method for determining a depth map from images, device for determining a depth map
CN114785996A (en) Virtual reality parallax correction
WO2021000390A1 (en) Point cloud fusion method and apparatus, electronic device, and computer storage medium
CN109685879B (en) Method, device, equipment and storage medium for determining multi-view image texture distribution
Rossi et al. Joint graph-based depth refinement and normal estimation
US10121259B2 (en) System and method for determining motion and structure from optical flow
CN113673400A (en) Real scene three-dimensional semantic reconstruction method and device based on deep learning and storage medium
CN116977596A (en) Three-dimensional modeling system and method based on multi-view images
CN113808063A (en) Depth map optimization method and device for large-scale scene reconstruction and storage medium
Coorg Pose imagery and automated three-dimensional modeling of urban environments
CN115937002B (en) Method, apparatus, electronic device and storage medium for estimating video rotation
CN115345990A (en) Oblique photography three-dimensional reconstruction method and device for weak texture scene
CN115345897A (en) Three-dimensional reconstruction depth map optimization method and device
Murayama et al. Depth Image Noise Reduction and Super-Resolution by Pixel-Wise Multi-Frame Fusion
KR102181832B1 (en) Apparatus and method for 4d image reconstruction
Fujimura et al. Dehazing cost volume for deep multi-view stereo in scattering media with airlight and scattering coefficient estimation
Graber Realtime 3D reconstruction
Liao et al. High completeness multi-view stereo for dense reconstruction of large-scale urban scenes
Aguilar-Gonzalez Monocular-SLAM dense mapping algorithm and hardware architecture for FPGA acceleration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination