CN113808063A

CN113808063A - Depth map optimization method and device for large-scale scene reconstruction and storage medium

Info

Publication number: CN113808063A
Application number: CN202111117916.1A
Authority: CN
Inventors: 何娇; 董林佳; 王江安
Original assignee: Tudou Data Technology Group Co ltd
Current assignee: Tudou Data Technology Group Co ltd
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2021-12-17

Abstract

The application discloses a depth map optimization method for large-scale scene reconstruction, which relates to the technical field of image processing and comprises the following steps: preprocessing the N multi-view images, and determining the depth estimation range of each multi-view image; selecting one image of the N multi-view images as a reference image and M images as a source image, and determining an initial depth map and an initial depth confidence map of the reference image according to the depth estimation range of the reference image and the depth estimation range of the source image; calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image; optimizing the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value to determine an optimized depth map; the optimized depth map is filtered to determine the depth map, the problem that the fusion degree of a texture region is not high is solved, missing parts can be filled in the depth map of the weak texture region, and a large outdoor scene is effectively reconstructed.

Description

Depth map optimization method and device for large-scale scene reconstruction and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a depth map optimization method and apparatus for large-scale scene reconstruction, and a storage medium.

Background

The traditional depth map fusion-based method obtains relatively high precision under certain environments, but because the texture environment is seriously relied on in the implementation process to calculate the consistency of the photos, the problem of low fusion degree can occur when the depth fusion is carried out on the physical scene with weak texture, for example, when buildings and water surfaces with a large amount of glass materials are used, the traditional depth map fusion method can not meet the requirement of high-precision fusion.

At present, how to reconstruct a texture region is a problem to be solved urgently.

Disclosure of Invention

The depth map optimization method for large-scale scene reconstruction solves the problem that in the prior art, the fusion degree of texture regions is not high, the depth map of weak texture regions can be filled in missing positions, and large-scale outdoor scenes are effectively reconstructed.

In a first aspect, an embodiment of the present invention provides a depth map optimization method for large-scale scene reconstruction, where the method includes:

preprocessing N multi-view images, and determining the depth estimation range of each multi-view image;

selecting one image of N multi-view images as a reference image and M images as source images, wherein M is less than N;

determining a reference image initial depth map and a reference image initial depth confidence map according to the depth estimation range of the reference image and the depth estimation range of the source image;

calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image;

optimizing the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value to determine an optimized depth map;

and filtering the optimized depth map to determine the depth map.

With reference to the first aspect, in a possible implementation manner, the preprocessing the N multiview images includes:

estimating a depth range by adopting an incremental motion recovery structure algorithm;

and selecting an image for stereo matching.

With reference to the first aspect, in a possible implementation manner, the determining an initial depth map of a reference image includes:

calculating the sampling number of each pixel point in the reference image in the depth direction of the reference visual angle;

and calculating the initial depth value of the pixel point according to the sampling number, and determining a reference image initial depth map.

With reference to the first aspect, in a possible implementation manner, the determining an initial depth confidence map of a reference image includes:

calculating the matching correlation value of each pixel point in the reference image corresponding to the pixel point in the M source images, and determining a plurality of matching correlation values of each pixel point in the reference image;

calculating an average value of a plurality of matching correlation values of each pixel point in M pieces of source images, and determining an initial depth confidence value of each pixel point of the reference image;

and determining the initial depth confidence map of the reference image according to the initial depth confidence value of each pixel point of the reference image.

With reference to the first aspect, in a possible implementation manner, the calculating a confidence mask value of a pixel point in the reference image includes:

determining constraint conditions including depth constraint, smooth constraint and normal constraint, and determining an objective function;

the confidence coefficient mask value of the pixel point in the reference image is used for calculating the minimum value of the target function;

and determining the minimum value of the objective function as the optimized depth value of the pixel point in the reference image.

With reference to the first aspect, in a possible implementation manner, the filtering the optimized depth map includes: and filtering the depth map by adopting an iterative filtering method.

In a second aspect, an embodiment of the present invention provides a depth map optimization apparatus for large-scale scene reconstruction, where the apparatus includes:

the depth range estimation module is used for preprocessing N multi-view images and determining the depth estimation range of each multi-view image;

the image selecting module is used for selecting one image of N multi-view images as a reference image and M images as source images, wherein M is less than N;

the initial depth map and initial depth confidence map determining module is used for determining a reference image initial depth map and a reference image initial depth confidence map according to the depth estimation range of the reference image and the depth estimation range of the source image;

the confidence mask determining module is used for calculating confidence mask values of pixel points in the reference image according to the initial depth confidence map of the reference image;

the optimized depth map determining module is used for optimizing the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value to determine an optimized depth map;

and the depth map determining module is used for filtering the optimized depth map to determine the depth map.

With reference to the second aspect, in a possible implementation manner, the depth range estimation module is configured to estimate the depth range by using an incremental motion recovery structure algorithm;

and selecting an image for stereo matching.

With reference to the second aspect, in a possible implementation manner, the initial depth map and the initial depth confidence map determining module are configured to calculate a sampling number of each pixel point in the reference image in a depth direction of a reference view;

With reference to the second aspect, in a possible implementation manner, the initial depth map and initial depth confidence map determining module is configured to calculate a matching correlation value of each pixel point in the reference image corresponding to a pixel point in M pieces of the source images, and determine a plurality of matching correlation values of each pixel point in the reference image;

With reference to the second aspect, in a possible implementation manner, the confidence mask determining module is configured to determine that the constraint condition includes a depth constraint, a smooth constraint, and a normal constraint, and determine an objective function;

With reference to the second aspect, in a possible implementation manner, the depth map determining module is configured to perform depth map filtering by using an iterative filtering method.

In a third aspect, an embodiment of the present invention provides a depth map optimization server for large-scale scene reconstruction, including a memory and a processor;

the memory is to store computer-executable instructions;

the processor is configured to execute the computer-executable instructions to implement the method provided by the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores executable instructions, and a computer can implement the method provided in the first aspect when executing the executable instructions.

One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:

the embodiment of the invention adopts a depth map optimization method for large-scale scene reconstruction, and the method comprises the steps of preprocessing N multi-view images and determining the depth estimation range of each multi-view image; selecting one image of N multi-view images as a reference image and M images as source images, wherein M is less than N; determining a reference image initial depth map and a reference image initial depth confidence map according to the depth estimation range of the reference image and the depth estimation range of the source image; calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image; optimizing the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value to determine an optimized depth map; the optimized depth map is filtered, the depth map is determined, the confidence value of the pixel point of the reference image is added to optimize the depth value of the pixel point, the problem that the fusion degree of a texture area is not high in the prior art is solved, the depth map of a weak texture area can be filled in the missing part, and a large outdoor scene is effectively reconstructed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present invention or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart illustrating steps of a depth map optimization method for large-scale scene reconstruction according to an embodiment of the present disclosure;

fig. 2 is a flowchart of image preprocessing steps in a depth map optimization method for large-scale scene reconstruction according to an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a step of determining an initial depth map in a depth map optimization method for large-scale scene reconstruction according to an embodiment of the present disclosure;

fig. 4 is a view illustrating an initial depth confidence map determined in the depth map optimization method for large-scale scene reconstruction provided in the embodiment of the present application;

fig. 5 is a flowchart of confidence mask value calculation steps in a depth map optimization method for large-scale scene reconstruction according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a depth map optimization apparatus for large-scale scene reconstruction provided in an embodiment of the present application;

fig. 7 is a schematic diagram of a depth map optimization server for large-scale scene reconstruction according to an embodiment of the present application;

fig. 8A is an input image in a depth map optimization method for large-scale scene reconstruction provided in an embodiment of the present application;

fig. 8B is an initial depth map in the depth map optimization method for large-scale scene reconstruction provided in the embodiment of the present application;

fig. 8C is a confidence mask map in the depth map optimization method for large-scale scene reconstruction provided in the embodiment of the present application;

fig. 8D is a surface normal map in the depth map optimization method for large-scale scene reconstruction provided in the embodiment of the present application;

fig. 8E is an optimized depth map in the depth map optimization method for large-scale scene reconstruction provided in the embodiment of the present application;

fig. 8F is a fused point cloud image in the depth map optimization method for large-scale scene reconstruction provided in the embodiment of the present application.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In a large-scale outdoor scene, because the photo consistency measurement error of a weak texture area is large, the traditional depth fusion-based method depends heavily on a texture environment to calculate the consistency of photos, so that for scenes of many weak texture objects, such as buildings and water surfaces with a large amount of glass materials, the original depth fusion method often causes incomplete reconstruction.

The embodiment of the invention provides a depth map optimization method for large-scale scene reconstruction, which comprises the following steps as shown in figure 1:

step S101, preprocessing N multi-view images, and determining the depth estimation range of each multi-view image.

Step S102, selecting one image of N multi-view images as a reference image and M images as source images, wherein M is less than N.

And step S103, determining a reference image initial depth map and a reference image initial depth confidence map according to the depth estimation range of the reference image and the depth estimation range of the source image.

And step S104, calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image.

And S105, optimizing the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value, and determining an optimized depth map.

And step S106, filtering the optimized depth map to determine the depth map.

In the steps, the problem that the depth fusion precision of the traditional depth fusion-based method is not high in certain environments is solved, the confidence value of the pixel point of the reference image is added to optimize the depth value of the pixel point, the missing part of the depth image of the weak texture area can be filled, and the large outdoor scene can be effectively reconstructed.

With reference to the first aspect, in a possible implementation manner, the preprocessing the N multiview images includes the following steps as shown in fig. 2:

step S201, an incremental motion recovery structure algorithm is adopted to estimate a depth range.

Step S202, selecting an image for stereo matching.

In the above step S201, the N-length multi-view images in the same area are preprocessed, the incremental motion recovery algorithm is used to estimate the camera internal and external parameters, and the depth estimation range [ d ] of each pixel point on each image is obtained_min,d_max]. The principle of the incremental motion restoration algorithm is that feature point matching and detection are carried out on two views to solve the geometrical relation between N images, then a segmented Gaussian function is used for grading each image, a pair of images with the highest grading are selected to carry out initialization reconstruction, usually a random sampling consistency algorithm is adopted to remove wrong matching points, and beam adjustment optimization is carried out on the initial pose and the 3D point of the camera again. On the basis, new images are continuously added to solve the camera pose and position and triangulate the feature points, and in the adding process, beam adjustment is required to be optimized to reduce error accumulation during each adding, so that the robustness of the incremental motion recovery structure algorithm is improved.

With reference to the first aspect, in a possible implementation manner, determining an initial depth map of a reference image, as shown in fig. 3, includes the following steps:

step S301, calculating the sampling number of each pixel point in the reference image in the depth direction of the reference view.

Step S302, calculating the initial depth value of the pixel point according to the sampling number, and determining the initial depth map of the reference image.

Before step S301 is executed, N multiview images are grouped based on the feature point matching in step S101, and distortion correction is performed on all N images as the probability of being a group increases as the degree of feature point matching increases.

In step S301, one picture in the grouped group is selected as a reference image, and the rest of the grouped pictures are selected as source images.

Sampling quantity D of pixel points in reference image in depth direction of pixel points in source image_numThe calculation formula of (a) is as follows:

，

dmin represents the minimum depth value of a certain pixel point in the reference image; dmax represents the maximum depth value of a certain pixel point in the reference image; and rho represents the distance lowest value of the corresponding pixel point in the source image projected to the horizontal coordinate system.

Initial depth value D of reference image₀The calculation formula of (p) is as follows:

，

dmin represents the minimum depth value of a certain pixel point in the reference image; dmax represents the maximum depth value of a certain pixel point in the reference image; d_numAnd the sampling number of the pixel points in the reference image in the depth direction of the pixel points in the source image is represented.

With reference to the first aspect, in a possible implementation manner, referring to the initial depth confidence map of the image, as shown in fig. 4, includes the following steps:

step S401, calculating the matching correlation value of each pixel point in the reference image corresponding to the pixel points in the M source images, and determining a plurality of matching correlation values of each pixel point in the reference image.

Step S402, calculating an average value of a plurality of matching correlation values of each pixel point in the M source images, and determining an initial depth confidence value of each pixel point of the reference image.

Step S403, determining an initial depth confidence map of the reference image according to the initial depth confidence value of each pixel point of the reference image.

In step S401, a plurality of matching correlation values for each pixel in the reference image are calculated for normalizing the correlation between matching targets. By referencing picture I_rPixel point I in_r(x, y) centered 3 x 3 domain matching window W_pIn the source image I matched therewith_sCorresponding matching point

Wherein adjacent matching windows W are constructed_p’To perform the calculation of the matching error. The premise for constructing the matching window is that the baseline correction has already been performed between the two matching images. The method for calculating the matching correlation value NCC between the matched pixel points is as follows:

，

wherein, I_r(x, y) represents the coordinates of the pixel points in the reference image within the matching window,

representing matching windows in a source image

The coordinates of the pixel points within the frame,

representing reference image matching windows

The mean value of the pixel values within (a),

representing the mean of the pixel values within the source image matching window.

In the method, in order to improve efficiency, the NCC average value of the matched pixel points in the M source image and one reference image can be used as the corresponding depth confidence value in the reference image in the calculation process. Namely:

，

wherein p represents a pixel point of a depth confidence value to be solved in the reference image,

and representing pixel points matched with the p points in the source image.

With reference to the first aspect, in a possible implementation manner, calculating a confidence mask value of a pixel point in a reference image includes the following steps as shown in fig. 5:

step S501, determining constraint conditions including depth constraint, smooth constraint and normal constraint, and determining an objective function.

Step S502, according to the initial depth confidence map of the reference image, confidence mask values of pixel points in the reference image are calculated.

In step S503, the confidence mask values of the pixel points in the reference image are used to calculate the minimum value of the objective function.

Step S504, the minimum value of the objective function is determined to be the optimized depth value of the pixel point in the reference image.

In the present application, before performing step S501, a full convolution network based on a computer vision Group (VGG) is trained, with symmetric encoding and decoding for predicting the surface normal of each pixel from all images. In step S501, an equation system is defined to complete the drawing of the optimized depth map, and the objective function is defined as a weighted sum of three constraint conditions, namely, a depth constraint, a smooth constraint and a normal constraint.

Constraining E at depth_DAnd representing the distance between the initial depth value of the pixel point to be calculated and the estimated depth value, so that the estimated depth value is close to the initial depth value.

Smoothing constraint E_STo representThe depth consistency of adjacent pixels encourages them to have the same depth.

Normal constraint E_NRepresenting the correspondence between the reference image prediction surface normal n (p) and the matching point prediction normal in the source image.

In step S502, the depth map calculated by MVS inevitably contains noise, anomalies and large holes, especially in weak texture regions, so that it is difficult to complete the depth map directly. Aiming at the problem, a confidence mask is designed according to the output confidence of MVS, and a depth constraint E is added_DTo indicate the reliability of each depth point. That is, if the p-confidence value of a pixel is low, the depth is considered unreliable, and the depth constraint E is weighted downward_DThe confidence mask value proposed by the present invention is described as:

,

wherein the content of the first and second substances,

represents the mean in a gaussian distribution;

represents the variance in a gaussian distribution; c represents the depth confidence value of the pixel point;

representing a depth confidence threshold; in this application, let

。

When the confidence of the pixel point is larger than

When the confidence coefficient is less than the confidence coefficient, the reliability of the depth value of the pixel point is high

In time, the reliability of the depth value representing the pixel point is limited. But these blurred pixels are not discarded directly, but are weighted by a gaussian distribution application.

The objective function in step S501 is:

，

wherein the content of the first and second substances,

representing the initial depth values of the pixels of the reference image,

representing an estimated depth value of the reference image,

representing the surface normal of each pixel point in the reference image,

representing the tangent vector required for dot product with the surface normal, p is a pixel point on the reference image,

respectively a depth constraint, a smoothing constraint and a normal constraint.

Weight, initial parameter setting, respectively representing each constraint

=10³，

=1，

=10^-3. By establishing an error matrix of the objective function, ultimately, theAnd (3) solving by using sparse Cholesky factorization, wherein the final solution is a global minimum solution of the approximate objective function, and D (p) is obtained and is the optimized depth value.

And after the initial depth values of all the pixel points in the reference image are optimized and calculated, determining an optimized depth map. The optimized depth extract is further filtered. For reference picture I_rThe point p in (1) is,

is that point p is in image I_rBack-projecting p to world coordinates by known camera intrinsic parameters and pose information to obtain 3D coordinates of the corresponding point p. Then projects p to its neighboring image I_r+1In (1).

Is that point p is in image I_r+1From the projection position in image I_r+1If satisfied, the obtained depth value

Set t =0.01, consider 3D point p at I_rAnd I_r+1Is consistent. If I_rSatisfies this equation, the pixel point p is considered accurate and retained, otherwise it is removed. The depth map filtering process can fill many small holes effectively, but for some large missing regions in the weak texture region, the fused point cloud still contains some unfilled parts. Therefore, the invention provides an iterative filtering and completion method.

All reserved depth points are cross-validated in the filtering process, which means that they are consistent in adjacent images, so the confidence mask value of the newly added pixel point is set to 1, and the newly generated pixel band point is used by the nearby pixel points to construct a smooth constraint condition, which will help the generation of the smooth constraint condition by the pixel pointsA more accurate depth value. This completion and filtering process is repeated until the number of points in the point cloud becomes stable. Direct comparison of the number of points in the point cloud is superfluous because of the additional fusion step required. Calculating the growth rate of the average pixel point number in the optimized depth map after each iteration, if the average pixel point number

And the number in the last iteration

Satisfy the equation

The iteration is stopped. As a compromise to balance model integrity and processing time, the present application will

Set to 0.01. Finally, using these iteratively complemented depth maps, all depth maps are back-projected to the 3D points and the points are merged together, fusing them to obtain the final 3D point cloud.

In a second aspect, an embodiment of the present invention provides a depth map optimization apparatus 600 for large scale scene reconstruction, as shown in fig. 6, the apparatus includes: a depth range estimation module 601, an image selection module 602, an initial depth map and initial depth confidence map determination module 603, a confidence mask determination module 604, an optimized depth map determination module 605, a depth map determination module 606.

A depth range estimation module 601, configured to pre-process the N multi-view images, and determine a depth estimation range of each multi-view image; and estimating the depth range by adopting an incremental motion recovery structure algorithm, and selecting an image for stereo matching.

An image selecting module 602, configured to select one of N multi-view images as a reference image and M images as source images, where M < N;

the initial depth map and initial depth confidence map determining module 603 determines an initial depth map and an initial depth confidence map of a reference image according to the depth estimation range of the reference image and the depth estimation range of the source image; calculating the sampling number of each pixel point in the reference image in the depth direction of the reference visual angle; and calculating the initial depth value of the pixel point according to the sampling number, and determining the initial depth map of the reference image. Calculating the matching correlation value of each pixel point in the reference image corresponding to the pixel points in the M source images, and determining a plurality of matching correlation values of each pixel point in the reference image; calculating an average value of a plurality of matching correlation values of each pixel point in the M source images, and determining an initial depth confidence value of each pixel point of a reference image; and determining an initial depth confidence map of the reference image according to the initial depth confidence value of each pixel point of the reference image.

The confidence mask determining module 604 is configured to calculate a confidence mask value of a pixel point in the reference image according to the initial depth confidence map of the reference image; determining constraint conditions including depth constraint, smooth constraint and normal constraint, and determining an objective function; calculating confidence coefficient mask values of pixel points in the reference image according to the initial depth confidence coefficient map of the reference image; the confidence coefficient mask value of the pixel point in the reference image is used for calculating the minimum value of the target function; and determining the minimum value of the objective function as the optimized depth value of the pixel point in the reference image.

And the optimized depth map determining module 605 is configured to optimize the depth value of each pixel point in the initial depth map of the reference image according to the confidence coefficient mask value, and determine an optimized depth map.

A depth map determining module 606, configured to filter the optimized depth map and determine a depth map; the method is used for filtering the depth map by adopting an iterative filtering method.

In a third aspect, an embodiment of the present invention provides a depth map optimization server for large-scale scene reconstruction, as shown in fig. 7, including a memory 701 and a processor 702; the memory 701 is used to store computer executable instructions; the processor 702 is configured to execute computer-executable instructions to implement the methods provided above.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where executable instructions are stored in the computer-readable storage medium, and when the computer executes the executable instructions, the method described above can be implemented.

In a specific embodiment of the present application, fig. 8A is a specific input image, fig. 8B is an initial depth map, fig. 8C is a confidence mask map, fig. 8D is a surface normal map, fig. 8E is an optimized depth map, and fig. 8F is a fused point cloud map.

The storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache, a Hard Disk (Hard Disk Drive), or a Memory Card (HDD). The memory may be used to store computer program instructions.

Although the present application provides method steps as described in an embodiment or flowchart, additional or fewer steps may be included based on conventional or non-inventive efforts. The sequence of steps recited in this embodiment is only one of many steps performed and does not represent a unique order of execution. When an actual apparatus or client product executes, it can execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the methods shown in this embodiment or the figures.

The apparatuses or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. The functionality of the modules may be implemented in the same one or more software and/or hardware implementations of the present application. Of course, a module that implements a certain function may be implemented by a plurality of sub-modules or sub-units in combination.

The methods, apparatus or modules described herein may be implemented in a computer readable program code means for a controller in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip pIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

Some of the modules in the apparatus described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary hardware. Based on such understanding, the technical solutions of the present application may be embodied in the form of software products or in the implementation process of data migration, which essentially or partially contributes to the prior art. The computer software product may be stored in a storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, mobile terminal, server, or network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. All or portions of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, mobile communication terminals, multiprocessor systems, microprocessor-based systems, programmable electronic devices, network pC, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the present application; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the present disclosure.

Claims

1. A depth map optimization method for large scale scene reconstruction, comprising:

and filtering the optimized depth map to determine the depth map.

2. The method according to claim 1, wherein the preprocessing the N multiview images comprises:

and selecting an image for stereo matching.

3. The method of claim 1, wherein determining the initial depth map of the reference image comprises:

4. The method of claim 1, wherein determining the initial depth confidence map for the reference image comprises:

5. The method of claim 1, wherein calculating confidence mask values for pixels in the reference image comprises:

6. The method of claim 1, wherein the filtering the optimized depth map comprises: and filtering the depth map by adopting an iterative filtering method.

7. A depth map optimization apparatus for large scale scene reconstruction, comprising:

8. A depth map optimization server for large scale scene reconstruction, comprising a memory and a processor;

the memory is to store computer-executable instructions;

the processor is configured to execute the computer-executable instructions to implement the method of any of claims 1-6.

9. A computer-readable storage medium having stored thereon executable instructions that, when executed by a computer, are capable of implementing the method of any one of claims 1-6.