CN112927251B

CN112927251B - Morphology-based scene dense depth map acquisition method, system and device

Info

Publication number: CN112927251B
Application number: CN202110327446.5A
Authority: CN
Inventors: 蒋永实; 李至; 于海涛; 朱晓阳
Original assignee: Zhongke Qichuang Tianjin Technology Co ltd; Institute of Automation of Chinese Academy of Science
Current assignee: Zhongke Qichuang Tianjin Technology Co ltd; Institute of Automation of Chinese Academy of Science
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2022-10-14
Anticipated expiration: 2041-03-26
Also published as: CN112927251A

Abstract

The invention belongs to the field of augmented reality and computer vision, and particularly relates to a scene dense depth map acquisition method, a scene dense depth map acquisition system and scene dense depth map acquisition equipment based on morphology, aiming at solving the problems that a monocular camera is low in pose estimation precision, lost in dynamic scene tracking, weak in robustness and difficult to acquire a scene dense depth map with a low calculated amount. The invention comprises the following steps: initializing IMU parameters; acquiring image pose estimation and a scene sparse point diagram by adopting vision-inertia combined nonlinear optimization; extracting a coarse morphological edge by morphological dilation and erosion operations; modifying an edge judgment standard, and optimizing a morphological edge by using a Canny edge detector to obtain a final depth edge; and (4) establishing constraint conditions, constructing an optimization problem to carry out sparse depth propagation, and accelerating solution to obtain a final dense depth map of the scene. The method has high accuracy and strong robustness in estimating the lower posture in a complex dynamic scene. The obtained dense depth map has sharp edges and smooth regions, and has high accuracy and real-time performance.

Description

Morphology-based scene dense depth map acquisition method, system and device

Technical Field

The invention belongs to the field of augmented reality and computer vision, and particularly relates to a method, a system and equipment for acquiring a scene dense depth map based on morphology.

Background

Augmented Reality (AR) intelligently fuses and presents relevant information in front of a user in the real physical world in real time by establishing a mapping relation between a virtual scene and the real physical world, so that the purpose of enhancing the context perception capability of the user is achieved. With the popularization of intelligent terminals and 4G/5G communication, the augmented reality technology is gradually expanded from high-end applications such as national defense safety, industrial production, medical health, city management and the like to mass applications such as electronic commerce, cultural protection, tourism and entertainment and the like, and becomes a fundamental tool for people to learn the world and modify the world.

Dense depth maps are key to the realism of augmented reality systems. Virtual-real occlusion means that a correct occlusion relationship exists between a virtual object and a real object, namely, the virtual-real occlusion conforms to the front-back relationship between objects in a real physical world, and is one of core problems in creating reality of an augmented reality scene. The realization of correct virtual-real shielding effect requires a dense depth map of the scene, so that the spatial depth hierarchy of real and virtual objects is accurately determined. Dense depth maps also play an important role in the field of computer vision, as three-dimensional object reconstruction techniques also require the acquisition of dense depth maps of images.

In recent years, researchers often adopt a Simultaneous localization and mapping (SLAM) technology to acquire their own pose in real time in a real scene and acquire a dense scene map of a real scene space geometric structure. However, the following problems still exist in the method of utilizing the SLAM technology to obtain the dense depth map: the monocular/binocular-based method fails in a fast-moving and weak-texture environment, and mostly only the sparse depth of a scene can be obtained; the IMU-based method is greatly influenced by noise, has large accumulated error and is difficult to directly obtain dense depth; the RGBD camera based method can generally obtain dense depth, but the calculation amount is huge, the real-time performance is poor, the equipment price is high, and the method is difficult to be used in outdoor scenes.

In general, monocular cameras are most widely applied at present, but the existing SLAM method is difficult to accurately estimate the pose locus of the camera in a dynamic scene and construct a dense depth map of the scene with low calculation amount.

Disclosure of Invention

In order to solve the above problems in the prior art, namely, the problems that the monocular camera pose estimation accuracy is low, the dynamic scene tracking is lost, the robustness is weak, and it is difficult to acquire the scene dense depth map with a low calculation amount, the invention provides a morphology-based scene dense depth map acquisition method, which comprises the following steps:

s10, initializing an IMU (inertial measurement Unit) based on the 0 th-t frame image of the monocular camera and a corresponding IMU measurement value to obtain initialized IMU parameters; the IMU parameters comprise a gravity rotation matrix, a scale factor, a speed and an IMU deviation;

step S20, solving IMU pre-integration based on the initialized IMU parameters, obtaining a rotation matrix R, a speed v and a position measurement p-integration quantity, constructing an inertial error corresponding to the t-t +1 frame image, constructing a re-projection error based on an ORB matching result of the t-t +1 frame image, and constructing a vision-inertia combined error optimization function by combining the inertial error and the re-projection error;

s30, solving the visual-inertial combined error optimization function by adopting a nonlinear optimization frame g2o, iterating by a Levenberg-Marquardt method until the error function value is less than a set value, and obtaining a camera pose and a scene sparse point diagram of the t +1 th frame of image;

s40, down-sampling the t +1 th frame image through a Gaussian pyramid, extracting a morphological rough edge of the down-sampled image by using morphological expansion and corrosion operations, and up-sampling through Gaussian convolution to obtain a preliminary morphological edge of the t +1 th frame;

s50, modifying an edge judgment standard by combining morphological threshold constraint modification to improve a Canny edge detector, and acquiring an accurate depth edge image of a t +1 frame through the improved Canny edge detector based on the image of the t +1 frame and the preliminary morphological edge of the t +1 frame;

step S60, constructing a depth alignment constraint condition, an edge constraint condition and a time coherent constraint condition for sparse depth propagation based on a scene sparse point diagram of a t +1 th frame and an accurate depth edge diagram of the t +1 th frame, and obtaining an initial scene dense depth diagram of the t +1 th frame;

and step S70, based on the initial scene dense depth map of the t +1 th frame, performing residual texture edge smoothing processing through a bilateral filter to obtain a final scene dense depth map of the t +1 th frame.

In some preferred embodiments, step S10 includes:

s11, acquiring camera poses corresponding to k frames of key frames in the 0-t frame images by a pure vision BA optimization method, and acquiring a speed initial value under a world coordinate system;

step S12, the IMU measurement value of the key frame of the 0-k frames is represented as I _0：k The initialized IMU parameter is denoted as y _k ＝{s，R _wg B, v, constructing a graph optimization function for the inertial motion and expressing by adopting the maximum posterior probability:

p(y _k |I _0：k )∝p(I _0：k |y _k )p(y _k )

step S13, based on the maximum posterior probability representation, combining the independence of IMU measurement, converting to obtain the maximum likelihood estimation:

wherein the content of the first and second substances,

the parameter set p (y) needed to be solved for the initialization of the IMU _k ) Is a prior probability; r _wg Representing a gravity rotation matrix from an inertial coordinate system to a world coordinate system; s is a scale factor; b is IMU deviation, including gyroscope deviation and accelerometer deviation; v is the velocity in the inertial coordinate system between the key frames of 0-k frames, v _i-1 And upsilon _i Respectively representing the velocity in the I-1 th and I-th key frame inertial coordinate systems, I _i-1，i Representing IMU measurements between the ith-1 and ith frame keyframes;

step S14, will

Taking the negative logarithm, the conversion is to minimize the sum of the errors:

wherein r is _p Is the residual error of the prior,

the measurement residual error of the IMU is represented by sigma, which represents the corresponding information matrix;

s15, under the condition that the camera moves at a constant speed, solving the minimized error by meeting a preset first constraint condition to obtain a gravity rotation matrix, a scale factor, a speed and an IMU deviation; under the slow movement of the camera, fixing the IMU deviation b and the gravity rotation matrix R _wg Solving the minimized error by meeting a preset first constraint condition to obtain a scale factor, updating the current camera pose based on the scale factor, and performing IMU initialization again until the camera moves at a constant speed; the preset first constraint condition is that the IMU pre-integration and the prior probability both meet Gaussian distribution.

In some preferred embodiments, step S40 includes:

s41, performing interlaced downsampling on the t +1 frame image through a Gaussian pyramid, adjusting the image to be half of the original size, and obtaining a t +1 frame downsampled image;

s42, performing convolution processing on the t +1 th frame of down-sampled image through expansion operation and corrosion operation with the rectangular kernel of 5 multiplied by 5 respectively, and solving local maximum and minimum values in a region to obtain a t +1 th frame of expanded image and a t +1 th frame of corrosion image;

s43, carrying out differential processing on the t +1 th frame expansion image and the t +1 th frame corrosion image to obtain a t +1 th frame morphological coarse edge;

and step S44, performing upsampling on the morphological coarse edge of the t +1 th frame through Gaussian convolution to obtain a preliminary morphological edge of the t +1 th frame with the same size as the original image.

In some preferred embodiments, step S50 includes:

step S51, based on the preliminary morphological edge of the t +1 frame and the t +1 frame image, denoising through a 7 × 7 Gaussian smoothing filter, and extracting an x gradient and a y gradient of the denoised image through a Sobel operator;

step S52, selecting the maximum value of 3 channels in the x direction and the y direction as the gradient value of the direction, and solving the gradient value M of the morphological edge _c And image gradient value M _i ；

S53, adopting non-maximum suppression to the t +1 frame image, and searching a local gradient maximum in a pixel point field as a candidate edge pixel point set;

step S54, combining the morphological edge gradient value M _c And image gradient value M _i And the morphological threshold τ _mor Strong edge threshold τ corresponding to image _high And weak edge threshold τ _low And judging the pixel category in the candidate edge pixel point set:

if M is _i ＞τ _high And M _c ＞τ _mor Then it is a strong edge pixel; if τ is _high ≥M _i ≥τ _low If the pixel is a weak edge pixel; otherwise, it is a non-edge pixel;

step S55, performing edge lag tracking on the weak edge pixel, and if a strong edge pixel exists in the 8 adjacent regions of the weak edge pixel, converting the weak edge pixel into a strong edge pixel; otherwise, converting into a non-edge pixel;

and S56, traversing the candidate edge pixel point set to obtain an accurate depth edge map of the t +1 th frame.

In some preferred embodiments, the depth alignment constraint is:

wherein D (p) represents a sceneDepth value corresponding to pixel point p in dense depth map, D _s (p) representing a depth value corresponding to a pixel point p in the scene sparse point diagram; w is a _s As weight, if the pixel point p is in the scene sparse point diagram, w _s The value is 1.

In some preferred embodiments, the edge constraint is:

wherein, p and q represent two adjacent pixel points, D (p) and D (q) represent the depth values corresponding to the pixel point p and the pixel point q in the scene dense depth map respectively, and D _e Representing the depth value, w, of the pixel point in the precise depth edge map of the t +1 th frame _e And m (p) and m (q) are gradient values of the pixel points p and q respectively.

In some preferred embodiments, the temporally coherent constraint is:

wherein D is _t (p) represents the depth value of the pixel point p in the previous frame image, D (p) represents the depth value corresponding to the pixel point p in the scene dense depth map, w _t Are weights.

In another aspect of the present invention, a morphology-based scene dense depth map acquisition system is provided, which includes the following modules:

the IMU initialization module is configured to initialize IMU based on the 0 th-t th frame images of the monocular camera and corresponding IMU measurement values to obtain initialized IMU parameters; the IMU parameters comprise a gravity rotation matrix, a scale factor, a speed and an IMU deviation;

the vision-inertia combined optimization module is configured to solve IMU pre-integration based on the initialized IMU parameters, obtain a rotation matrix R, a speed v and a position measurement p integral quantity, construct an inertial error corresponding to the t-t +1 frame of image, construct a re-projection error based on an ORB matching result of the t-t +1 frame of image, and construct a vision-inertia combined error optimization function by combining the inertial error and the re-projection error;

a solution iteration module which adopts a nonlinear optimization frame g2o to solve the vision-inertia combined error optimization function, iterates through a Levenberg-Marquardt method until an error function value is smaller than a set value, and obtains a camera pose and a scene sparse point diagram of the t +1 th frame of image;

the morphological edge extraction module is configured to perform downsampling on the t +1 th frame of image through a Gaussian pyramid, extract a morphological rough edge from the downsampled image by using morphological expansion and corrosion operations, perform upsampling through Gaussian convolution, and obtain a preliminary morphological edge of the t +1 th frame;

the accurate depth edge extraction module is configured to modify an edge judgment standard by combining morphological threshold constraint to improve an edge detector, and based on the t +1 frame image and the preliminary morphological edge of the t +1 frame, an accurate depth edge image of the t +1 frame is obtained through the improved edge detector;

the density module is configured to construct a depth alignment constraint condition, an edge constraint condition and a time coherent constraint condition for sparse depth propagation based on a scene sparse point diagram of a t +1 th frame and an accurate depth edge diagram of the t +1 th frame, and obtain an initial scene dense depth diagram of the t +1 th frame;

and the smoothing processing module is configured to perform residual texture edge smoothing processing through a bilateral filter based on the initial scene dense depth map of the t +1 th frame to obtain a final scene dense depth map of the t +1 th frame.

In a third aspect of the present invention, an electronic device is provided, including:

at least one processor; and

a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,

the memory store 0 stores instructions executable by the processor for execution by the processor to implement the morphology-based scene dense depth map acquisition method described above.

In a fourth aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for being executed by the computer to implement the above-mentioned morphology-based scene dense depth map acquisition method.

The invention has the beneficial effects that:

(1) According to the scene dense depth map acquisition method based on morphology, an inertial sensor is introduced on the basis of a monocular camera, a new IMU initialization strategy is provided to solve IMU parameters, a vision-inertia combined optimization function is constructed by combining a reprojection error between monocular camera image frames and an inertial error obtained by inertial sensor pre-integration, and a pose estimation and sparse map point set in a tracking process are solved, so that the accuracy and robustness of six-degree-of-freedom pose estimation and scene sparse depth value solution in tracking and positioning under a complex scene are integrally improved, and the accuracy and precision of a finally acquired scene dense depth map are further improved.

(2) The scene dense depth map acquisition method based on morphology combines the morphology threshold constraint to modify the edge judgment standard to improve the edge detector, effectively reduces the false edge information in the depth edge map, and acquires the depth edge map which contains less texture edges and is aligned with the image edges, thereby further improving the accuracy and precision of the finally acquired scene dense depth map.

(3) According to the scene dense depth map acquisition method based on morphology, sparse depth propagation is carried out on the accurate depth edges and scene sparse depth values through constructing a depth alignment constraint condition, an edge constraint condition and a time coherence constraint condition, the dense depth map with clear boundary, smooth region and time-space coherence is obtained through an acceleration strategy iteration solution, and the accuracy and the real-time performance of dense map construction are improved on the whole.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a schematic flow chart diagram of a morphology-based scene dense depth map acquisition method of the present invention;

fig. 2 is a schematic view of a depth edge extraction flow of an embodiment of the scene dense depth map acquisition method based on morphology according to the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The invention provides a scene dense depth map acquisition method based on morphology, which combines an inertial sensor on the basis of a monocular camera, fully utilizes the advantages that the monocular camera is not influenced by noise, the accumulated error is small, and the inertial sensor accurately acquires rapid motion data, does not need to be configured with expensive equipment, can acquire indoor and outdoor scene dense depth maps with lower calculation amount, and has high precision and good robustness.

The invention discloses a morphology-based scene dense depth map acquisition method, which comprises the following steps:

s10, initializing an IMU (inertial measurement Unit) based on the 0 th-t frame image of the monocular camera and a corresponding IMU measured value to obtain an initialized IMU parameter; the IMU parameters comprise a gravity rotation matrix, a scale factor, a speed and an IMU deviation;

step S20, solving IMU pre-integration based on the initialized IMU parameters, obtaining a rotation matrix R, a velocity v and a position measurement p integral quantity, constructing an inertial error corresponding to the t-t +1 th frame of image, constructing a reprojection error based on an ORB matching result of the t-t +1 th frame of image, and constructing a vision-inertia combined error optimization function by combining the inertial error and the reprojection error;

s40, down-sampling the t +1 th frame image through a Gaussian pyramid, extracting a morphological rough edge of the down-sampled image by using morphological expansion and corrosion operations, and up-sampling the down-sampled image through Gaussian convolution to obtain a preliminary morphological edge of the t +1 th frame;

step S60, constructing a depth alignment constraint condition, an edge constraint condition and a time coherence constraint condition for sparse depth propagation based on a scene sparse point diagram of a t +1 th frame and an accurate depth edge diagram of the t +1 th frame, and obtaining an initial scene dense depth diagram of the t +1 th frame;

and step S70, based on the initial scene dense depth map of the t + l frame, performing residual texture edge smoothing processing through a bilateral filter to obtain a final scene dense depth map of the t +1 frame.

In order to more clearly describe the scene dense depth map acquisition method based on morphology, the following describes each step in the embodiment of the present invention in detail with reference to fig. 1.

The method for acquiring the scene dense depth map based on morphology in the first embodiment of the invention comprises the following steps S10-S70, wherein the following steps are described in detail:

s10, initializing an IMU (inertial measurement Unit) based on the 0 th-t frame image of the monocular camera and a corresponding IMU measurement value to obtain initialized IMU parameters; the IMU parameters include a gravity rotation matrix, scale factors, velocity, and IMU bias.

The IMU initialization process, the visual-inertial combined optimization function construction and the iterative solution process can be carried out simultaneously, if the IMU initialization process and the visual-inertial combined optimization function construction and the iterative solution process are carried out simultaneously, the inertial error adopted by the visual-inertial combined optimization function construction can be obtained through prior information, the visual-inertial combined optimization function is continuously subjected to iterative optimization solution along with the IMU initialization process until the IMU initialization is completed, IMU pre-integration solution is carried out through initialized IMU parameters, an inertial matrix is updated through the parameters obtained through solution, the visual-inertial combined optimization function iterative solution is continuously carried out through new IMU data and video images, and scene sparse depth values and six-degree-of-freedom pose estimation of the current frame are obtained.

The following description is developed by separately initializing the IMU and then constructing a visual-inertial joint optimization function and performing iterative solution.

The initialization of the IMU comprises two conditions of initialization of the IMU under the constant-speed movement of the camera and initialization of the IMU under the slow movement of the camera, and specifically comprises the following steps:

step S11, acquiring camera poses corresponding to k frames of key frames in the 0-t frame images through a pure vision BA optimization method, solving a speed initial value under a world coordinate system by using translation vectors in the poses, only considering the optimization problem of a pure inertial sensor in the following calculation process, wherein the involved IMU parameters are shown as a formula (1):

y _k ＝{s，R _wg ，b，v} (1)

step S12, the IMU measurement value of the key frame of the 0-k frames is represented as I _0：k And constructing a graph optimization function for the inertial motion and expressing the function by adopting maximum posterior probability, wherein the formula (2) is as follows:

p(y _k |I _0：k )∝p(I _0：k |y _k )p(y _k ) (2)

based on the maximum posterior probability expression, combining the independence of IMU measurement, converting to obtain the maximum likelihood estimation, and converting the formula (2) into the formula (3):

wherein, the first and the second end of the pipe are connected with each other,

the parameter set p (y) needed to be solved for the initialization of the IMU _k ) Is a prior probability; r _wg Representing a gravity rotation matrix from an inertial coordinate system to a world coordinate system; s is a scale factor; b is IMU deviation, including gyroscope deviation and accelerometer deviation; v is the velocity in an inertial coordinate system between 0-k frames and key frames, upsilon _i-1 And upsilon _i Respectively representing the velocity in the I-1 th and I-th key frame inertial coordinate systems, I _i-1，i Representing IMU measurements between the ith-1 and ith frame keyframes;

step S14, will

Taking the negative logarithm, and converting into the sum of minimized errors, as shown in equation (4):

wherein r is _p Is the a-priori residual error that is,

s15, under the condition that the camera moves at a constant speed, solving the minimized error according to the preset first constraint condition to obtain a gravity rotation matrix, a scale factor, a speed and an IMU deviation; after the initialization of the inertial sensor is finished, updating the current camera pose according to the gravity direction and the scale, and updating the pre-integration process of the IMU according to the new IMU deviation;

under the slow movement of the camera, fixing the IMU deviation b and the gravity rotation matrix R _wg Solving the minimized error by meeting a preset first constraint condition to obtain a scale factor, updating the current camera pose based on the scale factor, and performing IMU initialization again until the camera moves at a constant speed;

the preset first constraint condition is that both the IMU pre-integration and the prior probability satisfy Gaussian distribution.

And S20, solving IMU pre-integration based on the initialized IMU parameters, obtaining a rotation matrix R, a speed v and a position measurement p-integral quantity, constructing an inertial error corresponding to the t-t +1 frame image, constructing a re-projection error based on an ORB matching result of the t-t +1 frame image, and constructing a vision-inertia combined error optimization function by combining the inertial error and the re-projection error.

The visual-inertial joint optimization function can be constructed by referring to the method in ORB-SLAM3, and the invention is not described in detail herein.

And S30, solving the vision-inertia combined error optimization function by adopting a nonlinear optimization frame g2o, iterating by a Levenberg-Marquardt method until the error function value is less than a set value, and obtaining a camera pose and a scene sparse point diagram of the t +1 th frame of image.

The extraction process of the precise depth edge map can also be performed simultaneously with the processes of obtaining the optimal scene sparse depth value and the six-degree-of-freedom pose estimation by the IMU initialization and the iterative solution.

And S40, downsampling the t +1 frame image through a Gaussian pyramid, extracting a morphological rough edge of the downsampled image by using morphological expansion and corrosion operations, and upsampling through Gaussian convolution to obtain a preliminary morphological edge of the t +1 frame.

step S42, performing convolution processing on the t +1 th frame of down-sampled image through expansion operation and corrosion operation with the rectangular kernel of 5 multiplied by 5 respectively, and solving local maximum and minimum values of a region to obtain a t +1 th frame of expanded image and a t +1 th frame of corrosion image;

the expansion operation and the erosion operation are performed independently, two operations may be successively performed to perform convolution processing on the t +1 th frame down-sampled image respectively, or two operations may be simultaneously performed to perform convolution processing on the t +1 th frame down-sampled image respectively, which is not limited in the present invention.

and S44, performing up-sampling on the morphological coarse edge of the t +1 th frame through Gaussian convolution to obtain a preliminary morphological edge of the t +1 th frame with the same size as the original image.

And S50, modifying an edge judgment standard by combining morphological threshold constraint modification to improve a Canny edge detector, and acquiring an accurate depth edge map of the t +1 th frame through the improved Canny edge detector based on the t +1 th frame image and the preliminary morphological edge of the t +1 th frame.

step S52, selecting the maximum value of 3 channels in the x direction and the y direction as the gradient value of the direction, and solving the gradient value M of the morphological edge _coarse And image gradient value M _intensity As shown in formulas (5) and (6):

wherein, g _x And g _y Gradient, g ', in x-and y-directions of a pixel representing a pixel position (x, y) in the preliminary morphological edge of the t +1 th frame, respectively' _x And g' _y Gradients in the x-direction and the y-direction of a pixel representing a pixel position (x ', y') in the t +1 th frame image, respectively;

selecting the maximum morphological edge gradient M on the t +1 frame image _cmax And image gradient M _imax As denominator, the gradient solved by each pixel point is taken as numerator, which can be converted into proportion to be expressed as: morphological edge gradient M _c And image gradient M _i ；

adopting non-maximum suppression on a color image (namely a t +1 th frame image), searching a local gradient maximum value of a pixel point neighborhood as a candidate edge pixel, and suppressing other non-maximum parts so as to eliminate most non-edge points, wherein the rest pixel points are used as a candidate edge pixel point set for further division;

The finally obtained precise depth edge image of the t +1 th frame comprises most depth edges of the scene, and simultaneously most complicated texture edges are suppressed.

And S60, constructing a depth alignment constraint condition, an edge constraint condition and a time coherence constraint condition for sparse depth propagation based on the scene sparse point diagram of the t +1 th frame and the accurate depth edge diagram of the t +1 th frame, and obtaining an initial scene dense depth diagram of the t +1 th frame.

Depth alignment: and aligning the dense depth with the sparse depth value, wherein the depth alignment constraint condition is shown as the formula (7):

d (p) represents the depth value corresponding to the pixel point p in the scene dense depth map, and D _s (p) representing a depth value corresponding to a pixel point p in the scene sparse point diagram; w is a _s For weight, if the pixel point p is in the scene sparse point map, w _s The value is 1.

Sharp edges, smooth areas: when the position of the depth edge is to maintain good distinguishability, namely the edge is sharp; the depth is kept highly smooth in other areas of the object than at the depth edges, i.e. the areas are smooth. The edge constraint condition is shown in equation (8):

D _e Representing the depth value of the pixel point in the precise depth edge image of the t +1 th frame if D of two pixel points _e The value is different, then here a depth edge, in which case no smoothing is performed. In the smooth area, the weight value is set to be related to the gradient size m (p) of the pixel, when the gradient of the pixel is smaller, the pixel point is more likely to be a non-edge part, and a high smoothing coefficient is added, so that the texture edge and other areas except the edge can be smoothed.

And (3) time coherence: in video sequences, images have a high temporal correlation. The depth values between the previous frame and the current frame are relatively close, and excessive depth jump should not occur, so that the depth map flickers. The time coherent constraint is given by equation (9):

And combining the three constraint conditions to construct an integral optimization function. For the integral optimization function, different maximum iteration times are set for different frames, and a Conjugate Gradient (CG) method is used for carrying out depth solution. For the key frame, a higher number of iterations is set, with 50 iterations in some preferred embodiments. For non-key frames, the variation between them is often less severe, so the maximum number of iterations is lower, and in some preferred embodiments the number of iterations is 30.

In the iterative solution, a layering iterative solution method and an effective initialization strategy are mainly adopted to effectively accelerate the iterative solution process to obtain the initial scene dense depth, layering iteration refers to using a pyramid to perform down-sampling to obtain different depth map solution results under different resolutions, and then the final dense depth value is integrated according to the thought of a weighting voting method.

The initialized depth map fuses two modules: firstly, a sparse depth map of a current frame; secondly, because the video sequence has time consistency, the content between adjacent frames often only slightly moves (except for a challenging environment), and the initialization is also integrated into a dense depth map of the previous frame after pose transformation. And other frames except the first frame adopt an initialization strategy to carry out deep solution, so that the convergence speed of the deep solution is improved.

The foregoing steps only describe the IMU initialization process of the 0-t frame and the scene dense depth map acquisition process of the t +1 th frame in detail, and after the final scene dense depth map of the t +1 th frame, the step S20 may be skipped to, and the scene dense depth map of the next frame is continued until the image frame is processed.

Although the foregoing embodiments describe the steps in the above sequential order, those skilled in the art will understand that, in order to achieve the effect of the present embodiments, the steps may not be executed in such an order, and may be executed simultaneously (in parallel) or in an inverse order, and these simple variations are within the scope of the present invention.

The morphology-based scene dense depth map acquisition system of the second embodiment of the present invention includes the following modules:

the IMU initialization module is configured to initialize IMU based on the 0 th-t frame images of the monocular camera and corresponding IMU measurement values to obtain initialized IMU parameters; the IMU parameters comprise a gravity rotation matrix, a scale factor, a speed and an IMU deviation;

the vision-inertia combined optimization module is configured to solve IMU pre-integration based on the initialized IMU parameters, obtain a rotation matrix R, a velocity v and a position measurement p integral quantity, construct an inertial error corresponding to the t-t +1 th frame of image, construct a reprojection error based on an ORB matching result of the t-t +1 th frame of image, and construct a vision-inertia combined error optimization function by combining the inertial error and the reprojection error;

a solving iteration module which adopts a nonlinear optimization frame g2o to solve the visual-inertial combined error optimization function, iterates through a Levenberg-Marquardt method until an error function value is smaller than a set value, and obtains a camera pose and a scene sparse point diagram of a t +1 th frame of image;

the accurate depth edge extraction module is configured to modify an edge judgment standard by combining morphological threshold constraint to improve an edge detector, and based on the t +1 th frame image and the preliminary morphological edge of the t +1 th frame, an accurate depth edge image of the t +1 th frame is obtained through the improved edge detector;

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.

It should be noted that, the scene dense depth map acquisition system based on morphology provided in the foregoing embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. Names of the modules and steps related in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

An electronic apparatus according to a third embodiment of the present invention includes:

at least one processor; and

a memory communicatively coupled to at least one of the processors; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the processor for execution by the processor to implement the morphology-based scene dense depth map acquisition method described above.

A computer readable storage medium of a fourth embodiment of the present invention stores computer instructions for being executed by the computer to implement the above-mentioned morphology-based scene dense depth map acquisition method.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of skill in the art will appreciate that the various illustrative modules, method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can be within the protection scope of the invention.

Claims

1. A scene dense depth map acquisition method based on morphology is characterized by comprising the following steps:

step S30, solving the vision-inertia combined error optimization function by adopting a nonlinear optimization frame g2o, iterating by a Levenberg-Marquardt method until the error function value is smaller than a set value, and obtaining a camera pose and a scene sparse point diagram of the t +1 th frame of image;

2. The morphology-based scene dense depth map acquisition method according to claim 1, wherein step S10 includes:

step S12, the IMU measurement value of the key frame of the 0-k frames is represented as I _0：k Initialized IMU parameter denoted as y _k ＝{s，R _wg B, v, constructing a graph optimization function for the inertial motion and expressing by adopting maximum posterior probability:

p(y _k |I _0∶k )∝p(I _0∶k |y _k )p(y _k )

wherein the content of the first and second substances,

the parameter set p (y) needed to be solved for the initialization of the IMU _k ) Is a prior probability; r _wg Representing a gravity rotation matrix from an inertial coordinate system to a world coordinate system; s is a scale factor; b is IMU deviation, including gyroscope deviation and accelerometer deviation; v is the velocity in the inertial coordinate system between the key frames of 0-k frames, v _i -1 and v _i Respectively representing the velocity in the I-1 th and I-th key frame inertial coordinate systems, I _i-1，i Representing IMU measurements between the i-1 th and i-th frame key frames;

step S14, will

wherein r is _p Is the a-priori residual error that is,

s15, under the condition that the camera moves at a constant speed, solving the minimized error by meeting a preset first constraint condition to obtain a gravity rotation matrix, a scale factor, a speed and an IMU deviation; under the slow movement of the camera, fixing the IMU deviation b and the gravity rotation matrix R _wg Solving the minimized error by meeting a preset first constraint condition to obtain a scale factor, updating the current camera pose based on the scale factor, and performing IMU initialization again until the camera moves at a constant speed; the preset first constraint condition is that both the IMU pre-integration and the prior probability satisfy Gaussian distribution.

3. The morphology-based scene dense depth map acquisition method according to claim 1, wherein step S40 includes:

4. The morphology-based scene dense depth map acquisition method according to claim 1, wherein step S50 includes:

if M is _i ＞τ _high And M is _c ＞τ _mor Then it is a strong edge pixel; if tau _high ≥M _i ≥τ _low If the pixel is a weak edge pixel; otherwise, it is a non-edge pixel;

and S56, traversing the candidate edge pixel point set to obtain an accurate depth edge map of the (t + 1) th frame.

5. The morphology-based scene dense depth map acquisition method according to claim 1, wherein the depth alignment constraint conditions are:

d (p) represents the depth value corresponding to the pixel point p in the scene dense depth map, D _s (p) representing a depth value corresponding to a pixel point p in the scene sparse point diagram; w is a _s For weight, if the pixel point p is in the scene sparse point map, w _s The value is 1.

6. The morphology-based scene dense depth map acquisition method according to claim 1, wherein the edge constraint condition is:

wherein, p and q represent two adjacent pixel points, D (p) and D (q) represent the depth values corresponding to the pixel point p and the pixel point q in the scene dense depth map respectively, and D _e Representing the depth value of the pixel point in the precise depth edge image of the t +1 th frame, w _e M (p) and m (q) are weights, respectivelyThe gradient values of the pixel points p and q are obtained.

7. The morphology-based scene dense depth map acquisition method of claim 1, wherein the temporal coherence constraint is:

8. A scene dense depth map acquisition system based on morphology is characterized by comprising the following modules:

the morphological edge extraction module is configured to perform downsampling on the t +1 th frame image through a Gaussian pyramid, extract a morphological coarse edge from the downsampled image by using morphological expansion and corrosion operations, perform upsampling through Gaussian convolution, and obtain a preliminary morphological edge of the t +1 th frame;

the density module is configured to construct a depth alignment constraint condition, an edge constraint condition and a time coherence constraint condition for sparse depth propagation based on the scene sparse point diagram of the t +1 th frame and the accurate depth edge diagram of the t +1 th frame, and obtain an initial scene dense depth diagram of the t +1 th frame;

9. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the processor for execution by the processor to implement the morphology-based scene dense depth map acquisition method of any one of claims 1-7.

10. A computer-readable storage medium storing computer instructions for execution by the computer to implement the morphology-based scene dense depth map acquisition method of any one of claims 1-7.