CN110021041B - Unmanned scene incremental gridding structure reconstruction method based on binocular camera - Google Patents

Unmanned scene incremental gridding structure reconstruction method based on binocular camera Download PDF

Info

Publication number
CN110021041B
CN110021041B CN201910156872.XA CN201910156872A CN110021041B CN 110021041 B CN110021041 B CN 110021041B CN 201910156872 A CN201910156872 A CN 201910156872A CN 110021041 B CN110021041 B CN 110021041B
Authority
CN
China
Prior art keywords
scene
visual
frame
gridding
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910156872.XA
Other languages
Chinese (zh)
Other versions
CN110021041A (en
Inventor
朱建科
李昱辰
章国锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910156872.XA priority Critical patent/CN110021041B/en
Publication of CN110021041A publication Critical patent/CN110021041A/en
Application granted granted Critical
Publication of CN110021041B publication Critical patent/CN110021041B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering

Abstract

The invention discloses a binocular camera-based unmanned scene incremental gridding structure reconstruction method. Preprocessing an input binocular video frame, calculating a disparity map of the input binocular video frame, calculating a corresponding depth map based on the disparity map, and then constructing an initial scene gridding structure through triangulation and mesh subdivision stages; carrying out grid structure smoothing treatment after eliminating the grid nail phenomenon; and calculating to obtain the motion information between the current frame and the previous frame by utilizing the positioning information acquired by the satellite navigation equipment, calculating to obtain a new mesh surface patch added with the global scene gridding structure by combining the scene gridding structure reconstruction result of the current frame, and finally completing the splicing work of the scene gridding structure by a local re-triangulation method. The invention innovatively utilizes the visual difference between two frames of images to find a new visual area, and incrementally updates the overall scene reconstruction result in a gridding structure form, thereby obtaining the three-dimensional scene reconstruction method with better performance and higher robustness under various environments.

Description

Unmanned scene incremental gridding structure reconstruction method based on binocular camera
Technical Field
The invention belongs to the technical field of three-dimensional scene reconstruction, and particularly relates to an unmanned scene incremental gridding structure reconstruction method based on a binocular camera
Background
Nowadays, unmanned vehicles such as automatic driving vehicles and automatic flying unmanned aerial vehicles are greatly developed, and for the unmanned vehicles, automatic navigation and path planning in the driving or flying process are very key technical links. For almost all automatic navigation and path planning techniques, various types of scene structure maps are indispensable, and conventional scene reconstruction generally adopts an offline scene structure reconstruction scheme based on various types of high-precision hardware, for example: in the field of automatic driving, researchers often collect and reconstruct scene structure data through a manned vehicle carrying a high-precision RTK (carrier phase differential technology) device and a multi-line-beam laser radar. This acquisition method has two drawbacks: 1) the constructed scene structure is composed of dense laser point clouds and is a scattered point structure essentially, while the scene structure in the real scene is composed of continuous lines or surfaces, and the scattered point structure can lose important scene continuous structure information; 2) the scene structure reconstruction scheme requires high hardware cost and is not suitable for all research institutions and development teams.
Aiming at the defects contained in the off-line scene structure reconstruction scheme based on various high-precision hardware, and along with the development of the SLAM (synchronous positioning and mapping) technology, some advanced SLAM technologies can directly construct a dense scene map in a state estimation mode, but the technology still has some defects which are difficult to perfectly solve at present: 1) the constructed scene structure is composed of dense point clouds with color information and is a scattered point structure essentially, while the scene structure in a real scene is composed of continuous lines or surfaces, and the scattered point structure can lose important scene continuous structure information; 2) due to the diversity of scenes and various unexpected emergencies within a scene, even the most advanced SLAM technology cannot maintain excellent state solution in all types of environments, so a dense scene map constructed entirely by means of SLAM technology often cannot achieve satisfactory accuracy.
In summary, the present invention needs to solve the following problems:
1. the problem of discontinuous scene structure reconstruction results is as follows: the dense map constructed by a plurality of traditional scene structure reconstruction schemes is a scattered-point type 'pseudo dense' map, and obviously, the map cannot directly support tasks such as automatic navigation, path planning and the like of various unmanned vehicles;
2. the problem of too high reconstruction cost of a scene structure is as follows: in a scene structure reconstruction scheme based on a high-precision RTK device and a multi-line beam laser radar, hardware devices with very high prices, such as a high-precision satellite navigation device and a 64-line laser radar, are often purchased by research and development teams, for most of the research and development teams, the hardware cost will cause significant economic burden, and for companies, the hardware device scheme obviously cannot meet the requirement of mass production due to the problem of high cost per se.
3. The problem of insufficient accuracy of scene structure reconstruction results is as follows: many conventional scene structure reconstruction schemes, especially scene structure reconstruction schemes based on various motion information resolving technologies, often have unsatisfactory accuracy of a constructed scene structure result, and the reasons for the situation are many, and the reason is that a single type of algorithm often cannot obtain satisfactory processing results under all situations, which is one of the widely inherent disadvantages of solutions mainly based on algorithm type technologies.
Disclosure of Invention
In order to solve the problems in the background art, the invention combines a stereoscopic vision technology, a triangulation technology, a mesh subdivision technology, a mesh optimization technology and the like, develops a whole set of incremental scene gridding structure reconstruction method capable of keeping high stability in an outdoor open environment, and proves the effectiveness of the system through a large number of experiments.
The technical scheme adopted by the invention comprises the following steps:
1) and inputting a single-frame binocular video frame in a scene acquired by the vehicle-mounted binocular camera, and carrying out scene gridding structure reconstruction on the input single-frame binocular video frame to obtain network structure characteristics.
The method is applied to unmanned driving or other indoor or outdoor scenes with definite visual texture characteristics, and the unmanned driving scenes particularly comprise common road scenes such as urban roads, rural roads, expressways and the like.
The step 1) is specifically as follows:
1.1) respectively extracting visual feature points on two images in a single-frame binocular video frame acquired by a binocular camera; there may be various visual feature point selection schemes, such as FAST feature points, ORB feature points, BRIEF feature points, etc., which may be selected according to different environmental characteristics, different system frame rate requirements, different hardware devices, etc. In different types of environments, the types of feature points with the best effect may not be the same; and obtaining the FAST characteristic points, ORB characteristic points or BRIEF characteristic points as visual characteristic points.
1.2) visual feature point matching stage: matching the visual characteristic points obtained in the step 1.1) based on a Brute-Force matching (Brute-Force matching) method to obtain visual characteristic point pairs, and calculating the parallax value of each pair of visual characteristic points;
1.3) visual feature point depth estimation stage: establishing a camera coordinate system by taking the initial position of the left eye camera as a coordinate origin, and calculating the three-dimensional position of the visual feature point of the left eye image in the visual feature point pair in the camera coordinate system according to the parallax value of the visual feature point pair in the step 1.2), wherein the specific calculation process is as follows:
1.3.1) calculating the depth value of each visual characteristic point, thereby obtaining a sparse depth map of the left eye image:
Figure BDA0001983212200000031
wherein d is a parallax value, b is a base line length of the binocular camera, f is a focal length of the camera, and z is a depth value;
1.3.2) calculating the three-dimensional position of the visual characteristic point of the left eye image in a camera coordinate system:
Figure BDA0001983212200000032
Figure BDA0001983212200000033
Figure BDA0001983212200000034
Pw=zK-1Puv
wherein K is an internal parameter matrix of the binocular camera, PuvIs the homogeneous coordinate of the visual feature point on the image pixel coordinate system, the image pixel coordinate system is a two-dimensional coordinate system with the upper left corner of the image as the origin, PwThe coordinate of the visual feature point in a camera coordinate system, u is the corresponding position of the visual feature point on a transverse coordinate axis in an image pixel coordinate system, v is the corresponding position of the visual feature point on a longitudinal coordinate axis in the image coordinate system, X is the corresponding position of the visual feature point on an X axis in the camera coordinate system, Y is the corresponding position of the visual feature point on a Y axis in the camera coordinate system, and Z is the corresponding position of the visual feature point on a Z axis in the camera coordinate system;
1.4) triangulation stage: adopting a Delaunay triangulation method to subdivide the interior of a visual characteristic point set to obtain a triangular mesh structure, wherein the visual characteristic point set is a set of visual characteristic points on a left eye image;
1.5) mesh subdivision stage: performing depth interpolation on the triangular mesh structure of the current frame according to the triangular mesh structure generated in the step 1.4) and the depth value of each mesh vertex, iteratively searching visual feature points on the first 5% left eye image with the largest depth error in the whole triangular mesh structure as visual feature points to be updated through a Census descriptor, re-matching the visual feature points to be updated according to the hamming distance, adding the visual feature points to be updated into a visual feature point set, and performing Delaunay triangulation again inside the visual feature point set, thereby obtaining the subdivided scene meshing structure.
2) And searching for the position of the grid spiking phenomenon in the scene gridding structure by using the network structure characteristics, eliminating the grid spiking phenomenon, and improving the smoothness of the whole scene gridding structure by using an approximate Laplace smoothing method.
The "spiking" phenomenon refers to a phenomenon that the depths of a single or few grid vertexes appearing at local positions of the scene gridding structure are excessively different from those of surrounding grids, and is one of the most main reasons for inaccurate estimation of the whole scene gridding structure.
The step 2) is specifically as follows:
2.1) grid "spiking" removal stage: processing each mesh vertex on the scene meshing structure in a traversal mode, and replacing the depth of each mesh vertex by the average depth of each adjacent vertex when the depth of each mesh vertex is larger than or smaller than the depth of each adjacent mesh vertex and the absolute value of the difference between the average depth of each adjacent vertex and the average depth of each adjacent vertex is larger than a threshold value;
2.2) mesh structure smoothing stage: the mesh vertexes in the scene gridding structure are smoothed one by an approximate Laplace smoothing method, and the calculation mode of the approximate Laplace smoothing method is as follows:
Figure BDA0001983212200000041
wherein Z iscAs depth values of vertices of the mesh to be processed, ZcFor the average depth value of all adjacent vertices of the mesh vertex to be processed, alpha is a manually set damping parameter, PwFor the position of the vertices of the mesh to be processed, P, in the camera coordinate systemoAnd optimizing the positions of the vertexes of the mesh to be processed in the camera coordinate system.
3) Acquiring position information of a scene by adopting vehicle-mounted satellite navigation equipment, calculating motion information between a previous frame and a current frame through the position information to obtain a transformation matrix for describing continuous motion between frames, and then detecting a new visual area; the new visual area is detected as: and constructing a virtual image according to the transformation matrix and the scene gridding structure of the previous frame, and determining a new visual area of the previous frame compared with the current frame by using the visual difference between the virtual image and the current frame.
The method for detecting the new visual area in step 3) is to detect the new visual area in the current frame in a virtual image-based manner, and specifically includes the following steps:
each pixel point of a left eye image in a previous single-frame binocular video frame is processed as follows to construct a virtual image:
Figure BDA0001983212200000042
Ppis the homogeneous coordinate, z, of the pixel point on the pixel coordinate system of the left eye image in the previous frame of the single-frame binocular video framepIs the depth value, P, of the pixel point at the moment of the last framecIs the homogeneous coordinate, z, of the pixel point on the pixel coordinate system of the left eye image in the current frame of the single-frame binocular video framecThe depth value of a pixel point at the current frame moment is shown, T is a motion matrix corresponding to the inter-frame motion information, and K is an internal parameter matrix of the binocular camera;
the method comprises the steps of utilizing a Navier-Stokes equation to carry out image restoration on a virtual image, then comparing visual differences on each triangular patch area in the restored virtual image and a current frame left-eye image to obtain a visual difference value, wherein the specific calculation mode of the visual difference value is as follows:
Figure BDA0001983212200000043
wherein n is the total number of pixels in the triangular area, gpiFor the gray value of the ith pixel position in the virtual image, gciD is a visual difference value;
finally, selecting a triangular patch with a visual difference value higher than the average visual difference value as a component patch of the new visual area to finish the detection of the new visual area; the average visual difference value is the average value of all visual difference values obtained after the visual difference on each triangular patch area in the repaired virtual image and the current frame left-eye image is compared.
4) And determining the position of the local scene gridding structure to be updated by utilizing the accumulated motion information of the camera position of the current frame relative to the camera position of the initial frame and the scene patch in the constructed overall scene gridding structure and combining the new visual area in the step 3).
The constructed overall scene gridding structure is a scene gridding structure constructed by all frames before the current frame; the overall scene gridding structure is a scene gridding structure constructed by the current frame and all frames before the current frame.
The step 4) is as follows: projecting a triangular patch in the constructed overall scene gridding structure onto a current frame by adopting the following calculation method:
Figure BDA0001983212200000051
wherein, PwThe position of the vertex of the triangular patch in a world coordinate system with a first frame as an origin, z is the depth, T is the accumulated motion information of the camera position of the current frame relative to the camera position of the initial frame, and K is the internal parameter of the binocular cameraMatrix, PuvThe homogeneous coordinate of the vertex of the triangular patch projected on the current frame on the pixel coordinate system;
in the constructed overall scene gridding structure, the position of a triangular patch which is overlapped or partially overlapped with a new visual area in a current frame and has a spatial distance within 5 meters with the new visual area corresponding to the current frame is regarded as the position of a local scene gridding structure to be updated.
5) The incremental splicing method of the scene gridding structure comprises the following steps: and after the new visual area is associated with the position of the grid structure of the local scene to be updated, splicing the grid structure of the whole scene by a Delaunay triangulation method.
The incremental splicing method for the scene gridding structure in the step 5) specifically comprises the following steps: and performing connected region search on all triangular patches in the new visual region of the current frame, sorting out all connected regions, projecting each connected region and the position of the local scene gridding structure to be updated to a position set on the left image of the current frame, and then performing Delaunay triangulation again to update the whole scene gridding structure.
6) And repeating the iteration steps 1-5 until all single-frame binocular video frames obtained by the vehicle-mounted binocular camera are processed, and finally obtaining an overall scene gridding structure reconstruction result capable of meeting the unmanned use requirement.
The invention has the beneficial effects that:
the invention creatively utilizes the visual difference between two frames of images to find a new visual area and incrementally updates the global scene reconstruction result in a gridding structure form, thereby obtaining the three-dimensional scene reconstruction method with better performance and higher robustness under various environments.
The invention obviously optimizes the following three defects of the traditional scene structure reconstruction scheme:
1. the problem of discontinuous scene structure reconstruction results is as follows: the scene structure constructed by the invention is a grid structure which is strictly continuous and has no structural holes, and the requirements of various unmanned vehicles on scene structure maps during tasks such as automatic navigation, path planning and the like can be met.
2. The problem of too high reconstruction cost of a scene structure is as follows: the method does not depend on expensive multi-line beam laser radar equipment at all, and obviously reduces the hardware cost required by a scene structure reconstruction system.
3. The problem of insufficient accuracy of scene structure reconstruction results is as follows: the invention solves the problem of insufficient accuracy of scene structure reconstruction results from two angles: 1) the precision of scene depth calculation is improved through a mesh subdivision technology and a mesh optimization technology; 2) and obtaining more accurate interframe motion information through the satellite navigation equipment than the interframe motion information obtained through the motion information resolving technology.
Drawings
FIG. 1 is an overall frame diagram of the present invention;
FIG. 2 illustrates a scene data collection module;
FIG. 3 shows a single frame scene gridding structure solution module;
FIG. 4 is a mapping of a single frame mesh structure onto a single frame image;
FIG. 5 illustrates a single frame scene gridding structure optimization module;
FIG. 6 illustrates the scene gridding structure optimization effect reflected by a depth map; (a) gridding the scene before optimization reflected by the depth map; (b) gridding the optimized scene reflected by the depth map;
FIG. 7 illustrates a scene gridding structure incremental update module;
FIG. 8 illustrates a generated virtual image; (a) the image is a virtual image before image restoration processing; (b) the virtual image is obtained after image restoration processing;
fig. 9 shows new visual area detection results;
FIG. 10 illustrates a partial map of a new visual area; (a) (c) is the part with inaccurate depth value or incomplete visual information in the previous frame; (b) (d) new visual area detection results;
FIG. 11 shows scene gridding structure reconstruction results; (a) in the form of a grid; (b) in RGB rendering form.
Detailed Description
As shown in fig. 1, the present invention mainly includes the following four technical modules, and each technical module is further described with reference to the accompanying drawings:
first, scene data acquisition module (as shown in fig. 2)
The first stage, video frame acquisition stage based on binocular camera:
the method is characterized in that equipment such as a vehicle-mounted binocular camera, an unmanned aerial vehicle-mounted binocular camera or a handheld binocular camera is utilized to perform mobile exploration in a scene needing to construct a continuous gridding structure and acquire scene continuous binocular video frames, and particularly, the used binocular camera is strictly calibrated. In this stage, it is necessary to ensure that the binocular camera is kept horizontal during the acquisition process and the frame rate is kept stable, so as to ensure the availability of the acquired scene visual information.
And a second stage, namely a motion information acquisition stage based on the satellite navigation equipment:
and synchronously recording position information in a world coordinate system in the process of recording continuous binocular video frames of a scene by utilizing various satellite navigation equipment, and resolving the position information to obtain interframe motion information. It should be noted that, due to different frame numbers or other errors that are difficult to completely eliminate, scene visual information recorded by a binocular camera and position information recorded by satellite navigation equipment cannot be completely synchronized in a strict sense, and the solution adopted by the present invention is: assuming that the motion of the mobile carrier is a uniform motion within a short time (<0.1s), then performing uniform position compensation according to the difference between the corresponding scene visual information timestamp and the position information timestamp. Experiments have shown that this solution is effective.
Two, single frame scene gridding structure resolving module (as shown in figure 3)
The first stage, visual feature point extraction stage:
in the invention, the structure is presented in a grid form, so that the grid vertex is expected to more efficiently reflect the remarkable structural characteristics of the scene, and the accurate pixel point matching between the two images is realized by adopting a visual characteristic point extraction and matching mode. There may be various visual feature point selection schemes, such as FAST feature points, ORB feature points, BRIEF feature points, etc., and the selection may be flexible according to different environment characteristics, different system frame rate requirements, different hardware devices, etc., and experiments prove that the feature point types with the best effect may be different in different environments.
The second stage, visual feature point matching stage:
since the basic idea of scene depth estimation in the present invention is a stereoscopic vision idea, after the visual feature points on two images in a single frame of binocular video frame are extracted, the visual feature points on the two images need to be accurately matched. The process can be realized by various schemes, and as the scene structure reconstruction of the invention is not on-line reconstruction but off-line reconstruction, the invention has no strict requirement on the real-time performance of the matching scheme, so that the invention adopts a Brute-Force matching (Brute-Force mather) mode, directly calculates the Euclidean distance or the Hamming distance between each visual feature point between two images according to the type of the selected visual feature point, and then performs the visual feature point matching work between the two images, and experiments prove that the matching strategy is effective.
The third stage, a visual feature point depth estimation stage:
after the accurate matching of the visual feature points between the two images in the single-frame binocular video frame is completed, the parallax of the corresponding position can be obtained according to the pixel difference in the horizontal direction between the matched visual feature point pairs. In the invention, the left camera in the binocular camera is used as the basis of scene structure perception, so that the sparse disparity map of the left eye image can be obtained through the processing of the stage, and the position with disparity information is the position of the visual feature point on the left eye image. After the disparity value and the parameters of the binocular camera are obtained, the accurate depth value of the corresponding point can be obtained according to the principle of a binocular camera model, and the calculation mode is as follows:
Figure BDA0001983212200000081
wherein d is a parallax value, b is a binocular camera baseline length, f is a camera focal length, and z is a depth result;
thus, a sparse depth map of the left eye image is obtained.
After obtaining the sparse depth map of the left eye image, the accurate position of the pixel point with the depth value in the three-dimensional space needs to be known, and the accurate position can be obtained by calculation according to the pinhole camera model principle, and the calculation method is as follows:
Figure BDA0001983212200000082
Figure BDA0001983212200000083
Figure BDA0001983212200000084
Pw=zK-1Puv
wherein K is an internal parameter matrix of the binocular camera, PuvFor homogeneous coordinates, P, of the visual feature points on the image pixel coordinate systemwIs the coordinate of the visual feature point in three-dimensional space.
Therefore, the accurate position of each visual characteristic point in the left eye image in the three-dimensional space is obtained.
A fourth stage, a triangulation stage:
the invention needs to reconstruct a scene gridding structure with complete continuity, and the accurate three-dimensional position of the sparse scene characteristic points is not enough. Here, the positions of the visual feature points representing the salient features of the scene in the left eye image in the three-dimensional space are obtained, and the scattered points need to form a continuous mesh through the connection between the points through a triangulation scheme. The invention is realized by adopting a Delaunay triangulation scheme, and the main reason for selecting the triangulation scheme is that the Delaunay triangulation scheme maximizes a minimum angle and can form a triangulation network which is closest to regularization, which is very helpful for truly embodying a scene structure.
The fifth stage, the mesh subdivision stage:
the scene gridding structure obtained through Delaunay triangulation is an initial scene gridding structure, the accuracy is not enough to support practical use, and the mesh needs to be further subdivided through a mesh subdivision scheme, so that the mesh achieves higher accuracy. Determining the mesh vertices newly added to the existing mesh is the main driving force of the mesh refinement scheme. In the third stage of the module, a sparse disparity map of the left eye image is obtained, in the fourth stage of the module, an initial scene gridding structure corresponding to the left eye image is obtained, based on the data bases, a dense disparity map of the left eye image can be obtained through a triangle internal interpolation method, according to the dense disparity map of the left eye image, any pixel position in the left eye image can be corresponding to the right eye image, and the error degree of the corresponding disparity is judged by measuring the hamming distance between Census descriptors (a descriptor calculation mode based on pixel neighborhood statistics) of paired pixels. After a dense error degree graph of a left eye image is obtained, the image is uniformly divided into a plurality of parts, a plurality of pixel positions with the highest error degree are taken in each part as positions of grid vertexes newly added into an initial scene grid, but the points do not have corresponding accurate positions in a three-dimensional space, for the points, as the whole system is based on a binocular camera system and the binocular camera is kept horizontal when scene visual information is recorded, only the corresponding positions in the right eye image with the highest matching degree are searched on a horizontal line according to Census descriptors, then parallax is calculated, the depth values of the points can be obtained, and the accurate positions of the points in the three-dimensional space can be obtained according to the depth values and a parameter matrix in the camera. After the mesh vertex position of a newly added mesh is determined, the original Delaunay triangulation result is emptied, and the Delaunay triangulation is carried out again according to all the mesh vertices to obtain the subdivided scene meshing structure. According to the actual precision requirement, the grid subdivision process is iterated for multiple times, and the scene grid structure reconstruction result which meets the precision requirement and is shown in fig. 4 can be obtained.
Three, single frame scene gridding structure optimization module (as shown in figure 5)
The first stage, the grid nail removing stage:
after the reconstruction result of the grid structure of the single frame scene is obtained, because the grid vertexes are only added to the grid structure iteratively and are not deleted according to strict logic in the grid subdivision stage, the grid structure of the scene still has errors, wherein the most significant error is the nail sticking error. The expression form of the error is that a few triangular mesh vertexes with very high prominence suddenly exist in certain parts of the mesh, the vertexes form a visual effect similar to 'nailing', the vertex mainly comes from matching errors in the process of resolving depth based on stereoscopic vision, and the phenomenon of 'nailing' can be caused when the parallax obtained according to the matching result is too large or too small. In the invention, the scheme of removing the grid nail prick is direct, whether a grid vertex is a key vertex causing the nail prick phenomenon is judged, the absolute value of the difference between the depth of the grid vertex and the average depth of all adjacent vertexes is mainly calculated, the depth of the grid vertex is compared with the depth of all adjacent vertexes, when the absolute value of the depth difference is larger than a certain threshold set according to practical experience and the depth of the grid vertex is larger than or smaller than the depth of all adjacent vertexes, the point is considered to be the key vertex causing the nail prick phenomenon, and the average depth of all adjacent vertexes of the point is used for replacing the original depth of the point, so that the single nail prick phenomenon can be solved. Experiments prove that after the process is applied to the whole grid area, all 'nailing' phenomena in the scene grid structure can be solved.
And a second stage, namely a grid structure smoothing stage:
in a conventional mesh structure smoothing scheme, a laplacian smoothing scheme is often used to smooth the mesh structure, which can significantly improve the smoothness of the mesh structure and also change the positions of a plurality of mesh vertices inside the mesh. However, in the present invention, in order to prevent the positions of the mesh vertices whose positions are changed from being changed on the image pixel coordinate system, an approximately laplacian smoothing scheme is adopted, which is still processed mesh-vertex-by-mesh vertex, according to the following formula:
Figure BDA0001983212200000101
wherein Z iscAs depth values of vertices of the mesh to be processed, ZnFor the average depth value of all adjacent vertices of the mesh vertex to be processed, alpha is a manually set damping parameter, PwFor the position of the vertices of the mesh to be processed in three-dimensional space, PoAnd optimizing the position of the vertex of the mesh to be processed in the three-dimensional space.
Experiments prove that after the process is applied to the whole grid area, the smoothness of the scene gridding structure can be obviously improved to be closer to the real scene structure, as shown in fig. 6, the optimization effect of the scene gridding structure reflected by a depth map is shown, and fig. 6(a) and 6(b) are the scene gridding structures before and after optimization reflected by the depth map respectively.
Fourth, incremental updating module of scene gridding structure (as shown in FIG. 7)
The first stage, scene gridding structure updating detection stage:
after the single-frame scene gridding structure is processed by the first-time single-frame scene gridding structure optimizing module, a single-frame scene gridding structure reconstruction result is obtained, and all three-dimensional triangular surface patches contained in the reconstruction result are stored in a scene gridding structure library in a unified mode. Firstly, after a new frame of binocular image is input into the system, the system can construct a scene gridding structure corresponding to a new binocular image frame according to the three modules, then judge which areas on a left eye image in the new binocular image belong to new visual areas, and take the new visual areas and related triangular patches as new triangular patches which need to be added into an original scene gridding structure library. In the invention, a virtual image scheme is adopted to complete the selection work of a new triangular patch. The system obtains a scene gridding structure corresponding to a previous frame of binocular image, a scene gridding structure corresponding to a current frame of binocular image and motion information between the two frames of binocular images, and according to the computer vision principle, the position of each point on a left eye image in the previous frame of binocular image projected on the left eye image in the current frame of binocular image after the action of the motion information can be obtained, and a specific calculation mode aiming at a single point on the left eye image in the previous frame of binocular image is as follows:
setting the homogeneous coordinate of the point on the pixel coordinate system in the left eye image in the previous frame of binocular image as PpThe depth value of the point at the time of the last frame is zpThe homogeneous coordinate of the point on the pixel coordinate system in the left eye image in the current frame binocular image is PcThe depth value of the point at the current frame time is zcThe motion matrix corresponding to the interframe motion information is T, the intrinsic parameter matrix of the camera is K, and the calculation is carried out according to the following formula:
Figure BDA0001983212200000111
after the process is applied to all pixel points with depth values in a left-eye image in a previous frame of binocular image, a virtual image generated according to the depth values and inter-frame motion information can be obtained, so that the virtual image is an image with holes and needs to be restored in an interpolation mode. After the restored virtual image is obtained, comparing the gray value of the image with the gray value of the left eye image in the current frame binocular image, counting the average gray value error, selecting a triangular patch with the gray value error higher than the average gray value error as a component patch of a new visual area (for example, a white patch area shown in fig. 9 shows a new visual area detection result), and adding the triangular patch into a scene gridding structure library to prepare for the later scene gridding structure updating.
Fig. 10 shows a partial correspondence map of a new visual region, wherein (a) (c) is a portion of a previous frame where depth values are inaccurate or visual information is incomplete; (b) and (d) the new visual region detection results corresponding to (a) and (c).
And a second stage, namely a scene gridding structure updating stage:
in order to make the incrementally generated scene grid structure be a continuous grid structure without redundant patches, after the triangular patches representing the new visual area obtained by the previous stage processing are obtained, it is also necessary to select which scene triangular patches in the original scene grid structure library need to be optimized. In the invention, the scheme based on the interframe mapping is still adopted to complete the work. The specific implementation process comprises the following steps: mapping a triangular patch in a scene gridding structure library corresponding to a plurality of frames (ten frames are set in the specific embodiment) before the current frame to the current frame according to the motion information, and regarding the triangular patch which is overlapped or partially overlapped with a new visual area in the current frame as the triangular patch to be optimized; then, searching communicated areas of all triangular patches in the new visual area in the current frame, and sorting out all the communicated areas; secondly, according to the image coordinate vertexes of the triangular patches to be optimized after being mapped to the current frame and the image coordinate vertexes corresponding to all triangular patches in the new visual area in the current frame, the triangular patches are gathered together, and through a Delaunay triangulation technology, each connected area is taken as a reference to carry out re-triangulation work; and finally, deleting the triangular patches to be optimized from the scene gridding structure library, and adding the triangular patches corresponding to the new Delaunay triangulation area into the scene gridding structure library. The operation is carried out frame by frame, and the reconstruction result of the gridding structure of the continuous scene on the global scope can be obtained. Fig. 11 shows a reconstruction result of a gridding structure of a scene, where fig. 11(a) is in a grid form, a structure above a certain depth is replaced by scattered points for visualization, and fig. 11(b) is in an RGB rendering form.
Finally, it should be noted that the above embodiments are merely representative examples of the present invention. Obviously, the technical solution of the present invention is not limited to the above-described embodiments, and many variations are possible. A person skilled in the art may make modifications or changes to the embodiments described above without departing from the inventive idea of the present invention, and therefore the scope of protection of the present invention is not limited by the embodiments described above, but should be accorded the widest scope of the innovative features set forth in the claims.

Claims (5)

1. A binocular camera-based unmanned scene incremental gridding structure reconstruction method is characterized by comprising the following steps:
1) inputting a single-frame binocular video frame in a scene collected by an on-board binocular camera, and carrying out scene gridding structure reconstruction on the input single-frame binocular video frame;
the step 1) is specifically as follows:
1.1) respectively extracting visual feature points on two images in a single-frame binocular video frame acquired by a binocular camera;
1.2) visual feature point matching stage: matching the visual characteristic points obtained in the step 1.1) based on a Brute-Force matching (Brute-Force matching) method to obtain visual characteristic point pairs, and calculating the parallax value of each pair of visual characteristic points;
1.3) visual feature point depth estimation stage: establishing a camera coordinate system by taking the initial position of the left eye camera as a coordinate origin, and calculating the three-dimensional position of the visual feature point of the left eye image in the visual feature point pair in the camera coordinate system according to the parallax value of the visual feature point pair in the step 1.2), wherein the specific calculation process is as follows:
1.3.1) calculating the depth value of each visual characteristic point, thereby obtaining a sparse depth map of the left eye image:
Figure FDA0002782306070000011
wherein d is a parallax value, b is a base line length of the binocular camera, f is a focal length of the camera, and z is a depth value;
1.3.2) calculating the three-dimensional position of the visual characteristic point of the left eye image in a camera coordinate system:
Figure FDA0002782306070000012
Figure FDA0002782306070000013
Figure FDA0002782306070000014
Pw=zK-1Puv
wherein K is an internal parameter matrix of the binocular camera, PuvFor homogeneous coordinates, P, of the visual feature points on the image pixel coordinate systemwThe coordinate of the visual feature point in a camera coordinate system, u is the corresponding position of the visual feature point on a transverse coordinate axis in an image pixel coordinate system, v is the corresponding position of the visual feature point on a longitudinal coordinate axis in the image coordinate system, and X is the visual feature point on the camera coordinate systemThe corresponding position of the visual characteristic point on the x axis in the coordinate system, Y is the corresponding position of the visual characteristic point on the Y axis in the camera coordinate system, and Z is the corresponding position of the visual characteristic point on the Z axis in the camera coordinate system;
1.4) triangulation stage: adopting a Delaunay triangulation method to subdivide the interior of a visual characteristic point set to obtain a triangular mesh structure, wherein the visual characteristic point set is a set of visual characteristic points on a left eye image;
1.5) mesh subdivision stage: performing depth interpolation on the triangular mesh structure of the current frame according to the triangular mesh structure generated in the step 1.4) and the depth value of each mesh vertex, iteratively searching visual feature points on the first 5% left eye image with the largest depth error in the whole triangular mesh structure as visual feature points to be updated through a Census descriptor, re-matching the visual feature points to be updated according to the hamming distance, adding the visual feature points to be updated into a visual feature point set, and performing Delaunay triangulation again inside the visual feature point set, thereby obtaining a subdivided scene gridding structure;
2) searching for the position of a grid spiking phenomenon in the scene gridding structure by using the network structure characteristics, eliminating the grid spiking phenomenon, and improving the smoothness of the whole scene gridding structure by using an approximate Laplace smoothing method;
3) acquiring position information of a scene by adopting vehicle-mounted satellite navigation equipment, calculating motion information between a previous frame and a current frame through the position information to obtain a transformation matrix for describing continuous motion between frames, and then detecting a new visual area; the new visual area is detected as: constructing a virtual image according to the transformation matrix and the scene gridding structure of the previous frame, and determining a new visual area of the previous frame compared with the current frame by using the visual difference between the virtual image and the current frame;
4) determining the position of a local scene gridding structure to be updated by utilizing the accumulated motion information of the camera position of the current frame relative to the camera position of the initial frame and the scene facet in the constructed overall scene gridding structure and combining the new visual area in the step 3);
5) the incremental splicing method of the scene gridding structure comprises the following steps: after the new visual area is associated with the position of the grid structure of the local scene to be updated, splicing the grid structure of the whole scene by a Delaunay triangulation method;
6) and repeating the iteration steps 1-5 until all single-frame binocular video frames obtained by the vehicle-mounted binocular camera are processed, and finally obtaining an overall scene gridding structure reconstruction result.
2. The binocular camera-based unmanned scene incremental gridding structure reconstruction method according to claim 1, wherein the binocular camera-based unmanned scene incremental gridding structure reconstruction method comprises the following steps:
the step 2) is specifically as follows:
2.1) grid "spiking" removal stage: processing each mesh vertex on the scene meshing structure in a traversal mode, and replacing the depth of each mesh vertex by the average depth of each adjacent vertex when the depth of each mesh vertex is larger than or smaller than the depth of each adjacent mesh vertex and the absolute value of the difference between the average depth of each adjacent vertex and the average depth of each adjacent vertex is larger than a threshold value;
2.2) mesh structure smoothing stage: the mesh vertexes in the scene gridding structure are smoothed one by an approximate Laplace smoothing method, and the calculation mode of the approximate Laplace smoothing method is as follows:
Figure FDA0002782306070000031
wherein Z iscAs depth values of vertices of the mesh to be processed, ZnFor the average depth value of all adjacent vertices of the mesh vertex to be processed, alpha is a manually set damping parameter, PwFor the position of the vertices of the mesh to be processed, P, in the camera coordinate systemoAnd optimizing the positions of the vertexes of the mesh to be processed in the camera coordinate system.
3. The binocular camera-based unmanned scene incremental gridding structure reconstruction method according to claim 1, wherein the binocular camera-based unmanned scene incremental gridding structure reconstruction method comprises the following steps: the method for detecting the new visual area in step 3) is to detect the new visual area in the current frame in a virtual image-based manner, and specifically includes the following steps:
each pixel point of a left eye image in a previous single-frame binocular video frame is processed as follows to construct a virtual image:
Figure FDA0002782306070000032
Ppis the homogeneous coordinate, z, of the pixel point on the pixel coordinate system of the left eye image in the previous frame of the single-frame binocular video framepIs the depth value, P, of the pixel point at the moment of the last framecIs the homogeneous coordinate, z, of the pixel point on the pixel coordinate system of the left eye image in the current frame of the single-frame binocular video framecThe depth value of a pixel point at the current frame moment is shown, T is a motion matrix corresponding to the inter-frame motion information, and K is an internal parameter matrix of the binocular camera;
the method comprises the steps of utilizing a Navier-Stokes equation to carry out image restoration on a virtual image, then comparing visual differences on each triangular patch area in the restored virtual image and a current frame left-eye image to obtain a visual difference value, wherein the specific calculation mode of the visual difference value is as follows:
Figure FDA0002782306070000033
wherein n is the total number of pixels in the triangular area, gpiFor the gray value of the ith pixel position in the virtual image, gciD is a visual difference value;
and finally, selecting a triangular patch with the visual difference value higher than the average visual difference value as a composition patch of the new visual area to finish the detection of the new visual area.
4. The binocular camera-based unmanned scene incremental gridding structure reconstruction method according to claim 1,
the step 4) is as follows: projecting a triangular patch in the constructed overall scene gridding structure onto a current frame by adopting the following calculation method:
Figure FDA0002782306070000034
wherein, PwThe position of the vertex of the triangular patch in a world coordinate system with a first frame as an origin, z is depth, T is accumulated motion information of the camera position of the current frame relative to the camera position of the initial frame, K is an internal parameter matrix of a binocular camera, and P isuvThe homogeneous coordinate of the vertex of the triangular patch projected on the current frame on the pixel coordinate system;
in the constructed overall scene gridding structure, the position of a triangular patch which is overlapped or partially overlapped with a new visual area in a current frame and has a spatial distance within 5 meters with the new visual area corresponding to the current frame is regarded as the position of the local scene gridding structure to be updated.
5. The binocular camera-based unmanned scene incremental gridding structure reconstruction method according to claim 1, wherein the binocular camera-based unmanned scene incremental gridding structure reconstruction method comprises the following steps:
the incremental splicing method for the scene gridding structure in the step 5) specifically comprises the following steps: and performing connected region search on all triangular patches in the new visual region of the current frame, sorting out all connected regions, projecting each connected region and the position of the local scene gridding structure to be updated to a position set on the left image of the current frame, and then performing Delaunay triangulation again to update the whole scene gridding structure.
CN201910156872.XA 2019-03-01 2019-03-01 Unmanned scene incremental gridding structure reconstruction method based on binocular camera Active CN110021041B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910156872.XA CN110021041B (en) 2019-03-01 2019-03-01 Unmanned scene incremental gridding structure reconstruction method based on binocular camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910156872.XA CN110021041B (en) 2019-03-01 2019-03-01 Unmanned scene incremental gridding structure reconstruction method based on binocular camera

Publications (2)

Publication Number Publication Date
CN110021041A CN110021041A (en) 2019-07-16
CN110021041B true CN110021041B (en) 2021-02-12

Family

ID=67189161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910156872.XA Active CN110021041B (en) 2019-03-01 2019-03-01 Unmanned scene incremental gridding structure reconstruction method based on binocular camera

Country Status (1)

Country Link
CN (1) CN110021041B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100621A (en) * 2022-08-25 2022-09-23 北京中科慧眼科技有限公司 Ground scene detection method and system based on deep learning network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102074052A (en) * 2011-01-20 2011-05-25 山东理工大学 Sampling point topological neighbor-based method for reconstructing surface topology of scattered point cloud
US9256496B1 (en) * 2008-12-15 2016-02-09 Open Invention Network, Llc System and method for hybrid kernel—and user-space incremental and full checkpointing
CN106780735A (en) * 2016-12-29 2017-05-31 深圳先进技术研究院 A kind of semantic map constructing method, device and a kind of robot

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1235185B1 (en) * 2001-02-21 2011-11-23 Boly Media Communications Inc. Method of compressing digital images
CN100533484C (en) * 2007-05-25 2009-08-26 同济大学 System and method for converting disordered point cloud to triangular net based on adaptive flatness
CN101581575B (en) * 2009-06-19 2010-11-03 南昌航空大学 Three-dimensional rebuilding method based on laser and camera data fusion
CN101866497A (en) * 2010-06-18 2010-10-20 北京交通大学 Binocular stereo vision based intelligent three-dimensional human face rebuilding method and system
CN102496184B (en) * 2011-12-12 2013-07-31 南京大学 Increment three-dimensional reconstruction method based on bayes and facial model
CN107610228A (en) * 2017-07-05 2018-01-19 山东理工大学 Curved surface increment topology rebuilding method based on massive point cloud
CN108876909A (en) * 2018-06-08 2018-11-23 桂林电子科技大学 A kind of three-dimensional rebuilding method based on more image mosaics

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9256496B1 (en) * 2008-12-15 2016-02-09 Open Invention Network, Llc System and method for hybrid kernel—and user-space incremental and full checkpointing
CN102074052A (en) * 2011-01-20 2011-05-25 山东理工大学 Sampling point topological neighbor-based method for reconstructing surface topology of scattered point cloud
CN106780735A (en) * 2016-12-29 2017-05-31 深圳先进技术研究院 A kind of semantic map constructing method, device and a kind of robot

Also Published As

Publication number Publication date
CN110021041A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN111968129B (en) Instant positioning and map construction system and method with semantic perception
Alidoost et al. Comparison of UAS-based photogrammetry software for 3D point cloud generation: a survey over a historical site
CN111275750B (en) Indoor space panoramic image generation method based on multi-sensor fusion
Remondino et al. State of the art in high density image matching
CN105160702A (en) Stereoscopic image dense matching method and system based on LiDAR point cloud assistance
WO2018061010A1 (en) Point cloud transforming in large-scale urban modelling
CN103814306A (en) Depth measurement quality enhancement
Bethmann et al. Semi-global matching in object space
Li et al. Dense surface reconstruction from monocular vision and LiDAR
Kuschk Large scale urban reconstruction from remote sensing imagery
CN112465849B (en) Registration method for laser point cloud and sequence image of unmanned aerial vehicle
CN111882668A (en) Multi-view three-dimensional object reconstruction method and system
CN114494589A (en) Three-dimensional reconstruction method, three-dimensional reconstruction device, electronic equipment and computer-readable storage medium
Bethmann et al. Object-based multi-image semi-global matching–concept and first results
Wang et al. Terrainfusion: Real-time digital surface model reconstruction based on monocular slam
CN116977596A (en) Three-dimensional modeling system and method based on multi-view images
CN110021041B (en) Unmanned scene incremental gridding structure reconstruction method based on binocular camera
Rothermel et al. Fast and robust generation of semantic urban terrain models from UAV video streams
Le Besnerais et al. Dense height map estimation from oblique aerial image sequences
CN116051980B (en) Building identification method, system, electronic equipment and medium based on oblique photography
CN112991436B (en) Monocular vision SLAM method based on object size prior information
Bethmann et al. Object-based semi-global multi-image matching
Cavegn et al. Evaluation of matching strategies for image-based mobile mapping
Previtali et al. An automatic multi-image procedure for accurate 3D object reconstruction
CN113554102A (en) Aviation image DSM matching method for cost calculation dynamic programming

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant