CN110675483A

CN110675483A - Dense vision SLAM-based rapid reconstruction method for three-dimensional map of unmanned aerial vehicle

Info

Publication number: CN110675483A
Application number: CN201910646511.3A
Authority: CN
Inventors: 黄方; 杨浩; 铁博; 陆俊; 彭思远; 陈胤杰
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-07-17
Filing date: 2019-07-17
Publication date: 2020-01-10
Anticipated expiration: 2039-07-17
Also published as: CN110675483B

Abstract

The invention discloses a dense vision SLAM-based unmanned aerial vehicle three-dimensional map rapid reconstruction method. And introducing a CUDA parallel programming model, analyzing algorithm hot spots in three-dimensional map reconstruction, designing a parallel algorithm, realizing the parallel algorithm on an embedded GPU platform, and verifying the improvement of the processing speed of the parallel algorithm. Based on the ROS distributed computing framework, a visual SLAM three-dimensional map reconstruction system prototype of a multi-process node is designed and realized. The practicability of the three-dimensional map reconstruction algorithm based on the vision SLAM is verified by operating the real-time image data of the unmanned airborne vision sensor, and the method has popularization and application values.

Description

Dense vision SLAM-based rapid reconstruction method for three-dimensional map of unmanned aerial vehicle

Technical Field

The invention relates to the technical field of unmanned aerial vehicles, in particular to a dense vision SLAM-based three-dimensional map rapid reconstruction method for an unmanned aerial vehicle.

Background

In recent years, natural disasters such as earthquake landslide and debris flow frequently occur, so that disaster areas are damaged to different degrees, and therefore, when the natural disasters occur, rapid emergency rescue is important. When such natural disasters occur, rescue workers are difficult to rush to the site at the first time due to the possibility of damage to the road. The unmanned aerial vehicle is widely applied to work such as earthquake relief work and the like due to the characteristics of flexibility, mobility, low cost and small ground terrain constraint, and plays a great role in emergency rescue work by virtue of the quick response capability of the unmanned aerial vehicle. Meanwhile, if the three-dimensional terrain information of the disaster area is required to be acquired, the unmanned aerial vehicle can also replace surveying and mapping personnel to enter the danger zone. The unmanned aerial vehicle can cruise below the cloud layer, the resolution of the acquired image data is high, and the image data can be used for three-dimensional map reconstruction after being slightly processed. Based on the acquired three-dimensional map, the disaster degree of the disaster area can be known, and a decision maker can rapidly deploy rescue schemes and routes through the provided spatial information. At present, there are mainly the following 3 methods for constructing a three-dimensional map.

The measurement modeling method comprises the following steps: the method comprises the steps of acquiring image data of a building by field workers, processing the image data by field workers, establishing a white mold through data fusion, and finally performing texture mapping on the white mold to finally generate a three-dimensional model of a measured area. The production process is complex, the working strength is high, and professional quality is required for interior and exterior workers.

Oblique photography technique: five high-resolution cameras are carried on an unmanned aerial vehicle, image data of five angles, namely a vertical direction and four oblique directions, are acquired simultaneously, and then an accurate three-dimensional map can be constructed through processing. The oblique photography measurement technology greatly reduces the construction cost of the three-dimensional city map.

Synchronizing positioning and map creation

Visual SLAM (simultaneous Localization and mapping) is an emerging three-dimensional map construction technology, namely, in the sensor motion process, motion tracks are estimated, and three-dimensional information of the surrounding environment is constructed. The visual SLAM uses the camera as a sensor for data acquisition, and has the advantages of large information amount, high flexibility, low cost, high timeliness, easiness in embedded development and the like. Currently, the classical SLAM algorithms include an Extended Kalman Filter (EKF) method, an Extended Information Filter (EIF) method, a Particle Filter (PF) method, and the like, but the methods cannot well solve the three-dimensional map reconstruction in a large scale range. With the great improvement of hardware performance, the development of parallel computing technology and the appearance of efficient solution method, the method based on graph optimization becomes a hot spot of the SLAM invention again.

The traditional three-dimensional map construction method realizes three-dimensional geometric reconstruction based on measured area image data in different visual angle directions, but has long production period, high cost and complex process. In certain specific application scenarios, such as natural disasters, the need for rapid emergency rescue is urgent, and at this time, a three-dimensional map needs to be generated quickly or even in real time, which cannot be solved well by the conventional method. In general, the poor timeliness of the conventional method mainly has the following three disadvantages: (1) a plurality of preorders are arranged before data acquisition, and the requirement on data precision is high; (2) in most unmanned aerial vehicle three-dimensional reconstruction processes, an unmanned aerial vehicle is only used as a tool for acquiring data, the data can be processed only after the data is acquired and the operation is finished, three-dimensional information cannot be fed back in real time, and therefore the convenience of an unmanned aerial vehicle platform cannot be fully utilized to provide timeliness service for three-dimensional map reconstruction; (3) when image processing is carried out on unmanned aerial vehicle image data, the traditional method is relatively time-consuming due to relatively high algorithm complexity, and moreover, the unmanned aerial vehicle image is high in resolution and large in data volume, so that the process from an image sequence to a three-dimensional map is difficult to calculate quickly. Therefore, how to realize high-efficiency rapid three-dimensional reconstruction of the unmanned aerial vehicle by using the visual SLAM technology is a direction worthy of invention and attention.

Disclosure of Invention

The invention aims to solve the problems and provide a method for rapidly reconstructing an unmanned aerial vehicle three-dimensional map based on a dense vision SLAM.

The invention realizes the purpose through the following technical scheme:

the method comprises the steps of fast reconstruction of a three-dimensional map of an unmanned aerial vehicle platform, a fast reconstruction algorithm of the three-dimensional map of the unmanned aerial vehicle, a fast reconstruction system of the three-dimensional map of the unmanned aerial vehicle, a visual SLAM node communication system based on ROS and a visual SLAM node communication algorithm based on ROS;

the three-dimensional map of the unmanned aerial vehicle platform is quickly reconstructed:

the method for abstracting the hardware bottom layer of the system based on the ROS comprises the following steps:

receiving sensor data using a sensor _ msgs/Image message type, wherein the message type comprises a RAW Image and a compressed Image, and the RAW Image is original data of a CMOS or CCD Image sensor for converting captured light source signals into digital signals;

after sensor data are successfully acquired, the ROS provides camera calibration, distortion correction and color decoding for an image pipeline, the ROS image pipeline runs through an image _ proc function package and provides a conversion function for acquiring single colors and colors from a RAW image acquired by a camera, and after the camera is calibrated, the image pipeline can extract CameraInfo messages and correct radial and tangential distortion of the image, namely, preprocessing can be completed while data are acquired;

for a complex Image processing task, using an OpenCV, a cv _ bridge and an Image _ transport library to perform conversion connection on a message, and processing a subscribed and published Image Topic, on the other hand, using the cv _ bridge to convert the Image into an ROS Image message for publishing when the OpenCV performs Image processing;

in order to perform image processing using OpenCV in a node, a separate OpenCV library needs to be installed, and then OpenCV2 dependency packages needed for compiling and running must be specified in a package xml file under a workspace; then, for each node using OpenCV, a lib file of OpenCV must be added to target _ link _ libraries; finally, adding the required OpenCV header file into the cpp file of the node;

through the processes and the steps, hardware abstraction is realized, namely, data are acquired through a sensor and published to an ROS network, and image processing is carried out by subscribing RAW information of an image and using OpenCV;

algorithm decoupling and distributed computation:

when the three-dimensional reconstruction algorithm runs, a plurality of nodes are operated, each node correspondingly realizes respective function, the nodes are communicated through ROS messages, each node can run on different equipment, and the coupling of the algorithm functions is realized through a network constructed by ROS;

an algorithm acceleration method based on CUDA parallel computing comprises the following steps:

the method comprises the steps that an NVIDIA Jetson TX2 embedded development module is selected as an experimental processing platform, when a parallel program based on a TX2 development module runs, each 32 parallel threads are called a Wrap, the scheduling execution of the threads in a GPU is carried out by taking the Wrap as a scheduling unit, each SM is internally provided with 2 thread bundle schedulers, each thread bundle scheduler is internally provided with 32 CUDA (compute unified device architecture) cores, and the GPU is totally 256 CUDA cores;

topic based messaging and communication:

(1) starting the Talker, and registering the information of the Talker to the Master end through a port, wherein the information comprises the topic name, the node address information and the like of the message issued by the Talker; master adds these information into a registration list;

(2) starting Listener, which registers to Master end to register the topic and Listener own address information;

(3) the Master matches information of the publisher and the subscriber, and establishes connection if the publisher and the subscriber publish/subscribe the same Message; if not, continuing to wait; if a match is found, and the Talker issues a Message at the moment, the address of the Talker is sent to the Listener;

(4) after receiving the address of the Talker sent by the Master, the Listener sends a connection request to the Talker, and simultaneously sends the topic name to which the Listener wants to subscribe and a communication protocol required by communication completion to the Talker;

(5) after receiving the Listener request, the Talker returns a piece of connection confirmation information which comprises the TCP address of the Talker;

(6) after receiving the TCP address of the Talker, the Listener establishes network connection with the Talker through the TCP;

(7) the Talker sends data to Listener through the established TCP network connection;

the unmanned aerial vehicle three-dimensional map rapid reconstruction system comprises: the system comprises a vision sensor, an unmanned aerial vehicle, an embedded processor and a GPU, wherein the software realizes a core algorithm of a vision SLAM; the software and hardware are mutually linked through a distributed framework and a communication protocol to form a complete software and hardware system; the process is as follows:

(1) the unmanned aerial vehicle carries an embedded type carrying vision sensor to obtain a real-time image sequence;

(2) the image sequences are respectively transmitted to the front end and the rear end;

(3) the front end provides a pose initial value for the rear end through feature extraction matching, key frame extraction and pose estimation;

(4) carrying out closed-loop detection on the image received in the step (2) by the rear end, and optimizing the pose acquired in the step (3);

(5) dense reconstruction is carried out on the rear end based on the pose and the image;

(6) monitoring in real time by an observation end through an ROS visualization tool;

the unmanned aerial vehicle three-dimensional map rapid reconstruction algorithm comprises the following steps:

the flow of the visual SLAM front-end algorithm is as follows:

(1) the SLAM system provides an interface, receives a continuous image sequence acquired by a sensor and transmits the image sequence to a VO end;

(2) judging whether the initialization of the SLAM system is completed or not, if not, performing the step (3), and if the initialization is completed, performing the step (4);

(3) initializing a first keyframe and initializing pose T0;

(4) continuously carrying out ORB feature extraction on the image sequence, and selecting the next key frame F_i(i>＝1)；

(5) F is to be_iAnd F_i-1The ORB characteristics are matched, and reliable matching is screened out;

(6) based on reliable characteristic point pairs, solving the motion between frames through epipolar constraint to calculate the pose T of the key frame_i；

(7) If the frame flow is finished, stopping, otherwise, returning to the step 4;

the flow of the reconstruction part algorithm is as follows:

(1) selecting a target pixel point p with the depth required to be calculated;

(2) calculating the epipolar line position based on the plane formed by the camera motion, namely the translation vector of the camera optical center and the vector of the connecting line of the camera optical center of the key frame and the p;

(3) traversing the obtained polar line, and searching a point matched with p;

(4) calculating the actual spatial position of the point p by using a triangulation meter;

(5) updating the depth information of the pixel;

the SLAM front-end feature extraction and matching method comprises the following steps: firstly, detecting the position of an Oriented FAST corner, and then calculating a BRIEF descriptor of the position based on the position of the corner; matching BRIEF descriptors in the two images, and recording Hamming distances of the BRIEF descriptors, namely the number of different digits of the binary character string; record the minimum d among all matches_minIf the Hamming distance between the descriptors is greater than 2 times d_minIf yes, deleting the match, otherwise, keeping the match;

the key frame extraction method comprises the following steps: firstly, processing a first image, firstly detecting FAST feature points and edge features, if the number of feature points in the middle of the image exceeds a set threshold value, taking the image as a first key frame, then processing continuous images after the first image, and continuously tracking the feature points; when the number of the matching points is larger than a threshold value, if the median of the parallax is larger than the threshold value, selecting to calculate an essential matrix E, otherwise, selecting to calculate a homography matrix H; if enough interior points exist after H or E is calculated, the interior points are used as key frames;

camera motion estimation: the motion estimation of the camera calculates a translation vector t and a rotation torque matrix R between adjacent key frames, wherein t and R can be obtained by decomposing an essential matrix E, when the key frame camera only rotates and does not translate, the epipolar constraint of two views is not established, a basic matrix F is a zero matrix, and a homography matrix H needs to be decomposed; the decomposition method is Singular Value (SVD) decomposition:

tR＝E＝UDV^T(4-1)

u, V is a third-order orthogonal matrix, and D is a diagonal matrix D ═ diag (r, s, t); decomposing to obtain eigenvalues of the diagonal matrix, wherein two eigenvalues in the 3 eigenvalues are positive, the values of the two eigenvalues are equal, and the other eigenvalue is 0; four groups of different solutions are obtained in the decomposition process and respectively correspond to four relative poses of the key frame;

dense reconstruction: for estimating the depth of any pixel in two adjacent images, a certain pixel p₁Corresponding space point P, camera optical center of two adjacent frames and its connection line O₁P、O₂P and O₁O₂Coplanar, wherein O₁P crosses the first frame to P₁，O₂P crosses over the second frame to P₂Then p is₂Must fall on the surface O₁O₂P crosses the second frame, i.e. P₁The position in the second frame is on the epipolar line;

since the brightness of individual pixels is not differentiated, a comparison of the pixel block similarity is considered, i.e. at the pixel point p₁Taking a small block w x w around the image block, traversing each point on the epipolar line, and matching the small block w x w around the point, wherein the assumption of the algorithm is changed from the gray scale invariance of the pixel into the gray scale invariance of the image block;

for the matching of two patches, their Normalized Cross-Correlation (NCC) is calculated:

after calculating each similarity measure on the epipolar line, an NCC distribution along the epipolar line, i.e. a probability distribution of p1 pixels occurring in the second frame, is obtained, and the higher the NCC value is, the higher the probability of the point being the homonymous point of p1 pixels is; however, there is a chance that the depth value of a pixel is calculated by only two images, so that a plurality of images are needed for estimation; we assume that the depth d of a certain pixel follows a gaussian distribution:

P(d)＝N(μ,σ²) (4-3)

whenever new data comes, the depth of the observation will also be a gaussian distribution:

then updating the depth of observation point p1 becomes an information fusion problem, and the fused depth satisfies:

after a plurality of iterations, the closer the probability distribution peak of p1 on the epipolar line of the second frame is to the true position; thus, the complete steps for dense reconstruction are as follows:

(1) assuming that the depths of all pixels satisfy some initial gaussian distribution;

(2) when new data is generated, determining the position of a projection point through epipolar line search and block matching;

(3) calculating the depth and uncertainty of the triangulated depth according to the geometric relationship;

(4) fusing the current observation into the last estimation, stopping calculation if convergence occurs, and returning to the step 2 if convergence does not occur;

ROS-based visual SLAM node communication system: an ROS distributed computing architecture is adopted to reduce the coupling among all modules of the system and improve the availability and reliability of the system in a complex environment; the method comprises the steps that data acquisition, pose estimation, dense reconstruction and visualization in the visual SLAM reconstruction process are respectively operated by four computing nodes, each node respectively plays its own role, and a Topic subscription mode is adopted for communication among the nodes; when the system works, 4 nodes participate in communication, wherein a sensor captures data, original data of the sensor is published as a topic/sensor _ msgs, the visual sensor is directly connected with Jetson TX2 through a USB or CSI interface, stream data is converted into image sequence data messages, and the image sequence data messages are redistributed into a topic/USB _ cam;

VO node subscribes/usb _ cam, carries out pose estimation, issues the result to pose forming data, reconstructs node subscribes/usb _ cam and pose data at the same time, issues the processed depth map data and point cloud data to ROS network, and subscribes by visualization node to realize visualization function of incremental map construction of system;

the ROS-based visual SLAM node communication algorithm comprises the following steps: the SLAM front end, namely the VO node, and the back end reconstruction node subscribe the topic/usb _ cam at the same time; the VO node performs pose resolving on data issued by the/usb _ cam node through an interface of a SLAM front-end library, then issues the data in a message form, and a reconstruction node at the back end of the SLAM subscribes the data, namely the reconstruction node subscribes frame data and pose data at the same time; in order to visualize the result of the three-dimensional map reconstruction in real time, the method adopted by the method comprises the following flows:

(1) when the depth map of the selected reference frame is converged, publishing the depth map result to a topic;

(2) the visualization node acquires the pose/position issued by the VO node according to the reference frame and the timestamp of the reference frame;

(3) calculating the spatial position of the optical center of the reference frame and the normal direction of the frame based on the initialization result;

(4) a point cloud is constructed pixel-by-pixel based on the depth map.

The invention has the beneficial effects that:

the invention relates to a method for quickly reconstructing an unmanned aerial vehicle three-dimensional map based on dense vision SLAM. And introducing a CUDA parallel programming model, analyzing algorithm hot spots in three-dimensional map reconstruction, designing a parallel algorithm, realizing the parallel algorithm on an embedded GPU platform, and verifying the improvement of the processing speed. Based on the ROS distributed computing framework, a visual SLAM three-dimensional map reconstruction system prototype of a multi-process node is designed and realized. The practicability of the three-dimensional map reconstruction algorithm based on the visual SLAM is verified by running real-time visual sensor-carried image data of the unmanned aerial vehicle, and the method has popularization and application values.

Drawings

FIG. 1 is an overall design framework diagram of the present invention; FIG. 2 is a system flow diagram of the present invention; FIG. 3 is an algorithm module dependency graph of the present invention; FIG. 4 is a flow chart of the visual SLAM algorithm front end design of the present invention; FIG. 5 is a flow chart of the visual SLAM algorithm reconstruction end design of the present invention; FIG. 6 is a feature extraction and matching flow diagram of the present invention; FIG. 7 is a graph of the registration result (no culling mismatch) based on ORB features of the present invention; FIG. 8 is a graph of the registration result (culled mismatches) based on ORB features of the present invention; FIG. 9 is a key frame extraction flow diagram of the present invention; FIG. 10 is a key frame relative pose graph corresponding to four sets of solutions of the present invention; FIG. 11 is a diagram of the unmanned aerial vehicle image matching and motion estimation results of the present invention; FIG. 12 is a graph of the number of iterative key frames versus the number of pixel points per iteration of the present invention; FIG. 13 is a graph of the number of iterative key frames versus the computation time for each iteration of the present invention; fig. 14 shows the reconstruction result when NCC is 0.90, in which (a) is the original image; (b)1 frame; (c)5 frames; (d)10 frames; (e)15 frames; (f)25 frames; FIG. 15 is a parallelization strategy diagram of the dense reconstruction algorithm of the present invention; FIG. 16 is a diagram of the ROS based hardware and software system architecture of the present invention; FIG. 17 is a device IP to host name binding diagram of the present invention; FIG. 18 is a schematic diagram of the node being run by the launch file according to the present invention; FIG. 19 is a GPU detail map of the present invention; FIG. 20 is a partial image of sensor data of the present invention; FIG. 21 is a view of a building under the view of an unmanned aerial vehicle of the present invention; FIG. 22 is a diagram of the effect of motion trajectory estimation based on sensor data in accordance with the present invention; FIG. 23 is a graph of the incremental three-dimensional reconstruction results of the present invention based on sensor data; FIG. 24 is a three-dimensional map reconstruction result diagram of the present invention; FIG. 25 is a graph of the pose estimation of sensor data versus a three-dimensional map in an experiment of the present invention; FIG. 26 is a diagram of the result of the three-dimensional reconstruction experiment point cloud of the building of the present invention; fig. 27 is a graph of the reconstruction accuracy evaluation result (error variance versus effective pixel percentage) of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings in which:

the invention combines the characteristics of the unmanned aerial vehicle and the embedded computing platform on the basis of fully analyzing the three-dimensional map reconstruction method and the realization principle of the visual SLAM algorithm, and realizes the rapid reconstruction of the three-dimensional map on the unmanned aerial vehicle platform. The method is invented aiming at key technologies of multi-node distributed computation, communication and data transmission, CUDA parallel computation and the like of a visual SLAM algorithm on an unmanned aerial vehicle platform.

(1) The three-dimensional map reconstruction method is analyzed, the realization process and the mathematical principle of the three-dimensional map reconstruction method are understood, the bottleneck of improving the efficiency of the three-dimensional reconstruction method of the unmanned aerial vehicle platform is analyzed, and the key technology for solving the problems and inventing the corresponding idea is provided aiming at the problems.

(2) The method comprises the steps of researching the characteristics of an ROS distributed computing architecture, designing modes and strategies of multi-node distributed computing and communication by combining a visual SLAM algorithm, then building an ROS network, and monitoring the running state, images and three-dimensional information of nodes under the ROS in real time based on a visual tool.

(3) And introducing a CUDA parallel programming model, analyzing algorithm hotspots in the three-dimensional map reconstruction, designing a parallel algorithm, realizing the parallel algorithm on the embedded GPU platform, and verifying the improvement of the processing speed of the parallel algorithm.

(4) And constructing a software and hardware integrated system based on the ROS distributed computing framework, designing a visual SLAM three-dimensional reconstruction algorithm of the multi-process node, and realizing the visualization of real-time increment.

(5) And verifying the practicability of the three-dimensional map reconstruction algorithm based on the visual SLAM by replaying the image stream data acquired by the airborne visual sensor of the unmanned aerial vehicle.

The main function of the invention is the rapid reconstruction of the three-dimensional map, in order to achieve the aim, firstly, image data is acquired, then, the processing is carried out based on the visual SLAM algorithm, and finally, the generated result is displayed in a visual form, so that the overall design framework of the invention is shown as the following figure 4-1.

According to the function realization of the invention, the function realization can be divided into a hardware part and a software part, wherein the hardware comprises a visual sensor, an unmanned aerial vehicle, an embedded processor and a GPU, and the software realizes the core algorithm of the visual SLAM. The software and hardware are interconnected through a distributed framework and a communication protocol to form a complete software and hardware system.

Designing a system flow:

based on the above general design, the software and hardware integrated system of the present invention has the following operation flow as shown in fig. 4-2, and the flow is as follows:

(6) and the observation end carries out real-time monitoring through an ROS visualization tool.

In order to reduce the complexity of the system designed by the invention and facilitate the definition of task division and test of the front end and the back end, the algorithm part of the system is divided into a VO front end and a mapping back end. Therefore, the front end and the rear end can work relatively independently, the coupling degree between codes is reduced, and the optimization or the replacement of a module at the later stage is facilitated. The VO front end mainly realizes feature extraction and matching, motion estimation and key frame extraction, the mapping rear end mainly realizes closed-loop detection, attitude optimization and dense reconstruction, and the dependency relationship between the front end and the rear end and each function is shown in the figure 4-3.

In fig. 4-3, the specific functions and design ideas of each part are as follows:

(1) feature extraction and matching: the main function of the part is to establish data association between adjacent images, and as the system of the invention is not assisted by a positioning mode except vision, the same observation point in the adjacent images needs to be determined first, and the accuracy of the module directly influences the reliability of posture estimation.

(2) And (3) motion estimation: the main function of the part is to solve the camera motion by a epipolar constraint simultaneous equation set based on the result of a feature extraction and matching module, namely the pixel positions of a plurality of homonymous feature point pairs in different images.

(3) Extracting key frames: the main function of the part is to eliminate redundant and invalid images, screen out data enough to realize dense reconstruction, greatly reduce the calculated amount and improve the robustness of the algorithm to a certain extent.

(4) Closed loop detection: the main function of the part is to detect whether the unmanned aerial vehicle returns to a position which is reached once, so as to provide constraint conditions for pose optimization. This part is implemented by the open source library BoW 3.

(5) Pose optimization: the main function of the part is to optimize the poses of all key frames through beam adjustment after the closed loop is detected so as to reduce the accumulated error generated by estimating the motion by only using adjacent frames. This part is implemented by the open source library g2 o.

(6) Dense reconstruction: the main function of the part is to estimate the spatial position of each pixel in the key frame based on the key frame and the pose data thereof, and the result is directly fed back to the constructed three-dimensional map.

Designing an algorithm flow:

as mentioned earlier, the visual SLAM algorithm is a computationally intensive algorithm. The characteristic-based visual SLAM front end is always considered as the mainstream method of VO, and has the characteristics of stable operation, insensitivity to illumination and dynamic objects and the like, so that the characteristic-based visual SLAM front end gradually becomes a mature solution at present. The VO front-end algorithm flow designed by the invention is shown in the figure 4-4.

From the above fig. 4-4, the flow of the visual SLAM front-end algorithm is as follows:

(3) initialize the first keyframe and initialize pose T₀；

(7) If the frame flow is finished, stopping, otherwise, returning to the step 4.

In dense reconstruction, the distances of most of the pixel points of the acquired image data need to be known, and in the case of using a monocular camera, the distances of the pixels are estimated by triangulation. The algorithm for reconstructing the dense three-dimensional map designed by the invention updates the depth map of the key frame to be obtained until the depth map is converged through multiple iterations, and the flow of each iteration is shown in FIGS. 4-5.

In fig. 4-5, the flow of the reconstruction part algorithm is as follows:

(1) selecting a target pixel point p with the depth required to be calculated;

(3) traversing the obtained polar line, and searching a point matched with p;

(5) updating the depth information of the pixel;

SLAM front end design:

the design of the feature extraction and matching method comprises the following steps:

the flow of the present invention for feature extraction and matching is shown in FIGS. 4-6 below.

First the Oriented FAST corner locations are detected and then based on the corner locations, their BRIEF descriptors are computed. And matching BRIEF descriptors in the two images, and recording the Hamming distance, namely the number of different digits of the binary character string. The minimum value dmin among all matches is recorded, and if the hamming distance between descriptors is greater than 2 times dmin, the match is deleted, otherwise the match is retained.

The method for eliminating the mismatching is an engineering empirical value, fig. 4-7 are registration results of non-eliminated mismatching based on ORB features, and it can be clearly seen that a large number of mismatching are generated in the ORB feature matching process of two images.

By eliminating Hamming distance greater than 2 times d_minThe remaining matches are then again shown in fig. 4-8, and the results show that where there are significant textural features, the matches are relatively accurate.

To this end, we obtain data association between adjacent frames of an image sequence, i.e. different pixel positions where the homonymic feature points are located in the image. The actual positions of the homonymous feature points in the space are consistent, and the homonymous feature points are mapped on a plurality of homonymous point pairs of two image planes and are subjected to position and attitude estimation by a motion estimation module.

The key frame extraction method comprises the following steps:

before pose estimation, considering that a vision sensor can acquire data at a rate of dozens of frames per second for three-dimensional reconstruction by using a vision SLAM of an unmanned aerial vehicle platform, however, due to the limitation of computing resources, all frame data acquired by the sensor cannot be sent to the SLAM system for processing. Meanwhile, the acquired image data also contains a large number of redundant and invalid frames, and all frame data does not need to be used for participating in pose estimation and calculation. Therefore, we need to extract a representative key frame from the image sequence, calculate the pose of the key frame, and perform relatively reliable estimation of pixel depth in a local range with reference to the key frame. We therefore devised a strategy for extracting key frames as shown in fig. 4-9.

Firstly processing a first image, detecting FAST characteristic points and edge characteristics, if the number of characteristic points in the middle of the image exceeds a set threshold value, taking the image as a first key frame, and then processing continuous images after the first image to continuously track the characteristic points. And when the number of the matching points is larger than the threshold value, if the median of the parallaxes is larger than the threshold value, selecting to calculate the intrinsic matrix E, and otherwise, selecting to calculate the homography matrix H. If there are enough inliers after H or E is computed, it is taken as a key frame.

Camera motion estimation:

the motion estimation of the camera calculates a translation vector t and a rotation matrix R between adjacent key frames, and t and R can be obtained by decomposing an essential matrix E, when the key frame camera only rotates and does not translate, the epipolar constraint of two views is not established, a basic matrix F is a zero matrix, and a homography matrix H needs to be decomposed at this time. The decomposition method is Singular Value (SVD) decomposition:

tR＝E＝UDV^T(4-1)

wherein U, V are all three-order orthogonal matrices, and D is a diagonal matrix D ═ diag (r, s, t). The decomposition yields eigenvalues of the diagonal matrix, and of the 3 eigenvalues, two eigenvalues will be positive, with equal values, and the other 0. Four different sets of solutions are obtained during the decomposition process, corresponding to the four relative poses of the keyframes, respectively, as shown in fig. 4-10.

As can be seen from fig. 4-10, only the first solution of the four solutions has actual physical meaning, i.e., the scene depth corresponding to the keyframe is a positive value, while the remaining three solutions have only theoretical meaning, and actually cannot obtain correct camera motion. In summary, the correct set of solutions can be determined by depth of field, and the pose estimation results based on fig. 4-8 are shown in fig. 4-11.

Dense reconstruction design: through the design of the visual SLAM front end, the spatial position and the attitude direction of each key frame can be estimated efficiently and relatively reliably, and based on the position and the attitude and the key frame data, the spatial position of any pixel point can be deduced in turn, namely the depth of any pixel is predicted.

The method introduces an epipolar line search and block matching algorithm in the construction of the dense three-dimensional point cloud map so as to estimate and acquire the depth information of most pixels in the image. For estimating the depth of any pixel in two adjacent images, for a spatial point P corresponding to a pixel P1, the camera optical centers of two adjacent frames are coplanar with the connecting lines O1P, O2P and O1O2, where O1P intersects the first frame at P1 and O2P intersects the second frame at P2, then P2 always falls on the intersecting line of the plane O1O2P and the second frame, i.e. the position of P1 in the second frame is on the epipolar line.

Since the brightness of a single pixel is not differentiated, the similarity of pixel blocks is considered to be compared, that is, a small block w is taken around the pixel point p1, and then each point on the epipolar line is traversed to match with the small block w around the point, so that the assumption of the algorithm is changed from the gray-scale invariance of the pixel into the gray-scale invariance of the image block.

after calculating each similarity measure on the epipolar line, an NCC distribution along the epipolar line, i.e. a probability distribution of p1 pixels occurring in the second frame, is obtained, the higher the NCC value, the higher the probability that the point is the homonymous point of p1 pixels. However, there is a chance that the depth value of a pixel is calculated only by two images, and therefore, a plurality of images are required for estimation. We assume that the depth d of a certain pixel follows a gaussian distribution:

P(d)＝N(μ,σ²) (4-3)

over multiple iterations, the closer the peak of the probability distribution of p1 on the second frame epipolar line is to its true position. Thus, the complete steps for dense reconstruction are as follows:

(4) and fusing the current observation into the last estimation, stopping calculation if the current observation is converged, and returning to the step 2 if the current observation is not converged.

Dense reconstruction serial algorithm implementation and testing:

through epipolar search and block matching technology, we can estimate depth of a large number of non-characteristic pixels in an image after motion estimation is realized, however, the following problems still exist for visual SLAM of an unmanned aerial vehicle platform: the method has a large calculation amount, for example, an image with a size of 640 × 480 is taken as an example, about 30 ten thousand pixel points need to be calculated during each depth estimation, and in order to meet the requirement of the system on quick real-time performance, parallel optimization needs to be performed on an algorithm.

Before parallelizing the module of dense reconstruction in the SLAM system, a proper parallelization mode is selected, a reasonable parallelization strategy is formulated, and a theoretical basis is made for realizing the parallelization in the next step. The parallelization thought is analyzed by combining a dense reconstruction principle, embedded GPU platform characteristics and a common parallelization method.

As can be seen from the above, the dense reconstruction needs to be iterated for many times, it is assumed that the depths of all pixels satisfy a certain initial gaussian distribution, when new data is generated, the positions of projection points are determined by epipolar search and block matching, then the triangulated depths and uncertainties are calculated according to the geometric relationships, finally the current observations are fused into the last estimation, if convergence is reached, the calculation is stopped, otherwise the positions of the projection points are determined again;

in order to realize fast and efficient three-dimensional reconstruction, parameters such as an NCC threshold value, the number of iteration frames and the like are determined before dense reconstruction, therefore, the invention uses 50 key frames to carry out iterative computation depth on a reference frame, and when the NCC is respectively computed to be 0.85, 0.90 and 0.95, the number of pixel points and computation time of each iteration are respectively calculated. The experiment platform selects Think PadT470p, the processor is Intel i 77700 HQ, and the size of the running memory is 16 GB. The data used were the remodode data set acquired by the drone, the image size was 640 x 480, and the experimental results are shown in fig. 4-12, 4-13.

As can be seen from fig. 4-12, when the key frame number is around 20 frames, the number of pixels updated in each iteration of the image will be much smaller than the total number of pixels in the image. On the other hand, the higher the NCC is, the smaller the number of pixel points per update is, whereas the lower the NCC is, the more pixel points per update is, which is consistent with theoretically the easier the pixel update is when the matching threshold is lowered. While the higher the NCC value, the higher the reliability of dense reconstruction, but the more sparse the result will be.

As can be seen from fig. 4-13, the NCC value also affects the time of dense reconstruction, and before about 17 frames, the NCC has less influence on the computation time of reconstruction, and after about 17 frames, the time of each iteration tends to be stable, and the higher the NCC, the longer the computation time of each iteration.

Comprehensively considering the dense requirement, the reliability of the reconstruction result and the rapid realization requirement of the point cloud in the invention, the selected parameter NCC is 0.9, the iteration results when 1 frame, 5 frames, 10 frames, 15 frames and 25 frames are respectively carried out in the iteration process are shown in FIGS. 4-14, and the depth map tends to be stable after the iteration is carried out to 15 frames;

fig. 4-14 reconstruction results (a) original image at 0.90 NCC; (b)1 frame; (c)5 frames; (d)10 frames; (e)15 frames; (f)25 frames

And (3) carrying out a dense reconstruction algorithm overall parallelization strategy:

in an experiment, the estimation of the dense depth map is very time-consuming, because the point to be estimated for each image is changed from the original hundreds of feature points to hundreds of thousands or even millions of pixel points at a time, and even the current mainstream CPU cannot calculate the huge number in real time. However, the dense reconstruction process has yet another property: the depth estimates of these hundreds of thousands of pixels are independent of each other, which makes parallelization a useful proposition.

In the above pseudo code, a double loop is used to traverse each pixel of the image and perform an epipolar search on the image pixel by pixel. When a CPU is used, the process is serial, i.e. the next pixel is calculated after the last pixel has been calculated. In practice, however, there is no obvious link between the next pixel and the previous pixel, so it is not necessary to wait for the calculation result of the previous pixel at all.

The parallelization strategy of the dense reconstruction algorithm of the invention is shown in fig. 4-15, wherein the CPU of TX2 undertakes the reading of the key frame, the initialization of the depth map and its variance matrix, and the GPU is responsible for depth estimation of each pixel. After the GPU finishes updating the pixel points, the CPU also copies the data to the host memory.

As can be seen from the foregoing description of the algorithm principle and the algorithm, the process is suitable for performing a parallelization design following the data parallelization principle, that is, mapping different pixel data to different work items of a processor for parallel processing, and performing depth estimation iterative operations of different pixels on the different work items. The data allocation strategy adopts a continuous data allocation mode, and according to the allocation mode, the number of pixels directly allocated to the work items is ensured to be different from 1, so that the consistency of the number of the work item processing tasks is basically maintained.

And after all the workitems are allocated to respective pixel data for processing, copying the processing result of the global memory space of the equipment into the memory by the host end for outputting the final depth map matrix data.

Implementation of dense reconstruction parallel algorithm:

the host program designed by the invention is an ROS node, is responsible for the work of initialization, parameter setting and the like of the program, and when the node runs to iterate the depth map of the key frame, a kernel function is called and executed by a GPU terminal,

the parallel codes are tested by using the same data set on the same platform running the serial codes, iteration is carried out on the reference frame through 20 key frames, and the average value is obtained through 5 experiments.

ROS-based visual SLAM node communication design and algorithm implementation:

visual SLAM node communication design:

in the practical application scene of the visual SLAM (such as unmanned aerial vehicles, automatic driving automobiles and the like), any module of the system possibly breaks down when the platform is in a complex environment, so that the invention adopts an ROS distributed computing architecture to reduce the coupling among the modules of the system and improve the usability and reliability of the system in the complex environment. The invention respectively runs data acquisition, pose estimation, dense reconstruction and visualization in the visual SLAM reconstruction process by four computing nodes, each node respectively plays its own role, the communication between the nodes adopts a Topic subscription mode, and the system structure is shown in figures 4-16.

As can be seen from fig. 4-16, the system operates with a total of 4 nodes engaged in communication, where the sensor captures data, the raw data of which is published as topic/sensor _ msgs, and the visual sensor is directly connected to Jetson TX2 through a USB or CSI interface, converting the streaming data into image sequence data messages, and re-publishing as topic/USB _ cam.

VO nodes subscribe/usb _ cam, pose estimation is carried out, the result is published to pose forming data, reconstruction nodes subscribe/usb _ cam and pose data at the same time, processed depth map data and point cloud data are published to an ROS network, and visualization nodes subscribe to realize the visualization function of incremental map construction of the system.

The visual SLAM algorithm node is realized as follows:

the SLAM front end, i.e., the VO node, subscribes to the topic/usb _ cam simultaneously with the back-end reconstruction node. And the VO node performs pose resolving on the data issued by the/usb _ cam node through an interface of the SLAM front-end library, then issues the data in a message form, and subscribes by a reconstruction node at the back end of the SLAM, namely the reconstruction node subscribes frame data and pose data at the same time.

Three-dimensional point cloud visualization based on depth maps:

in order to visualize the result of the three-dimensional map reconstruction in real time, the method adopted by the invention has the following procedures:

(4) a point cloud is constructed pixel-by-pixel based on the depth map.

System environment and operation:

ROS network:

the ROS-supported operating system platform comprises Ubuntu, OS X, Fedora, Gentoo, OpenSUSE, Debian, Arch Linux, Windows and the like. As Ubuntu provides rich visualization function and ROS is convenient and rapid to install and deploy under the system, Ubuntu 16.04 is selected as an operation platform of ROS, and common commands of ROS are shown in a table 4-1.

TABLE 4-1ROS general Command

Command	Function(s)
		rosmake	Compiling ROS packages
roscore	Running ROS core Process
		rosrun	Running ROS bag
roslaunch	ROS startup script
		rosnode	ROS node
rostopic	Topic of ROS

The ROS can quickly establish a distributed computing system, in the invention, the distributed computing system is composed of an airborne NVIDIA Jetson TX2 and a portable PC through a wireless local area network, before two-machine communication, an ifconfig is firstly input into each terminal to check the respective IP information of two devices, then a hosts file under/etc directory is modified, and the IP of the two devices is bound with a host name, as shown in FIGS. 4-17, so that the ROS bottom layer can analyze the host name of the other device. After modification, the network is restarted on both devices.

When each node of the SLAM system is operated, a node manager is selected firstly, a roscore is input by a device terminal where the node manager is located to start an ROS core process, and when the device needs to operate other processes, the node can be started directly through a rossrun or a roslaunch. The node manager of the present invention operates on an onboard TX 2.

To enable a portable PC to join the ROS network, it is necessary to configure an ROS _ MASTER _ URI on the PC, entering at its terminal: export ROS _ MASTER _ URI ═ http:// tx2:11311

Where TX2 is the hostname that has been modified and 11311 is the port number of the ROS node manager. At this point, the configuration of the ROS network consisting of two devices is complete.

Due to the characteristic of ROS distributed operation, a system constructed by the invention needs to control a plurality of process nodes when in operation. And the unmanned aerial vehicle can not directly control, so that the SSH is required to be used for remote connection. Under Ubuntu, SSH is very convenient to use, only a new terminal is needed to be started, when the onboard equipment TX2 and a PC used for control are in the same local area network, information such as an IP address of TX2 is input at the terminal to establish session connection, and the node can be controlled through instructions or running scripts after the connection is successful.

Algorithm deployment and operation:

the VO node and the reconstruction node can operate on an onboard equipment end and can also operate on a ground-end portable PC. Since the VO node and the reconstruction node both subscribe to the messages issued by the sensor node, a large amount of communication overhead exists in the VO node and the reconstruction node, and NVIDIA Jetson TX2 has enough computing power to support the simultaneous operation of the above nodes, the two nodes are operated on TX2, and the ground-side PC starts an rviz visualization tool to observe the real-time reconstructed three-dimensional map.

As in the design and implementation of the previous section, four nodes and an ROS core process need to be started simultaneously when the system runs, sensor nodes relate to camera intrinsic parameters, and since the intrinsic parameters of each camera are not completely consistent, the sensor nodes need to be calibrated before use. After the internal parameter matrix of the sensor is obtained, the sensor can be conveniently reused, therefore, an ROS starting script is created, the internal parameters of the camera are omitted, and the sensor node can be started by running the script after the ROS core process is started.

For convenient observation, after the sensor node is started, the rviz visualization process can be started at the far end, the start instruction of the rviz is rosrun rviz, and the interface after the start is as follows. When the process of registering with the ROS MASTER is running, visualization can be performed by rviz, such as observing an input frame, a selected key frame, a depth map calculated based on the key frame, and the like in the SLAM system.

The ROS node in the invention needs some initialized parameters during operation, such as information of length, width, channel and the like of an image, if the ROS node is directly operated, a long instruction needs to be input, so the ROS node adopts a launch file mode, the parameters of the node operation are written into the ROS node, the launch file is executed, the progress node can be operated with parameters, data such as camera parameters and the like can be called, and the starting process of the whole system is shown in FIGS. 4-18.

When the system runs, each terminal can only control one node, each node needs to set an environment variable before running, then the operation of executing the launch file corresponding to the node needs to be repeated for many times, and missing or typing errors can be avoided in the process, so that in order to save time, the Shell script under Ubuntu is introduced during the running of the system to write all commands needing to be executed into one script to be executed.

Sh is used as a suffix for naming, then the right is modified to be an executable file, and finally the script is operated to operate the system of the invention;

and (3) knotting:

the invention specifically introduces an implementation mode and steps of a three-dimensional map reconstruction system based on visual SLAM. Firstly, the software and hardware system is designed totally, and then the system flow and the algorithm flow are designed with each functional module. And then, by combining the characteristics of the unmanned aerial vehicle platform and based on an ROS distributed computing framework, the visual SLAM node function and communication are designed and realized. And finally, the ROS environment, network configuration, algorithm deployment and operation are introduced.

The invention aims to realize the rapid reconstruction of the three-dimensional map of the unmanned aerial vehicle based on the visual SLAM, and the achievement display is a set of software and hardware integrated system capable of realizing the function. Therefore, the method is based on the image sequence and the sensor data respectively, the verification of the algorithm is carried out on the unmanned aerial vehicle airborne embedded computing platform NVIDIA Jetson TX2, and the achievement display is carried out through the rviz visualization tool.

Experimental platform and configuration

The invention carries out verification experiment on pose estimation and three-dimensional map reconstruction based on the visual SLAM algorithm, tests are carried out on the unmanned aerial vehicle airborne embedded computing equipment, and details of hardware configuration and software configuration are shown in the table 5.1.

TABLE 5.1 Experimental Environment software and hardware configuration information Table

The detailed GPU information of NVIDIA Jetson TX2 is shown in fig. 5-1, where the CUDA version is 9.0, the CUDA core number is 256, the Wrap size is 32, and the maximum number of processes per Block is 1024.

The experimental contents are as follows:

unmanned aerial vehicle motion estimation and increment visualization experiment:

the experiment is to test the pose estimation and dense reconstruction nodes of the algorithm on a raspberry pi 3B platform and an NVIDIA Jetson TX2 platform respectively according to the image data of the unmanned aerial vehicle sensor in the ruin environment in an emergency disaster relief scene.

The raspberry pi 3B platform is poor in computing power, and does not carry a GPU (graphics processing unit) for accelerating a dense reconstruction algorithm, so that the result of running VO (VO) nodes is only tested, and the unmanned aerial vehicle motion track is observed and analyzed through an rviz visualization tool.

The NVIDIA Jetson TX2 platform operates the VO node and the reconstruction node simultaneously, and observes the three-dimensional map reconstructed based on the unmanned aerial vehicle image through an rviz visualization tool. And finally, comparing and analyzing the three-dimensional map result of the dense reconstruction with the reconstruction result of Smart3D three-dimensional reconstruction software.

The experimental steps for testing the motion estimation node under the raspberry pi 3B platform are as follows:

(1) raspberry pi 3B initiates ROS core process roscore;

(2) configuring a notebook used for observation and a raspberry pi 3B in the same wireless local area network, and designating ROS _ MASTER _ URI as a 11311 port of the raspberry pi 3B at an observation end;

(3) the raspberry Pi 3B runs a VO node, and the observation end runs an rviz visualization tool;

(4) experimental data were played back on raspberry pi 3B by rossbag play;

the experimental steps for testing simultaneous localization and three-dimensional map rapid reconstruction under the NVIDIA Jetson TX2 platform are as follows:

(1) NVIDIA Jetson TX2 starts ROS core process roscore;

(2) configuring a notebook for observation and NVIDIA Jetson TX2 in the same wireless local area network, and designating ROS _ MASTER _ URI as a 11311 port of TX2 at an observation end;

(3) a VO node and a reconstruction node are operated on TX2, and an rviz visualization tool is operated on an observation end;

(4) experimental data were played back by rossbag play on TX;

(5) and observing the three-dimensional map reconstruction based on the data of the unmanned airborne vision sensor at the observation end.

Building three-dimensional reconstruction experiment:

in the experiment, the dense reconstruction algorithm is verified according to image data under the view angle of the unmanned aerial vehicle, the generated three-dimensional point cloud map is stored as a point cloud file in a pcd format, and visualization is realized through a pci point cloud library.

Reconstruction time and accuracy evaluation experiment:

in order to quantitatively evaluate the method of the present invention, evaluation experiments were performed on the time and accuracy of the reconstruction of the present invention.

The reconstruction time evaluation experiment is respectively carried out by comparing the running time of the serial algorithm with the running time of the parallel algorithm of the invention and comparing the software reconstruction time (SfM algorithm) of Smart3D with the system reconstruction time of the invention.

The reconstruction precision evaluation experiment is carried out by comparing a reference frame and a corresponding real depth map in the reconstruction process, and a variance curve of effective pixels and errors is drawn.

Experimental data:

(1) unmanned aerial vehicle motion estimation and increment visualization experiment data

The image size of the sensor rossbag experimental data 1 group is 752 × 480, the camera internal parameters are calibrated, the image is directly distributed in the ROS network in a message form through an unmanned aerial vehicle carrying an embedded computing device and acquired by a sensor without compression, and all Topic and message information at the current moment are recorded and stored through the rossbag for experimental reproduction. During reproduction, the state of the sensor at that time can be simulated only by running the' rossbag play xx.

The sensor data is outdoor ruin scene, the storage size is 2.1GB, messages are distributed in/camera/image _ raw, 6387 messages are total, and part of data is shown in the following fig. 5-2.

(2) Building three-dimensional reconstruction result experimental data

In order to verify the three-dimensional reconstruction effect of the invention on buildings, single-view overlooking data of an innovation center of a riverschool district of electronic science and technology university are acquired by an unmanned aerial vehicle, wherein a reference frame is shown in the following fig. 5-3.

(3) Reconstruction of time and accuracy evaluation experimental data

Data selected by the reconstruction time evaluation experiment are an REMODE data set used in the invention and sensor data used in the unmanned aerial vehicle increment visualization experiment, images in the used sensor data Rosbag are extracted by script sampling, and the sampling frequency is 2 seconds/frame.

The reconstruction accuracy evaluation experiment data set consists of views generated by ray tracing of the three-dimensional synthetic model. Each view has an associated accurate camera pose and depth map and part of the information for this data set is shown in table 5.2.

TABLE 5.2 data set information

Number of frames	Dimension (m)	Mean value (m)	Maximum movement (m)
				200	0.827-2.84	1.531	4.576

Experimental results and analysis:

the experimental results are as follows:

(1) unmanned aerial vehicle motion estimation and increment visualization experiment result

The motion estimation effect of the VO node of the algorithm operated under the raspberry Pi 3B platform is shown in FIGS. 5-4.

Fig. 5-4 are top-down observations after adjusting the observation perspective, where a line segment is a fitted camera motion trajectory and the positions of endpoints on the fitted line segment are estimated spatial positions of the sensor at times corresponding to the keyframes. In the algorithm execution process, the pose can be synchronously estimated in the sensor motion process, and the fitted track is matched with the motion of the directly observed sensor data.

And based on the estimated pose, rapidly reconstructing a three-dimensional map under an NVIDIA Jetson TX2 platform, wherein the incremental three-dimensional reconstruction process of the overlooking angle is shown in the figures 5-5.

Fig. 5-5 show the effect of incrementally reconstructing a three-dimensional map observed by the visualization tool in real time during the operation of the algorithm, and during the execution of the algorithm, the dense three-dimensional map can be synchronously reconstructed during the movement of the sensor, and the generation time of the reconstruction result is a little slower than the movement of the sensor for several seconds, as can be seen from the timestamp below the visualization tool, the reconstruction process time of the area is 54 seconds, and the real-time requirement can be basically met.

FIGS. 5-6(a), (b), (c) are the results of the dense three-dimensional map reconstruction of the present invention.

As can be seen in fig. 5-6(a), 5-6(b) and 5-6(c), the three-dimensional environment map constructed by the algorithm of the invention can still ensure a better three-dimensional map reconstruction result under the condition of greatly shortened computation time, and the reconstructed dense three-dimensional point cloud can better reflect the three-dimensional information of the measurement area space; the correspondence of the motion trajectory in the present invention to the three-dimensional map is good as shown in fig. 5-7.

(1) Building three-dimensional reconstruction experimental result

The reconstitution results of this experiment are shown in FIGS. 5-8. Wherein, 5-8(a) to 5-4(c) are results of the three-dimensional point cloud in the experiment from front view, overlook and right view respectively, the storage format is pcd, and the pcd _ viewer is used for representing the pcd in a three-dimensional point cloud space.

(3) Reconstruction time and precision evaluation experiment result

The results of comparing the serial algorithm reconstruction runtime of the present invention with the parallel algorithm runtime of the present invention and comparing the Smart3D software reconstruction time (SfM algorithm) with the system reconstruction time of the present invention are shown in table 5.3 below.

TABLE 5.3 evaluation results of reconstruction time according to the invention

Experimental data	Reconstruction algorithm	Reconstruction time	Reconstruction time of the invention	Acceleration ratio
					Table top (20 frames)	Serial algorithm of the invention	63.53	8.41	7.55
Table top (50 frames)	SfM	136	22	6.18
					Ruin scene	SfM	436	54	8.07

The reconstruction precision evaluation experiment is carried out by comparing a reference frame and a corresponding real depth map in the reconstruction process, and a variance curve of effective pixels and errors is drawn. When the error of the estimation result of the pixel-by-pixel depth map of the algorithm is within the range of 5cm and 15cm, the percentage of effective pixels and the error variance sigma²The relationship of (a) is shown by the green and light blue lines in FIGS. 5-9. Wherein, the variance calculation formula is as follows:

w is the width of the image, h is the height of the image, d_iIs an estimate of the ith pixel, d_stN is the number of effective pixels within the error tolerance range, and err is the set error range.

And (3) analyzing an experimental result:

(1) unmanned aerial vehicle motion estimation and increment visualization experiment result analysis

Through the pose estimation experiment of the experiment, the following conclusion can be obtained: the ROS distributed computing architecture can realize partial application on the unmanned aerial vehicle platform through integration of hardware such as a sensor and an embedded computing device and an algorithm; the algorithm of the invention can obtain better real-time performance on low-performance embedded computing equipment during motion estimation; VO nodes based on the visual SLAM algorithm can well realize state estimation of the pose of the unmanned aerial vehicle.

The following conclusions can be drawn through the three-dimensional map reconstruction experiment of the experiment: the reconstruction method based on the visual SLAM can greatly accelerate the reconstruction speed of the three-dimensional map, can quickly present an intuitive three-dimensional map result on an embedded computing platform with better performance, and provides better real-time property for the three-dimensional reconstruction based on the unmanned aerial vehicle image; the motion estimation of the invention is consistent with the movement of the sensor, and the local three-dimensional map is constructed in an incremental manner in the motion process of the sensor, so that better reconstruction results of terrain, topography and the like can be presented, and abundant three-dimensional information is provided; compared with the three-dimensional reconstruction of the image in the global range, the incremental map construction process in the motion process can avoid the phenomenon of spatial disorder caused by mismatching of a large number of similar characteristic points to a certain extent, and the reliability and the usability of the three-dimensional map are higher.

(2) Building three-dimensional reconstruction experimental analysis

The following conclusions can be drawn from the analysis of the results of this experiment: the algorithm of the invention can ensure a better three-dimensional map reconstruction result, the reconstructed dense three-dimensional point cloud can better reflect the three-dimensional information of the space of the measuring area, the reconstruction effect on the area with rich texture is better, but point cloud loss and holes can be generated on the reconstruction result of the mirror surfaces of glass and the like.

(3) Reconstruction time and precision evaluation experiment result analysis

The following conclusions can be drawn from the analysis of the results of this experiment:

in the aspect of reconstruction time evaluation, compared with a serial algorithm, the parallel algorithm has a good acceleration ratio in operation time, and can provide better real-time guarantee for the rapid reconstruction of the three-dimensional map of the unmanned aerial vehicle; on the other hand, the three-dimensional reconstruction time of the system is greatly improved compared with the SfM algorithm-based Smart3D software.

In the aspect of reconstruction accuracy evaluation, the reconstruction accuracy of the three-dimensional map of the invention depends on the accuracy of depth map estimation to a great extent, and the following conclusions can be drawn through quantitative experiments of the experiments: under a data set adopted by an experiment, the proportion of effective pixels with the error of less than 5cm in a densely reconstructed depth map in the total pixels can reach more than 40%, the proportion of effective pixels with the error of less than 15cm in the total pixels can reach nearly 60%, the depth map estimation result is good, and a good reference value can be provided for visualization of a three-dimensional map.

In addition, the method can simultaneously carry out motion estimation and dense three-dimensional map reconstruction of the visual SLAM on the basis of the ROS distributed computing architecture and operate on an unmanned aerial vehicle airborne computing platform, and a borrowable scheme is provided for deployment of other complex applications on the unmanned aerial vehicle platform.

The feasibility of the ROS distributed computing architecture in the application of the unmanned aerial vehicle platform is verified, and meanwhile, the algorithm of the invention is respectively tested on raspberry pi 3B and NVIDIA Jetson TX2 platforms; the motion estimation algorithm can also ensure better real-time performance on low-performance embedded computing equipment, and the dense reconstruction algorithm can quickly realize the reconstruction of a dense three-dimensional map on high-performance embedded equipment; the incremental three-dimensional reconstruction process of the unmanned aerial vehicle platform can be synchronously observed at an observation end, the reconstruction result is more visual compared with image data, the reconstruction precision evaluation experiment shows that the reconstruction error of the three-dimensional map is smaller, the depth map estimation result is better, and the provided three-dimensional map visualization result can embody the spatial characteristics of an observation scene.

The foregoing shows and describes the general principles and features of the present invention, together with the advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A dense vision SLAM-based rapid unmanned aerial vehicle three-dimensional map reconstruction method is characterized by comprising the following steps: the method comprises the steps of fast reconstruction of a three-dimensional map of an unmanned aerial vehicle platform, a fast reconstruction algorithm of the three-dimensional map of the unmanned aerial vehicle, a fast reconstruction system of the three-dimensional map of the unmanned aerial vehicle, a visual SLAM node communication system based on ROS and a visual SLAM node communication algorithm based on ROS;

receiving sensor data using a sensor _ msgs/Image message type, wherein the message type comprises a RAW Image and a compressed Image, and the RAW Image is original data of a light source signal captured by a CMOS or CCD Image sensor into a digital signal;

after sensor data are successfully acquired, the ROS provides camera calibration, distortion correction and color decoding for an image pipeline, the ROS image pipeline runs through an image _ proc function package and provides a conversion function for acquiring single colors and colors from a RAW image acquired by a camera, and after the camera is calibrated, the image pipeline can extract a Camera info message and correct radial and tangential distortion of the image, namely, preprocessing can be completed while data are acquired;

algorithm decoupling and distributed computation:

topic based messaging and communication:

(1) starting the Talker, and registering the information of the Talker to the Master end through a port, wherein the information comprises the topic name, the node address information and the like of the message issued by the Talker; the Master adds the information into a registration list;

the flow of the visual SLAM front-end algorithm is as follows:

(3) initializing a first keyframe and initializing pose T0;

(4) continuously carrying out ORB feature extraction on the image sequence, and selecting the next key frameF_i(i>＝1)；

the flow of the reconstruction part algorithm is as follows:

(1) selecting a target pixel point p with the depth required to be calculated;

(3) traversing the obtained polar line, and searching a point matched with p;

(5) updating the depth information of the pixel;

camera motion estimation: the motion estimation of the camera calculates a translation vector t and a rotation matrix R between adjacent key frames, wherein t and R can be obtained by decomposing an essential matrix E, when the key frame camera only rotates and does not translate, the epipolar constraint of two views is not established, a basic matrix F is a zero matrix, and a homography matrix H needs to be decomposed; the decomposition method is Singular Value (SVD) decomposition:

tR＝E＝UDV^T(4-1)

since the brightness of a single pixel is not differentiated, the similarity of pixel blocks is considered to be compared, namely, the pixel point p is₁Taking a small block w x w around the image block, traversing each point on the epipolar line, and matching the small block w x w around the point, wherein the assumption of the algorithm is changed from the gray scale invariance of the pixel into the gray scale invariance of the image block;

P(d)＝N(μ,σ²) (4-3)

VO nodes subscribe/usb _ cam, perform pose estimation, publish the result to pose forming data, reconstruct nodes to subscribe/usb _ cam and pose data at the same time, publish the processed depth map data and point cloud data to an ROS network, and subscribe by a visualization node to realize the visualization function of incremental map construction of the system;

the ROS-based visual SLAM node communication algorithm comprises the following steps: the SLAM front end, namely the VO node, and the back end reconstruction node subscribe the topic/usb _ cam at the same time; the VO node performs pose resolving on data issued by the/usb _ cam node through an interface of a SLAM front-end library, then issues the data in a message form, and a reconstruction node at the back end of the SLAM subscribes the data, namely the reconstruction node subscribes frame data and pose data at the same time; in order to visualize the result of the three-dimensional map reconstruction in real time, the method adopted comprises the following flows:

(4) a point cloud is constructed pixel-by-pixel based on the depth map.