CN113808152A - Unmanned aerial vehicle autonomous navigation method based on ORB _ SLAM2 - Google Patents

Unmanned aerial vehicle autonomous navigation method based on ORB _ SLAM2 Download PDF

Info

Publication number
CN113808152A
CN113808152A CN202111075423.6A CN202111075423A CN113808152A CN 113808152 A CN113808152 A CN 113808152A CN 202111075423 A CN202111075423 A CN 202111075423A CN 113808152 A CN113808152 A CN 113808152A
Authority
CN
China
Prior art keywords
camera
point
points
orb
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111075423.6A
Other languages
Chinese (zh)
Inventor
范嘉宇
刘云平
王炎
袁玉雯
苏东彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202111075423.6A priority Critical patent/CN113808152A/en
Publication of CN113808152A publication Critical patent/CN113808152A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20164Salient point detection; Corner detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an unmanned aerial vehicle autonomous navigation method based on ORB _ SLAM2, and belongs to the field of vision synchronous positioning and map construction. The method mainly comprises the steps of global optimization and mapping of a visual front end and a visual rear end and three-dimensional path planning. The visual front end mainly comprises ORB feature point extraction and matching and a visual odometer for calculating the current pose of the unmanned aerial vehicle, and the part realizes the positioning function. The visual back-end global optimization and mapping mainly comprises BA global map optimization and dense point cloud image mapping, and the global optimization and mapping functions are realized. The three-dimensional path planning realizes the operation of avoiding obstacles indoors and outdoors by the unmanned aerial vehicle, and the part realizes the path planning function. The method utilizes the improved FAST angular points to extract ORB pixel points and ORB characteristic points for block optimization; and performing pose estimation by using an improved ICP-PnP algorithm.

Description

Unmanned aerial vehicle autonomous navigation method based on ORB _ SLAM2
Technical Field
The invention relates to an unmanned aerial vehicle autonomous navigation method based on ORB _ SLAM2, and belongs to the field of vision synchronous positioning and map building (SLAM).
Background
An Unmanned Aerial Vehicle (UAV) is a mobile robot, and is called an Unmanned Aerial Vehicle (UAV), and a controller is mounted on the UAV, so that people can control the flight of the Vehicle on the ground. Meanwhile, an algorithm for building a Simuitaneous Localization and Mapping (SLAM) is disclosed, which aims to solve the problem of positioning of a robot without GPS signals and measure a map of an environment. The unmanned aerial vehicle relies on RGB-D autonomous navigation and is mainly divided into three elements: "where do I am? "," what is the environment i are in? "how do i go to the task destination? ", these three problems represent: positioning, mapping and path planning. SLAM can solve the first two of the three problems, so the SLAM technology is positioned and mapped, which is considered as the key for realizing unmanned aerial vehicle autonomous navigation by extensive researchers in the field.
As disclosed in patent No. CN 108520554 a, a binocular three-dimensional dense mapping method based on ORB _ SLAM2 proposes four ORB (organized FAST and Rotated brief) threads mainly for SLAM mapping: the method comprises a tracking thread, a local map thread, a closed-loop monitoring thread and a dense mapping process, wherein the ORB feature point extraction and matching process is put in the tracking process, if ORB feature points appear in piles, the judgment pose of a visual odometer is greatly deviated, and if ORB feature point blocking optimization is not carried out, the positioning error is greatly caused.
In the conventional unmanned aerial vehicle SLAM system, after the ORB feature points are extracted from the positioning module, the problem that the estimation pose of the visual odometer is influenced by the fact that feature points are piled up and stored is considered, the image has errors when pixel coordinate transformation is carried out due to the fact that global optimization does not exist after a dense point cloud image is generated in the mapping module, and the conventional RRT algorithm when the path planning module operates may cause that the unmanned aerial vehicle can not avoid obstacles at corners or avoid too large obstacles.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an unmanned aerial vehicle autonomous navigation method based on ORB _ SLAM2, so that the navigation process of the unmanned aerial vehicle is smoother, and related tasks are better completed
The invention adopts the following technical scheme for solving the technical problems:
an autonomous navigation method of a unmanned aerial vehicle based on ORB _ SLAM2 comprises the following steps:
(1) ORB feature point extraction
The ORB feature point extraction consists of two parts: FAST corner and BRIEF descriptor
When detecting the angular points, directly checking the gray levels of pixel points at the four positions of No. 1,5,9 and 13 on the circumference of each pixel to be determined, wherein the current detected pixel may be an angular point only if the gray levels of the four pixels meet the requirement that the number of the pixel points is not less than three;
the rotational invariance of the ORB features is realized by a gray centroid method, which comprises the following specific steps: selecting a small image block B from the center of the corner point, and then the moment of the image block can be defined as:
Figure BDA0003262107920000031
wherein: m ispqRepresenting moments, x, of image blockspP pixels representing the x-axis direction, yqQ pixels representing the y direction, I (x, y) represents the rotational invariance in the x and y directions, p represents the pixels p around the keypoint, q represents the pixels q around the keypoint;
finding the centroid of the image block by moment:
Figure BDA0003262107920000032
wherein: c denotes the centroid of the image block, m00Moment, m, representing an azimuthal (0,0) image block01Representing moments, m, of azimuthal (0,1) image blocks10Representing the moments of the orientation (1,0) image blocks;
assuming that the geometric center of the image block B is O and the OCB is connected, the direction of the key point is the vector
Figure BDA0003262107920000033
Using angles to indicate:
θ=arctan(m01/m10)
after the ORB feature key points are extracted, calculating another part of content of the feature points, namely descriptors of the key points;
after graying the image, dividing the image into 64x64 local pixel blocks, then carrying out key point detection on each pixel block, respectively calculating Harris response values of candidate key points meeting detection conditions, and finally reserving two key points with the maximum response values as a final key point set;
(2) ORB violence matching
After the violence characteristic points are matched and the descriptors are extracted, matching the descriptors;
after the feature matching is completed, only the depth values of the feature points are needed to be used, the feature points are subjected to inverse transformation once by using a camera pinhole imaging principle, and then the three-dimensional coordinates of the feature points under the camera coordinates are obtained, and the transformation is as follows:
Figure BDA0003262107920000041
wherein d represents the depth value of the characteristic point; u denotes x-axis constant, v denotes y-axis constant, cxAnd cyRepresenting the amount of change, f, of the origin of the two coordinate systemsyRepresents the camera focal length;
let the coordinates of a P point in space be x, y, z]TThe imaging position P 'is [ x', y ', z']TF is the camera focal length; according to the triangle similarity theorem, the following comparison relationship is obvious:
Figure BDA0003262107920000042
slightly finishing to obtain:
Figure BDA0003262107920000043
in the formula, only a transformation model from a space point to an imaging plane is shown, and individual pixels are obtained in the shooting process of a camera, so that the pixel coordinate of the space P can be obtained only by carrying out coordinate transformation once. The transformation process is as follows:
Figure BDA0003262107920000044
alpha and beta represent the scaling factors of two coordinate axes, respectively, cxAnd cyRepresenting the variation of the origin of the two coordinate systems; combined and written as f for α f and β f, respectivelyxAnd fyThen the whole imaging process is written as:
Figure BDA0003262107920000051
so far, we have obtained a model of camera pinhole imaging, and write the pixel coordinates into a homogeneous form and then write the whole projection process into a matrix form:
Figure BDA0003262107920000052
in the formula, z is the depth of the pixel;
solving the PnP problem using a direct linear transformation method, assuming that its coordinate has a homogeneous form of P ═ x, y, z,1)TThe coordinate of the corresponding homogeneous pixel in the image is P ═ (u, v,1)T(ii) a In order to solve the poses R and t of the camera, a rotation matrix and a translation vector are written into a matrix form, namely an augmentation matrix [ R | t ] is constructed];
And (3) expanding the augmentation matrix by using an imaging projection model of the camera to obtain the following relational expression:
Figure BDA0003262107920000053
Figure BDA0003262107920000054
Figure BDA0003262107920000055
thus, each 3D to 2D pair of feature matching points provides two linear constraints for the augmented matrix; define the row vector of the augmented matrix [ R | t ]:
Figure BDA0003262107920000061
the formula is then simplified to:
Figure BDA0003262107920000062
assuming a total of N matching point pairs, the system of equations is set forth:
Figure BDA0003262107920000063
the matrix [ R | t ] has 12 dimensions in total, and one feature matching point provides two linear constraints for the matrix [ R | t ], so that the pose is solved as long as the number of the feature matching point pairs is more than 6 pairs;
(3) global map optimization and three-dimensional dense point cloud map construction of BA
Visual SLAM rear end optimization and map building comprise global map optimization and three-dimensional dense point cloud map building based on BA;
the SLAM process is a process of continuously observing the environment, where the observation is made at a specific time, and the motion of the drone is converted into the motion of the camera at a discrete time t 1,2 within a continuous time..., k-1, k, and the poses of the cameras at the corresponding times are marked as x1, x2, and xk, and the camera poses at the times are connected to form the track of the cameras in the time; recording the environmental characteristic points observed in the period of time, namely the landmarks to finish the construction of the sparse map, assuming that N observed landmarks exist, and each landmark is marked as m1,...,mN(ii) a Considering the motion of the camera from time k-1 to time k, i.e. the change of pose of the drone, this function is abstracted as f:
xk=f(xk-1,ukk)
in the above formula, xk-1,xkIs the pose of the camera at the time of k-1 and k, ukIs a motion measurement sensor carried by an unmanned aerial vehicle, such as an encoder, and the reading of an IMU at the moment k, omegakIs the noise present during the motion measurement of the drone; the above formula indicates that the state change of the unmanned aerial vehicle is the motion equation of the unmanned aerial vehicle;
the process of the camera observation environment is abstracted into a function h:
zk,j=h(mj,xk,vk,j)
in the above formula, mjDenotes the jth road sign, vk,jRepresenting noise generated in the process, zk,jThe representative camera observes the environment at the pose xk, and the formula is an observation equation of SLAM;
ignoring the noise first, the observation equation is written as:
z=h(x,y)
wherein x refers to the current pose of the camera and is represented by R and t, and the corresponding lie algebra form is epsilon; pixel coordinates of signpost y are written as
Figure BDA0003262107920000071
The observed error is written as:
e=z-h(ε,p)
accumulating all error terms, we can get the global BA optimized objective function:
Figure BDA0003262107920000081
define camera pose and all landmarks as arguments:
x=[ε12,...,εm,p1,p2,...,pn]T
adding a slight increment to the argument and performing a taylor expansion, the objective function is written as:
Figure BDA0003262107920000082
fij represents the partial derivative of the camera pose part of the independent variable in the objective function, and Eij is the partial derivative of the position of the characteristic point in the objective function;
Figure BDA0003262107920000083
Figure BDA0003262107920000084
then, the incremental equation of the overall objective function is pieced together by F and E, and adjusted to be the jacobian matrix:
J=[F E]
the incremental equation H matrix is then:
Figure BDA0003262107920000085
thus obtaining incremental function representation of global optimization of BA;
constructing a three-dimensional dense point cloud map, constructing four threads by using an ORB _ SLAM2 framework, and designing and constructing an SLAM system of the dense map; firstly, completing a visual odometer by using a thread, estimating an accurate pose, feature point extraction and feature point violence matching for each frame of image shot by a camera; completing local map construction by using a second thread, wherein the second thread is mainly used for positioning; the third thread accelerates the feature matching of the map and the current frame; finally, the fourth thread completes the construction of the three-dimensional dense point cloud map;
(4) improved RRT algorithm in three-dimensional environment
In the initial sampling, reducing the range of sampling points to reduce the memory requirement, and adding connected domain sampling and target deviation bounded sampling; after the initial path is found, continuous and rapid optimization of the path is realized by utilizing elliptical sampling.
The invention has the following beneficial effects:
1. the invention can more accurately extract the corner points with gray change by the improved FAST (features From accessed Segment test) corner point extraction method. In the feature point extraction process, the feature points may be piled up, so that the feature points are extracted after the image is segmented, the feature point extraction is more dispersed, and the feature point matching is facilitated.
2. The invention uses the improved ICP-PnP (Iterative closed Point periodic-n-Point) algorithm to estimate the pose of the camera, solves the problem that effective pose estimation cannot be given under the condition of less feature points, and ensures that the pose of the camera under the feature matching relationship from 3D to 2D is more accurate.
3. According to the invention, the problem that the convergence speed of the algorithm is reduced and the quality of the planned path is poor due to too many exploration directions caused by complete randomness of the conventional RRT is solved through the improved unmanned aerial vehicle path planning algorithm R-RRT, so that the random tree has directivity, and the convergence speed is greatly improved.
Drawings
Fig. 1 is a general principle block diagram of a visual camera.
Fig. 2 is a flow chart of an algorithm for determining pose by a visual odometer.
FIG. 3 is a four thread flow diagram of the creation of a graph.
FIG. 4 is a diagram of unmanned aerial vehicle path planning R-RRT*And (4) an algorithm flow chart.
Fig. 5(a) is a FAST corner extraction effect diagram; FIG. 5(b) is the ORB descriptor violence matching effect.
A drawing; fig. 5(c) is a diagram of a visual odometer principle pinhole imaging model.
FIG. 6 is a three-dimensional simulated dense map.
FIG. 7(a) is a random tree RRT*A schematic diagram; FIG. 7(b) is a modified R-RRT*The algorithm simulates a simulation diagram.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
An autonomous unmanned aerial vehicle navigation method based on ORB _ SLAM2 includes, as shown in FIG. 1, a visual SLAM front end design, a visual SLAM rear end global optimization and mapping, and an unmanned aerial vehicle path planning algorithm design.
(1) Visual SLAM front end design
The visual SLAM front end design comprises ORB feature point extraction and violence matching, and the overall visual SLAM front end design flow is shown in figure 2 based on the position estimation of the ICP algorithm. The ORB feature point extraction consists of two parts: FAST corner and BRIEF descriptor
The FAST corner is a relatively special pixel point, a local pixel block with obvious gray change in the image represents the characteristics of the image, and the FAST corner monitoring process idea is as follows: and comparing the gray levels of one pixel point and adjacent pixel points in the image, wherein if the gray levels have obvious difference of over-bright or over-dark, the corner point is likely to be a corner point. The invention is improved, when detecting the corner, the gray levels of the pixel points at the 1 st, 5 th, 9 th and 13 th positions on the circumference are directly checked for each pixel to be determined, only if the gray levels of the four pixels meet the requirement that the number is not less than three, the current detected pixel may be a corner, and the specific effect of extracting the FAST corner is shown in fig. 5 (a).
The rotational invariance of the ORB features can be achieved by a grayscale centroid method, which is specifically done as follows: selecting a small image block B from the center of the corner point, and then the moment of the image block can be defined as:
Figure BDA0003262107920000111
wherein: m ispqTo representMoment of image block, xpP pixels representing the x-axis direction, yqQ pixels representing the y direction, I (x, y) represents the rotational invariance in the x and y directions, p represents the pixels p around the keypoint, q represents the pixels q around the keypoint;
the centroid of the image block can be found by the moments:
Figure BDA0003262107920000112
wherein: c denotes the centroid of the image block, m00Moment, m, representing an azimuthal (0,0) image block01Representing moments, m, of azimuthal (0,1) image blocks10Representing the moments of the orientation (1,0) image blocks;
assume that the geometric center of the image block B is O, and OC is connectedBThen the direction of the keypoint is the vector
Figure BDA0003262107920000121
Can be expressed using angles:
θ=arctan(m01/m10)
after the ORB key points are added with scale invariance and rotation invariance, the robustness of the representation of the ORB key points among different images is greatly improved. After the ORB feature key point extraction is completed, another part of the content of the feature point, that is, the descriptor of the key point, needs to be calculated. The descriptor is the key of feature matching, can ensure the rotation invariance of feature points, and the calculation method of the descriptor comprises the following steps: determining the key point as an origin, establishing a coordinate axis according to the direction of the key point, randomly selecting two pixels p and q around the key point, comparing the gray values of the two pixels, if p is greater than q, the value of the current bit is 1, otherwise, 0 is selected. Repeating the operation 256 times can calculate a descriptor of the ORB feature point.
Key points of the ORB features are easy to appear at places with obvious pixel changes, and easy to be piled up, because the vicinity of a pixel point with obvious gray change better meets the requirement of the FAST corner point, the extracted ORB feature points are particularly concentrated, and the pose error of the camera is increased. In order to solve the problem of the occurrence of the feature point bunching, the invention divides the image into regions, and the specific operation is as follows: after graying the image, dividing the image into 64x64 local pixel blocks, then performing key point detection on each pixel block, respectively calculating Harris response values of candidate key points meeting detection conditions, and finally reserving two key points with the maximum response values as a final key point set.
The violent feature point matching is useful when there are few feature points by extracting a descriptor and then matching the descriptor. Violence feature point matching thought: two sets of descriptors, P ═ p1... pM ] and Q ═ q1... qN, are given. Then, at any point in P, the corresponding minimum distance point in Q is found, i.e. a match is calculated. However, the feature points that do not satisfy the detection requirement are not eliminated, so a distance threshold d _ max is generally set, i.e., the distance of the feature points considered as matching should not be greater than d _ max, and the effect of the ORB feature point violent matching is shown in fig. 5 (b).
The ICP-PnP algorithm-based pose estimation method has the advantages that due to the complexity of the environment, noise exists in an RGB-D camera shooting depth map, the situation that the number of inner points matched with features is too small can be caused, the improvement of the estimation pose by fusing the PnP algorithm in the front end is provided, and the pose calculation method after the fusion improvement is named as the ICP-PnP algorithm.
After the feature matching is completed, the three-dimensional coordinates of the feature points under the camera coordinates can be obtained by only using the depth values of the feature points and carrying out inverse transformation on the feature points once by utilizing the camera pinhole imaging principle, and the transformation is as follows:
Figure BDA0003262107920000131
wherein d represents the depth value of the feature point, x represents the x-axis direction, y represents the y-axis direction, z represents the z-axis direction, u represents the x-axis constant, v represents the y-axis constant, cxAnd cyRepresenting the amount of change, f, of the origin of the two coordinate systemsyRepresenting the camera focal length.
Let the coordinate of a P point in space be[x,y,z]TThe imaging position P 'is [ x', y ', z']TAnd f is the camera focal length. According to the triangle similarity theorem, the following comparison relationship is obvious:
Figure BDA0003262107920000141
slightly finishing to obtain:
Figure BDA0003262107920000142
in the formula, only a transformation model from a space point to an imaging plane is shown, and individual pixels are obtained in the shooting process of a camera, so that the pixel coordinate of the space P can be obtained only by carrying out coordinate transformation once. The transformation process is as follows:
Figure BDA0003262107920000143
alpha and beta represent the scaling factors of two coordinate axes, respectively, cxAnd cyRepresenting the amount of change in the origin of the two coordinate systems. Combined and written as f for α f and β f, respectivelyxAnd fyThen the whole imaging process can be written as:
Figure BDA0003262107920000144
so far, we have obtained a model of camera pinhole imaging, and write the pixel coordinates into a homogeneous form and then write the whole projection process into a matrix form:
Figure BDA0003262107920000145
in the formula, z is the depth of the pixel.
The present invention solves the PnP problem using a Direct Linear Transformation (DLT) method, assuming a homogeneous form of its coordinatesFor P ═ (x, y, z,1)TThe coordinate of the corresponding homogeneous pixel in the image is P ═ (u, v,1)T. In order to solve the poses R and t of the camera, a rotation matrix and a translation vector are written into a matrix form, namely an augmentation matrix [ R | t ] is constructed]。
Using the imaging projection model of the camera, as shown in fig. 5(c), the augmentation matrix can be expanded to obtain the following relation:
Figure BDA0003262107920000151
Figure BDA0003262107920000152
Figure BDA0003262107920000153
thus, each pair of 3D to 2D feature matching points may provide two linear constraints for the augmented matrix. Define the row vector of the augmented matrix [ R | t ]:
Figure BDA0003262107920000154
the equation can then be simplified as:
Figure BDA0003262107920000155
assuming a total of N matching point pairs, the system of equations can be set forth:
Figure BDA0003262107920000161
the matrix [ R | t ] has 12 dimensions in total, and one feature matching point can provide two linear constraints for the matrix [ R | t ], so that the pose can be solved as long as the number of the feature matching point pairs is more than 6 pairs. (2) Visual SLAM back-end optimization and mapping
The visual SLAM back-end optimization and mapping comprises BA (global map optimization) -based and three-dimensional dense point cloud map construction, and the overall process is shown in FIG. 3.
The SLAM process is a process of continuously observing the environment, wherein the observation is carried out at a specific moment, the motion of the unmanned aerial vehicle in a period of continuous time is converted into the motion of a camera in the optimal dispersion moment t from a global graph, namely 1,21,x2,...,xkConnecting the camera poses at these times constitutes the trajectory of the camera during this time. Recording the environmental characteristic points observed in the period of time, namely the landmarks to finish the construction of the sparse map, assuming that N observed landmarks exist, and each landmark is marked as m1,...,mN. Considering the motion of the camera from time k-1 to time k, i.e. the change of pose of the drone, this function is abstracted as f:
xk=f(xk-1,ukk)
in the above formula, xk-1,xkIs the pose of the camera at the time of k-1 and k, ukIs a motion measurement sensor carried by an unmanned aerial vehicle, such as an encoder, and the reading of an IMU at the moment k, omegakIs the noise present when measuring the motion of the drone. The above formula indicates that the change of state of the drone is the equation of motion of the drone.
The process of the camera observation environment is abstracted into a function h:
zk,j=h(mj,xk,vk,j)
in the above formula, mjDenotes the jth road sign, vk,jRepresenting noise generated in the process, zk,jRepresenting the camera in pose xkThe environment is observed and the above equation is the observation equation for SLAM.
BA plays a crucial role in back-end global optimization, ignoring noise first, then the observation equation can be written as:
z=h(x,y)
wherein x is the current pose of the camera and can be represented by R and t, and its corresponding lieThe algebraic form is epsilon. The pixel coordinates of signpost y can be written as
Figure BDA0003262107920000171
The observed error can be written as:
e=z-h(ε,p)
accumulating all error terms, we can get the global BA optimized objective function:
Figure BDA0003262107920000172
define camera pose and all landmarks as arguments:
x=[ε12,...,εm,p1,p2,...,pn]T
by adding a slight increment to the argument and performing a taylor expansion, the objective function can be written as:
Figure BDA0003262107920000173
fij represents the partial derivative of the camera pose component of the argument in the objective function, Eij is the partial derivative of the feature point position in the objective function.
Figure BDA0003262107920000181
Figure BDA0003262107920000182
Then, the incremental equation of the overall objective function is pieced together by F and E, and adjusted to be the jacobian matrix:
J=[F E]
the incremental equation H matrix is then:
Figure BDA0003262107920000183
thus, the incremental function representation of global optimization of BA is obtained.
The three-dimensional dense point cloud map is constructed by constructing four threads and designing an SLAM system for constructing the dense map by using an ORB _ SLAM2 framework. Firstly, a thread is used for completing a visual odometer, and an accurate pose, feature point extraction and feature point violence matching are estimated for each frame of image shot by a camera. The local map construction is done using a second thread, mainly for localization. The third thread accelerates the feature matching of the map and the current frame. And finally, completing the construction of the three-dimensional dense point cloud map by the fourth thread, wherein the three-dimensional dense point cloud map is constructed by the last four threads as shown in FIG. 6.
Improved RRT algorithm in three-dimensional environment
The main idea of route planning of the RRT algorithm is as follows: taking a task starting point as a root node of a random tree in a map, then obtaining random points through random sampling in a planning space, and then enabling tree nodes close to the random points to grow towards the direction of the random points, wherein the schematic diagram of RRT is shown in FIG. 7(a), and the specific planning process of RRT algorithm can be divided into five steps:
the first step is to take the task starting point as the root node q of the random treeinit(ii) a Second step, a random point z is obtained by random sampling in the planning spacerandThen find the tree node z nearest to the random point in the random treenewest
Third step from tree node znewestAlong znewestzrandObtaining a distance z from the random tree step directionnewest
Nearest candidate tree node znew(ii) a The fourth step connecting znewestAnd znewObtaining a line segment, carrying out collision test on the road segment, and if no obstacle exists in the road segment, znewThe tree node becomes a real tree node, otherwise, a new candidate tree node is searched; and fifthly, continuously searching tree nodes until the tree nodes grow to the position on or near the target point. However, the RRT algorithm has the following disadvantages:
(1) in the initial iteration, the beneficial samples near the target area are ignored.
(2) The sampling is explored throughout the configuration space, so the sampling space is too large.
(3) The random tree growth direction has more and more tree nodes, and a large amount of time is spent for exploring the space beyond the target point, so that the memory requirement is increased.
(4) The random tree converges slowly.
In order to solve the above problems, the present invention provides a new off-line algorithm, namely, reset boundary fast search random tree (R-RRT), and a flow chart of the unmanned plane path planning R-RRT algorithm is shown in fig. 4. In the initial sampling, the range of sampling points is reduced to reduce the memory requirement, and connected domain sampling and target deviation bounded sampling are added, so that the algorithm can quickly converge to an optimal solution. The two methods enable valuable samples near the target area to be rapidly acquired, so that the optimal path can be effectively acquired in a relatively short time; after the initial path is found, continuous and rapid optimization of the path is realized by utilizing elliptical sampling.
Figure BDA0003262107920000201
Figure BDA0003262107920000211
The main operation of the process is explained as follows:
the GoalFirstNotFound function is used to determine whether the first path planning is completed, and if the sampling of the connected domain is completed, the sampling area of the ellipse will be planned.
ConnectivityReg this function uses DscaleTo identify by CRegA connection region of the representation, which covers ZinitAnd ZgoalThe search space in between.
BounddSample, which randomly selects a point Zrand∈Creg∈Zfree
Nerest function nerest (T, Z)rand) Returning T ═ (V, E) to Z according to a cost functionrandThe closest point of (a).
Steer guiding function (Z)nearestZrand) Providing a control input UnewFor the drive system, Z (0) is ZrandTo Z (1) ═ ZnerestAlong path z [0,1 ]]Z to ZnewA determined distance deltaq.
Collision check the function determines the path Z (t) e ZfreeWhether it is a feasible path in Z from t-0 to t-1.
Near: function Near (T, Z)newV) return neighboring points within a range.
ChooseParent: this function selects the least costly tree node Z from the nearby pointsmin
InsertPoint: the function sets the tree T to point Z in (V, E)newAdd to V and point ZminConnected as its tree nodes. Distance is resolved to ZnewEqual to the distance of its tree node plus ZnewAnd ZminThe sum of the euclidean distances between them.
Rewire function rewiring: (T, Z)near,Zmin,Znew) Inspection from ZnewTo ZnearIf the influence on the specific point is small, please define the parent node as the tree node.
CompleteScan: if at CRegComplete scan is complete, then true is returned. If a complete scan path is not found, the algorithm must grow by CReg
MinAix: this function is used to define the length of the major axis of the ellipse. After sampling, the ellipse iteratively changes the length of the major axis to optimize the path.
Ellipse Emaple this function defines and samples an elliptical area.
The algorithm is represented by ZinitFor the root node, then use the intelligent bounded sampling random tree, called CRegRandomly selects points within a limited area.
CRegThe search space in which new random points are sampled is limited and is set according to the size of the search map, i.e. the size of the search map is limitedSay Using the extended distance Scale to define ZinitAnd ZgoalThe search space in between.
DscaleE/m, where E is the size of the environment map and m is the spreading factor. By default, m is 8, limiting generation of CRegThe size of the environmental map of the area. m may be increased to obtain a narrower area or a wider area. Heuristic random selection C for target deviation caused by intelligent bounded samplingRegA point within the range.
When bound sampling is completed CRegIn a single scan, the first scan finds no path and then the digital scale is gradually increased, CRegAnd incremented until a path is found.
Once at CRegSearch to initial Path, CbestThe function will return a path defined by euclidean:
Figure BDA0003262107920000231
after the first path is found after connected domain sampling, a sampling area is continuously set, and elliptic sampling is achieved. An ellipse subset state is added to the connected component, which is uniformly sampled in the ellipse as the path solution for the planned region shrinks to the theoretical minimum.
The elliptical sampling space is defined as the optimal path after sampling through the connected domain:
Figure DA00032621079244370862
c, a path minimum function, andinitconnecting to Z in a given free spacegoalExpressed as:
Figure BDA0003262107920000232
f (X) is by ZinitTo ZgoalSo we can improve the solution subset ZfE.g., current state of Z, the current solutionCbestExpressed as:
Zf={z∈Z|f(x)<Cbest}
heuristic sampling Domain ZfIs one from the starting point ZinitAnd target point ZgoalAn ellipse at the focus, is used to find the minimum path problem. The shape of the ellipse depends on the initial and target points, the theoretical minimum distance C between themminAnd path C of the found best solutionbest
The unmanned aerial vehicle path planning algorithm R-RRT adds heuristic planning based on the RRT algorithm, and the specific effect is shown in fig. 7 (b).

Claims (1)

1. An unmanned aerial vehicle autonomous navigation method based on ORB _ SLAM2 is characterized by comprising the following steps:
(1) ORB feature point extraction
The ORB feature point extraction consists of two parts: FAST corner and BRIEF descriptor
When detecting the angular points, directly checking the gray levels of pixel points at the four positions of No. 1,5,9 and 13 on the circumference of each pixel to be determined, wherein the current detected pixel may be an angular point only if the gray levels of the four pixels meet the requirement that the number of the pixel points is not less than three;
the rotational invariance of the ORB features is realized by a gray centroid method, which comprises the following specific steps: selecting a small image block B from the center of the angular point, and defining the moment of the image block as:
Figure FDA0003262107910000011
wherein: m ispqRepresenting moments, x, of image blockspP pixels representing the x-axis direction, yqQ pixels representing the y direction, I (x, y) represents the rotational invariance in the x and y directions, p represents the pixels p around the keypoint, q represents the pixels q around the keypoint;
finding the centroid of the image block by moment:
Figure FDA0003262107910000012
wherein: c denotes the centroid of the image block, m00Moment, m, representing an azimuthal (0,0) image block01Representing moments, m, of azimuthal (0,1) image blocks10Representing the moments of the orientation (1,0) image blocks;
assuming that the geometric center of the image block B is O and the OCB is connected, the direction of the key point is the vector
Figure FDA0003262107910000021
Using angles to indicate:
θ=arctan(m01/m10)
after the ORB feature key points are extracted, calculating another part of content of the feature points, namely descriptors of the key points;
after graying the image, dividing the image into 64x64 local pixel blocks, then carrying out key point detection on each pixel block, respectively calculating Harris response values of candidate key points meeting detection conditions, and finally reserving two key points with the maximum response values as a final key point set;
(2) ORB violence matching
After the violence characteristic points are matched and the descriptors are extracted, matching the descriptors;
after the feature matching is completed, only the depth values of the feature points are needed to be used, the feature points are subjected to inverse transformation once by using a camera pinhole imaging principle, and then the three-dimensional coordinates of the feature points under the camera coordinates are obtained, and the transformation is as follows:
Figure FDA0003262107910000022
wherein d represents the depth value of the characteristic point; u denotes x-axis constant, v denotes y-axis constant, cxAnd cyRepresenting the amount of change, f, of the origin of the two coordinate systemsyRepresents the camera focal length;
suppose a P point in spaceHas the coordinate of [ x, y, z ]]TThe imaging position P 'is [ x', y ', z']TF is the camera focal length; according to the triangle similarity theorem, the following comparison relationship is obvious:
Figure FDA0003262107910000031
slightly finishing to obtain:
Figure FDA0003262107910000032
in the formula, only a transformation model from a space point to an imaging plane is shown, and individual pixels are obtained in the shooting process of a camera, so that the pixel coordinate of the space P can be obtained only by carrying out coordinate transformation once. The transformation process is as follows:
Figure FDA0003262107910000033
alpha and beta represent the scaling factors of two coordinate axes, respectively, cxAnd cyRepresenting the variation of the origin of the two coordinate systems; combined and written as f for α f and β f, respectivelyxAnd fyThen the whole imaging process is written as:
Figure FDA0003262107910000034
so far, we have obtained a model of camera pinhole imaging, and write the pixel coordinates into a homogeneous form and then write the whole projection process into a matrix form:
Figure FDA0003262107910000035
in the formula, z is the depth of the pixel;
using direct linear transformationThe alternative method solves the PnP problem, assuming that its coordinate has a homogeneous form of P ═ x, y, z,1)TThe coordinate of the corresponding homogeneous pixel in the image is P ═ (u, v,1)T(ii) a In order to solve the poses R and t of the camera, a rotation matrix and a translation vector are written into a matrix form, namely an augmentation matrix [ R | t ] is constructed];
And (3) expanding the augmentation matrix by using an imaging projection model of the camera to obtain the following relational expression:
Figure FDA0003262107910000041
Figure FDA0003262107910000042
Figure FDA0003262107910000043
thus, each 3D to 2D pair of feature matching points provides two linear constraints for the augmented matrix; define the row vector of the augmented matrix [ R | t ]:
Figure FDA0003262107910000044
the formula is then simplified to:
Figure FDA0003262107910000045
assuming a total of N matching point pairs, the system of equations is set forth:
Figure FDA0003262107910000046
the matrix [ R | t ] has 12 dimensions in total, and one feature matching point provides two linear constraints for the matrix [ R | t ], so that the pose is solved as long as the number of the feature matching point pairs is more than 6 pairs;
(3) global map optimization and three-dimensional dense point cloud map construction of BA
Visual SLAM rear end optimization and map building comprise global map optimization and three-dimensional dense point cloud map building based on BA;
the SLAM process is a process of continuously observing the environment, wherein the observation is performed at a specific time, the motion of the unmanned aerial vehicle in a continuous period of time is converted into the motion of the camera in a discrete time t 1,2,.., k-1, k, and the poses of the camera corresponding to the time are marked as x1, x2,.., xk, and the camera poses at the time are connected to form the track of the camera in the time; recording the environmental characteristic points observed in the period of time, namely the landmarks to finish the construction of the sparse map, assuming that N observed landmarks exist, and each landmark is marked as m1,...,mN(ii) a Considering the motion of the camera from time k-1 to time k, i.e. the change of pose of the drone, this function is abstracted as f:
xk=f(xk-1,ukk)
in the above formula, xk-1,xkIs the pose of the camera at the time of k-1 and k, ukIs a motion measurement sensor carried by an unmanned aerial vehicle, such as an encoder, and the reading of an IMU at the moment k, omegakIs the noise present during the motion measurement of the drone; the above formula indicates that the state change of the unmanned aerial vehicle is the motion equation of the unmanned aerial vehicle;
the process of the camera observation environment is abstracted into a function h:
zk,j=h(mj,xk,vk,j)
in the above formula, mjDenotes the jth road sign, vk,jRepresenting noise generated in the process, zk,jThe representative camera observes the environment at the pose xk, and the formula is an observation equation of SLAM;
ignoring the noise first, the observation equation is written as:
z=h(x,y)
wherein x refers to the current pose of the camera and is represented by R and t, and the corresponding lie algebra form is epsilon;pixel coordinates of signpost y are written as
Figure FDA0003262107910000061
The observed error is written as:
e=z-h(ε,p)
accumulating all error terms, we can get the global BA optimized objective function:
Figure FDA0003262107910000062
define camera pose and all landmarks as arguments:
x=[ε12,...,εm,p1,p2,...,pn]T
adding a slight increment to the argument and performing a taylor expansion, the objective function is written as:
Figure FDA0003262107910000063
fij represents the partial derivative of the camera pose part of the independent variable in the objective function, and Eij is the partial derivative of the position of the characteristic point in the objective function;
Figure FDA0003262107910000064
Figure FDA0003262107910000071
then, the incremental equation of the overall objective function is pieced together by F and E, and adjusted to be the jacobian matrix:
J=[F E]
the incremental equation H matrix is then:
Figure FDA0003262107910000072
thus obtaining incremental function representation of global optimization of BA;
constructing a three-dimensional dense point cloud map, constructing four threads by using an ORB _ SLAM2 framework, and designing and constructing an SLAM system of the dense map; firstly, completing a visual odometer by using a thread, estimating an accurate pose, feature point extraction and feature point violence matching for each frame of image shot by a camera; completing local map construction by using a second thread, wherein the second thread is mainly used for positioning; the third thread accelerates the feature matching of the map and the current frame; finally, the fourth thread completes the construction of the three-dimensional dense point cloud map;
(4) improved RRT algorithm in three-dimensional environment
In the initial sampling, reducing the range of sampling points to reduce the memory requirement, and adding connected domain sampling and target deviation bounded sampling; after the initial path is found, continuous and rapid optimization of the path is realized by utilizing elliptical sampling.
CN202111075423.6A 2021-09-14 2021-09-14 Unmanned aerial vehicle autonomous navigation method based on ORB _ SLAM2 Pending CN113808152A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111075423.6A CN113808152A (en) 2021-09-14 2021-09-14 Unmanned aerial vehicle autonomous navigation method based on ORB _ SLAM2

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111075423.6A CN113808152A (en) 2021-09-14 2021-09-14 Unmanned aerial vehicle autonomous navigation method based on ORB _ SLAM2

Publications (1)

Publication Number Publication Date
CN113808152A true CN113808152A (en) 2021-12-17

Family

ID=78895270

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111075423.6A Pending CN113808152A (en) 2021-09-14 2021-09-14 Unmanned aerial vehicle autonomous navigation method based on ORB _ SLAM2

Country Status (1)

Country Link
CN (1) CN113808152A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115451920A (en) * 2022-10-27 2022-12-09 南京航空航天大学 Relative pose measurement method for unmanned autonomous landing
CN117274620A (en) * 2023-11-23 2023-12-22 东华理工大学南昌校区 Visual SLAM method based on self-adaptive uniform division feature point extraction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766758A (en) * 2018-12-12 2019-05-17 北京计算机技术及应用研究所 A kind of vision SLAM method based on ORB feature
CN111667506A (en) * 2020-05-14 2020-09-15 电子科技大学 Motion estimation method based on ORB feature points

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766758A (en) * 2018-12-12 2019-05-17 北京计算机技术及应用研究所 A kind of vision SLAM method based on ORB feature
CN111667506A (en) * 2020-05-14 2020-09-15 电子科技大学 Motion estimation method based on ORB feature points

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
王海: "基于视觉SLAM建图的无人机路径规划", 《中国优秀硕士学位论文全文数据库 工程科技II辑》, no. 2012, pages 031 - 189 *
裴以建 等: "基于改进 RRT*的移动机器人路径规划算法", 《计算机工程》, vol. 45, no. 5, pages 285 - 297 *
郑凯元: "室内服务机器人关键技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 2020, pages 140 - 115 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115451920A (en) * 2022-10-27 2022-12-09 南京航空航天大学 Relative pose measurement method for unmanned autonomous landing
CN115451920B (en) * 2022-10-27 2023-03-14 南京航空航天大学 Relative pose measurement method for unmanned autonomous landing
CN117274620A (en) * 2023-11-23 2023-12-22 东华理工大学南昌校区 Visual SLAM method based on self-adaptive uniform division feature point extraction
CN117274620B (en) * 2023-11-23 2024-02-06 东华理工大学南昌校区 Visual SLAM method based on self-adaptive uniform division feature point extraction

Similar Documents

Publication Publication Date Title
CN110070615B (en) Multi-camera cooperation-based panoramic vision SLAM method
Rozenberszki et al. LOL: Lidar-only Odometry and Localization in 3D point cloud maps
Huang Review on LiDAR-based SLAM techniques
Yang et al. Multi-camera visual SLAM for off-road navigation
Mu et al. Research on SLAM algorithm of mobile robot based on the fusion of 2D LiDAR and depth camera
CN110726406A (en) Improved nonlinear optimization monocular inertial navigation SLAM method
Lv et al. ORB-SLAM-based tracing and 3D reconstruction for robot using Kinect 2.0
Sujiwo et al. Monocular vision-based localization using ORB-SLAM with LIDAR-aided mapping in real-world robot challenge
CN114419147A (en) Rescue robot intelligent remote human-computer interaction control method and system
CN113808152A (en) Unmanned aerial vehicle autonomous navigation method based on ORB _ SLAM2
CN113674412B (en) Pose fusion optimization-based indoor map construction method, system and storage medium
Li et al. Automatic targetless LiDAR–camera calibration: a survey
CN111812978B (en) Cooperative SLAM method and system for multiple unmanned aerial vehicles
CN111998862A (en) Dense binocular SLAM method based on BNN
CN117367427A (en) Multi-mode slam method applicable to vision-assisted laser fusion IMU in indoor environment
Shao et al. Monocular object slam using quadrics and landmark reference map for outdoor UAV applications
Wei et al. Overview of visual slam for mobile robots
Zhang et al. A Robust Lidar SLAM System Based on Multi-Sensor Fusion
CN113379915B (en) Driving scene construction method based on point cloud fusion
Li et al. An SLAM Algorithm Based on Laser Radar and Vision Fusion with Loop Detection Optimization
Liu et al. Laser 3D tightly coupled mapping method based on visual information
Li et al. BA-LIOM: tightly coupled laser-inertial odometry and mapping with bundle adjustment
Zhang et al. Object depth measurement from monocular images based on feature segments
Liu et al. 6-DOF motion estimation using optical flow based on dual cameras
Chen et al. SLAM system based on tightly coupled visual-inertial

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination