CN111998862B - BNN-based dense binocular SLAM method - Google Patents

BNN-based dense binocular SLAM method Download PDF

Info

Publication number
CN111998862B
CN111998862B CN202010627927.3A CN202010627927A CN111998862B CN 111998862 B CN111998862 B CN 111998862B CN 202010627927 A CN202010627927 A CN 202010627927A CN 111998862 B CN111998862 B CN 111998862B
Authority
CN
China
Prior art keywords
map
binocular
depth
camera
bnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010627927.3A
Other languages
Chinese (zh)
Other versions
CN111998862A (en
Inventor
陈刚
吴志慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202010627927.3A priority Critical patent/CN111998862B/en
Publication of CN111998862A publication Critical patent/CN111998862A/en
Application granted granted Critical
Publication of CN111998862B publication Critical patent/CN111998862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/28Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
    • G01C21/30Map- or contour-matching
    • G01C21/32Structuring or formatting of map data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a dense binocular SLAM method based on BNN, which comprises the following steps: step one: extracting ORB characteristics from the corrected left graph; step two: acquiring a left image depth map; step three: obtaining the depth of ORB characteristic points of the left graph; step four: estimating the current camera motion gesture; step five: generating a key frame, and carrying out local mapping through the key frame; step six: establishing a dense map; step seven: loop detection and loop correction. The invention uses BNN to realize binocular matching, improves matching precision and matching speed, thereby improving positioning precision and mapping speed, not only solving the problem that the feature method vision SLAM can not build a dense map, but also realizing real-time mapping of the dense map, and enabling the feature method vision SLAM to be practically applied.

Description

BNN-based dense binocular SLAM method
Technical Field
The invention relates to the field of simultaneous positioning and map creation of robots, in particular to a BNN-based dense binocular SLAM method.
Background
Simultaneous localization and mapping (Simultaneous Localization and Mapping, SLAM) refers to a mobile robot estimating its own motion in an unknown environment by means of onboard sensors while building an environment map.
Visual SLAM can be divided into two categories: characteristic methods and direct methods. The visual SLAM method of the feature method needs to extract image feature points, and a sparse map can be created; the feature method vision SLAM has good robustness to illumination and intense movement, but has poor performance in a scene without texture, and the established sparse map can not be further used for tasks such as obstacle avoidance, movement planning and the like.
For binocular signature visual SLAM, a binocular matching process is typically involved. The traditional binocular matching method has large calculated amount and low accuracy. The patent publication with the application number of CN201811416964.9 discloses a robot positioning system and a robot positioning method based on binocular vision and a convolutional neural network, wherein the convolutional neural network is adopted to improve the binocular matching precision, so that more accurate data are better obtained to build a dense map, but the convolutional neural network still causes large calculation amount of binocular matching, poor instantaneity and incapability of building the dense map in real time, and the system and the method are difficult to be applied in practice.
Disclosure of Invention
The invention provides a dense binocular SLAM method based on BNN, which aims to solve the problems that the binocular matching precision of the binocular feature method visual SLAM method in the prior art is low and a dense map is difficult to build in real time, realizes high-precision positioning, reduces the calculated amount and can build the dense map in real time.
In order to solve the technical problems, the invention adopts the following technical scheme: a dense binocular SLAM method based on BNN adopts a system comprising BNN (binary neural network) module, ORB-SLAM2 frame and dense mapping module; BNN module: receiving left and right images of a binocular camera, and calculating a parallax image corresponding to the left image in real time through a trained binary neural network; ORB-SLAM2 framework: ORB-SLAM2 is an open source feature-based SLAM method supporting monocular, binocular and RGB-D cameras, providing high precision positioning through fusion of BNN modules with ORB-SLAM2 frames; and (5) a dense mapping module: and the dense map building module builds a dense three-dimensional environment map in real time according to the images input by the binocular camera, the parallax map obtained by the BNN module and the positioning result obtained by the ORB-SLAM2 frame.
The BNN module comprises a feature extraction unit, a matching cost calculation unit and a disparity map optimization unit;
the feature extraction unit extracts binary image block features from the left and right eye images through a twin binary network
Figure BDA0002567230350000021
Weight for each image block feature>
Figure BDA0002567230350000022
Obtaining a binary descriptor with weights, wherein the weights are learned by a network;
the matching cost calculation unit measures the similarity between image blocks by calculating the weighted hamming distance of the weighted binary description, and the calculation formula is as follows:
Figure BDA0002567230350000023
in the method, in the process of the invention,
Figure BDA0002567230350000024
representing the weight vector learned by the network; />
Figure BDA0002567230350000025
The j-th element of the weight vector obtained by the network learning is represented; the H () function represents the hamming distance.
And the disparity map optimizing unit optimizes the matching cost through semi-global matching SGM to obtain a final disparity map.
The ORB-SLAM2 framework comprises a camera motion tracking module, a local mapping module and a loop detection module;
the camera motion tracking module is used for estimating the motion gesture of the camera;
the local map building module is used for building a sparse environment map;
the loop detection module is used for detecting a loop, judging whether the camera returns to a scene which is arrived before, and correcting the loop after detecting the loop, so as to correct the pose and map point position of the relevant camera;
the camera motion tracking module, the local mapping module and the loop detection module run on three CPU threads in parallel.
The system runs on four CPU threads in real time and accelerates by using a GPU, the BNN module is contained in a camera motion tracking module of the ORB-SLAM2 framework and runs in the GPU, the camera motion tracking module, the local mapping module and the loop detection module of the ORB-SLAM2 framework run on three CPU threads, and the dense mapping module runs on one CPU thread.
A dense binocular SLAM method based on BNN comprises the following specific steps:
step one: extracting ORB characteristics from the corrected left graph; the ORB feature consists of rotating multi-scale FAST corner points and 256-bit BRIEF (Binary Robust Independent Elementary Features) descriptors corresponding to the corner points;
step two: acquiring a left image depth map; the method comprises the steps of calculating a parallax map of a left image by using a binary neural network for inputting left and right eye images, and obtaining a depth map of the left image according to a binocular distance principle from the obtained parallax map of the left image;
step three: acquiring the depth of a characteristic point of the left graph ORB (Oriented FAST and Rotated BRIEF); positioning the positions of ORB feature points in the trusted region of the depth map, and obtaining the depth of the corresponding ORB feature points;
step four: estimating the current camera motion gesture;
step five: generating a key frame, and carrying out local mapping through the key frame;
step six: establishing a dense map;
step seven: loop detection and loop correction.
Preferably, in the second step, the binary neural network shares four layers and is trained by a binocular image library, the depth map of the left image is obtained by measuring the similarity of image blocks by using weighted hamming distance to obtain Matching cost, and finally, the Matching cost is optimized by Semi-Global Matching (SGM) to obtain a parallax map, and then, the depth map of the left image is obtained according to a binocular distance principle.
Preferably, for extracted image block features
Figure BDA0002567230350000031
The cosine similarity is as follows: />
Figure BDA0002567230350000032
In the method, in the process of the invention,
Figure BDA0002567230350000033
a feature vector representing an i-th image block of the left image; />
Figure BDA0002567230350000034
A feature vector representing an i-th image block of the right image;
Figure BDA0002567230350000035
representing the cosine similarity of the two feature vectors; />
Figure BDA0002567230350000036
A j-th element representing a feature vector of an i-th image block of the left image; />
Figure BDA0002567230350000037
A j-th element of the feature vector representing the i-th image block of the right image; the sign () function represents binarization;
cosine similarity is expressed by weighted hamming distance:
Figure BDA0002567230350000041
in the method, in the process of the invention,
Figure BDA0002567230350000042
representing the weight vector learned by the network; />
Figure BDA0002567230350000043
The j-th element of the weight vector obtained by the network learning is represented; the H () function represents the hamming distance;
the solving step of the weighted hamming distance optimal weight is as follows:
Figure BDA0002567230350000044
Figure BDA0002567230350000045
in the method, in the process of the invention,
Figure BDA0002567230350000046
a j-th element representing an i-th weight vector; />
Figure BDA0002567230350000047
Representing the weight vector learned by the network; />
Figure BDA0002567230350000048
The j-th element of the weight vector obtained by the network learning is represented; />
Figure BDA0002567230350000049
Representing the optimal weight; the H () function represents the hamming distance; j () represents building a least squares optimization problem;
for a pair of
Figure BDA00025672303500000410
About->
Figure BDA00025672303500000411
Deriving and setting the weight optimal solution to be 0: />
Figure BDA00025672303500000412
In the method, in the process of the invention,
Figure BDA00025672303500000413
a j-th element representing an i-th weight vector; />
Figure BDA00025672303500000414
The j-th element of the weight vector obtained by the network learning is represented.
Preferably, the specific flow of the third step is as follows: selecting a trusted region of the depth map of the left image, enabling the length and width of the trusted region to be a and b respectively relative to the aspect ratio of the original depth map, setting parameters a and b to be determined according to parameters of the binocular camera, and selecting smaller a and smaller b when the baseline of the binocular camera is shorter; unreliable depth rejection, namely, the depth is too large or too small, as unreliable depth, setting upper and lower thresholds of the depth, and rejecting the depth outside the threshold range, wherein the threshold of the depth can be set by multiplying the baseline length of the binocular camera by a value of a certain constant; and positioning the positions of the ORB feature points in the trusted region of the depth map, and acquiring the depth of the corresponding ORB feature points.
Preferably, the specific flow of the fourth step is as follows:
s4.1, if the camera moves at a uniform speed, estimating the motion gesture of the current frame according to the motion gesture of the previous frame according to a uniform motion model; if the camera does not move at a uniform speed, performing feature matching on the current frame and a reference key frame of the current frame, accelerating a matching process by using a Bag-of-Words (Bag) dictionary, and optimizing the motion gesture of the current frame according to a matching result; the optimization method is a Levenberg-Marquardt method in g2o, and the optimization method can be applied to other optimizations of the scheme.
S4.2: and (3) performing feature matching on the current frame and the local map, and using a matching result to optimize the camera motion pose estimation in the step (4.1) to obtain final camera motion pose estimation. The local map consists of map points of a series of key frames that have partially identical map points as the current frame or, in a co-view, are neighbors of a key frame that has partially identical map points as the current frame. A common view is made up of a series of key frames, with the connection weights between them representing the number of points that have the same map.
Preferably, the specific steps of establishing the local map in the fifth step are as follows:
s5.1: establishing a common view and a spanning tree; the common view consists of key frames, if at least 15 identical map points exist between two key frames, an edge exists, and the weight of the edge is the number of the identical map points; the spanning tree is a sub-graph of the common view, and the edge weight requirement of the spanning tree is at least 100;
s5.2: inserting a key frame; processing the key frames sent by the camera motion tracking thread, and updating the common view and the spanning tree;
s5.3: updating map points; the method comprises the steps of clipping and generating map points, wherein the generation of the map points is generated from ORB characteristic points in key frames of common views by using a triangulation principle; cutting map points is to remove map points with less observation times;
s5.4: local BA optimization; optimizing the motion postures of the current frame and all key frames related to the current frame in the common view, and the spatial positions of map points corresponding to the key frames;
s5.5: and cutting out the local key frames. The 90% map points are deleted from the key frames which can be observed by other key frames so as to control the scale of the local map not to be overlarge.
Preferably, in the sixth step, the specific steps of dense map establishment are:
s6.1, for an input key frame, calculating a depth map of the key frame according to a binocular range principle by using a parallax map of the key frame and an internal reference of a camera;
s6.2, mapping image pixel coordinates in the key frame to three-dimensional world point coordinates according to the depth map of the key frame, the camera motion gesture of the key frame and the camera internal parameters;
s6.3, representing and storing three-dimensional world points by using RGB point clouds, and incrementally constructing a three-dimensional environment map;
s6.4: and performing filtering optimization on the three-dimensional environment map by using voxel filtering to obtain an optimized three-dimensional environment map.
Preferably, the specific flow in the step seven is as follows: using a closed-loop key frame selection strategy of an algorithm frame, and correcting corresponding map points and key frame motion pose if a closed loop is formed; and then starting a new thread to perform global optimization, and fusing the optimized key frame motion pose and map points with the initial key frame motion pose and map points after the global optimization is completed.
Preferably, in the first step: in order to realize scale invariance of descriptors, gaussian blur and downsampling are carried out on a left graph to form an 8-layer Gaussian pyramid; the specific method is that each layer of image of the pyramid is subjected to grid division, and FAST corner points are extracted from grids; extracting not less than 5 corner points from each grid; for each corner point, a corresponding descriptor is calculated. In the image block near the corner point, two pixels are randomly selected 256 times according to Gaussian distribution, gray values are compared, if the first pixel is large, 1 is calculated, and finally a 256-bit descriptor is obtained; in order to realize the invariance of the direction of the descriptors, the gray centroid method is used for calculating the direction of the angular points, and the direction of the descriptors is updated.
Compared with the prior art, the invention has the beneficial effects that: the invention uses BNN to realize binocular matching, improves matching precision and matching speed, thereby improving positioning precision and mapping speed, not only solving the problem that the feature method vision SLAM can not build a dense map, but also realizing real-time mapping of the dense map, and enabling the feature method vision SLAM to be practically applied. The invention has the complete function of SLAM, provides the functions of high-precision positioning and dense mapping, and has wide application range and good robustness.
Drawings
FIG. 1 is a schematic overall flow diagram of a BNN-based dense binocular SLAM method of the present invention;
FIG. 2 is a flowchart of the operation of a binary neural network of the BNN-based dense binocular SLAM method of the present invention;
FIG. 3 is a graphical illustration of the positioning accuracy of the present invention on an EuRoC dataset;
fig. 4 is an effect of the present invention to build a dense map on an EuRoC dataset.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present patent; for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationship depicted in the drawings is for illustrative purposes only and is not to be construed as limiting the present patent.
The technical scheme of the invention is further specifically described by the following specific embodiments with reference to the accompanying drawings:
example 1
As shown in fig. 1-2, an embodiment of a BNN-based dense binocular SLAM method includes the steps of:
step one: extracting ORB characteristics from the corrected left graph; ORB features consist of rotating multi-scale FAST corner points and 256 BRIEF descriptors corresponding to the corner points; in order to realize scale invariance of descriptors, gaussian blur and downsampling are carried out on a left graph to form an 8-layer Gaussian pyramid; the specific method is that each layer of image of the pyramid is subjected to grid division, and FAST corner points are extracted from grids; extracting not less than 5 corner points from each grid; for each corner point, a corresponding descriptor is calculated. In the image block near the corner point, two pixels are randomly selected 256 times according to Gaussian distribution, gray values are compared, if the first pixel is large, 1 is calculated, and finally a 256-bit descriptor is obtained; in order to realize the invariance of the direction of the descriptors, the gray centroid method is used for calculating the direction of the angular points, and the direction of the descriptors is updated.
Step two: acquiring a left image depth map; the method comprises the steps of calculating a parallax map of a left image by using a binary neural network for inputting left and right eye images, and obtaining a depth map of the left image according to a binocular distance principle from the obtained parallax map of the left image; specifically, the binary neural network has four layers and is trained by a binocular image library, the binary neural network is used for obtaining the depth map of the left image by measuring the similarity of image blocks by using weighted Hamming distance to obtain matching cost, and finally, the matching cost is optimized by semi-global matching SGM to obtain a parallax map, and then, the depth map of the left image is obtained according to the principle of binocular distance.
For extracted image features
Figure BDA0002567230350000071
The cosine similarity is as follows:
Figure BDA0002567230350000072
in the method, in the process of the invention,
Figure BDA0002567230350000073
a feature vector representing an i-th image block of the left image; />
Figure BDA0002567230350000074
A feature vector representing an i-th image block of the right image;
Figure BDA0002567230350000075
representing the cosine similarity of the two feature vectors; />
Figure BDA0002567230350000076
A j-th element representing a feature vector of an i-th image block of the left image; />
Figure BDA0002567230350000081
A j-th element of the feature vector representing the i-th image block of the right image; the sign () function represents binarization;
cosine similarity is expressed by weighted hamming distance:
Figure BDA0002567230350000082
in the method, in the process of the invention,
Figure BDA0002567230350000083
representing the weight vector learned by the network; />
Figure BDA0002567230350000084
The j-th element of the weight vector obtained by the network learning is represented; the H () function represents the hamming distance;
the solving step of the weighted hamming distance optimal weight is as follows:
Figure BDA0002567230350000085
Figure BDA0002567230350000086
/>
in the method, in the process of the invention,
Figure BDA0002567230350000087
a j-th element representing an i-th weight vector; />
Figure BDA0002567230350000088
Representing the weight vector learned by the network; />
Figure BDA0002567230350000089
The j-th element of the weight vector obtained by the network learning is represented; />
Figure BDA00025672303500000810
Representing the optimal weight; the H () function represents the hamming distance; j () represents building a least squares optimization problem;
for a pair of
Figure BDA00025672303500000811
About->
Figure BDA00025672303500000812
Deriving and setting the weight optimal solution to be 0:
Figure BDA00025672303500000813
in the method, in the process of the invention,
Figure BDA00025672303500000814
a j-th element representing an i-th weight vector; />
Figure BDA00025672303500000815
The j-th element of the weight vector obtained by the network learning is represented.
Step three: obtaining the depth of ORB characteristic points of the left graph; selecting a trusted region of the depth map of the left image, enabling the length and width of the trusted region to be a and b respectively relative to the aspect ratio of the original depth map, setting parameters a and b to be determined according to parameters of the binocular camera, and selecting smaller a and smaller b when the baseline of the binocular camera is shorter; unreliable depth rejection, namely, the depth is too large or too small, as unreliable depth, setting upper and lower thresholds of the depth, and rejecting the depth outside the threshold range, wherein the threshold of the depth can be set by multiplying the baseline length of the binocular camera by a value of a certain constant; and positioning the positions of the ORB feature points in the trusted region of the depth map, and acquiring the depth of the corresponding ORB feature points.
Step four: estimating the current camera motion gesture;
s4.1, if the camera moves at a uniform speed, estimating the motion gesture of the current frame according to the motion gesture of the previous frame according to a uniform motion model; if the camera does not move at a uniform speed, performing feature matching on the current frame and a reference key frame of the current frame, accelerating a matching process by using a BoW dictionary, and optimizing the motion gesture of the current frame according to a matching result; the optimization method is a Levenberg-Marquardt method in g2o, and the optimization method can be applied to other optimizations of the scheme.
S4.2: and (3) performing feature matching on the current frame and the local map, and using a matching result to optimize the camera motion pose estimation in the step (4.1) to obtain final camera motion pose estimation. The local map consists of map points of a series of key frames that have partially identical map points as the current frame or, in a co-view, are neighbors of a key frame that has partially identical map points as the current frame. A common view is made up of a series of key frames, with the connection weights between them representing the number of points that have the same map.
Step five: generating a key frame, and carrying out local mapping through the key frame; the specific steps are as follows:
s5.1: establishing a common view and a spanning tree; the common view consists of key frames, if at least 15 identical map points exist between two key frames, an edge exists, and the weight of the edge is the number of the identical map points; the spanning tree is a sub-graph of the common view, and the edge weight requirement of the spanning tree is at least 100;
s5.2: inserting a key frame; processing the key frames sent by the camera motion tracking thread, and updating the common view and the spanning tree;
s5.3: updating map points; the method comprises the steps of clipping and generating map points, wherein the generation of the map points is generated from ORB characteristic points in key frames of common views by using a triangulation principle; cutting map points is to remove map points with less observation times;
s5.4: local BA optimization; optimizing the motion postures of the current frame and all key frames related to the current frame in the common view, and the spatial positions of map points corresponding to the key frames;
s5.5: and cutting out the local key frames. The 90% map points are deleted from the key frames which can be observed by other key frames so as to control the scale of the local map not to be overlarge.
Step six: the method for establishing the dense map comprises the following specific steps:
s6.1, for an input key frame, calculating a depth map of the key frame according to a binocular range principle by using a parallax map of the key frame and an internal reference of a camera;
s6.2, mapping image pixel coordinates in the key frame to three-dimensional world point coordinates according to the depth map of the key frame, the camera motion gesture of the key frame and the camera internal parameters;
s6.3, representing and storing three-dimensional world points by using RGB point clouds, and incrementally constructing a three-dimensional environment map;
s6.4: and performing filtering optimization on the three-dimensional environment map by using voxel filtering to obtain an optimized three-dimensional environment map.
Step seven: loop detection and loop correction. Using a closed-loop key frame selection strategy of the ORB-SLAM2 framework, and correcting corresponding map points and key frame motion pose if a closed loop is formed; and then starting a new thread to perform global optimization, and fusing the optimized key frame motion pose and map points with the initial key frame motion pose and map points after the global optimization is completed.
To further illustrate the effect of the method presented in this example, the method presented in this example was tested on the V101 sequence of the EuRoC dataset. The EuRoC dataset is a series of images captured by a binocular camera mounted on an unmanned aerial vehicle, and is a common dataset for evaluating binocular vision SLAM. Fig. 3 is a comparison of the positioning accuracy of the method of the present embodiment and the open source ORB-SLAM2 method (a sequence segment with a stronger camera motion and a higher positioning difficulty is selected), and it can be seen that the positioning accuracy of the method and ORB-SLAM2 provided in the present embodiment is higher than that of ORB-SLAM2. Fig. 4 is a dense three-dimensional point cloud map established by the method provided by the embodiment, so that the outline of an object can be seen clearly, and the reconstruction effect on a scene is good.
The beneficial effects of this embodiment are: the invention uses BNN to realize binocular matching, improves matching precision and matching speed, thereby improving positioning precision and mapping speed, not only solving the problem that the feature method vision SLAM can not build a dense map, but also realizing real-time mapping of the dense map, and enabling the feature method vision SLAM to be practically applied. The invention has the complete function of SLAM, provides the functions of high-precision positioning and dense mapping, and has wide application range and good robustness.
Example 2
Another embodiment of a dense binocular SLAM method based on BNN, the method in this embodiment employs a system comprising a BNN (binary neural network) module, an ORB-SLAM2 framework, and a dense mapping module; BNN module: receiving left and right images of a binocular camera, and calculating a parallax image corresponding to the left image in real time through a trained binary neural network; ORB-SLAM2 framework: ORB-SLAM2 is an open source feature-based SLAM method supporting monocular, binocular and RGB-D cameras, providing high precision positioning through fusion of BNN modules with ORB-SLAM2 frames; and (5) a dense mapping module: and the dense map building module builds a dense three-dimensional environment map in real time according to the images input by the binocular camera, the parallax map obtained by the BNN module and the positioning result obtained by the ORB-SLAM2 frame.
The BNN module comprises a feature extraction unit, a matching cost calculation unit and a disparity map optimization unit;
the feature extraction unit extracts binary images of left and right eyes through a twin binary networkImage block making feature
Figure BDA0002567230350000111
Weight for each image block feature>
Figure BDA0002567230350000112
Obtaining a binary descriptor with weights, wherein the weights are learned by a network;
the matching cost calculation unit measures the similarity between image blocks by calculating the weighted hamming distance of the weighted binary description, and the calculation formula is as follows:
Figure BDA0002567230350000113
and the disparity map optimizing unit optimizes the matching cost through semi-global matching SGM to obtain a final disparity map.
The ORB-SLAM2 framework comprises a camera motion tracking module, a local mapping module and a loop detection module;
the camera motion tracking module is used for estimating the motion gesture of the camera;
the local map building module is used for building a sparse environment map;
the loop detection module is used for detecting a loop, judging whether the camera returns to a scene which is arrived before, and correcting the loop after detecting the loop, so as to correct the pose and map point position of the relevant camera;
the camera motion tracking module, the local mapping module and the loop detection module run on three CPU threads in parallel.
It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (8)

1. A BNN-based dense binocular SLAM method comprising the steps of:
step one: extracting ORB characteristics from the corrected left graph; ORB features consist of rotating multi-scale FAST corner points and 256 BRIEF descriptors corresponding to the corner points;
step two: acquiring a left image depth map; the method comprises the steps of calculating a parallax map of a left image by using a binary neural network for inputting left and right eye images, and obtaining a depth map of the left image according to a binocular distance principle from the obtained parallax map of the left image; the binary neural network has four layers and is trained by a binocular image library, the binary neural network is used for obtaining a depth image of the left image, the similarity of image blocks is measured by using weighted Hamming distance to obtain matching cost, the matching cost is optimized by semi-global matching SGM to obtain a parallax image, and the depth image of the left image is obtained according to a binocular distance principle;
features for network extracted image blocks
Figure FDA0004152783320000011
The cosine similarity is as follows:
Figure FDA0004152783320000012
in the method, in the process of the invention,
Figure FDA0004152783320000013
a feature vector representing an i-th image block of the left image; />
Figure FDA0004152783320000014
A feature vector representing an i-th image block of the right image;
Figure FDA0004152783320000015
representing the calculation of twoCosine similarity of feature vectors; />
Figure FDA0004152783320000016
A j-th element representing a feature vector of an i-th image block of the left image; />
Figure FDA0004152783320000017
A j-th element of the feature vector representing the i-th image block of the right image; the sign () function represents binarization;
cosine similarity is expressed by weighted hamming distance:
Figure FDA0004152783320000018
in the method, in the process of the invention,
Figure FDA0004152783320000019
representing the weight vector learned by the network; />
Figure FDA00041527833200000110
The j-th element of the weight vector obtained by the network learning is represented; the H () function represents the hamming distance;
the solving step of the weighted hamming distance optimal weight is as follows:
Figure FDA0004152783320000021
Figure FDA0004152783320000022
in the method, in the process of the invention,
Figure FDA0004152783320000023
a j-th element representing an i-th weight vector; />
Figure FDA0004152783320000024
Representing the weight vector learned by the network; />
Figure FDA0004152783320000025
The j-th element of the weight vector obtained by the network learning is represented; />
Figure FDA0004152783320000026
Representing the optimal weight; the H () function represents the hamming distance; j () represents building a least squares optimization problem;
for a pair of
Figure FDA0004152783320000027
About->
Figure FDA0004152783320000028
Deriving and setting the weight optimal solution to be 0:
Figure FDA0004152783320000029
in the method, in the process of the invention,
Figure FDA00041527833200000210
a j-th element representing an i-th weight vector; />
Figure FDA00041527833200000211
The j-th element of the weight vector obtained by the network learning is represented;
step three: obtaining the depth of ORB characteristic points of the left graph; positioning the positions of ORB feature points in the trusted region of the depth map, and obtaining the depth of the corresponding ORB feature points;
step four: estimating the current camera motion gesture;
step five: generating a key frame, and carrying out local mapping through the key frame;
step six: establishing a dense map;
step seven: loop detection and loop correction.
2. The BNN-based dense binocular SLAM method of claim 1, wherein the step three is performed as follows: selecting a trusted region of the depth map of the left image, enabling the length and width of the trusted region to be a and b respectively relative to the aspect ratio of the original depth map, setting parameters a and b to be determined according to parameters of the binocular camera, and selecting smaller a and smaller b when the baseline of the binocular camera is shorter; unreliable depth rejection, namely, the depth is too large or too small, as unreliable depth, setting upper and lower thresholds of the depth, and rejecting the depth outside the threshold range, wherein the threshold of the depth can be set by multiplying the baseline length of the binocular camera by a value of a certain constant; and positioning the positions of the ORB feature points in the trusted region of the depth map, and acquiring the depth of the corresponding ORB feature points.
3. The BNN-based dense binocular SLAM method of claim 2, wherein the step four is performed as follows:
s4.1, if the camera moves at a uniform speed, estimating the motion gesture of the current frame according to the motion gesture of the previous frame according to a uniform motion model; if the camera does not move at a uniform speed, performing feature matching on the current frame and a reference key frame of the current frame, accelerating a matching process by using a BoW dictionary, and optimizing the motion gesture of the current frame according to a matching result;
s4.2: and (3) performing feature matching on the current frame and the local map, and using a matching result to optimize the camera motion pose estimation in the step (4.1) to obtain final camera motion pose estimation.
4. The BNN-based dense binocular SLAM method of claim 3, wherein the step five of creating the local map comprises the specific steps of:
s5.1: establishing a common view and a spanning tree; the common view consists of key frames, if at least 15 identical map points exist between two key frames, an edge exists, and the weight of the edge is the number of the identical map points; the spanning tree is a sub-graph of the common view, and the edge weight requirement of the spanning tree is at least 100;
s5.2: inserting a key frame; processing the key frames sent by the camera motion tracking thread, and updating the common view and the spanning tree;
s5.3: updating map points; the method comprises the steps of clipping and generating map points, wherein the generation of the map points is generated from ORB characteristic points in key frames of common views by using a triangulation principle; cutting map points is to remove map points with less observation times;
s5.4: local BA optimization; optimizing the motion postures of the current frame and all key frames related to the current frame in the common view, and the spatial positions of map points corresponding to the key frames;
s5.5: cutting local key frames; the 90% map points are deleted from the key frames which can be observed by other key frames so as to control the scale of the local map not to be overlarge.
5. The BNN-based dense binocular SLAM method of claim 4, wherein in step six, the dense map is constructed by the specific steps of:
s6.1, for an input key frame, calculating a depth map of the key frame according to a binocular range principle by using a parallax map of the key frame and an internal reference of a camera;
s6.2, mapping image pixel coordinates in the key frame to three-dimensional world point coordinates according to the depth map of the key frame, the camera motion gesture of the key frame and the camera internal parameters;
s6.3, representing and storing three-dimensional world points by using RGB point clouds, and incrementally constructing a three-dimensional environment map;
s6.4: and performing filtering optimization on the three-dimensional environment map by using voxel filtering to obtain an optimized three-dimensional environment map.
6. A dense binocular SLAM method based on BNN according to any of claims 1-5, characterized in that a system is employed comprising BNN modules, ORB-SLAM2 frames and dense mapping modules; BNN module: receiving left and right images of a binocular camera, and calculating a parallax image corresponding to the left image in real time through a trained binary neural network; ORB-SLAM2 framework: ORB-SLAM2 is an open source feature-based SLAM method supporting monocular, binocular and RGB-D cameras, providing high precision positioning through fusion of BNN modules with ORB-SLAM2 frames; and (5) a dense mapping module: and the dense map building module builds a dense three-dimensional environment map in real time according to the images input by the binocular camera, the parallax map obtained by the BNN module and the positioning result obtained by the ORB-SLAM2 frame.
7. The BNN-based dense binocular SLAM method of claim 6, wherein the BNN module comprises a feature extraction unit, a matching cost calculation unit, and a disparity map optimization unit.
8. The BNN-based dense binocular SLAM method of claim 6, wherein the ORB-SLAM2 framework comprises a camera motion tracking module, a local mapping module, a loop detection module;
the camera motion tracking module is used for estimating the motion gesture of the camera; the local map building module is used for building a sparse environment map; the loop detection module is used for detecting a loop, judging whether the camera returns to a scene which is arrived before, and correcting the loop after detecting the loop, so as to correct the pose and map point position of the relevant camera.
CN202010627927.3A 2020-07-02 2020-07-02 BNN-based dense binocular SLAM method Active CN111998862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010627927.3A CN111998862B (en) 2020-07-02 2020-07-02 BNN-based dense binocular SLAM method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010627927.3A CN111998862B (en) 2020-07-02 2020-07-02 BNN-based dense binocular SLAM method

Publications (2)

Publication Number Publication Date
CN111998862A CN111998862A (en) 2020-11-27
CN111998862B true CN111998862B (en) 2023-05-16

Family

ID=73467329

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010627927.3A Active CN111998862B (en) 2020-07-02 2020-07-02 BNN-based dense binocular SLAM method

Country Status (1)

Country Link
CN (1) CN111998862B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112904901B (en) * 2021-01-14 2022-01-21 吉林大学 Path planning method based on binocular vision slam and fusion algorithm
CN112862959B (en) * 2021-03-23 2022-07-12 清华大学 Real-time probability monocular dense reconstruction method and system based on semantic prior
CN114359388A (en) * 2022-01-06 2022-04-15 闽都创新实验室 Binocular vision SLAM dense image construction method based on DNN stereo matching module
CN115619740B (en) * 2022-10-19 2023-08-08 广西交科集团有限公司 High-precision video speed measuring method, system, electronic equipment and storage medium
CN117456124B (en) * 2023-12-26 2024-03-26 浙江大学 Dense SLAM method based on back-to-back binocular fisheye camera

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808407A (en) * 2017-10-16 2018-03-16 亿航智能设备(广州)有限公司 Unmanned plane vision SLAM methods, unmanned plane and storage medium based on binocular camera
CN108416840A (en) * 2018-03-14 2018-08-17 大连理工大学 A kind of dense method for reconstructing of three-dimensional scenic based on monocular camera
CN108520554A (en) * 2018-04-12 2018-09-11 无锡信捷电气股份有限公司 A kind of binocular three-dimensional based on ORB-SLAM2 is dense to build drawing method
CN109583457A (en) * 2018-12-03 2019-04-05 荆门博谦信息科技有限公司 A kind of method and robot of robot localization and map structuring
CN109631855A (en) * 2019-01-25 2019-04-16 西安电子科技大学 High-precision vehicle positioning method based on ORB-SLAM
CN110050243A (en) * 2016-12-21 2019-07-23 英特尔公司 It is returned by using the enhancing nerve of the middle layer feature in autonomous machine and carries out camera repositioning
WO2019169540A1 (en) * 2018-03-06 2019-09-12 斯坦德机器人(深圳)有限公司 Method for tightly-coupling visual slam, terminal and computer readable storage medium
CN110599545A (en) * 2019-09-06 2019-12-20 电子科技大学中山学院 Feature-based dense map construction system
CN110738241A (en) * 2019-09-24 2020-01-31 中山大学 binocular stereo vision matching method based on neural network and operation frame thereof

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110050243A (en) * 2016-12-21 2019-07-23 英特尔公司 It is returned by using the enhancing nerve of the middle layer feature in autonomous machine and carries out camera repositioning
CN107808407A (en) * 2017-10-16 2018-03-16 亿航智能设备(广州)有限公司 Unmanned plane vision SLAM methods, unmanned plane and storage medium based on binocular camera
WO2019169540A1 (en) * 2018-03-06 2019-09-12 斯坦德机器人(深圳)有限公司 Method for tightly-coupling visual slam, terminal and computer readable storage medium
CN108416840A (en) * 2018-03-14 2018-08-17 大连理工大学 A kind of dense method for reconstructing of three-dimensional scenic based on monocular camera
CN108520554A (en) * 2018-04-12 2018-09-11 无锡信捷电气股份有限公司 A kind of binocular three-dimensional based on ORB-SLAM2 is dense to build drawing method
CN109583457A (en) * 2018-12-03 2019-04-05 荆门博谦信息科技有限公司 A kind of method and robot of robot localization and map structuring
CN109631855A (en) * 2019-01-25 2019-04-16 西安电子科技大学 High-precision vehicle positioning method based on ORB-SLAM
CN110599545A (en) * 2019-09-06 2019-12-20 电子科技大学中山学院 Feature-based dense map construction system
CN110738241A (en) * 2019-09-24 2020-01-31 中山大学 binocular stereo vision matching method based on neural network and operation frame thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
3D Semantic Mapping Based on Convolutional Neural Networks;Jing Li等;《2018 37th Chinese Control Conference (CCC)》;第9303-9308页 *
Relative Pose Estimation of Visual SLAM Based on Convolutional Neural Networks;Xiaogang Ruan等;《2019 Chinese Control Conference (CCC)》;第8827-8832页 *
基于卷积神经网络的语义同时定位以及地图构建方法;刘智杰等;《科学技术与工程》;第19卷(第9期);第148-153页 *

Also Published As

Publication number Publication date
CN111998862A (en) 2020-11-27

Similar Documents

Publication Publication Date Title
CN112258618B (en) Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
CN111998862B (en) BNN-based dense binocular SLAM method
CN109166149B (en) Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
CN110097553B (en) Semantic mapping system based on instant positioning mapping and three-dimensional semantic segmentation
CN111462135B (en) Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation
CN112132972B (en) Three-dimensional reconstruction method and system for fusing laser and image data
CN106940704B (en) Positioning method and device based on grid map
US20210390329A1 (en) Image processing method, device, movable platform, unmanned aerial vehicle, and storage medium
CN111563415B (en) Binocular vision-based three-dimensional target detection system and method
CN110176032B (en) Three-dimensional reconstruction method and device
CN109102547A (en) Robot based on object identification deep learning model grabs position and orientation estimation method
CN110334701B (en) Data acquisition method based on deep learning and multi-vision in digital twin environment
CN110853075A (en) Visual tracking positioning method based on dense point cloud and synthetic view
CN111897349A (en) Underwater robot autonomous obstacle avoidance method based on binocular vision
CN111768449B (en) Object grabbing method combining binocular vision with deep learning
CN113744315B (en) Semi-direct vision odometer based on binocular vision
CN110570474B (en) Pose estimation method and system of depth camera
CN113393524B (en) Target pose estimation method combining deep learning and contour point cloud reconstruction
CN115032648B (en) Three-dimensional target identification and positioning method based on laser radar dense point cloud
Ma et al. Crlf: Automatic calibration and refinement based on line feature for lidar and camera in road scenes
CN114494644A (en) Binocular stereo matching-based spatial non-cooperative target pose estimation and three-dimensional reconstruction method and system
Yong-guo et al. The navigation of mobile robot based on stereo vision
CN111198563B (en) Terrain identification method and system for dynamic motion of foot type robot
CN117197333A (en) Space target reconstruction and pose estimation method and system based on multi-view vision
WO2023030062A1 (en) Flight control method and apparatus for unmanned aerial vehicle, and device, medium and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant