CN114140527A - Dynamic environment binocular vision SLAM method based on semantic segmentation - Google Patents

Dynamic environment binocular vision SLAM method based on semantic segmentation Download PDF

Info

Publication number
CN114140527A
CN114140527A CN202111373890.7A CN202111373890A CN114140527A CN 114140527 A CN114140527 A CN 114140527A CN 202111373890 A CN202111373890 A CN 202111373890A CN 114140527 A CN114140527 A CN 114140527A
Authority
CN
China
Prior art keywords
feature points
dynamic
binocular
semantic
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111373890.7A
Other languages
Chinese (zh)
Inventor
沈晔湖
李星
卢金斌
王其聪
赵冲
蒋全胜
朱其新
谢鸥
牛福洲
牛雪梅
付贵忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University of Science and Technology
Original Assignee
Suzhou University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University of Science and Technology filed Critical Suzhou University of Science and Technology
Priority to CN202111373890.7A priority Critical patent/CN114140527A/en
Publication of CN114140527A publication Critical patent/CN114140527A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/38Electronic maps specially adapted for navigation; Updating thereof
    • G01C21/3804Creation or updating of map data
    • G01C21/3833Creation or updating of map data characterised by the source of data
    • G01C21/3841Data obtained from two or more sources, e.g. probe vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Automation & Control Theory (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a dynamic environment binocular vision SLAM method based on semantic segmentation, which comprises the following steps: obtaining a semantic mask of an object, wherein the semantic mask is generated through a deep learning network; acquiring a plurality of continuous binocular images by using a binocular camera; extracting characteristic points on each frame of binocular image, and matching the characteristic points on the adjacent frames of binocular images; removing the feature points on the semantic mask, and calculating the pose of the camera according to the remaining feature points; separating dynamic objects and static objects on the binocular image based on the camera pose; recalculating the camera pose based on the separated static object; and constructing a static map based on the updated camera pose and the feature points on the static object. The method uses the binocular camera, takes the image segmented by the semantic information as the guide, can identify the dynamic and static objects in the scene, and realizes the construction of the map.

Description

Dynamic environment binocular vision SLAM method based on semantic segmentation
Technical Field
The invention relates to the technical field of visual space positioning, in particular to a dynamic environment binocular vision SLAM method based on semantic segmentation.
Background
With the development of computer technology and artificial intelligence, intelligent autonomous mobile robots become an important research direction and research hotspot in the field of robots. Along with the gradual intellectualization of the mobile robot, the mobile robot has higher and higher requirements on self positioning and an environment map. Currently, the smart mobile robot has some practical applications to perform self-localization and mapping in known environments, but there are still many challenges in unknown environments. The technology for completing positioning and mapping in such an environment is called SLAM (simultaneous Localization and mapping), i.e. synchronous positioning and mapping, and the goal of SLAM is to enable a robot to complete self-positioning and incremental mapping during the movement of an unknown environment.
The traditional SLAM algorithm relies primarily on a more stable range sensor, such as a lidar. However, the range data obtained by the laser radar is very sparse, which causes the environment map constructed by the SLAM to only contain a very small number of landmark points. This map can only be used to improve the positioning accuracy of the robot and cannot be used in other fields of robot navigation such as path planning. Moreover, the high price, large volume and weight and power consumption of the laser radar limit the application of the laser radar in certain fields. Although the camera can overcome the disadvantages of the laser radar in price, volume, quality and power consumption to a certain extent, and meanwhile, the camera can acquire rich information, the camera also has some problems, such as sensitivity to light changes, high operation complexity and the like. At present, a multi-sensor fusion SLAM algorithm is provided, which can effectively relieve the problems caused by the self-deficiency of a single sensor, but further increases the cost and the complexity of the algorithm.
The existing visual SLAM algorithm is mostly based on the environmental static assumption, that is, the scene is static and there are no objects moving relatively. However, in an actual outdoor scene, dynamic objects such as pedestrians and vehicles are present in a large amount, which limits the operation of the SLAM system based on the above assumption in an actual scene. Aiming at the problem that the positioning accuracy and stability of the visual SLAM algorithm are reduced in a dynamic environment, the existing algorithm uses algorithms based on probability statistics or geometric constraints, and the influence of a dynamic object on the accuracy and stability of the visual SLAM algorithm is reduced. For example, when there are a small number of dynamic objects in the scene, probabilistic algorithms such as ransac (random Sample consensus) can be used to cull the dynamic objects. However, when a large number of dynamic objects appear in the scene, the algorithm cannot normally distinguish the dynamic objects. While other algorithms use optical flow to distinguish dynamic objects, which is indeed possible in scenes with a large number of dynamic objects, but the execution efficiency of the SLAM algorithm is reduced due to the time-consuming process of computing a dense optical flow.
Therefore, how to provide a dynamic environment binocular vision SLAM method based on semantic segmentation, which is simple in operation and low in cost and can be applied to most of practical scenes, is a technical problem to be solved urgently by those skilled in the art.
Disclosure of Invention
The invention provides a dynamic environment binocular vision SLAM method based on semantic segmentation, which aims to solve the technical problem.
In order to solve the technical problem, the invention provides a dynamic environment binocular vision SLAM method based on semantic segmentation, which comprises the following steps:
obtaining a semantic mask of an object, wherein the semantic mask is generated through a deep learning network;
acquiring a plurality of continuous binocular images by using a binocular camera;
extracting feature points on each frame of binocular image, and matching the feature points on the adjacent frames of binocular images;
removing the feature points on the semantic mask, and calculating the pose of the camera according to the remaining feature points;
separating dynamic objects and static objects on the binocular image based on the camera pose;
estimating the motion parameters of the dynamic object based on the separated dynamic object;
recalculating the camera pose based on the separated static object;
and constructing a static map based on the updated camera pose and the feature points on the static object.
Preferably, the deep learning network for generating the semantic Mask is a Mask R-CNN model.
Preferably, the method for extracting the feature points on the binocular images of each frame and matching the feature points on the binocular images of the adjacent frames comprises:
extracting the characteristic points by adopting an ORB method;
obtaining the descriptors of each feature point on each frame of binocular image, calculating the Hamming distance between two descriptors of one feature point on two adjacent frames of binocular images, and forming a group of matched feature points by two feature points with the minimum Hamming distance.
Preferably, the method for determining whether the feature point is located on the semantic mask includes: the semantic mask at least comprises a frame of the object, and the coordinates of the feature points are located in the frame range, so that the feature points are located on the semantic mask.
Preferably, the method for calculating the camera pose according to the remaining feature points includes: and solving the pose of the camera by adopting a PnP algorithm.
Preferably, the separating of the dynamic object and the static object on the binocular image based on the camera pose; the method for estimating the motion parameters of the dynamic object based on the separated dynamic object comprises the following steps:
separating the dynamic object: calculating the motion probability of an object corresponding to the semantic mask based on the camera pose and the position relation between the binocular images of the adjacent frames and the semantic mask, and if the motion probability is greater than a first threshold value, judging that the object corresponding to the semantic mask is a dynamic object;
dynamic object matching: calculating hu moment, central point Euclidean distance and histogram distribution of a semantic mask corresponding to the dynamic object in the binocular images of the adjacent frames aiming at the dynamic object, calculating the matching probability of the dynamic object in the binocular images of the adjacent frames based on the hu moment, the central point Euclidean distance and the histogram distribution, and if the probability is greater than a second threshold value, enabling the two dynamic objects in the binocular images of the adjacent frames to be the same object; and
and (3) dynamic object motion estimation: and completing the association of the dynamic object between the continuous frames through the dynamic object matching, and estimating the motion parameters of the dynamic object through a PnP algorithm.
Preferably, the step of separating the dynamic object comprises:
calculating the position of the semantic mask of the previous frame corresponding to the current frame based on the camera pose;
calculating three-dimensional coordinates of all feature points on the semantic mask after projection by using a disparity map, wherein the disparity map is obtained by calculating through the binocular image;
calculating errors of the corresponding feature points of the previous frame and the current frame in the x, y and z directions, wherein the maximum value of the errors is used as an error value of the feature point;
and converting the error value into the motion probability of the object corresponding to the semantic mask where the feature point is located, and judging whether the object corresponding to the semantic mask is a dynamic object or not based on the motion probability.
Preferably, the method for recalculating the camera pose based on the separated static object comprises: and eliminating the feature points on the semantic mask corresponding to the dynamic object, and updating the camera pose by adopting a PnP algorithm according to the remaining feature points.
Preferably, the method for constructing the static map based on the updated camera pose and the feature points located on the static object comprises:
determining a plurality of keyframes based on the updated camera poses and the feature points located on the static object;
matching the feature points on the plurality of key frames, and eliminating unmatched feature points;
checking whether the matched feature points meet epipolar geometric constraint or not, and eliminating the feature points which are not met;
checking whether the forward depth of field, the parallax, the back projection error and the scale of the residual feature points are consistent, eliminating inconsistent feature points, and generating map points based on the residual feature points;
constructing the static map based on the map points.
Preferably, before the static map is constructed, a step of optimizing the generated map points by bundle adjustment is further included.
Compared with the prior art, the dynamic environment binocular vision SLAM method based on semantic segmentation provided by the invention uses a binocular camera, takes an image segmented by semantic information as a guide, can identify dynamic and static objects in a scene, and realizes the construction of a map.
Drawings
FIG. 1 is a schematic flow chart of a dynamic environment binocular vision SLAM method based on semantic segmentation according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of separating dynamic objects according to an embodiment of the present invention.
Detailed Description
In order to more thoroughly express the technical scheme of the invention, the following specific examples are listed to demonstrate the technical effect; it is emphasized that these examples are intended to illustrate the invention and are not to be construed as limiting the scope of the invention.
The invention provides a dynamic environment binocular vision SLAM method based on semantic segmentation, which comprises the following steps as shown in figure 1:
in the embodiment, the deep learning network used for generating the semantic Mask is a Mask R-CNN model, so that high-quality semantic segmentation is realized.
Adopt the binocular camera to acquire the continuous binocular image of multiframe, follow can acquire the third dimension degree of depth information of two-dimensional image pixel in the binocular image, of course, about the internal reference and the external reference of binocular camera mainly include: focal length f of camera, optical center (u, v) of camera, and radial distortion coefficient kc of camera lens1And kc2And the parameters can be obtained by calibrating the Zhangyingyou calibration method.
And extracting the characteristic points on each frame of binocular image, and matching the characteristic points on the adjacent frames of binocular images. The specific method comprises the following steps:
extracting the characteristic points by adopting an ORB (English full name: ordered Fast and Rotated Brief) method;
obtaining the descriptors of each feature point on each frame of binocular image, calculating the Hamming distance between two descriptors of one feature point on two adjacent frames of binocular images, and forming a group of matched feature points by two feature points with the minimum Hamming distance.
And eliminating the feature points on the semantic mask, and calculating the pose of the camera according to the remaining feature points. The method for judging whether the feature points are positioned on the semantic mask comprises the following steps: the semantic mask at least comprises a frame of the object, and if the coordinates of the feature points are located in the range of the frame, the feature points are located on the semantic mask; and if the feature point is not positioned in the frame range, the feature point is not positioned on the semantic mask. The method for calculating the camera pose according to the remaining feature points comprises the following steps: solving the camera pose by adopting a PnP (English full name Perspectral-n-Point) algorithm, constructing a reprojection error and optimizing as shown in the following formula (1):
Figure BDA0003363379700000051
and obtaining an optimal solution, namely the required camera pose, by minimizing the reprojection error.
Separating the dynamic object and the static object on the binocular image based on the camera pose, and the specific method comprises the following steps:
separating the dynamic object: and calculating the motion probability of the object corresponding to the semantic mask based on the camera pose and the position relation between the binocular images of the adjacent frames and the semantic mask, and if the motion probability is greater than a first threshold value, judging that the object corresponding to the semantic mask is a dynamic object. The specific steps are shown in fig. 2, and include:
calculating the position of the semantic mask of the previous frame corresponding to the current frame based on the camera pose;
calculating three-dimensional coordinates of all feature points on the semantic mask after projection by using a disparity map, wherein the disparity map is obtained by calculating through the binocular image, and the disparity map can be calculated by specifically adopting an ELAS (empirical mode for standardization) algorithm;
calculating errors of the corresponding feature points of the previous frame and the current frame in the x, y and z directions, wherein the maximum value of the errors is used as an error value of the feature point;
and converting the error value into the motion probability of the object corresponding to the semantic mask where the feature point is located, and judging whether the object corresponding to the semantic mask is a dynamic object or not based on the motion probability.
As known from the camera imaging principle, the conversion relationship between the three-dimensional coordinate system and the pixel (two-dimensional) coordinate system and the depth and parallax are converted into:
Figure BDA0003363379700000061
Figure BDA0003363379700000062
recording the coordinate set of the jth semantic mask of the t-1 frame on a pixel coordinate system as
Figure BDA0003363379700000063
Obtaining the three-dimensional coordinate set of the semantic mask at the moment through formula (2) and formula (3)
Figure BDA0003363379700000064
Obtaining a set of post-motion three-dimensional points by formula (4)
Figure BDA0003363379700000065
Figure BDA0003363379700000066
Obtained by the formula (3)
Figure BDA0003363379700000067
Set converted to pixel coordinate system
Figure BDA0003363379700000068
Then use
Figure BDA0003363379700000069
And the disparity map is obtained by calculation through formula (2) and formula (3)
Figure BDA00033633797000000610
Note the book
Figure BDA00033633797000000611
Is composed of
Figure BDA00033633797000000612
At the point of the ith (m) th,
Figure BDA00033633797000000613
is composed of
Figure BDA00033633797000000614
And (5) calculating the error delta i between two points as:
Figure BDA00033633797000000615
then the error of the object corresponding to the feature point is:
Figure BDA00033633797000000616
calculated motion probability S (Δ)j) Namely:
Figure BDA00033633797000000617
dynamic object matching: and calculating the hu moment (namely image moment), the Euclidean distance of the central point and the histogram distribution of the semantic mask corresponding to the dynamic object in the binocular images of the adjacent frames aiming at the dynamic object, calculating the matching probability of the dynamic object in the binocular images of the adjacent frames based on the hu moment, the Euclidean distance of the central point and the histogram distribution, and if the probability is greater than a second threshold value, determining that the two dynamic objects in the binocular images of the adjacent frames are the same object. Specifically, the hu moment of an image is an image feature with translation, rotation, and scale invariance.
The common moment calculation formula of the image is as follows:
Figure BDA0003363379700000071
calculating the hu moment requires calculating the center distance, first calculating the centroid coordinates:
Figure BDA0003363379700000072
Figure BDA0003363379700000073
the center moment is then constructed:
Figure BDA0003363379700000074
the center-to-center distances are then normalized:
Figure BDA0003363379700000075
the hu moment is constructed by the central moment, and has 7 invariant moments, and the specific formula is as follows:
Φ1=η2002
Figure BDA00033633797000000710
Φ3=(η20-3η12)2+3(η2103)2
Φ4=(η3012)2+(η2103)2
Φ5=(η30+3η12)(η3012)[(η3012)2-3(η2103)2+(3η2103)(η2103)[3(η3012)2-(η2103)2
Φ6=(η2002)[(η3012)2-(η2103)2]+4η113012)(η2103)
Φ7=(3η2103)(η3012)[(η3012)2-3(η2103)2]+]+(3η1230)(η2103)[3(η3012)2-(η2103)2] (12)
note the book
Figure BDA0003363379700000076
The hu moments of j semantic masks of the t-1 th frame are taken as the distance between the hu moments of two semantic masks:
Figure BDA0003363379700000077
calculating the center position of each semantic mask, then calculating the Euclidean distance of the center position of each semantic mask between the front frame and the rear frame, and recording the Euclidean distance as:
Figure BDA0003363379700000078
the histogram distribution of the semantic mask is calculated and then normalized, and is recorded as
Figure BDA0003363379700000079
And calculating the Kl divergence (also called as relative entropy) of different semantic masks of the front frame and the rear frame.
Figure BDA0003363379700000081
And (3) estimating the matching probability by combining the hu moment, the Euclidean distance and the histogram:
Figure BDA0003363379700000082
the method for estimating the motion parameters of the dynamic object based on the separated dynamic object comprises the following steps: and (3) dynamic object motion estimation: and completing the association of the dynamic object between the continuous frames through the dynamic object matching, and estimating the motion parameters of the dynamic object through a PnP algorithm.
Recalculating the camera pose based on the separated static object, the specific method comprises: and removing the feature points on the semantic mask corresponding to the dynamic object, and updating the camera pose by adopting a PnP algorithm according to the remaining feature points, wherein the specific calculation method can refer to the method for calculating the camera pose for the first time.
Constructing a static map based on the updated camera pose and the feature points on the static object, wherein the specific method comprises the following steps:
determining a plurality of keyframes based on the updated camera poses and the feature points located on the static object;
matching the feature points on a plurality of key frames, triangularizing the matched feature points, matching the unmatched feature points in other key frames until all matched feature points are found and rejecting unmatched feature points;
checking whether the matched feature points meet epipolar geometric constraint or not, and eliminating the feature points which are not met;
checking whether the forward depth of field, the parallax, the back projection error and the scale of the residual feature points are consistent, eliminating inconsistent feature points, and generating map points based on the residual feature points;
constructing the static map based on the map points.
Preferably, before the static map is constructed, a step of optimizing the generated map points by Bundle Adjustment (BA) is further included.
According to the method, the binocular images are processed, the dynamic objects existing in the binocular images are identified, the camera pose and the pose of the dynamic objects are estimated, the environment map is constructed, and the requirements of the mobile robot on the three-dimensional map are met.
In summary, the semantic segmentation based binocular vision SLAM method for dynamic environment provided by the invention comprises the following steps: obtaining a semantic mask of an object, wherein the semantic mask is generated through a deep learning network; acquiring a plurality of continuous binocular images by using a binocular camera; extracting feature points on each frame of binocular image, and matching the feature points on the adjacent frames of binocular images; removing the feature points on the semantic mask, and calculating the pose of the camera according to the remaining feature points; separating dynamic objects and static objects on the binocular image based on the camera pose; estimating the motion parameters of the dynamic object based on the separated dynamic object; recalculating the camera pose based on the separated static object; and constructing a static map based on the updated camera pose and the feature points on the static object. The method uses the binocular camera, takes the image segmented by the semantic information as the guide, can identify the dynamic and static objects in the scene, and realizes the construction of the map.
It will be apparent to those skilled in the art that various changes and modifications may be made in the invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A dynamic environment binocular vision SLAM method based on semantic segmentation is characterized by comprising the following steps:
obtaining a semantic mask of an object, wherein the semantic mask is generated through a deep learning network;
acquiring a plurality of continuous binocular images by using a binocular camera;
extracting feature points on each frame of binocular image, and matching the feature points on the adjacent frames of binocular images;
removing the feature points on the semantic mask, and calculating the pose of the camera according to the remaining feature points;
separating dynamic objects and static objects on the binocular image based on the camera pose;
estimating the motion parameters of the dynamic object based on the separated dynamic object;
recalculating the camera pose based on the separated static object;
and constructing a static map based on the updated camera pose and the feature points on the static object.
2. The semantic segmentation based dynamic environment binocular vision SLAM method of claim 1, wherein the deep learning network used to generate the semantic Mask is a Mask R-CNN model.
3. The semantic segmentation based dynamic environment binocular vision SLAM method of claim 1, wherein the method of extracting feature points on the binocular images of each frame, matching feature points on binocular images of adjacent frames comprises:
extracting the characteristic points by adopting an ORB method;
obtaining the descriptors of each feature point on each frame of binocular image, calculating the Hamming distance between two descriptors of one feature point on two adjacent frames of binocular images, and forming a group of matched feature points by two feature points with the minimum Hamming distance.
4. The semantic segmentation based dynamic environment binocular vision SLAM method of claim 1, wherein the method of determining whether the feature points are located on the semantic mask comprises: the semantic mask at least comprises a frame of the object, and the coordinates of the feature points are located in the frame range, so that the feature points are located on the semantic mask.
5. The semantic segmentation based dynamic environment binocular vision SLAM method of claim 1, wherein the method of calculating camera pose from remaining feature points comprises: and solving the pose of the camera by adopting a PnP algorithm.
6. The semantic segmentation based dynamic environment binocular vision SLAM method of claim 1, wherein the separation of dynamic objects and static objects on the binocular images based on the camera pose; the method for estimating the motion parameters of the dynamic object based on the separated dynamic object comprises the following steps:
separating the dynamic object: calculating the motion probability of an object corresponding to the semantic mask based on the camera pose and the position relation between the binocular images of the adjacent frames and the semantic mask, and if the motion probability is greater than a first threshold value, judging that the object corresponding to the semantic mask is a dynamic object;
dynamic object matching: calculating hu moment, central point Euclidean distance and histogram distribution of a semantic mask corresponding to the dynamic object in the binocular images of the adjacent frames aiming at the dynamic object, calculating the matching probability of the dynamic object in the binocular images of the adjacent frames based on the hu moment, the central point Euclidean distance and the histogram distribution, and if the probability is greater than a second threshold value, enabling the two dynamic objects in the binocular images of the adjacent frames to be the same object; and
and (3) dynamic object motion estimation: and completing the association of the dynamic object between the continuous frames through the dynamic object matching, and estimating the motion parameters of the dynamic object through a PnP algorithm.
7. The semantic segmentation based dynamic ambient binocular vision SLAM method of claim 6 wherein the step of separating dynamic objects comprises:
calculating the position of the semantic mask of the previous frame corresponding to the current frame based on the camera pose;
calculating three-dimensional coordinates of all feature points on the semantic mask after projection by using a disparity map, wherein the disparity map is obtained by calculating through the binocular image;
calculating errors of the corresponding feature points of the previous frame and the current frame in the x, y and z directions, wherein the maximum value of the errors is used as an error value of the feature point;
and converting the error value into the motion probability of the object corresponding to the semantic mask where the feature point is located, and judging whether the object corresponding to the semantic mask is a dynamic object or not based on the motion probability.
8. The semantic segmentation based dynamic environment binocular vision SLAM method of claim 1, wherein the method of recalculating camera pose based on separated static objects comprises: and eliminating the feature points on the semantic mask corresponding to the dynamic object, and updating the camera pose by adopting a PnP algorithm according to the remaining feature points.
9. The semantic segmentation based dynamic environment binocular vision SLAM method of claim 1, wherein the method of constructing a static map based on updated camera poses and feature points located on the static object comprises:
determining a plurality of keyframes based on the updated camera poses and the feature points located on the static object;
matching the feature points on the plurality of key frames, and eliminating unmatched feature points;
checking whether the matched feature points meet epipolar geometric constraint or not, and eliminating the feature points which are not met;
checking whether the forward depth of field, the parallax, the back projection error and the scale of the residual feature points are consistent, eliminating inconsistent feature points, and generating map points based on the residual feature points;
constructing the static map based on the map points.
10. The semantic segmentation based dynamic ambient binocular vision SLAM method of claim 9 further comprising the step of optimizing the generated map points by bundle adjustment prior to constructing the static map.
CN202111373890.7A 2021-11-19 2021-11-19 Dynamic environment binocular vision SLAM method based on semantic segmentation Pending CN114140527A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111373890.7A CN114140527A (en) 2021-11-19 2021-11-19 Dynamic environment binocular vision SLAM method based on semantic segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111373890.7A CN114140527A (en) 2021-11-19 2021-11-19 Dynamic environment binocular vision SLAM method based on semantic segmentation

Publications (1)

Publication Number Publication Date
CN114140527A true CN114140527A (en) 2022-03-04

Family

ID=80390414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111373890.7A Pending CN114140527A (en) 2021-11-19 2021-11-19 Dynamic environment binocular vision SLAM method based on semantic segmentation

Country Status (1)

Country Link
CN (1) CN114140527A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116524026A (en) * 2023-05-08 2023-08-01 哈尔滨理工大学 Dynamic vision SLAM method based on frequency domain and semantics
CN116958265A (en) * 2023-09-19 2023-10-27 交通运输部天津水运工程科学研究所 Ship pose measurement method and system based on binocular vision

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116524026A (en) * 2023-05-08 2023-08-01 哈尔滨理工大学 Dynamic vision SLAM method based on frequency domain and semantics
CN116524026B (en) * 2023-05-08 2023-10-27 哈尔滨理工大学 Dynamic vision SLAM method based on frequency domain and semantics
CN116958265A (en) * 2023-09-19 2023-10-27 交通运输部天津水运工程科学研究所 Ship pose measurement method and system based on binocular vision

Similar Documents

Publication Publication Date Title
CN109345588B (en) Tag-based six-degree-of-freedom attitude estimation method
CN111462135B (en) Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation
WO2021233029A1 (en) Simultaneous localization and mapping method, device, system and storage medium
CN111201451B (en) Method and device for detecting object in scene based on laser data and radar data of scene
CN110335319B (en) Semantic-driven camera positioning and map reconstruction method and system
CN110827395B (en) Instant positioning and map construction method suitable for dynamic environment
CN110322511B (en) Semantic SLAM method and system based on object and plane features
CN110688905B (en) Three-dimensional object detection and tracking method based on key frame
CN112396595B (en) Semantic SLAM method based on point-line characteristics in dynamic environment
CN108537844B (en) Visual SLAM loop detection method fusing geometric information
CN111882602B (en) Visual odometer implementation method based on ORB feature points and GMS matching filter
CN112435262A (en) Dynamic environment information detection method based on semantic segmentation network and multi-view geometry
CN110070578B (en) Loop detection method
CN114140527A (en) Dynamic environment binocular vision SLAM method based on semantic segmentation
CN112419497A (en) Monocular vision-based SLAM method combining feature method and direct method
WO2021114776A1 (en) Object detection method, object detection device, terminal device, and medium
Shi et al. An improved lightweight deep neural network with knowledge distillation for local feature extraction and visual localization using images and LiDAR point clouds
CN111899345B (en) Three-dimensional reconstruction method based on 2D visual image
CN111998862A (en) Dense binocular SLAM method based on BNN
CN115410167A (en) Target detection and semantic segmentation method, device, equipment and storage medium
CN114088081A (en) Map construction method for accurate positioning based on multi-segment joint optimization
CN112634305B (en) Infrared visual odometer implementation method based on edge feature matching
Shi et al. Dense semantic 3D map based long-term visual localization with hybrid features
CN116468786A (en) Semantic SLAM method based on point-line combination and oriented to dynamic environment
CN114648639B (en) Target vehicle detection method, system and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination