CN116468786B - Semantic SLAM method based on point-line combination and oriented to dynamic environment - Google Patents

Semantic SLAM method based on point-line combination and oriented to dynamic environment Download PDF

Info

Publication number
CN116468786B
CN116468786B CN202211619407.3A CN202211619407A CN116468786B CN 116468786 B CN116468786 B CN 116468786B CN 202211619407 A CN202211619407 A CN 202211619407A CN 116468786 B CN116468786 B CN 116468786B
Authority
CN
China
Prior art keywords
point
matching
line
points
lines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211619407.3A
Other languages
Chinese (zh)
Other versions
CN116468786A (en
Inventor
杨健
董军宇
范浩
饶源
时正午
杨凯
李丛
刘伊美
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202211619407.3A priority Critical patent/CN116468786B/en
Publication of CN116468786A publication Critical patent/CN116468786A/en
Application granted granted Critical
Publication of CN116468786B publication Critical patent/CN116468786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention provides a semantic SLAM method based on point-line combination, which is oriented to a dynamic environment, improves on the basis of ORB-SLAM3, and is oriented to the dynamic environment. The method is used for extracting the point and line characteristics, and using the point and line characteristics for accurate matching and repositioning of Lu Bang under the scene lacking texture and illumination change to estimate the pose of the camera, so that the positioning error and repositioning error are reduced, and the algorithm solves the problems of failure detection and difficult positioning of the characteristic points in the weak texture area and the illumination change scene.

Description

Semantic SLAM method based on point-line combination and oriented to dynamic environment
Technical Field
The invention relates to the field of computer vision, in particular to a semantic SLAM method based on point-line combination and oriented to a dynamic environment.
Background
The synchronous positioning and map construction technology (Simultaneous Localization and Mapping, SLAM) refers to that a robot collects surrounding environment information by using various sensors carried by the robot in an unknown environment, analyzes the position of the robot by an algorithm and establishes a map of the surrounding environment, wherein vision SLAM (Visual SLAM) mainly uses cameras to acquire data, including monocular, binocular, RGB-D cameras and the like, and the camera sensor used by the robot has the characteristics of high cost performance, small volume, low power consumption and capability of acquiring abundant environment information, so that the robot becomes a popular research field in recent years.
Various algorithms of the traditional visual SLAM can obtain good feature matching in a static scene, and mismatching can occur in a dynamic scene, so that great errors are generated in positioning and mapping of the SLAM system. Therefore, aiming at the problem that the positioning accuracy and the robustness of an SLAM system are reduced when a dynamic moving object exists in an application scene, a semantic SLAM method and a semantic SLAM system based on feature points and feature lines are provided.
The existing semantic SLAM technology mainly aims at a scene with a dynamic object, and the existing semantic SLAM technology mainly adopts a mode that pixels on all prior dynamic objects are deleted, the rest pixels are utilized for feature extraction and subsequent positioning research, or all dynamic feature points are deleted, only static feature points are adopted for feature point matching and rear end processing, the method can improve the positioning precision of a camera in a dynamic scene with rich textures, but for the scene with the dynamic object with low textures and strong illumination, only the information of the feature points and the semantics is adopted, so that enough data are difficult to obtain, tracking loss of a SLAM system is easy to be caused, and the positioning precision is reduced.
Currently, vision-based SLAM algorithm research has made great progress, such as ORB-SLAM2 (Orient FAST and Rotated BRIEF SLAM), LSD-SLAM (Large Scale Direct monocular SLAM), and the like. However, these algorithms are generally based on a strong assumption that a static working environment has many features and no obvious illumination changes, and have strict limitations on the application environment. The assumption influences the applicability of the visual SLAM system in an actual scene, when the environment is a dynamic weak texture area and has illumination change, the characteristic points are sensitive to the scene and are difficult to detect, the accuracy and the robustness of camera pose estimation can be reduced, errors are caused to the positioning based on vision, and a large deviation occurs to the three-dimensional reconstruction result.
The camera is typically in motion during the mobile robot's positioning and mapping process using the camera. This makes classical motion segmentation methods such as background removal (Background Subtraction) unusable in visual SLAM. Early SLAM systems mostly employed data optimization methods to reduce the effects of dynamic objects. A Random sampling consistency detection (Random SampleConsensus, RANSAC) algorithm is used for roughly estimating a basic matrix between two frames, semantic information and a mobile consistency detection result are combined, the establishment of a two-stage semantic knowledge base is completed, and all feature points in a dynamic contour are deleted as noise or discrete points. And eliminating the inter-frame characteristic point matching pairs on the dynamic object by using a RANSAC algorithm, and reducing the influence of the dynamic object on the SLAM system to a certain extent. These methods all implicitly assume that the objects in the image are mostly static and will fail when the data generated by the dynamic object exceeds a certain threshold.
In the prior art, researches on visual positioning, robot navigation and the like in scenes with abundant features such as cities, indoors and the like have been advanced to a certain extent, but many research contents are still insufficient, and for scenes with low texture and illumination variation with geometric features, the following problems still exist in visual positioning:
(1) The existing method is influenced by the problems of shielding, missing and the like of objects in the aspect of feature detection, and the complete geometric features are difficult to detect from the image, so that the pose of a camera is difficult to calculate;
(2) The existing method is affected by few textures and few feature points in the low-texture image, so that features of the image are difficult to extract, or feature matching errors are caused, SLAM tracking and repositioning are invalid, and camera pose recognition is poor;
(3) In the area with obvious illumination change, the detection of the characteristic points is sensitive, and the problems of difficult detection of the characteristic points, no matching and the like are easy to occur, so that the pose of a camera is inaccurate;
and combining MASK-RCNN with multi-view geometry to realize the example segmentation and rejection of the dynamic target, simultaneously identifying dynamic characteristic points, eliminating the interference of the dynamic target on characteristic matching and eliminating the influence of the dynamic target on an SLAM system.
Disclosure of Invention
The invention improves on the basis of ORB-SLAM3, and provides a semantic SLAM method based on point line characteristics, compared with the point characteristics, the line provides more geometric structure information about the environment, and the camera pose is jointly optimized through the point line, so that the camera positioning precision and robustness are improved. The method is used for extracting the point and line characteristics, and using the point and line characteristics for accurate matching and repositioning of Lu Bang under the scene lacking texture and illumination change to estimate the pose of the camera, so that the positioning error and repositioning error are reduced, and the algorithm solves the problems of failure detection and difficult positioning of the characteristic points in the weak texture area and the illumination change scene.
The invention is realized by the following technical scheme: a semantic SLAM method facing dynamic environment based on point-line combination specifically comprises the following steps:
step S1: acquiring an image stream of a scene, transmitting the image stream into a CNN network frame by frame, dividing an object with a priori dynamic property pixel by pixel, dividing the dynamic object in the scene to obtain a key frame image, and complementing a static scene blocked by a dynamic target by utilizing information of the previous frames;
step S2: for step S1: extracting feature points and feature lines from the obtained key frame image, constructing a local map related to the current frame image, including a key frame image sharing a common view point with the current frame image and adjacent frame images of the key frame image, searching feature points and line segments matched with the current frame image in the key frame image and the adjacent frame images of the key frame image, then carrying out dynamic consistency check on the prior dynamic object, removing the feature points and the feature lines on the dynamic object, reserving the feature points and the feature lines on the static object, and carrying out matching by utilizing the rest static feature points and the rest static lines;
step S3: matching the characteristic points and the characteristic lines in the step S2, filtering at the same time, removing the points and the lines which are incorrectly matched to obtain correct matching point pairs and line pairs, and obtaining the initial camera pose by using the matching point pairs;
step S4: calculating the camera pose of the current frame through the matching point pair and the line pair obtained in the step S3, and obtaining accurate camera pose estimation by minimizing the re-projection error of the point pair and the line pair;
step S5: constructing a local map about a scene by utilizing a key frame image, carrying out instance segmentation on each frame image, merging characteristic points and characteristic lines in each instance into corresponding instances, positioning a camera pose by utilizing the characteristic points and the characteristic lines, and calculating point clouds of objects and the scene to obtain a sparse point cloud map;
step S6: and (3) performing pose optimization by using loop detection, correcting drift errors, and obtaining more accurate camera pose estimation.
As a preferred scheme, step S1 is to extract feature points and feature lines of a static region on a key frame image, and extract feature points and feature lines of the static region of the key frame image, and specifically includes the following steps: and extracting the characteristics of the image static region by using ORB characteristic points, simultaneously calculating ORB descriptors to obtain characteristic points and descriptors of the image static region, extracting line characteristics of the image from which the dynamic object is removed, wherein the extraction of the line characteristics adopts a network structure of a transducer, and the line characteristics on the image static region are obtained by fusing characteristic information under different scales through a series of up-sampling and down-sampling operations.
Further, the extracted line features employ horizontal distancesAnd vertical distanceGenerating vector->To predict the positions of the two end points of a single line segment to obtain line characteristics, wherein +.>Andrepresenting coordinates of left and right end points of the line segment, < >>Is the midpoint coordinate of the line segment, ">Represent right endpoint->Coordinates and midpointA vector of relationships between coordinatesIn the present method->And->Expressed as: />
As a preferred solution, the matching of the feature points and the feature lines in step S3 specifically includes the following steps: the feature point matching is to find out a feature point with the closest descriptor distance as a matching point in the current frame through quick nearest neighbor search by generating ORB descriptors, then to reject the mismatching point pair, when the matching descriptor distance is larger than a threshold gamma or the ratio of the optimal matching point distance to the second optimal matching point distance is smaller than 1, the second matching point is equivalent to the first matching point, then the matching point pair is considered to be easy to be mismatched, and the matching point pair is rejected; the matching of the characteristic lines is to obtain 2D-2D matching line pairs through geometric constraint, map the 2D-2D matching line pairs to a 3D space directly through outlier rejection, and then obtain accurate 2D-3D line matching pairs by minimizing the reprojection error.
As a preferred solution, the optimization of the camera pose by minimizing the re-projection errors of the point pairs and the line pairs in step S4 is specifically implemented as follows:
the position and posture are jointly optimized by adopting the dotted line, and the minimized reprojection error is defined as:
wherein the method comprises the steps of
Wherein N represents a pair of matching lines on 2D-3D, a functionEqual to 3D line->Line projected onto 2D plane, angle error +.>By defining two planes +.>And->Defined, function->Equal to 3D point->Points on the 2D plane of the graph, +.>And->Is a given weight value, and optimizes the camera pose by minimizing the re-projection error.
In the preferred scheme, in step S5, the point cloud processing is performed through local mapping, and the pose of the camera is optimized by global repositioning, so as to obtain a sparse point cloud reconstruction map, which specifically comprises the following steps:
calculating a BOW vector of each frame of data stream, calculating the current frame image comprising the BOW vector and the common view relation information, inserting the current frame image into a map, and updating the common view; in the tracking process, each key frame is attached with information comprising feature points, feature lines and descriptors, and then map points are created by utilizing triangulation; judging whether other key frames exist in the key frame queue, if not, optimizing map points, and performing local BA optimization by using the current frame, the key frame image sharing the common view point with the current frame image and the adjacent frame images of the key frame image;
and (3) finding candidate key frames corresponding to the current frame, matching the current frame with the key frames by using a BOW dictionary for each candidate key frame, initializing by using the matching relation between the current frame and the candidate key frames, and estimating the pose by using EPnP for each candidate key frame.
Further, in step S6, optimizing the pose of the camera through loop detection specifically includes the following steps:
based on two characteristics of points and lines, performing loop detection by using key frames, when three continuous closed-loop candidate key frames have higher similarity with the current key frame, obtaining loop candidate frames, firstly matching characteristic points and characteristic lines on each candidate loop frame with the current frame, then solving a similar transformation matrix by using three-dimensional information corresponding to the characteristic points and the characteristic lines, if enough inner points and inner lines exist in the loop frame, performing Sim (3) optimization, performing loop correction by using the loop candidate frames, optimizing characteristic point constraint and line segment constraint, and obtaining the camera pose after point-line joint optimization.
(1) The invention adopts the technical proposal, and compared with the prior art, the invention has the following beneficial effects: the invention improves on the basis of ORB-SLAM3, proposes a SLAM algorithm based on feature points, feature lines and semantic information, combines MASK-RCNN with multi-view geometry, realizes the example segmentation and rejection of dynamic targets, simultaneously identifies dynamic feature points and feature lines, eliminates the interference of the dynamic targets on feature matching, eliminates the influence of the dynamic targets on a SLAM system, and completes the static scene blocked by the dynamic targets by utilizing the information of the previous frames;
(2) The invention provides a semantic SLAM system based on feature points and feature lines, which adopts a structure of a transducer to extract line features, and the line features extracted by the method are more accurate than those extracted by the traditional method;
compared with the point features, the line provides more geometric structure information about the environment, the point and line features are extracted, the point and line features can be more accurately matched with Lu Bang under the scene of weak texture and illumination change, the pose estimation of a camera is realized, the positioning error and repositioning error are reduced, and the algorithm solves the problem of difficult positioning under the low-texture scene.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a feature line detection diagram;
FIG. 2 is a flow chart of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
The semantic SLAM method based on the dotted line combination for the dynamic environment according to the embodiment of the present invention is specifically described below with reference to fig. 1 to 2.
As shown in fig. 1 and fig. 2, the invention provides a semantic SLAM method based on point-line combination and oriented to a dynamic environment, which is characterized by comprising the following steps:
step S1: acquiring an image stream of a scene, transmitting the image stream into a CNN network frame by frame, dividing objects with priori dynamic properties such as pedestrians, vehicles, fish and the like pixel by pixel, and dividing dynamic objects in the scene to obtainThe key frame image is used for complementing the static scene shielded by the dynamic target by utilizing the information of the previous frames; extracting feature points and feature lines of a static region on a key frame image, and extracting the feature points and the feature lines of the static region of the key frame image, wherein the method specifically comprises the following steps of: and extracting the characteristics of the image static region by using ORB characteristic points, simultaneously calculating ORB descriptors to obtain characteristic points and descriptors of the image static region, extracting line characteristics of the image from which the dynamic object is removed, wherein the extraction of the line characteristics adopts a network structure of a transducer, and the line characteristics on the image static region are obtained by fusing characteristic information under different scales through a series of up-sampling and down-sampling operations. Extracting line characteristics by using length of line segmentAnd the angle theta acquires two endpoints of the line segment, and for the long line segment, the small change of the angle can greatly influence the position of the endpoint of the line segment, so that larger line error is caused, and the method adopts horizontal distance +.>And vertical distance->Generating vector->To predict the positions of the two end points of a single line segment to obtain line characteristics, wherein +.>And->Representing coordinates of left and right end points of the line segment, < >>Is the midpoint coordinate of the line segment, ">Represent right endpoint->Coordinates and midpoint->A vector of the relation between coordinates, in the method +.>And->Expressed as: />,/>
Step S2: for step S1: extracting feature points and feature lines from the obtained key frame image, constructing a local map related to the current frame image, including a key frame image sharing a common view point with the current frame image and adjacent frame images of the key frame image, searching feature points and line segments matched with the current frame image in the key frame image and the adjacent frame images of the key frame image, then carrying out dynamic consistency check on the prior dynamic object, removing the feature points and the feature lines on the dynamic object, reserving the feature points and the feature lines on the static object, and carrying out matching by utilizing the rest static feature points and the rest static lines;
step S3: matching the characteristic points and the characteristic lines in the step S2, filtering at the same time, removing the points and the lines which are incorrectly matched to obtain correct matching point pairs and line pairs, and obtaining the initial camera pose by using the matching point pairs; the matching of the feature points and the feature lines specifically comprises the following steps: the feature point matching is to find out a feature point with the closest descriptor distance as a matching point in the current frame through quick nearest neighbor search by generating ORB descriptors, then to reject the mismatching point pair, when the matching descriptor distance is larger than a threshold gamma or the ratio of the optimal matching point distance to the second optimal matching point distance is smaller than 1, the second matching point is equivalent to the first matching point, then the matching point pair is considered to be easy to be mismatched, and the matching point pair is rejected; the matching of the characteristic lines is to obtain 2D-2D matching line pairs through geometric constraint, map the 2D-2D matching line pairs to a 3D space directly through outlier rejection, and then obtain accurate 2D-3D line matching pairs by minimizing the reprojection error. The initial camera pose calculation specifically comprises the following steps: and calculating a basic matrix and an essential matrix through the feature points and the feature lines, and obtaining a relatively accurate pose transformation matrix between cameras through SVD decomposition.
Step S4: calculating the camera pose of the current frame through the matching point pair and the line pair obtained in the step S3, and obtaining accurate camera pose estimation by minimizing the re-projection error of the point pair and the line pair; the specific implementation of optimizing the camera pose by minimizing the reprojection error of the point pair and the line pair is as follows:
the position and posture are jointly optimized by adopting the dotted line, and the minimized reprojection error is defined as:
wherein the method comprises the steps of
Wherein N represents a pair of matching lines on 2D-3D, a functionEqual to 3D line->Line projected onto 2D plane, angle error +.>By defining two planesFace->And->Defined, functionEqual to 3D point->Dot of figure onto 2D plane +.>And->Is a given weight value, and optimizes the camera pose by minimizing the re-projection error.
Step S5: constructing a local map about a scene by utilizing a key frame image, carrying out instance segmentation on each frame image, merging characteristic points and characteristic lines in each instance into corresponding instances, positioning a camera pose by utilizing the characteristic points and the characteristic lines, calculating point clouds of an object and the scene, carrying out point cloud processing by utilizing the local map, and optimizing the camera pose by utilizing global repositioning, thereby obtaining a sparse point cloud reconstruction map, and specifically comprising the following steps:
calculating a BOW vector of each frame of data stream, calculating the current frame image comprising the BOW vector and the common view relation information, inserting the current frame image into a map, and updating the common view; in the tracking process, each key frame is attached with information comprising feature points, feature lines and descriptors, but not all feature points become 3D map points, so that unqualified feature points and feature lines need to be removed, and then the map points are created by utilizing triangulation; judging whether other key frames exist in the key frame queue, if not, optimizing map points, and performing local BA optimization by using the current frame, the key frame image sharing the common view point with the current frame image and the adjacent frame images of the key frame image;
and (3) finding candidate key frames corresponding to the current frame, matching the current frame with the key frames by using a BOW dictionary for each candidate key frame, initializing by using the matching relation between the current frame and the candidate key frames, and estimating the pose by using EPnP for each candidate key frame.
Step S6: and (3) performing pose optimization by using loop detection, correcting drift errors, and obtaining more accurate camera pose estimation. The method specifically comprises the following steps:
based on two characteristics of points and lines, performing loop detection by using key frames, when three continuous closed-loop candidate key frames have higher similarity with the current key frame, obtaining loop candidate frames, firstly matching characteristic points and characteristic lines on each candidate loop frame with the current frame, then solving a similar transformation matrix by using three-dimensional information corresponding to the characteristic points and the characteristic lines, if enough inner points and inner lines exist in the loop frame, performing Sim (3) optimization, performing loop correction by using the loop candidate frames, optimizing characteristic point constraint and line segment constraint, and obtaining the camera pose after point-line joint optimization.
In the description of the present invention, the term "plurality" means two or more, unless explicitly defined otherwise, the orientation or positional relationship indicated by the terms "upper", "lower", etc. are based on the orientation or positional relationship shown in the drawings, merely for convenience of description of the present invention and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the present invention; the terms "coupled," "mounted," "secured," and the like are to be construed broadly, and may be fixedly coupled, detachably coupled, or integrally connected, for example; can be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the description of the present specification, the terms "one embodiment," "some embodiments," "particular embodiments," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (3)

1. The semantic SLAM method based on the point-line combination for the dynamic environment is characterized by comprising the following steps of:
step S1: acquiring an image stream of a scene, transmitting the image stream into a CNN network frame by frame, dividing an object with a priori dynamic property pixel by pixel, dividing the dynamic object in the scene to obtain a key frame image, and complementing a static scene blocked by a dynamic target by utilizing information of the previous frames; extracting feature points and feature lines of a static region on a key frame image, and extracting the feature points and the feature lines of the static region of the key frame image, wherein the method specifically comprises the following steps of: extracting features of an image static region by using ORB feature points, simultaneously calculating ORB descriptors to obtain feature points and descriptors of the image static region, extracting line features of the image from which a dynamic object is removed, wherein the extraction of the line features adopts a network structure of a transducer, and the line features on the image static region are obtained by fusing feature information under different scales through a series of up-sampling and down-sampling operations; extracting line characteristics by using horizontal distanceAnd vertical distance->Generating vector->To predict both ends of a single line segmentThe position of the dots, a line characteristic is obtained, wherein +.>And->Representing coordinates of left and right end points of the line segment, < >>Is the midpoint coordinate of the line segment, ">Represent right endpoint->Coordinates and midpoint->A vector of the relation between coordinates, in the method +.>And->Expressed as: />,/>
Step S2: for step S1: extracting feature points and feature lines from the obtained key frame image, constructing a local map related to the current frame image, including a key frame image sharing a common view point with the current frame image and adjacent frame images of the key frame image, searching feature points and line segments matched with the current frame image in the key frame image and the adjacent frame images of the key frame image, then carrying out dynamic consistency check on the prior dynamic object, removing the feature points and the feature lines on the dynamic object, reserving the feature points and the feature lines on the static object, and carrying out matching by utilizing the rest static feature points and the rest static lines;
step S3: matching the characteristic points and the characteristic lines in the step S2, filtering at the same time, removing the points and the lines which are incorrectly matched to obtain correct matching point pairs and line pairs, and obtaining the initial camera pose by using the matching point pairs; the matching of the feature points and the feature lines specifically comprises the following steps: the feature point matching is to find out a feature point with the closest descriptor distance as a matching point in the current frame through quick nearest neighbor search by generating ORB descriptors, then to reject the mismatching point pair, when the matching descriptor distance is larger than a threshold gamma or the ratio of the optimal matching point distance to the second optimal matching point distance is smaller than 1, the second matching point is equivalent to the first matching point, then the matching point pair is considered to be easy to be mismatched, and the matching point pair is rejected; the matching of the characteristic lines is to obtain 2D-2D matching line pairs through geometric constraint, map the 2D-2D matching line pairs to a 3D space directly through outlier rejection, and then obtain accurate 2D-3D line matching pairs by minimizing the reprojection error;
step S4: calculating the camera pose of the current frame through the matching point pair and the line pair obtained in the step S3, and obtaining accurate camera pose estimation by minimizing the re-projection error of the point pair and the line pair; the specific implementation of optimizing the camera pose by minimizing the reprojection error of the point pair and the line pair is as follows:
the position and posture are jointly optimized by adopting the dotted line, and the minimized reprojection error is defined as:
wherein the method comprises the steps of
Wherein N represents a pair of matching lines on 2D-3D, a functionEqual to 3D line->Line projected onto 2D plane, angle error +.>By defining two planes +.>And->Defined, functionEqual to 3D point->Points on the 2D plane of the graph, +.>And->Is given weight value, and optimizes the pose of the camera by minimizing the reprojection error
Step S5: constructing a local map about a scene by utilizing a key frame image, carrying out instance segmentation on each frame image, merging characteristic points and characteristic lines in each instance into corresponding instances, positioning a camera pose by utilizing the characteristic points and the characteristic lines, and calculating point clouds of objects and the scene to obtain a sparse point cloud map;
step S6: and (3) performing pose optimization by using loop detection, correcting drift errors, and obtaining more accurate camera pose estimation.
2. The semantic SLAM method based on point-line combination for dynamic environment according to claim 1, wherein in step S5, the point cloud processing is performed by local mapping, and the camera pose is optimized by global repositioning, so as to obtain a sparse point cloud reconstruction map, which specifically comprises the following steps:
calculating a BOW vector of each frame of data stream, calculating the current frame image comprising the BOW vector and the common view relation information, inserting the current frame image into a map, and updating the common view; in the tracking process, each key frame is attached with information comprising feature points, feature lines and descriptors, and then map points are created by utilizing triangulation; judging whether other key frames exist in the key frame queue, if not, optimizing map points, and performing local BA optimization by using the current frame, the key frame image sharing the common view point with the current frame image and the adjacent frame images of the key frame image;
and (3) finding candidate key frames corresponding to the current frame, matching the current frame with the key frames by using a BOW dictionary for each candidate key frame, initializing by using the matching relation between the current frame and the candidate key frames, and estimating the pose by using EPnP for each candidate key frame.
3. The semantic SLAM method based on point-line combination for dynamic environment according to claim 2, wherein optimizing the camera pose by loop detection in step S6 specifically comprises the steps of:
based on two characteristics of points and lines, performing loop detection by using key frames, when three continuous closed-loop candidate key frames have higher similarity with the current key frame, obtaining loop candidate frames, firstly matching characteristic points and characteristic lines on each candidate loop frame with the current frame, then solving a similar transformation matrix by using three-dimensional information corresponding to the characteristic points and the characteristic lines, if enough inner points and inner lines exist in the loop frame, performing Sim (3) optimization, performing loop correction by using the loop candidate frames, optimizing characteristic point constraint and line segment constraint, and obtaining the camera pose after point-line joint optimization.
CN202211619407.3A 2022-12-16 2022-12-16 Semantic SLAM method based on point-line combination and oriented to dynamic environment Active CN116468786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211619407.3A CN116468786B (en) 2022-12-16 2022-12-16 Semantic SLAM method based on point-line combination and oriented to dynamic environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211619407.3A CN116468786B (en) 2022-12-16 2022-12-16 Semantic SLAM method based on point-line combination and oriented to dynamic environment

Publications (2)

Publication Number Publication Date
CN116468786A CN116468786A (en) 2023-07-21
CN116468786B true CN116468786B (en) 2023-12-26

Family

ID=87181281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211619407.3A Active CN116468786B (en) 2022-12-16 2022-12-16 Semantic SLAM method based on point-line combination and oriented to dynamic environment

Country Status (1)

Country Link
CN (1) CN116468786B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173342A (en) * 2023-11-02 2023-12-05 中国海洋大学 Underwater monocular and binocular camera-based natural light moving three-dimensional reconstruction device and method

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489501A (en) * 2019-07-24 2019-11-22 西北工业大学 SLAM system rapid relocation algorithm based on line feature
CN110782494A (en) * 2019-10-16 2020-02-11 北京工业大学 Visual SLAM method based on point-line fusion
CN111402336A (en) * 2020-03-23 2020-07-10 中国科学院自动化研究所 Semantic S L AM-based dynamic environment camera pose estimation and semantic map construction method
CN112132897A (en) * 2020-09-17 2020-12-25 中国人民解放军陆军工程大学 Visual SLAM method based on deep learning semantic segmentation
CN112381890A (en) * 2020-11-27 2021-02-19 上海工程技术大学 RGB-D vision SLAM method based on dotted line characteristics
CN112396595A (en) * 2020-11-27 2021-02-23 广东电网有限责任公司肇庆供电局 Semantic SLAM method based on point-line characteristics in dynamic environment
CN112435262A (en) * 2020-11-27 2021-03-02 广东电网有限责任公司肇庆供电局 Dynamic environment information detection method based on semantic segmentation network and multi-view geometry
CN112446882A (en) * 2020-10-28 2021-03-05 北京工业大学 Robust visual SLAM method based on deep learning in dynamic scene
CN113837277A (en) * 2021-09-24 2021-12-24 东南大学 Multisource fusion SLAM system based on visual point-line feature optimization
WO2022041596A1 (en) * 2020-08-31 2022-03-03 同济人工智能研究院(苏州)有限公司 Visual slam method applicable to indoor dynamic environment
CN114283199A (en) * 2021-12-29 2022-04-05 北京航空航天大学 Dynamic scene-oriented dotted line fusion semantic SLAM method
CN114627309A (en) * 2022-03-11 2022-06-14 长春工业大学 Visual SLAM method based on dotted line features in low texture environment
CN114708293A (en) * 2022-03-22 2022-07-05 广东工业大学 Robot motion estimation method based on deep learning point-line feature and IMU tight coupling
CN114862949A (en) * 2022-04-02 2022-08-05 华南理工大学 Structured scene vision SLAM method based on point, line and surface characteristics

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11017545B2 (en) * 2018-06-07 2021-05-25 Uisee Technologies (Beijing) Ltd. Method and device of simultaneous localization and mapping

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489501A (en) * 2019-07-24 2019-11-22 西北工业大学 SLAM system rapid relocation algorithm based on line feature
CN110782494A (en) * 2019-10-16 2020-02-11 北京工业大学 Visual SLAM method based on point-line fusion
CN111402336A (en) * 2020-03-23 2020-07-10 中国科学院自动化研究所 Semantic S L AM-based dynamic environment camera pose estimation and semantic map construction method
WO2022041596A1 (en) * 2020-08-31 2022-03-03 同济人工智能研究院(苏州)有限公司 Visual slam method applicable to indoor dynamic environment
CN112132897A (en) * 2020-09-17 2020-12-25 中国人民解放军陆军工程大学 Visual SLAM method based on deep learning semantic segmentation
CN112446882A (en) * 2020-10-28 2021-03-05 北京工业大学 Robust visual SLAM method based on deep learning in dynamic scene
CN112396595A (en) * 2020-11-27 2021-02-23 广东电网有限责任公司肇庆供电局 Semantic SLAM method based on point-line characteristics in dynamic environment
CN112435262A (en) * 2020-11-27 2021-03-02 广东电网有限责任公司肇庆供电局 Dynamic environment information detection method based on semantic segmentation network and multi-view geometry
CN112381890A (en) * 2020-11-27 2021-02-19 上海工程技术大学 RGB-D vision SLAM method based on dotted line characteristics
CN113837277A (en) * 2021-09-24 2021-12-24 东南大学 Multisource fusion SLAM system based on visual point-line feature optimization
CN114283199A (en) * 2021-12-29 2022-04-05 北京航空航天大学 Dynamic scene-oriented dotted line fusion semantic SLAM method
CN114627309A (en) * 2022-03-11 2022-06-14 长春工业大学 Visual SLAM method based on dotted line features in low texture environment
CN114708293A (en) * 2022-03-22 2022-07-05 广东工业大学 Robot motion estimation method based on deep learning point-line feature and IMU tight coupling
CN114862949A (en) * 2022-04-02 2022-08-05 华南理工大学 Structured scene vision SLAM method based on point, line and surface characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于点线特征的单目视觉同时定位与地图构建算法;王丹等;机器人(第03期);全文 *

Also Published As

Publication number Publication date
CN116468786A (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN109345588B (en) Tag-based six-degree-of-freedom attitude estimation method
CN110223348B (en) Robot scene self-adaptive pose estimation method based on RGB-D camera
CN110389348B (en) Positioning and navigation method and device based on laser radar and binocular camera
CN108986037B (en) Monocular vision odometer positioning method and positioning system based on semi-direct method
US9330471B2 (en) Camera aided motion direction and speed estimation
CN110807809B (en) Light-weight monocular vision positioning method based on point-line characteristics and depth filter
CN110322511B (en) Semantic SLAM method and system based on object and plane features
CN109579825B (en) Robot positioning system and method based on binocular vision and convolutional neural network
CN108519102B (en) Binocular vision mileage calculation method based on secondary projection
CN108776989B (en) Low-texture planar scene reconstruction method based on sparse SLAM framework
CN113506318B (en) Three-dimensional target perception method under vehicle-mounted edge scene
CN107862735B (en) RGBD three-dimensional scene reconstruction method based on structural information
CN112484746B (en) Monocular vision auxiliary laser radar odometer method based on ground plane
US10991105B2 (en) Image processing device
CN113658337B (en) Multi-mode odometer method based on rut lines
CN112419497A (en) Monocular vision-based SLAM method combining feature method and direct method
CN113223045A (en) Vision and IMU sensor fusion positioning system based on dynamic object semantic segmentation
CN116449384A (en) Radar inertial tight coupling positioning mapping method based on solid-state laser radar
CN111998862A (en) Dense binocular SLAM method based on BNN
CN116468786B (en) Semantic SLAM method based on point-line combination and oriented to dynamic environment
CN114088081A (en) Map construction method for accurate positioning based on multi-segment joint optimization
CN114140527A (en) Dynamic environment binocular vision SLAM method based on semantic segmentation
CN112037282B (en) Aircraft attitude estimation method and system based on key points and skeleton
CN116385538A (en) Visual SLAM method, system and storage medium for dynamic scene
CN116128966A (en) Semantic positioning method based on environmental object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant