CN110930519A - Semantic ORB-SLAM sensing method and device based on environment understanding - Google Patents

Semantic ORB-SLAM sensing method and device based on environment understanding Download PDF

Info

Publication number
CN110930519A
CN110930519A CN201911113708.7A CN201911113708A CN110930519A CN 110930519 A CN110930519 A CN 110930519A CN 201911113708 A CN201911113708 A CN 201911113708A CN 110930519 A CN110930519 A CN 110930519A
Authority
CN
China
Prior art keywords
frame
sequence
orb
key frame
slam
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911113708.7A
Other languages
Chinese (zh)
Other versions
CN110930519B (en
Inventor
柯晶晶
周广兵
蒙仕格
郑辉
林飞堞
陈惠纲
王珏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Robotics Innovation Research Institute
Original Assignee
South China Robotics Innovation Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Robotics Innovation Research Institute filed Critical South China Robotics Innovation Research Institute
Priority to CN201911113708.7A priority Critical patent/CN110930519B/en
Publication of CN110930519A publication Critical patent/CN110930519A/en
Application granted granted Critical
Publication of CN110930519B publication Critical patent/CN110930519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/003Navigation within 3D models or images
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a semantic ORB-SLAM perception method and device based on environment understanding, wherein the method comprises the following steps: inputting the sequence frame into an ORB-SLAM front end Tracking thread to carry out key frame extraction processing, and acquiring key frame data; inputting the key frame data into an adjacent key frame image optimization thread to perform key frame data optimization processing, and acquiring key frame data after image optimization; calculating an error value between the graph optimized key frame data, and generating a candidate set based on the error value; and performing closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and performing synchronous positioning and map construction based on a correction result. In the embodiment of the invention, the improvement of the environment perception of the robot has a remarkable effect, and meanwhile, the robot can obtain higher-layer cognitive information of a scene, so that a more natural application mode is provided for application neighborhoods including robot navigation, augmented reality and automatic driving.

Description

Semantic ORB-SLAM sensing method and device based on environment understanding
Technical Field
The invention relates to the technical field of intelligent robot perception, in particular to a semantic ORB-SLAM perception method and device based on environment understanding.
Background
Synchronous positioning and Mapping (SLAM) is the basis for realizing autonomous navigation in an unknown environment by a mobile robot and is one of the precondition for realizing autonomy and intellectualization; currently, the visual SLAM can perform real-time positioning and three-dimensional map construction in a static environment in a certain range, however, the map generated by the traditional visual SLAM only contains simple geometric information (points, lines and the like) or low-level pixel level information (colors, brightness and the like), and does not contain semantic information. Although these simple geometric information and pixel level information can satisfy the autonomous navigation of a robot in a single environment, they cannot satisfy the requirement of a mobile robot to perform higher level tasks.
Patent CN201811514700 discloses a visual SLAM method based on ORB features, which only adopts ORB features to replace traditional SIFT feature extraction in a front-end link, and performs feature matching judgment by using hamming distance, so that the calculated amount can be reduced to a certain extent, and the real-time performance of the visual SLAM is improved; and in the back-end module, a graph optimization idea is adopted, and the accuracy of loop detection can be well improved based on a point cloud fusion optimization idea combining local loop and global loop.
However, although the ORB features are used for replacing the traditional SIFT feature extraction, the visual SLAM method based on ORB features effectively improves the calculation speed, but can only work in a static state or in a scene with a small number of dynamic objects, if a large number of feature points fall on the dynamic objects, the SLAM tracking and positioning result can shift along with the movement of the dynamic objects, the robot mapping and positioning accuracy is greatly influenced, and even the pose calculation failure can occur; most pixel information in an original picture is discarded in the generation process of feature points of the visual SLAM method based on ORB features, effective semantic information is lacked, and further understanding of the robot to environment perception is seriously influenced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a semantic ORB-SLAM perception method and device based on environment understanding, which improve the remarkable effect of a robot on environment perception, enable the robot to obtain higher-layer cognitive information on a scene, and provide a more natural application mode for application fields including robot navigation, augmented reality and automatic driving.
In order to solve the above technical problem, an embodiment of the present invention provides a semantic ORB-SLAM sensing method based on environment understanding, where the method includes:
inputting the sequence frame into an ORB-SLAM front end Tracking thread to carry out key frame extraction processing, and acquiring key frame data;
inputting the key frame data into an adjacent key frame image optimization thread to perform key frame data optimization processing, and acquiring key frame data after image optimization;
calculating an error value between the graph optimized key frame data, and generating a candidate set based on the error value;
and performing closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and performing synchronous positioning and map construction based on a correction result.
Optionally, the inputting the sequence frame into an ORB-SLAM front end Tracking thread to perform key frame extraction processing, and acquiring key frame data includes:
an ORB-SLAM front end Tracking thread adopts an interframe difference method to carry out dynamic background removal processing on an input sequence frame, and a sequence frame with a dynamic background removed is obtained;
establishing a mapping relation between the sequence frame with the dynamic background removed and the object characteristic points, and acquiring a sequence frame in the mapping relation with the object characteristic points;
carrying out ORB feature extraction processing on the sequence frame in mapping relation with the object feature points to obtain ORB features of the sequence frame;
matching ORB characteristics of the current frame with ORB characteristics of the previous frame to obtain matched characteristic point pairs;
performing pose estimation and repositioning processing based on the matching feature point pairs to obtain pose estimation and repositioning results;
and performing pose estimation and repositioning processing according to the matched adjacent sequence frames to obtain pose optimization of adjacent frames, and acquiring a key frame sequence based on the pose optimization of the adjacent frames.
Optionally, the performing, by the ORB-SLAM front end Tracking thread, dynamic background removal processing on the input sequence frame by using an inter-frame difference method to obtain a sequence frame with a dynamic background removed includes:
performing difference operation on adjacent frames in the continuous time interval in the sequence frames, and performing change detection by using strong correlation of the adjacent frames in the sequence frames to obtain a moving target;
and based on the selected threshold value, eliminating the dynamic background of the moving target in the sequence frame, and acquiring the sequence frame with the dynamic background removed.
Optionally, the establishing a mapping relationship between the sequence frame without the dynamic background and the object feature point to obtain a sequence frame in a mapping relationship with the object feature point includes:
observing a picture point observed according to the sequence frame of the current frame with the dynamic background removed, and observing the sequence frame of the next frame with the dynamic background removed based on the picture point to be used as an adjacent sequence frame of the current frame with the dynamic background removed;
taking the sequence frame of the current frame without the dynamic background as a root node, and taking the adjacent sequence frame as a child node to generate a node tree;
and constructing a mapping relation between the sequence frame without the dynamic background and the object characteristic points based on the node tree, and acquiring the sequence frame in the mapping relation with the object characteristic points.
Optionally, the performing pose estimation and repositioning processing based on the matching feature point pairs includes:
and calculating the relative displacement of the sequence frame of the current frame and the sequence frame of the previous frame by utilizing the minimized reprojection error according to the matched feature point pairs.
Optionally, the method further includes:
when pose estimation and repositioning processing fails based on the matched feature point pairs, obtaining the most similar sequence frames among the sequence frames of the current frame based on the mapping relation with the object feature points;
obtaining the most similar ORB characteristics of the sequence frames, and matching the ORB characteristics of the sequence frames of the current frame with the most similar ORB characteristics of the sequence frames to obtain a first matching characteristic point pair;
and performing pose estimation and repositioning calculation again by using the first matching feature point pair to obtain a pose estimation and repositioning result.
Optionally, the obtaining a sequence of key frames based on pose optimization of the adjacent frames includes:
calculating the minimum re-projection error between the adjacent frames, and establishing a common view based on the minimum re-projection error;
and extracting the sequence frame in the common view as a key sequence frame.
Optionally, the inputting the key frame data into an adjacent key frame image optimization thread to perform key frame data optimization processing, and acquiring the key frame data after image optimization, includes:
and inputting the key frame data into an adjacent key frame image optimization thread, and then sequentially performing redundant point elimination processing, semantic extraction processing, new image point creation processing and adjacent frame optimization processing on the key frame data to obtain the key frame data after image optimization.
Optionally, the semantic extraction processing is performed on the key frame data after the redundant point elimination processing, and includes:
performing object detection on the key frame data subjected to the redundant point elimination processing based on a YOLO-v3 algorithm to obtain an object detection result;
performing semantic association processing on the object detection result by using a conditional random field to obtain combined object class probability and scene context information;
correcting and optimizing the combined object class probability and scene context information to generate a temporary object information candidate set;
judging whether the temporary object information in the temporary object information candidate set is a new object or an existing object, searching each point information of each temporary object information in the temporary object information candidate set in a corresponding neighborhood thereof, and acquiring a three-dimensional point closest to the point;
and calculating the Euler distance between the point and the three-dimensional point, and if the Euler distance is smaller than a preset threshold value, considering the point and the three-dimensional point as the same point.
In addition, an embodiment of the present invention further provides a semantic ORB-SLAM sensing apparatus based on environment understanding, where the apparatus includes:
the key frame extraction module: the system comprises a sequence frame input unit, an ORB-SLAM front end Tracking thread, a key frame extraction unit, a frame synchronization unit and a frame synchronization unit, wherein the sequence frame input unit is used for inputting a sequence frame into the ORB-SLAM front end Tracking thread to perform key frame extraction processing to obtain key frame data;
a key frame optimization module: the key frame data are input into an adjacent key frame image optimization thread to be subjected to key frame data optimization processing, and the key frame data after image optimization are obtained;
an error calculation module: the device is used for calculating an error value between the graph optimized key frame data and generating a candidate set based on the error value;
the synchronous positioning and map building module: and the method is used for carrying out closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and carrying out synchronous positioning and map construction based on a correction result.
In the embodiment of the invention, aiming at the defects that the traditional visual ORB-SLAM is easily interfered by a dynamic target in the characteristic extraction process, the extracted characteristic points only contain color brightness and geometric information and lack of object environment semantic information, in the front Tracking thread of the ORB-SLAM, firstly, the adjacent frames in the sequence frames are subjected to difference operation by using an interframe difference method, a threshold value is set, dynamic objects are eliminated, then, the mapping relation between the sequence frames and the object characteristic points is constructed, ORB characteristic extraction is carried out, the object environment information extracted based on deep learning semantics is integrated into an ORB-SLAM system, the semantic ORB-SLAM perception method for realizing environment understanding is realized, and the visual ORB-SLAM sensing method has the advantages of stable performance, difficulty in environmental interference, accurate matching and deeper environment understanding; the robot has a remarkable effect on environment perception, can obtain higher-layer cognitive information of a scene, and provides a more natural application mode for application fields including robot navigation, augmented reality and automatic driving.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a semantic ORB-SLAM perception method based on environment understanding in an embodiment of the present invention;
fig. 2 is a schematic structural composition diagram of a semantic ORB-SLAM perception device based on environment understanding in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
Referring to fig. 1, fig. 1 is a flowchart illustrating a semantic ORB-SLAM sensing method based on environment understanding according to an embodiment of the present invention.
As shown in fig. 1, a semantic ORB-SLAM perception method based on environment understanding, the method comprising:
s11, inputting the sequence frame into an ORB-SLAM front end Tracking thread to perform key frame extraction processing, and acquiring key frame data;
in the specific implementation process of the present invention, the inputting the sequence frame into the ORB-SLAM front end Tracking thread for key frame extraction processing to obtain key frame data includes: an ORB-SLAM front end Tracking thread adopts an interframe difference method to carry out dynamic background removal processing on an input sequence frame, and a sequence frame with a dynamic background removed is obtained; establishing a mapping relation between the sequence frame with the dynamic background removed and the object characteristic points, and acquiring a sequence frame in the mapping relation with the object characteristic points; carrying out ORB feature extraction processing on the sequence frame in mapping relation with the object feature points to obtain ORB features of the sequence frame; matching ORB characteristics of the current frame with ORB characteristics of the previous frame to obtain matched characteristic point pairs; performing pose estimation and repositioning processing based on the matching feature point pairs to obtain pose estimation and repositioning results; and performing pose estimation and repositioning processing according to the matched adjacent sequence frames to obtain pose optimization of adjacent frames, and acquiring a key frame sequence based on the pose optimization of the adjacent frames.
Further, the ORB-SLAM front end Tracking thread performs dynamic background removal processing on the input sequence frame by using an inter-frame difference method to obtain a sequence frame with a dynamic background removed, including: performing difference operation on adjacent frames in the continuous time interval in the sequence frames, and performing change detection by using strong correlation of the adjacent frames in the sequence frames to obtain a moving target; and based on the selected threshold value, eliminating the dynamic background of the moving target in the sequence frame, and acquiring the sequence frame with the dynamic background removed.
Further, the establishing a mapping relationship between the sequence frame without the dynamic background and the object feature point to obtain a sequence frame in a mapping relationship with the object feature point includes: observing a picture point observed according to the sequence frame of the current frame with the dynamic background removed, and observing the sequence frame of the next frame with the dynamic background removed based on the picture point to be used as an adjacent sequence frame of the current frame with the dynamic background removed; taking the sequence frame of the current frame without the dynamic background as a root node, and taking the adjacent sequence frame as a child node to generate a node tree; and constructing a mapping relation between the sequence frame without the dynamic background and the object characteristic points based on the node tree, and acquiring the sequence frame in the mapping relation with the object characteristic points.
Further, the performing pose estimation and repositioning processing based on the matching feature point pairs includes: and calculating the relative displacement of the sequence frame of the current frame and the sequence frame of the previous frame by utilizing the minimized reprojection error according to the matched feature point pairs.
Further, the method further comprises: when pose estimation and repositioning processing fails based on the matched feature point pairs, obtaining the most similar sequence frames among the sequence frames of the current frame based on the mapping relation with the object feature points; obtaining the most similar ORB characteristics of the sequence frames, and matching the ORB characteristics of the sequence frames of the current frame with the most similar ORB characteristics of the sequence frames to obtain a first matching characteristic point pair; and performing pose estimation and repositioning calculation again by using the first matching feature point pair to obtain a pose estimation and repositioning result.
Further, the obtaining of the key frame sequence based on the pose optimization of the adjacent frames comprises: calculating the minimum re-projection error between the adjacent frames, and establishing a common view based on the minimum re-projection error; and extracting the sequence frame in the common view as a key sequence frame.
Specifically, in an ORB-SLAM front end Tracking thread, a sequence frame is input into the ORB-SLAM front end Tracking thread, dynamic background removal is firstly carried out, noise interference and influence of a dynamic object on a subsequent characteristic point extraction and matching process are eliminated, an interframe difference method is adopted, adjacent frames in continuous time intervals in the sequence frame are extracted for difference operation, strong correlation of the adjacent frames in the sequence frame is utilized for change detection, a moving target is detected, and then a moving area in the sequence frame is removed through selecting a threshold value; among the sequential frames, the k-th frame fk(x, y) and k +1 frames fk+1The change between (x, y) can be represented by the binarized difference value D (x, y) as follows:
Figure BDA0002273484180000071
wherein, T is a set binary difference threshold; the part of '1' in the binary difference is composed of parts of the corresponding pixel gray values of the front and the back frames which are changed, and usually comprises a moving object and noise; the part of "0" is composed of the parts of the corresponding pixels with unchanged gray values of the previous and next frames.
In a front-end Tracking thread, in order to integrate the extracted semantic information into an ORB-SLAM frame, a mapping relation between a sequence frame with a dynamic background removed and object feature points needs to be established; in the ORB-SLAM, each sequence frame for removing the dynamic background stores the image points observed by the frame, and simultaneously, each image point also stores the sequence frame for removing the dynamic background and observing the image points; establishing a spanning tree of ORB-SLAM according to the relation between the sequence frame and the image point of the removed dynamic background; in order to construct a spanning tree, firstly, finding out a sequence frame for removing the dynamic background and observing the image point according to the image point observed by the current sequence frame for removing the dynamic background, wherein the sequence frame for removing the dynamic background is an adjacent sequence frame of the current sequence frame for removing the dynamic background and has a large number of same image points with the current sequence frame for removing the dynamic background; at the same time, the image points between the current sequence frames for removing the dynamic background, and each image point is provided with an associated sequence frame for removing the dynamic background; therefore, a spanning tree which takes the current sequence frame without the dynamic background as a root node and takes the adjacent sequence frame as a child node can be generated; in the spanning tree, the relationship between the child node and the parent node is determined by the number of common graph points; according to the spanning tree, the current sequence frame without the dynamic background can find the adjacent sequence frame, so that more associated image points can be found; the mapping relation between the sequence frames for removing the dynamic background and the object is established in the following way:
each object OiComprises the following steps:
point cloud data which are contained in an object under a world coordinate system and are obtained through calculation according to camera projection; the number of object classes and the probability of the corresponding object classes, wherein the probability is iteratively updated through an iterative Bayesian process; observing a set of keyframes for the object; the class to which the object belongs corresponds to the class of the object with the highest probability; the number of times the object is observed.
The color image corresponding to the sequence frame with the dynamic background removed is used for object detection; the depth image corresponding to the sequence frame with the dynamic background removed is used for generating object point cloud data; the sequence of frames with the dynamic background removed is observed as object information. Based on the mapping between the image point and the sequence frame with the dynamic background removed, after the relationship construction operation between the object and the sequence frame with the dynamic background removed is completed, the sequence frame with the dynamic background removed can find the associated object, and the object can also find the associated sequence frame with the dynamic background removed.
ORB feature extraction is carried out on the sequence frame in the mapping relation with the object feature points, ORB feature points are extracted, SIFT feature points are replaced, and therefore the operand can be effectively retrieved, and the operation efficiency is accelerated.
In an ORB-SLAM front end Tracking thread, performing pose estimation and repositioning according to a sequence frame with a dynamic background removed from a previous frame, namely matching ORB characteristics of the sequence frame of a current frame with ORB characteristics of the sequence frame of the previous frame to obtain matched characteristic point pairs, and then calculating relative displacement between the current frame and the previous frame by using a minimized reprojection error of the currently matched characteristic point pairs; if the tracking and positioning fails, a sequence frame which is most similar to the current frame is found by using a scene failure mode, the current frame is matched with the sequence frame to obtain matched image points, and the pose of the current frame is recalculated by using the matched image points.
Generally, two adjacent frames can simultaneously observe a part of the same image points, the minimum reprojection error between the two adjacent frames is calculated, the smaller the reprojection error is, the greater the correlation between the two adjacent sequence frames, so a corresponding preset threshold value is set, the projection error is compared with the preset threshold value, the projection error is required to be smaller than or equal to the preset threshold value, otherwise, the corresponding adjacent sequence frames are removed, on the premise, a common view can be established, pose optimization between adjacent frames is formed, and the sequence frame in the common view is obtained as the most key sequence frame.
S12: inputting the key frame data into an adjacent key frame image optimization thread to perform key frame data optimization processing, and acquiring key frame data after image optimization;
in a specific implementation process of the present invention, the inputting the key frame data into an adjacent key frame image optimization thread to perform key frame data optimization processing, and acquiring the key frame data after image optimization, includes: and inputting the key frame data into an adjacent key frame image optimization thread, and then sequentially performing redundant point elimination processing, semantic extraction processing, new image point creation processing and adjacent frame optimization processing on the key frame data to obtain the key frame data after image optimization.
Further, the semantic extraction processing is performed on the key frame data after the redundant point elimination processing, and the semantic extraction processing includes: performing object detection on the key frame data subjected to the redundant point elimination processing based on a YOLO-v3 algorithm to obtain an object detection result; performing semantic association processing on the object detection result by using a conditional random field to obtain combined object class probability and scene context information; correcting and optimizing the combined object class probability and scene context information to generate a temporary object information candidate set; judging whether a new object or an existing object exists in the temporary object information candidate set, and searching each point information of each piece of temporary object information in the temporary object information candidate set in a corresponding neighborhood thereof to obtain a three-dimensional point closest to the point; and calculating the Euler distance between the point and the three-dimensional point, and if the Euler distance is smaller than a preset threshold value, considering the point and the three-dimensional point as the same point.
Specifically, after key frames are obtained, the key frames are input into an adjacent key frame image optimization thread, redundant points are removed, a semantic extraction algorithm is designed, the image optimization process of adjacent frames among the key frames is realized, the designed created semantic extraction algorithm comprises the functions of object detection, object semantic association, temporary object generation, object association, object model updating and the like, the object detection is responsible for extracting object information from the images by using a deep learning network, the extracted object information is subjected to semantic association, and the extracted detection objects are corrected and optimized through the semantic association and are stored in a temporary object information set; and the object association and update is responsible for associating the temporary object information with the existing object information in the object database according to the mapping relation among the key frame, the object information and the map point, and updating and fusing the temporary object information to the corresponding object information.
Here, the YOLO-v 3-based algorithm is used for object detection, which divides each picture into N × N squares, then performs an object detection operation only once for each square, and finally fuses the detection results together.
Semantic detection is carried out on the key frame by using a YOLO-v3 algorithm, semantic association is further carried out on the objects extracted through deep learning detection by using a conditional random field, and the detection classification accuracy is improved by combining object class probability and scene context information, wherein an energy equation corresponding to the designed conditional random field combining the object class probability and the context information is as follows:
Figure BDA0002273484180000102
E(x)=∑iψμ(xi)+∑i<jψP(xi,yi);
where x represents a random variable of the object class, i, j ranges from 1 to k, where k is the number of objects detected in the image, Z is a normalization factor ensuring that the result is a probability, e (x) is an energy function of the conditional random field, and a univariate potential function ψuPlotting probability of random field graph node label category, binary potential function psiPIs to characterize the correlation between the nodes of the random field map.
Unitary potential function psiuAs follows:
ψμ=-log p(xi);
binary potential function psiPAs follows:
Figure BDA0002273484180000101
wherein, p (x)i) Represents the probability distribution, omega, of the class to which the ith object belongs given by the YOLO-v3 modelmIs the linear combination weight, mu is the mark and cumA capacitive function representing the likelihood of simultaneous occurrence of different classes within a neighborhood.
Semantic association is carried out on the detected objects through a conditional random field, the detection result is corrected and optimized, a temporary object information candidate set is generated, the temporary objects are judged, and whether the temporary objects are new objects or objects already exist in the candidate set is determined; and aiming at the data of each candidate object, searching each point information of the temporary object in the neighborhood thereof, finding out a three-dimensional point closest to the point from the point cloud data of the candidate object, calculating the Euler distance between the two points, and if the Euler distance between the two points is smaller than a set threshold value, considering the two points as the same point.
S13: calculating an error value between the graph optimized key frame data, and generating a candidate set based on the error value;
in the specific implementation process of the invention, the error value between the optimized key frame data of the graph is calculated, and the candidate set can be generated according to the error value.
S14: and performing closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and performing synchronous positioning and map construction based on a correction result.
In the specific implementation process of the method, closed-loop correction processing is carried out on a candidate set through global map optimization and loop-back fusion; closed loop detection is realized, the positioning precision is improved, and errors are reduced; and synchronous positioning and map construction are carried out based on the correction result.
In the embodiment of the invention, aiming at the defects that the traditional visual ORB-SLAM is easily interfered by a dynamic target in the characteristic extraction process, the extracted characteristic points only contain color brightness and geometric information and lack of object environment semantic information, in the front Tracking thread of the ORB-SLAM, firstly, the adjacent frames in the sequence frames are subjected to difference operation by using an interframe difference method, a threshold value is set, dynamic objects are eliminated, then, the mapping relation between the sequence frames and the object characteristic points is constructed, ORB characteristic extraction is carried out, the object environment information extracted based on deep learning semantics is integrated into an ORB-SLAM system, the semantic ORB-SLAM perception method for realizing environment understanding is realized, and the visual ORB-SLAM sensing method has the advantages of stable performance, difficulty in environmental interference, accurate matching and deeper environment understanding; the robot has a remarkable effect on environment perception, can obtain higher-layer cognitive information of a scene, and provides a more natural application mode for application fields including robot navigation, augmented reality and automatic driving.
Examples
Referring to fig. 2, fig. 2 is a schematic structural composition diagram of a semantic ORB-SLAM sensing apparatus based on environment understanding according to an embodiment of the present invention.
As shown in fig. 2, a semantic ORB-SLAM aware device based on environment understanding, the device comprising:
the key frame extraction module 21: the system comprises a sequence frame input unit, an ORB-SLAM front end Tracking thread, a key frame extraction unit, a frame synchronization unit and a frame synchronization unit, wherein the sequence frame input unit is used for inputting a sequence frame into the ORB-SLAM front end Tracking thread to perform key frame extraction processing to obtain key frame data;
in the specific implementation process of the present invention, the inputting the sequence frame into the ORB-SLAM front end Tracking thread for key frame extraction processing to obtain key frame data includes: an ORB-SLAM front end Tracking thread adopts an interframe difference method to carry out dynamic background removal processing on an input sequence frame, and a sequence frame with a dynamic background removed is obtained; establishing a mapping relation between the sequence frame with the dynamic background removed and the object characteristic points, and acquiring a sequence frame in the mapping relation with the object characteristic points; carrying out ORB feature extraction processing on the sequence frame in mapping relation with the object feature points to obtain ORB features of the sequence frame; matching ORB characteristics of the current frame with ORB characteristics of the previous frame to obtain matched characteristic point pairs; performing pose estimation and repositioning processing based on the matching feature point pairs to obtain pose estimation and repositioning results; and performing pose estimation and repositioning processing according to the matched adjacent sequence frames to obtain pose optimization of adjacent frames, and acquiring a key frame sequence based on the pose optimization of the adjacent frames.
Further, the ORB-SLAM front end Tracking thread performs dynamic background removal processing on the input sequence frame by using an inter-frame difference method to obtain a sequence frame with a dynamic background removed, including: performing difference operation on adjacent frames in the continuous time interval in the sequence frames, and performing change detection by using strong correlation of the adjacent frames in the sequence frames to obtain a moving target; and based on the selected threshold value, eliminating the dynamic background of the moving target in the sequence frame, and acquiring the sequence frame with the dynamic background removed.
Further, the establishing a mapping relationship between the sequence frame without the dynamic background and the object feature point to obtain a sequence frame in a mapping relationship with the object feature point includes: observing a picture point observed according to the sequence frame of the current frame with the dynamic background removed, and observing the sequence frame of the next frame with the dynamic background removed based on the picture point to be used as an adjacent sequence frame of the current frame with the dynamic background removed; taking the sequence frame of the current frame without the dynamic background as a root node, and taking the adjacent sequence frame as a child node to generate a node tree; and constructing a mapping relation between the sequence frame without the dynamic background and the object characteristic points based on the node tree, and acquiring the sequence frame in the mapping relation with the object characteristic points.
Further, the performing pose estimation and repositioning processing based on the matching feature point pairs includes: and calculating the relative displacement of the sequence frame of the current frame and the sequence frame of the previous frame by utilizing the minimized reprojection error according to the matched feature point pairs.
Further, the method further comprises: when pose estimation and repositioning processing fails based on the matched feature point pairs, obtaining the most similar sequence frames among the sequence frames of the previous frame based on the mapping relation with the object feature points; obtaining the most similar ORB characteristics of the sequence frames, and matching the ORB characteristics of the sequence frames of the current frame with the most similar ORB characteristics of the sequence frames to obtain a first matching characteristic point pair; and performing pose estimation and repositioning calculation again by using the first matching feature point pair to obtain a pose estimation and repositioning result.
Further, the obtaining of the key frame sequence based on the pose optimization of the adjacent frames comprises: calculating the minimum re-projection error between the adjacent frames, and establishing a common view based on the minimum re-projection error; and extracting the sequence frame in the common view as a key sequence frame.
Specifically, in the ORB-SLAM front end Tracking thread, the sequence frame is input into the ORB-SLAM front endIn the Tracking thread, firstly, dynamic background removal is carried out, noise interference and influence of a dynamic object on the subsequent characteristic point extraction and matching process are eliminated, an interframe difference method is adopted, adjacent frames in continuous time intervals in a sequence frame are extracted for difference operation, change detection is carried out by utilizing strong correlation of the adjacent frames in the sequence frame, so that a moving target is detected, and then a moving area in the sequence frame is removed by selecting a threshold value; among the sequential frames, the k-th frame fk(x, y) and k +1 frames fk+1The change between (x, y) can be represented by the binarized difference value D (x, y) as follows:
Figure BDA0002273484180000131
wherein, T is a set binary difference threshold; the part of '1' in the binary difference is composed of parts of the corresponding pixel gray values of the front and the back frames which are changed, and usually comprises a moving object and noise; the part of "0" is composed of the parts of the corresponding pixels with unchanged gray values of the previous and next frames.
In a front-end Tracking thread, in order to integrate the extracted semantic information into an ORB-SLAM frame, a mapping relation between a sequence frame with a dynamic background removed and object feature points needs to be established; in the ORB-SLAM, each sequence frame for removing the dynamic background stores the image points observed by the frame, and simultaneously, each image point also stores the sequence frame for removing the dynamic background and observing the image points; establishing a spanning tree of ORB-SLAM according to the relation between the sequence frame and the image point of the removed dynamic background; in order to construct a spanning tree, firstly, finding out a sequence frame for removing the dynamic background and observing the image point according to the image point observed by the current sequence frame for removing the dynamic background, wherein the sequence frame for removing the dynamic background is an adjacent sequence frame of the current sequence frame for removing the dynamic background and has a large number of same image points with the current sequence frame for removing the dynamic background; meanwhile, the current sequence frames for removing the dynamic background have the map points, and each map point has the associated sequence frame for removing the dynamic background; therefore, a spanning tree which takes the current sequence frame without the dynamic background as a root node and takes the adjacent sequence frame as a child node can be generated; in the spanning tree, the relationship between the child node and the parent node is determined by the number of common graph points; according to the spanning tree, the current sequence frame without the dynamic background can find the adjacent sequence frame, so that more associated image points can be found; the mapping relation between the sequence frame for removing the dynamic background and the object is established according to the following mode:
each object OiComprises the following steps:
point cloud data which are contained in an object under a world coordinate system and are obtained through calculation according to camera projection; the number of object classes and the probability of the corresponding object classes, wherein the probability is iteratively updated through an iterative Bayesian process; observing a set of keyframes for the object; the class to which the object belongs corresponds to the class of the object with the highest probability; the number of times the object is observed.
The color image corresponding to the sequence frame with the dynamic background removed is used for object detection; the depth image corresponding to the sequence frame with the dynamic background removed is used for generating object point cloud data; the sequence of frames with the dynamic background removed is observed as object information. Based on the mapping between the image point and the sequence frame with the dynamic background removed, after the relationship construction operation between the object and the sequence frame with the dynamic background removed is completed, the sequence frame with the dynamic background removed can find the associated object, and the object can also find the associated sequence frame with the dynamic background removed.
ORB feature extraction is carried out on the sequence frame in the mapping relation with the object feature points, ORB feature points are extracted, SIFT feature points are replaced, and therefore the operand can be effectively retrieved, and the operation efficiency is accelerated.
In an ORB-SLAM front end Tracking thread, performing pose estimation and repositioning according to a sequence frame with a dynamic background removed from a previous frame, matching ORB characteristics of a sequence frame of a current frame with ORB characteristics of the sequence frame of the previous frame to obtain matched characteristic point pairs, then feeling the currently matched characteristic point pairs, and calculating relative displacement between the current frame and the previous frame by using a minimized reprojection error; if the tracking and positioning fails, a sequence frame which is most similar to the current frame is found by using a scene failure mode, the current frame is matched with the sequence frame to obtain a matched image point, and the pose of the current frame is recalculated by using a matched land.
Generally, two adjacent frames can simultaneously observe a part of the same image points, the minimum reprojection error between the two adjacent frames is calculated, the smaller the reprojection error is, the greater the correlation between the two adjacent sequence frames, so a corresponding preset threshold value is set, the projection error is compared with the preset threshold value, the projection error is required to be smaller than or equal to the preset threshold value, otherwise, the corresponding adjacent sequence frames are removed, on the premise, a common view can be established, pose optimization between adjacent frames is formed, and the sequence frame in the common view is obtained as the most key sequence frame.
The key frame optimization module 22: the key frame data are input into an adjacent key frame image optimization thread to be subjected to key frame data optimization processing, and the key frame data after image optimization are obtained;
in a specific implementation process of the present invention, the inputting the key frame data into an adjacent key frame image optimization thread to perform key frame data optimization processing, and acquiring the key frame data after image optimization, includes: and inputting the key frame data into an adjacent key frame image optimization thread, and then sequentially performing redundant point elimination processing, semantic extraction processing and new image point processing opinion adjacent frame optimization processing on the key frame data to obtain image optimized key frame data.
Further, the semantic extraction processing is performed on the key frame data after the redundant point elimination processing, and the semantic extraction processing includes: performing object detection on the key frame data subjected to the redundant point elimination processing based on a YOLO-v3 algorithm to obtain an object detection result; performing semantic association processing on the object detection result by using a conditional random field to obtain combined object class probability and scene context information; correcting and optimizing the combined object class probability and scene context information to generate a temporary object information candidate set; judging whether the temporary object information in the temporary object information candidate set is a new object or an existing object, searching each point information of each temporary object information in the temporary object information candidate set in a corresponding neighborhood thereof, and acquiring a three-dimensional point closest to the point; and calculating the Euler distance between the point and the three-dimensional point, and if the Euler distance is smaller than a preset threshold value, considering the point and the three-dimensional point as the same point.
Specifically, after key frames are obtained, the key frames are input into an adjacent key frame image optimization thread, redundant points are removed, a semantic extraction algorithm is designed, the image optimization process of adjacent frames among the key frames is realized, the designed created semantic extraction algorithm comprises the functions of object detection, object semantic association, temporary object generation, object association, object model updating and the like, the object detection is responsible for extracting object information from the images by using a deep learning network, the extracted object information is associated with corresponding semantics by semantic tags, and then the object detection is corrected and optimized through semantic association, so that the extracted detected objects are more accurate and reliable and are stored in a temporary object information set; and the object association and update is responsible for associating the temporary object information with the existing object information in the object database according to the mapping relation among the key frame, the object information and the map point, and updating and fusing the temporary object information to the corresponding object information.
The method is used for object detection based on a YOLO algorithm, each picture is divided into N × N grids, then object detection operation is carried out on each grid only once, and finally detection results are fused together; the YOLO design solves the problem of duplicate detection.
Semantic detection is carried out on the key frame by using a YOLO algorithm, semantic association is further carried out on the object extracted by deep learning detection by using a conditional random field, and the detection classification accuracy is improved by combining object class probability and scene context information, wherein an energy equation corresponding to the designed conditional random field combining the object class probability and the context information is as follows:
Figure BDA0002273484180000161
E(x)=∑iψμ(xi)+∑i<jψP(xi,yi);
wherein x representsRandom variables of object classes, i, j, range from 1 to k, where k is the number of objects detected in the image, Z is a normalization factor ensuring that the result of the computation is a probability, e (x) is an energy function of the conditional random field, a univariate potential function ψuPlotting probability of random field graph node label category, binary potential function psiPIs to characterize the correlation between the nodes of the random field map.
Unitary potential function psiuAs follows:
ψμ=-log p(xi);
binary potential function psiPAs follows:
Figure BDA0002273484180000162
wherein, p (x)i) Representing the probability distribution, ω, of the class to which the ith object belongs given by the YOLO modelmIs a linear combination weight and μ is a tag compatibility function, representing the likelihood of the simultaneous occurrence of different classes within the neighborhood.
Semantic association is carried out on the detected objects through a conditional random field, the detection result is corrected and optimized, a temporary object information candidate set is generated, the temporary objects are judged, and whether the temporary objects are new objects or objects already exist in the candidate set is determined; and aiming at the data of each candidate object, searching each point information of the temporary object in the neighborhood thereof, finding out a three-dimensional point closest to the point from the point cloud data of the candidate object, calculating the Euler distance between the two points, and if the Euler distance between the two points is smaller than a set threshold value, considering the two points as the same point.
The error calculation module 23: the device is used for calculating an error value between the graph optimized key frame data and generating a candidate set based on the error value;
in the specific implementation process of the invention, the error value between the optimized key frame data of the graph is calculated, and the candidate set can be generated according to the error value.
The synchronized positioning and mapping module 24: and the method is used for carrying out closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and carrying out synchronous positioning and map construction based on a correction result.
In the specific implementation process of the method, closed-loop correction processing is carried out on a candidate set through global map optimization and loop-back fusion; closed loop detection is realized, the positioning precision is improved, and errors are reduced; and synchronous positioning and map construction are carried out based on the correction result.
In the embodiment of the invention, aiming at the defects that the traditional visual ORB-SLAM is easily interfered by a dynamic target in the characteristic extraction process, the extracted characteristic points only contain color brightness and geometric information and lack of object environment semantic information, in the front Tracking thread of the ORB-SLAM, firstly, the adjacent frames in the sequence frames are subjected to difference operation by using an interframe difference method, a threshold value is set, dynamic objects are eliminated, then, the mapping relation between the sequence frames and the object characteristic points is constructed, ORB characteristic extraction is carried out, the object environment information extracted based on deep learning semantics is integrated into an ORB-SLAM system, the semantic ORB-SLAM perception method for realizing environment understanding is realized, and the visual ORB-SLAM sensing method has the advantages of stable performance, difficulty in environmental interference, accurate matching and deeper environment understanding; the robot has a remarkable effect on environment perception, can obtain higher-layer cognitive information of a scene, and provides a more natural application mode for application fields including robot navigation, augmented reality and automatic driving.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.
In addition, the semantic ORB-SLAM sensing method and apparatus based on environment understanding provided by the embodiment of the present invention are described in detail above, and a specific example should be used herein to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A semantic ORB-SLAM perception method based on environmental understanding, the method comprising:
inputting the sequence frame into an ORB-SLAM front end Tracking thread to carry out key frame extraction processing, and acquiring key frame data;
inputting the key frame data into an adjacent key frame image optimization thread to perform key frame data optimization processing, and acquiring key frame data after image optimization;
calculating an error value between the graph optimized key frame data, and generating a candidate set based on the error value;
and performing closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and performing synchronous positioning and map construction based on a correction result.
2. The semantic ORB-SLAM sensing method of claim 1, wherein the inputting of the sequence frames into an ORB-SLAM front end Tracking thread for key frame extraction processing to obtain key frame data comprises:
an ORB-SLAM front end Tracking thread adopts an interframe difference method to carry out dynamic background removal processing on an input sequence frame, and a sequence frame with a dynamic background removed is obtained;
establishing a mapping relation between the sequence frame with the dynamic background removed and the object characteristic points, and acquiring a sequence frame in the mapping relation with the object characteristic points;
carrying out ORB feature extraction processing on the sequence frame in mapping relation with the object feature points to obtain ORB features of the sequence frame;
matching ORB characteristics of the current frame with ORB characteristics of the previous frame to obtain matched characteristic point pairs;
performing pose estimation and repositioning processing based on the matching feature point pairs to obtain pose estimation and repositioning results;
and performing pose estimation and repositioning processing according to the matched adjacent sequence frames to obtain pose optimization of adjacent frames, and acquiring a key frame sequence based on the pose optimization of the adjacent frames.
3. The semantic ORB-SLAM sensing method of claim 2, wherein the ORB-SLAM front end Tracking thread performs dynamic background removal processing on the input sequence frame by using an inter-frame difference method to obtain a sequence frame with a dynamic background removed, comprising:
performing difference operation on adjacent frames in the continuous time interval in the sequence frames, and performing change detection by using strong correlation of the adjacent frames in the sequence frames to obtain a moving target;
and based on the selected threshold value, eliminating the dynamic background of the moving target in the sequence frame, and acquiring the sequence frame with the dynamic background removed.
4. The semantic ORB-SLAM perception method according to claim 2, wherein the establishing a mapping relationship between the sequence frames with the dynamic background removed and object feature points to obtain the sequence frames with the mapping relationship with the object feature points comprises:
observing a picture point observed according to the sequence frame of the current frame with the dynamic background removed, and observing the sequence frame of the next frame with the dynamic background removed based on the picture point to be used as an adjacent sequence frame of the current frame with the dynamic background removed;
taking the sequence frame of the current frame without the dynamic background as a root node, and taking the adjacent sequence frame as a child node to generate a node tree;
and constructing a mapping relation between the sequence frame without the dynamic background and the object characteristic points based on the node tree, and acquiring the sequence frame in the mapping relation with the object characteristic points.
5. The semantic ORB-SLAM perception method of claim 2, wherein the pose estimation and repositioning based on the matching feature point pairs comprises:
and calculating the relative displacement of the sequence frame of the current frame and the sequence frame of the previous frame by utilizing the minimized reprojection error according to the matched feature point pairs.
6. The semantic ORB-SLAM perception method of claim 2, wherein the method further comprises:
when pose estimation and repositioning processing fails based on the matched feature point pairs, obtaining the most similar sequence frames among the sequence frames of the current frame based on the mapping relation with the object feature points;
obtaining the most similar ORB characteristics of the sequence frames, and matching the ORB characteristics of the sequence frames of the current frame with the most similar ORB characteristics of the sequence frames to obtain a first matching characteristic point pair;
and performing pose estimation and repositioning calculation again by using the first matching feature point pair to obtain a pose estimation and repositioning result.
7. The semantic ORB-SLAM perception method of claim 2, wherein the obtaining a sequence of key frames based on pose optimization of the neighboring frames comprises:
calculating the minimum re-projection error between the adjacent frames, and establishing a common view based on the minimum re-projection error;
and extracting the sequence frame in the common view as a key sequence frame.
8. The semantic ORB-SLAM sensing method of claim 1, wherein the entering the keyframe data into an adjacent keyframe graph optimization thread for keyframe data optimization processing to obtain graph-optimized keyframe data comprises:
and inputting the key frame data into an adjacent key frame image optimization thread, and then sequentially performing redundant point elimination processing, semantic extraction processing, new image point creation processing and adjacent frame optimization processing on the key frame data to obtain the key frame data after image optimization.
9. The semantic ORB-SLAM sensing method of claim 8, wherein the semantic extraction processing of the key frame data after the redundant point elimination processing comprises:
performing object detection on the key frame data subjected to the redundant point elimination processing based on a YOLO-v3 algorithm to obtain an object detection result;
performing semantic association processing on the object detection result by using a conditional random field to obtain combined object class probability and scene context information;
correcting and optimizing the combined object class probability and scene context information to generate a temporary object information candidate set;
judging whether the temporary object information in the temporary object information candidate set is a new object or an existing object, searching each point information of each temporary object information in the temporary object information candidate set in a corresponding neighborhood thereof, and acquiring a three-dimensional point closest to the point;
and calculating the Euler distance between the point and the three-dimensional point, and if the Euler distance is smaller than a preset threshold value, considering the point and the three-dimensional point as the same point.
10. A semantic ORB-SLAM aware apparatus based on environmental understanding, the apparatus comprising:
the key frame extraction module: the system comprises a sequence frame input unit, an ORB-SLAM front end Tracking thread, a key frame extraction unit, a frame synchronization unit and a frame synchronization unit, wherein the sequence frame input unit is used for inputting a sequence frame into the ORB-SLAM front end Tracking thread to perform key frame extraction processing to obtain key frame data;
a key frame optimization module: the key frame data are input into an adjacent key frame image optimization thread to be subjected to key frame data optimization processing, and the key frame data after image optimization are obtained;
an error calculation module: the device is used for calculating an error value between the graph optimized key frame data and generating a candidate set based on the error value;
the synchronous positioning and map building module: and the method is used for carrying out closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and carrying out synchronous positioning and map construction based on a correction result.
CN201911113708.7A 2019-11-14 2019-11-14 Semantic ORB-SLAM sensing method and device based on environment understanding Active CN110930519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911113708.7A CN110930519B (en) 2019-11-14 2019-11-14 Semantic ORB-SLAM sensing method and device based on environment understanding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911113708.7A CN110930519B (en) 2019-11-14 2019-11-14 Semantic ORB-SLAM sensing method and device based on environment understanding

Publications (2)

Publication Number Publication Date
CN110930519A true CN110930519A (en) 2020-03-27
CN110930519B CN110930519B (en) 2023-06-20

Family

ID=69852948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911113708.7A Active CN110930519B (en) 2019-11-14 2019-11-14 Semantic ORB-SLAM sensing method and device based on environment understanding

Country Status (1)

Country Link
CN (1) CN110930519B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375869A (en) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 Robot repositioning method, robot and computer-readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106373141A (en) * 2016-09-14 2017-02-01 上海航天控制技术研究所 Tracking system and tracking method of relative movement angle and angular velocity of slowly rotating space fragment
CN110125928A (en) * 2019-03-27 2019-08-16 浙江工业大学 A kind of binocular inertial navigation SLAM system carrying out characteristic matching based on before and after frames
CN110363816A (en) * 2019-06-25 2019-10-22 广东工业大学 A kind of mobile robot environment semanteme based on deep learning builds drawing method
CN110378345A (en) * 2019-06-04 2019-10-25 广东工业大学 Dynamic scene SLAM method based on YOLACT example parted pattern
CN110378997A (en) * 2019-06-04 2019-10-25 广东工业大学 A kind of dynamic scene based on ORB-SLAM2 builds figure and localization method
US20200218929A1 (en) * 2017-09-22 2020-07-09 Huawei Technologies Co., Ltd. Visual slam method and apparatus based on point and line features

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106373141A (en) * 2016-09-14 2017-02-01 上海航天控制技术研究所 Tracking system and tracking method of relative movement angle and angular velocity of slowly rotating space fragment
US20200218929A1 (en) * 2017-09-22 2020-07-09 Huawei Technologies Co., Ltd. Visual slam method and apparatus based on point and line features
CN110125928A (en) * 2019-03-27 2019-08-16 浙江工业大学 A kind of binocular inertial navigation SLAM system carrying out characteristic matching based on before and after frames
CN110378345A (en) * 2019-06-04 2019-10-25 广东工业大学 Dynamic scene SLAM method based on YOLACT example parted pattern
CN110378997A (en) * 2019-06-04 2019-10-25 广东工业大学 A kind of dynamic scene based on ORB-SLAM2 builds figure and localization method
CN110363816A (en) * 2019-06-25 2019-10-22 广东工业大学 A kind of mobile robot environment semanteme based on deep learning builds drawing method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375869A (en) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 Robot repositioning method, robot and computer-readable storage medium
CN115375869B (en) * 2022-10-25 2023-02-10 杭州华橙软件技术有限公司 Robot repositioning method, robot and computer-readable storage medium

Also Published As

Publication number Publication date
CN110930519B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
CN111563442B (en) Slam method and system for fusing point cloud and camera image data based on laser radar
CN110335319B (en) Semantic-driven camera positioning and map reconstruction method and system
CN111060115B (en) Visual SLAM method and system based on image edge features
CN106909877B (en) Visual simultaneous mapping and positioning method based on dotted line comprehensive characteristics
CN111724439B (en) Visual positioning method and device under dynamic scene
CN110781262B (en) Semantic map construction method based on visual SLAM
CN111080659A (en) Environmental semantic perception method based on visual information
CN109584302B (en) Camera pose optimization method, camera pose optimization device, electronic equipment and computer readable medium
CN109974743B (en) Visual odometer based on GMS feature matching and sliding window pose graph optimization
CN109815847B (en) Visual SLAM method based on semantic constraint
CN111462207A (en) RGB-D simultaneous positioning and map creation method integrating direct method and feature method
CN107369183A (en) Towards the MAR Tracing Registration method and system based on figure optimization SLAM
CN111899334A (en) Visual synchronous positioning and map building method and device based on point-line characteristics
CN110852241B (en) Small target detection method applied to nursing robot
KR20200063368A (en) Unsupervised stereo matching apparatus and method using confidential correspondence consistency
Xue et al. Boundary-induced and scene-aggregated network for monocular depth prediction
CN111767854B (en) SLAM loop detection method combined with scene text semantic information
CN111709317B (en) Pedestrian re-identification method based on multi-scale features under saliency model
Pu et al. Visual SLAM integration with semantic segmentation and deep learning: A review
Zhang et al. Improved feature point extraction method of ORB-SLAM2 dense map
CN110930519B (en) Semantic ORB-SLAM sensing method and device based on environment understanding
CN117036653A (en) Point cloud segmentation method and system based on super voxel clustering
CN113570713B (en) Semantic map construction method and device for dynamic environment
Tao et al. 3d semantic vslam of indoor environment based on mask scoring rcnn
Sun et al. Kinect depth recovery via the cooperative profit random forest algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A semantic ORB-SLAM perception method and device based on environmental understanding

Effective date of registration: 20231130

Granted publication date: 20230620

Pledgee: Guangdong Shunde Rural Commercial Bank Co.,Ltd. science and technology innovation sub branch

Pledgor: SOUTH CHINA ROBOTICS INNOVATION Research Institute

Registration number: Y2023980068232

PE01 Entry into force of the registration of the contract for pledge of patent right