CN110827395B - Instant positioning and map construction method suitable for dynamic environment - Google Patents

Instant positioning and map construction method suitable for dynamic environment Download PDF

Info

Publication number
CN110827395B
CN110827395B CN201910848375.6A CN201910848375A CN110827395B CN 110827395 B CN110827395 B CN 110827395B CN 201910848375 A CN201910848375 A CN 201910848375A CN 110827395 B CN110827395 B CN 110827395B
Authority
CN
China
Prior art keywords
map
frame
points
key frame
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910848375.6A
Other languages
Chinese (zh)
Other versions
CN110827395A (en
Inventor
陈文峰
蔡述庭
李丰
徐伟峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910848375.6A priority Critical patent/CN110827395B/en
Publication of CN110827395A publication Critical patent/CN110827395A/en
Application granted granted Critical
Publication of CN110827395B publication Critical patent/CN110827395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/005Tree description, e.g. octree, quadtree
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Remote Sensing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an instant positioning and map building method suitable for a dynamic environment, which comprises the following steps: firstly, externally calibrating a depth camera and an encoder to obtain a transformation matrix from an encoder coordinate to a depth camera coordinate; then, each frame of image collected by the depth camera is layered by using an image pyramid, and then ORB feature points are extracted from each layer of the image pyramid; taking data obtained by an encoder as a constant-speed model of a tracking module, projecting the characteristic points of the key frame of the previous frame onto the current frame, matching the characteristic points with the characteristic points of the current frame, and obtaining the pose between the two frames by triangulation of the successfully matched key points; the tracking part of the invention uses the encoder data as the initial pose when the camera re-projects, realizes the accurate positioning tracking in the dynamic environment, and saves a large amount of time compared with the traditional semantic slam tracking module which needs each frame to eliminate the dynamic pixels.

Description

Instant positioning and map construction method suitable for dynamic environment
Technical Field
The invention relates to the technical field of positioning navigation and map building of mobile robots, in particular to an instant positioning and map building method suitable for a dynamic environment.
Background
Slam (instant positioning and mapping) is widely applied to the fields of mobile robots, unmanned planes, augmented reality and the like. However, in a dynamic environment, the conventional slam system is not robust. In a real scene full of dynamic objects, a tracking module in a classic SLAM system like an ORB _ SLAM fails, and what is worse, a static map is established by the classic SLAM system, and the dynamic objects cannot be removed from the map, so that the map cannot be used at all. In order to solve the problem of how to adapt the SLAM system to the dynamic scene, many paper documents provide different solutions, and DS _ SLAM is an effective solution, and the solution performs semantic segmentation on each input frame image, calculates an essential matrix by using the segmented image, then judges whether the images are dynamic objects by using the essential matrix, and if so, eliminates the dynamic objects. However, because the semantic segmentation is carried out on each frame of image, the system is difficult to achieve real-time performance on a mobile platform, and a dense point cloud map is established, so that the memory occupation is high.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an instant positioning and map construction method suitable for a dynamic environment.
The purpose of the invention is realized by the following technical scheme:
an instant positioning and mapping method suitable for dynamic environment comprises the following steps:
firstly, carrying out external reference calibration on a depth camera and an encoder to obtain a transformation matrix from an encoder coordinate to a depth camera coordinate; then, each frame of image collected by the depth camera is layered by using an image pyramid, and then ORB feature points are extracted from each layer of the image pyramid; taking data obtained by an encoder as a constant-speed model of a tracking module, projecting the feature points of the key frame of the previous frame onto the current frame, matching the feature points with the feature points of the current frame, and obtaining the pose between the two frames by triangulation of the successfully matched key points;
step two, the tracking module judges whether the current frame is selected as a key frame according to the principle that whether more than 20 frames are past from the current frame or not through the latest global optimization and whether 50 map points can be detected from the current frame or not, then the key frame is transmitted to the dynamic object removing module, and the dynamic object removing module carries out semantic segmentation on the key frame by using a network;
thirdly, projecting the key frame selected by the tracking module in a local map optimization module to obtain map points, and then optimizing by using lacol BA; then, using a DBow2 bag-of-words model to perform loop detection in a loop detection module, and performing global optimization on all map points and key frames in the past by using BA (building block), so that the pose of the robot is optimized and the map points are also optimized;
and step four, receiving the globally optimized key frame and the result of semantic segmentation, projecting the key frame to the 3d point cloud, coloring the point cloud according to different semantic segmentation labels, and then establishing a semantic octree map in a semantic octree map module.
Preferably, the tracking module works as follows:
the depth camera collects color images and depth images at the frequency of 30 frames per second, the color images and the depth images are issued in a topic form through an iai _ kinect2 software package in ros, and then the slam system monitors the two topics to receive the collected color images and depth images; the mobile robot issues encoder data at a frequency of 90hz, the slam system acquires the encoder data through monitoring topics, and then the system respectively preprocesses the encoder data and images.
Preferably, the preprocessing of the encoder data is specifically:
firstly, at the moment t, the pose of the robot is xi t =[x t y t z t ] T And according to the odometer motion model, the pose of the robot at the t +1 moment can be obtained as follows:
Figure BDA0002196061310000031
wherein, delta s is the moving distance of the center of the robot, and Delta theta is the moving angle value of the robot; the readings of the left and right code discs of the encoder can be obtained by monitoring the topic of the encoder, and the following relations are satisfied between the readings of the left and right code discs of the encoder and the movement of the robot:
Figure BDA0002196061310000032
where b is the wheelbase between two wheels,. DELTA.s r And Δ s l The difference between the readings of the left and right coded disks at the moment from t to t + 1; therefore, the transformation matrix of the robot coordinate from t to t +1 can be obtained as follows:
Figure BDA0002196061310000033
preferably, the preprocessing the image is divided into preprocessing a color image and a depth image, specifically:
the color image obtained from the topic is a three-channel color image, and the color image must be converted into a single-channel gray image by using an OpenCV library function; meanwhile, the depth map is also a three-channel image, and the depth map needs to be converted into a single-channel image of CV _32F by a mapping factor;
after data are preprocessed, entering an initialization part of a tracking module, and selecting a first frame image as a key frame; then, when a new frame arrives, the main thread of the tracking module is entered:
(1) Using an image pyramid to extract orb characteristic points including orb key points and descriptors under 12 transformation scales;
(2) Re-projecting the feature points using the encoder model; in the data preprocessing, a transformation matrix of the robot coordinate from t to t +1 is obtained, and the acquisition frequency of an encoder is the acquisition frequency of a camera3 times of the previous frame, and the encoder data and the camera data have the same time stamp to correspond to each other, so that a transformation matrix from the current frame to the previous frame can be obtained
Figure BDA0002196061310000041
Let p be c The method comprises the following steps that (1) the camera coordinate of a certain pixel of a current frame is defined, and in a camera pinhole model, the 2d pixel coordinate is converted into a function operation of a 3d point camera coordinate; the current frame pixel p c The projection result projected to the previous frame can be expressed as
Figure BDA0002196061310000042
Wherein
Figure BDA0002196061310000043
Is an external parameter matrix of the robot and is determined by the relative poses of the robot and the camera; then can be at p c Finding matched characteristic points in a pixel region of 3 sigma in the method, wherein the error of an encoder is subject to normal distribution, and sigma is the standard deviation of the normal distribution;
(3) Then selecting key points with highest descriptor matching scores in the region as matching points; traversing all the matching points, and carrying out re-projection on the depth values of the matching points to eliminate mismatching caused by a dynamic environment; firstly, projecting the matched characteristic point pair to world coordinates to obtain a 3d point p, then projecting the 3d point to a nearest key frame of the frame, and reconstructing the map point on the key frame
Figure BDA0002196061310000044
If the map point corresponding to this matching point is static, the depth value between p' and p is unchanged, so if the difference between the depth values is larger than an empirical threshold d th If the matching point is a dynamic point, the depth value of the matching point is set to zero; d can be obtained from the following formula th =d b +k d * z', wherein k d Related to scale, d b Is a base threshold, let d generally b =0.2m and k d =0.025;
(4) Calling a PNP function provided in OpenCV, and calculating the pose between the two frames of pictures by using the ePNP and the matched key points;
(5) Finally, judging the key frame, judging whether the current frame meets the condition called as the key frame, and if the current frame meets the following condition, considering the current frame as the key frame; the map points are obtained by converting the feature points on the key frame into world coordinates;
(1) more than 20 frames from the last global relocation;
(2) the local mapping process is idle, or more than 20 frames are inserted from the last key frame;
(3) the current frame can track at least 50 map points;
(4) the current frame can track map points less than 90% of the map points in the reference key frame.
Preferably, the local map optimization module mainly manages map points and key frames, and optimizes a transformation matrix between the map points and the key frames through local BA; the working steps of the local map optimization module are as follows:
(1) And (3) removing map points, namely removing the map points which do not meet the following conditions from the system:
(1) the point is predicted to be in a key frame which can observe the point, and more than 25% of key frames can track the point;
(2) it must be observed by more than three key frames;
(2) Creating a new map point; after a new key frame is generated, performing feature matching on the secondary key frame and the key frame connected with the secondary key frame, and triangulating the generated feature points to obtain 3d points which are new map points;
(3) BA optimization of the local map, namely establishing a cost function between map points and feature points by utilizing the re-projection relationship between the map points and the feature points of the key frame, and then enabling an error value of the cost function to be minimum by using external parameters of the camera;
(4) And key frame elimination, wherein if 90% of key points can be observed by more than 3 key frames, the excessive key frames are considered as redundant key frames, and are eliminated.
Preferably, the dynamic object elimination module comprises the following working steps:
when the tracking module generates a new key frame, the color image and the depth image corresponding to the key frame are issued in a ros system topic form, and then the dynamic object removing module thread receives the topic, so that the color image and the depth image of which the dynamic pixels need to be removed are obtained; because the semantic segmentation network used in the key frame dynamic object elimination module is realized on a pytorch platform and must be realized by a Python language, and other modules are realized by c + +, a ros communication mechanism is needed among cross-platforms;
before semantic segmentation is carried out, a network for semantic segmentation needs to be trained, a vocpascal2012 data set is adopted to train an ICNet network, the selected reason is that the data set marks the background into one class, and marks other different objects such as people, tables and chairs into other classes, so that the background and the dynamic objects are conveniently separated when the image is subjected to semantic segmentation;
after receiving the color image and the depth image, the color image can be segmented by using an ICNet, the segmented result is a single-channel gray-scale image, the single-channel gray-scale image is processed into a three-channel color image by using an OpenCV (open cell vehicle), then a label marked on the color image is colored by using a pre-made color bar, and the final semantic segmentation result is obtained after the coloring is finished;
the semantic segmentation divides the picture by using pre-labeled labels, wherein the 0 th type label is a background, the 15 th type label is a person, and for an indoor dynamic scene, a dynamic object to be removed is mainly a person; therefore, firstly carrying out binarization on the image, setting the gray value of the label belonging to the 15 th class as 255, and then using an OpenCV region expansion function to expand the pixel region of the label of the 15 th class; then, in the depth map, setting the depth value of a pixel which is 255 in the corresponding binary image to be zero, and eliminating the dynamic pixel;
and finally, sending the depth map with the dynamic pixels removed and the color map subjected to semantic segmentation back to the slam system in a topic form.
Preferably, the working steps of the loop detection module are as follows:
after the local map optimization module is finished, the key frames are transmitted to the loop detection module, and then the loop detection module calls the bag-of-word vectors of the current key frames and the bag-of-word vectors of other key frames for similarity test, as shown in the following formula:
Figure BDA0002196061310000071
wherein v is 1 And v 2 The word bag vector reflects the characteristic point information in one image and can be obtained by calling the library function of the word bag model; if the similarity between the current key frame and a key frame of a certain previous frame is 1.1 times higher than a similarity threshold, the current key frame and the previous key frame are considered to have the same pose, and loop detection is successful and used as a condition for global key frame optimization; the similarity threshold is obtained by similarity calculation between the key frame and four adjacent key frames, and the value with the highest similarity is set as the similarity threshold.
Preferably, the semantic octree map module works as follows:
after the loop detection is finished, a semantic octree map module receives a color image and a depth image obtained after semantic segmentation; p is a radical of formula a =[x a y a 1] T For a pixel in a color image, p is divided a Conversion to world coordinates
Figure BDA0002196061310000072
Traversing each pixel in the color map to convert the pixels to be under the world coordinate to obtain a 3d point under the world coordinate system; then converting the 3d point into a color point cloud data type pcl, wherein the point cloud data type is PointXYZRGB, the point cloud data type is a six-dimensional variable, the front three-dimensional is the world coordinate of the point cloud, and the rear three-dimensional is the color of the point cloud; after segmentation, different labels have different colors on the color image, and point cloud data is colored according to the result; class 0 tags are background, so class 0 tags correspondAfter the pixels are converted into point cloud data, the point clouds are set to be silvery white; other labels are set to be different colors in sequence;
after point cloud data containing semantic information exist, the point cloud data can be directly converted into an octree map; calling a library of the octree map in c + +, creating an octree object of ColorOcTree, then inserting point cloud data into the octree map, finally setting the resolution of the octree map to be 0.05, setting the hit probability to be 0.7 and setting the miss probability to be 0.55; and then releasing the established octree in a ros topic/octomap form.
Compared with the prior art, the invention has the following beneficial effects:
(1) The tracking part of the invention uses encoder data as an initial pose when the camera re-projects, thereby realizing accurate positioning tracking in a dynamic environment, and saving a large amount of time compared with a traditional semantic slam tracking module which needs each frame to remove dynamic pixels; the semantic segmentation part is only arranged in the key frame, so that dynamic pixels in the map can be removed, and a reliable static map is established; meanwhile, the semantic segmentation part is innovatively placed on a TensorFlow platform, which is different from a coffee platform of the traditional semantic slam, so that not only is the time for semantic segmentation improved, but also the precision of semantic segmentation is improved;
(2) The method is different from the slam scheme of the fusion encoder, the encoder is only used as a constant speed model when the color image characteristic points are re-projected, the encoder is used as a main sensor in the traditional dynamic environment slam of the fusion encoder data, and the encoder data can be optimized in the map local optimization and loop detection; however, since the encoder is easy to slip and the data may fail, there may be a considerable error in some scenarios when the encoder is used as the main sensor, and this problem can be effectively avoided when the encoder is used as the constant-speed model.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a graph showing the performance comparison between an ICNet network and a conventional semantic segmentation network according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the embodiments of the present invention are not limited thereto.
The invention provides an instant positioning and map construction method suitable for a dynamic environment, which is characterized in that on the basis of a DS _ SLAM system, encoder data is integrated, so that a tracking part in the dynamic environment is more robust, only a key frame part is subjected to image segmentation, an image segmentation result is only used for an image construction part, and the time is greatly shortened; in addition, a semantic segmentation network on a TensorFlow platform is innovatively combined, and compared with the semantic segmentation network under the traditional caffe framework, the new network has better performance; a new network introduces a ros system multi-node communication mechanism, and a node thread is independently opened up for semantic segmentation; and finally, establishing a semantic octree map according to the result of semantic segmentation.
Specifically, as shown in fig. 1 to 2, a method for instant positioning and mapping in a dynamic environment includes the following steps:
firstly, externally calibrating a depth camera and an encoder to obtain a transformation matrix from an encoder coordinate to a depth camera coordinate; then, layering each frame of image collected by the depth camera by using an image pyramid, and extracting ORB feature points from each layer of the image pyramid; taking data obtained by an encoder as a constant-speed model of a tracking module, projecting the feature points of the key frame of the previous frame onto the current frame, matching the feature points with the feature points of the current frame, and obtaining the pose between the two frames by triangulation of the successfully matched key points;
the tracking module comprises the following working steps:
the depth camera collects color images and depth images at the frequency of 30 frames per second, the color images and the depth images are issued in a topic form through an iai _ kinect2 software package in ros, and then the slam system monitors the two topics to receive the collected color images and depth images; the mobile robot issues encoder data at a frequency of 90hz, the slam system acquires the encoder data through monitoring topics, and then the system respectively preprocesses the encoder data and images.
The encoder data preprocessing specifically comprises:
firstly, at the moment t, the pose of the robot is xi t =[x t y t z t ] T And according to the odometer motion model, the pose of the robot at the t +1 moment can be obtained as follows:
Figure BDA0002196061310000101
wherein, delta s is the moving distance of the center of the robot, and Delta theta is the moving angle value of the robot; the readings of the left and right code discs of the encoder can be obtained by monitoring the topic of the encoder, and the following relations are satisfied between the readings of the left and right code discs of the encoder and the movement of the robot:
Figure BDA0002196061310000102
where b is the wheelbase between two wheels,. DELTA.s r And Δ s l The difference between the readings of the left and right code discs at the moment from t to t + 1; therefore, the transformation matrix of the robot coordinate from t to t +1 can be obtained as follows:
Figure BDA0002196061310000103
the image preprocessing comprises the following steps of preprocessing a color image and a depth image:
the color image obtained from the topic is a three-channel color image, and the color image must be converted into a single-channel gray image by using an OpenCV library function; meanwhile, the depth map is also a three-channel image, and the depth map needs to be converted into a single-channel image of CV _32F by a mapping factor;
after preprocessing the data, entering an initialization part of a tracking module, and selecting a first frame image as a key frame; then, when a new frame arrives, the main thread of the tracking module is entered:
(1) Using an image pyramid to extract orb characteristic points including orb key points and descriptors under 12 transformation scales;
(2) Re-projecting the feature points using the encoder model; in the data preprocessing, a transformation matrix of a robot coordinate from t to t +1 is obtained, the acquisition frequency of an encoder is 3 times that of a camera, and the encoder data and the camera data have the same time stamp and can be corresponded, so that the transformation matrix from a current frame to a previous frame can be obtained
Figure BDA0002196061310000111
Suppose p c The method comprises the following steps that (1) the camera coordinate of a certain pixel of a current frame is defined, and in a camera pinhole model, the 2d pixel coordinate is converted into a function operation of a 3d point camera coordinate; the current frame pixel p c The projection result projected to the previous frame can be expressed as
Figure BDA0002196061310000112
Wherein
Figure BDA0002196061310000113
Is an external parameter matrix of the robot and is determined by the relative poses of the robot and the camera; then can be at p c Finding matched characteristic points in a pixel region of 3 sigma in the method, wherein the error of an encoder is subject to normal distribution, and sigma is the standard deviation of the normal distribution;
(3) Then selecting key points with highest descriptor matching scores in the region as matching points; traversing all the matching points, re-projecting the depth values of the matching points, and eliminating mismatching caused by a dynamic environment; firstly, projecting the matched characteristic point pair to world coordinates to obtain a 3d point p, then projecting the 3d point to a nearest key frame of the frame, and reconstructing the map point on the key frame
Figure BDA0002196061310000121
If the map point corresponding to this matching point is static, the depth value between p' and p is unchanged, so if the difference between the depth values is larger than an empirical threshold d th Then the matching point is considered as a dynamic point and its depth value is setZero; d can be obtained from the following formula th =d b +k d * z' wherein k d Related to scale, d b Is a base threshold, let d generally b =0.2m and k d =0.025;
(4) Calling a PNP function provided in OpenCV, and calculating the pose between the two frames of pictures by using the ePNP and the matched key points;
(5) Finally, judging the key frame, judging whether the current frame meets the condition called as the key frame, and if the current frame meets the following condition, considering the current frame as the key frame; wherein, the map points are obtained by converting the characteristic points on the key frame into world coordinates;
(1) more than 20 frames from the last global relocation;
(2) the local mapping process is idle or more than 20 frames away from the last key frame insertion;
(3) the current frame can track at least 50 map points;
(4) the current frame can track map points less than 90% of the map points in the reference key frame.
Step two, the tracking module judges whether the current frame is selected as a key frame according to the principle that whether more than 20 frames are past from the current frame or not through the latest global optimization and whether 50 map points can be detected from the current frame or not, then the key frame is transmitted to the dynamic object removing module, and the dynamic object removing module carries out semantic segmentation on the key frame by using a network;
the dynamic object removing module comprises the following working steps:
when the tracking module generates a new key frame, the color image and the depth image corresponding to the key frame are issued in a ros system topic form, and then the dynamic object removing module thread receives the topic, so that the color image and the depth image of which the dynamic pixels need to be removed are obtained; because the semantic segmentation network used in the key frame dynamic object elimination module is realized on a pytorch platform and must be realized by a Python language, and other modules are realized by c + +, a ros communication mechanism is needed among cross-platforms;
before semantic segmentation is carried out, a network for semantic segmentation needs to be trained, a vocpascal2012 data set is adopted to train an ICNet network, the selected reason is that the data set marks the background into one class, and marks other different objects such as people, tables and chairs into other classes, so that the background and the dynamic objects are conveniently separated when the image is subjected to semantic segmentation;
after receiving the color image and the depth image, the color image can be segmented by using an ICNet, the segmented result is a single-channel gray-scale image, the single-channel gray-scale image is processed into a three-channel color image by using an OpenCV (open cell vehicle), then a label marked on the color image is colored by using a pre-made color bar, and the final semantic segmentation result is obtained after the coloring is finished;
the semantic segmentation divides the picture by using pre-labeled labels, wherein the 0 th type label is a background, the 15 th type label is a person, and for an indoor dynamic scene, a dynamic object to be removed is mainly a person; therefore, firstly carrying out binarization on the image, setting the gray value of the label belonging to the 15 th class as 255, and then using an OpenCV region expansion function to expand the pixel region of the label of the 15 th class; then, in the depth map, setting the depth value of a pixel which is 255 in the corresponding binary image to be zero, and eliminating the dynamic pixel;
and finally, sending the depth map without the dynamic pixels and the color map subjected to semantic segmentation processing back to the slam system in a topic form.
Thirdly, projecting the key frame selected by the tracking module in a local map optimization module to obtain map points, and then optimizing by using lacol BA; then, using a DBow2 bag-of-words model to perform loop detection in a loop detection module, and performing global optimization on all map points and key frames in the past by using BA (building block), so that the pose of the robot is optimized and the map points are also optimized;
the local map optimization module is mainly used for managing map points and key frames and optimizing a conversion matrix between the map points and the key frames through local BA; the working steps of the local map optimization module are as follows:
(1) And (3) removing map points, namely removing the map points which do not meet the following conditions from the system:
(1) the point is predicted to be in a key frame which can observe the point, and more than 25% of key frames can track the point;
(2) it must be observed by more than three key frames;
(2) Creating a new map point; after a new key frame is generated, performing feature matching on the secondary key frame and the key frame connected with the secondary key frame, and triangulating the generated feature points to obtain 3d points which are new map points;
(3) BA optimization of the local map, namely establishing a cost function between map points and feature points by utilizing the re-projection relationship between the map points and the feature points of the key frame, and then enabling an error value of the cost function to be minimum by using external parameters of the camera;
(4) And key frame elimination, wherein if 90% of key points can be observed by more than 3 key frames, the excessive key frames are considered as redundant key frames, and are eliminated.
The working steps of the loop detection module are as follows:
after the local map optimization module is finished, the key frames are transmitted to the loop detection module, and then the loop detection module calls the bag-of-word vector of the current key frame and performs similarity detection with the bag-of-word vectors of other key frames, as shown in the following formula:
Figure BDA0002196061310000141
wherein v is 1 And v 2 The bag-of-word vector reflects feature point information in an image and can be obtained by calling a library function of a bag-of-word model; if the similarity between the current key frame and a key frame of a certain previous frame is 1.1 times higher than a similarity threshold, the current key frame and the previous key frame are considered to have the same pose, and loop detection is successful and used as a condition for global key frame optimization; the similarity threshold is obtained by calculating the similarity between the key frame and the adjacent four key frames, and the value with the highest similarity is set as the similarity threshold.
And step four, receiving the globally optimized key frame and the result of semantic segmentation, projecting the key frame to the 3d point cloud, coloring the point cloud according to different semantic segmentation labels, and then establishing a semantic octree map in a semantic octree map module.
The semantic octree map module comprises the following working steps:
after loop detection is finished, a semantic octree map module receives a color image and a depth image obtained after semantic segmentation; p is a radical of a =[x a y a 1] T For a pixel in a color image, p is divided a Conversion to world coordinates
Figure BDA0002196061310000151
Traversing each pixel in the color map to convert the pixels to be under the world coordinate to obtain a 3d point under the world coordinate system; then converting the 3d points into a color point cloud data type pcl, wherein the point cloud data type is a six-dimensional variable, the front three-dimensional is the world coordinate of the point cloud, and the rear three-dimensional is the color of the point cloud; after segmentation, different labels have different colors on the color image, and point cloud data is colored according to the result; the 0 th type label is a background, so after pixels corresponding to the 0 th type label are converted into point cloud data, the point clouds are set to be silvery white; other labels are set to be different colors in sequence;
after point cloud data containing semantic information exist, the point cloud data can be directly converted into an octree map; calling a library of the octree map in c + +, creating an octree object of ColorOcTree, then inserting point cloud data into the octree map, finally setting the resolution of the octree map to 0.05, setting the hit probability to 0.7 and setting the miss probability to 0.55; and then issuing the established octree in a ros topic/octomap form.
The traditional semantic segmentation slam is generally based on a caffe framework, and few selectable semantic segmentation networks exist under the caffe framework, so that the segmentation effect is not good even if the speed is low; as shown in fig. 2, the abscissa represents the number of frames processed per second, the ordinate represents the precision segmentation of the image, the unit is mloU%, and the ICNet (outputs) is far ahead of the Segnet network used in the traditional semantic segmentation framework in terms of the forward running time and precision of the network, and the performance can achieve real-time performance.
Compared with the SLAM method based on laser radar in a dynamic environment, the SLAM object state detection method of the robot in a dynamic sparse environment and the indoor personnel autonomous positioning method based on the integration of SLAM and gait IMU in the prior art, the method has the main innovation points that:
1. an innovative method for fusing encoder data is used, only the encoder data is used as a transformation matrix of the reprojection between the current frame and the key frame of the camera, and the shallow fusion of the encoder data and the camera data is realized; the huge influence on the system caused by the error of the encoder data due to the slip is avoided.
2. C + + and Python multi-node communication is achieved by using a ros system, so that a semantic segmentation network under a TensorFlow frame is innovatively added into the slam system, and the performance of semantic segmentation is effectively improved.
3. The semantic octree map is innovatively removed, image information obtained by semantic segmentation is fully utilized, and a more complete and more practical semantic map is established.
The tracking part of the invention uses encoder data as the initial pose of the camera during re-projection, thus realizing accurate positioning tracking in a dynamic environment, and saving a large amount of time compared with a traditional semantic slam tracking module which needs each frame to eliminate dynamic pixels; the semantic segmentation part is only arranged in the key frame, so that dynamic pixels in the map can be removed, and a reliable static map is established; meanwhile, the semantic segmentation part is innovatively placed on a TensorFlow platform, which is different from a caffe platform of the traditional semantic slam, so that not only is the time for semantic segmentation improved, but also the precision of semantic segmentation is improved.
The method is different from the slam scheme of the fusion encoder, the encoder is only used as a constant speed model when the color image characteristic points are re-projected, the encoder is used as a main sensor in the traditional dynamic environment slam of the fusion encoder data, and the encoder data can be optimized in the map local optimization and loop detection; however, since the encoder is easy to slip and the data may fail, there may be a considerable error in some scenarios when the encoder is used as the main sensor, and this problem can be effectively avoided when the encoder is used as the constant-speed model.
The present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents and equivalents thereof, which are intended to be included in the scope of the present invention.

Claims (4)

1. An instant positioning and mapping method suitable for dynamic environment, characterized by comprising the following steps:
firstly, carrying out external reference calibration on a depth camera and an encoder to obtain a transformation matrix from an encoder coordinate to a depth camera coordinate; then, each frame of image collected by the depth camera is layered by using an image pyramid, and then ORB feature points are extracted from each layer of the image pyramid; taking data obtained by an encoder as a constant-speed model of a tracking module, projecting the feature points of the key frame of the previous frame onto the current frame, matching the feature points with the feature points of the current frame, and obtaining the pose between the two frames by triangulation of the successfully matched key points;
step two, judging whether the current frame is selected as a key frame or not by the tracking module according to the principle that whether more than 20 frames have passed from the current frame or not through the last global optimization and whether 50 map points can be detected from the current frame or not by the tracking module, and then transmitting the key frame to the dynamic object removing module, wherein the dynamic object removing module performs semantic segmentation on the key frame by using a network;
when the tracking module generates a new key frame, the color image and the depth image corresponding to the key frame are issued in a ros system topic form, and then the dynamic object elimination module thread receives the topic, so as to obtain the color image and the depth image of which the dynamic pixels need to be eliminated; because the semantic segmentation network used in the key frame dynamic object elimination module is realized on a pytorech platform and must be realized by Python language, and other modules are realized by c + +, a ros communication mechanism is required among cross-platforms;
before semantic segmentation is carried out, a network for semantic segmentation is required to be trained, a vocacal 2012 data set is adopted to train an ICNet network, the selected reason is that the data set marks the background into one class, and marks other different objects including people, tables and chairs into other classes, so that the background and the dynamic objects are conveniently separated when the image is subjected to semantic segmentation;
after receiving the color image and the depth image, carrying out image segmentation on the color image by using an ICNet, wherein the segmented result is a single-channel gray-scale image, the color image is processed into a three-channel color image by using an OpenCV (open cell vehicle), then a label marked on the color image is colored by using a pre-made color bar, and the result of final semantic segmentation is obtained after coloring is finished;
the semantic segmentation divides the picture by using pre-labeled labels, wherein the 0 th label is a background, the 15 th label is a person, and for an indoor dynamic scene, the dynamic object to be removed is a person; therefore, firstly, carrying out binarization on the image, setting the gray value of the label belonging to the 15 th class as 255, and then expanding the pixel area of the label of the 15 th class by using an OpenCV area expansion function; then, in the depth map, setting the depth value of a pixel which is 255 in the corresponding binary image as zero, and realizing the elimination of the dynamic pixel;
finally, the depth map with the dynamic pixels removed and the color map subjected to semantic segmentation are sent back to the slam system in a topic form;
thirdly, projecting the key frame selected by the tracking module in a local map optimization module to obtain map points, and then optimizing by using a local map BA; then, using a DBow2 bag-of-words model to perform loop detection in a loop detection module, and performing global optimization on all map points and key frames in the past by using BA (building block), so that the pose of the robot is optimized and the map points are also optimized;
step four, receiving the globally optimized key frame and the result after semantic segmentation, projecting the key frame to a 3d point cloud, coloring the point cloud according to different semantic segmentation labels, and then establishing a semantic octree map in a semantic octree map module;
the working steps of the tracking module are as follows:
the depth camera collects color images and depth images at the frequency of 30 frames per second, the color images and the depth images are issued in a topic form through an iai _ kinect2 software package in ros, and then the slam system monitors the two topics to receive the collected color images and depth images; the mobile robot issues encoder data at a frequency of 90hz, the slam system acquires the encoder data through monitoring topics, and then the system respectively preprocesses the encoder data and images;
the image preprocessing comprises the following steps of preprocessing a color image and a depth image:
the color image obtained from the topic is a three-channel color image, and the color image must be converted into a single-channel gray image by using an OpenCV library function; meanwhile, the depth map is also a three-channel image, and the depth map needs to be converted into a single-channel image of CV _32F by a mapping factor;
after preprocessing the data, entering an initialization part of a tracking module, and selecting a first frame image as a key frame; then, every time a new frame arrives, the main thread entering the tracking module comprises the following steps (1) to (5):
(1) Using an image pyramid to extract orb characteristic points including orb key points and descriptors under 12 transformation scales;
(2) Re-projecting the feature points using an encoder model; the transformation matrix of the robot coordinate from t to t +1 moment is obtained in the data preprocessing, the acquisition frequency of the encoder is 3 times of that of the camera, and the encoder data and the camera data have the same timestamp and are corresponding to each other, so that the transformation matrix from the current frame to the previous frame is obtained
Figure FDA0003769698190000021
T rr A transformation matrix representing the robot coordinate from t to t + 1; suppose p c Is the camera coordinate of a certain pixel of the current frame, and in the definition of the camera pinhole model, the 2d imageFunction operation of converting pixel coordinates into 3d point camera coordinates; the current frame pixel p c The projection result projected to the previous frame is represented as
Figure FDA0003769698190000031
Wherein
Figure FDA0003769698190000032
The external parameter matrix is an external parameter matrix of the robot and is determined by the relative poses of the robot and the camera; then just at p c ' in 3 sigma pixel area, finding matched characteristic points, the error of the encoder obeys normal distribution, wherein sigma is the standard deviation of the normal distribution;
(3) Then selecting the key point with the highest descriptor matching score in the region as a matching point; traversing all the matching points, and carrying out re-projection on the depth values of the matching points to eliminate mismatching caused by a dynamic environment; firstly, projecting the matched characteristic point pair to world coordinates to obtain a 3d point p, then projecting the 3d point to a nearest key frame of the frame, and reconstructing the map point on the key frame
Figure FDA0003769698190000033
If the map point corresponding to this matching point is static, the depth value between p' and p is unchanged, so if the difference between the depth values is larger than an empirical threshold d th If the matching point is a dynamic point, setting the depth value of the matching point to zero; d is obtained from the following formula th =0.2+0.025*z′;
(4) Calling an eNP function provided in OpenCV, and calculating the pose between the two frames of pictures by using the eNP function and the matched key points;
(5) Finally, judging the key frame, judging whether the current frame meets the condition called as the key frame, and if the current frame meets the following condition, considering the current frame as the key frame; the map points are obtained by converting the feature points on the key frame into world coordinates;
(1) more than 20 frames from the last global relocation;
(2) the local mapping process is idle, or more than 20 frames are inserted from the last key frame;
(3) the current frame can track at least 50 map points;
(4) the map points that the current frame can track are less than 90% of the map points in the reference key frame;
the semantic octree map module comprises the following working steps:
after loop detection is finished, a semantic octree map module receives a color image and a depth image obtained after semantic segmentation; p is a radical of a =[x a y a 1] T For a pixel in a color image, p is added a Conversion to world coordinates
Figure FDA0003769698190000034
Traversing each pixel in the color map and converting the pixels to be below world coordinates to obtain a 3d point under a world coordinate system; then converting the 3d point into a color point cloud data type pcl, wherein the point cloud data type is PointXYZRGB, the point cloud data type is a six-dimensional variable, the front three-dimensional is the world coordinate of the point cloud, and the rear three-dimensional is the color of the point cloud; after segmentation, different labels have different colors on the color image, and point cloud data is colored according to the result; the 0 th type label is a background, so after pixels corresponding to the 0 th type label are converted into point cloud data, the point clouds are set to be silvery white; other labels are sequentially set to be different colors;
directly converting the point cloud data containing semantic information into an octree map; calling a library of the octree map in c + +, creating an octree object of ColorOcTree, then inserting point cloud data into the octree map, finally setting the resolution of the octree map to 0.05, setting the hit probability to 0.7 and setting the miss probability to 0.55; and then releasing the established octree in a ros topic/octomap form.
2. The method of claim 1, wherein the pre-processing of the encoder data is specifically:
firstly, at the time t, the pose of the robotIs xi t =[x t y t θ t ] T And according to the odometer motion model, obtaining the pose of the robot at the t +1 moment as follows:
Figure FDA0003769698190000041
wherein, delta s is the moving distance of the center of the robot, and Delta theta is the moving angle value of the robot; the method comprises the following steps that the readings of the left and right code discs of the encoder are obtained by monitoring the topic of the encoder, and the following relations are satisfied between the readings of the left and right code discs of the encoder and the movement of the robot:
Figure FDA0003769698190000042
wherein b is the wheelbase between two wheels,. DELTA.s r Is the difference of readings, Δ s, of the left code wheel from t to t +1 l The reading difference of the right coded disc from t to t + 1; therefore, a transformation matrix of the robot coordinate from t to t +1 is obtained as follows:
Figure FDA0003769698190000043
3. the method of claim 1, wherein the local map optimization module is configured to manage map points and keyframes, and optimize a transformation matrix between the map points and the keyframes via a local map BA; the working steps of the local map optimization module are as follows:
A. and (3) removing map points, namely removing the map points which do not meet the following conditions from the system:
(1) the point is predicted to be in a key frame that can be observed, and more than 25% of key frames can track the point;
(2) it must be observed by more than three key frames;
B. creating a new map point; after a new key frame is generated, performing feature matching on the secondary key frame and the key frame connected with the secondary key frame, and triangulating the generated feature points to obtain 3d points which are new map points;
C. BA optimization of the local map, namely establishing a cost function between map points and feature points by utilizing the re-projection relationship between the map points and the key frame feature points, and then enabling the error value of the cost function to be minimum by using the external parameters of the camera;
D. and (4) key frame elimination, if 90% of key points can be observed by more than 3 key frames, the excessive key frames are considered as redundant key frames, and the redundant key frames are eliminated.
4. The method of claim 1, wherein the loop detection module comprises the following steps:
after the local map optimization module is finished, the key frames are transmitted to the loop detection module, and then the loop detection module calls the bag-of-word vector of the current key frame and performs similarity detection with the bag-of-word vectors of other key frames, as shown in the following formula:
Figure FDA0003769698190000051
wherein s (v) 1 ,v 2 ) Shows the result of the similarity test, v 1 And v 2 The word bag vector reflects the characteristic point information in one image and is obtained by calling the library function of the word bag model; if the similarity between the current key frame and a key frame of a certain previous frame is 1.1 times higher than a similarity threshold, the current key frame and the previous key frame are considered to have the same pose, and loop detection is successful and used as a condition for global key frame optimization; the similarity threshold is obtained by similarity calculation between the key frame and four adjacent key frames, and the value with the highest similarity is set as the similarity threshold.
CN201910848375.6A 2019-09-09 2019-09-09 Instant positioning and map construction method suitable for dynamic environment Active CN110827395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910848375.6A CN110827395B (en) 2019-09-09 2019-09-09 Instant positioning and map construction method suitable for dynamic environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910848375.6A CN110827395B (en) 2019-09-09 2019-09-09 Instant positioning and map construction method suitable for dynamic environment

Publications (2)

Publication Number Publication Date
CN110827395A CN110827395A (en) 2020-02-21
CN110827395B true CN110827395B (en) 2023-01-20

Family

ID=69547981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910848375.6A Active CN110827395B (en) 2019-09-09 2019-09-09 Instant positioning and map construction method suitable for dynamic environment

Country Status (1)

Country Link
CN (1) CN110827395B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326716B (en) * 2020-02-28 2024-03-01 北京创奇视界科技有限公司 Loop detection method for AR application positioning of assembly guidance of assembly site environment
CN113009453B (en) * 2020-03-20 2022-11-08 青岛慧拓智能机器有限公司 Mine road edge detection and mapping method and device
CN111402336B (en) * 2020-03-23 2024-03-12 中国科学院自动化研究所 Semantic SLAM-based dynamic environment camera pose estimation and semantic map construction method
CN111563442B (en) * 2020-04-29 2023-05-02 上海交通大学 Slam method and system for fusing point cloud and camera image data based on laser radar
CN111652933B (en) * 2020-05-06 2023-08-04 Oppo广东移动通信有限公司 Repositioning method and device based on monocular camera, storage medium and electronic equipment
CN111664842A (en) * 2020-05-07 2020-09-15 苏州品坤智能科技有限公司 Instant positioning and map building system of unmanned sweeper
CN112116657B (en) * 2020-08-07 2023-12-19 中国科学院深圳先进技术研究院 Simultaneous positioning and mapping method and device based on table retrieval
CN112132893B (en) * 2020-08-31 2024-01-09 同济人工智能研究院(苏州)有限公司 Visual SLAM method suitable for indoor dynamic environment
CN112132857B (en) * 2020-09-18 2023-04-07 福州大学 Dynamic object detection and static map reconstruction method of dynamic environment hybrid vision system
CN112233180B (en) * 2020-10-23 2024-03-15 上海影谱科技有限公司 Map-based SLAM rapid initialization method and device and electronic equipment
CN112200874B (en) * 2020-10-30 2022-06-21 中国科学院自动化研究所 Multilayer scene reconstruction and rapid segmentation method, system and device in narrow space
CN112378409B (en) * 2020-12-01 2022-08-12 杭州宇芯机器人科技有限公司 Robot RGB-D SLAM method based on geometric and motion constraint in dynamic environment
CN113284224B (en) * 2021-04-20 2024-06-18 北京行动智能科技有限公司 Automatic mapping method and device based on simple codes, positioning method and equipment
CN113362363B (en) * 2021-06-18 2022-11-04 广东工业大学 Image automatic labeling method and device based on visual SLAM and storage medium
CN113763551B (en) * 2021-09-08 2023-10-27 北京易航远智科技有限公司 Rapid repositioning method for large-scale map building scene based on point cloud
CN113902847B (en) * 2021-10-11 2024-04-16 岱悟智能科技(上海)有限公司 Monocular depth image pose optimization method based on three-dimensional feature constraint
CN114926536B (en) * 2022-07-19 2022-10-14 合肥工业大学 Semantic-based positioning and mapping method and system and intelligent robot
CN115120346B (en) * 2022-08-30 2023-02-17 中国科学院自动化研究所 Target point positioning device, electronic equipment and bronchoscope system
CN118097030A (en) * 2024-04-23 2024-05-28 华南农业大学 BundleFusion-based three-dimensional reconstruction method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106123890A (en) * 2016-06-14 2016-11-16 中国科学院合肥物质科学研究院 A kind of robot localization method of Fusion
CN107516326A (en) * 2017-07-14 2017-12-26 中国科学院计算技术研究所 Merge monocular vision and the robot localization method and system of encoder information
CN108594816A (en) * 2018-04-23 2018-09-28 长沙学院 A kind of method and system for realizing positioning and composition by improving ORB-SLAM algorithms
CN108596974A (en) * 2018-04-04 2018-09-28 清华大学 Dynamic scene robot localization builds drawing system and method
CN109753913A (en) * 2018-12-28 2019-05-14 东南大学 Calculate efficient multi-mode video semantic segmentation method
CN110095111A (en) * 2019-05-10 2019-08-06 广东工业大学 A kind of construction method of map scene, building system and relevant apparatus
CN110097553A (en) * 2019-04-10 2019-08-06 东南大学 The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100877071B1 (en) * 2007-07-18 2009-01-07 삼성전자주식회사 Method and apparatus of pose estimation in a mobile robot based on particle filter
KR101572851B1 (en) * 2008-12-22 2015-11-30 삼성전자 주식회사 Method for building map of mobile platform in dynamic environment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106123890A (en) * 2016-06-14 2016-11-16 中国科学院合肥物质科学研究院 A kind of robot localization method of Fusion
CN107516326A (en) * 2017-07-14 2017-12-26 中国科学院计算技术研究所 Merge monocular vision and the robot localization method and system of encoder information
CN108596974A (en) * 2018-04-04 2018-09-28 清华大学 Dynamic scene robot localization builds drawing system and method
CN108594816A (en) * 2018-04-23 2018-09-28 长沙学院 A kind of method and system for realizing positioning and composition by improving ORB-SLAM algorithms
CN109753913A (en) * 2018-12-28 2019-05-14 东南大学 Calculate efficient multi-mode video semantic segmentation method
CN110097553A (en) * 2019-04-10 2019-08-06 东南大学 The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system
CN110095111A (en) * 2019-05-10 2019-08-06 广东工业大学 A kind of construction method of map scene, building system and relevant apparatus

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
DRE-SLAM: Dynamic RGB-D Encoder SLAM for a Differential-Drive Robot;Dongsheng Yang et al;《remote sensing》;20190213;第1-29页 *
ICNet for Real-Time Semantic Segmentation on High-Resolution Images;Zhao H et al;《Proceedings of the European conference on computer vision (ECCV)》;20181231;第405-420页 *
基于单目视觉的同时定位与稀疏地图构建的研究与实现;吴勇;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180115;第I138-1304页 *
基于激光雷达的农业移动机器人果树干检测和导航研究;张莹莹;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170615;第I140-174页 *
基于视觉的语义SLAM关键技术研究;王泽民;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190515;第I138-1503页 *
移动系统中的实时图像处理算法的研究;马超群;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20180415;第I138-2363页 *

Also Published As

Publication number Publication date
CN110827395A (en) 2020-02-21

Similar Documents

Publication Publication Date Title
CN110827395B (en) Instant positioning and map construction method suitable for dynamic environment
CN111462135B (en) Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation
CN109345588B (en) Tag-based six-degree-of-freedom attitude estimation method
CN108229366B (en) Deep learning vehicle-mounted obstacle detection method based on radar and image data fusion
CN112734852B (en) Robot mapping method and device and computing equipment
WO2019153245A1 (en) Systems and methods for deep localization and segmentation with 3d semantic map
CN110706248A (en) Visual perception mapping algorithm based on SLAM and mobile robot
CN110688905B (en) Three-dimensional object detection and tracking method based on key frame
CN106875437B (en) RGBD three-dimensional reconstruction-oriented key frame extraction method
CN113506318B (en) Three-dimensional target perception method under vehicle-mounted edge scene
CN114413881B (en) Construction method, device and storage medium of high-precision vector map
CN112825192B (en) Object identification system and method based on machine learning
CN112101160B (en) Binocular semantic SLAM method for automatic driving scene
CN111179344A (en) Efficient mobile robot SLAM system for repairing semantic information
CN109934873A (en) Mark image acquiring method, device and equipment
CN112541423A (en) Synchronous positioning and map construction method and system
CN116643291A (en) SLAM method for removing dynamic targets by combining vision and laser radar
CN117523461B (en) Moving target tracking and positioning method based on airborne monocular camera
CN114140527A (en) Dynamic environment binocular vision SLAM method based on semantic segmentation
CN115147344A (en) Three-dimensional detection and tracking method for parts in augmented reality assisted automobile maintenance
WO2020199072A1 (en) Autonomous driving dataset generation with automatic object labelling methods and apparatuses
CN113628334A (en) Visual SLAM method, device, terminal equipment and storage medium
CN112509110A (en) Automatic image data set acquisition and labeling framework for land confrontation intelligent agent
CN117132620A (en) Multi-target tracking method, system, storage medium and terminal for automatic driving scene
CN116468786B (en) Semantic SLAM method based on point-line combination and oriented to dynamic environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant