CN110827395A - Instant positioning and map construction method suitable for dynamic environment - Google Patents

Instant positioning and map construction method suitable for dynamic environment Download PDF

Info

Publication number
CN110827395A
CN110827395A CN201910848375.6A CN201910848375A CN110827395A CN 110827395 A CN110827395 A CN 110827395A CN 201910848375 A CN201910848375 A CN 201910848375A CN 110827395 A CN110827395 A CN 110827395A
Authority
CN
China
Prior art keywords
map
frame
image
key
key frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910848375.6A
Other languages
Chinese (zh)
Other versions
CN110827395B (en
Inventor
陈文峰
蔡述庭
李丰
徐伟峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910848375.6A priority Critical patent/CN110827395B/en
Publication of CN110827395A publication Critical patent/CN110827395A/en
Application granted granted Critical
Publication of CN110827395B publication Critical patent/CN110827395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/005Tree description, e.g. octree, quadtree
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Abstract

The invention discloses an instant positioning and map building method suitable for a dynamic environment, which comprises the following steps: firstly, carrying out external reference calibration on a depth camera and an encoder to obtain a transformation matrix from an encoder coordinate to a depth camera coordinate; then, layering each frame of image collected by the depth camera by using an image pyramid, and extracting ORB feature points from each layer of the image pyramid; taking data obtained by an encoder as a constant-speed model of a tracking module, projecting the feature points of the key frame of the previous frame onto the current frame, matching the feature points with the feature points of the current frame, and obtaining the pose between the two frames by triangulation of the successfully matched key points; the tracking part of the invention uses the encoder data as the initial pose when the camera re-projects, realizes the accurate positioning tracking in the dynamic environment, and saves a large amount of time compared with the traditional semantic slam tracking module which needs each frame to eliminate the dynamic pixels.

Description

Instant positioning and map construction method suitable for dynamic environment
Technical Field
The invention relates to the technical field of positioning navigation and map building of mobile robots, in particular to an instant positioning and map building method suitable for a dynamic environment.
Background
Slam (instant positioning and mapping) is widely applied to the fields of mobile robots, unmanned planes, augmented reality and the like. However, in a dynamic environment, a conventional slam system is not robust. In a real scene full of dynamic objects, a tracking module in a classic SLAM system like an ORB _ SLAM fails, and what is worse, a static map is established by the classic SLAM system, and the dynamic objects cannot be removed from the map, so that the map cannot be used at all. In order to solve the problem of how to adapt the SLAM system to the dynamic scene, various paper documents provide different solutions, and DS _ SLAM is an effective solution, and the solution performs semantic segmentation on each frame of input image, calculates an essential matrix by using the segmented image, then judges whether the images are dynamic objects by using the essential matrix, and rejects the dynamic objects if the images are the dynamic objects. However, in the scheme, because each frame of image is subjected to semantic segmentation, the system is difficult to achieve real-time performance on a mobile platform, and a dense point cloud map is established, so that the memory occupation is high.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an instant positioning and map construction method suitable for a dynamic environment.
The purpose of the invention is realized by the following technical scheme:
an instant positioning and mapping method suitable for dynamic environment comprises the following steps:
firstly, carrying out external reference calibration on a depth camera and an encoder to obtain a transformation matrix from an encoder coordinate to a depth camera coordinate; then, layering each frame of image collected by the depth camera by using an image pyramid, and extracting ORB feature points from each layer of the image pyramid; taking data obtained by an encoder as a constant-speed model of a tracking module, projecting the feature points of the key frame of the previous frame onto the current frame, matching the feature points with the feature points of the current frame, and obtaining the pose between the two frames by triangulation of the successfully matched key points;
step two, the tracking module judges whether the current frame is selected as a key frame according to the principle that whether more than 20 frames are past from the current frame or not through the latest global optimization and whether 50 map points can be detected from the current frame or not, then the key frame is transmitted to the dynamic object removing module, and the dynamic object removing module carries out semantic segmentation on the key frame by using a network;
thirdly, projecting the key frame selected by the tracking module in a local map optimization module to obtain map points, and then optimizing by using lacol BA; then, a DBow2 bag-of-words model is used in a loop detection module for loop detection, all map points and key frames are globally optimized by BA, and the pose of the robot and the map points are optimized;
and step four, receiving the globally optimized key frame and the result of semantic segmentation, projecting the key frame to the 3d point cloud, coloring the point cloud according to different semantic segmentation labels, and then establishing a semantic octree map in a semantic octree map module.
Preferably, the tracking module works as follows:
the depth camera collects color images and depth images at the frequency of 30 frames per second, the color images and the depth images are published in a topic form through an iai _ kinect2 software package in ros, and then the slam system monitors the two topics to receive the collected color images and the collected depth images; the mobile robot issues encoder data at a frequency of 90hz, the slam system acquires the encoder data through monitoring topics, and then the system respectively preprocesses the encoder data and images.
Preferably, the preprocessing of the encoder data is specifically:
firstly, at the time t, the pose of the robot is ξt=[xtytzt]TAnd according to the odometer motion model, the pose of the robot at the t +1 moment can be obtained as follows:
wherein, Delta s is the moving distance of the center of the robot, and Delta theta is the moving angle value of the robot; the readings of the left and right code discs of the encoder can be obtained by monitoring the topic of the encoder, and the following relations are satisfied between the readings of the left and right code discs of the encoder and the movement of the robot:
Figure BDA0002196061310000032
where b is the wheelbase between two wheels,. DELTA.srAnd Δ slThe difference between the readings of the left and right code discs at the moment from t to t + 1; therefore, the transformation matrix of the robot coordinate from t to t +1 can be obtained as follows:
Figure BDA0002196061310000033
preferably, the preprocessing the image is divided into preprocessing a color image and a depth image, specifically:
the color image obtained from the topic is a three-channel color image, and the color image must be converted into a single-channel gray image by using an OpenCV library function; meanwhile, the depth map is also a three-channel image, and the depth map needs to be converted into a single-channel image of CV _32F by a mapping factor;
after preprocessing the data, entering an initialization part of a tracking module, and selecting a first frame image as a key frame; then, when a new frame arrives, the main thread of the tracking module is entered:
(1) using an image pyramid, extracting orb feature points including orb keypoints and descriptors at 12 transformation scales;
(2) re-projecting the feature points using the encoder model; in the data preprocessing, a transformation matrix of a robot coordinate from t to t +1 is obtained, the acquisition frequency of an encoder is 3 times that of a camera, and the encoder data and the camera data have the same time stamp and can be corresponded, so that the transformation matrix from a current frame to a previous frame can be obtained
Figure BDA0002196061310000041
Let p becThe method comprises the following steps that (1) the camera coordinate of a certain pixel of a current frame is defined, and in a camera pinhole model, the 2d pixel coordinate is converted into a function operation of a 3d point camera coordinate; the current frame pixel pcThe projection result projected to the previous frame can be expressed as
Figure BDA0002196061310000042
Wherein
Figure BDA0002196061310000043
Is an external parameter matrix of the robot and is determined by the relative poses of the robot and the camera; then can be at pc' in 3 sigma pixel area, finding matched characteristic points, the error of the encoder obeys normal distribution, wherein sigma is the standard deviation of the normal distribution;
(3) then selecting the key point with the highest descriptor matching score in the region as a matching point; traversing all the matching points, and carrying out re-projection on the depth values of the matching points to eliminate mismatching caused by a dynamic environment; firstly, projecting the matched characteristic point pair to world coordinates to obtain a 3d point p, then projecting the 3d point to a nearest key frame of the frame, and reconstructing the map point on the key frame
Figure BDA0002196061310000044
If the map point corresponding to this matching point is static, the depth value between p' and p is unchanged, so if the difference between the depth values is larger than an empirical threshold dthIf the matching point is a dynamic point, setting the depth value of the matching point to zero; d can be obtained from the following formulath=db+kdZ', wherein kdRelated to scale, dbIs a base threshold, let d generallyb0.2m and kd=0.025;
(4) Calling a PNP function provided in OpenCV, and calculating the pose between the two frames of pictures by using the ePNP and the matched key points;
(5) finally, judging the key frame, judging whether the current frame meets the condition called as the key frame, and if the current frame meets the following condition, considering the current frame as the key frame; the map points are obtained by converting the feature points on the key frame into world coordinates;
① more than 20 frames from the last global relocation;
② the partial mapping process is idle or more than 20 frames away from the last key frame insertion;
③ the current frame can track at least 50 map points;
④ the current frame can track less than 90% of the map points in the reference key frame.
Preferably, the local map optimization module mainly manages map points and key frames, and optimizes a transformation matrix between the map points and the key frames through local BA; the working steps of the local map optimization module are as follows:
(1) and (3) removing map points, namely removing the map points which do not meet the following conditions from the system:
① the point is predicted to be among the key frames that can be observed, more than 25% of the key frames can track the point;
② it must be observed by more than three key frames;
(2) creating a new map point; after a new key frame is generated, performing feature matching on the secondary key frame and the key frame connected with the secondary key frame, and triangulating the generated feature points to obtain 3d points which are new map points;
(3) BA optimization of the local map, namely establishing a cost function between map points and feature points by utilizing the re-projection relationship between the map points and the key frame feature points, and then enabling the error value of the cost function to be minimum by using the external parameters of the camera;
(4) and key frame elimination, wherein if 90% of key points can be observed by more than 3 key frames, the excessive key frames are considered as redundant key frames, and are eliminated.
Preferably, the dynamic object elimination module comprises the following working steps:
when the tracking module generates a new key frame, the color image and the depth image corresponding to the key frame are issued in a ros system topic form, and then the dynamic object removing module thread receives the topic, so that the color image and the depth image of which the dynamic pixels need to be removed are obtained; because the semantic segmentation network used in the key frame dynamic object elimination module is realized on a pytorech platform and must be realized by Python language, and other modules are realized by c + +, a ros communication mechanism is required among cross-platforms;
before semantic segmentation is carried out, a network for semantic segmentation is required to be trained, a vocacal 2012 data set is adopted to train an ICNet network, the selected reason is that the data set marks the background into one class, and marks other different objects such as people, tables and chairs into other classes, so that the background and the dynamic objects are conveniently separated when the image is subjected to semantic segmentation;
after receiving the color image and the depth image, the color image can be segmented by using an ICNet, the segmented result is a single-channel gray-scale image, the single-channel gray-scale image is processed into a three-channel color image by using an OpenCV (open cell vehicle), then a label marked on the color image is colored by using a pre-made color bar, and the final semantic segmentation result is obtained after the coloring is finished;
the semantic segmentation divides the picture by using pre-labeled labels, wherein the 0 th type label is a background, the 15 th type label is a person, and for an indoor dynamic scene, a dynamic object to be removed is mainly a person; therefore, firstly carrying out binarization on the image, setting the gray value of the label belonging to the 15 th class as 255, and then using an OpenCV region expansion function to expand the pixel region of the label of the 15 th class; then, in the depth map, setting the depth value of a pixel which is 255 in the corresponding binary image to be zero, and eliminating the dynamic pixel;
and finally, sending the depth map with the dynamic pixels removed and the color map subjected to semantic segmentation back to the slam system in a topic form.
Preferably, the working steps of the loop detection module are as follows:
after the local map optimization module is finished, the key frames are transmitted to the loop detection module, and then the loop detection module calls the bag-of-word vectors of the current key frames and the bag-of-word vectors of other key frames for similarity test, as shown in the following formula:
Figure BDA0002196061310000071
wherein v is1And v2The word bag vector reflects the characteristic point information in one image and can be obtained by calling the library function of the word bag model; if the similarity between the current key frame and a key frame of a certain previous frame is 1.1 times higher than a similarity threshold, the current key frame and the previous key frame are considered to have the same pose, and loop detection is successful and used as a condition for global key frame optimization; the similarity threshold is obtained by calculating the similarity between the key frame and the adjacent four key frames, and the value with the highest similarity is set as the similarity threshold.
Preferably, the semantic octree map module works as follows:
after loop detection is finished, a semantic octree map module receives a color image and a depth image obtained after semantic segmentation; p is a radical ofa=[xaya1]TFor a pixel in a color image, p is addedaConversion to world coordinates
Figure BDA0002196061310000072
Traversing each pixel in the color map to convert the pixels to be under the world coordinate to obtain a 3d point under the world coordinate system; then converting the 3d point into a color point cloud data type pcl, wherein the point cloud data type is PointXYZRGB, the point cloud data type is a six-dimensional variable, the front three-dimensional is the world coordinate of the point cloud, and the rear three-dimensional is the color of the point cloud; after segmentation, different labels have different colors on the color image, and point cloud data is colored according to the result; the 0 th type label is a background, so after the pixels corresponding to the 0 th type label are converted into point cloud data, the point clouds are set to be silvery white; other labels are sequentially set to be different colors;
after point cloud data containing semantic information exist, the point cloud data can be directly converted into an octree map; calling a library of the octree map in c + +, creating an octree object of ColorOcTree, then inserting point cloud data into the octree map, finally setting the resolution of the octree map to 0.05, setting the hit probability to 0.7 and setting the miss probability to 0.55; and then releasing the established octree in a ros topic/octomap form.
Compared with the prior art, the invention has the following beneficial effects:
(1) the tracking part of the invention uses encoder data as the initial pose of the camera during re-projection, thus realizing accurate positioning tracking in a dynamic environment, and saving a large amount of time compared with a traditional semantic slam tracking module which needs each frame to eliminate dynamic pixels; the semantic segmentation part is only arranged in the key frame, so that dynamic pixels in the map can be removed, and a reliable static map is established; meanwhile, the semantic segmentation part is innovatively placed on a TensorFlow platform, which is different from a caffe platform of the traditional semantic slam, so that not only is the time for semantic segmentation improved, but also the precision of semantic segmentation is improved;
(2) the method is different from the slam scheme of the fusion encoder, the encoder is only used as a constant speed model when the color image characteristic points are re-projected, the encoder is used as a main sensor in the traditional dynamic environment slam of the fusion encoder data, and the encoder data can be optimized in the map local optimization and loop detection; however, since the encoder is easy to slip and the data may fail, there may be a considerable error in some scenarios when the encoder is used as the main sensor, and this problem can be effectively avoided when the encoder is used as the constant-speed model.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a graph showing the performance comparison between an ICNet network and a conventional semantic segmentation network according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
The invention provides an instant positioning and map construction method suitable for a dynamic environment, which is characterized in that encoder data are integrated on the basis of a DS _ SLAM system, so that a tracking part in the dynamic environment is more robust, only a key frame part is subjected to image segmentation, an image segmentation result is only used for an image construction part, and the time is greatly shortened; in addition, a semantic segmentation network on a TensorFlow platform is innovatively combined, and compared with the semantic segmentation network under the traditional caffe framework, the new network has better performance; a new network introduces a ros system multi-node communication mechanism, and a node thread is independently opened up for semantic segmentation; and finally, establishing a semantic octree map according to the result of semantic segmentation.
Specifically, as shown in fig. 1 to 2, a method for instant positioning and map construction suitable for a dynamic environment includes the following steps:
firstly, carrying out external reference calibration on a depth camera and an encoder to obtain a transformation matrix from an encoder coordinate to a depth camera coordinate; then, layering each frame of image collected by the depth camera by using an image pyramid, and extracting ORB feature points from each layer of the image pyramid; taking data obtained by an encoder as a constant-speed model of a tracking module, projecting the feature points of the key frame of the previous frame onto the current frame, matching the feature points with the feature points of the current frame, and obtaining the pose between the two frames by triangulation of the successfully matched key points;
the tracking module comprises the following working steps:
the depth camera collects color images and depth images at the frequency of 30 frames per second, the color images and the depth images are published in a topic form through an iai _ kinect2 software package in ros, and then the slam system monitors the two topics to receive the collected color images and the collected depth images; the mobile robot issues encoder data at a frequency of 90hz, the slam system acquires the encoder data through monitoring topics, and then the system respectively preprocesses the encoder data and images.
The encoder data preprocessing specifically comprises:
firstly, at the time t, the pose of the robot is ξt=[xtytzt]TAnd according to the odometer motion model, the pose of the robot at the t +1 moment can be obtained as follows:
Figure BDA0002196061310000101
wherein, Delta s is the moving distance of the center of the robot, and Delta theta is the moving angle value of the robot; the readings of the left and right code discs of the encoder can be obtained by monitoring the topic of the encoder, and the following relations are satisfied between the readings of the left and right code discs of the encoder and the movement of the robot:
Figure BDA0002196061310000102
where b is the wheelbase between two wheels,. DELTA.srAnd Δ slThe difference between the readings of the left and right code discs at the moment from t to t + 1; therefore, the transformation matrix of the robot coordinate from t to t +1 can be obtained as follows:
Figure BDA0002196061310000103
the image preprocessing comprises the following steps of preprocessing a color image and a depth image:
the color image obtained from the topic is a three-channel color image, and the color image must be converted into a single-channel gray image by using an OpenCV library function; meanwhile, the depth map is also a three-channel image, and the depth map needs to be converted into a single-channel image of CV _32F by a mapping factor;
after preprocessing the data, entering an initialization part of a tracking module, and selecting a first frame image as a key frame; then, when a new frame arrives, the main thread of the tracking module is entered:
(1) using an image pyramid, extracting orb feature points including orb keypoints and descriptors at 12 transformation scales;
(2) re-projecting the feature points using the encoder model; in the data preprocessing, a transformation matrix of the robot coordinate at the time from t to t +1 is obtained, and the acquisition frequency of an encoder isThe camera collects 3 times of frequency, and the encoder data and the camera data have the same time stamp to correspond, so that a transformation matrix from the current frame to the previous frame can be obtained
Figure BDA0002196061310000111
Let p becThe method comprises the following steps that (1) the camera coordinate of a certain pixel of a current frame is defined, and in a camera pinhole model, the 2d pixel coordinate is converted into a function operation of a 3d point camera coordinate; the current frame pixel pcThe projection result projected to the previous frame can be expressed as
Figure BDA0002196061310000112
Wherein
Figure BDA0002196061310000113
Is an external parameter matrix of the robot and is determined by the relative poses of the robot and the camera; then can be at pc' in 3 sigma pixel area, finding matched characteristic points, the error of the encoder obeys normal distribution, wherein sigma is the standard deviation of the normal distribution;
(3) then selecting the key point with the highest descriptor matching score in the region as a matching point; traversing all the matching points, and carrying out re-projection on the depth values of the matching points to eliminate mismatching caused by a dynamic environment; firstly, projecting the matched characteristic point pair to world coordinates to obtain a 3d point p, then projecting the 3d point to a nearest key frame of the frame, and reconstructing the map point on the key frame
Figure BDA0002196061310000121
If the map point corresponding to this matching point is static, the depth value between p' and p is unchanged, so if the difference between the depth values is larger than an empirical threshold dthIf the matching point is a dynamic point, setting the depth value of the matching point to zero; d can be obtained from the following formulath=db+kdZ', wherein kdRelated to scale, dbIs a base threshold, let d generallyb0.2m and kd=0.025;
(4) Calling a PNP function provided in OpenCV, and calculating the pose between the two frames of pictures by using the ePNP and the matched key points;
(5) finally, judging the key frame, judging whether the current frame meets the condition called as the key frame, and if the current frame meets the following condition, considering the current frame as the key frame; the map points are obtained by converting the feature points on the key frame into world coordinates;
① more than 20 frames from the last global relocation;
② the partial mapping process is idle or more than 20 frames away from the last key frame insertion;
③ the current frame can track at least 50 map points;
④ the current frame can track less than 90% of the map points in the reference key frame.
Step two, the tracking module judges whether the current frame is selected as a key frame according to the principle that whether more than 20 frames are past from the current frame or not through the latest global optimization and whether 50 map points can be detected from the current frame or not, then the key frame is transmitted to the dynamic object removing module, and the dynamic object removing module carries out semantic segmentation on the key frame by using a network;
the dynamic object removing module comprises the following working steps:
when the tracking module generates a new key frame, the color image and the depth image corresponding to the key frame are issued in a ros system topic form, and then the dynamic object removing module thread receives the topic, so that the color image and the depth image of which the dynamic pixels need to be removed are obtained; because the semantic segmentation network used in the key frame dynamic object elimination module is realized on a pytorech platform and must be realized by Python language, and other modules are realized by c + +, a ros communication mechanism is required among cross-platforms;
before semantic segmentation is carried out, a network for semantic segmentation is required to be trained, a vocacal 2012 data set is adopted to train an ICNet network, the selected reason is that the data set marks the background into one class, and marks other different objects such as people, tables and chairs into other classes, so that the background and the dynamic objects are conveniently separated when the image is subjected to semantic segmentation;
after receiving the color image and the depth image, the color image can be segmented by using an ICNet, the segmented result is a single-channel gray-scale image, the single-channel gray-scale image is processed into a three-channel color image by using an OpenCV (open cell vehicle), then a label marked on the color image is colored by using a pre-made color bar, and the final semantic segmentation result is obtained after the coloring is finished;
the semantic segmentation divides the picture by using pre-labeled labels, wherein the 0 th type label is a background, the 15 th type label is a person, and for an indoor dynamic scene, a dynamic object to be removed is mainly a person; therefore, firstly carrying out binarization on the image, setting the gray value of the label belonging to the 15 th class as 255, and then using an OpenCV region expansion function to expand the pixel region of the label of the 15 th class; then, in the depth map, setting the depth value of a pixel which is 255 in the corresponding binary image to be zero, and eliminating the dynamic pixel;
and finally, sending the depth map with the dynamic pixels removed and the color map subjected to semantic segmentation back to the slam system in a topic form.
Thirdly, projecting the key frame selected by the tracking module in a local map optimization module to obtain map points, and then optimizing by using lacol BA; then, a DBow2 bag-of-words model is used in a loop detection module for loop detection, all map points and key frames are globally optimized by BA, and the pose of the robot and the map points are optimized;
the local map optimization module is mainly used for managing map points and key frames and optimizing a conversion matrix between the map points and the key frames through local BA; the working steps of the local map optimization module are as follows:
(1) and (3) removing map points, namely removing the map points which do not meet the following conditions from the system:
① the point is predicted to be among the key frames that can be observed, more than 25% of the key frames can track the point;
② it must be observed by more than three key frames;
(2) creating a new map point; after a new key frame is generated, performing feature matching on the secondary key frame and the key frame connected with the secondary key frame, and triangulating the generated feature points to obtain 3d points which are new map points;
(3) BA optimization of the local map, namely establishing a cost function between map points and feature points by utilizing the re-projection relationship between the map points and the key frame feature points, and then enabling the error value of the cost function to be minimum by using the external parameters of the camera;
(4) and key frame elimination, wherein if 90% of key points can be observed by more than 3 key frames, the excessive key frames are considered as redundant key frames, and are eliminated.
The working steps of the loop detection module are as follows:
after the local map optimization module is finished, the key frames are transmitted to the loop detection module, and then the loop detection module calls the bag-of-word vectors of the current key frames and the bag-of-word vectors of other key frames for similarity test, as shown in the following formula:
Figure BDA0002196061310000141
wherein v is1And v2The word bag vector reflects the characteristic point information in one image and can be obtained by calling the library function of the word bag model; if the similarity between the current key frame and a key frame of a certain previous frame is 1.1 times higher than a similarity threshold, the current key frame and the previous key frame are considered to have the same pose, and loop detection is successful and used as a condition for global key frame optimization; the similarity threshold is obtained by calculating the similarity between the key frame and the adjacent four key frames, and the value with the highest similarity is set as the similarity threshold.
And step four, receiving the globally optimized key frame and the result of semantic segmentation, projecting the key frame to the 3d point cloud, coloring the point cloud according to different semantic segmentation labels, and then establishing a semantic octree map in a semantic octree map module.
The semantic octree map module comprises the following working steps:
after loop detection is finished, a semantic octree map module receives a color image and a depth image obtained after semantic segmentation; p is a radical ofa=[xaya1]TFor a pixel in a color image, p is addedaConversion to world coordinates
Figure BDA0002196061310000151
Traversing each pixel in the color map to convert the pixels to be under the world coordinate to obtain a 3d point under the world coordinate system; then converting the 3d point into a color point cloud data type pcl, wherein the point cloud data type is PointXYZRGB, the point cloud data type is a six-dimensional variable, the front three-dimensional is the world coordinate of the point cloud, and the rear three-dimensional is the color of the point cloud; after segmentation, different labels have different colors on the color image, and point cloud data is colored according to the result; the 0 th type label is a background, so after the pixels corresponding to the 0 th type label are converted into point cloud data, the point clouds are set to be silvery white; other labels are sequentially set to be different colors;
after point cloud data containing semantic information exist, the point cloud data can be directly converted into an octree map; calling a library of the octree map in c + +, creating an octree object of ColorOcTree, then inserting point cloud data into the octree map, finally setting the resolution of the octree map to 0.05, setting the hit probability to 0.7 and setting the miss probability to 0.55; and then releasing the established octree in a ros topic/octomap form.
The traditional semantic segmentation slam is generally based on a caffe framework, and few selectable semantic segmentation networks are available under the caffe framework, so that the segmentation effect is not good even if the speed is low; in the invention, as shown in fig. 2, the abscissa is the number of frames processed per second, and the ordinate is the precision segmentation of the image, the unit is mloU%, and icnet (outlets) is far superior to the Segnet network used in the traditional semantic segmentation framework in terms of the forward running time and precision of the network, and the performance can achieve real-time performance.
Compared with the SLAM method based on laser radar in a dynamic environment, the robot SLAM object state detection method in a dynamic sparse environment and the indoor personnel autonomous positioning method based on the integration of the SLAM and the gait IMU in the prior art, the invention has the main innovation points that:
1. an innovative method for fusing encoder data is used, only the encoder data is used as a transformation matrix of the reprojection between the current frame and the key frame of the camera, and the shallow fusion of the encoder data and the camera data is realized; the huge influence on the system caused by the error of the encoder data due to the slip is avoided.
2. C + + and Python multi-node communication is achieved by using a ros system, so that a semantic segmentation network under a TensorFlow frame is innovatively added into a slam system, and the semantic segmentation performance is effectively improved.
3. The semantic octree map is innovatively eliminated, image information obtained by semantic segmentation is fully utilized, and a more complete and practical semantic map is established.
The tracking part of the invention uses encoder data as the initial pose of the camera during re-projection, thus realizing accurate positioning tracking in a dynamic environment, and saving a large amount of time compared with a traditional semantic slam tracking module which needs each frame to eliminate dynamic pixels; the semantic segmentation part is only arranged in the key frame, so that dynamic pixels in the map can be removed, and a reliable static map is established; meanwhile, the semantic segmentation part is innovatively placed on a TensorFlow platform, which is different from a caffe platform of the traditional semantic slam, so that not only is the time for semantic segmentation improved, but also the precision of semantic segmentation is improved.
The method is different from the slam scheme of the fusion encoder, the encoder is only used as a constant speed model when the color image characteristic points are re-projected, the encoder is used as a main sensor in the traditional dynamic environment slam of the fusion encoder data, and the encoder data can be optimized in the map local optimization and loop detection; however, since the encoder is easy to slip and the data may fail, there may be a considerable error in some scenarios when the encoder is used as the main sensor, and this problem can be effectively avoided when the encoder is used as the constant-speed model.
The present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents and are included in the scope of the present invention.

Claims (8)

1. An instant positioning and mapping method suitable for dynamic environment is characterized by comprising the following steps:
firstly, carrying out external reference calibration on a depth camera and an encoder to obtain a transformation matrix from an encoder coordinate to a depth camera coordinate; then, layering each frame of image collected by the depth camera by using an image pyramid, and extracting ORB feature points from each layer of the image pyramid; taking data obtained by an encoder as a constant-speed model of a tracking module, projecting the feature points of the key frame of the previous frame onto the current frame, matching the feature points with the feature points of the current frame, and obtaining the pose between the two frames by triangulation of the successfully matched key points;
step two, the tracking module judges whether the current frame is selected as a key frame according to the principle that whether more than 20 frames are past from the current frame or not through the latest global optimization and whether 50 map points can be detected from the current frame or not, then the key frame is transmitted to the dynamic object removing module, and the dynamic object removing module carries out semantic segmentation on the key frame by using a network;
thirdly, projecting the key frame selected by the tracking module in a local map optimization module to obtain map points, and then optimizing by using lacolBA; then, a DBow2 bag-of-words model is used in a loop detection module for loop detection, all map points and key frames are globally optimized by BA, and the pose of the robot and the map points are optimized;
and step four, receiving the globally optimized key frame and the result of semantic segmentation, projecting the key frame to the 3d point cloud, coloring the point cloud according to different semantic segmentation labels, and then establishing a semantic octree map in a semantic octree map module.
2. The method of claim 1, wherein the tracking module comprises the following steps:
the depth camera collects color images and depth images at the frequency of 30 frames per second, the color images and the depth images are published in a topic form through an iai _ kinect2 software package in ros, and then the slam system monitors the two topics to receive the collected color images and the collected depth images; the mobile robot issues encoder data at a frequency of 90hz, the slam system acquires the encoder data through monitoring topics, and then the system respectively preprocesses the encoder data and images.
3. The method of claim 2, wherein the pre-processing of the encoder data is specifically:
firstly, at the time t, the pose of the robot is ξt=[xtytzt]TAnd according to the odometer motion model, the pose of the robot at the t +1 moment can be obtained as follows:
Figure FDA0002196061300000021
wherein, Delta s is the moving distance of the center of the robot, and Delta theta is the moving angle value of the robot; the readings of the left and right code discs of the encoder can be obtained by monitoring the topic of the encoder, and the following relations are satisfied between the readings of the left and right code discs of the encoder and the movement of the robot:
Figure FDA0002196061300000022
where b is the wheelbase between two wheels,. DELTA.srAnd Δ slThe difference between the readings of the left and right code discs at the moment from t to t + 1; therefore, the transformation matrix of the robot coordinate from t to t +1 can be obtained as follows:
Figure FDA0002196061300000023
4. the method of claim 2, wherein the preprocessing the image is divided into preprocessing a color image and a depth image, and specifically comprises:
the color image obtained from the topic is a three-channel color image, and the color image must be converted into a single-channel gray image by using an OpenCV library function; meanwhile, the depth map is also a three-channel image, and the depth map needs to be converted into a single-channel image of CV _32F by a mapping factor;
after preprocessing the data, entering an initialization part of a tracking module, and selecting a first frame image as a key frame; then, when a new frame arrives, the main thread of the tracking module is entered:
(1) using an image pyramid, extracting orb feature points including orb keypoints and descriptors at 12 transformation scales;
(2) re-projecting the feature points using the encoder model; in the data preprocessing, a transformation matrix of a robot coordinate from t to t +1 is obtained, the acquisition frequency of an encoder is 3 times that of a camera, and the encoder data and the camera data have the same time stamp and can be corresponded, so that the transformation matrix from a current frame to a previous frame can be obtained
Figure FDA0002196061300000031
Let p becThe method comprises the following steps that (1) the camera coordinate of a certain pixel of a current frame is defined, and in a camera pinhole model, the 2d pixel coordinate is converted into a function operation of a 3d point camera coordinate; the current frame pixel pcThe projection result projected to the previous frame can be expressed as
Figure FDA0002196061300000032
Wherein
Figure FDA0002196061300000033
Is an external parameter matrix of the robot,determined by the relative poses of the robot and the camera; then can be at pc' in 3 sigma pixel area, finding matched characteristic points, the error of the encoder obeys normal distribution, wherein sigma is the standard deviation of the normal distribution;
(3) then selecting the key point with the highest descriptor matching score in the region as a matching point; traversing all the matching points, and carrying out re-projection on the depth values of the matching points to eliminate mismatching caused by a dynamic environment; firstly, projecting the matched characteristic point pair to world coordinates to obtain a 3d point p, then projecting the 3d point to a nearest key frame of the frame, and reconstructing the map point on the key frame
Figure FDA0002196061300000034
If the map point corresponding to this matching point is static, the depth value between p' and p is unchanged, so if the difference between the depth values is larger than an empirical threshold dthIf the matching point is a dynamic point, setting the depth value of the matching point to zero; d can be obtained from the following formulath=db+kdZ', wherein kdRelated to scale, dbIs a base threshold, let d generallyb0.2m and kd=0.025;
(4) Calling a PNP function provided in OpenCV, and calculating the pose between the two frames of pictures by using the ePNP and the matched key points;
(5) finally, judging the key frame, judging whether the current frame meets the condition called as the key frame, and if the current frame meets the following condition, considering the current frame as the key frame; the map points are obtained by converting the feature points on the key frame into world coordinates;
① more than 20 frames from the last global relocation;
② the partial mapping process is idle or more than 20 frames away from the last key frame insertion;
③ the current frame can track at least 50 map points;
④ the current frame can track less than 90% of the map points in the reference key frame.
5. The method of claim 1, wherein the local map optimization module mainly manages map points and key frames, and optimizes a transformation matrix between map points and key frames through local BA; the working steps of the local map optimization module are as follows:
(1) and (3) removing map points, namely removing the map points which do not meet the following conditions from the system:
① the point is predicted to be among the key frames that can be observed, more than 25% of the key frames can track the point;
② it must be observed by more than three key frames;
(2) creating a new map point; after a new key frame is generated, performing feature matching on the secondary key frame and the key frame connected with the secondary key frame, and triangulating the generated feature points to obtain 3d points which are new map points;
(3) BA optimization of the local map, namely establishing a cost function between map points and feature points by utilizing the re-projection relationship between the map points and the key frame feature points, and then enabling the error value of the cost function to be minimum by using the external parameters of the camera;
(4) and key frame elimination, wherein if 90% of key points can be observed by more than 3 key frames, the excessive key frames are considered as redundant key frames, and are eliminated.
6. The method for instant location and mapping in a dynamic environment as claimed in claim 1, wherein the dynamic object elimination module comprises the following steps:
when the tracking module generates a new key frame, the color image and the depth image corresponding to the key frame are issued in a ros system topic form, and then the dynamic object removing module thread receives the topic, so that the color image and the depth image of which the dynamic pixels need to be removed are obtained; because the semantic segmentation network used in the key frame dynamic object elimination module is realized on a pytorech platform and must be realized by Python language, and other modules are realized by c + +, a ros communication mechanism is required among cross-platforms;
before semantic segmentation is carried out, a network for semantic segmentation is required to be trained, a vocacal 2012 data set is adopted to train an ICNet network, the selected reason is that the data set marks the background into one class, and marks other different objects such as people, tables and chairs into other classes, so that the background and the dynamic objects are conveniently separated when the image is subjected to semantic segmentation;
after receiving the color image and the depth image, the color image can be segmented by using an ICNet, the segmented result is a single-channel gray-scale image, the single-channel gray-scale image is processed into a three-channel color image by using an OpenCV (open cell vehicle), then a label marked on the color image is colored by using a pre-made color bar, and the final semantic segmentation result is obtained after the coloring is finished;
the semantic segmentation divides the picture by using pre-labeled labels, wherein the 0 th type label is a background, the 15 th type label is a person, and for an indoor dynamic scene, a dynamic object to be removed is mainly a person; therefore, firstly carrying out binarization on the image, setting the gray value of the label belonging to the 15 th class as 255, and then using an OpenCV region expansion function to expand the pixel region of the label of the 15 th class; then, in the depth map, setting the depth value of a pixel which is 255 in the corresponding binary image to be zero, and eliminating the dynamic pixel;
and finally, sending the depth map with the dynamic pixels removed and the color map subjected to semantic segmentation back to the slam system in a topic form.
7. The method of claim 1, wherein the loop detection module comprises the following steps:
after the local map optimization module is finished, the key frames are transmitted to the loop detection module, and then the loop detection module calls the bag-of-word vectors of the current key frames and the bag-of-word vectors of other key frames for similarity test, as shown in the following formula:
Figure FDA0002196061300000061
wherein v is1And v2The word bag vector reflects the characteristic point information in one image and can be obtained by calling the library function of the word bag model; if the similarity between the current key frame and a key frame of a certain previous frame is 1.1 times higher than a similarity threshold, the current key frame and the previous key frame are considered to have the same pose, and loop detection is successful and used as a condition for global key frame optimization; the similarity threshold is obtained by calculating the similarity between the key frame and the adjacent four key frames, and the value with the highest similarity is set as the similarity threshold.
8. The method for instant location and mapping in a dynamic environment of claim 1, wherein the semantic octree map module works as follows:
after loop detection is finished, a semantic octree map module receives a color image and a depth image obtained after semantic segmentation; p is a radical ofa=[xaya1]TFor a pixel in a color image, p is addedaConversion to world coordinates
Figure FDA0002196061300000071
Traversing each pixel in the color map to convert the pixels to be under the world coordinate to obtain a 3d point under the world coordinate system; then converting the 3d point into a color point cloud data type pcl, wherein the point cloud data type is PointXYZRGB, the point cloud data type is a six-dimensional variable, the front three-dimensional is the world coordinate of the point cloud, and the rear three-dimensional is the color of the point cloud; after segmentation, different labels have different colors on the color image, and point cloud data is colored according to the result; the 0 th type label is a background, so after the pixels corresponding to the 0 th type label are converted into point cloud data, the point clouds are set to be silvery white; other labels are sequentially set to be different colors;
after point cloud data containing semantic information exist, the point cloud data can be directly converted into an octree map; calling a library of the octree map in c + +, creating an octree object of ColorOcTree, then inserting point cloud data into the octree map, finally setting the resolution of the octree map to 0.05, setting the hit probability to 0.7 and setting the miss probability to 0.55; and then releasing the established octree in a ros topic/octomap form.
CN201910848375.6A 2019-09-09 2019-09-09 Instant positioning and map construction method suitable for dynamic environment Active CN110827395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910848375.6A CN110827395B (en) 2019-09-09 2019-09-09 Instant positioning and map construction method suitable for dynamic environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910848375.6A CN110827395B (en) 2019-09-09 2019-09-09 Instant positioning and map construction method suitable for dynamic environment

Publications (2)

Publication Number Publication Date
CN110827395A true CN110827395A (en) 2020-02-21
CN110827395B CN110827395B (en) 2023-01-20

Family

ID=69547981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910848375.6A Active CN110827395B (en) 2019-09-09 2019-09-09 Instant positioning and map construction method suitable for dynamic environment

Country Status (1)

Country Link
CN (1) CN110827395B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402336A (en) * 2020-03-23 2020-07-10 中国科学院自动化研究所 Semantic S L AM-based dynamic environment camera pose estimation and semantic map construction method
CN111563442A (en) * 2020-04-29 2020-08-21 上海交通大学 Slam method and system for fusing point cloud and camera image data based on laser radar
CN111652933A (en) * 2020-05-06 2020-09-11 Oppo广东移动通信有限公司 Monocular camera-based repositioning method and device, storage medium and electronic equipment
CN111664842A (en) * 2020-05-07 2020-09-15 苏州品坤智能科技有限公司 Instant positioning and map building system of unmanned sweeper
CN112116657A (en) * 2020-08-07 2020-12-22 中国科学院深圳先进技术研究院 Table retrieval-based simultaneous positioning and mapping method and device
CN112132857A (en) * 2020-09-18 2020-12-25 福州大学 Dynamic object detection and static map reconstruction method of dynamic environment hybrid vision system
CN112200874A (en) * 2020-10-30 2021-01-08 中国科学院自动化研究所 Multilayer scene reconstruction and rapid segmentation method, system and device in narrow space
CN112233180A (en) * 2020-10-23 2021-01-15 上海影谱科技有限公司 Map-based SLAM rapid initialization method and device and electronic equipment
CN112378409A (en) * 2020-12-01 2021-02-19 杭州宇芯机器人科技有限公司 Robot RGB-D SLAM method based on geometric and motion constraint in dynamic environment
CN113009453A (en) * 2020-03-20 2021-06-22 青岛慧拓智能机器有限公司 Mine road edge detection and map building method and device
CN113326716A (en) * 2020-02-28 2021-08-31 北京创奇视界科技有限公司 Loop detection method for guiding AR application positioning by assembling in-situ environment
CN113362363A (en) * 2021-06-18 2021-09-07 广东工业大学 Automatic image annotation method and device based on visual SLAM and storage medium
CN113763551A (en) * 2021-09-08 2021-12-07 北京易航远智科技有限公司 Point cloud-based rapid repositioning method for large-scale mapping scene
CN113902847A (en) * 2021-10-11 2022-01-07 岱悟智能科技(上海)有限公司 Monocular depth image pose optimization method based on three-dimensional feature constraint
WO2022041596A1 (en) * 2020-08-31 2022-03-03 同济人工智能研究院(苏州)有限公司 Visual slam method applicable to indoor dynamic environment
CN114926536A (en) * 2022-07-19 2022-08-19 合肥工业大学 Semantic-based positioning and mapping method and system and intelligent robot
CN115120346A (en) * 2022-08-30 2022-09-30 中国科学院自动化研究所 Target point positioning method and device, electronic equipment and bronchoscope system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024251A1 (en) * 2007-07-18 2009-01-22 Samsung Electronics Co., Ltd. Method and apparatus for estimating pose of mobile robot using particle filter
US20100161225A1 (en) * 2008-12-22 2010-06-24 Samsung Electronics Co., Ltd. Method of building map of mobile platform in dynamic environment
CN106123890A (en) * 2016-06-14 2016-11-16 中国科学院合肥物质科学研究院 A kind of robot localization method of Fusion
CN107516326A (en) * 2017-07-14 2017-12-26 中国科学院计算技术研究所 Merge monocular vision and the robot localization method and system of encoder information
CN108594816A (en) * 2018-04-23 2018-09-28 长沙学院 A kind of method and system for realizing positioning and composition by improving ORB-SLAM algorithms
CN108596974A (en) * 2018-04-04 2018-09-28 清华大学 Dynamic scene robot localization builds drawing system and method
CN109753913A (en) * 2018-12-28 2019-05-14 东南大学 Calculate efficient multi-mode video semantic segmentation method
CN110097553A (en) * 2019-04-10 2019-08-06 东南大学 The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system
CN110095111A (en) * 2019-05-10 2019-08-06 广东工业大学 A kind of construction method of map scene, building system and relevant apparatus

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024251A1 (en) * 2007-07-18 2009-01-22 Samsung Electronics Co., Ltd. Method and apparatus for estimating pose of mobile robot using particle filter
US20100161225A1 (en) * 2008-12-22 2010-06-24 Samsung Electronics Co., Ltd. Method of building map of mobile platform in dynamic environment
CN106123890A (en) * 2016-06-14 2016-11-16 中国科学院合肥物质科学研究院 A kind of robot localization method of Fusion
CN107516326A (en) * 2017-07-14 2017-12-26 中国科学院计算技术研究所 Merge monocular vision and the robot localization method and system of encoder information
CN108596974A (en) * 2018-04-04 2018-09-28 清华大学 Dynamic scene robot localization builds drawing system and method
CN108594816A (en) * 2018-04-23 2018-09-28 长沙学院 A kind of method and system for realizing positioning and composition by improving ORB-SLAM algorithms
CN109753913A (en) * 2018-12-28 2019-05-14 东南大学 Calculate efficient multi-mode video semantic segmentation method
CN110097553A (en) * 2019-04-10 2019-08-06 东南大学 The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system
CN110095111A (en) * 2019-05-10 2019-08-06 广东工业大学 A kind of construction method of map scene, building system and relevant apparatus

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
DONGSHENG YANG ET AL: "DRE-SLAM: Dynamic RGB-D Encoder SLAM for a Differential-Drive Robot", 《REMOTE SENSING》 *
ZHAO H ET AL: "ICNet for Real-Time Semantic Segmentation on High-Resolution Images", 《PROCEEDINGS OF THE EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV)》 *
吴勇: "基于单目视觉的同时定位与稀疏地图构建的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
张莹莹: "基于激光雷达的农业移动机器人果树干检测和导航研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王泽民: "基于视觉的语义SLAM关键技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
马超群: "移动系统中的实时图像处理算法的研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326716B (en) * 2020-02-28 2024-03-01 北京创奇视界科技有限公司 Loop detection method for AR application positioning of assembly guidance of assembly site environment
CN113326716A (en) * 2020-02-28 2021-08-31 北京创奇视界科技有限公司 Loop detection method for guiding AR application positioning by assembling in-situ environment
CN113009453A (en) * 2020-03-20 2021-06-22 青岛慧拓智能机器有限公司 Mine road edge detection and map building method and device
CN111402336B (en) * 2020-03-23 2024-03-12 中国科学院自动化研究所 Semantic SLAM-based dynamic environment camera pose estimation and semantic map construction method
CN111402336A (en) * 2020-03-23 2020-07-10 中国科学院自动化研究所 Semantic S L AM-based dynamic environment camera pose estimation and semantic map construction method
CN111563442A (en) * 2020-04-29 2020-08-21 上海交通大学 Slam method and system for fusing point cloud and camera image data based on laser radar
CN111563442B (en) * 2020-04-29 2023-05-02 上海交通大学 Slam method and system for fusing point cloud and camera image data based on laser radar
CN111652933A (en) * 2020-05-06 2020-09-11 Oppo广东移动通信有限公司 Monocular camera-based repositioning method and device, storage medium and electronic equipment
CN111652933B (en) * 2020-05-06 2023-08-04 Oppo广东移动通信有限公司 Repositioning method and device based on monocular camera, storage medium and electronic equipment
CN111664842A (en) * 2020-05-07 2020-09-15 苏州品坤智能科技有限公司 Instant positioning and map building system of unmanned sweeper
CN112116657A (en) * 2020-08-07 2020-12-22 中国科学院深圳先进技术研究院 Table retrieval-based simultaneous positioning and mapping method and device
CN112116657B (en) * 2020-08-07 2023-12-19 中国科学院深圳先进技术研究院 Simultaneous positioning and mapping method and device based on table retrieval
WO2022041596A1 (en) * 2020-08-31 2022-03-03 同济人工智能研究院(苏州)有限公司 Visual slam method applicable to indoor dynamic environment
CN112132857B (en) * 2020-09-18 2023-04-07 福州大学 Dynamic object detection and static map reconstruction method of dynamic environment hybrid vision system
CN112132857A (en) * 2020-09-18 2020-12-25 福州大学 Dynamic object detection and static map reconstruction method of dynamic environment hybrid vision system
CN112233180B (en) * 2020-10-23 2024-03-15 上海影谱科技有限公司 Map-based SLAM rapid initialization method and device and electronic equipment
CN112233180A (en) * 2020-10-23 2021-01-15 上海影谱科技有限公司 Map-based SLAM rapid initialization method and device and electronic equipment
CN112200874B (en) * 2020-10-30 2022-06-21 中国科学院自动化研究所 Multilayer scene reconstruction and rapid segmentation method, system and device in narrow space
CN112200874A (en) * 2020-10-30 2021-01-08 中国科学院自动化研究所 Multilayer scene reconstruction and rapid segmentation method, system and device in narrow space
CN112378409A (en) * 2020-12-01 2021-02-19 杭州宇芯机器人科技有限公司 Robot RGB-D SLAM method based on geometric and motion constraint in dynamic environment
CN112378409B (en) * 2020-12-01 2022-08-12 杭州宇芯机器人科技有限公司 Robot RGB-D SLAM method based on geometric and motion constraint in dynamic environment
CN113362363A (en) * 2021-06-18 2021-09-07 广东工业大学 Automatic image annotation method and device based on visual SLAM and storage medium
CN113763551A (en) * 2021-09-08 2021-12-07 北京易航远智科技有限公司 Point cloud-based rapid repositioning method for large-scale mapping scene
CN113763551B (en) * 2021-09-08 2023-10-27 北京易航远智科技有限公司 Rapid repositioning method for large-scale map building scene based on point cloud
CN113902847A (en) * 2021-10-11 2022-01-07 岱悟智能科技(上海)有限公司 Monocular depth image pose optimization method based on three-dimensional feature constraint
CN113902847B (en) * 2021-10-11 2024-04-16 岱悟智能科技(上海)有限公司 Monocular depth image pose optimization method based on three-dimensional feature constraint
CN114926536B (en) * 2022-07-19 2022-10-14 合肥工业大学 Semantic-based positioning and mapping method and system and intelligent robot
CN114926536A (en) * 2022-07-19 2022-08-19 合肥工业大学 Semantic-based positioning and mapping method and system and intelligent robot
CN115120346B (en) * 2022-08-30 2023-02-17 中国科学院自动化研究所 Target point positioning device, electronic equipment and bronchoscope system
CN115120346A (en) * 2022-08-30 2022-09-30 中国科学院自动化研究所 Target point positioning method and device, electronic equipment and bronchoscope system

Also Published As

Publication number Publication date
CN110827395B (en) 2023-01-20

Similar Documents

Publication Publication Date Title
CN110827395B (en) Instant positioning and map construction method suitable for dynamic environment
CN111462135B (en) Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation
CN108229366B (en) Deep learning vehicle-mounted obstacle detection method based on radar and image data fusion
CN112734852B (en) Robot mapping method and device and computing equipment
CN111563442A (en) Slam method and system for fusing point cloud and camera image data based on laser radar
CN110706248A (en) Visual perception mapping algorithm based on SLAM and mobile robot
CN113506318B (en) Three-dimensional target perception method under vehicle-mounted edge scene
CN110765922A (en) AGV is with two mesh vision object detection barrier systems
CN111028267B (en) Monocular vision following system and method for mobile robot
CN110533720A (en) Semantic SLAM system and method based on joint constraint
CN111179344A (en) Efficient mobile robot SLAM system for repairing semantic information
CN110533716B (en) Semantic SLAM system and method based on 3D constraint
WO2020237516A1 (en) Point cloud processing method, device, and computer readable storage medium
CN114821507A (en) Multi-sensor fusion vehicle-road cooperative sensing method for automatic driving
CN113743391A (en) Three-dimensional obstacle detection system and method applied to low-speed autonomous driving robot
CN115410167A (en) Target detection and semantic segmentation method, device, equipment and storage medium
Yu et al. Accurate and robust visual localization system in large-scale appearance-changing environments
CN114140527A (en) Dynamic environment binocular vision SLAM method based on semantic segmentation
CN112509110A (en) Automatic image data set acquisition and labeling framework for land confrontation intelligent agent
CN116642490A (en) Visual positioning navigation method based on hybrid map, robot and storage medium
CN116804744A (en) Target detection method based on vehicle-mounted laser radar and image information
Priya et al. 3DYOLO: Real-time 3D Object Detection in 3D Point Clouds for Autonomous Driving
CN116309817A (en) Tray detection and positioning method based on RGB-D camera
CN115147344A (en) Three-dimensional detection and tracking method for parts in augmented reality assisted automobile maintenance
CN115965673B (en) Centralized multi-robot positioning method based on binocular vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant