CN116524026B - Dynamic vision SLAM method based on frequency domain and semantics - Google Patents
Dynamic vision SLAM method based on frequency domain and semantics Download PDFInfo
- Publication number
- CN116524026B CN116524026B CN202310505675.0A CN202310505675A CN116524026B CN 116524026 B CN116524026 B CN 116524026B CN 202310505675 A CN202310505675 A CN 202310505675A CN 116524026 B CN116524026 B CN 116524026B
- Authority
- CN
- China
- Prior art keywords
- image
- mask
- motion
- frame
- dynamic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000033001 locomotion Effects 0.000 claims abstract description 60
- 238000001514 detection method Methods 0.000 claims abstract description 17
- 230000003068 static effect Effects 0.000 claims abstract description 11
- 230000011218 segmentation Effects 0.000 claims abstract description 7
- 230000009466 transformation Effects 0.000 claims description 16
- 230000004044 response Effects 0.000 claims description 12
- 238000003708 edge detection Methods 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 8
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 claims description 6
- 206010047571 Visual impairment Diseases 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- FEPMHVLSLDOMQC-UHFFFAOYSA-N virginiamycin-S1 Natural products CC1OC(=O)C(C=2C=CC=CC=2)NC(=O)C2CC(=O)CCN2C(=O)C(CC=2C=CC=CC=2)N(C)C(=O)C2CCCN2C(=O)C(CC)NC(=O)C1NC(=O)C1=NC=CC=C1O FEPMHVLSLDOMQC-UHFFFAOYSA-N 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 230000008030 elimination Effects 0.000 claims description 3
- 238000003379 elimination reaction Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000005286 illumination Methods 0.000 abstract description 10
- 230000008859 change Effects 0.000 abstract description 7
- 238000013507 mapping Methods 0.000 abstract description 6
- IKGXIBQEEMLURG-NVPNHPEKSA-N rutin Chemical compound O[C@@H]1[C@H](O)[C@@H](O)[C@H](C)O[C@H]1OC[C@@H]1[C@@H](O)[C@H](O)[C@@H](O)[C@H](OC=2C(C3=C(O)C=C(O)C=C3OC=2C=2C=C(O)C(O)=CC=2)=O)O1 IKGXIBQEEMLURG-NVPNHPEKSA-N 0.000 abstract description 2
- 238000012360 testing method Methods 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 7
- 230000000007 visual effect Effects 0.000 description 5
- 230000004807 localization Effects 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/254—Analysis of motion involving subtraction of images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/37—Determination of transform parameters for the alignment of images, i.e. image registration using transform domain methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/579—Depth or shape recovery from multiple images from motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Automation & Control Theory (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a dynamic SLAM method based on frequency domain and semantics, which is used for completing positioning and mapping tasks in a high-dynamic and complex illumination environment. First, to accurately obtain a motion region of an object, images are registered in a frequency domain using a fourier melin algorithm to compensate for camera motion, and then an inter-frame difference algorithm is applied to obtain a motion mask of the images. And simultaneously, carrying out semantic segmentation on the image through a short-time dense connection (STDC) network to obtain a potential moving object mask. Combining the motion mask and the object mask to obtain a final object motion area, and eliminating the characteristic points falling in the area. And finally, tracking and optimizing according to the stable static characteristic points, and improving the pose accuracy. Test results in a public data set and a real environment show that the method has good positioning accuracy and robustness in a complex dynamic scene, and can effectively reduce the influence of motion blur and illumination change on motion detection.
Description
Field of the art
The invention belongs to the field of computer vision, and particularly relates to a simultaneous localization and mapping technology, in particular to a dynamic vision SLAM method based on frequency domain and semantics.
(II) background art
Synchronous localization and mapping techniques (simultaneous location and mapping, SLAM) refer to building a map of the surrounding environment from sensor data in real time without any prior knowledge, while inferring its own localization from this map. SLAM technology based on visual sensors is known as visual synchrony positioning and map creation (VSLAM) technology. After having an RGB-D camera with fast acquisition speed, rich acquisition information, and relatively low price, VSLAMs have been widely used in a variety of fields.
Over the past 30 years, many scholars have studied SLAM and achieved outstanding effects such as ORB-SLAM2, RGBD-SLAM-V2, and the like. However, the conventional SLAM operation is mostly based on the assumption of a static environment, but dynamic objects inevitably exist in the real operation environment of the SLAM, and feature points of the objects are unstable, thus causing interference to the SLAM and causing performance degradation. In a SLAM system based on feature points, when unstable feature points are tracked, pose estimation is seriously affected, resulting in a large track error or even system breakdown. Therefore, performance degradation and lack of robustness in dynamic scenarios have become major obstacles in their practical application.
In the paper 'visual synchronous positioning and map construction based on semantic and optical flow constraint under dynamic scene', semantic and optical flow information is used for eliminating dynamic object feature points in the scene so as to reduce interference of dynamic objects on SLAM, thereby improving the accuracy and robustness of SLAM. However, the optical flow method is based on the assumption of invariance of illumination and cannot be applied to scenes with changed illumination. In the paper 'indoor mobile robot mapping technical research based on dynamic target detection', epipolar constraint is used for screening dynamic feature points, and semantic information and the dynamic feature points in a dynamic scene are utilized for filtering dynamic parts, so that the accuracy of gesture estimation is improved. Epipolar constraint is based on the assumption that static regions occupy an absolute majority of scenes, which is not true in most dynamic scenes, especially those where motion blur occurs. The invention also uses a deep learning method to acquire semantic information, but mainly improves a motion detection algorithm, and uses a Fourier-Merlin transformation registration image to perform motion detection, so that the method still has robustness in the environments of severe illumination change and motion blurring.
Aiming at the problem that the prior art is not robust in the environment with severe illumination change and motion blur, the invention provides a dynamic vision SLAM method based on frequency domain and semantics, which can effectively improve the precision and the robustness of SLAM in the environment with severe illumination change and motion blur.
(III) summary of the invention
The invention utilizes the unique advantages of Fourier-Merlin transformation in image registration to combine with an interframe difference (Temporal Difference, TD) algorithm, realizes a high-robustness motion detection algorithm, combines with a visual ORB-SLAM2 and STDC semantic segmentation network, and provides a visual SLAM algorithm based on Fourier-Merlin transformation in a dynamic scene. First, to accurately obtain the motion region of the object, a fourier melin algorithm is used for registration to compensate for camera motion, and then an inter-frame difference algorithm is used to obtain a motion mask. Meanwhile, the image passes through the STDC semantic segmentation network to obtain a potential moving object mask. Combining the motion mask and the object mask to obtain a final object motion area, and eliminating the characteristic points falling in the area. And finally, tracking, optimizing and improving the pose accuracy through stable static feature points.
In order to achieve the above purpose, the invention adopts the following technical scheme:
s1, acquiring an input image sequence, wherein the input image sequence comprises RGB images and corresponding depth images;
s2, extracting ORB characteristic points of an RGB image of an input frame, and specifically comprising the following substeps:
s21, converting the input RGB image into a gray scale image;
s22, initializing pyramid parameters of the image, wherein the pyramid parameters comprise the number of extracted feature points, pyramid scaling factors, pyramid layers, the number of feature points pre-allocated for each layer, the extraction parameters of initial FAST feature points and the like;
s23, constructing an image pyramid, scaling each layer of pyramid image in the construction process, and filling the periphery;
s24, traversing the images of all pyramid layers, gridding each image, and calling opencv functions in the grids to extract FAST corner points;
s25, eliminating the characteristic points by using an octree method according to the characteristic points pre-distributed in each layer, and calculating the direction of each characteristic point by using a gray centroid method;
s26, waiting for the completion of the motion detection to acquire a moving object region;
s3, taking the input current frame as an image to be registered, taking the previous frame image as a registration image, and registering the registration image and the image to be registered in a frequency domain by utilizing Fourier Merlin transformation, wherein the method specifically comprises the following substeps:
s31, converting RGB images of an input registration image and an image to be registered into a gray scale image;
s32, performing discrete Fourier transform on the registered gray level images, and performing high-pass filtering on the frequency domain images subjected to the discrete Fourier transform;
s33, carrying out logarithmic polar coordinate transformation on the frequency domain diagram after high-pass filtering, and inputting the image subjected to the logarithmic polar coordinate transformation into a phase correlation step to obtain response coordinates (x, y);
s34, carrying out coordinate transformation on response coordinates (x, y) obtained in the phase correlation step to obtain a rotation angle theta and a scale factor S, and carrying out rotation and scaling on the image to be registered according to the rotation angle theta and the scale factor S;
s35, inputting the rotated and scaled image to be registered and the registration image into the phase correlation step again to obtain response coordinates (x, y), and translating the image to be registered according to the response coordinates (x, y) to obtain a final registration image;
s4, performing motion detection on the registration image and the previous frame image through an inter-frame difference method, removing noise through thresholding, edge detection, contour clustering and other operations, and specifically comprising the following sub-steps:
s41, inputting the registered image and the previous frame image into an inter-frame difference module together to obtain a difference image, wherein an inter-frame difference formula is as follows:
D i (x,y)=|f i (x,y)-f i+1 (x,y)|
wherein D is i (x, y) is the i-th frame differential image, f i (x, y) is the i-th frame gray scale image, f i+1 (x, y) is the i+1st frame gray image after registration;
s42, thresholding is carried out on the differential graph, and a thresholding formula is as follows:
wherein R is i (x, y) is the i-th frame threshold map, and t=40 is the binarization threshold. Namely, setting the pixel value of a point with the pixel value larger than 40 in the difference map as 255, and setting the pixel value of a point with the pixel value smaller than 40 as 0 to obtain a threshold map;
s43, applying a Canny edge detection operator to the threshold map to perform edge detection to obtain an edge mask;
s44, applying directional rectangular frame fitting to the edge mask to obtain a contour, calculating an aspect ratio for each rectangular frame, classifying the rectangular frame as an afterimage if the aspect ratio is smaller than 0.1, and setting 0 for pixel points of an afterimage area to realize elimination to obtain a final motion mask;
s5, inputting RGB images of an input frame into a short-time dense connection (STDC) network for semantic segmentation to obtain an object mask containing object semantic information;
s6, according to an object mask obtained by the STDC network, combining the motion mask to judge the motion of the object, and specifically comprising the following substeps:
s61, calculating the motion probability rho of the object for the object mask and the motion mask through the following formula i :
Wherein M is i M is the total pixel number of the ith object in the object mask i The total pixel number of the corresponding region of the motion mask;
s62, a threshold epsilon=0.1 is set, if the motion probability ρ is i If the pixel point is larger than the threshold epsilon, the object is regarded as a moving object, otherwise, the object is regarded as a static object, and a priori dynamic object mask is obtained by setting 0 to the pixel point of the static object area;
s63, fusing the prior dynamic object mask and the motion mask to obtain a final dynamic object mask;
s64, inputting the dynamic object mask into the step S26, and eliminating the characteristic points falling in the dynamic object area according to the dynamic object mask.
The invention has the following beneficial effects:
(1) The invention registers images through improved Fourier Merlin transformation to realize motion compensation, obtains a motion mask by using inter-frame difference, and reduces the influence of motion blur and illumination change on motion detection;
(2) The invention combines motion detection and semantic segmentation to provide a dynamic feature point filtering method which can effectively eliminate the interference of pose estimation and mapping of a dynamic object;
(3) Compared with the traditional dynamic SLAM, the invention can obtain better effect in a high dynamic environment. In the high dynamic sequence, the absolute track error of the invention is reduced by more than 95 percent compared with ORB-SLAM2 average, and is reduced by more than 30 percent compared with DS-SLAM average, which shows that the invention has higher accuracy and robustness under dynamic environment.
(IV) description of the drawings
FIG. 1 is a general flow diagram of a SLAM system;
FIG. 2 is a flowchart of Fourier Merlin transform image registration;
FIG. 3 is an example of image registration;
FIG. 4 is a flow chart of the motion detection of the drawing;
FIG. 5 is an exemplary diagram of motion detection;
FIG. 6 is a graph of mask extraction effect under motion blur;
fig. 7 is a graph showing the mask extraction effect under illumination change.
(fifth) detailed description of the invention
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and test examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. An overall flow chart of the system of the present invention is shown in fig. 1.
S1, acquiring an input image sequence, wherein the input image sequence comprises RGB images and corresponding depth images;
s2, extracting ORB characteristic points of an RGB image of an input frame, and specifically comprising the following substeps:
s21, converting the input RGB image into a gray scale image;
s22, initializing pyramid parameters of the image, wherein the pyramid parameters comprise the number of extracted feature points, pyramid scaling factors, pyramid layers, the number of feature points pre-allocated for each layer, the extraction parameters of initial FAST feature points and the like;
s23, constructing an image pyramid, scaling each layer of pyramid image in the construction process, and filling the periphery;
s24, traversing the images of all pyramid layers, gridding each image, and calling opencv functions in the grids to extract FAST corner points;
s25, eliminating the characteristic points by using an octree method according to the characteristic points pre-distributed in each layer, and calculating the direction of each characteristic point by using a gray centroid method;
s26, waiting for the completion of the motion detection to acquire a moving object region;
s3, taking the input current frame as an image to be registered, taking the previous frame of image as a registration image, registering the registration image and the image to be registered in a frequency domain by utilizing Fourier Merlin transformation, wherein an image registration flow chart is shown in FIG. 2, and specifically comprises the following substeps:
s31, converting RGB images of an input registration image and an image to be registered into a gray scale image;
s32, performing discrete Fourier transform on the registered gray level images, and performing high-pass filtering on the frequency domain images subjected to the discrete Fourier transform;
s33, carrying out logarithmic polar coordinate transformation on the frequency domain diagram after high-pass filtering, and inputting the image subjected to the logarithmic polar coordinate transformation into a phase correlation step to obtain response coordinates (x, y);
s34, carrying out coordinate transformation on response coordinates (x, y) obtained in the phase correlation step to obtain a rotation angle theta and a scale factor S, and carrying out rotation and scaling on the image to be registered according to the rotation angle theta and the scale factor S;
s35, inputting the rotated and scaled image to be registered and the registration image into the phase correlation step again to obtain response coordinates (x, y), and translating the image to be registered according to the response coordinates (x, y) to obtain a final registration image, wherein an image registration example diagram is shown in FIG. 3;
s4, performing motion detection on the registration image and the previous frame image through an inter-frame difference method, removing noise through thresholding, edge detection, contour clustering and other operations, wherein a motion detection flow chart is shown in FIG. 4, and specifically comprises the following sub-steps:
s41, inputting the registered image and the previous frame image into an inter-frame difference module together to obtain a difference image, wherein an inter-frame difference formula is as follows:
D i (x,y)=|f i (x,y)-f i+1 (x,y)|
wherein D is i (x, y) is the i-th frame differential image, f i (x, y) is the i-th frame gray scale image, f i+1 (x, y) is the i+1st frame gray image after registration;
s42, thresholding is carried out on the differential graph, and a thresholding formula is as follows:
wherein R is i (x, y) is the i-th frame threshold map, and t=40 is the binarization threshold. That is, the pixel value of the point with the pixel value larger than 40 in the difference image is set as 255, likeSetting the pixel value of the point with the pixel value smaller than 40 to be 0, and obtaining a threshold value diagram;
s43, applying a Canny edge detection operator to the threshold map to perform edge detection to obtain an edge mask;
s44, applying directional rectangular frame fitting to the edge mask to obtain a contour, calculating an aspect ratio for each rectangular frame, classifying the rectangular frame as an afterimage if the aspect ratio is smaller than 0.1, and setting 0 to the pixel point of the afterimage area to realize elimination, so as to obtain a final motion mask, wherein an example diagram of motion detection is shown in FIG. 5;
s5, inputting RGB images of an input frame into a short-time dense connection (STDC) network for semantic segmentation to obtain an object mask containing object semantic information;
s6, according to an object mask obtained by the STDC network, combining the motion mask to judge the motion of the object, and specifically comprising the following substeps:
s61, calculating the motion probability rho of the object for the object mask and the motion mask through the following formula i :
Wherein M is i M is the total pixel number of the ith object in the object mask i The total pixel number of the corresponding region of the motion mask;
s62, a threshold epsilon=0.1 is set, if the motion probability ρ is i If the pixel point is larger than the threshold epsilon, the object is regarded as a moving object, otherwise, the object is regarded as a static object, and a priori dynamic object mask is obtained by setting 0 to the pixel point of the static object area;
s63, fusing the prior dynamic object mask and the motion mask to obtain a final dynamic object mask;
s64, inputting the dynamic object mask into the step S26, and eliminating the characteristic points falling in the dynamic object area according to the dynamic object mask.
The present invention uses absolute track error (Absolute Trajectory Error, ATE) and relative pose error (Relative Pose Error, RPE) to evaluate method performance, using root mean square error (Root Mean Square Error, RMSE) and standard deviation (Standard Deviation, SD) as evaluation indicators. The performance under the TUM dataset is shown in tables 1 and 2.
TABLE 1
TABLE 2
As can be seen from tables 1 and 2, in the high dynamic sequence, the root mean square error and standard deviation of the absolute track error of the present invention are significantly better than the ORB-SLAM2 and DS-SLAM, which indicates that the present invention has higher accuracy and more compact error distribution in the high dynamic sequence. In particular, the invention provides a significant improvement over both fr3/w/xyz and fr3/w/half sequences. In the fr3/w/xyz sequence, the root mean square error and standard deviation of the present invention were reduced by about 98.39% and 97.79% relative to ORB-SLAM2, and about 35.33% and 42.94% relative to DS-SLAM. In the fr3/w/half sequence, the root mean square error of the present invention was reduced by about 97.71% and 97.13% relative to ORB-SLAM2 and by about 39.91% and 53.37% relative to DS-SLAM. This shows that the invention has better robustness compared with ORB-SLAM2 and DS-SLAM methods in dynamic scenarios.
However, in low dynamic sequences, the root mean square error and standard deviation of the absolute track error of the invention are reduced by only 17.86% and 15.82% relative to ORB-SLAM2, and are improved by 7.80% and 3.98% relative to DS-SLAM, because in low dynamic sequences the moving object is not moving at the moment, which results in its feature points being used for localization when the moving object is stationary, and its feature points being removed when moving, which has an effect on the accuracy of the system in subsequent global optimization. In addition, although the method uses high-pass filtering, constructing an edge mask and the like to eliminate the influence of environmental noise caused by registration errors, in some cases, especially in the case of severe camera movement, the environmental noise is difficult to thoroughly eliminate, which is one of the reasons that the method has slightly poor precision in low dynamic sequences compared with the DS-SLAM algorithm.
Compared with ORB-SLAM2 and classical DS-SLAM, the method can remarkably improve the positioning accuracy of dynamic scenes. In particular, for low dynamic scenes, the accuracy of the present invention is improved by about 15% over ORB-SLAM 2. For a high dynamic scene, the improvement effect is more obvious, the precision of the invention can be stably improved by more than 95% compared with ORB-SLAM2, and the precision of the invention is improved by more than 30% compared with DS-SLAM. Research results show that the method can accurately eliminate the interference of the dynamic target, thereby reducing the pose error in the optimization process. Because the invention optimizes the extraction strategy of the dynamic mask, compared with the DS-SLAM which uses the RANSAC method to calculate the outlier so as to obtain the dynamic characteristic point, the invention does not need the assumption that the static area occupies the main area or the interested area. As shown in fig. 6 and 7, the present invention can accurately extract a dynamic region in the case where the dynamic blur region occupies a large part of an image or a scene undergoes a severe illumination change.
The above embodiments further illustrate the objects, technical solutions and advantageous effects of the present invention, and the above examples are only for illustrating the technical solutions of the present invention, but not for limiting the scope of protection of the present invention, and it should be understood by those skilled in the art that modifications, equivalents and alternatives to the technical solutions of the present invention are included in the scope of protection of the present invention.
Claims (1)
1. The dynamic vision SLAM method based on the frequency domain and the semantics is characterized by comprising the following steps:
s1, acquiring an input image sequence, wherein the input image sequence comprises RGB images and corresponding depth images;
s2, extracting ORB characteristic points of an RGB image of an input frame, and specifically comprising the following substeps:
s21, converting the input RGB image into a gray scale image;
s22, initializing image pyramid parameters, wherein the parameters comprise the number of extracted feature points, pyramid scaling factors, pyramid layers, the number of feature points pre-allocated for each layer and the extraction parameters of initial FAST feature points;
s23, constructing an image pyramid, scaling each layer of pyramid image in the construction process, and filling the periphery;
s24, traversing the images of all pyramid layers, gridding each image, and calling opencv functions in the grids to extract FAST corner points;
s25, eliminating the characteristic points by using an octree method according to the characteristic points pre-distributed in each layer, and calculating the direction of each characteristic point by using a gray centroid method;
s26, waiting for the completion of the motion detection to acquire a moving object region;
s3, taking the input current frame as an image to be registered, taking the previous frame image as a registration image, and registering the registration image and the image to be registered in a frequency domain by utilizing Fourier Merlin transformation, wherein the method specifically comprises the following substeps:
s31, converting RGB images of an input registration image and an image to be registered into a gray scale image;
s32, performing discrete Fourier transform on the registered gray level images, and performing high-pass filtering on the frequency domain images subjected to the discrete Fourier transform;
s33, carrying out logarithmic polar coordinate transformation on the frequency domain diagram after high-pass filtering, and inputting the image subjected to the logarithmic polar coordinate transformation into a phase correlation step to obtain response coordinates (x, y);
s34, carrying out coordinate transformation on response coordinates (x, y) obtained in the phase correlation step to obtain a rotation angle theta and a scale factor S, and carrying out rotation and scaling on the image to be registered according to the rotation angle theta and the scale factor S;
s35, inputting the rotated and scaled image to be registered and the registration image into the phase correlation step again to obtain response coordinates (x, y), and translating the image to be registered according to the response coordinates (x, y) to obtain a final registration image;
s4, performing motion detection on the registration image and the previous frame image through an inter-frame difference method, and removing noise through thresholding, edge detection and contour fitting operation, wherein the method specifically comprises the following sub-steps:
s41, inputting the registered image and the previous frame image into an inter-frame difference module together to obtain a difference image, wherein an inter-frame difference formula is as follows:
D i (x,y)=|f i (x,y)-f i+1 (x,y)|
wherein D is i (x, y) is the i-th frame differential image, f i (x, y) is the i-th frame gray scale image, f i+1 (x, y) is the i+1st frame gray image after registration;
s42, thresholding is carried out on the differential graph, and a thresholding formula is as follows:
wherein R is i (x, y) is an i-th frame threshold map, t=40 is a binarization threshold, that is, a pixel value of a point with a pixel value greater than 40 in the difference map is set to 255, and a pixel value of a point with a pixel value less than 40 is set to 0, to obtain a threshold map;
s43, applying a Canny edge detection operator to the threshold map to perform edge detection to obtain an edge mask;
s44, applying directional rectangular frame fitting to the edge mask to obtain a contour, calculating an aspect ratio for each rectangular frame, classifying the rectangular frame as an afterimage if the aspect ratio is smaller than 0.1, and setting 0 for pixel points of an afterimage area to realize elimination to obtain a final motion mask;
s5, inputting the RGB image of the input frame into a short-time dense connection STDC network for semantic segmentation to obtain an object mask containing object semantic information;
s6, according to an object mask obtained by the STDC network, combining the motion mask to judge the motion of the object, and specifically comprising the following substeps:
s61, calculating the motion probability rho of the object for the object mask and the motion mask through the following formula i :
Wherein F is i M is the total pixel number of the ith object in the object mask i The total pixel number of the corresponding region of the motion mask;
s62, a threshold epsilon=0.1 is set, if the motion probability ρ is i If the pixel point is larger than the threshold epsilon, the object is regarded as a moving object, otherwise, the object is regarded as a static object, and a priori dynamic object mask is obtained by setting 0 to the pixel point of the static object area;
s63, fusing the prior dynamic object mask and the motion mask to obtain a final dynamic object mask;
s64, inputting the dynamic object mask into the step S26, and eliminating the characteristic points falling in the dynamic object area according to the dynamic object mask.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310505675.0A CN116524026B (en) | 2023-05-08 | 2023-05-08 | Dynamic vision SLAM method based on frequency domain and semantics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310505675.0A CN116524026B (en) | 2023-05-08 | 2023-05-08 | Dynamic vision SLAM method based on frequency domain and semantics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116524026A CN116524026A (en) | 2023-08-01 |
CN116524026B true CN116524026B (en) | 2023-10-27 |
Family
ID=87402762
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310505675.0A Active CN116524026B (en) | 2023-05-08 | 2023-05-08 | Dynamic vision SLAM method based on frequency domain and semantics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116524026B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117036408B (en) * | 2023-08-22 | 2024-03-29 | 哈尔滨理工大学 | Object SLAM method combining multi-target tracking under dynamic environment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102970528A (en) * | 2012-12-28 | 2013-03-13 | 北京航空航天大学 | Video object division method based on change detection and frame difference accumulation |
CN106127801A (en) * | 2016-06-16 | 2016-11-16 | 乐视控股(北京)有限公司 | A kind of method and apparatus of moving region detection |
CN110334762A (en) * | 2019-07-04 | 2019-10-15 | 华南师范大学 | A kind of feature matching method combining ORB and SIFT based on quaternary tree |
CN110942484A (en) * | 2019-11-26 | 2020-03-31 | 福州大学 | Camera self-motion estimation method based on occlusion perception and feature pyramid matching |
CN112465858A (en) * | 2020-12-10 | 2021-03-09 | 武汉工程大学 | Semantic vision SLAM method based on probability grid filtering |
JP2021082265A (en) * | 2019-11-15 | 2021-05-27 | 広東工業大学Guangdong University Of Technology | Drone visual travel distance measuring method based on depth point line feature |
CN114140527A (en) * | 2021-11-19 | 2022-03-04 | 苏州科技大学 | Dynamic environment binocular vision SLAM method based on semantic segmentation |
-
2023
- 2023-05-08 CN CN202310505675.0A patent/CN116524026B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102970528A (en) * | 2012-12-28 | 2013-03-13 | 北京航空航天大学 | Video object division method based on change detection and frame difference accumulation |
CN106127801A (en) * | 2016-06-16 | 2016-11-16 | 乐视控股(北京)有限公司 | A kind of method and apparatus of moving region detection |
CN110334762A (en) * | 2019-07-04 | 2019-10-15 | 华南师范大学 | A kind of feature matching method combining ORB and SIFT based on quaternary tree |
JP2021082265A (en) * | 2019-11-15 | 2021-05-27 | 広東工業大学Guangdong University Of Technology | Drone visual travel distance measuring method based on depth point line feature |
CN110942484A (en) * | 2019-11-26 | 2020-03-31 | 福州大学 | Camera self-motion estimation method based on occlusion perception and feature pyramid matching |
CN112465858A (en) * | 2020-12-10 | 2021-03-09 | 武汉工程大学 | Semantic vision SLAM method based on probability grid filtering |
CN114140527A (en) * | 2021-11-19 | 2022-03-04 | 苏州科技大学 | Dynamic environment binocular vision SLAM method based on semantic segmentation |
Also Published As
Publication number | Publication date |
---|---|
CN116524026A (en) | 2023-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106780576B (en) | RGBD data stream-oriented camera pose estimation method | |
JP6095018B2 (en) | Detection and tracking of moving objects | |
CN103325112B (en) | Moving target method for quick in dynamic scene | |
CN109767454B (en) | Unmanned aerial vehicle aerial video moving target detection method based on time-space-frequency significance | |
CN104036524A (en) | Fast target tracking method with improved SIFT algorithm | |
CN114782691A (en) | Robot target identification and motion detection method based on deep learning, storage medium and equipment | |
CN116524026B (en) | Dynamic vision SLAM method based on frequency domain and semantics | |
CN112364865B (en) | Method for detecting small moving target in complex scene | |
Yang et al. | Robust RGB-D SLAM in dynamic environment using faster R-CNN | |
CN103428408A (en) | Inter-frame image stabilizing method | |
CN112927251A (en) | Morphology-based scene dense depth map acquisition method, system and device | |
CN111950599B (en) | Dense visual odometer method for fusing edge information in dynamic environment | |
CN111914832B (en) | SLAM method of RGB-D camera under dynamic scene | |
Zhang et al. | A robust visual odometry based on RGB-D camera in dynamic indoor environments | |
CN114707611B (en) | Mobile robot map construction method, storage medium and equipment based on graph neural network feature extraction and matching | |
CN113592947B (en) | Method for realizing visual odometer by semi-direct method | |
CN116067374A (en) | Dynamic scene SLAM positioning method based on target detection algorithm YOLOv4 and geometric constraint | |
Min et al. | COEB-SLAM: A Robust VSLAM in Dynamic Environments Combined Object Detection, Epipolar Geometry Constraint, and Blur Filtering | |
CN113837243A (en) | RGB-D camera dynamic visual odometer method based on edge information | |
CN115272450A (en) | Target positioning method based on panoramic segmentation | |
Mei et al. | An Algorithm for Automatic Extraction of Moving Object in the Image Guidance | |
Wang et al. | Semi-direct Sparse Odometry with Robust and Accurate Pose Estimation for Dynamic Scenes | |
Wei et al. | Matching Filter-Based Vslam Optimization in Indoor Environments | |
Wang et al. | An improved particle filter tracking algorithm based on motion and appearance features | |
Adachi et al. | Improvement of Visual Odometry Using Classic Features by Semantic Information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |