CN116524026B - Dynamic vision SLAM method based on frequency domain and semantics - Google Patents

Dynamic vision SLAM method based on frequency domain and semantics Download PDF

Info

Publication number
CN116524026B
CN116524026B CN202310505675.0A CN202310505675A CN116524026B CN 116524026 B CN116524026 B CN 116524026B CN 202310505675 A CN202310505675 A CN 202310505675A CN 116524026 B CN116524026 B CN 116524026B
Authority
CN
China
Prior art keywords
image
mask
motion
frame
dynamic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310505675.0A
Other languages
Chinese (zh)
Other versions
CN116524026A (en
Inventor
栾添添
吕奉坤
班喜程
孙明晓
吕重阳
张晓霜
吴宝奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202310505675.0A priority Critical patent/CN116524026B/en
Publication of CN116524026A publication Critical patent/CN116524026A/en
Application granted granted Critical
Publication of CN116524026B publication Critical patent/CN116524026B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/37Determination of transform parameters for the alignment of images, i.e. image registration using transform domain methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a dynamic SLAM method based on frequency domain and semantics, which is used for completing positioning and mapping tasks in a high-dynamic and complex illumination environment. First, to accurately obtain a motion region of an object, images are registered in a frequency domain using a fourier melin algorithm to compensate for camera motion, and then an inter-frame difference algorithm is applied to obtain a motion mask of the images. And simultaneously, carrying out semantic segmentation on the image through a short-time dense connection (STDC) network to obtain a potential moving object mask. Combining the motion mask and the object mask to obtain a final object motion area, and eliminating the characteristic points falling in the area. And finally, tracking and optimizing according to the stable static characteristic points, and improving the pose accuracy. Test results in a public data set and a real environment show that the method has good positioning accuracy and robustness in a complex dynamic scene, and can effectively reduce the influence of motion blur and illumination change on motion detection.

Description

Dynamic vision SLAM method based on frequency domain and semantics
Field of the art
The invention belongs to the field of computer vision, and particularly relates to a simultaneous localization and mapping technology, in particular to a dynamic vision SLAM method based on frequency domain and semantics.
(II) background art
Synchronous localization and mapping techniques (simultaneous location and mapping, SLAM) refer to building a map of the surrounding environment from sensor data in real time without any prior knowledge, while inferring its own localization from this map. SLAM technology based on visual sensors is known as visual synchrony positioning and map creation (VSLAM) technology. After having an RGB-D camera with fast acquisition speed, rich acquisition information, and relatively low price, VSLAMs have been widely used in a variety of fields.
Over the past 30 years, many scholars have studied SLAM and achieved outstanding effects such as ORB-SLAM2, RGBD-SLAM-V2, and the like. However, the conventional SLAM operation is mostly based on the assumption of a static environment, but dynamic objects inevitably exist in the real operation environment of the SLAM, and feature points of the objects are unstable, thus causing interference to the SLAM and causing performance degradation. In a SLAM system based on feature points, when unstable feature points are tracked, pose estimation is seriously affected, resulting in a large track error or even system breakdown. Therefore, performance degradation and lack of robustness in dynamic scenarios have become major obstacles in their practical application.
In the paper 'visual synchronous positioning and map construction based on semantic and optical flow constraint under dynamic scene', semantic and optical flow information is used for eliminating dynamic object feature points in the scene so as to reduce interference of dynamic objects on SLAM, thereby improving the accuracy and robustness of SLAM. However, the optical flow method is based on the assumption of invariance of illumination and cannot be applied to scenes with changed illumination. In the paper 'indoor mobile robot mapping technical research based on dynamic target detection', epipolar constraint is used for screening dynamic feature points, and semantic information and the dynamic feature points in a dynamic scene are utilized for filtering dynamic parts, so that the accuracy of gesture estimation is improved. Epipolar constraint is based on the assumption that static regions occupy an absolute majority of scenes, which is not true in most dynamic scenes, especially those where motion blur occurs. The invention also uses a deep learning method to acquire semantic information, but mainly improves a motion detection algorithm, and uses a Fourier-Merlin transformation registration image to perform motion detection, so that the method still has robustness in the environments of severe illumination change and motion blurring.
Aiming at the problem that the prior art is not robust in the environment with severe illumination change and motion blur, the invention provides a dynamic vision SLAM method based on frequency domain and semantics, which can effectively improve the precision and the robustness of SLAM in the environment with severe illumination change and motion blur.
(III) summary of the invention
The invention utilizes the unique advantages of Fourier-Merlin transformation in image registration to combine with an interframe difference (Temporal Difference, TD) algorithm, realizes a high-robustness motion detection algorithm, combines with a visual ORB-SLAM2 and STDC semantic segmentation network, and provides a visual SLAM algorithm based on Fourier-Merlin transformation in a dynamic scene. First, to accurately obtain the motion region of the object, a fourier melin algorithm is used for registration to compensate for camera motion, and then an inter-frame difference algorithm is used to obtain a motion mask. Meanwhile, the image passes through the STDC semantic segmentation network to obtain a potential moving object mask. Combining the motion mask and the object mask to obtain a final object motion area, and eliminating the characteristic points falling in the area. And finally, tracking, optimizing and improving the pose accuracy through stable static feature points.
In order to achieve the above purpose, the invention adopts the following technical scheme:
s1, acquiring an input image sequence, wherein the input image sequence comprises RGB images and corresponding depth images;
s2, extracting ORB characteristic points of an RGB image of an input frame, and specifically comprising the following substeps:
s21, converting the input RGB image into a gray scale image;
s22, initializing pyramid parameters of the image, wherein the pyramid parameters comprise the number of extracted feature points, pyramid scaling factors, pyramid layers, the number of feature points pre-allocated for each layer, the extraction parameters of initial FAST feature points and the like;
s23, constructing an image pyramid, scaling each layer of pyramid image in the construction process, and filling the periphery;
s24, traversing the images of all pyramid layers, gridding each image, and calling opencv functions in the grids to extract FAST corner points;
s25, eliminating the characteristic points by using an octree method according to the characteristic points pre-distributed in each layer, and calculating the direction of each characteristic point by using a gray centroid method;
s26, waiting for the completion of the motion detection to acquire a moving object region;
s3, taking the input current frame as an image to be registered, taking the previous frame image as a registration image, and registering the registration image and the image to be registered in a frequency domain by utilizing Fourier Merlin transformation, wherein the method specifically comprises the following substeps:
s31, converting RGB images of an input registration image and an image to be registered into a gray scale image;
s32, performing discrete Fourier transform on the registered gray level images, and performing high-pass filtering on the frequency domain images subjected to the discrete Fourier transform;
s33, carrying out logarithmic polar coordinate transformation on the frequency domain diagram after high-pass filtering, and inputting the image subjected to the logarithmic polar coordinate transformation into a phase correlation step to obtain response coordinates (x, y);
s34, carrying out coordinate transformation on response coordinates (x, y) obtained in the phase correlation step to obtain a rotation angle theta and a scale factor S, and carrying out rotation and scaling on the image to be registered according to the rotation angle theta and the scale factor S;
s35, inputting the rotated and scaled image to be registered and the registration image into the phase correlation step again to obtain response coordinates (x, y), and translating the image to be registered according to the response coordinates (x, y) to obtain a final registration image;
s4, performing motion detection on the registration image and the previous frame image through an inter-frame difference method, removing noise through thresholding, edge detection, contour clustering and other operations, and specifically comprising the following sub-steps:
s41, inputting the registered image and the previous frame image into an inter-frame difference module together to obtain a difference image, wherein an inter-frame difference formula is as follows:
D i (x,y)=|f i (x,y)-f i+1 (x,y)|
wherein D is i (x, y) is the i-th frame differential image, f i (x, y) is the i-th frame gray scale image, f i+1 (x, y) is the i+1st frame gray image after registration;
s42, thresholding is carried out on the differential graph, and a thresholding formula is as follows:
wherein R is i (x, y) is the i-th frame threshold map, and t=40 is the binarization threshold. Namely, setting the pixel value of a point with the pixel value larger than 40 in the difference map as 255, and setting the pixel value of a point with the pixel value smaller than 40 as 0 to obtain a threshold map;
s43, applying a Canny edge detection operator to the threshold map to perform edge detection to obtain an edge mask;
s44, applying directional rectangular frame fitting to the edge mask to obtain a contour, calculating an aspect ratio for each rectangular frame, classifying the rectangular frame as an afterimage if the aspect ratio is smaller than 0.1, and setting 0 for pixel points of an afterimage area to realize elimination to obtain a final motion mask;
s5, inputting RGB images of an input frame into a short-time dense connection (STDC) network for semantic segmentation to obtain an object mask containing object semantic information;
s6, according to an object mask obtained by the STDC network, combining the motion mask to judge the motion of the object, and specifically comprising the following substeps:
s61, calculating the motion probability rho of the object for the object mask and the motion mask through the following formula i
Wherein M is i M is the total pixel number of the ith object in the object mask i The total pixel number of the corresponding region of the motion mask;
s62, a threshold epsilon=0.1 is set, if the motion probability ρ is i If the pixel point is larger than the threshold epsilon, the object is regarded as a moving object, otherwise, the object is regarded as a static object, and a priori dynamic object mask is obtained by setting 0 to the pixel point of the static object area;
s63, fusing the prior dynamic object mask and the motion mask to obtain a final dynamic object mask;
s64, inputting the dynamic object mask into the step S26, and eliminating the characteristic points falling in the dynamic object area according to the dynamic object mask.
The invention has the following beneficial effects:
(1) The invention registers images through improved Fourier Merlin transformation to realize motion compensation, obtains a motion mask by using inter-frame difference, and reduces the influence of motion blur and illumination change on motion detection;
(2) The invention combines motion detection and semantic segmentation to provide a dynamic feature point filtering method which can effectively eliminate the interference of pose estimation and mapping of a dynamic object;
(3) Compared with the traditional dynamic SLAM, the invention can obtain better effect in a high dynamic environment. In the high dynamic sequence, the absolute track error of the invention is reduced by more than 95 percent compared with ORB-SLAM2 average, and is reduced by more than 30 percent compared with DS-SLAM average, which shows that the invention has higher accuracy and robustness under dynamic environment.
(IV) description of the drawings
FIG. 1 is a general flow diagram of a SLAM system;
FIG. 2 is a flowchart of Fourier Merlin transform image registration;
FIG. 3 is an example of image registration;
FIG. 4 is a flow chart of the motion detection of the drawing;
FIG. 5 is an exemplary diagram of motion detection;
FIG. 6 is a graph of mask extraction effect under motion blur;
fig. 7 is a graph showing the mask extraction effect under illumination change.
(fifth) detailed description of the invention
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and test examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. An overall flow chart of the system of the present invention is shown in fig. 1.
S1, acquiring an input image sequence, wherein the input image sequence comprises RGB images and corresponding depth images;
s2, extracting ORB characteristic points of an RGB image of an input frame, and specifically comprising the following substeps:
s21, converting the input RGB image into a gray scale image;
s22, initializing pyramid parameters of the image, wherein the pyramid parameters comprise the number of extracted feature points, pyramid scaling factors, pyramid layers, the number of feature points pre-allocated for each layer, the extraction parameters of initial FAST feature points and the like;
s23, constructing an image pyramid, scaling each layer of pyramid image in the construction process, and filling the periphery;
s24, traversing the images of all pyramid layers, gridding each image, and calling opencv functions in the grids to extract FAST corner points;
s25, eliminating the characteristic points by using an octree method according to the characteristic points pre-distributed in each layer, and calculating the direction of each characteristic point by using a gray centroid method;
s26, waiting for the completion of the motion detection to acquire a moving object region;
s3, taking the input current frame as an image to be registered, taking the previous frame of image as a registration image, registering the registration image and the image to be registered in a frequency domain by utilizing Fourier Merlin transformation, wherein an image registration flow chart is shown in FIG. 2, and specifically comprises the following substeps:
s31, converting RGB images of an input registration image and an image to be registered into a gray scale image;
s32, performing discrete Fourier transform on the registered gray level images, and performing high-pass filtering on the frequency domain images subjected to the discrete Fourier transform;
s33, carrying out logarithmic polar coordinate transformation on the frequency domain diagram after high-pass filtering, and inputting the image subjected to the logarithmic polar coordinate transformation into a phase correlation step to obtain response coordinates (x, y);
s34, carrying out coordinate transformation on response coordinates (x, y) obtained in the phase correlation step to obtain a rotation angle theta and a scale factor S, and carrying out rotation and scaling on the image to be registered according to the rotation angle theta and the scale factor S;
s35, inputting the rotated and scaled image to be registered and the registration image into the phase correlation step again to obtain response coordinates (x, y), and translating the image to be registered according to the response coordinates (x, y) to obtain a final registration image, wherein an image registration example diagram is shown in FIG. 3;
s4, performing motion detection on the registration image and the previous frame image through an inter-frame difference method, removing noise through thresholding, edge detection, contour clustering and other operations, wherein a motion detection flow chart is shown in FIG. 4, and specifically comprises the following sub-steps:
s41, inputting the registered image and the previous frame image into an inter-frame difference module together to obtain a difference image, wherein an inter-frame difference formula is as follows:
D i (x,y)=|f i (x,y)-f i+1 (x,y)|
wherein D is i (x, y) is the i-th frame differential image, f i (x, y) is the i-th frame gray scale image, f i+1 (x, y) is the i+1st frame gray image after registration;
s42, thresholding is carried out on the differential graph, and a thresholding formula is as follows:
wherein R is i (x, y) is the i-th frame threshold map, and t=40 is the binarization threshold. That is, the pixel value of the point with the pixel value larger than 40 in the difference image is set as 255, likeSetting the pixel value of the point with the pixel value smaller than 40 to be 0, and obtaining a threshold value diagram;
s43, applying a Canny edge detection operator to the threshold map to perform edge detection to obtain an edge mask;
s44, applying directional rectangular frame fitting to the edge mask to obtain a contour, calculating an aspect ratio for each rectangular frame, classifying the rectangular frame as an afterimage if the aspect ratio is smaller than 0.1, and setting 0 to the pixel point of the afterimage area to realize elimination, so as to obtain a final motion mask, wherein an example diagram of motion detection is shown in FIG. 5;
s5, inputting RGB images of an input frame into a short-time dense connection (STDC) network for semantic segmentation to obtain an object mask containing object semantic information;
s6, according to an object mask obtained by the STDC network, combining the motion mask to judge the motion of the object, and specifically comprising the following substeps:
s61, calculating the motion probability rho of the object for the object mask and the motion mask through the following formula i
Wherein M is i M is the total pixel number of the ith object in the object mask i The total pixel number of the corresponding region of the motion mask;
s62, a threshold epsilon=0.1 is set, if the motion probability ρ is i If the pixel point is larger than the threshold epsilon, the object is regarded as a moving object, otherwise, the object is regarded as a static object, and a priori dynamic object mask is obtained by setting 0 to the pixel point of the static object area;
s63, fusing the prior dynamic object mask and the motion mask to obtain a final dynamic object mask;
s64, inputting the dynamic object mask into the step S26, and eliminating the characteristic points falling in the dynamic object area according to the dynamic object mask.
The present invention uses absolute track error (Absolute Trajectory Error, ATE) and relative pose error (Relative Pose Error, RPE) to evaluate method performance, using root mean square error (Root Mean Square Error, RMSE) and standard deviation (Standard Deviation, SD) as evaluation indicators. The performance under the TUM dataset is shown in tables 1 and 2.
TABLE 1
TABLE 2
As can be seen from tables 1 and 2, in the high dynamic sequence, the root mean square error and standard deviation of the absolute track error of the present invention are significantly better than the ORB-SLAM2 and DS-SLAM, which indicates that the present invention has higher accuracy and more compact error distribution in the high dynamic sequence. In particular, the invention provides a significant improvement over both fr3/w/xyz and fr3/w/half sequences. In the fr3/w/xyz sequence, the root mean square error and standard deviation of the present invention were reduced by about 98.39% and 97.79% relative to ORB-SLAM2, and about 35.33% and 42.94% relative to DS-SLAM. In the fr3/w/half sequence, the root mean square error of the present invention was reduced by about 97.71% and 97.13% relative to ORB-SLAM2 and by about 39.91% and 53.37% relative to DS-SLAM. This shows that the invention has better robustness compared with ORB-SLAM2 and DS-SLAM methods in dynamic scenarios.
However, in low dynamic sequences, the root mean square error and standard deviation of the absolute track error of the invention are reduced by only 17.86% and 15.82% relative to ORB-SLAM2, and are improved by 7.80% and 3.98% relative to DS-SLAM, because in low dynamic sequences the moving object is not moving at the moment, which results in its feature points being used for localization when the moving object is stationary, and its feature points being removed when moving, which has an effect on the accuracy of the system in subsequent global optimization. In addition, although the method uses high-pass filtering, constructing an edge mask and the like to eliminate the influence of environmental noise caused by registration errors, in some cases, especially in the case of severe camera movement, the environmental noise is difficult to thoroughly eliminate, which is one of the reasons that the method has slightly poor precision in low dynamic sequences compared with the DS-SLAM algorithm.
Compared with ORB-SLAM2 and classical DS-SLAM, the method can remarkably improve the positioning accuracy of dynamic scenes. In particular, for low dynamic scenes, the accuracy of the present invention is improved by about 15% over ORB-SLAM 2. For a high dynamic scene, the improvement effect is more obvious, the precision of the invention can be stably improved by more than 95% compared with ORB-SLAM2, and the precision of the invention is improved by more than 30% compared with DS-SLAM. Research results show that the method can accurately eliminate the interference of the dynamic target, thereby reducing the pose error in the optimization process. Because the invention optimizes the extraction strategy of the dynamic mask, compared with the DS-SLAM which uses the RANSAC method to calculate the outlier so as to obtain the dynamic characteristic point, the invention does not need the assumption that the static area occupies the main area or the interested area. As shown in fig. 6 and 7, the present invention can accurately extract a dynamic region in the case where the dynamic blur region occupies a large part of an image or a scene undergoes a severe illumination change.
The above embodiments further illustrate the objects, technical solutions and advantageous effects of the present invention, and the above examples are only for illustrating the technical solutions of the present invention, but not for limiting the scope of protection of the present invention, and it should be understood by those skilled in the art that modifications, equivalents and alternatives to the technical solutions of the present invention are included in the scope of protection of the present invention.

Claims (1)

1. The dynamic vision SLAM method based on the frequency domain and the semantics is characterized by comprising the following steps:
s1, acquiring an input image sequence, wherein the input image sequence comprises RGB images and corresponding depth images;
s2, extracting ORB characteristic points of an RGB image of an input frame, and specifically comprising the following substeps:
s21, converting the input RGB image into a gray scale image;
s22, initializing image pyramid parameters, wherein the parameters comprise the number of extracted feature points, pyramid scaling factors, pyramid layers, the number of feature points pre-allocated for each layer and the extraction parameters of initial FAST feature points;
s23, constructing an image pyramid, scaling each layer of pyramid image in the construction process, and filling the periphery;
s24, traversing the images of all pyramid layers, gridding each image, and calling opencv functions in the grids to extract FAST corner points;
s25, eliminating the characteristic points by using an octree method according to the characteristic points pre-distributed in each layer, and calculating the direction of each characteristic point by using a gray centroid method;
s26, waiting for the completion of the motion detection to acquire a moving object region;
s3, taking the input current frame as an image to be registered, taking the previous frame image as a registration image, and registering the registration image and the image to be registered in a frequency domain by utilizing Fourier Merlin transformation, wherein the method specifically comprises the following substeps:
s31, converting RGB images of an input registration image and an image to be registered into a gray scale image;
s32, performing discrete Fourier transform on the registered gray level images, and performing high-pass filtering on the frequency domain images subjected to the discrete Fourier transform;
s33, carrying out logarithmic polar coordinate transformation on the frequency domain diagram after high-pass filtering, and inputting the image subjected to the logarithmic polar coordinate transformation into a phase correlation step to obtain response coordinates (x, y);
s34, carrying out coordinate transformation on response coordinates (x, y) obtained in the phase correlation step to obtain a rotation angle theta and a scale factor S, and carrying out rotation and scaling on the image to be registered according to the rotation angle theta and the scale factor S;
s35, inputting the rotated and scaled image to be registered and the registration image into the phase correlation step again to obtain response coordinates (x, y), and translating the image to be registered according to the response coordinates (x, y) to obtain a final registration image;
s4, performing motion detection on the registration image and the previous frame image through an inter-frame difference method, and removing noise through thresholding, edge detection and contour fitting operation, wherein the method specifically comprises the following sub-steps:
s41, inputting the registered image and the previous frame image into an inter-frame difference module together to obtain a difference image, wherein an inter-frame difference formula is as follows:
D i (x,y)=|f i (x,y)-f i+1 (x,y)|
wherein D is i (x, y) is the i-th frame differential image, f i (x, y) is the i-th frame gray scale image, f i+1 (x, y) is the i+1st frame gray image after registration;
s42, thresholding is carried out on the differential graph, and a thresholding formula is as follows:
wherein R is i (x, y) is an i-th frame threshold map, t=40 is a binarization threshold, that is, a pixel value of a point with a pixel value greater than 40 in the difference map is set to 255, and a pixel value of a point with a pixel value less than 40 is set to 0, to obtain a threshold map;
s43, applying a Canny edge detection operator to the threshold map to perform edge detection to obtain an edge mask;
s44, applying directional rectangular frame fitting to the edge mask to obtain a contour, calculating an aspect ratio for each rectangular frame, classifying the rectangular frame as an afterimage if the aspect ratio is smaller than 0.1, and setting 0 for pixel points of an afterimage area to realize elimination to obtain a final motion mask;
s5, inputting the RGB image of the input frame into a short-time dense connection STDC network for semantic segmentation to obtain an object mask containing object semantic information;
s6, according to an object mask obtained by the STDC network, combining the motion mask to judge the motion of the object, and specifically comprising the following substeps:
s61, calculating the motion probability rho of the object for the object mask and the motion mask through the following formula i
Wherein F is i M is the total pixel number of the ith object in the object mask i The total pixel number of the corresponding region of the motion mask;
s62, a threshold epsilon=0.1 is set, if the motion probability ρ is i If the pixel point is larger than the threshold epsilon, the object is regarded as a moving object, otherwise, the object is regarded as a static object, and a priori dynamic object mask is obtained by setting 0 to the pixel point of the static object area;
s63, fusing the prior dynamic object mask and the motion mask to obtain a final dynamic object mask;
s64, inputting the dynamic object mask into the step S26, and eliminating the characteristic points falling in the dynamic object area according to the dynamic object mask.
CN202310505675.0A 2023-05-08 2023-05-08 Dynamic vision SLAM method based on frequency domain and semantics Active CN116524026B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310505675.0A CN116524026B (en) 2023-05-08 2023-05-08 Dynamic vision SLAM method based on frequency domain and semantics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310505675.0A CN116524026B (en) 2023-05-08 2023-05-08 Dynamic vision SLAM method based on frequency domain and semantics

Publications (2)

Publication Number Publication Date
CN116524026A CN116524026A (en) 2023-08-01
CN116524026B true CN116524026B (en) 2023-10-27

Family

ID=87402762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310505675.0A Active CN116524026B (en) 2023-05-08 2023-05-08 Dynamic vision SLAM method based on frequency domain and semantics

Country Status (1)

Country Link
CN (1) CN116524026B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036408B (en) * 2023-08-22 2024-03-29 哈尔滨理工大学 Object SLAM method combining multi-target tracking under dynamic environment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102970528A (en) * 2012-12-28 2013-03-13 北京航空航天大学 Video object division method based on change detection and frame difference accumulation
CN106127801A (en) * 2016-06-16 2016-11-16 乐视控股(北京)有限公司 A kind of method and apparatus of moving region detection
CN110334762A (en) * 2019-07-04 2019-10-15 华南师范大学 A kind of feature matching method combining ORB and SIFT based on quaternary tree
CN110942484A (en) * 2019-11-26 2020-03-31 福州大学 Camera self-motion estimation method based on occlusion perception and feature pyramid matching
CN112465858A (en) * 2020-12-10 2021-03-09 武汉工程大学 Semantic vision SLAM method based on probability grid filtering
JP2021082265A (en) * 2019-11-15 2021-05-27 広東工業大学Guangdong University Of Technology Drone visual travel distance measuring method based on depth point line feature
CN114140527A (en) * 2021-11-19 2022-03-04 苏州科技大学 Dynamic environment binocular vision SLAM method based on semantic segmentation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102970528A (en) * 2012-12-28 2013-03-13 北京航空航天大学 Video object division method based on change detection and frame difference accumulation
CN106127801A (en) * 2016-06-16 2016-11-16 乐视控股(北京)有限公司 A kind of method and apparatus of moving region detection
CN110334762A (en) * 2019-07-04 2019-10-15 华南师范大学 A kind of feature matching method combining ORB and SIFT based on quaternary tree
JP2021082265A (en) * 2019-11-15 2021-05-27 広東工業大学Guangdong University Of Technology Drone visual travel distance measuring method based on depth point line feature
CN110942484A (en) * 2019-11-26 2020-03-31 福州大学 Camera self-motion estimation method based on occlusion perception and feature pyramid matching
CN112465858A (en) * 2020-12-10 2021-03-09 武汉工程大学 Semantic vision SLAM method based on probability grid filtering
CN114140527A (en) * 2021-11-19 2022-03-04 苏州科技大学 Dynamic environment binocular vision SLAM method based on semantic segmentation

Also Published As

Publication number Publication date
CN116524026A (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN106780576B (en) RGBD data stream-oriented camera pose estimation method
JP6095018B2 (en) Detection and tracking of moving objects
CN103325112B (en) Moving target method for quick in dynamic scene
CN109767454B (en) Unmanned aerial vehicle aerial video moving target detection method based on time-space-frequency significance
CN104036524A (en) Fast target tracking method with improved SIFT algorithm
CN114782691A (en) Robot target identification and motion detection method based on deep learning, storage medium and equipment
CN116524026B (en) Dynamic vision SLAM method based on frequency domain and semantics
CN112364865B (en) Method for detecting small moving target in complex scene
Yang et al. Robust RGB-D SLAM in dynamic environment using faster R-CNN
CN103428408A (en) Inter-frame image stabilizing method
CN112927251A (en) Morphology-based scene dense depth map acquisition method, system and device
CN111950599B (en) Dense visual odometer method for fusing edge information in dynamic environment
CN111914832B (en) SLAM method of RGB-D camera under dynamic scene
Zhang et al. A robust visual odometry based on RGB-D camera in dynamic indoor environments
CN114707611B (en) Mobile robot map construction method, storage medium and equipment based on graph neural network feature extraction and matching
CN113592947B (en) Method for realizing visual odometer by semi-direct method
CN116067374A (en) Dynamic scene SLAM positioning method based on target detection algorithm YOLOv4 and geometric constraint
Min et al. COEB-SLAM: A Robust VSLAM in Dynamic Environments Combined Object Detection, Epipolar Geometry Constraint, and Blur Filtering
CN113837243A (en) RGB-D camera dynamic visual odometer method based on edge information
CN115272450A (en) Target positioning method based on panoramic segmentation
Mei et al. An Algorithm for Automatic Extraction of Moving Object in the Image Guidance
Wang et al. Semi-direct Sparse Odometry with Robust and Accurate Pose Estimation for Dynamic Scenes
Wei et al. Matching Filter-Based Vslam Optimization in Indoor Environments
Wang et al. An improved particle filter tracking algorithm based on motion and appearance features
Adachi et al. Improvement of Visual Odometry Using Classic Features by Semantic Information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant