CN111311708B - Visual SLAM method based on semantic optical flow and inverse depth filtering - Google Patents

Visual SLAM method based on semantic optical flow and inverse depth filtering Download PDF

Info

Publication number
CN111311708B
CN111311708B CN202010065930.0A CN202010065930A CN111311708B CN 111311708 B CN111311708 B CN 111311708B CN 202010065930 A CN202010065930 A CN 202010065930A CN 111311708 B CN111311708 B CN 111311708B
Authority
CN
China
Prior art keywords
map
semantic
inverse depth
point
dynamic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010065930.0A
Other languages
Chinese (zh)
Other versions
CN111311708A (en
Inventor
崔林艳
马朝伟
郭政航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010065930.0A priority Critical patent/CN111311708B/en
Publication of CN111311708A publication Critical patent/CN111311708A/en
Application granted granted Critical
Publication of CN111311708B publication Critical patent/CN111311708B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion

Abstract

The invention relates to a visual SLAM method based on semantic optical flow and inverse depth filtering, which comprises the following steps: (1) the vision sensor collects images, and performs feature extraction and semantic segmentation on the collected images to obtain extracted feature points and semantic segmentation results. (2) And initializing the map by using an semantic optical flow method according to the feature points and the segmentation result, removing dynamic feature points and creating a reliable initialized map. (3) And evaluating whether the 3D map points in the map are dynamic points or not by adopting an inverse depth filter on the initialized map, and expanding the map according to the evaluation result of the inverse depth filter. (4) And continuously carrying out tracking, local map building and loop detection in sequence aiming at the map expanded by the depth filter, and finally realizing the visual SLAM facing the dynamic scene based on semantic optical flow and inverse depth filtering.

Description

Visual SLAM method based on semantic optical flow and inverse depth filtering
Technical Field
The invention relates to a visual SLAM method based on semantic optical flow and inverse depth filtering, which is a new visual SLAM method combining the semantic optical flow and inverse depth filtering technologies and is suitable for solving the problems that the traditional visual SLAM system fails in a high-dynamic scene, lacks in understanding the scene and the like.
Background
The simultaneous localization and mapping (SLAM) means that the pose of the robot is estimated through acquired sensor data under the condition that the robot has no environment prior information, and a globally consistent environment map is constructed at the same time. Among them, the SLAM system based on the visual sensor is called a visual SLAM, and has advantages of low hardware cost, high positioning accuracy, capability of realizing completely autonomous positioning navigation, and the like, so that the technology is widely concerned in the fields of artificial intelligence, virtual reality, and the like, and many excellent visual SLAM systems such as RTAB-MAP, DVO-SLAM, ORB-SLAM2, and the like are brought forward.
The traditional visual SLAM system usually assumes that the environment of the system is static, and is difficult to deal with common situations in daily life such as long time, large spatial scale, high dynamic scene and the like. Especially in a high dynamic scene, the visual SLAM based on the static world assumption cannot distinguish the dynamic scene where the system is located, and even cannot distinguish a dynamic object in the scene, so that the precision of the SLAM system in a dynamic environment is greatly reduced, and even the whole SLAM system fails in a severe case, which affects the wide application of the visual SLAM system in daily life. Therefore, how to improve the precision and stability of the visual SLAM system in a dynamic scene and enhance the understanding capability of the system to the surrounding environment is very important, and the problem becomes a problem to be solved urgently in the field of visual SLAM.
In recent years, with the progress of deep learning algorithms and the improvement of computing power, computers have been increasingly capable of processing images such as image classification and semantic segmentation. The traditional visual SLAM technology and the semantic segmentation technology based on deep learning are combined, so that the robustness and the practicability of the SLAM system can be greatly improved. The SLAM algorithm combined with semantic information is generally called semantic SLAM, which is an emerging research field, and how to use semantic information, at present, there is no mature and consistent scheme. The current difficulties are as follows: (1) how to ensure the precision and stability of a semantic vision SLAM system in a high dynamic scene; (2) how to enhance the capability of a semantic vision SLAM system for coping with high dynamic scenes and make the system have good performance when coping with static scenes.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the visual SLAM method based on the semantic optical flow and the inverse depth filtering is provided, the capability of the SLAM system in coping with dynamic scenes is improved, the understanding capability of the system to the scenes is improved, and the positioning accuracy of the system in the dynamic scenes is improved.
The technical scheme of the invention is a visual SLAM method based on semantic optical flow and inverse depth filtering, which comprises the following steps: the method comprises the following steps:
the method comprises the following steps that (1) a vision sensor collects images, and performs feature extraction and semantic segmentation on the collected images to obtain extracted feature points and semantic segmentation results;
step (2) map initialization is carried out through a semantic optical flow method according to the feature points and the segmentation results, dynamic feature points are removed, and a reliable initialization map is created;
step (3) evaluating whether the 3D map points in the initialized map are dynamic points by adopting an inverse depth filter on the initialized map, and expanding the map according to the evaluation result of the inverse depth filter;
and (4) continuously carrying out tracking, local map building and loop detection in sequence aiming at the extended map after the inverse depth filter is extended, further building an accurate map in a dynamic scene, and finally realizing the visual SLAM facing the dynamic scene based on semantic optical flow and inverse depth filtering.
Further, in the step (1), the image feature extraction and semantic segmentation method includes:
after image data acquired by a sensor are acquired, extracting image characteristic points, and performing semantic segmentation on the RGB image of the current frame by using a SegNet semantic segmentation network; dividing the characteristic points into three types of static, potential dynamic and dynamic through semantic information; the SegNet comprises an encoder network and a decoder network, an input image is firstly sent to the encoder network, each encoder in the encoder network generates a series of feature maps through convolution operation to obtain an input feature map, and then after batch normalization processing and ReLU activation function activation operation, a decoder in the decoder network performs up-sampling on the input feature map by using a maximum pooling index value stored in the corresponding encoder feature map to generate a sparse feature map; then, the sparse feature maps are passed through a trainable convolution module to generate dense feature maps; and the high-dimensional feature representation output by the last decoder of the decoder network is transmitted to the softmax classifier, a semantic label of each pixel is generated, and the semantic segmentation process of the image is completed.
Further, in the step (2), map initialization is performed by using an semantic optical flow method and a reliable initialization map is created, the method includes:
firstly, on the basis that the feature points on the acquired image are divided into three types of static, potential dynamic and dynamic by a semantic segmentation method, calculating sparse optical flows for semantic static feature points on the image of the current frame by using the image data of the current frame and the image data of the previous frame; subsequently, a basic matrix F is calculated, which is the key to the polar geometric constraint; finally, judging the motion characteristics of the static feature points, the potential dynamic feature points and the dynamic feature points again according to the epipolar constraint, and checking the judgment result through the basis matrix F calculated just before; and setting a pixel as a threshold value in the checking process, and if the straight-line distance from the feature point in the current frame image to the epipolar line corresponding to the feature point exceeds the threshold value, judging the feature point as a real dynamic feature point, thereby obtaining a reliable initialization map.
Further, in the step (3), an inverse depth filter is adopted to evaluate and expand the map for the 3D map points in the initialized map, and the method includes:
applying a depth filter based on gaussian-uniform mixture distribution assumption to SLAM, first, modeling the observed values of the inverse depth of map points with a mixture model of gaussian distribution and uniform distribution:
p(x|Z,π)=πN(x|Z,τ2)+(1-π)U(x|Zmin,Zmax)
the meaning of the respective quantities in the above formula is:
x is an observed value of the inverse depth of a map point and is a random variable; z is the true inverse depth of the map point, which is the value to be calculated; pi is the probability that the map point is an interior point, which is referred to as the interior point rate for short, the interior point is a static map point in the map, and the depth of the interior point is a point obtained by triangularization through a correct matching point; p (x | Z, π) represents the distribution of map points inverse depth observations; n (x | Z, τ)2) Representing the mean value of the true inverse depth Z of the map points, τ2Is a gaussian distribution of variance; u (x | Z)min,Zmax) Denotes a uniform distribution, ZminAnd ZmaxThe lower and upper bounds are uniformly distributed, namely the minimum inverse depth and the maximum inverse depth;
calculating the posterior probability distribution of the current time (Z, pi) to obtain:
p(Z,π|x1,...,xn)∝p(Z,π|x1,...,xn-1)p(xn|Z,π)
wherein x1,...,xnFor a series of map points of inverse depth to each otherIndependent observation values, wherein n is the serial number of the observation values; p (Z, π | x)1,...,xn) Is the posterior probability distribution of the current time (Z, π), p (Z, π | x)1,...,xn-1) Is the posterior probability distribution of the previous time (Z, π), p (x)n| Z, pi) is the likelihood probability of the depth measurement at the current time; to estimate the parameters Z and π and simplify the operation, p (Z, π | x)1,...,xn) Distribution approximating a gaussian-beta form:
q(Z,π|a,b,μ,σ2)=N(Z|μ,σ2)Beta(π|a,b)
wherein q (Z, π | a, b, μ, σ)2) Denotes that the (Z, π) obedience parameter is (a, b, μ, σ)2) Gaussian-Beta distribution of N (Z | μ, σ)2) Is a Gaussian distribution, and Beta (π | a, b) is a Beta distribution. Gaussian-Beta distribution-a total of 4 parameters (a, b, μ, σ)2) Wherein a and b are two parameters which are more than zero in the beta distribution in probability theory, mu and sigma2The expectation and the variance in the Gaussian distribution are obtained, and after a new inverse depth observation value is obtained, the 4 parameters are updated to obtain new Gaussian-beta distribution; first use
Figure BDA0002375962260000031
Finding the first and second moments of Z and pi, and then using p (Z, pi | x)1,...,xn) Determining the first and second moments of Z and pi, wherein
Figure BDA0002375962260000032
Indicating that the (Z, pi) obedience parameter at this time is
Figure BDA0002375962260000033
Gaussian-beta distribution of; then using the moment comparison method to p (Z, pi | x)1,...,xn) And q (Z, π | a, b, μ, σ)2) Respectively calculating the first and second moments of Z and pi, comparing them to obtain new parameters
Figure BDA0002375962260000034
When in use
Figure BDA0002375962260000035
When the inverse depth of the map point is smaller than a set threshold value, the inverse depth of the map point is considered to be converged; the first moment of the interior dot rate pi can be used as an estimate of pi:
Figure BDA0002375962260000041
when the inverse depth of the map point is converged, if the internal point rate pi is lower than a set threshold value, the map point is still considered as a dynamic point and is removed; only when the inverse depth of the map point converges, the interior point rate pi is higher than the set threshold value, the map point is considered to be a reliable static map point, and the reliable initial map obtained before is updated according to the reliable static map point.
Further, in the step (4), tracking and local mapping threads in a dynamic scene are performed according to the inverse depth filter extended map result, and the method includes:
carrying out initial pose estimation or repositioning of the system through an initial map obtained by semantic optical flow and inverse depth filtering, tracking a reconstructed local map, optimizing the pose, and determining a new key frame; after the key frame is determined, completing key frame insertion in a local image building thread, eliminating redundant map points and key frames, and then performing a local clustering adjustment step; in the loop detection thread, candidate frame detection is included, Sim3 is calculated, and closed loop fusion and closed loop optimization are performed; and finally, constructing an accurate map under the dynamic scene, and realizing the visual SLAM facing the dynamic scene based on semantic optical flow and inverse depth filtering.
Compared with the prior art, the invention has the advantages that:
(1) the invention adopts a semantic optical flow method, and the semantic information and the optical flow information are well integrated into a visual SLAM system in a 'tight coupling' mode, so that the problems that the traditional visual SLAM cannot understand scene information and cannot deal with dynamic scenes and the like are solved. The pose resolving accuracy in a dynamic scene is improved and is superior to that of the existing method.
(2) The invention adopts the inverse depth filtering method, considers all image frames capable of observing the map points, and continuously accumulates new observation data through the probability frame, so that the single and smaller dynamic map points can be detected and processed.
In a word, the method adopted by the invention has good performance when dealing with a high dynamic scene, and can achieve the purpose of accurately positioning the visual SLAM system in the dynamic scene.
Drawings
FIG. 1 is a flow chart of a visual SLAM method based on semantic optical flow and inverse depth filtering according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.
As shown in fig. 1, the specific implementation steps of the present invention are as follows:
step 1, acquiring image data acquired by a sensor, extracting image characteristic points, and performing semantic segmentation on an RGB image of a current frame by using a SegNet semantic segmentation network. The characteristic points are classified into three categories of static, latent dynamic and dynamic through semantic information. The SegNet comprises two modules of an encoder network and a decoder network. The input image is firstly sent to an encoder network, each encoder in the encoder network generates a series of feature maps through convolution operation, and then after batch normalization processing, ReLU activation function activation and other operations are carried out, a decoder in a decoder network uses the maximum pooling index value stored in the corresponding encoder feature map to carry out up-sampling on the input feature map, so that a sparse feature map is generated. The feature maps are then passed through a trainable convolution module to generate dense feature maps. And the high-dimensional feature representation output by the last decoder of the decoder network is transmitted to the softmax classifier, a semantic label of each pixel is generated, and the semantic segmentation process of the image is completed.
And 2, a semantic optical flow method is a method for detecting dynamic feature points by tightly coupling semantic information and geometric information, and the method makes up the defects of the traditional dynamic feature point detection algorithm. The semantic optical flow method firstly calculates sparse optical flow for semantic static feature points on the current frame image by using the image data of the current frame and the image data of the previous frame on the basis that the feature points on the acquired image are divided into three types of static, potential dynamic and dynamic by a semantic segmentation method. A basis matrix F is then calculated, which is critical to the polar geometry constraints. And finally, judging the motion characteristics of the static feature points, the potential dynamic feature points and the dynamic feature points again according to epipolar constraint, and checking the judgment result through the base matrix F calculated just before. And setting a pixel as a threshold value in the checking process, and if the straight-line distance from the feature point in the current frame image to the epipolar line corresponding to the feature point exceeds the threshold value, judging the feature point as a real dynamic feature point. This results in a reliable initial map.
And 3, evaluating the 3D map points in the initialized map by adopting an inverse depth filter and expanding the map. The depth filter based on the Gaussian-uniform mixed distribution hypothesis is applied to the SLAM, so that the system can not only process the influence of wrong matching on map point construction, but also process the influence of motion elements on the map point construction.
Modeling the observed value of the map point inverse depth by using a mixed model of Gaussian distribution and uniform distribution:
p(x|Z,π)=πN(x|Z,τ2)+(1-π)U(x|Zmin,Zmax)
the meaning of the respective quantities in the above formula is:
x is an observed value of the inverse depth of a map point and is a random variable; z is the true inverse depth of the map point, which is the value to be calculated; pi is the probability that the map point is an interior point, the interior point is a static map point in the map, and the depth of the interior point is a point obtained by triangularization through a correct matching point; p (x | Z, π) represents the distribution of map points inverse depth observations; n (x | Z),τ2) Representing the mean value of the true inverse depth Z of the map points, τ2Is a gaussian distribution of variance; u (x | Z)min,Zmax) Denotes a uniform distribution, ZminAnd ZmaxThe lower and upper bounds of the uniform distribution, i.e., the minimum inverse depth and the maximum inverse depth.
Calculating the posterior probability distribution of the current time (Z, pi) to obtain:
p(Z,π|x1,...,xn)∝p(Z,π|x1,...,xn-1)p(xn|Z,π)
wherein x1,...,xnThe observation values are mutually independent in a series of map point inverse depths, and n is the serial number of the observation values; p (Z, π | x)1,...,xn) Is the posterior probability distribution of the current time (Z, π), p (Z, π | x)1,...,xn-1) Is the posterior probability distribution of the previous time (Z, π), p (x)n| Z, π) is the likelihood probability of the depth measurement at the current time. To estimate the parameters Z and π and simplify the operation, p (Z, π | x)1,…,xn) Approximately gaussian-beta distribution:
q(Z,π|a,b,μ,σ2)=N(Z|μ,σ2)Beta(π|a,b)
wherein q (Z, π | a, b, μ, σ)2) Denotes that the (Z, π) obedience parameter is (a, b, μ, σ)2) Gaussian-Beta distribution of N (Z | μ, σ)2) Is a Gaussian distribution, and Beta (π | a, b) is a Beta distribution. Gaussian-Beta distribution-a total of 4 parameters (a, b, μ, σ)2) Wherein a and b are two parameters which are more than zero in the beta distribution in probability theory, mu and sigma2The 4 parameters are expected and variance in the Gaussian distribution, so that after a new inverse depth observation value is obtained, the 4 parameters are updated to obtain a new Gaussian-Beta distribution. First use
Figure BDA0002375962260000061
Finding the first and second moments of Z and pi, and then using p (Z, pi | x)1,…,xn) Determining the first and second moments of Z and pi, wherein
Figure BDA0002375962260000062
Indicating that the (Z, pi) obedience parameter at this time is
Figure BDA0002375962260000063
Gaussian-beta distribution. Then using a moment comparison method to compare the Z and pi first and second moments respectively obtained by the two modes to obtain new parameters
Figure BDA0002375962260000064
When in use
Figure BDA0002375962260000065
When the inverse depth of the map point is smaller than a set threshold value, the inverse depth of the map point is considered to be converged. The first moment of the interior dot rate pi can be used as an estimate of pi:
Figure BDA0002375962260000066
when the inverse depth of the map point converges, the inner point rate pi is lower than the set threshold value, and the map point is still considered as a dynamic point and is removed. Only when the inverse depth of the map point converges, the interior point rate pi is higher than the set threshold value, the map point is considered to be a reliable static map point, and the reliable initial map obtained before is updated according to the reliable static map point.
And 4, estimating or repositioning the initial pose of the system by using the initial map obtained by the semantic optical flow and the inverse depth filtering, tracking the reconstructed local map, optimizing the pose, and determining a new key frame. After the key frame is determined, the steps of key frame insertion, redundant map points and key frames elimination, local cluster adjustment and the like are mainly completed in the local mapping thread. The loop detection thread includes candidate frame detection, Sim3 calculation, closed loop fusion, closed loop optimization and the like. Through the threads, an accurate map under a dynamic scene is finally constructed, and the visual SLAM facing the dynamic scene based on semantic optical flow and inverse depth filtering is realized.
As shown in table 1, the method of the present invention is compared with the existing visual SLAM system for dynamic scenes (4 most representative algorithms are selected here, including the algorithms proposed by DS-SLAM, DynaSLAM, Detect-SLAM, l.zhang, etc.) quantitatively on the TUM RGB-D data set, where the TUM RGB-D data set includes one low dynamic scene video sequence s _ static and four high dynamic scene video sequences w _ halfsphere, w _ rpy, w _ static and w _ xyz. The quantitative comparison result shows that the method has the highest precision in both low dynamic scenes and high dynamic scenes, can more effectively improve the capability of the visual SLAM system for coping with the dynamic scenes, and improves the positioning precision of the system in the dynamic scenes.
Table 1 shows the running result accuracy comparison on five dynamic scene video sequences of the TUM RGB-D dataset using the method of the present invention and other classical visual SLAM methods.
TABLE 1
Figure BDA0002375962260000071
(Note: the percentages in the table indicate the percentage improvement in the accuracy of the column visual SLAM method over the classical ORB-SLAM2, "-" indicates that the corresponding algorithm was not experimented on the video sequence)
The invention combines the traditional visual SLAM technology with the semantic optical flow technology and the inverse depth filtering technology based on deep learning, and provides a new visual SLAM method based on the semantic optical flow and the inverse depth filtering. The invention has strong practicability for innovation and improvement of the SLAM system based on the vision sensor, and has important significance for wider application of the vision SLAM system in the future.
Those skilled in the art will appreciate that the invention may be practiced without these specific details.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims (4)

1. A visual SLAM method based on semantic optical flow and inverse depth filtering is characterized by comprising the following steps:
the method comprises the following steps that (1) a vision sensor collects images, and performs feature extraction and semantic segmentation on the collected images to obtain extracted feature points and semantic segmentation results;
step (2) map initialization is carried out through a semantic optical flow method according to the feature points and the segmentation results, dynamic feature points are removed, and a reliable initialization map is created;
step (3) evaluating whether the 3D map points in the initialized map are dynamic points by adopting an inverse depth filter on the initialized map, and expanding the map according to the evaluation result of the inverse depth filter;
in the step (3), an inverse depth filter is adopted to evaluate and expand the map for the 3D map points in the initialized map, and the method comprises the following steps:
applying a depth filter based on gaussian-uniform mixture distribution assumption to SLAM, first, modeling the observed values of the inverse depth of map points with a mixture model of gaussian distribution and uniform distribution:
p(x|Z,π)=πN(x|Z,τ2)+(1-π)U(x|Zmin,Zmax)
the meaning of the respective quantities in the above formula is:
x is an observed value of the inverse depth of a map point and is a random variable; z is the true inverse depth of the map point, which is the value to be calculated; pi is the probability that the map point is an interior point, which is referred to as the interior point rate for short, the interior point is a static map point in the map, and the depth of the interior point is a point obtained by triangularization through a correct matching point; p (x | Z, π) represents the distribution of map points inverse depth observations; n (x | Z, τ)2) Representing true inverse depth with map pointsZ is the mean value, τ2Is a gaussian distribution of variance; u (x | Z)min,Zmax) Denotes a uniform distribution, ZminAnd ZmaxThe lower and upper bounds are uniformly distributed, namely the minimum inverse depth and the maximum inverse depth;
calculating the posterior probability distribution of the current time (Z, pi) to obtain:
p(Z,π|x1,…,xn)∝p(Z,π|x1,…,xn-1)p(xn|Z,π)
wherein x1,…,xnThe observation values are mutually independent in a series of map point inverse depths, and n is the serial number of the observation values; p (Z, π | x)1,…,xn) Is the posterior probability distribution of the current time (Z, π), p (Z, π | x)1,…,xn-1) Is the posterior probability distribution of the previous time (Z, π), p (x)n| Z, pi) is the likelihood probability of the depth measurement at the current time; to estimate the parameters Z and π and simplify the operation, p (Z, π | x)1,…,xn) Distribution approximating a gaussian-beta form:
q(Z,π|a,b,μ,σ2)=N(Z|μ,σ2)Beta(π|a,b)
wherein q (Z, π | a, b, μ, σ)2) Denotes that the (Z, π) obedience parameter is (a, b, μ, σ)2) Gaussian-Beta distribution of N (Z | μ, σ)2) Is a Gaussian distribution, Beta (π | a, b) is a Beta distribution, which has a total of 4 parameters (a, b, μ, σ)2) Wherein a and b are two parameters which are more than zero in the beta distribution in probability theory, mu and sigma2The expectation and the variance in the Gaussian distribution are obtained, and after a new inverse depth observation value is obtained, the 4 parameters are updated to obtain new Gaussian-beta distribution; first use
Figure FDA0003488251970000021
Finding the first and second moments of Z and pi, and then using p (Z, pi | x)1,…,xn) Determining the first and second moments of Z and pi, wherein
Figure FDA0003488251970000022
Indicating that the (Z, pi) obedience parameter at this time is
Figure FDA0003488251970000023
Gaussian-beta distribution of; then using the moment comparison method to p (Z, pi | x)1,…,xn) And q (Z, π | a, b, μ, σ)2) Respectively calculating the first and second moments of Z and pi, comparing them to obtain new parameters
Figure FDA0003488251970000024
When in use
Figure FDA0003488251970000025
When the inverse depth of the map point is smaller than a set threshold value, the inverse depth of the map point is considered to be converged; the first moment of the interior dot rate pi can be used as an estimate of pi:
Figure FDA0003488251970000026
when the inverse depth of the map point is converged, if the internal point rate pi is lower than a set threshold value, the map point is still considered as a dynamic point and is removed; only when the inverse depth of the map point is converged, if the internal point rate pi is higher than a set threshold value, the map point is considered to be a reliable static map point, and the reliable initial map obtained before is updated according to the reliable static map point;
and (4) continuously carrying out tracking, local map building and loop detection in sequence aiming at the extended map after the inverse depth filter is extended, further building an accurate map in a dynamic scene, and finally realizing the visual SLAM facing the dynamic scene based on semantic optical flow and inverse depth filtering.
2. The visual SLAM method based on semantic optical flow and inverse depth filtering of claim 1, wherein: in the step (1), the image feature extraction and semantic segmentation method comprises the following steps:
after image data acquired by a sensor are acquired, extracting image characteristic points, and performing semantic segmentation on the RGB image of the current frame by using a SegNet semantic segmentation network; dividing the characteristic points into three types of static, potential dynamic and dynamic through semantic information; the SegNet comprises an encoder network and a decoder network, an input image is firstly sent to the encoder network, each encoder in the encoder network generates a series of feature maps through convolution operation to obtain an input feature map, and then after batch normalization processing and ReLU activation function activation operation, a decoder in the decoder network performs up-sampling on the input feature map by using a maximum pooling index value stored in the corresponding encoder feature map to generate a sparse feature map; then, the sparse feature maps are passed through a trainable convolution module to generate dense feature maps; and the high-dimensional feature representation output by the last decoder of the decoder network is transmitted to the softmax classifier, a semantic label of each pixel is generated, and the semantic segmentation process of the image is completed.
3. The visual SLAM method based on semantic optical flow and inverse depth filtering of claim 1, wherein: in the step (2), map initialization is performed by using an semantic optical flow method and a reliable initialization map is created, wherein the method comprises the following steps:
firstly, on the basis that the feature points on the acquired image are divided into three types of static, potential dynamic and dynamic by a semantic segmentation method, calculating sparse optical flows for semantic static feature points on the image of the current frame by using the image data of the current frame and the image data of the previous frame; subsequently, a basic matrix F is calculated, which is the key to the polar geometric constraint; finally, judging the motion characteristics of the static feature points, the potential dynamic feature points and the dynamic feature points again according to the epipolar constraint, and checking the judgment result through the basis matrix F calculated just before; and setting a pixel as a threshold value in the checking process, and if the straight-line distance from the feature point in the current frame image to the epipolar line corresponding to the feature point exceeds the threshold value, judging the feature point as a real dynamic feature point, thereby obtaining a reliable initialization map.
4. The visual SLAM method based on semantic optical flow and inverse depth filtering of claim 1, wherein: in the step (4), tracking and local map building threads in a dynamic scene are performed according to the result of the map expansion by the inverse depth filter, and the method comprises the following steps:
carrying out initial pose estimation or repositioning of the system through an initial map obtained by semantic optical flow and inverse depth filtering, tracking a reconstructed local map, optimizing the pose, and determining a new key frame; after the key frame is determined, completing key frame insertion in a local image building thread, eliminating redundant map points and key frames, and then performing a local clustering adjustment step; in the loop detection thread, candidate frame detection is included, Sim3 is calculated, and closed loop fusion and closed loop optimization are performed; and finally, constructing an accurate map under the dynamic scene, and realizing the visual SLAM facing the dynamic scene based on semantic optical flow and inverse depth filtering.
CN202010065930.0A 2020-01-20 2020-01-20 Visual SLAM method based on semantic optical flow and inverse depth filtering Active CN111311708B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010065930.0A CN111311708B (en) 2020-01-20 2020-01-20 Visual SLAM method based on semantic optical flow and inverse depth filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010065930.0A CN111311708B (en) 2020-01-20 2020-01-20 Visual SLAM method based on semantic optical flow and inverse depth filtering

Publications (2)

Publication Number Publication Date
CN111311708A CN111311708A (en) 2020-06-19
CN111311708B true CN111311708B (en) 2022-03-11

Family

ID=71160787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010065930.0A Active CN111311708B (en) 2020-01-20 2020-01-20 Visual SLAM method based on semantic optical flow and inverse depth filtering

Country Status (1)

Country Link
CN (1) CN111311708B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132893B (en) * 2020-08-31 2024-01-09 同济人工智能研究院(苏州)有限公司 Visual SLAM method suitable for indoor dynamic environment
CN112037268B (en) * 2020-09-02 2022-09-02 中国科学技术大学 Environment sensing method based on probability transfer model in dynamic scene
CN112884835A (en) * 2020-09-17 2021-06-01 中国人民解放军陆军工程大学 Visual SLAM method for target detection based on deep learning
CN112446885A (en) * 2020-11-27 2021-03-05 广东电网有限责任公司肇庆供电局 SLAM method based on improved semantic optical flow method in dynamic environment
CN112465858A (en) * 2020-12-10 2021-03-09 武汉工程大学 Semantic vision SLAM method based on probability grid filtering
CN113781574B (en) * 2021-07-19 2024-04-12 长春理工大学 Dynamic point removing method for binocular refraction and reflection panoramic system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110176033A (en) * 2019-05-08 2019-08-27 北京航空航天大学 A kind of mixing probability based on probability graph is against depth estimation method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101619076B1 (en) * 2009-08-25 2016-05-10 삼성전자 주식회사 Method of detecting and tracking moving object for mobile platform
CN107833236B (en) * 2017-10-31 2020-06-26 中国科学院电子学研究所 Visual positioning system and method combining semantics under dynamic environment
CN108648270B (en) * 2018-05-12 2022-04-19 西北工业大学 Unmanned aerial vehicle real-time three-dimensional scene reconstruction method capable of realizing real-time synchronous positioning and map construction
CN108986136B (en) * 2018-07-23 2020-07-24 南昌航空大学 Binocular scene flow determination method and system based on semantic segmentation
CN110084850B (en) * 2019-04-04 2023-05-23 东南大学 Dynamic scene visual positioning method based on image semantic segmentation
CN110335319B (en) * 2019-06-26 2022-03-18 华中科技大学 Semantic-driven camera positioning and map reconstruction method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110176033A (en) * 2019-05-08 2019-08-27 北京航空航天大学 A kind of mixing probability based on probability graph is against depth estimation method

Also Published As

Publication number Publication date
CN111311708A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN111311708B (en) Visual SLAM method based on semantic optical flow and inverse depth filtering
CN110335319B (en) Semantic-driven camera positioning and map reconstruction method and system
CN112184752A (en) Video target tracking method based on pyramid convolution
CN110490158B (en) Robust face alignment method based on multistage model
CN111462210B (en) Monocular line feature map construction method based on epipolar constraint
CN113012122B (en) Category-level 6D pose and size estimation method and device
CN114782691A (en) Robot target identification and motion detection method based on deep learning, storage medium and equipment
CN112232134B (en) Human body posture estimation method based on hourglass network and attention mechanism
CN110335299B (en) Monocular depth estimation system implementation method based on countermeasure network
Kong et al. A method for learning matching errors for stereo computation.
CN110119768B (en) Visual information fusion system and method for vehicle positioning
CN112785636B (en) Multi-scale enhanced monocular depth estimation method
CN115880720A (en) Non-labeling scene self-adaptive human body posture and shape estimation method based on confidence degree sharing
CN112801945A (en) Depth Gaussian mixture model skull registration method based on dual attention mechanism feature extraction
CN114372523A (en) Binocular matching uncertainty estimation method based on evidence deep learning
Kong et al. Local stereo matching using adaptive cross-region-based guided image filtering with orthogonal weights
Singh et al. Fusing semantics and motion state detection for robust visual SLAM
Rodríguez-Puigvert et al. Bayesian deep neural networks for supervised learning of single-view depth
CN106650814B (en) Outdoor road self-adaptive classifier generation method based on vehicle-mounted monocular vision
Bouaynaya et al. A complete system for head tracking using motion-based particle filter and randomly perturbed active contour
CN113570713B (en) Semantic map construction method and device for dynamic environment
Min et al. COEB-SLAM: A Robust VSLAM in Dynamic Environments Combined Object Detection, Epipolar Geometry Constraint, and Blur Filtering
JP2023065296A (en) Planar surface detection apparatus and method
CN111583331B (en) Method and device for simultaneous localization and mapping
CN113888603A (en) Loop detection and visual SLAM method based on optical flow tracking and feature matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant