CN113379842B - RGBD camera-based weak texture and dynamic scene vision SLAM positioning method - Google Patents

RGBD camera-based weak texture and dynamic scene vision SLAM positioning method Download PDF

Info

Publication number
CN113379842B
CN113379842B CN202110695592.3A CN202110695592A CN113379842B CN 113379842 B CN113379842 B CN 113379842B CN 202110695592 A CN202110695592 A CN 202110695592A CN 113379842 B CN113379842 B CN 113379842B
Authority
CN
China
Prior art keywords
frame
image
current frame
frame image
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110695592.3A
Other languages
Chinese (zh)
Other versions
CN113379842A (en
Inventor
梅天灿
姜祚鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110695592.3A priority Critical patent/CN113379842B/en
Publication of CN113379842A publication Critical patent/CN113379842A/en
Application granted granted Critical
Publication of CN113379842B publication Critical patent/CN113379842B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Abstract

The invention provides a weak texture and dynamic scene vision SLAM positioning method based on an RGBD camera. The method comprises the steps of collecting images in real time through an RGBD camera, comparing the sequence number of a current frame image with a preset sliding window length, and calculating the grid size of feature extraction of the current frame image; constructing a plurality of pixel blocks according to the feature point set of the previous frame image, projecting the pixel blocks to the current frame, and removing the dynamic feature points in the previous frame image according to the gray difference between each pixel block in the previous frame image and the corresponding projected pixel block in the current frame image to obtain the static feature point set of the previous frame image; and constructing a pose solving objective function based on the static feature point set of the previous frame of image, performing first-stage optimization solving on the pose of the current frame camera, and optimizing the pose of the current frame camera based on the key frame common view to obtain the optimized pose of the current frame camera. The method can work stably in a dynamic scene with weak texture and moving target, and has strong environment adaptability.

Description

RGBD camera-based weak texture and dynamic scene vision SLAM positioning method
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a weak texture and dynamic scene vision SLAM positioning method based on an RGBD camera.
Background
With the continuous progress and development of scientific technologies such as computers, electronic engineering, virtual reality, artificial intelligence and the like, the robot technology gradually becomes a hot spot of current scientific research. Meanwhile, the figure of the service robot is gradually appearing in many fields of medical treatment, military affairs, agriculture, industrial production and daily life. It is expected that, with the development of the times and the technological level, in the future of more automation and intelligence in production and life, the service robot will have a very wide application prospect and will permeate into various fields of human production and life. In order to realize autonomous movement of the robot, the robot needs to acquire the position and surrounding environment information of the robot in real time, and thus the SLAM technology is developed. SLAM is a short term for synchronous positioning and Mapping (Simultaneous Localization and Mapping), and refers to a robot or a mobile platform that continuously positions itself and builds a map of the surrounding environment.
The sensor of the traditional SLAM scheme is mainly a laser radar, but the laser radar is high in cost, point cloud information can only reflect the distance and the angle of a surrounding object point, abundant information such as texture and semantics of the surrounding environment is discarded, and the interactivity with people is poor. Therefore, cameras have become mainstream sensors of the SLAM scheme in recent years, and the SLAM scheme having a camera as a main sensor is called a visual SLAM.
In practical applications, many factors can interfere with the accuracy and stability of the visual SLAM system, and the most significant factor is the scene characteristics. Since most visual SLAM schemes rely strongly on feature extraction and matching, when the scene texture is weak or there are moving objects in the scene, the feature extraction and matching between images will be greatly affected, thereby affecting the stable operation of the visual SLAM system. The patent literature discloses an RGB-D SLAM method (publication No. CN106127739B) combined with monocular vision, a method for realizing SLAM positioning based on monocular vision and a related device (publication No. CN111928842A) both perform feature extraction on images, and establish data association through feature matching between two adjacent frames of images, so as to position a camera, but both methods do not mention the operation problem of visual SLAM in a scene with weak texture and dynamic object. The patent literature discloses a visual SLAM method and device based on dynamic target detection (application publication No. CN112435278A), and a method and related device for realizing SLAM positioning in a dynamic environment (application publication No. 111928857a), which all use a deep learning network to detect potential moving targets in an image, so as to eliminate potential dynamic feature points, and improve the operation capability of a visual SLAM system in a dynamic scene. However, the method needs to train the network model in advance, and when the environment changes, the type of the dynamic object included in the model needs to be changed, and the deep learning network also occupies larger computing resources in practical application. In addition, the method based on the deep learning network can remove all objects which are likely to move in the image even if the objects do not actually move (such as a stationary vehicle, a human being and the like), and the deep learning network cannot remove the objects when some non-potential moving objects which are moved exist in the environment (such as a moved book, a moved cabinet and the like). Therefore, although the method can solve the operation problem of the visual SLAM system in a dynamic scene to a certain extent, the method also has a certain problem and is not strong in applicability in an actual scene.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a weak texture and dynamic scene vision SLAM positioning method based on an RGBD camera, and the method is independent of a deep learning network, has good real-time performance and strong adaptability to different environments.
In order to achieve the above object, the technical solution of the present invention provides a method for positioning a weak texture and dynamic scene vision SLAM based on an RGBD camera, which includes the following steps:
step 1, calibrating a camera principal point, a focal length and distortion parameters by a Zhang Zhengyou camera calibration method to obtain a calibrated camera principal point, a calibrated focal length and calibrated distortion parameters; further setting a scale factor, the number of preset feature points and the length of a sliding window;
step 2, collecting a plurality of frames of images in real time through a camera, comparing the sequence number of the current frame of image with the length of a sliding window, and further calculating the grid size of the current frame of image;
step 3, initializing a camera pose of a current frame, constructing a plurality of pixel blocks according to a feature point set of a previous frame of image, projecting all the pixel blocks to the current frame in sequence by combining with an initial camera pose structure of the current frame, calculating gray level differences between each pixel block in the previous frame of image and a corresponding projected pixel block in the current frame of image, and removing dynamic feature points in the feature point set of the previous frame of image by combining with the gray level differences to obtain a static feature point set of the previous frame of image;
step 4, constructing a pose solving objective function based on the static feature point set of the previous frame of image and the initial camera pose of the current frame, performing first-stage optimization solving on the camera pose of the current frame, further performing further optimization solving on the camera pose of the current frame based on the key frame common view to obtain the camera pose of the current frame after optimization solving;
preferably, the scale factors in step 1 are: the physical distance of a spatial point from the center of the camera is proportional to the depth value of the point in the depth image.
Step 1, the number of the preset feature points is as follows: the number of the characteristic points which are preset by a user and expected to be extracted from each frame of image is defined as N;
step 1, the length of the sliding window is defined as: l;
preferably, the step 2 specifically comprises:
if the current frame number is k, and if k is 1, the step 2 of calculating the grid size of the current frame image specifically includes:
calculating the grid size of the current frame image according to the size of the current frame image and the number of the preset feature points in the step 1, wherein the specific calculation formula is as follows:
Figure BDA0003128201360000031
wherein G iskThe grid size of the k frame image is represented, N represents the number of the preset feature points set in the step 1, w represents the length of the k frame image, and h represents the width of the k frame image;
if k is more than 1 and less than or equal to w, the step 2 of calculating the grid size of the current frame image specifically comprises the following steps:
uniformly dividing the image of the (k-1) th frame according to the grid size of the image of the (k-1) th frame to obtain a plurality of grids in the image of the (k-1) th frame, and defining the set of all the grids of the (k-1) th frame as Grk-1
Performing FAST feature extraction on the (k-1) frame image to obtain a plurality of feature points of the (k-1) frame image;
calculating the score of each feature point of the image of the (k-1) th frame based on a Shi-Tomasi algorithm, selecting the feature point with the highest score of each grid in the image of the (k-1) th frame as a main selection feature point of the grid in the image of the (k-1) th frame, and selecting the feature point with the next highest score of each grid in the image of the (k-1) th frame as a candidate feature point of the grid in the image of the (k-1) th frame;
and the main selected characteristic points of the grid j in the k-1 frame image are defined as follows:
Figure BDA0003128201360000032
and the candidate characteristic points of the grid j in the k-1 frame image are defined as:
Figure BDA0003128201360000033
counting the number of main feature points of all grids in the k-1 frame image, and defining as follows: n is a radical ofk-1
Comparing the number of the main feature points of all grids in the k-1 frame image with the number N of the preset feature points:
when N is presentk-1When the number of the feature points of the k-1 frame image is more than or equal to N, the feature points of the k-1 frame image are formed by the main selection feature points of all grids, otherwise, the feature points of the k-1 frame image are formed by the main selection feature points and the alternative feature points of all grids;
if N is presentk-1If < N, the grid size of the current frame image, namely the k frame image is Gk-1-theta, if Nk-1When the grid size of the current frame image, namely the kth frame image is G when the grid size is larger than or equal to Nk-1+ θ, the grid size of the current frame image, i.e. the kth frame image, is defined as: gk
When k is greater than L, the step 2 of calculating the grid size of the current frame image specifically includes:
calculating the grid size of the current frame image, wherein the specific calculation formula is as follows:
Figure BDA0003128201360000041
Figure BDA0003128201360000042
Figure BDA0003128201360000043
Figure BDA0003128201360000044
wherein G iskThe grid size of the k frame image is represented, N represents the number of preset feature points set in the step 1, L represents the length of a sliding window, i belongs to [1, L ]],Nk,preRepresenting a weighted value, G, of the number of dominant feature points in the kth frame of imagek,preRepresents a grid size weighted value in the k frame image, p represents a weighted scale factor, σ represents a grid size adjustment factor, Gk-iRepresents the grid size, N, of the k-i frame imagek-iRepresenting the number of main feature points of all grids in the k-i frame image, and m represents the sum of weights;
and the main feature point set of the k frame image is defined as:
Figure BDA0003128201360000045
the set of candidate feature points of the k frame image is defined as:
Figure BDA0003128201360000046
the feature point set of the k frame image is defined as:
Figure BDA0003128201360000047
wherein, numkIs the number of grids in the k-th frame, and j is the grid number.
Preferably, the step 3 of constructing a plurality of pixel blocks according to the previous frame image feature point set specifically includes:
each characteristic point in the previous frame image in turn, namely FPk-1Respectively constructing pixel blocks with the length bw and the width bh by taking the coordinates as centers;
and 3, projecting all pixel blocks to the current frame in sequence by combining the initial camera pose structure of the current frame, specifically:
sequentially projecting each pixel block of the previous frame to the current frame k according to the camera parameters, the camera pose of the previous frame and the initial camera pose of the current frame of the configuration file in the step 1;
T(ini,k)=Tk-1·ΔT(k-2,k-1)
wherein T is(ini,k)Initiating a camera pose, Δ T, for the current frame (i.e., the k-th frame)(k-2,k-1)The camera pose increment from the (k-2) th frame to the (k-1) th frame is obtained;
step 3, calculating the gray level difference between each pixel block in the previous frame image and the corresponding projection pixel block in the current frame image as follows:
ΔL(pn)=Ik(p'n)-Ik-1(pn)
wherein p isnIs a characteristic point with the sequence number n on the k-1 frame, and n belongs to [1, o (FP)k-1)],o(FPk-1) Is the number of characteristic points of the k-1 th frame, Ik-1(pn) Is represented by pnThe gray value of the pixel block as the center is obtained by calculation of bilateral interpolation method, Ik(p’n) Is represented by pnThe corresponding projected pixel block gray value on the k-th frame for the centered pixel block.
And 3, removing the dynamic feature points in the feature point set of the previous frame of image by combining the gray level difference, specifically:
and (3) marking the feature points in the previous frame image corresponding to the pixel blocks with the gray difference larger than the threshold Th as dynamic feature points, deleting the dynamic feature points from the previous frame image feature point set, defining the residual feature points as the previous frame image static feature point set in the step 3, and specifically defining as: rk-1
Preferably, the step 4 of constructing a pose solving objective function based on the previous frame image static feature point set and the current frame initial camera pose is as follows:
projecting each static point in the static feature point set of the previous frame image to the current frame image according to the step 3 by combining the initial camera pose of the current frame;
step 4, constructing a pose solving function optimization target, namely finding a current frame camera pose TkAfter the pixel blocks in the (k-1) th frame image are projected according to the pose of the current frame image, the sum of the gray level differences of the pixel blocks in the previous frame image and the corresponding projected pixel blocks in the current frame image is minimum;
step 4, the pose solution objective function is as follows:
Figure BDA0003128201360000051
Figure BDA0003128201360000052
wherein, Ik-1(ps) Denotes p on the k-1 th framesA pixel block gray value as a center, pi represents the conversion of a point from a camera coordinate system to an image coordinate system according to camera intrinsic parameters, pi-1Indicating conversion of a point press from an image coordinate system to a camera coordinate system, psRepresenting a set R of k-1 frame static feature pointsk-1Feature points with middle sequence number s, dpsRepresents psDepth value of (d);
and 4, performing first-stage optimization solution on the current frame camera pose as follows:
by T(ini,k)As an iteration initial value of the pose of the current frame camera, using a Gauss-Newton algorithm to iteratively solve the pose solving objective function in the step 4 to obtain the pose of the current frame camera, wherein the pose is defined as: t isk';
And 4, further optimizing and solving the current frame camera pose based on the key frame common view to obtain:
finding a key frame having a common-view relation with a current frame image in a key frame common-view as a reference frame, constructing a pixel block with the length bw and the width bh by taking a static characteristic point on the reference frame as a center, projecting each pixel block on the reference frame to the current frame to construct a pose optimization function, and using T to solve the pose of the current framekUsing Gauss-Newton algorithm to optimize the camera pose of the current frame image as the iteration initial value of the camera pose of the current frame image, and defining the optimized camera pose of the current frame image as TkThe optimization function is defined as follows:
Figure BDA0003128201360000061
wherein, Δ L (T)k,pu) Representing a point p of static character on a reference frameuThe gray difference, p, between the central pixel block and its corresponding projected pixel block on the current frame, i.e. the k-th frameuRepresenting the feature points with the sequence number u in the reference frame static feature point set, and Rre representing the static feature point set on the reference frame;
step 4, the key frame common view is obtained by the following specific steps:
when the interval of the last key frame in the current frame and the key frame co-view is more than T frames or the number of co-viewpoints between the current frame and the reference frame is less than a threshold Th _ pt, the key frame is considered to be inserted into the key frame co-view;
judging the quality of the current frame, if the current frame has at least Th _ ft recoverable map points and the pose difference between the current frame and the previous key frame is greater than Th _ sh, using the current frame as the key frame according to the camera pose TkInserted into the key frame common view. Wherein, the pose difference calculation formula is as follows:
Figure BDA0003128201360000062
creating a map point set MP for the current key frame according to the camera pose of the current frame and the camera internal parameters in the step 1kThe formula is created as follows:
MPk={Tk·π-1(pv,dpv)|pv∈Rk}
where Rk represents the set of static feature points of the k frame image, pvDenotes a feature point, dp, with a sequence number v in RkvRepresents a point pvDepth value of (d, n)-1Indicating conversion of a point-to-point camera reference from an image coordinate system to a camera coordinate system, TkRepresenting the current frame camera pose.
Sequentially judging the redundancy conditions of other key frames and map points in the key frame common view, and when a certain key frame in the map simultaneously meets the following conditions: more than a% of elements in the map point set of the key frame are contained by the map point sets of the key frames of other b frames; the key frame is separated from the current frame by more than c key frames. The key frame is deleted.
When a map point in the map satisfies: after the map point is not included in the map point set of all the key frames in the map or the map point is created for the first time, the map point cannot be observed by two next key frames. The map point is deleted.
The method has the advantage that the positioning capability of the visual SLAM system in weak texture and dynamic scenes is greatly improved. The grid size of the current frame feature extraction is dynamically adjusted according to the number of feature points of the processed frame, so that the positioning accuracy and stability of the visual SLAM system in a weak texture scene are greatly improved; and constructing pixel blocks based on the feature point set of the previous frame image, and eliminating dynamic feature points in the image according to the gray level difference between each pixel block in the previous frame image and the corresponding projection pixel block in the current frame image, thereby greatly improving the positioning precision and stability of the visual SLAM system in a dynamic scene.
Drawings
FIG. 1: is a flow chart of an embodiment of the method of the present invention.
Detailed Description
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings and embodiments, so as to fully understand the objects, the features and the effects of the present invention. It is to be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.
The technical scheme adopted by the invention is as follows: a specific embodiment flow of a weak texture and dynamic scene vision SLAM method based on an RGBD camera is shown in FIG. 1, and the method mainly comprises the following steps:
a weak texture and dynamic scene vision SLAM positioning method based on an RGBD camera comprises the following steps:
step 1, calibrating a camera principal point, a focal length and distortion parameters by a Zhang Zhengyou camera calibration method to obtain a calibrated camera principal point, a calibrated focal length and calibrated distortion parameters; further setting a scale factor, the number of preset feature points and the length of a sliding window;
step 1, the scale factors are as follows: the physical distance of a spatial point from the center of the camera is proportional to the depth value of the point in the depth image.
Step 1, the number of the preset feature points is as follows: the number of feature points which are preset by a user and desired to be extracted from each frame of image is defined as N-1200;
step 1, the length of the sliding window is defined as: l ═ 5;
step 2, collecting a plurality of frames of images in real time through a camera, comparing the sequence number of the current frame of image with the length of a sliding window, and further calculating the grid size of the current frame of image;
the step 2 specifically comprises the following steps:
if the current frame number is k, and if k is 1, the step 2 of calculating the grid size of the current frame image specifically includes:
calculating the grid size of the current frame image according to the size of the current frame image and the number of the preset feature points in the step 1, wherein the specific calculation formula is as follows:
Figure BDA0003128201360000081
wherein, GkThe grid size of the k frame image is represented, N is 1200, w is 640, and h is 480;
if k is more than 1 and less than or equal to w, the step 2 of calculating the grid size of the current frame image specifically comprises the following steps:
uniformly dividing the k-1 frame image according to the grid size of the k-1 frame image to obtain a plurality of grids in the k-1 frame image, and defining all grid sets of the k-1 frame as Grk-1
Performing FAST feature extraction on the (k-1) frame image to obtain a plurality of feature points of the (k-1) frame image;
calculating the score of each feature point of the image of the (k-1) th frame based on a Shi-Tomasi algorithm, selecting the feature point with the highest score of each grid in the image of the (k-1) th frame as a main selection feature point of the grid in the image of the (k-1) th frame, and selecting the feature point with the next highest score of each grid in the image of the (k-1) th frame as a candidate feature point of the grid in the image of the (k-1) th frame;
and the main selected characteristic points of the grid j in the k-1 frame image are defined as follows:
Figure BDA0003128201360000082
and the candidate characteristic points of the grid j in the k-1 frame image are defined as:
Figure BDA0003128201360000083
counting the number of main feature points of all grids in the k-1 frame image, and defining as follows: n is a radical ofk-1
Comparing the number of the main feature points of all grids in the k-1 frame image with the number N of the preset feature points:
when N is presentk-1When the number of the feature points of the k-1 frame image is more than or equal to N, the feature points of the k-1 frame image are formed by the main selection feature points of all grids, otherwise, the feature points of the k-1 frame image are formed by the main selection feature points and the alternative feature points of all grids;
if N is presentk-1If < N, the grid size of the current frame image, namely the k frame image is Gk-1θ, if Nk-1When the grid size of the current frame image, namely the kth frame image is G when the grid size is larger than or equal to Nk-1+ θ, θ is 2, and the mesh size of the current frame image, i.e., the k-th frame image, is defined as: gk
When k is greater than L, the step 2 of calculating the grid size of the current frame image specifically includes:
calculating the grid size of the current frame image, wherein the specific calculation formula is as follows:
Figure BDA0003128201360000091
Figure BDA0003128201360000092
Figure BDA0003128201360000093
Figure BDA0003128201360000094
wherein G iskA mesh size indicating the kth frame image, N1200, L5, i ∈ [1, L],Nk,preRepresenting a weighted value, G, of the number of dominant feature points in the kth frame of imagek,preRepresents the grid size weighting value in the k frame image, p is 0.8, sigma is 100, Gk-iRepresents the grid size, N, of the k-i frame imagek-iRepresenting the number of main feature points of all grids in the k-i frame image, and m represents the sum of weights;
and the main feature point set of the k frame image is defined as:
Figure BDA0003128201360000095
the set of candidate feature points of the k frame image is defined as:
Figure BDA0003128201360000096
the feature point set of the k frame image is defined as:
Figure BDA0003128201360000097
wherein, numkIs the number of grids in the k-th frame, and j is the grid number.
Step 3, initializing a camera pose of a current frame, constructing a plurality of pixel blocks according to a feature point set of a previous frame of image, projecting all the pixel blocks to the current frame in sequence by combining with the initial camera pose of the current frame, calculating the gray difference between each pixel block in the previous frame of image and a corresponding projected pixel block in the current frame of image, and removing dynamic feature points in the feature point set of the previous frame of image by combining with the gray difference to obtain a static feature point set of the previous frame of image;
step 3, constructing a plurality of pixel blocks according to the previous frame image feature point set, specifically:
each characteristic point in the previous frame image in turn, namely FPk-1Respectively constructing pixel blocks with the length bw equal to 4 and the width bh equal to 4 by taking the coordinates as centers;
step 3, projecting all pixel blocks to the current frame in sequence by combining the initial camera position structure of the current frame, specifically:
sequentially projecting each pixel block of the previous frame to the current frame k according to the camera parameters, the camera pose of the previous frame and the initial camera pose of the current frame of the configuration file in the step 1;
T(ini,k)=Tk-1·ΔT(k-2,k-1)
wherein T is(ini,k)Initiating a camera pose, Δ T, for the current frame (i.e., the k-th frame)(k-2,k-1)The camera pose increment from the (k-2) th frame to the (k-1) th frame is obtained;
step 3, calculating the gray level difference between each pixel block in the previous frame image and the corresponding projection pixel block in the current frame image as follows:
ΔL(pn)=Ik(p'n)-Ik-1(pn)
wherein p isnIs a characteristic point with the sequence number n on the k-1 frame, and n belongs to [1, o (FP)k-1)],o(FPk-1) Is the number of characteristic points of the k-1 th frame, Ik-1(pn) Is represented by pnThe gray value of the pixel block as the center is obtained by calculation of bilateral interpolation method, Ik(p’n) Is represented by pnThe corresponding projected pixel block gray value on the k-th frame for the centered pixel block.
And 3, removing the dynamic feature points in the feature point set of the previous frame of image by combining the gray level difference, specifically:
and (3) taking the feature points in the previous frame image corresponding to the pixel blocks with the gray difference larger than the threshold Th of 35 as dynamic feature points, deleting the dynamic feature points from the previous frame image feature point set, defining the remaining feature points as the previous frame image static feature point set in the step 3, and specifically defining as: rk-1
Step 4, constructing a pose solving objective function based on the static feature point set of the previous frame of image and the initial camera pose of the current frame, performing first-stage optimization solving on the camera pose of the current frame, further performing further optimization solving on the camera pose of the current frame based on the key frame common view to obtain the camera pose of the current frame after optimization solving;
step 4, constructing a pose solving objective function based on the static feature point set of the previous frame of image and the pose of the initial camera of the current frame, wherein the pose solving objective function comprises the following steps:
projecting each static point in the static feature point set of the previous frame image to the current frame image according to the step 3 by combining the initial camera pose of the current frame;
step 4, constructing a pose solving function optimization target, namely finding a current frame camera pose TkAfter the pixel blocks in the (k-1) th frame image are projected according to the pose of the current frame image, the sum of the gray level differences of the pixel blocks in the previous frame image and the corresponding projected pixel blocks in the current frame image is minimum;
step 4, the pose solution objective function is as follows:
Figure BDA0003128201360000101
Figure BDA0003128201360000112
wherein, Ik-1(ps) Denotes p on the k-1 th framesA pixel block gray value as a center, pi represents the conversion of a point from a camera coordinate system to an image coordinate system according to camera intrinsic parameters, pi-1Indicating conversion of a point press from an image coordinate system to a camera coordinate system, psRepresenting a set R of k-1 frame static feature pointsk-1Feature points with middle sequence number s, dpsRepresents psDepth value of (d);
and 4, performing first-stage optimization solution on the current frame camera pose as follows:
by T(ini,k)As the iterative initial value of the current frame camera pose, using Gauss-Newton algorithm to iteratively solve the pose solving objective function in the step 4 to obtain the current frame camera poseDefined as: t isk';
And 4, further optimizing and solving the current frame camera pose based on the key frame common view to obtain:
searching a key frame having a common-view relation with a current frame image in a key frame common-view as a reference frame, constructing a pixel block with the length bw being 4 and the width bh being 4 by taking a static characteristic point on the reference frame as a center, projecting each pixel block on the reference frame to the current frame similarly to the previous-stage pose solving, constructing a pose optimization function, and taking T as a reference for the current frame imagekUsing Gauss-Newton algorithm to optimize the camera pose of the current frame image as the iteration initial value of the camera pose of the current frame image, and defining the optimized camera pose of the current frame image as TkThe optimization function is defined as follows:
Figure BDA0003128201360000111
wherein, Δ L (T)k,pu) Representing a point p of static character on a reference frameuThe gray difference, p, between the central pixel block and its corresponding projected pixel block on the current frame, i.e. the k-th frameuRepresenting the feature points with the sequence number u in the reference frame static feature point set, and Rre representing the static feature point set on the reference frame;
step 4, the key frame common view is obtained by the following specific steps:
when the previous key frame interval T in the current frame and key frame co-view is more than 30 frames or the number of co-viewpoints between the current frame and the reference frame is less than the threshold Th _ pt is 30, the key frame is considered to be inserted into the key frame co-view;
judging the quality of the current frame, if the current frame has at least 15 recoverable map points and the pose difference between the current frame and the previous key frame is greater than 0.1, using the current frame as the key frame and according to the camera pose TkInserted into the key frame common view.
Creating a map point set MP for the current key frame according to the camera pose of the current frame and the camera internal parameters in the step 1kThe formula is created as follows:
MPk={Tk·π-1(pv,dpv)|pv∈Rk}
wherein R iskSet of static feature points, p, representing the image of the k-th framevRepresents RkFeature points with middle ordinal number v, dpvRepresents a point pvDepth value of (d, n)-1Indicating conversion of a point-to-point camera reference from an image coordinate system to a camera coordinate system, TkRepresenting the current frame camera pose.
Sequentially judging the redundancy conditions of other key frames and map points in the key frame common view, and when a certain key frame in the map simultaneously meets the following conditions: the map point set of the key frame contains more than 90% of elements with a percent equal to 90% contained by other map point sets of key frames with b equal to 3; the key frame is separated from the current frame by more than 10 key frames. The key frame is deleted.
When a map point in the map satisfies: after the map point is not included in the map point set of all the key frames in the map or is created for the first time, the map point cannot be observed by two consecutive key frames. The map point is deleted.
And 5, outputting a result, writing the timestamp and the pose information of the current frame image into a text file for storage, wherein the pose is expressed in a quaternion form.
Experiment: according to the method, a visual SLAM system (hereinafter referred to as the system) is constructed based on a Linux platform, positioning experiments are carried out on the image sequences of a dynamic scene and a weak texture scene of a public RGB-D data set TUM, and compared with other two open-source RGB-D visual SLAM schemes, and the experimental data are shown in tables 1 and 2 (in the tables, "-" indicates that the system cannot normally complete positioning tasks under the sequences):
TABLE 1 System positioning Accuracy (ATE) comparison in dynamic scenarios (unit: m)
Figure BDA0003128201360000121
TABLE 2 System positioning Accuracy (ATE) comparison (unit: m) in Weak texture scenes
Figure BDA0003128201360000122
In summary, aiming at the problem that the positioning accuracy of the existing SLAM technology is reduced in the weak texture and dynamic scene, the invention provides a dynamic point elimination method based on a pixel gray error threshold value by combining the image characteristics of the weak texture and the dynamic scene, and overcomes the problem of mismatching in the dynamic scene. A back-end optimization framework based on graph optimization is constructed, a key frame common view is created and maintained, key frames and map points are created and deleted by adopting a simple and effective strategy, and drift generated by system operation is effectively corrected. Experimental results prove that the method provided by the invention can achieve better performance than the existing method in weak texture and dynamic scenes, and can meet the positioning and drawing application requirements in challenging scenes
The above-described examples are merely illustrative of embodiments of the present invention and do not limit the scope of the present invention. Various modifications and improvements of the technical solution of the present invention may be made by those skilled in the art without departing from the spirit of the present invention, and the technical solution of the present invention is to be covered by the protection scope defined by the claims.

Claims (3)

1. A weak texture and dynamic scene vision SLAM positioning method based on an RGBD camera is characterized by comprising the following steps:
step 1, calibrating a camera principal point, a focal length and distortion parameters by a Zhang Zhengyou camera calibration method to obtain a calibrated camera principal point, a calibrated focal length and calibrated distortion parameters; further setting a scale factor, the number of preset feature points and the length of a sliding window;
step 2, collecting a plurality of frames of images in real time through a camera, comparing the sequence number of the current frame of image with the length of a sliding window, and further calculating the grid size of the current frame of image;
step 3, initializing a camera pose of a current frame, constructing a plurality of pixel blocks according to a feature point set of a previous frame image, constructing all the pixel blocks by combining the initial camera pose of the current frame image, projecting the pixel blocks to the current frame in sequence, calculating the gray level difference between each pixel block in the previous frame image and the corresponding projected pixel block in the current frame image, and removing dynamic feature points in the feature point set of the previous frame image by combining the gray level difference to obtain a static feature point set of the previous frame image;
step 4, constructing a pose solving objective function based on the static feature point set of the previous frame of image and the initial camera pose of the current frame, performing first-stage optimization solving on the camera pose of the current frame, further performing further optimization solving on the camera pose of the current frame based on the key frame common view to obtain the camera pose of the current frame after optimization solving;
step 1, the scale factors are as follows: the ratio of the physical distance of a spatial point from the center of the camera to the depth value of the point in the depth image;
step 1, the number of the preset feature points is as follows: the number of the characteristic points which are preset by a user and expected to be extracted from each frame of image is defined as N;
step 1, the length of the sliding window is defined as: l;
the step 2 specifically comprises the following steps:
if the current frame number is k, and if k is 1, the step 2 of calculating the grid size of the current frame image specifically includes:
calculating the grid size of the current frame image according to the size of the current frame image and the number of the preset feature points in the step 1, wherein the specific calculation formula is as follows:
Figure FDA0003626838870000011
wherein, GkThe grid size of the k frame image is represented, N represents the number of the preset feature points set in the step 1, w represents the length of the k frame image, and h represents the width of the k frame image;
if k is more than 1 and less than or equal to L, the step 2 of calculating the grid size of the current frame image specifically comprises the following steps:
uniformly dividing the k-1 frame image according to the grid size of the k-1 frame image to obtain a plurality of grids in the k-1 frame image, and defining the k-1 frame imageSet of all grids of a frame as Grk-1
Performing FAST feature extraction on the (k-1) frame image to obtain a plurality of feature points of the (k-1) frame image;
calculating the score of each feature point of the image of the (k-1) th frame based on a Shi-Tomasi algorithm, selecting the feature point with the highest score of each grid in the image of the (k-1) th frame as a main selection feature point of the grid in the image of the (k-1) th frame, and selecting the feature point with the next highest score of each grid in the image of the (k-1) th frame as a candidate feature point of the grid in the image of the (k-1) th frame;
and the main selected characteristic points of the grid j in the k-1 frame image are defined as follows:
Figure FDA0003626838870000021
and the candidate characteristic points of the grid j in the k-1 frame image are defined as:
Figure FDA0003626838870000022
counting the number of main feature points of all grids in the k-1 frame image, and defining the number as follows: n is a radical ofk-1
Comparing the number of the main feature points of all grids in the k-1 frame image with the number N of the preset feature points:
when N is presentk-1When the number of the feature points of the k-1 frame image is more than or equal to N, the feature points of the k-1 frame image are formed by the main selection feature points of all grids, otherwise, the feature points of the k-1 frame image are formed by the main selection feature points and the alternative feature points of all grids;
if N is presentk-1If < N, the grid size of the current frame image, namely the k frame image is Gk-1-theta, if Nk-1When the grid size of the current frame image, namely the kth frame image is G when the grid size is larger than or equal to Nk-1+ θ, the grid size of the current frame image, i.e. the kth frame image, is defined as: gk
When k is greater than L, the step 2 of calculating the grid size of the current frame image specifically includes:
calculating the grid size of the current frame image, wherein the specific calculation formula is as follows:
Figure FDA0003626838870000023
Figure FDA0003626838870000024
Figure FDA0003626838870000025
Figure FDA0003626838870000026
wherein G iskThe grid size of the k frame image is shown, N is the number of preset feature points set in the step 1, L is the length of a sliding window, i belongs to [1, L],Nk,preRepresenting a weighted value, G, of the number of dominant feature points in the kth frame of imagek,preRepresents a grid size weighted value in the k frame image, p represents a weighted scale factor, σ represents a grid size adjustment factor, Gk-iRepresents the grid size, N, of the k-i frame imagek-iRepresenting the number of main feature points of all grids in the k-i frame image, and m represents the sum of weights;
and the main feature point set of the k frame image is defined as:
Figure FDA0003626838870000031
the set of candidate feature points of the k frame image is defined as:
Figure FDA0003626838870000032
the feature point set of the k frame image is defined as:
Figure FDA0003626838870000033
wherein, numkAs the number of meshes in the k-th frameAnd j is a grid number.
2. The method for positioning the weak texture and dynamic scene vision SLAM based on the RGBD camera as claimed in claim 1, wherein the step 3 constructs a plurality of pixel blocks according to the feature point set of the previous frame image, specifically:
each characteristic point in the previous frame image in turn, namely FPk-1Respectively constructing pixel blocks with the length bw and the width bh by taking the coordinates as a center;
step 3, constructing all pixel blocks to project to the current frame in sequence by combining the initial camera pose of the current frame, specifically:
sequentially projecting each pixel block of the previous frame to the current frame k according to the camera parameters, the camera pose of the previous frame and the initial camera pose of the current frame of the configuration file in the step 1;
T(ini,k)=Tk-1·ΔT(k-2,k-1)
wherein T is(ini,k)For the current frame, i.e. the k-th frame, the initial camera pose, Δ T(k-2,k-1)The camera pose increment from the (k-2) th frame to the (k-1) th frame is obtained;
step 3, calculating the gray level difference between each pixel block in the previous frame image and the corresponding projection pixel block in the current frame image as follows:
ΔL(pn)=Ik(p′n)-Ik-1(pn)
wherein p isnIs a characteristic point with the sequence number n on the k-1 frame, and n belongs to [1, o (FP)k-1)],o(FPk-1) Is the number of characteristic points of the k-1 th frame, Ik-1(pn) Is represented by pnThe gray value of the pixel block as the center is obtained by calculation of bilateral interpolation method, Ik(p’n) Is represented by pnThe gray value of the corresponding projection pixel block of the pixel block which is taken as the center on the k frame;
and 3, removing the dynamic feature points in the feature point set of the previous frame of image by combining the gray level difference, specifically:
the characteristic points in the previous frame image corresponding to the pixel blocks with the gray difference larger than the threshold Th are taken as dynamic characteristic points, and the dynamic characteristic points are collected from the characteristic points of the previous frame imageDeleting, wherein the remaining feature points are defined as the static feature point set of the previous frame image in the step 3, and the specific definition is as follows: rk-1
3. The method for visually positioning the SLAM based on the weak texture and the dynamic scene of the RGBD camera in the claim 1, wherein the step 4 of constructing the pose solving objective function based on the static feature point set of the previous frame image and the initial camera pose of the current frame image is as follows:
projecting each static point in the static feature point set of the previous frame image to the current frame image according to the step 3 by combining the initial camera pose of the current frame;
step 4, constructing a pose solving function optimization target, namely finding a current frame camera pose TkAfter the pixel blocks in the (k-1) th frame image are projected according to the pose of the current frame image, the sum of the gray level differences of the pixel blocks in the previous frame image and the corresponding projected pixel blocks in the current frame image is minimum;
step 4, the pose solution objective function is as follows:
Figure FDA0003626838870000041
Figure FDA0003626838870000042
wherein, Ik-1(ps) Denotes p on the k-1 th framesA pixel block gray value as a center, pi represents the conversion of a point from a camera coordinate system to an image coordinate system according to camera internal parameters, pi-1Indicating conversion of a point press from an image coordinate system to a camera coordinate system, psRepresenting a set R of k-1 frame static feature pointsk-1Feature points with middle sequence number s, dpsDenotes psDepth value of (d);
and 4, performing first-stage optimization solution on the current frame camera pose as follows:
by T(ini,k)As the pose of the current frame cameraAnd (5) iteration initial values, using Gauss-Newton algorithm to carry out iteration solution on the pose solution objective function in the step (4), and obtaining the pose of the camera of the current frame by solution, wherein the pose is defined as: t'k
And 4, further optimizing and solving the current frame camera pose based on the key frame common view to obtain:
searching a key frame having a common-view relation with a current frame image in a key frame common-view picture as a reference frame, constructing a pixel block with the length bw and the width bh by taking a static characteristic point on the reference frame as a center, projecting each pixel block on the reference frame to the current frame, constructing a pose optimization function, and using T'kAs the camera pose iteration initial value of the current frame image, optimizing the camera pose of the current frame image by using Gauss-Newton algorithm, wherein the optimized camera pose of the current frame image is defined as TkThe optimization function is defined as follows:
Figure FDA0003626838870000051
wherein, Δ L (T)k,pu) Representing a point p of static character on a reference frameuThe gray difference, p, between the central pixel block and its corresponding projected pixel block on the current frame, i.e. the k-th frameuRepresenting the feature points with the sequence number u in the reference frame static feature point set, and Rre representing the static feature point set on the reference frame;
step 4, the key frame common view is obtained by the following specific steps:
when the interval T of the last key frame in the common view of the current frame and the key frame is more than or equal to the threshold Th _ pt, or the number of the common viewpoints between the current frame and the reference frame is less than the threshold Th _ pt, the key frame is considered to be inserted into the common view of the key frame;
judging the quality of the current frame, if the current frame has at least Th _ ft recoverable map points and the pose difference between the current frame and the previous key frame is greater than Th _ sh, using the current frame as the key frame according to the camera pose TkInserting into the key frame common view;
creating a current key frame according to the pose of the current frame camera and the camera internal reference in the step 1Map point set MPkThe formula is created as follows:
MPk={Tk·π-1(pv,dpv)|pv∈Rk}
where Rk represents the set of static feature points of the k frame image, pvDenotes a feature point, dp, with a sequence number v in RkvRepresents a point pvDepth value of (d, n)-1Indicating conversion of a point-to-point camera reference from an image coordinate system to a camera coordinate system, TkRepresenting the pose of the current frame camera;
sequentially judging the redundancy conditions of other key frames and map points in the key frame common view, and when a certain key frame in the map simultaneously meets the following conditions: more than a% of elements in the map point set of the key frame are contained by the map point sets of the key frames of other b frames; the key frame is separated from the current frame by more than c key frames; deleting the key frame;
when a map point in the map satisfies: the map point set of all key frames in the map does not contain the map point or after the map point is created for the first time, the map point cannot be observed by two continuous key frames; the map point is deleted.
CN202110695592.3A 2021-06-23 2021-06-23 RGBD camera-based weak texture and dynamic scene vision SLAM positioning method Active CN113379842B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110695592.3A CN113379842B (en) 2021-06-23 2021-06-23 RGBD camera-based weak texture and dynamic scene vision SLAM positioning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110695592.3A CN113379842B (en) 2021-06-23 2021-06-23 RGBD camera-based weak texture and dynamic scene vision SLAM positioning method

Publications (2)

Publication Number Publication Date
CN113379842A CN113379842A (en) 2021-09-10
CN113379842B true CN113379842B (en) 2022-06-14

Family

ID=77578459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110695592.3A Active CN113379842B (en) 2021-06-23 2021-06-23 RGBD camera-based weak texture and dynamic scene vision SLAM positioning method

Country Status (1)

Country Link
CN (1) CN113379842B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114812540B (en) * 2022-06-23 2022-11-29 深圳市普渡科技有限公司 Picture construction method and device and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583457A (en) * 2018-12-03 2019-04-05 荆门博谦信息科技有限公司 A kind of method and robot of robot localization and map structuring
CN111060115A (en) * 2019-11-29 2020-04-24 中国科学院计算技术研究所 Visual SLAM method and system based on image edge features
CN112200879A (en) * 2020-10-30 2021-01-08 浙江大学 Map lightweight compression transmission method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015194864A1 (en) * 2014-06-17 2015-12-23 (주)유진로봇 Device for updating map of mobile robot and method therefor
EP3474230B1 (en) * 2017-10-18 2020-07-22 Tata Consultancy Services Limited Systems and methods for edge points based monocular visual slam

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583457A (en) * 2018-12-03 2019-04-05 荆门博谦信息科技有限公司 A kind of method and robot of robot localization and map structuring
CN111060115A (en) * 2019-11-29 2020-04-24 中国科学院计算技术研究所 Visual SLAM method and system based on image edge features
CN112200879A (en) * 2020-10-30 2021-01-08 浙江大学 Map lightweight compression transmission method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A New RGB-D SLAM Method with Moving Object Detection for Dynamic Indoor Scenes;Wang, Runzhi;《Remote Sensing》;20190514;第11卷(第10期);第1-19页 *
一种基于PL-ICP及NDT点云匹配的单线激光里程计;姜祚鹏,梅天灿;《激光杂志》;20200331;第41卷(第3期);第21-24页 *

Also Published As

Publication number Publication date
CN113379842A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
KR100793838B1 (en) Appratus for findinng the motion of camera, system and method for supporting augmented reality in ocean scene using the appratus
CN111462207A (en) RGB-D simultaneous positioning and map creation method integrating direct method and feature method
CN106846467B (en) Entity scene modeling method and system based on optimization of position of each camera
CN109974743B (en) Visual odometer based on GMS feature matching and sliding window pose graph optimization
KR20160138062A (en) Eye gaze tracking based upon adaptive homography mapping
US9158963B2 (en) Fitting contours to features
US11367195B2 (en) Image segmentation method, image segmentation apparatus, image segmentation device
JP2009093644A (en) Computer-implemented method for tacking 3d position of object moving in scene
TWI686771B (en) 3d model reconstruction method, electronic device, and non-transitory computer readable storage medium
US9202138B2 (en) Adjusting a contour by a shape model
US20170064279A1 (en) Multi-view 3d video method and system
WO2023060964A1 (en) Calibration method and related apparatus, device, storage medium and computer program product
WO2019157922A1 (en) Image processing method and device and ar apparatus
Chen et al. A particle filtering framework for joint video tracking and pose estimation
CN112652020B (en) Visual SLAM method based on AdaLAM algorithm
WO2021027543A1 (en) Monocular image-based model training method and apparatus, and data processing device
CN113379842B (en) RGBD camera-based weak texture and dynamic scene vision SLAM positioning method
CN108416800A (en) Method for tracking target and device, terminal, computer readable storage medium
CN114266823A (en) Monocular SLAM method combining SuperPoint network characteristic extraction
CN112131991B (en) Event camera-based data association method
CN110111341B (en) Image foreground obtaining method, device and equipment
Petersen A comparison of 2d-3d pose estimation methods
CN112085842A (en) Depth value determination method and device, electronic equipment and storage medium
CN112508168B (en) Frame regression neural network construction method based on automatic correction of prediction frame
CN114882106A (en) Pose determination method and device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant