CN113379842B

CN113379842B - RGBD camera-based weak texture and dynamic scene vision SLAM positioning method

Info

Publication number: CN113379842B
Application number: CN202110695592.3A
Authority: CN
Inventors: 梅天灿; 姜祚鹏
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2022-06-14
Anticipated expiration: 2041-06-23
Also published as: CN113379842A

Abstract

The invention provides a weak texture and dynamic scene vision SLAM positioning method based on an RGBD camera. The method comprises the steps of collecting images in real time through an RGBD camera, comparing the sequence number of a current frame image with a preset sliding window length, and calculating the grid size of feature extraction of the current frame image; constructing a plurality of pixel blocks according to the feature point set of the previous frame image, projecting the pixel blocks to the current frame, and removing the dynamic feature points in the previous frame image according to the gray difference between each pixel block in the previous frame image and the corresponding projected pixel block in the current frame image to obtain the static feature point set of the previous frame image; and constructing a pose solving objective function based on the static feature point set of the previous frame of image, performing first-stage optimization solving on the pose of the current frame camera, and optimizing the pose of the current frame camera based on the key frame common view to obtain the optimized pose of the current frame camera. The method can work stably in a dynamic scene with weak texture and moving target, and has strong environment adaptability.

Description

RGBD camera-based weak texture and dynamic scene vision SLAM positioning method

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a weak texture and dynamic scene vision SLAM positioning method based on an RGBD camera.

Background

With the continuous progress and development of scientific technologies such as computers, electronic engineering, virtual reality, artificial intelligence and the like, the robot technology gradually becomes a hot spot of current scientific research. Meanwhile, the figure of the service robot is gradually appearing in many fields of medical treatment, military affairs, agriculture, industrial production and daily life. It is expected that, with the development of the times and the technological level, in the future of more automation and intelligence in production and life, the service robot will have a very wide application prospect and will permeate into various fields of human production and life. In order to realize autonomous movement of the robot, the robot needs to acquire the position and surrounding environment information of the robot in real time, and thus the SLAM technology is developed. SLAM is a short term for synchronous positioning and Mapping (Simultaneous Localization and Mapping), and refers to a robot or a mobile platform that continuously positions itself and builds a map of the surrounding environment.

The sensor of the traditional SLAM scheme is mainly a laser radar, but the laser radar is high in cost, point cloud information can only reflect the distance and the angle of a surrounding object point, abundant information such as texture and semantics of the surrounding environment is discarded, and the interactivity with people is poor. Therefore, cameras have become mainstream sensors of the SLAM scheme in recent years, and the SLAM scheme having a camera as a main sensor is called a visual SLAM.

In practical applications, many factors can interfere with the accuracy and stability of the visual SLAM system, and the most significant factor is the scene characteristics. Since most visual SLAM schemes rely strongly on feature extraction and matching, when the scene texture is weak or there are moving objects in the scene, the feature extraction and matching between images will be greatly affected, thereby affecting the stable operation of the visual SLAM system. The patent literature discloses an RGB-D SLAM method (publication No. CN106127739B) combined with monocular vision, a method for realizing SLAM positioning based on monocular vision and a related device (publication No. CN111928842A) both perform feature extraction on images, and establish data association through feature matching between two adjacent frames of images, so as to position a camera, but both methods do not mention the operation problem of visual SLAM in a scene with weak texture and dynamic object. The patent literature discloses a visual SLAM method and device based on dynamic target detection (application publication No. CN112435278A), and a method and related device for realizing SLAM positioning in a dynamic environment (application publication No. 111928857a), which all use a deep learning network to detect potential moving targets in an image, so as to eliminate potential dynamic feature points, and improve the operation capability of a visual SLAM system in a dynamic scene. However, the method needs to train the network model in advance, and when the environment changes, the type of the dynamic object included in the model needs to be changed, and the deep learning network also occupies larger computing resources in practical application. In addition, the method based on the deep learning network can remove all objects which are likely to move in the image even if the objects do not actually move (such as a stationary vehicle, a human being and the like), and the deep learning network cannot remove the objects when some non-potential moving objects which are moved exist in the environment (such as a moved book, a moved cabinet and the like). Therefore, although the method can solve the operation problem of the visual SLAM system in a dynamic scene to a certain extent, the method also has a certain problem and is not strong in applicability in an actual scene.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide a weak texture and dynamic scene vision SLAM positioning method based on an RGBD camera, and the method is independent of a deep learning network, has good real-time performance and strong adaptability to different environments.

In order to achieve the above object, the technical solution of the present invention provides a method for positioning a weak texture and dynamic scene vision SLAM based on an RGBD camera, which includes the following steps:

step 1, calibrating a camera principal point, a focal length and distortion parameters by a Zhang Zhengyou camera calibration method to obtain a calibrated camera principal point, a calibrated focal length and calibrated distortion parameters; further setting a scale factor, the number of preset feature points and the length of a sliding window;

step 2, collecting a plurality of frames of images in real time through a camera, comparing the sequence number of the current frame of image with the length of a sliding window, and further calculating the grid size of the current frame of image;

step 3, initializing a camera pose of a current frame, constructing a plurality of pixel blocks according to a feature point set of a previous frame of image, projecting all the pixel blocks to the current frame in sequence by combining with an initial camera pose structure of the current frame, calculating gray level differences between each pixel block in the previous frame of image and a corresponding projected pixel block in the current frame of image, and removing dynamic feature points in the feature point set of the previous frame of image by combining with the gray level differences to obtain a static feature point set of the previous frame of image;

step 4, constructing a pose solving objective function based on the static feature point set of the previous frame of image and the initial camera pose of the current frame, performing first-stage optimization solving on the camera pose of the current frame, further performing further optimization solving on the camera pose of the current frame based on the key frame common view to obtain the camera pose of the current frame after optimization solving;

preferably, the scale factors in step 1 are: the physical distance of a spatial point from the center of the camera is proportional to the depth value of the point in the depth image.

Step 1, the number of the preset feature points is as follows: the number of the characteristic points which are preset by a user and expected to be extracted from each frame of image is defined as N;

step 1, the length of the sliding window is defined as: l;

preferably, the step 2 specifically comprises:

if the current frame number is k, and if k is 1, the step 2 of calculating the grid size of the current frame image specifically includes:

calculating the grid size of the current frame image according to the size of the current frame image and the number of the preset feature points in the step 1, wherein the specific calculation formula is as follows:

wherein G is_kThe grid size of the k frame image is represented, N represents the number of the preset feature points set in the step 1, w represents the length of the k frame image, and h represents the width of the k frame image;

if k is more than 1 and less than or equal to w, the step 2 of calculating the grid size of the current frame image specifically comprises the following steps:

uniformly dividing the image of the (k-1) th frame according to the grid size of the image of the (k-1) th frame to obtain a plurality of grids in the image of the (k-1) th frame, and defining the set of all the grids of the (k-1) th frame as Gr_k-1

Performing FAST feature extraction on the (k-1) frame image to obtain a plurality of feature points of the (k-1) frame image;

calculating the score of each feature point of the image of the (k-1) th frame based on a Shi-Tomasi algorithm, selecting the feature point with the highest score of each grid in the image of the (k-1) th frame as a main selection feature point of the grid in the image of the (k-1) th frame, and selecting the feature point with the next highest score of each grid in the image of the (k-1) th frame as a candidate feature point of the grid in the image of the (k-1) th frame;

and the main selected characteristic points of the grid j in the k-1 frame image are defined as follows:

and the candidate characteristic points of the grid j in the k-1 frame image are defined as:

counting the number of main feature points of all grids in the k-1 frame image, and defining as follows: n is a radical of_k-1；

Comparing the number of the main feature points of all grids in the k-1 frame image with the number N of the preset feature points:

when N is present_k-1When the number of the feature points of the k-1 frame image is more than or equal to N, the feature points of the k-1 frame image are formed by the main selection feature points of all grids, otherwise, the feature points of the k-1 frame image are formed by the main selection feature points and the alternative feature points of all grids;

if N is present_k-1If < N, the grid size of the current frame image, namely the k frame image is G_k-1-theta, if N_k-1When the grid size of the current frame image, namely the kth frame image is G when the grid size is larger than or equal to N_k-1+ θ, the grid size of the current frame image, i.e. the kth frame image, is defined as: g_k；

When k is greater than L, the step 2 of calculating the grid size of the current frame image specifically includes:

calculating the grid size of the current frame image, wherein the specific calculation formula is as follows:

wherein G is_kThe grid size of the k frame image is represented, N represents the number of preset feature points set in the step 1, L represents the length of a sliding window, i belongs to [1, L ]]，N_k,preRepresenting a weighted value, G, of the number of dominant feature points in the kth frame of image_k,preRepresents a grid size weighted value in the k frame image, p represents a weighted scale factor, σ represents a grid size adjustment factor, G_k-iRepresents the grid size, N, of the k-i frame image_k-iRepresenting the number of main feature points of all grids in the k-i frame image, and m represents the sum of weights;

and the main feature point set of the k frame image is defined as:

the set of candidate feature points of the k frame image is defined as:

the feature point set of the k frame image is defined as:

wherein, num_kIs the number of grids in the k-th frame, and j is the grid number.

Preferably, the step 3 of constructing a plurality of pixel blocks according to the previous frame image feature point set specifically includes:

each characteristic point in the previous frame image in turn, namely FP_k-1Respectively constructing pixel blocks with the length bw and the width bh by taking the coordinates as centers;

and 3, projecting all pixel blocks to the current frame in sequence by combining the initial camera pose structure of the current frame, specifically:

sequentially projecting each pixel block of the previous frame to the current frame k according to the camera parameters, the camera pose of the previous frame and the initial camera pose of the current frame of the configuration file in the step 1;

T_(ini,k)＝T_k-1·ΔT_(k-2,k-1)

wherein T is_(ini，k)Initiating a camera pose, Δ T, for the current frame (i.e., the k-th frame)_(k-2,k-1)The camera pose increment from the (k-2) th frame to the (k-1) th frame is obtained;

step 3, calculating the gray level difference between each pixel block in the previous frame image and the corresponding projection pixel block in the current frame image as follows:

ΔL(p_n)＝I_k(p'_n)-I_k-1(p_n)

wherein p is_nIs a characteristic point with the sequence number n on the k-1 frame, and n belongs to [1, o (FP)_k-1)]，o(FP_k-1) Is the number of characteristic points of the k-1 th frame, I_k-1(p_n) Is represented by p_nThe gray value of the pixel block as the center is obtained by calculation of bilateral interpolation method, I_k(p’_n) Is represented by p_nThe corresponding projected pixel block gray value on the k-th frame for the centered pixel block.

And 3, removing the dynamic feature points in the feature point set of the previous frame of image by combining the gray level difference, specifically:

and (3) marking the feature points in the previous frame image corresponding to the pixel blocks with the gray difference larger than the threshold Th as dynamic feature points, deleting the dynamic feature points from the previous frame image feature point set, defining the residual feature points as the previous frame image static feature point set in the step 3, and specifically defining as: r_k-1；

Preferably, the step 4 of constructing a pose solving objective function based on the previous frame image static feature point set and the current frame initial camera pose is as follows:

projecting each static point in the static feature point set of the previous frame image to the current frame image according to the step 3 by combining the initial camera pose of the current frame;

step 4, constructing a pose solving function optimization target, namely finding a current frame camera pose T_kAfter the pixel blocks in the (k-1) th frame image are projected according to the pose of the current frame image, the sum of the gray level differences of the pixel blocks in the previous frame image and the corresponding projected pixel blocks in the current frame image is minimum;

step 4, the pose solution objective function is as follows:

wherein, I_k-1(p_s) Denotes p on the k-1 th frame_sA pixel block gray value as a center, pi represents the conversion of a point from a camera coordinate system to an image coordinate system according to camera intrinsic parameters, pi^-1Indicating conversion of a point press from an image coordinate system to a camera coordinate system, p_sRepresenting a set R of k-1 frame static feature points_k-1Feature points with middle sequence number s, dp_sRepresents p_sDepth value of (d);

and 4, performing first-stage optimization solution on the current frame camera pose as follows:

by T_(ini，k)As an iteration initial value of the pose of the current frame camera, using a Gauss-Newton algorithm to iteratively solve the pose solving objective function in the step 4 to obtain the pose of the current frame camera, wherein the pose is defined as: t is_k'；

And 4, further optimizing and solving the current frame camera pose based on the key frame common view to obtain:

finding a key frame having a common-view relation with a current frame image in a key frame common-view as a reference frame, constructing a pixel block with the length bw and the width bh by taking a static characteristic point on the reference frame as a center, projecting each pixel block on the reference frame to the current frame to construct a pose optimization function, and using T to solve the pose of the current frame_kUsing Gauss-Newton algorithm to optimize the camera pose of the current frame image as the iteration initial value of the camera pose of the current frame image, and defining the optimized camera pose of the current frame image as T_kThe optimization function is defined as follows:

wherein, Δ L (T)_k,p_u) Representing a point p of static character on a reference frame_uThe gray difference, p, between the central pixel block and its corresponding projected pixel block on the current frame, i.e. the k-th frame_uRepresenting the feature points with the sequence number u in the reference frame static feature point set, and Rre representing the static feature point set on the reference frame;

step 4, the key frame common view is obtained by the following specific steps:

when the interval of the last key frame in the current frame and the key frame co-view is more than T frames or the number of co-viewpoints between the current frame and the reference frame is less than a threshold Th _ pt, the key frame is considered to be inserted into the key frame co-view;

judging the quality of the current frame, if the current frame has at least Th _ ft recoverable map points and the pose difference between the current frame and the previous key frame is greater than Th _ sh, using the current frame as the key frame according to the camera pose T_kInserted into the key frame common view. Wherein, the pose difference calculation formula is as follows:

creating a map point set MP for the current key frame according to the camera pose of the current frame and the camera internal parameters in the step 1_kThe formula is created as follows:

MP_k＝{T_k·π^-1(p_v,dp_v)|p_v∈R_k}

where Rk represents the set of static feature points of the k frame image, p_vDenotes a feature point, dp, with a sequence number v in Rk_vRepresents a point p_vDepth value of (d, n)^-1Indicating conversion of a point-to-point camera reference from an image coordinate system to a camera coordinate system, T_kRepresenting the current frame camera pose.

Sequentially judging the redundancy conditions of other key frames and map points in the key frame common view, and when a certain key frame in the map simultaneously meets the following conditions: more than a% of elements in the map point set of the key frame are contained by the map point sets of the key frames of other b frames; the key frame is separated from the current frame by more than c key frames. The key frame is deleted.

When a map point in the map satisfies: after the map point is not included in the map point set of all the key frames in the map or the map point is created for the first time, the map point cannot be observed by two next key frames. The map point is deleted.

The method has the advantage that the positioning capability of the visual SLAM system in weak texture and dynamic scenes is greatly improved. The grid size of the current frame feature extraction is dynamically adjusted according to the number of feature points of the processed frame, so that the positioning accuracy and stability of the visual SLAM system in a weak texture scene are greatly improved; and constructing pixel blocks based on the feature point set of the previous frame image, and eliminating dynamic feature points in the image according to the gray level difference between each pixel block in the previous frame image and the corresponding projection pixel block in the current frame image, thereby greatly improving the positioning precision and stability of the visual SLAM system in a dynamic scene.

Drawings

FIG. 1: is a flow chart of an embodiment of the method of the present invention.

Detailed Description

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings and embodiments, so as to fully understand the objects, the features and the effects of the present invention. It is to be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.

The technical scheme adopted by the invention is as follows: a specific embodiment flow of a weak texture and dynamic scene vision SLAM method based on an RGBD camera is shown in FIG. 1, and the method mainly comprises the following steps:

a weak texture and dynamic scene vision SLAM positioning method based on an RGBD camera comprises the following steps:

step 1, the scale factors are as follows: the physical distance of a spatial point from the center of the camera is proportional to the depth value of the point in the depth image.

Step 1, the number of the preset feature points is as follows: the number of feature points which are preset by a user and desired to be extracted from each frame of image is defined as N-1200;

step 1, the length of the sliding window is defined as: l ═ 5;

the step 2 specifically comprises the following steps:

wherein, G_kThe grid size of the k frame image is represented, N is 1200, w is 640, and h is 480;

uniformly dividing the k-1 frame image according to the grid size of the k-1 frame image to obtain a plurality of grids in the k-1 frame image, and defining all grid sets of the k-1 frame as Gr_k-1

if N is present_k-1If < N, the grid size of the current frame image, namely the k frame image is G_k-1θ, if N_k-1When the grid size of the current frame image, namely the kth frame image is G when the grid size is larger than or equal to N_k-1+ θ, θ is 2, and the mesh size of the current frame image, i.e., the k-th frame image, is defined as: g_k；

wherein G is_kA mesh size indicating the kth frame image, N1200, L5, i ∈ [1, L]，N_k,preRepresenting a weighted value, G, of the number of dominant feature points in the kth frame of image_k,preRepresents the grid size weighting value in the k frame image, p is 0.8, sigma is 100, G_k-iRepresents the grid size, N, of the k-i frame image_k-iRepresenting the number of main feature points of all grids in the k-i frame image, and m represents the sum of weights;

and the main feature point set of the k frame image is defined as:

the set of candidate feature points of the k frame image is defined as:

the feature point set of the k frame image is defined as:

Step 3, initializing a camera pose of a current frame, constructing a plurality of pixel blocks according to a feature point set of a previous frame of image, projecting all the pixel blocks to the current frame in sequence by combining with the initial camera pose of the current frame, calculating the gray difference between each pixel block in the previous frame of image and a corresponding projected pixel block in the current frame of image, and removing dynamic feature points in the feature point set of the previous frame of image by combining with the gray difference to obtain a static feature point set of the previous frame of image;

step 3, constructing a plurality of pixel blocks according to the previous frame image feature point set, specifically:

each characteristic point in the previous frame image in turn, namely FP_k-1Respectively constructing pixel blocks with the length bw equal to 4 and the width bh equal to 4 by taking the coordinates as centers;

step 3, projecting all pixel blocks to the current frame in sequence by combining the initial camera position structure of the current frame, specifically:

T_(ini,k)＝T_k-1·ΔT_(k-2,k-1)

ΔL(p_n)＝I_k(p'_n)-I_k-1(p_n)

and (3) taking the feature points in the previous frame image corresponding to the pixel blocks with the gray difference larger than the threshold Th of 35 as dynamic feature points, deleting the dynamic feature points from the previous frame image feature point set, defining the remaining feature points as the previous frame image static feature point set in the step 3, and specifically defining as: r_k-1；

step 4, constructing a pose solving objective function based on the static feature point set of the previous frame of image and the pose of the initial camera of the current frame, wherein the pose solving objective function comprises the following steps:

step 4, the pose solution objective function is as follows:

by T_(ini，k)As the iterative initial value of the current frame camera pose, using Gauss-Newton algorithm to iteratively solve the pose solving objective function in the step 4 to obtain the current frame camera poseDefined as: t is_k'；

searching a key frame having a common-view relation with a current frame image in a key frame common-view as a reference frame, constructing a pixel block with the length bw being 4 and the width bh being 4 by taking a static characteristic point on the reference frame as a center, projecting each pixel block on the reference frame to the current frame similarly to the previous-stage pose solving, constructing a pose optimization function, and taking T as a reference for the current frame image_kUsing Gauss-Newton algorithm to optimize the camera pose of the current frame image as the iteration initial value of the camera pose of the current frame image, and defining the optimized camera pose of the current frame image as T_kThe optimization function is defined as follows:

step 4, the key frame common view is obtained by the following specific steps:

when the previous key frame interval T in the current frame and key frame co-view is more than 30 frames or the number of co-viewpoints between the current frame and the reference frame is less than the threshold Th _ pt is 30, the key frame is considered to be inserted into the key frame co-view;

judging the quality of the current frame, if the current frame has at least 15 recoverable map points and the pose difference between the current frame and the previous key frame is greater than 0.1, using the current frame as the key frame and according to the camera pose T_kInserted into the key frame common view.

MP_k＝{T_k·π^-1(p_v,dp_v)|p_v∈R_k}

wherein R is_kSet of static feature points, p, representing the image of the k-th frame_vRepresents R_kFeature points with middle ordinal number v, dp_vRepresents a point p_vDepth value of (d, n)^-1Indicating conversion of a point-to-point camera reference from an image coordinate system to a camera coordinate system, T_kRepresenting the current frame camera pose.

Sequentially judging the redundancy conditions of other key frames and map points in the key frame common view, and when a certain key frame in the map simultaneously meets the following conditions: the map point set of the key frame contains more than 90% of elements with a percent equal to 90% contained by other map point sets of key frames with b equal to 3; the key frame is separated from the current frame by more than 10 key frames. The key frame is deleted.

When a map point in the map satisfies: after the map point is not included in the map point set of all the key frames in the map or is created for the first time, the map point cannot be observed by two consecutive key frames. The map point is deleted.

And 5, outputting a result, writing the timestamp and the pose information of the current frame image into a text file for storage, wherein the pose is expressed in a quaternion form.

Experiment: according to the method, a visual SLAM system (hereinafter referred to as the system) is constructed based on a Linux platform, positioning experiments are carried out on the image sequences of a dynamic scene and a weak texture scene of a public RGB-D data set TUM, and compared with other two open-source RGB-D visual SLAM schemes, and the experimental data are shown in tables 1 and 2 (in the tables, "-" indicates that the system cannot normally complete positioning tasks under the sequences):

TABLE 1 System positioning Accuracy (ATE) comparison in dynamic scenarios (unit: m)

TABLE 2 System positioning Accuracy (ATE) comparison (unit: m) in Weak texture scenes

In summary, aiming at the problem that the positioning accuracy of the existing SLAM technology is reduced in the weak texture and dynamic scene, the invention provides a dynamic point elimination method based on a pixel gray error threshold value by combining the image characteristics of the weak texture and the dynamic scene, and overcomes the problem of mismatching in the dynamic scene. A back-end optimization framework based on graph optimization is constructed, a key frame common view is created and maintained, key frames and map points are created and deleted by adopting a simple and effective strategy, and drift generated by system operation is effectively corrected. Experimental results prove that the method provided by the invention can achieve better performance than the existing method in weak texture and dynamic scenes, and can meet the positioning and drawing application requirements in challenging scenes

The above-described examples are merely illustrative of embodiments of the present invention and do not limit the scope of the present invention. Various modifications and improvements of the technical solution of the present invention may be made by those skilled in the art without departing from the spirit of the present invention, and the technical solution of the present invention is to be covered by the protection scope defined by the claims.

Claims

1. A weak texture and dynamic scene vision SLAM positioning method based on an RGBD camera is characterized by comprising the following steps:

step 3, initializing a camera pose of a current frame, constructing a plurality of pixel blocks according to a feature point set of a previous frame image, constructing all the pixel blocks by combining the initial camera pose of the current frame image, projecting the pixel blocks to the current frame in sequence, calculating the gray level difference between each pixel block in the previous frame image and the corresponding projected pixel block in the current frame image, and removing dynamic feature points in the feature point set of the previous frame image by combining the gray level difference to obtain a static feature point set of the previous frame image;

step 1, the scale factors are as follows: the ratio of the physical distance of a spatial point from the center of the camera to the depth value of the point in the depth image;

step 1, the length of the sliding window is defined as: l;

the step 2 specifically comprises the following steps:

wherein, G_kThe grid size of the k frame image is represented, N represents the number of the preset feature points set in the step 1, w represents the length of the k frame image, and h represents the width of the k frame image;

if k is more than 1 and less than or equal to L, the step 2 of calculating the grid size of the current frame image specifically comprises the following steps:

uniformly dividing the k-1 frame image according to the grid size of the k-1 frame image to obtain a plurality of grids in the k-1 frame image, and defining the k-1 frame imageSet of all grids of a frame as Gr_k-1

counting the number of main feature points of all grids in the k-1 frame image, and defining the number as follows: n is a radical of_k-1；

wherein G is_kThe grid size of the k frame image is shown, N is the number of preset feature points set in the step 1, L is the length of a sliding window, i belongs to [1, L]，N_k,preRepresenting a weighted value, G, of the number of dominant feature points in the kth frame of image_k,preRepresents a grid size weighted value in the k frame image, p represents a weighted scale factor, σ represents a grid size adjustment factor, G_k-iRepresents the grid size, N, of the k-i frame image_k-iRepresenting the number of main feature points of all grids in the k-i frame image, and m represents the sum of weights;

and the main feature point set of the k frame image is defined as:

the set of candidate feature points of the k frame image is defined as:

the feature point set of the k frame image is defined as:

wherein, num_kAs the number of meshes in the k-th frameAnd j is a grid number.

2. The method for positioning the weak texture and dynamic scene vision SLAM based on the RGBD camera as claimed in claim 1, wherein the step 3 constructs a plurality of pixel blocks according to the feature point set of the previous frame image, specifically:

each characteristic point in the previous frame image in turn, namely FP_k-1Respectively constructing pixel blocks with the length bw and the width bh by taking the coordinates as a center;

step 3, constructing all pixel blocks to project to the current frame in sequence by combining the initial camera pose of the current frame, specifically:

T_(ini,k)＝T_k-1·ΔT_(k-2,k-1)

wherein T is_(ini，k)For the current frame, i.e. the k-th frame, the initial camera pose, Δ T_(k-2,k-1)The camera pose increment from the (k-2) th frame to the (k-1) th frame is obtained;

ΔL(p_n)＝I_k(p′_n)-I_k-1(p_n)

wherein p is_nIs a characteristic point with the sequence number n on the k-1 frame, and n belongs to [1, o (FP)_k-1)]，o(FP_k-1) Is the number of characteristic points of the k-1 th frame, I_k-1(p_n) Is represented by p_nThe gray value of the pixel block as the center is obtained by calculation of bilateral interpolation method, I_k(p’_n) Is represented by p_nThe gray value of the corresponding projection pixel block of the pixel block which is taken as the center on the k frame;

the characteristic points in the previous frame image corresponding to the pixel blocks with the gray difference larger than the threshold Th are taken as dynamic characteristic points, and the dynamic characteristic points are collected from the characteristic points of the previous frame imageDeleting, wherein the remaining feature points are defined as the static feature point set of the previous frame image in the step 3, and the specific definition is as follows: r_k-1。

3. The method for visually positioning the SLAM based on the weak texture and the dynamic scene of the RGBD camera in the claim 1, wherein the step 4 of constructing the pose solving objective function based on the static feature point set of the previous frame image and the initial camera pose of the current frame image is as follows:

step 4, the pose solution objective function is as follows:

wherein, I_k-1(p_s) Denotes p on the k-1 th frame_sA pixel block gray value as a center, pi represents the conversion of a point from a camera coordinate system to an image coordinate system according to camera internal parameters, pi^-1Indicating conversion of a point press from an image coordinate system to a camera coordinate system, p_sRepresenting a set R of k-1 frame static feature points_k-1Feature points with middle sequence number s, dp_sDenotes p_sDepth value of (d);

by T_(ini，k)As the pose of the current frame cameraAnd (5) iteration initial values, using Gauss-Newton algorithm to carry out iteration solution on the pose solution objective function in the step (4), and obtaining the pose of the camera of the current frame by solution, wherein the pose is defined as: t'_k；

searching a key frame having a common-view relation with a current frame image in a key frame common-view picture as a reference frame, constructing a pixel block with the length bw and the width bh by taking a static characteristic point on the reference frame as a center, projecting each pixel block on the reference frame to the current frame, constructing a pose optimization function, and using T'_kAs the camera pose iteration initial value of the current frame image, optimizing the camera pose of the current frame image by using Gauss-Newton algorithm, wherein the optimized camera pose of the current frame image is defined as T_kThe optimization function is defined as follows:

step 4, the key frame common view is obtained by the following specific steps:

when the interval T of the last key frame in the common view of the current frame and the key frame is more than or equal to the threshold Th _ pt, or the number of the common viewpoints between the current frame and the reference frame is less than the threshold Th _ pt, the key frame is considered to be inserted into the common view of the key frame;

judging the quality of the current frame, if the current frame has at least Th _ ft recoverable map points and the pose difference between the current frame and the previous key frame is greater than Th _ sh, using the current frame as the key frame according to the camera pose T_kInserting into the key frame common view;

creating a current key frame according to the pose of the current frame camera and the camera internal reference in the step 1Map point set MP_kThe formula is created as follows:

MP_k＝{T_k·π^-1(p_v,dp_v)|p_v∈R_k}

where Rk represents the set of static feature points of the k frame image, p_vDenotes a feature point, dp, with a sequence number v in Rk_vRepresents a point p_vDepth value of (d, n)^-1Indicating conversion of a point-to-point camera reference from an image coordinate system to a camera coordinate system, T_kRepresenting the pose of the current frame camera;

sequentially judging the redundancy conditions of other key frames and map points in the key frame common view, and when a certain key frame in the map simultaneously meets the following conditions: more than a% of elements in the map point set of the key frame are contained by the map point sets of the key frames of other b frames; the key frame is separated from the current frame by more than c key frames; deleting the key frame;

when a map point in the map satisfies: the map point set of all key frames in the map does not contain the map point or after the map point is created for the first time, the map point cannot be observed by two continuous key frames; the map point is deleted.