WO2022002039A1

WO2022002039A1 - Visual positioning method and device based on visual map

Info

Publication number: WO2022002039A1
Application number: PCT/CN2021/103073
Authority: WO
Inventors: 李建禹; 易雨亭; 龙学雄
Original assignee: 杭州海康机器人技术有限公司
Priority date: 2020-06-30
Filing date: 2021-06-29
Publication date: 2022-01-06
Also published as: CN111780763B; CN111780763A

Abstract

A visual positioning method based on a visual map. The method comprises: collecting a current image, and obtaining a current frame (101); on the basis of the current frame, performing feature point extraction to obtain feature points of the current frame (103); and determining a positioning strategy according to a current positioning state, matching the feature points in the current frame with map points in a map on the basis of the positioning strategy to obtain matched feature points, and when the matched feature points satisfy conditions for pose calculation of the current frame, calculating a pose of the current frame according to the matched feature points to obtain a positioning result (104). According to the method, a calibration point having an accurate position does not need to be set, relatively good robustness is achieved for an area without map building or when positioning fails, and a continuous match with the map can be performed in the positioning process without large jumping.

Description

A visual positioning method and device based on a visual map

This application claims the priority of the Chinese patent application filed on June 30, 2020 with the application number 202010618519.1 titled "A Visual Positioning Method and Device Based on Visual Maps", the entire contents of which are incorporated by reference in in this application.

technical field

The present application relates to the field of visual navigation, and in particular, to a visual positioning method and device based on a visual map.

Background technique

Visual navigation is to collect images of the surrounding environment through a camera device, and perform calculations based on the images to complete the position determination and path recognition of the mobile robot. The constructed environment map. Taking the map based on the ground texture as an example, when the mobile robot moves through a texture point, according to the feature registration of the current image and the map collected at the texture point, the pose of the current mobile robot can be calculated, and the ground texture-based method can be performed. positioning navigation.

Existing localization methods based on visual maps use the matching results of a frame of images for localization, resulting in poor localization robustness.

SUMMARY OF THE INVENTION

The present application provides a visual localization method based on a visual map to improve the robustness of visual localization.

A visual positioning method based on a visual map provided by this application is implemented as follows:

Collect the current image to get the current frame;

Based on the current frame, feature point extraction is performed to obtain the feature points of the current frame;

Determine the positioning strategy according to the current positioning state,

Based on the positioning strategy, the feature points in the current frame are matched with the map points in the map to obtain the matching feature points,

When the matching feature points meet the conditions for the current frame pose calculation, the pose of the current frame is calculated according to the matching feature points, and the positioning result is obtained.

Optionally, after obtaining the matching feature points, it further includes,

Based on the matching feature points, a random sampling consensus algorithm is used to determine the best matching feature point set,

Determine whether the best matching feature point set satisfies the conditions for the current frame pose calculation, when the best matching feature point set does not meet the current frame pose calculation conditions, determine that the current frame positioning fails; the positioning The state is determined according to the positioning result of the previous frame and the current number of frames that have failed to locate continuously; the positioning state includes an uninitialized state, a successful positioning state, and a relocation state;

The positioning strategy includes initialization positioning in an uninitialized state, normal positioning in a successful positioning state, relocation in a relocation state, and conversion relationships between the positioning states;

Wherein, the conversion relationship between the various positioning states includes:

When the current positioning state is the uninitialized state, if the initialization positioning is successful, it will switch to the positioning successful state, and if the initialization positioning fails, the current uninitialized state will be maintained;

When the current positioning status is the positioning successful status, if the normal positioning is successful, the current positioning successful status will be maintained, and if the normal positioning fails, it will switch to the positioning lost status;

When the current positioning state is the positioning loss state, if the relocation is successful, it will switch to the positioning success state. If the relocation fails, determine whether the number of consecutive failed positioning frames exceeds the set frame number threshold, or determine whether the current frame is different from the last positioning. Whether the distance between the successful poses exceeds the set first distance threshold, if it exceeds the set number of frames threshold or the set first distance threshold, it will switch to the uninitialized state, if it does not exceed the set number of frames threshold or the set first distance threshold, then keep the current positioning loss state;

The determining of the positioning strategy according to the current positioning state includes generating positioning logic according to each positioning state and the conversion relationship between the respective positioning states; wherein, the logical content of the positioning logic includes: executing the current frame indicated by the corresponding positioning strategy The positioning process, the corresponding positioning strategy is the positioning strategy under the current positioning state;

Matching the feature points in the current frame with the map points in the map based on the positioning strategy, including, based on the positioning logic, according to the positioning process indicated by the positioning strategy in each state, determining the candidate map range in the map for the current frame , to match the feature points in the current frame with the map points in the candidate map range.

Optionally, generating the positioning logic according to each positioning state and the conversion relationship between the respective positioning states, including,

Determine whether the current frame is the first frame,

If the current frame is the first frame, initialize the current frame,

If the current frame is not the first frame, it is judged whether the positioning of the previous frame is successful.

If the positioning of the previous frame is successful, the current frame will be positioned normally;

If the positioning of the previous frame is unsuccessful, it is judged whether the current number of consecutive failed frames exceeds the set number of frames threshold, or whether the distance between the current frame and the pose of the last successful positioning exceeds the set first distance threshold ;

If the current frame number of consecutive positioning failures exceeds the frame number threshold or the distance between the current frame and the last successful positioning pose exceeds the set first distance threshold, the current frame will be initialized and positioned.

If the current frame number of consecutive positioning failures does not exceed the frame number threshold or the distance between the current frame and the last successful positioning pose does not exceed the set first distance threshold, the current frame is relocated.

Optionally, if the current frame is the first frame, initializing and positioning the current frame, further comprising,

According to the current frame is the first frame, it is determined that the current positioning state is an uninitialized state,

If the initialization and positioning of the current frame is successful, the current frame positioning is recorded successfully, and the status is switched to the positioning success state. If the current frame positioning fails, the current frame positioning failure is recorded, and the current uninitialized state is maintained;

Described if the positioning of the previous frame is successful, then the current frame is positioned normally, further comprising,

According to the successful positioning of the previous frame, it is determined that the current positioning status is the positioning successful status.

If the normal positioning of the current frame is successful, the current frame positioning is recorded successfully, and the current positioning success status is maintained; if the current frame positioning fails, the current frame positioning failure is recorded, and the status is switched to the positioning loss state;

Described if the current continuous positioning failure frame number exceeds the frame number threshold, or, the distance between the current frame and the most recent successful positioning pose exceeds the set first distance threshold, then the current frame is initialized and positioned, further comprising,

According to the current continuous positioning failure frame number exceeds the frame number threshold, or the distance between the current frame and the most recent successful positioning pose exceeds the set first distance threshold, it is determined that the current positioning state is an uninitialized state,

If the initialization positioning of the current frame is successful, the current frame positioning success is recorded, and the status is switched to the positioning success state. If the current frame initialization positioning fails, the current frame positioning failure is recorded, and the current uninitialized state is maintained;

Described if the current continuous positioning failure frame number does not exceed the frame number threshold or the distance between the current frame and the most recent successful positioning does not exceed the set first distance threshold, then the current frame is relocated, further comprising,

According to the current continuous positioning failure frame number does not exceed the frame number threshold, or the distance between the current frame and the last successful positioning pose does not exceed the set first distance threshold, it is determined that the current positioning state is a positioning loss state.

If the current frame relocation is successful, record the current frame positioning success, and switch to the positioning success state; if the current frame relocation fails, record the current frame positioning failure, keep the current positioning loss state, and return to the judgment whether the current continuous positioning failure frame number is Exceeds the set number of frames threshold, or whether the distance between the current frame and the last successful positioning pose exceeds the set first distance threshold.

Optionally, based on the positioning logic, according to the positioning process indicated by the positioning strategy in each state, determine the candidate map range in the map for the current frame, including,

When the current frame is initialized and positioned, all map points in the map are used as the first candidate map range, or the auxiliary information is used to obtain the first candidate map range; the map points within the first candidate map range are screened by brute force matching, If the feature matching degree of every two map points exceeds the set second matching threshold, then randomly delete one of the map points to obtain the revised first candidate map range;

When performing normal positioning on the current frame, the pose prediction is performed on the current frame according to the inter-frame motion information from the previous frame to the current frame, and the predicted pose of the current frame is obtained; according to the predicted pose of the current frame, the second frame in the map is determined. candidate map range;

When relocating the current frame, according to the historical frame that has been successfully positioned, trace forward the closest frame to the current frame in the historical frame and use it as the reference frame, and use the reference frame as the previous frame, and according to the previous frame to the current frame The motion information between frames is obtained, the pose prediction is performed on the current frame, and the predicted pose of the current frame is obtained; according to the predicted pose of the current frame, the third candidate map range in the map is determined.

Optionally, according to the inter-frame motion information from the previous frame to the current frame, the pose prediction is performed on the current frame to obtain the predicted pose of the current frame, including,

The first method: Obtain the frame pose transformation from the previous frame to the current frame through the wheel odometer or inertial measurement element, and obtain the predicted pose of the current frame based on the inter-frame pose transformation and the positioning result of the previous frame. ;

or

The second method: obtain the inter-frame pose transformation from the previous frame to the current frame through the visual odometry, and obtain the predicted pose of the current frame based on the inter-frame pose transformation and the positioning result of the previous frame;

or

The third method: According to the historical frame for which the positioning result has been obtained, predict the inter-frame pose transformation from the previous frame to the current frame, and obtain the predicted pose of the current frame based on the inter-frame pose transformation and the positioning result of the previous frame ;

or,

The fourth method: adopt at least two of the first method, the second method, and the third method to obtain the first predicted pose of the current frame respectively, and obtain at least two first predicted poses;

The Kalman filter is used to filter the at least two first predicted poses to obtain a filtered second predicted pose, and the second predicted pose is used as the final predicted pose of the current frame; or, nonlinear optimization is used. method, optimize based on the at least two first predicted poses, obtain an optimized second predicted pose, and use the second predicted pose as the final predicted pose of the current frame;

Wherein, the objective function of nonlinear optimization is the sum of various error terms obtained according to various methods, wherein, using a nonlinear optimization method to optimize based on the at least two first predicted poses, including: The pose, the first predicted pose of the current frame obtained in different ways, and the pose transformation between frames are used as initial values, and are substituted into the objective function of nonlinear optimization to solve the position when the objective function of nonlinear optimization achieves the minimum value. pose as the second predicted pose.

Optionally, determining the second candidate map range in the map according to the predicted pose of the current frame, including,

Taking the map position determined by the predicted pose of the current frame as the center, the first neighborhood of the center is determined as the second candidate map range,

or

For each feature point of the current frame, according to the predicted pose of the current frame and the pixel coordinates of the feature point, calculate the position of the projection point projected from the feature point to the map, and take the position of the projection point as the center of the first neighborhood as the center. The second candidate map range of the feature point;

Described according to the predicted pose of the current frame, determine the third candidate map range in the map, including,

Taking the map position determined by the predicted pose of the current frame as the center, the second neighborhood of the center is determined as the third candidate map range,

or

For each feature point of the current frame, according to the predicted pose of the current frame and the pixel coordinates of the feature point, calculate the position of the projection point projected from the feature point to the map, and the second neighborhood with the position of the projection point as the center is used as the The third candidate map range of the feature point;

The range of the second neighborhood is greater than the range of the first neighborhood.

Optionally, based on the matching feature points, a random sampling consistency algorithm is used to determine the best matching feature point set, including,

From the set of matching feature points formed by the matching feature points, randomly select the matching feature points for calculating the pose estimation of the current frame, and obtain a subset of the current matching feature points;

Based on the mapping between the spatial position information and the pixel position information established by the matching feature points in the current matching feature point subset, the current pose is calculated, and the fitted pose estimation of the current matching feature point subset is obtained,

Obtain the spatial positions of all feature points in the current frame according to the fitted pose estimation and camera internal parameters, and obtain the projected point spatial positions of all feature points;

For each matching feature point in the matching feature point set, according to the spatial position of the projected point of the matching feature point, determine whether the distance between the projected point of the matching feature point in the current frame and the map point matched by the matching feature point in the map is less than If the set second distance threshold is less than the set second distance threshold, then determine that the matching feature point is an inner point; repeatedly perform the described judgment of the projection point of the matching feature point in the current frame and the matching feature point in the map Whether the distance of the matched map points is less than the set second distance threshold, until all matching feature points in the current matching feature point set have been judged as interior points;

Count the number of current interior points, and judge whether the current number of interior points is the largest in the previous iterations. If it is the largest in the previous iterations, the set formed by the current interior points is used as the current best matching feature point set;

Judging whether the end condition is reached, if the end condition is reached, the current best matching feature point set is used as the final best matching feature point set, if the end condition is not reached, the matching feature point set formed from the matching feature points is returned. The step of randomly selecting matching feature points for calculating the pose estimation of the current frame;

Described judging whether the best matching feature point set satisfies the conditions for the current frame pose calculation, including,

According to the number of matching feature points, determine whether the best matching feature point set satisfies the conditions for the current frame pose calculation;

or,

At least two or more best matching feature point sets are respectively given weights that measure the matching degree between the current frame and the candidate map range; the weights are based on the number of matching feature points in the best matching feature point set and the number of feature points extracted from the current frame. , the distribution of feature points, one of the initial number of matching feature points and the number of matching feature points in the best matching feature point set or any combination thereof is determined,

According to the set weight threshold and maximum weight, it is judged whether the best matching feature point set satisfies the conditions for the current frame pose calculation.

Optionally, after calculating the pose of the current frame according to the matching feature points and obtaining the positioning result, the method further includes:

The non-linear optimization based on sliding window is used to calculate the pose of the current frame; the optimized variable is the pose of each image frame in the sliding window, the sliding window includes the current frame, and the optimization constraints are the feature points of the current frame and the previous key Inter-frame matching constraints between frame feature points, and/or map matching constraints between current frame feature points and map points in the map,

Using the least squares method, the inter-frame matching error and/or the map matching error are minimized, and the optimized pose of the current frame is obtained as the positioning result;

in,

The map matching constraint is: the error between the pixel position of the first matching map point back-projected on the current frame and the pixel position of the first matching feature point matching the first matching map point in the current frame, or, in the current frame The error between the spatial position of the first matching feature point projected to the world coordinate system and the spatial position of the first matching map point matching the first matching feature point in the world coordinate system; the first matching feature point is : the feature point in the current frame is matched with the map point in the map to obtain the successfully matched feature point; the first matching map point is: the map point successfully matched by the first matching feature point;

The inter-frame matching constraints are: the first matching feature point in the current frame is projected to the spatial position in the world coordinate system and the second matching feature point matching the first matching feature point in the previous key frame of the current frame is projected to the world The error between the spatial positions in the coordinate system, or, the second matching map point matching the second matching feature point is back-projected to the pixel position of the current frame and the second matching map point is back-projected to the pixel of the previous key frame error between positions;

After the current image is acquired and the current frame is obtained, feature point extraction is performed based on the current frame, and before the feature points of the current frame are obtained, the method includes performing image preprocessing on the current frame.

Optionally, the least squares method is used to minimize the inter-frame matching error and/or the map matching error to obtain the optimized current frame pose, including,

constructing an objective function for optimization, the objective function for optimization is: a first result obtained by weighting the sum of map matching errors of all the first matching feature points of all frames in the current sliding window with the first weight, and , the second result obtained by weighting the sum of the inter-frame matching errors of all the second matching map points between all frames between each frame in the current sliding window and its previous key frame with the second weight, the accumulated information about the first the sum of the result and the second result;

Take the map matching error obtained according to the pose of the current frame, the spatial position information of the first matching map point, the camera internal reference, and the pixel coordinates of the first matching feature point matching the first matching map point in the current frame, as the map matching error initial value,

Taking the inter-frame matching error obtained according to the current frame pose, the spatial position information of the second matching map point, the pose of the previous key frame, and the camera internal parameter matrix as the initial value of the inter-frame matching error,

The iterative solution makes the pose of the current frame when the objective function used for optimization achieves the minimum value, and the optimized current pose is obtained;

After the optimized current frame pose is obtained, the method further includes:

When one of the following conditions is met, it is determined that the current frame is a key frame: the number of the first matching feature points in the current frame is less than the first threshold, and the number of the second matching feature points in the current frame is less than the second threshold;

If the current frame is a non-key frame, the current frame is deleted in the sliding window. If the current frame is a key frame, it is judged whether the number of frames in the current sliding window reaches the set first frame threshold. If the frame threshold is set, the earliest key frame added in the sliding window will be deleted. If the set first frame threshold is not reached, the earliest key frame added in the sliding window will not be deleted;

The performing image preprocessing on the current frame includes,

According to the distortion coefficient of the camera, the current frame is de-distorted to obtain a de-distorted image,

Determine whether the pixel value of each pixel in the dedistorted image is greater than the first pixel threshold, if it is greater than the first pixel threshold, invert the pixels whose pixel value is greater than the first pixel threshold, and then perform image filtering to obtain the background image , if it is not greater than the first pixel threshold, perform image filtering on the dedistorted image to obtain the background image,

Subtract the background image from the undistorted image to get the foreground image,

It is judged whether the pixel values in the foreground image are uniformly distributed. If uniform, the foreground image is used as the current frame after image preprocessing. If it is not uniform, the foreground image is stretched to obtain the current frame after image preprocessing.

Optionally, performing stretching processing on the foreground image includes:

If the pixel value of the foreground image is less than or equal to the set minimum gray value, the pixel value of the foreground image is the minimum value within the pixel value range;

If the pixel value of the foreground image is greater than the minimum gray value and less than the set maximum gray value, the pixel value that is proportional to the maximum value of the pixel is used as the pixel value of the foreground image; the ratio is the pixel value of the foreground image and the minimum value. The ratio of the difference between the gray values to the difference between the maximum gray value and the minimum gray value;

If the pixel value of the foreground image is greater than or equal to the maximum gray value, the pixel value of the foreground image is the maximum value within the range of pixel values;

Described based on the current frame, feature point extraction is performed to obtain the feature points of the current frame, including,

Perform feature detection on the current frame to obtain feature points,

divide the current frame into a predetermined number of grids,

For the feature points in any grid, the feature points in the grid are arranged in descending order according to the response value of the feature points, the first Q feature points are retained, and the filtered feature points are obtained; among them, Q is based on the feature points in the target image frame. The number and the set upper limit of the total number of feature points, and the total number of grid feature points are determined;

Feature descriptors are calculated separately for each feature point after screening.

A visual positioning device based on a visual map provided by the application, the device includes:

The image acquisition module collects the current image and obtains the current frame;

The feature extraction module, based on the current frame, performs feature point extraction to obtain the feature points of the current frame;

The positioning module determines the positioning strategy according to the current positioning state. Based on the positioning strategy, the feature points in the current frame are matched with the map points in the map to obtain matching feature points. When the matching feature points meet the requirements for the current frame pose calculation. When conditions are met, the pose of the current frame is calculated according to the matching feature points, and the positioning result is obtained.

The present application also provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of any of the above-mentioned visual map-based visual positioning methods are implemented.

The present application also provides a mobile robot, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to implement any of the above-mentioned visual map-based visual positioning methods A step of.

The present application also provides a computer program product containing instructions, when the computer program product containing instructions is run on a computer, the computer program product can cause the computer to execute any of the above-mentioned steps of the visual map-based visual positioning method.

The present application also provides a computer program, which, when running on a computer, causes the computer to execute the steps of any of the above-mentioned visual map-based visual positioning methods.

The visual positioning method based on the visual map provided by the present application performs feature matching and pose calculation of the current frame according to the positioning strategy determined by the current positioning state, and does not need to set accurate calibration points; and, compared with the prior art , because different positioning strategies are used for positioning in different positioning states, so that positioning for various positioning states can be achieved, so it has better robustness to unmapped areas or when positioning fails; at the same time, Compared with the prior art positioning scheme of setting calibration points, only images passing through the calibration points can be positioned more accurately, and the positioning process of this scheme can continue to match with the map without significant jumps in the positioning results.

Description of drawings

In order to more clearly illustrate the embodiments of the present application and the technical solutions of the prior art, the following briefly introduces the drawings required in the embodiments and the prior art. Obviously, the drawings in the following description are only the For some embodiments of the application, those of ordinary skill in the art can also obtain other drawings according to these drawings.

FIG. 1 is a schematic flowchart of visual positioning in this embodiment.

Figure 2 is a schematic diagram of feature point screening.

FIG. 3 is a schematic diagram of a transition relationship between positioning states.

FIG. 4 is a schematic diagram of positioning logic.

FIG. 5 is a schematic flow chart of initializing positioning.

FIG. 6 is a schematic diagram of map point screening.

FIG. 7 is a schematic flow chart of normal positioning.

FIG. 8 is a schematic flowchart of a relocation.

9a and 9b are a schematic diagram of determining a candidate map range.

FIG. 10 is a schematic diagram of a visual positioning device based on a visual map of the present application.

FIG. 11 is a schematic diagram of an image preprocessing module.

FIG. 12 is a schematic diagram of map matching constraints and inter-frame matching constraints of the current frame in an image coordinate system.

FIG. 13 is a schematic diagram of map matching constraints and inter-frame matching constraints of the current frame in the world coordinate system.

detailed description

In order to make the objectives, technical solutions, and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and examples. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. All other embodiments obtained by those of ordinary skill in the art based on the embodiments in the present application fall within the protection scope of the present application.

The present application determines the positioning logic according to different positioning strategies adopted in different positioning states, so that the positioning state is converted when the positioning logic is satisfied, wherein the positioning strategies are different in the processing of pose prediction and feature matching, so that the overall positioning Robustness is increased. The following description will be based on the visual positioning based on the visual map, wherein the visual map is a feature map that has been constructed in advance, and the map has map points with three-dimensional spatial information, that is, the world coordinates and descriptions of the feature points are stored in the map. Sub-information, that is, map points in the map can also be called feature points in the map, and the descriptor information of any feature point is the information used to describe the image features of the feature point, which is used in the feature matching process, such as Color features, texture features, etc., wherein the descriptor information may also be called feature descriptors or descriptors.

For ease of understanding, in the embodiment, for example, the visual map is a texture map constructed based on the collected ground texture information, which can be a two-dimensional or three-dimensional point cloud map with feature descriptors, and the coverage of the texture map can be Continuous coverage can also be discrete coverage. The following takes a 3D point cloud map as an example.

Referring to FIG. 1 , FIG. 1 is a schematic flowchart of the visual positioning in this embodiment. After loading the texture map, the mobile robot performs the following steps:

Step 101, collecting a current image to obtain a current frame;

Step 102: Perform image preprocessing on the current frame to make the texture in the image prominent, including but not limited to one or more of optional processes such as image de-distortion, image filtering, and image enhancement. This step 102 is an optional step based on image quality. For example, whether to add image preprocessing is determined according to whether the current frame is over-distorted and whether the texture is significant. It should be noted that if it is determined based on the image quality that the current frame does not need to be added to image preprocessing, after the current frame is collected, the feature points of the current frame can be directly extracted to obtain the feature points of the current frame. The process of extracting the feature points to obtain the feature points of the current frame is the same as the process of extracting the feature points of the preprocessed current frame.

Optionally, a schematic flowchart of performing image preprocessing on the current frame, including the following steps:

Step 1021, perform de-distortion processing on the current frame according to the distortion coefficient of the camera, to obtain a de-distorted image I(u, v), where u and v represent pixel coordinates.

Step 1022, determine whether the pixel value of each pixel in the de-distorted image is greater than the set first pixel threshold, if it is greater than the set first pixel threshold, then perform an inversion operation on the pixel whose pixel value is greater than the first pixel threshold , then filter the de-distorted image after the inversion operation, if it is not greater than the set first pixel threshold, directly perform image filtering on the de-distorted image I (u, v) to obtain the background image I _b (u , v). Among them, taking a color image or a grayscale image as an example, the so-called inversion operation refers to: using 255 to subtract the original pixel value as the new image pixel value, the inverted image is compared with the original image, the black and white is reversed, the light and shade are reversed. on the contrary.

Step 1023, the image distortion spent subtracting the background image, foreground image to obtain I _f (u, v), wherein, the foreground image equation is expressed as:

_{I f (u, v) =} I (u, v) -I b (u, v)

Step 1024, it is determined foreground image I _f (u, v) whether or not the pixel value distribution, even if, then the foreground image as the current frame after pretreatment, if not uniform, the foreground image is stretched to obtain pre- the current frame after;

Among them, whether the pixel values of any image are evenly distributed can determine whether the pixel values of the image are distributed in the entire grayscale range, so as to prevent the contrast from being reduced due to the image being too dark or overexposed. Exemplarily, in an implementation manner, an image grayscale histogram is calculated, the number of pixels in each grayscale interval is counted, and if the pixel grayscale values are concentrated in a larger or smaller grayscale interval, the image is determined. The pixel values of the image are not uniformly distributed; if there are pixel distributions in each grayscale interval, it is determined that the pixel values of the image are uniformly distributed.

Exemplarily, stretching the foreground image may be:

When the pixel value of the foreground image is less than or equal to the minimum gray value, the pixel value of the foreground image is the minimum value within the pixel value range, that is, the pixel value is 0; where the minimum gray value and the maximum gray value can be A preset gray value threshold; or, the minimum gray value may be the minimum value among the gray values of the foreground image pixels, and the maximum gray value may be the maximum value among the gray values of the foreground image pixels.

When the pixel value of the foreground image is larger than the minimum gray value and smaller than the maximum gray value, the contrast of the pixel value of the foreground image will be increased; Optionally, the ratio is: the ratio of the difference between the pixel value of the foreground image and the minimum gray value and the difference between the maximum gray value and the minimum gray value, that is, using the difference between the pixel value of the foreground image and the minimum gray value The difference is divided by the difference between the maximum gray value and the minimum gray value to obtain the ratio.

When the pixel value of the foreground image is greater than or equal to the maximum gray value, the pixel value of the foreground image is the maximum value within the pixel value range, for example, the maximum value of the pixel is 255.

Mathematically expressed as:

After stretching foreground image I _f '(u, v) is expressed as:

Among them, I _min is the minimum gray value, and I _max is the maximum gray value. In the above formula, the pixel value ranges from 0 to 255.

Step 103: Extract the image feature points in the preprocessed current frame, and calculate the descriptor based on the feature points to obtain the feature points and descriptors of the current frame, that is, extract the feature points in the preprocessed current frame, and based on the feature points The point calculation is based on the descriptor of the feature point, so as to obtain the descriptor of the feature point of the current frame. Among them, the descriptor form is the same as the descriptor form of the map point in the map. Exemplarily, features such as ORB (Oriented FAST and Rotated BRIEF), Scale Invariant Feature Transform (SIFT), SURF (Speeded Up Robust Features) and the like can be used. For example, if the feature point of the map point is the ORB (Oriented FAST and Rotated BRIEF) feature point, the feature point of the current frame is also the ORB feature point. Among them, ORB is a feature point optimized based on FAST (Features from Accelerated Segment Test) feature point and BRIEF descriptor, which adds the calculation of FAST direction and the anti-rotation invariance of Brief.

Optionally, the image feature points in the preprocessed current frame are extracted according to the manner of forming with the feature points of the constructed map.

In this embodiment, exemplarily, ORB feature points may be used.

In view of the fact that the collected ground texture images are usually scale-stable, multi-scale features do not need to be constructed to enhance the scale invariance of features, so the construction of pyramid images can be abandoned, and feature extraction based on the current frame image is equivalent to that based on the source image. Perform feature extraction, thereby reducing the amount of computation required for feature extraction and improving computational efficiency. Among them, the pyramid image is also called an image pyramid, and the so-called image pyramid is a collection of images of different resolutions derived from the same image. In view of the fact that uniform and significant feature points can reduce the positioning error in the calculation of camera pose, in order to improve the performance and efficiency of positioning, the extracted feature points can be screened. Figure 2 is a schematic diagram of feature point screening. Exemplarily, the FAST feature is used as an example to introduce the feature point screening process. Specifically: after the FAST (Features from Accelerated Segment Test) feature is extracted, the current frame can be divided into a predetermined number of grids, and each grid is filtered. The Q feature points with the highest FAST response value are retained, and the screened feature points are obtained. Among them, Q is determined according to the number of feature points in one frame of target image, the upper limit of the total number of feature points set, and the total number of frame points in the grid, and the number of feature points retained by different grids is different. For example, in a frame of target image, the upper limit of the total number of feature points is set to 100, and the number of feature points in the current frame is 2000, then select one feature point for every 20 feature points in the target image of this frame. If there are 20 feature points, the feature point retained by the grid is 1, that is, Q=1. The determination of Q is expressed mathematically as:

Among them, the symbol

Indicates rounded down.

Among them, the FAST response value refers to the score value extracted by the FAST corner point, that is, the score of the point becoming a FAST corner point, and the higher the score, the more significant the FAST feature is. Moreover, other types of feature points also have similar score values (response values), which are used to measure the significance of the point as a feature point. Moreover, FAST can be regarded as an algorithm for extracting corner points. FAST corner points are defined as: if a pixel is significantly different from enough pixels in its surrounding neighborhood, the pixel may be a corner point; among them, the corner point is The very important features of the image, the points with particularly prominent attributes in certain aspects are the isolated points with the largest or smallest intensity on some attributes, the end points of the line segments, etc., which play a very important role in the understanding and analysis of image graphics; While retaining the important features of image graphics, it can effectively reduce the amount of information data, make the information content very high, effectively improve the speed of calculation, and be conducive to the reliable matching of images, making real-time processing possible.

Step 104, in view of the fact that in the process of positioning, if the ground texture of the current position has not been mapped, or the ground texture has changed, then the current frame and the map point cannot be properly matched, and then there will be a frame positioning loss. Case. Therefore, in the positioning process, different positioning strategies are used for positioning in the current frame according to different current positioning states. which is:

Determine a positioning strategy corresponding to the current positioning state according to the current positioning state, wherein the positioning state includes an uninitialized state, a successful positioning state, and a relocation state, and the positioning strategies corresponding to each positioning state include: initialized positioning in an uninitialized state , the normal positioning in the successful positioning state, the relocation in the relocation state, and the transition relationship between the various states. The relocation state may also be referred to as a location loss state. Regarding the content related to initialization positioning, normal positioning, and relocation, the following description will be given in conjunction with the specific positioning process for the current frame.

Referring to FIG. 3 , FIG. 3 is a schematic diagram of a transition relationship between positioning states. in,

When the current positioning state is the positioning loss state, if the relocation is successful, it will switch to the positioning successful state. If the relocation fails, it is judged whether the number of frames of continuous positioning failure exceeds the set threshold of the number of frames, or whether the current frame and the last successful positioning are successful. Whether the distance between the poses exceeds the set first distance threshold, if it exceeds the set frame number threshold or the set first distance threshold, it will switch to the uninitialized state, if it does not exceed the set frame number threshold or the set first distance threshold, the current positioning loss state is maintained.

Taking the current frame as an example, when using the current frame for positioning, if the current positioning state is uninitialized, the current frame will be initialized and positioned. If the initialization and positioning is successful, the current positioning state will be converted to the positioning successful state. If the initialization and positioning fails. , keep the current uninitialized state; if the current positioning state is the positioning successful state, the current frame will be positioned normally; if the normal positioning is successful, the current positioning successful state will be maintained; if the normal positioning fails, the current positioning state will be converted to the positioning lost state; If the current positioning state is the positioning loss state, the current frame will be relocated. If the relocation is successful, it will be converted to the positioning successful state. If the relocation fails, it will be judged whether the number of consecutive failed frames exceeds the set frame number threshold. Or, determine whether the distance between the current frame and the last successful positioning pose exceeds the set first distance threshold, and if it exceeds the set frame number threshold or the set first distance threshold, the current positioning state transitions to In the uninitialized state, if the set frame number threshold or the set first distance threshold is not exceeded, the current positioning loss state is maintained.

Based on the conversion relationship between the positioning states, the positioning logic flow can be obtained, and the logical content of the obtained positioning logic includes: executing the positioning process indicated by the corresponding positioning strategy on the current frame, and the corresponding positioning strategy is the positioning strategy in the current positioning state . Referring to FIG. 1 and FIG. 4, FIG. 4 is a schematic diagram of positioning logic, including,

Step 1041, determine whether the current frame is the first frame,

If the current frame is the first frame, it means that there is no positioning result of the previous frame, it is determined that the current positioning state is an uninitialized state, and initialization positioning is performed. Switch to the positioning success state. If the positioning fails, record the current frame positioning failure and keep the current uninitialized state; that is, if the current frame is the first frame, it is determined that the current positioning state is the uninitialized state, and the current frame is uninitialized. The initialization positioning in the state, and based on the positioning result of the current frame, according to the conversion relationship between the positioning states, the current positioning state is updated.

If it is not the first frame, go to step 1042,

Step 1042, determine whether the positioning of the previous frame is successful,

If the positioning of the previous frame is successful, it is determined that the current positioning state is the successful positioning state, and normal positioning is performed. If the positioning is successful, that is, the matching in Figure 1 is successful, the current frame positioning is recorded successfully, and the current positioning successful state is maintained. If the positioning fails, Record the positioning failure of the current frame, and switch to the positioning loss state; that is, if the positioning of the previous frame is successful, the current positioning status is determined as the positioning successful state, and the normal positioning in the positioning successful state is performed for the current frame, and based on the current frame The positioning result, according to the conversion relationship between each positioning state, update the current positioning state;

If the positioning of the previous frame fails, go to step 1043,

Step 1043, judging whether the current continuous positioning failure frame number exceeds the frame number threshold, or whether the distance between the current frame and the most recent successful positioning pose exceeds the set first distance threshold;

If the current number of frames of continuous positioning failure exceeds the frame number threshold, or the distance between the current frame and the last successful positioning pose exceeds the set first distance threshold, the current positioning state is determined to be an uninitialized state, and initialization positioning is performed. , if the positioning is successful, that is, the matching in Figure 1 is successful, record the current frame positioning success, and switch to the positioning success state; if the positioning fails, record the current frame positioning failure, and keep the current uninitialized state; that is, determine the current state as uninitialized In the initialization state, the initialization positioning in the uninitialized state is performed on the current frame, and based on the positioning results of the current frame, the current positioning state is updated according to the conversion relationship between the positioning states; if the current number of consecutive failed positioning frames does not exceed the frame number threshold, Or, if the distance between the current frame and the pose with the most recent successful positioning does not exceed the set first distance threshold, determine that the current state is a repositioning state, and execute the repositioning. If the positioning is successful, that is, the matching in Figure 1 is successful. Then record the current frame positioning success, and switch to the positioning success state. If the positioning fails, record the current frame positioning failure, keep the current positioning loss state, and return to step 1043; Relocation in the state of location loss is performed, and based on the location result of the current frame, the current location state is updated according to the conversion relationship between each location state.

in,

Initial positioning is a positioning strategy performed when the first frame or the number of consecutive failed positioning frames exceeds the threshold. Since there is no positioning result of the previous frame at this time, accurate pose prediction cannot be performed. In the positioning process of initial positioning, the first candidate map range is usually obtained by searching map points in the global map or using auxiliary information, and comparing the feature points of the current frame with the map points in the global map or the first candidate map range. Map points are matched. If the matching is successful, the pose of the current frame is calculated according to the matching feature points, so as to obtain the positioning result of the current frame. It should be noted that when initializing and positioning the current frame, all map points in the map can also be used as the first candidate map range, or the auxiliary information can be used to obtain the first candidate map range; The points are screened by violent matching. If the feature matching degree of each two map points exceeds the set second matching threshold, one of the map points will be randomly deleted to obtain the revised first candidate map range; then, the current frame The feature points in are matched with the map points within the first candidate map range. If the matching is successful, the pose of the current frame is calculated according to the matching feature points, so as to obtain the positioning result of the current frame. Among them, the so-called brute force matching is: pairwise matching, that is, the possibility of each matching is calculated and screened.

Normal positioning is a positioning strategy performed when the positioning of the previous frame is successful. During the positioning process of normal positioning, when the positioning of the previous frame is successful, the pose prediction of the current frame can be used to predict the pose of the current frame more accurately. Therefore, when performing feature matching, it can be determined according to the first neighborhood of the predicted pose. In the second candidate map range, the feature points of the current frame are matched with the map points in the second candidate map range. If the matching is successful, the pose of the current frame is calculated according to the matching feature points, thereby obtaining the positioning result of the current frame.

Relocation is a positioning strategy performed when the previous frame positioning is lost. In the case where the positioning of the previous frame is lost, the last frame that is not lost and the current frame is used as a reference frame, and the pose of the current frame is predicted according to the reference frame. The frame closest to the current frame is used as the reference frame. According to the inter-frame motion information from the reference frame to the current frame, the pose prediction of the current frame is performed to obtain the predicted pose of the current frame. When performing feature matching, the third candidate map range is determined according to the second neighborhood of the predicted pose, and the feature points of the current frame and the map points in the third candidate map range are matched by the method of brute force matching. Guaranteed robustness of relocation, of course, is not limited to brute force matching. If the relocation matching is successful, the pose of the current frame is calculated according to the matching feature points to obtain the positioning result of the current frame.

Optionally, the range of the first neighborhood is smaller than the range of the second neighborhood.

It should be emphasized that the pose of the current frame is the pose of the mobile robot that collected the current frame, and the positioning result of the current frame is the current positioning result of the mobile robot. Among them, the pose of the mobile robot includes a position and an attitude.

In addition, normal positioning and repositioning are in line with the processing flow of pose prediction, feature matching and pose calculation. The difference lies in the specific pose prediction method and feature matching method. Among them, pose prediction and feature matching can be used as: There are two steps in the process of matching the feature points in the frame with the map points in the map to obtain the matching feature points. The initialization positioning only includes the processing flow of feature matching and pose calculation. The feature matching method is different from the feature matching method in normal positioning and relocation. For the initialization positioning, the feature matching is as follows: the feature points in the current frame are compared with the map. It is a link in the process of matching the map points in to obtain the matching feature points. That is to say, for the process of matching the feature points in the current frame with the map points in the map to obtain the matching feature points, the three positioning methods of initial positioning, normal positioning and repositioning are different matching methods; and when matching When the feature points meet the conditions for the pose calculation of the current frame, in the three positioning modes of initial positioning, normal positioning and repositioning, the pose of the current frame is calculated according to the matching feature points, and the specific implementation process of obtaining the positioning result can be the same.

Referring to FIG. 1, step 105, performing graph optimization on the pose of the current frame that has been successfully positioned, so as to optimize the positioning result and improve the accuracy of the positioning.

Among them, the so-called graph optimization can also be called pose graph optimization, which is used to optimize the positioning results. The purpose of positioning optimization is to make the output positioning results more accurate and smooth, and improve the accuracy and robustness of the entire system. Optionally, the positioning optimization adopts a nonlinear optimization based on a sliding window, the optimized variable is the pose of each image frame in the sliding window, the current frame is added to the sliding window, and the optimization constraints are the inter-frame matching constraints, and /or map matching constraints, using the LM method (least square method) to minimize the inter-frame matching error and/or map projection error to obtain the optimized pose of the current frame. The map projection error is also called map matching error, and the error obtained by the map matching constraint is the map matching error, and the error obtained by the inter-frame matching constraint is the inter-frame matching error. in,

The map matching constraint is: the error between the pixel position of the first matching map point back-projected to the current frame and the pixel position of the first matching feature point matching the first matching map point in the current frame, that is, two pixel positions The two pixel positions are respectively: the pixel position of the first matching map point back-projected on the current frame, and the pixel position of the first matching feature point matching the map point in the current frame; or, the current The error between the projection of the first matching feature point in the frame to the spatial position in the world coordinate system and the spatial position of the first matching map point matching the first matching feature point in the world coordinate system, that is, the difference between the two spatial positions. The two spatial positions are: the spatial position where the first matching feature point in the current frame is projected to the world coordinate system, and the first matching map point matched by the first matching feature point is in the world coordinate system. The spatial position of the first matching feature point is: the feature point in the current frame is matched with the map point in the map to obtain a successful matching feature point; the first matching map point is: the first matching feature point is successfully matched the map point;

The inter-frame matching constraints are: the first matching feature point in the current frame is projected to the spatial position in the world coordinate system and the second matching feature point matching the first matching feature point in the previous key frame of the current frame is projected to the world The error between the spatial positions in the coordinate system, that is, the error between two spatial positions, the two spatial positions are respectively: the spatial position where the first matching feature point in the current frame is projected to the world coordinate system, and the current In the previous key frame of the frame, the second matching feature point that matches the first matching feature point is projected to the spatial position in the world coordinate system; or, the second matching map point that matches the second matching feature point is back-projected to the current The error between the pixel position of the frame and the pixel position of the second matching map point back-projected to the previous key frame, that is, the error between two pixel positions, the two pixel positions are: and the second matching feature point The matched second matching map point is back-projected to the pixel position of the current frame, and the second matching map point is back-projected to the pixel position of the previous key frame. In addition, in order to prevent the map from being too large, you can set some frames as keyframes and add them to the map to save. Then, the previous keyframe is the keyframe that is closest to the collection time of the current frame among the set keyframes. For the way of determining whether any video frame can be used as a key frame, please refer to the following description for whether the current frame can be used as a key frame.

Referring to FIG. 12 , FIG. 12 is a schematic diagram of map matching constraints and inter-frame matching constraints of the current frame. Wherein, the second matching feature point is a subset of the first matching feature point set, and the second matching map point matching the second matching feature point is: a subset of the first matching map point set matching with the first matching feature point . The error between the pixel position of the first matching map point back-projected to the current frame and the pixel position of the first matching feature point matched by the first map point constitutes a map matching constraint; the second matching map point is back-projected to the current frame. The error between the pixel position and the pixel position of the last key frame of the back-projected current frame of the second matching map point constitutes an inter-frame matching constraint.

It should be understood that although the map matching constraints and the inter-frame matching constraints described in the embodiments of the present application are determined in the image coordinate system, the map matching constraints and the inter-frame matching constraints can also be determined in the world coordinate system.

Referring to FIG. 13 , FIG. 13 is a schematic diagram of map matching constraints and inter-frame matching constraints of the current frame in the world coordinate system. Wherein, the error between the spatial position of the first matching feature point in the current frame projected to the world coordinate system and the spatial position of the first matching map point matching the first matching feature point in the world coordinate system constitutes a map matching constraint , that is, the error between two spatial positions constitutes a map matching constraint, and the two spatial positions are: the first matching feature point in the current frame is projected to the spatial position in the world coordinate system, and the first matching feature point matches The spatial position of the first matching map point in the world coordinate system; the spatial position of the first matching feature point in the current frame projected to the world coordinate system matches the first matching feature point in the previous key frame of the current frame. The error between the second matching feature point projected to the spatial position in the world coordinate system constitutes the inter-frame matching constraint, that is, the error between the two spatial positions constitutes the inter-frame matching constraint, and the two spatial positions are: the current frame The first matching feature point in is projected to the spatial position in the world coordinate system, and the second matching feature point matching the first matching feature point in the previous key frame of the current frame is projected to the spatial position in the world coordinate system. The spatial position of the matching feature point projected into the world coordinate system is obtained according to the camera model, according to the camera internal parameters, the pixel position of the matching feature point, and the pose of the frame where the matching feature point is located.

Optionally, the first matching feature point is a feature point in the best matching feature point set.

Exemplarily, in an implementation manner, the map matching constraint is: the first matching feature point in the current frame is projected to the spatial position in the world coordinate system and the first matching map point matching the first matching feature point is in the world. The error between the spatial positions in the coordinate system, the inter-frame matching constraints are: the second matching map point matching the second matching feature point is back-projected to the pixel position of the current frame and the second matching map point is back-projected to the previous The error between the pixel positions of the keyframes, correspondingly,

Using the least squares method, the inter-frame matching error and/or map matching error is minimized, and the optimized current frame pose is obtained, including constructing an objective function for optimization. The objective function for optimization is: taking the current sliding window The first result obtained by the sum of the map matching errors of all the first matching feature points of all the frames in the current sliding window, and/or the inter-frame matching of all the second matching map points of all the frames in the current sliding window and its previous key frame The second result obtained by the sum of errors; that is, the objective function used for optimization can be: the first result obtained by the sum of the map matching errors of all the first matching feature points of all frames in the current sliding window , it can also be: the second result obtained by the sum of the inter-frame matching errors of all the frames in the current sliding window and all the second matching map points of the previous key frame, or can also be: the above-mentioned first result and the second result sum of results.

In addition, based on the above-mentioned map matching constraints, it can be known that the map matching errors of all the first matching feature points in any frame are: The error between the spatial positions of the first matching map points in the point matching in the world coordinate system; and based on the above inter-frame matching constraints, it can be known that any frame is matched with all the second matching map points of the previous key frame. The error is: the error between the back-projection of the second matching map point matching the second matching feature point to the pixel position of the frame and the back-projection of the second matching map point to the pixel position of the previous key frame, where, here The second matching feature point of is the feature point matching the first matching feature point of the frame in the previous key frame of the frame.

Taking the optimization constraints as inter-frame matching constraints and map matching constraints as an example, the mathematical formula of the objective function used for optimization is expressed as:

in,

e _ik-map = p _ik -(K(R _i X _k +t _i )), e _ijm-frame =(K(R _i X _m +t _i ))-(K(R _j X _m +t _j ) )

e _ik-map is the map matching error between the pixel position of the first matching map point k back-projected on the current frame i and the pixel position of the first matching feature point k that matches the map point in the current frame i; p _ik is the pixel coordinate of the first matching feature point k in the current frame i, K is the camera internal parameter matrix, X _k is the three-dimensional coordinate of the first matching map point k in the map, R _i and t _i are obtained through the first matching feature point the pose of the current frame i.

e _ijm-frame is the back-projection error of the second matching map point m between the current frame i and the previous key frame j, that is, the matching error between frames, X _m is the second matching feature in the current frame i and the previous key frame j a second matching point matching three-dimensional coordinates of the map point m, R _{_i,} t _i is the current pose frame i, R _{_j,} t _j is the pose of the previous key frame j, may be obtained by the second feature point matching .

I is the total number of frames in the sliding window, K is the total number of first matching feature points in the current frame, j is the previous key frame of each frame in the sliding window, M is the total number of second matching map points back projected in the current frame .

In the objective function used for optimization,

Substitute the map matching constraints obtained according to the pose of the current frame i, the three-dimensional coordinates of the first matching map point k in the map, the camera internal parameter matrix K, and the pixel coordinates of the first matching feature point k in the current frame i, as the map match constraint initial value;

Substitute the pose of the current frame i, the three-dimensional coordinates of the second matching map point m in the map, the pose of the previous key frame, the camera internal parameter matrix K, and the obtained inter-frame matching constraint as the initial value of the inter-frame matching constraint,

The iterative solution makes the current frame pose when the objective function obtains the minimum value, and the pose is taken as the current positioning result.

Further, different weights of map matching constraints and inter-frame matching constraints can also be assigned to construct the objective function to further improve the accuracy of positioning. The objective function for optimization is: weighted with the first weight. The first result obtained by the sum of the map matching errors of all the first matching feature points of all frames in the current sliding window, and the second weighted weighting of all the second key frames between each frame in the current sliding window and its previous key frame The second result obtained by the sum of the inter-frame matching errors of the matching map points, the sum of the accumulated first result and the second result; that is, the objective function is: weighting the first result with the first weight and taking The sum obtained after the second result is weighted by the second weight; the objective function used for optimization is mathematically expressed as:

Among them, γ ₁ and γ ₂ are weights.

In addition, when iteratively solving the pose of the current frame, the pose of all frames in the constraint frame can also be used as a variable to be optimized for optimization.

After the optimization is completed, the current sliding window is optimized, that is, whether the current frame can be used as a key frame and whether it can exist in the sliding window is identified, so as to further improve the accuracy of positioning, that is, to improve the accuracy of subsequent positioning using image frames. Specifically:

The current frame is determined as a key frame when one of the following conditions is met: the number of first matching feature points in the current frame is less than the first threshold, or the number of second matching feature points in the current frame is less than the second threshold.

If the current frame is a non-key frame, the current frame is deleted in the sliding window. If the current frame is a key frame, it is judged whether the number of frames in the current sliding window reaches the set first frame threshold, and if it reaches the first frame threshold , the earliest added keyframe in the sliding window is deleted, and if the first frame threshold is not reached, the earliest added keyframe in the sliding window is not deleted.

Each positioning strategy involved in step 104 will be described below.

Referring to FIG. 5 , FIG. 5 is a schematic flowchart of initializing positioning. After image preprocessing and feature point extraction of the current frame, the initial positioning includes,

Step 501: Match the feature points of the current frame with the map points in the map. When the matching is successful, a first matching feature point is obtained, and the obtained first matching feature point constitutes a first matching feature point set; in this way, for the first matching feature point set; In terms of matching feature points, the spatial position information of the first matching map point matched with the first matching feature point is determined as the spatial position information of the first matching feature point, so as to obtain the two-dimensional feature point of the current frame and the three-dimensional feature point in the map. matching of feature points.

The matching can be performed as follows: for any feature point of the current frame, calculate whether the matching degree between the feature point in the current frame and the map point descriptor in the map is less than the set first matching threshold, if it is less than the set first matching threshold. The determined first matching threshold is determined to match, and if it is not less than the set first matching threshold, it is determined to be mismatched; wherein, the matching degree can be described by the Hamming distance, and the matching threshold is the Hamming distance threshold;

In this step 501, the map points in the map can be all map points in the map, and then the brute force matching method is used for matching, that is, the possibility of each matching is calculated when pairwise matching is performed;

Optionally, in order to reduce the calculation amount of the matching calculation, the auxiliary information is used to obtain the candidate map range, that is, the above-mentioned first candidate map range, and the feature points in the current frame are matched with the map points in the first candidate map range, for example , and set easily identifiable positions as candidate map ranges, such as the starting and ending positions of paths, turning positions, and crossing positions. Taking a grid path map as an example, the positions of intersections and corners of all paths can be set as candidate map ranges, and correspondingly, the auxiliary information can be position information of easily identifiable positions.

Further, because the actual situation of the texture site varies, it is very likely that similar textures and repeated textures will occur, such as carpets, floor tiles, etc. In view of this, the map points within the candidate map range can be screened to prevent the existence of Map points with similar or identical features. The map point screening can be performed by brute force matching, that is, for every two map point features, if the matching degree exceeds the set second matching threshold, it means that the two map points are very similar, and one map needs to be deleted randomly. point to prevent the risk of mis-matching during positioning initialization. Referring to FIG. 6 , FIG. 6 is a schematic diagram of map point screening, wherein the dotted circles are map points removed by screening, and the solid circles are candidate map positions.

Step 501 is repeated until all feature points of the current frame are matched.

Step 502, in view of the fact that there may be some mismatched points when only the descriptors of the feature points are used to match the feature points, preferably, a certain method can be used to screen the best first matching feature points to improve the accuracy of matching the feature points. , thereby improving the accuracy of the pose of the current frame. Therefore, in this step 502, based on the first matching feature point, a random sampling consistency (RANSAC) algorithm can be used to determine the best matching feature point set, which specifically includes:

Step 5021, from the first matching feature point set obtained in step 501, randomly select the matching feature points for calculating the pose estimation of the current frame, and obtain the current matching feature point subset;

Step 5022, based on the mapping of the spatial position information and the pixel position information established by the matching feature points in the current matching feature point subset, calculate the current pose, thereby obtaining the fitting pose estimation of the matching feature point subset; that is, Calculate the current position based on the mapping, which is a mapping between the spatial position information and the pixel position information established by using the matching feature points in the current matching feature point subset; and, the calculated current pose can be used as the matching feature point subset. Fitting pose estimation;

In step 5022, calculating the current pose includes but is not limited to the following methods: perspective N-point positioning PNP (2D-3D) method, 2-dimensional nearest neighbor iterative 2D-ICP (2D-2D), 3-dimensional nearest neighbor iterative 3D-ICP (3D-3D), the homography matrix H(2D-2D).

Take the calculation of the pose through the homography matrix as an example.

Since the mobile robot moves in a plane and the spatial position coordinates are in the same plane z=0, the product of the homography matrix and the spatial position coordinate matrix corresponds to the pixel coordinate matrix, which is expressed mathematically as:

The degree of freedom of the homography matrix is 8, and the value of each element in the homography matrix can be obtained through the correspondence between the spatial positions of the four first matching feature points and the pixel positions. By performing singular value (SVD) decomposition on the homography matrix, the corresponding rotation matrix R and translation vector t can be obtained, and the fitted pose estimation can be obtained.

Step 5023, since the fitting pose estimation is obtained from the matching feature points in the subset, in order to consider whether other matching feature points in the first matching feature set also conform to the currently calculated fitted pose estimation, it is necessary to calculate the point rate. In this step 5023, the spatial positions of all feature points in the current frame are obtained according to the fitted pose estimation and the camera internal parameters, and the spatial positions of the projection points of all the feature points are obtained.

Using the camera projection model, two-dimensional pixel coordinate points can be mapped to three-dimensional coordinate points. In this paper, this mapping is called projection; otherwise, from three-dimensional coordinate points to two-dimensional coordinate points, this mapping is called inverse projection. That is, the spatial position of the feature point refers to the spatial position of the projection point of the feature point. Among them, the feature point itself exists in the image coordinate system, and the spatial position of the projection point can be obtained by projection. Projection points exist in the world coordinate system.

In this step 5023, all the first matching feature points in the current frame are projected to three-dimensional spatial positions, that is, projected into a map, as the spatial positions of the projected points. Therefore, for any first matching feature point i in the current frame, its three-dimensional space coordinates can be obtained.

The mathematical formula can be expressed as:

p _i =K(RX _i +t)

Among them, p _i is the pixel coordinate of the first matching feature point i in the current frame, R and t are the current fitting pose estimation, and X _i is the three-dimensional space coordinate of the first matching feature point i projected to the map, that is, the projection point Three-dimensional space coordinates, K is the camera internal parameter matrix.

Step 5024, for each first matching feature point in the first matching feature point set, determine whether the distance between the projection point of the first matching feature point in the current frame and the map point matched by the first matching feature point in the map is less than If the set second distance threshold is smaller than the set second distance threshold, it is determined that the first matching feature point is an inner point, wherein, in the judgment process, the spatial position of the projection point of the first matching feature point can be to judge.

Step 5024 is repeatedly performed until all the first matching feature points are determined whether they are interior points.

Step 5025: Count the number of current inliers, and use the ratio of the current number of inliers to the number of first matching feature points as the inlier rate; the larger the ratio, the higher the inlier rate, which means the higher the degree of fitting, the better the fitting pose. The better the estimation, the better the randomly selected matching feature points.

Step 5026, determine whether the number of inliers currently counted is the largest in the previous iterations, if it is the largest in the previous iterations, then take the set formed by the current inliers as the current best matching feature point set, and then execute step 5027, if is not the largest in previous iterations, the current best matching feature point set is not updated, but step 5027 is directly executed,

Step 5027, determine whether the end condition is reached, if the end condition is met, then execute step 503, if the end condition is not met, return to step 5021, so as to randomly select a subset of matching feature points for fitting pose estimation, so as to carry out the fitting method. The cycle of the pose estimation to the best matching feature point set confirmation, that is, the cycle of executing S5021-S5027;

The end condition includes at least one of the following conditions:

1) The interior point rate satisfies the preset conditions,

2) The number of iterations satisfies the preset conditions; in order to make at least one random selection during the iterative loop process under the condition of confidence η, so that the selected m points are all interior points, which is beneficial to the process of the loop. , the best value of the fitted pose estimate can be obtained at least once. Therefore, the number of iterations α should satisfy the following conditions:

Among them, m is the size of the subset, that is, the number of matching feature points in the subset; the confidence level is generally set in the range of 0.95 to 0.99. ε is the interior point rate. In general, ε is usually unknown, so the proportion of interior points under the worst condition can be taken, or the proportion under the worst condition can be set in the initial state, and then with the number of iterations, the Update to the current maximum inlier rate.

3) The probability that the subsets are all interior points satisfies the required confidence level. Specifically, the selected subset is regarded as the two results of "all interior points" or "not all interior points". term distribution, while the probability of the former is p=1/ε ^m . For the case where p is small enough, it can be regarded as a Poisson distribution. Therefore, in the i cycle, the probability that there are θ "subsets are all interior points" can be expressed as:

Among them, λ represents the expectation of the number of times that "the subsets are all interior points" in the i rounds.

For example, it is hoped that the probability that "none of the selected subsets are all interior points" in this i iteration loop is less than a certain confidence level, that is: p(0, λ) = e - ^λ < 1 - η, with confidence level Taking 95% as an example, λ is approximately equal to 3, which means that under 95% confidence, in i cycles, an average of 3 "good" subsets can be selected.

Step 503, judging whether the best matching feature point set satisfies the conditions for the current frame pose calculation,

If the conditions for the pose calculation of the current frame are met, the pose calculation of the current frame is performed, and the calculation of the current pose includes but is not limited to the following methods: perspective N-point positioning PNP (2D-3D) method, 2-dimensional nearest neighbor iteration 2D-ICP(2D-2D), 3D nearest neighbor iterative 3D-ICP(3D-3D), homography matrix H(2D-2D);

If the conditions for the current frame pose calculation are not met, it is determined that the current frame positioning fails. It should be noted that in the case of not using the best matching feature point set, that is, based on the positioning strategy, the feature points in the current frame are matched with the map points in the map, and after the matching feature points are obtained, you can directly Determine whether the matching feature points meet the conditions for the current frame pose calculation, and when the judgment result is yes, calculate the pose of the current frame according to the matching feature points, and obtain the positioning result, wherein the current frame is calculated according to the matching feature points. The methods of pose include but are not limited to the following methods: perspective N-point positioning PNP (2D-3D) method, 2-dimensional nearest neighbor iterative 2D-ICP (2D-2D), 3-dimensional nearest neighbor iterative 3D-ICP (3D-3D) ), the homography matrix H(2D-2D).

In this step 503, one of the embodiments, in view of the different pose calculation methods required by different pose calculation conditions are different, taking the homography matrix H(2D-2D) as an example, to solve the pose at least There are 4 matching feature points, therefore, it is judged whether the number of matching feature points in the best matching feature point set satisfies the conditions for pose calculation.

In the second embodiment, when there are multiple sets of best matching feature points due to multiple candidate map ranges, it is necessary to determine a unique set of best matching feature points, so as to determine the best matching feature point set based on the matching situation between the current frame and multiple candidate map ranges. Determine whether there is a candidate map range that meets the conditions for successful matching. specifically is:

According to the number of matching feature points in the best matching feature point set, that is, according to the number of interior points, each best matching feature point set is given a weight to measure the matching degree between the current frame and the candidate map range; the weight can also be based on the current frame. The number of frame feature points extracted, the distribution of feature points, one of the initial number of matching feature points and the number of matching feature points in the best matching feature point set (the number of inliers obtained by screening) or any combination thereof to determine the weight.

According to the set weight threshold and maximum weight, determine the unique best matching feature point set, so as to determine that the best matching feature point set meets the conditions for pose calculation; for example, adopt the weight threshold and the unique maximum weight Combined, that is, filter out the best matching feature point set whose weight is greater than the weight threshold, and select the unique best matching feature point set from the best matching feature point set according to the principle that the difference between the largest weight and the next largest weight is the largest. The best matching feature point set.

Referring to FIG. 7 , FIG. 7 is a schematic flow chart of normal positioning. After image preprocessing and feature point extraction of the current frame, normal positioning includes,

Step 701, in view of the fact that the positioning result of the previous frame has been obtained, according to the inter-frame motion information from the previous frame to the current frame, perform pose prediction on the current frame to obtain the predicted pose, so as to use the predicted pose to determine the second candidate. map range, thereby improving the efficiency of matching;

Pose prediction methods include,

Embodiment 1: Obtain the inter-frame pose transformation from the previous frame to the current frame through a wheeled odometer or an inertial measurement unit (IMU), and obtain the current frame based on the positioning result of the previous frame and the inter-frame pose transformation. Predict the pose.

In the second embodiment, the inter-frame pose transformation from the previous frame to the current frame is obtained through visual odometry (VO), and the predicted pose of the current frame is obtained based on the positioning result of the previous frame and the inter-frame pose transformation. This implementation requires only image information, no additional inertial information.

Embodiment 3: According to several historical frames for which positioning results have been obtained, predict the inter-frame pose transformation from the previous frame to the current frame, and obtain the predicted position of the current frame based on the positioning results of the previous frame and the inter-frame pose transformation. posture. This implementation does not rely on any information about the current frame.

In Embodiment 4, at least two of Embodiments 1 to 3 are used to obtain the first predicted poses of the current frame respectively, and at least two first predicted poses are obtained; Kalman filter is used to analyze the at least two first predicted poses. A predicted pose is filtered to obtain a filtered second predicted pose, and the second predicted pose is used as the final predicted pose of the current frame, or, a nonlinear optimization method is used, based on the at least two first predicted poses The pose is optimized to obtain the optimized second predicted pose, and the second predicted pose is used as the final predicted pose of the current frame;

For example, the objective function of nonlinear optimization is constructed according to the various error terms obtained in various embodiments, and the mathematical expression is:

Among them, e _ijs represents the error term when the implementation s is adopted,

T _i represents the pose of the previous frame i, that is, the positioning result of the previous frame i, T _j represents the first predicted pose of the current frame j, ΔT _ij represents the inter-frame position between the previous frame i and the current frame j Pose transformation, ξ _j is the Lie algebra representation of the predicted pose of the current frame, and S is the total number of implementations adopted.

_{The pose T i} of the previous frame i, _{the first predicted pose T j} of the current frame j obtained by different implementations, and the inter-frame pose transformation ΔT _{ij are} substituted into the objective function as initial values, and the solution makes the objective function obtain the minimum value. posture at the time.

Step 702, according to the predicted pose of the current frame, determine the second candidate map range, match the feature points in the current frame with the map points in the second candidate map range, when the matching is successful, obtain the third matching feature point, Then perform step 703,

In this step 702, one embodiment is, as shown in FIG. 9a, taking the map position determined by the predicted pose of the current frame as the center, and determining the first neighborhood of the center as the second candidate map range; matching The method can use brute force matching, that is, violently calculate the descriptor distance between all feature points in the current frame and the feature points in the candidate map range, and select the feature point with the smallest descriptor distance as the matching feature point. This method is suitable for use when the confidence of the predicted pose is low or the number of feature points to be matched is small.

Another embodiment is, as shown in FIG. 9b, for each feature point of the current frame, according to the projection range of the feature point, the second candidate map range of the feature point is determined, that is, according to the predicted pose of the current frame. and the pixel coordinates of the feature point, calculate the projection point position of the feature point projected into the map, and take the first neighborhood of the projection point as the center as the second candidate map range of the feature point, calculate the current frame feature point and the first For the matching of map points within the range of two candidate maps, the feature point pair with the smallest descriptor distance is selected as the matching feature point. Wherein, according to the predicted pose of the current frame and the pixel coordinates of the feature point, the calculation formula used to calculate the position of the projection point projected from the feature point to the map may be the formula about p _{i shown in the relevant description of step 5023} expression.

Preferably, the first neighborhood may be a map area covered by a set radius with the center as the center.

Step 702 is repeated until all feature points of the current frame are matched.

Step 703, based on the third matching feature point, adopt the random sampling consistency algorithm to determine the best matching feature point set, this step 703 is the same as step 502,

Step 704, determine whether the number of matching feature points in the best matching feature point set satisfies the conditions for the current frame pose calculation, if the conditions for the current frame pose calculation are met, then perform the current frame pose calculation , if the conditions for the current frame pose calculation are not met, it is determined that the current frame positioning fails. This step 703 is the same as step 503 .

Referring to FIG. 8, FIG. 8 is a schematic flowchart of a relocation. After image preprocessing and feature point extraction of the current frame, relocation includes,

Step 801, since the positioning of the previous frame fails, according to the historical frame of the successful positioning, trace the latest frame in the historical frame as a reference frame, and perform pose prediction on the current frame according to the inter-frame motion information from the reference frame to the current frame, and obtain: Predicting the pose, so as to use the predicted pose to determine the third candidate map range, thereby improving the efficiency of matching;

The pose prediction method is the same as the pose prediction method in step 701, and only the reference frame needs to be regarded as the previous frame.

Step 802, according to the predicted pose of the current frame, determine the third candidate map range, match the feature points in the current frame with the map points in the third candidate map range, when the matching is successful, obtain the fourth matching feature point, Then step 803 is executed,

In this step 802, an implementation manner is to take the map position determined by the predicted pose of the current frame as the center, and determine the second neighborhood of the center as the third candidate map range; the matching method can be brute force matching. , to improve the robustness of relocation.

Another implementation is that, for each feature point of the current frame, the third candidate map range of the feature point is determined according to the projection range of the feature point, that is, according to the predicted pose of the current frame and the pixels of the feature point Coordinates, calculate the position of the projection point projected from the feature point of the current frame to the map, the second neighborhood centered on the position of the projection point is used as the third candidate map range of the feature point, and calculate the feature point of the current frame and the third candidate map range In the matching of map points within, the feature point pair with the smallest descriptor distance is selected as the matching feature point.

Optionally, the second neighborhood may be a map range covered by a set radius with the center as the center.

Step 802 is repeated until all feature points of the current frame are matched.

Step 803, based on the fourth matching feature point, adopt the random sampling consistency algorithm to determine the best matching feature point set, this step 803 is the same as step 502,

Step 804, determine whether the number of matching feature points in the best matching feature point set satisfies the conditions for pose calculation, if it meets the conditions for pose calculation, then perform the pose calculation of the current frame, if not The condition used for pose calculation, it is determined that the current frame positioning fails.

When relocation successfully calculates the pose of the current frame, since the best matching feature point set is determined by the reference frame, there is a high probability of a large error. Preferably, the LM optimization method is used to solve the problem.

Compared with the traditional two-dimensional code positioning, the visual positioning method provided in this embodiment does not need to transform the ground for positioning using texture maps, and can use natural textures for positioning, and the cost is low; The positioning logic determined by the positioning strategy enhances the adaptability in the case of local texture changes or lack of local textures. Whether it is a texture map or a non-textured visual map, there is no need to pre-set accurate calibration points, image acquisition is easy, and positioning During the process, it can continue to match with the map without large jumps. In some areas where the map is built, it can be located on any path, not limited to the path of the map, which improves the robustness of the positioning process; different Different positioning strategies in pose prediction and feature matching processing not only improve the accuracy of positioning, but also help to improve the efficiency of positioning.

Referring to FIG. 10 , FIG. 10 is a schematic diagram of a visual positioning device based on a visual map of the present application. The device includes,

The image acquisition module 1001 collects the current image to obtain the current frame;

The feature extraction module 1002, based on the current frame, performs feature point extraction to obtain the feature points of the current frame;

The positioning module 1003 determines a positioning strategy according to the current positioning state; based on the positioning strategy, a candidate map range in the map is determined for the current frame, and the feature points in the current frame are matched with the map points within the candidate range to obtain matching feature points. When the matching feature points meet the conditions for solving the pose of the current frame, the pose of the current frame is calculated according to the matching feature points, and the positioning result is obtained.

Wherein, the positioning module 1003 includes:

The positioning state sub-module 1004 determines the current positioning state according to the positioning result of the previous frame and the current number of frames with continuous positioning failures,

The positioning strategy sub-module 1005 determines a positioning strategy according to the current positioning state, and the positioning strategy includes: initialization positioning in an uninitialized state, normal positioning in a successful positioning state, relocation in a relocation state, and between each positioning state conversion relationship;

The positioning logic sub-module 1006 generates positioning logic according to each positioning state and the conversion relationship between each positioning state; wherein, the logical content of the positioning logic includes: performing the positioning process indicated by the corresponding positioning strategy on the current frame, the The corresponding positioning strategy is the positioning strategy in the current positioning state;

Matching and positioning sub-module 1007, based on the positioning logic, according to the positioning process indicated by the positioning strategy in each state, determine the candidate map range in the map for the current frame, and match the feature points in the current frame with the map points within the candidate range , obtain the matching feature points, and calculate the pose of the current frame according to the matching feature points;

The pose graph optimization sub-module 1008 performs graph optimization on the pose of the current frame. Optionally, the positioning module 1003 is further configured to, after obtaining the matching feature points, use a random sampling consistency algorithm to determine the best matching feature point set based on the matching feature points, and determine whether the best matching feature point set satisfies the requirements for The conditions for the current frame pose calculation, when the best matching feature point set does not meet the conditions for the current frame pose calculation, it is determined that the current frame positioning fails.

Optionally, the conversion relationship between the various positioning states includes:

When the current positioning state is the positioning loss state, if the relocation is successful, it will switch to the positioning success state. If the relocation fails, determine whether the number of frames of continuous positioning failure exceeds the set threshold of the number of frames, or determine whether the current frame is different from the most recent positioning. Whether the distance between the successful poses exceeds the set first distance threshold, if it exceeds the set number of frames threshold or the set first distance threshold, it will switch to the uninitialized state, if it does not exceed the set number of frames threshold or the set first distance threshold, the current position loss state is maintained.

The matching and positioning sub-module 1007 includes:

The candidate map determination unit 1009, based on the positioning logic, determines the candidate map range in the map for the current frame according to the positioning process indicated by the positioning strategy in each state,

The feature matching unit 1010 matches the feature points in the current frame with the map points in the candidate area,

The pose calculation unit 1011 calculates the pose of the current frame according to the matching feature points.

Optionally, the positioning logic sub-module 1006 is specifically used for:

Determine whether the current frame is the first frame,

If the current frame is the first frame, initialize the current frame,

Optionally, the positioning logic sub-module 1006 is further configured to determine that the current positioning state is an uninitialized state according to the current frame being the first frame before initializing and positioning the current frame if the current frame is the first frame,

The positioning logic sub-module 1006 is also used to determine that the current positioning state is a positioning success state according to the successful positioning of the previous frame before performing normal positioning on the current frame if the positioning of the previous frame is successful, and if the current frame is successfully positioned normally, Then record the current frame positioning success, keep the current positioning successful state, if the current frame fails to locate normally, record the current frame positioning failure, and switch to the positioning loss state;

The positioning logic sub-module 1006 is also used for:

If the current frame number of consecutive positioning failures exceeds the frame number threshold, or the distance between the current frame and the pose of the latest successful positioning exceeds the set first distance threshold, before initializing the current frame positioning,

The positioning logic sub-module 1006 is also used for:

Before the current frame is relocated, if the current frame number of consecutive positioning failures does not exceed the frame number threshold or the distance between the current frame and the last successful positioning pose does not exceed the set first distance threshold, the current frame is relocated.

According to the current continuous positioning failure frame number does not exceed the frame number threshold, or the distance between the current frame and the last successful positioning pose does not exceed the set first distance threshold, it is determined that the current positioning state is a positioning loss state,

Optionally, the matching and positioning sub-module 1007 is specifically used for:

Optionally, the matching positioning sub-module 1007 performs pose prediction on the current frame according to the inter-frame motion information from the previous frame to the current frame, and obtains the predicted pose of the current frame, including:

The first method: Obtain the frame pose transformation from the previous frame to the current frame through the wheel odometer or inertial measurement element, and obtain the predicted pose of the current frame based on the inter-frame pose transformation and the positioning result of the previous frame. ;or

The second method: obtain the inter-frame pose transformation from the previous frame to the current frame through the visual odometry, and obtain the predicted pose of the current frame based on the inter-frame pose transformation and the positioning result of the previous frame; or

The third method: According to the historical frame for which the positioning result has been obtained, predict the inter-frame pose transformation from the previous frame to the current frame, and obtain the predicted pose of the current frame based on the inter-frame pose transformation and the positioning result of the previous frame ;or,

The matching and positioning sub-module 1007 is specifically used for:

or

Optionally, the positioning module 1003 is specifically used for:

or,

Optionally, the positioning module 1003 is also used to calculate the pose of the current frame according to the matching feature points, and after obtaining the positioning result, it also includes:

in,

The image acquisition module 1001 is further configured to collect the current image, and after obtaining the current frame, perform feature point extraction based on the current frame, and perform image preprocessing on the current frame before obtaining the feature points of the current frame.

The positioning module 1003 is specifically used for:

The positioning module is also used to obtain the optimized pose of the current frame,

If the current frame is a non-key frame, the current frame is deleted in the sliding window. If the current frame is a key frame, it is judged whether the number of frames in the current sliding window reaches the set first frame threshold, and if it reaches the set first frame threshold If the threshold is one frame, the earliest added keyframe in the sliding window will be deleted. If the set first frame threshold is not reached, the earliest added keyframe in the sliding window will not be deleted;

The image preprocessing performed by the image acquisition module 1001 on the current frame includes:

It is judged whether the pixel values in the foreground image are uniformly distributed. If it is uniform, the foreground image is used as the current frame after image preprocessing. If it is not uniform, the foreground image is stretched to obtain the current frame after image preprocessing.

Stretching the foreground image includes:

Perform feature detection on the current frame to obtain feature points,

divide the current frame into a predetermined number of grids,

In order to improve the efficiency of positioning, the visual positioning device further includes an image preprocessing module 1012 for preprocessing the image; preferably, see FIG. 11 , which is a schematic diagram of the image preprocessing module. The image preprocessing module 1012 includes,

The image de-distortion sub-module performs de-distortion processing on the source image frame according to the distortion coefficient of the camera to obtain a de-distorted image,

The image filtering sub-module performs image filtering on the dedistorted image to obtain the background image,

Image difference sub-module, subtract the background image from the distorted image to obtain the foreground image,

Image stretching sub-module, which stretches the foreground image to obtain the target image frame.

Among them, when the visual map is a texture map, the image filtering sub-module, the image difference sub-module, and the image stretching sub-module can be used to enhance the image texture.

The present application also provides a mobile robot, comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to implement the steps of the above-mentioned visual map-based visual positioning method.

The memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage. Optionally, the memory may also be at least one storage device located away from the aforementioned processor.

The above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processing, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned visual positioning method based on a visual map are implemented.

Embodiments of the present invention also provide a computer program product containing instructions, which, when running on a computer, causes the computer to execute the steps of the above-mentioned visual positioning method based on a visual map.

An embodiment of the present invention further provides a computer program, which, when running on a computer, causes the computer to execute the steps of the above-mentioned visual positioning method based on a visual map. For the apparatus/network-side device/storage medium embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the partial description of the method embodiment.

In this document, relational terms such as first and second, etc. are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such existence between these entities or operations. The actual relationship or sequence. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application shall be included in the present application. within the scope of protection.

Claims

A visual positioning method based on a visual map, the method comprising,

Collect the current image to get the current frame;

Based on the current frame, feature point extraction is performed to obtain the feature points of the current frame;

Determine the positioning strategy according to the current positioning state,

Based on the positioning strategy, the feature points in the current frame are matched with the map points in the map to obtain the matching feature points,

When the matching feature points meet the conditions for the current frame pose calculation, the pose of the current frame is calculated according to the matching feature points, and the positioning result is obtained.
The method of claim 1, wherein after obtaining the matching feature points, the method further comprises:

Based on the matching feature points, a random sampling consensus algorithm is used to determine the best matching feature point set,

Judging whether the best matching feature point set satisfies the conditions for the current frame pose calculation, when the best matching feature point set does not meet the current frame pose calculation conditions, it is judged that the current frame positioning fails; the positioning The state is determined according to the positioning result of the previous frame and the current number of consecutive failed frames; the positioning state includes an uninitialized state, a successful positioning state, and a relocation state;

The positioning strategy includes initialization positioning in an uninitialized state, normal positioning in a successful positioning state, relocation in a relocation state, and conversion relationships between the positioning states;

Wherein, the conversion relationship between the various positioning states includes:

When the current positioning state is the uninitialized state, if the initialization positioning is successful, it will switch to the positioning successful state, and if the initialization positioning fails, the current uninitialized state will be maintained;

When the current positioning status is the positioning successful status, if the normal positioning is successful, the current positioning successful status will be maintained, and if the normal positioning fails, it will switch to the positioning lost status;

When the current positioning state is the positioning loss state, if the relocation is successful, it will switch to the positioning success state. If the relocation fails, it will be judged whether the number of frames of continuous positioning failure exceeds the set threshold of the number of frames, or the current frame and the most recent frame will be judged. Whether the distance between the successfully positioned poses exceeds the set first distance threshold, if it exceeds the set frame number threshold or the set first distance threshold, it will switch to the uninitialized state, if it does not exceed the set frame If the number threshold or the set first distance threshold is exceeded, the current positioning loss state is maintained;

The determining of the positioning strategy according to the current positioning state includes generating positioning logic according to each positioning state and the conversion relationship between the respective positioning states; wherein, the logical content of the positioning logic includes: executing the current frame indicated by the corresponding positioning strategy The positioning process, the corresponding positioning strategy is the positioning strategy under the current positioning state;

Matching the feature points in the current frame with the map points in the map based on the positioning strategy, including, based on the positioning logic, according to the positioning process indicated by the positioning strategy in each state, determining the candidate map range in the map for the current frame , to match the feature points in the current frame with the map points in the candidate map range.
The method according to claim 2, wherein, generating the positioning logic according to each positioning state and the conversion relationship between the respective positioning states, comprising:

Determine whether the current frame is the first frame,

If the current frame is the first frame, initialize the current frame,

If the current frame is not the first frame, it is judged whether the positioning of the previous frame is successful.

If the positioning of the previous frame is successful, the current frame will be positioned normally;

If the positioning of the previous frame is unsuccessful, it is judged whether the current number of consecutive failed frames exceeds the set number of frames threshold, or whether the distance between the current frame and the pose of the last successful positioning exceeds the set first distance threshold ;

If the current frame number of consecutive positioning failures exceeds the frame number threshold or the distance between the current frame and the last successful positioning pose exceeds the set first distance threshold, the current frame will be initialized and positioned.

If the current frame number of consecutive positioning failures does not exceed the frame number threshold or the distance between the current frame and the last successful positioning pose does not exceed the set first distance threshold, the current frame is relocated.
The method of claim 3, wherein, if the current frame is the first frame, initializing and positioning the current frame, further comprising:

According to the current frame is the first frame, it is determined that the current positioning state is an uninitialized state,

If the initialization positioning of the current frame is successful, the current frame positioning success is recorded, and the status is switched to the positioning successful state. If the current frame initialization positioning fails, the current frame positioning failure is recorded, and the current uninitialized state is maintained;

Described if the positioning of the previous frame is successful, then the current frame is positioned normally, further comprising,

According to the successful positioning of the previous frame, it is determined that the current positioning status is the positioning successful status.

If the normal positioning of the current frame is successful, the current frame positioning is recorded successfully, and the current positioning success status is maintained. If the current frame positioning fails normally, the current frame positioning failure is recorded, and the status is converted to positioning loss;

Described if the current continuous positioning failure frame number exceeds the frame number threshold, or, the distance between the current frame and the most recent successful positioning pose exceeds the set first distance threshold, then the current frame is initialized and positioned, further comprising,

According to the current continuous positioning failure frame number exceeds the frame number threshold, or the distance between the current frame and the most recent successful positioning pose exceeds the set first distance threshold, it is determined that the current positioning state is an uninitialized state,

If the initialization positioning of the current frame is successful, the current frame positioning success is recorded, and the status is switched to the positioning success state. If the current frame initialization positioning fails, the current frame positioning failure is recorded, and the current uninitialized state is maintained;

Described if the current continuous positioning failure frame number does not exceed the frame number threshold or the distance between the current frame and the most recent successful positioning does not exceed the set first distance threshold, then the current frame is relocated, further comprising,

According to the current continuous positioning failure frame number does not exceed the frame number threshold, or the distance between the current frame and the last successful positioning pose does not exceed the set first distance threshold, it is determined that the current positioning state is a positioning loss state,

If the current frame relocation and positioning is successful, record the current frame positioning success, and switch to the positioning success state; if the current frame relocation positioning fails, record the current frame positioning failure, keep the current positioning loss state, and return to the frame that judges the current continuous positioning failure The step of checking whether the number of frames exceeds the set threshold of the number of frames, or whether the distance between the current frame and the pose of the last successful positioning exceeds the set first distance threshold.
The method according to claim 4, wherein, based on the positioning logic, according to the positioning process indicated by the positioning strategy in each state, the candidate map range in the map is determined for the current frame, including:

When the current frame is initialized and positioned, all map points in the map are used as the first candidate map range, or the auxiliary information is used to obtain the first candidate map range; the map points within the first candidate map range are screened by brute force matching, If the feature matching degree of every two map points exceeds the set second matching threshold, then randomly delete one of the map points to obtain the revised first candidate map range;

When performing normal positioning on the current frame, the pose prediction is performed on the current frame according to the inter-frame motion information from the previous frame to the current frame, and the predicted pose of the current frame is obtained; according to the predicted pose of the current frame, the second frame in the map is determined. candidate map range;

When relocating the current frame, according to the historical frame that has been successfully positioned, trace forward the closest frame to the current frame in the historical frame and use it as the reference frame, and use the reference frame as the previous frame, and according to the previous frame to the current frame The motion information between frames is obtained, the pose prediction is performed on the current frame, and the predicted pose of the current frame is obtained; according to the predicted pose of the current frame, the third candidate map range in the map is determined.
The method according to claim 5, wherein, according to the inter-frame motion information from the previous frame to the current frame, performing pose prediction on the current frame to obtain the predicted pose of the current frame, comprising:

The first method: Obtain the frame pose transformation from the previous frame to the current frame through the wheel odometer or inertial measurement element, and obtain the predicted pose of the current frame based on the inter-frame pose transformation and the positioning result of the previous frame. ;

or

The second method: obtain the inter-frame pose transformation from the previous frame to the current frame through the visual odometry, and obtain the predicted pose of the current frame based on the inter-frame pose transformation and the positioning result of the previous frame;

or

The third method: According to the historical frame for which the positioning result has been obtained, predict the inter-frame pose transformation from the previous frame to the current frame, and obtain the predicted pose of the current frame based on the inter-frame pose transformation and the positioning result of the previous frame ;

or,

The fourth method: adopt at least two of the first method, the second method, and the third method to obtain the first predicted pose of the current frame respectively, and obtain at least two first predicted poses;

The Kalman filter is used to filter the at least two first predicted poses to obtain a filtered second predicted pose, and the second predicted pose is used as the final predicted pose of the current frame; or, nonlinear optimization is used. method, optimize based on the at least two first predicted poses, obtain an optimized second predicted pose, and use the second predicted pose as the final predicted pose of the current frame;

Wherein, the objective function of nonlinear optimization is the sum of various error terms obtained according to various methods, wherein, using a nonlinear optimization method to optimize based on the at least two first predicted poses, including: The pose, the first predicted pose of the current frame obtained in different ways, and the pose transformation between frames are used as initial values, and are substituted into the objective function of nonlinear optimization to solve the position when the objective function of nonlinear optimization achieves the minimum value. pose as the second predicted pose.
The method according to claim 5 or 6, wherein, determining the second candidate map range in the map according to the predicted pose of the current frame, comprising:

Taking the map position determined by the predicted pose of the current frame as the center, the first neighborhood of the center is determined as the second candidate map range,

or

For each feature point of the current frame, according to the predicted pose of the current frame and the pixel coordinates of the feature point, calculate the position of the projection point projected from the feature point to the map, and take the position of the projection point as the center of the first neighborhood as the center. The second candidate map range of the feature point;

Described according to the predicted pose of the current frame, determine the third candidate map range in the map, including,

Taking the map position determined by the predicted pose of the current frame as the center, the second neighborhood of the center is determined as the third candidate map range,

or

For each feature point of the current frame, according to the predicted pose of the current frame and the pixel coordinates of the feature point, calculate the position of the projection point projected from the feature point to the map, and the second neighborhood with the position of the projection point as the center is used as the The third candidate map range of the feature point;

The range of the second neighborhood is larger than the range of the first area.
The method according to claim 2, wherein, based on the matching feature points, using a random sampling consistency algorithm to determine the best matching feature point set, comprising:

From the set of matching feature points formed by the matching feature points, randomly select the matching feature points for calculating the pose estimation of the current frame, and obtain a subset of the current matching feature points;

Based on the mapping between the spatial position information and the pixel position information established by the matching feature points in the current matching feature point subset, the current pose is calculated, and the fitted pose estimation of the current matching feature point subset is obtained,

Obtain the spatial positions of all feature points in the current frame according to the fitted pose estimation and camera internal parameters, and obtain the projected point spatial positions of all feature points;

For each matching feature point in the matching feature point set, according to the spatial position of the projected point of the matching feature point, determine whether the distance between the projected point of the matching feature point in the current frame and the map point matched by the matching feature point in the map is less than If the set second distance threshold is less than the set second distance threshold, then determine that the matching feature point is an inner point; repeatedly perform the described judgment of the projection point of the matching feature point in the current frame and the matching feature point in the map Whether the distance of the matched map points is less than the set second distance threshold, until all matching feature points in the current matching feature point set have been judged as interior points;

Count the number of current interior points, and judge whether the current number of interior points is the largest in the previous iterations. If it is the largest in the previous iterations, the set formed by the current interior points is used as the current best matching feature point set;

Judging whether the end condition is reached, if the constraint condition is met, the current best matching feature point set is used as the final best matching feature point set, if the constraint condition is not met, the matching feature point set formed from the matching feature points is returned. The step of randomly selecting matching feature points for calculating the pose estimation of the current frame;

Described judging whether the best matching feature point set satisfies the conditions for the current frame pose calculation, including,

According to the number of matching feature points, determine whether the best matching feature point set satisfies the conditions for the current frame pose calculation;

or,

At least two or more best matching feature point sets are respectively given weights that measure the matching degree between the current frame and the candidate map range; the weights are based on the number of matching feature points in the best matching feature point set and the number of feature points extracted from the current frame. , the distribution of feature points, one of the initial number of matching feature points and the number of matching feature points in the best matching feature point set or any combination thereof is determined,

According to the set weight threshold and maximum weight, it is judged whether the best matching feature point set satisfies the conditions for the current frame pose calculation.
The method according to claim 1, wherein after calculating the pose of the current frame according to the matching feature points and obtaining a positioning result, the method further comprises:

The non-linear optimization based on sliding window is used to calculate the pose of the current frame; the optimized variable is the pose of each image frame in the sliding window, the sliding window includes the current frame, and the optimization constraints are the feature points of the current frame and the previous key Inter-frame matching constraints between frame feature points, and/or map matching constraints between current frame feature points and map points in the map,

Using the least squares method, the inter-frame matching error and/or the map matching error are minimized, and the optimized pose of the current frame is obtained as the positioning result;

in,

The map matching constraint is: the error between the pixel position of the first matching map point back-projected on the current frame and the pixel position of the first matching feature point matching the first matching map point in the current frame, or, in the current frame The error between the spatial position of the first matching feature point projected to the world coordinate system and the spatial position of the first matching map point matching the first matching feature point in the world coordinate system; the first matching feature point is : the feature point in the current frame is matched with the map point in the map to obtain the successfully matched feature point; the first matching map point is: the map point successfully matched by the first matching feature point;

The inter-frame matching constraints are: the first matching feature point in the current frame is projected to the spatial position in the world coordinate system and the second matching feature point matching the first matching feature point in the previous key frame of the current frame is projected to the world The error between the spatial positions in the coordinate system, or, the second matching map point matching the second matching feature point is back-projected to the pixel position of the current frame and the second matching map point is back-projected to the pixel of the previous key frame error between positions;

After the current image is acquired and the current frame is obtained, feature point extraction is performed based on the current frame, and before the feature points of the current frame are obtained, the method includes performing image preprocessing on the current frame.
The method according to claim 9, wherein, using the least squares method to minimize the inter-frame matching error and/or the map matching error to obtain the optimized pose of the current frame, comprising:

constructing an objective function for optimization, the objective function for optimization is: a first result obtained by weighting the sum of map matching errors of all the first matching feature points of all frames in the current sliding window with the first weight, and , the second result obtained by weighting the sum of the inter-frame matching errors of all the second matching map points between all frames between each frame in the current sliding window and its previous key frame with the second weight, the accumulated information about the first the sum of the result and the second result;

Take the map matching error obtained according to the pose of the current frame, the spatial position information of the first matching map point, the camera internal reference, and the pixel coordinates of the first matching feature point matching the first matching map point in the current frame, as the map matching error initial value,

Taking the inter-frame matching error obtained according to the current frame pose, the spatial position information of the second matching map point, the pose of the previous key frame, and the camera internal parameter matrix as the initial value of the inter-frame matching error,

The iterative solution makes the pose of the current frame when the objective function used for optimization achieves the minimum value, and the optimized current pose is obtained;

After the optimized current frame pose is obtained, the method further includes:

When one of the following conditions is met, it is determined that the current frame is a key frame: the number of the first matching feature points in the current frame is less than the first threshold, and the number of the second matching feature points in the current frame is less than the second threshold;

If the current frame is a non-key frame, the current frame is deleted in the sliding window. If the current frame is a key frame, it is judged whether the number of frames in the current sliding window reaches the set first frame threshold, and if it reaches the set first frame threshold If the threshold is one frame, the earliest added keyframe in the sliding window will be deleted. If the set first frame threshold is not reached, the earliest added keyframe in the sliding window will not be deleted;

The performing image preprocessing on the current frame includes,

According to the distortion coefficient of the camera, the current frame is de-distorted to obtain a de-distorted image,

Determine whether the pixel value of each pixel in the dedistorted image is greater than the first pixel threshold, if it is greater than the first pixel threshold, invert the pixels whose pixel value is greater than the first pixel threshold, and then perform image filtering to obtain the background image , if it is not greater than the first pixel threshold, the dedistorted image is subjected to image filtering to obtain the background image,

Subtract the background image from the undistorted image to get the foreground image,

It is judged whether the pixel values in the foreground image are evenly distributed. If it is uniform, the foreground image is used as the current frame after image preprocessing. If it is not uniform, the foreground image is stretched to obtain the current frame after image preprocessing.
The method of claim 10, wherein performing the stretching process on the foreground image comprises:

If the pixel value of the foreground image is less than or equal to the set minimum gray value, the pixel value of the foreground image is the minimum value within the pixel value range;

If the pixel value of the foreground image is greater than the minimum gray value and less than the set maximum gray value, the pixel value that is proportional to the maximum value of the pixel is used as the pixel value of the foreground image; the ratio is the pixel value of the foreground image and the minimum value. The ratio of the difference between the gray values to the difference between the maximum gray value and the minimum gray value;

If the pixel value of the foreground image is greater than or equal to the maximum gray value, the pixel value of the foreground image is the maximum value within the range of pixel values;

Described based on the current frame, feature point extraction is performed to obtain the feature points of the current frame, including,

Perform feature detection on the current frame to obtain feature points,

divide the current frame into a predetermined number of grids,

For the feature points in any grid, the feature points in the grid are arranged in descending order according to the response value of the feature points, the first Q feature points are retained, and the filtered feature points are obtained; among them, Q is based on the feature points in the target image frame. The number and the set upper limit of the total number of feature points, and the total number of grid feature points are determined;

Feature descriptors are calculated separately for each feature point after screening.
A visual positioning device based on a visual map, the device comprising,

The image acquisition module collects the current image and obtains the current frame;

The feature extraction module, based on the current frame, performs feature point extraction to obtain the feature points of the current frame;

The positioning module determines the positioning strategy according to the current positioning state. Based on the positioning strategy, the feature points in the current frame are matched with the map points in the map to obtain the matching feature points. When the matching feature points meet the requirements for the current frame pose calculation. When conditions are met, the pose of the current frame is calculated according to the matching feature points, and the positioning result is obtained.
A computer-readable storage medium, storing a computer program in the storage medium, the computer program implementing the steps of the visual map-based visual positioning method according to any one of claims 1 to 11 when the computer program is executed by a processor.
A mobile robot comprising a memory and a processor, wherein the memory stores a computer program, the processor is configured to execute the computer program to implement the steps of the method of any one of claims 1 to 11 above.
A computer program product containing instructions, when the computer program product containing instructions is run on a computer, causes the computer to perform the method steps of any one of claims 1 to 11.
A computer program which, when run on a computer, causes the computer to perform the method steps of any one of claims 1 to 11.