CN117451052A

CN117451052A - Positioning method, device, equipment and storage medium based on vision and wheel speed meter

Info

Publication number: CN117451052A
Application number: CN202311409697.3A
Authority: CN
Inventors: 王家麟; 胡楠
Original assignee: Shenzhen Middle School
Current assignee: Shenzhen Middle School
Priority date: 2023-10-26
Filing date: 2023-10-26
Publication date: 2024-01-26

Abstract

The application relates to the technical field of visual positioning, and provides a positioning method, a device, equipment and a storage medium based on vision and a wheel speed meter, wherein the positioning method based on the vision and the wheel speed meter comprises the following steps: acquiring a current image frame; loading pre-constructed prior map data; matching the current image frame with the prior map data; if the current image frame is successfully matched with the prior map data, performing visual repositioning based on the matched image data; and if the matching of the current image frame and the prior map data is unsuccessful, positioning based on the mileage data of the wheel speed meter. By combining visual detection with a wheel speed meter for positioning, compared with a visual SLAM technology using a three-dimensional positioning map construction algorithm, the method has the advantages that a large amount of calculation resources are not needed, the problem of error accumulation is avoided, and indoor high-precision positioning can be realized.

Description

Positioning method, device, equipment and storage medium based on vision and wheel speed meter

Technical Field

The application relates to the technical field of visual positioning, in particular to a positioning method, a positioning device, positioning equipment and a storage medium based on vision and a wheel speed meter.

Background

With the continuous progress of science and technology and the expansion of application scenes of robots, the robots need to be able to accurately sense and understand environments so as to realize tasks such as autonomous navigation, target tracking, scene reconstruction and the like.

The robot vision positioning technology can enable the robot to determine the position and the posture of the robot according to the vision information, and accurately sense the three-dimensional structure and the object attribute of the surrounding environment. Therefore, the robot vision positioning technology is one of important fields of modern robot research. Currently, the heart of the research on the robot vision positioning technology is synchronous positioning and map building (Simultaneous Localization And Mapping, SLAM) technology. Most of common SLAM technologies utilize three-dimensional positioning and map construction algorithms, and have the characteristics of constructing a semi-dense point cloud picture, being capable of adapting to different exposure environments and luminosity differences, being difficult to lose view when facing fast camera movement, and the like. However, since the three-dimensional positioning and map construction algorithm needs to undergo multiple loop detection, errors occur in each loop detection. Therefore, the visual SLAM technique using the three-dimensional localization and mapping algorithm not only requires a large amount of computing resources but also has a problem of error accumulation.

Disclosure of Invention

In view of this, the embodiments of the present application provide a positioning method, apparatus, device, and storage medium based on vision and wheel speed meter, which, by combining vision detection and wheel speed meter movement for positioning, can realize indoor high-precision positioning without requiring a large amount of computing resources and without the problem of error accumulation, compared with the vision SLAM technology using a three-dimensional positioning map construction algorithm.

In a first aspect, embodiments of the present application provide a positioning method based on vision and a wheel speed meter, including:

acquiring a current image frame;

loading pre-constructed prior map data;

matching the current image frame with the prior map data;

if the current image frame is successfully matched with the prior map data, performing visual repositioning based on the matched image data;

and if the matching of the current image frame and the prior map data is unsuccessful, positioning based on the mileage data of the wheel speed meter.

In one embodiment, the a priori map data includes: track data and first key feature points of image frames corresponding to each track point in the track data;

matching the current image frame with the prior map data, including:

extracting a second key feature point from the current image frame;

And respectively calculating the similarity between the first key feature point and the second key feature point of the image frame corresponding to each track point.

In one embodiment, if the matching of the current image frame with the prior map data is successful, performing the visual repositioning based on the matched image data includes:

if the similarity between the first key feature point and the second key feature point of the image frame corresponding to the track point is larger than a preset similarity threshold value, determining that the current image frame is successfully matched with the prior map data;

and performing visual repositioning based on the first key feature point and the second key feature point of the image frame corresponding to the track point.

In an embodiment, the current image frame includes left camera image data and right camera image data;

based on the first key feature point and the second key feature point of the image frame corresponding to the track point, performing visual repositioning, including:

respectively extracting a first characteristic point in the left camera image data and a second characteristic point in the right camera image data, wherein the first key characteristic point comprises the first characteristic point and the second characteristic point;

performing binocular polar line matching on the first characteristic points and the second characteristic points, and calculating three-dimensional space coordinates of the matched first matching characteristic points;

Extracting a second matching feature point matched with the second key feature point from the first matching feature point, converting the three-dimensional space coordinates of the second matching feature point based on a predetermined coordinate projection rule, and obtaining projection coordinates of the second matching feature point in the two-dimensional image;

and positioning according to the three-dimensional space coordinates of the second matching feature points and the projection coordinates of the second matching feature points in the two-dimensional image.

In one embodiment, if the matching of the current image frame with the prior map data is unsuccessful, locating based on the wheel speed mileage data comprises:

if the similarity of the first key feature points and the second key feature points of the image frames corresponding to all the track points is smaller than or equal to a preset similarity threshold value, determining that the current image frame is unsuccessfully matched with the prior map data;

respectively acquiring pose information of the current moment and the last moment and motion increment of the current moment and the last moment;

and positioning according to the pose information and the motion increment.

In an embodiment, before loading the pre-constructed prior map data, further comprising:

analyzing the current image frame based on a visual loop detection algorithm to obtain a loop image frame;

And carrying out loop calibration on all image frames between the current image frame and the loop image frame, and constructing the prior map data based on the image frames after the loop calibration.

In an embodiment, loop calibration is performed for all image frames between a current image frame and a loop image frame, comprising:

and carrying out loop calibration on all image frames between the current image frame and the loop image frame based on the preset adjacent frame constraint factor and the preset loop constraint factor.

In a second aspect, embodiments of the present application provide a positioning device based on vision and wheel speed meters, comprising:

the acquisition module is used for acquiring the current image frame;

the loading module is used for loading the pre-constructed prior map data;

the matching module is used for matching the current image frame with the prior map data;

the first positioning module is used for performing visual repositioning based on the matched image data if the current image frame is successfully matched with the prior map data;

and the second positioning module is used for positioning based on the wheel speed meter mileage data if the current image frame is not successfully matched with the prior map data.

A matching module, comprising:

an extracting unit for extracting a second key feature point from the current image frame;

and the computing unit is used for computing the similarity between the first key feature point and the second key feature point of the image frame corresponding to each track point.

In one embodiment, a first positioning module includes:

the first determining unit is used for determining that the matching between the current image frame and the prior map data is successful if the similarity between the first key feature point and the second key feature point of the image frame corresponding to the track point is greater than a preset similarity threshold value;

and the first positioning unit is used for performing visual repositioning based on the first key feature points and the second key feature points of the image frames corresponding to the track points.

a first positioning unit comprising:

an extraction subunit, configured to extract a first feature point in the left camera image data and a second feature point in the right camera image data, where the first key feature point includes a first feature point and a second feature point;

the matching subunit is used for carrying out binocular polar line matching on the first characteristic points and the second characteristic points, and calculating three-dimensional space coordinates of the matched first matching characteristic points;

The conversion subunit is used for extracting a second matching characteristic point matched with the second key characteristic point from the first matching characteristic point, converting the three-dimensional space coordinates of the second matching characteristic point based on a predetermined coordinate projection rule, and obtaining projection coordinates of the second matching characteristic point in the two-dimensional image;

and the positioning subunit is used for positioning according to the three-dimensional space coordinates of the second matching characteristic points and the projection coordinates of the second matching characteristic points in the two-dimensional image.

In one embodiment, the second positioning module includes:

the second determining unit is used for determining that the matching between the current image frame and the prior map data is unsuccessful if the similarity between the first key feature points and the second key feature points of the image frames corresponding to all the track points is smaller than or equal to a preset similarity threshold value;

the acquisition unit is used for respectively acquiring pose information of the current moment and the last moment and motion increment of the current moment and the last moment;

and the second positioning unit is used for positioning according to the pose information and the motion increment.

In an embodiment, further comprising:

the analysis module is used for analyzing the current image frame based on a visual loop detection algorithm to obtain a loop image frame;

And the calibration module is used for carrying out loop calibration on all the image frames between the current image frame and the loop image frame, and constructing the prior map data based on the image frames after the loop calibration.

In one embodiment, the calibration module is specifically configured to:

and carrying out loop calibration on all image frames between the current image frame and the loop image frame based on a preset adjacent frame constraint factor and a preset loop constraint factor.

A third aspect of the present application provides a positioning apparatus, comprising: a memory and a processor; the memory is used for storing a computer program; a processor for executing the computer program and for implementing the vision and wheel speed meter based positioning method as described in the first aspect above when the computer program is executed.

A fourth aspect of the present application provides a computer-readable storage medium storing a computer program; the computer program, when executed by one or more processors, causes the one or more processors to perform the vision and wheel speed meter based positioning method as described in the first aspect above.

The embodiment of the application provides a positioning method, a device, equipment and a storage medium based on vision and a wheel speed meter, wherein the positioning method based on the vision and the wheel speed meter comprises the following steps: acquiring a current image frame; loading pre-constructed prior map data; matching the current image frame with the prior map data; if the current image frame is successfully matched with the prior map data, performing visual repositioning based on the matched image data; and if the matching of the current image frame and the prior map data is unsuccessful, positioning based on the mileage data of the wheel speed meter. By combining visual detection with a wheel speed meter for positioning, compared with a visual SLAM technology using a three-dimensional positioning map construction algorithm, the method has the advantages that a large amount of calculation resources are not needed, the problem of error accumulation is avoided, and indoor high-precision positioning can be realized.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a visual and wheel speed meter based positioning method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a reprojection error of a three-dimensional coordinate to a two-dimensional image obtained by using a PNP algorithm according to an embodiment of the present application;

FIG. 3 is a flow chart of a vision and wheel speed meter based positioning method according to another embodiment of the present application;

FIG. 4 is a schematic structural view of a positioning device based on vision and wheel speed gauges according to an embodiment of the present application;

fig. 5 is a schematic block diagram of a positioning device provided in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

It is also to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Before describing the technical solution provided in the present application, it should be noted that the robot vision positioning technology is one of the important fields in modern robot research. Along with continuous progress of technology and expansion of application scenes of robots, the robots need to be capable of accurately sensing and understanding environments so as to realize tasks such as autonomous navigation, target tracking, scene reconstruction and the like. Among them, synchronous positioning and mapping (SLAM, simultaneous Localization And Mapping), i.e. SLAM technology, has been rapidly developed since the first time in the 80 s of the last century as a core technology for these applications, which has a strong theoretical basis and a wide application value, and has been a research hotspot in the fields of robots, computer vision, etc. since the recent years, a research history of about 40 years has been internationally developed. The SLAM technology aims at enabling a robot to autonomously sense a motion state and external environment information in a motion process by carrying a specific sensor, so that the tasks of self positioning and environment map construction are realized. In addition, SLAM technology emphasizes real-time, i.e., all tasks need to be completed on-line in real time with the motion of the robot, and thus is called "synchronous positioning and mapping", and is divided into two tasks of positioning and mapping. More specifically, the positioning in SLAM shall be referred to as "pose estimation", which is a process of estimating a high-precision 6-degree-of-freedom (DoF, degree of Freedom) pose from observation data of a sensor, wherein the pose includes XYZ coordinates and orientations of a robot in three-dimensional space.

Common sensors implementing SLAM technology include cameras, lidar, inertial measurement units (IMUs, inertial Measurement Unit), wheel speed meters, and the like. The camera is used as a vision sensor, and the collected image contains very rich environmental information. Vision-based localization and scene reconstruction tasks can be achieved using techniques such as multi-view geometry, image processing, and the like. The camera has the advantages of low cost, low power consumption, easy installation, rich sensing information and the like, so the camera becomes the most widely applied sensor in SLAM technology. The wheel speed meter is a sensor directly mounted on the wheel, and the angle of rotation of the wheel is measured through a photoelectric encoder or a rotary encoder to further calculate the advancing distance of the wheel. It has the advantages of simplicity, easy use, small volume, etc.

In a 2D scene, the positioning system realized based on the wheel speed meter has the advantages of simplicity and stability in operation, no environmental interference and the like, but the positioning error is larger and larger along with the accumulation of the moving distance. Accordingly, a method for assisting fusion positioning of wheel speed meters using visual images is presented herein to address this problem.

The technical solutions provided in the present application will be described in detail below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flow chart of a positioning method based on vision and a wheel speed meter according to an embodiment of the present application. The vision and wheel speed meter based positioning method may be implemented by a positioning device including a wheel speed meter, including but not limited to a mobile device such as a robot, a garbage truck, or the like.

As can be seen from fig. 1, the positioning method based on vision and wheel speed meter provided in the present embodiment includes steps S101 to S105, which are described in detail below.

S101: a current image frame is acquired.

The current image frame is acquired by a binocular camera, including left camera image data and right camera image data. In this embodiment, the binocular camera is fixed directly in front of the top of a positioning apparatus, such as a robot, for capturing images. Specifically, the model of the binocular camera is not limited in this embodiment, and it can be flexibly selected according to the application scene requirement.

It should be noted that, the positioning device may be connected with other devices in a communication manner, and the other devices are used as a master control device of the positioning device to control the positioning device to move in a preset space, for example, a room according to a preset path, and in the moving process, the positioning device collects image data in a corresponding environment in real time. Specifically, the master control device of the positioning device includes, but is not limited to, a notebook computer, a server, and the like. Of course, the positioning device can be controlled by the controller to move in the preset space, and image data under the corresponding environment can be acquired in real time in the moving process.

S102: and loading the pre-constructed prior map data.

Wherein the prior map data comprises: track data and first key feature points of image frames corresponding to each track point in the track data.

Specifically, each track point in the track data respectively identifies a positioning result, namely a pose, of each corresponding image frame; the first key feature points of the image frames corresponding to each track point respectively comprise key points extracted from the corresponding image frames, depth values of each key point, descriptors of each key point, word bag vectors formed by descriptors of each key point in the same image frame and depth values of each key point.

The key points are a preset number of feature points, for example, 500 feature points, uniformly extracted from the corresponding image frame using a preset feature point extraction algorithm (OrientedFASTandRotatedBRIEF, ORB). The ORB has the advantages of high calculation speed, stable feature matching and the like, and can meet the real-time requirement in use.

In this embodiment, each key point includes 256-bit (32-byte) descriptors, and each descriptor has information encoding some characteristics of the surrounding area of the key point, so that the similarity of the corresponding key point can be determined based on the descriptors of different key points. The bag-of-word vector is a vector formed by descriptors and is used for describing appearance information of corresponding image frames. Matching of feature points can be completed by calculating the similarity between descriptors, and matching of image frames can be completed by calculating the similarity of bag-of-word vectors.

The depth value of each key point can be obtained by solving according to the pose relation of the frame. Specifically, for a binocular camera, matching feature points in left image data and right image data are determined by calculating the similarity between the left image data and the right image data, and for the feature points in the left image data and the right image data that have been successfully matched, since the relative pose between the images is known, the depth values of the corresponding matching feature points can be solved.

The process of solving the depth value (or 3D coordinates) of the matched point according to the relative pose between images is called a triangulation process or a triangularization process. Specifically, assume that the depth of the matching feature point in the k-th frame left image data and right image data is λ _k The coordinate of the feature point on the normalized plane is p _k Wherein p is _k ＝[x _k ,y _k ,1] ^T The 3D coordinate under the world coordinate system is P _w Rotation and translation from normalized plane to world coordinate system are R respectively _k And t _k The following relationship is present:

λ _k p _k ＝T _kw P _w wherein

Wherein T is _kw Is a projection matrix of the world coordinate system to the camera coordinate system.

By T of _kw Taking different lines to obtain 3D coordinates P of the feature points _w Constraint relation equation between corresponding image frames, in the corresponding constraint relation equation, only P _w Is an unknown number. For either image frame, two corresponding constraint relation equations can be provided. Therefore, in binocular vision, the successfully matched feature points have two image frames, 4 corresponding constraint relation equations can be listed, and a linear system is constructed through the 4 constraint relation methods to solve the 3D coordinates P of the feature points _w 。

For example, for T _kw Taking the third row, we can get: lambda (lambda) _k ＝T _kw,3 P _w

Wherein T is _kw,3 Representing T _kw Is a third row of (c). Lambda to be described above _k Substitution into T _kw The first two rows of (a) cancel lambda _k Obtaining:

the two equations above give the 3D coordinates P of the feature points _w Constraint relationship between its corresponding image frame, where there is only P _w Is an unknown number. For successfully matched feature points, corresponding to left image data and right image data, two image frames can list 4 equations of corresponding constraint relations, namely:

here P _w The solution of (2) is a non-zero element in the zero space, and can be obtained by performing decomposition calculation through singular values (SingularValueDecomposition, SVD). Specifically, the process of SVD decomposition calculation is not described here.

It should be noted that, the purpose of constructing the prior map data is to save prior information of the positioning device, and by enabling the positioning device to walk at each angle in the application environment as much as possible, enough prior information is obtained to ensure that when the positioning device moves in the corresponding scene for a long time, loop calibration is performed by using the corresponding prior information, and accumulated errors of the wheel speed meter are corrected and eliminated, so that the positioning of the positioning device in the corresponding scene is ensured not to drift.

S103: the current image frame is matched with the prior map data.

Matching the current image frame with the prior map data, including: extracting a second key feature point from the current image frame; and respectively calculating the similarity between the first key feature point and the second key feature point of the image frame corresponding to each track point.

Specifically, the first key feature point of each image frame in the prior map data and the second key feature point in the current image frame may each constitute a corresponding bag-of-word vector. The term bag vector is explained in detail in the previous steps and will not be described here. The method for calculating the similarity between the first key feature point and the second key feature point of the image frame corresponding to each track point comprises the following steps: and respectively calculating the similarity between the bag-of-word vector formed by the first key feature points and the bag-of-word vector formed by the second key feature points of the image frames corresponding to the track points. Specifically, the algorithm for calculating the similarity between vectors includes, but is not limited to, cosine similarity method, euclidean distance method, manhattan distance method, minkowski distance method, and Jack similarity method. The algorithm for calculating the similarity between vectors is not limited in this embodiment.

S104: and if the current image frame is successfully matched with the prior map data, performing visual repositioning based on the matched image data.

In one embodiment, if the matching of the current image frame with the prior map data is successful, performing the visual repositioning based on the matched image data includes: if the similarity between the first key feature point and the second key feature point of the image frame corresponding to the track point is larger than a preset similarity threshold value, determining that the current image frame is successfully matched with the prior map data; and performing visual repositioning based on the first key feature point and the second key feature point of the image frame corresponding to the track point.

Specifically, the current image frame includes left camera image data and right camera image data; based on the first key feature point and the second key feature point of the image frame corresponding to the track point, performing visual repositioning comprises: respectively extracting a first characteristic point in the left camera image data and a second characteristic point in the right camera image data, wherein the first key characteristic point comprises the first characteristic point and the second characteristic point; performing binocular polar line matching on the first characteristic points and the second characteristic points, and calculating three-dimensional space coordinates of the matched first matching characteristic points; extracting a second matching feature point matched with the second key feature point from the first matching feature point, converting the three-dimensional space coordinates of the second matching feature point based on a predetermined coordinate projection rule, and obtaining projection coordinates of the second matching feature point in the two-dimensional image; and positioning according to the three-dimensional space coordinates of the second matching feature points and the projection coordinates of the second matching feature points in the two-dimensional image.

The process of calculating the three-dimensional space coordinates of the matched first matching feature points can refer to the description of the previous steps, and will not be repeated here. After the three-dimensional space coordinates of the matched first matching feature points are obtained, the projection coordinates of the second matching feature points in the two-dimensional image are obtained based on a predetermined coordinate projection rule, for example, based on a multi-camera pose estimation algorithm (PNP) on a three-dimensional world coordinate system. The objective of the PNP algorithm is to minimize the three-dimensional coordinate to two-dimensional image reprojection error, and in particular, the PNP algorithm is used to obtain the three-dimensional coordinate to two-dimensional image reprojection error as shown in fig. 2.

The principle of solving the pose by using the PNP algorithm is as follows: after the matching relation between the three-dimensional coordinates and the two-dimensional image is established, all the three-dimensional coordinates are projected onto the image plane of the image frame to be solved, and the difference between the coordinates of the projection points and the coordinates of the matching points is calculated, wherein the difference is the re-projection error.

In FIG. 2, q1 to q5 respectively represent projected points of three-dimensional coordinates, p ₁ To p ₅ Separate tableThe s1 to s5 line segments, showing the matched points, represent the re-projection errors, respectively. The pose of the image frame is adjusted so that the sum of the re-projection errors between all projection points and the matching points is minimum, and then the pose at the moment is the true pose of the corresponding image frame. Thus, pnP can be defined to solve the nonlinear optimization problem as follows:

Wherein T is _iw Representing the pose of the world system to be solved to the camera system, P _k Andthree-dimensional coordinates representing the kth feature point and its corresponding coordinates of the matching point on the frame image plane, respectively,/->As a kernel function, summing k represents accumulating all the re-projection errors in the image frame.

S105: and if the matching of the current image frame and the prior map data is unsuccessful, positioning based on the mileage data of the wheel speed meter.

In one embodiment, if the matching of the current image frame with the prior map data is unsuccessful, locating based on the wheel speed mileage data comprises: if the similarity of the first key feature points and the second key feature points of the image frames corresponding to all the track points is smaller than or equal to a preset similarity threshold value, determining that the current image frame is unsuccessfully matched with the prior map data; respectively acquiring pose information of the current moment and the last moment and motion increment of the current moment and the last moment; and positioning according to the pose information and the motion increment.

Specifically, pose information at the current time and the last time are pose information in a world coordinate system, and are respectively expressed as: (x ', y ', θ ') and (x, y, θ); the motion increment at the current moment and the last moment is the motion increment under a positioning device coordinate system such as a robot coordinate system, and is expressed as follows: (dx, dy, dθ). The motion delta needs to be converted from the positioning device coordinate system to the world coordinate system before positioning is performed according to pose information and the motion delta. In particular, the process of converting motion delta from the pointing device coordinate system to the world coordinate system may be expressed as follows:

Wherein (x, y) and (x ', y ') denote horizontal positions in the world coordinate system, and θ ' denote directions in the world coordinate system, respectively.

Further, the error can be eliminated by adding noise, specifically expressed as after adding noise:

in the positioning task of the two-dimensional scene, the odometer is realized by a wheel speed meter, is not influenced by the environment, and ensures the stability of the system; the visual information is used for loop detection, namely judging whether the robot reaches the place where the robot passes through previously, so that a pose relation is constructed, the accumulated error of the wheel speed meter is corrected, and the positioning accuracy is improved. Compared with a purely visual positioning scheme, the method has lower computational complexity and can be applied to embedded equipment with lower computational effort.

As can be seen from the above analysis, the positioning method based on vision and wheel speed meter provided in the embodiment of the present application includes: acquiring a current image frame; loading pre-constructed prior map data; matching the current image frame with the prior map data; if the current image frame is successfully matched with the prior map data, performing visual repositioning based on the matched image data; and if the matching of the current image frame and the prior map data is unsuccessful, positioning based on the mileage data of the wheel speed meter. By combining visual detection with a wheel speed meter for positioning, compared with a visual SLAM technology using a three-dimensional positioning map construction algorithm, the method has the advantages that a large amount of calculation resources are not needed, the problem of error accumulation is avoided, and indoor high-precision positioning can be realized.

Referring to fig. 3, fig. 3 is a flow chart of a positioning method based on vision and a wheel speed meter according to another embodiment of the present application. The specific implementation procedures of S301 and S101 and S304 to S307 and S102 to S105 are the same in this embodiment compared with the embodiment shown in fig. 1, except that S302 and S303 are further included before S304. The details are as follows:

s301: a current image frame is acquired.

S302: and analyzing the current image frame based on a visual loop detection algorithm to obtain a loop image frame.

Specifically, the similarity between the current image frame and each of the preset number of historical image frames is calculated, and if the similarity between the historical image frame and the current image frame is greater than a preset similarity threshold, the historical image frame is determined to be a loop image frame of the current image frame. The algorithm for calculating the similarity between the image frames can be referred to in the embodiment of fig. 1, and will not be described herein.

S303: and carrying out loop calibration on all the image frames between the current image frame and the loop image frame, and constructing prior map data based on the image frames after the loop calibration.

Wherein loop checking is performed on all image frames between the current image frame and the loop image frame, including: and carrying out loop calibration on all image frames between the current image frame and the loop image frame based on the preset adjacent frame constraint factor and the preset loop constraint factor.

Specifically, a preset adjacent frame constraint factor is used to optimize pose changes between the front and rear adjacent frames. The preset loop constraint factor is used for optimizing the pose relation between the current frame and the loop frame. Assume thatAnd->Original pose of i frame and i+1 frame are respectively +.>Residual errors are pose errors; their translation amounts are +.>And-> Translation residual, then the constraint residual between the adjacent frames (also called adjacent frame constraint factor) is defined as follows:

wherein->

Wherein the parameters to be optimized include { θ } _i ,θ _j ,t _i ,t _i+1 In addition, all the poses in the above formula are under the camera coordinate system, and the original poses are obtained through a wheel speed meter, and the poses under the wheel speed meter coordinate system are transformed under the camera system by using parameters calibrated in advance.

For loop constraint, let loop residual be r _loop In the same manner as the adjacent frame constraint factor, except thatAnd->Pose T estimated by visual interframe _ij Obtained. Thus, the process of loop checking all image frames between the current image frame and the loop image frame can be defined as a nonlinear optimization problem as follows:

wherein r is _k,k+1 Representing a constrained residual between the Kth frame and the Kth+1st frame, r _loop Is a loop-back residual.

It should be understood that in practiceIn this case, a plurality of loop constraints may occur, and the above procedure of loop checking all image frames between the current image frame and the loop image frame is defined by taking the residuals of all adjacent frames and the loop residuals into account. In addition, the optimization problem has three degrees of freedom which are not considerable in practice, zero-space drift can occur in the optimization result, namely, the optimized pose is subjected to arbitrary two-dimensional Euclidean transformation, and the result still meets the optimization condition. Therefore, after the optimization is completed, all the poses need to be transformed to the poses of the original first frameAnd eliminating drift in the optimization process. Let the pose after the k frame is removed from drift be T _k ' then there is:

the pose obtained by removing the zero space drift is taken as the final result of loop correction.

S304: the a priori map data is loaded.

S305: the current image frame is matched with the prior map data.

S306: and if the current image frame is successfully matched with the prior map data, performing visual repositioning based on the matched image data.

S307: and if the matching of the current image frame and the prior map data is unsuccessful, positioning based on the mileage data of the wheel speed meter.

According to the positioning method based on the vision and the wheel speed meter, which is provided by the embodiment of the application, the loop checking technology is introduced in the process of constructing the prior map data, so that more comprehensive and accurate prior map data can be obtained, and the positioning accuracy is improved in the process of positioning based on the prior map data.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a positioning device based on vision and a wheel speed meter according to an embodiment of the present application. As can be seen from fig. 4, the device 40 comprises:

an acquisition module 401, configured to acquire a current image frame;

a loading module 402, configured to load pre-constructed prior map data;

a matching module 403, configured to match the current image frame with the prior map data;

a first positioning module 404, configured to perform visual repositioning based on the matched image data if the current image frame is successfully matched with the prior map data;

and a second positioning module 405, configured to perform positioning based on the wheel speed meter mileage data if the current image frame is not successfully matched with the prior map data.

A matching module 403, comprising:

In one embodiment, the first positioning module 404 includes:

a first positioning unit comprising:

In one embodiment, the second positioning module 405 includes:

In an embodiment, further comprising:

In one embodiment, the calibration module is specifically configured to:

Referring to fig. 5, fig. 5 is a schematic block diagram of a positioning device according to an embodiment of the present application. The positioning device 50 comprises not only a binocular camera 501, a processor 502, a memory 503.

The processor 502 and the memory 503 are illustratively connected by a bus 504, the bus 504 being, for example, an I2C (Inter-integrated Circuit) bus. The processor 502 and the memory 503 may be integrated within a touch-operated screen to form an integrated device.

Specifically, the processor 502 may be a Micro-controller Unit (MCU), a central processing Unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP), or the like.

Specifically, the Memory 503 may be a Flash chip, a Read-Only Memory (ROM) disk, an optical disk, a U-disk, a removable hard disk, or the like.

Wherein the processor 502 is adapted to run a computer program stored in the memory 503 and to carry out the steps of the above-mentioned vision-and wheel speed meter-based positioning method when the computer program is executed.

The processor 502 is for example arranged to run a computer program stored in the memory 503 and to implement the following steps when executing the computer program:

acquiring a current image frame;

loading pre-constructed prior map data;

matching the current image frame with the prior map data;

matching the current image frame with the prior map data, including:

extracting a second key feature point from the current image frame;

and positioning according to the pose information and the motion increment.

Furthermore, the present application also provides a computer-readable storage medium storing a computer program; the computer program, when executed by one or more processors, causes the one or more processors to perform the steps of a vision and wheel speed meter based positioning method.

The computer readable storage medium may be an internal storage unit of the positioning device, such as a hard disk or a memory of the positioning device, among others. The computer readable storage medium may also be an external storage device of the positioning device, such as a plug-in hard disk provided on the positioning device, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), etc.

It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

It should also be understood that the term "and/or" as used in this application and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A vision and wheel speed meter based positioning method, comprising:

acquiring a current image frame;

loading pre-constructed prior map data;

matching the current image frame with the prior map data;

and if the current image frame is not successfully matched with the prior map data, positioning based on the wheel speed meter mileage data.

2. The vision-and-wheel-speed-meter-based positioning method of claim 1, wherein the a priori map data comprises: track data and first key feature points of image frames corresponding to each track point in the track data;

Said matching said current image frame with said prior map data comprises:

extracting a second key feature point from the current image frame;

3. The vision and wheel speed meter based positioning method of claim 2, wherein if the current image frame is successfully matched with the prior map data, performing a vision repositioning based on the matched image data, comprising:

if the similarity between the first key feature points and the second key feature points of the image frames corresponding to the track points is larger than a preset similarity threshold, determining that the current image frames are successfully matched with the prior map data;

and performing visual repositioning based on the first key feature points and the second key feature points of the image frames corresponding to the track points.

4. A vision and wheel speed meter based positioning method as in claim 3 wherein the current image frame comprises left and right camera image data;

the performing visual repositioning based on the first key feature point and the second key feature point of the image frame corresponding to the track point includes:

extracting a second matching feature point matched with the second key feature point from the first matching feature point, converting the three-dimensional space coordinates of the second matching feature point based on a predetermined coordinate projection rule, and obtaining projection coordinates of the second matching feature point in a two-dimensional image;

and positioning according to the three-dimensional space coordinates of the second matching characteristic points and the projection coordinates of the second matching characteristic points in the two-dimensional image.

5. The vision-and-wheel-speed-meter-based positioning method of claim 2, wherein said positioning based on said wheel-speed-meter mileage data if said current image frame is not successfully matched with said a priori map data, comprises:

if the similarity between the first key feature points and the second key feature points of all the track points corresponding to the image frames is smaller than or equal to a preset similarity threshold value, determining that the current image frame is unsuccessfully matched with the prior map data;

and positioning according to the pose information and the motion increment.

6. The vision-and-wheel-speed-meter-based positioning method according to any one of claims 1 to 5, further comprising, prior to said loading of the pre-constructed a priori map data:

7. The vision and wheel speed meter based positioning method of claim 6, wherein said loop-back calibration of all image frames between said current image frame and said loop-back image frame comprises:

8. A vision and wheel speed meter based positioning device, comprising:

The acquisition module is used for acquiring the current image frame;

the loading module is used for loading the pre-constructed prior map data;

9. A positioning apparatus, comprising:

a memory and a processor;

the memory is used for storing a computer program;

the processor for executing the computer program and for implementing the steps of the vision and wheel speed meter based positioning method according to any one of claims 1 to 7 when the computer program is executed.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program;

the computer program, when executed by one or more processors, causes the one or more processors to perform the steps of the vision and wheel speed meter based positioning method as claimed in any one of claims 1 to 7.