CN115936029A

CN115936029A - SLAM positioning method and device based on two-dimensional code

Info

Publication number: CN115936029A
Application number: CN202211602729.7A
Authority: CN
Inventors: 秦兆博; 李琦; 谢国涛; 王晓伟; 秦洪懋; 秦晓辉; 徐彪; 丁荣军
Original assignee: Wuxi Institute Of Intelligent Control Hunan University
Current assignee: Wuxi Institute Of Intelligent Control Hunan University
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-04-07
Anticipated expiration: 2042-12-13
Also published as: CN115936029B

Abstract

The invention discloses a SLAM positioning method and a device based on two-dimensional codes, wherein the method comprises the following steps: step 1, acquiring images around a vehicle by using a vehicle-mounted camera, and acquiring a two-dimensional code and natural feature points from the images; step 2, analyzing the two-dimensional code to obtain a two-dimensional code pose, and acquiring position information of natural characteristic points; step 3, determining the pose of the camera according to the pose of the two-dimensional code and the position information of the natural characteristic points; the step 3 comprises the following steps: and constructing a residual function which comprises a natural characteristic point projection error and a two-dimensional code positioning error, constructing a least square problem according to the residual function, solving the minimum value of the residual function, and determining the camera pose according to the camera pose corresponding to the minimum value. The scheme adopted by the invention adopts a form of positioning by fusing visual SLAM with two-dimensional codes, realizes more accurate and stable positioning, and has better precision, stability and robustness.

Description

SLAM positioning method and device based on two-dimensional code

Technical Field

The invention relates to the technical field of automatic driving, in particular to a SLAM positioning method and device based on two-dimensional codes.

Background

With the development of the automatic driving technology, more and more enterprises and research institutions begin to pay attention to the landing implementation of the automatic driving technology. The Automatic Valet Parking (AVP) is used for realizing the automatic driving of a vehicle in a specific area from an entrance/exit of a Parking lot to a Parking space, and is considered as an L4-level automatic driving scene for realizing the landing at first in the industry due to the characteristics of low driving speed and capability of ensuring safety to the maximum extent in a specific operation place. The positioning is used as an important ring for autonomous passenger parking, and a prerequisite is provided for the control and decision of the vehicle. At present, the positioning technology in the parking lot environment mainly adopts SLAM (simultaneous localization and mapping) based on sensors such as a camera and a laser radar, namely, a synchronous positioning and map building technology, and can solve the problems of positioning and map building when a vehicle moves in the parking lot environment.

Currently, SLAMs are mainly classified into two types according to different sensor configurations: 1) Lidar based laser SLAM (Lidar SLAM); 2) Visual camera based VSLAM (Visual SLAM).

However, the positioning and mapping of autonomous parking for the passenger is performed by using the laser SLAM, the installation and deployment of the laser radar on the vehicle are required, the laser radar is expensive, the laser SLAM cannot utilize abundant semantic texture information of parking spaces in the parking lot, and the relative positions of the parking spaces cannot be distinguished. The vision SLAM adopts image information extracted by a camera, is greatly influenced by ambient light, cannot work in areas with weak light and poor textures, and the environment of an indoor parking lot may have dark light and most of the areas are texture-free areas such as white walls, so that the pure vision SLAM based on feature points has poor effect. Moreover, due to the influence of calibration error, sensor error, noise and the like, the accumulated error gradually increases with the increase of the operation time and the distance, and the positioning accuracy is influenced.

Therefore, it is desirable to provide a new SLAM positioning scheme.

Disclosure of Invention

The present invention is directed to a method and an apparatus for SLAM positioning based on two-dimensional code, which overcome or at least alleviate at least one of the above-mentioned drawbacks of the prior art.

In order to achieve the above object, the present invention provides a SLAM positioning method based on two-dimensional code, applied to an automatic parking vehicle, including:

step 1, acquiring images around a vehicle by using a vehicle-mounted camera, and acquiring a two-dimensional code and natural feature points from the images;

step 2, analyzing the two-dimension code to obtain the pose of the two-dimension code, and acquiring the position information of the natural feature points;

step 3, determining the pose of the camera according to the pose of the two-dimensional code and the position information of the natural feature points; the method comprises the following steps:

constructing a residual function, wherein the residual function comprises a natural characteristic point projection error and a two-dimensional code positioning error, and the residual function is represented by the following formula:

wherein f (x) represents a residual function, i represents the current ith frame image, k represents k key frame images in total, j represents the current jth map point, and n represents n map points in total; s represents the s-th two-dimensional code, and t represents that t two-dimensional codes are detected at present;

e _P,i,j representing the projection error of the natural characteristic point, and the expression is

Wherein P is _j Represents the jth map point, P, in the world coordinate system _j ＝[X _j ,Y _j ,Z _j ] ^T ,X _j ,Y _j ,Z _j As coordinates in the world coordinate system, T ⁱ Corresponding the pose of the camera in the world coordinate system to the ith frame image _ij ＝[u _ij ,v _ij ] ^T Represents the ith frame image time point P _j Actual pixel coordinates in an image>

Is the ith frame image time point P _j Projected pixel coordinate in image, s _i The scale depth of a camera corresponding to the ith frame of image is represented, and K represents an internal reference matrix of the camera;

the representation information matrix is the inverse of a covariance matrix of the preset visual observation;

e _M,i,s representing the positioning error of the two-dimensional code with the expression e _M,I,s ＝T _WM ^is T _MC ^is -T ⁱ Wherein, T _WM ^is Representing the pose T of the s-th two-dimensional code in the world coordinate system when the i-th frame image is displayed _MC ^is And when the ith frame of image is represented, based on the s two-dimensional code, positioning and recovering the camera pose, wherein the pose of the camera corresponding to the ith frame of image in the world coordinate system is T ⁱ ；

The representation information matrix is the inverse of a covariance matrix of a preset two-dimensional code and a camera pose variable;

constructing a least square problem according to the residual function, solving the minimum value of the residual function, and determining the camera pose T corresponding to the minimum value ⁱ And determining the camera pose in the world coordinate system when the image of the ith frame is taken.

Preferably, step 2 comprises:

and acquiring the identifier of the two-dimensional code and the coordinates of the four corner points.

Preferably, the method further comprises: when the two-dimensional code appearing before is detected, judging that loop appears, and executing loop detection;

the loop detection comprises the following steps:

constructing a second residual function based on two-dimensional code positioning constraint, natural feature point projection constraint and loop detection constraint, and expressing by the following formula:

wherein x is _i Representing pixel coordinates under the current frame corresponding to the ith map point;

s _i represents the depth of the ith map point;

P _i world coordinates representing the ith map point under the loopback frame; the loop frame is a frame corresponding to the last time when the two-dimensional code appears;

and solving the minimum value of the second residual function for each frame image from the loop frame to the current frame to respectively obtain the camera pose corresponding to the minimum value, and updating the camera pose of each frame from the loop frame to the current frame by using the obtained camera pose.

The embodiment of the invention also provides a SLAM positioning device based on the two-dimensional code, which is applied to automatic parking vehicles and comprises the following components:

the acquisition module is used for acquiring images around the vehicle by using the vehicle-mounted camera and acquiring the two-dimensional codes and the natural characteristic points from the images;

the processing module is used for analyzing the two-dimension code to obtain the pose of the two-dimension code and acquiring the position information of the natural characteristic points; determining the pose of the camera according to the pose of the two-dimensional code and the position information of the natural feature points;

the processing module is used for determining the camera pose according to the two-dimensional code pose and the position information of the natural feature points in the following mode:

constructing a residual function which comprises a natural characteristic point projection error and a two-dimensional code positioning error, and representing the residual function by the following formula:

Wherein P is _j Represents the jth map point, P, in the world coordinate system _j ＝[X _j ,Y _j ,Z _j ] ^T ,X _j ,Y _j ,Z _j As coordinates in the world coordinate system, T ⁱ Corresponding the pose of the camera in the world coordinate system to the ith frame image _ij ＝[u _ij ,v _ij ] ^T Represents the ith frame image time point P _j The actual pixel coordinate in the image, is->

e _M,i,s representing the positioning error of the two-dimensional code with the expression e _M,I,s ＝T _WM ^is T _MC ^is -T ⁱ Wherein, T _WM ^is Representing the pose T of the s-th two-dimensional code in the world coordinate system when the i-th frame image is displayed _MC ^is The pose of the camera which is positioned and recovered based on the s two-dimensional code when the i frame image is represented, and the pose of the camera corresponding to the i frame image under the world coordinate system is T ⁱ ；

Preferably, the processing module is configured to:

Preferably, the processing module is further configured to: when the two-dimension code appearing before is detected, judging that loop appears, and executing loop detection;

the loop detection comprises the following steps:

s _i representing the depth of the ith map point;

Due to the adoption of the technical scheme, the invention has the following advantages:

the scheme adopted by the invention adopts a form of positioning by fusing the visual SLAM with the two-dimensional code, arranges the visual two-dimensional code, fuses the artificial label information with the natural characteristics and realizes more accurate and stable positioning. The method can effectively reduce the accumulative error of classical SLAM positioning in indoor scenes such as parking lots, can achieve centimeter-level positioning accuracy, and has better accuracy, stability and robustness. When the two-dimensional code is not detected, the vehicle pose can be restored by adopting a visual SLAM method, when the two-dimensional code is detected, the vehicle pose is initialized, and the more accurate pose of the vehicle is restored through the two-dimensional code. Moreover, when the position before passing or tracking fails, the loop detection and repositioning based on the two-dimension code characteristics can realize the quick, stable and accurate positioning of the vehicle.

Drawings

Fig. 1 is a schematic flowchart of a SLAM positioning method based on a two-dimensional code according to an embodiment of the present invention.

Fig. 2 is a schematic flowchart of a SLAM positioning method based on two-dimensional codes according to an embodiment of the present invention.

Fig. 3 is a schematic diagram illustrating extraction of natural feature points in the method according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a two-dimensional code in the method according to an embodiment of the present invention.

Fig. 5 is a schematic view of a loop detection scenario in the method according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of coordinate transformation in a method according to another embodiment of the present invention.

Fig. 7 is a schematic structural diagram of a SLAM positioning device based on a two-dimensional code according to an embodiment of the present invention.

Detailed Description

In the drawings, the same or similar reference numerals are used to designate the same or similar elements or elements having the same or similar functions. Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

In the description of the present invention, the terms "central", "longitudinal", "lateral", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and therefore, should not be construed as limiting the scope of the present invention.

In the present invention, the technical features of the embodiments and implementations may be combined with each other without conflict, and the present invention is not limited to the embodiments or implementations in which the technical features are located.

The present invention will be further described with reference to the accompanying drawings and specific embodiments, it should be noted that the technical solutions and design principles of the present invention are described in detail in the following only by way of an optimized technical solution, but the scope of the present invention is not limited thereto.

The visual SLAM positioning method is mainly divided into four parts: 1) A front-end visual odometer; 2) Performing back-end nonlinear optimization; 3) Detecting a loop; 4) And (5) establishing a graph. Firstly, parking lot image information is collected by a monocular, binocular or RGB-D depth camera and transmitted to a front-end vision odometer, the vision odometer is initialized through feature extraction and matching and motion estimation, the relative pose relation between image frames is calculated, and the position and pose information of the camera is estimated. The back-end nonlinear optimization thread generally adopts a Beam Adjustment (BA) method to perform nonlinear optimization Adjustment, receives camera poses measured by the visual odometer at different moments and information of loop detection, and performs error elimination and optimization to obtain globally consistent tracks and maps. Loop back detection determines whether the vehicle has reached a previous position. If a loop is detected, it will provide the information to the back end for processing. And the map building flow establishes and stores a map corresponding to the task requirement according to the track estimated by the camera.

For convenience of explanation, some concepts related to the present invention will be explained.

Natural characteristic points: and extracting points in the obtained point pairs matched with the spatial three-dimensional points from the image acquired by the camera.

Map points: the spatial points calculated and recovered by the method provided by the invention are matched and correspond to the actual spatial three-dimensional points.

World coordinate system: and taking the coordinate system of the camera under the initial frame moment as a world coordinate system, wherein the position of the camera is the origin of the world coordinate system.

Camera coordinate system: the current position of the camera is taken as the origin, the forward direction along the optical center is taken as the Z axis, the leftward direction is taken as the X axis, and the upward direction is taken as the Y axis.

Scale depth of camera: the scale of the distance between the map point and the camera is generally optimized by taking the median of the distances between all the map points and the camera in the initial frame as an initial scale and then as an optimization variable.

Depth of map points: distance of map points to camera.

The embodiment of the invention provides a SLAM positioning method based on two-dimensional codes, which is applied to automatic parking vehicles and comprises the following steps of:

step 3, determining the pose of the camera according to the pose of the two-dimensional code and the position information of the natural characteristic points; the method comprises the following steps:

wherein f (x) represents a residual error function, i represents the current ith frame image, k represents k key frame images in total, j represents the current jth map point, and n represents n map points in total; s represents the current s-th two-dimensional code, and t represents t two-dimensional codes in total;

e _P,i,j representing the projection error of natural characteristic points, and the expression is

Wherein P is _j Representing the worldJth map point under the mark system, P _j ＝[X _j ,Y _j ,Z _j ] ^T ,X _j ,Y _j ,Z _j As coordinates in the world coordinate system, T ⁱ Corresponding the pose of the camera in the world coordinate system to the ith frame image _ij ＝[u _ij ,v _ij ] ^T Represents the ith frame image time point P _j The actual pixel coordinate in the image, is->

Is the ith frame image time point P _j Projected pixel coordinates in the image, s _i Representing the scale depth of a camera corresponding to the ith frame image, and K represents an internal reference matrix of the camera;

covariance matrix Q representing information matrix, i.e. visual observations _P,i,j The inverse of (c).

Wherein σ (e) _P,1,1 ,e _P,1,1 ) To σ (e) _P,i,j ,e _P,i,j ) Represents a preset variance;

Representing an information matrix, i.e. a covariance matrix Q between the two-dimensional code pose and the camera pose _M,i,s The inverse of (1);

wherein σ (e) _M,1,1 ,e _M,1,1 ) To σ (e) _M,i,s ,e _M,i,s ) Represents a preset variance;

Wherein, step 2 includes:

Wherein, the method also comprises: when the two-dimension code appearing before is detected, judging that loop appears, and executing loop detection;

the loop detection comprises the following steps:

s _i represents the depth of the ith map point;

P _i representing world coordinates of the ith map point under the loopback frame; the loop frame is a frame corresponding to the last time when the two-dimensional code appears;

The method for positioning an SLAM based on a two-dimensional code according to the present invention is described below with a specific embodiment. As shown in fig. 2, the SLAM positioning method based on two-dimensional code provided by the embodiment of the present invention includes:

and step 21, extracting natural feature points according to the camera image.

Natural feature points refer to points in pairs of points that can be matched, derived from images captured by a camera. For example, in the driving process of a vehicle, a visual odometer is adopted for positioning, characteristic points are extracted and matched according to a monocular camera image, and points in the matched point pairs are natural characteristic points. Assuming that the two frames of images are I1 and I2, a pair of well-matched feature points P1 and P2 is obtained, as shown in fig. 3, O1 and O2 are the camera centers, P is a map point, P1 and P2 are projections of the P point on the camera imaging planes I1 and I2, respectively, O1O2 and the image plane are respectively intersected with e1 and e2, and then l1 and l2 are intersecting lines of the image plane and O1O2P plane.

And step 22, detecting the two-dimension code to obtain the position information of the two-dimension code.

The two-dimension code position information comprises a two-dimension code ID and pixel coordinates of four corner points of the two-dimension code. The two-dimensional code tag adopted by the invention is a binary square reference mark which can be used for camera attitude estimation, including but not limited to Aruco code, apriltag code and the like, and mainly consists of a wide black border and an internal binary matrix for determining an identifier (id) of the wide black border, as shown in FIG. 4. Its black border facilitates its rapid detection in the image, and the internal binary code is used to identify the mark and provide error detection and correction. The size of the two-dimensional code mark size determines the size of the internal matrix, for example, a mark with a size of 6x6 is composed of 36-bit binary numbers. Its main advantages are simple and quick detection and high robustness.

The method comprises the steps of bisecting a detected image by adopting a self-adaptive threshold method, namely dividing gray values of all pixel points in the image into 0 and 255 according to a certain threshold, dividing a label region only under two conditions of black and white, extracting an outline of a two-dimensional code from the thresholded image, and obtaining pixel coordinates of four corner points of the two-dimensional code according to a clockwise sequence after extracting the outline, wherein the outline needs to be abandoned in a filtering mode, and the outline is too large or too small and the like because the environment, a sensor and the image generate noise during transmission, so that the image is blurred and isolated pixel points appear; and performing perspective transformation on the marked image to obtain a standard front view form (square), performing thresholding separation on white and black positions, performing dictionary matching according to the divided black and white grid pixels, and identifying the ID of the two-dimensional code in a dictionary.

And step 23, determining the pose of the camera based on the natural feature points and the position information of the two-dimensional code.

In this step, a residual function is constructed, which includes a projection error of a natural feature point and a positioning error of a two-dimensional code, and the residual function is represented by the following formula:

wherein f (x) represents a residual function, i represents the current ith frame image, k represents k key frame images in total, j represents the current jth map point, and n represents n map points in total; s represents the current s-th two-dimensional code, and t represents t two-dimensional codes in total;

As the ith frame image time point P _j Projected pixel coordinates in the image, s _i The scale depth of a camera corresponding to the ith frame of image is represented, and K represents an internal reference matrix of the camera;

representing preset visual viewsA measured covariance matrix;

Representing a covariance matrix between a preset two-dimensional code pose and a camera pose;

constructing a least square problem according to the residual error function, solving the minimum value of the residual error function, and determining the camera pose T corresponding to the minimum value ⁱ And determining the camera pose in the world coordinate system when the image of the ith frame is taken.

The projection error of the natural characteristic point and the positioning error of the two-dimensional code can be obtained through the following analysis.

Setting coordinate points [ X, Y, Z ] of a camera coordinate system based on a camera constraint model] ^T Image pixel coordinate is [ u, v ]] ^T Then, the pinhole camera imaging model has:

then there is

Namely:

wherein the matrix

Is a camera internal reference matrix and can be obtained by camera calibration or factory parameters, T is a camera pose, P _w And the coordinates of the characteristic map points in the world coordinate system.

If a point P in space _i ＝[X _I ,Y _I ,Z _i ] ^T Actual pixel coordinate l _i ＝[u _I ,v _i ] ^T The pose of the camera is T, and the projection pixel coordinate is

s _i Representing the depth of scale. The scale depth is the scale size of the distance between the map point and the camera, and generally takes the median from all the map points to the camera in the initial frame as the initial scale, and then as the optimization variable for optimization. Because the observation noise has errors, in order to find the optimal pose T, a visual re-projection error is constructed, and the errors are minimized:

similarly, based on the prearranged two-dimension code, the pose of the two-dimension code under the known world coordinate system is T _WM And the pose of the camera is recovered to be T by positioning the two-dimensional code _MC The world pose of the camera is T, and the projection pose is T _WM T _MC Then, there is the restriction of vision two-dimensional code location:

e _m ＝T _WM T _MC -T (15)

wherein, a least square problem is constructed according to the residual function, and the process of solving the minimum value of the residual function may include:

constructing a least squares problem based on equation (1), wherein the expression is as follows:

and solving the least square problem to finally obtain the corresponding camera pose when the residual error is minimum.

The method for solving the least square problem may include: solving the least square problem by adopting a nonlinear optimization algorithm, which comprises the following steps:

constantly looking for a decrease increment Δ x _k To make

Convergence reaches a minimum. The first-order Taylor expansion of f (x) can be performed by adopting the Gaussian Newton algorithm

f(x+Δx)≈f(x)+J(x) ^T Δx#(17)

Wherein J (x) is a first-order linear derivative coefficient, i.e. a first-order derivative matrix of f (x), and the substitution of the above formula (17) into the formula (16) has

The derivative of the above equation with respect to Δ x is taken to be zero:

J(x)f(x)+J(x)J(x) ^T f(x)＝0 (19)

the following system of equations can be obtained

J(x)J(x) ^T f(x)＝-J(x)f(x) (20)

Called incremental equation or gauss-newton equation, the left coefficient is defined as H and the right coefficient is defined as g, and the above equation becomes

HΔx＝g (21)

Then by solving the incremental equation, iterate Δ x continuously so that | | | f (x) + J (x) ^T Δx|| ² And finally obtaining the optimal solution of the x state variable when the minimum value is reached. And taking x as the camera pose, and obtaining the optimal solution of the camera pose.

And 24, carrying out loop detection based on the two-dimensional code position information.

When the vehicle moves to a scene that the vehicle has walked, the camera detects the two-dimensional code which has appeared before, and then the situation that a loop appears is judged.

Constructing a residual function based on two-dimensional code constraint, visual re-projection constraint and loop detection constraint to obtain the following formula:

wherein x is _i Representing the pixel coordinate of the ith map point corresponding to the current frame;

s _i representing the ith map point depth;

P _i representing world coordinates of the ith map point in a loop frame coordinate system;

constructing a least squares problem, the expression is as follows:

and solving the residual function for each frame image from the loop frame to the current frame, and updating the pose of the camera.

Wherein, the meaning of the loop detection is as follows:

the visual odometer is calculated by a visual feature point method, the motion pose transformation between frames of the camera is recovered, the motion estimation between adjacent key frames is accompanied by a certain error, and the accumulated error of the odometer is gradually increased along with the increase of the key frames, so that a loop detection mode is needed to eliminate the accumulated error, the global pose optimization is carried out, globally consistent tracks and maps are constructed, the judgment of the loop frames is more efficiently completed by means of information contained in the visual two-dimensional code, the correlation strength of current data and historical data is improved, and the repositioning accuracy is improved, as shown in fig. 5.

Taking fig. 5 as an example, the specific process of loop back detection is as follows:

1) Loop candidate frame decision

When the vehicle moves to the scene that has been walked, i.e. the vehicle attitude is from X _n-1 ,X _n To X _k-1 ,X _k The camera detects the two-dimensional code M appearing before _t-1 ,M _t The two-dimensional code ID and the corresponding angular point are identified and detected, when the ID information of the two-dimensional code before a plurality of frames appears, the loop appears, and the loop detection and judgment method based on the two-dimensional code can effectively improve the accuracy of the loopAnd recall rate, improving the robustness of the algorithm.

Loop pose optimization

Based on loop frame X _n-1 ,X _n The 3D map point coordinate information and the current frame X _k-1 ,X _k The 2D image coordinate information of the current frame, a motion recovery pose method based on the correspondence of three-dimensional points and two-dimensional pixel coordinates (describing how to estimate the pose of a camera when knowing n 3D space points and the projection positions thereof), and updating the pose of the current frame, and further from the constraint of the current frame and the historical frame and the loop-back two-dimensional code M _t-1 ,M _t And constraining, wherein the residual error items are two-dimensional code loop errors and visual odometer errors, and a least square problem is constructed through the two residual errors to optimize the overall pose, eliminate the accumulated errors and improve the positioning accuracy. The specific process is as follows:

first, assume that the world coordinate of the observed map point under the loopback frame is P _i ＝(X _I ,Y _I ,Z _I ,1) ^T The corresponding image coordinate information under the current frame is x _i ＝(u _I ,v _I ,1) ^T And (in a normalized coordinate system), constructing a PnP problem, and solving the motion from the 3D point to the 2D point. Defining an augmentation matrix T = [ R | T =]Is a 3x4 matrix containing rotation R and translation information t. Then there is

Wherein s is _i And K is the camera internal reference. Then write to matrix form:

then a least square problem can be constructed for solving, the best camera pose T is found, and the error is minimized:

the above formula is an optimization problem obtained according to the two-dimensional code constraint and the visual reprojection constraint in the above rear-end optimization, wherein the T with more accurate pose transformation of the loop frame and the current frame is obtained according to the visual reprojection error constraint of the loop frame and the current frame in the loop detection:

combining the three error terms, and finally obtaining the least square problem based on three residual errors of two-dimensional code constraint, visual reprojection constraint and loop detection constraint in the loop detection process, namely:

and adopting a Gauss-Newton method in the nonlinear optimization algorithm to construct a Gauss-Newton equation H delta x = g, continuously iterating and solving, minimizing a cost function, and obtaining the optimized pose transformation T of each frame, namely performing global pose optimization from a loop frame to a current frame, thereby reducing the influence caused by accumulated errors.

In the embodiment of the invention, the position of the camera can be repositioned:

in the tracking and positioning process of the slam algorithm, if tracking is lost, the pose of the current frame in the world coordinate system can be quickly obtained through repositioning based on the two-dimensional code, and tracking and positioning are restarted.

In one aspect of the present invention, if the camera does not detect a natural feature point, the camera pose may be determined directly from the detected two-dimensional code position information. The SLAM positioning method based on two-dimensional codes provided by the present invention is described below by another specific embodiment. In the SLAM positioning method based on the two-dimensional code, provided by the embodiment of the invention, the automatic parking vehicle detects the two-dimensional code, acquires the position information of the two-dimensional code, and determines the pose of the camera according to the position information of the two-dimensional code.

Fig. 6 shows a coordinate conversion chart in the present embodiment. In detectingObtaining accurate pixel coordinates of four corner points during two-dimensional code, obtaining a rotation vector R and a translation vector T of a camera relative to a two-dimensional code coordinate system by adopting a 2D-3D pose recovery PnP algorithm according to a 3D position of the corner point in the two-dimensional code coordinate system, converting the rotation vector R into a rotation matrix R of 3x3 through a Rodrigues formula, and obtaining an absolute pose transformation matrix T between the camera and the two-dimensional code coordinate system _MC And then the known pose T of the prearranged two-dimensional code under the world coordinate system _WM Then there is

T _WC ＝T _WM ·T _MC ＝R _WM (R _MC +t _MC )+t _WM (61)

Wherein R is _MC And t _MC Respectively a rotation matrix and a translation matrix between a camera coordinate system and a two-dimensional code coordinate system. R _MC And t _MC The calculation can be obtained through the PnP algorithm, and the specific calculation process is as follows:

consider that the homogeneous coordinate of the upper corner point of the two-dimensional code under the coordinate system of the two-dimensional code is P = (X, Y, Z, 1) ^T Projected in the image to a feature point x ₁ ＝(u ₁ ,v ₁ ,1) ^T (in normalized homogeneous coordinates). At the moment, the pose R and T of the camera are unknown, and an augmentation matrix T = [ R | T ] is defined]Is a 3x4 matrix containing rotation and translation information. The relationship between the camera coordinate system and the two-dimensional code coordinate system expands the form as follows:

eliminating s with the last row, two constraints are obtained:

to simplify the representation, a row vector of T is defined:

T ₁ ＝(t ₁ ,t ₂ ,t ₃ ,t ₄ ) ^T ，T ₂ ＝(t ₅ ,t ₆ ,t ₇ ,t ₈ ) ^T ，T ₃ ＝(t ₉ ,t ₁₀ ,t ₁₁ ,t ₁₂ ) ^T ，

then there is

Each pair of feature points (3D points and 2D matching point pairs) provides two linear constraints on T, and if there are a total of N pairs of feature points, the following linear equations can be listed:

t has 12 dimensions in total, and linear solution of the matrix T can be realized by at least 6 pairs of matching points, which is called direct linear solution (DLT).

In DLT solution, the T matrix is directly regarded as 12 unknowns, but the solution obtained by DLT does not necessarily satisfy the constraint and is a general matrix because the rotation matrix R ∈ SO (3). Therefore for the rotation matrix R, for the matrix blocks 3x3 to the left of T, the best rotation matrix can be found to approximate. This process may be accomplished by QR decomposition.

In another case of the present invention, if the camera does not detect the two-dimensional code, the camera pose is determined by:

in the vehicle driving process, when the two-dimensional code is not detected, the visual odometer is adopted for positioning, natural characteristic points are extracted and matched according to the monocular camera image, the motion of the camera is estimated through the pair-level geometric constraint according to the matched point pair, and a rotation matrix R and a translation vector t between two frames are obtained.

As shown in connection with fig. 3. A pair of matched natural feature points P1 and P2 is obtained by assuming that the front frame image and the rear frame image are I1 and I2.

If there are several pairs of such matching points, the motion of the camera between two frames can be recovered by the correspondence of these two-dimensional image points.

Taking the pair of matching points as an example, O1 and O2 are the camera centers, P is a map point, P1 and P2 are the projections of the P point on the camera imaging planes I1 and I2, respectively, O1O2 and the image plane intersect at e1 and e2, respectively, and then l1 and l2 are the intersection lines of the image plane and O1O2P plane. Assuming the motion from the first frame to the second frame as R, t, the algebraic solution is as follows:

in the first frame coordinate system, let P be at a spatial position

P＝[X,Y,Z] ^T (71)

Then, the pinhole camera model can know the pixel positions of two pixels P1 and P2 as

s ₁ P ₁ ＝KP (72)

s ₂ P ₂ ＝K(RP+t) (73)

Where K is the camera reference and R, t is the two coordinate system camera motion. s1 and s2 are the depths of the two, s being the same according to the scale ₁ P ₁ And P ₁ In projection, being equivalent in a scale sense, the above formula can be written as

And (3) taking:

x ₁ ＝K ^-1 P ₁ (76)

x ₂ ＝K ^-1 P ₂ (77)

where x is ₁ ，x ₂ Is the coordinate on the normalized plane of the two pixels. Substituting the formula to obtain:

and the two sides of the inverse symmetric matrix are multiplied by t ^ which is t. It is equivalent to that both sides simultaneously make an outer product with t:

then, the two sides are multiplied together

Due to t ^ x ₂ And x ₂ Vertically, then the inner product is 0, and the left side of the above equation is strictly equal to 0, from which it can be derived:

re-substituting p ₁ ,p ₂ Is provided with

The two equations above are the pair-level constraints, and the middle part is recorded as two matrices: the fundamental matrix F and the essential matrix E can further simplify the stage constraint:

E＝t^R (83)

F＝K ^-T EK ^-1 (84)

thus, the problem of estimating the pose of the visual slam odometer camera becomes the following two steps: 1) And E or F is obtained according to the pixel position of the matching point. 2) And solving R and t according to E or F.

Considering the scale equivalence of E, using 8 pairs of points to estimate E, called eight-point method, the following describes solving E by using eight-point method, and the solving process of eight-point method is as follows:

considering a pair of matching points, the normalized coordinates are x respectively ₁ ＝[u ₁ ,v ₁ ,1] ^T ,x ₂ ＝[u ₂ ,v ₂ ,1] ^T According to a pair of level constraints, have

The matrix E is expanded and written in vector form:

e＝[e ₁ ,e ₂ ,e ₃ ,e ₄ ,e ₅ ,e ₆ ,e ₇ ,e ₈ ,e ₉ ] ^T (87)

the pair level constraint can be written as follows:

[u ₁ u ₂ ,u ₁ v ₂ ,u ₁ ,v ₁ u ₂ ,v ₁ v ₂ ,v ₁ ,u ₂ ,v ₂ ,1]·e＝0 (88)

similarly, the equations with eight pairs of points are put into one equation set, and the equations with the same expression for other characteristic points have

The E matrix can be obtained by the above equation system, and then by E = t ^ R, by Singular Value Decomposition (SVD), R, t can be obtained.

The embodiment of the invention also provides a SLAM positioning device based on the two-dimensional code, which is applied to automatic parking vehicles and is used for realizing the method in any one of the embodiments.

As shown in fig. 7, the apparatus includes:

the acquisition module 10 is used for acquiring images around the vehicle by using a vehicle-mounted camera and acquiring two-dimensional codes and natural characteristic points from the images;

the processing module 20 is configured to analyze the two-dimensional code to obtain a two-dimensional code pose and acquire position information of the natural feature points; determining a camera pose according to the two-dimensional code pose and the position information of the natural characteristic points;

wherein, the processing module 20 is configured to determine the camera pose according to the two-dimensional code pose and the position information of the natural feature point by:

Wherein P is _j Represents the jth map point, P, in the world coordinate system _j ＝[X _j ,Y _j ,Z _j ] ^T ,X _j ,Y _j ,Z _j As coordinates in the world coordinate system, T ⁱ The position and pose of the camera under the world coordinate system are corresponding to the ith frame image _ij ＝[u _ij ,v _ij ] ^T Represents the ith frame image time point P _j Actual pixel coordinates in an image>

As the ith frame image time point P _j Projected pixel coordinate in image, s _i Representing the scale depth of a camera corresponding to the ith frame image, and K represents an internal reference matrix of the camera;

covariance matrix Q representing information matrix, i.e. visual observations _P,i,k The inverse of (c);

e _M,i,s representing the positioning error of the two-dimensional code with the expression e _M,I,s ＝T _WM ^is T _MC ^is -T ⁱ Wherein, T _WM ^is The pose T of the s two-dimensional code in the world coordinate system when the i frame image is represented _MC ^is And when the ith frame of image is represented, based on the s two-dimensional code, positioning and recovering the camera pose, wherein the pose of the camera corresponding to the ith frame of image in the world coordinate system is T ⁱ ；

Representing an information matrix, i.e. a covariance matrix Q between the two-dimensional code pose and the camera pose _M,i,s The inverse of (c);

Wherein the processing module 20 may be configured to:

Wherein, the processing module 20 is further configured to: when the two-dimensional code appearing before is detected, judging that loop appears, and executing loop detection;

the loop detection comprises the following steps:

s _i representing the depth of the ith map point;

The scheme adopted by the invention adopts a form of positioning by fusing the visual SLAM with the two-dimensional code, arranges the visual two-dimensional code, fuses the artificial label information with the natural characteristics and realizes more accurate and stable positioning. The method can effectively reduce the accumulative error of classical SLAM positioning in indoor scenes such as parking lots, can achieve centimeter-level positioning accuracy, and has better accuracy, stability and robustness. When the two-dimension code is not detected, the vehicle pose can be recovered by adopting a visual SLAM method, when the two-dimension code is detected, the initialization of the vehicle pose is realized, and the more accurate pose of the vehicle is recovered through the two-dimension code. Moreover, when the position before passing or tracking fails, the loop detection and repositioning based on the two-dimension code characteristics can realize the quick, stable and accurate positioning of the vehicle.

Finally, it should be pointed out that: the above examples are only for illustrating the technical solutions of the present invention, and are not limited thereto. Those of ordinary skill in the art will understand that: modifications can be made to the technical solutions described in the foregoing embodiments, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the present invention, which is defined by the appended claims.

Claims

1. A SLAM positioning method based on two-dimensional codes is applied to automatic parking vehicles and is characterized by comprising the following steps:

the method comprises the following steps that 1, an image around a vehicle is collected by a vehicle-mounted camera, and a two-dimensional code and natural feature points are obtained from the image;

step 2, analyzing the two-dimensional code to obtain a two-dimensional code pose, and acquiring position information of natural characteristic points;

Wherein P is _j Represents the jth map point, P, in the world coordinate system _j ＝[X _j ,Y _j, Z _j ] ^T ,X _j ,Y _j, Z _j As coordinates in the world coordinate system, T ⁱ Corresponding the pose of the camera in the world coordinate system to the ith frame image _ij ＝[u _ij ,v _ij ] ^T Represents the ith frame image time point P _j The actual pixel coordinates in the image are,

is the ith frame image time point P _j Projected pixel coordinates in the image, s _i The scale depth of a camera corresponding to the ith frame of image is represented, and K represents an internal reference matrix of the camera;

2. The SLAM positioning method based on two-dimensional codes of claim 1, wherein the step 2 comprises:

3. The SLAM positioning method based on two-dimensional codes as claimed in claim 1 or 2, further comprising: when the two-dimension code appearing before is detected, judging that loop appears, and executing loop detection;

the loop detection comprises the following steps:

s _i represents the depth of the ith map point;

4. A SLAM positioning device based on two-dimensional codes is applied to automatic parking vehicles and is characterized by comprising:

the processing module is used for analyzing the two-dimensional code to obtain the pose of the two-dimensional code and acquiring the position information of the natural characteristic points; determining the pose of the camera according to the pose of the two-dimensional code and the position information of the natural feature points;

the processing module is used for determining the pose of the camera according to the two-dimensional code pose and the position information of the natural feature points in the following mode:

Wherein P is _j Represents the jth map point, P, in the world coordinate system _j ＝[X _j ,Y _j, Z _j ] ^T ,X _j ,Y _j, Z _j As coordinates in the world coordinate system, T ⁱ The position and pose of the camera under the world coordinate system are corresponding to the ith frame image _ij ＝[u _ij ,v _ij ] ^T Represents the ith frame image time point P _j The actual pixel coordinates in the image are,

as the ith frame image time point P _j Projected pixel coordinates in the image, s _i Representing the scale depth of a camera corresponding to the ith frame image, and K represents an internal reference matrix of the camera;

5. The SLAM positioning apparatus based on two-dimensional codes of claim 4, wherein the processing module is configured to:

6. The SLAM positioning apparatus based on two-dimensional codes of claim 4 or 5, wherein the processing module is further configured to: when the two-dimensional code appearing before is detected, judging that loop appears, and executing loop detection;

the loop detection comprises the following steps:

s _i represents the depth of the ith map point;

P _i world coordinates representing the ith map point under the loopback frame; the loop frame is a frame corresponding to the last time when the two-dimension code appears;