CN113034584A

CN113034584A - Mobile robot visual positioning method based on object semantic road sign

Info

Publication number: CN113034584A
Application number: CN202110411557.4A
Authority: CN
Inventors: 何力; 林旭滨; 杨益枘; 管贻生; 张宏
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2021-06-25
Anticipated expiration: 2041-04-16
Also published as: CN113034584B

Abstract

The invention discloses a mobile robot visual positioning method based on object semantic road signs, which comprises the steps of firstly constructing an off-line object road sign global map, then constructing an on-line object road sign local map, then carrying out mathematical expression on the object road sign local map and the object road sign global map by taking a map structure as a mathematical model, taking a sub-map matching problem as a map matching mathematical equation, realizing registration estimation of the object road sign local map in the object road sign global map by using a sub-map matching solving result, and finally realizing the visual positioning of a mobile robot. The invention carries out the visual positioning of the robot based on the semantic road signs of the object, utilizes the property that the geometric structure of the three-dimensional space has the visual angle invariance, gets rid of the limitation requirement of the positioning system on the motion posture of the robot, improves the robustness of the positioning system on the external environment conditions, and finally improves the autonomy and the flexibility of the mobile robot.

Description

Mobile robot visual positioning method based on object semantic road sign

Technical Field

The invention relates to the technical field of mobile robots, in particular to a mobile robot visual positioning method based on object semantic road signs.

Background

The autonomous navigation is a core technical problem in the technical field of mobile robots, and an autonomous navigation system generally comprises three main modules of positioning, path planning, control and the like, wherein the positioning module is used for inputting information of a real-time control module and a local path planning module and plays a vital role in the autonomous navigation system as an information feedback link. The real-time performance, the accuracy and the robustness of the positioning module are decisive factors for the performance of the autonomous navigation system.

The existing positioning system based on the visual sensor mainly depends on the matching of key points of visual features, namely, the matching and the association of the key points of the features in the current frame image and the key points in the feature map database are carried out. And under the condition of successful matching, estimating the three-dimensional pose of a camera coordinate system corresponding to the current frame image by using the camera projection geometric model by using the three-dimensional coordinates of the key points in the map database, namely equivalently estimating the pose of the sensor moving platform, thereby realizing the positioning function. In the positioning process, visual feature key point matching is a premise of accurate pose estimation, and the matching of the feature key points is mainly based on a description vector extraction algorithm and depends on the local pixel intensity distribution characteristics of an image essentially. Under the condition that texture information and illumination conditions of an external environment are strictly the same, the local pixel intensity distribution characteristics of an image can be influenced by a camera capturing visual angle, so that in the same scene, feature point matching fails due to inconsistency between an observation angle and a map construction process, and further a robot positioning system cannot work reliably.

Disclosure of Invention

Aiming at the problems of insufficient illumination invariance and visual angle invariance of an extraction and matching algorithm of visual feature points in the existing mobile robot global positioning technology based on a visual scheme, the invention provides a mobile robot visual positioning method based on object semantic road signs, aiming at getting rid of the limitation requirement of a positioning system on the motion posture of a robot, improving the robustness of the positioning system on external environment conditions and finally improving the autonomy and flexibility of the mobile robot.

In order to achieve the purpose, the technical scheme provided by the invention is as follows:

a mobile robot visual positioning method based on object semantic road signs comprises the following steps:

s1, constructing an offline object landmark global map;

s2, constructing an online object road sign local map;

s3, performing mathematical expression on the object landmark local map and the object landmark global map by taking the graph structure as a mathematical model, taking a sub-map matching problem as a map matching mathematical equation, realizing registration estimation of the object landmark local map in the object landmark global map by using a sub-map matching solution result, and finally realizing the visual positioning of the mobile robot.

Further, the specific process of constructing the offline object landmark global map in step S1 is as follows:

s1-1, acquiring an input image, and preprocessing the input image;

s1-2, realizing multi-target object detection by using the trained convolutional neural network, finally outputting in the form of a 2D rectangular envelope frame, and outputting the category vectors c of all target objects in the image_iAnd the position x of its envelope frame in the pixel coordinate system_i，y_iAnd size information w_i，h_iI.e. { c_i,x_i,y_i,w_i,h_i,i∈N}，w_iIs height, h_iIs the height;

s1-3, performing pixel level registration on the current image and the previous frame image by using an optical flow method, and estimating the motion pose of the camera by minimizing the gray value cost function of the registered pixels between two frames:

formula (1)) Wherein (·) the symbolic representation transforms 6-dimensional vectors in lie algebra se (3) into a 4x4 matrix form, i.e.

And exp (·) is an exponential mapping to lie algebra SE (3), i.e. exp (ξ ^) epsilon SE (3) is a rigid motion pose of an adjacent frame, and SE (3) is a special Euclidean group. P₁And P₂For projection matrices of corresponding cameras, X_i,1And X_i,2For the representation of the corresponding three-dimensional points in space in the corresponding camera coordinate system, I₁(. and I)₂(·) is the pixel gray value of the corresponding pixel point;

s1-4, enveloping the object by using the dual ellipsoid curve as object road sign map expression, firstly carrying out inscribed ellipse fitting on the image object detection frame and obtaining the dual matrix form of ellipse, and regarding the view i, at the camera attitude (R) of the view i_i，t_i) Dual ellipsoid curved surface Q in three-dimensional space under the condition that the sum camera internal reference matrix K is obtained^*Projected contour in this view

By projecting a matrix P_i＝K·[R_i t_i]Has the following relationship:

in the formula (2), scalar quantity

Indicating that the equations are equivalent in one dimension of phase difference.

S1-5, in the homogeneous coordinate system, the dual form Q of ellipsoid^*Is a symmetric matrix of 4X4, and the dual form C of the ellipse^*Is a symmetrical matrix of 3X3, i.e. Q^*And C^*There are 10 and 6 independent elements, respectively; equation (2) is a quadratic equation with P_iPerforming quadratic form vector expression, and recording as B_iLinear expression of primitive equationThe composition is as follows:

s1-6, for multiple views, simultaneously establishing multiple equations (3) to construct a linear equation set, solving the overconstrained equation to be a linear least squares problem, and solving through SVD to obtain variables

Restoring the elliptic matrix into a symmetric matrix to obtain an ellipsoid primal-dual form:

and reconstructing an object envelope ellipsoid to form an object landmark global map.

Further, in the step S1-1, the preprocessing process includes: filtering and denoising, converting a gray scale image and scaling the size to adapt to the input dimension of the neural network.

Further, the process of constructing the on-line object landmark local map in step S2 is as follows:

after the visual image is preprocessed, object detection and data association are carried out; then, a position and pose tracking is carried out by using an optical flow method, namely, the function of a visual odometer is realized, and the visual odometer integrates the position and pose optimized at the last moment to eliminate the accumulated error to the maximum extent; and then constructing a local object road sign map by using multiple views, adopting a sliding window type local map scale control strategy, carrying out rigid body transformation on an object road sign envelope surface coordinate system, expressing under a camera coordinate system, and finally forming the scale-controllable object road sign local map related to the camera coordinate system.

Further, the specific process of step S3 is as follows:

matching an object landmark local map and an object landmark global map, wherein the matching problem is a sub-graph matching problem in mathematics, and the object landmark global map is represented as a graph structure G (V, E), wherein V is a node and comprises a three-dimensional coordinate and a label of each landmark i, and E is an edge and represents the mutual three-dimensional relationship between two landmarks; similarly, the local map is represented as a graph structure H ═ V ', E', V 'is a node, and E' is an edge;

the map matching problem is specifically defined as: given a graph G with n nodes (V, E) and a graph H with m nodes (V ', E'), node correspondences are sought in graph G and graph H:

X∈{0,1}^n×m

to maximize consistency of graph attributes and structure:

wherein v represents i of graph G₁Node and i of graph H₂A consistency measure of the nodes, the measure derived from the three-dimensional scale consistency of the signposts and the consistency of the semantic tags; d represents i of diagram G₁-j₁Edge and i of graph H₂-j₂A consistency measure of the edge, the measure derived from three-dimensional positional distance consistency between different landmarks in the map; matching nodes with the same label in the two maps by utilizing semantic information, reducing the number of optimization iterations, finally obtaining an optimal matching solution, and determining the one-to-one correspondence of the road signs between the two maps;

after the matching of the object landmark local map to the object landmark global map is completed, the pose propagation is carried out by taking the landmark three-dimensional information in the object landmark global map as a reference, namely the transformation T of the global map landmark three-dimensional information-local map landmark three-dimensional information-reference coordinate system of the local map-camera coordinate system in the world coordinate system on the matching_wcE, SE (3), and finally obtaining the pose of the robot at the current moment relative to a map world coordinate system:

T_wr＝T_wcT_cr∈SE(3)

in the formula, T_CRE SE (3) is the transformation matrix of the robot with respect to the camera coordinate system, T_WCE SE (3) is the transformation moment of the camera coordinate system relative to the world coordinate systemThe matrix is obtained by multiplying the transformation matrix and the transformation matrix according to the chain rule of the transformation matrix and performing matrix operation to obtain the transformation matrix T of the robot relative to the world coordinate system_WRE SE (3), namely the positioning posture information of the robot.

Compared with the prior art, the principle and the advantages of the scheme are as follows:

according to the scheme, an off-line object road sign global map is constructed, an on-line object road sign local map is constructed, then a graph structure is taken as a mathematical model to perform mathematical expression on the object road sign local map and the object road sign global map, a sub-graph matching problem is taken as a map matching mathematical equation, a sub-map matching solving result is used for realizing registration estimation of the object road sign local map in the object road sign global map, and finally visual positioning of the mobile robot is realized. According to the scheme, the robot is visually positioned based on the object semantic road signs, the property that the geometric structure of a three-dimensional space has visual angle invariance is utilized, the limitation requirement of a positioning system on the motion posture of the robot is eliminated, the robustness of the positioning system on external environment conditions is improved, and finally the autonomy and flexibility of the mobile robot are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the services required for the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a mobile robot vision positioning method based on semantic object landmarks according to the present invention;

FIG. 2 is a global map of object landmarks based on an ellipsoid envelope surface representation in an embodiment;

fig. 3 is a local map of object signposts under the sliding window strategy in the embodiment.

Detailed Description

The invention will be further illustrated with reference to specific examples:

as shown in fig. 1, the method for visually positioning a mobile robot based on semantic object landmarks in this embodiment includes the following steps:

s1, constructing an offline object landmark global map, wherein the process is as follows:

s1-1, acquiring an input image, and preprocessing the input image, wherein the preprocessing comprises the following steps: filtering and denoising, converting a gray level image and scaling the size to adapt to the input dimension of the neural network;

in equation (1) (. cndot.). Lambda-symbology transforms a 6-dimensional vector in lie algebra se (3) into a 4x4 matrix form, i.e.

s1-4, enveloping the object by using dual ellipsoid curves as the object road sign mapExpressing that firstly, the image object detection frame is subjected to inscribed ellipse fitting to obtain an elliptic dual matrix form, and for the view i, the view i is in the camera attitude (R)_i，t_i) Dual ellipsoid curved surface Q in three-dimensional space under the condition that the sum camera internal reference matrix K is obtained^*Projected contour in this view

By projecting a matrix P_i＝K·[R_i t_i]Has the following relationship:

in the formula (2), scalar quantity

S1-5, in the homogeneous coordinate system, the dual form Q of ellipsoid^*Is a symmetric matrix of 4X4, and the dual form C of the ellipse^*Is a symmetrical matrix of 3X3, i.e. Q^*And C^*There are 10 and 6 independent elements, respectively; equation (2) is a quadratic equation with P_iPerforming quadratic form vector expression, and recording as B_iThe original equation is linearly expressed as:

and reconstructing an object envelope ellipsoid to form an object landmark global map, as shown in fig. 2.

S2, constructing an online object road sign local map;

the step is the same as the step S1, and object detection and data association are carried out after the visual image is preprocessed; then, a position and pose tracking is carried out by using an optical flow method, namely, the function of a visual odometer is realized, and the visual odometer integrates the position and pose optimized at the last moment to eliminate the accumulated error to the maximum extent; and then constructing a local object landmark map by using multiple views, performing rigid body transformation on an object landmark envelope surface coordinate system by using a sliding window type local map scale control strategy, expressing the object landmark envelope surface coordinate system in a camera coordinate system, and finally forming the scale-controllable object landmark local map associated with the camera coordinate system, as shown in fig. 3.

And S3, matching the object landmark local map with the object landmark global map, and finally realizing the visual positioning of the mobile robot.

The specific process of the step is as follows:

X∈{0,1}^n×m

to maximize consistency of graph attributes and structure:

T_wr＝T_wcT_cr∈SE(3)

in the formula, T_CRE SE (3) is the transformation matrix of the robot with respect to the camera coordinate system, T_WCE SE (3) is a transformation matrix of the camera coordinate system relative to the world coordinate system, and the transformation matrix T of the robot relative to the world coordinate system can be obtained by multiplying the two according to a transformation matrix chain rule and performing matrix operation_WRE SE (3), namely the positioning posture information of the robot.

The embodiment carries out robot visual positioning based on the object semantic road sign, utilizes the property that a geometric structure of a three-dimensional space has visual angle invariance, breaks away from the limitation requirement of a positioning system on the motion posture of the robot, improves the robustness of the positioning system on external environment conditions, and finally improves the autonomy and flexibility of the mobile robot.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.

Claims

1. A mobile robot visual positioning method based on object semantic road signs is characterized by comprising the following steps:

s1, constructing an offline object landmark global map;

s2, constructing an online object road sign local map;

2. The visual positioning method for mobile robot based on semantic object signposts of claim 1, wherein the specific process of constructing the offline global map of object signposts in step S1 is as follows:

s1-1, acquiring an input image, and preprocessing the input image;

in the formula (1), the reaction mixture is,(. SP) symbolic representation transforms 6-dimensional vectors in lie algebra se (3) into a 4x4 matrix form, i.e.

Exp (·) is an exponential mapping to lie algebra SE (3), namely exp (ξ ^) epsilon SE (3) is a rigid motion pose of an adjacent frame, and SE (3) is a special Euclidean group; p₁And P₂For projection matrices of corresponding cameras, X_i,1And X_i,2For the representation of the corresponding three-dimensional points in space in the corresponding camera coordinate system, I₁(. and I)₂(·) is the pixel gray value of the corresponding pixel point;

By projecting a matrix P_i＝K·[R_i t_i]Has the following relationship:

in the formula (2), scalar quantity

Showing that the equations are equivalent in one scale of phase difference;

s1-5, in the homogeneous coordinate system, the dual form Q of ellipsoid^*Is a symmetric matrix of 4X4, and the dual form C of the ellipse^*Is a symmetrical matrix of 3X3, i.e. Q^*And C^*There are 10 and 6 independent elements, respectively; equation (2) is a quadratic equation with P_iPerforming quadratic form vector expressionIs B_iThe original equation is linearly expressed as:

3. The method for visually positioning a mobile robot based on semantic landmarks of objects as recited in claim 2, wherein said preprocessing step S1-1 comprises: filtering and denoising, converting a gray scale image and scaling the size to adapt to the input dimension of the neural network.

4. The visual positioning method for mobile robot based on semantic object signposts of claim 1, wherein the step S2 comprises the following steps:

5. The visual positioning method for mobile robot based on semantic object road sign of claim 1, wherein the specific process of step S3 is as follows:

X∈{0,1}^n×m

to maximize consistency of graph attributes and structure:

after the matching of the object landmark local map to the object landmark global map is completed, the pose transmission is carried out by taking the landmark three-dimensional information in the object landmark global map as the referenceBroadcasting, i.e. transformation T of the global map landmark three-dimensional information-local map landmark three-dimensional information-reference coordinate system of local map-camera coordinate system in the world coordinate system on match_wcE, SE (3), and finally obtaining the pose of the robot at the current moment relative to a map world coordinate system:

T_WR＝T_WCT_CR∈SE(3)