CN109579825B

CN109579825B - Robot positioning system and method based on binocular vision and convolutional neural network

Info

Publication number: CN109579825B
Application number: CN201811416964.9A
Authority: CN
Inventors: 马国军; 倪朋; 郑威; 朱琎; 李效龙; 曾庆军
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2018-11-26
Filing date: 2018-11-26
Publication date: 2022-08-19
Anticipated expiration: 2038-11-26
Also published as: CN109579825A

Abstract

The invention discloses a robot positioning system and a positioning method based on binocular vision and a convolutional neural network. The system comprises a mobile robot, wherein the mobile robot is provided with an inertial sensor module, a binocular vision module and an upper computer which are respectively connected with a sensor control module. The motion state of the mobile robot is obtained through the inertial sensor module, the image of the surrounding environment is obtained through the binocular vision module, the convolutional neural network is used for carrying out stereo matching, and the position of the landmark is obtained through calculation. The sensor control module is used for carrying out operation control on the inertial sensor module and the binocular vision module, receiving data of the inertial sensor module and the binocular vision module, and then transmitting the obtained data to the upper computer for data processing to obtain the position of the mobile robot. The proposed method enables efficient self-positioning of a mobile robot.

Description

Robot positioning system and method based on binocular vision and convolutional neural network

Technical Field

The invention relates to the field of robot instant positioning and Mapping (SLAM), in particular to a robot positioning system and a positioning method based on binocular vision and a convolutional neural network.

Background

For an autonomous mobile robot, a feasible scheme is provided for realizing autonomous navigation in a real sense by an instant positioning and mapping (SLAM) technology in a complex environment with unknown or no prior map. SLAM techniques can obtain position estimates of themselves in a map of the surrounding environment while the map is being built in real time.

The mobile robot acquires the environmental information through a sensor, wherein the sensor mainly comprises a laser sensor, a vision sensor and the like. The laser sensor has large volume, high energy consumption and high price, and is not suitable for robots with smaller sizes. In addition, the laser sensor has a problem that it is difficult to extract angular or linear features when the sensor is highly confused or crowded, and the sensor has a perceptual drift. The perceived resolution is low and it is difficult to correlate the observed features with known features. The visual sensor can obtain abundant information, and is convenient to load and low in price. The method has good performance in the aspects of extraction and matching of environment characteristic landmarks, representation and management of environment maps and the like, and can well solve the data association problem in SLAM by using vision. Compared with a stereoscopic vision system, monocular vision has the disadvantages that the uncertainty in direction measurement is large, the depth recovery is difficult, and the application occasions are limited to a certain extent because the obtained image information is less.

Binocular vision is a method of passively perceiving a distance using a computer by simulating the principle of human vision. Observing an object from two or more points, acquiring images under different viewing angles, and calculating the offset between pixels according to the matching relation of the pixels between the images by using a triangulation principle to acquire the three-dimensional information of the object. The depth information of the object is obtained, and the actual distance between the object and the camera, the three-dimensional size of the object and the actual distance between two points can be calculated.

Convolutional neural networks are a particular type of artificial neural network that has become one of the most commonly used tools in the field of speech analysis and image recognition today. The convolutional neural network optimizes the network structure by fully utilizing the characteristics of locality and the like contained in data by combining local sensing regions, shared weights, and spatial or temporal down-sampling. The structure of a convolutional neural network is a special multi-layered perceptron that is highly invariant to translation, scaling, tilting, rotation, or other forms of transformations.

The Extended Kalman Filtering (EKF) has the problems of poor real-time performance, low precision, easy interference and the like in the process of simultaneously positioning and mapping (SLAM) a mobile robot.

Disclosure of Invention

The invention aims to provide a robot positioning system and a positioning method based on binocular vision and a convolutional neural network, aiming at the problems and the defects of the prior art. The invention adopts the extended Kalman filtering algorithm to enhance the robustness of robot positioning and improve the positioning precision.

In order to achieve the purpose, the invention adopts the following technical scheme:

the utility model provides a robot positioning system based on binocular vision and convolution neural network, the system includes mobile robot, mobile robot carries with inertial sensor module, binocular vision module and the host computer that are connected with sensor control module respectively. The motion state of the mobile robot is obtained through the inertial sensor module, the image of the surrounding environment is obtained through the binocular vision module, and the position of the landmark is obtained through stereo matching by utilizing the convolutional neural network. The sensor control module is used for carrying out operation control on the inertial sensor module and the binocular vision module, receiving data of the inertial sensor module and the binocular vision module, and then transmitting the obtained data to the upper computer for data processing to obtain the position of the mobile robot. The proposed method enables efficient self-positioning of a mobile robot.

Further, the binocular vision module is composed of a binocular stereo camera, the inertial sensor module is composed of an inertial sensor, the sensor control module is composed of a single chip microcomputer, and the upper computer is a notebook computer.

In order to achieve the above object, the present invention adopts another technical solution as follows.

A positioning method of a robot positioning system based on binocular vision and a convolutional neural network comprises the following steps:

(1) acquiring the motion state of the mobile robot, including acceleration and angular velocity, through an inertial sensor;

(2) acquiring surrounding images through a binocular stereo camera and calculating the position of a landmark according to the surrounding images;

2.1 acquiring binocular images of the surrounding environment by using a binocular camera;

2.2 image correction

The binocular camera is difficult to obtain a binocular vision model completely in a parallel alignment state, at the moment, binocular correction operation is needed, and the image pair obtained by the binocular camera is subjected to re-projection to enable the image pair to meet the parallel alignment state. The image correction corrects two images, which are not in fact in coplanar line alignment, to be in coplanar line alignment.

The binocular correction is to respectively eliminate distortion and line alignment of the left view and the right view according to monocular internal reference data (focal length, imaging origin, distortion coefficient) and binocular relative position relationship (rotation matrix and translation vector) obtained after the cameras are calibrated, so that the imaging origin coordinates of the left view and the right view are consistent, the optical axes of the two cameras are parallel, the left imaging plane and the right imaging plane are coplanar, and the epipolar lines are aligned in a line manner. The left and right views are adjusted to be in an ideal form of complete parallel alignment, so that any point on one image and the corresponding point on the other image have the same line number, and the corresponding point can be matched only by one-dimensional search on the line.

Binocular correction is typically achieved using the standard binocular correction algorithm, the Bouguet algorithm.

The principle of the Bouguet algorithm is mainly to maximize the observation area when the reprojection distortion is minimized. Under the condition of obtaining a relative relation matrix of the two cameras, namely a rotation matrix R and a translation vector T, the left camera and the right camera are respectively rotated by half, namely the rotation matrix R is decomposed into R _l And r _r Thus, the optical axes of the two cameras can point to the vector and the direction of the original main optical axis in parallel. Figure rotated at this timeThe image planes are coplanar and not aligned. To achieve row alignment, i.e. pole to infinity and pole horizontal alignment, a transformation matrix R needs to be constructed _rect Setting:

wherein R is _rect Is a transformation matrix, e ₁ 、e ₂ 、e ₃ Three vectors are formed.

Due to the matrix R _rect Transform the pole of the left camera to infinity and epipolar horizontal, so vector e ₁ The direction is the direction of the plane between the two camera principal points:

wherein the content of the first and second substances,

vector e ₂ Only the sum vector e ₁ Orthogonal is sufficient.

The direction orthogonal to the main optical axis is generally selected by calculating the vector e ₁ And the main optical axis direction (0,0,1), normalized to:

wherein, T _x 、T _y 、T _z Obtained by matrix T decomposition.

Third vector e ₃ And vector e ₁ And e ₂ Quadrature, by cross product:

e ₃ ＝e ₁ ×e ₂

the correction matrix of the binocular camera can be obtained through the rotation matrix and the transformation matrix of the left camera and the right camera, and an ideal parallel optical axis binocular vision model can be obtained.

Wherein R is _rect For transforming the matrix, r _l ,r _r Is a synthetic rotation matrix, R _l ,R _r Is an overall rotation matrix.

2.3, carrying out stereo matching by using a convolutional neural network;

2.3.1 calculating each parallax matching cost for the binocular image by using a convolutional neural network;

the essence of stereo matching is to compare the similarity of two pixel points in a binocular image, and the matching cost is the mathematical expression of the similarity. The lower the matching cost between two pixel points is, the more similar the two pixel points are, and the higher the matching degree is. The method comprises the steps of utilizing a block-based image matching method, using a convolutional neural network to find out accurate corresponding relations among blocks, dividing an image into a plurality of blocks, comparing the similarity of the blocks to try to find out corresponding pixel points, and then scoring the similarity of different degrees according to the corresponding relations of different blocks. The negative value of the similarity is defined as the cost of stereo matching for subsequent cost aggregation and disparity calculation. The specific steps of binocular stereo matching cost calculation based on the convolutional neural network method can be summarized as follows:

1) extracting different feature information of the left image block and the right image block by using 4 layers of convolutional neural network layers respectively;

2) and connecting the left and right characteristic information, and classifying and judging the characteristics by utilizing a 4-layer full-connection layer network. Wherein the loss function uses a cross-entropy cost function:

tlog(s)+(1-t)log(1-s)

wherein s is the output of the similarity comparison network, t is a sample mark, and t is 1 when the input is a positive sample and t is 0 when the input is a negative sample;

3) outputting judgment results of similarity and dissimilarity in a similarity scoring mode for subsequent stereo matching;

4) and constructing a cost function by utilizing the inverse proportion of the similarity score to obtain the matching cost.

2.3.2 cost polymerization

The optimal parallax is selected at the matching cost of one pixel point, and the method is easily influenced by noise in the image, so that the stability of the algorithm is poor. In the local matching algorithm, the matching cost of all pixel points in the surrounding neighborhood (aggregation window) is taken as the center to be matched, and the overlapping statistics is carried out. For a parallax to be selected, the aggregation window is a two-dimensional plane; and for all parallaxes to be selected in the parallax search range, the aggregation window is a three-dimensional space.

The aggregation of the matching costs is equivalent to the convolution of an aggregation window with a to-be-selected disparity plane in the disparity space, as shown in the following formula.

In the formula, C _A (p, d) represents the aggregate cost of the reference point p at the disparity d to be selected, C ₀ (p, d) represents the original matching cost of the reference point p at the disparity d to be selected, and w (p, d) represents the aggregated window weight of the point p at the disparity d plane. The simplest window weight selection method is to take an average value, and assuming that the aggregation window is a square window with a fixed size of 5 × 5, the window weight is set to 1/25.

2.3.3 parallax calculation

In the parallax calculation, the local matching algorithm and the global matching algorithm have a large difference. The local matching algorithm focuses on two steps of matching cost calculation and cost aggregation, and the final disparity value is easy to establish, and a Winner Take All strategy (WTA) is generally adopted. The WTA strategy is to select a disparity d with the optimal matching cost in all the search disparity ranges as the final disparity of the point p to be matched.

d _p ＝arg minC(p,d),d∈D

In the formula, d _p And D represents a parallax searching range.

Of these four steps, the disparity calculation is the key step of the global matching algorithm. Most global matching algorithms establish an energy function first, and then optimize the energy function by continuously adjusting the parallax value to obtain the final parallax.

E(d)＝E _data (d)+λE _smooth (d)

In the formula, E _data (d) The data item indicates how similar two pixels are when the disparity is d. E _smooth (d) For smoothing terms, used to encode smoothing hypotheses in a global matching algorithm, usually E _smooth (d) Only the difference between the disparities of the neighboring pixels is measured.

2.3.4 parallax refinement to obtain the final parallax value

The local matching algorithm and the global matching algorithm directly obtain integer parallax values distributed in a search parallax range, and can meet the requirement of occasions with low precision requirements. However, for applications requiring precision, such parallax often does not work well. On the basis of the initial parallax, the parallax can be further refined by adopting mathematical methods such as sub-pixel enhancement and the like to obtain the parallax value at the sub-pixel level. Meanwhile, mismatching points in the disparity map are removed by using methods such as left and right consistency check, median filtering and the like, and a more accurate disparity value is obtained.

And 2.4 obtaining the position of the landmark according to the parallax value by utilizing the triangulation principle.

Let space point P (X) _c ,Y _c ,Z _c ) The image coordinates projected in the left and right cameras are p _l (u _l ,v _l ) And p _r (u _r ,v _r ) Since the cameras are set in a parallel alignment, v is equal to v _l ＝v _r 。ΔPO _l O _r And Δ Pp _l p _r Similarly, the following relationship is obtained:

simplifying to obtain:

wherein, B is the base length, namely the distance between the optical centers of the two cameras; f is the focal length; u ═ d _l u _r I.e. parallax. The conversion relationship between the camera coordinates and the image coordinates is as follows:

the conversion coordinate relation of the vision model with the parallel optical axis can be obtained as follows:

(3) and carrying out augmentation extended Kalman filtering processing according to the acquired data.

3.1 prediction Process

And predicting the state vector and covariance of the mobile robot at the current moment through the pose vector and covariance of the mobile robot at the previous moment.

3.2 Observation Process

And observing to obtain different landmark positions, and updating the pose and the system error correction parameters of the mobile robot by using Kalman gain and an innovation matrix.

3.3 data Association

Judging an observation result, and entering a state augmentation stage if the observed landmark is a new landmark; if the observed landmark is a known landmark, the method proceeds to step.

3.4 State augmentation Process

When the mobile robot observes a new landmark, after verification, the position state of the newly observed landmark needs to be added into the system state vector.

3.5 update phase

And updating the new system state vector and the covariance matrix by applying an augmentation model.

(4) And updating the map, and drawing and updating the position of the mobile robot according to the result of the augmented extended Kalman filtering.

The invention has the following advantages:

1. the invention uses the binocular camera to collect the image, and the accuracy of the image acquisition is high

2. And the stereo matching is carried out by using a convolution neural network method, so that an accurate disparity map can be obtained, a more accurate landmark position is obtained, and the position estimation accuracy of the mobile robot is improved.

3. The invention positions the mobile robot by using the method of the extended Kalman filtering, thereby improving the positioning precision.

Drawings

FIG. 1 is a system diagram of a robot positioning system based on binocular vision and a convolutional neural network;

FIG. 2 is a flow chart of a method for a robot positioning system based on binocular vision and convolutional neural networks;

FIG. 3 is a flow chart of stereo matching;

FIG. 4 is a flowchart of a landmark acquisition portion;

FIG. 5 is a partial flow diagram of an augmented extended Kalman filter.

Detailed Description

The invention is described in detail below with reference to the figures and examples.

Fig. 1 is a schematic system diagram of a robot positioning system based on binocular vision and a convolutional neural network. The utility model provides a robot positioning system based on binocular vision and convolution neural network includes mobile robot, mobile robot carries with inertial sensor module, binocular vision module and the host computer that is connected with sensor control module respectively.

The inertial sensor acquires the motion state of the mobile robot, the adopted inertial sensor can be an MPU6050 or an MPU6500, and the inertial sensor module and the sensor control module are connected through a USB data line. The three-axis gyroscope and the three-axis accelerometer are integrated in the inertial sensor, and when the robot carrying the inertial sensor module moves, the chip can detect and output the angular velocity and the acceleration in real time according to the detection axis and the direction specified by the inertial sensor. Binocular vision module comprises binocular camera, can be ZED binocular camera or bumblebee2 binocular camera, and binocular camera passes through the USB data line and the controller module is connected.

The sensor control module adopts STM32 as a core processor, and is connected with a computer through an RS232 serial port and carries out real-time communication. The sensor control module is used for carrying out operation control on the inertial sensor module and the binocular vision module, receiving data of the inertial sensor module and the binocular vision module, and then transmitting the obtained data to the upper computer for data processing to obtain the position of the mobile robot.

The upper computer is a notebook computer, and the position of the mobile robot can be drawn and displayed according to the result of the extended Kalman filtering. The mobile robot is a small-sized turtbelto mobile robot, is small in size and high in flexibility, and can be loaded with an inertial sensor module, a binocular vision module, a sensor control module and a notebook computer.

The invention relates to a positioning method of a robot positioning system based on binocular vision and a convolutional neural network, which is characterized in that reference is made to a flow chart of the positioning method of the robot positioning system based on the binocular vision and the convolutional neural network in figure 2; fig. 3 is a perspective matching flowchart; fig. 4 is a flowchart of a landmark acquisition part; referring to fig. 5, a partial flowchart of the extended kalman filter is shown, which includes the following steps:

(1) prediction

The dynamic equation of the mobile robot is as follows:

in the formula: f (-) is a dynamic model equation of the mobile robot, dt is a control time step length, and L is a wheel base distance of a front axle and a rear axle. At the moment of k-1, the pose vector of the mobile robot

The control vector of the mobile robot from k-1 to k is

As the velocity correction parameter, γ is the heading angle.

Then, at time k, the state space and covariance predictions are:

in the formula

Is a pose vector of the mobile robot, and f (-) is a dynamic model equation of the mobile robot

In the formula: v ^ f _x ，▽f _u Pose vector for mobile robot motion model relative to mobile robot

And the Jacobian (Jacobian) determinant of the control vector u (k); q _k The process noise is assumed to be gaussian noise with a mean of zero as the error covariance of the process noise.

(2) Observation of

Assuming that the signpost is stationary in the environment,

is the ith landmark position observed by the mobile robot, and the distance and angle between the inertial sensor return sensor and the landmark are loaded by the mobile robot, the observation model can be expressed as:

S(k)＝▽h·P ^- (K)·(▽h) ^T +R _k

in the above formula, S (k) is an innovation matrix, and ^ h is an observation model about a state space

Jacobian's equation of (P) ^- (k) Is a covariance matrix, which is now processed as follows:

S _new (k)＝0.5·(S(k)+S(k) ^T )

S _Chol (k)＝chol(S _new (k))

above formula S _new (k) Symmetrizing the information matrix, formula S _Chol (k) The method shows that Cholesky decomposition is carried out on the newly generated symmetrical innovation matrix, and the processing result is more stable in value in the recursive operation process than that of simply applying the innovation matrix.

Kalman gain K _k The change is then:

K _k ＝P ^- (k)·(▽h) ^T ·S ^-1 _Chol (k)·(S ^-1 _Chol (k)) ^T

in the above formula, K _k Is Kalman gain, P ^- (k) Is a covariance matrix

The recursive update equation is:

P ⁺ (k)＝(I-K _k ·▽h)·P ^- (k)

in the above formula: h is the observation model with respect to the state space

Jacobian, R _k To observe the error covariance of the noise, the observed noise is assumed to be gaussian with a mean of zero.

(3) Data association, namely judging an observation result, and if the observed landmark is a new landmark, entering the step (4); if the observed landmark is a known landmark, step (5) is entered.

(4) State augmentation

(5) And updating the new system state vector and the covariance matrix by applying an augmentation model. Newly observed landmarks

The position can be expressed as:

in the above formula, the first and second carbon atoms are,

representing the newly observed landmark positions, and g (-) represents the new kinetic model equation.

Then the system state space containing the new landmark

Comprises the following steps:

corresponding covariance matrix P ^* (k) Comprises the following steps:

wherein:

in the formula:

is g (-) a Jacobian determinant on the pose vector and the observation vector of the mobile robot.

While the invention has been described with reference to a specific embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A positioning method of a robot positioning system based on binocular vision and a convolutional neural network comprises the following steps that the positioning system comprises a mobile robot, and the mobile robot is provided with an inertial sensor module, a binocular vision module and an upper computer which are respectively connected with a sensor control module; the method is characterized by comprising the following steps:

(1) acquiring data through an inertial sensor and a binocular vision sensor, acquiring the acceleration and the angular velocity of the mobile robot through the inertial sensor, and acquiring images around the mobile robot through the binocular vision sensor;

(2) stereo matching calculation is carried out through the acquired image by utilizing a convolutional neural network to obtain the position of the landmark;

(3) according to the motion state of the mobile robot obtained by the inertial sensor and the obtained landmark position, carrying out augmentation Kalman filtering processing to obtain the position of the mobile robot;

(4) drawing a motion map according to the position change of the mobile robot;

the step (2) of obtaining the position of the landmark by stereo matching calculation of the acquired image by using a convolutional neural network comprises the following specific steps:

(1) acquiring an environment image around the robot by using a binocular camera;

(2) correcting the binocular images, and re-projecting the image pairs acquired by the binocular camera to enable the image pairs to meet a parallel alignment state;

(3) stereo matching is carried out on the collected images by using a convolutional neural network to obtain a parallax value;

(4) calculating to obtain the position of the landmark according to the parallax value and the triangulation principle;

the specific method for carrying out stereo matching on the acquired image by utilizing the convolutional neural network comprises the following steps:

(1) calculating parallax matching cost of the binocular images by using a convolutional neural network;

(2) cost aggregation, namely aggregating the matching cost in a support window by a summing and averaging method to obtain the accumulated cost of a point on the image at a specific parallax;

(3) calculating the parallax, selecting a point with the optimal accumulated cost in the parallax search range as a corresponding matching point, wherein the parallax corresponding to the point is the required parallax;

(4) and (5) refining the parallax, and optimizing the obtained parallax image to obtain a final parallax value.

2. The positioning method according to claim 1, wherein the step (3) of performing extended kalman filter processing according to the acquired data includes:

(1) predicting the state matrix and the covariance matrix of the robot at the current moment through the state matrix and the covariance matrix of the robot at the previous moment;

(2) in the observation process, the landmark of the surrounding environment is observed, and the state of the robot and the system error correction parameters are updated through Kalman gain and the innovation matrix;

(3) data association, namely judging the relationship between the observation landmark and the known landmark according to the observation result, if the observation landmark is a new landmark, entering the step (4), and if the observation landmark is the known landmark, entering the step (5);

(4) in the state augmentation process, if a new landmark is observed, the position of the observed new landmark is added into a system state vector;

(5) and updating the new system state matrix and covariance if the observed landmark is a known landmark.

3. The positioning method according to claim 1, wherein the binocular vision module is composed of binocular stereo cameras; the inertial sensor module consists of an inertial sensor; the sensor control module consists of a single chip microcomputer; the upper computer is a notebook computer.