CN111612823A

CN111612823A - Robot autonomous tracking method based on vision

Info

Publication number: CN111612823A
Application number: CN202010436066.0A
Authority: CN
Inventors: 范江波; 郑昆; 徐云水; 赵泽彪; 邱平; 李锐
Original assignee: Zhaotong Power Supply Bureau of Yunnan Power Grid Co Ltd
Current assignee: Zhaotong Power Supply Bureau of Yunnan Power Grid Co Ltd
Priority date: 2020-05-21
Filing date: 2020-05-21
Publication date: 2020-09-01

Abstract

The invention discloses a vision-based robot autonomous tracking method, and belongs to the technical field of automatic robots. The method comprises the steps of determining a target position and sensing a planning path of a surrounding environment to avoid obstacles by using a GOTURN tracking model and a SegNet-based scene segmentation model, establishing a mapping relation between a pixel coordinate and a two-dimensional plane coordinate by a system through parallel projection, constructing a local grid map under a polar coordinate by a visual positioning fusion environment mathematical model, and finally planning an optimal track between a robot and a target through an A-ray algorithm; the method can realize stronger tracking capability and longer tracking distance in indoor and corridor environments.

Description

Robot autonomous tracking method based on vision

Technical Field

The invention relates to a vision-based robot autonomous tracking method, and belongs to the technical field of automatic robots.

Background

In recent studies, Sidenbladh et al uses skin color segmentation to detect a target person from an RGB image and track the head of the person, and the detection result is susceptible to ambient lighting, target motion and other factors, thereby affecting the system performance. Adiwahono et al propose a flexible leg feature detection method, but are sensitive to changes in human pose. Gockley et al used a standard particle filtering algorithm to segment laser scanner data and track potential targets, and authors designed two tracking schemes, direction tracking and path tracking, but the system lacked recognizable visual information and its effective range was less than 3.5 m. Wang et al uses a binocular camera based three-dimensional reconstruction to obtain the relative position between the robot and the person, and sets the distance between the person and the robot to be between 1-2.5 m in order to obtain sufficient reconstruction accuracy. The above conventional methods are more accurate tracking of targets and tracking target trajectories but do not employ path planning, which means that the robot can only track nearby targets.

Disclosure of Invention

The invention provides a vision-based robot autonomous tracking method.

The technical scheme of the invention is as follows: a robot autonomous tracking method based on vision comprises the steps of firstly, manually selecting a tracking target, and setting input of CNN as the target and a surrounding background area; dividing the environment into different categories according to the training data set; establishing a local map under a polar coordinate, taking the robot as the center of a polar coordinate system, establishing a conversion relation between a pixel coordinate point and a plane coordinate point and a grid coordinate, and calculating a pixel of a grid as a statistical grid obstacle; and setting a grid coordinate of the robot start, obtaining the target position of the robot through a target tracking algorithm, and finally completing the optimal path search by using an A-algorithm.

The method comprises the following specific steps:

step1, manually selecting destination target, setting CNN input as target area and background area around target, if atthe tracked object in the t-1 frame is a bounding box (C)_x,C_y) Is w, h, then at the next time t, is centered on the center (C)_x,C_y) Width and height are respectively k₁w、k₁h is extracted as a search area in the previous frame; wherein, C_x、C_yIs a point coordinate, t is a time, k₁w、k₁h is the width and height of the background region, k₁Is a coefficient;

step2, after searching the destination target, dividing the scene into different categories according to the training data set: segmenting the scene into different categories, such as people, places, trees or other categories, according to the training data set, thereby segmenting the scene;

step3, obtaining a scene segmentation model, and conveniently establishing a mapping relation between the image coordinates of the destination target and the plane actual coordinates through a visual positioning algorithm: setting a camera coordinate system { C } as a reference coordinate system, wherein a plane pi exists in the space, and a three-dimensional point p exists on the plane_i＝[x_i,y_i,z_i,1]^TN, n is the total number of three-dimensional points; deriving p from an aperture imaging model_iNormalized projection coordinates of and p_iHomogeneous image coordinates of (a);

step4, establishing a local map with polar coordinate center of the robot:

the polar coordinates are expressed in the form of (r, t); the pole diameter and the pole angle of each grid are obtained according to the following formula:

R_c＝C_res·r₁,r₁∈(1,R_dim)

C_const＝R_dim·C_res

wherein x and y represent p in a reference coordinate system_iAbscissa and ordinate of coordinate point, R_cRepresenting the radius length from the grid of the linear region to the origin of coordinates, also called the grid radius; corresponding R_nIs the radial length, r, of the grid in the non-linear region₁、r₂A radius number representing a grid; c_resRepresents the length of a single grid of the linear region, h_camRepresenting the distance of the optical center of the camera to the ground, hR_minRepresents the distance, R, from the top of the camera's field of view to the ground at 1.8m_dimDenotes the number of linear region grids, hR_dimDenotes the number of non-linear region grids, T_dimThe robot is divided into a plurality of parts in an angle dimension, namely each grid occupies a space of 2 degrees;

the obtained pole diameter r_pSubstituting into formula to obtain grid coordinate in radial direction, i.e. polar diameter, t_pI.e. the coordinates in the angular direction, i.e. the polar angle; the ratio lambda of the pixel points marked as obstacles in the grid to all the pixel points in the grid>At threshold, the grid is an obstacle, otherwise it is flat;

step5, using the a-algorithm to complete the optimal path search: and setting a starting point to be positioned at an origin of polar coordinates during path searching, setting a polar angle of 90 degrees as an initial orientation of the robot, obtaining a target position through an autonomous tracking algorithm, and finally completing the optimal path searching by using an A-x algorithm.

In Step2, SegNet is used to perform scene segmentation.

The invention has the beneficial effects that: the vision robot tracking technology is one of the key technologies of the intelligent mobile robot, and has wide application prospects in related fields. The robot has wide application value in the fields of museum navigation, pedestrian tracking, hospital assistance and the like: the method comprises the steps of determining a target position and sensing a planning path of a surrounding environment to avoid obstacles by using a GOTURN tracking model and a SegNet-based scene segmentation model, establishing a mapping relation between a pixel coordinate and a two-dimensional plane coordinate by a system through parallel projection, constructing a local grid map under a polar coordinate by a visual positioning fusion environment mathematical model, and finally planning an optimal track between a robot and a target through an A-ray algorithm; the method can realize stronger tracking capability and longer tracking distance in indoor and corridor environments.

Drawings

FIG. 1 is a schematic diagram of a CNN target tracking model;

FIG. 2 is a diagram of a scene segmentation network architecture;

FIG. 3 is a perspective projection view;

FIG. 4 is a schematic diagram of dividing a vertical grid;

FIG. 5 is a block diagram of a grid map;

FIG. 6 is a target occlusion test chart;

fig. 7 is a robot indoor follow-up experiment.

Detailed Description

Example 1: as shown in fig. 1-7, a vision-based robot autonomous tracking method, first, a tracking target is manually selected, and input of CNN is set as the target and a surrounding background area; dividing the environment into different categories according to the training data set; establishing a local map under a polar coordinate, taking the robot as the center of a polar coordinate system, establishing a conversion relation between a pixel coordinate point and a plane coordinate point and a grid coordinate, and calculating a pixel of a grid as a statistical grid obstacle; and setting a grid coordinate of the robot start, obtaining the target position of the robot through a target tracking algorithm, and finally completing the optimal path search by using an A-algorithm. According to the method, a similar GOTURN tracking model is selected, as shown in figure 1, a CNN model is used for extracting basic features of a current frame and a previous frame of image respectively, then the image features are fused and input into a 3-layer full-connection layer, the full-connection layer calculates the relative motion trend of a target by comparing the target features of the previous frame with the current image features, and the last layer outputs coordinates of a bounding Box of the tracked target.

Further, the specific steps of the method may be set as follows:

step1, manually selecting destination target, setting CNN input as target area and background area around target, if the tracked object in t-1 frame is bounding box (C)_x,C_y) Is w, h, then at the next time t, is centered on the center (C)_x,C_y) Width and height are respectively k₁w、k₁h is extracted as a search area in the previous frame; wherein, C_x、C_yIs a point coordinate, t is a time, k₁w、k₁h is the width and height of the background region, k₁Is a coefficient;

step2, after searching the destination target, dividing the scene into different categories according to the training data set: segmenting the scene into different categories, such as people, places, trees or other categories, according to the training data set, thereby segmenting the scene; to improve model performance and robustness, 2200 additional images were captured in the test environment, which were extracted from a video sequence lasting about 1 hour. These images were annotated using LabelMe and their labels were obtained. The floor is marked 1 and the obstacle is marked 0. 2000 images in the data set were randomly selected as a training set, and the remaining 200 images were used as a test set. The model was initially learned at a rate of 0.0001 and trained on a PyTorch platform.

Step3, obtaining a scene segmentation model, and conveniently establishing a mapping relation between the image coordinates of the destination target and the plane actual coordinates through a visual positioning algorithm: setting a camera coordinate system { C } as a reference coordinate system, wherein a plane pi exists in the space, and a three-dimensional point p exists on the plane_i＝[x_i,y_i,z_i,1]^TN, n is the total number of three-dimensional points; deriving p from an aperture imaging model_iToNormalized projection coordinates and p_iHomogeneous image coordinates of (a); specifically, the method comprises the following steps:

the normalized projection coordinates of a point in space and p can be determined by camera-internal parameters_iThe homogeneous image coordinates of (a) are converted to each other.

The mathematical expression for the viewing plane of the coordinate system { a } may be expressed as Z ═ 0 and pi in the form of projective geometry^A＝[n^ATd^A]^T＝[0 0 1 0]^T，n^ADenotes the plane pi^ANormal vector in coordinate system { A }, d^ADenotes the origin of coordinates of the coordinate system { A } to the plane pi^AThe distance of (c). If the camera coordinate system { C } is taken as the reference coordinate system, the view plane Z ═ 0 can be expressed as pi^c＝[n^cTd^c]^TWherein:

in FIG. 3, the camera optical center is defined

And normalizing the projection coordinates m to a space line L

And m is written in homogeneous coordinate form:

B＝[m^T1]^T

the Plucker matrix of the line L is L ═ AB^T-BA^TBy calculating the line L and the plane π^cThe intersection point coordinates of the two-dimensional projection coordinate m are obtained^cProjection coordinate P in (1):

and (3) converting the coordinates under the camera coordinate system into a ground coordinate system { A }, and performing homogeneous coordinate transformation:

after the coordinates of point P are found, P can be calculated^A(ii) a Establishing a mapping relation between the image coordinate and the plane actual coordinate through the step;

step4, the grid chart is the most widely applied form in the field of path planning, but the grid chart is not suitable for robot navigation, because the angle and the distance are usually discrete, a local map with a polar coordinate center of a robot is selected to be established.

As in fig. 4, an area within 1.8 meters from the camera is divided into 9 cells, each of which has a length of 0.2 meters. For grids exceeding 1.8 meters, the length increases in a non-linear fashion, so that linear and non-linear intervals can be divided in the grid map, respectively. The radius of each grid is determined by the following system of equations:

R_c＝C_res·r₁,r₁∈(1,R_dim)

C_const＝R_dim·C_res

wherein R is_cRepresenting the radius length from the grid of the linear region to the origin of coordinates, also called the grid radius; corresponding R_nIs half of the grid of the non-linear regionRadial length, r₁、r₂A radius number representing a grid; c_res0.2m represents the length of a single grid of the linear region, h_cam0.72m denotes the distance between the optical center of the camera and the ground, hR_minRepresents the distance, R, from the top of the camera's field of view to the ground at 1.8m_dim9 denotes the number of linear region grids, hR_dimDenotes the number of non-linear region grids, T _dim180 denotes dividing the robot in angular dimension 180 times a turn, i.e. each grid occupies a space of 2 degrees in size; c_constIndicating the linear zone length, △ h indicates 1/10 of the distance from the camera's optical center to the ground.

wherein x and y represent p in a reference coordinate system_iThe abscissa and ordinate of the coordinate point. The obtained pole diameter r_pSubstituting into equation (2) to obtain grid coordinate in radial direction, i.e. polar diameter, t_pI.e. the coordinates in the angular direction, i.e. the polar angle; thus, a pixel coordinate point is established>Plane coordinate point->The conversion relation of the grid coordinates, and the mapping relation between the grid coordinates and the pixel coordinates is stored; the ratio lambda of the pixel points marked as obstacles in the grid to all the pixel points in the grid>At 0.1, the grid is an obstacle, otherwise it is flat; therefore, when the robot receives one frame of image, the construction of the local map can be completed through scene segmentation and coordinate mapping.

Step5, using the a-algorithm to complete the optimal path search: as in fig. 5, a polar grid map is constructed with dimensions 16 × 90. When path searching is carried out, a starting point is set to be located at the origin of polar coordinates, a polar angle of 90 degrees is set to be the initial orientation of the robot, the grid coordinates of the robot start are (0,45), the target position can be obtained through an autonomous tracking algorithm, and finally the optimal path searching is completed through an A-x algorithm.

In Step2, SegNet is used to perform scene segmentation.

In order to test the robustness of the robot to the target occlusion processing, in the process of normal forward movement of the robot, a pedestrian walks back and forth before tracking a target, so that a scene that the target is occluded in an actual situation is simulated. Fig. 6 shows that when the target is temporarily blocked, the tracker is not easily interfered by the blocking object, and after the blocking disappears, the robot can still stably track the target pedestrian. Since the target is located right in front of the robot, the planned path is a straight line trajectory directed directly to the target. When the target is lost or is shielded for a long time, the robot enters an emergency stop state, so that the robot is protected.

The robot target following task is to move towards the target's position until a certain safe distance is maintained from the target. The laboratory area is about 8m 10m, and facilities such as pedestrians, tables and chairs, garbage cans, sofas and the like are arranged indoors, so that the indoor environment is relatively complex. The experimental results are shown in fig. 7, which shows path generation, scene segmentation and grid maps, respectively. Considering the volume of the robot, the planned path needs to be manually fine-tuned. We have 2 methods to deal with this problem, one is to artificially enlarge the boundary of the obstacle so that the segmentation map includes the size of the robot, which can effectively avoid collision with the obstacle while keeping the model of the robot itself unchanged. And secondly, manually adding an offset term on a planned path, wherein the number of the offset grids is determined by the size of the grids and the distance of the grids. We take a second strategy that is easier to implement, where the planned path is the path with the bias added. Experimental results also prove that the robot can stably move to a target position, and meanwhile, obstacles can be effectively avoided, and the indoor experimental effect initially meets the system requirements.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. A robot autonomous tracking method based on vision is characterized in that: firstly, manually selecting a tracking target, and setting the input of CNN as the target and a surrounding background area; dividing the environment into different categories according to the training data set; establishing a local map under a polar coordinate, taking the robot as the center of a polar coordinate system, establishing a conversion relation between a pixel coordinate point and a plane coordinate point and a grid coordinate, and calculating a pixel of a grid as a statistical grid obstacle; and setting a grid coordinate of the robot start, obtaining the target position of the robot through a target tracking algorithm, and finally completing the optimal path search by using an A-algorithm.

2. The vision-based autonomous robot tracking method according to claim 1, characterized in that: the method comprises the following specific steps:

step1, manually selecting destination target, setting CNN input as target area and background area around target, if the tracked object in t-1 frame is bounding box (C)_x,C_y) Is a BoundingBox of w, h, respectively, then at the next time t, one is extracted at the t-1 th frame to be (c)_x,c_y) Is a center, and has a width and a height of k₁w、k₁h as a search area; wherein, C_x、C_yIs a point coordinate, t is a time, k₁w、k₁h is the width and height of the background region, k₁Is a coefficient;

step3, obtaining a scene segmentation model, and conveniently establishing a mapping relation between the image coordinates of the destination target and the plane actual coordinates through a visual positioning algorithm: let the camera coordinate system { C } be the reference coordinate system, in spaceThere is a plane pi on which there is a three-dimensional point p_i＝[x_i,y_i,z_i,1]^TN, n is the total number of three-dimensional points; deriving p from an aperture imaging model_iNormalized projection coordinates of and p_iHomogeneous image coordinates of (a);

step4, establishing a local map with polar coordinate center of the robot:

R_c＝C_res·r₁,r₁∈(1,R_dim)

C_const＝R_dim·C_res

step5, using the a-algorithm to complete the optimal path search: and setting a starting point to be positioned at an origin of polar coordinates during path searching, setting a polar angle of 90 degrees as an initial orientation of the robot, obtaining a target position through a target tracking algorithm, and finally completing the optimal path searching by using an A-x algorithm.

3. The vision-based autonomous robot tracking method according to claim 2, characterized in that: in Step2, SegNet is used to perform scene segmentation.