CN113343875A

CN113343875A - Driving region sensing method for robot

Info

Publication number: CN113343875A
Application number: CN202110677857.7A
Authority: CN
Inventors: 程敏; 李奇; 罗作煌; 徐伟; 郭永春; 张淼; 沈林鹏; 杨顺; 尹帅; 唐海健
Original assignee: Shenzhen Yijiahe Technology R & D Co ltd
Current assignee: Shenzhen Yijiahe Technology R & D Co ltd
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-09-03

Abstract

The invention discloses a travelable area sensing method for a robot, which comprises the following steps: a travelable region perception method for a robot, characterized by: the method comprises the following steps: the method comprises the following steps that firstly, a robot collects a pavement depth image and an RGB image through an RGB-D camera and aligns the depth image and the RGB image; step two, processing the RGB image through a pre-trained road surface segmentation model to obtain a drivable area, performing plane detection on the depth image to obtain a drivable area, and aligning the two drivable areas through the step one; and step three, fusing the drivable areas by adopting a multi-frame superposition and weight accumulation method to obtain the final drivable area. The invention inputs RGB images and depth maps based on a depth learning model, outputs a travelable region, and enhances the stability of the system through cooperative work of a travelable region sensing technology and a positioning system under normal conditions.

Description

Driving region sensing method for robot

Technical Field

The invention relates to the technical field of robots, in particular to a drivable area sensing method for a robot.

Background

Conventional positioning algorithms typically perform combined navigation or SLAM positioning based on GNSS signals, lidar, etc. However, GNSS signals are generally only suitable for open scenes, and positioning performance may be drastically degraded in occlusion and multipath scenes. The applicable scenes of the laser radar are wide, but the laser radar is easy to locate and lose in the scenes such as open, dynamic and long corridors.

The positioning system is the basis that the inspection robot can correctly execute other tasks, and a stable and reliable travelable area perception technology can provide solid guarantee for the whole robot system to correctly and effectively execute commands. However, when the positioning system has problems such as loss of similar positioning and positioning drift, the inspection robot can follow a wrong track to move, so that all subsequent logic processing problems occur, and in severe cases, the robot can turn over, collide with people and the like.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the defects, the invention provides a drivable area sensing method for a robot, which is used for inputting an RGB image and a depth map based on a depth learning model and outputting a drivable area.

The technical scheme is as follows:

a travelable region perception method for a robot, comprising the steps of:

the method comprises the following steps that firstly, a robot collects a pavement depth image and an RGB image, and the depth image and the RGB image are aligned;

step two, processing the RGB image through a pre-trained road surface segmentation model to obtain a drivable area, performing plane detection on the depth image to obtain a drivable area, and aligning the two drivable areas through the step one;

and step three, fusing the drivable areas by adopting a multi-frame superposition and weight accumulation method to obtain the final drivable area.

In the first step, the robot collects a road surface depth image and an RGB image through an RGB-D camera; or the RGB image is collected through the RGB camera, and then the pavement depth image is collected through the depth camera or the laser radar.

Judging whether the two travelable areas are overlapped or not through the intersection ratio of the two travelable areas before the third step, and if so, turning to the third step; if the distance values are not coincident, calculating the average distance between all inner points in the travelable area obtained by plane detection and the plane, normalizing to obtain a plane detection error value, calculating to obtain a road surface segmentation error value according to the confidence coefficient output by the road surface segmentation model, comparing the road surface segmentation error value and the road surface segmentation error value, and performing fusion in the third step by taking the travelable area with smaller error value as a reference.

The road surface segmentation model in the second step is trained as follows:

(11) the robot respectively collects RGB images in different working conditions, different time periods or different positions through a camera arranged on the robot and marks out the road surface in the RGB images;

(12) training the image obtained in the step (11) by adopting a neural network through the neural network to obtain an annotation model, and annotating the acquired RGB image through the annotation model;

(13) and (3) designing a pavement segmentation model, taking the parameters of the labeled model obtained in the step (12) as initial parameters of the pavement segmentation model by adopting a transfer learning method, and training the RGB image after labeling by adopting a distillation learning method to obtain the pavement segmentation model.

The designed road surface segmentation model specifically comprises the following steps: adopting a depth _ wise convolution, point _ wise convolution, channel shuffle or inclusion convolution structure, and using a small convolution kernel of 3 by 3; in the network structure, a detail branch adopts a wide channel and a shallow layer convolution, a semantic branch adopts a narrow channel and a deep layer convolution, and a fusion layer adopts an attention mechanism to fuse two layers of information with different dimensionalities.

In the second step, the processing of the RGB image by the pre-trained road surface segmentation model to obtain the drivable area specifically comprises:

(21) inputting the RGB image into a trained road surface segmentation model to obtain a road surface mask of a drivable area;

(22) performing down-sampling on the pavement mask obtained in the step (21) to obtain a plurality of discrete points on the pavement mask outline;

(23) and (4) filtering a plurality of discrete points on the road mask contour obtained after the down-sampling in the step (22) by adopting a Kalman filtering method to obtain 2D coordinates of all the points, and further obtaining a travelable area.

After the travelable region is obtained in the second step, detecting the area change of the travelable region of the previous and next frames in real time, and comparing the area change with a preset area change threshold; and when the area change of the drivable areas of the previous and next frames is larger than a preset area change threshold value, judging that false detection exists, and inputting the current frame image into the road surface segmentation model trained before to train.

In the second step, performing plane detection on the depth image to obtain a travelable area specifically includes:

(31) converting the depth image into a 3D point cloud, and assuming that a plane model is Ax + By + Cz + D as 0;

(32) randomly selecting three points from the 3D point cloud to construct a plane model, and calculating parameters and a plane normal vector of the plane model;

(33) substituting all points in the 3D point cloud into the plane model constructed in the step (32) to calculate the number of the inner points; judging whether the distance between the point and the plane is smaller than a set threshold value according to the judgment basis of whether the point is an inner point;

(34) judging whether the number of points in the plane model reaches a set number, if not, turning to the step (35); if the plane model is reached, the plane model is taken as a final plane model; wherein the set number meets the condition that the number of the internal points accounts for 80% of all the points in the 3D point cloud;

(35) repeating the steps (31) to (32), comparing the number of interior points of the current plane model and the plane model at the previous moment, taking the plane model with the maximum number of interior points as a candidate plane model, and recording the parameters and the number of interior points;

(36) and (5) repeating the step (34) until all the points in the 3D point cloud are traversed, finishing the iteration, and taking the final candidate plane model as a final plane model to further obtain a travelable area.

The third step is specifically as follows:

(41) multi-frame superposition:

(411) feeding back a translation and rotation relation TF of the current frame image relative to the initial frame image by the slam system;

(412) obtaining a driving area for each frame of image, converting the driving area into a vehicle body coordinate system through the installation parameters of the RGB-D camera, and splicing through TF to obtain a spliced driving area;

(42) and (3) weight accumulation:

(421) the whole map is virtualized into a chessboard diagram with the grid resolution of 0.05m x 0.05m, when the grid corresponding to a certain area is judged as a travelable area by the robot, the corresponding grid position weight is increased by 1, otherwise, when the grid is judged as a non-travelable area, the corresponding grid position weight is decreased by 1;

(422) after the running of the set time, the grid with the weight more than or equal to the set threshold value 3 is a drivable area, and the grid with the weight less than the set threshold value 3 is an undrivable area;

(43) and finally obtaining the feasible driving area after fusion.

Further comprising an alignment step:

(51) installing a calibration plate at a certain point on a map, and respectively collecting calibration point information on the calibration plate through an RGB-D camera and a laser radar;

(52) calculating a rotation matrix:

R^*·N＝M

R^*·(N·N′)＝M·N′

R^*＝(M·N′)·(N·N′)^-1(if rank(N·N′)＝3)

in the above formula, R is a rotation matrix, N is a plane normal vector of a plane where the calibration point on the calibration plate is located in the camera coordinate system, and M is a plane normal vector of the plane where the calibration point on the calibration plate is located in the laser radar coordinate system;

(53) calculating a translation vector:

in the above formula, α, β, γ are euler angles of three degrees of freedom, x, y, z are translation distances of three degrees of freedom, T is a translation vector, p_iAs coordinates of the marker point in the camera coordinate system, q_i，jCoordinates of feature points captured for a lidar in the lidar coordinate system, n_iA normal vector of a plane where a mark point is located under a camera coordinate system;

(54) detecting a characteristic point on the plane of the calibration plate in the laser coordinate system, and simultaneously selecting a corresponding characteristic point on the plane of the calibration plate in the camera coordinate system and converting the characteristic point into the laser radar coordinate system according to the rotation and translation of the step (52) and the step (53);

(55) under a laser radar coordinate system, a connecting line of the two characteristic points is necessarily perpendicular to a normal vector of a plane where the calibration plate is located, namely points of the connecting line and the normal vector are multiplied by 0; repeating the step (54) to select a plurality of characteristic points to obtain an equation set and solving to obtain RT;

(56) and (5) calculating the drivable area under the slam system coordinate system through the RT obtained in the step (55).

Has the advantages that: different from a common combined navigation and SLAM system, the method inputs an RGB image and a depth map based on a deep learning model and outputs a drivable region. The normal driving area sensing technology and the positioning system work together to enhance the stability of the system, when some special situations are met, such as: 1) when dynamic factors such as people or vehicles appear, the module is responsible for detecting a dynamic area; 2) when the positioning loss occurs, the module is responsible for detecting the drivable area of the road surface to prevent the robot from turning over; 3) when the positioning deviation occurs on a narrow road surface, the module is responsible for road edge detection, and the robot is prevented from falling.

Drawings

FIG. 1 is a flow chart of the CNN model training of the present invention;

FIG. 2 is a flow chart of a deployment run algorithm of the present invention;

FIG. 3 is a flow chart of the post-processing of the present invention.

Fig. 4 is a schematic network structure diagram of the road surface segmentation model of the present invention.

Detailed Description

The invention is further elucidated with reference to the drawings and the embodiments.

The invention relates to a drivable area sensing method for a robot, which comprises the following steps:

step one, training a road surface segmentation model;

the pavement segmentation model adopts a CNN model, the input of the CNN model training is a labeled RGB image, the labeling cost is reduced by introducing semi-automatic labeling considering that the high-quality labeling data is usually high in cost, time and labor are consumed, the CNN model is iteratively upgraded by a feedback labeling technology, and the size of the CNN model is controlled by combining model lightweight design and a model compression technology to reduce the computing resource consumption of the CNN model during actual deployment and operation so as to ensure that the CNN model can output the pavement mask in real time, as shown in FIG. 1.

The method comprises the following specific steps:

(11) data acquisition: the robot acquires an RGB image through a camera arranged on the robot; in order to ensure the diversity of data samples, collecting related data in different working conditions, different time periods or different positions; for example, in different weather such as sunny days, cloudy days or rainy, snowy and foggy days, in different time periods such as the morning, the midday or the evening, in different ages, places and sizes, transformer substations and the like are adopted;

(12) data annotation: manually marking by adopting a marking tool cvat, marking the road surface in the collected RGB image through a mask, and training a road surface segmentation model after marking;

when the number of RGB images to be labeled reaches a certain number, a large model is trained to perform semi-automatic labeling so as to reduce the labeling cost;

the large model is a deep learning model, specifically, a CNN model is adopted, the RGB image is used as model input, the model is trained, and finally the large model for labeling the RGB image is obtained;

(13) and (3) lightweight design of the road surface segmentation model:

because the deep learning model needs to be deployed on the robot end, but the robot end usually has limited computational power and has higher requirements on power consumption, time consumption for operation and resource consumption, a pavement segmentation model with low computational power requirement and high deployment performance needs to be designed;

in the aspect of selecting a convolution structure, the road surface segmentation model adopts a light convolution mode, such as depth _ wise convolution, point _ wise convolution, channel shuffle, and inclusion structures, uses a small convolution kernel of 3 × 3, and reduces cross-channel connection as much as possible.

In terms of network structure of the road surface segmentation model, as shown in fig. 4, the network of the road surface segmentation model of the present invention adopts two branches, and the detail branch adopts a wide channel and a shallow layer convolution to obtain the detail features and rich feature expressions of the image low _ level; the semantic branch adopts narrow channel and deep layer convolution, high-level semantic context information such as contour, boundary, angular point and the like can be obtained from a larger receptive field, and then two layers of information with different dimensionalities are fused in a fusion layer by adopting an attention mechanism, so that higher segmentation accuracy is obtained while lower calculation power is ensured;

(14) training and compressing a road surface segmentation model: obtaining initial parameters of the road surface segmentation model by adopting a transfer learning method based on the large model which is preliminarily trained in the step (12), and further compressing the model in order to deploy at the robot end, so that the compressed road surface segmentation model is obtained by adopting a distillation learning method; the method comprises the following specific steps:

(141) performing data enhancement on the marked image, and performing operations such as random clipping, random mask, Gaussian noise addition, random affine transformation, resize and the like on the data during training to increase the diversity of the data;

(142) on the basis of the large model which is preliminarily trained in the step (12), obtaining initial parameters of the road surface segmentation model by adopting a transfer learning method so as to obtain higher training speed and effect;

(143) performing CNN training according to the initial parameters of the road surface segmentation model obtained in the step (142) to obtain a large teacher model (without considering model calculation force and size), ensuring good segmentation effect (clear driving area edge and MAP segmentation reaching 90%), and obtaining fixed parameters of the teacher model;

(144) obtaining a student model according to the pavement segmentation model designed in the step (13) and the initial parameters of the step (142), and adding the middle layer output and the final output of the teacher model obtained in the step (143) into a loss layer of the student model for training, so that the student model is easier to train and obtains a good segmentation effect, and finally obtaining a compressed pavement segmentation model; after the training of the road segmentation model is finished, the original float32 type is converted into the int8 type, so that the running time and the memory consumption of the road segmentation model are reduced;

(15) model transplantation: the time frame adopted during the training of the pavement segmentation model is a pytorch, and when the pavement segmentation model is actually deployed to a feedforward frame at a robot end, openvino needs to convert the model into onx firstly and then convert the onx into a bin file supported by the openvino;

step two, as shown in fig. 2, acquiring a drivable area;

(21) the robot collects a pavement depth image and an RGB image through an RGB-D camera arranged on the robot, and aligns the depth image and the RGB image; in the invention, RGB images can be collected through an RGB camera, and pavement depth images can be collected through a depth camera or a laser radar;

(22) as shown in fig. 3, the RGB image is input into the road segmentation model trained in the first step to obtain a road mask of the travelable area;

(23) and (3) post-processing algorithm processing:

(231) contour fitting: in consideration of reducing the calculation amount of the combination of the drivable areas and the accidental false detection problem, the obtained road mask needs to be subjected to down-sampling to obtain a plurality of discrete points on the contour of the road mask;

(232) filtering a plurality of discrete points on the road mask contour obtained after the down-sampling in the step (231) by adopting a Kalman filtering method to obtain 2D coordinates of all the points, and further obtaining a travelable area;

in the invention, the area change of the front frame drivable area and the rear frame drivable area can be detected in real time after the drivable area is obtained, an area change threshold value is preset, when the area change of the front frame drivable area and the rear frame drivable area is larger than the area change threshold value, the misdetection is judged to exist, the processed image is stored, the image is input into a trained road surface segmentation model for training so as to be used for the aforementioned feedback marking, and the acquisition, the identification and the segmentation of a road surface depth image and an RGB image are carried out again;

(24) carrying out plane detection on the depth image: firstly, assuming that a travelable area is necessarily a plane, wherein the assumption accords with the actual scene of a transformer substation; therefore, the travelable area can be obtained through a plane detection algorithm, namely, the depth image is converted into a 3D point cloud, and the plane detection is carried out on the 3D point cloud through a RANSAC algorithm to obtain the travelable area; the method comprises the following specific steps:

(241) assuming that the plane model is Ax + By + Cz + D as 0; randomly selecting three points from the 3D point cloud to construct a plane model, and calculating parameters and a plane normal vector of the plane model;

(242) substituting all points in the 3D point cloud into the plane model constructed in the step (241), and calculating the number of the inner points; the interpretation basis of whether the point is the interior point can be set artificially, and whether the distance between the point and the plane is smaller than a set threshold value is adopted in the invention; in the present invention, the size of the set threshold is set to 0.5 mm;

(243) judging whether the number of points in the plane model reaches a set threshold value, if not, turning to the step (244); if the plane model is reached, the plane model is taken as a final plane model; wherein, the set threshold value meets the condition that the number of the inner points accounts for 80% of all the points in the 3D point cloud;

(244) repeating the steps (241) to (242), comparing the number of interior points of the current plane model and the plane model at the previous moment, taking the plane model with the maximum number of interior points as a candidate plane model, and recording the parameters and the number of interior points;

(245) repeating the step (243) until all points in the 3D point cloud are traversed, finishing iteration, and taking the final candidate plane model as a final plane model to further obtain a drivable area;

(25) aligning the travelable region obtained in the step (23) with the travelable region obtained in the step (24) according to the internal parameters of the RGB-D camera;

(26) according to the two aligned travelable areas obtained through the steps, the two travelable areas are overlapped under most conditions, but not all planes can be traveled, and a road surface segmentation model has certain probability of false detection;

(261) judging whether the two travelable areas are overlapped or not by comparing the intersection ratio of the two travelable areas with a set intersection ratio threshold value, wherein the set intersection ratio threshold value is 0.6-1; if the driving areas are overlapped, the step three is carried out by taking the driving areas obtained in the step (245) as the reference; if not, go to step (262);

(262) calculating the average distance between all inner points in the travelable area obtained by plane detection and the plane, normalizing the average distance by a standard threshold value to obtain a plane detection error value, calculating a road surface segmentation error value according to the confidence coefficient output by the road surface segmentation model in the step (22), comparing the road surface segmentation error value and the road surface segmentation error value, and taking the travelable area with a smaller error value as a reference; the standard threshold is the average distance between all points in a preset plane and the plane;

step three, fusion of drivable areas:

(31) multi-frame superposition: for multi-frame superposition, a slam system needs to feed back a translation and rotation relation, abbreviated as TF, of a current frame relative to an initial frame; when a driving area is detected, the calibrated external parameters are converted into a vehicle body coordinate system, and then the driving areas detected by a plurality of frames are spliced through TF;

(32) and (3) weight accumulation: the weight accumulation can effectively avoid single-frame false detection, firstly, the whole map is virtualized into a checkerboard graph with grid resolution of 0.05m x 0.05m, when a grid corresponding to a certain area is judged as a travelable area by the robot, the corresponding grid position weight is increased by 1, otherwise, when the grid is judged as a non-travelable area, the corresponding grid position weight is decreased by 1; after the vehicle runs for a set time, setting a threshold value to be 3, wherein a grid with the weight of more than or equal to 3 is a drivable area, and a grid with the weight of less than 3 is an undrivable area;

(33) thereby finally obtaining a fused drivable area;

step four, calibrating the RGB-D camera and the laser radar: after the fused travelable area is obtained, the detected travelable area can be used for a positioning system only by aligning with the existing slam system coordinate system, so that external reference RTs of an RGB-D camera and a laser radar need to be calibrated;

(41) before calibration, a calibration plate needs to be prepared in advance for detecting plane characteristic points and calculating a plane normal vector; installing a calibration plate at a certain point on a map, and respectively collecting calibration point information on the calibration plate through an RGB-D camera and a laser radar;

(42) calculating a rotation matrix:

R^*·N＝M

R^*·(N·N′)＝M·N′

R^*＝(M·N′)·(N·N′)^-1(if rank(N·N′)＝3)

(43) calculating a translation vector:

in the above formula, α, β, γ are euler angles of three degrees of freedom, x, y, z are translation distances of three degrees of freedom, T is a translation vector, p_iAs coordinates of the marker point in the camera coordinate system, q_i,jCoordinates of feature points captured for a lidar in the lidar coordinate system, n_iIs the normal vector of the plane where the mark point is located under the camera coordinate system.

(44) Under a laser radar coordinate system, a connecting line of two characteristic points on a plane is necessarily vertical to a plane normal vector; firstly, detecting a characteristic point on a plane where a calibration plate is located under a laser coordinate system, simultaneously selecting a characteristic point on the plane where the calibration plate is located under a camera coordinate system, converting the characteristic point into a laser radar coordinate system through external reference rotation and translation, theoretically, if the external reference is accurate, a connecting line of two characteristic points is necessarily vertical to a normal vector of the plane where the calibration plate is located, and points of the connecting line and the normal vector are multiplied by 0, so that the external reference RT can be calculated only by selecting a plurality of groups of characteristic point minimization upper formulas;

(45) and (4) calculating the travelable area under the slam system coordinate system through the external reference RT obtained in the step (44).

The traditional plane detection method has no road surface semantic information and cannot be directly used for detecting the travelable area, and the CNN model cannot be directly applied to engineering generally due to the diversity of training data samples and the insufficient generalization of the model. The driving-capable area perception technology is mainly characterized in that a CNN deep learning method and a traditional plane detection method are combined, semantic information is obtained through a CNN model, and then the plane detection method is introduced to be combined with the CNN model to eliminate occasional false detection so as to improve detection robustness. And obtaining a false detection RGB image of the CNN model in the post-processing module through a self-adaptive threshold, and then putting the false detection image into the model to enhance training to form a feedback annotation closed loop so as to ensure the iterative upgrading capability of the method.

Although the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the details of the foregoing embodiments, and various equivalent changes (such as number, shape, position, etc.) may be made to the technical solution of the present invention within the technical spirit of the present invention, and these equivalent changes are all within the protection scope of the present invention.

Claims

1. A travelable region perception method for a robot, characterized by: the method comprises the following steps:

2. The travelable region perception method according to claim 1, characterized in that: in the first step, the robot collects a road surface depth image and an RGB image through an RGB-D camera; or the RGB image is collected through the RGB camera, and then the pavement depth image is collected through the depth camera or the laser radar.

3. The travelable region perception method according to claim 1, characterized in that: judging whether the two travelable areas are overlapped or not through the intersection ratio of the two travelable areas before the third step, and if so, turning to the third step; if the distance values are not coincident, calculating the average distance between all inner points in the travelable area obtained by plane detection and the plane, normalizing to obtain a plane detection error value, calculating to obtain a road surface segmentation error value according to the confidence coefficient output by the road surface segmentation model, comparing the road surface segmentation error value and the road surface segmentation error value, and performing fusion in the third step by taking the travelable area with smaller error value as a reference.

4. The travelable region perception method according to claim 1, characterized in that: the road surface segmentation model in the second step is trained as follows:

5. The travelable region perception method according to claim 4, characterized in that: the designed road surface segmentation model specifically comprises the following steps: adopting a depth _ wise convolution, point _ wise convolution, channel shuffle or inclusion convolution structure, and using a small convolution kernel of 3 by 3; in the network structure, a detail branch adopts a wide channel and a shallow layer convolution, a semantic branch adopts a narrow channel and a deep layer convolution, and a fusion layer adopts an attention mechanism to fuse two layers of information with different dimensionalities.

6. The travelable region perception method according to claim 1, characterized in that: in the second step, the processing of the RGB image by the pre-trained road surface segmentation model to obtain the drivable area specifically comprises:

7. The travelable region perception method according to claim 1, characterized in that: after the travelable region is obtained in the second step, detecting the area change of the travelable region of the previous and next frames in real time, and comparing the area change with a preset area change threshold; and when the area change of the drivable areas of the previous and next frames is larger than a preset area change threshold value, judging that false detection exists, and inputting the current frame image into the road surface segmentation model trained before to train.

8. The travelable region perception method according to claim 1, characterized in that: in the second step, performing plane detection on the depth image to obtain a travelable area specifically includes:

9. The travelable region perception method according to claim 1, characterized in that: the third step is specifically as follows:

(41) multi-frame superposition:

(42) and (3) weight accumulation:

(43) and finally obtaining the feasible driving area after fusion.

10. The travelable region perception method according to claim 1, characterized in that: further comprising an alignment step:

(52) calculating a rotation matrix:

R^*·N＝M

R^*·(N·N′)＝M·N′

R^*＝(M·N′)·(N·N′)^-1(if rank(N·N′)＝3)

(53) calculating a translation vector:

in the above formula, α, β, γ are euler angles of three degrees of freedom, x, y, z are translation distances of three degrees of freedom, T is a translation vector, p_iAs coordinates of the marker point in the camera coordinate system, q_i,jCoordinates of feature points captured for a lidar in the lidar coordinate system, n_iA normal vector of a plane where a mark point is located under a camera coordinate system;