CN113343875A - Driving region sensing method for robot - Google Patents

Driving region sensing method for robot Download PDF

Info

Publication number
CN113343875A
CN113343875A CN202110677857.7A CN202110677857A CN113343875A CN 113343875 A CN113343875 A CN 113343875A CN 202110677857 A CN202110677857 A CN 202110677857A CN 113343875 A CN113343875 A CN 113343875A
Authority
CN
China
Prior art keywords
plane
model
area
points
travelable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110677857.7A
Other languages
Chinese (zh)
Inventor
程敏
李奇
罗作煌
徐伟
郭永春
张淼
沈林鹏
杨顺
尹帅
唐海健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yijiahe Technology R & D Co ltd
Original Assignee
Shenzhen Yijiahe Technology R & D Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yijiahe Technology R & D Co ltd filed Critical Shenzhen Yijiahe Technology R & D Co ltd
Priority to CN202110677857.7A priority Critical patent/CN113343875A/en
Publication of CN113343875A publication Critical patent/CN113343875A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • G06T2207/30256Lane; Road marking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a travelable area sensing method for a robot, which comprises the following steps: a travelable region perception method for a robot, characterized by: the method comprises the following steps: the method comprises the following steps that firstly, a robot collects a pavement depth image and an RGB image through an RGB-D camera and aligns the depth image and the RGB image; step two, processing the RGB image through a pre-trained road surface segmentation model to obtain a drivable area, performing plane detection on the depth image to obtain a drivable area, and aligning the two drivable areas through the step one; and step three, fusing the drivable areas by adopting a multi-frame superposition and weight accumulation method to obtain the final drivable area. The invention inputs RGB images and depth maps based on a depth learning model, outputs a travelable region, and enhances the stability of the system through cooperative work of a travelable region sensing technology and a positioning system under normal conditions.

Description

Driving region sensing method for robot
Technical Field
The invention relates to the technical field of robots, in particular to a drivable area sensing method for a robot.
Background
Conventional positioning algorithms typically perform combined navigation or SLAM positioning based on GNSS signals, lidar, etc. However, GNSS signals are generally only suitable for open scenes, and positioning performance may be drastically degraded in occlusion and multipath scenes. The applicable scenes of the laser radar are wide, but the laser radar is easy to locate and lose in the scenes such as open, dynamic and long corridors.
The positioning system is the basis that the inspection robot can correctly execute other tasks, and a stable and reliable travelable area perception technology can provide solid guarantee for the whole robot system to correctly and effectively execute commands. However, when the positioning system has problems such as loss of similar positioning and positioning drift, the inspection robot can follow a wrong track to move, so that all subsequent logic processing problems occur, and in severe cases, the robot can turn over, collide with people and the like.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects, the invention provides a drivable area sensing method for a robot, which is used for inputting an RGB image and a depth map based on a depth learning model and outputting a drivable area.
The technical scheme is as follows:
a travelable region perception method for a robot, comprising the steps of:
the method comprises the following steps that firstly, a robot collects a pavement depth image and an RGB image, and the depth image and the RGB image are aligned;
step two, processing the RGB image through a pre-trained road surface segmentation model to obtain a drivable area, performing plane detection on the depth image to obtain a drivable area, and aligning the two drivable areas through the step one;
and step three, fusing the drivable areas by adopting a multi-frame superposition and weight accumulation method to obtain the final drivable area.
In the first step, the robot collects a road surface depth image and an RGB image through an RGB-D camera; or the RGB image is collected through the RGB camera, and then the pavement depth image is collected through the depth camera or the laser radar.
Judging whether the two travelable areas are overlapped or not through the intersection ratio of the two travelable areas before the third step, and if so, turning to the third step; if the distance values are not coincident, calculating the average distance between all inner points in the travelable area obtained by plane detection and the plane, normalizing to obtain a plane detection error value, calculating to obtain a road surface segmentation error value according to the confidence coefficient output by the road surface segmentation model, comparing the road surface segmentation error value and the road surface segmentation error value, and performing fusion in the third step by taking the travelable area with smaller error value as a reference.
The road surface segmentation model in the second step is trained as follows:
(11) the robot respectively collects RGB images in different working conditions, different time periods or different positions through a camera arranged on the robot and marks out the road surface in the RGB images;
(12) training the image obtained in the step (11) by adopting a neural network through the neural network to obtain an annotation model, and annotating the acquired RGB image through the annotation model;
(13) and (3) designing a pavement segmentation model, taking the parameters of the labeled model obtained in the step (12) as initial parameters of the pavement segmentation model by adopting a transfer learning method, and training the RGB image after labeling by adopting a distillation learning method to obtain the pavement segmentation model.
The designed road surface segmentation model specifically comprises the following steps: adopting a depth _ wise convolution, point _ wise convolution, channel shuffle or inclusion convolution structure, and using a small convolution kernel of 3 by 3; in the network structure, a detail branch adopts a wide channel and a shallow layer convolution, a semantic branch adopts a narrow channel and a deep layer convolution, and a fusion layer adopts an attention mechanism to fuse two layers of information with different dimensionalities.
In the second step, the processing of the RGB image by the pre-trained road surface segmentation model to obtain the drivable area specifically comprises:
(21) inputting the RGB image into a trained road surface segmentation model to obtain a road surface mask of a drivable area;
(22) performing down-sampling on the pavement mask obtained in the step (21) to obtain a plurality of discrete points on the pavement mask outline;
(23) and (4) filtering a plurality of discrete points on the road mask contour obtained after the down-sampling in the step (22) by adopting a Kalman filtering method to obtain 2D coordinates of all the points, and further obtaining a travelable area.
After the travelable region is obtained in the second step, detecting the area change of the travelable region of the previous and next frames in real time, and comparing the area change with a preset area change threshold; and when the area change of the drivable areas of the previous and next frames is larger than a preset area change threshold value, judging that false detection exists, and inputting the current frame image into the road surface segmentation model trained before to train.
In the second step, performing plane detection on the depth image to obtain a travelable area specifically includes:
(31) converting the depth image into a 3D point cloud, and assuming that a plane model is Ax + By + Cz + D as 0;
(32) randomly selecting three points from the 3D point cloud to construct a plane model, and calculating parameters and a plane normal vector of the plane model;
(33) substituting all points in the 3D point cloud into the plane model constructed in the step (32) to calculate the number of the inner points; judging whether the distance between the point and the plane is smaller than a set threshold value according to the judgment basis of whether the point is an inner point;
(34) judging whether the number of points in the plane model reaches a set number, if not, turning to the step (35); if the plane model is reached, the plane model is taken as a final plane model; wherein the set number meets the condition that the number of the internal points accounts for 80% of all the points in the 3D point cloud;
(35) repeating the steps (31) to (32), comparing the number of interior points of the current plane model and the plane model at the previous moment, taking the plane model with the maximum number of interior points as a candidate plane model, and recording the parameters and the number of interior points;
(36) and (5) repeating the step (34) until all the points in the 3D point cloud are traversed, finishing the iteration, and taking the final candidate plane model as a final plane model to further obtain a travelable area.
The third step is specifically as follows:
(41) multi-frame superposition:
(411) feeding back a translation and rotation relation TF of the current frame image relative to the initial frame image by the slam system;
(412) obtaining a driving area for each frame of image, converting the driving area into a vehicle body coordinate system through the installation parameters of the RGB-D camera, and splicing through TF to obtain a spliced driving area;
(42) and (3) weight accumulation:
(421) the whole map is virtualized into a chessboard diagram with the grid resolution of 0.05m x 0.05m, when the grid corresponding to a certain area is judged as a travelable area by the robot, the corresponding grid position weight is increased by 1, otherwise, when the grid is judged as a non-travelable area, the corresponding grid position weight is decreased by 1;
(422) after the running of the set time, the grid with the weight more than or equal to the set threshold value 3 is a drivable area, and the grid with the weight less than the set threshold value 3 is an undrivable area;
(43) and finally obtaining the feasible driving area after fusion.
Further comprising an alignment step:
(51) installing a calibration plate at a certain point on a map, and respectively collecting calibration point information on the calibration plate through an RGB-D camera and a laser radar;
(52) calculating a rotation matrix:
R*·N=M
R*·(N·N′)=M·N′
R*=(M·N′)·(N·N′)-1(if rank(N·N′)=3)
in the above formula, R is a rotation matrix, N is a plane normal vector of a plane where the calibration point on the calibration plate is located in the camera coordinate system, and M is a plane normal vector of the plane where the calibration point on the calibration plate is located in the laser radar coordinate system;
(53) calculating a translation vector:
Figure RE-GDA0003177226870000041
in the above formula, α, β, γ are euler angles of three degrees of freedom, x, y, z are translation distances of three degrees of freedom, T is a translation vector, piAs coordinates of the marker point in the camera coordinate system, qi,jCoordinates of feature points captured for a lidar in the lidar coordinate system, niA normal vector of a plane where a mark point is located under a camera coordinate system;
(54) detecting a characteristic point on the plane of the calibration plate in the laser coordinate system, and simultaneously selecting a corresponding characteristic point on the plane of the calibration plate in the camera coordinate system and converting the characteristic point into the laser radar coordinate system according to the rotation and translation of the step (52) and the step (53);
(55) under a laser radar coordinate system, a connecting line of the two characteristic points is necessarily perpendicular to a normal vector of a plane where the calibration plate is located, namely points of the connecting line and the normal vector are multiplied by 0; repeating the step (54) to select a plurality of characteristic points to obtain an equation set and solving to obtain RT;
(56) and (5) calculating the drivable area under the slam system coordinate system through the RT obtained in the step (55).
Has the advantages that: different from a common combined navigation and SLAM system, the method inputs an RGB image and a depth map based on a deep learning model and outputs a drivable region. The normal driving area sensing technology and the positioning system work together to enhance the stability of the system, when some special situations are met, such as: 1) when dynamic factors such as people or vehicles appear, the module is responsible for detecting a dynamic area; 2) when the positioning loss occurs, the module is responsible for detecting the drivable area of the road surface to prevent the robot from turning over; 3) when the positioning deviation occurs on a narrow road surface, the module is responsible for road edge detection, and the robot is prevented from falling.
Drawings
FIG. 1 is a flow chart of the CNN model training of the present invention;
FIG. 2 is a flow chart of a deployment run algorithm of the present invention;
FIG. 3 is a flow chart of the post-processing of the present invention.
Fig. 4 is a schematic network structure diagram of the road surface segmentation model of the present invention.
Detailed Description
The invention is further elucidated with reference to the drawings and the embodiments.
The invention relates to a drivable area sensing method for a robot, which comprises the following steps:
step one, training a road surface segmentation model;
the pavement segmentation model adopts a CNN model, the input of the CNN model training is a labeled RGB image, the labeling cost is reduced by introducing semi-automatic labeling considering that the high-quality labeling data is usually high in cost, time and labor are consumed, the CNN model is iteratively upgraded by a feedback labeling technology, and the size of the CNN model is controlled by combining model lightweight design and a model compression technology to reduce the computing resource consumption of the CNN model during actual deployment and operation so as to ensure that the CNN model can output the pavement mask in real time, as shown in FIG. 1.
The method comprises the following specific steps:
(11) data acquisition: the robot acquires an RGB image through a camera arranged on the robot; in order to ensure the diversity of data samples, collecting related data in different working conditions, different time periods or different positions; for example, in different weather such as sunny days, cloudy days or rainy, snowy and foggy days, in different time periods such as the morning, the midday or the evening, in different ages, places and sizes, transformer substations and the like are adopted;
(12) data annotation: manually marking by adopting a marking tool cvat, marking the road surface in the collected RGB image through a mask, and training a road surface segmentation model after marking;
when the number of RGB images to be labeled reaches a certain number, a large model is trained to perform semi-automatic labeling so as to reduce the labeling cost;
the large model is a deep learning model, specifically, a CNN model is adopted, the RGB image is used as model input, the model is trained, and finally the large model for labeling the RGB image is obtained;
(13) and (3) lightweight design of the road surface segmentation model:
because the deep learning model needs to be deployed on the robot end, but the robot end usually has limited computational power and has higher requirements on power consumption, time consumption for operation and resource consumption, a pavement segmentation model with low computational power requirement and high deployment performance needs to be designed;
in the aspect of selecting a convolution structure, the road surface segmentation model adopts a light convolution mode, such as depth _ wise convolution, point _ wise convolution, channel shuffle, and inclusion structures, uses a small convolution kernel of 3 × 3, and reduces cross-channel connection as much as possible.
In terms of network structure of the road surface segmentation model, as shown in fig. 4, the network of the road surface segmentation model of the present invention adopts two branches, and the detail branch adopts a wide channel and a shallow layer convolution to obtain the detail features and rich feature expressions of the image low _ level; the semantic branch adopts narrow channel and deep layer convolution, high-level semantic context information such as contour, boundary, angular point and the like can be obtained from a larger receptive field, and then two layers of information with different dimensionalities are fused in a fusion layer by adopting an attention mechanism, so that higher segmentation accuracy is obtained while lower calculation power is ensured;
(14) training and compressing a road surface segmentation model: obtaining initial parameters of the road surface segmentation model by adopting a transfer learning method based on the large model which is preliminarily trained in the step (12), and further compressing the model in order to deploy at the robot end, so that the compressed road surface segmentation model is obtained by adopting a distillation learning method; the method comprises the following specific steps:
(141) performing data enhancement on the marked image, and performing operations such as random clipping, random mask, Gaussian noise addition, random affine transformation, resize and the like on the data during training to increase the diversity of the data;
(142) on the basis of the large model which is preliminarily trained in the step (12), obtaining initial parameters of the road surface segmentation model by adopting a transfer learning method so as to obtain higher training speed and effect;
(143) performing CNN training according to the initial parameters of the road surface segmentation model obtained in the step (142) to obtain a large teacher model (without considering model calculation force and size), ensuring good segmentation effect (clear driving area edge and MAP segmentation reaching 90%), and obtaining fixed parameters of the teacher model;
(144) obtaining a student model according to the pavement segmentation model designed in the step (13) and the initial parameters of the step (142), and adding the middle layer output and the final output of the teacher model obtained in the step (143) into a loss layer of the student model for training, so that the student model is easier to train and obtains a good segmentation effect, and finally obtaining a compressed pavement segmentation model; after the training of the road segmentation model is finished, the original float32 type is converted into the int8 type, so that the running time and the memory consumption of the road segmentation model are reduced;
(15) model transplantation: the time frame adopted during the training of the pavement segmentation model is a pytorch, and when the pavement segmentation model is actually deployed to a feedforward frame at a robot end, openvino needs to convert the model into onx firstly and then convert the onx into a bin file supported by the openvino;
step two, as shown in fig. 2, acquiring a drivable area;
(21) the robot collects a pavement depth image and an RGB image through an RGB-D camera arranged on the robot, and aligns the depth image and the RGB image; in the invention, RGB images can be collected through an RGB camera, and pavement depth images can be collected through a depth camera or a laser radar;
(22) as shown in fig. 3, the RGB image is input into the road segmentation model trained in the first step to obtain a road mask of the travelable area;
(23) and (3) post-processing algorithm processing:
(231) contour fitting: in consideration of reducing the calculation amount of the combination of the drivable areas and the accidental false detection problem, the obtained road mask needs to be subjected to down-sampling to obtain a plurality of discrete points on the contour of the road mask;
(232) filtering a plurality of discrete points on the road mask contour obtained after the down-sampling in the step (231) by adopting a Kalman filtering method to obtain 2D coordinates of all the points, and further obtaining a travelable area;
in the invention, the area change of the front frame drivable area and the rear frame drivable area can be detected in real time after the drivable area is obtained, an area change threshold value is preset, when the area change of the front frame drivable area and the rear frame drivable area is larger than the area change threshold value, the misdetection is judged to exist, the processed image is stored, the image is input into a trained road surface segmentation model for training so as to be used for the aforementioned feedback marking, and the acquisition, the identification and the segmentation of a road surface depth image and an RGB image are carried out again;
(24) carrying out plane detection on the depth image: firstly, assuming that a travelable area is necessarily a plane, wherein the assumption accords with the actual scene of a transformer substation; therefore, the travelable area can be obtained through a plane detection algorithm, namely, the depth image is converted into a 3D point cloud, and the plane detection is carried out on the 3D point cloud through a RANSAC algorithm to obtain the travelable area; the method comprises the following specific steps:
(241) assuming that the plane model is Ax + By + Cz + D as 0; randomly selecting three points from the 3D point cloud to construct a plane model, and calculating parameters and a plane normal vector of the plane model;
(242) substituting all points in the 3D point cloud into the plane model constructed in the step (241), and calculating the number of the inner points; the interpretation basis of whether the point is the interior point can be set artificially, and whether the distance between the point and the plane is smaller than a set threshold value is adopted in the invention; in the present invention, the size of the set threshold is set to 0.5 mm;
(243) judging whether the number of points in the plane model reaches a set threshold value, if not, turning to the step (244); if the plane model is reached, the plane model is taken as a final plane model; wherein, the set threshold value meets the condition that the number of the inner points accounts for 80% of all the points in the 3D point cloud;
(244) repeating the steps (241) to (242), comparing the number of interior points of the current plane model and the plane model at the previous moment, taking the plane model with the maximum number of interior points as a candidate plane model, and recording the parameters and the number of interior points;
(245) repeating the step (243) until all points in the 3D point cloud are traversed, finishing iteration, and taking the final candidate plane model as a final plane model to further obtain a drivable area;
(25) aligning the travelable region obtained in the step (23) with the travelable region obtained in the step (24) according to the internal parameters of the RGB-D camera;
(26) according to the two aligned travelable areas obtained through the steps, the two travelable areas are overlapped under most conditions, but not all planes can be traveled, and a road surface segmentation model has certain probability of false detection;
(261) judging whether the two travelable areas are overlapped or not by comparing the intersection ratio of the two travelable areas with a set intersection ratio threshold value, wherein the set intersection ratio threshold value is 0.6-1; if the driving areas are overlapped, the step three is carried out by taking the driving areas obtained in the step (245) as the reference; if not, go to step (262);
(262) calculating the average distance between all inner points in the travelable area obtained by plane detection and the plane, normalizing the average distance by a standard threshold value to obtain a plane detection error value, calculating a road surface segmentation error value according to the confidence coefficient output by the road surface segmentation model in the step (22), comparing the road surface segmentation error value and the road surface segmentation error value, and taking the travelable area with a smaller error value as a reference; the standard threshold is the average distance between all points in a preset plane and the plane;
step three, fusion of drivable areas:
(31) multi-frame superposition: for multi-frame superposition, a slam system needs to feed back a translation and rotation relation, abbreviated as TF, of a current frame relative to an initial frame; when a driving area is detected, the calibrated external parameters are converted into a vehicle body coordinate system, and then the driving areas detected by a plurality of frames are spliced through TF;
(32) and (3) weight accumulation: the weight accumulation can effectively avoid single-frame false detection, firstly, the whole map is virtualized into a checkerboard graph with grid resolution of 0.05m x 0.05m, when a grid corresponding to a certain area is judged as a travelable area by the robot, the corresponding grid position weight is increased by 1, otherwise, when the grid is judged as a non-travelable area, the corresponding grid position weight is decreased by 1; after the vehicle runs for a set time, setting a threshold value to be 3, wherein a grid with the weight of more than or equal to 3 is a drivable area, and a grid with the weight of less than 3 is an undrivable area;
(33) thereby finally obtaining a fused drivable area;
step four, calibrating the RGB-D camera and the laser radar: after the fused travelable area is obtained, the detected travelable area can be used for a positioning system only by aligning with the existing slam system coordinate system, so that external reference RTs of an RGB-D camera and a laser radar need to be calibrated;
(41) before calibration, a calibration plate needs to be prepared in advance for detecting plane characteristic points and calculating a plane normal vector; installing a calibration plate at a certain point on a map, and respectively collecting calibration point information on the calibration plate through an RGB-D camera and a laser radar;
(42) calculating a rotation matrix:
R*·N=M
R*·(N·N′)=M·N′
R*=(M·N′)·(N·N′)-1(if rank(N·N′)=3)
in the above formula, R is a rotation matrix, N is a plane normal vector of a plane where the calibration point on the calibration plate is located in the camera coordinate system, and M is a plane normal vector of the plane where the calibration point on the calibration plate is located in the laser radar coordinate system;
(43) calculating a translation vector:
Figure RE-GDA0003177226870000091
in the above formula, α, β, γ are euler angles of three degrees of freedom, x, y, z are translation distances of three degrees of freedom, T is a translation vector, piAs coordinates of the marker point in the camera coordinate system, qi,jCoordinates of feature points captured for a lidar in the lidar coordinate system, niIs the normal vector of the plane where the mark point is located under the camera coordinate system.
(44) Under a laser radar coordinate system, a connecting line of two characteristic points on a plane is necessarily vertical to a plane normal vector; firstly, detecting a characteristic point on a plane where a calibration plate is located under a laser coordinate system, simultaneously selecting a characteristic point on the plane where the calibration plate is located under a camera coordinate system, converting the characteristic point into a laser radar coordinate system through external reference rotation and translation, theoretically, if the external reference is accurate, a connecting line of two characteristic points is necessarily vertical to a normal vector of the plane where the calibration plate is located, and points of the connecting line and the normal vector are multiplied by 0, so that the external reference RT can be calculated only by selecting a plurality of groups of characteristic point minimization upper formulas;
(45) and (4) calculating the travelable area under the slam system coordinate system through the external reference RT obtained in the step (44).
The traditional plane detection method has no road surface semantic information and cannot be directly used for detecting the travelable area, and the CNN model cannot be directly applied to engineering generally due to the diversity of training data samples and the insufficient generalization of the model. The driving-capable area perception technology is mainly characterized in that a CNN deep learning method and a traditional plane detection method are combined, semantic information is obtained through a CNN model, and then the plane detection method is introduced to be combined with the CNN model to eliminate occasional false detection so as to improve detection robustness. And obtaining a false detection RGB image of the CNN model in the post-processing module through a self-adaptive threshold, and then putting the false detection image into the model to enhance training to form a feedback annotation closed loop so as to ensure the iterative upgrading capability of the method.
Although the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the details of the foregoing embodiments, and various equivalent changes (such as number, shape, position, etc.) may be made to the technical solution of the present invention within the technical spirit of the present invention, and these equivalent changes are all within the protection scope of the present invention.

Claims (10)

1. A travelable region perception method for a robot, characterized by: the method comprises the following steps:
the method comprises the following steps that firstly, a robot collects a pavement depth image and an RGB image, and the depth image and the RGB image are aligned;
step two, processing the RGB image through a pre-trained road surface segmentation model to obtain a drivable area, performing plane detection on the depth image to obtain a drivable area, and aligning the two drivable areas through the step one;
and step three, fusing the drivable areas by adopting a multi-frame superposition and weight accumulation method to obtain the final drivable area.
2. The travelable region perception method according to claim 1, characterized in that: in the first step, the robot collects a road surface depth image and an RGB image through an RGB-D camera; or the RGB image is collected through the RGB camera, and then the pavement depth image is collected through the depth camera or the laser radar.
3. The travelable region perception method according to claim 1, characterized in that: judging whether the two travelable areas are overlapped or not through the intersection ratio of the two travelable areas before the third step, and if so, turning to the third step; if the distance values are not coincident, calculating the average distance between all inner points in the travelable area obtained by plane detection and the plane, normalizing to obtain a plane detection error value, calculating to obtain a road surface segmentation error value according to the confidence coefficient output by the road surface segmentation model, comparing the road surface segmentation error value and the road surface segmentation error value, and performing fusion in the third step by taking the travelable area with smaller error value as a reference.
4. The travelable region perception method according to claim 1, characterized in that: the road surface segmentation model in the second step is trained as follows:
(11) the robot respectively collects RGB images in different working conditions, different time periods or different positions through a camera arranged on the robot and marks out the road surface in the RGB images;
(12) training the image obtained in the step (11) by adopting a neural network through the neural network to obtain an annotation model, and annotating the acquired RGB image through the annotation model;
(13) and (3) designing a pavement segmentation model, taking the parameters of the labeled model obtained in the step (12) as initial parameters of the pavement segmentation model by adopting a transfer learning method, and training the RGB image after labeling by adopting a distillation learning method to obtain the pavement segmentation model.
5. The travelable region perception method according to claim 4, characterized in that: the designed road surface segmentation model specifically comprises the following steps: adopting a depth _ wise convolution, point _ wise convolution, channel shuffle or inclusion convolution structure, and using a small convolution kernel of 3 by 3; in the network structure, a detail branch adopts a wide channel and a shallow layer convolution, a semantic branch adopts a narrow channel and a deep layer convolution, and a fusion layer adopts an attention mechanism to fuse two layers of information with different dimensionalities.
6. The travelable region perception method according to claim 1, characterized in that: in the second step, the processing of the RGB image by the pre-trained road surface segmentation model to obtain the drivable area specifically comprises:
(21) inputting the RGB image into a trained road surface segmentation model to obtain a road surface mask of a drivable area;
(22) performing down-sampling on the pavement mask obtained in the step (21) to obtain a plurality of discrete points on the pavement mask outline;
(23) and (4) filtering a plurality of discrete points on the road mask contour obtained after the down-sampling in the step (22) by adopting a Kalman filtering method to obtain 2D coordinates of all the points, and further obtaining a travelable area.
7. The travelable region perception method according to claim 1, characterized in that: after the travelable region is obtained in the second step, detecting the area change of the travelable region of the previous and next frames in real time, and comparing the area change with a preset area change threshold; and when the area change of the drivable areas of the previous and next frames is larger than a preset area change threshold value, judging that false detection exists, and inputting the current frame image into the road surface segmentation model trained before to train.
8. The travelable region perception method according to claim 1, characterized in that: in the second step, performing plane detection on the depth image to obtain a travelable area specifically includes:
(31) converting the depth image into a 3D point cloud, and assuming that a plane model is Ax + By + Cz + D as 0;
(32) randomly selecting three points from the 3D point cloud to construct a plane model, and calculating parameters and a plane normal vector of the plane model;
(33) substituting all points in the 3D point cloud into the plane model constructed in the step (32) to calculate the number of the inner points; judging whether the distance between the point and the plane is smaller than a set threshold value according to the judgment basis of whether the point is an inner point;
(34) judging whether the number of points in the plane model reaches a set number, if not, turning to the step (35); if the plane model is reached, the plane model is taken as a final plane model; wherein the set number meets the condition that the number of the internal points accounts for 80% of all the points in the 3D point cloud;
(35) repeating the steps (31) to (32), comparing the number of interior points of the current plane model and the plane model at the previous moment, taking the plane model with the maximum number of interior points as a candidate plane model, and recording the parameters and the number of interior points;
(36) and (5) repeating the step (34) until all the points in the 3D point cloud are traversed, finishing the iteration, and taking the final candidate plane model as a final plane model to further obtain a travelable area.
9. The travelable region perception method according to claim 1, characterized in that: the third step is specifically as follows:
(41) multi-frame superposition:
(411) feeding back a translation and rotation relation TF of the current frame image relative to the initial frame image by the slam system;
(412) obtaining a driving area for each frame of image, converting the driving area into a vehicle body coordinate system through the installation parameters of the RGB-D camera, and splicing through TF to obtain a spliced driving area;
(42) and (3) weight accumulation:
(421) the whole map is virtualized into a chessboard diagram with the grid resolution of 0.05m x 0.05m, when the grid corresponding to a certain area is judged as a travelable area by the robot, the corresponding grid position weight is increased by 1, otherwise, when the grid is judged as a non-travelable area, the corresponding grid position weight is decreased by 1;
(422) after the running of the set time, the grid with the weight more than or equal to the set threshold value 3 is a drivable area, and the grid with the weight less than the set threshold value 3 is an undrivable area;
(43) and finally obtaining the feasible driving area after fusion.
10. The travelable region perception method according to claim 1, characterized in that: further comprising an alignment step:
(51) installing a calibration plate at a certain point on a map, and respectively collecting calibration point information on the calibration plate through an RGB-D camera and a laser radar;
(52) calculating a rotation matrix:
R*·N=M
R*·(N·N′)=M·N′
R*=(M·N′)·(N·N′)-1(if rank(N·N′)=3)
in the above formula, R is a rotation matrix, N is a plane normal vector of a plane where the calibration point on the calibration plate is located in the camera coordinate system, and M is a plane normal vector of the plane where the calibration point on the calibration plate is located in the laser radar coordinate system;
(53) calculating a translation vector:
Figure FDA0003121525700000031
in the above formula, α, β, γ are euler angles of three degrees of freedom, x, y, z are translation distances of three degrees of freedom, T is a translation vector, piAs coordinates of the marker point in the camera coordinate system, qi,jCoordinates of feature points captured for a lidar in the lidar coordinate system, niA normal vector of a plane where a mark point is located under a camera coordinate system;
(54) detecting a characteristic point on the plane of the calibration plate in the laser coordinate system, and simultaneously selecting a corresponding characteristic point on the plane of the calibration plate in the camera coordinate system and converting the characteristic point into the laser radar coordinate system according to the rotation and translation of the step (52) and the step (53);
(55) under a laser radar coordinate system, a connecting line of the two characteristic points is necessarily perpendicular to a normal vector of a plane where the calibration plate is located, namely points of the connecting line and the normal vector are multiplied by 0; repeating the step (54) to select a plurality of characteristic points to obtain an equation set and solving to obtain RT;
(56) and (5) calculating the drivable area under the slam system coordinate system through the RT obtained in the step (55).
CN202110677857.7A 2021-06-18 2021-06-18 Driving region sensing method for robot Pending CN113343875A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110677857.7A CN113343875A (en) 2021-06-18 2021-06-18 Driving region sensing method for robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110677857.7A CN113343875A (en) 2021-06-18 2021-06-18 Driving region sensing method for robot

Publications (1)

Publication Number Publication Date
CN113343875A true CN113343875A (en) 2021-09-03

Family

ID=77476314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110677857.7A Pending CN113343875A (en) 2021-06-18 2021-06-18 Driving region sensing method for robot

Country Status (1)

Country Link
CN (1) CN113343875A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445415A (en) * 2021-12-14 2022-05-06 中国科学院深圳先进技术研究院 Method for dividing a drivable region and associated device
CN116343159A (en) * 2023-05-24 2023-06-27 之江实验室 Unstructured scene passable region detection method, device and storage medium
CN116385949A (en) * 2023-03-23 2023-07-04 广州里工实业有限公司 Mobile robot region detection method, system, device and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445415A (en) * 2021-12-14 2022-05-06 中国科学院深圳先进技术研究院 Method for dividing a drivable region and associated device
CN116385949A (en) * 2023-03-23 2023-07-04 广州里工实业有限公司 Mobile robot region detection method, system, device and medium
CN116385949B (en) * 2023-03-23 2023-09-08 广州里工实业有限公司 Mobile robot region detection method, system, device and medium
CN116343159A (en) * 2023-05-24 2023-06-27 之江实验室 Unstructured scene passable region detection method, device and storage medium

Similar Documents

Publication Publication Date Title
US20210142095A1 (en) Image disparity estimation
US10408939B1 (en) Learning method and learning device for integrating image acquired by camera and point-cloud map acquired by radar or LiDAR corresponding to image at each of convolution stages in neural network and testing method and testing device using the same
CN110097553B (en) Semantic mapping system based on instant positioning mapping and three-dimensional semantic segmentation
CN108802785B (en) Vehicle self-positioning method based on high-precision vector map and monocular vision sensor
US20200250468A1 (en) Learning method and learning device for sensor fusion to integrate information acquired by radar capable of distance estimation and information acquired by camera to thereby improve neural network for supporting autonomous driving, and testing method and testing device using the same
CN113343875A (en) Driving region sensing method for robot
CN111201451A (en) Method and device for detecting object in scene based on laser data and radar data of scene
US10402978B1 (en) Method for detecting pseudo-3D bounding box based on CNN capable of converting modes according to poses of objects using instance segmentation and device using the same
CN106548173A (en) A kind of improvement no-manned plane three-dimensional information getting method based on classification matching strategy
CN104537709A (en) Real-time three-dimensional reconstruction key frame determination method based on position and orientation changes
CN113108771B (en) Movement pose estimation method based on closed-loop direct sparse visual odometer
CN111736586B (en) Method for automatically driving vehicle position for path planning and device thereof
CN113658337B (en) Multi-mode odometer method based on rut lines
US10728461B1 (en) Method for correcting misalignment of camera by selectively using information generated by itself and information generated by other entities and device using the same
CN111797688A (en) Visual SLAM method based on optical flow and semantic segmentation
US10748032B1 (en) Method for providing robust object distance estimation based on camera by performing pitch calibration of camera more precisely with fusion of information acquired through camera and information acquired through V2V communication and device using the same
CN109740584A (en) Automatic parking parking space detection method based on deep learning
CN110599497A (en) Drivable region segmentation method based on deep neural network
Zhao et al. A robust stereo feature-aided semi-direct SLAM system
EP3686776A1 (en) Method for detecting pseudo-3d bounding box to be used for military purpose, smart phone or virtual driving based-on cnn capable of converting modes according to conditions of objects and device using the same
CN114581307A (en) Multi-image stitching method, system, device and medium for target tracking identification
CN112132745A (en) Multi-sub-map splicing feature fusion method based on geographic information
CN110135387B (en) Image rapid identification method based on sensor fusion
CN116403186A (en) Automatic driving three-dimensional target detection method based on FPN Swin Transformer and Pointernet++
Velat et al. Vision based vehicle localization for autonomous navigation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination