CN111757021A

CN111757021A - Multi-sensor real-time fusion method for mobile robot remote takeover scene

Info

Publication number: CN111757021A
Application number: CN202010642390.8A
Authority: CN
Inventors: 李红; 杨国青; 朱春林; 吕攀; 吴朝晖
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-07-06
Filing date: 2020-07-06
Publication date: 2020-10-09
Anticipated expiration: 2040-07-06
Also published as: CN111757021B

Abstract

The invention discloses a multi-sensor real-time fusion method for a mobile robot remote takeover scene. Aiming at the problems of splicing seams and double images in the linear fusion method, the invention provides the linear fusion method with the threshold, which can eliminate the splicing seams and the double images while ensuring the real-time performance and improve the splicing effect. By combining the actual situation of the intelligent mobile robot, the invention completes the real-time fusion of multi-channel high-definition video streams with depth information through an offline parameter updating and specific parallelization method, reduces the time delay of data transmission while removing the redundant information of multiple sensors, and has practical application value.

Description

Multi-sensor real-time fusion method for mobile robot remote takeover scene

Technical Field

The invention belongs to the technical field of multi-sensor fusion, and particularly relates to a multi-sensor real-time fusion method for a mobile robot remote take-over scene.

Background

With the development of scientific technology in recent years, the intelligent mobile robot has more and more extensive application in people's life, can complete corresponding tasks in specific scenes, and partially replaces the labor force of people, such as an AGV sorting robot, a sweeping robot, an automatic driving truck and the like. However, in the current stage, the intelligent mobile robot still does not reach the level of completely autonomously processing the problems, when the intelligent mobile robot is in a complex scene or an unknown scene, a task decision system of the intelligent mobile robot cannot sufficiently cope with new problems, and if remote assistance is lacked, an uncontrollable situation occurs, and even accidents and losses are caused. The intelligent mobile robot has the problem that the functional module is abnormal, so that the intelligent mobile robot cannot independently move to a safe position to wait for maintenance, and a lot of inconvenience is caused. The remote control taking-over system combines the autonomous movement capability of the intelligent mobile robot and the cognitive flexibility of an operator, assists the operator to control, and guarantees the stability and the safety of the intelligent mobile robot.

At present, a remote takeover system usually transmits multiple paths of video stream data directly to a client for display, but the method has the following problems: 1. the sensor is single in use, only the camera is used for collecting environmental information, and important distance information is lost; 2. the transmission data volume is large, the redundant data is large, and the requirements on network bandwidth and transmission speed are high; 3. the display of the multi-channel video data is asynchronous; 4. the multiple video streams are directly displayed, so that the operators are inconvenient to make decisions. For the problem 1, the method is solved by adopting a laser radar sensor to collect point cloud data; for problems 2, 3, 4, optimization is performed by means of multi-sensor fusion.

The Chinese patent with the publication number of CN108495060A provides a real-time splicing method facing high-definition videos, and after the method is initialized, all steps of a video splicing stage are accelerated in a parallelization mode, so that the splicing frame rate of two paths of video streams is more than or equal to 30fps, and the problems of double images, fuzziness and the like are solved at the same time.

Chinese patent publication No. CN103856727B proposes a real-time multi-channel video stream splicing method, which accelerates the execution time of the system by means of offline parameter update, and achieves the effect of real-time multi-camera video stream splicing, but the method has many problems: 1. the adopted image resolution is 704 multiplied by 576, and for a remote takeover system, the resolution is low and is not beneficial to recognition and decision judgment; 2. the multi-image fusion of the method adopts a pairwise fusion scheme, so that the time delay of the system is greatly increased; 3. the method is only suitable for relatively static supervision scenes, and does not perform corresponding processing on dynamic objects with splicing seams.

Chinese patent publication No. CN106526605B proposes a data fusion method for a laser radar and a depth camera, which obtains two different sets of polar coordinate strings by calculating a first pixel point of a depth image and an angle and a distance of a laser radar point cloud, and performs sequence fusion according to the angle, but the method can well expand a detection range and improve information accuracy, but has the following problems: 1. the method adopts the fusion of a single binocular camera and the laser radar, the cost of the binocular camera is high, and the visual information acquired by the single binocular camera is less; 2. the fusion method cannot meet the real-time requirement and is not suitable for the scene of remote take-over with the intelligent mobile robot.

Disclosure of Invention

In view of the above, the invention provides a multi-sensor real-time fusion method for a mobile robot remote takeover scene, which adopts a laser radar and multiple cameras to acquire visual information and distance information to make up for the defects of a single sensor, and generates a video stream with depth information by fusing the laser radar and the multiple cameras to enable an operator to know the distance information of surrounding objects; aiming at the problems of splicing seams, double images and the like existing in the linear fusion method, the invention provides the linear fusion method with the threshold, so that the splicing seams and the double images are eliminated while the real-time performance is ensured, and the splicing effect is improved; aiming at the condition that the positions of multiple cameras are relatively fixed, the method completes real-time fusion of multiple paths of high-definition video streams through an offline parameter updating technology and a specific parallelization acceleration method, reduces the time delay of data transmission while removing redundant information of multiple sensors, and has practical application value.

A multi-sensor real-time fusion method for a mobile robot remote takeover scene comprises the following steps:

(1) configuring three monocular cameras and a laser radar for the mobile robot, calculating a global homography matrix among the multiple cameras in an off-line parameter updating mode, and storing the global homography matrix;

(2) fusing point cloud data acquired by a laser radar with a video image acquired by a camera to generate three paths of video streams with depth information;

(3) for a video stream with depth information, preprocessing each frame of video image, including distortion correction and columnar projection;

(4) and performing projection transformation on the preprocessed video images by using a global homography matrix, performing fusion between the images by using a linear fusion method with a threshold value, and performing real-time fusion on the three video streams by using a specific parallelization method to obtain a final fusion result.

The camera can acquire two-dimensional visual information, but important distance information can be lost, and the performance is poor in severe environment and weather; the laser radar can acquire accurate distance information, but the resolution ratio is low; according to the invention, the laser radar and the three monocular cameras are adopted simultaneously, the defect of a single sensor is overcome, the three monocular cameras and the laser radar are relatively fixed in position, one camera is right opposite to the front, the other two cameras are respectively arranged on the left side and the right side, and the laser radar is placed above the front camera and is prevented from being shielded by an object. The multi-path video stream fusion needs a certain coincidence region between images, the more the coincidence region is, the better the fusion effect is, but the less the information content is, in order to have a better fusion result, more left and right side environment information can be obtained simultaneously, and the angles between the left and right side cameras and the front camera are 45-60 degrees.

In the process of real-time fusion of multiple paths of video streams, projection transformation among images needs to be completed through a global homography matrix among cameras, and if the calculation of the global homography matrix is carried out on each frame of fused images, the requirement on real-time performance cannot be met; when the relative position between the cameras is fixed, the global homography matrix is also relatively fixed, so that the global homography matrix between the cameras is calculated and stored in the off-line parameter updating stage, and the fusion speed of the multiple cameras is accelerated. The specific implementation of the offline parameter updating mode in the step (1) is as follows: firstly, reading video images collected by three cameras, then respectively carrying out SURF feature detection and descriptor extraction on the images by utilizing a SurfFeatureDetector function and a SurfDescriptorExtractor function, further extracting feature point pairs by adopting a FlanBasedMatcher matcher, and finally calculating a global homography matrix among the cameras by using a findHomography function.

In order to enable an operator to have certain grasp on distance information and avoid the interference of multi-sensor data, the invention fuses a laser radar and a camera to obtain a video stream with depth information; the specific implementation manner of the step (2) is as follows: firstly, respectively carrying out combined calibration on a laser radar and a camera to obtain a rotation matrix and a translation vector between a laser radar coordinate system and a camera coordinate system, projecting point cloud data onto a video image and coloring according to depth information for point cloud data and the video image at the same moment, and generating a video stream with the depth information; coloring of the point cloud is related to the change range of the object depth information, for different camera fusion results, the change range of the object depth information is inconsistent, which results in the same distance information, but the corresponding point cloud coloring is different, so that the change ranges of the different camera depth information need to be unified. According to the method, the threshold value is set according to the actual situation of the scene, the consistency of the point cloud coloring results projected by the point cloud data into the video images of the three cameras is ensured, and the color fluctuation at the splicing position after the three video streams are fused is avoided.

After acquiring a plurality of paths of video streams with depth information, fusing the plurality of paths of video streams, calibrating a camera through a checkerboard because of the distortion of an image, and calculating internal and external parameters of the camera to perform distortion correction of the image, wherein the poor fusion result is caused; since the images are not under the unified plane coordinate system, direct splicing can cause distortion of the splicing result, and therefore, plane images of different coordinate systems are projected onto the unified cylindrical curved surface through cylindrical projection. Performing columnar projection on the video image through the following formula in the step (3);

x’＝f*θ

y’＝y*cos(θ)

wherein: for any pixel point, x and y are respectively the horizontal and vertical coordinates of the pixel point in the video image, x 'and y' are respectively the horizontal and vertical coordinates of the pixel point on the projection image, f is the focal length of the camera, and theta is the deflection angle of the pixel point in the video image relative to the optical axis.

For preprocessing operations such as distortion correction and columnar projection, the video streams are not interfered with each other, so that the acceleration can be performed in a multi-thread mode, and meanwhile, the preprocessing of the video streams in multiple paths is realized.

After preprocessing each path of video stream data, performing image projection transformation through a global homography matrix obtained in an off-line parameter updating stage, and then performing fusion between images. The commonly used image fusion method is linear fusion and Laplace fusion, the linear fusion speed is high, but the splicing effect is poor, and the problems of splicing seams, double images and the like exist. The Laplace fusion splicing effect is good, but the speed is low, and the real-time requirement cannot be met. The linear fusion method with the threshold in the step (4) is specifically realized as follows:

for the overlapped area in the two images, let p be any pixel point in the overlapped area, the pixel values in the two images are pval1 and pval2 respectively, Mval is the pixel value after linear weighted fusion of the pixel point p, Mval is the maximum value of pval1 and pval2, pval is the pixel value of the pixel point p after final fusion, wherein:

mval＝w*pval1+(1-w)*pval2

Mval＝max(pval1，pval2)

pm1＝|pval1-mval|

pm2＝|pval2-mval|

dividing the overlapped area into three areas A1, A2 and A3 in equal proportion from left to right, in the area A1, if pm1 is less than 5, setting pval to mval, otherwise, setting pval to 0.5(pval1+ mval);

in the a2 region, when pm1 and pm2 are both less than 5, pval is set to Mval, when pm1 and pm2 are both more than 5, pval is set to Mval, otherwise, pval is set to 0.5(Mval + Mval);

in the a3 region, if pm2 is less than 5, then pval is set to mval, otherwise, pval is set to 0.5(pval2+ mval); wherein: w is a weight coefficient between 0 and 1, and the value of w in the overlapping area gradually decreases from left to right.

For the three images A, B, C to be fused, the conventional multi-image fusion method would fuse A, B to obtain D, and then fuse D, C to obtain the final result, which would greatly increase the time delay of the system. Aiming at the actual situation of the intelligent mobile robot, the invention accelerates through a specific parallelization method, the parallelization method in the step (4) is to take the image of the front camera as a reference image and fuse the reference image with the images of the left camera and the right camera respectively, because the two times of fusion are not interfered with each other, the parallelization method can be accelerated through a multithreading technology, because the two fusion results have the same image information of the front camera, and finally, the final fusion image can be obtained through simple cutting and splicing.

Further, the fusion of the images in the step (4) is realized by accelerating through a CUDA technology.

Based on the technical scheme, the invention has the following beneficial technical effects:

1. the invention provides a multi-sensor real-time fusion method aiming at a scene of remote take-over of a mobile robot, applies multi-sensor fusion to the mobile robot, overcomes the defect of a single sensor, enables an operator to know distance information of surrounding objects by fusing a laser radar and multiple cameras to generate a video stream with depth information, and meanwhile removes redundant data in the multiple sensors to accelerate data transmission.

2. The invention improves the related problems existing in linear fusion, and performs fusion between images by a linear fusion method with a threshold, thereby greatly improving the fusion effect while ensuring the timeliness.

3. The invention combines the practical situation of the intelligent mobile robot, aims at the problem of high time delay of the traditional multi-path video stream fusion, performs multi-camera fusion through offline parameter updating and a specific parallelization scheme, and accelerates through a CUDA (compute unified device architecture) technology, thereby meeting the real-time requirement on an embedded platform.

Drawings

FIG. 1 is a schematic flow chart of a multi-sensor real-time fusion method according to the present invention.

Fig. 2 is a schematic diagram of the time delay comparison between the parallelization fusion mode and the traditional pairwise fusion mode.

Detailed Description

In order to more specifically describe the present invention, the following detailed description is provided for the technical solution of the present invention with reference to the accompanying drawings and the specific embodiments.

As shown in fig. 1, the multi-sensor real-time fusion method for a mobile robot remote takeover scenario of the present invention specifically includes the following steps:

step 1: the intelligent mobile robot adopts three cameras, and the positions of the cameras are relatively fixed.

In a long-distance takeover scene, the camera can acquire visual information, but important distance information can be lost, the performance is poor in severe environment and weather, the laser radar can acquire accurate distance information, but the resolution ratio is low, and the laser radar and the three monocular cameras are adopted simultaneously to make up for the defect of a single sensor. One of the cameras is right opposite to the front, and the other two cameras are respectively arranged on the left side and the right side. The multi-path video stream fusion needs a certain coincidence region between images, the more the coincidence region is, the better the fusion effect is, but the less the information content is, in order to have a better fusion result, more left and right side environment information can be obtained simultaneously, and the angles of the left and right side cameras and the front side camera are 45-60 degrees. Laser radar places in leading camera top, avoids sheltered from by the object simultaneously. In the embodiment, the intelligent mobile robot is provided with a multi-core CPU and a high-performance GPU by adopting a platform built based on a Jetson TX2 development board.

Step 2: and updating and calculating a global homography matrix among the multiple cameras through the offline parameters and storing the global homography matrix.

In the process of multi-path video stream fusion, projection transformation among images needs to be completed through a global homography matrix among cameras, if the calculation of the global homography matrix is carried out on each frame of fused images, the requirement of real-time performance cannot be met, and when the positions among the cameras are relatively fixed, the global homography matrix is also relatively fixed, so that the global homography matrix among the cameras is calculated and stored in an off-line parameter updating stage, and the speed of multi-camera fusion is accelerated.

The real-time fusion method of the multiple cameras is divided into an off-line parameter updating stage and a real-time fusion stage, wherein the off-line parameter updating stage comprises the following steps: reading pictures, SURF feature extraction, calculating a global homography matrix and storing parameters. The embodiment specifically realizes that OpenCV is adopted, a feature detector and a descriptor extractor of SURF feature points are established through a SurfFeatureDetector function and a SurfDescriptorExtractor function, a feature point pair is extracted through a FlankBasedMatcher matcher, mismatching points in a matching point pair are considered as mismatching points when the distance between the matching points exceeds 3 times of the maximum distance between all the matching points, the mismatching points are removed, and then a global homography matrix between cameras is calculated through a findHomography function.

And step 3: and fusing the laser radar point cloud data and the multi-path camera data to generate a multi-path video stream with depth information.

In order to enable an operator to have certain grasp on distance information and avoid interference of data of multiple sensors, a laser radar and a camera are fused to obtain an image with depth information. Firstly, respectively carrying out combined calibration on a laser radar and a plurality of cameras to obtain a rotation matrix and a translation vector between a laser radar coordinate system and a camera coordinate system, then projecting point cloud data on an image for point cloud data and the image at the same moment, and coloring according to depth information. Coloring of the point cloud is related to the variation range of the object depth information, and for different camera fusion results, the variation range of the object depth information is inconsistent, which results in the same distance information, but the corresponding coloring is different, so that the variation ranges of the depth information of different cameras need to be unified. According to the method, the threshold value is set according to the actual situation of the scene, so that the consistency of point cloud coloring results of depth images generated by fusing different cameras and the laser radar is ensured, and the condition that the video streams of multiple cameras do not fluctuate too much at the splicing joint position is ensured.

And 4, step 4: and carrying out distortion correction and columnar projection on each frame of image on the multi-channel video stream data with the depth information.

After acquiring the multiple paths of video streams with the depth information, the multiple paths of video streams need to be fused, and due to distortion of the images, poor fusion results can be caused, so that the camera needs to be calibrated through a checkerboard, and the distortion correction of the images is performed by calculating internal and external parameters of the camera. Since the images are not under the unified plane coordinate system, direct splicing can cause distortion of the splicing result, and therefore, plane images of different coordinate systems are projected onto the unified cylindrical curved surface through cylindrical projection. For distortion correction and columnar projection, the multi-path video stream data do not interfere with each other, so that the multi-path video stream data can be accelerated by multithreading, and the preprocessing of the multi-path video stream data is realized. In the specific implementation, the implementation mode is realized by combining with an ROS system, soft synchronization of data is completed through a multi-node subscription mechanism of the ROS for reading multi-path video stream data with depth information, for columnar projection and distortion correction, a projection transformation matrix is calculated through internal and external parameters of a camera and a columnar projection transformation formula, then preprocessing of an image can be completed through a remap function of an OpenCV, and the preprocessing can be accelerated to a certain extent.

And 5: and performing projection transformation on the preprocessed images, and performing fusion between the images by a linear fusion method with a threshold value.

After preprocessing each path of video stream data, performing image projection transformation through a global homography matrix obtained in an off-line parameter updating stage, and then performing fusion between images. The commonly used image fusion method is linear fusion and Laplace fusion, the linear fusion speed is high, but the splicing effect is poor, and the problems of splicing seams, double images and the like exist. The Laplace fusion splicing effect is good, but the speed is low, and the real-time requirement cannot be met. Aiming at the problems, the invention improves the linear fusion method and provides a linear fusion algorithm with a threshold value, and the principle is as follows:

let area1 and area2 be the overlapped region of the stitched image, the pixel values are pval1 and pval2, pval is the pixel value of the final fused image, Mval is the pixel value obtained by linear weighted fusion, Mval is the maximum value of pval1 and pval2, θ is the maximum value of pval1 and pval2₁、θ₂For the difference value between the pixel value of the overlapped area and mval, a calculation formula is as follows, wherein w is a weight value in the range of 0-1 and gradually decreases from left to right.

mval＝w*pval1+(1-w)*pval2

Mval＝max(pval1，pval2)

θ₁＝|pval1-mval|

θ₂＝|pval2-mval|

Dividing the overlapped area into three areas A1, A2 and A3 in equal proportion, and in the area A1, more emphasizing on the left image when theta is equal to theta₁When smaller than θ, pval is mval, otherwise, pval is 0.5(pval1+ mval). In the region A2, when theta is greater than theta₁And theta₂When both are less than theta, pval is mval, when theta is less than theta₁And theta₂When both are larger than θ, pval equals Mval, otherwise pval equals 0.5(Mval + Mval). In the area A3, the right spliced image is more emphasized, and when theta is satisfied₂When θ is smaller, pval is mval, otherwise, pval is 0.5(pval2+ mval), and θ is 5 in the present embodiment.

Step 6: and carrying out real-time fusion on the multi-channel video stream with the depth information by a specific parallelization method to obtain a final fusion result.

For the three images A, B, C to be stitched, the conventional multi-image fusion method would first perform A, B fusion to obtain D, and then perform D, C fusion to obtain the final result, which would greatly increase the time delay of the system. Aiming at the actual situation of the intelligent mobile robot, the invention accelerates by a specific parallelization method, the invention respectively fuses the left camera image, the right camera image and the front camera image by taking the image of the front camera as a reference image, and the two times of fusion are not interfered with each other, so the acceleration can be realized by a multithreading technology. The two fusion results have the same image information of the front camera, so that the final image can be obtained by cutting and splicing.

In addition, the embodiment also adopts a CUDA technology to accelerate the relevant steps, and is specifically realized through an OpenCVCUDA module, and data exchange is performed through a GpuMat matrix and a Mat matrix.

Comparing a traditional multi-channel video stream fusion method with a parallelization method aiming at an intelligent mobile robot, recording the time delay of continuous 30 sequence frames, and accelerating the two schemes by offline parameter updating and CUDA (compute unified device architecture) technology, wherein the result is shown in figure 2, the average time delay of a multi-camera pairwise fusion method is 0.0334s, and the average time delay of the parallelization method is 0.0283 s. The splicing time delay is reduced by 15.2% by the parallelization method, the average time delay of the fusion and preprocessing stages of the laser radar camera is 0.0523s, and therefore, images of 12 frames per second can be processed by the parallelization method in the whole splicing process, and the real-time requirement of an intelligent mobile robot scene is met.

Aiming at the scene of remote take-over of the intelligent mobile robot, the laser radar and the multiple cameras are used for acquiring data, the defect of a single sensor is overcome, the multiple sensors are fused, the data volume of transmission is reduced, and the data transmission is accelerated. Aiming at the problems of splicing seams, double images and the like existing in linear fusion, fusion is carried out through a linear fusion method with a threshold value, and the splicing effect is improved while the timeliness is ensured. Aiming at the condition that the positions of multiple cameras are relatively fixed, the real-time fusion of multi-channel high-definition video streams with depth information is completed by accelerating through an offline parameter updating technology and a specific parallelization method.

The embodiments described above are presented to enable a person having ordinary skill in the art to make and use the invention. It will be readily apparent to those skilled in the art that various modifications to the above-described embodiments may be made, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims

1. A multi-sensor real-time fusion method for a mobile robot remote takeover scene comprises the following steps:

2. The multi-sensor real-time fusion method according to claim 1, characterized in that: three camera and laser radar position relatively fixed, one of them camera is just to the place ahead, and all the other two cameras are arranged respectively in the left and right sides, and laser radar places in leading camera top, and the angle of left and right sides camera and leading camera is between 45 ~ 60 degrees.

3. The multi-sensor real-time fusion method according to claim 1, characterized in that: the specific implementation of the offline parameter updating mode in the step (1) is as follows: firstly, reading video images collected by three cameras, then respectively carrying out SURF feature detection and descriptor extraction on the images by utilizing a SurfFeatureDetector function and a SurfDescriptorExtractor function, further extracting feature point pairs by adopting a FlanBasedMatcher matcher, and finally calculating a global homography matrix among the cameras by using a findHomography function.

4. The multi-sensor real-time fusion method according to claim 1, characterized in that: the specific implementation manner of the step (2) is as follows: firstly, respectively carrying out combined calibration on a laser radar and a camera to obtain a rotation matrix and a translation vector between a laser radar coordinate system and a camera coordinate system, projecting point cloud data onto a video image and coloring according to depth information for point cloud data and the video image at the same moment, and generating a video stream with the depth information; and then setting a threshold value according to the actual situation of the scene, ensuring that point cloud coloring results projected by point cloud data into video images of the three cameras have consistency, and ensuring that no color fluctuation exists at a splicing part after three video streams are fused.

5. The multi-sensor real-time fusion method according to claim 1, characterized in that: performing columnar projection on the video image through the following formula in the step (3);

x’＝f*θ

y’＝y*cos(θ)

6. The multi-sensor real-time fusion method according to claim 1, characterized in that: the linear fusion method with the threshold in the step (4) is specifically realized as follows:

mval＝w*pval1+(1-w)*pval2

Mval＝max(pval1,pval2)

pm1＝|pval1-mval|

pm2＝|pval2-mval|

7. The multi-sensor real-time fusion method according to claim 1, characterized in that: the parallelization method in the step (4) is to take the image of the front camera as a reference image and fuse the reference image with the images of the left camera and the right camera respectively, the two times of fusion are not interfered with each other, the acceleration can be realized by a multithreading technology, and the two fusion results have the same image information of the front camera, and finally the final fusion image can be obtained by simple cutting and splicing.

8. The multi-sensor real-time fusion method according to claim 1, characterized in that: and (4) accelerating the fusion of the images by adopting a CUDA (compute unified device architecture) technology.