CN112396562B

CN112396562B - Disparity map enhancement method based on fusion of RGB and DVS images in high dynamic range scene

Info

Publication number: CN112396562B
Application number: CN202011283187.2A
Authority: CN
Inventors: 黄凯; 孟浩; 李博洋
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2023-09-05
Anticipated expiration: 2040-11-17
Also published as: CN112396562A

Abstract

The invention belongs to the field of robot perception, and particularly relates to a disparity map enhancement method based on fusion of RGB and DVS images in a high dynamic range scene. Comprising the following steps: s1, deploying a binocular RGB camera and a DVS camera, and calibrating the binocular RGB camera and the DVS camera; s2, acquiring RGB images and DVS images of a binocular camera in a scene, and performing multi-scale weighted fusion after registration; s3, generating an HDR image aiming at computer vision for the fused image; s4, generating a parallax image based on the HDR image generated in the step S3 by using an improved binocular stereo matching algorithm SGM. In a scene with a large imaging dynamic range such as a tunnel, the problems of underexposure and high exposure of a camera are solved, the quality of a generated image is improved, meanwhile, aiming at the problems of discontinuity and instability of an image edge area, edge detail information is enriched as much as possible by introducing other information sources, and the accuracy of a finally generated parallax image at the image edge is improved.

Description

Disparity map enhancement method based on fusion of RGB and DVS images in high dynamic range scene

Technical Field

The invention belongs to the field of robot perception, and particularly relates to a disparity map enhancement method based on fusion of RGB and DVS images in a high dynamic range scene.

Background

HDR (High-Dynamic Range) is a High Dynamic Range image that provides more Dynamic Range and image detail information than normal images. The HDR image is synthesized by LDR (Low-Dynamic Range image) of different exposure times, and with LDR images of the best detail corresponding to each exposure time.

Camera calibration is a method of determining parameters of a sensor imaging geometric model that determines the three-dimensional geometric position of a surface point of a spatial object in relation to its corresponding pixel point in an image. The camera calibration is divided into an internal reference calibration and an external reference calibration. Obtaining a projection relation between a camera coordinate system and an image coordinate system by internal parameter calibration; the coordinate conversion relation between the world coordinate system and the camera coordinate system is generally described by a rotation matrix (R) and a translation matrix (T).

The image fusion is to integrate two or more images into a new image by using a specific algorithm, so that the fused image contains more information. Image fusion algorithms that are currently in common use include: mathematical morphology, HIS transformation, laplacian pyramid fusion, wavelet transformation, and the like.

The binocular stereo matching algorithm obtains a parallax image through left and right viewpoint images of the same scene, and further obtains a depth image. The most commonly used algorithm at present is the semi-global matching (SGM) algorithm.

Chinese patent CN111833393a, publication date 2020.10.27, discloses a binocular stereo matching method based on edge information, which performs region division on pixel points by using a super-pixel segmentation algorithm based on image edge information, and finally can obtain a more accurate parallax map in a shielding region and an edge information discontinuous region. However, the method is only suitable for high-quality images with low dynamic range, under-exposure and over-exposure problems can occur in images generated by a camera in a scene with large dynamic range such as a tunnel entrance, and the accuracy of parallax images estimated by the method can be greatly reduced.

Disclosure of Invention

The invention aims to overcome at least one defect in the prior art, and provides a disparity map enhancement method based on fusion of RGB and DVS images in a scene with a high dynamic range, which can realize disparity map enhancement more accurately, reliably and effectively in a scene with a larger imaging dynamic range.

In order to solve the technical problems, the invention adopts the following technical scheme: a disparity map enhancement method based on fusion of RGB and DVS images in a high dynamic range scene comprises the following steps:

s1, deploying a binocular RGB camera and a DVS camera, and calibrating the binocular RGB camera and the DVS camera;

s2, acquiring RGB images and DVS images of a binocular camera in a scene, and performing multi-scale weighted fusion after registration;

s3, generating an HDR image aiming at computer vision for the fused image;

s4, generating a parallax image based on the HDR image generated in the step S3 by using an improved binocular stereo matching algorithm SGM.

Further, after the binocular RGB camera and the DVS camera are deployed, the positions of the sensors are ensured to be relatively unchanged in the process of acquiring data for many times, and only one calibration is needed in the whole process; the calibration mainly comprises the internal parameter calibration of a binocular RGB camera and a DVS camera and the external parameter calibration of the combination of the RGB camera and the DVS camera.

Further, the DVS camera is a sensor triggered based on a change in illumination intensity, outputs a pulse signal when the illumination intensity changes, and has a characteristic of a high dynamic range for a single pixel point in which the response depends on the change in illumination intensity, not an absolute illumination intensity value. In addition, the object edge is better captured by the DVS camera due to the difference in illumination intensity between it and the background. Therefore, the DVS camera is used in the present invention to enhance the edge information of the picture.

Further, the data acquired by the DVS camera is asynchronous event stream data, and there is no concept of frame rate in a common camera, so that the DVS outputs an image frame that is not standard, by setting a fixed time slice length Δt and accumulating trigger events continuously in the time slice, then stacking the event streams accumulated in a period of time together, and finally, obtaining the image frame after passing through an event screen plane with a thickness d.

Further, when the RGB camera uses the checkerboard calibration plate to calibrate internal parameters, the position of the camera is fixed, then the checkerboard calibration plate is fixed to capture a group of images, and a plurality of groups of images are obtained by moving the checkerboard calibration plate; the DVS camera is only sensitive to illumination intensity change, so that when the DVS camera and the checkerboard calibration plate are fixed, the DVS cannot acquire and output image information, a continuously refreshed display screen is used as an event triggering source when DVS internal reference calibration is carried out, and the internal reference calibration of the DVS camera and the RGB camera is simultaneously carried out by displaying the checkerboard calibration plate on the screen.

Further, if a deviation of the positions occurs between the images of the different sensors, the whole fused image can further amplify the deviation; therefore, in the step S2, the matching process includes: firstly, detecting characteristic points through a SIFT, ORB, SURF characteristic extraction algorithm, and then matching the characteristic points; after the matching corresponding points between the images are obtained, calculating homography matrixes of the two images according to the corresponding point information, and calculating the alignment of the images through the homography matrixes; the homography matrix can be calculated through four coordinate point pairs, and after the homography matrix is obtained, registration and alignment can be carried out on the left-hand source images.

Further, as described in step S2, before fusion, a feature-based image registration method is used to reduce geometrical space differences between different sensors, a mapping transformation model between different sensor images is established, and pixels of an image are mapped to pixels of another image through the mapping model.

Further, the image fusion part adopts a pyramid transformation method, filters or samples the image to obtain a pyramid-like layered structure, and performs data fusion on each layer of the pyramid by using a weighted fusion method to obtain a pyramid-shaped fusion image layer; as the resolution of the sampled image layer gradually decreases, the formula is used for the fusion of the image layer with lower resolution but higher frequency:

wherein C is _A (i, j) and C _B (i, j) represent the pixel values of the two sets of images at (i, j), C _F (i, j) represents the pixel value of the fused image at (i, j);

the formula is used for higher resolution but lower frequency images:

C _F (i,j)＝(C _A (i,j)+C _B (i,j))/2

taking the average value as a fusion result; and after the fusion result of each layer is obtained, carrying out inverse transformation and superposition on each layer to obtain an integral fusion image.

Further, in the step S3, the quality of the fused image is seriously affected by the overexposure and underexposure caused by the severe change of the light conditions, two sets of images with different exposure degrees are obtained by the automatic multi-exposure control method, and then the Mertens algorithm with the improved pixel weight calculation formula is applied to the two sets of images to obtain the HDR image.

Further, the generating an HDR image specifically includes the following steps:

the measurement of the response of each pixel is calculated as follows:

where k=0 represents a low exposure image, k=1 represents a high exposure image, I _i,j Representing the gray value of the image at (i, j), delta being a constant;

the initial weights for each pixel of the low exposure and high exposure images are then calculated as follows:

W _i,j,k ＝min{w _C C _i,j,k +w _E E _i,j,k ,1}

wherein C is _i,j,k Constrast weights, w, representing either low exposure or high exposure at (i, j) _C W _E Respectively representing the weight coefficients of the two;

the numerical stability of the result is enhanced by optimizing a weight calculation formula, which is as follows:

wherein N represents the number of different exposure images; w (W) _i,j,k′ Representing the initial weight of the kth' exposure image at subscript (i, j);

finally through the formulaWeighting to obtain an HDR image; i _i,j,k′ The gray value of the kth' exposure image at (i, j) is represented.

Further, in the improved SGM algorithm, census transformation with illumination invariance is used for replacing mutual information MI in the original SGM algorithm to calculate parallax matching cost, then cost aggregation is carried out based on cross neighborhood to improve performance of the algorithm under the condition of discontinuous gray information and depth information, and finally parallax calculation and parallax optimization are carried out according to steps in the SGM algorithm.

According to the invention, the binocular RGB camera and the DVS camera are calibrated, and the relative positions among the sensors are kept unchanged during data acquisition after calibration is completed, so that the conversion relation among the sensor coordinate systems is prevented from being damaged. Before fusing the RGB images and DVS images, feature-based registration of the images is performed to reduce the geometrical spatial differences between the different sensor images. The registered images are subjected to pyramid transformation to obtain a layered structure, each layer is fused by using a weighted fusion method, and the whole fused image is obtained by an inverse transformation superposition mode. On the basis of the fusion, the HDR image is obtained from two differently exposed images using the Mertens algorithm after modification. Finally, a modified binocular stereo matching algorithm, namely a semi-global matching algorithm (SGM), is used for generating a high-quality disparity map from the HDR image.

The method can effectively solve the problems of underexposure and high exposure of the camera in a scene with a large imaging dynamic range such as a tunnel, improves the quality of the generated image, and simultaneously aims at the problems of discontinuity and instability of the image edge area, enriches the edge detail information as much as possible by introducing other information sources, and improves the accuracy of the finally generated parallax image at the image edge.

Compared with the prior art, the beneficial effects are that:

1. the edge information of the image is enhanced by fusing the binocular RGB image and the DVS image, and the accuracy of the generated parallax image at the edge of the image is improved;

2. the HDR image is obtained through two different exposure images by using the improved Mertens algorithm, so that the invention can generate a high-quality parallax image in a scene with a larger imaging dynamic range;

3. in the binocular matching algorithm, census transformation is used for replacing original mutual information calculation in the SGM algorithm, so that illumination invariance is ensured, and meanwhile, the execution speed of the algorithm is improved.

Drawings

FIG. 1 is a schematic overall flow chart of the method of the present invention.

Fig. 2 is an illustration of the DVS imaging principle of the present invention.

FIG. 3 is a flow chart of a multi-scale weighted fusion method based on pyramid transformation.

FIG. 4 is a schematic diagram of the image pyramid structure model of the present invention.

Fig. 5 is a flowchart illustration of the present invention generating an HDR image.

Fig. 6 is a schematic flow chart of the SGM algorithm modified in the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationship described in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.

As shown in fig. 1, a disparity map enhancement method based on fusion of RGB and DVS images in a high dynamic range scene includes the following steps:

s3, generating an HDR image aiming at computer vision for the fused image;

In one embodiment, after deployment of the binocular RGB camera and the DVS camera, the positions of the sensors are ensured to be relatively unchanged in the process of acquiring data for multiple times, and only one calibration is needed in the whole process; the calibration mainly comprises the internal parameter calibration of a binocular RGB camera and a DVS camera and the external parameter calibration of the combination of the RGB camera and the DVS camera.

Further, as shown in fig. 2, the DVS camera is a sensor triggered based on a change in illumination intensity, outputs a pulse signal when the illumination intensity changes, and has a characteristic of a high dynamic range for a single pixel point that its response depends on the change in illumination intensity, not an absolute illumination intensity value. In addition, the object edge is better captured by the DVS camera due to the difference in illumination intensity between it and the background. Therefore, the DVS camera is used in the present invention to enhance the edge information of the picture.

The data acquired by the DVS camera is asynchronous event stream data, and the concept of frame rate in a common camera is not adopted, so that the DVS outputs image frames which are not standard, trigger events are continuously accumulated in a fixed time slice by setting a time slice length deltat, then event streams accumulated in a period of time are overlapped together, and finally the image frames are obtained after passing through an event screen plane with a thickness d.

When the RGB camera uses the checkerboard calibration board to calibrate internal references, fixing the camera position, then fixing the checkerboard calibration board to capture a group of images, and obtaining a plurality of groups of images by moving the checkerboard calibration board; the DVS camera is only sensitive to illumination intensity change, so that when the DVS camera and the checkerboard calibration plate are fixed, the DVS cannot acquire and output image information, a continuously refreshed display screen is used as an event triggering source when DVS internal reference calibration is carried out, and the internal reference calibration of the DVS camera and the RGB camera is simultaneously carried out by displaying the checkerboard calibration plate on the screen.

In addition, if a deviation of the positions occurs between the different sensor images, the whole image after fusion may further amplify the deviation; therefore, in the step S2, the matching process includes: firstly, detecting characteristic points through a SIFT, ORB, SURF characteristic extraction algorithm, and then matching the characteristic points; after the matching corresponding points between the images are obtained, calculating homography matrixes of the two images according to the corresponding point information, and calculating the alignment of the images through the homography matrixes; the homography matrix can be calculated through four coordinate point pairs, and after the homography matrix is obtained, registration and alignment can be carried out on the left-hand source images.

In the step S2, before fusion, a feature-based image registration method is used to reduce geometrical space differences between different sensors, a mapping transformation model between different sensor images is established, and pixels of an image are mapped to pixels of another image through the mapping model.

In some embodiments, as shown in fig. 3 and fig. 4, the image fusion part adopts a pyramid transformation method, filters or samples the image to obtain a layered structure similar to a pyramid, and performs data fusion on each layer of the pyramid by using a weighted fusion method to obtain a pyramid-shaped fused image layer; as the resolution of the sampled image layer gradually decreases, the formula is used for the fusion of the image layer with lower resolution but higher frequency:

taking a larger absolute value as a fusion result, and adopting a formula aiming at an image with higher resolution but lower frequency:

C _F (i,j)＝(C _A (i,j)+C _B (i,j))/2

In another embodiment, in the step S3, the quality of the fused image is seriously affected by the problems of overexposure and underexposure caused by the severe variation of the light conditions, two sets of images with different exposure degrees are obtained by an automatic multi-exposure control method, and then the Mertens algorithm with improved pixel weight calculation formula is applied to the two sets of images to obtain the HDR image. As shown in fig. 5, the generation of the HDR image specifically includes the steps of:

the measurement of the response of each pixel is calculated as follows:

W _i,j,k ＝min{w _C C _i,j,k +w _E E _i,j,k ,1}

finally through the formulaWeighting to obtain HDR image, I _t,j,k′ The gray value of the kth' exposure image at (i, j) is represented.

In some embodiments, as shown in fig. 6, the modified SGM algorithm uses Census transform with illumination invariance to replace mutual information MI in the original SGM algorithm to calculate parallax matching cost, then performs cost aggregation based on crisscross neighborhood to improve performance of the algorithm under the condition of discontinuous gray information and depth information, and finally performs parallax calculation and parallax optimization according to steps in the SGM algorithm.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. The disparity map enhancement method based on fusion of RGB and DVS images in a high dynamic range scene is characterized by comprising the following steps:

s3, generating an HDR image aiming at computer vision for the fused image; the quality of the fused image is seriously affected due to the problems of overexposure and underexposure which occur in severe light condition change, two groups of images with different exposure degrees are obtained through an automatic multi-exposure control method, and then an HDR image is obtained by applying a Mertens algorithm with an improved pixel weight calculation formula to the two groups of images;

the generation of the HDR image specifically comprises the following steps:

the measurement of the response of each pixel is calculated as follows:

W _i,j,k ＝min{w _C C _i,j,k +w _E E _i,j,k ,1}

finally lead toOverformulaWeighting to obtain HDR image, I _i,j,k′ Representing the gray value of the kth' exposure image at (i, j);

s4, generating a parallax image based on the HDR image generated in the step S3 by using an improved binocular stereo matching algorithm SGM; in the improved SGM algorithm, census transformation with illumination invariance is used for replacing mutual information MI in the original SGM algorithm to calculate parallax matching cost, then, based on cross neighborhood, cost aggregation is carried out to improve the performance of the algorithm under the condition that gray information and depth information are discontinuous, and finally, parallax calculation and parallax optimization are carried out according to steps in the SGM algorithm.

2. The parallax image enhancement method based on fusion of RGB and DVS images in a high dynamic range scene according to claim 1, wherein after deployment of a binocular RGB camera and a DVS camera, the positions of the sensors are ensured to be relatively unchanged in the process of acquiring data for multiple times, and only one calibration is needed in the whole process; the calibration mainly comprises the internal parameter calibration of a binocular RGB camera and a DVS camera and the external parameter calibration of the combination of the RGB camera and the DVS camera.

3. The disparity map enhancement method based on fusion of RGB and DVS images in a high dynamic range scene according to claim 2, wherein edge information of the picture is enhanced using a DVS camera; since the data acquired by the DVS camera is asynchronous event stream data, the DVS outputs image frames that are not standard, by setting a fixed time slice length Δt and accumulating trigger events continuously during the time slice, then stacking the event streams accumulated over a period of time, and finally, obtaining the image frames after passing through an event screen plane with a thickness d.

4. The parallax image enhancement method based on fusion of RGB and DVS images in a high dynamic range scene according to claim 2, wherein when the RGB camera uses a checkerboard calibration plate to calibrate internal parameters, the position of the camera is fixed, then the checkerboard calibration plate is fixed to capture a group of images, and a plurality of groups of images are obtained by moving the checkerboard calibration plate; and when DVS internal reference calibration is carried out, a display screen which is continuously refreshed is used as an triggering source of an event, and the internal reference calibration of the DVS camera and the RGB camera is simultaneously carried out by displaying a checkerboard calibration plate on the screen.

5. The disparity map enhancement method based on fusion of RGB and DVS images in a high dynamic range scene according to claim 1, wherein in step S2, the matching process includes: firstly, detecting characteristic points through a SIFT, ORB, SURF characteristic extraction algorithm, and then matching the characteristic points; after the matching corresponding points between the images are obtained, calculating homography matrixes of the two images according to the corresponding point information, and calculating the alignment of the images through the homography matrixes; the homography matrix is calculated through four coordinate point pairs, and after the homography matrix is obtained, the left-hand source image is registered and aligned.

6. The method for enhancing a disparity map based on fusion of RGB and DVS images in a high dynamic range scene according to claim 5, wherein in step S2, before the fusion, a feature-based image registration method is used to reduce geometrical space differences between different sensors, a mapping transformation model between different sensor images is established, and pixels of an image are mapped to pixels of another image through the mapping model.

7. The parallax image enhancement method based on fusion of RGB and DVS images in a high dynamic range scene according to claim 6, wherein the image fusion part adopts a pyramid transformation method, filters or samples the image to obtain a layered structure similar to a pyramid, and performs data fusion on each layer of the pyramid by using a weighted fusion method to obtain a pyramid-shaped fusion image layer; as the resolution of the sampled image layer gradually decreases, the formula is used for the fusion of the image layer with lower resolution but higher frequency:

the formula is used for higher resolution but lower frequency images:

C _F (i,j)＝(C _A (i,j)+C _B (i,j))/2