CN109360179B

CN109360179B - Image fusion method and device and readable storage medium

Info

Publication number: CN109360179B
Application number: CN201811214128.2A
Authority: CN
Inventors: 程永翔; 刘坤; 于晟焘
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2018-10-18
Filing date: 2018-10-18
Publication date: 2022-09-02
Anticipated expiration: 2038-10-18
Also published as: CN109360179A

Abstract

The invention discloses an image fusion method, an image fusion device and a readable storage medium, which are applied to the technical field of image processing, wherein the image fusion method comprises the following steps: firstly, obtaining a first image and a second image after registration; after convolutional neural network training, classifying and outputting a first score map and a second score map; comparing corresponding pixels of the first score map and the second score map to obtain a binary map; obtaining a first fusion image; calculating a first structural similarity graph and a second structural similarity graph; obtaining a difference diagram of the first structural similarity diagram and the second structural similarity diagram; and obtaining a second fusion image based on the difference image, the first image and the second image. By applying the embodiment of the invention, the fusion image of the infrared image and the visible light image is obtained through the two-channel convolution neural network, and the convolution neural network is used as the algorithm of deep learning, so that the image characteristics can be automatically selected, the singularity of characteristic extraction is improved, and the defect of the existing fusion method of the infrared image and the visible light image is avoided.

Description

Image fusion method and device and readable storage medium

Technical Field

The invention relates to the technical field of human image fusion, in particular to an image fusion method, an image fusion device and a readable storage medium.

Background

The infrared sensor is sensitive to the infrared thermal characteristics of a target area, can work day and night and overcome the difficulty of illumination to find a target, but often lacks abundant detailed information and has a fuzzy background; the visible light image contains more abundant textural features and detail information, but the imaging condition of the visible light image has higher requirements on illumination. If the complementary information of the infrared image and the visible light image is effectively fused, the obtained fused image information is richer and has stronger robustness, and a good foundation is laid for subsequent image segmentation, detection and identification. Therefore, the infrared and visible light image fusion technology is widely applied to the fields of military affairs and safety monitoring.

The image fusion is divided into: pixel level, feature level, and decision level. Image fusion at the pixel level is the most basic and the fused image information is richer. An image fusion method based on multi-scale transform (MST) and Sparse Representation (SR) is the most common method in a pixel-level image fusion method, and an image feature extractor in the method needs to be manually designed, so that the operation efficiency is low; meanwhile, the extracted single image features cannot be well applied to various complex image environments, and misjudgment is easy to occur in areas with uniform gray levels.

Disclosure of Invention

The embodiment of the invention aims to provide an image fusion method, an image fusion device and a readable storage medium, wherein a fusion image of an infrared image and a visible light image is obtained through a two-channel convolution neural network, and the convolution neural network is used as a deep learning algorithm, so that the image characteristics can be automatically selected, the feature extraction unicity is improved, and the defects of the existing infrared image and visible light image fusion method are overcome. The specific technical scheme is as follows:

in order to achieve the above object, an embodiment of the present invention provides an image fusion method, including:

registering an infrared image and a visible light image to obtain a first image and a second image after registration, wherein the first image is a partial image in the infrared image, and the second image is a partial image in the visible light image;

inputting the first image and the second image into a trained convolutional neural network, and outputting a first score map and a second score map in a classified manner after the convolutional neural network is trained;

comparing corresponding pixels of the first score map and the second score map to obtain a binary map;

obtaining a first fusion image based on the binary image, the first image and the second image;

calculating a first structural similarity graph of the first image and the first fused image, and calculating a second structural similarity graph of the second image and the first fused image;

obtaining a difference map of the first structural similarity map and the second structural similarity map;

and obtaining a second fusion image based on the difference image, the first image and the second image.

In one implementation, the step of comparing corresponding pixels of the first score map and the second score map to obtain a binary map includes:

judging whether a first pixel point on the first score map is larger than a pixel value of a second pixel point, wherein the first pixel point is any one pixel point on the first score map, and the second pixel point is a pixel point on the second score map corresponding to the first pixel point;

if yes, the pixel value of a third pixel point on the binary image is 1; and otherwise, the pixel value of a third pixel point is 0, wherein the third pixel point is a pixel point on the binary image corresponding to the first pixel point.

In one implementation, the first fused image is specifically expressed by the following formula:

F ₁ (x,y)＝D ₁ (x,y)A(x,y)+(1-D ₁ (x,y)B(x,y))

wherein D is ₁ Is a binary image, A is a first image, B is a second image, F ₁ The first fused image is obtained, and x and y are coordinate values of the pixel points.

In one implementation, the step of obtaining a difference map of the first structural similarity map and the second structural similarity map includes:

obtaining a difference value between the first structural similarity graph and the second structural similarity graph;

and taking the absolute value of the difference value as a difference map of the first structural similarity map and the second structural similarity map.

In one implementation, the step of obtaining a second fused image based on the difference map, the first image and the second image includes:

based on the target area, removing an area irrelevant to the target in the difference image to obtain a target feature extraction image;

and extracting an image, the first image and the second image according to the target feature to obtain a second fusion image.

In one implementation, the second fused image is specifically expressed by the following formula:

F ₂ (x,y)＝D ₂ (x,y)A(x,y)+(1-D ₂ (x,y)B(x,y))

wherein D is ₂ Extracting an image for the target feature, wherein A is a first image, B is a second image, x and y are coordinate values of pixel points, and F ₂ Is the second fused image.

Taking the binary image as a decision graph, obtaining a primary fusion image by adopting a weighting fusion rule, finally extracting a significant image of a target region by using SSIM, and fusing again to obtain a final fusion image;

in one implementation, the training step of the convolutional neural network includes:

extracting a first number of original images with the size of 32 multiplied by 32 from the first image set, and adding a second number of visible light images in the second image set;

converting the original image and the visible light image into a gray image, and cutting the gray image into 16 x 16 sub-blocks as a high-resolution image set;

and performing Gaussian blur processing on a first number of original images in the first image set, adding a second number of infrared light images in a second image set, and cutting the first number of original images and the second number of infrared light images into 16 × 16 sub-blocks serving as a blurred image set.

And training the convolutional neural network structure on the manufactured fuzzy image set and the high-resolution image set.

In one implementation, the convolutional neural network is a two-channel network, each channel is composed of 5 layers of convolutional neural network, including 3 convolutional layers, 1 max pooling layer, and 1 full connection layer, and the final output layer is 1 softmax classifier.

In addition, an embodiment of the present invention further provides an image fusion apparatus, including:

the registration module is used for registering an infrared image and a visible light image to obtain a first image and a second image after registration, wherein the first image is a partial image in the infrared image, and the second image is a partial image in the visible light image;

the classification module is used for inputting the first image and the second image into a trained convolutional neural network, and classifying and outputting a first score map and a second score map after the convolutional neural network is trained;

the comparison module is used for comparing corresponding pixels of the first score map and the second score map to obtain a binary map;

the first fusion module is used for obtaining a first fusion image based on the binary image, the first image and the second image;

the calculation module is used for calculating a first structural similarity graph of the first image and the first fusion image and calculating a second structural similarity graph of the second image and the first fusion image;

an obtaining module, configured to obtain a difference map of the first structural similarity map and the second structural similarity map;

and the second fusion module is used for obtaining a second fusion image based on the difference image, the first image and the second image.

And, providing a readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the image fusion methods.

By applying the image fusion method, the image fusion device and the readable storage medium provided by the embodiment of the invention, the fusion image of the infrared image and the visible light image is obtained through the convolutional neural network, the image characteristics are automatically selected, the singularity of characteristic extraction is improved, and the defects of the existing fusion method of the infrared image and the visible light image are avoided. Aiming at the situation that the target area and the background area are not completely and accurately segmented by binary segmentation, so that the shadow occurs in the later-stage fusion image, a saliency target area image is obtained according to the difference of the structural similarity between the infrared and visible light source images and the initial fusion image, the quality of the fusion image is improved by adopting a secondary fusion step, the integrity of the highlight target area can be kept by the fusion method based on the saliency, the visual quality of the fusion image is improved, and the subsequent image understanding, identification and the like can be better served.

Drawings

Fig. 1 is a schematic flowchart of an image fusion method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a first effect provided by the embodiment of the invention;

FIG. 3 is a diagram illustrating a second effect provided by the embodiment of the invention;

FIG. 4 is a schematic diagram of a third effect provided by the embodiment of the invention;

FIG. 5 is a diagram illustrating a fourth effect provided by the embodiment of the present invention;

FIG. 6 is a diagram illustrating a fifth effect provided by the embodiment of the present invention;

FIG. 7 is a diagram illustrating a sixth effect provided by the embodiment of the present invention;

fig. 8 is a diagram illustrating a seventh effect provided by the embodiment of the present invention;

fig. 9 is a schematic diagram of an eighth effect provided by the embodiment of the invention;

fig. 10 is a diagram illustrating a ninth effect provided by the embodiment of the present invention;

fig. 11 is a schematic diagram illustrating a tenth effect provided by the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that, in the image processing technology, the thermal radiation effect of the target in the infrared image is large, and is larger or even opposite to the gray scale difference of the visible light image; the background gray scale of the infrared image has no obvious thermal effect and is low in contrast, and compared with the visible light image, the infrared image lacks spectral information and also contains detailed information. Therefore, when the images are fused, the fusion effect can be further improved only by keeping more information of the original images.

Referring to fig. 1, an embodiment of the present invention provides an image fusion method, including the following steps:

s101, registering an infrared image and a visible light image to obtain a first image and a second image after registration, wherein the first image is a partial image in the infrared image, and the second image is a partial image in the visible light image.

It should be noted that geometric registration refers to an operation of geometrically transforming images (data) of the same region obtained by different remote sensor systems at different times and different wave bands to completely overlap the same-name image points in position and orientation. The specific geometric registration process is the prior art, and the embodiment of the present invention is not described herein.

It is understood that the sliding window is an image processing tool commonly used in image processing, and specifically, the size of the sliding window may be 3 × 3, 5 × 5, or 16 × 16, and the embodiment of the present invention is not limited in this respect.

For example, taking the first image as an example, a 16 × 16 sliding window may start from the first pixel point at the top left corner, regard it as the first center pixel point of the 16 × 16 sliding window, and then sequentially move the 16 × 16 sliding window. Therefore, any pixel point in the first image has a chance to be used as a central pixel point, and so on, and the same is true for the second image, so according to the principle, the structural similarity between any central pixel point in the first image and the corresponding central pixel point in the second image can be calculated.

Defining the size of a sliding window to be 16 multiplied by 16, wherein the step length is 1, and performing sliding operation from left to right and from top to bottom on the input registered infrared image and visible light image respectively to obtain a first image of an infrared image sub-block, as shown in fig. 2; the visible light image sub-block is the second image, as shown in fig. 3.

S102, inputting the first image and the second image into a trained convolutional neural network, and classifying and outputting a first score map and a second score map after training of the convolutional neural network.

It should be noted that, in machine learning, the convolutional neural network is a deep feedforward artificial neural network, and has been successfully applied to image recognition. The convolutional neural network is a feedforward neural network, and the artificial neuron can respond to surrounding units and can perform large-scale image processing, including convolutional layers and pooling layers.

In one implementation, the training step of the convolutional neural network includes: extracting a first number of original images with the size of 32 multiplied by 32 from the first image set, and adding a second number of visible light images in the second image set; converting the original image and the visible light image into a gray image, and cutting the gray image into 16 x 16 sub-blocks as a high-resolution image set; and performing Gaussian blur processing on a first number of original images in the first image set, adding a second number of infrared light images in a second image set, and cutting the first number of original images and the second number of infrared light images into 16 × 16 sub-blocks serving as a blurred image set.

Illustratively, 2000 original clear images with the size of 32 × 32 are extracted from the Cifar-10 Image set, 200 visible light images in the TNO _ Image _ Fusion _ database Image set are added, and then the images are converted into grayscale images and are all cut into 16 × 16 sub-blocks as a high-resolution Image set; secondly, all the subblocks from the Cifar-10 Image are subjected to Gaussian blur processing (because the background area of the infrared light Image is lower than the resolution of the visible light Image), and 200 infrared light images (all the subblocks are cut into 16 multiplied by 16) in the TNO _ Image _ Fusion _ Datase Image set are added to be used as a blurred Image set.

A two-channel network is adopted, each channel is composed of 5 layers of convolutional neural networks and comprises 3 convolutional layers, 1 maximum pooling layer and 1 full-connection layer, and the last output layer is 1 softmax classifier. The input image block size is 16 × 16, the convolution kernel size of the convolution layer is set to 3 × 3, and the step size is set to 1; the maximum pooling layer convolution kernel size is 2 x 2, the step size is 2, and the activation function is Relu. The momentum and weight attenuation were set to 0.9 and 0.0005, respectively, and the learning rate was 0.0001.

It can be understood that the first image is input into a trained convolutional neural network, each pixel point in the first image is trained through the convolutional neural network to obtain a score for each pixel point, so that the scores of all the pixel points are obtained after all the pixel points in the first image are trained, and a trained first score map S is obtained _A Similarly, a second score map S corresponding to the second image can be obtained _B . Referring to fig. 4, after the convolutional neural network is subjected to convolution twice, maximum pooling, convolution and full connection, a trained image is output.

S103, comparing corresponding pixels of the first score map and the second score map to obtain a binary map.

Specifically, whether a first pixel point on the first score map is larger than a pixel value of a second pixel point is judged, wherein the first pixel point is any one pixel point on the first score map, and the second pixel point is a pixel point on the second score map corresponding to the first pixel point; if yes, the pixel value of a third pixel point on the binary image is 1; and otherwise, the pixel value of a third pixel point is 0, wherein the third pixel point is a pixel point on the binary image corresponding to the first pixel point.

For the binary image T, the first score image and the second score image are compared one by one, if any pixel point is in the position of (m, n), the value is S _A Has a pixel point value greater than S _B The pixel point value of (a) is 1 at the position (m, n) corresponding to the binary image, otherwise, the pixel point value of (a) is 0 at the position corresponding to the binary image, as shown in the following formula, for example, based on fig. 2 and 3, the binary image obtained after passing through the neural network shown in fig. 4 is as shown in fig. 5Shown in the figure.

Thus, a binary image of the target area and the background area is obtained, wherein the white area represents the target area of the infrared image, the black area is the background area, and the binary image can be used as a decision map for image fusion.

S104, obtaining a first fusion image based on the binary image, the first image and the second image.

The first image and the second image are weighted according to the binary image to obtain a primary fusion result, the primary fusion is to integrate the target area of the infrared image and the background area of the high-resolution visible light image into one image, and a first fusion image as shown in fig. 6 is obtained based on fig. 2, fig. 3 and fig. 5.

F ₁ (x,y)＝D ₁ (x,y)A(x,y)+(1-D ₁ (x,y)B(x,y))

S105, calculating a first structural similarity graph of the first image and the first fused image, and calculating a second structural similarity graph of the second image and the first fused image.

There is a strong correlation between the infrared image and the visible image pixels, and there is a lot of structural information in the correlation, and the ssim (structural similarity index) is an index for evaluating the image quality. From the perspective of image composition, the structural similarity index defines structural information as brightness and contrast, thereby reflecting the structural characteristics of objects in the image. For the two images C and D, the similarity measure function for the two images is defined as:

wherein, mu _a ，μ _b Is the average gray scale of images C and D, σ _a ，μ _b Is the standard deviation, σ, of images C and D _ab Is the covariance of images C and D, C ₁ ，C ₂ ，C ₃ Is a very small normal number in order to avoid instability when the denominator is close to 0.α, β, γ > 0, which are weights used to adjust the brightness, contrast, and structure functions.

Thus, the first image A and the first fused image F are calculated ₁ First structural similarity map S _AF Illustratively, the first structural similarity map shown in fig. 7 is obtained based on fig. 2 and fig. 6, and the second image B and the first fused image F are calculated ₁ Second structural similarity graph S _BF Based on fig. 3 and fig. 6, a second structural similarity map as shown in fig. 8 is obtained.

And S106, obtaining a difference diagram of the first structural similarity diagram and the second structural similarity diagram.

In one implementation, the step of obtaining a difference map of the first structural similarity map and the second structural similarity map includes: obtaining a difference value of the first structural similarity graph and the second structural similarity graph; and taking the absolute value of the difference as a difference map of the first structural similarity map and the second structural similarity map. Specifically, the difference graph between the first structural similarity graph and the second structural similarity graph is as follows:

S＝|S _AF -S _BF |

wherein the first structure similarity map S _AF Second structural similarity map S _BF S is a difference map, and an exemplary difference map obtained based on fig. 7 and 8 is shown in fig. 9.

S107, obtaining a second fusion image based on the difference image, the first image and the second image.

Because the first fused image obtained by the primary fusion does not completely and accurately segment the target region and the background region, the shadow appears in the later fused image, and therefore the quality of the fused image is improved by adopting the secondary fusion step.

In one implementation, the step of obtaining a second fused image based on the difference map, the first image and the second image includes: based on the target area, removing an area irrelevant to the target in the difference map to obtain a target feature extraction image; and extracting an image, the first image and the second image according to the target feature to obtain a second fusion image.

Illustratively, based on the difference map shown in fig. 9, a target feature extraction image as shown in fig. 10 is obtained.

F ₂ (x,y)＝D ₂ (x,y)A(x,y)+(1-D ₂ (x,y)B(x,y))

The second-order fusion is regarded as the fusion of the infrared image extracted based on the saliency target and the visible light image. The difference map S contains the salient regions of the infrared image. By adopting a morphological image processing method, a region irrelevant to the target in the difference map is removed to obtain a target feature extraction map, and it can be understood that the target region is an infrared map of the target person extracted by the infrared sensor, so that the significance of the target region is enhanced, and the detail information retained in the fusion image can be improved, as shown in fig. 11, based on the second fusion image obtained in fig. 10, fig. 2 and fig. 3.

By adopting the idea of binary segmentation, the fusion image of the infrared image and the visible light image is obtained through the two-channel convolutional neural network, and the convolutional neural network is used as a deep learning algorithm, so that the image characteristics can be automatically selected, the feature extraction unicity is improved, and the defects of the existing fusion method of the infrared image and the visible light image (most of the methods need to manually design and extract the features, and the features are single in extraction and easy to lose) are overcome. Secondly, aiming at the situation that the target area and the background area are not completely and accurately segmented by binary segmentation, so that the shadow occurs in the later-stage fusion image, a saliency target area image is obtained according to the difference of the structural similarity between the infrared and visible light source images and the initial fusion image, the quality of the fusion image is improved by adopting a secondary fusion step, the integrity of the saliency target area can be kept by the fusion method based on the saliency, the visual quality of the fusion image is improved, and the subsequent image understanding, identification and the like can be better served.

the calculation module is used for calculating a first structural similarity graph of the first image and the first fused image and calculating a second structural similarity graph of the second image and the first fused image;

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. An image fusion method, comprising:

inputting the first image and the second image into a trained convolutional neural network, and classifying and outputting a first score map and a second score map after training of the convolutional neural network;

comparing corresponding pixels of the first score map and the second score map to obtain a binary map; obtaining a first fused image based on the binary image, the first image and the second image, wherein a concrete expression formula of the first fused image is as follows: f1(x, y) ═ D1(x, y) a (x, y) + (1-D1(x, y) B (x, y))

Wherein D1 is a binary image, A is a first image, B is a second image, F1 is a first fused image, and x and y are coordinate values of pixels;

obtaining a difference map of the first structural similarity map and the second structural similarity map; obtaining a second fusion image based on the difference image, the first image and the second image;

the step of obtaining a difference map of the first structural similarity map and the second structural similarity map comprises:

taking the absolute value of the difference as a difference map of the first structural similarity map and the second structural similarity map;

the step of obtaining a second fused image based on the disparity map, the first image and the second image comprises:

extracting an image, the first image and the second image according to the target feature to obtain a second fused image;

the specific expression formula of the second fusion image is as follows: f2(x, y) ═ D2(x, y) a (x, y) + (1-D2(x, y) B (x, y))

Wherein D2 is the target feature extraction image, A is the first image, B is the second image, x, y are the coordinate values of the constituent pixels, F2 is the second fusion image.

2. The image fusion method according to claim 1, wherein the step of comparing corresponding pixels of the first score map and the second score map to obtain a binary map comprises:

aiming at a first pixel point on the first score map, judging whether the pixel value is larger than the pixel value of a second pixel point, wherein the first pixel point is any one pixel point on the first score map, and the second pixel point is a pixel point on the second score map corresponding to the first pixel point;

3. The image fusion method according to claim 1, wherein the training step of the convolutional neural network comprises:

4. An image fusion method according to claim 1 or 3, characterized in that the convolutional neural network is a two-channel network, each channel is composed of a convolutional neural network with 5 layers, including 3 convolutional layers, 1 max pooling layer, and 1 full-link layer, and the final output layer is 1 softmax classifier.

5. An image fusion apparatus, comprising:

a first fusion module, configured to obtain a first fusion image based on the binary image, the first image, and the second image, where a specific expression formula of the first fusion image is: f1(x, y) ═ D1(x, y) a (x, y) + (1-D1(x, y) B (x, y))

the second fusion module is used for obtaining a second fusion image based on the difference image, the first image and the second image;

the obtaining module is specifically configured to: obtaining a difference value between the first structural similarity graph and the second structural similarity graph; taking the absolute value of the difference as a difference map of the first structural similarity map and the second structural similarity map;

the second fusion module is specifically configured to: based on the target area, removing an area irrelevant to the target in the difference image to obtain a target feature extraction image; extracting an image, the first image and the second image according to the target feature to obtain a second fused image; the specific expression formula of the second fusion image is as follows:

F2(x,y)＝D2(x,y)A(x,y)+(1-D2(x,y)B(x,y))

6. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the image fusion method according to any one of claims 1 to 4.