CN116823694B

CN116823694B - Infrared and visible light image fusion method and system based on multi-focus information integration

Info

Publication number: CN116823694B
Application number: CN202311108029.7A
Authority: CN
Inventors: 李小松; 黎熹来; 王晓磐; 张莞宁; 刘洋; 谭海曙; 王茗祎; 陈健伸; 杨乐淼; 易鹏
Original assignee: Foshan University
Current assignee: Foshan University
Priority date: 2023-08-31
Filing date: 2023-08-31
Publication date: 2023-11-24
Anticipated expiration: 2043-08-31
Also published as: CN116823694A

Abstract

The application discloses an infrared and visible light image fusion method and system based on multi-focus information integration, wherein the method comprises the following steps: acquiring an infrared image, a first visible light source image and a second visible light source image; decomposing the infrared image, the first visible light source image and the second visible light source image by an image filter based on semi-sparsity; performing remarkable information fusion processing on the decomposed infrared image, the decomposed first visible light source image and the decomposed second visible light source image to obtain a fusion texture layer and a fusion structure layer; and adding the fusion texture layer and the fusion structure layer to obtain a final image fusion result. By using the method and the device, the detailed information and the infrared heat radiation target information in the visible light image can be effectively extracted. The method and the system for fusing the infrared and visible light images based on multi-focus information integration can be widely applied to the technical field of image fusion.

Description

Infrared and visible light image fusion method and system based on multi-focus information integration

Technical Field

The application relates to the technical field of image fusion, in particular to an infrared and visible light image fusion method and system based on multi-focus information integration.

Background

Infrared and visible light image fusion techniques play a key role in the computing field by integrating valuable information obtained from different sensors. Such techniques may provide more comprehensive and detailed scene interpretation for advanced visual tasks, such as object detection, semantic segmentation, and pedestrian re-recognition. In a fully lit scene, the visible light sensor is adept at capturing rich texture and detail information, providing a more complex scene description. However, in low light conditions or in challenging environments, such as smoke or rain, visible light sensors often suffer from serious information loss. On the other hand, the infrared camera is able to capture the thermal radiation emitted by the target, making it less susceptible to external disturbances. Nevertheless, they lack sensitivity to complex scene details, mainly providing feature information that helps to pinpoint objects.

In recent years, many effective feature information detection algorithms have been developed in the field of infrared and visible light image fusion, and can be roughly classified into a deep learning-based method and a conventional fusion method. The deep learning-based method adopts a neural network to simulate the functions of the human brain. By learning from a broad dataset, these models build a connection and utilize depth features to reconstruct a fused image rich in complex details. Conventional fusion algorithms can be divided into algorithms based on multi-scale transformations and algorithms based on saliency. The former decomposes the source image into multiple scales and directions, integrates it with fusion rules and reconstructs the fusion coefficients. The saliency-based algorithm aims to preserve salient regions identified from different source images. It typically uses various feature extraction operators to calculate a saliency map and builds fusion weights on this basis. The method is helpful for reducing redundancy of pixel information and improving visual quality of fusion results. While many algorithms can provide high quality fusion results in infrared and visible light image fusion tasks, they generally assume that scene information captured by a visible light imaging device is always in focus and in focus. However, in practical cases, only objects within the depth of field can be focused due to the limitations of the optical lens. Thus, when the camera is unable to capture all of the target information in the scene at the same time, a partial region within the scene may become blurred and one full scene focused image may not be presented, thus requiring multiple sets of data to be captured to ensure that all of the target information is within the focus region.

Disclosure of Invention

In order to solve the technical problems, the application aims to provide an infrared and visible light image fusion method and system based on multi-focus information integration, which can effectively extract detailed information and infrared thermal radiation target information in a visible light image.

The first technical scheme adopted by the application is as follows: the infrared and visible light image fusion method based on multi-focus information integration comprises the following steps:

acquiring an infrared image, a first visible light source image and a second visible light source image, wherein the first visible light source image and the second visible light source image have different focusing areas;

decomposing the infrared image, the first visible light source image and the second visible light source image by using an image filter based on semi-sparsity to obtain a decomposed infrared image, a decomposed first visible light source image and a decomposed second visible light source image;

performing remarkable information fusion processing on the decomposed infrared image, the decomposed first visible light source image and the decomposed second visible light source image to obtain a fusion texture layer and a fusion structure layer;

and adding the fusion texture layer and the fusion structure layer to obtain a final image fusion result.

Further, the step of decomposing the infrared image, the first visible light source image and the second visible light source image by the image filter based on semi-sparsity to obtain a decomposed infrared image, a decomposed first visible light source image and a decomposed second visible light source image specifically includes:

performing smoothing operation on the infrared image, the first visible light source image and the second visible light source image by using an image filter based on semi-sparsity to obtain a structural layer of the infrared image, a structural layer of the first visible light source image and a structural layer of the second visible light source image;

performing subtraction processing on the infrared image and the structural layer of the infrared image, performing subtraction processing on the first visible light source image and the structural layer of the first visible light source image, and performing subtraction processing on the second visible light source image and the structural layer of the second visible light source image to obtain a texture layer of the infrared image, a texture layer of the first visible light source image and a texture layer of the second visible light source image;

and integrating the structural layer of the infrared image and the texture layer of the infrared image, the structural layer of the first visible light source image and the texture layer of the first visible light source image, and the structural layer of the second visible light source image and the texture layer of the second visible light source image respectively to obtain a decomposed infrared image, a decomposed first visible light source image and a decomposed second visible light source image.

Further, the expression of the semi-sparsity image filter is specifically as follows:

;

in the above-mentioned method, the step of,representing the input image, ++>Representing the output image->And->Weight representing balance, ++>Express provision->Confidence map of spatial variation of smoothness of +.>Represents->Differential operator->Representation of image->First->Higher order gradient->Indicate->A differential operator, wherein->。

Further, the step of performing significant information fusion processing on the decomposed infrared image, the decomposed first visible light source image and the decomposed second visible light source image to obtain a fusion texture layer and a fusion structure layer specifically includes:

performing feature detection processing on the texture layer of the infrared image, the texture layer of the first visible light source image and the texture layer of the second visible light source image through a feature extraction operator to obtain a salient feature map of the texture layer of the infrared image, a salient feature map of the texture layer of the first visible light source image and a salient feature map of the texture layer of the second visible light source image;

performing absolute value maximization processing on the salient feature map of the infrared image texture layer, the salient feature map of the first visible light source image texture layer and the salient feature map of the second visible light source image texture layer to obtain a focusing decision map of the infrared image texture layer, a focusing decision map of the first visible light source image texture layer and a focusing decision map of the second visible light source image texture layer;

integrating the focusing decision diagram of the infrared image texture layer, the focusing decision diagram of the first visible light source image texture layer and the focusing decision diagram of the second visible light source image texture layer to obtain a focusing texture diagram;

performing salient feature extraction processing on the focusing texture map to obtain a salient feature map of the focusing texture map;

based on texture layer fusion rules, carrying out fusion processing on the salient feature images of the focused texture images to obtain fusion texture layers;

and taking the distribution of energy information into consideration, carrying out fusion processing on the structural layer of the infrared image, the structural layer of the first visible light source image and the structural layer of the second visible light source image to obtain a fusion structural layer.

Further, the expression of performing feature detection processing on the texture layer of the infrared image, the texture layer of the first visible light source image and the texture layer of the second visible light source image by the feature extraction operator is specifically as follows:

;

in the above-mentioned method, the step of,a saliency map representing a texture layer, +.>Salient pixel information representing the texture layer of the image, +.>Representing image texture layer +.>Gradient information of direction,/>Representing image linesLayer(s) of the management>Gradient information of the direction.

Further, the expression of the texture layer fusion rule is specifically as follows:

;

in the above-mentioned method, the step of,representing a fused texture layer->Representing focus texture layer +.>Is characterized by (1)>A salient feature map representing a focused texture map, +.>Texture layer representing image->Representing texture layer->Is a significant feature map of (1).

Further, the step of performing feature detection processing on the texture layer of the infrared image, the texture layer of the first visible light source image and the texture layer of the second visible light source image by using a feature extraction operator to obtain a salient feature map of the texture layer of the infrared image, a salient feature map of the texture layer of the first visible light source image and a salient feature map of the texture layer of the second visible light source image specifically includes:

global gradient information detection is carried out on a texture layer of the infrared image, a texture layer of the first visible light source image and a texture layer of the second visible light source image, and gradient information of the texture layer of the infrared image, gradient information of the texture layer of the first visible light source image and gradient information of the texture layer of the second visible light source image are obtained;

decomposing the texture layer of the infrared image, the texture layer of the first visible light source image and the texture layer of the second visible light source image based on the Gaussian pyramid to obtain an infrared image texture layer with a plurality of scales, a first visible light source image texture layer with a plurality of scales and a second visible light source image texture layer with a plurality of scales;

performing image salient pixel information detection processing on an infrared image texture layer with a plurality of scales, a first visible light source image texture layer with a plurality of scales and a second visible light source image texture layer with a plurality of scales through a Laplacian pyramid to obtain characteristic information of the infrared image texture layer, characteristic information of the first visible light source image texture layer and characteristic information of the second visible light source image texture layer;

and integrating the gradient information of the infrared image texture layer and the characteristic information of the infrared image texture layer, the gradient information of the first visible light source image texture layer and the characteristic information of the first visible light source image texture layer, the gradient information of the second visible light source image texture layer and the characteristic information of the second visible light source image texture layer respectively to obtain a salient feature map of the infrared image texture layer, a salient feature map of the first visible light source image texture layer and a salient feature map of the second visible light source image texture layer.

The step of fusing the structural layer of the infrared image, the structural layer of the first visible light source image and the structural layer of the second visible light source image to obtain a fused structural layer by considering the distribution of the energy information specifically comprises the following steps:

respectively calculating the frequency variance of the discrete cosine transform block of the infrared image structure layer, the frequency variance of the discrete cosine transform block of the first visible light source image structure layer and the frequency variance of the discrete cosine transform block of the second visible light source image structure layer;

averaging the frequency variance of the discrete cosine transform block of the infrared image structure layer, the frequency variance of the discrete cosine transform block of the first visible light source image structure layer and the frequency variance of the discrete cosine transform block of the second visible light source image structure layer to obtain a first characteristic value of the fusion structure layer;

respectively calculating the entropy of the infrared image structure layer, the entropy of the first visible light source image structure layer and the entropy of the second visible light source image structure layer to obtain a second characteristic value of the fusion structure layer;

and based on the structural layer fusion rule, carrying out fusion processing on the first characteristic value of the fusion structural layer and the second characteristic value of the fusion structural layer to obtain the fusion structural layer.

The expression of the structural layer fusion rule is specifically as follows:

;

in the above-mentioned method, the step of,representing a fusion structural layer->Entropy representing image structure layer, ++>Frequency variance of discrete cosine transform block representing image structure layer,/->Representing the structural layer of the image.

The second technical scheme adopted by the application is as follows: an infrared and visible light image fusion system based on multi-focus information integration, comprising:

the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an infrared image, a first visible light source image and a second visible light source image, and the first visible light source image and the second visible light source image have different focusing areas;

the decomposition module is used for decomposing the infrared image, the first visible light source image and the second visible light source image based on the semi-sparsity image filter to obtain a decomposed infrared image, a decomposed first visible light source image and a decomposed second visible light source image;

the fusion module is used for carrying out remarkable information fusion processing on the decomposed infrared image, the decomposed first visible light source image and the decomposed second visible light source image to obtain a fusion texture layer and a fusion structure layer;

and the adding module is used for adding the fusion texture layer and the fusion structure layer to obtain a final image fusion result.

The method and the system have the beneficial effects that: according to the application, the infrared image and the visible light source image are obtained, the infrared image, the first visible light source image and the second visible light source image are further decomposed based on the semi-sparsity image filter, the gradient information and the remarkable pixel characteristic information of the infrared image, the first visible light source image and the second visible light source image are extracted by utilizing the complementary information among different mode images, the image energy information distribution is considered, the image is fused, important information can be detected from different mode images, the focusing characteristic of pixel points can be considered, various complex scenes can be effectively processed, the effective pixel information from different mode images can be effectively distinguished while the effective pixel information from the different mode images can be effectively distinguished.

Drawings

FIG. 1 is a flow chart showing the steps of an infrared and visible image fusion method based on multi-focus information integration according to an embodiment of the present application;

FIG. 2 is a block diagram of an infrared and visible image fusion system based on multi-focus information integration in accordance with an embodiment of the present application;

FIG. 3 is a schematic flow diagram of a method for fusing infrared and visible images according to an embodiment of the present application;

fig. 4 is a schematic diagram of the results of a comparison experiment of the method of the present application with 8 existing image fusion algorithms.

Detailed Description

The application will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.

Referring to fig. 1 and 3, the present application provides an infrared and visible light image fusion method based on multi-focusing information integration, the method comprising the steps of:

s1, acquiring an infrared image, a first visible light source image and a second visible light source image, wherein the first visible light source image and the second visible light source image have different focusing areas;

in particular, corresponding registered infrared is acquiredAnd visible light source images with different focus areas +.>And->The corresponding registration means that the resolutions of the input infrared image and the visible light image are consistent, the relative positions of all objects in the image scene are consistent, and the geometric shapes are aligned, and as the application aims at the registered image fusion, different input images are required to be registered, namely, the images acquired at different times or at different angles can be aligned correctly; having different focal areas means that under the same scene, there is often partial focus partial blur (e.g., front post-focus blur, back pre-focus blur) in the imaged image due to camera depth of field limitations, such an image is referred to as a multi-focus image. Different multi-focused images of the same scene are focused in different areas, i.e. have focus complementarity, such as for example two multi-focused images, with a partial area of any multi-focused image being blurred, the partially blurred area being clear with respect to the image of the second different focused area, and the first clear area being blurred with respect to the second image. The same scene can be fused by multi-focus image fusion technologyAnd integrating focusing information of the multiple multi-focusing images to obtain a full-focusing image.

S2, decomposing the infrared image, the first visible light source image and the second visible light source image by using an image filter based on semi-sparsity to obtain a decomposed infrared image, a decomposed first visible light source image and a decomposed second visible light source image;

specifically, the edge preserving filter can effectively smooth most textures and detail information in the source image and simultaneously preserve the strength of structural edges, and in order to better distinguish pixel points representing different characteristic information of the image, the application introduces an image filter based on semi-sparsity to carry out smoothing treatment on the source image, wherein the filter can be expressed as follows:

；

in the above-mentioned method, the step of,representing the input image, ++>Representing the output image->And->Weight representing balance, ++>Express provision->Confidence map of spatial variation of smoothness of +.>Represents->Differential operator->Representation of image->First->Higher order gradient->Indicate->A differential operator, wherein->；

In the present application, it willSetting to 2 as the highest order of regularization, +.>And->Set to 0.8 and 0.05, respectively, and finally, the half sparsity filter smoothing operation is expressed as follows:

；

in the above-mentioned method, the step of,representing a semi-sparsity filtering operator;

inputting visible light images of different focus areasAnd->And corresponding infrared image->Obtaining structural layer->The operation of (1) is as follows:

；

in the above-mentioned method, the step of,representing a structural layer, wherein->；

The infrared image, the first visible light source image, and the second visible light source image texture layer are calculated as follows:

；

in the above-mentioned method, the step of,representing the texture layer.

S3, performing significant information fusion processing on the decomposed infrared image, the decomposed first visible light source image and the decomposed second visible light source image to obtain a fusion texture layer and a fusion structure layer;

s31, fusing the infrared image, the first visible light source image and the second visible light source image texture layer;

in particular, the fusion of texture information relies on the detection of focused and clear detail information and the utilization of complementary information between different modality images, for which purpose the present application designs a novel feature extraction operator to meet this challenge. The operator consists of two important parts, namely global gradient information detection and multi-scale feature extraction, firstly, gradient information of a texture layer is obtained through a global gradient detection operator, specifically expressed as follows,

；

in the above-mentioned method, the step of,representing texture layer->At->And->Gradient in direction, ++>Is set to a constant of 0.8, < >> ，/>Representing a balance parameter,/->A gradient map is shown.

The image pyramid is an efficient multi-scale representation method, and the multi-scale feature extraction operator is constructed based on a Gaussian pyramid and a Laplacian pyramid. Given an input imageDecomposing it into N scales using Gaussian pyramid>，/> Wherein->Each layer corresponding to the Laplacian pyramid is represented as，/>The object of the application is to provide a device which is in the scale +.>Under the direction of detecting significant pixel information of an image, the specific procedure is as follows,

;

in the above-mentioned method, the step of,representing spatial frequency operator, +.>Representing a significance measure map,/->Representing an imageCorresponding->Layered Gaussian pyramid->Representation of image->Corresponding->Layer laplacian pyramids;

finally, the global gradient measurement and the significance measurement are integrated, and the novel feature extraction operator provided by the application can be expressed as follows:

;

in the above-mentioned method, the step of,a saliency map representing a texture layer, +.>Salient pixel information representing the texture layer of the image, +.>Representing image texture layer +.>Gradient information of direction,/>Representing image texture layer +.>Gradient information of the direction;

at the position ofUnder the guidance of (1), the application firstly obtains a focusing decision diagram through the absolute value big rule>The expression is specifically as follows:

;

in the above-mentioned method, the step of,representing a focus decision diagram, < >>Representing texture layer->Is characterized by (1)>Representing texture layer->Is a significant feature map of (1).

According toThe application can integrate the focused clear details to obtain the focused texture map +.>The expression is specifically as follows:

;

in the above-mentioned method, the step of,representing source image +.>Is (are) texture layer>Representing source image +.>Is a texture layer of (a);

acquisition ofIs->After that, the final fusion texture layer is constructed by the following rule>The expression is specifically as follows:

;

S32, fusing the infrared image, the first visible light source image and the second visible light source image structural layer.

Specifically, the structural layer contains low-frequency information of a source image, the distribution of energy information is considered from the aspects of entropy and multi-directional frequency variance of the image, and each structural layer is calculated firstlyFrequency variance of discrete cosine transform block of size +.>，

In the above-mentioned method, the step of,representing DCT blocks in four directions +>Standard deviation of>Representation->Average value in four directions;

the method comprises the steps of calculating variances of all blocks, calculating an average value of the variances to serve as a first characteristic value of a structural layer, and using entropy serving as a second characteristic value to design a fusion rule of the structural layer;

thus, a fused structural layerThe construction is as follows:

;

in the above-mentioned method, the step of,，/>representing structural layer->Entropy of->Representing a fusion structural layer->Frequency variance of discrete cosine transform block representing image structure layer,/->Representing the structural layer of the image.

And S4, adding the fusion texture layer and the fusion structure layer to obtain a final image fusion result.

Specifically, the expression of the final image fusion result is specifically as follows:

in the above-mentioned method, the step of,representing the final image fusion result,/->Representing a fused texture layer->Representing a fused structural layer.

In summary, the present application is first used to decompose an image into structural and texture components based on semi-sparsity-based image smoothing filtering. In the fusion of texture components, the application designs a novel multi-scale significant information detection operator, which can simultaneously consider the focusing information of pixels and the useful pixel information from images of different modes, and in addition, the application considers the distribution of energy information in structural components from the angles of multi-direction frequency variance and information entropy, thereby realizing the effective capture of scene brightness information and the maintenance of reasonable contrast.

Referring to fig. 2, an infrared and visible light image fusion system based on multi-focus information integration, comprising:

In order to further demonstrate the advantages and effectiveness of the present application, a set of comparative experiments were performed with 8 existing image fusion algorithms, analyzing the performance of each algorithm on subjective visual assessment. Wherein the comparative method of the present application is shown in fig. 4 with (a) - (g), respectively, comprising: (a): fusing infrared and visible light images by using a potential low-rank representation method; (b): fusing infrared and visible light images decomposed based on target enhanced multi-scale transformation; (c): generating an countermeasure network with multiple classification constraints for infrared and visible light image fusion; (d): learning a deep multi-scale feature set and edge attention guidance for image fusion; (e): image fusion in advanced visual task loop: a semantic-aware real-time infrared and visible image fusion network; (f): generating infrared and visible light image fusion for realizing semantic supervision of an countermeasure network through a double discriminator; (g): a unified, unsupervised image fusion network. Wherein (h) is the algorithm provided by the application. As can be seen from fig. 4, the method (b) completely masks the person information by the smoke information, and the methods (d) and (e) are also severely interfered by the smoke information, so that reasonable contrast cannot be maintained. It is evident that the pixel information of the methods (a) and (c) is mostly derived from infrared images, which may lose detail information of part of the visible image. The algorithm provided by the application has good detail holding capacity while being not interfered by smoke information, and can effectively distinguish useful pixel information from images of different modes.

The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.

While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims

1. The infrared and visible light image fusion method based on multi-focus information integration is characterized by comprising the following steps of:

taking the distribution of energy information into consideration, carrying out fusion treatment on the structural layer of the infrared image, the structural layer of the first visible light source image and the structural layer of the second visible light source image to obtain a fusion structural layer;

2. The method for integrating infrared and visible light images based on multi-focusing information according to claim 1, wherein the step of decomposing the infrared image, the first visible light source image and the second visible light source image by the image filter based on semi-sparsity to obtain a decomposed infrared image, a decomposed first visible light source image and a decomposed second visible light source image specifically comprises the steps of:

3. The method for integrating infrared and visible light images based on multi-focusing information according to claim 2, wherein the expression of the semi-sparsity image filter is specifically as follows:

in the above formula, f represents an input image, u represents an output image, α and λ represent balanced weights, Z represents a parameter controlling smoothness of the output image u,represents the nth derivative operator,>representing the kth higher gradient of image u, < >>Represents the kth differential operator, where k=1, 2,3,4,5.

4. The method for integrating infrared and visible light images based on multi-focusing information according to claim 3, wherein the expression of the feature extraction operator for performing feature detection processing on the texture layer of the infrared image, the texture layer of the first visible light source image and the texture layer of the second visible light source image is specifically as follows:

in the above, TM _m Saliency map representing texture layer, SM _m Significant pixel information representing the texture layer of the image,gradient information representing the x-direction of the image texture layer, < >>Gradient information representing the y-direction of the image texture layer.

5. The method for integrating infrared and visible light images based on multi-focus information according to claim 4, wherein the expression of the texture layer integration rule is specifically as follows:

in the above formula, FT represents a fused texture layer, TM ₄ Representing a focused texture layer T ₄ Is a salient feature map of (TM) _m Representing a focused texture map TM _m T of the salient features map of (1) _m Texture layer, TM, representing an image ₃ Representing texture layer T ₃ Is a significant feature map of (1).

6. The method for integrating infrared and visible light images based on multi-focusing information according to claim 5, wherein the step of performing feature detection processing on the texture layer of the infrared image, the texture layer of the first visible light source image, and the texture layer of the second visible light source image by the feature extraction operator to obtain a salient feature map of the texture layer of the infrared image, a salient feature map of the texture layer of the first visible light source image, and a salient feature map of the texture layer of the second visible light source image specifically comprises:

7. The method for fusing infrared and visible light images based on multi-focus information integration according to claim 6, wherein the step of fusing the structural layer of the infrared image, the structural layer of the first visible light source image, and the structural layer of the second visible light source image to obtain a fused structural layer by considering the distribution of energy information specifically comprises:

8. The method for integrating infrared and visible light images based on multi-focusing information according to claim 7, wherein the expression of the structural layer integration rule is specifically as follows:

in the above formula, FS represents a fusion structural layer, E _m Entropy, ψ, representing the image structure layer _m Representing the frequency variance of discrete cosine transform blocks of the image structure layer, S _m Representing the structural layer of the image.

9. The infrared and visible light image fusion system based on multi-focus information integration is characterized by comprising the following modules:

the fusion module is used for carrying out feature detection processing on the texture layer of the infrared image, the texture layer of the first visible light source image and the texture layer of the second visible light source image through the feature extraction operator to obtain a salient feature map of the texture layer of the infrared image, a salient feature map of the texture layer of the first visible light source image and a salient feature map of the texture layer of the second visible light source image; performing absolute value maximization processing on the salient feature map of the infrared image texture layer, the salient feature map of the first visible light source image texture layer and the salient feature map of the second visible light source image texture layer to obtain a focusing decision map of the infrared image texture layer, a focusing decision map of the first visible light source image texture layer and a focusing decision map of the second visible light source image texture layer; integrating the focusing decision diagram of the infrared image texture layer, the focusing decision diagram of the first visible light source image texture layer and the focusing decision diagram of the second visible light source image texture layer to obtain a focusing texture diagram; performing salient feature extraction processing on the focusing texture map to obtain a salient feature map of the focusing texture map; based on texture layer fusion rules, carrying out fusion processing on the salient feature images of the focused texture images to obtain fusion texture layers; taking the distribution of energy information into consideration, carrying out fusion treatment on the structural layer of the infrared image, the structural layer of the first visible light source image and the structural layer of the second visible light source image to obtain a fusion structural layer;