CN115170810A

CN115170810A - Visible light infrared image fusion target detection example segmentation method

Info

Publication number: CN115170810A
Application number: CN202211095330.4A
Authority: CN
Inventors: 任侃; 赵俊逸; 钱惟贤; 顾国华; 陈钱
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-09-08
Filing date: 2022-09-08
Publication date: 2022-10-11
Anticipated expiration: 2042-09-08
Also published as: CN115170810B

Abstract

The invention provides a visible light infrared image fusion target detection example segmentation method, which comprises the steps of registering an infrared image and a visible light image, calibrating the same target in the infrared image and the visible light image, and translating and zooming the infrared image according to a calibration result to obtain an infrared registration image; extracting a target position coordinate from the infrared registration image by using a yolov5 neural network; extracting a mask in the infrared registration image by using a mask _ rcnn neural network; determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method; the fused image is given a pseudo color, and example segmentation is performed using a mask. The invention reserves the advantage of large visual field of visible light for the background part, and combines the advantages of high visibility of infrared severe environment and visible light space detail for the target part.

Description

Visible light infrared image fusion target detection example segmentation method

Technical Field

The invention relates to the field of computational vision, in particular to a visible light infrared image fusion target detection example segmentation method.

Background

Visual information is an important acquisition information channel for human beings, and 83% of information acquired by human beings comes from the visual information. The visual sensor is an important tool for assisting human to obtain visual information in human production and life. However, in the face of a complex and changeable natural environment, the information obtained by the single vision sensor has great limitation, and an image fusion technology is developed.

Infrared imaging devices and visible light imaging devices have received a great deal of attention in the field of imaging devices. The visible light image is imaged by depending on object reflected light, and the generated image is suitable for human eyes to observe, has higher spatial resolution and considerable detail and light and shade contrast, but is easily interfered by extreme environments such as poor illumination, fog and other bad weather. The infrared image is imaged by depending on the thermal radiation of an object, has a good imaging effect under the conditions of poor illumination, fog and other severe weather, but has lower spatial resolution and reduced details compared with a visible light imaging device. The infrared image and the visible light image are fused, so that the advantages of the infrared image and the visible light image can be integrated, the generated image is rich in details, and the infrared image and the visible light image are suitable for severe weather. However, the fusion of the whole image can make the image fusion take a long time, for example, for urban traffic conditions, it is generally only necessary to care about the targets such as pedestrians, vehicles, motorcycles, etc., and for background roads, billboards, sky, etc., the fusion of the targets can generate a great excess.

Disclosure of Invention

The invention aims to provide a visible light infrared image fusion target detection example segmentation method, which only segments out corresponding examples aiming at concerned targets.

The technical solution for realizing the purpose of the invention is as follows: a visible light infrared image fusion target detection example segmentation method comprises the following steps:

step 1, registering an infrared image and a visible light image, calibrating the same target in the infrared image and the visible light image, and translating and scaling the infrared image according to a calibration result to obtain an infrared registration image;

step 2, extracting a target position coordinate from the infrared registration image by using a yolov5 neural network;

step 3, extracting a mask from the infrared registration image by using a mask _ rcnn neural network;

step 4, determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method;

and 5, giving a pseudo color to the fused image, and performing example segmentation by using a mask.

Further, step 1, registering the infrared image and the visible light image, calibrating the same target in the infrared image and the visible light image, and translating and zooming the infrared image according to the calibration result to obtain the infrared registration image, wherein the specific method comprises the following steps:

selecting a visible light image and an infrared image which are shot simultaneously, respectively selecting 12 groups of calibration points from the two images to record coordinates as x _{ir_i，} y _{ir_i} And x _{vis_i，} y _{vis_i} (i is ∈ [1, 12 ]]) The subscript is ir to represent the mark points of the infrared image, vis to represent the mark points of the visible image, i to represent the number of groups, and for each mark point, the real space position is required to be the same, 4 groups of 12 groups of mark points are from a close-range target, 4 groups of mark points are from a medium-range target, and 4 groups of mark points are from a far-range target;

calculating the sum of Euclidean distances of the calibration points:

and controlling the infrared image translation scaling to enable the sum of Euclidean distances of the calibration points to be minimum, recording the translation scaling amount, and performing the same transformation on all the infrared images to obtain the infrared registration image.

Further, step 2, using yolov5 neural network to obtain the target position coordinates in the infrared registration image, the specific method is as follows:

(a) Training a yolov5 neural network by using a flight data set;

training a yolov5 neural network by using three labels of small vehicles, self-propelled vehicles and pedestrians in a flight data set, adopting Mosaic data enhancement in the training, randomly selecting 4 infrared images, forming a brand new picture by means of random scaling, random cutting and random arrangement, forming a training set by using an original flight data set image and an image obtained by enhancing the Mosaic data, inputting the training set into the yolov5 neural network for training, and obtaining a training weight;

(b) Carrying out target detection by using a yolov5 neural network;

and (3) performing minimum black edge scaling on the registered infrared image, wherein the minimum black edge scaling calculation method comprises the following steps:

wherein m and n are the length and width coefficients of the original input image, t is the image side length required by yolov5 neural network input, a _1， a ₂ The scaling coefficient of the length and the width of the original image relative to the yolov5 neural network required for image input;

use of scale to register infrared imagesm _new 、n _new Zooming is carried out;

wherein a is _min Is a _1， a ₂ The smaller of these;

filling a black frame with the length of c at the upper and lower sides, wherein the calculation formula of the width of the filling side is as follows:

，

wherein c is the size of the filled black edge;

loading the weight obtained by training in the step (a), inputting the infrared registration image after the zooming and filling into a yolov5 neural network, and outputting the target position coordinate (x) ₁ ,x ₂ ,y ₁ ,y ₂ ,c),x ₁ ,y ₁ Is the target upper left corner point coordinate, x ₂ ,y ₂ Coordinate of the lower right corner of the target, and c is the target classification.

Further, step 3, extracting a mask in the infrared image by using a mask _ rcnn neural network, including:

performing mask _ rcnn neural network training by using a coco data set to obtain training weights;

and (c) loading the training weight in the step (a), and inputting the infrared registration image into a mask _ rcnn neural network to obtain a mask.

Further, step 4, determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method, wherein the specific method comprises the following steps:

step 4.1, with x ₁ ,y ₁ As coordinates of the upper left corner point, x ₂ ,y ₂ Constructing a target position coordinate rectangular frame for the coordinates of the lower right corner, selecting a visible light image to be fused and an infrared image to be fused for the visible light image and the infrared registration image by using the target position coordinate rectangular frame, and using an ADF image fusion method;

step 4.2, anisotropic filtering is carried out on the visible light image and the infrared registration image to obtain a visible light image base layer B _vis Registering image base layer B with infrared _ir Wherein the anisotropic filtering pair for image decomposition is specifically operative to:

whereinI _t+1 In order to output the image(s),I _t for the input image, c is the diffusion rate, the V is the gradient operator, Δ is the Laplace operator, t is the set iteration number;

and 4.3, subtracting the visible light image basic layer from the visible light image to be fused to obtain a visible light image detail layer, and subtracting the infrared image basic layer from the infrared image to be fused to obtain an infrared image detail layer, wherein the infrared image detail layer is calculated as follows:

the visible light image detail layers are:

wherein I _vis And I _ir Respectively a visible light image to be fused and an infrared registration image;

step 4.4, respectively arranging the visible light image detail layer and the infrared registration image detail layer as X _vis And X _ir Find X _vis And X _ir Covariance matrix C of _xx Calculating a covariance matrix C _xx Set of eigenvalues w ₁ 、w ₂ And feature vector set:

selecting w ₁ 、w ₂ The larger of the characteristic values is w _max Selecting w _max Corresponding feature vector v _max The irrelevant subassembly coefficient in detail layer is sought, KL1 is the irrelevant subassembly coefficient in visible light image detail layer, KL2 is the irrelevant subassembly coefficient in infrared image detail layer:

fusing image detail layers:

fusing image base layers:

fusing images:

。

further, step 5, endowing pseudo color to the fused image, and performing example segmentation by using a mask, wherein the specific method comprises the following steps:

performing pseudo-color processing on the fused image by using COLORMAP _ JET to obtain a color fused image, replacing the image in a rectangular frame of the target position coordinates of the visible light image with the color fused image, obtaining a global color fused image, using a mask to perform negation and overlapping with the visible light image to obtain a visible light background image, overlapping the mask and the global color fused image to obtain a global color fused image example segmentation part, and overlapping the global color fused image example segmentation part and the visible light background image to obtain a final result.

A computer device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, when the processor executes the computer program, the target instance segmentation based on the visible light and infrared image fusion target detection instance segmentation method is realized.

A computer readable storage medium, on which a computer program is stored, which, when executed by a processor, implements a target instance segmentation based on visible light image and infrared image fusion detection based on the visible light infrared image fusion target detection instance segmentation method.

Compared with the prior art, the invention has the remarkable advantages that: (1) Compared with a common visible light and infrared image fusion method, the method can intelligently detect the target concerned on the image and carry out example segmentation on the concerned target; (2) Compared with a common visible light and infrared image fusion method, the method adopts local fusion based on the position coordinates of the target, thereby not only ensuring the detection of the concerned target, but also shortening the time consumed by image fusion; (3) A Mosaic data enhancement method is adopted for training the mask _ rcnn neural network, and the detection performance of the network on small targets is improved.

Drawings

FIG. 1 is a schematic diagram of the structure implementation of the present invention.

Fig. 2 is a structural schematic diagram of yolov 5.

Fig. 3 is a schematic diagram of yolov5 specific modules.

Fig. 4 is a schematic structural diagram of slice in yolov 5.

FIG. 5 is a schematic diagram of the mask _ rcnn structure.

FIG. 6 is a diagram of the structure of ResNet-FPN.

Fig. 7 is a PRN network structure.

Fig. 8 is an ADF image fusion flow chart.

FIG. 9 is a diagram of the detection effect of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The invention discloses a visible light infrared image fusion target detection example segmentation method, the overall flow chart of which is shown in figure 1, and the method comprises the following steps:

the original visible light image and the infrared image have the problem of different view fields, the phenomenon of target dislocation can occur when the images are directly fused, and image registration operation is needed to ensure that the target positions of the fused images are strictly corresponding. Selecting a visible light image and an infrared image which are shot simultaneously, and respectively selecting 12 groups of calibration points from the two images to record the coordinatesMarked x _{ir_i，} y _{ir_i} And x _{vis_i，} y _{vis_i} (i is ∈ [1, 12 ]]) Wherein the subscript is ir to represent the mark points of the infrared image, vis to represent the mark points of the visible image, i to represent the number of groups, and for each mark point, the real space position is required to be the same, 4 groups of 12 groups of mark points are from the close view target, 4 groups are from the medium view target, and 4 groups are from the distant view target. Calculating the sum of Euclidean distances of the calibration points

The sum of Euclidean distances of the calibration points is minimized by controlling the translation and the scaling of the infrared image, the translation and scaling amount is recorded, and the position of a target on the infrared image is the same as the position of a pixel coordinate on the same target on the visible light image on the image by changing according to the translation and scaling amount after the infrared image is input.

Step 2, training a yolov5 neural network, and extracting a target position coordinate from the infrared registration image by using the yolov5 neural network;

the specific structure of Yolov5 neural network is shown in fig. 2 and fig. 3, and includes:

an input end for performing minimum black edge filling;

inputting an infrared image to perform feature extraction to obtain a feature map;

the Neck part is used for performing up-sampling on the input characteristic diagram to obtain three groups of characteristic diagrams with different scales;

the prediction part is used for carrying out convolution operation on the three groups of characteristic images with different scales to obtain a target position coordinate, and the three groups of characteristic images are integrated on an input image to obtain a final target position coordinate;

before the input image enters the yolov5 neural network, the minimum black edge scaling is carried out to be uniform in size, the value of t in the invention is 608, and the minimum black edge scaling calculation method for the size of the input registration infrared image to be 640 x 512 is as follows:

wherein m and n are the length and width coefficients of the original input image, t is the image side length required by yolov5 neural network,a _i is that the scaling ratio selects the smallest scaling factor

Calculating a fill margin value

，

Wherein c is the size of the filled black border. Use of scale to register infrared imagesm _new 、n _new And scaling is carried out, and black frames with the length of c are filled at the upper and lower sides, so that the black edges at the two ends of the image height are reduced, and the calculated amount is reduced during reasoning.

The Backbone is formed by arranging a Focus module, a CSP module and an SPP module and is used for extracting a characteristic diagram of a graph. Is the backbone network of yolov 5. The Focus structure divides a picture into four different groups of slices, and stacks the slices through Contcat, as shown in FIG. 4. Obtaining 304 × 304 feature maps through a first Focus module; obtaining 152 x 152 feature maps through CBL module ports; and after passing through the CSP1_1 and CBL modules, obtaining a characteristic diagram 76 x 76, after passing through the CSP1_3 module, inputting part of the characteristic diagram into the Neck module, inputting part of the characteristic diagram into the CBL module to obtain a characteristic diagram 38 x 38, storing the characteristic diagram in the CSP1_3 module, inputting the characteristic diagram into the CBL module to obtain a characteristic diagram 19 x 19, and inputting the characteristic diagram into the SPP module to input the Neck part.

And Neck adopts a FPN + PAN structure, down-samples the image, extracts target features of different scales and provides the target features to a prediction layer. The FPN fractions extract 76 × 76, 38 × 38, 19 × 19, respectively, from the size feature layer. The specific method for fusing the two feature layers is that the upper layer feature is up-sampled by 2 times, the lower layer feature changes the channel number of the lower layer feature through 1 × 1 convolution, and then corresponding elements of the result after the up-sampling and the CBL convolution are simply added. In the PAN structure, copying the feature pyramid bottom layer in the FPN into a new feature pyramid bottom layer, superposing the upper sampling of the feature pyramid bottom layer and the upper layer in the FPN, and outputting the predicted target coordinate position and the confidence coefficient through the CSP2_1 convolution structure.

And in the prediction section, performing target screening by adopting nms non-maximum suppression, wherein the nms non-maximum suppression mainly comprises the steps of grouping all rectangular frames according to different types of labels, and sequencing in groups according to confidence levels. And taking out the rectangular frames with the same label, traversing the residual rectangular frames with the same label, calculating the intersection ratio, and deleting the frames which are larger than the set IOU threshold value in the residual rectangular frames.

The key of this embodiment is the model training process, and the yolov5 neural network model for target detection used in the present invention is specifically described below: the method selects a filer data set to train a yolov5 neural network, a flight data set is a visible light and infrared image data set of an automobile traffic road issued in 2018 and 7 months, 14000 images are selected in total, three training labels including people, bicycles and automobiles are selected, mosaic data enhancement is adopted, 4 images are adopted, random scaling, random cutting and random arrangement are adopted for splicing, a CIOU _ Loss is selected as a Loss function, and the CIOU _ Loss Loss function formula is as follows:

wherein

Is the intersection area of the two frames,

is a two-box union area.

Is the length of the minimum external rectangle diagonal of the two frames,

is the Euclidean distance between the central points of the two frame graphs.

Is a parameter to measure the uniformity of aspect ratio. The specific calculation formula is

Whereinw ^gt ，h ^gt Respectively represent the width and height of the prediction block diagram,w ^p ，h ^p respectively, represent the width and height of the actual block diagram.

After training is carried out to obtain the training weight, the matched infrared image is input to obtain the position coordinates of the predicted target, wherein the specific target comprises a pedestrian, a bicycle and an automobile.

And 3, training a mask _ rcnn neural network, and extracting a mask in the infrared registration image by using the mask _ rcnn neural network.

Fig. 5 is a schematic diagram of a mask _ rcnn neural network, and the structure includes:

extracting feature graphs of different scales from the ResNet-FPN part, and constructing a feature pyramid;

the RPN part extracts a required part in the feature map;

the ROI Align part is used for pooling the extracted part in the feature map;

a mask part for inputting the characteristic diagram to obtain a mask;

the ResNet-FPN part comprises five convolutional layers, four feature maps with different scales are output from the second layer to the fifth layer, and a feature map which is scaled by 4,8, 16 and 32 times is extracted from an original image, as shown in the figure 6.

After the RPN partial pictures are input, a group of RPN partial pictures passes through the convolution layer of 1*1, and then passes through reshape, softmax, and reshape. Wherein the softmax module can extract a required part in the extracted feature map, as shown in fig. 7;

the ROI Align equally divides each sampling point of the feature map into four grids, and the four sampling points in each grid are subjected to Maxpoling, so that the final ROIAlign result can be obtained.

The Mask part uses FCN full convolution network for image segmentation.

The invention adopts a coco data set for mask _ rcnn training, wherein the coco data set is a large-scale data set for object detection, segmentation and character. The method comprises 330k, wherein three labels of a car, a bicycle and a pedestrian are selected, irrelevant pictures are removed for training, and a method of Mosaic data enhancement is also adopted.

And after training to obtain corresponding weight, performing example segmentation on the registered infrared image to obtain a mask.

Step 4, determining a fusion range according to the target position coordinates, adopting an ADF image fusion method for the fusion range of the visible light image and the infrared registration image, giving pseudo-color processing, and performing example segmentation on the pseudo-color image by using a mask;

registering the visible image with the infrared image, in x _1， y ₁ Is the coordinate of the point at the upper left corner, x _2， y ₂ And constructing a target position coordinate rectangular frame for the coordinates of the lower right corner point, selecting the visible light image to be fused and the infrared registration image to be fused by using the target position coordinate rectangular frame, and using an ADF image fusion method.

The ADF image fusion process is shown in FIG. 8, and anisotropic filtering is performed on the visible light image and the infrared registration image to obtain a visible light image base layer B _vis Registering image base layer B with infrared _ir : the anisotropic filtering pair is specifically operated to perform image decomposition by:

whereinI _t+1 In order to output the image(s),I _t for the input image, c isThe diffusion rate is V-shaped, delta is Laplace operator, t is iteration number, is set to 10, and the input original image is I ₀ Input and output are I ₁ In the following ₁ Inputting, performing 10 iterations to obtain I ₁₀ I.e. the base layer image, the above formula is developed to obtain:

v is a gradient operator, subscripts N, S, E, W represent gradient for a single direction, Δ is the Laplace operator, t is the number of iterations, c _N ,c _S ,c _W ,c _E The conduction coefficient of the input image to the north, south, west and east directions, I ^t _i，j Representing the input image, I, j representing the coordinates of the individual pixel positions, I ^t+1 _i，j The iteration number of the output image is set to 10, and the value range of lambda is [0,1/4 ]]For V in the formula _N I ^t _i，j ，∨ _S I ^t _i，j ，∨ _W I ^t _i，j ，∨ _E I ^t _i，j The partial expansion formula is as follows:

wherein I _i-1,j -I _i,j The expression is that for each pixel on the image, the pixel under the pixel point is used to subtract the pixel, the boundary position takes a zero value, and the other formulas are similar, and for c in the formula _N ,c _S ,c _W ,c _E Part of the specific calculation formulas are as follows:

wherein _N I ^t _i，j ，∨ _S I ^t _i，j ，∨ _W I ^t _i，j ，∨ _E I ^t _i，j As defined above, function

Need to satisfy monotonically decreasing and

two calculation options are adopted, the following two options are adopted

：

The operation visible light original image is I ₀ Input and output are I ₁ In the following ₁ Inputting, performing 10 iterations to obtain I ₁₀ I.e. the visible light image base layer B _vis Obtaining the infrared registration image base layer B by using the same operation for the infrared registration image _ir 。

And subtracting the visible light image basic layer from the visible light image to obtain a visible light image detail layer, wherein the infrared registration image detail layer is calculated as:

the visible light image detail layers are:

wherein I _vis And I _ir Are respectively visibleThe light image is registered with the infrared image.

Respectively arranging a visible light image detail layer and an infrared image detail layer into X _vis And X _ir Find X _vis And X _ir Covariance matrix C of _xx Calculating a covariance matrix C _xx Set of eigenvalues w ₁ 、w ₂ And feature vector set

Calculating the uncorrelated coefficients and selecting a larger eigenvalue w _max Selecting the corresponding feature vector v _max Finding out coefficients of uncorrelated components in detail layers

Fusing image detail layers

Fused image base layer

The fused image is

And performing pseudo-color processing on the fused image part by using a COLORMAP _ JET strategy to obtain a color fused image, replacing an image in a rectangular frame of a target position coordinate of the visible light image with the color fused image, obtaining a global color fused image, performing inversion by using a mask and overlapping with the visible light image to obtain a visible light background image, overlapping the mask and the global color fused image to obtain a global color fused image example segmentation part, and overlapping the global color fused image example segmentation part and the visible light background image to obtain a final result.

The experimental scene is a scene facing urban traffic roads, and a real-time calculation result of 60w power is obtained by using a Haikang gigabit Internet access industrial area array camera visible light detector, a Xencisgobi +640 infrared thermal imager and a binocular camera on a Jetson-Nano development platform. The shot was 3 minutes 23 seconds, including 4067 pictures.

Fig. 9 is a final detection sample diagram, which can accurately detect and divide objects for pedestrians and vehicles in urban road traffic, and perform infrared and visible light fusion for a detection area.

Table 1 shows the time consumption of a single picture using full fusion and local fusion, which is compared with the time consumption of each partial link, and it is obvious that the time consumption of local fusion is reduced compared with global fusion. Table 2 shows the comparison of the precision of different segmentation methods, and the segmentation effect is similar for the neural networks of the medium and large targets, but the mask _ rcnn network has obvious advantages for the small target at a distance.

TABLE 1 comparison of time consumption of Single Picture Using full fusion and partial fusion

TABLE 2 precision comparison of different segmentation methods

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims

1. A visible light infrared image fusion target detection example segmentation method is characterized by comprising the following steps:

2. The visible light infrared image fusion target detection instance segmentation method according to claim 1, characterized in that: step 1, registering the infrared image and the visible light image, calibrating the same target in the infrared image and the visible light image, and translating and zooming the infrared image according to a calibration result to obtain an infrared registration image, wherein the specific method comprises the following steps:

selecting a visible light image and an infrared image which are shot simultaneously, respectively selecting 12 groups of calibration points from the two images to record coordinates as x _{ir_i} ,y _{ir_i} And x _{vis_i} ,y _{vis_i} (i is ∈ [1, 12 ]]) The subscript is ir to represent the mark points of the infrared image, vis to represent the mark points of the visible image, i to represent the number of groups, and for each mark point, the real space position is required to be the same, 4 groups of 12 groups of mark points are from a close-range target, 4 groups of mark points are from a medium-range target, and 4 groups of mark points are from a far-range target;

calculating the sum of Euclidean distances of the calibration points:

3. The method for segmenting the visible light infrared image fusion target detection example according to claim 1, characterized in that: step 2, acquiring a target position coordinate in the infrared registration image by using a yolov5 neural network, wherein the specific method comprises the following steps:

(a) Training a yolov5 neural network by using a flight data set;

(b) Carrying out target detection by using a yolov5 neural network;

wherein m and n are the length-width coefficient of the original input image, t is the image side length required by yolov5 neural network input, a ₁ ,a ₂ The scaling coefficient of the length and the width of the original image relative to the yolov5 neural network required for image input;

wherein a is _min Is a ₁ ,a ₂ The smaller of the two;

，

wherein c is the size of the filled black edge;

loading the weight obtained by training in the step (a), inputting the infrared registration image after scaling and filling into a yolov5 neural network, and outputting a target position coordinate (x) ₁ ,x ₂ ,y ₁ ,y ₂ ,c),x ₁ ,y ₁ Is the target upper left corner point coordinate, x ₂ ,y ₂ Coordinate of the lower right corner point of the target, and c is the target classification.

4. The visible-light infrared image fusion target detection instance segmentation method according to claim 3, characterized in that: step 3, extracting a mask from the infrared image by using a mask _ rcnn neural network, wherein the method comprises the following steps:

5. The visible-light infrared image fusion target detection instance segmentation method according to claim 3, characterized in that: step 4, determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method, wherein the specific method comprises the following steps:

whereinI _t+1 In order to output the image(s),I _t for the input image, c is the diffusion rate, the V is the gradient operator, Δ is the Laplace operator, and t is the set iteration number;

the visible light image detail layers are:

step 4.4, respectively arranging the visible light image detail layer and the infrared registration image detail layerX _vis And X _ir Find X _vis And X _ir Covariance matrix C of _xx Calculating a covariance matrix C _xx Set of eigenvalues w ₁ 、w ₂ And feature vector set:

selecting w ₁ 、w ₂ The larger of the characteristic values is w _max Selecting w _max Corresponding feature vector v _max And obtaining the coefficient of the irrelevant component of the detail layer, wherein KL1 is the coefficient of the irrelevant component of the visible light image detail layer, and KL2 is the coefficient of the irrelevant component of the infrared image detail layer:

fusing image detail layers:

fusing image base layers:

fusing images:

。

6. the visible-light infrared image fusion target detection instance segmentation method according to claim 1, characterized in that: and 5, endowing pseudo colors to the fusion image, and performing instance segmentation by using a mask, wherein the specific method comprises the following steps of:

7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements target instance segmentation based on visible light image and infrared image fusion detection based on the visible light infrared image fusion target detection instance segmentation method of any one of claims 1 to 6.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements target instance segmentation based on visible light image and infrared image fusion detection based on the visible light infrared image fusion target detection instance segmentation method of any one of claims 1 to 6.