CN115170810A - Visible light infrared image fusion target detection example segmentation method - Google Patents

Visible light infrared image fusion target detection example segmentation method Download PDF

Info

Publication number
CN115170810A
CN115170810A CN202211095330.4A CN202211095330A CN115170810A CN 115170810 A CN115170810 A CN 115170810A CN 202211095330 A CN202211095330 A CN 202211095330A CN 115170810 A CN115170810 A CN 115170810A
Authority
CN
China
Prior art keywords
image
infrared
visible light
target
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211095330.4A
Other languages
Chinese (zh)
Other versions
CN115170810B (en
Inventor
任侃
赵俊逸
钱惟贤
顾国华
陈钱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202211095330.4A priority Critical patent/CN115170810B/en
Publication of CN115170810A publication Critical patent/CN115170810A/en
Application granted granted Critical
Publication of CN115170810B publication Critical patent/CN115170810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a visible light infrared image fusion target detection example segmentation method, which comprises the steps of registering an infrared image and a visible light image, calibrating the same target in the infrared image and the visible light image, and translating and zooming the infrared image according to a calibration result to obtain an infrared registration image; extracting a target position coordinate from the infrared registration image by using a yolov5 neural network; extracting a mask in the infrared registration image by using a mask _ rcnn neural network; determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method; the fused image is given a pseudo color, and example segmentation is performed using a mask. The invention reserves the advantage of large visual field of visible light for the background part, and combines the advantages of high visibility of infrared severe environment and visible light space detail for the target part.

Description

Visible light infrared image fusion target detection example segmentation method
Technical Field
The invention relates to the field of computational vision, in particular to a visible light infrared image fusion target detection example segmentation method.
Background
Visual information is an important acquisition information channel for human beings, and 83% of information acquired by human beings comes from the visual information. The visual sensor is an important tool for assisting human to obtain visual information in human production and life. However, in the face of a complex and changeable natural environment, the information obtained by the single vision sensor has great limitation, and an image fusion technology is developed.
Infrared imaging devices and visible light imaging devices have received a great deal of attention in the field of imaging devices. The visible light image is imaged by depending on object reflected light, and the generated image is suitable for human eyes to observe, has higher spatial resolution and considerable detail and light and shade contrast, but is easily interfered by extreme environments such as poor illumination, fog and other bad weather. The infrared image is imaged by depending on the thermal radiation of an object, has a good imaging effect under the conditions of poor illumination, fog and other severe weather, but has lower spatial resolution and reduced details compared with a visible light imaging device. The infrared image and the visible light image are fused, so that the advantages of the infrared image and the visible light image can be integrated, the generated image is rich in details, and the infrared image and the visible light image are suitable for severe weather. However, the fusion of the whole image can make the image fusion take a long time, for example, for urban traffic conditions, it is generally only necessary to care about the targets such as pedestrians, vehicles, motorcycles, etc., and for background roads, billboards, sky, etc., the fusion of the targets can generate a great excess.
Disclosure of Invention
The invention aims to provide a visible light infrared image fusion target detection example segmentation method, which only segments out corresponding examples aiming at concerned targets.
The technical solution for realizing the purpose of the invention is as follows: a visible light infrared image fusion target detection example segmentation method comprises the following steps:
step 1, registering an infrared image and a visible light image, calibrating the same target in the infrared image and the visible light image, and translating and scaling the infrared image according to a calibration result to obtain an infrared registration image;
step 2, extracting a target position coordinate from the infrared registration image by using a yolov5 neural network;
step 3, extracting a mask from the infrared registration image by using a mask _ rcnn neural network;
step 4, determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method;
and 5, giving a pseudo color to the fused image, and performing example segmentation by using a mask.
Further, step 1, registering the infrared image and the visible light image, calibrating the same target in the infrared image and the visible light image, and translating and zooming the infrared image according to the calibration result to obtain the infrared registration image, wherein the specific method comprises the following steps:
selecting a visible light image and an infrared image which are shot simultaneously, respectively selecting 12 groups of calibration points from the two images to record coordinates as x ir_i, y ir_i And x vis_i, y vis_i (i is ∈ [1, 12 ]]) The subscript is ir to represent the mark points of the infrared image, vis to represent the mark points of the visible image, i to represent the number of groups, and for each mark point, the real space position is required to be the same, 4 groups of 12 groups of mark points are from a close-range target, 4 groups of mark points are from a medium-range target, and 4 groups of mark points are from a far-range target;
calculating the sum of Euclidean distances of the calibration points:
Figure 655361DEST_PATH_IMAGE001
and controlling the infrared image translation scaling to enable the sum of Euclidean distances of the calibration points to be minimum, recording the translation scaling amount, and performing the same transformation on all the infrared images to obtain the infrared registration image.
Further, step 2, using yolov5 neural network to obtain the target position coordinates in the infrared registration image, the specific method is as follows:
(a) Training a yolov5 neural network by using a flight data set;
training a yolov5 neural network by using three labels of small vehicles, self-propelled vehicles and pedestrians in a flight data set, adopting Mosaic data enhancement in the training, randomly selecting 4 infrared images, forming a brand new picture by means of random scaling, random cutting and random arrangement, forming a training set by using an original flight data set image and an image obtained by enhancing the Mosaic data, inputting the training set into the yolov5 neural network for training, and obtaining a training weight;
(b) Carrying out target detection by using a yolov5 neural network;
and (3) performing minimum black edge scaling on the registered infrared image, wherein the minimum black edge scaling calculation method comprises the following steps:
Figure 41343DEST_PATH_IMAGE002
wherein m and n are the length and width coefficients of the original input image, t is the image side length required by yolov5 neural network input, a 1, a 2 The scaling coefficient of the length and the width of the original image relative to the yolov5 neural network required for image input;
use of scale to register infrared imagesm new 、n new Zooming is carried out;
Figure 414556DEST_PATH_IMAGE003
wherein a is min Is a 1, a 2 The smaller of these;
filling a black frame with the length of c at the upper and lower sides, wherein the calculation formula of the width of the filling side is as follows:
Figure 278607DEST_PATH_IMAGE004
Figure 686454DEST_PATH_IMAGE005
wherein c is the size of the filled black edge;
loading the weight obtained by training in the step (a), inputting the infrared registration image after the zooming and filling into a yolov5 neural network, and outputting the target position coordinate (x) 1 ,x 2 ,y 1 ,y 2 ,c),x 1 ,y 1 Is the target upper left corner point coordinate, x 2 ,y 2 Coordinate of the lower right corner of the target, and c is the target classification.
Further, step 3, extracting a mask in the infrared image by using a mask _ rcnn neural network, including:
performing mask _ rcnn neural network training by using a coco data set to obtain training weights;
and (c) loading the training weight in the step (a), and inputting the infrared registration image into a mask _ rcnn neural network to obtain a mask.
Further, step 4, determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method, wherein the specific method comprises the following steps:
step 4.1, with x 1 ,y 1 As coordinates of the upper left corner point, x 2 ,y 2 Constructing a target position coordinate rectangular frame for the coordinates of the lower right corner, selecting a visible light image to be fused and an infrared image to be fused for the visible light image and the infrared registration image by using the target position coordinate rectangular frame, and using an ADF image fusion method;
step 4.2, anisotropic filtering is carried out on the visible light image and the infrared registration image to obtain a visible light image base layer B vis Registering image base layer B with infrared ir Wherein the anisotropic filtering pair for image decomposition is specifically operative to:
Figure 977758DEST_PATH_IMAGE006
whereinI t+1 In order to output the image(s),I t for the input image, c is the diffusion rate, the V is the gradient operator, Δ is the Laplace operator, t is the set iteration number;
and 4.3, subtracting the visible light image basic layer from the visible light image to be fused to obtain a visible light image detail layer, and subtracting the infrared image basic layer from the infrared image to be fused to obtain an infrared image detail layer, wherein the infrared image detail layer is calculated as follows:
Figure 838267DEST_PATH_IMAGE007
the visible light image detail layers are:
Figure 506009DEST_PATH_IMAGE008
wherein I vis And I ir Respectively a visible light image to be fused and an infrared registration image;
step 4.4, respectively arranging the visible light image detail layer and the infrared registration image detail layer as X vis And X ir Find X vis And X ir Covariance matrix C of xx Calculating a covariance matrix C xx Set of eigenvalues w 1 、w 2 And feature vector set:
Figure 269828DEST_PATH_IMAGE009
selecting w 1 、w 2 The larger of the characteristic values is w max Selecting w max Corresponding feature vector v max The irrelevant subassembly coefficient in detail layer is sought, KL1 is the irrelevant subassembly coefficient in visible light image detail layer, KL2 is the irrelevant subassembly coefficient in infrared image detail layer:
Figure 732033DEST_PATH_IMAGE010
fusing image detail layers:
Figure 283100DEST_PATH_IMAGE011
fusing image base layers:
Figure 613587DEST_PATH_IMAGE012
fusing images:
Figure 605814DEST_PATH_IMAGE013
further, step 5, endowing pseudo color to the fused image, and performing example segmentation by using a mask, wherein the specific method comprises the following steps:
performing pseudo-color processing on the fused image by using COLORMAP _ JET to obtain a color fused image, replacing the image in a rectangular frame of the target position coordinates of the visible light image with the color fused image, obtaining a global color fused image, using a mask to perform negation and overlapping with the visible light image to obtain a visible light background image, overlapping the mask and the global color fused image to obtain a global color fused image example segmentation part, and overlapping the global color fused image example segmentation part and the visible light background image to obtain a final result.
A computer device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, when the processor executes the computer program, the target instance segmentation based on the visible light and infrared image fusion target detection instance segmentation method is realized.
A computer readable storage medium, on which a computer program is stored, which, when executed by a processor, implements a target instance segmentation based on visible light image and infrared image fusion detection based on the visible light infrared image fusion target detection instance segmentation method.
Compared with the prior art, the invention has the remarkable advantages that: (1) Compared with a common visible light and infrared image fusion method, the method can intelligently detect the target concerned on the image and carry out example segmentation on the concerned target; (2) Compared with a common visible light and infrared image fusion method, the method adopts local fusion based on the position coordinates of the target, thereby not only ensuring the detection of the concerned target, but also shortening the time consumed by image fusion; (3) A Mosaic data enhancement method is adopted for training the mask _ rcnn neural network, and the detection performance of the network on small targets is improved.
Drawings
FIG. 1 is a schematic diagram of the structure implementation of the present invention.
Fig. 2 is a structural schematic diagram of yolov 5.
Fig. 3 is a schematic diagram of yolov5 specific modules.
Fig. 4 is a schematic structural diagram of slice in yolov 5.
FIG. 5 is a schematic diagram of the mask _ rcnn structure.
FIG. 6 is a diagram of the structure of ResNet-FPN.
Fig. 7 is a PRN network structure.
Fig. 8 is an ADF image fusion flow chart.
FIG. 9 is a diagram of the detection effect of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The invention discloses a visible light infrared image fusion target detection example segmentation method, the overall flow chart of which is shown in figure 1, and the method comprises the following steps:
step 1, registering an infrared image and a visible light image, calibrating the same target in the infrared image and the visible light image, and translating and scaling the infrared image according to a calibration result to obtain an infrared registration image;
the original visible light image and the infrared image have the problem of different view fields, the phenomenon of target dislocation can occur when the images are directly fused, and image registration operation is needed to ensure that the target positions of the fused images are strictly corresponding. Selecting a visible light image and an infrared image which are shot simultaneously, and respectively selecting 12 groups of calibration points from the two images to record the coordinatesMarked x ir_i, y ir_i And x vis_i, y vis_i (i is ∈ [1, 12 ]]) Wherein the subscript is ir to represent the mark points of the infrared image, vis to represent the mark points of the visible image, i to represent the number of groups, and for each mark point, the real space position is required to be the same, 4 groups of 12 groups of mark points are from the close view target, 4 groups are from the medium view target, and 4 groups are from the distant view target. Calculating the sum of Euclidean distances of the calibration points
Figure 301237DEST_PATH_IMAGE015
The sum of Euclidean distances of the calibration points is minimized by controlling the translation and the scaling of the infrared image, the translation and scaling amount is recorded, and the position of a target on the infrared image is the same as the position of a pixel coordinate on the same target on the visible light image on the image by changing according to the translation and scaling amount after the infrared image is input.
Step 2, training a yolov5 neural network, and extracting a target position coordinate from the infrared registration image by using the yolov5 neural network;
the specific structure of Yolov5 neural network is shown in fig. 2 and fig. 3, and includes:
an input end for performing minimum black edge filling;
inputting an infrared image to perform feature extraction to obtain a feature map;
the Neck part is used for performing up-sampling on the input characteristic diagram to obtain three groups of characteristic diagrams with different scales;
the prediction part is used for carrying out convolution operation on the three groups of characteristic images with different scales to obtain a target position coordinate, and the three groups of characteristic images are integrated on an input image to obtain a final target position coordinate;
before the input image enters the yolov5 neural network, the minimum black edge scaling is carried out to be uniform in size, the value of t in the invention is 608, and the minimum black edge scaling calculation method for the size of the input registration infrared image to be 640 x 512 is as follows:
Figure 136338DEST_PATH_IMAGE002
wherein m and n are the length and width coefficients of the original input image, t is the image side length required by yolov5 neural network,a i is that the scaling ratio selects the smallest scaling factor
Figure 880303DEST_PATH_IMAGE016
Calculating a fill margin value
Figure 618715DEST_PATH_IMAGE004
Figure 422722DEST_PATH_IMAGE017
Wherein c is the size of the filled black border. Use of scale to register infrared imagesm new 、n new And scaling is carried out, and black frames with the length of c are filled at the upper and lower sides, so that the black edges at the two ends of the image height are reduced, and the calculated amount is reduced during reasoning.
The Backbone is formed by arranging a Focus module, a CSP module and an SPP module and is used for extracting a characteristic diagram of a graph. Is the backbone network of yolov 5. The Focus structure divides a picture into four different groups of slices, and stacks the slices through Contcat, as shown in FIG. 4. Obtaining 304 × 304 feature maps through a first Focus module; obtaining 152 x 152 feature maps through CBL module ports; and after passing through the CSP1_1 and CBL modules, obtaining a characteristic diagram 76 x 76, after passing through the CSP1_3 module, inputting part of the characteristic diagram into the Neck module, inputting part of the characteristic diagram into the CBL module to obtain a characteristic diagram 38 x 38, storing the characteristic diagram in the CSP1_3 module, inputting the characteristic diagram into the CBL module to obtain a characteristic diagram 19 x 19, and inputting the characteristic diagram into the SPP module to input the Neck part.
And Neck adopts a FPN + PAN structure, down-samples the image, extracts target features of different scales and provides the target features to a prediction layer. The FPN fractions extract 76 × 76, 38 × 38, 19 × 19, respectively, from the size feature layer. The specific method for fusing the two feature layers is that the upper layer feature is up-sampled by 2 times, the lower layer feature changes the channel number of the lower layer feature through 1 × 1 convolution, and then corresponding elements of the result after the up-sampling and the CBL convolution are simply added. In the PAN structure, copying the feature pyramid bottom layer in the FPN into a new feature pyramid bottom layer, superposing the upper sampling of the feature pyramid bottom layer and the upper layer in the FPN, and outputting the predicted target coordinate position and the confidence coefficient through the CSP2_1 convolution structure.
And in the prediction section, performing target screening by adopting nms non-maximum suppression, wherein the nms non-maximum suppression mainly comprises the steps of grouping all rectangular frames according to different types of labels, and sequencing in groups according to confidence levels. And taking out the rectangular frames with the same label, traversing the residual rectangular frames with the same label, calculating the intersection ratio, and deleting the frames which are larger than the set IOU threshold value in the residual rectangular frames.
The key of this embodiment is the model training process, and the yolov5 neural network model for target detection used in the present invention is specifically described below: the method selects a filer data set to train a yolov5 neural network, a flight data set is a visible light and infrared image data set of an automobile traffic road issued in 2018 and 7 months, 14000 images are selected in total, three training labels including people, bicycles and automobiles are selected, mosaic data enhancement is adopted, 4 images are adopted, random scaling, random cutting and random arrangement are adopted for splicing, a CIOU _ Loss is selected as a Loss function, and the CIOU _ Loss Loss function formula is as follows:
Figure 745119DEST_PATH_IMAGE019
wherein
Figure 292775DEST_PATH_IMAGE020
Figure 384228DEST_PATH_IMAGE021
Is the intersection area of the two frames,
Figure 624717DEST_PATH_IMAGE022
is a two-box union area.
Figure 168830DEST_PATH_IMAGE023
Is the length of the minimum external rectangle diagonal of the two frames,
Figure 520177DEST_PATH_IMAGE024
is the Euclidean distance between the central points of the two frame graphs.
Figure 967601DEST_PATH_IMAGE025
Is a parameter to measure the uniformity of aspect ratio. The specific calculation formula is
Figure 378991DEST_PATH_IMAGE027
Whereinw gt h gt Respectively represent the width and height of the prediction block diagram,w p h p respectively, represent the width and height of the actual block diagram.
After training is carried out to obtain the training weight, the matched infrared image is input to obtain the position coordinates of the predicted target, wherein the specific target comprises a pedestrian, a bicycle and an automobile.
And 3, training a mask _ rcnn neural network, and extracting a mask in the infrared registration image by using the mask _ rcnn neural network.
Fig. 5 is a schematic diagram of a mask _ rcnn neural network, and the structure includes:
extracting feature graphs of different scales from the ResNet-FPN part, and constructing a feature pyramid;
the RPN part extracts a required part in the feature map;
the ROI Align part is used for pooling the extracted part in the feature map;
a mask part for inputting the characteristic diagram to obtain a mask;
the ResNet-FPN part comprises five convolutional layers, four feature maps with different scales are output from the second layer to the fifth layer, and a feature map which is scaled by 4,8, 16 and 32 times is extracted from an original image, as shown in the figure 6.
After the RPN partial pictures are input, a group of RPN partial pictures passes through the convolution layer of 1*1, and then passes through reshape, softmax, and reshape. Wherein the softmax module can extract a required part in the extracted feature map, as shown in fig. 7;
the ROI Align equally divides each sampling point of the feature map into four grids, and the four sampling points in each grid are subjected to Maxpoling, so that the final ROIAlign result can be obtained.
The Mask part uses FCN full convolution network for image segmentation.
The invention adopts a coco data set for mask _ rcnn training, wherein the coco data set is a large-scale data set for object detection, segmentation and character. The method comprises 330k, wherein three labels of a car, a bicycle and a pedestrian are selected, irrelevant pictures are removed for training, and a method of Mosaic data enhancement is also adopted.
And after training to obtain corresponding weight, performing example segmentation on the registered infrared image to obtain a mask.
Step 4, determining a fusion range according to the target position coordinates, adopting an ADF image fusion method for the fusion range of the visible light image and the infrared registration image, giving pseudo-color processing, and performing example segmentation on the pseudo-color image by using a mask;
registering the visible image with the infrared image, in x 1, y 1 Is the coordinate of the point at the upper left corner, x 2, y 2 And constructing a target position coordinate rectangular frame for the coordinates of the lower right corner point, selecting the visible light image to be fused and the infrared registration image to be fused by using the target position coordinate rectangular frame, and using an ADF image fusion method.
The ADF image fusion process is shown in FIG. 8, and anisotropic filtering is performed on the visible light image and the infrared registration image to obtain a visible light image base layer B vis Registering image base layer B with infrared ir : the anisotropic filtering pair is specifically operated to perform image decomposition by:
Figure 410401DEST_PATH_IMAGE028
whereinI t+1 In order to output the image(s),I t for the input image, c isThe diffusion rate is V-shaped, delta is Laplace operator, t is iteration number, is set to 10, and the input original image is I 0 Input and output are I 1 In the following 1 Inputting, performing 10 iterations to obtain I 10 I.e. the base layer image, the above formula is developed to obtain:
Figure 299860DEST_PATH_IMAGE030
v is a gradient operator, subscripts N, S, E, W represent gradient for a single direction, Δ is the Laplace operator, t is the number of iterations, c N ,c S ,c W ,c E The conduction coefficient of the input image to the north, south, west and east directions, I t i,j Representing the input image, I, j representing the coordinates of the individual pixel positions, I t+1 i,j The iteration number of the output image is set to 10, and the value range of lambda is [0,1/4 ]]For V in the formula N I t i,j ,∨ S I t i,j ,∨ W I t i,j ,∨ E I t i,j The partial expansion formula is as follows:
Figure 100326DEST_PATH_IMAGE032
Figure 682617DEST_PATH_IMAGE034
wherein I i-1,j -I i,j The expression is that for each pixel on the image, the pixel under the pixel point is used to subtract the pixel, the boundary position takes a zero value, and the other formulas are similar, and for c in the formula N ,c S ,c W ,c E Part of the specific calculation formulas are as follows:
Figure 201323DEST_PATH_IMAGE035
Figure 894472DEST_PATH_IMAGE036
wherein N I t i,j ,∨ S I t i,j ,∨ W I t i,j ,∨ E I t i,j As defined above, function
Figure 50909DEST_PATH_IMAGE037
Need to satisfy monotonically decreasing and
Figure 804102DEST_PATH_IMAGE038
two calculation options are adopted, the following two options are adopted
Figure 810104DEST_PATH_IMAGE039
Figure 306944DEST_PATH_IMAGE041
The operation visible light original image is I 0 Input and output are I 1 In the following 1 Inputting, performing 10 iterations to obtain I 10 I.e. the visible light image base layer B vis Obtaining the infrared registration image base layer B by using the same operation for the infrared registration image ir
And subtracting the visible light image basic layer from the visible light image to obtain a visible light image detail layer, wherein the infrared registration image detail layer is calculated as:
Figure 82002DEST_PATH_IMAGE042
the visible light image detail layers are:
Figure 537254DEST_PATH_IMAGE043
wherein I vis And I ir Are respectively visibleThe light image is registered with the infrared image.
Respectively arranging a visible light image detail layer and an infrared image detail layer into X vis And X ir Find X vis And X ir Covariance matrix C of xx Calculating a covariance matrix C xx Set of eigenvalues w 1 、w 2 And feature vector set
Figure 233815DEST_PATH_IMAGE044
Calculating the uncorrelated coefficients and selecting a larger eigenvalue w max Selecting the corresponding feature vector v max Finding out coefficients of uncorrelated components in detail layers
Figure 268767DEST_PATH_IMAGE046
Fusing image detail layers
Figure 836014DEST_PATH_IMAGE048
Fused image base layer
Figure 291529DEST_PATH_IMAGE050
The fused image is
Figure 147489DEST_PATH_IMAGE051
And performing pseudo-color processing on the fused image part by using a COLORMAP _ JET strategy to obtain a color fused image, replacing an image in a rectangular frame of a target position coordinate of the visible light image with the color fused image, obtaining a global color fused image, performing inversion by using a mask and overlapping with the visible light image to obtain a visible light background image, overlapping the mask and the global color fused image to obtain a global color fused image example segmentation part, and overlapping the global color fused image example segmentation part and the visible light background image to obtain a final result.
The experimental scene is a scene facing urban traffic roads, and a real-time calculation result of 60w power is obtained by using a Haikang gigabit Internet access industrial area array camera visible light detector, a Xencisgobi +640 infrared thermal imager and a binocular camera on a Jetson-Nano development platform. The shot was 3 minutes 23 seconds, including 4067 pictures.
Fig. 9 is a final detection sample diagram, which can accurately detect and divide objects for pedestrians and vehicles in urban road traffic, and perform infrared and visible light fusion for a detection area.
Table 1 shows the time consumption of a single picture using full fusion and local fusion, which is compared with the time consumption of each partial link, and it is obvious that the time consumption of local fusion is reduced compared with global fusion. Table 2 shows the comparison of the precision of different segmentation methods, and the segmentation effect is similar for the neural networks of the medium and large targets, but the mask _ rcnn network has obvious advantages for the small target at a distance.
TABLE 1 comparison of time consumption of Single Picture Using full fusion and partial fusion
Figure DEST_PATH_IMAGE053
TABLE 2 precision comparison of different segmentation methods
Figure DEST_PATH_IMAGE055
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims (8)

1. A visible light infrared image fusion target detection example segmentation method is characterized by comprising the following steps:
step 1, registering an infrared image and a visible light image, calibrating the same target in the infrared image and the visible light image, and translating and scaling the infrared image according to a calibration result to obtain an infrared registration image;
step 2, extracting a target position coordinate from the infrared registration image by using a yolov5 neural network;
step 3, extracting a mask from the infrared registration image by using a mask _ rcnn neural network;
step 4, determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method;
and 5, giving a pseudo color to the fused image, and performing example segmentation by using a mask.
2. The visible light infrared image fusion target detection instance segmentation method according to claim 1, characterized in that: step 1, registering the infrared image and the visible light image, calibrating the same target in the infrared image and the visible light image, and translating and zooming the infrared image according to a calibration result to obtain an infrared registration image, wherein the specific method comprises the following steps:
selecting a visible light image and an infrared image which are shot simultaneously, respectively selecting 12 groups of calibration points from the two images to record coordinates as x ir_i ,y ir_i And x vis_i ,y vis_i (i is ∈ [1, 12 ]]) The subscript is ir to represent the mark points of the infrared image, vis to represent the mark points of the visible image, i to represent the number of groups, and for each mark point, the real space position is required to be the same, 4 groups of 12 groups of mark points are from a close-range target, 4 groups of mark points are from a medium-range target, and 4 groups of mark points are from a far-range target;
calculating the sum of Euclidean distances of the calibration points:
Figure 494713DEST_PATH_IMAGE001
and controlling the infrared image translation scaling to enable the sum of Euclidean distances of the calibration points to be minimum, recording the translation scaling amount, and performing the same transformation on all the infrared images to obtain the infrared registration image.
3. The method for segmenting the visible light infrared image fusion target detection example according to claim 1, characterized in that: step 2, acquiring a target position coordinate in the infrared registration image by using a yolov5 neural network, wherein the specific method comprises the following steps:
(a) Training a yolov5 neural network by using a flight data set;
training a yolov5 neural network by using three labels of small vehicles, self-propelled vehicles and pedestrians in a flight data set, adopting Mosaic data enhancement in the training, randomly selecting 4 infrared images, forming a brand new picture by means of random scaling, random cutting and random arrangement, forming a training set by using an original flight data set image and an image obtained by enhancing the Mosaic data, inputting the training set into the yolov5 neural network for training, and obtaining a training weight;
(b) Carrying out target detection by using a yolov5 neural network;
and (3) performing minimum black edge scaling on the registered infrared image, wherein the minimum black edge scaling calculation method comprises the following steps:
Figure 830402DEST_PATH_IMAGE002
wherein m and n are the length-width coefficient of the original input image, t is the image side length required by yolov5 neural network input, a 1 ,a 2 The scaling coefficient of the length and the width of the original image relative to the yolov5 neural network required for image input;
use of scale to register infrared imagesm new 、n new Zooming is carried out;
Figure 239518DEST_PATH_IMAGE004
wherein a is min Is a 1 ,a 2 The smaller of the two;
filling a black frame with the length of c at the upper and lower sides, wherein the calculation formula of the width of the filling side is as follows:
Figure 638269DEST_PATH_IMAGE005
Figure 84425DEST_PATH_IMAGE006
wherein c is the size of the filled black edge;
loading the weight obtained by training in the step (a), inputting the infrared registration image after scaling and filling into a yolov5 neural network, and outputting a target position coordinate (x) 1 ,x 2 ,y 1 ,y 2 ,c),x 1 ,y 1 Is the target upper left corner point coordinate, x 2 ,y 2 Coordinate of the lower right corner point of the target, and c is the target classification.
4. The visible-light infrared image fusion target detection instance segmentation method according to claim 3, characterized in that: step 3, extracting a mask from the infrared image by using a mask _ rcnn neural network, wherein the method comprises the following steps:
performing mask _ rcnn neural network training by using a coco data set to obtain training weights;
and (c) loading the training weight in the step (a), and inputting the infrared registration image into a mask _ rcnn neural network to obtain a mask.
5. The visible-light infrared image fusion target detection instance segmentation method according to claim 3, characterized in that: step 4, determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method, wherein the specific method comprises the following steps:
step 4.1, with x 1 ,y 1 As coordinates of the upper left corner point, x 2 ,y 2 Constructing a target position coordinate rectangular frame for the coordinates of the lower right corner, selecting a visible light image to be fused and an infrared image to be fused for the visible light image and the infrared registration image by using the target position coordinate rectangular frame, and using an ADF image fusion method;
step 4.2, anisotropic filtering is carried out on the visible light image and the infrared registration image to obtain a visible light image base layer B vis Registering image base layer B with infrared ir Wherein the anisotropic filtering pair for image decomposition is specifically operative to:
Figure 545625DEST_PATH_IMAGE007
whereinI t+1 In order to output the image(s),I t for the input image, c is the diffusion rate, the V is the gradient operator, Δ is the Laplace operator, and t is the set iteration number;
and 4.3, subtracting the visible light image basic layer from the visible light image to be fused to obtain a visible light image detail layer, and subtracting the infrared image basic layer from the infrared image to be fused to obtain an infrared image detail layer, wherein the infrared image detail layer is calculated as follows:
Figure 752572DEST_PATH_IMAGE008
the visible light image detail layers are:
Figure 615617DEST_PATH_IMAGE009
wherein I vis And I ir Respectively a visible light image to be fused and an infrared registration image;
step 4.4, respectively arranging the visible light image detail layer and the infrared registration image detail layerX vis And X ir Find X vis And X ir Covariance matrix C of xx Calculating a covariance matrix C xx Set of eigenvalues w 1 、w 2 And feature vector set:
Figure 842461DEST_PATH_IMAGE010
selecting w 1 、w 2 The larger of the characteristic values is w max Selecting w max Corresponding feature vector v max And obtaining the coefficient of the irrelevant component of the detail layer, wherein KL1 is the coefficient of the irrelevant component of the visible light image detail layer, and KL2 is the coefficient of the irrelevant component of the infrared image detail layer:
Figure 650011DEST_PATH_IMAGE011
fusing image detail layers:
Figure 536016DEST_PATH_IMAGE012
fusing image base layers:
Figure 207562DEST_PATH_IMAGE013
fusing images:
Figure 136466DEST_PATH_IMAGE014
6. the visible-light infrared image fusion target detection instance segmentation method according to claim 1, characterized in that: and 5, endowing pseudo colors to the fusion image, and performing instance segmentation by using a mask, wherein the specific method comprises the following steps of:
performing pseudo-color processing on the fused image by using COLORMAP _ JET to obtain a color fused image, replacing the image in a rectangular frame of the target position coordinates of the visible light image with the color fused image, obtaining a global color fused image, using a mask to perform negation and overlapping with the visible light image to obtain a visible light background image, overlapping the mask and the global color fused image to obtain a global color fused image example segmentation part, and overlapping the global color fused image example segmentation part and the visible light background image to obtain a final result.
7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements target instance segmentation based on visible light image and infrared image fusion detection based on the visible light infrared image fusion target detection instance segmentation method of any one of claims 1 to 6.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements target instance segmentation based on visible light image and infrared image fusion detection based on the visible light infrared image fusion target detection instance segmentation method of any one of claims 1 to 6.
CN202211095330.4A 2022-09-08 2022-09-08 Visible light infrared image fusion target detection example segmentation method Active CN115170810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211095330.4A CN115170810B (en) 2022-09-08 2022-09-08 Visible light infrared image fusion target detection example segmentation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211095330.4A CN115170810B (en) 2022-09-08 2022-09-08 Visible light infrared image fusion target detection example segmentation method

Publications (2)

Publication Number Publication Date
CN115170810A true CN115170810A (en) 2022-10-11
CN115170810B CN115170810B (en) 2022-12-13

Family

ID=83482397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211095330.4A Active CN115170810B (en) 2022-09-08 2022-09-08 Visible light infrared image fusion target detection example segmentation method

Country Status (1)

Country Link
CN (1) CN115170810B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116895094A (en) * 2023-09-11 2023-10-17 杭州魔点科技有限公司 Dark environment imaging method, system, device and medium based on binocular fusion
CN118038499A (en) * 2024-04-12 2024-05-14 北京航空航天大学 Cross-mode pedestrian re-identification method based on mode conversion

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366353A (en) * 2013-05-08 2013-10-23 北京大学深圳研究生院 Infrared image and visible-light image fusion method based on saliency region segmentation
CN105447838A (en) * 2014-08-27 2016-03-30 北京计算机技术及应用研究所 Method and system for infrared and low-level-light/visible-light fusion imaging
US20180227509A1 (en) * 2015-08-05 2018-08-09 Wuhan Guide Infrared Co., Ltd. Visible light image and infrared image fusion processing system and fusion method
US20190279371A1 (en) * 2018-03-06 2019-09-12 Sony Corporation Image processing apparatus and method for object boundary stabilization in an image of a sequence of images
AU2020100178A4 (en) * 2020-02-04 2020-03-19 Huang, Shuying DR Multiple decision maps based infrared and visible image fusion
CN111062905A (en) * 2019-12-17 2020-04-24 大连理工大学 Infrared and visible light fusion method based on saliency map enhancement
CN111145133A (en) * 2019-12-05 2020-05-12 南京理工大学 ZYNQ-based infrared and visible light co-optical axis image fusion system and method
CN111209810A (en) * 2018-12-26 2020-05-29 浙江大学 Bounding box segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time in visible light and infrared images
CN111275759A (en) * 2020-01-16 2020-06-12 国网江苏省电力有限公司 Transformer substation disconnecting link temperature detection method based on unmanned aerial vehicle double-light image fusion
CN111432172A (en) * 2020-03-20 2020-07-17 浙江大华技术股份有限公司 Fence alarm method and system based on image fusion
CN111611905A (en) * 2020-05-18 2020-09-01 沈阳理工大学 Visible light and infrared fused target identification method
CN111915546A (en) * 2020-08-04 2020-11-10 西安科技大学 Infrared and visible light image fusion method and system, computer equipment and application
CN112016478A (en) * 2020-08-31 2020-12-01 中国电子科技集团公司第三研究所 Complex scene identification method and system based on multispectral image fusion
CN113344475A (en) * 2021-08-05 2021-09-03 国网江西省电力有限公司电力科学研究院 Transformer bushing defect identification method and system based on sequence modal decomposition
CN114332748A (en) * 2021-11-08 2022-04-12 西安电子科技大学 Target detection method based on multi-source feature joint network and self-generation of transformed image
CN114519808A (en) * 2022-02-21 2022-05-20 烟台艾睿光电科技有限公司 Image fusion method, device and equipment and storage medium

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366353A (en) * 2013-05-08 2013-10-23 北京大学深圳研究生院 Infrared image and visible-light image fusion method based on saliency region segmentation
CN105447838A (en) * 2014-08-27 2016-03-30 北京计算机技术及应用研究所 Method and system for infrared and low-level-light/visible-light fusion imaging
US20180227509A1 (en) * 2015-08-05 2018-08-09 Wuhan Guide Infrared Co., Ltd. Visible light image and infrared image fusion processing system and fusion method
US20190279371A1 (en) * 2018-03-06 2019-09-12 Sony Corporation Image processing apparatus and method for object boundary stabilization in an image of a sequence of images
CN111209810A (en) * 2018-12-26 2020-05-29 浙江大学 Bounding box segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time in visible light and infrared images
CN111145133A (en) * 2019-12-05 2020-05-12 南京理工大学 ZYNQ-based infrared and visible light co-optical axis image fusion system and method
CN111062905A (en) * 2019-12-17 2020-04-24 大连理工大学 Infrared and visible light fusion method based on saliency map enhancement
CN111275759A (en) * 2020-01-16 2020-06-12 国网江苏省电力有限公司 Transformer substation disconnecting link temperature detection method based on unmanned aerial vehicle double-light image fusion
AU2020100178A4 (en) * 2020-02-04 2020-03-19 Huang, Shuying DR Multiple decision maps based infrared and visible image fusion
CN111432172A (en) * 2020-03-20 2020-07-17 浙江大华技术股份有限公司 Fence alarm method and system based on image fusion
CN111611905A (en) * 2020-05-18 2020-09-01 沈阳理工大学 Visible light and infrared fused target identification method
CN111915546A (en) * 2020-08-04 2020-11-10 西安科技大学 Infrared and visible light image fusion method and system, computer equipment and application
CN112016478A (en) * 2020-08-31 2020-12-01 中国电子科技集团公司第三研究所 Complex scene identification method and system based on multispectral image fusion
CN113344475A (en) * 2021-08-05 2021-09-03 国网江西省电力有限公司电力科学研究院 Transformer bushing defect identification method and system based on sequence modal decomposition
CN114332748A (en) * 2021-11-08 2022-04-12 西安电子科技大学 Target detection method based on multi-source feature joint network and self-generation of transformed image
CN114519808A (en) * 2022-02-21 2022-05-20 烟台艾睿光电科技有限公司 Image fusion method, device and equipment and storage medium

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
CHUNYU XIE 等: "Infrared and Visible Image Fusion: A Region-Based Deep Learning Method", 《INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTICS AND APPLICATIONS》 *
JIAYI MA 等: "STDFusionNet: An Infrared and Visible Image Fusion Network Based on Salient Target Detection", 《IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT》 *
ZHOU JINGWEN 等: "An infrared and visible image fusion method based on VGG-19 network", 《OPTIK》 *
刘砚菊 等: "GAN网络的红外与可见光图像融合方法研究", 《沈阳理工大学学报》 *
王瑜婧: "显著性检测的红外与可见光图像融合算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
高乙文: "基于混合角点检测的脑磁共振图像配准", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116895094A (en) * 2023-09-11 2023-10-17 杭州魔点科技有限公司 Dark environment imaging method, system, device and medium based on binocular fusion
CN116895094B (en) * 2023-09-11 2024-01-30 杭州魔点科技有限公司 Dark environment imaging method, system, device and medium based on binocular fusion
CN118038499A (en) * 2024-04-12 2024-05-14 北京航空航天大学 Cross-mode pedestrian re-identification method based on mode conversion

Also Published As

Publication number Publication date
CN115170810B (en) 2022-12-13

Similar Documents

Publication Publication Date Title
CN108596101B (en) Remote sensing image multi-target detection method based on convolutional neural network
CN115170810B (en) Visible light infrared image fusion target detection example segmentation method
US20220044375A1 (en) Saliency Map Enhancement-Based Infrared and Visible Light Fusion Method
CN109086668B (en) Unmanned aerial vehicle remote sensing image road information extraction method based on multi-scale generation countermeasure network
CN109886312B (en) Bridge vehicle wheel detection method based on multilayer feature fusion neural network model
CN111563415B (en) Binocular vision-based three-dimensional target detection system and method
US8179393B2 (en) Fusion of a 2D electro-optical image and 3D point cloud data for scene interpretation and registration performance assessment
Vaudrey et al. Differences between stereo and motion behaviour on synthetic and real-world stereo sequences
US10477178B2 (en) High-speed and tunable scene reconstruction systems and methods using stereo imagery
CN113111974A (en) Vision-laser radar fusion method and system based on depth canonical correlation analysis
CN108665496A (en) A kind of semanteme end to end based on deep learning is instant to be positioned and builds drawing method
CA3028599A1 (en) Systems and methods for correcting a high-definition map based on detection of obstructing objects
CN113673444B (en) Intersection multi-view target detection method and system based on angular point pooling
CN111951306A (en) Target detection method for fusion of laser radar and image video
Min et al. Orfd: A dataset and benchmark for off-road freespace detection
CN114254696A (en) Visible light, infrared and radar fusion target detection method based on deep learning
CN112016478B (en) Complex scene recognition method and system based on multispectral image fusion
CN110070025A (en) Objective detection system and method based on monocular image
CN114972748B (en) Infrared semantic segmentation method capable of explaining edge attention and gray scale quantization network
CN114972989A (en) Single remote sensing image height information estimation method based on deep learning algorithm
CN114339185A (en) Image colorization for vehicle camera images
CN113610905A (en) Deep learning remote sensing image registration method based on subimage matching and application
CN111626241A (en) Face detection method and device
Tseng et al. Semi-supervised image depth prediction with deep learning and binocular algorithms
CN116630528A (en) Static scene reconstruction method based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant