CN115170810A - Visible light infrared image fusion target detection example segmentation method - Google Patents
Visible light infrared image fusion target detection example segmentation method Download PDFInfo
- Publication number
- CN115170810A CN115170810A CN202211095330.4A CN202211095330A CN115170810A CN 115170810 A CN115170810 A CN 115170810A CN 202211095330 A CN202211095330 A CN 202211095330A CN 115170810 A CN115170810 A CN 115170810A
- Authority
- CN
- China
- Prior art keywords
- image
- infrared
- visible light
- target
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 42
- 230000011218 segmentation Effects 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000001514 detection method Methods 0.000 title claims abstract description 29
- 238000013528 artificial neural network Methods 0.000 claims abstract description 39
- 238000007500 overflow downdraw method Methods 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims description 32
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000011049 filling Methods 0.000 claims description 7
- 238000013519 translation Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000011068 loading method Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000005520 cutting process Methods 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000009792 diffusion process Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 2
- 238000003860 storage Methods 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 239000003086 colorant Substances 0.000 claims 1
- 230000000007 visual effect Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 23
- 241000282414 Homo sapiens Species 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 239000000284 extract Substances 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000003331 infrared imaging Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a visible light infrared image fusion target detection example segmentation method, which comprises the steps of registering an infrared image and a visible light image, calibrating the same target in the infrared image and the visible light image, and translating and zooming the infrared image according to a calibration result to obtain an infrared registration image; extracting a target position coordinate from the infrared registration image by using a yolov5 neural network; extracting a mask in the infrared registration image by using a mask _ rcnn neural network; determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method; the fused image is given a pseudo color, and example segmentation is performed using a mask. The invention reserves the advantage of large visual field of visible light for the background part, and combines the advantages of high visibility of infrared severe environment and visible light space detail for the target part.
Description
Technical Field
The invention relates to the field of computational vision, in particular to a visible light infrared image fusion target detection example segmentation method.
Background
Visual information is an important acquisition information channel for human beings, and 83% of information acquired by human beings comes from the visual information. The visual sensor is an important tool for assisting human to obtain visual information in human production and life. However, in the face of a complex and changeable natural environment, the information obtained by the single vision sensor has great limitation, and an image fusion technology is developed.
Infrared imaging devices and visible light imaging devices have received a great deal of attention in the field of imaging devices. The visible light image is imaged by depending on object reflected light, and the generated image is suitable for human eyes to observe, has higher spatial resolution and considerable detail and light and shade contrast, but is easily interfered by extreme environments such as poor illumination, fog and other bad weather. The infrared image is imaged by depending on the thermal radiation of an object, has a good imaging effect under the conditions of poor illumination, fog and other severe weather, but has lower spatial resolution and reduced details compared with a visible light imaging device. The infrared image and the visible light image are fused, so that the advantages of the infrared image and the visible light image can be integrated, the generated image is rich in details, and the infrared image and the visible light image are suitable for severe weather. However, the fusion of the whole image can make the image fusion take a long time, for example, for urban traffic conditions, it is generally only necessary to care about the targets such as pedestrians, vehicles, motorcycles, etc., and for background roads, billboards, sky, etc., the fusion of the targets can generate a great excess.
Disclosure of Invention
The invention aims to provide a visible light infrared image fusion target detection example segmentation method, which only segments out corresponding examples aiming at concerned targets.
The technical solution for realizing the purpose of the invention is as follows: a visible light infrared image fusion target detection example segmentation method comprises the following steps:
and 5, giving a pseudo color to the fused image, and performing example segmentation by using a mask.
Further, step 1, registering the infrared image and the visible light image, calibrating the same target in the infrared image and the visible light image, and translating and zooming the infrared image according to the calibration result to obtain the infrared registration image, wherein the specific method comprises the following steps:
selecting a visible light image and an infrared image which are shot simultaneously, respectively selecting 12 groups of calibration points from the two images to record coordinates as x ir_i, y ir_i And x vis_i, y vis_i (i is ∈ [1, 12 ]]) The subscript is ir to represent the mark points of the infrared image, vis to represent the mark points of the visible image, i to represent the number of groups, and for each mark point, the real space position is required to be the same, 4 groups of 12 groups of mark points are from a close-range target, 4 groups of mark points are from a medium-range target, and 4 groups of mark points are from a far-range target;
calculating the sum of Euclidean distances of the calibration points:
and controlling the infrared image translation scaling to enable the sum of Euclidean distances of the calibration points to be minimum, recording the translation scaling amount, and performing the same transformation on all the infrared images to obtain the infrared registration image.
Further, step 2, using yolov5 neural network to obtain the target position coordinates in the infrared registration image, the specific method is as follows:
(a) Training a yolov5 neural network by using a flight data set;
training a yolov5 neural network by using three labels of small vehicles, self-propelled vehicles and pedestrians in a flight data set, adopting Mosaic data enhancement in the training, randomly selecting 4 infrared images, forming a brand new picture by means of random scaling, random cutting and random arrangement, forming a training set by using an original flight data set image and an image obtained by enhancing the Mosaic data, inputting the training set into the yolov5 neural network for training, and obtaining a training weight;
(b) Carrying out target detection by using a yolov5 neural network;
and (3) performing minimum black edge scaling on the registered infrared image, wherein the minimum black edge scaling calculation method comprises the following steps:
wherein m and n are the length and width coefficients of the original input image, t is the image side length required by yolov5 neural network input, a 1, a 2 The scaling coefficient of the length and the width of the original image relative to the yolov5 neural network required for image input;
use of scale to register infrared imagesm new 、n new Zooming is carried out;
wherein a is min Is a 1, a 2 The smaller of these;
filling a black frame with the length of c at the upper and lower sides, wherein the calculation formula of the width of the filling side is as follows:
wherein c is the size of the filled black edge;
loading the weight obtained by training in the step (a), inputting the infrared registration image after the zooming and filling into a yolov5 neural network, and outputting the target position coordinate (x) 1 ,x 2 ,y 1 ,y 2 ,c),x 1 ,y 1 Is the target upper left corner point coordinate, x 2 ,y 2 Coordinate of the lower right corner of the target, and c is the target classification.
Further, step 3, extracting a mask in the infrared image by using a mask _ rcnn neural network, including:
performing mask _ rcnn neural network training by using a coco data set to obtain training weights;
and (c) loading the training weight in the step (a), and inputting the infrared registration image into a mask _ rcnn neural network to obtain a mask.
Further, step 4, determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method, wherein the specific method comprises the following steps:
step 4.1, with x 1 ,y 1 As coordinates of the upper left corner point, x 2 ,y 2 Constructing a target position coordinate rectangular frame for the coordinates of the lower right corner, selecting a visible light image to be fused and an infrared image to be fused for the visible light image and the infrared registration image by using the target position coordinate rectangular frame, and using an ADF image fusion method;
step 4.2, anisotropic filtering is carried out on the visible light image and the infrared registration image to obtain a visible light image base layer B vis Registering image base layer B with infrared ir Wherein the anisotropic filtering pair for image decomposition is specifically operative to:
whereinI t+1 In order to output the image(s),I t for the input image, c is the diffusion rate, the V is the gradient operator, Δ is the Laplace operator, t is the set iteration number;
and 4.3, subtracting the visible light image basic layer from the visible light image to be fused to obtain a visible light image detail layer, and subtracting the infrared image basic layer from the infrared image to be fused to obtain an infrared image detail layer, wherein the infrared image detail layer is calculated as follows:
the visible light image detail layers are:
wherein I vis And I ir Respectively a visible light image to be fused and an infrared registration image;
step 4.4, respectively arranging the visible light image detail layer and the infrared registration image detail layer as X vis And X ir Find X vis And X ir Covariance matrix C of xx Calculating a covariance matrix C xx Set of eigenvalues w 1 、w 2 And feature vector set:
selecting w 1 、w 2 The larger of the characteristic values is w max Selecting w max Corresponding feature vector v max The irrelevant subassembly coefficient in detail layer is sought, KL1 is the irrelevant subassembly coefficient in visible light image detail layer, KL2 is the irrelevant subassembly coefficient in infrared image detail layer:
fusing image detail layers:
fusing image base layers:
fusing images:
further, step 5, endowing pseudo color to the fused image, and performing example segmentation by using a mask, wherein the specific method comprises the following steps:
performing pseudo-color processing on the fused image by using COLORMAP _ JET to obtain a color fused image, replacing the image in a rectangular frame of the target position coordinates of the visible light image with the color fused image, obtaining a global color fused image, using a mask to perform negation and overlapping with the visible light image to obtain a visible light background image, overlapping the mask and the global color fused image to obtain a global color fused image example segmentation part, and overlapping the global color fused image example segmentation part and the visible light background image to obtain a final result.
A computer device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, when the processor executes the computer program, the target instance segmentation based on the visible light and infrared image fusion target detection instance segmentation method is realized.
A computer readable storage medium, on which a computer program is stored, which, when executed by a processor, implements a target instance segmentation based on visible light image and infrared image fusion detection based on the visible light infrared image fusion target detection instance segmentation method.
Compared with the prior art, the invention has the remarkable advantages that: (1) Compared with a common visible light and infrared image fusion method, the method can intelligently detect the target concerned on the image and carry out example segmentation on the concerned target; (2) Compared with a common visible light and infrared image fusion method, the method adopts local fusion based on the position coordinates of the target, thereby not only ensuring the detection of the concerned target, but also shortening the time consumed by image fusion; (3) A Mosaic data enhancement method is adopted for training the mask _ rcnn neural network, and the detection performance of the network on small targets is improved.
Drawings
FIG. 1 is a schematic diagram of the structure implementation of the present invention.
Fig. 2 is a structural schematic diagram of yolov 5.
Fig. 3 is a schematic diagram of yolov5 specific modules.
Fig. 4 is a schematic structural diagram of slice in yolov 5.
FIG. 5 is a schematic diagram of the mask _ rcnn structure.
FIG. 6 is a diagram of the structure of ResNet-FPN.
Fig. 7 is a PRN network structure.
Fig. 8 is an ADF image fusion flow chart.
FIG. 9 is a diagram of the detection effect of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The invention discloses a visible light infrared image fusion target detection example segmentation method, the overall flow chart of which is shown in figure 1, and the method comprises the following steps:
the original visible light image and the infrared image have the problem of different view fields, the phenomenon of target dislocation can occur when the images are directly fused, and image registration operation is needed to ensure that the target positions of the fused images are strictly corresponding. Selecting a visible light image and an infrared image which are shot simultaneously, and respectively selecting 12 groups of calibration points from the two images to record the coordinatesMarked x ir_i, y ir_i And x vis_i, y vis_i (i is ∈ [1, 12 ]]) Wherein the subscript is ir to represent the mark points of the infrared image, vis to represent the mark points of the visible image, i to represent the number of groups, and for each mark point, the real space position is required to be the same, 4 groups of 12 groups of mark points are from the close view target, 4 groups are from the medium view target, and 4 groups are from the distant view target. Calculating the sum of Euclidean distances of the calibration points
The sum of Euclidean distances of the calibration points is minimized by controlling the translation and the scaling of the infrared image, the translation and scaling amount is recorded, and the position of a target on the infrared image is the same as the position of a pixel coordinate on the same target on the visible light image on the image by changing according to the translation and scaling amount after the infrared image is input.
the specific structure of Yolov5 neural network is shown in fig. 2 and fig. 3, and includes:
an input end for performing minimum black edge filling;
inputting an infrared image to perform feature extraction to obtain a feature map;
the Neck part is used for performing up-sampling on the input characteristic diagram to obtain three groups of characteristic diagrams with different scales;
the prediction part is used for carrying out convolution operation on the three groups of characteristic images with different scales to obtain a target position coordinate, and the three groups of characteristic images are integrated on an input image to obtain a final target position coordinate;
before the input image enters the yolov5 neural network, the minimum black edge scaling is carried out to be uniform in size, the value of t in the invention is 608, and the minimum black edge scaling calculation method for the size of the input registration infrared image to be 640 x 512 is as follows:
wherein m and n are the length and width coefficients of the original input image, t is the image side length required by yolov5 neural network,a i is that the scaling ratio selects the smallest scaling factor
Calculating a fill margin value
Wherein c is the size of the filled black border. Use of scale to register infrared imagesm new 、n new And scaling is carried out, and black frames with the length of c are filled at the upper and lower sides, so that the black edges at the two ends of the image height are reduced, and the calculated amount is reduced during reasoning.
The Backbone is formed by arranging a Focus module, a CSP module and an SPP module and is used for extracting a characteristic diagram of a graph. Is the backbone network of yolov 5. The Focus structure divides a picture into four different groups of slices, and stacks the slices through Contcat, as shown in FIG. 4. Obtaining 304 × 304 feature maps through a first Focus module; obtaining 152 x 152 feature maps through CBL module ports; and after passing through the CSP1_1 and CBL modules, obtaining a characteristic diagram 76 x 76, after passing through the CSP1_3 module, inputting part of the characteristic diagram into the Neck module, inputting part of the characteristic diagram into the CBL module to obtain a characteristic diagram 38 x 38, storing the characteristic diagram in the CSP1_3 module, inputting the characteristic diagram into the CBL module to obtain a characteristic diagram 19 x 19, and inputting the characteristic diagram into the SPP module to input the Neck part.
And Neck adopts a FPN + PAN structure, down-samples the image, extracts target features of different scales and provides the target features to a prediction layer. The FPN fractions extract 76 × 76, 38 × 38, 19 × 19, respectively, from the size feature layer. The specific method for fusing the two feature layers is that the upper layer feature is up-sampled by 2 times, the lower layer feature changes the channel number of the lower layer feature through 1 × 1 convolution, and then corresponding elements of the result after the up-sampling and the CBL convolution are simply added. In the PAN structure, copying the feature pyramid bottom layer in the FPN into a new feature pyramid bottom layer, superposing the upper sampling of the feature pyramid bottom layer and the upper layer in the FPN, and outputting the predicted target coordinate position and the confidence coefficient through the CSP2_1 convolution structure.
And in the prediction section, performing target screening by adopting nms non-maximum suppression, wherein the nms non-maximum suppression mainly comprises the steps of grouping all rectangular frames according to different types of labels, and sequencing in groups according to confidence levels. And taking out the rectangular frames with the same label, traversing the residual rectangular frames with the same label, calculating the intersection ratio, and deleting the frames which are larger than the set IOU threshold value in the residual rectangular frames.
The key of this embodiment is the model training process, and the yolov5 neural network model for target detection used in the present invention is specifically described below: the method selects a filer data set to train a yolov5 neural network, a flight data set is a visible light and infrared image data set of an automobile traffic road issued in 2018 and 7 months, 14000 images are selected in total, three training labels including people, bicycles and automobiles are selected, mosaic data enhancement is adopted, 4 images are adopted, random scaling, random cutting and random arrangement are adopted for splicing, a CIOU _ Loss is selected as a Loss function, and the CIOU _ Loss Loss function formula is as follows:
wherein
Is the intersection area of the two frames,is a two-box union area.Is the length of the minimum external rectangle diagonal of the two frames,is the Euclidean distance between the central points of the two frame graphs.Is a parameter to measure the uniformity of aspect ratio. The specific calculation formula is
Whereinw gt ,h gt Respectively represent the width and height of the prediction block diagram,w p ,h p respectively, represent the width and height of the actual block diagram.
After training is carried out to obtain the training weight, the matched infrared image is input to obtain the position coordinates of the predicted target, wherein the specific target comprises a pedestrian, a bicycle and an automobile.
And 3, training a mask _ rcnn neural network, and extracting a mask in the infrared registration image by using the mask _ rcnn neural network.
Fig. 5 is a schematic diagram of a mask _ rcnn neural network, and the structure includes:
extracting feature graphs of different scales from the ResNet-FPN part, and constructing a feature pyramid;
the RPN part extracts a required part in the feature map;
the ROI Align part is used for pooling the extracted part in the feature map;
a mask part for inputting the characteristic diagram to obtain a mask;
the ResNet-FPN part comprises five convolutional layers, four feature maps with different scales are output from the second layer to the fifth layer, and a feature map which is scaled by 4,8, 16 and 32 times is extracted from an original image, as shown in the figure 6.
After the RPN partial pictures are input, a group of RPN partial pictures passes through the convolution layer of 1*1, and then passes through reshape, softmax, and reshape. Wherein the softmax module can extract a required part in the extracted feature map, as shown in fig. 7;
the ROI Align equally divides each sampling point of the feature map into four grids, and the four sampling points in each grid are subjected to Maxpoling, so that the final ROIAlign result can be obtained.
The Mask part uses FCN full convolution network for image segmentation.
The invention adopts a coco data set for mask _ rcnn training, wherein the coco data set is a large-scale data set for object detection, segmentation and character. The method comprises 330k, wherein three labels of a car, a bicycle and a pedestrian are selected, irrelevant pictures are removed for training, and a method of Mosaic data enhancement is also adopted.
And after training to obtain corresponding weight, performing example segmentation on the registered infrared image to obtain a mask.
registering the visible image with the infrared image, in x 1, y 1 Is the coordinate of the point at the upper left corner, x 2, y 2 And constructing a target position coordinate rectangular frame for the coordinates of the lower right corner point, selecting the visible light image to be fused and the infrared registration image to be fused by using the target position coordinate rectangular frame, and using an ADF image fusion method.
The ADF image fusion process is shown in FIG. 8, and anisotropic filtering is performed on the visible light image and the infrared registration image to obtain a visible light image base layer B vis Registering image base layer B with infrared ir : the anisotropic filtering pair is specifically operated to perform image decomposition by:
whereinI t+1 In order to output the image(s),I t for the input image, c isThe diffusion rate is V-shaped, delta is Laplace operator, t is iteration number, is set to 10, and the input original image is I 0 Input and output are I 1 In the following 1 Inputting, performing 10 iterations to obtain I 10 I.e. the base layer image, the above formula is developed to obtain:
v is a gradient operator, subscripts N, S, E, W represent gradient for a single direction, Δ is the Laplace operator, t is the number of iterations, c N ,c S ,c W ,c E The conduction coefficient of the input image to the north, south, west and east directions, I t i,j Representing the input image, I, j representing the coordinates of the individual pixel positions, I t+1 i,j The iteration number of the output image is set to 10, and the value range of lambda is [0,1/4 ]]For V in the formula N I t i,j ,∨ S I t i,j ,∨ W I t i,j ,∨ E I t i,j The partial expansion formula is as follows:
wherein I i-1,j -I i,j The expression is that for each pixel on the image, the pixel under the pixel point is used to subtract the pixel, the boundary position takes a zero value, and the other formulas are similar, and for c in the formula N ,c S ,c W ,c E Part of the specific calculation formulas are as follows:
wherein N I t i,j ,∨ S I t i,j ,∨ W I t i,j ,∨ E I t i,j As defined above, functionNeed to satisfy monotonically decreasing andtwo calculation options are adopted, the following two options are adopted:
The operation visible light original image is I 0 Input and output are I 1 In the following 1 Inputting, performing 10 iterations to obtain I 10 I.e. the visible light image base layer B vis Obtaining the infrared registration image base layer B by using the same operation for the infrared registration image ir 。
And subtracting the visible light image basic layer from the visible light image to obtain a visible light image detail layer, wherein the infrared registration image detail layer is calculated as:
the visible light image detail layers are:
wherein I vis And I ir Are respectively visibleThe light image is registered with the infrared image.
Respectively arranging a visible light image detail layer and an infrared image detail layer into X vis And X ir Find X vis And X ir Covariance matrix C of xx Calculating a covariance matrix C xx Set of eigenvalues w 1 、w 2 And feature vector set
Calculating the uncorrelated coefficients and selecting a larger eigenvalue w max Selecting the corresponding feature vector v max Finding out coefficients of uncorrelated components in detail layers
And performing pseudo-color processing on the fused image part by using a COLORMAP _ JET strategy to obtain a color fused image, replacing an image in a rectangular frame of a target position coordinate of the visible light image with the color fused image, obtaining a global color fused image, performing inversion by using a mask and overlapping with the visible light image to obtain a visible light background image, overlapping the mask and the global color fused image to obtain a global color fused image example segmentation part, and overlapping the global color fused image example segmentation part and the visible light background image to obtain a final result.
The experimental scene is a scene facing urban traffic roads, and a real-time calculation result of 60w power is obtained by using a Haikang gigabit Internet access industrial area array camera visible light detector, a Xencisgobi +640 infrared thermal imager and a binocular camera on a Jetson-Nano development platform. The shot was 3 minutes 23 seconds, including 4067 pictures.
Fig. 9 is a final detection sample diagram, which can accurately detect and divide objects for pedestrians and vehicles in urban road traffic, and perform infrared and visible light fusion for a detection area.
Table 1 shows the time consumption of a single picture using full fusion and local fusion, which is compared with the time consumption of each partial link, and it is obvious that the time consumption of local fusion is reduced compared with global fusion. Table 2 shows the comparison of the precision of different segmentation methods, and the segmentation effect is similar for the neural networks of the medium and large targets, but the mask _ rcnn network has obvious advantages for the small target at a distance.
TABLE 1 comparison of time consumption of Single Picture Using full fusion and partial fusion
TABLE 2 precision comparison of different segmentation methods
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.
Claims (8)
1. A visible light infrared image fusion target detection example segmentation method is characterized by comprising the following steps:
step 1, registering an infrared image and a visible light image, calibrating the same target in the infrared image and the visible light image, and translating and scaling the infrared image according to a calibration result to obtain an infrared registration image;
step 2, extracting a target position coordinate from the infrared registration image by using a yolov5 neural network;
step 3, extracting a mask from the infrared registration image by using a mask _ rcnn neural network;
step 4, determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method;
and 5, giving a pseudo color to the fused image, and performing example segmentation by using a mask.
2. The visible light infrared image fusion target detection instance segmentation method according to claim 1, characterized in that: step 1, registering the infrared image and the visible light image, calibrating the same target in the infrared image and the visible light image, and translating and zooming the infrared image according to a calibration result to obtain an infrared registration image, wherein the specific method comprises the following steps:
selecting a visible light image and an infrared image which are shot simultaneously, respectively selecting 12 groups of calibration points from the two images to record coordinates as x ir_i ,y ir_i And x vis_i ,y vis_i (i is ∈ [1, 12 ]]) The subscript is ir to represent the mark points of the infrared image, vis to represent the mark points of the visible image, i to represent the number of groups, and for each mark point, the real space position is required to be the same, 4 groups of 12 groups of mark points are from a close-range target, 4 groups of mark points are from a medium-range target, and 4 groups of mark points are from a far-range target;
calculating the sum of Euclidean distances of the calibration points:
and controlling the infrared image translation scaling to enable the sum of Euclidean distances of the calibration points to be minimum, recording the translation scaling amount, and performing the same transformation on all the infrared images to obtain the infrared registration image.
3. The method for segmenting the visible light infrared image fusion target detection example according to claim 1, characterized in that: step 2, acquiring a target position coordinate in the infrared registration image by using a yolov5 neural network, wherein the specific method comprises the following steps:
(a) Training a yolov5 neural network by using a flight data set;
training a yolov5 neural network by using three labels of small vehicles, self-propelled vehicles and pedestrians in a flight data set, adopting Mosaic data enhancement in the training, randomly selecting 4 infrared images, forming a brand new picture by means of random scaling, random cutting and random arrangement, forming a training set by using an original flight data set image and an image obtained by enhancing the Mosaic data, inputting the training set into the yolov5 neural network for training, and obtaining a training weight;
(b) Carrying out target detection by using a yolov5 neural network;
and (3) performing minimum black edge scaling on the registered infrared image, wherein the minimum black edge scaling calculation method comprises the following steps:
wherein m and n are the length-width coefficient of the original input image, t is the image side length required by yolov5 neural network input, a 1 ,a 2 The scaling coefficient of the length and the width of the original image relative to the yolov5 neural network required for image input;
use of scale to register infrared imagesm new 、n new Zooming is carried out;
wherein a is min Is a 1 ,a 2 The smaller of the two;
filling a black frame with the length of c at the upper and lower sides, wherein the calculation formula of the width of the filling side is as follows:
wherein c is the size of the filled black edge;
loading the weight obtained by training in the step (a), inputting the infrared registration image after scaling and filling into a yolov5 neural network, and outputting a target position coordinate (x) 1 ,x 2 ,y 1 ,y 2 ,c),x 1 ,y 1 Is the target upper left corner point coordinate, x 2 ,y 2 Coordinate of the lower right corner point of the target, and c is the target classification.
4. The visible-light infrared image fusion target detection instance segmentation method according to claim 3, characterized in that: step 3, extracting a mask from the infrared image by using a mask _ rcnn neural network, wherein the method comprises the following steps:
performing mask _ rcnn neural network training by using a coco data set to obtain training weights;
and (c) loading the training weight in the step (a), and inputting the infrared registration image into a mask _ rcnn neural network to obtain a mask.
5. The visible-light infrared image fusion target detection instance segmentation method according to claim 3, characterized in that: step 4, determining a fusion range according to the target position coordinates, and fusing the visible light image and the infrared registration image by adopting an ADF image fusion method, wherein the specific method comprises the following steps:
step 4.1, with x 1 ,y 1 As coordinates of the upper left corner point, x 2 ,y 2 Constructing a target position coordinate rectangular frame for the coordinates of the lower right corner, selecting a visible light image to be fused and an infrared image to be fused for the visible light image and the infrared registration image by using the target position coordinate rectangular frame, and using an ADF image fusion method;
step 4.2, anisotropic filtering is carried out on the visible light image and the infrared registration image to obtain a visible light image base layer B vis Registering image base layer B with infrared ir Wherein the anisotropic filtering pair for image decomposition is specifically operative to:
whereinI t+1 In order to output the image(s),I t for the input image, c is the diffusion rate, the V is the gradient operator, Δ is the Laplace operator, and t is the set iteration number;
and 4.3, subtracting the visible light image basic layer from the visible light image to be fused to obtain a visible light image detail layer, and subtracting the infrared image basic layer from the infrared image to be fused to obtain an infrared image detail layer, wherein the infrared image detail layer is calculated as follows:
the visible light image detail layers are:
wherein I vis And I ir Respectively a visible light image to be fused and an infrared registration image;
step 4.4, respectively arranging the visible light image detail layer and the infrared registration image detail layerX vis And X ir Find X vis And X ir Covariance matrix C of xx Calculating a covariance matrix C xx Set of eigenvalues w 1 、w 2 And feature vector set:
selecting w 1 、w 2 The larger of the characteristic values is w max Selecting w max Corresponding feature vector v max And obtaining the coefficient of the irrelevant component of the detail layer, wherein KL1 is the coefficient of the irrelevant component of the visible light image detail layer, and KL2 is the coefficient of the irrelevant component of the infrared image detail layer:
fusing image detail layers:
fusing image base layers:
fusing images:
6. the visible-light infrared image fusion target detection instance segmentation method according to claim 1, characterized in that: and 5, endowing pseudo colors to the fusion image, and performing instance segmentation by using a mask, wherein the specific method comprises the following steps of:
performing pseudo-color processing on the fused image by using COLORMAP _ JET to obtain a color fused image, replacing the image in a rectangular frame of the target position coordinates of the visible light image with the color fused image, obtaining a global color fused image, using a mask to perform negation and overlapping with the visible light image to obtain a visible light background image, overlapping the mask and the global color fused image to obtain a global color fused image example segmentation part, and overlapping the global color fused image example segmentation part and the visible light background image to obtain a final result.
7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements target instance segmentation based on visible light image and infrared image fusion detection based on the visible light infrared image fusion target detection instance segmentation method of any one of claims 1 to 6.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements target instance segmentation based on visible light image and infrared image fusion detection based on the visible light infrared image fusion target detection instance segmentation method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211095330.4A CN115170810B (en) | 2022-09-08 | 2022-09-08 | Visible light infrared image fusion target detection example segmentation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211095330.4A CN115170810B (en) | 2022-09-08 | 2022-09-08 | Visible light infrared image fusion target detection example segmentation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115170810A true CN115170810A (en) | 2022-10-11 |
CN115170810B CN115170810B (en) | 2022-12-13 |
Family
ID=83482397
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211095330.4A Active CN115170810B (en) | 2022-09-08 | 2022-09-08 | Visible light infrared image fusion target detection example segmentation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115170810B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116895094A (en) * | 2023-09-11 | 2023-10-17 | 杭州魔点科技有限公司 | Dark environment imaging method, system, device and medium based on binocular fusion |
CN118038499A (en) * | 2024-04-12 | 2024-05-14 | 北京航空航天大学 | Cross-mode pedestrian re-identification method based on mode conversion |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103366353A (en) * | 2013-05-08 | 2013-10-23 | 北京大学深圳研究生院 | Infrared image and visible-light image fusion method based on saliency region segmentation |
CN105447838A (en) * | 2014-08-27 | 2016-03-30 | 北京计算机技术及应用研究所 | Method and system for infrared and low-level-light/visible-light fusion imaging |
US20180227509A1 (en) * | 2015-08-05 | 2018-08-09 | Wuhan Guide Infrared Co., Ltd. | Visible light image and infrared image fusion processing system and fusion method |
US20190279371A1 (en) * | 2018-03-06 | 2019-09-12 | Sony Corporation | Image processing apparatus and method for object boundary stabilization in an image of a sequence of images |
AU2020100178A4 (en) * | 2020-02-04 | 2020-03-19 | Huang, Shuying DR | Multiple decision maps based infrared and visible image fusion |
CN111062905A (en) * | 2019-12-17 | 2020-04-24 | 大连理工大学 | Infrared and visible light fusion method based on saliency map enhancement |
CN111145133A (en) * | 2019-12-05 | 2020-05-12 | 南京理工大学 | ZYNQ-based infrared and visible light co-optical axis image fusion system and method |
CN111209810A (en) * | 2018-12-26 | 2020-05-29 | 浙江大学 | Bounding box segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time in visible light and infrared images |
CN111275759A (en) * | 2020-01-16 | 2020-06-12 | 国网江苏省电力有限公司 | Transformer substation disconnecting link temperature detection method based on unmanned aerial vehicle double-light image fusion |
CN111432172A (en) * | 2020-03-20 | 2020-07-17 | 浙江大华技术股份有限公司 | Fence alarm method and system based on image fusion |
CN111611905A (en) * | 2020-05-18 | 2020-09-01 | 沈阳理工大学 | Visible light and infrared fused target identification method |
CN111915546A (en) * | 2020-08-04 | 2020-11-10 | 西安科技大学 | Infrared and visible light image fusion method and system, computer equipment and application |
CN112016478A (en) * | 2020-08-31 | 2020-12-01 | 中国电子科技集团公司第三研究所 | Complex scene identification method and system based on multispectral image fusion |
CN113344475A (en) * | 2021-08-05 | 2021-09-03 | 国网江西省电力有限公司电力科学研究院 | Transformer bushing defect identification method and system based on sequence modal decomposition |
CN114332748A (en) * | 2021-11-08 | 2022-04-12 | 西安电子科技大学 | Target detection method based on multi-source feature joint network and self-generation of transformed image |
CN114519808A (en) * | 2022-02-21 | 2022-05-20 | 烟台艾睿光电科技有限公司 | Image fusion method, device and equipment and storage medium |
-
2022
- 2022-09-08 CN CN202211095330.4A patent/CN115170810B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103366353A (en) * | 2013-05-08 | 2013-10-23 | 北京大学深圳研究生院 | Infrared image and visible-light image fusion method based on saliency region segmentation |
CN105447838A (en) * | 2014-08-27 | 2016-03-30 | 北京计算机技术及应用研究所 | Method and system for infrared and low-level-light/visible-light fusion imaging |
US20180227509A1 (en) * | 2015-08-05 | 2018-08-09 | Wuhan Guide Infrared Co., Ltd. | Visible light image and infrared image fusion processing system and fusion method |
US20190279371A1 (en) * | 2018-03-06 | 2019-09-12 | Sony Corporation | Image processing apparatus and method for object boundary stabilization in an image of a sequence of images |
CN111209810A (en) * | 2018-12-26 | 2020-05-29 | 浙江大学 | Bounding box segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time in visible light and infrared images |
CN111145133A (en) * | 2019-12-05 | 2020-05-12 | 南京理工大学 | ZYNQ-based infrared and visible light co-optical axis image fusion system and method |
CN111062905A (en) * | 2019-12-17 | 2020-04-24 | 大连理工大学 | Infrared and visible light fusion method based on saliency map enhancement |
CN111275759A (en) * | 2020-01-16 | 2020-06-12 | 国网江苏省电力有限公司 | Transformer substation disconnecting link temperature detection method based on unmanned aerial vehicle double-light image fusion |
AU2020100178A4 (en) * | 2020-02-04 | 2020-03-19 | Huang, Shuying DR | Multiple decision maps based infrared and visible image fusion |
CN111432172A (en) * | 2020-03-20 | 2020-07-17 | 浙江大华技术股份有限公司 | Fence alarm method and system based on image fusion |
CN111611905A (en) * | 2020-05-18 | 2020-09-01 | 沈阳理工大学 | Visible light and infrared fused target identification method |
CN111915546A (en) * | 2020-08-04 | 2020-11-10 | 西安科技大学 | Infrared and visible light image fusion method and system, computer equipment and application |
CN112016478A (en) * | 2020-08-31 | 2020-12-01 | 中国电子科技集团公司第三研究所 | Complex scene identification method and system based on multispectral image fusion |
CN113344475A (en) * | 2021-08-05 | 2021-09-03 | 国网江西省电力有限公司电力科学研究院 | Transformer bushing defect identification method and system based on sequence modal decomposition |
CN114332748A (en) * | 2021-11-08 | 2022-04-12 | 西安电子科技大学 | Target detection method based on multi-source feature joint network and self-generation of transformed image |
CN114519808A (en) * | 2022-02-21 | 2022-05-20 | 烟台艾睿光电科技有限公司 | Image fusion method, device and equipment and storage medium |
Non-Patent Citations (6)
Title |
---|
CHUNYU XIE 等: "Infrared and Visible Image Fusion: A Region-Based Deep Learning Method", 《INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTICS AND APPLICATIONS》 * |
JIAYI MA 等: "STDFusionNet: An Infrared and Visible Image Fusion Network Based on Salient Target Detection", 《IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT》 * |
ZHOU JINGWEN 等: "An infrared and visible image fusion method based on VGG-19 network", 《OPTIK》 * |
刘砚菊 等: "GAN网络的红外与可见光图像融合方法研究", 《沈阳理工大学学报》 * |
王瑜婧: "显著性检测的红外与可见光图像融合算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
高乙文: "基于混合角点检测的脑磁共振图像配准", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116895094A (en) * | 2023-09-11 | 2023-10-17 | 杭州魔点科技有限公司 | Dark environment imaging method, system, device and medium based on binocular fusion |
CN116895094B (en) * | 2023-09-11 | 2024-01-30 | 杭州魔点科技有限公司 | Dark environment imaging method, system, device and medium based on binocular fusion |
CN118038499A (en) * | 2024-04-12 | 2024-05-14 | 北京航空航天大学 | Cross-mode pedestrian re-identification method based on mode conversion |
Also Published As
Publication number | Publication date |
---|---|
CN115170810B (en) | 2022-12-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108596101B (en) | Remote sensing image multi-target detection method based on convolutional neural network | |
CN115170810B (en) | Visible light infrared image fusion target detection example segmentation method | |
US20220044375A1 (en) | Saliency Map Enhancement-Based Infrared and Visible Light Fusion Method | |
CN109086668B (en) | Unmanned aerial vehicle remote sensing image road information extraction method based on multi-scale generation countermeasure network | |
CN109886312B (en) | Bridge vehicle wheel detection method based on multilayer feature fusion neural network model | |
CN111563415B (en) | Binocular vision-based three-dimensional target detection system and method | |
US8179393B2 (en) | Fusion of a 2D electro-optical image and 3D point cloud data for scene interpretation and registration performance assessment | |
Vaudrey et al. | Differences between stereo and motion behaviour on synthetic and real-world stereo sequences | |
US10477178B2 (en) | High-speed and tunable scene reconstruction systems and methods using stereo imagery | |
CN113111974A (en) | Vision-laser radar fusion method and system based on depth canonical correlation analysis | |
CN108665496A (en) | A kind of semanteme end to end based on deep learning is instant to be positioned and builds drawing method | |
CA3028599A1 (en) | Systems and methods for correcting a high-definition map based on detection of obstructing objects | |
CN113673444B (en) | Intersection multi-view target detection method and system based on angular point pooling | |
CN111951306A (en) | Target detection method for fusion of laser radar and image video | |
Min et al. | Orfd: A dataset and benchmark for off-road freespace detection | |
CN114254696A (en) | Visible light, infrared and radar fusion target detection method based on deep learning | |
CN112016478B (en) | Complex scene recognition method and system based on multispectral image fusion | |
CN110070025A (en) | Objective detection system and method based on monocular image | |
CN114972748B (en) | Infrared semantic segmentation method capable of explaining edge attention and gray scale quantization network | |
CN114972989A (en) | Single remote sensing image height information estimation method based on deep learning algorithm | |
CN114339185A (en) | Image colorization for vehicle camera images | |
CN113610905A (en) | Deep learning remote sensing image registration method based on subimage matching and application | |
CN111626241A (en) | Face detection method and device | |
Tseng et al. | Semi-supervised image depth prediction with deep learning and binocular algorithms | |
CN116630528A (en) | Static scene reconstruction method based on neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |