CN113688722B

CN113688722B - Infrared pedestrian target detection method based on image fusion

Info

Publication number: CN113688722B
Application number: CN202110971334.3A
Authority: CN
Inventors: 李永军; 李耀; 李莎莎; 李孟军; 陈竞; 陈立家; 李鹏飞; 曹雪
Original assignee: Henan University
Current assignee: Henan University
Priority date: 2021-08-21
Filing date: 2021-08-21
Publication date: 2024-03-22
Anticipated expiration: 2041-08-21
Also published as: CN113688722A

Abstract

The invention provides an infrared pedestrian target detection method based on image fusion, which comprises the following steps: 1. establishing an infrared pedestrian target detection data set; 2. fusing the images using the trained Denseuse network; 3. constructing and training a YOLOv5 infrared pedestrian target detection model based on image fusion; the invention uses the Denseuse network to fuse the visible light image and the infrared image pair in the constructed infrared pedestrian target detection data set, thereby enhancing the quality of the image and reducing redundant information, and obtaining an infrared pedestrian target detection data set with more abundant information; the image fusion-based YOLOv5 infrared pedestrian target detection model is trained by using the fused infrared pedestrian detection data set, so that the image fusion-based YOLOv5 infrared pedestrian target detection model with good convergence is obtained, and the infrared image pedestrian target detection precision is improved.

Description

Infrared pedestrian target detection method based on image fusion

Technical Field

The invention relates to the technical field of image processing, in particular to an infrared pedestrian target detection method based on image fusion.

Background

The infrared imaging has the outstanding characteristics of long detection distance, high concealment, all-weather day and night work and the like, the target detection in the infrared image can simultaneously acquire the judgment of image content and target positioning, and the infrared imaging takes an irreplaceable position in civil fields such as medical lesion cell diagnosis, industrial flaw detection, automobile driving assistant and the like and military fields such as infrared early warning, submarine searching, infrared guidance and the like. However, the infrared image is obtained by thermal radiation of the target scene, and has a longer imaging wavelength, a large image noise, a low contrast, and a poor spatial resolution compared to the visible light image. The quality of the image directly influences the design and detection accuracy of the algorithm, so that the infrared target detection rate is low and the detection speed is low in the application field. Data enhancement has proven to be an effective approach to address the challenges presented by infrared pedestrian target detection tasks. The visible light image can reflect the spectral information attribute of the object, contains more detail information and accords with the visual characteristics of human eyes, but the heat radiation characteristic of the infrared image is more sensitive to the target and the region, and the interference caused by scene change can be avoided. Therefore, the infrared image and the visible light image have complementarity, the infrared image and the visible light image can be fused into an infrared image with more abundant information by an image fusion preprocessing method, so that the image quality is enhanced, the redundant information is reduced, and the aim of preprocessing a data set to enhance the image is fulfilled.

A monocular far infrared pedestrian detection method based on feature fusion is disclosed in a monocular far infrared pedestrian detection method based on feature fusion (patent application number: 2019109437223, publication number: CN 110674779A) owned by the university of agricultural in south China. The method comprises the steps of firstly carrying out scaling treatment on an original infrared image, then obtaining preliminary ROIs through threshold segmentation and morphological treatment, carrying out sliding window on the preliminary ROIs, and finally carrying out decision by cascading a linear SVM classifier based on HAAR feature and LBP feature while using a linear SVM classifier based on HOG feature. The method can be well suitable for long-distance, medium-distance and short-distance pedestrian detection, but is redundant in area selection strategy and high in time complexity due to the adoption of a traditional feature extraction mode and a sliding block filtering operation, has no good robustness to the change of diversity, and particularly is difficult to obtain useful information such as shape, size, structure, texture and the like for a weak and small target.

White jade, hou Zhijiang, liu Xiaoyi, etc. propose a target detection algorithm based on decision level fusion in paper "target detection algorithm based on decision level fusion of visible light images and infrared images" (journal of air force engineering university (natural science edition) 2020, volume 21, 6, pages 53 to 59). The algorithm retrains the YOLOv3 network by establishing a data set with labels, and respectively detects a visible light image and an infrared image by using the trained YOLOv3 network before fusion; in the fusion process, the target detection results of the visible light image and the infrared image are reserved, the accurate results of the same target detected in the visible light image and the infrared image at the same time are subjected to weighted fusion, and the detection results and the fusion results are combined to be used as the detection results of all corresponding targets in the fusion image. The algorithm combines the visible light image detection result, the infrared image result and the fusion result thereof, so that the detection accuracy is improved to a certain extent, but the algorithm uses the YOLOv3 network to train the visible light image and the infrared image respectively, and the calculation complexity is relatively high.

Disclosure of Invention

The invention aims to provide an infrared pedestrian target detection method based on image fusion, which aims to solve the problems of poor robustness, complex calculation and the like in the existing infrared pedestrian target detection method.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the infrared pedestrian target detection method based on image fusion comprises the following steps:

step 1: establishing an infrared pedestrian target detection data set, and specifically:

step 1.1: the method comprises the steps that a visible light camera and an infrared camera are used for randomly shooting a scene containing a pedestrian target at the same time, and the visible light images and the infrared images with the same quantity are collected;

step 1.2: preprocessing the visible light image and the infrared image acquired in the step 1.1;

step 1.3: the visible light images and the infrared images preprocessed in the step 1.2 are arranged according to the acquisition sequence, and the acquired visible light images and infrared images are mutually corresponding and named as consistent, so as to form a visible light image and infrared image pair;

step 1.4: performing boundary frame marking classification on pedestrians in the visible light image and infrared image pair in the step 1.3 through LabelImg software, and manufacturing and deriving a label;

step 1.5: randomly selecting 70% of the visible light image and infrared image pairs in the step 1.4 as a training set, and 30% of the rest visible light image and infrared image pairs as a test set;

step 2: fusion of visible and infrared images using a Denseuse network, in particular:

step 2.1: sending the training set in the step 1.5 to a Denseuse network;

step 2.2: setting training parameters, specifically, setting the size of a training Batch to be batch=4, and initially setting the learning rate to be Ir=0.0001, wherein the training iteration times are epoch=150;

step 2.3: extracting features of the input image using an encoder in the Denseuse network;

step 2.4: reconstructing the input image using a decoder in the Denseuse network according to the features extracted from the input image in step 2.3, thereby obtaining a fixed weight Denseuse network;

step 2.5: the test set in the step 1.5 is sent to the Denseuse network with fixed weight in the step 2.4 for testing, and a test result is obtained;

step 2.6: evaluating the Denseuse network, and evaluating the fixed-weight Denseuse network in the step 2.4 according to the test result in the step 2.5 to obtain an evaluation result;

step 2.7: judging whether the loss variation in the evaluation result of the step 2.6 tends to be stable, if so, executing the step 2.9, otherwise, executing the step 2.8;

step 2.8: adjusting the learning rate and the iteration times in the training parameters of the Denseuse network, and jumping to the step 2.3 for retraining;

step 2.9: the training set and the test set in the step 1.5 are both sent to the Denseuse network with fixed weight in the step 2.4, the characteristics of the visible light image and the infrared image pair are extracted through the encoder, and the characteristics of the visible light image and the infrared image pair extracted by the encoder are fused by adopting an addition fusion strategy shown in the formula (1); sending the fused features into a decoder for reconstruction to obtain a fused image;

F ^m (x,y)＝λVis ^m (x,y)+(1-λ)Ir ^m (x,y)； (1)

wherein Vis ^m (x, y) represents the feature of the m-th channel visible light image extracted by the encoder, ir ^m (x, y) represents the features of the mth channel infrared image extracted by the encoder, F ^m (x, y) represents the feature of fusion of the m-th channel visible light image and the infrared image pair, lambda is a weighting coefficient, and x and y represent the abscissa and the ordinate of the pixel point on the input image respectively;

step 3: training a YOLOv5 infrared pedestrian target detection model based on image fusion, and specifically:

step 3.1: constructing a training set and a testing set of the YOLOv5 infrared pedestrian target detection model based on image fusion, randomly selecting 70% of the images fused in the step 2.9, taking the label corresponding to the step 1.4 as the training set of the YOLOv5 infrared pedestrian target detection model based on image fusion, taking the remaining 30% of the images fused in the step 2.9 as the testing set of the YOLOv5 infrared pedestrian target detection model based on image fusion, and taking the label corresponding to the step 1.4 as the testing set of the YOLOv5 infrared pedestrian target detection model based on image fusion;

step 3.2: training parameters are set, training is carried out by using a random optimization algorithm Adam, the size of a training Batch is set to be batch=64, the Momentum momentum=0.9, the learning rate is initially set to be ir=0.001, and the training iteration times epoch=300;

step 3.3: the training set of the image fusion-based YOLOv5 infrared pedestrian target detection model in the step 3.1 is sent to the image fusion-based YOLOv5 infrared pedestrian target detection model for training, and average precision change and loss change trend of the training result are obtained;

step 3.4: according to the average precision change and the loss change trend of the training result in the step 3.3, the learning rate and the iteration times are adjusted until the average precision change and the loss change trend tend to be in a stable state, and a YOLOv5 infrared pedestrian target detection model with good final convergence and based on image fusion is obtained;

step 3.5: and (3) sending the test set of the image fusion-based YOLOv5 infrared pedestrian target detection model in the step (3.1) into the well-converged image fusion-based YOLOv5 infrared pedestrian target detection model in the step (3.4) for detection.

The visible light camera and the infrared camera in the step 1.1 are arranged at the same acquisition position, and the optical axes of the lenses are in the same direction and parallel.

The pretreatment process in the step 1.2 is as follows: and (3) carrying out feature registration and Gaussian filtering on the visible light image and the infrared image acquired in the step (1.1).

Compared with the prior art, the invention has the beneficial effects that:

according to the infrared pedestrian target detection method based on image fusion, the visible light image and the infrared image are fused by using the Denseuse network, so that the detail information of a pedestrian target is enhanced, the data characteristics are expanded, and the target characteristics are more obvious and easier to extract; and then, the network structure is flexible, the convergence speed is high, the detection is performed by using a YOLOv5 target detection model with high model precision, the detection precision of the infrared image pedestrian target is finally improved, and the actual application requirements are met.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a Denseuse network according to the present invention;

fig. 3 is an effect diagram of a fused image of fixed weight Denseuse according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, 2 and 3: the invention discloses an infrared pedestrian target detection method based on image fusion, which comprises the following steps:

step 1.1: the method comprises the steps that a visible light camera and an infrared camera are arranged at the same collecting position, optical axes of lenses are in the same direction and parallel, the visible light camera and the infrared camera are used for randomly and simultaneously shooting scenes containing pedestrian targets, and the number of visible light images and infrared images is the same;

step 1.2: preprocessing the visible light image and the infrared image acquired in the step 1.1, specifically, performing feature registration, gaussian filtering and other preprocessing processes on the visible light image and the infrared image acquired in the step 1.1, so that interference of natural environment on the image is reduced to the greatest extent, and high quality of the acquired visible light image and infrared image is ensured.

Step 1.3: the visible light images and the infrared images preprocessed in the step 1.2 are arranged according to the acquisition sequence, and the acquired visible light images and infrared images are mutually corresponding and named as consistent, so as to form a visible light image and infrared image pair; for example, a visible light image and an infrared image acquired simultaneously by a visible light camera and an infrared camera for the first time correspond to each other and can be named as a first pair of images according to a sequence number;

step 2.1: sending the training set in the step 1.5 to a Denseuse network;

step 2.3: the feature of the input image is extracted using an encoder in the Denseuse network, in particular: the encoder in the Denseuse network consists of a C1 module and a DenseBlock module, wherein the C1 module is a convolution layer with the size of 3 multiplied by 3 and the step length of 1 and is used for extracting rough features, the DenseBlock module comprises three convolution layers DC1, DC2 and DC3 with the size of 3 multiplied by 3 and the step length of 1, and the output of each convolution layer is cascaded as the input of the subsequent layer and is used for retaining the significant features;

step 2.4: reconstructing the input image using a decoder in the Denseuse network according to the features extracted to the input image in step 2.3, resulting in a fixed weight Denseuse network, in particular: the decoder in the Denseuse network consists of C2, C3, C4 and C5 modules, wherein the C2, C3, C4 and C5 modules are all convolution layers with the size of 3 multiplied by 3 and the step length of 1 and are used for reconstructing an input image;

step 2.6: evaluating the Denseuse network, and evaluating the Denseuse network with fixed weight in the step 2.4 according to the test result in the step 2.5 to obtain evaluation results such as loss change; wherein the loss function H (x, y) is determined by the structural similarity loss function H _SSIM (x, y) and a pixel loss function H _P The (x, y) weights are calculated as follows:

H(x，y)＝γH _SSIM (x，y)+H _P (x，y)

＝γ(1-SSIM(Out(x，y)，In(x，y)))+||Out(x，y)-In(x，y)|| ₂

wherein Out (x, y) and In (x, y) represent an output image and an input image, respectively, ||out (x, y) -In (x, y) | ₂ Is the euclidean distance between the output image Out (x, y) and the input image In (x, y), SSIM (Out (x, y), in (x, y)) is the structural similarity between the output image Out (x, y) and the input image In (x, y), γ is a weighting coefficient, and given that there are three orders of magnitude differences between pixel loss and structural similarity loss, γ=100 is taken here, x and y represent the abscissa and ordinate, respectively, of the pixel point on the image;

step 2.7: judging whether the loss change in the evaluation result of the step 2.6 tends to be stable or not, specifically, whether the loss change tends to be stable along with the increase of the training iteration times Epoch or not, if the loss change tends to be stable, executing the step 2.9, otherwise, executing the step 2.8;

step 2.8: adjusting the learning rate and the iteration times in the training parameters of the Denseuse network, and jumping to the step 2.3 for retraining; specifically, if the loss change is in a reduced but not stable state, it is indicated that the Denseuse network obtained by training has not converged yet, and the training iteration time Epoch can be properly enlarged; if the loss change is in an oscillation state, the fact that the Denseuse model obtained through training falls into a local optimal solution is indicated, and the learning rate Ir can be properly reduced;

F ^m (x，y)＝λVis ^m (x，y)+(1-λ)Ir ^m (x，y)； (1)

step 3.4: because the average precision represents the accuracy of model prediction, the loss change reflects the relation between a predicted value and a real value, the smaller the loss change is, the closer the predicted value is to the real value, the better the model effect is, therefore, according to the average precision change and the loss change trend of the training result in the step 3.3, the learning rate and the iteration times are adjusted until the average precision change and the loss change trend tend to be in a stable state, and the YOLOv5 infrared pedestrian target detection model based on image fusion with good final convergence is obtained;

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The infrared pedestrian target detection method based on image fusion is characterized by comprising the following steps of:

step 2.1: sending the training set in the step 1.5 to a Denseuse network;

F ^m (x，y)＝λVis ^m (x，y)+(1-λ)Ir ^m (x，y)； (1)

2. The image fusion-based infrared pedestrian target detection method according to claim 1, wherein: the visible light camera and the infrared camera in the step 1.1 are arranged at the same acquisition position, and the optical axes of the lenses are in the same direction and parallel.

3. The image fusion-based infrared pedestrian target detection method according to claim 2, wherein: the pretreatment process in the step 1.2 is as follows: and (3) carrying out feature registration and Gaussian filtering on the visible light image and the infrared image acquired in the step (1.1).

4. The image fusion-based infrared pedestrian target detection method as claimed in claim 3, wherein: the loss variation in step 2.6 is derived from a loss function H (x, y) derived from a structural similarity loss function H _SSIM (x, y) and a pixel loss function H _P The (x, y) weights are calculated as follows:

H(x，y)＝γH _SSIM (x，y)+H _P (x，y)

＝γ(1-SSIM(Out(x，y)，In(x，y)))

+||Out(x，y)-In(x，y)|| ₂

wherein Out (x, y) and In (x, y) represent an output image and an input image, respectively, ||out (x, y) -In (x, y) | ₂ Is an outputEuclidean distance between the image Out (x, y) and the input image In (x, y), SSIM (Out (x, y), in (x, y)) is the structural similarity between the output image Out (x, y) and the input image In (x, y), γ is a weighting coefficient, and γ=100 is taken here, where x and y represent the abscissa and ordinate, respectively, of the pixel point on the image, considering that there are three orders of magnitude differences between the pixel loss and the structural similarity loss.

5. The infrared pedestrian target detection method based on image fusion according to claim 4, wherein: in step 2.8, the method for adjusting the learning rate and the iteration number in the training parameters of the Denseuse network comprises the following steps: if the loss change is in a state of being reduced but not tending to be stable, increasing the training iteration times Epoch; if the loss change is in a vibration state, the learning rate Ir is reduced.