CN113688722B - Infrared pedestrian target detection method based on image fusion - Google Patents

Infrared pedestrian target detection method based on image fusion Download PDF

Info

Publication number
CN113688722B
CN113688722B CN202110971334.3A CN202110971334A CN113688722B CN 113688722 B CN113688722 B CN 113688722B CN 202110971334 A CN202110971334 A CN 202110971334A CN 113688722 B CN113688722 B CN 113688722B
Authority
CN
China
Prior art keywords
image
infrared
target detection
pedestrian target
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110971334.3A
Other languages
Chinese (zh)
Other versions
CN113688722A (en
Inventor
李永军
李耀
李莎莎
李孟军
陈竞
陈立家
李鹏飞
曹雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University
Original Assignee
Henan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University filed Critical Henan University
Priority to CN202110971334.3A priority Critical patent/CN113688722B/en
Publication of CN113688722A publication Critical patent/CN113688722A/en
Application granted granted Critical
Publication of CN113688722B publication Critical patent/CN113688722B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention provides an infrared pedestrian target detection method based on image fusion, which comprises the following steps: 1. establishing an infrared pedestrian target detection data set; 2. fusing the images using the trained Denseuse network; 3. constructing and training a YOLOv5 infrared pedestrian target detection model based on image fusion; the invention uses the Denseuse network to fuse the visible light image and the infrared image pair in the constructed infrared pedestrian target detection data set, thereby enhancing the quality of the image and reducing redundant information, and obtaining an infrared pedestrian target detection data set with more abundant information; the image fusion-based YOLOv5 infrared pedestrian target detection model is trained by using the fused infrared pedestrian detection data set, so that the image fusion-based YOLOv5 infrared pedestrian target detection model with good convergence is obtained, and the infrared image pedestrian target detection precision is improved.

Description

Infrared pedestrian target detection method based on image fusion
Technical Field
The invention relates to the technical field of image processing, in particular to an infrared pedestrian target detection method based on image fusion.
Background
The infrared imaging has the outstanding characteristics of long detection distance, high concealment, all-weather day and night work and the like, the target detection in the infrared image can simultaneously acquire the judgment of image content and target positioning, and the infrared imaging takes an irreplaceable position in civil fields such as medical lesion cell diagnosis, industrial flaw detection, automobile driving assistant and the like and military fields such as infrared early warning, submarine searching, infrared guidance and the like. However, the infrared image is obtained by thermal radiation of the target scene, and has a longer imaging wavelength, a large image noise, a low contrast, and a poor spatial resolution compared to the visible light image. The quality of the image directly influences the design and detection accuracy of the algorithm, so that the infrared target detection rate is low and the detection speed is low in the application field. Data enhancement has proven to be an effective approach to address the challenges presented by infrared pedestrian target detection tasks. The visible light image can reflect the spectral information attribute of the object, contains more detail information and accords with the visual characteristics of human eyes, but the heat radiation characteristic of the infrared image is more sensitive to the target and the region, and the interference caused by scene change can be avoided. Therefore, the infrared image and the visible light image have complementarity, the infrared image and the visible light image can be fused into an infrared image with more abundant information by an image fusion preprocessing method, so that the image quality is enhanced, the redundant information is reduced, and the aim of preprocessing a data set to enhance the image is fulfilled.
A monocular far infrared pedestrian detection method based on feature fusion is disclosed in a monocular far infrared pedestrian detection method based on feature fusion (patent application number: 2019109437223, publication number: CN 110674779A) owned by the university of agricultural in south China. The method comprises the steps of firstly carrying out scaling treatment on an original infrared image, then obtaining preliminary ROIs through threshold segmentation and morphological treatment, carrying out sliding window on the preliminary ROIs, and finally carrying out decision by cascading a linear SVM classifier based on HAAR feature and LBP feature while using a linear SVM classifier based on HOG feature. The method can be well suitable for long-distance, medium-distance and short-distance pedestrian detection, but is redundant in area selection strategy and high in time complexity due to the adoption of a traditional feature extraction mode and a sliding block filtering operation, has no good robustness to the change of diversity, and particularly is difficult to obtain useful information such as shape, size, structure, texture and the like for a weak and small target.
White jade, hou Zhijiang, liu Xiaoyi, etc. propose a target detection algorithm based on decision level fusion in paper "target detection algorithm based on decision level fusion of visible light images and infrared images" (journal of air force engineering university (natural science edition) 2020, volume 21, 6, pages 53 to 59). The algorithm retrains the YOLOv3 network by establishing a data set with labels, and respectively detects a visible light image and an infrared image by using the trained YOLOv3 network before fusion; in the fusion process, the target detection results of the visible light image and the infrared image are reserved, the accurate results of the same target detected in the visible light image and the infrared image at the same time are subjected to weighted fusion, and the detection results and the fusion results are combined to be used as the detection results of all corresponding targets in the fusion image. The algorithm combines the visible light image detection result, the infrared image result and the fusion result thereof, so that the detection accuracy is improved to a certain extent, but the algorithm uses the YOLOv3 network to train the visible light image and the infrared image respectively, and the calculation complexity is relatively high.
Disclosure of Invention
The invention aims to provide an infrared pedestrian target detection method based on image fusion, which aims to solve the problems of poor robustness, complex calculation and the like in the existing infrared pedestrian target detection method.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the infrared pedestrian target detection method based on image fusion comprises the following steps:
step 1: establishing an infrared pedestrian target detection data set, and specifically:
step 1.1: the method comprises the steps that a visible light camera and an infrared camera are used for randomly shooting a scene containing a pedestrian target at the same time, and the visible light images and the infrared images with the same quantity are collected;
step 1.2: preprocessing the visible light image and the infrared image acquired in the step 1.1;
step 1.3: the visible light images and the infrared images preprocessed in the step 1.2 are arranged according to the acquisition sequence, and the acquired visible light images and infrared images are mutually corresponding and named as consistent, so as to form a visible light image and infrared image pair;
step 1.4: performing boundary frame marking classification on pedestrians in the visible light image and infrared image pair in the step 1.3 through LabelImg software, and manufacturing and deriving a label;
step 1.5: randomly selecting 70% of the visible light image and infrared image pairs in the step 1.4 as a training set, and 30% of the rest visible light image and infrared image pairs as a test set;
step 2: fusion of visible and infrared images using a Denseuse network, in particular:
step 2.1: sending the training set in the step 1.5 to a Denseuse network;
step 2.2: setting training parameters, specifically, setting the size of a training Batch to be batch=4, and initially setting the learning rate to be Ir=0.0001, wherein the training iteration times are epoch=150;
step 2.3: extracting features of the input image using an encoder in the Denseuse network;
step 2.4: reconstructing the input image using a decoder in the Denseuse network according to the features extracted from the input image in step 2.3, thereby obtaining a fixed weight Denseuse network;
step 2.5: the test set in the step 1.5 is sent to the Denseuse network with fixed weight in the step 2.4 for testing, and a test result is obtained;
step 2.6: evaluating the Denseuse network, and evaluating the fixed-weight Denseuse network in the step 2.4 according to the test result in the step 2.5 to obtain an evaluation result;
step 2.7: judging whether the loss variation in the evaluation result of the step 2.6 tends to be stable, if so, executing the step 2.9, otherwise, executing the step 2.8;
step 2.8: adjusting the learning rate and the iteration times in the training parameters of the Denseuse network, and jumping to the step 2.3 for retraining;
step 2.9: the training set and the test set in the step 1.5 are both sent to the Denseuse network with fixed weight in the step 2.4, the characteristics of the visible light image and the infrared image pair are extracted through the encoder, and the characteristics of the visible light image and the infrared image pair extracted by the encoder are fused by adopting an addition fusion strategy shown in the formula (1); sending the fused features into a decoder for reconstruction to obtain a fused image;
F m (x,y)=λVis m (x,y)+(1-λ)Ir m (x,y); (1)
wherein Vis m (x, y) represents the feature of the m-th channel visible light image extracted by the encoder, ir m (x, y) represents the features of the mth channel infrared image extracted by the encoder, F m (x, y) represents the feature of fusion of the m-th channel visible light image and the infrared image pair, lambda is a weighting coefficient, and x and y represent the abscissa and the ordinate of the pixel point on the input image respectively;
step 3: training a YOLOv5 infrared pedestrian target detection model based on image fusion, and specifically:
step 3.1: constructing a training set and a testing set of the YOLOv5 infrared pedestrian target detection model based on image fusion, randomly selecting 70% of the images fused in the step 2.9, taking the label corresponding to the step 1.4 as the training set of the YOLOv5 infrared pedestrian target detection model based on image fusion, taking the remaining 30% of the images fused in the step 2.9 as the testing set of the YOLOv5 infrared pedestrian target detection model based on image fusion, and taking the label corresponding to the step 1.4 as the testing set of the YOLOv5 infrared pedestrian target detection model based on image fusion;
step 3.2: training parameters are set, training is carried out by using a random optimization algorithm Adam, the size of a training Batch is set to be batch=64, the Momentum momentum=0.9, the learning rate is initially set to be ir=0.001, and the training iteration times epoch=300;
step 3.3: the training set of the image fusion-based YOLOv5 infrared pedestrian target detection model in the step 3.1 is sent to the image fusion-based YOLOv5 infrared pedestrian target detection model for training, and average precision change and loss change trend of the training result are obtained;
step 3.4: according to the average precision change and the loss change trend of the training result in the step 3.3, the learning rate and the iteration times are adjusted until the average precision change and the loss change trend tend to be in a stable state, and a YOLOv5 infrared pedestrian target detection model with good final convergence and based on image fusion is obtained;
step 3.5: and (3) sending the test set of the image fusion-based YOLOv5 infrared pedestrian target detection model in the step (3.1) into the well-converged image fusion-based YOLOv5 infrared pedestrian target detection model in the step (3.4) for detection.
The visible light camera and the infrared camera in the step 1.1 are arranged at the same acquisition position, and the optical axes of the lenses are in the same direction and parallel.
The pretreatment process in the step 1.2 is as follows: and (3) carrying out feature registration and Gaussian filtering on the visible light image and the infrared image acquired in the step (1.1).
Compared with the prior art, the invention has the beneficial effects that:
according to the infrared pedestrian target detection method based on image fusion, the visible light image and the infrared image are fused by using the Denseuse network, so that the detail information of a pedestrian target is enhanced, the data characteristics are expanded, and the target characteristics are more obvious and easier to extract; and then, the network structure is flexible, the convergence speed is high, the detection is performed by using a YOLOv5 target detection model with high model precision, the detection precision of the infrared image pedestrian target is finally improved, and the actual application requirements are met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a Denseuse network according to the present invention;
fig. 3 is an effect diagram of a fused image of fixed weight Denseuse according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, 2 and 3: the invention discloses an infrared pedestrian target detection method based on image fusion, which comprises the following steps:
step 1: establishing an infrared pedestrian target detection data set, and specifically:
step 1.1: the method comprises the steps that a visible light camera and an infrared camera are arranged at the same collecting position, optical axes of lenses are in the same direction and parallel, the visible light camera and the infrared camera are used for randomly and simultaneously shooting scenes containing pedestrian targets, and the number of visible light images and infrared images is the same;
step 1.2: preprocessing the visible light image and the infrared image acquired in the step 1.1, specifically, performing feature registration, gaussian filtering and other preprocessing processes on the visible light image and the infrared image acquired in the step 1.1, so that interference of natural environment on the image is reduced to the greatest extent, and high quality of the acquired visible light image and infrared image is ensured.
Step 1.3: the visible light images and the infrared images preprocessed in the step 1.2 are arranged according to the acquisition sequence, and the acquired visible light images and infrared images are mutually corresponding and named as consistent, so as to form a visible light image and infrared image pair; for example, a visible light image and an infrared image acquired simultaneously by a visible light camera and an infrared camera for the first time correspond to each other and can be named as a first pair of images according to a sequence number;
step 1.4: performing boundary frame marking classification on pedestrians in the visible light image and infrared image pair in the step 1.3 through LabelImg software, and manufacturing and deriving a label;
step 1.5: randomly selecting 70% of the visible light image and infrared image pairs in the step 1.4 as a training set, and 30% of the rest visible light image and infrared image pairs as a test set;
step 2: fusion of visible and infrared images using a Denseuse network, in particular:
step 2.1: sending the training set in the step 1.5 to a Denseuse network;
step 2.2: setting training parameters, specifically, setting the size of a training Batch to be batch=4, and initially setting the learning rate to be Ir=0.0001, wherein the training iteration times are epoch=150;
step 2.3: the feature of the input image is extracted using an encoder in the Denseuse network, in particular: the encoder in the Denseuse network consists of a C1 module and a DenseBlock module, wherein the C1 module is a convolution layer with the size of 3 multiplied by 3 and the step length of 1 and is used for extracting rough features, the DenseBlock module comprises three convolution layers DC1, DC2 and DC3 with the size of 3 multiplied by 3 and the step length of 1, and the output of each convolution layer is cascaded as the input of the subsequent layer and is used for retaining the significant features;
step 2.4: reconstructing the input image using a decoder in the Denseuse network according to the features extracted to the input image in step 2.3, resulting in a fixed weight Denseuse network, in particular: the decoder in the Denseuse network consists of C2, C3, C4 and C5 modules, wherein the C2, C3, C4 and C5 modules are all convolution layers with the size of 3 multiplied by 3 and the step length of 1 and are used for reconstructing an input image;
step 2.5: the test set in the step 1.5 is sent to the Denseuse network with fixed weight in the step 2.4 for testing, and a test result is obtained;
step 2.6: evaluating the Denseuse network, and evaluating the Denseuse network with fixed weight in the step 2.4 according to the test result in the step 2.5 to obtain evaluation results such as loss change; wherein the loss function H (x, y) is determined by the structural similarity loss function H SSIM (x, y) and a pixel loss function H P The (x, y) weights are calculated as follows:
H(x,y)=γH SSIM (x,y)+H P (x,y)
=γ(1-SSIM(Out(x,y),In(x,y)))+||Out(x,y)-In(x,y)|| 2
wherein Out (x, y) and In (x, y) represent an output image and an input image, respectively, ||out (x, y) -In (x, y) | 2 Is the euclidean distance between the output image Out (x, y) and the input image In (x, y), SSIM (Out (x, y), in (x, y)) is the structural similarity between the output image Out (x, y) and the input image In (x, y), γ is a weighting coefficient, and given that there are three orders of magnitude differences between pixel loss and structural similarity loss, γ=100 is taken here, x and y represent the abscissa and ordinate, respectively, of the pixel point on the image;
step 2.7: judging whether the loss change in the evaluation result of the step 2.6 tends to be stable or not, specifically, whether the loss change tends to be stable along with the increase of the training iteration times Epoch or not, if the loss change tends to be stable, executing the step 2.9, otherwise, executing the step 2.8;
step 2.8: adjusting the learning rate and the iteration times in the training parameters of the Denseuse network, and jumping to the step 2.3 for retraining; specifically, if the loss change is in a reduced but not stable state, it is indicated that the Denseuse network obtained by training has not converged yet, and the training iteration time Epoch can be properly enlarged; if the loss change is in an oscillation state, the fact that the Denseuse model obtained through training falls into a local optimal solution is indicated, and the learning rate Ir can be properly reduced;
step 2.9: the training set and the test set in the step 1.5 are both sent to the Denseuse network with fixed weight in the step 2.4, the characteristics of the visible light image and the infrared image pair are extracted through the encoder, and the characteristics of the visible light image and the infrared image pair extracted by the encoder are fused by adopting an addition fusion strategy shown in the formula (1); sending the fused features into a decoder for reconstruction to obtain a fused image;
F m (x,y)=λVis m (x,y)+(1-λ)Ir m (x,y); (1)
wherein Vis m (x, y) represents the feature of the m-th channel visible light image extracted by the encoder, ir m (x, y) represents the features of the mth channel infrared image extracted by the encoder, F m (x, y) represents the feature of fusion of the m-th channel visible light image and the infrared image pair, lambda is a weighting coefficient, and x and y represent the abscissa and the ordinate of the pixel point on the input image respectively;
step 3: training a YOLOv5 infrared pedestrian target detection model based on image fusion, and specifically:
step 3.1: constructing a training set and a testing set of the YOLOv5 infrared pedestrian target detection model based on image fusion, randomly selecting 70% of the images fused in the step 2.9, taking the label corresponding to the step 1.4 as the training set of the YOLOv5 infrared pedestrian target detection model based on image fusion, taking the remaining 30% of the images fused in the step 2.9 as the testing set of the YOLOv5 infrared pedestrian target detection model based on image fusion, and taking the label corresponding to the step 1.4 as the testing set of the YOLOv5 infrared pedestrian target detection model based on image fusion;
step 3.2: training parameters are set, training is carried out by using a random optimization algorithm Adam, the size of a training Batch is set to be batch=64, the Momentum momentum=0.9, the learning rate is initially set to be ir=0.001, and the training iteration times epoch=300;
step 3.3: the training set of the image fusion-based YOLOv5 infrared pedestrian target detection model in the step 3.1 is sent to the image fusion-based YOLOv5 infrared pedestrian target detection model for training, and average precision change and loss change trend of the training result are obtained;
step 3.4: because the average precision represents the accuracy of model prediction, the loss change reflects the relation between a predicted value and a real value, the smaller the loss change is, the closer the predicted value is to the real value, the better the model effect is, therefore, according to the average precision change and the loss change trend of the training result in the step 3.3, the learning rate and the iteration times are adjusted until the average precision change and the loss change trend tend to be in a stable state, and the YOLOv5 infrared pedestrian target detection model based on image fusion with good final convergence is obtained;
step 3.5: and (3) sending the test set of the image fusion-based YOLOv5 infrared pedestrian target detection model in the step (3.1) into the well-converged image fusion-based YOLOv5 infrared pedestrian target detection model in the step (3.4) for detection.
According to the infrared pedestrian target detection method based on image fusion, the visible light image and the infrared image are fused by using the Denseuse network, so that the detail information of a pedestrian target is enhanced, the data characteristics are expanded, and the target characteristics are more obvious and easier to extract; and then, the network structure is flexible, the convergence speed is high, the detection is performed by using a YOLOv5 target detection model with high model precision, the detection precision of the infrared image pedestrian target is finally improved, and the actual application requirements are met.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (5)

1. The infrared pedestrian target detection method based on image fusion is characterized by comprising the following steps of:
step 1: establishing an infrared pedestrian target detection data set, and specifically:
step 1.1: the method comprises the steps that a visible light camera and an infrared camera are used for randomly shooting a scene containing a pedestrian target at the same time, and the visible light images and the infrared images with the same quantity are collected;
step 1.2: preprocessing the visible light image and the infrared image acquired in the step 1.1;
step 1.3: the visible light images and the infrared images preprocessed in the step 1.2 are arranged according to the acquisition sequence, and the acquired visible light images and infrared images are mutually corresponding and named as consistent, so as to form a visible light image and infrared image pair;
step 1.4: performing boundary frame marking classification on pedestrians in the visible light image and infrared image pair in the step 1.3 through LabelImg software, and manufacturing and deriving a label;
step 1.5: randomly selecting 70% of the visible light image and infrared image pairs in the step 1.4 as a training set, and 30% of the rest visible light image and infrared image pairs as a test set;
step 2: fusion of visible and infrared images using a Denseuse network, in particular:
step 2.1: sending the training set in the step 1.5 to a Denseuse network;
step 2.2: setting training parameters, specifically, setting the size of a training Batch to be batch=4, and initially setting the learning rate to be Ir=0.0001, wherein the training iteration times are epoch=150;
step 2.3: extracting features of the input image using an encoder in the Denseuse network;
step 2.4: reconstructing the input image using a decoder in the Denseuse network according to the features extracted from the input image in step 2.3, thereby obtaining a fixed weight Denseuse network;
step 2.5: the test set in the step 1.5 is sent to the Denseuse network with fixed weight in the step 2.4 for testing, and a test result is obtained;
step 2.6: evaluating the Denseuse network, and evaluating the fixed-weight Denseuse network in the step 2.4 according to the test result in the step 2.5 to obtain an evaluation result;
step 2.7: judging whether the loss variation in the evaluation result of the step 2.6 tends to be stable, if so, executing the step 2.9, otherwise, executing the step 2.8;
step 2.8: adjusting the learning rate and the iteration times in the training parameters of the Denseuse network, and jumping to the step 2.3 for retraining;
step 2.9: the training set and the test set in the step 1.5 are both sent to the Denseuse network with fixed weight in the step 2.4, the characteristics of the visible light image and the infrared image pair are extracted through the encoder, and the characteristics of the visible light image and the infrared image pair extracted by the encoder are fused by adopting an addition fusion strategy shown in the formula (1); sending the fused features into a decoder for reconstruction to obtain a fused image;
F m (x,y)=λVis m (x,y)+(1-λ)Ir m (x,y); (1)
wherein Vis m (x, y) represents the feature of the m-th channel visible light image extracted by the encoder, ir m (x, y) represents the features of the mth channel infrared image extracted by the encoder, F m (x, y) represents the feature of fusion of the m-th channel visible light image and the infrared image pair, lambda is a weighting coefficient, and x and y represent the abscissa and the ordinate of the pixel point on the input image respectively;
step 3: training a YOLOv5 infrared pedestrian target detection model based on image fusion, and specifically:
step 3.1: constructing a training set and a testing set of the YOLOv5 infrared pedestrian target detection model based on image fusion, randomly selecting 70% of the images fused in the step 2.9, taking the label corresponding to the step 1.4 as the training set of the YOLOv5 infrared pedestrian target detection model based on image fusion, taking the remaining 30% of the images fused in the step 2.9 as the testing set of the YOLOv5 infrared pedestrian target detection model based on image fusion, and taking the label corresponding to the step 1.4 as the testing set of the YOLOv5 infrared pedestrian target detection model based on image fusion;
step 3.2: training parameters are set, training is carried out by using a random optimization algorithm Adam, the size of a training Batch is set to be batch=64, the Momentum momentum=0.9, the learning rate is initially set to be ir=0.001, and the training iteration times epoch=300;
step 3.3: the training set of the image fusion-based YOLOv5 infrared pedestrian target detection model in the step 3.1 is sent to the image fusion-based YOLOv5 infrared pedestrian target detection model for training, and average precision change and loss change trend of the training result are obtained;
step 3.4: according to the average precision change and the loss change trend of the training result in the step 3.3, the learning rate and the iteration times are adjusted until the average precision change and the loss change trend tend to be in a stable state, and a YOLOv5 infrared pedestrian target detection model with good final convergence and based on image fusion is obtained;
step 3.5: and (3) sending the test set of the image fusion-based YOLOv5 infrared pedestrian target detection model in the step (3.1) into the well-converged image fusion-based YOLOv5 infrared pedestrian target detection model in the step (3.4) for detection.
2. The image fusion-based infrared pedestrian target detection method according to claim 1, wherein: the visible light camera and the infrared camera in the step 1.1 are arranged at the same acquisition position, and the optical axes of the lenses are in the same direction and parallel.
3. The image fusion-based infrared pedestrian target detection method according to claim 2, wherein: the pretreatment process in the step 1.2 is as follows: and (3) carrying out feature registration and Gaussian filtering on the visible light image and the infrared image acquired in the step (1.1).
4. The image fusion-based infrared pedestrian target detection method as claimed in claim 3, wherein: the loss variation in step 2.6 is derived from a loss function H (x, y) derived from a structural similarity loss function H SSIM (x, y) and a pixel loss function H P The (x, y) weights are calculated as follows:
H(x,y)=γH SSIM (x,y)+H P (x,y)
=γ(1-SSIM(Out(x,y),In(x,y)))
+||Out(x,y)-In(x,y)|| 2
wherein Out (x, y) and In (x, y) represent an output image and an input image, respectively, ||out (x, y) -In (x, y) | 2 Is an outputEuclidean distance between the image Out (x, y) and the input image In (x, y), SSIM (Out (x, y), in (x, y)) is the structural similarity between the output image Out (x, y) and the input image In (x, y), γ is a weighting coefficient, and γ=100 is taken here, where x and y represent the abscissa and ordinate, respectively, of the pixel point on the image, considering that there are three orders of magnitude differences between the pixel loss and the structural similarity loss.
5. The infrared pedestrian target detection method based on image fusion according to claim 4, wherein: in step 2.8, the method for adjusting the learning rate and the iteration number in the training parameters of the Denseuse network comprises the following steps: if the loss change is in a state of being reduced but not tending to be stable, increasing the training iteration times Epoch; if the loss change is in a vibration state, the learning rate Ir is reduced.
CN202110971334.3A 2021-08-21 2021-08-21 Infrared pedestrian target detection method based on image fusion Active CN113688722B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110971334.3A CN113688722B (en) 2021-08-21 2021-08-21 Infrared pedestrian target detection method based on image fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110971334.3A CN113688722B (en) 2021-08-21 2021-08-21 Infrared pedestrian target detection method based on image fusion

Publications (2)

Publication Number Publication Date
CN113688722A CN113688722A (en) 2021-11-23
CN113688722B true CN113688722B (en) 2024-03-22

Family

ID=78581644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110971334.3A Active CN113688722B (en) 2021-08-21 2021-08-21 Infrared pedestrian target detection method based on image fusion

Country Status (1)

Country Link
CN (1) CN113688722B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020102988A1 (en) * 2018-11-20 2020-05-28 西安电子科技大学 Feature fusion and dense connection based infrared plane target detection method
CN111209810A (en) * 2018-12-26 2020-05-29 浙江大学 Bounding box segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time in visible light and infrared images
CN111563473A (en) * 2020-05-18 2020-08-21 电子科技大学 Remote sensing ship identification method based on dense feature fusion and pixel level attention
WO2020177432A1 (en) * 2019-03-07 2020-09-10 中国科学院自动化研究所 Multi-tag object detection method and system based on target detection network, and apparatuses
CN112070111A (en) * 2020-07-28 2020-12-11 浙江大学 Multi-target detection method and system adaptive to multiband images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020102988A1 (en) * 2018-11-20 2020-05-28 西安电子科技大学 Feature fusion and dense connection based infrared plane target detection method
CN111209810A (en) * 2018-12-26 2020-05-29 浙江大学 Bounding box segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time in visible light and infrared images
WO2020177432A1 (en) * 2019-03-07 2020-09-10 中国科学院自动化研究所 Multi-tag object detection method and system based on target detection network, and apparatuses
CN111563473A (en) * 2020-05-18 2020-08-21 电子科技大学 Remote sensing ship identification method based on dense feature fusion and pixel level attention
CN112070111A (en) * 2020-07-28 2020-12-11 浙江大学 Multi-target detection method and system adaptive to multiband images

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于伪模态转换的红外目标融合检测算法;安浩南;赵明;潘胜达;林长青;;光子学报;20201231(第08期);全文 *
改进的YOLOv3红外图像行人检测算法;史健婷;张贵强;;黑龙江科技大学学报;20200730(第04期);全文 *
改进的YOLOv3红外视频图像行人检测算法;王殿伟;何衍辉;李大湘;刘颖;许志杰;王晶;;西安邮电大学学报;20180710(第04期);全文 *

Also Published As

Publication number Publication date
CN113688722A (en) 2021-11-23

Similar Documents

Publication Publication Date Title
CN110363140B (en) Human body action real-time identification method based on infrared image
CN111209810B (en) Boundary frame segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time through visible light and infrared images
Hao et al. Learning from synthetic photorealistic raindrop for single image raindrop removal
CN109241982A (en) Object detection method based on depth layer convolutional neural networks
CN112215296B (en) Infrared image recognition method based on transfer learning and storage medium
CN113158943A (en) Cross-domain infrared target detection method
CN114445430B (en) Real-time image semantic segmentation method and system for lightweight multi-scale feature fusion
CN111161160B (en) Foggy weather obstacle detection method and device, electronic equipment and storage medium
CN116311254B (en) Image target detection method, system and equipment under severe weather condition
CN114119586A (en) Intelligent detection method for aircraft skin defects based on machine vision
CN114782298A (en) Infrared and visible light image fusion method with regional attention
CN114170537A (en) Multi-mode three-dimensional visual attention prediction method and application thereof
Jiang et al. Unsupervised monocular depth perception: Focusing on moving objects
Hamzeh et al. A review of detection and removal of raindrops in automotive vision systems
Kim et al. Video object detection using object's motion context and spatio-temporal feature aggregation
CN110458019B (en) Water surface target detection method for eliminating reflection interference under scarce cognitive sample condition
CN116758117B (en) Target tracking method and system under visible light and infrared images
CN110751005B (en) Pedestrian detection method integrating depth perception features and kernel extreme learning machine
CN110110606A (en) The fusion method of visible light neural network based and infrared face image
CN113688722B (en) Infrared pedestrian target detection method based on image fusion
Nakamura et al. Few-shot adaptive object detection with cross-domain cutmix
CN115063428B (en) Spatial dim small target detection method based on deep reinforcement learning
Zhao et al. Deep learning-based laser and infrared composite imaging for armor target identification and segmentation in complex battlefield environments
CN115375991A (en) Strong/weak illumination and fog environment self-adaptive target detection method
Liu et al. FSFM: a feature square tower fusion module for multimodal object detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant