CN116863271A

CN116863271A - Lightweight infrared flame detection method based on improved YOLO V5

Info

Publication number: CN116863271A
Application number: CN202310824468.1A
Authority: CN
Inventors: 胡啸; 常宇超; 李明杰; 刘俊; 董炤琛; 赵志诚; 谢刚
Original assignee: Taiyuan University of Science and Technology
Current assignee: Taiyuan University of Science and Technology
Priority date: 2023-07-06
Filing date: 2023-07-06
Publication date: 2023-10-10

Abstract

The application discloses a lightweight infrared flame detection method based on improved YOLOV5, which comprises the following steps: acquiring an infrared flame data set, and constructing an experimental data set based on the CycleGAN model and the infrared flame data set; classifying and labeling the experimental data set to obtain an image data set; constructing an improved YOLOV5 network to process the image data set to obtain tensor data of different scales; acquiring an infrared flame target detection frame based on tensor data; and evaluating the calculated quantity, the detection precision and the reasoning speed of the infrared flame target detection frame. According to the application, a new trunk structure of SheffleNetV 2-CA-Lite is provided in the YOLOV5 trunk network part, the generalization capability of the network is enhanced by replacing the maximum pooling layer in the original SPPF structure by the mixed pooling layer, and the C3 structure of the Neck part is further improved, so that the rapid and accurate detection of the infrared flame image target is realized.

Description

Lightweight infrared flame detection method based on improved YOLO V5

Technical Field

The application belongs to the technical field of deep learning and target detection, and particularly relates to a lightweight infrared flame detection method based on improved YOLO V5.

Background

The transformer substation is a core hub of each level of power grid, and routine inspection of equipment in the transformer substation is a key technical means for ensuring safe operation of the power grid. With the high-speed development of economy, the scale of the power system is continuously enlarged, and the requirement on the system stability is continuously improved. The existing manual inspection mode has the defects of high labor intensity, scattered detection quality, large interference by severe weather and serious consequences which are difficult to measure if the occurrence of fire can not be found in time. At present, the early warning of fire is mainly focused on the field of visible light, the research on infrared mode is less, and the detection of flame targets under the condition of visible light is greatly influenced by illumination conditions. Thus, flame detection remains a subject of urgent investigation.

The traditional flame detection algorithm needs to be realized through manual design characteristics, has no good robustness to the change of morphological diversity and illumination diversity, and is difficult to obtain good effect in actual detection tasks. In recent years, with the improvement of GPU performance and the advent of large-scale data sets, a wide variety of object detection algorithms based on deep learning have been proposed, which are mainly divided into two categories: a single-stage target detection algorithm and a double-stage target detection algorithm. The two-stage target detection algorithm divides target detection into two stages, wherein the first stage generates a candidate frame possibly containing an object to be detected, and the second stage predicts and identifies the object of the candidate frame.

The algorithm of the double-stage target detection is represented by Faster R-CNN, and the algorithm of the single-stage target detection is represented by YOLO series. Although the position information of the two-stage target detector is accurate, the detection of the small target is good, the whole flow is long due to the fact that the components are more, the detection speed is low, and the real-time application scene cannot be met. The one-stage target detector has the characteristics of simple structure and high calculation efficiency, and has good detection precision, but the real-time performance is poor due to the calculation force limitation of the embedded equipment generally when the parameter quantity of the model is increased while the precision is improved. If a lightweight network is added, the number of model parameters and the computational complexity can be reduced, but the accuracy is reduced.

Disclosure of Invention

The application aims to provide a lightweight infrared flame detection method based on improved YOLO V5, so as to solve the problems in the prior art.

In order to achieve the above purpose, the application provides a lightweight infrared flame detection method based on improved YOLO V5, which has the characteristics of both real-time performance and accuracy, and comprises the following steps:

acquiring an infrared flame data set, and constructing an experimental data set based on a CycleGAN model and the infrared flame data set;

classifying and labeling the experimental data set to obtain an image data set;

constructing an improved YOLO V5 network;

processing the image dataset based on an improved YOLO V5 network to obtain tensor data of different scales;

acquiring an infrared flame target detection frame based on tensor data of different scales;

and evaluating the calculated amount, the detection precision and the reasoning speed of the infrared flame target detection frame, and carrying out infrared flame detection based on the evaluated infrared flame target detection frame.

Optionally, the infrared flame data set includes an infrared flame image and a visible flame image, and the process of constructing an experimental data set based on the CycleGAN network and the infrared flame data set includes:

constructing a CycleGAN model based on a Pytorch deep learning framework, and training the CycleGAN model based on an infrared flame image and a visible light image;

and converting the visible light image into an infrared FLAME image based on the trained CycleGAN model, and constructing the experimental data set with the Corsican Fire data set and the FLAME data set.

Optionally, the process of constructing the improved YOLO V5 network includes:

the original backbone network of the YOLO V5 network is improved based on a SheffeNetV 2-CA-Lite structure;

based on the hybrid pooling to replace the maximum pooling layer, obtaining an improved SPPF structure;

the modified Neck part is obtained based on the S-ConvNeXt structure replacing the C3 structure in the Neck part.

Optionally, the processing of the image dataset based on the modified YOLO V5 network comprises:

acquiring infrared flame image feature graphs with different scales according to the image dataset based on an improved YOLO V5 network;

carrying out feature randomness diversity optimization on the infrared flame image feature graphs with different scales based on the improved SPPF structure;

and up-sampling and feature fusion are carried out on the optimized infrared flame image feature map based on the improved Neck part, so that tensor data of different scales are obtained.

Optionally, the SheffeNetV 2-CA-Lite structure comprises a SheffeNetV 2 module and a CA-Lite module;

the CA-Lite module comprises a CA attention mechanism and a Ghost convolution;

the Ghost convolution is as follows:

Y'＝X×f'

wherein X∈R^c×h×w ，Y'∈R ^h'×w'×m ，f'∈R ^c×k×k×m C is the number of channels of the input feature map, Y ' is the output feature map with m channels, f ' represents the convolution filter, h ' and w ' are the height and width of the output data, respectively, and k x k is the kernel size of the convolution filter f ';

the output of the CA attention mechanism is as follows:

wherein ,y_c (i, j) is the output of CA attention mechanism, x _c (i, j) is the size of the input feature map, weights in two spatial directions, respectively.

Optionally, the mixing pooling includes:

wherein ,representing the output of the mixing pool, x _k Representing the elements at the location of the merge region coverage, f represents a 1x1 convolution filter, and represents the depth concatenation of the outputs of the two operations.

Optionally, the modified neg part includes a convolutional layer module, an upsampling module, a feature fusion module, and a modified Convnext module, where the modified Convnext module is configured to introduce a SiLU activation function into the Convnext module;

the SilU activation function is defined as follows:

wherein ,a_k (S) is the output of the SiLU activation function, i represents the neuron number, S _i B for input vector _k For bias, σ is Sigmoid activation function output, w _iK Is the weight connected to the hidden unit k.

Optionally, the process of evaluating the calculated amount, the detection precision and the reasoning speed of the infrared flame target detection frame includes:

drawing a PR curve, calculating average precision based on the PR curve, and carrying out detection precision evaluation based on the average precision;

testing the number of the detectable images of the infrared flame target detection frame for one second, and evaluating the reasoning speed;

and obtaining the calculation complexity based on floating point operation, and performing calculation amount evaluation.

The application has the technical effects that:

the application provides a lightweight infrared flame detection method based on improved YOLO V5, which comprises the steps of firstly utilizing a cycleGAN network expansion data set to be combined with a public infrared flame data set to construct a related data set, then providing a new trunk structure of ShuffeNetV 2-CA-Lite in a YOLO V5 trunk network part, replacing the generalization capability of a maximum pooling layer enhancement network in an original SPPF structure through a mixed pooling layer, then further improving a C3 structure of a Neck part, providing an S-ConvNeXt structure to replace an original C3 structure, further processing image characteristics, and finally evaluating target detection performance through calculation amount, average precision and reasoning speed. The method can realize the rapid and accurate detection of the infrared flame image target on the basis of improving the recognition accuracy and the characteristic extraction performance.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

FIG. 1 is a schematic flow chart of a lightweight infrared flame detection method based on improved YOLO V5 according to an embodiment of the application;

fig. 2 is a schematic structural diagram of a backbone network shufflenet v2-CA-Lite according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an improved SPPF architecture Mix-SPPF according to an embodiment of the present application;

FIG. 4 is a schematic view of a neck structure modified in accordance with an embodiment of the present application;

FIG. 5 is a graph of average accuracy of flame detection over a data set in accordance with an embodiment of the present application.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

Example 1

As shown in fig. 1-5, the embodiment provides a lightweight infrared flame detection method based on improved YOLO V5, as shown in fig. 1, comprising the following steps:

1) Aiming at the problem of insufficient data sets, expanding the data sets by utilizing a CycleGAN network, inputting an infrared FLAME image and a visible light FLAME image into the CycleGAN network, converting the visible light image into the infrared FLAME image, and taking the processed data sets, a Corsican Fire data set and a FLAME data set as the data sets of the experiment;

2) Classifying and labeling the data set obtained in the step 1) to obtain an image data set with a category label, and dividing the image data set with the category label into a training set, a verification set and a test set, wherein 80% of the image data set is used as the training set, 10% of the image data set is used as the verification set, and 10% of the image data set is used as the test set;

3) Inputting the marked image into an improved YOLO V5 network to obtain infrared flame image feature images with different scales for the image dataset with the category labels obtained in the step 2); the improved YOLO V5 network is to replace the original backbone network with a shufflenet V2-CA-Lite structure;

4) Inputting the image feature map obtained in the step 3) into an improved SPPF structure, and replacing the original maximum pooling layer by mixed pooling, so as to further increase the receptive field of the network and improve the generalization capability of the network;

5) Inputting the feature images of the infrared flame images with different scales obtained in the step 4) into an improved Neck part, providing an S-ConvNeXt structure to replace a C3 structure in the original Neck part, and obtaining tensor data with different scales after up-sampling and feature fusion of the feature images of the infrared flame images with different scales;

6) Inputting tensor data with different scales obtained in the step 5) into a prediction layer part in a network to obtain a detection frame of an infrared flame target, and evaluating the tensor data by using 3 parameters of calculated quantity, average precision and reasoning speed;

in step 1) of the embodiment, 300 FLAME images in an infrared mode are obtained through a CycleGAN expansion data set, and 1449 infrared FLAME images are obtained in total by adding partial images extracted from a Corsican Fire data set and a FLAME data set; after marking, the image data set and the class label generated by marking work are divided into a training set, a verification set and a test set, wherein 1159 sheets are used as the training set, 144 sheets are used as the verification set, and 146 sheets are used as the test set;

further, in step 3), as shown in fig. 2, the backbone network is improved, and the improved backbone network uses the proposed shufflenet V2-CA-Lite module to replace the backbone network of YOLO V5, so that the network improves the detection speed and also improves the detection precision. The improved main network performs the following feature extraction process on the image, firstly, preprocessing the image of the data set, and adjusting the size of the image to 224 multiplied by 3; the preprocessed infrared flame image is input into a convolution and maximum pooling layer of a network head to reduce the dimension of data, and then is input into a proposed shufflenet V2-CA-Lite module for further processing, wherein the module is formed by fusing the shufflenet V2 module and the CA-Lite module, the added CA-Lite module comprises a CA attention mechanism and a Ghost convolution, a large amount of redundant data generated by the traditional convolution while capturing characteristic information is reduced, so that the network focuses on flame areas during characteristic extraction, unimportant areas are restrained, and the occurrence of missed detection under a complex scene is reduced. Wherein the definition of the Ghost convolution is as follows:

Y'＝X×f'

wherein X∈R^c×h×w ，Y'∈R ^h'×w'×m ，f'∈R ^c×k×k×m C is the number of channels of the input signature, Y ' is the output signature with m channels, f ' represents the convolution filter, h ' and w ' are the height and width of the output data, respectively, and k x k is the kernel size of the convolution filter f '.

The output of the CA attention mechanism is as follows:

wherein x_c (i, j) is the size of the input feature map,weights in two spatial directions, respectively.

Further, in step 4), the SPPF module is improved as shown in fig. 3, and the maximum pooling layer is replaced by mixed pooling, so that the randomness and diversity of the features are increased, the generalization performance of the network is increased, and the definition of mixed pooling is as follows:

Further, in step 5), as shown in fig. 4, the network's neg portion is modified, and the data obtained in step (4) is sent to the network's modified neg portion, where the modified neg portion is composed of a convolutional layer module, an upsampling module, a feature fusion module, and a modified Convnext module, where the modified Convnext module introduces a SiLU activation function into the Convnext module, so as to enhance the reasoning speed of the model. And up-sampling and feature fusion are carried out on the infrared flame image feature images with different scales to obtain tensor data with different scales. The Silu function is defined as follows:

wherein, a is _k (S) is the output of the SiLU activation function, i represents the neuron number, S _i B for input vector _k For bias, σ is Sigmoid activation function output, w _iK Is the weight connected to the hidden unit k.

Further, in step 6), the average precision is used as an index for measuring the detection precision of the multi-label image, the average precision is obtained by drawing PR curves, namely, precision and recovery are used as two-dimensional curves of the vertical and horizontal axis coordinates, precision is the accuracy, and recovery is the recall; the inference speed is defined as the number of detectable images per second; floating point operations (FLOPs) are used to evaluate computational complexity.

Simulation experiment: the platform of the embodiment is Ubuntu18.04 operating system, and the model in the experiment is realized by using Pytorch1.8.1 and Python3.8 as deep learning frameworks. The hardware is configured as one piece of Intel (R) Xeon (R) Platinum 8255C and one piece of RTX 2080Ti 11GB. The specific experimental steps are as follows:

data set selection: 1449 pictures of the expansion data set selected in the training are marked, 1159 pictures are used as training sets, 144 pictures are used as verification sets, and 146 pictures are used as test sets;

improved YOLO V5 training parameter settings: the training number of the wheels is 200, the step length is 16, namely, the data put in once is 16 pictures, and the initial learning rate is as follows: 0.001.

training result analysis: the infrared flame target detection result based on the improved YOLO V5 was evaluated by average accuracy, inference speed and floating point operation. The larger the average accuracy value, the higher the target detection accuracy. The faster the reasoning speed, the better the real-time of the target detection network. The smaller the floating point operation, the less computationally intensive the network. Fig. 5 is a graph of average accuracy parameters of improved YOLO V5, whose vertical axis is the average accuracy value, whose horizontal axis is the training round number, and whose average accuracy value of improved YOLO V5 is 97.4 and whose average accuracy value of unmodified YOLO V5 is 95.4 after about 200 iterations, and whose target detection accuracy is higher. In the aspect of reasoning speed, 143 pictures can be detected by the improved YOLO V5 network in one second, 100 pictures can be detected by the unmodified YOLO V5 network in one second, the detection speed of the improved YOLO V5 algorithm is improved by 31%, and the real-time performance of the improved YOLO V5 network is better. In terms of floating point operand, the operand of the improved YOLO V5 is 5.5G, the operand of the unmodified YOLO V5 is 15.8G, the operand of the improved YOLO V5 algorithm is reduced by 65%, and the operand is smaller.

The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A lightweight infrared flame detection method based on improved YOLOV5, comprising the steps of:

classifying and labeling the experimental data set to obtain an image data set;

constructing an improved YOLOV5 network;

processing the image dataset based on an improved YOLOV5 network to obtain tensor data of different scales;

2. The improved YOLOV5 based lightweight infrared flame detection method of claim 1, wherein the infrared flame dataset comprises an infrared flame image and a visible flame image, and constructing an experimental dataset based on a CycleGAN network and the infrared flame dataset comprises:

3. The improved YOLOV5 based light-weight infrared flame detection method of claim 1, wherein the process of constructing the improved YOLOV5 network comprises:

the original backbone network of the YOLOV5 network is improved based on a SheffeNetV 2-CA-Lite structure;

4. A lightweight infrared flame detection method based on modified YOLOV5 as in claim 3, wherein processing the image dataset based on modified YOLOV5 network comprises:

acquiring infrared flame image feature maps of different scales according to the image dataset based on an improved YOLOV5 network;

5. The improved YOLOV5 based light-weight infrared flame detection method of claim 3, wherein the ShuffleNetV2-CA-Lite structure comprises a ShuffleNetV2 module and a CA-Lite module;

the CA-Lite module comprises a CA attention mechanism and a Ghost convolution;

the Ghost convolution is as follows:

Y'＝X×f'

the output of the CA attention mechanism is as follows:

6. The improved YOLOV 5-based light-weight infrared flame detection method of claim 3, wherein the mixing pooling comprises:

7. The improved YOLOV 5-based light-weight infrared flame detection method of claim 3, wherein the improved neg portion comprises a convolutional layer module, an upsampling module, a feature fusion module, and an improved Convnext module, the improved Convnext module being configured to introduce a SiLU activation function into the Convnext module;

the SilU activation function is defined as follows:

8. The improved YOLOV5 based light-weight infrared flame detection method of claim 1, wherein the process of computing, detecting accuracy and reasoning speed evaluation of the infrared flame target detection box comprises: