CN115457428A - Improved YOLOv5 fire detection method and device integrating adjustable coordinate residual attention - Google Patents

Improved YOLOv5 fire detection method and device integrating adjustable coordinate residual attention Download PDF

Info

Publication number
CN115457428A
CN115457428A CN202210981425.XA CN202210981425A CN115457428A CN 115457428 A CN115457428 A CN 115457428A CN 202210981425 A CN202210981425 A CN 202210981425A CN 115457428 A CN115457428 A CN 115457428A
Authority
CN
China
Prior art keywords
fire
module
network
attention
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210981425.XA
Other languages
Chinese (zh)
Inventor
李晓旭
张曦
于春雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Fire Research Institute of MEM
Original Assignee
Shenyang Fire Research Institute of MEM
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Fire Research Institute of MEM filed Critical Shenyang Fire Research Institute of MEM
Priority to CN202210981425.XA priority Critical patent/CN115457428A/en
Publication of CN115457428A publication Critical patent/CN115457428A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention provides an improved YOLOv5 fire detection method and device integrating adjustable coordinate residual attention, wherein the method comprises the following steps: constructing a fire data set, wherein the fire data set comprises video data and first picture data of different fire degrees collected in a laboratory ignition experiment, extracting second picture data from the video data, and adding marks of flame and/or smoke for the first picture data and the second picture data; establishing an improved YOLOv5 neural network which is integrated with adjustable coordinate residual attention, and training the improved YOLOv5 neural network by utilizing a fire data set to serve as a fire detection model; and deploying the fire detection model to the mobile terminal, and after receiving the real-time video data captured by the camera, detecting and identifying a fire target by the mobile terminal through the fire detection model on the real-time video data. The invention can not only identify and detect the flame information generated by the fire, but also identify and detect the smoke generated in the early stage of the fire, thereby reducing the loss of missing the optimal repair time in the early stage of the fire.

Description

Improved YOLOv5 fire detection method and device integrating adjustable coordinate residual attention
Technical Field
The invention relates to the technical field of computer vision and image type fire detection and identification, in particular to an improved YOLOv5 fire detection method and device integrating adjustable coordinate residual error attention.
Background
Accurate identification of early detection of a fire is an important means for fire safety, and it is valuable and necessary to research a fire monitoring alarm system with a rapid response capability. Fire monitoring and alarm systems have been under investigation for decades.
Chinese patent application CN113869567a describes a control method, device, computer device and storage medium based on fire prediction information for multiple scenes, which mainly aims at performing fire prediction control on fire scene data, such as temperature, smoke flow and fire fighting data. Chinese patent application CN113673748A discloses a fire prediction method based on an XGboost model, which mainly utilizes the XGboost model to realize the prediction of fire, but the XGboost model cannot model a space position and cannot well capture images.
Muhammad et al [ Muhammad K, ahmad J et al (2019) effective device CNN-based fire detection and localization in video surveillance applications, IEEE Trans Syst Man Cybern Syst 19 (7): 1419-1434] classify fire detection methods into two categories: conventional fire alarms and visual sensors assist in fire detection. Currently, most fire detection and fire alarm systems are based on conventional fire detection or fire alarm systems. For example, xu et al [ Xu Y, zhang J et al (2013) The structure of an automatic fire alarm system based on a visual instrument, J Tianjin Univ Techninol 29 (3): 30-36] propose a fire alarm system based on a fire alarm controller, temperature and smoke detectors. Hu et al [ Hu X (2013) Research and product reduction of MIR flame detector system, J Zhejiang Univ 10 (1): 78] propose a multiband infrared fire detector. However, systems based on these sensors have a limited monitoring range and the performance of these systems is susceptible to environmental changes.
With the popularization of video monitoring systems, the research of assisting fire detection by a visual sensor is receiving much attention. Advantages of image/video based detection of fire include fast response, insensitivity to ambient temperature, and concomitant real-time images or video of the fire scene. In image/video based fire detectors, fire objects are abstracted into image features generated from color, brightness, texture, shape and motion information. Toptas proposes a remote video monitoring system and an image processing technology based on a network camera for fire monitoring and alarm technology [ Toptas B, handy D. A new intelligent be color algorithm-based color space for fire/flame detection [ J ]. Soft Computing, 2019 (2): 1-12 ]. Wan et al [ Wan Z (2020) Fire detection from images based on single shot multi-box detector. Hohai unity, nanjing ] propose an improved SSD for detecting fires in images by using data to enhance and modify the proportion and number of default boxes, but with an accuracy of only 84.75%. Shen et al [ Shen D, chen X, yan W (2018) Flame detection using detection In: proceedings of the 2018 th international control on control automation and robotics, pp 416-420] propose an optimized YOLO model for detecting Flame objects from video frames. However, the data set employed lacks data diversity because the samples are from 194 images.
The difficulty of the fire identification technology based on digital image processing is the segmentation and extraction of the flame target. Previously, flame and smoke targets were extracted mainly by excavation methods and contour tracing techniques. However, in practical applications, the obtained image is noisy, and the noise area is not uniform in size, which often results in image damage. Not only does this take a lot of time, but also needs the difference of object and actual outline, does not adopt the attention to extract more characteristic information to the characteristic extraction part of training image with pertinence, can appear the condition of wrong recognition or missing recognition, has influenced speed and accuracy of flame recognition.
Disclosure of Invention
In view of the above, the present invention proposes an improved YOLOv5 fire detection method and apparatus incorporating adjustable coordinate residual attention that overcomes or at least partially addresses the above-mentioned problems.
The invention provides an improved YOLOv5 fire detection method integrated with adjustable coordinate residual attention, which comprises the following steps:
constructing a fire data set, wherein the fire data set comprises video data and first picture data of different fire degrees collected in a laboratory ignition experiment, extracting second picture data from the video data, and adding marks of flame and/or smoke to the first picture data and the second picture data;
establishing an improved YOLOv5 neural network integrated with adjustable coordinate residual attention, and training the improved YOLOv5 neural network by using the fire data set to serve as a fire detection model;
and deploying the fire detection model to a mobile terminal, and after the mobile terminal receives real-time video data captured by a camera, detecting and identifying a fire target by the mobile terminal through the fire detection model.
Optionally, the improved YOLOv5 neural network that blends in adjustable coordinate residual attention comprises: backbone network Backbone, neck network Neck and Head network Head;
the Backbone network Backbone is mainly used for extracting key features from an input image; the Neck network Neck is mainly used for creating a characteristic pyramid; the Head network Head is primarily responsible for the final detection step, which uses the anchor box to construct the final output vector with class probabilities, objectification scores, and bounding boxes.
Optionally, the establishing of the improved YOLOv5 neural network merged into the coordinate attention includes:
an attention mechanism is added to a backbone network for YOLOv5 feature extraction, the attention mechanism is utilized to encode the remote dependency relationship and the position information of the input image from the horizontal and vertical spatial directions respectively, and then the features are aggregated.
Optionally, the attention mechanism final output is represented as follows:
Figure 72432DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 536911DEST_PATH_IMAGE002
a graph of the input features is represented,
Figure 346736DEST_PATH_IMAGE003
and
Figure 871258DEST_PATH_IMAGE004
respectively representing the attention weights of the two spatial directions. The formula is expressed as follows:
Figure 891167DEST_PATH_IMAGE005
Figure 519856DEST_PATH_IMAGE006
wherein
Figure 308821DEST_PATH_IMAGE007
And
Figure 4244DEST_PATH_IMAGE008
respectively, feature tensors of the information decomposition of the feature F in two directions.
Figure 245870DEST_PATH_IMAGE009
(. A) and
Figure 927518DEST_PATH_IMAGE010
(. Cndot.) denotes convolution operations with convolution kernels of 1 × 1, respectively.
Figure 102147DEST_PATH_IMAGE011
For over-parameters, the level and the height can be automatically adjustedFeature weight in vertical direction.
Figure 702893DEST_PATH_IMAGE012
Figure 979285DEST_PATH_IMAGE013
Wherein the content of the first and second substances,
Figure 464624DEST_PATH_IMAGE014
and
Figure 228180DEST_PATH_IMAGE015
representing the original features in two directions, respectively. Concat (-) represents the splicing operation of two features.
Figure 265406DEST_PATH_IMAGE016
Figure 871837DEST_PATH_IMAGE017
Wherein H and W are respectively input into the height and width of the characteristic diagram,
Figure 285501DEST_PATH_IMAGE018
the output of the c-th channel with height h,
Figure 903564DEST_PATH_IMAGE019
is the output of the c-th channel of width w,
Figure 252637DEST_PATH_IMAGE020
is the input image characteristic of the c channel.
Optionally, the backbone network includes four bottleeck-CSP-New modules to replace the bottleeck-CSP modules in the original YOLOv5 neural network;
the Bottleneck-CSP-New module comprises a first module and a second module; the first module uses a convolution layer of 1 multiplied by 1 to reduce the number of channels by half, then controls the number of channels of a hidden layer in a Bottleneck module through a residual error structure Bottleneck module and parameters, and then passes through a Conv.2Dxl module without passing through a BN and an activation function; and the second module carries out shortcut connection operation on the input features and the output of the first module without any change, and finally outputs the input features after the convolution of BN + Relu and common Conv.2Dxl.
Optionally, the loss function of the fire detection model is as follows:
Figure 956151DEST_PATH_IMAGE021
the loss function is a total loss function of the model, and is specifically as follows:
Figure 642347DEST_PATH_IMAGE022
wherein the content of the first and second substances,
Figure 18313DEST_PATH_IMAGE023
represents the aspect ratio of the target detection frame,
Figure 131763DEST_PATH_IMAGE024
representing the aspect ratio of the prediction detection block.
Figure 588152DEST_PATH_IMAGE025
Figure 953405DEST_PATH_IMAGE026
Of these, A, B represents a prediction detection frame and a target detection frame, respectively. a and
Figure 405115DEST_PATH_IMAGE027
respectively representing the center point of the prediction detection frame and the target detection frame.
Figure 220625DEST_PATH_IMAGE028
Representing the euclidean distance between the two center points, c represents the diagonal length of the minimum occlusion region of the prediction frame that contains the target frame at the same time.
Figure 790408DEST_PATH_IMAGE029
Is a variable parameter hyperparameter.
Figure 818407DEST_PATH_IMAGE030
Wherein, the first and the second end of the pipe are connected with each other,
Figure 531148DEST_PATH_IMAGE029
the setting is 1, and the setting is,
Figure 127346DEST_PATH_IMAGE031
the setting is 2, and the setting is,
Figure 292748DEST_PATH_IMAGE032
and y is the judgment of whether the sample is a positive sample or not for the size of the prediction probability. The focus loss function replaces the cross entropy loss function as a confidence and classification loss for the network.
The invention also provides an improved YOLOv5 fire detection device incorporating adjustable coordinate residual attention, the device comprising:
the system comprises a data collection module, a data analysis module and a data analysis module, wherein the data collection module is used for constructing a fire data set, the fire data set comprises video data and first picture data of different fire degrees collected in a laboratory ignition experiment, extracting second picture data from the video data, and adding marks of flame and/or smoke for the first picture data and the second picture data;
the model establishing module is used for establishing an improved YOLOv5 neural network fused with coordinate attention, and training the improved YOLOv5 neural network by utilizing the fire data set to serve as a fire detection model;
and the model deployment module is used for deploying the fire detection model to a mobile terminal, and after the mobile terminal receives the real-time video data captured by the camera, the mobile terminal utilizes the fire detection model to detect and identify the fire target for the real-time video data.
The invention also provides a computer readable storage medium for storing program code for performing the method of any of the above.
The present invention also provides a computing device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform any of the methods described above according to instructions in the program code.
Aiming at the problems of low detection precision and low speed of the existing fire detection method and sensor, the invention provides an improved YOLOv5 fire detection method and device integrating adjustable coordinate residual attention on the basis of analyzing fire image characteristics. According to the invention, the characteristics of the image can be automatically extracted and learned by adopting the YOLOv5 neural network, firstly, the attention mechanism is added, and the position information is embedded into the channel attention, so that the network can obtain information in a larger range, and the detection precision of a small target and a fuzzy smoke boundary is improved. Meanwhile, a Bottleneck-CSP model in a backbone network is improved, model parameters are reduced, the volume of the model is reduced, and effective support is provided for deployment of the model. The method can quickly and accurately identify the detection object and visually detect the fire in real time. The method can identify and detect not only the flame information generated by the fire, but also the smoke generated in the early stage of the fire, and reduce the loss of missing the optimal repair time in the early stage of the fire. For early fire, the missing of the optimal remedy time is reduced, and the early fire detection is carried out in time.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof taken in conjunction with the accompanying drawings.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a schematic flow chart of an improved YOLOv5 fire detection method incorporating adjustable coordinate residual attention according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating the overall structure of an improved YOLOv5 network according to an embodiment of the present invention;
FIG. 3 illustrates an overall implementation of an attention mechanism of an embodiment of the present invention;
FIG. 4 is a schematic diagram showing a front-to-back comparison of an improved original BottleneckCSP module according to an embodiment of the present invention;
fig. 5 shows a schematic structural diagram of an improved YOLOv5 fire detection device incorporating adjustable coordinate residual attention according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
An embodiment of the present invention provides an improved YOLOv5 fire detection method incorporating adjustable coordinate residual attention, and as shown in fig. 1, the improved YOLOv5 fire detection method incorporating coordinate attention according to the embodiment of the present invention may include at least the following steps S101 to S103.
S101, a fire data set is constructed, the fire data set comprises video data and first picture data of different fire degrees collected in a laboratory ignition experiment, second picture data are extracted from the video data, and marks of flames and/or smoke are added to the first picture data and the second picture data.
Optionally, videos and pictures of different degrees of a fire disaster can be collected by performing multiple ignition tests in a laboratory, wherein the sizes of the burning discs are respectively subjected to simulation tests according to the sizes of the burning discs specified by an image fire detector in national standard special fire detector, small target fire picture data and/or video data are collected, and the video data are subjected to interception of fire picture data of multiple burning states. The method comprises the steps of marking fire pictures by using a picture marking tool (labelImg), marking a region of interest of each picture, manually marking flame and smoke parts in each picture, and optionally dividing first picture data and second picture data into a flame target only, a smoke target only and a smoke target and a flame target simultaneously. That is, for any image data, the target markers may be aligned, e.g., it is determined that the image data contains a smoke target, a flame target, or both a smoke target and a flame target.
S102, establishing an improved YOLOv5 neural network fused with coordinate attention, and training the improved YOLOv5 neural network by using the fire data set to serve as a fire detection model;
s103, deploying the fire detection model to a mobile terminal, and after receiving real-time video data captured by a camera, detecting and identifying a fire target by the mobile terminal through the fire detection model.
The embodiment can improve the traditional YOLOv5 neural network and add an attention mechanism; inputting the marked fire data set according to a format required by a neural network, inputting the fire data set into an improved YOLOv5 neural network for training and testing a result; and deploying the trained model to a mobile terminal to perform fire detection and identification tasks.
As shown in fig. 2, the improved YOLOv5 neural network fused to coordinate attention includes: backbone network Backbone, neck network Neck and Head network Head; the Backbone network Backbone is mainly used for extracting key features from an input image; the Neck network Neck is mainly used for creating a characteristic pyramid; the Head network Head is primarily responsible for the final detection step, which uses the anchor box to construct the final output vector with class probabilities, objectification scores, and bounding boxes.
The step S102 of building an improved YOLOv5 neural network that incorporates the residual attention of the adjustable coordinate includes: an attention mechanism is added to a backbone network for YOLOv5 feature extraction, the attention mechanism is utilized to encode the remote dependency relationship and the position information of the input image from the horizontal and vertical spatial directions respectively, and then the features are aggregated.
When the attention mechanism is integrated into improvement, coordinate attention can be added into a backbone network of Yolov5 feature extraction, wherein the coordinate attention is a lightweight and efficient attention mechanism, and position information is embedded into channel attention, so that a mobile network can acquire knowledge in a larger range. Fig. 3 shows an overall implementation schematic of the attention mechanism of the embodiment of the present invention. The attention mechanism of the present embodiment is a coordinate attention mechanism. This attention encodes the remote dependency and location information from the horizontal and vertical spatial directions, respectively, and then aggregates the features. Therefore, features need to be decomposed to capture position information from space. Specifically along the horizontal and vertical directions. To be input with a characteristic diagram, i.e.
Figure 390017DEST_PATH_IMAGE033
Using pooled nuclei
Figure 816319DEST_PATH_IMAGE034
And
Figure 708052DEST_PATH_IMAGE035
c-th channel with height h and width w for encoding horizontal and vertical direction features respectivelyThe output of the lanes are represented as:
Figure 626329DEST_PATH_IMAGE016
Figure 763175DEST_PATH_IMAGE017
wherein, H and W are respectively input into the height and width of the characteristic diagram,
Figure 184929DEST_PATH_IMAGE018
the output of the c-th channel with height h,
Figure 637776DEST_PATH_IMAGE019
is the output of the c-th channel with width w,
Figure 512191DEST_PATH_IMAGE020
is the input image characteristic of the c channel. The two transformations described above aggregate features with two spatial directions (X and Y). They generate a pair of directional perceptual feature maps that enable the attention mechanism to capture distant information of the feature map along one spatial path and preserve accurate position information along another spatial path. Attention-driven mechanisms are widely used to improve the performance of the model, and the inspiration for attention-driven mechanisms comes from the way the human eye observes things, as the human eye is always focusing on the most important aspects of things. Also, it allows the network to focus on important features, which contributes to the accuracy of the network. By applying an attention mechanism to the network model, the accuracy of the classification will be further improved. The essence of the attention mechanism is to weight the feature map, so that the model can pay attention to important feature information, and the generalization capability of the model is improved. The SE attention mechanism is to compute channel attention weights using 2D global pooling and weight the feature information to optimize the model. However, SE attention weights the channel dimensions of the feature map, but ignores the spatial dimensions, which is crucial in computer vision tasks. CBAM uses channel pooling and convolution to weight the spatial dimension. However, convolution cannotThe relevance of the remote information is captured, which is crucial for the visual task. Therefore, the invention provides a network fusion coordinate attention mechanism, and the coordinate attention mechanism can obtain cross information, position sensitivity and direction perception information. It helps the model to focus useful feature information. Global Average Pooling (GAP) is commonly used to compute channel attention weights and to globally encode spatial information, GAP for each image feature over the spatial dimension H × W. However, it calculates the channel attention weight by compressing the global spatial information, thereby losing spatial information. Thus, the two-dimensional global pooling is decomposed into one-dimensional global pooling in horizontal and vertical directions to efficiently utilize spatial and spectral information. Specifically, each spectral dimension in the feature map with spatial extent (H, 1) and (1, W) is encoded using 1D horizontal global pooling and vertical global pooling. The two formulas described above allow correlation of the remote information to be obtained in one spatial direction while preserving location information in the other spatial direction, which helps the network focus on more information useful for classification. The two signatures generated in the horizontal and vertical directions are then encoded into two attention weights, each weight capturing the relevance of the remote information from the input signature in one spatial direction.
The two transforms are spliced in the spatial dimension and the channels are compressed using a 1 x 1 convolution. The spatial information in the vertical and horizontal directions is then encoded using BatchNorm and nonlinearities, the encoded information is split, and the noted channels are adjusted to be equal to the number of channels in the input feature map using a 1 × 1 convolution. Then, normalization and weighted fusion are performed using sigmoid function. The attention mechanism final output is expressed as follows:
Figure 216842DEST_PATH_IMAGE001
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE037A
representing the input feature map, c representing the c-th channel, h, w representing the height and width of the input feature map, respectively. i. j is a function ofThe sublist represents the height and width of the current vector.
Figure 791305DEST_PATH_IMAGE003
And
Figure 149474DEST_PATH_IMAGE004
respectively, the attention weights for the two spatial directions. The formula is expressed as follows:
Figure 511185DEST_PATH_IMAGE005
Figure 488368DEST_PATH_IMAGE006
wherein
Figure 760081DEST_PATH_IMAGE007
And
Figure 164517DEST_PATH_IMAGE008
respectively, feature tensors of the information decomposition of the feature F in two directions.
Figure 13525DEST_PATH_IMAGE009
(. A) and
Figure 528820DEST_PATH_IMAGE010
(. Cndot.) denotes convolution operations with convolution kernels of 1 × 1, respectively.
Figure 671350DEST_PATH_IMAGE011
For hyper-parameters, feature weights in horizontal and vertical directions may be automatically adjusted.
The method aims at the flame to carry out target detection, and considers that the flame form is constantly changed along with time, and the flame form has different change characteristics in the horizontal direction and the vertical direction. Thus, passing the hyper-parameter
Figure 246688DEST_PATH_IMAGE038
Influence of changes in horizontal and vertical directions on recognition respectivelyAnd (6) adjusting the rows. Meanwhile, the initial characteristics of the flame are kept through residual connection, and the initial characteristics are combined with the coordinate attention characteristics to achieve a better recognition effect.
Figure 582991DEST_PATH_IMAGE039
Figure 777344DEST_PATH_IMAGE013
Wherein the content of the first and second substances,
Figure 882703DEST_PATH_IMAGE014
and
Figure 628942DEST_PATH_IMAGE015
representing the original features in two directions, respectively. Concat (-) represents the splicing operation of two features.
Further, the backbone network of the embodiment of the invention comprises four Bottleneck-CSP-New modules to replace the Bottleneck-CSP modules in the original YOLOv5 neural network; the Bottleneck-CSP-New module comprises a first module and a second module; the first module uses a convolution layer of 1 multiplied by 1 to reduce the number of channels by half, then controls the number of channels of a hidden layer in a Bottleneck module through a residual error structure Bottleneck module and parameters, and then passes through a Conv.2Dxl module without passing through a BN and an activation function; and the second module carries out short connection operation on the input characteristics and the output of the first module without any change, and finally outputs the input characteristics after the input characteristics are convolved by BN + Relu and common Conv.2Dxl.
The modified model size should be reduced as much as possible for deployment on a hardware device. Therefore, the backbone network of the YOLOv5 model is modified. The backbone network of the YOLOv5s architecture includes four bottleeck-CSP modules, each having a number of convolutional layers. Although the convolution process can extract image features, the parameters of the convolution kernel are many, resulting in many parameters in the recognition model. Thus, the convolutional layers on different branches of the original CSP module are deleted. The depth of the other branch, to which the input feature map of the Bottleneck-CSP module is directly connected to the output feature map, greatly reduces the number of parameters in the module. The four stages of the original backbone network using the Bottleneck-CSP module are replaced by four Bottleneck-CSP-New modules, and due to the lightweight attribute of the Bottleneck-CSP modules, the defect of deep feature extraction in the image can be finally caused, but the model parameters can be reduced while the image feature information can be well extracted by combining with an attention mechanism, so that the model is convenient to deploy.
Fig. 4 shows a front-back comparison schematic diagram of an improved original bottleeckcsp module according to an embodiment of the present invention. An original Bottleneck-CSP module is divided into two parts of a Bottleneck and a CSP, input features pass through two different modules, firstly, the first module uses a convolution layer (conv.2Dxl + BN + Hardwish) of 1 x 1 to reduce the number of channels by half, then the number of channels of a hidden layer in the Bottleneck module is controlled through parameters through the Bottleneck module with a residual error structure, and then the input features pass through a Conv.2Dxl module without passing through a BN and an activation function. Next is a second module, the input features pass through a module of Conv2d, not through the BN and activation functions. And performing short connection operation on the outputs of the two modules, and finally performing output after the BN + Relu and the common Conv.2Dxl convolution.
The modified Bottleneck-CSP-New module is also divided into two modules, the first module is consistent with the original feature extraction flow, the second part is that the input feature is subjected to short connection operation with the output of the first module without any change, and finally the output is carried out after the convolution of BN + Relu and common Conv.2Dxl.
After the modified YOLOv5 neural network is obtained, the training of the modified YOLOv5 neural network in step S102 may be continued. Training the improved YOLOv5 neural network may be performed in the following manner.
1. Firstly, the labeled data set is input according to the format required by the neural network. The input end adopts a Mosaic data enhancement mode, images are spliced in a mode of random zooming, random cutting and random arrangement, and the detection effect on the small-target fire image is obvious. And (4) carrying out self-adaptive anchor frame calculation, and calculating the optimal anchor frame value of different training sets in a self-adaptive manner during each training.
2. Secondly, in the network training stage, the detailed steps of each module are as follows:
(1) The Backbone network Backbone is mainly used for extracting key features from input images.
The Focus layer is an initial layer of the backbone network and is used for simplifying model calculation and improving training speed. The application is as follows: using the slicing technique, the three-channel image is first segmented into four slices, each 3 × 320 × 320. Second, the four parts are connected deeply using concatenation, with an output feature map size of 12 × 320 × 320.
The Conv layer is the second layer of the backbone network, and performs convolution operation on the input feature map through the convolution layer, and forms an output feature map with the size of 32 × 320 × 320 by adopting the convolution layer composed of 32 convolution kernels. Finally, the result is output to the next layer through the BN layer (batch normalization) and the Hardswish activation function.
The BottleneckCSP-New module is the third layer of the backbone network and is designed to more effectively extract the depth information of the image. The BottleneckCSP-New module is mainly composed of a Bottleneck module, and connects a convolution layer (Conv 2d + BN + ReLu activation function) with a convolution kernel size of 1 × 1 and a convolution kernel size of 3 × 3. The final output of the Bottleneck module is the sum of the output of the part and the initial input through the residual structure. The first input of the BottleneckCSP-New module is split into two branches, the number of feature mapping channels is halved using convolution in the two branches, and then the output feature maps of branch 1 and branch 2 are deeply connected through the input features in the Bottleneck module and the second branch using concat. Finally, after stepping through the BN and Conv2d layers, an output profile of the module is created.
The CA-Res module is the fourth layer of the backbone network, and adopts an attention mechanism, wherein the attention mechanism encodes remote dependency and position information of an input image from the horizontal and vertical spatial directions respectively, learns horizontal and vertical direction characteristics through an adjustable residual error structure, and aggregates the horizontal and vertical direction characteristics. In particular toSaid, the characteristic diagram output by the BottleneckCSP-New module, namely
Figure 186962DEST_PATH_IMAGE033
Using pooled nuclei
Figure 965431DEST_PATH_IMAGE034
And
Figure 925297DEST_PATH_IMAGE035
coding the horizontal and vertical direction features respectively, learning the horizontal and vertical direction features through an adjustable residual error structure, aggregating the horizontal and vertical direction features, and performing normalization and weighted fusion on the horizontal, vertical and input features by using a sigmoid function.
And then, repeating the Conv module, the BottleneckCSP-New module and the CA module twice, and performing Conv convolution operation on the output image characteristics once again.
The SPP block (spatial pyramid pool) is the twelfth layer of the backbone network and is designed to increase the field of view of the network by converting an arbitrarily sized feature map into a fixed sized feature vector. After the convolutional layer loop, outputting a feature map of size 256 × 20 × 20; the convolution kernel size is 1 × 1. Then, the feature map is sub-sampled by three concurrent max boosting layers, and the feature map is deeply connected with the output feature map, and the size of the output feature map is 1024 × 20 × 20.
(2) The hack model is mainly used to create a feature pyramid. The feature pyramid helps in successful generalization of the model in object scaling and helps in identifying the same object of different size and scale.
Figure 717804DEST_PATH_IMAGE040
This formula is used to select a feature map, where 224 is the pre-trained size of a typical graph network, x, y are the width x and height y of ROl (the region of interest), respectively,
Figure 28699DEST_PATH_IMAGE041
is the target level to which the region of interest of x y =224 should map. The feature pyramid is very beneficial to help the model perform well on unknown data.
(3) The Head model is primarily responsible for the final detection step, which uses the anchor box to construct the final output vector with class probabilities, objectification scores, and bounding boxes. The detection network of the YOLOv5s structure includes three detection layers, each of which inputs a feature map having sizes of 80 × 80, 40 × 40, and 20 × 20 for detecting image objects of various sizes. Each detection layer outputs a 21-channel vector comprising two classes, a class probability, four surrounding frame position coordinates and three anchor frames. Then, a prediction boundary box and a category of the target in the original image are generated and marked, so that the detection of the image target is realized.
Computational adoption of a loss function
Figure 689488DEST_PATH_IMAGE042
A loss function of the fire detection model as follows:
Figure 395538DEST_PATH_IMAGE021
the loss function is the total loss function of the model.
When the prediction detection frame and the target detection frame do not intersect, ioU cannot reflect the distance between the two structures. At this point, the function is not conducive and cannot be optimized. The second problem is that IoU are identical, but the positions of the prediction detection frames are different, so IoU _ Loss cannot distinguish the intersection of the two. DIoU _ loss solves the above problem by considering the overlapping area of two frames and the distance of the center point. The CIoU _ Loss introduces the length-width ratio of two frames on the basis of the DIoU _ Loss, so that the intersection is converged faster than when the intersection is 0, and the regression result is better. Aspect ratio
Figure 749159DEST_PATH_IMAGE043
And
Figure 281771DEST_PATH_IMAGE044
the expression of (c) is as follows:
Figure 480671DEST_PATH_IMAGE045
wherein the content of the first and second substances,
Figure 290496DEST_PATH_IMAGE023
represents the aspect ratio of the target detection frame,
Figure 80597DEST_PATH_IMAGE024
representing the aspect ratio of the prediction detection block.
Figure 834926DEST_PATH_IMAGE046
Figure 227731DEST_PATH_IMAGE026
Of these, A, B denotes a prediction detection frame and a target detection frame, respectively. a and
Figure 16695DEST_PATH_IMAGE027
respectively representing the center point of the prediction detection frame and the target detection frame.
Figure 446539DEST_PATH_IMAGE028
Representing the euclidean distance between the two center points, c represents the diagonal length of the minimum occlusion region of the prediction frame that contains the target frame at the same time.
Figure 953744DEST_PATH_IMAGE029
Is a variable parameter hyperparameter.
In order to solve the problem of imbalance of positive and negative samples, focus loss is adopted to replace a cross entropy loss function to serve as confidence coefficient and classification loss of the network. The method gives higher loss weight to the foreground image, so that the model is more concentrated on classification of the foreground.
Figure 635392DEST_PATH_IMAGE047
Wherein, the first and the second end of the pipe are connected with each other,
Figure 278863DEST_PATH_IMAGE029
the setting is 1, and the setting is,
Figure 145188DEST_PATH_IMAGE031
the setting is 2, and the setting is,
Figure 765787DEST_PATH_IMAGE032
and y is the judgment of whether the sample is a positive sample or not.
The method comprises the steps of deploying trained and trained fire detection models to a mobile terminal to detect and identify fire targets, deploying trained weights and models to the mobile terminal, capturing videos through a camera and extracting the videos to the mobile terminal, then performing characteristic analysis on each frame of image of an input video stream by using the fire detection models, judging whether smoke targets and/or flame targets exist or not, detecting emerging fires in real time, outputting alarms when smoke and/or flames are detected, and popping up a detection frame to prompt relevant personnel to execute fire extinguishing measures.
The image with fire marks is input into an improved network model and passes through three modules, namely a backbone network, a neck network and a head network. The backbone network includes a slice network, a convolutional network, a modified bottleeckcsp network, a coordinate attention mechanism, and an SPP network, and the three-channel image is first segmented into four slices, each slice being 3 × 320 × 320, using a slicing technique. Secondly, the four parts are deeply connected by cascade connection, the output feature mapping size is 12 multiplied by 320, the input features of the image are extracted by a series of feature extraction networks, and the output feature mapping size is 1024 multiplied by 20. The neck network retains spatial information through up-sampling and down-sampling operations. And finally, sampling the feature graphs without sizes, processing the feature graphs into the same size, performing feature fusion and convolution operation to obtain 3 feature layers of 20 × 255, 40 × 255 and 80 × 255, calculating by using a GIoU loss function, predicting and generating a prediction result. And screening the multi-target frames by using the non-maximum value.
An embodiment of the present invention further provides an improved YOLOv5 fire detection apparatus incorporating adjustable coordinate residual attention, and as shown in fig. 5, the improved YOLOv5 fire detection apparatus incorporating coordinate attention according to an embodiment of the present invention may include:
a data collecting module 510, configured to construct a fire data set, where the fire data set includes video data and first picture data of different fire degrees collected in a laboratory ignition experiment, extract second picture data from the video data, and add signs of flame and/or smoke to the first picture data and the second picture data;
a model establishing module 520, configured to establish an improved YOLOv5 neural network that incorporates coordinate attention, and train the improved YOLOv5 neural network using the fire data set as a fire detection model;
a model deployment module 530, configured to deploy the fire detection model to a mobile terminal, and after the mobile terminal receives real-time video data captured by a camera, the mobile terminal uses the fire detection model to detect and identify a fire target for the real-time video data.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium is used for storing a program code, and the program code is used for executing the method described in the above embodiment.
An embodiment of the present invention further provides a computing device, where the computing device includes a processor and a memory: the memory is used for storing program codes and transmitting the program codes to the processor; the processor is configured to perform any of the methods described above according to instructions in the program code.
It is clear to those skilled in the art that the specific working processes of the above-described systems, devices, modules and units may refer to the corresponding processes in the foregoing method embodiments, and for the sake of brevity, further description is omitted here.
In addition, the functional units in the embodiments of the present invention may be physically independent of each other, two or more functional units may be integrated together, or all the functional units may be integrated in one processing unit. The integrated functional units may be implemented in the form of hardware, or in the form of software or firmware.
Those of ordinary skill in the art will understand that: the integrated functional units, if implemented in software and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computing device (e.g., a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention when the instructions are executed. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Alternatively, all or part of the steps of the method embodiments may be implemented by hardware (such as a personal computer, a server, or a network device) related to program instructions, which may be stored in a computer-readable storage medium, and when the program instructions are executed by a processor of the computing device, the computing device executes all or part of the steps of the method according to the embodiments of the present invention.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments can be modified or some or all of the technical features can be equivalently replaced within the spirit and principle of the present invention; such modifications or substitutions do not depart from the scope of the present invention.

Claims (9)

1. An improved YOLOv5 fire detection method incorporating adjustable coordinate residual attention, the method comprising:
constructing a fire data set, wherein the fire data set comprises video data and first picture data of different fire degrees collected in a laboratory ignition experiment, extracting second picture data from the video data, and adding marks of flame and/or smoke to the first picture data and the second picture data;
establishing an improved YOLOv5 neural network integrated with adjustable coordinate residual attention, and training the improved YOLOv5 neural network by using the fire data set to serve as a fire detection model;
and deploying the fire detection model to a mobile terminal, and after the mobile terminal receives real-time video data captured by a camera, detecting and identifying a fire target by the mobile terminal through the fire detection model.
2. The method of claim 1, wherein the improved YOLOv5 neural network that blends in adjustable coordinate residual attention comprises: backbone network Backbone, neck network Neck and Head network Head;
the Backbone network backhaul is mainly used for extracting key features from input images; the Neck network Neck is mainly used for creating a characteristic pyramid; the Head network Head is primarily responsible for the final detection step, which uses the anchor box to construct the final output vector with class probabilities, objectification scores, and bounding boxes.
3. The method of claim 2, wherein the building of the improved YOLOv5 neural network that blends in adjustable coordinate residual attention comprises:
adding an attention mechanism into a backbone network for YOLOv5 feature extraction, coding remote dependency relationship and position information of an input image from horizontal and vertical spatial directions respectively by using the attention mechanism, learning horizontal and vertical direction features through an adjustable residual error structure, and aggregating the features in the horizontal and vertical directions.
4. The method of claim 3, wherein the adjustable coordinate residual attention mechanism final output is represented as follows:
Figure 415694DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 862855DEST_PATH_IMAGE002
a graph of the input features is represented,
Figure 318107DEST_PATH_IMAGE003
and
Figure 483510DEST_PATH_IMAGE004
the attention weights for the two spatial directions are expressed separately, and the formula is as follows:
Figure 315199DEST_PATH_IMAGE005
Figure 882447DEST_PATH_IMAGE006
wherein
Figure 774180DEST_PATH_IMAGE007
And
Figure 426878DEST_PATH_IMAGE008
the feature tensors that are the information decomposition of the feature F in two directions,
Figure 563723DEST_PATH_IMAGE009
(. A) and
Figure 719898DEST_PATH_IMAGE010
(. Cndot.) represents convolution operations with convolution kernels of 1 × 1 respectively,
Figure 782532DEST_PATH_IMAGE011
the feature weight in the horizontal and vertical directions can be automatically adjusted for the hyper-parameter;
Figure 656947DEST_PATH_IMAGE012
Figure 96019DEST_PATH_IMAGE013
wherein the content of the first and second substances,
Figure 637859DEST_PATH_IMAGE014
and
Figure 871394DEST_PATH_IMAGE015
respectively representing original features in two directions, and Concat (·) represents splicing operation of the two features;
Figure 233105DEST_PATH_IMAGE016
Figure 210288DEST_PATH_IMAGE017
wherein H and W are respectively input into the height and width of the characteristic diagram,
Figure 341055DEST_PATH_IMAGE018
is the output of the c-th channel with height h,
Figure 745492DEST_PATH_IMAGE019
is the output of the c-th channel with width w,
Figure 594499DEST_PATH_IMAGE020
is the input image characteristic of the c channel.
5. The method of claim 2, wherein the backbone network comprises four Bottleneck-CSP-New modules to replace the Bottleneck-CSP modules in the original YOLOv5 neural network;
the Bottleneck-CSP-New module comprises a first module and a second module; the first module uses a convolution layer of 1 multiplied by 1 to reduce the number of channels by half, then controls the number of channels of a hidden layer in a Bottleneck module through a residual error structure Bottleneck module and parameters, and then passes through a Conv.2Dxl module without passing through a BN and an activation function; and the second module carries out short connection operation on the input characteristics and the output of the first module without any change, and finally outputs the input characteristics after the input characteristics are convolved by BN + Relu and common Conv.2Dxl.
6. The method of claim 2, wherein the loss function of the fire detection model
Figure 109794DEST_PATH_IMAGE021
The following were used:
Figure 626226DEST_PATH_IMAGE022
the
Figure 691310DEST_PATH_IMAGE023
The loss function is the total loss function of the model, and is specifically as follows:
Figure 27613DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure 346599DEST_PATH_IMAGE025
represents the aspect ratio of the target detection frame,
Figure 451959DEST_PATH_IMAGE026
representing the aspect ratio of the prediction detection block;
Figure 198198DEST_PATH_IMAGE027
Figure 756218DEST_PATH_IMAGE028
wherein A, B denotes a prediction detection frame and a target detection frame, respectively, a and
Figure 144474DEST_PATH_IMAGE029
respectively representing the central point of the prediction detection frame and the target detection frame,
Figure 104340DEST_PATH_IMAGE030
representing the euclidean distance between two center points, c representing the length of the diagonal of the minimum occlusion region of the prediction frame containing the target frame at the same time,
Figure 21480DEST_PATH_IMAGE031
a hyper-parameter that is a variable parameter;
Figure 332376DEST_PATH_IMAGE032
wherein the content of the first and second substances,
Figure 993164DEST_PATH_IMAGE031
the setting is 1, and the setting is,
Figure 807537DEST_PATH_IMAGE033
the setting is 2, and the setting is,
Figure 426737DEST_PATH_IMAGE034
and (4) for predicting the probability, y is for judging whether the sample is a positive sample, and the focus loss function replaces the cross entropy loss function to be used as the confidence coefficient and the classification loss of the network.
7. An improved YOLOv5 fire detection device incorporating adjustable coordinate residual attention, the device comprising:
the data collection module is used for constructing a fire data set, wherein the fire data set comprises video data and first picture data of different fire degrees collected in a laboratory ignition experiment, extracting second picture data from the video data, and adding marks of flame and/or smoke for the first picture data and the second picture data;
the model building module is used for building an improved YOLOv5 neural network which is fused with adjustable coordinate residual attention, and training the improved YOLOv5 neural network by using the fire data set to serve as a fire detection model;
and the model deployment module is used for deploying the fire detection model to a mobile terminal, and after the mobile terminal receives the real-time video data captured by the camera, the mobile terminal utilizes the fire detection model to detect and identify the fire target for the real-time video data.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium is configured to store a program code for performing the method of any of claims 1-6.
9. A computing device, the computing device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method of any of claims 1-6 according to instructions in the program code.
CN202210981425.XA 2022-08-16 2022-08-16 Improved YOLOv5 fire detection method and device integrating adjustable coordinate residual attention Pending CN115457428A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210981425.XA CN115457428A (en) 2022-08-16 2022-08-16 Improved YOLOv5 fire detection method and device integrating adjustable coordinate residual attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210981425.XA CN115457428A (en) 2022-08-16 2022-08-16 Improved YOLOv5 fire detection method and device integrating adjustable coordinate residual attention

Publications (1)

Publication Number Publication Date
CN115457428A true CN115457428A (en) 2022-12-09

Family

ID=84297762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210981425.XA Pending CN115457428A (en) 2022-08-16 2022-08-16 Improved YOLOv5 fire detection method and device integrating adjustable coordinate residual attention

Country Status (1)

Country Link
CN (1) CN115457428A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116342894A (en) * 2023-05-29 2023-06-27 南昌工程学院 GIS infrared feature recognition system and method based on improved YOLOv5
CN116362139A (en) * 2023-04-14 2023-06-30 应急管理部沈阳消防研究所 Multi-parameter fire detection method based on hierarchical long-short-time memory network
CN116416457A (en) * 2023-02-21 2023-07-11 四川轻化工大学 Safety situation sensing and danger early warning method for electric power maintenance vehicle
CN116843999A (en) * 2023-09-04 2023-10-03 四川泓宝润业工程技术有限公司 Gas cylinder detection method in fire operation based on deep learning
CN116863252A (en) * 2023-09-04 2023-10-10 四川泓宝润业工程技术有限公司 Method, device, equipment and storage medium for detecting inflammable substances in live fire operation site
CN117197658A (en) * 2023-08-08 2023-12-08 北京科技大学 Building fire multi-target detection method and system based on multi-situation generated image

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416457A (en) * 2023-02-21 2023-07-11 四川轻化工大学 Safety situation sensing and danger early warning method for electric power maintenance vehicle
CN116416457B (en) * 2023-02-21 2023-10-20 四川轻化工大学 Safety situation sensing and danger early warning method for electric power maintenance vehicle
CN116362139A (en) * 2023-04-14 2023-06-30 应急管理部沈阳消防研究所 Multi-parameter fire detection method based on hierarchical long-short-time memory network
CN116362139B (en) * 2023-04-14 2024-01-30 应急管理部沈阳消防研究所 Multi-parameter fire detection method based on hierarchical long-short-time memory network
CN116342894A (en) * 2023-05-29 2023-06-27 南昌工程学院 GIS infrared feature recognition system and method based on improved YOLOv5
CN117197658A (en) * 2023-08-08 2023-12-08 北京科技大学 Building fire multi-target detection method and system based on multi-situation generated image
CN116843999A (en) * 2023-09-04 2023-10-03 四川泓宝润业工程技术有限公司 Gas cylinder detection method in fire operation based on deep learning
CN116863252A (en) * 2023-09-04 2023-10-10 四川泓宝润业工程技术有限公司 Method, device, equipment and storage medium for detecting inflammable substances in live fire operation site
CN116863252B (en) * 2023-09-04 2023-11-21 四川泓宝润业工程技术有限公司 Method, device, equipment and storage medium for detecting inflammable substances in live fire operation site
CN116843999B (en) * 2023-09-04 2023-12-08 四川泓宝润业工程技术有限公司 Gas cylinder detection method in fire operation based on deep learning

Similar Documents

Publication Publication Date Title
CN115457428A (en) Improved YOLOv5 fire detection method and device integrating adjustable coordinate residual attention
Hu et al. Fast forest fire smoke detection using MVMNet
US20230045519A1 (en) Target Detection Method and Apparatus
CN113705478B (en) Mangrove single wood target detection method based on improved YOLOv5
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN110084165B (en) Intelligent identification and early warning method for abnormal events in open scene of power field based on edge calculation
CN112001339A (en) Pedestrian social distance real-time monitoring method based on YOLO v4
Kim et al. High-speed drone detection based on yolo-v8
CN111553403B (en) Smog detection method and system based on pseudo-3D convolutional neural network
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN111611889B (en) Miniature insect pest recognition device in farmland based on improved convolutional neural network
CN111047565A (en) Method, storage medium and equipment for forest cloud image segmentation
CN109902576B (en) Training method and application of head and shoulder image classifier
CN114155474A (en) Damage identification technology based on video semantic segmentation algorithm
CN112349057A (en) Deep learning-based indoor smoke and fire detection method
Jiang et al. A self-attention network for smoke detection
CN113469050A (en) Flame detection method based on image subdivision classification
CN114399799A (en) Mask wearing detection method based on YOLOv5 network
CN116052082A (en) Power distribution station room anomaly detection method and device based on deep learning algorithm
CN111062950A (en) Method, storage medium and equipment for multi-class forest scene image segmentation
CN113627504B (en) Multi-mode multi-scale feature fusion target detection method based on generation of countermeasure network
CN111260687A (en) Aerial video target tracking method based on semantic perception network and related filtering
CN115272882A (en) Discrete building detection method and system based on remote sensing image
CN114662605A (en) Flame detection method based on improved YOLOv5 model
CN111191575B (en) Naked flame detection method and system based on flame jumping modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination