CN114842185A

CN114842185A - Method, device, equipment and medium for identifying fire

Info

Publication number: CN114842185A
Application number: CN202210304551.1A
Authority: CN
Inventors: 艾如飞; 李才博; 朱国绪; 吴斌; 王迅
Original assignee: Zhaotong Liangfengtai Information Technology Co ltd
Current assignee: Zhaotong Liangfengtai Information Technology Co ltd
Priority date: 2022-03-21
Filing date: 2022-03-21
Publication date: 2022-08-02

Abstract

The invention provides a method, a device, equipment and a medium for identifying a fire target, which relate to the field of target detection and comprise the following steps: acquiring a plurality of fire images containing smoke or flame, and labeling the smoke or flame in the fire images to generate training samples; establishing an initial model based on a YOLO network structure, and configuring parameters; in the initial model, a compression and excitation network module is arranged after convolution, and weight distribution is carried out on the image after convolution on channel dimension; setting a convolution block attention module behind the convolution block, so that the image passing through the convolution block passes through the channel attention module and the space attention module in sequence to perform weight distribution on the channel dimension and the space dimension respectively again; training the initial model by adopting the training sample to obtain a target model for identifying smoke or flame; the method comprises the steps of collecting a real-time image, adopting a target model to carry out target identification on the real-time image, obtaining a target result, and solving the problem that the existing image processing-based fire disaster prevention is poor in precision and slow in finding.

Description

Method, device, equipment and medium for identifying fire

Technical Field

The present invention relates to the field of target detection, and in particular, to a method, an apparatus, a device, and a medium for identifying a fire.

Background

The early discovery of fire is always a big problem in real life, the physical signs of the early flame are tiny or shielded by solid substances such as buildings and the like, so that the later fire is out of control, large-scale or serious economic loss and casualties are caused, and when disasters such as fire occur, the severity of the disasters is determined by the speed and the accuracy of the reaction. In actual life, equipment such as smoke sensors or traditional image processing technologies are widely applied to buildings for fire prevention, but the fire prevention method has the defects that the fire prevention method is late in discovery, serious in misjudgment, low in precision, limited to be applied in narrow spaces, not suitable for outdoor large-scale popularization and the like.

Disclosure of Invention

In order to overcome the technical defects, the invention aims to provide a method, a device, equipment and a medium for identifying a fire target, which are used for solving the problem that the existing image processing-based fire prevention is low in precision and slow in finding.

The invention discloses a target identification method for fire, which comprises the following steps:

acquiring a plurality of fire images containing smoke or flame, and labeling the smoke or flame in the fire images to generate training samples;

establishing an initial model based on a YOLO network structure, and configuring parameters; the initial model comprises a Foucs block, a convolution block consisting of a convolution layer, a batch standardization layer and an activation layer, a CPS bottleneck layer and a space pyramid pooling layer;

in the initial model, a compression and excitation network module is arranged after convolution, and weight distribution is carried out on the image after convolution on channel dimension;

setting a convolution block attention module behind the convolution block, so that the image passing through the convolution block passes through the channel attention module and the space attention module in sequence to perform weight distribution on the channel dimension and the space dimension respectively again;

training the initial model by adopting the training sample to obtain a target model for identifying smoke or flame;

and acquiring a real-time image, and performing target identification on the real-time image by adopting the target model to obtain a target result.

Preferably, the setting of the compression and excitation network module after convolution and the weight assignment of the convolved image on the channel dimension include:

performing compression operation on the convolved image to obtain a channel-level global feature image;

performing excitation operation on the global feature image to generate corresponding relations and weights of all channels;

and carrying out pixel weighting on the convolved image according to the corresponding relation and the weight of each channel.

Preferably, the compressing the convolved image to obtain the global feature image at the channel level includes:

acquiring spatial features on a channel, and coding the spatial features into global features;

and performing global tie pooling on the convolved image based on the global features to obtain a channel-level global feature image.

Preferably, the excitation operation is performed on the global feature image to generate the corresponding relationship and the weight of each channel, and the method includes:

carrying out dimensionality reduction on the global feature image from an initial dimensionality by adopting a first full-connection layer, and processing through an activation function with a first scaling parameter to obtain a first processed image;

and utilizing a second full-connection layer to carry out dimension raising on the first processed image to an initial dimension, and obtaining the corresponding relation and the weight of each channel through activation function processing with a second scaling function.

Preferably, the image after passing through the rolling block passes through the channel attention module and the space attention module in sequence, so as to perform weight assignment again on the channel dimension and the space dimension respectively, including:

compressing the image after passing through the rolling block in a space dimension by adopting a channel attention module, and summing and combining by utilizing average pooling and maximum pooling mapping to generate a channel attention diagram;

performing weight distribution on the channel dimension after passing through the rolling block based on the channel attention map to obtain an intermediate image;

compressing the intermediate image on a channel dimension by adopting a space attention module, and combining average pooling and maximum pooling to generate a space attention map;

weight assignment of the intermediate image in a spatial dimension based on the spatial attention map.

Preferably, the sequentially labeling the smoke or the flames in the fire image to generate a training sample includes:

sequencing the fire images according to the positions and the sizes of the smoke or the flame;

and marking the smoke or flame in each fire image one by adopting a text format label of the YOLO and arranging the smoke or flame in sequence to generate a training sample.

Preferably, the target model comprises a trained fixed-parameter Foucs block, a convolution block attention module, a CPS bottleneck layer, a spatial pyramid pooling layer and a compression and excitation network module.

The present invention also provides a target recognition apparatus for a fire, including:

the system comprises a sample generation module, a data acquisition module and a data processing module, wherein the sample generation module is used for acquiring a plurality of fire images containing smoke or flame, marking the smoke or flame in the fire images and generating training samples;

the model establishing module is used for establishing an initial model based on a YOLO network structure and configuring parameters; the initial model comprises a Foucs block, a convolution block consisting of a convolution layer, a batch standardization layer and an activation layer, a CPS bottleneck layer and a space pyramid pooling layer;

the model training module is used for training the initial model by adopting the training sample to obtain a target model for identifying smoke or flame;

and the execution module is used for acquiring a real-time image, and performing target identification on the real-time image by adopting the target model to obtain a target result.

The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the steps of the object recognition method being implemented by the processor of the computer device when the computer program is executed by the processor.

The invention also provides a computer-readable storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the object recognition method.

After the technical scheme is adopted, compared with the prior art, the method has the following beneficial effects:

in the scheme, a fire image is collected to generate a training sample, an initial model based on a YOLO network structure is established, an attention mechanism of a channel dimension and a space dimension is increased to be used for identifying small targets such as smoke or flame in a fire, then the training sample is guided in to be trained, the generated target model is utilized to identify, rapid and accurate identification can be achieved when the small targets such as smoke or flame exist, early warning is triggered immediately, and the problem that the existing fire prevention accuracy based on image processing is poor, so that the finding is slow is solved.

Drawings

FIG. 1 is a flowchart of a first embodiment of a method for identifying a fire according to the present invention;

FIG. 2 is a flow chart of a compression and excitation network module for implementing convolution and performing weight distribution on a convolved image in channel dimensions according to a first embodiment of the target identification method for fire;

FIG. 3 is a block diagram of a compression and excitation network module according to an embodiment of the method for fire target identification according to the present invention;

fig. 4 is a flowchart for showing that the image after passing through the rolling block passes through the channel attention module and the space attention module in sequence to perform weight distribution again in the channel dimension and the space dimension, respectively, in the first embodiment of the target identification method for fire disaster according to the present invention;

FIG. 5 is a block diagram illustrating a rolling block attention module according to an embodiment of the method for identifying a fire;

FIG. 6 is a visualization result of a target model in an embodiment of the method for identifying a fire according to the present invention;

FIG. 7 is a diagram illustrating the analysis of the target model for visualization in an embodiment of the method for identifying a fire according to the present invention;

FIG. 8 is a schematic diagram of program modules of a second embodiment of the object recognition device for fire according to the present invention;

fig. 9 is a schematic diagram of a hardware structure of a computer device according to a third embodiment of the present invention.

Reference numerals: 7-target identification means for fire; 71-a sample generation module; 72-model building module; 73-a model training module; 74-execution Module.

Detailed Description

The advantages of the invention are further illustrated in the following description of specific embodiments in conjunction with the accompanying drawings.

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.

The first embodiment is as follows: the present embodiment provides a method for identifying a fire target, referring to fig. 1, including the steps of:

s100: acquiring a plurality of fire images containing smoke or flame, and labeling the smoke or flame in the fire images to generate training samples;

in the embodiment, a large number of actual fire photos of real life are collected as samples before model training is established, specifically, the samples can include but are not limited to images, videos and other image data of an actual fire scene, and the images in the actual scene are adopted, so that a target model obtained after subsequent model training is more suitable for fire early warning.

Specifically, the sequentially labeling of smoke or flames in the fire image to generate a training sample includes:

s110: sequencing the fire images according to the positions and the sizes of the smoke or the flame;

specifically, in the above step, the sorting of the fire images is to determine a process of smoke or flame in the fire from small to large, and determine an order of the smoke or flame from a first image to a last image, so that the smoke or flame is sequentially input into the initial model for training, and is used for the initial model to learn and distinguish the smoke or flame in each state. It should be noted that, in the process, each fire image may also be preprocessed, for example, each fire image may be adjusted to the same size or to a uniform format.

S120: and marking the smoke or flame in each fire image one by adopting a text format label of the YOLO and arranging the smoke or flame in sequence to generate a training sample.

Illustratively, in the above steps, the smoke or flame is labeled by using an open source labeling tool Labellmg, and the labeling format adopted in the labeling process generally includes, but is not limited to, VOCxml format, COCOjson format, YOLOtxt format, DOTAtxt format, and TXT format adopting YOLO in the present embodiment.

S200: establishing an initial model based on a YOLO network structure, and configuring parameters; the initial model comprises a Foucs block, a convolution block consisting of a convolution layer, a batch standardization layer and an activation layer, a CPS bottleneck layer and a space pyramid pooling layer;

specifically, the initial model of the present embodiment is based on the YOLOV5 framework, and as an explanation, the Focus block has a core of a slicing operation on a picture, and 32 convolution kernels are used in the structure. Convolution block Conv: conv consists of Conv (convolution layer) + BN (batch normalization layer) + Leaky _ relu activation function (activation layer). Module parameter args analysis: 128 of [ -1,1, Conv, [128,3,2] ] is the number of convolution kernels, the final number needs to be multiplied by width 128 x 1 x 128,3 is the convolution kernel 3 x 3,2 is the step size. CPS bottleneck layer, i.e. bottleeckcsp: the system consists of three convolution layers and X Resunint modules, if the system is provided with a False parameter, the Resunint modules are not used, and a conv + BN + Leaky _ relu structure is adopted, so that the size of an input size is not changed. The spatial pyramid pooling layer (SPP) performs multi-scale fusion in a maximal pooling manner of 1 × 1, 5 × 5, 9 × 9, 13 × 13. In the present embodiment, an improvement is made based on the YOLO network structure, specifically, an attention mechanism of a channel dimension and a space dimension is added for identifying a small target such as smoke or flame in a fire, where the small target refers to a state of smoke or flame in an early stage of a fire process.

S300: in the initial model, a compression and excitation network module is arranged after convolution, and weight distribution is carried out on the image after convolution on channel dimensionality;

the compression and Excitation network module is an SE module, which mainly includes two parts, namely, Squeeze (compression) and Excitation, essentially, the SE module performs an attention or gating operation on channel dimensions, the attention mechanism enables the model to pay more attention to channel features with the largest information amount, and suppress the unimportant channel features, and the module is embedded behind a convolution layer, specifically, the compression and Excitation network module is arranged after convolution, and the weight of a convolved image is assigned on the channel dimensions, which includes the following steps:

s310: performing compression operation on the convolved image to obtain a channel-level global feature image;

the above-mentioned global feature image of the channel level, that is, the global feature of the channel level, is only operated in a local space, and it is difficult to obtain enough information to extract the relationship between channels, and this kind of phenomenon is more serious for the former in the network, and its receptive field is smaller. Therefore, it is required to obtain a global feature image at a channel level, and more specifically, performing a compression operation on the convolved image to obtain a global feature image at a channel level, including:

s311: acquiring spatial features on a channel, and coding the spatial features into global features;

s312: and performing global tie pooling on the convolved image based on the global features to obtain a channel-level global feature image.

That is, in the above steps, global average potential is used for implementation, but it should be noted that, in principle, a more complicated aggregation strategy may also be used, that is, other methods capable of determining global characteristics in the prior art may also be used.

S320: performing excitation operation on the global feature image to generate corresponding relations and weights of all channels;

the above sequeneze operation gets global features and needs to capture the relationship between channels. This operation needs to satisfy two criteria: firstly, the method is flexible and can learn the nonlinear relation among channels; the relationship that the second point is learned is not mutually exclusive, since here a multi-channel feature is allowed, rather than a one-hot (only one activation point (not 0) at the same time) form. Heretofore, a signaling mechanism of a sigmoid (activation function) is adopted (which is expressed as an element multiplication operation of two parallel branches), so that excitation operation is performed on the global feature image to generate corresponding relations and weights of each channel, including:

s321: carrying out dimensionality reduction on the global feature image from an initial dimensionality by adopting a first full-connection layer, and processing through an activation function with a first scaling parameter to obtain a first processed image;

s322: and utilizing a second full-connection layer to carry out dimension raising on the first processed image to an initial dimension, and obtaining the corresponding relation and the weight of each channel through activation function processing with a second scaling function.

In the above steps, in order to reduce the complexity of the model and improve the generalization capability, a bottleeck structure including two fully-connected layers (a first fully-connected layer and a second fully-connected layer) is adopted, where the first fully-connected layer plays a role of reducing the dimension, the dimension reduction coefficient is r, which is a hyper-parameter, and then the activation is performed by using a ReLU (activation function). The second fully connected layer restores the original dimensions. And finally multiplying the learned activation value (sigmoid activation, value 0-1) of each channel by the original characteristic.

S330: and carrying out pixel weighting on the convolved image according to the corresponding relation and the weight of each channel.

Based on the steps S310 to S330, the above operation learns the weight coefficient of each channel, so that the established initial model has higher discrimination capability for the feature of each channel.

S400: setting a convolution block attention module behind the convolution block, so that the image passing through the convolution block passes through the channel attention module and the space attention module in sequence to perform weight distribution on the channel dimension and the space dimension respectively again;

through the step S300, the attention mechanism in the channel dimension is increased, and the attention mechanism in the spatial dimension is also increased in the embodiment, which is greatly different from the prior art, so that the sensitivity and the accuracy of the model are higher. Specifically, the image after passing through the convolution block passes through the channel attention module and the space attention module in sequence to perform weight distribution again in the channel dimension and the space dimension, respectively, with reference to fig. 4 and 5, including:

s410: compressing the image after passing through the rolling block in a space dimension by adopting a channel attention module, and summing and combining by utilizing average pooling and maximum pooling mapping to generate a channel attention diagram;

specifically, the channel attention mechanism is to compress the feature map (i.e., the image after passing through the convolution block) in the spatial dimension to obtain a one-dimensional vector, and then perform the operation. When performing compression in the spatial dimension, not only Average Pooling (Average Pooling) but also maximum Pooling (Max Pooling) is considered. The average pooling and maximum pooling may be used to aggregate the spatial information of the feature maps to a shared network, compress the spatial dimensions of the input feature maps, and sum and combine element-by-element to produce a channel attention map. In the case of a graph alone, the channel is focused on what is important on the graph. The average value pooling has feedback to each pixel point on the feature map, and the maximum value pooling has gradient feedback only at the place with the maximum response in the feature map when the gradient back propagation calculation is performed.

S420: performing weight distribution on the channel dimension after passing through the rolling block based on the channel attention map to obtain an intermediate image;

specifically, in this embodiment, the weighting is assigned to perform weighting processing on each pixel point according to the channel attention.

S430: compressing the intermediate image on a channel dimension by adopting a space attention module, and combining average pooling and maximum pooling to generate a space attention map;

specifically, the spatial attention mechanism is to compress the channel dimension, and perform average pooling and maximum pooling on the channel dimension respectively. The operation of the MaxPool is to extract the maximum value on a channel, and the extraction times are height multiplied by width; the AvgPool operation is to extract an average value on a channel, and the extraction times are also height times width; then, the extracted feature maps (the number of channels is all 1) are combined to obtain a feature map of 2 channels.

S440: weight assignment of the intermediate image in a spatial dimension based on the spatial attention map.

Specifically, as described above, after the result output by the convolution block passes through a channel attention module to obtain a weighting result (i.e., the intermediate image obtained in step S420 is the image passing through the channel attention module), the result is obtained by passing through a spatial attention module and finally performing weighting, so that the feature map is weighted by using the channel attention map first, and then the final feature map is obtained by using the spatial attention map.

S500: training the initial model by adopting the training sample to obtain a target model for identifying smoke or flame;

and after the training sample is obtained, the training sample is introduced into the following initial model to be trained by using GPU resources and corresponding parameters are set. By way of example, the specific operation flow is as follows: profile-cfg of the call train py-data setup data sets the profile-weights of the network fabric loads the path-size image size and number of the initial model. After training, the generated model file is stored in Weights (pt file storage address of the pytorch), and in the embodiment, the initial model generated target model based on the improvement of the YOLOV5 network structure will store two models, i.e. best-performing model best. Specifically, the target model comprises a trained Foucs block with fixed parameters, a convolution block attention module, a CPS bottleneck layer, a spatial pyramid pooling layer and a compression and excitation network module.

S600: and acquiring a real-time image, and performing target identification on the real-time image by adopting the target model to obtain a target result.

Specifically, in this embodiment, detect.py-weight is called in an actual scene, and two models (i.e., the above target models) obtained by training are used, source is imported into file.jpg.file.mp 4.path/. jpg.rtsp stream.rtmp stream.http steam (i.e., the storage address of the real-time image), the real-time image is imported into the target model, the real-time image is autonomously identified by the target model, and the image with the smoke or flame area identifier is output as a target result, and whether smoke or flame exists in the current real-time image or not can be determined according to the target result, so that a fire may be caused.

Further, after obtaining the target result, the method further comprises: and carrying out real-time early warning according to the target result. According to the steps S100-S600, real-time monitoring can be set under a target scene, a real-time image is obtained according to monitoring data, and when the target model identifies that smoke or flame exists in the real-time image, early warning is triggered immediately. ) Can go to adapt to more complicated more nimble environment through continuous improvement neural network, simultaneously, catch calamity and prevention calamity through current public camera, promote resource efficiency, improve the accuracy of discernment, reduce the discernment time, reduce the fire loss.

The target model adopted in the embodiment uses and modifies a Yolov5 network structure, a compression and excitation network module is arranged behind a convolution layer, a convolution block attention module is arranged behind a convolution block, a channel attention mechanism and a space attention mechanism are added, a large amount of effective data (fire image) is set, the identification of the flame and the smoke in the early stage is realized, the method is suitable for being used outdoors in a large area, the smoke or the flame generated in the early stage fire in the image can be accurately and quickly found through the processing of the target model, early warning is timely given, and the problem that the existing image processing-based fire prevention accuracy is poor, and the finding is slow is caused is solved.

It should be noted that fig. 6 and fig. 7 are a visualization result and an analysis of an exemplary target model, wherein the parameters in fig. 7 are explained as follows:

GIoU: the smaller the square frame is, the more accurate the GIoU loss function mean value in the target model is;

objectness: the smaller the target detection loss mean value in the target model is, the more accurate the target detection is;

classification: the smaller the classification loss mean value in the target model is, the more accurate the classification is;

precision: precision (find positive class/all found positive classes);

recall: recall (positive class of find pair/all positive classes that should be found pair);

mAP @0.5& mAP @0.5: 0.95: the mAP is the area enclosed after drawing by using Precision and Recall as two axes, m represents the average, the number behind @0 represents the threshold for judging iou as a positive sample and a negative sample, and @0.5:0.95 represents the average value after taking the threshold as 0.5:0.05: 0.95.

Based on the above example, it is further verified that the target model obtained based on the modification of the YOLOV5 network structure in the embodiment is high in smoke or flame accuracy in the early stage of fire detection.

Example two: the present embodiment provides an object recognition device 7 for fire, referring to fig. 8, including:

the sample generation module 71 is configured to collect a plurality of fire images including smoke or flames, label the smoke or flames in the fire images, and generate a training sample;

specifically, a large number of actual fire photos of real life are collected as samples before model training is established, an open source labeling tool Labellmg is used for carrying out smoke or flame training and verification set labeling, and a labeling format adopted in a labeling process is a TXT format of YOLO in the specific implementation mode. In the process, each fire image can be preprocessed, for example, each fire image is adjusted to be under the same size or in a uniform format.

A model establishing module 72, configured to establish an initial model based on a YOLO network structure, and configure parameters; the initial model comprises a Foucs block, a convolution block consisting of a convolution layer, a batch standardization layer and an activation layer, a CPS bottleneck layer and a space pyramid pooling layer; in the initial model, a compression and excitation network module is arranged after convolution, and weight distribution is carried out on the image after convolution on channel dimension; setting a convolution block attention module behind the convolution block, so that the image passing through the convolution block passes through the channel attention module and the space attention module in sequence to perform weight distribution on the channel dimension and the space dimension respectively again;

specifically, the initial model is improved based on the framework of the YOLOV5 network structure, and specifically, an attention mechanism of channel dimension and space dimension is added for identifying small targets such as smoke or flame in the fire, wherein the small targets refer to the state of the smoke or flame in the early stage of the fire process. And setting the convolution layer and the convolution block on the channel dimension for weight distribution, and further setting the convolution block and the convolution block on the channel dimension and the space dimension for weight distribution so as to improve the identification accuracy of the target model.

A model training module 73, configured to train the initial model with the training samples to obtain a target model for identifying smoke or flame;

specifically, in the training process in this embodiment, after the training samples are obtained, the training samples are imported into the initial model to be trained by using the GPU resources, and corresponding parameters are set.

And the execution module 74 is configured to acquire a real-time image, perform target identification on the real-time image by using the target model, and obtain a target result.

In an actual scene, the real-time image is guided into the target model, the target model autonomously identifies the real-time image, the image with the smoke or flame area identification is output as a target result, and early warning can be triggered immediately when the real-time image is identified by the target model to have smoke or flame.

In the embodiment, a sample generation module 71 is used for acquiring a fire image to generate a training sample, a model establishment module 72 is used for establishing an initial model based on a YOLO network structure, an attention mechanism of channel dimension and space dimension is increased for identifying small targets such as smoke or flame in fire, and then a model training module 73 is used for guiding in the training sample to train by using GPU resources and setting corresponding parameters. Finally, identification is performed under the execution module 74 using the target model generated after training, and an early warning can be triggered immediately when smoke or flame is present. A channel attention mechanism and a space attention mechanism are added, the small target recognition of flame and smoke in the early stage is realized, and the problem that the existing image processing-based fire disaster prevention is poor in accuracy and slow in finding is solved.

Example three:

in order to achieve the above object, the present invention further provides a computer device 8, as shown in fig. 9, the computer device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server or a rack-mounted server (including an independent server or a server cluster formed by multiple servers), etc. that executes programs. The computer device of the embodiment at least includes but is not limited to: a memory 81, a processor 82, which may be communicatively coupled to each other via a device bus, as shown in FIG. 8. It should be noted that fig. 8 only shows a computer device with components, but it should be understood that not all of the shown components are required to be implemented, and more or fewer components may be implemented instead.

In this embodiment, the memory 81 includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 81 may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the memory 81 may be an external storage device of a computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device. Of course, the memory 81 may also include both internal and external storage devices of the computer device. In the present embodiment, the memory 81 is generally used for storing an operating device installed in a computer device and various types of application software, such as program codes, training texts, and the like of an object recognition method for fire in an embodiment. Further, the memory 81 may also be used to temporarily store various types of data that have been output or are to be output.

Processor 82 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 82 is typically used to control the overall operation of the computer device. In the present embodiment, the processor 82 is configured to execute the program codes stored in the memory 81 or process data, for example, execute an object recognition device for a fire, so as to implement an object recognition method for a fire according to an embodiment.

Example four:

to achieve the above objects, the present invention also provides a computer-readable storage device including a plurality of storage media, such as a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor 82, implements corresponding functions. The computer readable storage medium of the present embodiment is used for storing a data storage query device, and when executed by the processor 82, implements an embodiment of a target recognition method for fire.

It should be noted that the embodiments of the present invention have been described in terms of preferred embodiments, and not by way of limitation, and that those skilled in the art can make modifications and variations of the embodiments described above without departing from the spirit of the invention.

Claims

1. A method of target identification for a fire, comprising:

training the initial model by adopting the training sample to obtain a target model for identifying smoke or flame; and acquiring a real-time image, and performing target identification on the real-time image by adopting the target model to obtain a target result.

2. The object recognition method of claim 1, wherein the convolution is followed by arranging a compression and excitation network module, and the weight distribution of the convolved image in the channel dimension comprises:

3. The object recognition method of claim 2, wherein the compressing the convolved images to obtain the global feature image at a channel level comprises:

4. The object recognition method of claim 2, wherein performing an excitation operation on the global feature image to generate the corresponding relationship and the weight of each channel comprises:

5. The object recognition method according to claim 1, wherein the image after passing through the rolling block passes through the channel attention module and the spatial attention module in sequence, so as to perform weight assignment again on the channel dimension and the spatial dimension respectively, and the method comprises:

compressing the image passing through the rolling block in a space dimension by adopting a channel attention module, and then summing and combining by utilizing average pooling and maximum pooling mapping to generate a channel attention diagram;

performing weight distribution on the channel dimension after passing through the rolling block based on the channel attention map to obtain an intermediate image; compressing the intermediate image on a channel dimension by adopting a space attention module, and combining average pooling and maximum pooling to generate a space attention map;

6. The method for identifying the target according to claim 1, wherein the sequentially labeling the smoke or the flame in the fire image to generate the training sample comprises:

sequencing all the fire images according to the positions and the sizes of the smoke or the flames;

and marking the smoke or flame in each fire image by adopting a text format label of YOLO one by one and arranging the smoke or flame in sequence to generate a training sample.

7. The object recognition method of claim 1, wherein:

the target model comprises a trained Foucs block with fixed parameters, a convolution block attention module, a CPS bottleneck layer, a spatial pyramid pooling layer and a compression and excitation network module.

8. An object recognition device for use in fire comprising:

9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein: the steps of the object recognition method of any one of claims 1 to 7 are implemented when the computer program is executed by a processor of the computer device.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when being executed by a processor, carries out the steps of the object recognition method of any one of claims 1 to 7.