CN115861872A

CN115861872A - Smoke and fire detection method and device, terminal equipment and storage medium

Info

Publication number: CN115861872A
Application number: CN202211411434.1A
Authority: CN
Inventors: 李翠
Original assignee: Xi'an Tianhe Defense Technology Co ltd
Current assignee: Xi'an Tianhe Defense Technology Co ltd
Priority date: 2022-11-11
Filing date: 2022-11-11
Publication date: 2023-03-28

Abstract

The application is suitable for the technical field of image recognition, and provides a firework detection method, a firework detection device, terminal equipment and a storage medium, wherein the firework detection method comprises the following steps: inputting an image to be detected into a trained smoke and fire detection model for detection, and obtaining a detection result output by the smoke and fire detection model, wherein the smoke and fire detection model performs multi-scale detection based on an attention mechanism, the detection result is used for indicating whether smoke and fire characteristics exist in the image to be detected, and outputting a first target image and position information of the first target image when the smoke and fire characteristics exist in the image to be detected, if the detection result indicates that the smoke and fire characteristics exist in the image to be detected, verifying processing is performed based on the first target image, a final detection result is obtained, verifying processing is used for determining whether the smoke and fire characteristics exist, the final detection result is used for indicating whether the smoke and fire characteristics exist in the image to be detected, and outputting the position information of the smoke and fire characteristics when the smoke and fire characteristics exist in the image to be detected. This application can improve the degree of accuracy that automatic firework detected.

Description

Smoke and fire detection method and device, terminal equipment and storage medium

Technical Field

The application belongs to the technical field of image recognition, and particularly relates to a smoke and fire detection method and device, terminal equipment and a computer-readable storage medium.

Background

Forest fires are one of the most dangerous and challenging natural disasters threatening mankind and the environment today. Forest fires not only burn a large number of forests and damage animals in the forests, but also reduce the reproductive capacity of the forests, and even cause the ecological environment to lose balance if serious.

The forest fire has the characteristics of outburst and randomness, the early discovery of the forest fire can effectively reduce the extinguishing difficulty, and the financial loss and the casualties are reduced to the maximum extent.

Because the area of the forest is large, the traditional method relying on manual patrol to find the forest fire is low in efficiency. With the development of computer technology, a video monitoring system is adopted in the prior art to realize remote monitoring of forest fires, but the method still needs to monitor video images manually in real time, and the monitoring efficiency is low.

Disclosure of Invention

The embodiment of the application provides a firework detection method, a firework detection device, terminal equipment and a storage medium, and can realize automatic firework detection and improve the accuracy of automatic firework detection.

In a first aspect, an embodiment of the present application provides a smoke and fire detection method, including:

inputting an image to be detected into a trained smoke and fire detection model for detection to obtain a detection result output by the smoke and fire detection model, wherein the smoke and fire detection model performs multi-scale detection based on an attention mechanism, the detection result is used for indicating whether smoke and fire characteristics exist in the image to be detected, and outputting a first target image and position information of the first target image when the smoke and fire characteristics exist in the image to be detected, and the first target image is an image corresponding to an area which is cut from the image to be detected and contains the smoke and fire characteristics;

if the detection result indicates that the image to be detected has the firework characteristic, verification processing is carried out based on the first target image to obtain a final detection result, the verification processing is used for determining whether the firework characteristic exists, the final detection result is used for indicating whether the image to be detected has the firework characteristic, and position information of the firework characteristic is output when the image to be detected has the firework characteristic.

In a second aspect, embodiments of the present application provide a smoke and fire detection device, including:

the detection module is used for inputting an image to be detected into a trained firework detection model for detection to obtain a detection result output by the firework detection model, wherein the firework detection model is used for carrying out multi-scale detection based on an attention mechanism, the detection result is used for indicating whether firework characteristics exist in the image to be detected or not, and when the firework characteristics exist in the image to be detected, a first target image and position information of the first target image are output, and the first target image is an image corresponding to an area which is cut from the image to be detected and contains the firework characteristics;

and the verification module is used for performing verification processing based on the first target image to obtain a final detection result if the detection result indicates that the image to be detected has smoke and fire characteristics, the verification processing is used for determining whether the smoke and fire characteristics exist, the final detection result is used for indicating whether the image to be detected has smoke and fire characteristics, and the position information of the smoke and fire characteristics is output when the image to be detected has the smoke and fire characteristics.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the smoke detection method according to the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the smoke detection method described in the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on a terminal device, causes the terminal device to perform the smoke detection method of any one of the first aspect.

Compared with the prior art, the embodiment of the application has the advantages that:

in the embodiment of the application, the image to be detected is input into the firework detection model based on the attention mechanism for multi-scale detection, the detection result output by the firework detection model is obtained, and when the detection result indicates that firework characteristics exist in the image to be detected, the first target image output by the firework detection model is verified, and the final detection result is obtained. As the firework detection model carries out multi-scale detection based on the attention mechanism, firework characteristics of different sizes can be better detected, and the attention mechanism enables the firework characteristics to be more concerned in the detection process, so that the accuracy of the firework detection model can be improved. Meanwhile, due to the fact that the smoke and fire characteristics indicated by the detection result output by the smoke and fire detection model are likely to be misdetected, when the detection result indicates that the smoke and fire characteristics exist in the image to be detected, verification is carried out based on the first target image containing the smoke and fire characteristics, whether the smoke and fire characteristics indicated by the detection result exist is determined through verification processing, and therefore the final detection result indicating whether the smoke and fire characteristics exist in the image to be detected is obtained, the situation of the smoke and fire detection misdetection is reduced, and the accuracy of the smoke and fire detection is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the embodiments or the description of the prior art will be briefly described below.

FIG. 1 is a schematic flow chart diagram of a smoke and fire detection method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a pre-slicing operation provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a preset area image provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of an image before and after brightness adjustment provided by an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a smoke and fire detection model provided in an embodiment of the present application

FIG. 6 is a schematic structural diagram of a smoke and fire detection device provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise.

The first embodiment is as follows:

with the rapid development of deep learning, automatic detection of forest smoke and fire can be realized by adopting a model based on deep learning, but when the method is adopted, whether a moving target appears in a video picture is dynamically detected, and the moving target is tracked to judge whether the moving target is smoke and fire or not, or whether the smoke and fire target appears or not is judged by detecting based on the color characteristics of smoke and fire. However, in practical applications, there are many forest background interferents and many smoke and fire types (such as black smoke, white smoke, green smoke, yellow flame, blue flame, etc.), so that there are still many false detection targets when automatic detection is performed based on a moving target or a smoke and fire color, etc., and the accuracy of smoke and fire detection is not high.

In order to improve the accuracy of automatic firework detection, the embodiment of the application provides a firework detection method.

In the smoke and fire detection method provided by the embodiment of the application, an image to be detected is input to a smoke and fire detection model based on an attention mechanism for multi-scale detection, and a detection result output by the smoke and fire detection model is obtained. Wherein, above-mentioned detection result is arranged in instructing whether to have the firework characteristic in the above-mentioned image of waiting to detect to output the first target image that contains above-mentioned firework characteristic when waiting to detect to have the firework characteristic in the image, thereby when the detection result instructs above-mentioned image of waiting to detect to have the firework characteristic, verify the processing based on the first target image that firework detection model output, confirm whether above-mentioned firework characteristic exists, obtain final testing result, above-mentioned final testing result is used for instructing above-mentioned image of waiting to detect whether has the firework characteristic, and output the positional information of above-mentioned firework characteristic when above-mentioned image of waiting to detect exists the firework characteristic.

Because the firework detection model carries out the multiscale detection based on the attention mechanism, therefore, this firework detection model can detect the target of variation in size simultaneously (if detect the target that the size is big simultaneously and the target that the size is little), simultaneously, because the attention mechanism more concerns interesting firework characteristics, consequently, whether have firework in the picture of waiting to detect through the firework detection model automatic detection based on the attention mechanism, can improve the degree of accuracy that firework detected. In addition, when firework characteristics are detected by the firework detection model, verification processing is carried out based on a first target image containing the detected firework characteristics, whether firework characteristics indicated in a detection result exist is determined through verification, and a final detection result indicating whether firework characteristics exist in an image to be detected can be obtained, so that the condition of firework detection false detection is reduced, the possibility of false detection is reduced, and the accuracy of firework detection is further improved.

The firework detection method provided by the embodiment of the application is described below with reference to the accompanying drawings.

Fig. 1 shows a schematic flow chart of a smoke and fire detection method provided by an embodiment of the present invention, which is detailed as follows:

step S101, inputting an image to be detected into a trained smoke and fire detection model for detection, and obtaining a detection result output by the smoke and fire detection model, wherein the smoke and fire detection model performs multi-scale detection based on an attention mechanism, the detection result is used for indicating whether smoke and fire characteristics exist in the image to be detected, and outputting a first target image and position information of the first target image when the smoke and fire characteristics exist in the image to be detected.

The first target image is an image including the smoke and fire characteristics, which is cut out from the image to be detected. The smoke and fire characteristics include various colors of smoke, flames and other characteristics accompanying fire, such as black smoke and red/yellow flames when the fire is violent.

The above-mentioned attention mechanism stems from the study of human vision. Attention mechanisms generally include two aspects: it is decided which part of the input needs attention and the limited information processing resources are allocated to the important part (i.e., the part that needs attention). Because forest resources are abundant, a plurality of interferents such as trees and animals with various colors exist, and part of terminal equipment computing resources are limited, so that the difficulty is higher when the model detects smoke and fire characteristics, therefore, an attention mechanism is introduced into the smoke and fire detection model in the embodiment of the application, so that the smoke and fire detection model focuses more on a smoke and fire area in the detection process, the computing cost is reduced, the inference speed of the smoke and fire detection model is increased, and the smoke and fire detection speed is increased.

The multi-scale detection is to obtain characteristic diagrams of different scales from an image to be detected for detection. When the image to be detected has extremely large and extremely small objects (namely, the target has different sizes and has larger difference), the objects with different sizes need to be detected, namely, the model is required to have robustness to the scale, the semantic information of the large object generally appears in a deep feature map, and the semantic information of the small object generally appears in a shallow feature map, so that the large object and the small object can be better detected simultaneously by detecting the feature maps with different sizes of the image to be detected.

Specifically, when detection is carried out, the obtained image to be detected is input into a trained smoke and fire detection model, the smoke and fire detection model carries out multi-scale detection on the image to be detected based on an attention mechanism so as to detect whether smoke and fire characteristics exist in the image to be detected, and a corresponding detection result is output. The detection result indicates whether the image to be detected has smoke and fire characteristics or not, and the first target image containing the detected smoke and fire characteristics and the position information thereof are output when the smoke and fire characteristics are detected, so that whether the smoke and fire characteristics exist in the image to be detected or not can be determined according to the detection result, and further, whether forest fire possibly occurs or not is determined. Optionally, in order to find the forest fire in time and reduce the loss caused by the forest fire, the image to be detected is an image obtained in real time, that is, smoke and fire detection is performed based on the real-time image.

Optionally, when the smoke and fire detection model is constructed in advance, an attention module is embedded in the smoke and fire detection model, convolution operations are performed on the image to be detected by adopting convolution kernels with different sizes, or the image to be detected is zoomed to obtain the image to be detected with different sizes, and then feature extraction is performed on the image to be detected with different sizes to obtain feature maps with different sizes. After the feature maps with different sizes are obtained, the fusion module performs fusion processing on the feature maps with different sizes to obtain fusion feature maps with different sizes, so as to perform smoke and fire target detection according to the fusion feature maps with different sizes to obtain detection results corresponding to the fusion feature maps with different sizes, and thus, the detection result of the image to be detected is determined according to the detection results of the fusion feature maps with different sizes. For example, a smoke and fire detection model is constructed based on a network structure of YOLOv5 (in the embodiment of the present application, the smoke and fire detection model constructed based on YOLOv5 is described, and in practical applications, the smoke and fire detection model may also be constructed based on a network structure such as SSD (Single Shot multi box detect, single multi-target detector) or a self-constructed smoke and fire detection model, where no limitation is made herein), an attention module is embedded in a backbone network of the YOLOv5 network, so that the backbone network performs feature extraction based on an attention mechanism, and a weighted bi-directional feature pyramid network is used as a path aggregation network in the YOLOv5 network, so as to learn the importance of different input features through a learnable weight, and meanwhile, when the weighted bi-directional feature pyramid network fusion performs feature fusion, an ADD fusion (i.e., additive fusion) method is used to fuse features to obtain fusion feature maps of different sizes, and an ADD two feature maps based on channel dimensions, so that the information amount of each dimension can be increased on the premise of not increasing the number of channels of the fusion feature maps, thereby increasing the information amount of each dimension, and further increasing the information amount of the fusion information amount of the subsequent feature maps, thereby improving the accuracy of detection based on the detection. After fusion feature maps with different sizes are obtained, the prediction module predicts the fusion feature maps with different sizes to obtain detection results corresponding to each fusion feature map, performs deduplication operation on the detection results of each fusion feature map, removes repeated prediction on the same firework feature in the detection results of each fusion feature map, and enables the firework detection model to output only one first target image and position information thereof for the same firework feature (for example, fusion feature maps X1 and X2 with the sizes of 80 × 80 and 40 × 40 respectively output one first target image Y1 and Y2 and corresponding position information, and the firework feature contained in Y1 and Y2 is the same firework feature in the image to be detected by comparing, and since the size of X1 is larger than that of X2, and correspondingly, the size of the first target image Y1 of the same firework feature is larger than that of Y2, the firework detection model only outputs the position information of the first target images Y1 and Y1).

Optionally, a set of tagged smoke and fire detection sample images is acquired, wherein each sample image in the set of sample images contains tagged smoke and/or flame characteristics. When the firework detection model is trained, a sample image set is randomly divided into a training set, a verification set and a test set, the built firework detection model is trained by the training set, the firework detection model obtained by the verification set is verified by the verification set, the firework detection model is continuously adjusted according to verification errors until the firework detection model meets user requirements (if the detection accuracy reaches 0.95), the obtained firework detection model is evaluated by the test set, and the performance of the trained firework detection model is evaluated. Optionally, when the sample image set only includes two types, namely smoke and flame, the background sample ratio is relatively too large, and the problem of imbalance between the positive sample and the negative sample exists, so that when the sample image set is obtained, relevant types, such as lakes and white garbage, can be added based on an actual application scene to solve the problem of high false detection rate caused by imbalance between the samples.

Optionally, when the first target image containing the smoke and fire features is output, the area intercepted by the prediction box or the boundary box in which the smoke and fire features are detected is output as the first target image. In a target detection task, a large number of regions are usually sampled in an image to be detected, whether the regions contain an interested target or not is judged, and the region edges are adjusted so as to predict the real bounding box of the target more accurately. In the sampling area, each pixel is used as the center to generate a plurality of boundary frames (anchor frames) with different sizes and aspect ratios, the types and the offsets of the generated boundary frames are predicted in the detection process, the positions of the anchor frames are continuously adjusted according to the predicted offsets, and therefore the prediction frames are obtained, wherein the obtained prediction frames comprise detected smoke and fire targets, namely smoke targets and/or flame targets.

Optionally, when the first target image including the detected smoke and fire features is output, if more than one smoke and fire features are detected in the image to be detected, one first target image including all the smoke and fire features detected in the image to be detected may be output, or a plurality of first target images may be output. When a plurality of first target images are output, one first target image comprises one detected firework characteristic, the firework characteristics of the first target images are different (namely the firework characteristics of the first target images are not the same firework characteristic at the same position), and the corresponding position information of the firework characteristics is different, so that a user can know the firework existing in the image to be detected more intuitively. Alternatively, since smoke and flames gradually spread when a forest fire occurs, smoke and fire features, the distance between which does not exceed the distance threshold (e.g., 10 px), are output in the same first target image, i.e., output as one smoke and fire feature, so that a user can know the smoke and fire conditions more clearly.

In the embodiment of the application, the image to be detected is input to be subjected to multi-scale detection through the firework detection model based on the attention mechanism, the corresponding detection result is output, and when firework characteristics exist in the image to be detected, the first target image containing the corresponding firework characteristics and the position information of the first target image are output, so that a user can know the existing firework conditions more intuitively. Because the attention mechanism enables the smoke and fire detection model to pay more attention to smoke and fire characteristics in the detection process, and the multi-scale detection can detect large targets and small targets at the same time, namely smoke and fire of different sizes can be detected.

And S102, if the detection result indicates that the image to be detected has smoke and fire characteristics, performing verification processing based on the first target image to obtain a final detection result.

And the final detection result is used for indicating whether the image to be detected has firework characteristics or not, and outputting the position information of the firework characteristics when the image to be detected has the firework characteristics, so that a user can know the fire information corresponding to the firework characteristics.

Specifically, in order to reduce the occurrence of false detection, when the detection result of the smoke and fire detection model indicates that smoke and fire features exist in the image to be detected, verification processing is performed according to a corresponding first target image including the smoke and fire features output by the smoke and fire detection model, that is, the first target image corresponding to the smoke and fire features is subjected to secondary verification, whether the smoke and fire features exist in the first target image is determined through the secondary verification, that is, whether the smoke and fire features in the first target image corresponding to the detection result are false detection is determined, and a final detection result is obtained according to the verification result. If the verification result obtained by the verification processing indicates that the firework characteristic exists in the first target image, the final detection result indicates that the firework characteristic exists in the image to be detected, and meanwhile, the position information of the firework characteristic is output; if the verification result indicates that no smoke and fire features exist in the first target image, namely, the detection result of the smoke and fire features in the first target image is false detection, the final detection result indicates that no smoke and fire features exist in the first target image, namely, the image to be detected does not have the smoke and fire features in the first target image.

It should be noted that, during verification processing, the verification of the smoke and fire characteristics in each first target image is performed based on each first target image output by the smoke and fire detection model, so as to accurately verify each smoke and fire characteristic indicated in the detection result, and meanwhile, the verification processing is performed on each corresponding first target image, so that the time required by the verification processing is reduced, and the detection speed is increased while the accuracy of the smoke and fire detection result is ensured. For example, the detection result output by the smoke and fire detection model indicates that smoke and fire characteristics A and B exist in the image to be detected, a first target image A1 of the smoke and fire characteristics A is output, and a second target image B1 of the smoke and fire characteristics B is output, then corresponding second target images A1 and B2 of the smoke and fire characteristics A and B are verified respectively, whether the smoke and fire characteristics A in the A1 and the smoke and fire characteristics B in the B1 exist is determined through verification, and a final detection result is obtained.

In the embodiment of the application, because the firework characteristic that exists that instructs in the testing result of firework detection model output has the possibility of false retrieval, consequently, when the testing result instructs to wait to detect that the image has the firework characteristic, verify the processing based on the first target image that contains the correspondence of firework characteristic, verify whether there is the firework characteristic in this first target image, whether the ultimate testing result who instructs to detect whether there is the firework characteristic that corresponds in the image again, thereby reduce the possibility of false retrieval, and then improve the degree of accuracy that firework detected.

In the embodiment of the application, adopt the firework detection model based on attention mechanism to treat the detection image and carry out the multiscale detection, output corresponding testing result, and when detecting that there is firework characteristic in the above-mentioned image of waiting to detect, the first target image that contains firework characteristic and the positional information of above-mentioned first target image of output, and simultaneously, when detecting that the above-mentioned image of waiting to detect has firework characteristic, verify the processing based on corresponding first target image, confirm whether there is firework characteristic in the above-mentioned image of waiting to detect through verifying, obtain again and instruct to wait to detect whether there is final testing result of firework characteristic in the image, with the possibility that reduces the false retrieval. Because attention mechanism makes firework detection model pay more attention to the firework characteristic in the testing process, calculation cost has been reduced, and, the detection of big target and small target can be carried out simultaneously in the detection of multiscale, make firework detection model realize the detection to the firework of different sizes, consequently, adopt the firework detection model based on attention mechanism to carry out multiscale detection, smoke and fire detection model's detection rate has been accelerated, improve detection accuracy simultaneously, make the user can in time discover the emergence of forest fire through above-mentioned firework detection model, reduce the influence of forest fire.

In some embodiments, the above-described smoke detection method further comprises:

and acquiring an image to be detected.

Optionally, real-time videos collected by the cameras installed in advance are obtained, a video frame sequence of each camera is obtained according to the real-time videos, and the image to be detected is determined according to each video frame. Each obtained video frame can be used as an image to be detected, and a part of the video frames can be selected as the image to be detected according to a selection rule (for example, an image to be detected is selected every 2 video frames in the video frame sequence of each camera). Alternatively, the camera may be a fixed camera, or a rotatable camera (e.g., a ball machine).

Optionally, in order to facilitate a user to know the forest fire condition more quickly and intuitively, the forest is divided into a plurality of different areas, each camera and the corresponding shooting area in each area are numbered correspondingly, and the area, the camera and the corresponding record of the shooting area are stored. When the image to be detected corresponding to the video acquired by each camera is acquired, the image to be detected is marked (for example, the video frame mark acquired by the camera 1 in the area 2 is 21), so that the forest area corresponding to the image to be detected with smoke and fire characteristics can be rapidly and definitely detected when the smoke and fire characteristics are detected in the image to be detected subsequently.

In the embodiment of the application, the video frame of the real-time video collected by the camera is acquired as the image to be detected, and each image to be detected is marked correspondingly, so that the corresponding area position can be rapidly and accurately determined when the forest fire is detected, and the forest fire is rapidly processed.

In some embodiments, the smoke detection model includes an input module, a backbone network, a fusion module, and a prediction module, and the step S101 includes:

a1, preprocessing an input image to be detected based on an input module to obtain an initial characteristic diagram of the image to be detected.

Optionally, before the image to be detected is input into the backbone network for feature extraction, the image to be detected is sliced through the input module, as shown in fig. 2, every other pixel in the image to be detected is taken to have a value, similar to downsampling, to obtain four images, and then the four images are connected in the channel direction, and the four images are complementary, so that in a new image obtained by connection, information in the image to be detected cannot be lost. The four images are spliced in the channel direction, and information in the horizontal direction and the vertical direction (namely H dimension and W dimension) is concentrated in the channel space, so that the input RGB three-channel image to be detected is changed into 12 channels, the obtained 12-channel new image is subjected to convolution operation, a double-sampling feature map, namely an initial feature map, under the condition that no information is lost can be obtained, the detection is carried out based on the 12-channel initial feature map with smaller size in the subsequent detection process, and the parameter quantity and the calculated quantity of the smoke and fire detection model are reduced.

Optionally, before the image to be detected is subjected to the slicing operation, the input module may further perform one or more preprocessing operations of graying, geometric transformation, normalization, image enhancement and the like on the image to be detected, so as to eliminate irrelevant information in the image and recover useful real information, thereby enhancing the detectability of relevant information (such as smoke and flame). For example, because there may be differences in brightness, scale, contrast, etc. of images to be detected acquired from different cameras, if the images to be detected are not subjected to standardized preprocessing, the differences in the images to be detected will have a certain influence on the corresponding detection results, so that the accuracy of smoke and fire detection is reduced, therefore, the input images to be detected are subjected to standardized processing at the input module, the mean value of the image pixels is subtracted from each pixel value of the images to be detected, and then the obtained value is divided by the standard deviation, so that the distribution of the obtained values conforms to the normal standard distribution, thereby reducing the influence of dimensions on the detection results of the images to be detected.

And A2, performing multi-scale feature extraction on the initial feature map based on the backbone network to obtain first feature maps with different sizes.

Optionally, when the backbone network performs multi-scale feature extraction on the initial feature map, the backbone network may perform scaling processing on the input initial feature map to obtain initial feature maps of different sizes, and then perform feature extraction based on the initial feature maps of different sizes, or perform multi-scale feature extraction on the initial feature maps by using convolution kernels of different sizes to obtain first feature maps of different sizes.

And A3, fusing the first feature maps with different sizes based on the fusion module.

Optionally, after the first feature maps of different sizes are extracted by the backbone network, the fusion module fuses each first feature map of different sizes based on the weighted bidirectional feature pyramid network, wherein the fusion module outputs the fusion feature maps of at least two sizes, so as to perform multi-scale detection based on the fusion feature maps of different sizes in the following. For example, if the backbone network outputs a first feature map a and a first feature map B of two different sizes, the fusion module performs convolution or operation on the first feature map a to obtain a first feature map A1 stored in the same manner as the first feature map B, performs weighted fusion processing on the first feature map A1 and the first feature map B to obtain a fused feature map with the same size as the first feature map B, performs sampling or convolution operation on the first feature map B to obtain a first feature map B1 with the same size as the first feature map a, performs weighted fusion processing on the first feature map B1 and the first feature map a to obtain a fused feature map with the same size as the first feature map a, and ensures that the fusion module outputs the fused feature maps of different sizes to perform multi-scale detection.

And A4, detecting the fused feature map obtained by fusion based on the prediction module, and outputting a detection result.

Optionally, the prediction module obtains the fusion feature maps of different sizes generated by the fusion module, detects the existing firework target based on the fusion feature maps of different sizes, outputs the corresponding detection result, and if it is detected that the firework feature exists in the fusion feature map, outputs an image of the firework feature at the corresponding position of the image to be detected or an image of the firework feature in the detection frame as a first target image, and outputs position information of the first target image (i.e., the corresponding detection frame), where the first target image includes the detected firework feature (i.e., the firework target). When the prediction module detects at least two fusion feature graphs with different sizes, smoke and fire features existing in each fusion feature graph are respectively detected, then detection results corresponding to each fusion feature graph are integrated, first target images corresponding to each fusion feature graph are compared, the same smoke and fire feature detected from different fusion feature graphs is subjected to de-duplication, namely if the smoke and fire features in the first target images of a plurality of fusion feature graphs are the same smoke and fire feature, the same smoke and fire feature is subjected to de-duplication, only one first target image (with the largest size of the first target image) is reserved, and the same smoke and fire feature is prevented from being repeatedly output. Optionally, when the first target images corresponding to the respective fusion feature maps are compared, whether the firework features in the respective first target images are the same firework feature may be determined according to the position information of the respective first target images in the corresponding to-be-detected image (i.e., the fusion feature map). For example, because the sizes of the respective fused feature maps are different, when comparing the positions of the respective first target images in the corresponding fused feature maps, the position information of the respective first target images is amplified according to the size ratio of the respective fused feature maps to the fused feature map a, taking a certain fused feature map, such as the fused feature map a with the largest size as a standard, so as to obtain the position information of the respective first target images relative to the fused feature map a, thereby comparing the positions of the respective first target images, and regarding the first target images whose positions are overlapped and whose non-overlapped portions do not exceed a threshold (e.g., 2 px) as the first target images of the same firework feature.

In the embodiment of the application, the input module is used for preprocessing the image to be detected, so that the detectability of useful information in the image to be detected is enhanced. Meanwhile, in the process of fusing the feature maps with different sizes, the feature maps are subjected to ADD fusion based on a weighted bidirectional feature pyramid network, the information content in the fusion feature maps is increased while the importance of different features in the feature maps capable of learning weight information input is introduced, the fusion feature maps with different sizes are generated to detect based on the fusion feature maps with different sizes, detection of smoke and fire targets with different sizes is facilitated, and therefore the accuracy of a smoke and fire detection model is improved.

In some embodiments, the step A2 includes:

and A21, encoding each channel of the initial characteristic diagram respectively along the horizontal direction and the vertical direction to obtain attention diagrams in two different directions.

And A22, performing feature extraction on the initial feature map to obtain an intermediate feature map.

And A23, embedding the two attention maps into the extracted intermediate feature map along the horizontal direction and the vertical direction respectively to obtain a first feature map containing different direction perceptions.

Optionally, the backbone network includes at least two feature extraction modules, and feature extraction of different sizes is performed on the input feature map by the at least two feature extraction modules to obtain feature maps of different sizes, where an attention module is embedded in each feature extraction module. Optionally, the Attention module embedded in each feature extraction module described above employs a CA (coding Attention) Attention mechanism. The CA attention may take the input feature map as input and output the feature map having the same size as the input feature map while having enhanced characterization by conversion.

Specifically, in each feature extraction module, one-dimensional pooling operation is performed on the input initial feature maps in the horizontal direction and the vertical direction (i.e., H-dimension and W-dimension) respectively to aggregate the initial feature maps into independent direction perception feature maps in the horizontal direction and the vertical direction, and the generated direction perception feature maps are encoded into attention maps in two different directions (i.e., the horizontal direction and the vertical direction) to embed spatial coordinate information into the attention maps. Due to the one-dimensional pooling operation of the initial feature maps along different dimensional directions, accurate position information is retained while long-distance dependency between different positions is captured, and therefore an attention map containing direction perception and position sensitivity is obtained. Meanwhile, the feature extraction module performs convolution processing on the initial feature map to obtain an intermediate feature map, and embeds the generated attention maps in two different directions into the intermediate feature map along corresponding directions (namely, the horizontal direction and the vertical direction) to obtain a first feature map containing different direction perceptions (the horizontal direction and the vertical direction).

It should be noted that each feature extraction module may acquire an initial feature map and perform feature extraction on the initial feature map based on an attention mechanism, or a first feature extraction module acquires the initial feature map, and each subsequent feature extraction module acquires a first feature map output by a previous feature extraction module (for example, a second feature extraction module takes the first feature map output by the first feature extraction module as input), and performs feature extraction on the acquired feature map (the initial feature map or the first feature map) based on the attention mechanism. Meanwhile, the attention module may be separately arranged to generate the attention diagrams corresponding to the feature extraction modules, and after the feature extraction modules extract the first feature maps, the attention diagrams may be embedded into at least two corresponding first feature maps, or the attention module may be embedded into the feature extraction module to generate the attention diagrams of the input feature maps of the feature extraction modules, and the attention diagrams may be embedded into the first feature maps extracted by the feature extraction module, which is not limited herein.

In the embodiment of the application, in the process of extracting the features of the initial feature map by the feature extraction module, the generated two attention maps containing different direction perceptions are embedded into the generated intermediate feature map, so that the first feature map containing different direction perceptions is obtained. Since the expression capability of the features in the feature map obtained based on the CA attention is enhanced, the subsequent firework detection based on the enhanced first feature map can improve the detection accuracy.

In some embodiments, the step A3 includes:

and acquiring at least two first feature maps with different sizes extracted by the feature extraction module.

And performing upsampling or convolution processing on each first feature map to obtain second feature maps with different sizes corresponding to the first feature maps with different sizes, wherein the size of the second feature map corresponding to the first feature map subjected to the upsampling or convolution processing is the same as that of any other first feature map not subjected to the upsampling or convolution processing.

And respectively carrying out weighted feature fusion on the specified second feature map and the specified first feature map to obtain fused feature maps with different sizes, wherein the sizes of the specified first feature map and the second feature map which are subjected to weighted feature fusion are the same.

Optionally, the fusion module performs fusion processing on the acquired first feature maps with different sizes based on the bidirectional weighted feature pyramid.

Specifically, the fusion module acquires the first feature map extracted by each feature extraction module. Because the backbone network includes at least two feature extraction modules, the fusion module acquires at least two first feature maps of different sizes, performs upsampling or convolution processing on each acquired first feature map, so as to scale each first feature map, and obtains a second feature map of different size corresponding to each first feature map of different size, that is, according to the first feature map, a second feature map of different size from the first feature map and of the same size as any one of the other first feature maps is generated (for example, the sizes of the first feature map M1 and the first feature map M2 are 40 × 40 and 80 × 80, the upsampling operation is performed on the first feature map M1 of 40 × 40, so as to obtain a second feature map N1 of 80 × 80, the convolution operation is performed on the first feature map M2 of 80, so as to obtain a second feature map N2 of 40 × 40, N1 is a second feature map corresponding to M1, N2 is a second feature map corresponding to M2, and N2 is the same size as M2, where N is the first feature map M1 and the second feature map M2, and N2 are the same size as M2. After obtaining each second feature map, performing weighted fusion on the specified second feature map and the specified first feature map, that is, the first feature map and the second feature map subjected to weighted fusion operation have the same size, so as to obtain fused feature maps with different sizes (for example, fusing the first feature map with the size of 20 × 20 and the second feature map with the same size of 20 × 20, so as to obtain a fused feature map with the size of 20 × 20). When the second feature diagram is fused with the first feature diagram, the second feature diagram and the first feature diagram are fused in an ADD fusion mode, information of all dimensions of the first feature diagram and the second feature diagram is fused, and the information amount in the fused feature diagram is increased while the dimension and the size of a channel of the fused feature diagram are not increased.

In the embodiment of the application, in the process of feature fusion, each obtained second feature graph and the first feature graph with the corresponding size are subjected to weighted fusion, the importance that the learned weight learns different features in the input feature graph is introduced, so that the firework detection model focuses more on firework features, and meanwhile, an ADD fusion mode is adopted for fusion, so that the information quantity of each channel dimension in the fusion feature graph is increased, and the subsequent detection of the firework features is facilitated.

In some embodiments, the step S102 includes:

b1, if the detection result indicates that the image to be detected has smoke and fire characteristics, acquiring N frames of verification images, wherein the verification images are images with sampling time adjacent to the sampling time of the image to be detected, and N is greater than or equal to 1.

Specifically, when the detection result indicates that the smoke and fire characteristics exist in the image to be detected, a verification image is obtained to perform secondary verification. The method comprises the steps of acquiring N frames of images which are shot by the same camera and are adjacent to the shooting time (namely sampling time) of an image to be detected as a verification image, wherein N is an integer greater than or equal to 1. In order to perform more accurate verification, when a verification image is acquired, acquiring N frames of images which are adjacent to each other in time and do not exceed a preset time threshold (such as 1 s) before and after the shooting time of the image to be detected as the verification images, and dividing the acquired verification images into a first verification image and a second verification image according to the shooting time of each verification image, wherein the first verification image is the verification image of which the shooting time is before the shooting time of the image to be detected, and the second verification image is the verification image of which the shooting time is after the shooting time of the image to be detected. For example, if the shooting time of the image to be detected is 13.15.00 and the preset time threshold is 1s, then the N frames of images, except the image to be detected, with the shooting time between 13.14.59 and 13.15.01 are acquired as the verification images, wherein the verification image with the shooting time between 13.14.59 and 13.15.00 is taken as the first verification image, and the verification image with the shooting time between 13.15.00 and 13.15.01 is taken as the second verification image.

And B2, acquiring at least one second target image, wherein the second target image is an image corresponding to a target at a target position in each verification image, and the target position is a position corresponding to the position information.

Specifically, according to the position information of each first target image output by the detection result, the image corresponding to the target at each target position in each verification image is determined, and at least one second target image corresponding to each first target image is acquired from each verification image, that is, for each first target image, at least one second target image corresponding to the first target image is acquired, so as to perform verification processing on each first target image. For example, the detection result outputs first target images a and B and corresponding position information thereof, and for the two acquired verification images X and Y, images corresponding to the targets at the positions corresponding to the position information of a are respectively acquired from the verification images X and Y according to the position information of the first target image a, so as to obtain two second target images Ax and Ay; and according to the position information of the second target image B, acquiring an image corresponding to the target at the position corresponding to the position information of B from the verification image X to obtain a second target image Bx.

And B3, comparing the first target image with each second target image to obtain a final detection result.

Optionally, when the verification image is acquired, since the smoke and fire detection model is detected in real time, acquiring a video frame with a shooting time after the image to be detected as the verification image needs to wait, so that directly acquiring an N frame image (i.e. a first verification image) with a shooting time before the image to be detected within a preset time threshold as the verification image, and performing verification in real time.

Optionally, at least one second target image is obtained from the first verification image, the first target image is compared with each obtained second target image, if the comparison result indicates that the similarity between the first target image and the second target image meets a preset threshold (for example, less than 0.3, the closer to 0 indicates the more similar), since the shooting time of the verification image corresponding to the second target image is before the image corresponding to the first target image (i.e., the image to be detected), and the image corresponding to the second target image may have been detected by the smoke and fire detection model before detecting the image to be detected, and the existence of smoke and fire features is not detected, the comparison result indicates that the similarity between the first target image and the second target image meets the preset threshold, it indicates that the first target image is similar to the second target image corresponding to the verification image without the detection of smoke and fire features.

Optionally, if N second target images are obtained for comparison with the first target image, and the similarity between a specified number (e.g., 2N/3) of the second target images and the first target image does not satisfy a preset threshold, it indicates that the first target image is not similar to each of the second target images, and it can be determined that the second target image has smoke and fire features, and meanwhile, the corresponding final detection result indicates that the smoke and fire features corresponding to the first target image exist in the image to be detected and the position information of the first target image, that is, a forest fire occurs, or else, it is determined that the first target image does not have smoke and fire features, that is, the smoke and fire features in the first target image are false detections. For example, after 3 verification images are acquired, target images of corresponding target positions in the 3 verification images are respectively acquired according to position information of a first target image to obtain 3 second target images, the first target image is respectively compared with the 3 second target images, if the comparison result indicates that the similarity between 2 or more second target images and the first target image does not meet the threshold requirement smaller than 0.40, namely the similarity between at least 2 second target images and the first target image is lower, it is determined that corresponding smoke and fire features exist in the image to be detected, and otherwise, it is determined that the smoke and fire features in the first target image are detected by mistake.

Optionally, if at least one second target image is acquired from the second verification image, each acquired second target image is compared with the first target image, and since the shooting time of the second verification image is after the to-be-detected image corresponding to the first target image, if the comparison result indicates that the similarity between the second target image and the first target image meets a preset threshold (e.g., 0.10), it indicates that the similarity between the second target image and the first target image is higher, and it is determined that the firework feature also exists in the second target image. Optionally, if N acquired second target images are compared with the first target image, and the similarity between a specified number (e.g., 3N/5) of the second target images and the first target image meets a preset threshold, it is determined that the smoke and fire features exist in the first target image, otherwise, it is determined that the smoke and fire features in the first target image are false detections.

Optionally, if at least one corresponding second target image is acquired from the first verification image and the second verification image respectively and compared with the first target image, it is determined whether the smoke and fire feature in the first target image is false detection or not by combining the comparison result of each second target image and the first target image. Optionally, if there is a specified number (e.g., one half) of second target images in the N second target images obtained from the first verification image that the similarity between the second target images and the first target image does not meet a preset requirement (e.g., 0.5), and meanwhile, there is a specified number (e.g., one half) of second target images in the M second target images obtained from the second verification image that the similarity between the second target images and the first target image meets a preset requirement (e.g., 0.3), it is determined that there is a smoke and fire feature in the first target image, otherwise, it is determined that the smoke and fire feature in the first target image is false detection.

Optionally, when comparing the first target image with each second target image, the similarity comparison of the first target image with each second target image is performed based on the siamese twin network. The twin network usually adopts two identical network models, the parameters of the two network models are completely the same, two input images (a first target image and a second target image) are calculated, a vector is output, and Euclidean distance calculation is performed on the output vector, so that the similarity of the two input images is obtained.

It should be noted that, in an actual application scenario, when the first target image and each of the second target images are compared in terms of similarity, it may also be determined whether the first target image and each of the second target images are false detected according to an actual similarity measurement method and a corresponding threshold, for example, when the first target image and each of the second target images are compared in terms of similarity based on cosine similarity, the closer the similarity is to 1 indicates that the first target image and each of the second target images are more similar.

In the embodiment of the application, when the detection result indicates that firework characteristics exist in the image to be detected, the corresponding N second target images are acquired to perform similarity comparison with the first target image containing the firework characteristics, whether the firework characteristics exist in the first target image is verified, and whether corresponding firework characteristics exist in the image to be detected is determined, so that the possibility of false detection is reduced, and the accuracy of firework detection is improved.

In some embodiments, in order to quickly verify whether the smoke and fire features in the first target image are false detected, when the verification processing is performed based on the first target image, the first target image may be directly used as an input of a smoke and fire detection model to detect again, if a detection result of the smoke and fire detection model indicates that the smoke and fire features exist in the first target image, it is determined that the smoke and fire features exist in the first target image and no false detection exists, if the detection result of the first target image indicates that no smoke and fire features exist in the first target image, it is determined that the smoke and fire features in the first target image in the image to be detected are false detected, and the final detection result indicates that the image to be detected does not have the smoke and fire features. The verification process may also be performed in other manners (e.g., manual verification, etc.), and is not limited herein.

In some embodiments, the step B3 includes:

when the first target image is compared with each second target image, if the size of the first target image indicated by the detection result is smaller than a preset threshold, comparing images of a preset area intercepted based on the center points of the first target image and each second target image to obtain a final detection result.

Optionally, when the rotatable camera is used for shooting, the camera is rotated to shoot, so that shooting areas of images which are shot in succession at shooting time have a certain offset, so that positions of the same target in different images which are continuous to each other are not completely consistent, and when the size of the target is small, the deviation of the relative position is large, therefore, in order to reduce the influence of the position offset on the small target, when the first target image and the second target image are compared and verified, if the size of the first target image is smaller than a preset threshold (such as width or height smaller than 30 pixels), images of preset areas (such as height and width of 80%) in the first target image and the second target image are intercepted and compared on the basis of the central points of the first target image and the second target image, so as to obtain a final detection result. For example, as shown in fig. 3, the left side is an original image corresponding to a first target image and a second target image, and the right side is an image of a preset area corresponding to the first target image and the second target image cut based on the center point of the image, and the preset area of two thirds of the width and height of the first target image (the first target image and the second target image are the same in size) is used for comparison.

In the embodiment of the application, because the shooting area is not completely consistent in the different images shot by the rotary camera, the position deviation of the small target in the images is obvious, and certain influence can be caused on the verification result, therefore, the image in the preset area is intercepted and compared on the basis of the image center point when the size of the first target image is smaller than the preset threshold value, and therefore the error caused by the position deviation of the small target in the images is reduced.

In some embodiments, the step B3 further includes, before:

and calculating the average brightness of the first target image and each second target image, and calculating the brightness difference between the first target image and each second target image based on the average brightness, so as to adjust the brightness of each corresponding second target image according to the brightness difference.

Optionally, since there is a certain interval between the shooting times corresponding to the first target image and the second target image, and the brightness of the shot images is different due to the change of sunlight and the like, and the similarity of the images will be different due to the difference of the brightness of the images, in order to reduce the influence of the brightness of the images and improve the accuracy of smoke and fire detection, when the first target image is compared with each corresponding second target image, the average brightness of the first target image and each second target image is calculated respectively, and the brightness difference between each second target image and the first target image is calculated based on the average brightness thereof, so as to adjust the brightness of each second target image according to the brightness difference thereof and reduce the interference of the brightness on the similarity of each second target image and the first target image. For example, as shown in fig. 4, the two images of the first layer on the left are the first target image and the second target image before brightness adjustment, and the two images of the second layer are the corresponding first target image and the second target image after brightness adjustment.

In the embodiment of the application, the difference of the image brightness has a certain difference to the similarity of the images, so that certain influence exists when the similarity of the first target image and the second target image is compared, and the verification accuracy is reduced, therefore, the brightness of the first target image and the second target image is adjusted, the brightness difference is reduced, and the verification result deviation caused by the image brightness difference is reduced.

In some embodiments, if the final detection result still indicates that corresponding firework features exist in the image to be detected, firework warning information is generated according to information corresponding to the image to be detected (such as an area where a camera is located, firework exists, and the like), the firework warning information is stored, and the warning information is sent to a corresponding client through an HTTP (HyperText Transfer Protocol) for warning, so that a user can process the firework warning information in time, and the fire influence is reduced.

The following illustrates the overall detection steps of the smoke detection model in some embodiments.

As shown in fig. 5, fig. 5 is a network structure of a smoke and fire detection model based on the YOLOv5 network, and the smoke and fire detection model includes an input module, a backbone network, a fusion module, and a prediction module.

The main network comprises 3 convolutional layers Conv and 3 feature extraction modules C3-CA, wherein the first convolutional layer is connected with an input module, an initial feature map output by the input module is used as input, the first C3-CA module is connected with the output of the first convolutional layer and the input of the second convolutional layer, namely, each convolutional layer and each C3-CA module are sequentially connected in a cross mode, conv represents a convolutional layer and is used for carrying out convolution operation, a C3 module represents a C3 module in a YOLOv5 network and is used for carrying out feature extraction, and a C3-CA module represents a C3 module embedded with CA attention and is used for carrying out feature extraction based on an attention mechanism.

The fusion module comprises 4 fusion units (a fusion unit 1, a fusion unit 2, a fusion unit 3 and a fusion unit 4), wherein the fusion unit 1 and the fusion unit 2 have the same structure and sequentially comprise a C3 module, a weighted fusion 2, an up-sampling module and a Conv; the fusion unit 3 sequentially comprises a Conv module, a weighted fusion 3 module and a C3 module; the fusion unit 4 sequentially includes a Conv module, a weighted fusion 2 module and a C3 module, the weighted fusion 2 module indicates that the fusion unit adopts two network layers and trains two weight parameters to perform weighted fusion on the input feature diagram (the first feature diagram or the fusion feature diagram), the weighted fusion 3 module indicates that the fusion unit adopts three network layers and trains 3 weight parameters to perform weighted fusion on the input feature diagram, and the C3 module of the YOLOv5 network is the prior art and is not described herein again.

For example, assuming that the size of the image to be detected is 1280 × 1280 and the number of channels is 3, the input module preprocesses the image to be detected to obtain 320 × 320 initial feature maps, the number of channels is 12, and the initial feature maps are input into the backbone network.

In the backbone network, a first convolution module performs convolution operation on the initial feature map to obtain an initial feature map with the size of 80 × 80, the initial feature map is input to a first C3-CA module, the C3-CA module performs feature extraction on the initial feature map based on an attention mechanism to output a first feature map with the size of 80 × 80, a second convolution layer obtains the first feature map with the size of 80 × 80 and performs convolution operation on the first feature map to obtain a first feature map with the size of 40 × 40, a second C3-CA module performs feature extraction on the first feature map with the size of 40 × 40 based on the attention mechanism, and so on, the first feature maps with the down-sampling sizes of 4 times, 8 times and 16 times, namely the first feature maps with the sizes of 80 × 40 and 20 × 20 are obtained respectively.

And the fusion module acquires the first feature maps of all sizes and performs fusion processing on the first feature maps. Specifically, the fusion unit 1 of the fusion module performs upsampling operation on the input first feature map of 40 × 40 to obtain a feature map with the size of 80 × 80, and then performs weighted feature fusion on the feature map with the first feature map with the size of 80 × 80 output by the CA-C3 module through the backbone network based on an ADD mode to obtain a fusion feature map with the size of 80 × 80, wherein the number of channels is 256; in the fusion unit 2, performing up-sampling operation on the feature maps of 20 × 20 to obtain feature maps with the size of 40 × 40, and performing weighted fusion on the feature maps with the size of 40 × 40 output by the CA-C3 module in the main network based on an ADD mode; in the fusion unit 3, acquiring a fusion feature map with the size of 80 × 80 output by the fusion unit 1, performing convolution operation on the fusion feature map to obtain a feature map with the size of 40 × 40, and performing weighted fusion on the feature map with the size of 40 × 40 output by the main network and the feature map with the size of 40 × 40 output by the fusion unit 2 based on an ADD mode to obtain a fusion feature map with the size of 40 × 40, wherein the number of channels is 512; and in the fusion unit 4, acquiring the fusion feature map which is obtained by the fusion unit 3 and stored as 40 × 40, performing convolution operation on the fusion feature map to obtain a feature map of 20 × 20, and performing weighted fusion on the feature map and the first feature map which is output by the main network and has the size of 20 × 20 based on an ADD mode to obtain a fusion feature map with the size of 20 × 20, wherein the number of channels is 1024.

And the prediction module acquires the fusion feature maps with the sizes of 80 × 256, 40 × 512 and 20 × 1024 generated by the fusion module respectively, and performs detection based on the fusion feature maps with different sizes respectively to obtain a detection result of the image to be detected.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Example two:

fig. 6 shows a block diagram of a smoke and fire detection device provided in an embodiment of the present application, corresponding to the smoke and fire detection method described in the above embodiment, and only the parts related to the embodiment of the present application are shown for convenience of description.

Referring to fig. 6, the apparatus includes: a detection module 61 and a verification module 62. Wherein the content of the first and second substances,

the detection module 61 is configured to input an image to be detected into a trained smoke and fire detection model for detection, so as to obtain a detection result output by the smoke and fire detection model, where the smoke and fire detection model performs multi-scale detection based on an attention mechanism, the detection result is used to indicate whether smoke and fire features exist in the image to be detected, and when the smoke and fire features exist in the image to be detected, a first target image and position information of the first target image are output, where the first target image is an image corresponding to an area including the smoke and fire features, and is captured from the image to be detected;

and a verification module 62, configured to perform a verification process based on the first target image to obtain a final detection result if the detection result indicates that the image to be detected has a smoke and fire feature, wherein the verification process is used to determine whether the smoke and fire feature exists, and the final detection result indicates whether the image to be detected has the smoke and fire feature and outputs position information of the smoke and fire feature when the image to be detected has the smoke and fire feature.

In some embodiments, the above-described pyrotechnic detection device further comprises:

the model base unit is used for constructing a smoke and fire detection model;

and the model training unit is used for training the firework detection model.

In some embodiments, the pyrotechnic detection device further comprises:

and the image to be detected acquiring unit is used for acquiring an image to be detected.

In some embodiments, the detection module 61 includes:

the preprocessing unit is used for preprocessing an input image to be detected to obtain an initial characteristic diagram of the image to be detected;

the characteristic extraction unit is used for carrying out multi-scale characteristic extraction on the initial characteristic diagram to obtain first characteristic diagrams with different sizes;

the fusion unit is used for fusing the first feature maps with different sizes to obtain a fusion feature map;

and the prediction unit is used for detecting the fused feature map and outputting a detection result.

In some embodiments, the feature extraction unit includes:

an attention map generation unit, configured to encode each channel of the initial feature map along a horizontal direction and a vertical direction, respectively, to obtain two attention maps in different directions;

the characteristic extraction subunit is used for carrying out characteristic extraction on the initial characteristic diagram to obtain an intermediate characteristic diagram;

and the attention embedding unit is used for embedding the two attention maps into the extracted intermediate feature map along the horizontal direction and the vertical direction respectively to obtain a first feature map containing different direction perceptions.

In some embodiments, the detection module 61 further includes:

a first feature map acquisition unit, configured to acquire at least two first feature maps of different sizes extracted by the feature extraction module;

a sampling unit, configured to perform upsampling or convolution processing on each of the first feature maps to obtain second feature maps of different sizes corresponding to the first feature maps of different sizes, where the size of the second feature map corresponding to the first feature map subjected to upsampling or convolution processing is the same as the size of one first feature map not subjected to upsampling or convolution processing;

and the weighted fusion unit is used for respectively carrying out weighted feature fusion on the specified second feature map and the specified first feature map to obtain fusion feature maps with different sizes, wherein the sizes of the specified first feature map and the second feature map which are subjected to weighted feature fusion are the same.

In some embodiments, the verification module 62 includes:

a verification image acquisition unit, configured to acquire N verification images if the detection result indicates that the image to be detected has a smoke and fire feature, where the verification images are images whose sampling times are adjacent to those of the image to be detected, and N is greater than or equal to 1;

a second target image obtaining unit, configured to obtain at least one second target image, where the second target image is an image corresponding to a target at a target position in each of the verification images, and the target position is a position corresponding to the position information;

and the comparison unit is used for comparing the first target image with each second target image to obtain a final detection result.

In some embodiments, the verification module 62 further comprises:

and a preset area comparison unit, configured to, when the first target image is compared with each of the second target images, compare images of a preset area captured based on center points of the first target image and each of the second target images to obtain a final detection result if a size of the first target image indicated by the detection result is smaller than a preset threshold.

In some embodiments, the verification module 62 further comprises:

and a brightness adjusting unit for calculating average brightness of the first target image and the second target images, respectively, and calculating brightness difference between the first target image and the second target images based on the average brightness to adjust brightness of the second target images according to the brightness difference.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

Example three:

fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 7, the terminal device 7 of this embodiment includes: at least one processor 70 (only one processor is shown in fig. 7), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, the steps of any of the various method embodiments described above being implemented when the computer program 72 is executed by the processor 70.

Illustratively, the computer program 72 may be divided into one or more modules/units, which are stored in the memory 71 and executed by the processor 70 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 72 in the electronic device 7. For example, the computer program 72 may be divided into the detection module 61 and the verification module 62, and the specific functions among the modules are as follows:

the detection module 61 is configured to input an image to be detected into a trained smoke and fire detection model for detection, and obtain a detection result output by the smoke and fire detection model, where the smoke and fire detection model performs multi-scale detection based on an attention mechanism, the detection result is used to indicate whether a smoke and fire feature exists in the image to be detected, and when the smoke and fire feature exists in the image to be detected, a first target image and position information of the first target image are output, where the first target image is an image corresponding to an area including the smoke and fire feature, and is captured from the image to be detected;

and a verification module 62, configured to perform verification processing based on the first target image to obtain a final detection result if the detection result indicates that the image to be detected has a smoke and fire feature, where the verification processing is configured to determine whether the smoke and fire feature exists, and the final detection result indicates whether the image to be detected has the smoke and fire feature, and output position information of the smoke and fire feature when the image to be detected has the smoke and fire feature.

The terminal device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 70, a memory 71. Those skilled in the art will appreciate that fig. 7 is only an example of the terminal device 7, and does not constitute a limitation to the terminal device 7, and may include more or less components than those shown, or combine some components, or different components, for example, and may further include input/output devices, network access devices, and the like.

The Processor 70 may be a Central Processing Unit (CPU), and the Processor 70 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may in some embodiments be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may also be an external storage device of the terminal device 7 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 71 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

An embodiment of the present application further provides a network device, where the network device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal device, recording medium, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunication signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In some jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and proprietary practices.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present application, and they should be construed as being included in the present application.

Claims

1. A method of fire and smoke detection, comprising:

inputting an image to be detected into a trained smoke and fire detection model for detection to obtain a detection result output by the smoke and fire detection model, wherein the smoke and fire detection model performs multi-scale detection based on an attention mechanism, the detection result is used for indicating whether smoke and fire characteristics exist in the image to be detected, and outputting a first target image and position information of the first target image when the smoke and fire characteristics exist in the image to be detected, and the first target image is an image corresponding to an area which is intercepted from the image to be detected and contains the smoke and fire characteristics;

2. The firework detection method according to claim 1, wherein the firework detection model comprises an input module, a backbone network, a fusion module and a prediction module, the input module is used for preprocessing an image to be detected to obtain an initial feature map, the backbone network is used for performing multi-scale feature extraction on the initial feature map to obtain first feature maps with different sizes, the fusion module is used for fusing the first feature maps to obtain a fused feature map, and the prediction module is used for detecting the fused feature map;

the image input of waiting to detect detects through the firework detection model of training, obtains the testing result of firework detection model output includes:

preprocessing an input image to be detected based on the input module to obtain an initial characteristic diagram of the image to be detected;

performing multi-scale feature extraction on the initial feature map based on the backbone network to obtain first feature maps with different sizes;

fusing the first feature maps with different sizes based on the fusion module to obtain a fused feature map;

and detecting the fusion characteristic diagram based on the prediction module, and outputting a detection result.

3. The smoke and fire detection method of claim 2, wherein the backbone network comprises at least two feature extraction modules, different ones of the feature extraction modules performing feature extractions of different sizes based on an attention mechanism;

when each feature extraction module performs feature extraction on the initial feature map, the method includes:

coding each channel of the initial characteristic diagram along the horizontal direction and the vertical direction respectively to obtain two attention diagrams in different directions;

extracting the characteristics of the initial characteristic diagram to obtain an intermediate characteristic diagram;

embedding the two attention diagrams into the intermediate feature diagram respectively along the horizontal direction and the vertical direction to obtain a first feature diagram containing different direction perceptions.

4. The pyrotechnic detection method of claim 3, wherein the fusing the first feature maps of different sizes based on the fusion module to obtain a fused feature map comprises:

acquiring the first feature maps with different sizes extracted by at least two feature extraction modules;

respectively performing upsampling or convolution processing on each first feature map to obtain a second feature map corresponding to each first feature map, wherein the size of the second feature map corresponding to the first feature map subjected to the upsampling or convolution processing is the same as the size of one first feature map not subjected to the upsampling or convolution processing;

and respectively carrying out weighted feature fusion on the specified second feature map and the specified first feature map to obtain fused feature maps with different sizes, wherein the first feature map subjected to weighted feature fusion and the second feature map are the same in size.

5. The smoke and fire detection method according to any one of claims 1 to 4, wherein if the detection result indicates that smoke and fire characteristics exist in the image to be detected, performing verification processing based on the first target image to obtain a final detection result comprises:

if the detection result indicates that the image to be detected has smoke and fire characteristics, acquiring N frames of verification images, wherein the verification images are images with sampling time adjacent to that of the image to be detected, and N is greater than or equal to 1;

acquiring at least one second target image, wherein the second target image is an image corresponding to a target at a target position in each verification image, and the target position is a position corresponding to the position information;

and comparing the first target image with each second target image to obtain a final detection result.

6. The firework detection method according to claim 5, wherein when the first target image is compared with each of the second target images, if the size of the first target image is smaller than a preset threshold, images of a preset area cut out based on center points of the first target image and each of the second target images are compared to obtain a final detection result.

7. The smoke and fire detection method of claim 5, further comprising, prior to said comparing the first target image to each of the second target images:

and respectively calculating the average brightness of the first target image and each second target image, and calculating the brightness difference of the first target image and each second target image based on the average brightness so as to adjust the brightness of each corresponding second target image according to the brightness difference.

8. A smoke and fire detection device, comprising:

the detection module is used for inputting an image to be detected into a trained firework detection model for detection to obtain a detection result output by the firework detection model, wherein the firework detection model performs multi-scale detection based on an attention mechanism, the detection result is used for indicating whether firework characteristics exist in the image to be detected, and outputting a first target image and position information of the first target image when the firework characteristics exist in the image to be detected, and the first target image is an image corresponding to an area which is intercepted from the image to be detected and contains the firework characteristics;

and the verification module is used for verifying and processing based on the first target image to obtain a final detection result if the detection result indicates that the image to be detected has the firework characteristic, the verification processing is used for determining whether the firework characteristic exists or not, the final detection result is used for indicating whether the image to be detected has the firework characteristic or not, and the position information of the firework characteristic is output when the image to be detected has the firework characteristic.

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.