CN117315551B

CN117315551B - Method and computing device for flame alerting

Info

Publication number: CN117315551B
Application number: CN202311609035.0A
Authority: CN
Inventors: 彭一洋; 熊超; 牛昕宇
Original assignee: Shenzhen Corerain Technologies Co Ltd
Current assignee: Shenzhen Corerain Technologies Co Ltd
Priority date: 2023-11-29
Filing date: 2023-11-29
Publication date: 2024-03-19
Anticipated expiration: 2043-11-29
Also published as: CN117315551A

Abstract

The invention provides a method and a computing device for flame warning, wherein the method comprises the following steps: acquiring a monitoring video; detecting continuous flame frames from the surveillance video through a pre-trained flame detection model; cutting the continuous flame frames to form a preset number of target flame video frames; inputting the predetermined number of target flame video frames into a pre-trained flame classification model; classifying the target flame video frames with the preset number through the flame classification model to obtain a classification result; and judging whether to alarm according to the classification result. According to the technical scheme of the invention, the flame misjudgment is reduced, the monitoring cost is saved, the application range is wider, and the personal and property loss caused by fire can be effectively reduced.

Description

Method and computing device for flame alerting

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and computing equipment for flame alarm.

Background

With the development of artificial intelligence technology and the increasing popularity of monitoring devices, image data resources are growing rapidly. The image is used as the core of information transmission, contains rich information, and the rapid development of computer vision technology in recent years makes the application field of the image very wide, such as security protection, security supervision, chemical industry, gas stations and the like. Flame identification and alarm are important measures for guaranteeing urban management and safe production, and can effectively avoid personal and property safety loss possibly caused by fire disaster.

Therefore, a technical scheme is needed, which can accurately identify flames and give an alarm in time, ensure personal and property safety, provide important technical assurance for safe production, help discover fire in time and effectively reduce loss caused by fire.

Disclosure of Invention

The invention aims to provide a method and computing equipment for flame alarm, which can discover fire conditions in time and effectively reduce loss caused by fire.

According to an aspect of the present invention, there is provided a method of flame warning, the method comprising:

acquiring a monitoring video;

detecting continuous flame frames from the surveillance video through a pre-trained flame detection model;

cutting the continuous flame frames to form a preset number of target flame video frames;

inputting the predetermined number of target flame video frames into a pre-trained flame classification model;

classifying the target flame video frames with the preset number through the flame classification model to obtain a classification result;

and judging whether to alarm according to the classification result.

According to some embodiments, detecting consecutive flame frames from the surveillance video includes:

decoding the surveillance video into a plurality of frame images;

inputting the plurality of frame images into the flame detection model;

and if the flame detection model outputs detection frames for continuous preset number of frame images respectively and the intersection of the detection frames is not empty, confirming that the continuous flame frames are detected and outputting corresponding detection frames.

According to some embodiments, cropping the continuous flame frames to form a predetermined number of target flame video frames includes:

and digging out the corresponding areas from the corresponding frame images by utilizing the detection frames to obtain a predetermined number of target flame video frames.

According to some embodiments, classifying the predetermined number of target flame video frames by the flame classification model includes:

extracting spatial features for the predetermined number of target flame video frames;

extracting timing characteristics for the predetermined number of target flame video frames;

fusing the spatial features and the temporal features for calculating a classification score;

and determining a classification result according to the classification score.

According to some embodiments, extracting spatial features for the predetermined number of target flame video frames comprises:

convolving the target flame video frames with the predetermined number and input in the format of H, W and C, and outputting a first characteristic diagram, wherein H is the height of an image pixel, W is the width of the image pixel, C is the number of channels, and n is the predetermined number;

and performing a first two-dimensional convolution operation on the first feature map to extract the flame space features.

According to some embodiments, the convolution kernel of the first two-dimensional convolution operation is a k x k convolution kernel for spatial feature extraction, k being an integer greater than 1.

According to some embodiments, extracting timing features for the predetermined number of target flame video frames comprises:

reshaping the first signature, and converting the first signature in an h×nw×c format into an nh×w×c format, where nH represents an alternating superposition of H data from different W portions of the first signature;

and performing a second two-dimensional convolution operation on the remolded first feature map, and outputting a second feature map.

According to some embodiments, the method further comprises: remodeling the second feature map to obtain a third feature map in a format of H.times.nW.times.C;

and operating the third feature map by a full-join operator so as to fuse the spatial feature and the time sequence feature.

According to some embodiments, the flame detection model is derived by pre-training a yolov4 neural network.

According to another aspect of the present invention, there is provided a computing device comprising:

a processor; and

a memory storing a computer program which, when executed by the processor, causes the processor to perform the method of any one of the preceding claims.

According to another aspect of the invention there is provided a non-transitory computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor, cause the processor to perform the method of any of the above.

According to the embodiment of the invention, the flame is identified by training the flame detection model, the identified flame is classified by the flame classification model, and the classified flame is judged. Compared with a method only using flame detection, the method can better reduce misjudgment of the flame, more accurately and rapidly identify the flame and send out an alarm.

According to some embodiments, a flame classification model extracts spatial and temporal features for a target flame video frame, fuses the spatial and temporal feature fusion for computing a classification score, and determines a type of flame from the classification score. The space appearance characteristics of the flame are considered, the flame time sequence dynamic characteristics are considered, the accuracy of flame identification is improved, misjudgment and misinformation of the flame are reduced, and the loss of personal and property caused by fire is reduced.

According to some embodiments, the first feature map in h×nw×c format is reshaped, transformed into nh×w×c format, and data in the height direction are alternately superimposed, and a second two-dimensional convolution operation is performed on the reshaped first feature map, so as to output a second feature map. The two-dimensional convolution is utilized to simulate the three-dimensional convolution, so that the method is friendly to most AI reasoning chips in the market, the application range is wider, the cost of flame monitoring and alarming is lower, and the application range is wider.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the description of the embodiments will be briefly described below.

FIG. 1 illustrates a method flow diagram for flame alerting in accordance with an example embodiment.

FIG. 2 shows a schematic diagram of a two-dimensional convolution simulation three-dimensional convolution module in accordance with an example embodiment.

FIG. 3 shows a schematic diagram of a flame video classification network module combination, according to an example embodiment.

FIG. 4 shows a schematic diagram of flame detection and classification according to an example embodiment.

FIG. 5 illustrates a flow chart of flame identification and alarm according to an example embodiment.

Fig. 6a shows a schematic diagram of a first feature map according to an example embodiment.

Fig. 6b shows a schematic diagram of a third feature map according to an example embodiment.

Fig. 7a shows a schematic diagram of an overall network structure according to an example embodiment.

Fig. 7b shows a schematic diagram of a space-time-residual module according to an example embodiment.

Fig. 7c shows a schematic diagram of a space-time-residual/2 module according to an example embodiment.

Fig. 7d shows a schematic diagram of a space-time-residual/4 module according to an example embodiment.

Fig. 7e shows a schematic diagram of a space-time-bottleneck module according to an example embodiment.

FIG. 8 illustrates a block diagram of a computing device in accordance with an exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the invention.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another element. Accordingly, a first component discussed below could be termed a second component without departing from the teachings of the present inventive concept. As used herein, the term "and/or" includes any one of the associated listed items and all combinations of one or more.

The user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present invention are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation entries for the user to select authorization or rejection.

Those skilled in the art will appreciate that the drawings are schematic representations of example embodiments and that the modules or flows in the drawings are not necessarily required to practice the invention and therefore should not be taken to limit the scope of the invention.

The occurrence of fire accidents can cause unnecessary financial loss and casualties to society, and the fire is easy to control in the early stage of occurrence, so that how to effectively monitor the places where the disasters possibly occur, and the potential loss hazard is very important to be reduced to the minimum. Traditional flame identification has larger limitations, such as high requirements on shooting equipment and data transmission rate, and is easy to misjudge.

With the development of computer vision, image processing techniques based on deep learning have become more sophisticated and widely used in many aspects of today's society. The technology can automatically, quickly and accurately identify and alarm the fire flame in the scene with the monitoring camera, thereby timely responding at the initial stage of fire occurrence, winning more time and minimizing loss.

Therefore, the invention provides a flame alarming method, which carries out dynamic and morphological identification on flames through training a flame identification model and a flame classification model, carries out flame classification, calculates a flame score, and further judges whether early warning is needed or not. The flame identification method considers the spatial appearance characteristics of the flame and the time sequence dynamic characteristics, adopts two-dimensional convolution to simulate three-dimensional convolution, and is friendly to an AI reasoning chip.

Before describing embodiments of the present invention, some terms or concepts related to the embodiments of the present invention are explained.

Remodeling (reshape) operator: the reshape operator is a common operation in deep learning that can adjust the shape of the tensor so that the data can adapt to the input requirements of the model. The reshape operator is a basic tensor operation that can adjust a tensor to a specified shape. In deep learning, three-or four-dimensional tensors are typically used to represent input data.

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings.

Referring to fig. 1, in S101, a monitoring video is acquired.

According to some embodiments, a monocular camera is prepared, whose focal length, position and angle are fixed for video capture.

According to some embodiments, video is decoded, converting the video into picture frames.

At S103, consecutive flame frames are detected from the surveillance video.

According to some embodiments, the flame identification model detects whether there are flame frames of 4 consecutive frames among the picture frames in S101. If at least 1 frame in the continuous 4 frames does not contain flame detection results, the alarm is not given.

According to some embodiments, the video clip is a complete surveillance shot image, wherein only one area may have fire, yolov4 detects the area that may have fire, and the circumscribed rectangle of the fire detection frame of the 4 frames will scratch out the fire area of the 4 frames as the 4 frames video frame needed for flame classification when all the 4 frames detect fire.

At S105, the continuous flame frames are cropped to form a predetermined number of target flame video frames.

According to some embodiments, the video clip is a complete surveillance captured image, and the yolov4 flame detection network detects a fire area if only one of the areas is likely to be fire. When fire is detected in 4 continuous frames, the area of the circumscribed rectangle of the fire detection frame in the 4 frames is scratched out to serve as 4 frames of video frames required by flame classification, and the video segments are formed by using the scratched images of flame detection.

According to some embodiments, reasoning is performed with the form shown in FIG. 6a, once with 4 frames 224x224 of image composition, based on the chip's constraint on input width and height.

According to some embodiments, the flame detection model is derived by pre-training a yolov4 neural network. The yolov4 flame detection network is initialized by using a yolov4 open source model trained on a coco 80 type public data set, the training data are marked as flame rectangular detection frames, and after the training on the data is finished, the model initially has the capability of detecting flames.

According to some embodiments, the video clips may be used for flame detection training and flame classification training.

At S107, the predetermined number of target flame video frames are input into a flame classification model.

According to some embodiments, when 4 consecutive frames have detection frames and the detection frames of the 4 frames are intersected, the detection of the continuous flame frames is confirmed, and then the fire area of the 4 frames is scratched out by using an external rectangle, so as to be used as input of a flame classification model.

According to some embodiments, the frames used to classify flames may be overlapping. For example, in a video with a fire, flame classification may be performed with 1,2,3,4 frames for the first inference, 2,3,4,5 for the second inference, 3,4,5,6 for the 3 rd inference, and so on.

According to some embodiments, flame classification is for the original video frames to be for the video frames within the detection box area after detection. When 4 continuous frames have detection frames and the detection frames of the 4 frames are intersected, the fire in the 4 frames is considered as a fire, and then the fire area of the 4 frames can be scratched out by using an external rectangle to be used as the input of flame classification.

And in S109, classifying the target flame video frames of the preset number through the flame classification model to obtain a classification result.

According to some embodiments, the flame video classification network uses two-dimensional convolution to simulate three-dimensional convolution.

According to some embodiments, spatial features are first extracted for 4 frames of target flame video frames, and then temporal features are extracted for the target flame video frames. The spatial features and the temporal features are fused for calculating a classification score, and a classification result is determined according to the classification score.

At S111, it is determined whether or not to alarm based on the classification result.

The flame classification models are classified into flames and non-flames for predicting a score that a detected object is a flame. After model training is sufficient, the target flame output score tends to be 1, and the target non-flame output score tends to be 0.

Referring to fig. 2, the module is named space-time-remodeling (space-time-reshape) module, taking 4 transversely arranged pictures as input (i.e., 4W), the 4 pictures being arranged in chronological order from left to right. The output of the middle layer of the flame classification network is called a feature map, for example, the output of a convolution layer can be called a feature map.

According to some embodiments, the space-time-remodeling module is the fundamental module of the present network, the first convolution is intended to extract spatial features, the second convolution after the first remodeling operator is intended to extract temporal features, and the second remodeling operator is simply transformed back to the original shape.

The portion of the graph where s=1 represents a variable portion, the high directional step size (stride) of the second convolution is 1 for space-time-remodeling/S1 (i.e., s=1); for space-time-remodeling/S2 (i.e., s=2), the step size in the high direction of the second convolution is 2. S1 and S2 are features to extract space and time, respectively. In the case of S2 (see, e.g., fig. 7 a), the 2 nd convolution of S2 is downsampled once in time, and the time dimension is hidden in the high dimension, noting the context of the convolution, so the shape output after downsampling of the convolution is 2HxWxC, and the shape output after passing through the 2 nd remodeling operator is Hx2WxC.

As shown in fig. 6a, a square indicates a feature of length C, different letter codes represent frames of the feature from different moments, and pixels of each frame are HxW, and the method of the present invention processes 4 frames simultaneously, so that the input is Hx4WxC, H is the image pixel height, W is the image pixel width, and C is the channel number. In addition, the numbers in the blocks represent row numbers, which are drawn in simplified form as 5 rows in the example of fig. 6 a.

According to some embodiments, convolution operation is performed on 4 frames of target flame video frames input in the format of h×nw×c, and a first feature map is output. And performing a first two-dimensional convolution operation on the first feature map to extract the flame space features.

According to some embodiments, the convolution kernel of the first two-dimensional convolution operation is a k x k convolution kernel for spatial feature extraction, k being an integer greater than 1. And carrying out two-dimensional convolution on the first feature map of the first module, and extracting the flame space features.

According to some embodiments, the first signature is reshaped and the first signature in H x nW x C format is transformed into nH x W x C format, where nH represents an alternating superposition of H data from different W portions of the first signature. And performing a second two-dimensional convolution operation on the remodeled first feature map, and outputting a second feature map.

According to some embodiments, the second signature is reshaped to obtain a third signature in h×nw×c format. And operating the third feature map by a full-join operator so as to fuse the spatial feature and the time sequence feature.

The convolution kernel of the 1 st two-dimensional convolution has a width and height of 3x3, slides in the width and height directions of the feature map, and when the width and height are larger, the features of the same frame are mainly covered in the 3x3 convolution kernel, so that the two-dimensional convolution is performed on the first feature map to extract mainly spatial features such as the color and the shape of flame.

The 1 st remolding operator remodels the features of Hx4WxC into the shape of 4HxWxC, as shown in fig. 6B, blocks of the same row, B frame immediately following a frame, C frame immediately following B frame, D frame immediately following C frame, second row blocks immediately following the first row, and the same order as in fig. 6B. The arrangement of the data is not changed in the view of the AI chip, and only the way of reading the data is changed, so that the reshaping operator does not consume any calculation power.

The convolution kernel of the 2 nd two-dimensional convolution has a width and height of 1x3, as A1B1C1 in fig. 6B, and the area covered by the convolution kernel is mainly the characteristic of different frames at the same position, and corresponds to the position before remodeling. When the two-dimensional convolution performs a multiply-add operation, features are fused in the time direction, and time sequence features such as a flame jumping mode and a flame jumping rule are extracted. This allows the timing features to be preserved in the network until the last layer of the network, where the spatial features and timing features are fused by the fully-join operator.

The space-time-remodeling module and conv2d (convolution kernel 1x 1) are combined, and the addition operator is combined into a space-time-residual (space-time-residual) module, which is repeatedly stacked as shown in fig. 3, to form a flame video classification network.

Other flame detection and identification inventions with higher accuracy can also be implemented by using the space-time-remodeling module of the present invention if it is desired to be deployed on a chip that does not support three-dimensional convolution.

According to some embodiments, the flame video classification network in the method of the invention uses two-dimensional convolution to simulate three-dimensional convolution, so as to achieve the purposes of adapting more AI reasoning chips and reducing the calculated amount, wherein the two-dimensional convolution is an operator supported by all AI reasoning chips and has been optimized with high efficiency.

According to some embodiments, if a flame alarm method is to be deployed on a chip which does not support three-dimensional convolution, the space-time-remodeling module can be adopted to realize the flame alarm method.

Referring to fig. 4, a monitoring video is first acquired, video frames are acquired by decoding the video, the video frames are input into a flame detection model, and whether objects resembling flames exist in each frame is detected frame by frame through a yolov4 flame detection network.

According to some embodiments, if flames are detected in 4 consecutive frames, the 4 frames of flame images are cached, the circumscribed rectangle of the detection frame is calculated, and the region of the 4 frames with the flames is scratched out to be used as 4 frames of video frames required by the flame classification model.

And performing flame classification on the 4 frames of video frames, calculating flame scores, wherein the flame target output scores tend to be 1, and the non-flame target output scores tend to be 0.

According to some embodiments, this figure may be considered as a video frame or as a feature map in the middle of the network, and the inventive structure is throughout the entire network.

According to some embodiments, the flame classification model processes both video frames and feature maps.

According to some embodiments, flame classification is for the original video frames to be for video frames within the detection box area after detection only. When 4 continuous frames have detection frames and the detection frames of the 4 frames are intersected, the fire in the 4 frames is considered as a fire, and then the fire area of the 4 frames can be scratched out by using an external rectangle to be used as the input of flame classification. This input is matted out of the original video, not the feature map, and when the video frame enters the network, such as a convolution, it is not in video format. After entering the classification network, the video frames are called feature maps.

Referring to fig. 5, in S501, a camera shoots a video.

In S503, the video decoding is converted into picture data, forming a picture frame.

In S505, the flame detection model detects whether there is a flame frame in the picture frames.

According to some embodiments, the flame identification model detects whether there are flame frames of 4 consecutive frames in the picture frame.

In S507, the flame detection model determines a flame detection result.

According to some embodiments, the flame detection model performs flame detection based on the number of frames in which flame frames occur.

According to some embodiments, the flame detection model uses a yolov4 flame detection network.

If no flame is detected, no alarm is given and detection is continued in S509.

According to some embodiments, if at least 1 frame within the consecutive 4 frames does not contain a flame detection result, then no alarm is given.

If a flame is detected in S511, the flame frame of the last 4 frames of flame target detection results is buffered.

According to some embodiments, when it is detected that flame detection results are contained within consecutive 4 frames, the last 4 frames of flame frames are buffered for flame classification.

At S513, 4 frames of flame frames are input to the flame classification model.

According to some embodiments, 4 frames of flame frames are input to a flame classification model for flame spatial feature and flame timing feature extraction.

In S515, the flame classification model judges a flame detection result from the classification result.

According to some embodiments, flame spatial features and flame timing features are fused and a flame score is calculated.

If no flame is detected, no alarm is given and detection is continued at S517.

According to some embodiments, when the flame score approaches 0, it is representative that a non-flame is detected and no alarm is raised.

At S519, an alarm is given if the flame score of the flame classification result is greater than the threshold value.

According to some embodiments, when the flame score approaches 1, an alarm is raised on behalf of a detected flame.

Before entering the flame classification network, the data are slices of 4 frames of video, which are resized (resize) to 224x224x3 RGB images, and then are tiled in a broad direction into a 224x896x3 RGB image, before being sent into the network. The first step after entering the network is to pool the features with a kernel of 7x7, a kernel of 2x2, and a step size of 2x2, resulting in a feature map of 112x448x 64. Since convolution can control the number of channels of the output profile at will, variations in the number of channels are not expressly explained herein.

Then enters the main part of the network, is a series of space-time-residual and space-time-bottleneck modules, and is downsampled from 112x448 to 7x7 after multiple downsampling in space and time. In particular, the length in the temporal direction is maintained up to the space-time-remodeling/S2 portion, which avoids temporal features being fused by spatial features prematurely, facilitating extraction of high-level semantic features of the temporal dimension (the deeper the network layer, the higher the semantic information).

Finally, after averaging pooling (avgpooling), the video information is extracted as a 512-dimensional feature vector, and when the probability value of fire is larger than that of non-fire, the prediction input is fire through the probability value of full-connection prediction fire/non-fire category. In practical tests, the score of the true flame probability value is often above 0.9.

Fig. 7b shows a schematic diagram of a space-time-residual module according to an example embodiment, according to some embodiments.

Referring to fig. 7b, the space-time-residual module is a space-time residual module built by imitating the residual structure of the resnet network, and the function of the space-time-residual module is similar to that of the residual module of the resnet network, namely, the gradient of 'output y' can be smoothly transmitted to 'input x' by using the form of y=f (x) +x, so that the function of accelerating and stabilizing training is achieved.

Fig. 7c, 7d show schematic diagrams of space-time-residual-2 and space-time-residual-4 according to example embodiments, according to some embodiments.

Referring to fig. 7c and 7d, the spatio-temporal-residual/2 module downsamples once in the spatial dimension on the spatio-temporal-residual basis. Here, a pooling operator is added to S1, the step size in the width and height directions is 2, the width and height are halved under the action of the pooling operator, and the shape is changed from Hx4WxC to H/2x2WxC.

The 2 nd space-time-remodelling/S1 module is replaced by a space-time-remodelling/S2 module in the space-time-residual/4 module, so that the whole space-time-residual/4 module is downsampled once in the space direction (wide and high) and also in the time direction (hidden in the wide), and the shape is changed from Hx4WxC to H/2xWxC.

Fig. 7e shows a schematic diagram of a space-time-bottleneck module according to an example embodiment, according to some embodiments.

Referring to fig. 7e, the space-time-bottleneck module is a convolution with 2 kernels 1x1 on the basis of space-time-residual/S1, the 1 st conv2d 1x1 serves to reduce the channel dimension from C to C/4, the subject part of the module is reduced, namely the calculated amount of 2 space-time-remodeling/S1 is reduced, and the 2 nd conv2d 1x1 serves to change the channel number from C/4 back to C, so that the module finally maintains the same channel number as the input. Because of this varying less and more structure, it is visualized as a bottleneck (bottleneck) structure, which is useful in many network designs.

FIG. 8 illustrates a block diagram of a computing device according to an example embodiment of the invention.

As shown in fig. 8, computing device 30 includes processor 12 and memory 14. Computing device 30 may also include a bus 22, a network interface 16, and an I/O interface 18. The processor 12, memory 14, network interface 16, and I/O interface 18 may communicate with each other via a bus 22.

The processor 12 may include one or more general purpose CPUs (Central Processing Unit, processors), microprocessors, or application specific integrated circuits, etc. for executing relevant program instructions. According to some embodiments, computing device 30 may also include a high performance display adapter (GPU) 20 that accelerates processor 12.

Memory 14 may include machine-system-readable media in the form of volatile memory, such as Random Access Memory (RAM), read Only Memory (ROM), and/or cache memory. Memory 14 is used to store one or more programs including instructions as well as data. The processor 12 may read instructions stored in the memory 14 to perform the methods according to embodiments of the invention described above.

Computing device 30 may also communicate with one or more networks through network interface 16. The network interface 16 may be a wireless network interface.

Bus 22 may be a bus including an address bus, a data bus, a control bus, etc. Bus 22 provides a path for exchanging information between the components.

It should be noted that, in the implementation, the computing device 30 may further include other components necessary to achieve normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.

The present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method. The computer readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVDs, CD-ROMs, micro-drives, and magneto-optical disks, ROM, RAM, EPROM, EEPROM, DRAM, VRAM, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), network storage devices, cloud storage devices, or any type of media or device suitable for storing instructions and/or data.

Embodiments of the present invention also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform part or all of the steps of any one of the methods described in the method embodiments above.

It will be clear to a person skilled in the art that the solution according to the invention can be implemented by means of software and/or hardware. "Unit" and "module" in this specification refer to software and/or hardware capable of performing a specific function, either alone or in combination with other components, where the hardware may be, for example, a field programmable gate array, an integrated circuit, or the like.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, such as a division of units, merely a division of logic functions, and there may be additional divisions in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present invention.

The exemplary embodiments of the present invention have been particularly shown and described above. It is to be understood that this invention is not limited to the precise arrangements, instrumentalities and instrumentalities described herein; on the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method for flame warning, comprising:

acquiring a monitoring video;

judging whether to alarm according to the classification result,

wherein classifying the predetermined number of target flame video frames by the flame classification model comprises:

extracting spatial features for the predetermined number of target flame video frames, comprising: convolving the target flame video frames with the predetermined number and input in the format of H, W and C, and outputting a first characteristic diagram, wherein H is the height of an image pixel, W is the width of the image pixel, C is the number of channels, and n is the predetermined number; performing a first two-dimensional convolution operation on the first feature map, and extracting the flame space features;

extracting timing features for the predetermined number of target flame video frames, comprising: reshaping the first signature, and converting the first signature in the format of h×nw×c into nh×w×c, where nH represents an alternating superposition of H data from different W portions of the first signature so that the time dimension is hidden in the height dimension; performing a second two-dimensional convolution operation on the remolded first feature map, and outputting a second feature map;

and remolding the second characteristic diagram to obtain a third characteristic diagram in an H.times.nW.times.C format.

2. The method of claim 1, wherein detecting successive flame frames from the surveillance video comprises:

decoding the surveillance video into a plurality of frame images;

inputting the plurality of frame images into the flame detection model;

3. The method of claim 2, wherein cropping the continuous flame frames to form a predetermined number of target flame video frames comprises:

4. The method of claim 1, wherein classifying the predetermined number of target flame video frames by the flame classification model further comprises:

and determining a classification result according to the classification score.

5. The method of claim 1, wherein the convolution kernel of the first two-dimensional convolution operation is a k x k convolution kernel for spatial feature extraction, k being an integer greater than 1.

6. The method according to claim 1, wherein the method further comprises:

7. The method of claim 2, wherein the flame detection model is derived by pre-training a yolov4 neural network.

8. A computing device, comprising:

a processor; and

a memory storing a computer program which, when executed by the processor, causes the processor to perform the method of any one of claims 1-7.