CN114841920A

CN114841920A - Flame identification method and device based on image processing and electronic equipment

Info

Publication number: CN114841920A
Application number: CN202210319552.3A
Authority: CN
Inventors: 赵劲松; 吴德阳; 吴昊; 房晓峰; 田健辉; 赵泽恒; 杨博睿
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-08-02

Abstract

The application discloses a flame identification method and device based on image processing and electronic equipment. The flame identification method based on image processing comprises the following steps: extracting video image frames from the monitoring video stream; identifying whether the video image frame is a rainy scene or not through the first model; determining a flame detection threshold value according to whether the video image frame is a rainy scene; and identifying whether flame exists in the video image frame through the second model, wherein whether flame exists in the video image frame is obtained based on the comparison between the result output by the second model and the flame detection threshold value. According to the embodiment of the application, different threshold values can be set for the model for identifying the flame through identifying the raining scene, so that the accuracy of flame identification is improved.

Description

Flame identification method and device based on image processing and electronic equipment

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a flame identification method and device based on image processing and electronic equipment.

Background

Among the various disasters, fire is one of the most frequent and most common main disasters threatening the safety of people and lives and property and social development. At present, a plurality of technologies for early warning of fire are invented, and the purpose is to make people realize the possibility of fire before the fire spreads to an uncontrollable place, remind people to eliminate partial fire which does not form a disaster as soon as possible, and reduce the harm caused by the fire to the utmost extent. Under the indoor environment, smoke transducer is an effectual device that detects indoor flame, because the burning of object is often accompanied with the production of smog, and the indoor space of limitation makes smog concentration can rise fast in the short time after the object begins to burn to the effect in the aspect of the early warning of conflagration of smoke transducer has been guaranteed. However, in an outdoor environment where air circulation is good or a space is wide, smoke generated by combustion is rapidly diluted, and it is difficult to make a smoke sensor alarm, and the distribution of the sensor in such a space is difficult to determine. Therefore, a flame detection method based on computer image processing technology is an important research direction.

The traditional flame detection method based on the image processing technology mainly judges whether flame exists in an image or not according to visual characteristics such as color, shape and texture of flame and a classification algorithm in pattern recognition. The methods for manually designing the characteristics have high detection speed, but have poor stability and generalization capability, and are easy to generate false alarm when detecting objects with the color similar to that of flame and generate false alarm when detecting the flame with the color exceeding a set threshold value.

With the rapid development of deep learning techniques in recent years, deep convolutional neural networks have also been used for flame detection. In recent years, the deep convolutional neural network is the most widely studied and applied method in the field of computer vision, and features are generally automatically extracted from an image by using a convolutional layer with parameters, a pooling layer, a batch specification layer and the like, and classified by using a full-link layer to judge whether flames exist in the image. Parameters of the network model can be obtained by learning from a large amount of image data by utilizing a back propagation algorithm, so that the characteristics do not need to be designed manually, and the tasks of image characteristic extraction and mode classification can be automatically realized after the convolutional neural network is trained on an image. Compared with the traditional flame detection method, the technology based on the deep convolutional neural network has higher accuracy, recall rate and higher generalization capability. In addition, the image target detection technology based on the deep convolutional neural network has achieved certain success in many fields, and the related technology also applies the image target detection technology to the flame real-time detection process.

However, there still exist some problems in the related art, the false alarm rate of flame detection by a model may be high, and an object close to the flame in shape and color is easily determined as a flame by mistake, so that the current flame identification method based on the image target detection technology still has a certain improvement space in accuracy.

Disclosure of Invention

The embodiment of the application provides a flame identification method and device based on image processing and electronic equipment, and the false alarm rate of the flame identification method can be reduced.

In a first aspect, an embodiment of the present application provides a flame identification method based on image processing, where the method includes:

extracting video image frames from a monitoring video stream;

identifying whether the video image frame is a rainy scene through the first model;

determining a flame detection threshold value according to whether the video image frame is a rainy scene;

and identifying whether flame exists in the video image frame through the second model, wherein whether flame exists in the video image frame is obtained based on the comparison between the result output by the second model and the flame detection threshold value.

Optionally, determining the flame detection threshold according to whether the video image frame is a rainy scene includes:

adding a judgment result of whether the video image frame is a rainy scene into a result queue; the result queue is used for storing the judgment result of whether the continuous multi-frame video image frame is a rainy scene;

judging whether raining occurs at present according to the average condition of the result queue;

determining a flame detection threshold based on whether it is currently raining, wherein the flame detection threshold in the case of current raining is greater than the flame detection threshold in the case of not currently raining.

Optionally, before identifying whether a flame exists in the video image frame through the second model, the method further comprises:

and detecting whether a moving object exists in the video image frame, wherein under the condition that the moving object exists in the video image frame, whether flame exists in the video image frame is identified through the second model.

Optionally, before detecting whether a moving object exists in the video image frame, the method further includes:

modeling each pixel value in the video image frame as a mixed distribution of k Gaussian distributions;

sorting according to the ratio of the weight and the mean value of the k Gaussian distributions, and taking the first b Gaussian distributions as a background difference model;

judging whether the corresponding pixel value is a foreground pixel or a background pixel according to the difference between each pixel value in the video image frame and the mean value of the background model;

detecting whether a moving object exists in a video image frame, comprising the following steps: judging whether the ratio of the number of foreground pixels to the number of all pixels of the video image frame is greater than a preset ratio threshold value or not; and if so, determining that a moving object exists in the video image frame.

Optionally, before identifying whether the video image frame is a rainy scene through the first model, the method further includes:

acquiring a plurality of images of a rainy scene and a plurality of images of a non-rainy scene, and establishing a raining scene training set;

and training the first deep convolutional neural network model through a raining scene training set to obtain a first model.

acquiring a plurality of images with flames, and establishing a flame image training set, wherein each image with flames is marked with a real marking frame which is used for identifying an area with flames;

and training a second deep convolutional neural network model through a flame image training set to obtain a second model, wherein the second model is used for outputting the probability of flame existence of each prediction candidate frame in the multiple prediction candidate frames.

Optionally, training the second deep convolutional neural network model through a flame image training set, including:

acquiring a first image in a flame image training set;

dividing a first image into a plurality of regions;

dividing a corresponding plurality of prediction candidate frames in each region based on the plurality of standard-size frames; the sizes of the standard size frames are obtained by clustering according to the sizes of real marking frames of all pictures in the flame image training set, and the center of each standard size frame is the center of the corresponding area;

identifying, by a second deep convolutional neural network model, a probability of a flame present in each prediction candidate box;

calculating a first loss error of the second deep convolutional neural network model aiming at the first image through a preset loss function;

updating parameters of the second deep convolutional neural network model according to the first error through an error back propagation algorithm;

testing the updated second deep convolutional neural network model through the test set;

calculating the total loss error of the test set;

stopping training if the total loss error is determined to meet the preset condition.

Optionally, after identifying whether a flame exists in the video image frame through the second model, the method further includes:

judging whether the judgment result of the second model is wrong or not through the third model; and the third model is obtained by training a third deep convolution neural network model according to the recognition result of the second model on the flame image training set and the non-flame image training set.

Optionally, before the determining, by the third model, whether the determination result of the second model is incorrect, the method further includes:

identifying image areas with flames in each image of the flame image training set and the non-flame image training set through a second model, and intercepting each image area with the flames identified as a sub-image;

dividing a plurality of sub-images into a real example training set and a false positive example training set, wherein the real example training set is the sub-images with flames identified in the flame image training set, and the false positive example training set is the sub-images with flames identified in the non-flame image training set;

and training by taking the true example training set and the false positive example training set as the training set of the third deep convolutional neural network model.

In a second aspect, the present application provides a flame recognition apparatus based on image processing, including:

an extraction unit for extracting video image frames from the monitoring video stream;

the first identification unit is used for identifying whether the video image frame is a rainy scene or not through the first model;

the determining unit is used for determining a flame detection threshold value according to whether the video image frame is a rainy scene;

and the second identification unit is used for identifying whether flame exists in the video image frame through a second model, wherein the existence of flame in the video image frame is obtained by comparing the result output by the second model with a flame detection threshold value.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory storing program instructions; the processor, when executing the program instructions, implements the image processing based flame identification method as described in the first aspect.

In a fourth aspect, the present application provides a readable storage medium, on which program instructions are stored, and when executed by a processor, the program instructions implement the flame identification method based on image processing according to the first aspect.

In a fifth aspect, the present application provides a program product, and when executed by a processor of an electronic device, the instructions enable the electronic device to perform the flame identification method based on image processing according to the first aspect.

The flame identification method, the flame identification device, the electronic equipment, the readable storage medium and the program product based on image processing are realized by extracting video image frames from a monitoring video stream; identifying whether the video image frame is a rainy scene through the first model; determining a flame detection threshold value according to whether the video image frame is a rainy scene; and identifying whether flame exists in the video image frame through the second model, wherein whether flame exists in the video image frame is obtained based on the comparison between the result output by the second model and the flame detection threshold value. According to the embodiment of the application, different threshold values can be set for the model for identifying the flame through identifying the raining scene, so that the accuracy of flame identification is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram of a flame identification method based on image processing according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a real labeling box of a flame identification method based on image processing according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a true annotation box, a prediction candidate box, and a standard size box of a flame recognition method based on image processing according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a proper example image in a flame recognition method based on image processing according to an embodiment of the present application;

FIG. 5 is a diagram illustrating detection of a moving object in a flame recognition method based on image processing according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a recognition result of a flame recognition method based on image processing according to an embodiment of the present application;

FIG. 7 is a schematic flow chart diagram illustrating a method for identifying flames based on image processing according to another embodiment of the present application;

FIG. 8 is a schematic structural diagram of a flame recognition device based on image processing according to another embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to still another embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In order to solve the problems of the prior art, embodiments of the present application provide a flame identification method, apparatus, device and readable storage medium based on image processing. The following first describes a flame recognition method based on image processing according to an embodiment of the present application.

Fig. 1 shows a flow chart of a flame identification method based on image processing according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:

step 101, extracting video image frames from a monitoring video stream.

The monitoring video stream may be a continuous video stream extracted from a camera, the camera may be set in an area facing to a target to be monitored, and the camera may transmit the video stream to the executor according to the embodiment of the present application through network transmission, so that the executor according to the embodiment of the present application acquires the monitoring video stream.

Alternatively, the resolution of the monitoring video stream may be configured to be 720 × 576 or higher, with a frame rate of up to 20fps or more. After obtaining the monitoring video stream, the video image frames may be extracted frame by frame, and the corresponding processing is performed for each frame of video image frame according to the following steps.

Optionally, in the embodiment of the present application, the standard width and height of the video image frame are denoted as W and H, respectively, and Ch is 3; let I (ch, x, y) be the pixel value of the ch-th channel with coordinates (x, y).

And step 102, identifying whether the video image frame is a rainy scene through the first model.

The first model is a model for identifying whether a scene in the video image frame is a rainy scene, and may be referred to as a rainy day determination model in the embodiment of the present application.

Before step 102 is executed to identify whether the video image frame is a raining scene through the first model, the first model needs to be trained. Specifically, a plurality of images of a rainy scene and a plurality of images of a non-rainy scene can be acquired, a raining scene training set is established, and then the first deep convolutional neural network model is trained through the raining scene training set to obtain a first model.

An exemplary embodiment of constructing a rainy weather assessment model may include the steps of:

step 1021 collecting N _rain Not less than 1000 color pictures for detecting rainy weather in scene and N _{no rain} Not less than 1000 color pictures of non-rainy weather establish rainy scene data set F _rain ；

Step 1022 is to collect the rainy scene data set F obtained in step 1021 _rain Random partitioning into training sets F _train And test set F _test Wherein the training set F _train The proportion of a is more than or equal to 0.6 and less than or equal to 0.9;

step 1023, standardizing and normalizing each picture of the training set and the test set established in step 1022, wherein the standardized width and height are W and H respectively, W is more than or equal to 400, H is less than or equal to 800, and the channel number Ch is 3; taking the pixel value of the t channel and the coordinate (x, y) of any normalized picture as I (t, x, y), wherein I (t, x, y) is more than or equal to 0 and less than or equal to 255, calculating the normalized pixel value I '(t, x, y) by using the formula (3-1), wherein I' (t, x, y) is more than or equal to 0 and less than or equal to 1:

step 1024, constructing a deep convolutional neural network model, wherein the model comprises an input layer, a feature extraction part and an output layer which are sequentially connected; the characteristic extraction part is formed by combining a convolutional layer and a pooling layer, comprises 3 to 6 pooling layers, comprises 1 to 5 convolutional layers in front of each pooling layer, and is added with a batch standard layer after each convolutional layer; the output layer of the model comprises 2 neurons which respectively represent two categories of rainy days and non-rainy days;

step 1025, sequentially inputting each picture of the training set subjected to the standardization and normalization processing in the step 1023 into the model established in the step 1024; let the input bePicture I, then the output vector (z) of the output layer of the deep convolutional neural network ₀ ，z ₁ ) Representing non-rainy and rainy scenes, respectively.

Step 1026 calculates the error between the model output information and the true annotation information according to the loss function. Defining a loss function as

Wherein y is ^I Representing the actual classification label of the I picture, which is y in rainy days ^I 1, actual time of non-rain y ^I ＝0。

Step 1027, training set F _train Inputting each picture into the Model built in the step 1024, calculating a loss function according to a formula (3-2), updating Model parameters through an error back propagation algorithm, and obtaining a current Model and marking the current Model as a Model _old 。

Step 1028, utilizing test set F _test For Model _old Testing is carried out, and a test set F _test Each picture is input into a Model in sequence _old Calculating F according to the formula (3-2) _test All pictures in the Model _old Total Loss of _old ；

Step 1029, training set F _train Each picture is input into a Model in sequence _old Calculating a loss function according to the formula (3-2), and updating Model parameters through an error back propagation algorithm to obtain a new current Model which is recorded as a Model _new ；

Step 1210, utilize test set F _test For Model _new Testing is carried out, and a test set F _test Each picture is input into a Model in sequence _new Calculating F according to the formula (3-2) _test All pictures in the Model _new Loss of total Loss of _new ；

Step 1211, determining a training stop condition:

if Loss _new ≤Loss _old Continuing to train the Model and setting the current Model _new Update to a new Model _old Will be the current Loss _new Update to New Loss _old Returning to step 1029;

if Loss _new >Loss _old When the Model is not trained, the Model stops training and outputs the current Model _old As a flame detection Model after final training _rain 。

And 103, determining a flame detection threshold value according to whether the video image frame is a rainy scene.

The flame detection threshold is a threshold used by the second model to identify whether a flame is present in the video image frame. And when the second model identifies whether flame exists in the video image frame, the output result comprises the probability of flame existence, and whether flame exists is judged based on the comparison of the probability and the flame detection threshold. The flame detection threshold is set to different values depending on whether it rains or not. For example, if it is a rainy scene, the flame in the video image frame may be blurred, so that the flame in the image is mistakenly recognized, and the possibility of fire is less likely to occur in the rainy scene, therefore, a higher flame detection threshold may be set to avoid false alarm. Whereas in non-rainy scenes, a relatively low flame detection threshold may be set.

In an alternative embodiment, determining the flame detection threshold based on whether the video image frame is a rainy scene may include the steps of:

and step 1031, adding the judgment result of whether the video image frame is a rainy scene into a result queue.

The result queue is used for storing the judgment result of whether the continuous multi-frame (for example, 100 frames) video image frames are raining scenes.

And step 1032, judging whether it rains currently or not according to the average condition of the result queue.

Illustratively, if a rainy scene is identified in the video image frame as 1 and a non-rainy scene is identified as 0, then if the average value in the result queue is greater than or equal to 0.5, it is considered as raining, otherwise, if the average value is less than 0.5, it is considered as not raining.

Step 1033, a flame detection threshold is determined based on whether it is currently raining. Wherein the flame detection threshold in the case of current rain is greater than the flame detection threshold in the case of no current rain.

Through the optional implementation mode, whether the rain is raining or not can be identified more accurately based on the multi-frame images, and the accuracy of rain scene identification is improved.

And 104, identifying whether flame exists in the video image frame through the second model, wherein whether flame exists in the video image frame is obtained by comparing the result output by the second model with a flame detection threshold value.

The second model is used for detecting whether flames exist in the image, and in the embodiment of the present application, the second model may be referred to as a flame detection model.

Before the video image frame is identified whether flame exists through the second model, the second model can be obtained through flame image training set training. Specifically, a plurality of images with flames can be obtained, a flame image training set is established, wherein each image with flames is marked with a real marking frame, the real marking frame is used for identifying an area with flames, and then a second deep convolutional neural network model is trained through the flame image training set to obtain a second model, and the second model is used for outputting the probability that each prediction candidate frame in a plurality of prediction candidate frames has flames.

When training the second deep convolutional neural network model through the flame image training set, the following steps may be performed:

step 1041, acquiring a first image in the flame image training set;

step 1042, dividing the first image into a plurality of areas;

step 1043, dividing a plurality of prediction candidate frames in each region based on the plurality of standard size frames; the sizes of the standard size frames are obtained by clustering according to the sizes of real marking frames of all pictures in the flame image training set, and the center of each standard size frame is the center of the corresponding area;

step 1044 of identifying the probability of flame in each prediction candidate box through a second deep convolutional neural network model;

step 1045, calculating a first loss error of the second deep convolutional neural network model for the first image through a preset loss function;

step 1046, updating parameters of the second deep convolutional neural network model according to the first error through an error back propagation algorithm;

step 1047, testing the updated second deep convolutional neural network model through the test set;

step 1048, calculating the total loss error of the test set;

and step 1049, stopping training when the total loss error is determined to meet the preset condition.

One embodiment of constructing and training the flame detection model (second model) is described in detail below:

step 401: collecting N _fire Establishing a flame picture data set F for more than or equal to 5000 color pictures with flame pictures _image And manually marking the flame area position coordinates of each picture by using a rectangular frame to obtain a real marking frame of the flame area position and 4 coordinate values of the real marking frame in each picture:

1,2, …, N, wherein

The position coordinates of the top left corner of the frame are really marked for the picture I,

and the position coordinates of the vertex of the lower right corner of the frame are really marked for the I picture. Referring to fig. 2, a schematic diagram of a real labeling box of a flame identification method based on image processing according to an embodiment of the present application is shown. Storing the information of the real marking frame of each picture as an xml file, wherein the xml files corresponding to all the flame pictures form a flame marking data set F _annotation ；F _image And F _annotation Form a flame detection dataset F _fire 。

Step 402: will be in step 401Resulting flame detection dataset F _fire Random partitioning into training sets F _train And test set F _test Wherein the training set F _train Occupied flame detection dataset F _fire The ratio of a is more than or equal to 0.6 and less than or equal to 0.9.

Step 403: standardizing each picture of the training set and the test set established in the step 402, wherein the standard width and the standard height are respectively W and H, W is more than or equal to 400, H is less than or equal to 800, and the channel number Ch is 3; taking the pixel value of the t channel and the coordinate (x, y) of any normalized picture as I (t, x, y), wherein I (t, x, y) is more than or equal to 0 and less than or equal to 255, calculating the normalized pixel value I '(t, x, y) by using a formula step 401, wherein I' (t, x, y) is more than or equal to 0 and less than or equal to 1:

step 404: constructing a deep convolutional neural network model, wherein the model comprises an input layer, a feature extraction part and an output layer which are sequentially connected; the characteristic extraction part is formed by combining a convolutional layer and a pooling layer, comprises 3 to 6 pooling layers, comprises 1 to 5 convolutional layers in front of each pooling layer, and is added with a batch standard layer after each convolutional layer.

Step 405: inputting each picture of the training set which is standardized and normalized in the step 403 into the model established in the step 404 in sequence; if the input is the I picture, the feature map corresponding to the picture is output by the penultimate convolutional layer in the deep convolutional neural network, the size of the feature map is s multiplied by s units, each unit corresponds to different position areas of the input picture, the output of the penultimate convolutional layer is connected to the output layer of the whole model through the last convolutional layer, and the output is the class probability of k (k is more than or equal to 5 and less than or equal to 9) prediction candidate frames in s multiplied by s different position areas of the input picture

And coordinate offset relative to a corresponding standard size box

1≤i≤s ² J is more than or equal to 1 and less than or equal to k, wherein i represents the ith position area, and j represents the jth prediction candidate frame of the position area; the width and height of the standard size frame relative to the whole input picture are represented by F _fire The width and height of the real mark frame of all pictures relative to the original pictures are obtained by clustering by a k-means method, so that

Respectively representing the width and the height of the jth standard size frame relative to the whole input picture; the center of each standard size frame is the center of the picture position area where the standard size frame is located; each standard size frame corresponds to a prediction candidate frame;

wherein,

representing the probability of flame existing in the jth prediction candidate frame of the ith position area of the ith picture;

representing the coordinate offset of the jth prediction candidate frame center of the ith position area of the ith picture relative to the center of the corresponding standard size frame;

and the width and height offsets of the jth prediction candidate box in the ith position area of the ith picture relative to the width and height offsets of the corresponding standard size box are represented.

If the real labeled frame in each picture is G, and any standard size frame of the picture is T, the intersection ratio IOU calculation expression of T and G is as follows:

wherein area represents a region; setting the threshold value of the positive sample to be more than or equal to 0.5 and less than or equal to 0.7, and judging s multiplied by k standard size frames corresponding to each picture: if the IOU is larger than or equal to eta, flame exists in the standard size frame, and the standard size frame is a positive sample; if IOU < eta, no flame exists in the standard size frame, and the standard size frame is a negative sample;

step 406: and calculating the error between the model output information and the real labeling information according to the loss function. According to the model obtained in the step 404, the class probability of s multiplied by k prediction candidate frames corresponding to the I picture and the coordinate offset relative to the corresponding standard size frame are output, and the picture prediction information is calculated

Corresponding real label information

The error between; the picture I loss function is defined as follows:

wherein,

representing the real category corresponding to the jth prediction candidate frame in the ith position area of the ith picture, and if the standard size frame corresponding to the prediction candidate frame is a positive sample, measuring

The value is 1; if the standard size frame corresponding to the predicted candidate frame is negative sample, measure

The value is 0.

The offset of the width and the height of the real marking frame of the I picture relative to the width and the height of the jth standard size frame of the ith position area respectively is represented; relative offset of real marking frame of I picture

The calculation expressions are respectively as follows:

wherein,

respectively represent F corresponding to the I picture _fire The width and height of the original picture;

coordinate information of the jth standard size frame of the ith position area of the ith picture relative to the whole input picture,

is the coordinate of the center of the ith position area relative to the whole input picture,

obtained by k-means clustering. Lambda [ alpha ] _loc (0.01≤λ _loc ≦ 0.2) weight coefficients representing the position error loss versus the class error loss.

Whether the jth prediction candidate frame of the ith position area is a Boolean value of flame or not is represented, and if the standard size frame corresponding to the prediction candidate frame is a positive sample, the value is 1If the standard size box corresponding to the prediction candidate box is a negative sample, the value is 0.

And whether the jth prediction candidate frame in the ith position area is not the Boolean value of the flame or not is represented, if the standard size frame corresponding to the prediction candidate frame is a positive sample, the value is 0, and if the standard size frame corresponding to the prediction candidate frame is a negative sample, the value is 1. Lambda [ alpha ] _noobj (0.01≤λ _noobj ≦ 0.2) is a weight coefficient that adjusts the class loss ratio for positive and negative samples. Fig. 3 is a schematic diagram of a true labeling box, a prediction candidate box, and a standard size box of a flame identification method based on image processing according to an embodiment of the present application.

Step 407: repeating the steps 405 and 406 to obtain a training set F _train Inputting the Model established in the step 404 once for each picture, updating the Model through an error back propagation algorithm to obtain a current Model and recording the current Model as a Model _old ；

Step 408: using test set F _test For Model _old Testing is carried out, and a test set F _test Each picture is input into a Model in sequence _old F is calculated according to step 403 _test All pictures in the Model _old Total Loss of _old ；

Step 409: repeat step 407, training set F _train Each picture is input into a Model in sequence _old Obtaining a new current Model and marking as a Model _new ；

Step 410: repeat step 408 using test set F _test For Model _new Testing was carried out to give F _test All pictures in the Model _new Loss of total Loss of _new ；

Step 411: training stopping condition judgment:

if Loss _new ≤Loss _old Continuing to train the Model and setting the current Model _new Update to a new Model _old Will be the current Loss _new Update to New Loss _old Returning to the step 409 again;

if Loss _new >Loss _old When the Model is not trained, the Model stops training and outputs the current Model _old As a flame detection Model after final training _fire 。

The flame identification method based on image processing comprises the steps of extracting video image frames from a monitoring video stream; identifying whether the video image frame is a rainy scene through the first model; determining a flame detection threshold value according to whether the video image frame is a rainy scene; and identifying whether flame exists in the video image frame through the second model, wherein whether flame exists in the video image frame is obtained based on the comparison between the result output by the second model and the flame detection threshold value. According to the embodiment of the application, different threshold values can be set for the model for identifying the flame through identifying the raining scene, so that the accuracy of flame identification is improved.

Optionally, before the video image frame is identified as the raining scene through the first model, whether a moving object (in this embodiment, referred to as a moving target) exists in the video image frame may also be detected, wherein, in the case that the moving object exists in the video image frame, whether a flame exists in the video image frame is identified through the second model.

Specifically, whether a moving object exists in the video image frame or not can be detected by using a background difference model, before whether the moving object exists in the video image frame or not is detected, each pixel value in the video image frame can be modeled into a mixed distribution of k gaussian distributions, the mixed distribution is sorted according to the ratio of the weight and the mean value of the k gaussian distributions, and the first b gaussian distributions are taken as the background difference model.

Furthermore, the corresponding pixel values can be judged to be foreground pixel points or background pixel points according to the difference between each pixel value in the video image frame and the mean value of the background model, so that when whether a moving object exists in the video image frame is detected, whether the ratio of the number of the foreground pixel points to the number of all the pixel points in the video image frame is larger than a preset ratio threshold value can be judged, if yes, the moving object exists in the video image frame, and if not, the moving object does not exist.

The following describes in detail an embodiment of how to establish a background difference model in the examples of the present application:

and (3) detecting the moving target by using a Gaussian mixture model, and determining the maximum Gaussian distribution number K (K is less than or equal to K and less than or equal to 5). Every pixel I on the video frame _(ch,x,y) Modeled as a mixture of K gaussian distributions. Then a certain pixel point at time t has a pixel value

The probability of (c) is:

wherein w _k Is the weight parameter of the kth Gaussian component representing the Gaussian distribution

Can be written as:

wherein mu _k Is taken as the mean value of the average value,

is the covariance matrix of the kth gaussian distribution.

A mixture of K Gaussian distributions in accordance with

The values of (a) are ordered from high to low, and the first B gaussian distributions are taken as the background model of the scene, B can be obtained by:

where the threshold T is the minimum score of the background model.

In the background difference process, whether the difference between the pixel value on the video frame and the B Gaussian distribution means is within 2.5 standard deviations is used as a judgment condition that the pixel point is a foreground or background pixel point. If the difference between the pixel value and the mean value of all B Gaussian distributions is beyond 2.5 standard deviations, the pixel point is regarded as a foreground pixel point; and traversing in sequence, and if a Gaussian distribution exists and the difference between the pixel value and the mean value is within 2.5 standard deviations, regarding the pixel point as a background pixel point. The first gaussian component in the sequence that satisfies the condition updates the parameters as disclosed below:

wherein ω is _k Is the kth gaussian component. If all the B Gaussian distributions do not meet the judgment condition, replacing the Gaussian components sorted at the last with a new Gaussian distribution, wherein the mean value of the distribution is the current pixel value, and selecting a larger initial variance and a smaller weight parameter.

When the detection is started, a background difference model is established by using Nbg (more than or equal to 100) frame images in front of the video stream, and preparation is made for detecting a moving object of a next video frame.

And inputting the video image frame into a background difference model, sequentially judging whether each pixel point is a foreground pixel point, and updating the background difference model. Counting the proportion of the number of all foreground pixels in the number of pixels in the whole video frame, if the proportion is greater than or equal to r (r is greater than or equal to 0.0001 and less than or equal to 0.01), determining that a moving object exists, referring to fig. 5, which is a schematic diagram of moving object detection in the embodiment of the application, and continuing to execute the following steps when the moving object exists, otherwise, returning to step 101 to detect the next frame of picture.

Optionally, after identifying whether flame exists in the video image frame through the second model, if it is determined that flame exists, it may also be determined whether false alarm exists, specifically, the determination may be performed through a third model, and in this embodiment of the present application, the third model may be referred to as a false alarm determination model. Specifically, the third model is obtained by training a third deep convolution neural network model according to the recognition result of the second model on the flame image training set and the non-flame image training set.

An exemplary embodiment of the third model is that before the third model determines whether the determination result of the second model is incorrect, the third model is obtained by training as follows:

step 501, identifying image areas with flames in each image of a flame image training set and a non-flame image training set through a second model, and intercepting each image area with the flames identified as a sub-image;

step 502, dividing a plurality of sub-images into a true example training set and a false positive example training set, wherein the true example training set is the sub-image with flames identified in the flame image training set, and the false positive example training set is the sub-image with flames identified in the non-flame image training set; referring to fig. 4, a schematic diagram of a proper example image in a flame recognition method based on image processing according to an embodiment of the present application is shown;

step 503, training the real example training set and the false example training set as the training set of the third deep convolutional neural network model.

A specific example of the above alternative embodiment is described in detail below:

step 5011: collecting N _negative (more than or equal to 5000) color pictures without flame pictures to establish a non-flame picture data set F _negative (ii) a F is to be _fire And F _negative All pictures in (1) are input into the Model trained in step (4) _fire Obtaining the flame detection result output by the model

Reserve all

(i.e., Model) _fire Judged as flame) and based on this result, F is judged _fire And F _negative The area which is judged as flame in all the pictures is intercepted to obtain a plurality of sub-pictures. Wherein, from F _fire The obtained plurality of sub-pictures form a data set F _{true alarm} True positive (true positive); from F _negative The obtained plurality of sub-pictures form a data set F _{false alarm} Namely, false positive. F _{true alarm} And F _{false alarm} Jointly constitute a false positive judgment data set F _alarm 。

Step 5012: the false alarm judgment data set F obtained in the step 5011 _alarm Random partitioning into training sets F _train And test set F _test Wherein the training set F _train The proportion of a is more than or equal to 0.6 and less than or equal to 0.9;

step 5013: standardizing and normalizing each picture of the training set and the test set established in the step 5012, wherein the standard width and the standard height are respectively W and H, and the channel number Ch is 3; taking the t channel of any image after normalization processing and the coordinate (x, y) as the pixel value I (t, x, y), wherein I (t, x, y) is more than or equal to 0 and less than or equal to 255, calculating the normalized pixel value I '(t, x, y) by using the formula (5-1), wherein I' (t, x, y) is more than or equal to 0 and less than or equal to 1:

step 5014: constructing a deep convolutional neural network model, wherein the model comprises an input layer, a feature extraction part and an output layer which are sequentially connected; the characteristic extraction part is formed by combining a convolutional layer and a pooling layer, comprises 3 to 6 pooling layers, comprises 1 to 5 convolutional layers in front of each pooling layer, and is added with a batch standard layer after each convolutional layer; the output layer of the model comprises 2 neurons which respectively represent two categories of false flame alarm and real flame alarm.

Step 5015: inputting each picture of the training set standardized and normalized in the step 5013 into the model established in the step 5014 in sequence; let the input be the l picture, then the output vector (z) of the output layer of the deep convolutional neural network ₀ ，z ₁ ) And the two categories respectively represent false flame alarm and real flame alarm.

Step 5016: and calculating the error between the model output information and the real labeling information according to the loss function. Defining a loss function as

Wherein y is ^I Representing the actual classification label of the I picture, and actually being y when a real flame is alarmed ^I 1, when the actual false flame alarm is given ^I ＝0。

Step 5017: will train set F _train Inputting each picture into the Model established in the step 5014, calculating a loss function according to the formula (5-2), updating Model parameters through an error back propagation algorithm, and obtaining a current Model and marking the current Model as a Model _old 。

Step 5018: using test set F _test For Model _old Testing is carried out, and a test set F _test Each picture is input into a Model in sequence _old Calculating F according to the formula (5-2) _test All pictures in the Model _old Total LosS of LosS _o1d ；

Step 5019: will train set F _train Each picture is input into a Model in sequence _old Calculating a loss function according to the formula (5-2), and updating Model parameters through an error back propagation algorithm to obtain a new current Model which is recorded as a Model _new ；

Step 5110: using test set F _test For Model _new Testing is carried out, and a test set F _test Each picture is input into a Model in sequence _new Calculating F according to the formula (5-2) _test All pictures in the Model _new Loss of total Loss of _new ；

Step 5111: training stop condition judgment:

if Loss _new ≤Loss _old Continuing to train the Model and setting the current Model _new Update to a new Model _old Will be the current Loss _new Update to New Loss _old Returning to step 5019 again;

if Loss _new ＞Loss _old When the Model is not trained, the Model stops training and outputs the current Model _old As a flame detection Model after final training _alarm 。

Referring to fig. 7, a detailed description is given of an implementation of the embodiment of the present application in a specific application scenario:

and (1) real-time video transmission. Extracting continuous video streams from a designated network camera, requiring that the resolution of the video streams reaches 720 × 576 or higher and the frame rate reaches more than 20fps, and continuously sending the video frames frame by frame to the model in the following steps for operation.

And (2) detecting whether a moving object exists or not through a motion detection model. And inputting the video frame into a background difference model prepared in advance, sequentially judging whether each pixel point is a foreground pixel point, and updating the background difference model. And (3) counting the proportion of the number of all foreground pixels to the number of pixels of the whole video frame, if the proportion is more than or equal to r (r is more than or equal to 0.0001 and less than or equal to 0.01), continuing to detect, and if not, returning to the step (2) again to detect the next frame of picture. This step is used to detect whether moving objects are present in the video frames.

And (3) judging whether the rainy day is the rainy day through the rainy day judgment model. The result queue is maintained, with 1 value removed from the head of the result queue before each detection of a video frame. After being standardized, the video frame is input to a prepared Model for judging whether the rainy day is good _rain In (1). If the model judges that the current time is rainy, adding a value of 1 at the tail of the queue Q; if the model judges that the current non-rainy day is finished, a value of '0' can be added to the tail of the queue Q. Setting a flame detection threshold multiplier lambda _fire If the average value of the numerical values in the queue Q is more than or equal to 0.5, the current video frame scene is considered to be rainy, and lambda is set _fire 1.5; if the current video frame scene is less than 0.5, the current video frame scene is considered to be in non-rainy days, and lambda is set _fire ＝1。

And (4) detecting whether flame exists in the video frame through a flame detection model. Inputting the video frame into a pre-trained flame detection Model _fire The model outputs the class probability of s × s × k prediction candidate frames corresponding to the frame and the coordinate offset from the corresponding standard size frame

Wherein the superscript represents the flame detection model; setting a flame judgment threshold value threshold (0 is more than or equal to the threshold value is less than or equal to 1), and judging the category probability of a prediction candidate frame output by each frame of picture: if it is

Then no flame is present in the prediction candidate box; if it is

The predicted candidate frame has a flame and the amount of coordinate offset of the predicted candidate frame with respect to the corresponding standard-sized frame is determined based on the flame

And corresponding standard size frame coordinates

Calculating to obtain the vertex coordinates of the upper left corner and the lower right corner of the actually predicted flame area as

Wherein,

respectively representing the original width and height of the frame picture; if all the prediction candidate frames of the frame have no flame, returning to the step (2) again to detect the next frame; if any prediction candidate frame has flame, the model outputs the calculated vertex coordinates of the actually predicted flame area at the upper left corner and the lower right corner in the picture, and captures the corresponding area on the video frame according to the coordinates to obtain the sub-picture.

And (5) judging whether the flame is misinformed or not through the misinformation judgment model. Inputting all the flame area sub-pictures obtained in the step (4) into a false alarm judgment Model _alarm Obtaining a classification result, if the MOdel judges that the flame area picture is a true flame alarm, confirming that flame exists in the current video frame, and according to the MOdel in the step (4) _fire Outputting the obtained area coordinates, marking a rectangular box on the video frame, and outputting the finally synthesized video frame marked with flame; if the model judges that the flame area picture is false flame alarm, the fact that flame does not exist in the current video frame is confirmed, and the video frame which is not processed is directly output. And (5) after the final output is finished, returning to the step (2) again to carry out the detection step of the next frame of image.

The method and the device for detecting the flame are different from other flame detection methods in that background difference, target detection and image classification technologies in the field of computer vision are applied, the background difference technology is firstly utilized to detect the moving target of a video picture before flame detection is carried out, the picture is further detected only when the moving target exists in the picture, the calculated amount of flame detection is reduced, and the real-time detection efficiency is improved. According to the method and the device, end-to-end learning is performed from image data through the deep convolutional neural network, and the whole detection process from the input of an original image to the output of flame position information is achieved through a target detection technology. According to the embodiment of the application, after the flame target detection process is carried out, the output image area is further subjected to image classification, whether the flame detected in the target detection step is a true alarm or a false alarm is confirmed, and finally only the true flame alarm confirmed by the model is output, so that the false alarm rate of flame detection is reduced, and the overall accuracy is improved. In addition, the embodiment of the application can adapt to image input of different sizes, meets the requirements of real-time detection, has stronger generalization performance in different scenes, can be applied to fire early warning tasks in various scenes such as indoor scenes and outdoor scenes, reminds people to eliminate local flames which do not form larger disasters as early as possible in the early stage of fire occurrence, and reduces the harm caused by fire to the maximum extent.

Fig. 8 shows a schematic structural diagram of a flame recognition device based on image processing according to an embodiment of the present application. The flame recognition device based on image processing provided by the embodiment of the application can be used for executing the flame recognition method based on image processing provided by the embodiment of the application. In the embodiments of the flame recognition device based on image processing provided in the embodiments of the present application, reference may be made to the description in the embodiments of the flame recognition method based on image processing provided in the embodiments of the present application.

As shown in fig. 8, the flame recognition device based on image processing according to the embodiment of the present application includes an extraction unit 11, a first recognition unit 12, a determination unit 13, and a second recognition unit 14.

The extracting unit 11 is configured to extract video image frames from the monitoring video stream;

the first identification unit 12 is configured to identify whether the video image frame is a rainy scene through a first model;

the determining unit 13 is configured to determine a flame detection threshold according to whether the video image frame is a rainy scene;

the second identifying unit 14 is configured to identify whether a flame exists in the video image frame through the second model, where whether a flame exists in the video image frame is obtained by comparing a result output by the second model with a flame detection threshold.

The flame recognition device based on image processing of the embodiment of the application extracts video image frames from a monitoring video stream; identifying whether the video image frame is a rainy scene through the first model; determining a flame detection threshold value according to whether the video image frame is a rainy scene; and identifying whether flame exists in the video image frame through the second model, wherein whether flame exists in the video image frame is obtained based on the comparison between the result output by the second model and the flame detection threshold value. According to the embodiment of the application, different threshold values can be set for the model for identifying the flame through identifying the raining scene, so that the accuracy of flame identification is improved.

Optionally, the determination unit 13 may be configured to perform the following steps:

Optionally, the device may further include a detection unit configured to detect whether a moving object exists in the video image frame before the video image frame is identified as a rainy scene by the first model, wherein in a case that the moving object exists in the video image frame, whether a flame exists in the video image frame is identified by the second model.

Optionally, the apparatus may further include a first performing unit, configured to perform the following steps before detecting whether a moving object exists in the video image frame:

judging that the corresponding pixel value is a foreground pixel or a background pixel according to the difference between each pixel value in the video image frame and the mean value of the background model;

Optionally, the apparatus may further include a second performing unit, configured to perform the following steps before identifying whether the video image frame is a rainy scene through the first model:

Optionally, the apparatus may further include a third performing unit, before identifying whether a flame exists in the video image frame through the second model, performing the following steps:

Optionally, when the third execution unit performs training of the second deep convolutional neural network model through the flame image training set, the method may specifically include performing the following steps:

acquiring a first image in a flame image training set;

dividing a first image into a plurality of regions;

calculating the total loss error of the test set;

Optionally, the apparatus may further include a fourth execution unit, configured to determine, by the third model, whether the determination result of the second model is incorrect after identifying whether flame exists in the video image frame by the second model; and the third model is obtained by training a third deep convolution neural network model according to the recognition result of the second model on the flame image training set and the non-flame image training set.

Optionally, the apparatus may further include a fifth execution unit, configured to execute the following steps before determining, by the third model, whether the determination result of the second model is incorrect:

Fig. 9 shows a hardware structure diagram of an electronic device provided in an embodiment of the present application.

The electronic device may include a processor 301 and a memory 302 having stored program instructions.

Specifically, the processor 301 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 302 may include mass storage for data or instructions. By way of example, and not limitation, memory 302 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 302 may include removable or non-removable (or fixed) media, where appropriate. The memory 302 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 302 is a non-volatile solid-state memory.

In a particular embodiment, the memory 302 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory or a combination of two or more of these.

The memory may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) readable storage media (e.g., a memory device) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the method according to an aspect of the application.

The processor 301 reads and executes the program instructions stored in the memory 302 to implement any one of the image processing-based flame recognition methods in the above embodiments.

In one example, the electronic device may also include a communication interface 303 and a bus 310. As shown in fig. 9, the processor 301, the memory 302, and the communication interface 303 are connected via a bus 310 to complete communication therebetween.

The communication interface 303 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiment of the present application.

Bus 310 includes hardware, software, or both to couple the components of the electronic device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 310 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

In combination with the flame identification method based on image processing in the foregoing embodiments, the embodiments of the present application may be implemented by providing a readable storage medium. The readable storage medium having stored thereon program instructions; the program instructions, when executed by the processor, implement any of the image processing based flame identification methods of the above embodiments.

It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by program instructions. These program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims

1. A flame identification method based on image processing is characterized by comprising the following steps:

extracting video image frames from the monitoring video stream;

identifying whether the video image frame is a rainy scene through a first model;

determining a flame detection threshold according to whether the video image frame is the raining scene;

and identifying whether the video image frame has flames or not through a second model, wherein whether the video image frame has flames or not is obtained based on comparison between the result output by the second model and the flame detection threshold value.

2. The method of claim 1, wherein determining a flame detection threshold based on whether the video image frame is the raining scene comprises:

adding a judgment result of whether the video image frame is in the raining scene into a result queue; the result queue is used for storing the judgment result of whether the continuous multi-frame video image frames are in the raining scene;

judging whether the current raining is performed according to the average condition of the result queue;

determining the flame detection threshold based on whether it is currently raining, wherein the flame detection threshold in the case of current raining is greater than the flame detection threshold in the case of not currently raining.

3. The method of claim 1, further comprising, prior to identifying whether the video image frame is a raining scene via the first model:

and detecting whether a moving object exists in the video image frame, wherein under the condition that the moving object exists in the video image frame, whether flames exist in the video image frame or not is identified through a second model.

4. The method of claim 3, further comprising, prior to detecting whether a moving object is present in the video image frame:

modeling each pixel value in the video image frame as a mixture of k Gaussian distributions;

sorting according to the ratio of the weight of the k Gaussian distributions to the average value, and taking the first b Gaussian distributions as a background difference model;

the detecting whether a moving object exists in the video image frame comprises the following steps: judging whether the ratio of the number of the foreground pixel points to the number of all pixel points of the video image frame is greater than a preset ratio threshold value or not; and if so, determining that the moving object exists in the video image frame.

5. The method of claim 1, further comprising, prior to identifying whether the video image frame is a raining scene via the first model:

and training a first deep convolutional neural network model through the raining scene training set to obtain the first model.

6. The method of claim 1, further comprising, prior to identifying whether a flame is present in the video image frame via the second model:

and training a second deep convolutional neural network model through the flame image training set to obtain the second model, wherein the second model is used for outputting the probability of flame existence of each prediction candidate frame in a plurality of prediction candidate frames.

7. The method of claim 6, wherein training a second deep convolutional neural network model through the flame image training set comprises:

acquiring a first image in the flame image training set;

dividing the first image into a plurality of regions;

dividing a corresponding plurality of prediction candidate frames in each region based on the plurality of standard-size frames; the sizes of the standard size frames are obtained by clustering according to the sizes of the real marking frames of all the pictures in the flame image training set, and the center of each standard size frame is the center of the corresponding area;

identifying, by the second deep convolutional neural network model, a probability of a flame being present in each of the prediction candidate boxes;

calculating a first loss error of the second deep convolutional neural network model for the first image through a preset loss function;

testing the updated second deep convolutional neural network model through a test set;

calculating the total loss error of the test set;

stopping training if it is determined that the total loss error satisfies a preset condition.

8. The method of claim 6, after identifying whether a flame is present in the video image frame via the second model, further comprising:

judging whether the judgment result of the second model is wrong or not through a third model; and the third model is obtained by training a third deep convolution neural network model according to the recognition result of the second model on the flame image training set and the non-flame image training set.

9. The method according to claim 8, before determining whether the determination result of the second model is incorrect by a third model, further comprising:

identifying image areas with flames in each image of the flame image training set and the non-flame image training set through the second model, and intercepting each image area with the flames identified as a sub-image;

dividing a plurality of sub-images into a real example training set and a false example training set, wherein the real example training set is the sub-images with flames identified in the flame image training set, and the false example training set is the sub-images with flames identified in the non-flame image training set;

and training by taking the real example training set and the false positive example training set as the training set of the third deep convolutional neural network model.

10. An image processing-based flame identification device, the device comprising:

the first identification unit is used for identifying whether the video image frame is a rainy scene through a first model;

the determining unit is used for determining a flame detection threshold value according to whether the video image frame is the raining scene;

and the second identification unit is used for identifying whether flame exists in the video image frame through a second model, wherein the existence of flame in the video image frame is obtained by comparing the result output by the second model with the flame detection threshold value.

11. An electronic device, characterized in that the electronic device comprises: a processor and a memory storing program instructions;

the processor, when executing the program instructions, implements the image processing based flame recognition method of any of claims 1-9.

12. A readable storage medium, characterized in that the readable storage medium has stored thereon program instructions which, when executed by a processor, implement the image processing-based flame recognition method according to any one of claims 1 to 9.

13. A program product, characterized in that the instructions in the program product, when executed by a processor of an electronic device, cause the electronic device to perform the image processing based flame recognition method according to any of claims 1-9.