CN108537215B

CN108537215B - Flame detection method based on image target detection

Info

Publication number: CN108537215B
Application number: CN201810243014.4A
Authority: CN
Inventors: 赵劲松; 吴昊
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-03-23
Filing date: 2018-03-23
Publication date: 2020-02-21
Anticipated expiration: 2038-03-23
Also published as: CN108537215A

Abstract

The invention provides a flame detection method based on image target detection, and belongs to the field of image processing, fire detection and video monitoring. Firstly, establishing a flame detection data set containing flame pictures and marking information of each picture, and dividing the flame detection data set into a training set and a test set; constructing a deep convolutional neural network model, performing iterative update on the model by using a training set, calculating a loss function on the updated model by using a test set, and finishing the model training if the loss function of the current model is not reduced any more; and shooting a real-time video, detecting each frame of picture by using the trained model, and if flame exists, outputting the coordinate position of the flame in the picture by using the model and marking the flame by using a rectangular frame. According to the invention, the candidate area of suspected flame is generated without manually designing features, and the flame detection can be directly carried out on the whole picture by using the deep convolutional neural network to obtain the position information of the flame, so that the occurrence of fire is early warned, and the harm caused by the fire is reduced to the maximum extent.

Description

Flame detection method based on image target detection

Technical Field

The invention relates to the field of image processing, fire detection and video monitoring, in particular to a flame detection method based on image target detection.

Background

The fire disaster is a frequent disaster seriously harming the life and property safety of people, early warning of the occurrence of the fire disaster can avoid the spread of the fire disaster, and the harm brought by the fire disaster is reduced to the maximum extent. For flame detection tasks, especially for outdoor and large indoor spaces and other scenes, common physical sensors cannot be effectively completed, so that the flame detection method based on the image processing technology is an important research direction in the field of safety engineering at present.

The development of flame detection methods based on image processing techniques is mainly in two stages. The first stage is to judge whether flame exists in the image according to visual characteristics of flame such as color, shape, texture and the like and by combining a classification algorithm in pattern recognition. The methods for manually designing the characteristics have high detection speed, but have poor stability and generalization capability, false alarm is easy to occur when detecting objects with the color similar to that of flame, and false alarm is easy to occur when detecting flame with the color exceeding a set threshold value. To address the problems with such approaches, researchers have recently begun to utilize deep convolutional neural networks for flame detection. The deep convolutional neural network is the most widely studied and applied method in the field of computer vision in recent five years, and features are generally automatically extracted from an image by using a convolutional layer, a pooling layer, a Batch Normalization (BN) layer and the like, and classified by using a full connection layer to judge whether flames exist in the image. The method has the advantages that the tasks of image feature extraction and mode classification can be realized by utilizing the neural network without manually designing features. Compared with the traditional flame detection method, the technology based on the deep convolutional neural network has higher accuracy, recall rate and higher generalization capability. In recent years, with the development of hardware computing power of CPUs, GPUs and the like, the computing speed of the method can meet the real-time requirement of flame detection. The existing flame detection method based on the deep convolutional neural network has the problems that a candidate area of suspected flame still needs to be generated from an original image, and then the neural network is used for classifying the candidate area to judge whether flame exists or not.

At present, a flame detection method based on a deep convolutional neural network is provided, and the method comprises the steps of firstly extracting a suspected flame point in a color image, taking the suspected flame point as a foreground to obtain a binary image, obtaining a series of connected regions by using a connected region method, obtaining candidate regions through screening, and then carrying out classification judgment on whether flames exist in the candidate regions by using the convolutional neural network. The problem with this approach is that the generation of the suspected flame candidate area is still accomplished by a manual design feature. This process still requires the use of the color and dynamic characteristics of the flame, which results in poor adaptability of its detection method to environmental changes. To date, no method has emerged that can achieve image flame detection completely independent of manually designed features.

Disclosure of Invention

The invention aims to overcome the defects of the existing method and provides a flame detection method based on image target detection. According to the method, the candidate area of the suspected flame is generated without using manual design characteristics, the flame of the whole picture can be directly detected by using the deep convolutional neural network, and the position information of the flame is obtained, so that the occurrence of fire is early warned, and the harm caused by the fire is reduced to the maximum extent.

The invention provides a flame detection method based on image target detection, which is characterized by comprising the following steps of:

1) collecting N color pictures with flame pictures to establish a flame picture data set F_image，N≥5000；

2) For the flame picture data set F established in the step 1)_imageManually marking the flame area position coordinates of each picture by using a rectangular frame to obtain a real marking frame of the flame area position and 4 coordinate values of the real marking frame in each picture, wherein the 4 coordinate values are as follows:

1, 2, N, wherein

The position coordinates of the top left corner vertex of the frame are really marked for the first picture,

the position coordinates of the top point of the lower right corner of the frame are really marked for the first picture; storing the information of the real marking frame of each picture as an xml file, wherein the xml files corresponding to all the flame pictures form a flame marking data set F_annotation；F_imageAnd F_annotationForming a flame detection dataset F₀；

3) Detecting the flame obtained in step 2) by using a data set F₀Random partitioning into training sets F_trainAnd test set F_testWherein the training set F_trainOccupied flame detection dataset F₀The ratio of a is more than or equal to 0.6 and less than or equal to 0.9;

4) constructing a deep convolution neural network model and training the model to obtain a trained flame detection model; the method comprises the following specific steps:

4-1) carrying out standardization and normalization processing on each picture of the training set and the test set;

standardizing each picture of the training set and the test set established in the step 1), wherein the standard width and the standard height are respectively W and H, W is more than or equal to 400 and less than or equal to 800, H is more than or equal to 400 and less than or equal to 800, and the channel number Ch is 3; taking the pixel value of the t channel and the coordinate (x, y) of any image after the normalization processing as I (t, x, y), wherein I (t, x, y) is more than or equal to 0 and less than or equal to 255, calculating the normalized pixel value I '(t, x, y) by using the formula (1), wherein I' (t, x, y) is more than or equal to 0 and less than or equal to 1:

4-2) constructing a deep convolutional neural network model, wherein the model comprises an input layer, a feature extraction part and an output layer which are sequentially connected; the characteristic extraction part is formed by combining convolution layers and pooling layers and comprises 3-6 pooling layers, wherein each pooling layer comprises 1-5 convolution layers in front of the pooling layer, and a batch standard layer is added behind each convolution layer;

4-3) training the model established in the step 4-2) to obtain a trained flame detection model; the method comprises the following specific steps:

4-3-1) sequentially inputting each picture of the training set subjected to the standardization and normalization processing in the step 4-1) into the model established in the step 4-2); inputting the first picture, outputting a feature map corresponding to the picture by the penultimate convolutional layer in the deep convolutional neural network, wherein the feature map is s multiplied by s units, each unit corresponds to different position areas of the input picture, then connecting the output of the penultimate convolutional layer to an output layer of the whole model through the last convolutional layer, outputting the output of the penultimate convolutional layer as k in the s multiplied by s different position areas of the input picture, wherein k is more than or equal to 5 and less than or equal to 9, and the class probability of each prediction candidate frame

And coordinate offset relative to corresponding standard size box

1≤i≤s²J is more than or equal to 1 and less than or equal to k, wherein i represents the ith position area, and j represents the jth prediction candidate frame of the position area; standard of meritThe width and height of the size frame relative to the whole input picture are represented by F_imageThe width and height of the real mark frame of all pictures relative to the original pictures are obtained by clustering by a k-means method, so thatRespectively representing the width and the height of the jth standard size frame relative to the whole input picture; the center of each standard size frame is the center of the picture position area where the standard size frame is located; each standard size frame corresponds to a prediction candidate frame;

wherein,

the probability that flame exists in the jth prediction candidate frame in the ith position area of the ith picture is represented, and the value range is 0-1;the coordinate offset of the jth prediction candidate frame center relative to the corresponding standard size frame center in the ith position area of the ith picture is represented, and the value range is 0-1 (delta w_ij，Δh_ij) Representing the width and height offset of a jth prediction candidate frame in an ith position area of the ith picture relative to the width and height of a corresponding standard size frame, wherein the value range is 0-1;

4-3-2) dividing s multiplied by k standard size frames of the input picture in the step 4-3-1) into positive and negative samples;

if the real labeled frame in each picture is G, and any standard size frame of the picture is T, the intersection ratio IoU of T and G is calculated as follows:

wherein area represents a region;

setting a threshold η of positive and negative samples, wherein the threshold is not less than 0.5 and not more than η and not more than 0.7, and judging s multiplied by k standard size frames corresponding to each picture, wherein if IoU is not less than η, flames exist in the standard size frames, the standard size frames are positive samples, and if IoU is less than η, the standard size frames do not exist, the standard size frames are negative samples;

4-3-3) calculating the error between the model output information and the real annotation information according to the loss function;

outputting the category probability of s multiplied by k prediction candidate frames corresponding to the first picture and the coordinate offset relative to the corresponding standard size frame according to the model obtained in the step 4-3-1), and calculating the picture prediction information

Corresponding real label information

The error between; the loss function for the ith picture is defined as follows:

wherein,

representing the real category corresponding to the jth prediction candidate frame in the ith position area of the ith picture, if the standard size frame corresponding to the prediction candidate frame is a positive sample, then

The value is 1; if the standard size frame corresponding to the prediction candidate frame is negative sample, thenThe value is 0;

indicating the coordinate offset of the center of the real labeling frame of the ith picture relative to the center of the jth standard size frame of the ith position area,

the width and the height of the real marking frame of the first picture are respectively corresponding to the jth standard of the ith position areaThe offset of the width and height of the size frame; relative offset of real marking frame of first picture

The calculation expressions are respectively as follows:

wherein,respectively represent F corresponding to the first picture_imageThe width and height of the original picture;

coordinate information of the jth standard size frame of the ith position area of the ith picture relative to the whole input picture,

is the coordinate of the center of the ith position area relative to the whole input picture,

obtained by k-means clustering; lambda [ alpha ]_locA weight coefficient representing the position error loss relative to the class error loss, 0.01 ≦ λ_loc≤0.2；

Indicates the ith position areaWhether the jth prediction candidate frame is the Boolean value of the flame or not is judged, and if the standard size frame corresponding to the prediction candidate frame is a positive sample, the standard size frame is judged to be the Boolean value of the flame

The value is 1; if the standard size frame corresponding to the prediction candidate frame is negative sample, then

The value is 0;

a Boolean value indicating whether the jth prediction candidate frame of the ith position area is not the flame, if the standard size frame corresponding to the prediction candidate frame is a positive sample, then

The value is 0; if the standard size frame corresponding to the prediction candidate frame is negative sample, then

The value is 1; lambda [ alpha ]_noobjλ is 0.01 ≦ for adjusting the weighting factor of the class loss ratio of the positive and negative samples_noobj≤0.2；

4-3-4) repeating steps 4-3-1) to 4-3-3), and combining the training set F_trainInputting each picture into the Model established in the step 4-1), updating the Model through an error back propagation algorithm to obtain a current Model and recording the current Model as a Model_old；

4-3-5) Using test set F_testFor Model_oldTesting is carried out, and a test set F_testEach picture is input into a Model in sequence_oldCalculating F according to equation (3)_testAll pictures in the Model_oldLoss of total Loss of_old；

4-3-6) repeat step 4-3-4), training set F_trainEach picture is input into a Model in sequence_oldObtaining a new current Model and marking as a Model_new；

4-3-7) repeat step 4-3-5), using test set F_testFor Model_newTesting was carried out to give F_testAll pictures in the Model_newLoss of total Loss of_new；

4-3-8) training stopping condition judgment:

if Loss_new≤Loss_oldContinuing to train the Model and setting the current Model_newUpdate to a new Model_oldWill be the current Loss_newUpdate to New Loss_oldAnd returning to the step 4-3-6);

if Loss_new≥Loss_oldWhen the Model is not trained, the Model stops training and outputs the current Model_oldTaking the flame detection model as a finally trained flame detection model, and entering the step 5);

5) carrying out flame detection on the real-time video by using the flame detection model obtained in the step 4);

5-1) shooting a real-time video, inputting each frame of picture of the real-time video into the flame detection model trained in the step 4) for detection, and outputting the class probability of s multiplied by k prediction candidate frames corresponding to the frame of picture and the coordinate offset relative to the corresponding standard size frame by the model

Where the superscript o represents the flame detection model;

5-2) setting a flame judgment threshold value, wherein the value range is 0-1, and judging the category probability of a prediction candidate frame output by each frame of picture: if it is

Then no flame is present in the prediction candidate box; if it is

The predicted candidate frame has a flame and the amount of coordinate offset of the predicted candidate frame with respect to the corresponding standard-sized frame is determined based on the flame

And corresponding standard size frame coordinatesCalculating to obtain the vertex coordinates of the upper left corner and the lower right corner of the actually predicted flame area as

Wherein,

respectively representing the original width and height of the frame picture;

5-3) if all the prediction candidate frames of the frame have no flame, returning to the step 5-1) again, and carrying out flame detection on the next frame; if any prediction candidate frame has flame, the model outputs the vertex coordinates of the actually predicted flame area in the upper left corner and the lower right corner of the picture, which are obtained by calculation in the step 5-2), and the coordinates are marked by using a rectangular frame, and then the step 5-1) is returned again to continue to detect the flame of the next frame of picture.

The invention has the characteristics and beneficial effects that:

the invention applies the target detection technology in the field of computer vision, is different from other flame detection methods, does not need to extract a candidate area of suspected flame, and converts the classification problem of flame detection into a regression problem. The invention does not depend on the static and dynamic characteristics of the flame needing manual design, can complete the whole detection process from the input of an original image to the output of flame position information through a deep convolution neural network in the whole process, and avoids the situations of false alarm and false alarm of the manually designed characteristics to a certain extent. In addition, the invention can adapt to image input with different sizes, can meet the requirement of real-time detection, has stronger generalization performance in different scenes, and can be applied to fire early warning tasks in various scenes such as indoor scenes, outdoor scenes and the like so as to avoid fire or reduce the harm brought by fire.

Drawings

FIG. 1 is an overall flow chart of the method of the present invention.

FIG. 2 is a diagram of a standard size box, a true label box, and a prediction candidate box according to an embodiment of the invention.

Fig. 3 is a schematic view of a process for labeling a flame picture according to an embodiment of the invention.

Fig. 4 is a schematic diagram of flame detection results in different scenarios according to an embodiment of the present invention.

Detailed Description

The invention provides a flame detection method based on image target detection, which is further described in detail below with reference to the accompanying drawings and specific embodiments.

The invention provides a flame detection method based on image target detection, the overall flow is shown as figure 1, and the method comprises the following steps:

1) collecting N (N is more than or equal to 5000) color pictures with flame pictures, and establishing a flame picture data set F_image. The picture source has no special requirement, the pictures of the embodiment are all collected from the network, the pictures have no size requirement, and the pictures are required to have obvious flame pictures.

2) For the flame picture data set F established in the step 1)_imageThe position coordinates of the flame area of each picture are manually marked by a rectangular frame (the rectangular frame is the actual position of the flame area in the picture and is framed by the rectangular frame to obtain 4 coordinate values of the rectangular frame), and the real marking frame of the flame area position and the 4 coordinate values of the real marking frame in each picture are obtained as follows:

1, 2, N, wherein

the position coordinates of the top point of the lower right corner of the frame are really marked for the first picture; and storing the flame marking data set as an xml file in a format of a VOC007 data set, wherein each picture corresponds to one xml file, and the xml files corresponding to all the flame pictures form a flame marking data set F_annotation。F_imageAnd F_annotationForm a complete flame detection data set F₀。

3) Detecting the flame obtained in step 2) by using a data set F₀Random partitioning into training sets F_trainAnd test set F_test(due to F)_imagePictures and F in_annotationThe xml files in (1) are in one-to-one correspondence, and the training set F is divided_trainAnd test set F_testThe images obtained by the middle division and the xml files are in one-to-one correspondence), wherein the training set F_trainOccupied flame detection dataset F₀A is more than or equal to 0.6 and less than or equal to 0.9, and a training set F_trainParameters to train the model, test set F_testTo test the generalization performance of the model.

because the size of the collected picture in the step 1) is any size, and the input of the convolutional neural network needs to fix the size of the picture, the original picture needs to be scaled to a standard size, the standard width and height are respectively W and H, W is more than or equal to 400 and less than or equal to 800, H is more than or equal to 400 and less than or equal to 800, and Ch is 3 without changing the number of channels. In the training and testing process, the value ranges of all pixel values of the input picture sample are all between 0 and 255, and normalization processing is needed to make the pixel values between 0 and 1. Taking the pixel value of the t channel and the coordinate (x, y) of any normalized picture as I (t, x, y), wherein I (t, x, y) is more than or equal to 0 and less than or equal to 255, calculating the normalized pixel value I '(t, x, y) by using a formula (1), wherein I' (t, x, y) is more than or equal to 0 and less than or equal to 1:

4-2) constructing a deep convolutional neural network model, which comprises an input layer, a feature extraction part and an output layer which are sequentially connected; wherein, the normalized picture is input from the input layer and connected with the feature extraction part; the characteristic extraction part is formed by combining a convolution layer and a pooling layer, wherein the pooling layer comprises 3-6 pooling layers, 1-5 convolution layers are arranged in front of each pooling layer, and a batch standard layer is added behind each convolution layer and is used for normalizing each batch of input data in an output characteristic diagram of the convolution layer; and finally, the characteristic extraction part is connected with the output layer to output a prediction result.

4-3-1) randomly selecting a picture from the training set after the standardization and normalization processing in the step 4-1) and inputting the picture into the model established in the step 4-2); let the input be the l-th picture, the feature map corresponding to the picture is output by the penultimate convolutional layer in the deep convolutional neural network, the feature map size corresponding to each picture is s × s units (the output feature map size s × s is determined by the width and height (W, H) of the input picture and the designed network structure), each unit corresponds to a different position region of the input picture, as shown in fig. 2, the large map on the left side of fig. 2 is the input picture, the small map on the right side is the feature map corresponding to the picture, when s is 3, the feature map includes 9 units, which correspond to 9 different position regions in the picture, and the uppermost-left-corner unit of the feature map corresponds to the uppermost-left-corner position region of the input picture. Then the output of the last convolution layer is connected to the output layer of the whole model through the last convolution layer, and the output is the category outline of k prediction candidate frames with k being more than or equal to 5 and less than or equal to 9 in different positions (s multiplied by s) of the input pictureRate of change

And coordinate offset relative to corresponding standard size box1≤i≤s²J is more than or equal to 1 and less than or equal to k, wherein i represents the ith position area, and j represents the jth prediction candidate frame of the position area. As shown in fig. 2, the solid line frame in fig. 2 is a real labeled frame, only one frame in fig. 2 is shown, the dotted frame represents a prediction candidate frame, each position region has k equal to 5 prediction candidate frames, the dashed frame represents a standard size frame, each position region has k equal to 5 standard size frames, and the relative size of the standard size frame (i.e., the width and height of the standard size frame relative to the whole input picture, such that the standard size frame corresponds to the whole input picture, and the reference size frame indicates the relative size of the prediction candidate frame

Respectively representing the width and height of the jth standard-size frame relative to the whole input picture) is represented by F_imageThe width and height of the real marking frames of all the pictures relative to the original pictures (the original pictures are the pictures before standardization processing) are obtained by clustering through a k-means method, and the center of each standard size frame is the center of the picture position area where the standard size frame is located; and each standard size frame corresponds to one prediction candidate frame, and the coordinates of the prediction candidate frame can be calculated through the coordinate offset and the coordinates of the standard size frame.

Wherein

The probability of flame existing in the jth prediction candidate frame of the ith position area of the ith picture is represented, the value range is 0-1,

closer to 1 indicates a greater likelihood of flame being present in the prediction candidate box;indicates the ith position area of the first pictureCoordinate offset of the jth prediction candidate frame center relative to the corresponding standard size frame center, (Δ w)_ij，Δh_ij) And the width and the height of the jth prediction candidate frame in the ith position area of the ith picture are represented by the offset of the width and the height of the jth prediction candidate frame relative to the width and the height of the corresponding standard size frame, and the value ranges are 0-1.

In actual operation, the input of the model is a batch of batch, the image data with the size of batch being 8 or more and the size of batch being 64 or less, and the size of batch is batch × W × H × Ch.

and (3) if the real marked frame in each picture is G, and the frame with any standard size of the picture is T, the Intersection-over-Unit ratio of T and G is (IoU):

wherein area represents a region;

setting a threshold η of positive and negative samples, wherein the threshold is more than or equal to 0.5 and less than or equal to η and less than or equal to 0.7, and judging s multiplied by k standard size frames corresponding to each picture, if IoU is more than or equal to η, the standard size frames have flames and are considered as positive samples, and if IoU is less than η, the standard size frames have no flames and are considered as negative samples.

Corresponding real label information

Error between, loss of pictureThe numbers are defined as follows:

wherein,

indicating the real type corresponding to the jth prediction candidate frame in the ith position area of the ith picture (the same as the real type corresponding to the corresponding standard size frame), and if the standard size frame corresponding to the prediction candidate frame is a positive sample, determining that the current prediction candidate frame is a positive sample

The value is 0;

the offset of the width and height of the actual marking frame of the ith picture relative to the width and height of the jth standard size frame of the ith position area (if the standard size frame is a positive sample, namely

The coordinate information of the real label box makes sense). Relative offset of real label box

From F_annotationThe marking information in the xml file is obtained by calculation, for example, the marking information of the flame area of the first picture isThe coordinate information of the jth standard size frame of the ith position area relative to the whole input picture is

(

Obtained by clustering with k-means,

i.e. the coordinates of the center of the ith position area relative to the whole input picture), the relative offset of the real mark frame of the ith picture

The calculation expressions are respectively as follows:

wherein,

respectively represent F corresponding to the first picture_imageThe width and height of the original picture; lambda [ alpha ]_locThe weight coefficient indicating the position error loss relative to the class error loss can be generally set to 0.01 ≦ λ_loc≤0.2；

To representWhether the jth prediction candidate frame of the ith position region is a Boolean value of the flame (a positive sample is 1, and a negative sample is 0);

boolean values (positive sample 0, negative sample 1) indicating whether the ith cell, the jth prediction candidate box, is not a flame; since the number of candidate regions for negative samples is much larger than that for positive samples in the whole picture,

the loss of terms will be much greater than

Thus introducing a weighting factor lambda_noobjTo adjust the class loss ratio of the positive and negative samples, it can be set to 0.01 ≦ λ_noobj≤0.2。

4-3-4) repeating steps 4-3-1) to 4-3-3), and combining the training set F_trainInputting each picture into the Model established in the step 4-1), carrying out iterative training on the whole neural network Model, updating the Model through an error back propagation algorithm, and obtaining a current Model and recording the current Model as a Model_old；

4-3-8) training stopping condition judgment:

if Loss_new≤Loss_oldTo account for test errorsStill descending, Model_newIn test set F_testThe performance of the system is superior to that of a Model_oldContinuing to train the Model and setting the current Model_newUpdate to a new Model_oldWill be the current Loss_newUpdate to New Loss_oldAnd returning to the step 4-3-6);

if Loss_new≥Loss_oldWhen the test error begins to rise, the Model_oldIn test set F_testThe performance of the system is superior to that of a Model_newIf so, the Model stops training and outputs the current Model_oldAnd 5) as a finally trained flame detection model, entering the step 5).

Where the superscript o represents the flame detection model;

5-2) setting a flame judgment threshold value, wherein the value range is 0-1, and judging the category probability of a prediction candidate frame output by each frame of picture: if it isThen no flame is present in the prediction candidate box; if it is

And corresponding standard size frame coordinates

Calculating to obtain the vertex coordinates of the upper left corner and the lower right corner of the actually predicted flame area as

Wherein,

respectively representing the original width and height of the frame picture;

5-3) if all the prediction candidate frames of the frame have no flame, returning to the step 5-1) again, and carrying out flame detection on the next frame; if any prediction candidate frame has flame, the model outputs the vertex coordinates of the actually predicted flame area in the upper left corner and the lower right corner in the picture, which are obtained by calculation in the step 5-2), and the coordinates are marked by using a rectangular frame, and then the step 5-1) is returned again), the flame detection of the next frame of picture is continued, and then the marked picture and the fire alarm signal can be sent to the control center.

The above method can be implemented by programming by those skilled in the art.

One embodiment of the present invention is as follows:

the flame detection method based on image target detection provided by the embodiment comprises the following steps:

1) collecting N (N is 5075) color pictures with flame pictures, and constructingVertical flame picture dataset F_image. The collected pictures are all gathered from the network, and all the pictures contain obvious flame pictures.

2) For the flame picture dataset F of step 1)_imageManually marking the coordinates of the flame area position of each picture by using a rectangular frame, as shown in (a) and (b) of fig. 3, framing the flame area by using the rectangular frame, and obtaining a real marking frame of the flame area position and 4 coordinate values of the real marking frame in each picture, wherein the coordinate values are as follows:

1, 2, N, whereinThe position coordinates of the top left corner vertex of the frame are really marked for the first picture,the position coordinates of the top point of the lower right corner of the frame are really marked for the first picture; and storing the labeling information as xml files in the format of a VOC007 data set, wherein each picture corresponds to one xml file, and the xml files corresponding to all flame pictures form a flame labeling data set F_annotation。F_imageAnd F_annotationForm a complete flame detection data set F₀。

3) Detecting the flame data set F obtained in the step 2)₀Random partitioning into training sets F_trainAnd test set F_test(due to F)_imagePictures and F in_annotationThe xml files in (1) are in one-to-one correspondence, and the training set F is divided_trainAnd test set F_testThe images obtained by the middle division and the xml files are in one-to-one correspondence), wherein the training set F_trainOccupied flame detection dataset F₀In a proportion of 80% to F_trainIncluding 4060 pictures and corresponding label data, F_testIncluding 1015 pictures and corresponding labeled data, training set F_trainParameters to train the model, test set F_testTo test the generalization performance of the model.

in this embodiment, the standard width and height are set to W and H (W416, H416), and the number Ch of channels of the color picture is 3. And (3) carrying out normalization processing on the pixel data of the zoomed picture according to a formula (1) to enable the pixel value to be between 0 and 1. The input of the model is picture data of one batch of batch 16, and the input size is batch × W × H × Ch.

4-2) constructing a deep convolutional neural network model, which comprises an input layer, a feature extraction part and an output layer: wherein, the normalized picture is input from the input layer and connected with the feature extraction part; the characteristic extraction part consists of 22 convolutional layers and 5 pooling layers, wherein each pooling layer comprises 1-5 convolutional layers in front, and a batch standard layer is added behind each convolutional layer for normalizing each batch of input data in an output characteristic diagram of the convolutional layer; and finally, the characteristic extraction part is connected with the output layer to output a prediction result.

4-3-1) randomly selecting a picture from the training set after the standardization and normalization processing in the step 4-1) and inputting the picture into the model established in the step 4-2); and if the input is the l picture, outputting a feature map corresponding to the picture by using the penultimate convolutional layer in the deep convolutional neural network, wherein the feature map corresponding to each picture is s × s units (s is 13), and each unit corresponds to a different position area of the input picture. The output of the last convolution layer is connected to the output layer of the whole model through the last convolution layer, and the output is the class probability of k-5 prediction candidate frames in different positions (s multiplied by s) of the input picture

And coordinate offset relative to corresponding standard size box

1≤i≤s²J is more than or equal to 1 and less than or equal to k, wherein i represents the ith position area, and j represents the jth prediction candidate frame of the position area. By making a pair F_imageK-means clustering is carried out on the real marking frames of all the pictures relative to the width and the height of the original pictures to obtain the relative width of 5 standard-size frames

And height

Comprises the following steps: (0.238,0.292),(0.754,0.819),(0.325,0.572),(0.094,0.117),(0.597,0.407).

4-3-2) dividing the s × s × k standard size frames of the input picture in the step 4-3-1) into positive and negative samples, setting the intersection ratio threshold η of the positive and negative samples to be 0.6, and judging the s × s × k standard size frames corresponding to each picture, wherein if IoU is not less than η, flames exist in the standard size frames and are considered as positive samples, and if IoU is less than η, flames do not exist in the standard size frames and are considered as negative samples.

Corresponding real label informationThe loss function of the first picture is defined in formula (3).

Wherein the weight coefficient of the position error loss relative to the class error loss is set as lambda_loc0.2, weight coefficient setting for adjusting class loss ratio of positive and negative samplesIs λ_noobj＝0.2。

4-3-8) training stopping condition judgment:

if Loss_new≤Loss_oldModel, which shows that the test error is still decreasing_newIn test set F_testThe performance of the system is superior to that of a Model_oldContinuing to train the Model and setting the current Model_newUpdate to a new Model_oldWill be the current Loss_newUpdate to New Loss_oldAnd returning to the step 4-3-6);

5-1) shooting a real-time video, and inputting each frame of picture of the real-time video into the training in the step 4)Detecting by a good flame detection model, and outputting the class probability of s × s × k prediction candidate frames corresponding to the frame and the coordinate offset relative to the standard size frame by the model

Where the superscript o represents the flame detection model.

Then no flame is present in the prediction candidate box; if it is

The predicted candidate frame has a flame and the coordinate offset of the predicted candidate frame with respect to the corresponding standard-sized frame is used

And corresponding standard size frame coordinates

Calculating the vertex coordinates of the upper left corner and the lower right corner of the actually predicted flame area according to the formulas (8) to (11) to obtain

5-3) if all the prediction candidate frames of the frame have no flame, returning to the step 5-1) again, and carrying out flame detection on the next frame; if any prediction candidate frame has flame, the model outputs the vertex coordinates of the actually predicted flame area in the upper left corner and the lower right corner of the picture, which are calculated in the step 5-2), and the coordinates are marked by using a rectangular frame, as shown in (a) to (e) in fig. 4, and then the step 5-1 is returned again, the flame detection of the next frame of picture is continued, and then the marked picture and the fire alarm signal can be sent to the control center.

Claims

1. A flame detection method based on image target detection is characterized by comprising the following steps:

wherein

standardizing each picture of the training set and the test set established in the step 3), setting the standard width and height as W and H respectively, setting W to be more than or equal to 400 and less than or equal to 800,400 and less than or equal to 800, and setting the channel number Ch to be 3; taking the pixel value of the t channel and the coordinate (x, y) of any image after the normalization processing as I (t, x, y), wherein I (t, x, y) is more than or equal to 0 and less than or equal to 255, calculating the normalized pixel value I '(t, x, y) by using the formula (1), wherein I' (t, x, y) is more than or equal to 0 and less than or equal to 1:

4-3-1) sequentially inputting each picture of the training set subjected to the standardization and normalization processing in the step 4-1) into the model established in the step 4-2); inputting the input image as the first image, outputting a characteristic image corresponding to the image by the penultimate convolutional layer in the deep convolutional neural network, wherein the size of the characteristic image is s × s units, each unit corresponds to different position areas of the input image, then connecting the output of the penultimate convolutional layer to an output layer of the whole model through the last convolutional layer, and outputting the output of the penultimate convolutional layer as the class probability of k prediction candidate frames in the s × s different position areas of the input imageAnd coordinate offset relative to corresponding standard size box

Wherein i represents the ith position area, and j represents the jth prediction candidate frame of the position area; the width and height of the standard size frame relative to the whole input picture are represented by F_imageThe width and height of the real mark frame of the whole picture relative to the original picture are passed throughObtained by k-means clustering, the order

Respectively representing the width and the height of the jth standard size frame relative to the whole input picture; the center of each standard size frame is the center of the picture position area where the standard size frame is located; each standard size frame corresponds to a prediction candidate frame;

wherein,

the probability that flame exists in the jth prediction candidate frame in the ith position area of the ith picture is represented, and the value range is 0-1;

the coordinate offset of the jth prediction candidate frame center relative to the corresponding standard size frame center in the ith position area of the ith picture is represented, and the value range is 0-1 (delta w_ij,Δh_ij) Representing the width and height offset of a jth prediction candidate frame in an ith position area of the ith picture relative to the width and height of a corresponding standard size frame, wherein the value range is 0-1;

wherein area represents a region;

Corresponding real label information

The error between; the loss function for the ith picture is defined as follows:

wherein,representing the real category corresponding to the jth prediction candidate frame in the ith position area of the ith picture, if the standard size frame corresponding to the prediction candidate frame is a positive sample, then

indicating the coordinate offset of the center of the real labeling frame of the ith picture relative to the center of the jth standard size frame of the ith position area,the width and the height of the real marking frame of the first picture are respectively relative to the width of the jth standard size frame of the ith position areaAnd the offset of the height; relative offset of real marking frame of first pictureThe calculation expressions are respectively as follows:

wherein,

respectively represent F corresponding to the first picture_imageThe width and height of the original picture;

obtained by k-means clustering; lambda [ alpha ]_locIndicating loss of position error0.01 ≦ λ for the weight coefficient of class error loss_loc≤0.2；

A Boolean value indicating whether the jth prediction candidate frame of the ith position area is a flame or not, if the standard size frame corresponding to the prediction candidate frame is a positive sample, the prediction candidate frame is divided into a plurality of regions

The value is 0;a Boolean value indicating whether the jth prediction candidate frame of the ith position area is not the flame, if the standard size frame corresponding to the prediction candidate frame is a positive sample, then

The value is 0; if the standard size frame corresponding to the prediction candidate frame is negative sample, thenThe value is 1; lambda [ alpha ]_noobjλ is 0.01 ≦ for adjusting the weighting factor of the class loss ratio of the positive and negative samples_noobj≤0.2；

4-3-4) repeating steps 4-3-1) to 4-3-3), and combining the training set F_trainInputting each picture into the Model established in the step 4-2), updating the Model through an error back propagation algorithm to obtain a current Model and recording the current Model as a Model_old；

4-3-8) training stopping condition judgment:

5-1) shooting a real-time video, inputting each frame of picture of the real-time video into the flame detection model trained in the step 4) for detection, and outputting the class probability of s multiplied by k prediction candidate frames corresponding to the frame of picture and the coordinate offset relative to the corresponding standard size frame by the modelWhere the superscript o represents the flame detection model;

Then no flame is present in the prediction candidate box; if it is

And corresponding standard size frame coordinates

Wherein,

respectively representing the original width and height of the frame picture;