CN110751089A

CN110751089A - Flame target detection method based on digital image and convolution characteristic

Info

Publication number: CN110751089A
Application number: CN201910992933.6A
Authority: CN
Inventors: 赵亚琴; 卢鹏; 丁志鹏
Original assignee: Nanjing Forestry University
Current assignee: Nanjing Forestry University
Priority date: 2019-10-18
Filing date: 2019-10-18
Publication date: 2020-02-04

Abstract

Because the flame detection model based on the image characteristics has low generalization and the deep neural network model has higher requirement on the number of training samples, the patent provides a flame target detection method based on digital images and convolution characteristics, and firstly, a data set containing video dynamic characteristics is made; then replacing the standard convolution of VGG16 in the classic Faster R-CNN with a depth separable convolution, and reducing the number of convolution layers; then cutting out 256 image blocks from an original image according to a candidate frame generated by RPN, and extracting LBP (local binary pattern) features for each image block; the size of an output characteristic diagram of the ROI pooling layer and the number of neurons of a full connection layer are reduced through convolution, and network parameters are further reduced; and finally, combining the extracted LBP characteristics, the dynamic characteristics in the data set and the characteristic vectors tiled after pooling, and sending the combined dynamic characteristics and the characteristic vectors into a full-connection layer for classification and regression. The flame target detection model that this patent was constructed has higher detection precision, is convenient for improve to the not enough of test result, and the flexibility is high.

Description

Flame target detection method based on digital image and convolution characteristic

Technical Field

The present invention relates to a method for detecting flame.

Background

The rapid detection of the flame has great significance for early warning and timely processing of the fire, the fire monitoring video system is an important means for preventing the fire, and the flame is an important visual sign of the fire, so that the flame identification method based on the image characteristics becomes a hotspot in the fire prevention field. Flame identification methods based on image characteristics are mainly divided into two types, namely flame static characteristics and dynamic characteristics. Flame static characteristics mainly comprise edge and color space information and the like, and the Joe construction strength and the like provide a flame detection method based on edge characteristics, and the edge characteristic information is extracted from the image so as to identify the flame; chentianyan and the like propose a YCbCr color space threshold segmentation-based method, and a flame region is extracted by utilizing the characteristics of flame in a color space. The flame dynamic characteristics comprise a Gaussian mixture background model, frame difference and the like, for example, Liqinghui and the like detect a moving target in a video sequence by using a self-adaptive Gaussian mixture model, and then a clustering algorithm is adopted to divide a suspected flame area and a non-fire area; stadler et al extracts flame candidate regions using a weighted interframe difference method, distinguishes flames from non-flames by high flicker frequency, and applies threshold filtering and a high-pass filter to intensity variations to improve the recognition rate. Since the outer flame of the flame has moving characteristics, and the moving characteristics of the flame kernel are not obvious, it is difficult to distinguish the flame from the interfering objects such as safflower or illumination similar to the flame color and texture by using the common characteristics based on static images such as color, edge and texture, and the detection method based on the dynamic characteristics can make the flame kernel area be wrongly judged as the background.

The deep convolutional neural network can learn and induce the image characteristics, has stronger discrimination capability and generalization capability, and is more and more widely applied to the field of image processing. Girshick et al propose that fast R-CNN is used for target detection, and a candidate region is firstly generated and then classified and regressed; redmon et al propose YOLO, which predicts the frame and class probabilities directly from the entire input image with only one evaluation, and has a lower accuracy but higher detection speed than Faster R-CNN; the SSD and the YOLO proposed by Liu et al belong to a single detector, and meanwhile, the anchor point frame mechanism of Faster R-CNN is used for reference, so that the rapid characteristic is kept under the condition of no precision loss.

The training of the deep convolutional neural network has higher requirements on the capacity of a data set, although a good effect can be achieved sometimes by using transfer learning on small and medium data sets, the similarity between the flame and an object in the current public data set is extremely small, and the effect of the transfer learning is extremely small.

Disclosure of Invention

The invention aims to provide a flame target detection method based on digital images and convolution characteristics, which has the advantages of few convolution layers, few network parameters and high flexibility.

In order to achieve the above object, the method for detecting a flame target based on digital images and convolution features according to the present invention comprises the following steps:

1) dividing a video to be detected into a plurality of video blocks, wherein each video block comprises 31 frames of images, reading the data of the video blocks and extracting dynamic characteristics from each video block: flame area variation characteristics, shape similarity characteristics and flicker frequency characteristics;

2) randomly extracting an image from the video block and carrying out image regularization on the image;

3) convolving the normalized image to generate a characteristic diagram;

4) generating 256 candidate areas on the feature diagram obtained by convolution by adopting an RPN (fast R-CNN) network of the Faster R-CNN, cutting out corresponding image blocks from the normalized image through candidate area coordinates, and extracting LBP (local binary pattern) features for each image block;

5) pooling the characteristic maps ROI corresponding to the candidate regions to a fixed size, and then reducing dimensions by convolution and flattening to a characteristic vector with fixed dimensions;

6) and dividing the network into two branches, wherein one branch is used for correcting the coordinates of the frame, the other branch is used for combining the dynamic characteristics and the LBP characteristics after the dimensionality reduction of the characteristic vector, and the softmax classifier is used for carrying out flame identification.

As a further improvement of the flame target detection method, in step 2), when the image is normalized, the short edge of the image is set to be not more than 300 pixels, and the long edge of the image is set to be not more than 500 pixels, and the image beyond the limit is scaled in an equal ratio.

As a further improvement to the flame target detection method, in step 3), the layer 1 of the convolutional network is a standard convolution, and then the layer 11 is a deep separable convolution; the convolution step size of the convolution network layers 3, 5, 7 and 11 is 2, so that the convolution reduces the length and width of the feature map by half each time, and finally reduces the length and width of the feature map to 1/16 of the normalized image.

As a further improvement of the flame target detection method, in step 5), the feature maps corresponding to the candidate regions are respectively pooled to a fixed size of 5 × 5 × 512, and then subjected to convolution with a step size of 2, so as to reduce the size of the feature maps and reduce the dimension of the flattened feature vectors, wherein the size of the flattened feature vectors is 256 × 2048.

As a further improvement to the flame target detection method, in step 1),

extracting a suspected flame area from a video image, and representing the change of the flame area by using the change of the number of pixel points, wherein the area change rate of the flame in a video frame is as follows:

in the formula, S_tAnd S_t-1Respectively representing the number of pixel points, delta A, in the suspected flame area of the t frame and the t-1 frame in the video sequence_tRepresenting the change rate of the total number of the pixel points in the suspected flame area, namely the change rate of the area of the suspected flame area; each video block has 31 frames of images, and 30-dimensional flame area change feature vectors are extracted.

Flame shape similarity ξ_tThe calculation formula is as follows:

wherein, b_t(x,_y) And b_t+3(x, y) are the image sequences of the t-th frame and the t + 3-th frame, respectively, omega is the suspected flame area, (x,_y) Are the coordinates of the points of the image sequence. Each video block extracts a 10-dimensional shape similarity feature vector.

The flame flicker frequency is used as an important criterion for recognizing the flame of the forest fire. Considering that the size of a video block is fixed, the flame flicker frequency characteristic is indirectly reflected by the change of the pixels at the edge of the suspected flame area, and the formula is as follows:

ΔP_t＝|P_t-P_t-3|

wherein, P_tAnd P_t-3Representing the suspected flame area edge pixel points of the t frame and the t-3 frame in the video sequence; delta P_tRepresenting edge pixel variation; extracting a 10-dimensional flicker frequency characteristic vector from each video block;

and combining the flame area change, the shape similarity and the flicker frequency characteristics, and extracting a 50-dimensional dynamic characteristic vector from each video block.

As a further improvement of the flame target detection method, in step 4), 59-dimensional LBP feature vectors are respectively extracted for the RGB three channels of each image block, and 177-dimensional LBP feature vectors are totally extracted.

As a further improvement to the flame target detection method, in step 6), the class prediction network first uses one fully-connected layer to reduce the feature vector to 256 × 512, then combines the feature vector with the previously extracted 256 × 177 LBP feature vector and the 256 × 50 dynamic feature vector extracted from the video data to form a new feature vector of 256 × 739, then generates feature vectors with dimensions equal to the total number of classes from two fully-connected layers, and finally obtains the classification result by the softmax classifier.

The invention has the beneficial effects that: aiming at the problems that the flame detection model based on the image characteristics is not strong in generalization, the deep neural network model has high requirements on the number of training samples and is difficult to improve aiming at the test result, an improved Faster R-CNN model combining flame space-time characteristics is constructed for flame target detection. Firstly, making a data set containing video dynamic characteristics; then replacing the standard convolution of VGG16 in the classic Faster R-CNN with a depth separable convolution, and reducing the number of convolution layers; then cutting out 256 image blocks from an original image according to a candidate frame generated by RPN, and extracting LBP (local binary pattern) features for each image block; the size of an output characteristic diagram of the ROI pooling layer and the number of neurons of a full connection layer are reduced through convolution, and network parameters are further reduced; and finally, combining the extracted LBP characteristics, the dynamic characteristics in the data set and the characteristic vectors tiled after pooling, and sending the combined dynamic characteristics and the characteristic vectors into a full-connection layer for classification and regression. The flame target detection model that this patent was established has higher detection precision to be convenient for improve to the not enough of test result, have higher flexibility.

Thus, an improved Faster R-CNN model incorporating flame spatiotemporal features was constructed herein for flame target detection. The space-time characteristics comprise time sequence dynamic characteristics (including dynamic characteristics such as flame area change, shape similarity and flicker frequency characteristics) which cannot be extracted by the Faster R-CNN deep convolution neural network and unobvious space texture characteristics, and are combined with the Faster R-CNN deep convolution neural network, so that the target characteristics can be extracted more comprehensively, and a certain generalization of the model is ensured.

Drawings

FIG. 1 is a flow chart of extracting a suspected flame region;

FIG. 2 is a flow chart of the modified Faster R-CNN algorithm;

FIG. 3 is a structural diagram of an improved Faster R-CNN model.

Detailed Description

The following examples are provided to illustrate the flame target detection method based on digital images and convolution features of the present invention.

1 data set production

1.1 training set format

The model detects flames by extracting static features and dynamic features, wherein the static features comprise features extracted by a convolutional network and LBP texture features, and the dynamic features comprise flame area change features, shape similarity features and flicker frequency features.

Static features must be extracted in real time according to candidate frames in the training process, and dynamic features are extracted through video frames and are irrelevant to network parameter changes, so that the dynamic features are extracted from a data set and taken together with images with labels as a training set before model training, and redundant calculation is simplified and simplified for facilitating model training. The model training needs a large amount of flame videos, the videos need to comprise more than ten different scenes, the videos are divided into a plurality of video blocks, each video block comprises 31 frames of images, 1 image is extracted from the videos, a box coordinate and a category label are marked on the images, and then dynamic features are extracted from each video block. Finally, each individual training sample format is shown in table 1.

TABLE 1 training set sample format

Sample numbering

Image data

Coordinates of the square

Object classes

Dynamic feature vector

2.2 dynamic feature extraction

Firstly, a method of combining color and motion information is adopted to extract a suspected flame area, and the extraction process is shown in fig. 1.

Color pixels are detected using three rules:

r, G, B are three channels of red, green and blue, T_RIs the threshold of S channel, S refers to the saturation of the pixel, T_SBeing S-channelAnd (4) a threshold value.

And detecting the motion pixels by adopting a mixed Gaussian background model method.

Next, flame area variation, shape similarity, and flicker frequency characteristics are extracted.

The area of the flame in the early fire presents a continuous and expandable growth trend, so that the change of the area of the flame area along with time can be used as a powerful basis for forest fire detection. In the video image, the change of the flame area can be represented by the change of the number of the pixel points. After color and motion pixel detection, obtaining a binary image of a suspected flame area, and calculating the total number of pixel points of a flame target in the binary image, wherein the area change rate of the flame in the video frame is as follows:

in the formula, S_tAnd S_t-1Respectively representing the number of pixel points, delta A, in the suspected flame area of the t frame and the t-1 frame in the video sequence_tAnd (3) representing the change rate of the total number of the pixel points in the suspected flame area, namely the area change rate of the suspected flame area. Each video block has 31 frames of images, and 30-dimensional flame area change feature vectors (t is 2, 3, 4, 5, … …, 31) are extracted in total.

The difference between two adjacent frames is small, so that the similarity of images separated by 3 frames is calculated, and the shape similarity ξ is obtained_tThe calculation formula is as follows:

wherein, b_t(x, y) and b_t+3And (x, y) are the image sequences of the t frame and the t +3 frame respectively, and omega is a suspected flame area. (x, y) are coordinates of sequence points, and a 10-dimensional shape similarity feature vector (t ═ 1, 4, 7, 10, … …, 28) is extracted for each video block.

When the combustible materials are combusted, flames flicker continuously, the flickering surfaces of the flames look disordered and irregular, and each combustible material has the fixed flickering frequency, so that the flickering frequency of the flames can be selected as an important criterion for identifying the flames of forest fires. Considering that the size of a video block is fixed, the flicker frequency characteristic is indirectly reflected by the edge pixel change, and the formula is as follows:

ΔP_t＝|P_t-P_t-3| (4)

wherein, P_tAnd P_t-3Representing the suspected flame area edge pixel points of the t frame and the t-3 frame in the video sequence; delta P_tIndicating edge pixel variation. Each video block extracts a 10-dimensional flicker frequency feature vector (t ═ 4, 7, 10, 13, … …, 31).

After merging, each video block extracts a 50-dimensional motion feature vector.

3 model training

The flow chart of the improved Faster R-CNN algorithm is shown in FIG. 2, and the steps are as follows: 1) read the data. 2) Then the read image data is subjected to image regularization. 3) Convolving the normalized image to produce a feature map. 4) Generating 256 candidate regions on the feature map obtained by convolution by using an RPN network of Faster R-CNN, and extracting an LBP feature for each candidate region. 5) Pooling the feature maps ROI corresponding to each candidate region to a fixed size, and then reducing dimensions by convolution and flattening to a feature vector of a fixed dimension. 6) And dividing the network into two branches, wherein one branch is used for correcting the coordinates of the frame, and the other branch combines the dynamic features and the LBP features after reducing the dimension of the feature vectors, and then predicts the category.

The specific structure of the model is shown in fig. 3 and table 4. The method comprises the following specific steps:

1) read training set data in the specified format.

2) And performing image normalization on the image data, setting the short edge of the image not to exceed 300 and the long edge of the image not to exceed 500, and performing equal-ratio scaling on the image which exceeds the limit.

3) Feeding the normalized image into a convolution network to generate a feature map, wherein the first layer of the convolution network is standard convolution and the later 11 layers are depth separable convolution as shown in FIG. 3; wherein the convolution step of conv3, conv5, conv7 and conv11 is 2, so that the convolution reduces the length and width of the feature map by half each time, and finally reduces the length and width of the feature map to 1/16 of the normalized image. The size of the finally obtained characteristic diagram is the same as that of the original Faster R-CNN characteristic diagram, but the parameters of the convolutional network are reduced to about 1/9, so that the network is lighter.

4) Generating 256 candidate areas on the feature map obtained by convolution by adopting an RPN network of Faster R-CNN, cutting out corresponding image blocks from the normalized image through candidate area coordinates, and respectively extracting 59-dimensional LBP feature vectors and 177-dimensional LBP feature vectors in total for RGB three channels of each image block.

5) Respectively pooling the feature maps corresponding to the candidate regions to a fixed size of 5 × 5 × 512, performing convolution with a step size of 2 (conv 13 in fig. 3), reducing the size of the feature maps, and reducing the dimension of the flattened feature vectors, wherein the size of the flattened feature vectors is 256 × 2048.

6) The network is divided into two branches, one branch uses two full connection layers to carry out frame regression, and frame coordinates are corrected; the other branch is used to predict the class. The class prediction network firstly uses a layer of full connection layer to reduce the dimension of the feature vector to 256 multiplied by 512, then combines the feature vector with the LBP feature vector of 256 multiplied by 177 extracted before and the dynamic feature vector of 256 multiplied by 50 extracted from the video data to be a new feature vector of 256 multiplied by 739, then connects two layers of full connection layer to reduce the dimension to the total number of classes, and finally obtains the classification result by a softmax classifier.

The parameters for training the convolutional network, RPN network and fully-connected layer with the gradient descent method are shown in table 2.

TABLE 2 fast R-CNN model Structure

4 model prediction

And reading the video block data by the prediction stage model so as to realize the detection of the flame target in the video. Unlike the training phase, what the model predicts is an image patch without block coordinates and class labels input to the network. The model firstly extracts the dynamic characteristics of the video block according to the above mode, and then extracts an image from the video block to perform operations such as regularization and subsequent convolution. Once the model detects a flame, the frame and class of the flame region is labeled on the extracted image.

The invention divides the video into video blocks, extracts dynamic characteristics for each video block and makes a training set according to a fixed format. The network model adopts a light-weight depth separable convolution network to extract an image characteristic diagram, adopts an RPN network of Faster Rcnn to generate candidate regions, extracts LBP characteristic vectors for each candidate region, reduces the dimension of the convolved characteristic vectors, combines the characteristic vectors with the LBP characteristic vectors and the dynamic characteristic vectors, and then obtains a classification result and frame regression respectively through a full connection layer, so that the flame detection identification precision is high.

Claims

1. The flame target detection method based on the digital image and the convolution characteristic is characterized in that: it comprises the following steps:

3) convolving the normalized image to generate a characteristic diagram;

6) and dividing the network into two branches, wherein one branch is used for correcting the coordinates of the frame, and the other branch is used for combining the dynamic characteristics and the LBP characteristics after the dimensionality reduction of the characteristic vector, and performing flame identification by using a softmax classifier.

2. A flame target detection method as defined in claim 1, wherein: and 2) in step 2), setting the short edge of the image not to exceed 300 pixels and the long edge of the image not to exceed 500 pixels when the image is normalized, and scaling the image exceeding the limit in an equal ratio.

3. A flame target detection method as defined in claim 2, wherein: in the step 3), the 1 st layer of the convolution network is standard convolution, and the later 11 layers are depth separable convolution; the convolution step size of the convolution network layers 3, 5, 7 and 11 is 2, so that the convolution reduces the length and width of the feature map by half each time, and finally reduces the length and width of the feature map to 1/16 of the normalized image.

4. A flame target detection method as defined in claim 1, wherein: and 5), pooling the characteristic maps corresponding to the candidate regions to a fixed size of 5 × 5 × 512 in ROI respectively, and performing convolution with a step length of 2 to reduce the size of the characteristic maps and the dimension of the flattened characteristic vectors, wherein the size of the flattened characteristic vectors is 256 × 2048.

5. A flame target detection method as defined in claim 1, wherein: in the step 1), the step (A) is carried out,

in the formula, S_tAnd S_t-1Respectively representing the number of pixel points, delta A, in the suspected flame area of the t frame and the t-1 frame in the video sequence_tRepresenting the change rate of the total number of the pixel points in the suspected flame area, namely the change rate of the area of the suspected flame area; each video block has 31 frames of images, and 30-dimensional flame area change characteristic vectors are extracted in total;

flame shape similarity ξ_tThe calculation formula is as follows:

wherein, b_t(x, y) and b_t+3(x, y) are the image sequences of the t frame and the t +3 frame respectively, omega is a suspected flame area, and each video block extracts 10-dimensional shape similarity characteristic vectors;

the method indirectly reflects the flame flicker frequency characteristics by using the edge pixel change of the suspected flame area, and the formula is as follows:

ΔP_t＝|P_t-P_t-3|

6. A flame target detection method as defined in claim 5, wherein: in step 4), 59-dimensional LBP feature vectors are respectively extracted from the RGB three channels of each image block, and 177-dimensional LBP feature vectors are extracted in total.

7. A flame target detection method as defined in claim 6, wherein: in step 6), the class prediction network firstly uses one full-link layer to reduce the dimension of the feature vector to 256 × 512, then combines the feature vector with the LBP feature vector of 256 × 177 extracted previously and the dynamic feature vector of 256 × 50 extracted from the video data into a new feature vector of 256 × 739, then connects two full-link layers to reduce the dimension to the total number of classes, and finally obtains the classification result by the softmax classifier.