CN109409224B

CN109409224B - Method for detecting flame in natural scene

Info

Publication number: CN109409224B
Application number: CN201811108891.7A
Authority: CN
Inventors: 巫义锐; 何粤超; 郭红鑫; 李子铭; 贾柠晖
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2018-09-21
Filing date: 2018-09-21
Publication date: 2023-09-05
Anticipated expiration: 2038-09-21
Also published as: CN109409224A

Abstract

The invention discloses a method for detecting flame in a natural scene, and belongs to the field of fire protection. The invention comprises the following steps: step 1, extracting the maximum stable extremum region of an input image; step 2, filtering redundant extremum regions through the color characteristics and the area of the maximum stable extremum region to obtain flame candidate regions; step 3, inputting the flame candidate region into a convolutional neural network model to extract convolutional features; step 4, inputting the extracted convolution characteristics into a trained support vector machine classifier to classify, and judging whether the convolution characteristics are flame or not; and 5, merging the flame areas to obtain a final flame area. The invention has strong robustness and good detection effect, and can accurately complete the flame detection task.

Description

Method for detecting flame in natural scene

Technical Field

The invention relates to a method for detecting flame in a natural scene, and belongs to the field of fire protection.

Background

Fire is always one of the most damaging and common disasters, and from residential houses to field environments, once the fire occurs and is not timely extinguished, huge losses can be generated. Therefore, the technology of flame detection is widely focused by researchers at home and abroad. At present, various techniques for detecting flames exist, however, due to the diversity and complexity of application scenarios, conventional flame detection techniques (such as smoke detection and temperature detection) always have certain limitations, so that the computer vision-based flame detection technique is a new direction for preventing fires.

Disclosure of Invention

Aiming at the limitations of the use environment of smoke detectors and temperature detectors and the popularization of video monitoring systems, the invention provides a natural scene flame detection method which is based on digital image processing and deep learning, has low detection cost and strong robustness.

The invention adopts the following technical scheme for solving the technical problems:

a method of natural scene flame detection comprising the steps of:

step 1, extracting the maximum stable extremum region of an input image;

step 2, filtering redundant extremum regions through the color characteristics and the area of the maximum stable extremum region to obtain flame candidate regions;

step 3, inputting the flame candidate region into a convolutional neural network model to extract convolutional features;

step 4, inputting the extracted convolution characteristics into a trained support vector machine classifier to classify, and judging whether the convolution characteristics are flame or not;

and 5, merging the flame areas to obtain a final flame area.

The specific process of the step 1 is as follows:

step 11, converting the input image into gray level image, denoted as I _gray ；

Step 12, according to the order of the ascending order of the threshold values, calculating the I under each threshold value _gray Extreme region of (1), extreme region (Q) _i The definition is as follows:

wherein I is _gray (p) And I _gray (q) represents I _gray The values of the pixel points p and q in the middle, i E [0,255]A threshold value representing the region of extrema,representation and extremum region Q _i A set of pixels adjacent but not belonging to the extremum region;

step 13, pair I _gray Calculating the change rate of the extremum region:

the extremum region change rate is defined as:

wherein, delta is the tiny variation of gray threshold value, Q _i+Δ For the extreme value region obtained after the gray threshold value is increased, Q _i-Δ For the extremum region obtained after the gray threshold is reduced, r (i) is region Q when i is threshold _i Is a rate of change of (2);

step 14, pair I _gray Solving the maximum stable extremum region:

when area Q _ii The region is considered to be the maximum stable extremum region if the rate of change r (i) is less than the threshold T.

The specific process of the step 2 is as follows:

step 21, calculating the area S of each maximum stable extremum region, calculating the RGB value of each pixel, and judging whether the pixel is a flame pixel, wherein the calculation formula is as follows:

where R is the value of the R channel in the RGB image, G is the value of the G channel in the RGB image, B is the value of the B channel in the RGB image, R _t Threshold for R channel, b _t Threshold g for B channel _t A threshold for G channel;

step 22, filtering the redundant area by the total number n of flame pixel points and the area of the maximum stable extremum area, wherein the remaining maximum stable extremum area is the flame candidate area, and the filtering conditions are as follows:

where H is the threshold and S is the area of the maximum stable extremum region.

The step of extracting the convolution feature in the step 3 is as follows:

step 31, preprocessing an input picture to obtain 224×224×3 images;

step 32, firstly performing convolution operation with step length of 1, convolution kernel size of 3×3 and convolution number of 64 on an input image, and then performing pooling operation with pooling size of 2×2 and step length of 2 to obtain a 112×112×64 feature map;

step 33, for the feature map obtained in step 31, performing convolution operation with step length of 1, convolution kernel size of 3×3 and convolution sum of 128 twice, and performing pooling operation again to obtain a 56×56×128 feature map;

step 34, for the feature map obtained in step 32, performing a convolution operation with a step size of 1, a convolution kernel size of 3×3, and a convolution sum of 256 for four times, and performing a pooling operation again to obtain a feature map with a size of 28×28×256;

step 35, for the feature map obtained in step 33, performing a convolution operation with a step size of 1, a convolution kernel size of 3×3, and a convolution sum of 512, and performing a pooling operation to obtain a feature map with a size of 14×14×512;

step 36, for the feature map obtained in step 34, performing a convolution operation with a step size of 1, a convolution kernel size of 3×3, and a convolution sum of 512, and performing a pooling operation to obtain a 7×7×128 feature map;

step 37, converting the feature map obtained in step 35 into feature vectors of 1×1×4096;

step 38, converting the feature vector obtained in step 36 into a feature vector of 1×1×1000.

The specific process of step 31 is as follows:

the first step, scaling the picture to a size of 224×224×3;

the second step, normalize the pixel value of the picture, the computational method is:

where R is the value of the R channel in the RGB image, G is the value of the G channel in the RGB image, and B is the value of the B channel in the RGB image;

thirdly, carrying out de-averaging on the pixel values of the picture, wherein the calculating method comprises the following steps:

wherein r is _mean 、b _mean 、g _mean Mean values of R channel, G channel and B channel of the picture are respectively shown.

The specific process of step 32 is as follows:

(1) Convolution operation:

let A be the input feature map M _i Is a convolution kernel:

wherein: a, a ₀₀ A represents the pixel value of the 0 th row and 0 th column in the feature plane A ₀₁ Representing the pixel value of the 0 th row and 1 st column in the feature plane A, a ₀₂ Representing the pixel value of row 0 and column 2 in feature plane A, a ₁₀ Representing the pixel value of the 1 st row and 0 th column in the feature plane A, a ₁₁ Representing the pixel value of the 1 st row and 1 st column in the feature plane A, a ₁₂ Representing the pixel value of the 1 st row and 2 nd column in the feature plane A, a ₂₀ Representing characteristic surfacesPixel value of row 2, column 0, a in a ₂₁ Representing the pixel value of the 2 nd row and 1 st column in the feature plane A, a ₂₂ Representing the pixel values of row 2 and column 2, w in feature plane A ₀₀ A weight value representing row 0 and column 0 of the convolution kernel K; w (w) ₀₁ A weight value representing row 0 and column 1 in the convolution kernel K; w (w) ₀₂ A weight value representing row 0 and column 2 in the convolution kernel K; w (w) ₁₀ A weight value representing row 1 and column 0 of the convolution kernel K; w (w) ₁₁ A weight value representing row 1 and column 1 of the convolution kernel K; w (w) ₁₂ A weight value representing row 1 and column 2 of the convolution kernel K; w (w) ₂₀ A weight value representing row 2 and column 0 of the convolution kernel K; w (w) ₂₁ A weight value representing row 2 and column 1 of the convolution kernel K; w (w) ₂₂ A weight value representing row 2 and column 2 of the convolution kernel K;

under the condition of step size stride= (1, 1),

wherein: a, a _ij Representing the pixel value, w, of the ith row and jth column in feature plane A _ij A weight value representing the ith row and jth column in the convolution kernel K, A ₀₀ For the value of row 0 and column 0 in A.times.K, A ₀₁ For the value of row 0, column 1 in A.times.K, A ₁₀ For the value of row 1 and column 0 in A.times.K, A ₁₁ For the value of row 1 and column 1 in a x K, K is a convolution kernel, the step size stride= (1, 1) indicates that the distance of the convolution kernel K moving transversely and longitudinally on the feature plane a is 1, when the convolution kernel moves the edge of the feature plane, if the number of elements of the feature plane is insufficient, the insufficient place is filled with '0', so as to ensure that the output image obtained after the convolution operation is the same as the input image in size.

Let B be the output profile M _o Is characterized by:

wherein A is _i For inputting feature map M _o Is the ith feature plane of (a), the symbol x represents convolutionN is the number of convolution kernels; the convolution operation is performed twice;

(2) Pooling operation:

let B be one of the feature planes in the feature map M obtained by convolution, and C be the feature plane obtained by pooling:

wherein b ₀₀ Representing the pixel value of row 0 and column 0 in feature plane B, B ₀₁ Representing the pixel value of row 0 and column 1 in feature plane B, B ₀₂ Representing the pixel value of row 0 and column 2 in feature plane B, B ₀₃ Representing the pixel value of row 0 and column 3 in feature plane B, B ₁₀ Representing the pixel value of the 1 st row and 0 th column in the feature plane B, B ₁₁ Representing the pixel value of row 0 and column 1 in feature plane B, B ₁₂ Representing the pixel values of row 1 and column 2 in feature plane B, B ₁₃ Representing the pixel values of row 1 and column 3 in feature plane B, B ₂₀ Representing the pixel value of row 2 and column 0 in feature plane B, B ₂₁ Representing the pixel values of row 2 and column 1 in feature plane B, B ₂₂ Representing the pixel values of row 2 and column 2 in feature plane B, B ₂₃ Representing the pixel values of row 2 and column 3 in feature plane B, B ₃₀ Representing the pixel value of row 3, column 0, B in feature plane B ₃₁ Representing the pixel values of row 3 and column 1 in feature plane B, B ₃₂ Representing the pixel values of row 3 and column 2 in feature plane B, B ₃₃ A pixel value representing the 3 rd row and 3 rd column in the feature plane B; c ₀₀ B is ₀₀ 、b ₀₁ 、b ₁₀ 、b ₁₁ Maximum value of c ₀₁ B is ₀₂ 、b ₀₃ 、b ₁₂ 、b ₁₃ Maximum value of c ₁₀ B is ₂₀ 、b ₂₁ 、b ₃₀ 、b ₃₁ Maximum value of c ₁₁ B is ₂₂ 、b ₂₃ 、b ₃₂ 、b ₃₃ Is the maximum value of (a).

After the pooling operation, a characteristic surface C is obtained, and the size is reduced to be half of the original size;

(3) After pooling, nonlinear transformation is performed on the output result by using a RELU function, and the RELU formula is as follows:

wherein: c _ij Is the value of the ith row and jth column of the feature map C.

The specific process of the step 37 is as follows:

1) Converting the feature map obtained in the step 35 into a vector of 1 x (7 x 512);

2) Multiplying the vector obtained in the step 6 by trained weights to obtain a feature vector of 1×1×4096, wherein the calculation formula is as follows:

wherein a is ₀ Value 0 as eigenvector, a ₁ For the feature vector 1 st value, a ₄₀₉₅ 4095 th value for the feature vector; w (w) ₀₀ The value w is the value of the 0 th row and the 0 th column of the weight matrix ₀₁ For the value of row 0, column 1, w ₁₀ For the value of row 1 and column 0 of the weight matrix, w ₁₁ Values for row 1 and column 1 of the weight matrix; b ₀ 0 th value of offset vector, b ₁ For the offset vector 1 st value, b ₄₀₉₅ 4095 th value for offset vector; x is x ₀ To input vector 0 th value, x ₁ To input the 1 st value of the vector, x ₅₁₂ 512 th value is the input vector.

3) Nonlinear transformation is carried out on the output result by using a RELU function, and the RELU formula is as follows:

wherein: a, a _i Is of special interestThe symptom vector is the i-th value.

The step 38 includes:

a) Multiplying the feature vector obtained in the step 36 by trained weights to obtain a feature vector of 1×1×1000, and the calculation formula is as follows:

wherein a is ₀ Value 0 as eigenvector, a ₁ For the feature vector 1 st value, a ₉₉₉ The 999 th value for the feature vector; w (w) ₀₀ The value w is the value of the 0 th row and the 0 th column of the weight matrix ₀₁ For the value of row 0, column 1, w ₁₀ For the value of row 1 and column 0 of the weight matrix, w ₁₁ Values for row 1 and column 1 of the weight matrix; b ₀ 0 th value of offset vector, b ₁ For the offset vector 1 st value, b ₉₉₉ A 999 th value for the bias vector; x is x ₀ To input vector 0 th value, x ₁ To input the 1 st value of the vector, x ₉₉₉ The 999 th value for the input vector;

b) Nonlinear transformation is carried out on the output result by using a RELU function, and the RELU formula is as follows:

wherein: a, a _i Is the i-th value of the feature vector.

The step of training the support vector machine in the step 4 includes:

step 41, taking a flame training data set, extracting convolution characteristics of all flame candidate areas through the steps 1,2 and 3, and taking the convolution characteristics as a training set;

step 42, inputting the training set into a support vector machine for training the classification problem, wherein the solving step of the learning algorithm of the support vector machine comprises the following steps:

(1) Selecting a proper parameter C, constructing and solving an optimization problem:

constraint conditions

Obtaining an optimal solution For the 1 st value thereof, +.>For the 2 nd value thereof, +.>An nth value therein; alpha _i Is the ith Lagrangian multiplier, alpha _j Is the j-th Lagrangian multiplier, x _i Is the ith training sample, x _j Is the j-th training sample, y _i E { -1,1} is the class label corresponding to the ith sample; y is _j E { -1,1} is the class label corresponding to the jth sample, K (x _i ,x _j ) Is a gaussian kernel function, the formula of which is as follows:

wherein: sigma is the standard deviation;

(2) Select alpha ^* Is a positive component of (a) For the j-th value, calculate

Wherein: b ^* For the required bias;

(3) Constructing a decision function:

wherein:for the i-th value in the optimal solution, sign () represents a sign function, f (x) =1 when the term in the bracket is greater than a certain threshold value, and f (x) =0 when the term in the bracket is less than a certain threshold value.

The specific process of the step 5 is as follows:

step 51, calculating all S for the flame region set S obtained in step 4 _i E S center point c _i ，s _i Is the ith zone in the set of flame zones S;

step 52, for any flame region s _i ,s _j ∈S，s _j Is the jth zone of the set of flame zones S if their center point c _i And c _j And if the Euler distance between the two is smaller than the threshold value F, combining the two to obtain the final flame region.

The beneficial effects of the invention are as follows:

(1) The invention uses MSER (maximum stability extremum region algorithm) and RGB (a color space) characteristics to extract flame candidate regions, and the characteristics describe the flame candidate regions by using the change rate of image gray values and RGB values. The method has strong robustness because the color variation range of the flame area is small, and the flame color has a certain rule and can be expressed by an RGB model.

(2) The invention utilizes the convolutional neural network to extract the flame region characteristics, and the convolutional neural network can extract a large number of characteristics and can well describe the flame.

(3) The invention can detect whether flame exists in the image or not, and can accurately mark the flame position.

Drawings

Fig. 1 is a detection flow chart.

Fig. 2 is a flow chart of convolutional neural network extraction of convolutional features.

Detailed Description

The invention will be described in further detail with reference to the accompanying drawings.

The invention relates to a method for detecting flame in a natural scene, which is shown in figure 1 in the operation process and comprises the following steps:

step 1: inputting a flame image to be detected;

step 2: the method comprises the following steps:

first, an input image is converted into a gray scale, denoted as I _gray ；

Next, the I under each threshold is obtained according to the order of the ascending order of the threshold _gray Extreme region of (1), extreme region (Q) _i The definition is as follows:

in the process of changing the threshold value, for I _gray The change rate of the extremum area is calculated, and the change rate of the extremum area is defined as:

wherein, delta is the tiny variation of gray threshold value, Q _i+Δ For the extreme value region obtained after the gray threshold value is increased, Q _i-Δ Is a gray thresholdAn extremum region obtained after the value is reduced, and a region Q when r (i) is i threshold value _i Is a rate of change of (2); when area Q _i If the rate of change r (i) is less than the threshold T, the region is considered to be the maximum stable extremum region;

then, calculating the area S of each maximum stable extremum region, calculating the RGB value of each pixel point, and judging whether the pixel point is a flame pixel point, wherein the calculation formula is as follows:

where R is the value of the R channel in the RGB image, G is the value of the G channel in the RGB image, B is the value of the B channel in the RGB image, R _t Threshold for R channel, b _t Threshold g for B channel _t Is the threshold for the G channel.

Finally, filtering redundant areas through the total number n of flame pixel points and the area of the maximum stable extremum area, wherein the remaining maximum stable extremum area is the flame candidate area, and the filtering conditions are as follows:

Step 3: the method comprises the following steps:

firstly, preprocessing an input picture to obtain 224×224×3 images, wherein the specific operation process is as follows:

(1) Scaling the picture to a size of 224×224×3;

(2) The pixel value of the picture is normalized, and the calculation method comprises the following steps:

(3) The pixel values of the pictures are subjected to de-averaging, and the calculation method is as follows:

wherein r is _mean 、b _mean 、g _mean Average values of R channel, G channel and B channel of the picture are respectively represented;

secondly, carrying out operations such as convolution, pooling and the like on the image, wherein the specific process is as follows:

(1) Firstly, carrying out convolution operation with the step length of 1, the convolution kernel size of 3 multiplied by 3 and the convolution sum of 64 on an input image twice, and carrying out pooling operation with the pooling size of 2 multiplied by 2 and the step length of 2 once again to obtain a 112 multiplied by 64 feature map;

(2) For the obtained feature map, performing convolution operation with step length of 1, convolution kernel size of 3×3 and convolution sum of 128 twice, and performing pooling operation once to obtain a 56×56×128 feature map;

(3) For the obtained feature map, performing convolution operation with four steps of 1, convolution kernel size of 3×3 and convolution sum of 256, and performing pooling operation again to obtain a 28×28×256 feature map;

(4) For the obtained feature map, performing convolution operation with four steps of 1, convolution kernel size of 3×3 and convolution sum of 512, and performing pooling operation again to obtain a 14×14×512 feature map;

(5) For the obtained feature map, performing convolution operation with four steps of 1, convolution kernel size of 3×3 and convolution sum of 512, and performing pooling operation again to obtain a 7×7×128 feature map;

then, the obtained feature map is converted into feature vectors of 1×1×4096;

finally, converting the obtained feature vector into a feature vector of 1 multiplied by 1000;

the convolution operation process is as follows:

let A be the feature map of the inputM _i Is a 3 x 3 convolution kernel:

wherein: a, a ₀₀ A represents the pixel value of the 0 th row and 0 th column in the feature plane A ₀₁ Representing the pixel value of the 0 th row and 1 st column in the feature plane A, a ₀₂ Representing the pixel value of row 0 and column 2 in feature plane A, a ₁₀ Representing the pixel value of the 1 st row and 0 th column in the feature plane A, a ₁₁ Representing the pixel value of the 1 st row and 1 st column in the feature plane A, a ₁₂ Representing the pixel value of the 1 st row and 2 nd column in the feature plane A, a ₂₀ Representing the pixel value, a, of the 2 nd row and 0 th column of the feature plane A ₂₁ Representing the pixel value of the 2 nd row and 1 st column in the feature plane A, a ₂₂ Pixel values representing row 2 and column 2 in feature plane a; w (w) ₀₀ A weight value representing row 0 and column 0 of the convolution kernel K; w (w) ₀₁ A weight value representing row 0 and column 1 in the convolution kernel K; w (w) ₀₂ A weight value representing row 0 and column 2 in the convolution kernel K; w (w) ₁₀ A weight value representing row 1 and column 0 of the convolution kernel K; w (w) ₁₁ A weight value representing row 1 and column 1 of the convolution kernel K; w (w) ₁₂ A weight value representing row 1 and column 2 of the convolution kernel K; w (w) ₂₀ A weight value representing row 2 and column 0 of the convolution kernel K; w (w) ₂₁ A weight value representing row 2 and column 1 of the convolution kernel K; w (w) ₂₂ A weight value representing row 2 and column 2 of the convolution kernel K;

under the condition of step size stride= (1, 1),

Let B be the output profile M _o Is characterized by:

wherein A is _i For inputting feature map M _o Is the number of convolution kernels. The convolution operation is performed twice.

The pooling operation process is as follows:

After the pooling operation, the characteristic surface C is obtained, and the size is reduced to be half of the original size.

After pooling, nonlinear transformation is performed on the output result by using a RELU function, and the RELU formula is as follows:

wherein: c _ij Is the value of the ith row and jth column of the feature map C.

The step of converting the result of the convolution operation into a feature vector is as follows:

(1) Converting the obtained characteristic diagram into a vector of 1 x (7 x 512);

(2) The resulting vector is multiplied by the trained weights to yield a feature vector of 1 x 4096, calculated as:

(3) Nonlinear transformation is carried out on the output result by using a RELU function, and the RELU formula is as follows:

wherein: a, a _i Is the i-th value of the feature vector.

(4) Multiplying the result obtained in the last step by trained weights to obtain a feature vector of 1 multiplied by 1000, wherein the calculation formula is as follows:

wherein: a, a ₉₉₉ The 999 th value for the feature vector; b ₉₉₉ A 999 th value for the bias vector; x is x ₉₉₉ The 999 th value for the input vector.

(5) Nonlinear transformation is carried out on the output result by using a RELU function, and the RELU formula is as follows:

wherein: a, a _i Is the i-th value of the feature vector.

Step 4, training the support vector machine, which comprises the following steps:

firstly, taking a flame training data set, extracting convolution characteristics of all flame candidate areas through the steps 1,2 and 3, and taking the convolution characteristics as a training set;

then, the training set is input into a support vector machine to carry out classification problem training, wherein the solving step of the learning algorithm of the support vector machine comprises the following steps:

constraint conditions/>

Obtaining an optimal solution For the 1 st value thereof, +.>For the 2 nd value thereof, +.>Is the nth value thereof. Alpha _i Is the ith Lagrangian multiplier, alpha _j Is the j-th Lagrangian multiplier, x _i Is the ith training sample, x _j Is the j-th training sample, y _i E { -1,1} is the class label corresponding to the ith sample; y is _j E { -1,1} is the class label corresponding to the jth sample, K (x _i ,x _j ) Is a gaussian kernel function, the formula of which is as follows:

wherein: sigma is the standard deviation.

Wherein: b ^* For the required bias.

(3) Constructing a decision function:

wherein:for the i-th value in the optimal solution, sign () represents a sign function, f (x) =1 when the term in the bracket is greater than a certain threshold value, and f (x) =0 when the term in the bracket is less than a certain threshold value. The input feature vectors can be classified by using the decision function.

Step 5: the method comprises the following steps:

first, for the flame region S obtained in step 2, all S are calculated _i E S center point c _i Wherein s is _i Is the ith zone in the set of flame zones S;

then, for any flame region s _i ,s _j ∈S，s _j Is the jth zone of the set of flame zones S if their center point c _i And c _j And if the Euler distance between the two is smaller than the threshold value F, combining the two to obtain the final flame region.

Claims

1. A method for detecting flame in natural scene is characterized in that: the method comprises the following steps:

step 1, extracting the maximum stable extremum region of an input image;

step 2, filtering redundant extremum regions through the color characteristics and the area of the maximum stable extremum region to obtain flame candidate regions; the specific process of the step 2 is as follows:

wherein H is a threshold value, and S is the area of the maximum stable extremum region;

step 3, inputting the flame candidate region into a convolutional neural network model to extract convolutional features; the step of extracting convolution characteristics in the step 3 is as follows:

step 31, preprocessing an input picture to obtain 224×224×3 images;

step 32, firstly performing convolution operation with the step length of 1, the convolution kernel size of 3 multiplied by 3 and the convolution kernel number of 64 on an input image, and then performing pooling operation with the pooling size of 2 multiplied by 2 and the step length of 2 to obtain a 112 multiplied by 64 feature map;

step 33, for the feature map obtained in step 32, performing convolution operation with step length of 1, convolution kernel size of 3×3 and convolution kernel number of 128 twice, and performing pooling operation again to obtain a 56×56×128 feature map;

step 34, for the feature map obtained in step 33, performing a convolution operation with a step size of 1, a convolution kernel size of 3×3, and a convolution kernel number of 256, and performing a pooling operation again to obtain a feature map with a size of 28×28×256;

step 35, for the feature map obtained in step 34, performing a convolution operation with four steps of 1, a convolution kernel size of 3×3 and a convolution kernel number of 512, and performing a pooling operation to obtain a feature map with 14×14×512;

step 36, for the feature map obtained in step 35, performing a convolution operation with four steps of 1, a convolution kernel size of 3×3 and a convolution kernel number of 512, and performing a pooling operation to obtain a 7×7x512 feature map;

step 37, converting the feature map obtained in step 36 into feature vectors of 1×1×4096;

step 38, converting the feature vector obtained in step 37 into a feature vector of 1×1×1000;

step 5, combining the flame areas to obtain a final flame area; the specific process of the step 5 is as follows:

2. A method of natural scene flame detection as defined in claim 1, wherein: the specific process of the step 1 is as follows:

step 13, pair I _gray Calculating the change rate of the extremum region:

the extremum region change rate is defined as:

wherein delta is the tiny variation of gray threshold value, Q _i+Δ For the extreme value region obtained after the gray threshold value is increased, Q _i-Δ An extremum region obtained by reducing the gray threshold value, and a region Q when r (i) is i threshold value _i Is a rate of change of (2);

step 14, pair I _gray Solving the maximum stable extremum region:

when area Q _i The region is considered to be the maximum stable extremum region if the rate of change r (i) is less than the threshold T.

3. A method of natural scene flame detection as defined in claim 1, wherein: the specific process of step 31 is as follows:

the first step, scaling the picture to a size of 224×224×3;

wherein r is _mean 、b _mean 、g _mean Mean values of R channel, B channel and G channel of the picture are respectively shown.

4. The method of natural scene flame detection as recited in claim 1, wherein: the specific process of the step 37 is as follows:

1) Converting the feature map obtained in the step 36 into a vector of 1×1× (7×7×512);

2) Multiplying the vector obtained in the step 36 by trained weights to obtain a feature vector of 1 multiplied by 4096;

wherein: a, a _i Is the i-th value of the feature vector.

5. The method of natural scene flame detection as recited in claim 1, wherein: the step 38 includes:

a) Multiplying the feature vector obtained in the step 37 by trained weights to obtain a feature vector of 1 multiplied by 1000;

wherein: a, a _i Is the i-th value of the feature vector.

6. A method of natural scene flame detection as defined in claim 1, wherein: the step of training the support vector machine in the step 4 includes:

(1) Selecting a parameter C, constructing and solving an optimization problem:

0≤α _i ≤C,i＝1,2,...,N

obtaining an optimal solution For the 1 st value thereof, +.>Is the 2 nd thereofValue of->For the nth value therein, α _i Is the ith Lagrangian multiplier, alpha _j Is the j-th Lagrangian multiplier, x _i Is the ith training sample, x _j Is the j-th training sample, y _i E { -1,1} is the class label corresponding to the ith sample; y is _j E { -1,1} is the class label corresponding to the jth sample, K (x _i ,x _j ) Is a gaussian kernel function, the formula of which is as follows:

wherein: sigma is the standard deviation;

Wherein: b ^* For the required bias;

(3) Constructing a decision function:

wherein:for the i-th value in the optimal solution, sign () represents a sign function, f (x) =1 when the term in the bracket is greater than a certain threshold value, and f (x) =0 when the term in the bracket is less than a certain threshold value. />