CN112052797B - MaskRCNN-based video fire disaster identification method and MaskRCNN-based video fire disaster identification system - Google Patents

MaskRCNN-based video fire disaster identification method and MaskRCNN-based video fire disaster identification system Download PDF

Info

Publication number
CN112052797B
CN112052797B CN202010931021.0A CN202010931021A CN112052797B CN 112052797 B CN112052797 B CN 112052797B CN 202010931021 A CN202010931021 A CN 202010931021A CN 112052797 B CN112052797 B CN 112052797B
Authority
CN
China
Prior art keywords
maskrcnn
network
target
image
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010931021.0A
Other languages
Chinese (zh)
Other versions
CN112052797A (en
Inventor
陈锐
钱廷柱
刘洪奎
郭云正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Kedalian Safety Technology Co ltd
Original Assignee
Hefei Kedalian Safety Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Kedalian Safety Technology Co ltd filed Critical Hefei Kedalian Safety Technology Co ltd
Priority to CN202010931021.0A priority Critical patent/CN112052797B/en
Publication of CN112052797A publication Critical patent/CN112052797A/en
Application granted granted Critical
Publication of CN112052797B publication Critical patent/CN112052797B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Fire-Detection Mechanisms (AREA)

Abstract

The invention provides a MaskRCNN-based video fire disaster identification method and a MaskRCNN-based video fire disaster identification system, which adopt a MaskRCNN deep learning model to detect targets such as smoke, flame and the like in a video image, transmit images in a video stream to a trained MaskRCNN model, fully extract characteristic information in the images through a series of operations such as convolution, pooling and the like, and accurately output coordinate information and scoring results of the predicted targets such as smoke, flame and the like. The MaskRCNN model has higher accuracy and reliability, but also has a small amount of false alarm phenomenon, and in order to further reduce false alarm, the embodiment uses a dynamic energy detection method based on frame difference to filter the detection result of MaskRCNN, so that most of static object false alarm phenomenon can be removed. Finally, the embodiment adopts the deep neural network to carry out final classification decision on the image area corresponding to the detected object, thereby further reducing the false alarm rate.

Description

MaskRCNN-based video fire disaster identification method and MaskRCNN-based video fire disaster identification system
Technical Field
The invention relates to the technical field of fire disaster identification, in particular to a MaskRCNN-based video fire disaster identification method and system.
Background
Currently, existing fire detection methods can be broadly divided into two categories: fire detection methods based on traditional image processing and fire detection methods based on deep learning. In general, based on the visual characteristics of an image, early scholars propose modeling color information in the image to extract a suspected flame region, and the method has higher real-time performance, but only considers color characteristics, so that the problem of lower accuracy is faced. Then, researchers propose to detect dynamic areas in the video through a dynamic background modeling method, acquire candidate areas through a color model, and finally finish final screening through shape features such as a plurality of morphologies, textures and the like. Compared with a pure color model, the method has much higher performance, and meanwhile, the dynamic background modeling method greatly reduces the false alarm phenomenon of a static object. The above fire detection methods based on traditional image processing involve the use of color features to a greater or lesser extent, however, in some real scenes, the influence of factors such as color cast, overexposure, illumination, etc. of the camera may cause a substantial reduction in the reliability of the algorithm. The subsequent scholars introduce a machine learning algorithm such as a support vector machine, an artificial neural network and the like into fire disaster recognition, and classify the suspected region by using the machine learning algorithm after extracting the image features of the suspected region. According to the method, under the influence of interference factors such as illumination and shielding, good accuracy is obtained, and the performance is superior to that of most fire detection algorithms. However, these algorithms cannot effectively utilize a massive data set to improve the performance of the algorithm, and at the same time, researchers are required to manually design features, which is complicated. According to the fire detection method based on deep learning, an existing mature algorithm, such as FASTERRCNN, MASKRCNN, SSD, YOLO algorithm, is generally adopted, flame and smoke in a single Zhang Jingtai image are directly detected through a convolutional neural network, and targets such as suspected flame and suspected smoke in the image are located.
The method comprises the steps of firstly extracting a suspected smoke area and then carrying out image classification by using a convolutional neural network, wherein the application number of the method is 201911323715. X; the method has the problem of low detection precision.
In addition, the traditional image processing method needs to manually perform complex feature extraction and design, has strong dependence on manually selected features, is generally only suitable for a single specific scene, has poor performance in a real complex scene, and generally has the problems of low detection rate, serious false alarm phenomenon and the like.
Disclosure of Invention
The invention aims to solve the technical problems of improving the accuracy of video fire disaster identification and reducing the false alarm rate.
The invention solves the technical problems by the following technical means:
The video fire disaster identification method based on MaskRCNN is characterized by comprising the following steps of: the method comprises the following steps:
S01, training MaskRCNN a network to obtain a target MaskRCNN network;
s02, transmitting the images in the video stream to a target MaskRCNN network, extracting characteristic information in the images, and obtaining an image classification dataset;
s03, constructing EFFICIENTNET networks, initializing the weights of the networks, and training by using the image classification data set constructed in the step S02 to obtain a target EFFICIENTNET network;
s04, detecting a smoke target in the video by using a target MaskRCNN network to obtain target frame coordinates, probability values and categories of suspected smoke and flame;
s05, judging the frame difference energy, and aiming at the target frame obtained in the step S04, filtering the false alarm phenomenon of the static object through the frame difference energy judgment;
S06, finally deciding by the classification network, and classifying by adopting EFFICIENTNET network according to the target frame reserved after the frame difference energy is judged.
According to the invention, a MaskRCNN deep learning model is adopted to detect the targets such as smoke, flame and the like in the video image, the images in the video stream are transmitted to a trained MaskRCNN model, and the characteristic information in the images can be fully extracted through a series of operations such as convolution, pooling and the like, so that the coordinate information and scoring result of the predicted targets such as smoke, flame and the like can be accurately output. The MaskRCNN model has higher accuracy and reliability, but also has a small amount of false alarm phenomenon, and in order to further reduce false alarm, the embodiment uses a dynamic energy detection method based on frame difference to filter the detection result of MaskRCNN, so that most of static object false alarm phenomenon can be removed. Finally, the embodiment adopts the deep neural network to carry out final classification decision on the image area corresponding to the detected object, thereby further reducing the false alarm rate.
Further, the training MaskRCNN network in the target MaskRCNN network construction module specifically includes: collecting fire image samples, marking the samples, constructing a training data set, marking coordinates and categories of targets such as smoke, flame and the like in the images, and carrying out data enhancement on the data set; constructing MaskRCNN a network, initializing the weight of the MaskRCNN network, and training MaskRCNN network by using the constructed training set to obtain a target MaskRCNN network; the sample label is specifically as follows: the upper left (x 1, y 1) and lower right (x 2, y 2) corner coordinates of the smoke and flame objects are noted, and the data enhancement operations on the samples in the image dataset include random clipping, random luminance dithering, random saturation dithering, random contrast dithering, random hue dithering, mixup operations, mixup are processed as follows:
wherein lambda-Beta (1.5 );
Wherein x i represents an image i to be fused, x j represents an image j to be fused, and y i and y j represent labeling information of the image i and the image j, respectively;
the training process of MaskRCNN network is as follows:
step S011, initializing parameters of MaskRCNN networks by using the trained network parameters on the COCO data set;
step S012: after scaling the image samples in the training dataset to 1024x1024, extracting the integral feature map of the training sample image by utilizing ResNet < 101+ > FPN network in MaskRCNN;
Step S013: inputting the integral feature map into an RPN network, predicting an ROI (region of interest) region, and selecting positive and negative samples according to the overlapping ratio of a candidate region target frame and a labeling target frame;
Step S014: performing ROIAlign pooling calculation on the ROI region on the feature map corresponding to the positive and negative samples to obtain a candidate region feature map with fixed size;
When ROIAlign is used for pooling calculation, firstly, an ROI region target frame is mapped onto a feature map, then, a feature image ROI region is obtained according to a minimum circumscribed rectangle algorithm, the ROI region is divided into m multiplied by m grids, 4 points on the feature map are selected by each grid to carry out bilinear difference, and finally, the feature map with the size of m multiplied by m is obtained;
Step S015: classifying the ROI region feature map calculated by ROIAlign and carrying out regression calculation on the target frame;
Step S016: calculating MaskRCNN a loss function, performing gradient calculation on the loss function through a random gradient descent algorithm, and updating MaskRCNN network weight;
Step S017: repeating the steps S012 to S016 until the preset iteration times are reached, stopping training, and storing MaskRCNN networks.
Further, in the step S02, specifically: and detecting a large amount of video data and image data by using a target MaskRCNN network, cutting out targets with suspected smoke and flame, and constructing an image classification dataset with 3 categories including smoke, flame and probability value.
Further, the step S03 specifically includes: and initializing parameters of the EFFICIENTNET network by using the trained EFFICIENTNET classification network parameters on the ImageNet data set, inputting the image classification data set, performing end-to-end training, performing gradient calculation on the classification loss function by an Adam optimization algorithm, updating EFFICIENTNET network parameters, and stopping training after the set round of training is completed to obtain the target EFFICIENTNET network.
Further, the step S04 specifically includes: acquiring images from a video stream, scaling the video images to 1024x1024 size, inputting the video images into a target MashRCNN network, and predicting to obtain target frame coordinates, probability values and categories of suspected smoke and flame; NMS non-maximum suppression is carried out on all predicted targets, and overlapping invalid target frames are filtered.
Further, the step S05 specifically includes: cutting out an image area corresponding to a target frame in each frame of the nearest adjacent N frames according to the target frame, wherein N is more than 2, N images acquired by each target frame respectively perform frame difference calculation on the adjacent 2 images, binarization calculation is performed according to a threshold T to obtain N-1 binary images, the number of non-zero values of pixel values in all the binary images is counted, and the number is divided by the area of the target frame to obtain a final energy value; if the energy value is greater than the set energy threshold, the target frame is valid, otherwise, the target frame is discarded as a false positive of the static object.
The invention also provides a MaskRCNN-based video fire identification system, which comprises:
The target MaskRCNN network construction module trains MaskRCNN networks to obtain a target MaskRCNN network;
the image classification data set construction module is used for transmitting images in the video stream to the target MaskRCNN network, extracting characteristic information in the images and obtaining an image classification data set;
the target EFFICIENTNET network construction module constructs EFFICIENTNET network, initializes the weight of the network, and trains by using the image classification data set constructed in the step S02 to obtain a target EFFICIENTNET network;
The firework target detection module is used for detecting firework targets in the video by using a target MaskRCNN network to obtain target frame coordinates, probability values and categories of suspected smoke and flame;
the frame difference energy judging module is used for judging the frame difference energy of the target frame and filtering the false alarm phenomenon of the static object;
and the final decision module of the classification network adopts EFFICIENTNET network to classify the target frames reserved after the frame difference energy is judged.
Further, the training MaskRCNN network in the target MaskRCNN network construction module specifically includes: collecting fire image samples, marking the samples, constructing a training data set, marking coordinates and categories of targets such as smoke, flame and the like in the images, and carrying out data enhancement on the data set; constructing MaskRCNN a network, initializing the weight of the MaskRCNN network, and training MaskRCNN network by using the constructed training set to obtain a target MaskRCNN network; the sample label is specifically as follows: the upper left (x 1, y 1) and lower right (x 2, y 2) corner coordinates of the smoke and flame objects are noted, and the data enhancement operations on the samples in the image dataset include random clipping, random luminance dithering, random saturation dithering, random contrast dithering, random hue dithering, mixup operations, mixup are processed as follows:
wherein lambda-Beta (1.5 );
Wherein x i represents an image i to be fused, x j represents an image j to be fused, and y i and y j represent labeling information of the image i and the image j, respectively;
the training process of MaskRCNN network is as follows:
step S011, initializing parameters of MaskRCNN networks by using the trained network parameters on the COCO data set;
step S012: after scaling the image samples in the training dataset to 1024x1024, extracting the integral feature map of the training sample image by utilizing ResNet < 101+ > FPN network in MaskRCNN;
Step S013: inputting the integral feature map into an RPN network, predicting an ROI (region of interest) region, and selecting positive and negative samples according to the overlapping ratio of a candidate region target frame and a labeling target frame;
Step S014: performing ROIAlign pooling calculation on the ROI region on the feature map corresponding to the positive and negative samples to obtain a candidate region feature map with fixed size;
When ROIAlign is used for pooling calculation, firstly, an ROI region target frame is mapped onto a feature map, then, a feature image ROI region is obtained according to a minimum circumscribed rectangle algorithm, the ROI region is divided into m multiplied by m grids, 4 points on the feature map are selected by each grid to carry out bilinear difference, and finally, the feature map with the size of m multiplied by m is obtained;
Step S015: classifying the ROI region feature map calculated by ROIAlign and carrying out regression calculation on the target frame;
Step S016: calculating MaskRCNN a loss function, performing gradient calculation on the loss function through a random gradient descent algorithm, and updating MaskRCNN network weight;
Step S017: repeating the steps S012 to S016 until the preset iteration times are reached, stopping training, and storing MaskRCNN networks.
Further, the specific implementation process of the firework target detection module is as follows: acquiring images from a video stream, scaling the video images to 1024x1024 size, inputting the video images into a target MashRCNN network, and predicting to obtain target frame coordinates, probability values and categories of suspected smoke and flame; NMS non-maximum suppression is carried out on all predicted targets, and overlapping invalid target frames are filtered.
Further, the specific implementation process of the frame difference energy judging module is as follows: cutting out an image area corresponding to a target frame in each frame of the nearest adjacent N frames (N > 3) according to the target frame, respectively carrying out frame difference calculation on the adjacent 2 images by each target frame, carrying out binarization calculation according to a threshold T to obtain N-1 binary images, counting the number of non-zero values of pixel values in all the binary images, and dividing the number by the target frame area to obtain a final energy value; if the energy value is greater than the set energy threshold, the target frame is valid, otherwise, the target frame is discarded as a false positive of the static object.
The invention has the advantages that:
According to the invention, a MaskRCNN deep learning model is adopted to detect the targets such as smoke, flame and the like in the video image, the images in the video stream are transmitted to a trained MaskRCNN model, and the characteristic information in the images can be fully extracted through a series of operations such as convolution, pooling and the like, so that the coordinate information and scoring result of the predicted targets such as smoke, flame and the like can be accurately output. The MaskRCNN model has higher accuracy and reliability, but also has a small amount of false alarm phenomenon, and in order to further reduce false alarm, the embodiment uses a dynamic energy detection method based on frame difference to filter the detection result of MaskRCNN, so that most of static object false alarm phenomenon can be removed. Finally, the embodiment adopts the deep neural network to carry out final classification decision on the image area corresponding to the detected object, thereby further reducing the false alarm rate.
Based on the algorithm of deep learning, various characteristics of flame and smoke in the image can be automatically learned through a convolutional neural network, the flame and smoke in the image can be identified and positioned by the learned characteristics, and the method has the advantages of high detection rate, low false alarm rate, high robustness and the like. In actual deployment, the method is accelerated by software and hardware, and can basically meet the real-time requirement.
Drawings
FIG. 1 is a diagram of a MaskRCNN network and EFFICIENTNET network training process in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of detecting a fire by a video fire identification method according to an embodiment of the invention;
FIG. 3 is a flow chart of MaskRCNN network detection fire in the present invention;
FIG. 4 is a flow chart of dynamic detection in an embodiment of the invention;
FIG. 5 is a diagram showing the effect of Mixup methods according to embodiments of the present invention;
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment provides a MaskRCNN-based video fire disaster identification method, which comprises the following steps:
Step S1: as shown in fig. 1, collecting fire image samples, labeling the samples, constructing a training data set, labeling coordinates and categories of targets such as smoke, flame and the like in the images, and carrying out data enhancement on the data set; the data enhancement operations include random clipping, random luminance dithering, random saturation dithering, random contrast dithering, random hue dithering, mixup, etc. As shown in fig. 1, the embodiment provides an image fusion method named mixup, which fuses two different images according to a certain proportion, so that the diversity of a training dataset can be effectively increased, as shown in fig. 5;
The upper left corner coordinates (x 1, y 1) and lower right corner coordinates (x 2, y 2) of the smoke and flame objects need to be marked, and the data enhancement operation of the samples in the image data set comprises operations of random clipping, random brightness dithering, random saturation dithering, random contrast dithering, random hue dithering, mixup and the like. Mixup the processing formula is as follows:
wherein lambda-Beta (1.5 );
where x i represents image i to be fused, x j represents image j to be fused, and y i and y j represent labeling information (target frame and category information) of image i and image j, respectively.
Step S2: as shown in fig. 2, a MaskRCNN network is constructed, the weight of the network is initialized, the training set constructed in the step S1 is used for training MaskRCNN network, and after 100 rounds of training, training is stopped;
the training process of MaskRCNN model is specifically as follows:
s2.1, initializing parameters of MaskRCNN networks by using model parameters trained on a COCO data set;
Step S2.2: after scaling the image samples in the training dataset to 1024x1024, extracting the integral feature map of the training sample image by utilizing ResNet < 101+ > FPN network in MaskRCNN;
Step S2.3: inputting the integral feature map into an RPN network, predicting a candidate Region (ROI), and selecting positive and negative samples according to the overlapping ratio of a candidate region target frame and a labeling target frame;
Step S2.4: performing ROIAlign pooling calculation on the ROI region on the feature map corresponding to the positive and negative samples to obtain a candidate region feature map with fixed size;
When ROIAlign is used for pooling calculation, firstly, a candidate region target frame is mapped onto a feature map, then, a feature image ROI region is obtained according to a minimum circumscribed rectangle algorithm, the ROI region is divided into m multiplied by m grids, 4 points on the feature map are selected by each grid to carry out bilinear difference, and finally, the feature map with the size of m multiplied by m is obtained;
step S2.5: classifying the candidate region feature map calculated by ROIAlign and carrying out regression calculation on the target frame;
Step S2.6: calculating MaskRCNN a loss function, performing gradient calculation on the loss function through a random gradient descent algorithm, and updating MaskRCNN network weight;
Step S2.7: repeating the steps S2.2 to S2.6 until the preset iteration times are reached, stopping training, and storing MaskRCNN network models;
step S3: as shown in fig. 1, a trained MaskRCNN model is used for detecting a large amount of video data and image data, a suspected smoke and flame target is cut out, and a guarantee replacement smoke, flame and false alarm image classification dataset is constructed;
Step S4: as shown in fig. 1, a EFFICIENTNET network is constructed, parameters of the EFFICIENTNET network are initialized by using a model pre-trained on an ImageNet dataset, training is performed by using the image classification dataset constructed in the step S3, and after 50 rounds of training, training is stopped;
initializing parameters of a EFFICIENTNET network by using EFFICIENTNET classification model parameters trained on an ImageNet dataset, inputting training data images, performing end-to-end training, performing gradient calculation on a classification loss function by an Adam optimization algorithm, updating EFFICIENTNET network parameters, and stopping training after 50 rounds of training are completed;
as shown in fig. 2, after the target MaskRCNN network and the target EFFICIENTNET network are obtained, the video fire identification method specifically includes the following steps:
step S5, as shown in FIG. 3, preprocessing a single frame image obtained by decoding a video stream, extracting overall image features of the preprocessed image through ResNet101+FPN network in MaskRCNN, predicting to obtain a candidate region after the feature map is input into the RPN network, performing ROIAlign pooling calculation on the feature map of the candidate region, outputting a candidate feature map with a fixed size, and finally classifying the candidate feature map and regressing a target frame by a classification regression network in MaskRCNN. Considering that there is a large amount of target overlap in the predicted result of MaskRCNN, NMS calculation of the prediction of MaskRCNN is required to suppress targets with excessive overlap areas.
Step S6, frame difference energy judgment: after MaskRCNN network detection and NMS calculation are completed, dynamic detection is required to be carried out on a suspected target area, as shown in fig. 4, a target ROI area in N (N > 2) continuous adjacent frames is first obtained, frame difference calculation is carried out between the adjacent 2 frames, binarization calculation is carried out according to a threshold T, a binary image is obtained, the number of non-zero values of pixel values in all the binary images is counted, and the number is divided by the area of the target frame to obtain a final energy value. And setting the pixels with absolute values larger than the threshold value T in the frame difference images to be 1, setting the rest pixels to be 0, and counting the number of non-zero value pixels in all the frame difference images, wherein the number of the pixels is divided by the area of the target area, and the calculated energy value is obtained. If the energy value is larger than the set threshold, the target frame is judged to be dynamic, the target frame is effective, otherwise, the target frame is judged to be static, and the target frame is discarded.
Step S7, final decision of the classification network: if the suspected target area is determined to be dynamic, further determination is required, and the image of the suspected target area is input into the target EFFICIENTNET network, the target EFFICIENTNET classifies the image, and determines whether it is false alarm or smoke or flame. If smoke or flame is judged, the alarm is directly given, otherwise, the alarm is not given, and the detection process of the next frame is directly carried out.
The embodiment provides a video fire detection method based on deep learning, which adopts a MaskRCNN deep learning model to detect targets such as smoke, flame and the like in a video image, transmits images in a video stream to a trained MaskRCNN model, and can fully extract characteristic information in the images through a series of operations such as convolution, pooling and the like, so that coordinate information and scoring results of the predicted targets such as smoke, flame and the like can be accurately output. The MaskRCNN model has higher accuracy and reliability, but also has a small amount of false alarm phenomenon, and in order to further reduce false alarm, the embodiment uses a dynamic energy detection method based on frame difference to filter the detection result of MaskRCNN, so that most of static object false alarm phenomenon can be removed. Finally, the embodiment adopts the deep neural network to carry out final classification decision on the image area corresponding to the detected object, thereby further reducing the false alarm rate.
Considering that MaskRCNN algorithm requires training data to provide mask annotation information, and the present embodiment does not need semantic segmentation function, the present embodiment eliminates semantic segmentation related branching function in MaskRCNN network.
The embodiment also provides a video fire identification system based on MaskRCNN, which comprises:
The target MaskRCNN network construction module is used for collecting fire image samples, carrying out sample labeling, constructing a training data set, marking coordinates and categories of targets such as smoke, flame and the like in the images, and carrying out data enhancement on the data set;
The upper left corner coordinates (x 1, y 1) and lower right corner coordinates (x 2, y 2) of the smoke and flame objects need to be marked, and the data enhancement operation of the samples in the image data set comprises operations of random clipping, random brightness dithering, random saturation dithering, random contrast dithering, random hue dithering, mixup and the like. Mixup the processing formula is as follows:
wherein lambda-Beta (1.5 );
where x i represents image i to be fused, x j represents image j to be fused, and y i and y j represent labeling information (target frame and category information) of image i and image j, respectively.
Constructing MaskRCNN a network, initializing the weight of the network, training the MaskRCNN network by using the training set constructed in the step S1, and stopping training after 100 rounds of training;
the training process of MaskRCNN model is specifically as follows:
s2.1, initializing parameters of MaskRCNN networks by using model parameters trained on a COCO data set;
Step S2.2: after scaling the image samples in the training dataset to 1024x1024, extracting the integral feature map of the training sample image by utilizing ResNet < 101+ > FPN network in MaskRCNN;
Step S2.3: inputting the integral feature map into an RPN network, predicting a candidate Region (ROI), and selecting positive and negative samples according to the overlapping ratio of a candidate region target frame and a labeling target frame;
Step S2.4: performing ROIAlign pooling calculation on the ROI region on the feature map corresponding to the positive and negative samples to obtain a candidate region feature map with fixed size;
When ROIAlign is used for pooling calculation, firstly, a candidate region target frame is mapped onto a feature map, then, a feature image ROI region is obtained according to a minimum circumscribed rectangle algorithm, the ROI region is divided into m x m grids, 4 points on the feature map are selected by each grid to carry out bilinear difference, and finally, the feature map with the size of m x m is obtained;
step S2.5: classifying the candidate region feature map calculated by ROIAlign and carrying out regression calculation on the target frame;
Step S2.6: calculating MaskRCNN a loss function, performing gradient calculation on the loss function through a random gradient descent algorithm, and updating MaskRCNN network weight;
Step S2.7: repeating the steps S2.2 to S2.6 until the preset iteration times are reached, stopping training, and storing MaskRCNN network models;
The image classification data set construction module is used for detecting a large amount of video data and image data by using a trained MaskRCNN model, cutting out targets with suspected smoke and flame, and constructing 3 kinds of image classification data sets of guarantee replacement smoke, flame and false alarm;
The target EFFICIENTNET network construction module initializes the weight of the network, trains by using the image classification dataset constructed in the step S3, and stops training after 50 rounds of training;
initializing parameters of a EFFICIENTNET network by using EFFICIENTNET classification model parameters trained on an ImageNet dataset, inputting training data images, performing end-to-end training, performing gradient calculation on a classification loss function by an Adam optimization algorithm, updating EFFICIENTNET network parameters, and stopping training after 50 rounds of training are completed;
after the target MaskRCNN network and the target EFFICIENTNET network are obtained, the video fire identification method specifically comprises the following steps:
The firework target detection module MaskRCNN detects a firework target in a video: acquiring images from a video stream, scaling the video images to 1024x1024 size, inputting the video images into MashRCNN networks, and predicting to obtain target frame coordinates, probability values and categories of suspected smoke and flame; NMS non-maximum suppression is carried out on all the predicted targets, and a large number of overlapped invalid target frames can be filtered;
and the frame difference energy judging module cuts out an image area corresponding to a target frame (assuming that the target frame width is w and the height is h) in each frame in the nearest adjacent N frames (N > 3) according to the target frame obtained in the last step, respectively carries out frame difference calculation on the adjacent 2 images by each target frame, carries out binarization calculation according to a threshold T to obtain N-1 binary images, counts the number of non-zero values of pixel values in all the binary images, and divides the number by the target frame area to obtain the final energy value. If the energy value is larger than the set energy threshold, the target frame is effective, otherwise, the target frame is regarded as false alarm of a static object, and the target frame is discarded;
And the final decision module of the classification network directly cuts the target through a target frame reserved after the frame difference energy is judged, scales the target to 128x128 size, outputs EFFICIENTNET the network to carry out final classification, and alarms if the prediction is smoke or flame, or does not alarm.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (2)

1. The video fire disaster identification method based on MaskRCNN is characterized by comprising the following steps of: the method comprises the following steps:
S01, training MaskRCNN a network to obtain a target MaskRCNN network;
s02, transmitting the images in the video stream to a target MaskRCNN network, extracting characteristic information in the images, and obtaining an image classification dataset;
s03, constructing EFFICIENTNET networks, initializing the weights of the networks, and training by using the image classification data set constructed in the step S02 to obtain a target EFFICIENTNET network;
s04, detecting a smoke target in the video by using a target MaskRCNN network to obtain target frame coordinates, probability values and categories of suspected smoke and flame;
s05, judging the frame difference energy, and aiming at the target frame obtained in the step S04, filtering the false alarm phenomenon of the static object through the frame difference energy judgment;
S06, a final decision of a classification network is made, and a EFFICIENTNET network is adopted for classification aiming at a target frame reserved after the frame difference energy is judged;
The training MaskRCNN network in the target MaskRCNN network construction module specifically comprises: collecting fire image samples, marking the samples, constructing a training data set, marking coordinates and categories of smoke and flame targets in the images, and carrying out data enhancement on the data set; constructing MaskRCNN a network, initializing the weight of the MaskRCNN network, and training MaskRCNN network by using the constructed training set to obtain a target MaskRCNN network; the sample label is specifically as follows: the upper left (x 1, y 1) and lower right (x 2, y 2) corner coordinates of the smoke and flame objects are noted, and the data enhancement operations on the samples in the image dataset include random clipping, random luminance dithering, random saturation dithering, random contrast dithering, random hue dithering, mixup operations, mixup are processed as follows:
wherein lambda-Beta (1.5 );
Wherein x i represents an image i to be fused, x j represents an image j to be fused, and y i and y j represent labeling information of the image i and the image j, respectively;
the training process of MaskRCNN network is as follows:
step S011, initializing parameters of MaskRCNN networks by using the trained network parameters on the COCO data set;
step S012: after scaling the image samples in the training dataset to 1024x1024, extracting the integral feature map of the training sample image by utilizing ResNet < 101+ > FPN network in MaskRCNN;
Step S013: inputting the integral feature map into an RPN network, predicting an ROI (region of interest) region, and selecting positive and negative samples according to the overlapping ratio of a candidate region target frame and a labeling target frame;
Step S014: performing ROIAlign pooling calculation on the ROI region on the feature map corresponding to the positive and negative samples to obtain a candidate region feature map with fixed size;
When ROIAlign is used for pooling calculation, firstly, an ROI region target frame is mapped onto a feature map, then, a feature image ROI region is obtained according to a minimum circumscribed rectangle algorithm, the ROI region is divided into m multiplied by m grids, 4 points on the feature map are selected by each grid to carry out bilinear difference, and finally, the feature map with the size of m multiplied by m is obtained;
Step S015: classifying the ROI region feature map calculated by ROIAlign and carrying out regression calculation on the target frame;
Step S016: calculating MaskRCNN a loss function, performing gradient calculation on the loss function through a random gradient descent algorithm, and updating MaskRCNN network weight;
Step S017: repeating the steps S012 to S016 until the preset iteration times are reached, stopping training, and storing MaskRCNN networks;
the step S02 specifically includes: detecting a large amount of video data and image data by using a target MaskRCNN network, cutting out targets with suspected smoke and flame, and constructing an image classification dataset with 3 categories including smoke, flame and probability value;
the step S03 specifically includes: initializing parameters of a EFFICIENTNET network by using the trained EFFICIENTNET classification network parameters on the ImageNet data set, inputting the image classification data set, performing end-to-end training, performing gradient calculation on a classification loss function by an Adam optimization algorithm, updating EFFICIENTNET network parameters, and stopping training after finishing the set round of training to obtain a target EFFICIENTNET network;
The step S04 specifically includes: acquiring images from a video stream, scaling the video images to 1024x1024 size, inputting the video images into a target MashRCNN network, and predicting to obtain target frame coordinates, probability values and categories of suspected smoke and flame; performing NMS non-maximum suppression on all predicted targets, and filtering out overlapped invalid target frames;
The step S05 specifically includes: cutting out an image area corresponding to a target frame in each frame of the nearest adjacent N frames according to the target frame, wherein N is more than 2, N images acquired by each target frame respectively perform frame difference calculation on the adjacent 2 images, binarization calculation is performed according to a threshold T to obtain N-1 binary images, the number of non-zero values of pixel values in all the binary images is counted, and the number is divided by the area of the target frame to obtain a final energy value; if the energy value is greater than the set energy threshold, the target frame is valid, otherwise, the target frame is discarded as a false positive of the static object.
2. MaskRCNN-based video fire identification system is characterized in that: comprising the following steps:
The target MaskRCNN network construction module trains MaskRCNN networks to obtain a target MaskRCNN network;
the image classification data set construction module is used for transmitting images in the video stream to the target MaskRCNN network, extracting characteristic information in the images and obtaining an image classification data set;
the target EFFICIENTNET network construction module constructs EFFICIENTNET network, initializes the weight of the network, and trains by using the image classification data set constructed in the step S02 to obtain a target EFFICIENTNET network;
The firework target detection module is used for detecting firework targets in the video by using a target MaskRCNN network to obtain target frame coordinates, probability values and categories of suspected smoke and flame;
the frame difference energy judging module is used for judging the frame difference energy of the target frame and filtering the false alarm phenomenon of the static object;
the final decision module of the classification network adopts EFFICIENTNET network to classify the target frame reserved after the frame difference energy is judged;
The training MaskRCNN network in the target MaskRCNN network construction module specifically comprises: collecting fire image samples, marking the samples, constructing a training data set, marking coordinates and categories of smoke and flame targets in the images, and carrying out data enhancement on the data set; constructing MaskRCNN a network, initializing the weight of the MaskRCNN network, and training MaskRCNN network by using the constructed training set to obtain a target MaskRCNN network; the sample label is specifically as follows: the upper left (x 1, y 1) and lower right (x 2, y 2) corner coordinates of the smoke and flame objects are noted, and the data enhancement operations on the samples in the image dataset include random clipping, random luminance dithering, random saturation dithering, random contrast dithering, random hue dithering, mixup operations, mixup are processed as follows:
wherein lambda-Beta (1.5 );
Wherein x i represents an image i to be fused, x j represents an image j to be fused, and y i and y j represent labeling information of the image i and the image j, respectively;
the training process of MaskRCNN network is as follows:
step S011, initializing parameters of MaskRCNN networks by using the trained network parameters on the COCO data set;
step S012: after scaling the image samples in the training dataset to 1024x1024, extracting the integral feature map of the training sample image by utilizing ResNet < 101+ > FPN network in MaskRCNN;
Step S013: inputting the integral feature map into an RPN network, predicting an ROI (region of interest) region, and selecting positive and negative samples according to the overlapping ratio of a candidate region target frame and a labeling target frame;
Step S014: performing ROIAlign pooling calculation on the ROI region on the feature map corresponding to the positive and negative samples to obtain a candidate region feature map with fixed size;
When ROIAlign is used for pooling calculation, firstly, an ROI region target frame is mapped onto a feature map, then, a feature image ROI region is obtained according to a minimum circumscribed rectangle algorithm, the ROI region is divided into m multiplied by m grids, 4 points on the feature map are selected by each grid to carry out bilinear difference, and finally, the feature map with the size of m multiplied by m is obtained;
Step S015: classifying the ROI region feature map calculated by ROIAlign and carrying out regression calculation on the target frame;
Step S016: calculating MaskRCNN a loss function, performing gradient calculation on the loss function through a random gradient descent algorithm, and updating MaskRCNN network weight;
Step S017: repeating the steps S012 to S016 until the preset iteration times are reached, stopping training, and storing MaskRCNN networks;
The specific implementation process of the firework target detection module is as follows: acquiring images from a video stream, scaling the video images to 1024x1024 size, inputting the video images into a target MashRCNN network, and predicting to obtain target frame coordinates, probability values and categories of suspected smoke and flame; performing NMS non-maximum suppression on all predicted targets, and filtering out overlapped invalid target frames;
The frame difference energy judging module specifically executes the following steps: cutting out an image area corresponding to a target frame in each frame in the nearest adjacent N frames according to the target frame, wherein N is more than 2, N images acquired by each target frame respectively perform frame difference calculation on the adjacent 2 images, binarization calculation is performed according to a threshold T to obtain N-1 binary images, the number of non-zero values of pixel values in all the binary images is counted, and the number is divided by the area of the target frame to obtain a final energy value; if the energy value is greater than the set energy threshold, the target frame is valid, otherwise, the target frame is discarded as a false positive of the static object.
CN202010931021.0A 2020-09-07 2020-09-07 MaskRCNN-based video fire disaster identification method and MaskRCNN-based video fire disaster identification system Active CN112052797B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010931021.0A CN112052797B (en) 2020-09-07 2020-09-07 MaskRCNN-based video fire disaster identification method and MaskRCNN-based video fire disaster identification system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010931021.0A CN112052797B (en) 2020-09-07 2020-09-07 MaskRCNN-based video fire disaster identification method and MaskRCNN-based video fire disaster identification system

Publications (2)

Publication Number Publication Date
CN112052797A CN112052797A (en) 2020-12-08
CN112052797B true CN112052797B (en) 2024-07-16

Family

ID=73609865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010931021.0A Active CN112052797B (en) 2020-09-07 2020-09-07 MaskRCNN-based video fire disaster identification method and MaskRCNN-based video fire disaster identification system

Country Status (1)

Country Link
CN (1) CN112052797B (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633231B (en) * 2020-12-30 2022-08-02 珠海大横琴科技发展有限公司 Fire disaster identification method and device
CN112699801B (en) * 2020-12-30 2022-11-11 上海船舶电子设备研究所(中国船舶重工集团公司第七二六研究所) Fire identification method and system based on video image
CN112686190A (en) * 2021-01-05 2021-04-20 北京林业大学 Forest fire smoke automatic identification method based on self-adaptive target detection
CN112861635B (en) * 2021-01-11 2024-05-14 西北工业大学 Fire disaster and smoke real-time detection method based on deep learning
CN112907885B (en) * 2021-01-12 2022-08-16 中国计量大学 Distributed centralized household image fire alarm system and method based on SCNN
CN112800929B (en) * 2021-01-25 2022-05-31 安徽农业大学 Bamboo shoot quantity and high growth rate online monitoring method based on deep learning
CN113762314B (en) * 2021-02-02 2023-11-03 北京京东振世信息技术有限公司 Firework detection method and device
CN112907886A (en) * 2021-02-07 2021-06-04 中国石油化工股份有限公司 Refinery plant fire identification method based on convolutional neural network
CN112819001B (en) * 2021-03-05 2024-02-23 浙江中烟工业有限责任公司 Complex scene cigarette packet recognition method and device based on deep learning
CN113033553B (en) * 2021-03-22 2023-05-12 深圳市安软科技股份有限公司 Multi-mode fusion fire detection method, device, related equipment and storage medium
CN113192038B (en) * 2021-05-07 2022-08-19 北京科技大学 Method for recognizing and monitoring abnormal smoke and fire in existing flame environment based on deep learning
CN113409923B (en) * 2021-05-25 2022-03-04 济南大学 Error correction method and system in bone marrow image individual cell automatic marking
CN113553985A (en) * 2021-08-02 2021-10-26 中再云图技术有限公司 High-altitude smoke detection and identification method based on artificial intelligence, storage device and server
CN113657233A (en) * 2021-08-10 2021-11-16 东华大学 Unmanned aerial vehicle forest fire smoke detection method based on computer vision
CN113657250A (en) * 2021-08-16 2021-11-16 南京图菱视频科技有限公司 Flame detection method and system based on monitoring video
CN113792684B (en) * 2021-09-17 2024-03-29 中国科学技术大学 Multi-mode visual flame detection method for fire-fighting robot under weak alignment condition
CN113947744A (en) * 2021-10-26 2022-01-18 华能盐城大丰新能源发电有限责任公司 Fire image detection method, system, equipment and storage medium based on video
CN114120171A (en) * 2021-10-28 2022-03-01 华能盐城大丰新能源发电有限责任公司 Fire smoke detection method, device and equipment based on video frame and storage medium
CN114120208A (en) * 2022-01-27 2022-03-01 青岛海尔工业智能研究院有限公司 Flame detection method, device, equipment and storage medium
CN114664047A (en) * 2022-05-26 2022-06-24 长沙海信智能系统研究院有限公司 Expressway fire identification method and device and electronic equipment
CN115170894B (en) * 2022-09-05 2023-07-25 深圳比特微电子科技有限公司 Method and device for detecting smoke and fire
CN115862258B (en) * 2022-11-22 2023-09-22 中国科学院合肥物质科学研究院 Fire monitoring and disposing system, method, equipment and storage medium
CN115861898A (en) * 2022-12-27 2023-03-28 浙江创悦诚科技有限公司 Flame smoke identification method applied to gas field station
CN117011785B (en) * 2023-07-06 2024-04-05 华新水泥股份有限公司 Firework detection method, device and system based on space-time correlation and Gaussian heat map

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232380A (en) * 2019-06-13 2019-09-13 应急管理部天津消防研究所 Fire night scenes restored method based on Mask R-CNN neural network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10936969B2 (en) * 2016-09-26 2021-03-02 Shabaz Basheer Patel Method and system for an end-to-end artificial intelligence workflow
CN109977790A (en) * 2019-03-04 2019-07-05 浙江工业大学 A kind of video smoke detection and recognition methods based on transfer learning
CN109903507A (en) * 2019-03-04 2019-06-18 上海海事大学 A kind of fire disaster intelligent monitor system and method based on deep learning
CN111539265B (en) * 2020-04-02 2024-01-09 申龙电梯股份有限公司 Method for detecting abnormal behavior in elevator car

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232380A (en) * 2019-06-13 2019-09-13 应急管理部天津消防研究所 Fire night scenes restored method based on Mask R-CNN neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周育新.基于关键帧的轻量化行为识别方法研究.仪器仪表学报.2020,第41卷(第7期),第3页右栏倒数第4段-第5页右栏第3段. *

Also Published As

Publication number Publication date
CN112052797A (en) 2020-12-08

Similar Documents

Publication Publication Date Title
CN112052797B (en) MaskRCNN-based video fire disaster identification method and MaskRCNN-based video fire disaster identification system
CN105404847B (en) A kind of residue real-time detection method
CN105844295B (en) A kind of video smoke sophisticated category method based on color model and motion feature
CN112001339A (en) Pedestrian social distance real-time monitoring method based on YOLO v4
CN109460754B (en) A kind of water surface foreign matter detecting method, device, equipment and storage medium
CN110298297B (en) Flame identification method and device
CN110490043A (en) A kind of forest rocket detection method based on region division and feature extraction
CN112598713A (en) Offshore submarine fish detection and tracking statistical method based on deep learning
CN105654508B (en) Monitor video method for tracking moving target and system based on adaptive background segmentation
CN109389185B (en) Video smoke identification method using three-dimensional convolutional neural network
CN111914665B (en) Face shielding detection method, device, equipment and storage medium
CN112364740B (en) Unmanned aerial vehicle room monitoring method and system based on computer vision
CN109242826B (en) Mobile equipment end stick-shaped object root counting method and system based on target detection
CN111062974A (en) Method and system for extracting foreground target by removing ghost
CN114639075B (en) Method and system for identifying falling object of high altitude parabola and computer readable medium
CN112115878B (en) Forest fire smoke root node detection method based on smoke area density
CN106650638A (en) Abandoned object detection method
CN112417955A (en) Patrol video stream processing method and device
CN112435257A (en) Smoke detection method and system based on multispectral imaging
CN115311623A (en) Equipment oil leakage detection method and system based on infrared thermal imaging
CN111950357A (en) Marine water surface garbage rapid identification method based on multi-feature YOLOV3
CN110991245A (en) Real-time smoke detection method based on deep learning and optical flow method
CN113177439B (en) Pedestrian crossing road guardrail detection method
CN113657250A (en) Flame detection method and system based on monitoring video
CN117557937A (en) Monitoring camera image anomaly detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant