CN109034033B

CN109034033B - Smoke discharge video detection method based on improved VGG16 convolutional network

Info

Publication number: CN109034033B
Application number: CN201810787200.4A
Authority: CN
Inventors: 肖志勇; 刘徐; 刘辰; 吴鑫鑫
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2018-07-16
Filing date: 2018-07-16
Publication date: 2021-05-14
Anticipated expiration: 2038-07-16
Also published as: CN109034033A

Abstract

The invention belongs to the technical field of target detection, and particularly relates to a smoke discharge video detection method based on an improved VGG16 convolutional network, which comprises the following steps: step 1: generating a chimney emission image dataset; and 2, sending the training set into an improved VGG16 convolutional network for training to obtain a plurality of weight models. The main network structure of the improved VGG16 convolutional network is VGG16, the last two full-connection layers are changed into two convolutional layers and used for extracting image features of the chimney in a multi-scale mode, the convolutional layers are connected with the global mean pooling layer, the generated matrix is used for outputting results, and finally the matrices are input into a loss function for classification to construct a complete network structure. The detection speed of the invention is greatly improved, and the default boxes with different length-width ratios are used, so that a plurality of default boxes can adapt to different shapes and sizes of different objects.

Description

Smoke discharge video detection method based on improved VGG16 convolutional network

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to a smoke discharge video detection method based on an improved VGG16 convolutional network.

Background

In the rapidly developing modern society, "safe, rapid and convenient" has become the pronoun of the modern society, and all aspects of life are full of the utilization of science and technology. Computer vision is a science for researching how to make a machine look, and the computer vision utilizes a camera and a computer to replace human eyes to carry out machine vision such as identification, tracking, measurement and the like on a target, and further carries out graphic processing, so that an image is more suitable for human eye observation or instrument detection. Computer vision is now a challenging, important and popular research area in the scientific field.

China chemical plants develop rapidly, the smoke discharge phenomenon of a chimney is complex, and air pollution is serious. Along with the increasingly strict national environmental protection regulations, the monitoring of the smoke discharge condition of a chemical plant is more important. Air pollution is a significant problem that must be addressed for human sustainable development. Although the camera is used for monitoring the smoke exhaust in some scenes at present, people still need to watch the monitoring picture, so that the time and labor are consumed, and the accurate and real-time response to the smoke exhaust condition cannot be guaranteed.

In order to meet the requirements of practical application, aiming at various defects of the existing chimney smoke discharge supervision and detection, the intelligent detection of the smoke discharge condition is researched, and along with the continuous development of deep learning and the continuous expansion of the application field, the automatic and intelligent real-time monitoring and detection of the chimney smoke discharge become possible, so that the chimney smoke discharge video real-time detection method based on the improved VGG16 convolutional network is designed.

Disclosure of Invention

The invention adopts the improved VGG16 convolution network to detect the smoke discharge of the chimney smoke discharge video and the image, and can effectively meet the requirements of real-time performance and precision.

The technical scheme of the invention is as follows:

a method for detecting smoke discharge video based on an improved VGG16 convolutional network comprises the following steps:

step 1: generating a chimney emission image dataset;

step 1.1 crawling chimney smoke extraction image

Downloading a chimney smoke exhaust video from a webpage, and intercepting picture frames; and integrating the chimney smoke exhaust image and the picture frame to generate a data set.

Step 1.2 data enhancement

And performing data enhancement on the data set generated by the integration in the step 1.1.

The data enhancement comprises rotation, reflection transformation, turning transformation, scaling transformation, translation transformation, scale transformation, contrast transformation, noise disturbance and color transformation. Data enhancement increases the number of images in a data set, improves the recognition capability and generalization capability of a neural network, and thus improves the training precision.

Step 1.3 feature tagging

And (3) carrying out characteristic marking on the smoke part of the smoke-exhaust image in the enhanced data set by using a rectangular frame to obtain coordinate information (x, y, w, h) of the rectangular frame, wherein (x, y) is the central coordinate of the rectangular frame, and (w, h) is the width and the height of the rectangular frame, and generating a new data set A by using the image with the coordinate information of the rectangular frame, wherein the data set A is used for improving the training of the VGG16 convolution network.

Step 2: obtaining an optimal weight model;

the labeled images in data set a are divided into a training set Q1, a validation set Q2, and a test set Q3.

Further, setting the training set Q1 to account for 60% of the data set A; the validation set Q2 accounts for 20% of the data set a; test set Q3 accounted for 20% of data set a.

Step 2.1 training Generation of multiple weight models

And (3) sending the training set Q1 into an improved VGG16 convolutional network for training to obtain a plurality of weight models.

The main network structure of the improved VGG16 convolutional network is VGG16, the last two full-connection layers are changed into two convolutional layers and used for extracting image features of the chimney in a multi-scale mode, then the convolutional layers are connected with the global mean pooling layer, a matrix of 1 multiplied by N is generated and used for result output, and finally the matrix is input into a loss function softmax for classification to construct a complete network structure. The global mean pooling layer is used for averaging the whole feature map, so that the total parameter number of the model can be greatly reduced, and the detection speed is improved.

The specific steps of training the training set Q1 with the improved VGG16 convolutional network in step 2.1 are as follows:

step 2.1.1 obtaining positive and negative samples for new network training

The default box is a chimney box for improving automatic marking of the VGG16 convolution network, and the area S of the default box_kThe following were used:

wherein S is_minThe area of the bottommost layer is represented, and the value is 0.2; s_maxThe area of the topmost layer is represented and is 0.9.

The width of the default box is:

where Δ w is a wide offset.

The height of the default box is:

where Δ h is a high offset.

Aspect ratio a of default box_rThere are 5 kinds, a_r＝{1,4/3,16/9,3/4,9/16}。

The default box has different sizes in different characteristic layers, and has different length-width ratios in the same characteristic layer, so that the default box can cover smoke with various shapes and sizes in the input smoke discharge image.

Comparing the area of the default box with the rectangular frame of the feature labels in the training set Q1, when the overlapping area of the default box and the rectangular frame of the feature labels in the training set Q1 is more than 0.5, marking the image as a positive sample, and when the overlapping area is less than 0.3, marking the image as a negative sample, wherein the ratio of the generated positive sample to the generated negative sample is 1: 3.

step 2.1.2 obtaining confidence coefficient and coordinate value of the category of the default box

Putting the positive sample and the negative sample obtained in the step 2.1.1 into an improved VGG16 convolutional network for training to obtain three characteristic layers; convolving three layers of feature layers by using two different convolution kernels of 3 x 3 to obtain two matrix outputs of 1 x N, wherein one output is a confidence coefficient for classification, namely the confidence coefficient of the category of each default box; the other output is the position for regression, i.e. the coordinate values (x, y, w, h) of each default box.

Step 2.1.3 model adjustment

And (3) sending the three characteristic layers and the training set Q1 to a modified VGG16 convolutional network, fitting by using a SmoothL1 function, and outputting a coordinate vector.

And combining the output coordinate vector, the confidence coefficient of the type of the default box obtained in the step 2.1.2 and the coordinate value of the default box by using a Concat method, and then performing model adjustment by using a loss function softmax to obtain a plurality of default boxes.

Step 2.1.4 Generation of multiple weight models

And (4) screening the plurality of default boxes in the step 2.1.3 by using a non-maximum suppression algorithm to obtain a default box A, and generating a plurality of weight models by using the default box A.

Step 2.2 verification of the set of parameters

The verification set Q2 is used to adjust model parameters. After a plurality of weight models are trained in the training set Q1, in order to find the weight model with the best effect, the verification set Q2 is predicted by using each weight model, and the accuracy of the weight models is recorded. And selecting the parameter corresponding to the weight model with the highest accuracy, and generating the optimal weight model by using the parameter.

2.3 measuring network Performance

And (3) predicting the optimal weight model obtained in the step 2.2 by using a test set Q3, and measuring and improving the performance of the VGG16 convolution network.

And step 3: testing the chimney emission video and image;

and (3) connecting the video monitoring equipment with a computer, detecting the monitored chimney video or image by using the optimal weight model generated in the step (2), monitoring the smoke exhaust condition of the chimney in real time, marking the smoke exhaust video with a rectangular frame and displaying the smoke exhaust video on the computer.

The invention has the beneficial effects that:

(1) the improved VGG16 convolutional network extracts a default box under a multi-scale feature map, so that the detection accuracy is high.

(2) The improved VGG16 convolutional network changes the traditional full-connection layer into a network structure of a convolutional layer and a global mean pooling layer, so that the detection speed is greatly improved.

(3) The default boxes with different length-width ratios are used, so that a plurality of default boxes can adapt to different shapes and sizes of different objects.

Drawings

FIG. 1 is a flow chart of a method provided by the present invention.

Fig. 2 is a network framework of the improved VGG16 convolutional network provided by the present invention.

Fig. 3 is a flow chart of an improved VGG16 convolutional network provided by the present invention.

Detailed Description

The following detailed description of specific embodiments of the present invention is provided in connection with the accompanying drawings.

step 1: generating a chimney emission image dataset;

step 1.1 crawling chimney smoke extraction image

Downloading a chimney smoke exhaust video from a webpage through a python code, and intercepting a picture frame; and integrating the chimney smoke exhaust image and the picture frame to generate a data set.

Step 1.2 data enhancement

Step 1.3 feature tagging

Step 2: obtaining an optimal weight model;

Step 2.1 training Generation of multiple weight models

Further, the specific steps of training the training set Q1 with the improved VGG16 convolutional network in step 2.1 are as follows:

step 2.1.1 obtaining positive and negative samples for new network training

The width of the default box is:

where Δ w is a wide offset.

The height of the default box is:

where Δ h is a high offset.

Step 2.1.3 model adjustment

Step 2.1.4 Generation of multiple weight models

Step 2.2 verification of the set of parameters

2.3 measuring network Performance

And step 3: testing the chimney emission video and image;

Claims

1. A method for detecting smoke discharge video based on an improved VGG16 convolutional network is characterized by comprising the following steps:

step 1: generating a chimney emission image dataset;

step 1.1 crawling chimney smoke extraction image

Downloading a chimney smoke exhaust video from a webpage, and intercepting picture frames; integrating the chimney smoke exhaust image and the picture frame to generate a data set;

step 1.2 data enhancement

Performing data enhancement on the data set generated by integration in the step 1.1;

step 1.3 feature tagging

Carrying out feature marking on the smoke part of the smoke-exhaust image in the enhanced data set by using a rectangular frame to obtain coordinate information (x, y, w, h) of the rectangular frame, wherein (x, y) is the central coordinate of the rectangular frame, and (w, h) is the width and the height of the rectangular frame, and generating a new data set A by using the image with the coordinate information of the rectangular frame, wherein the data set A is used for improving VGG16 convolutional network training;

step 2: obtaining an optimal weight model;

dividing the marked images in the data set A into a training set Q1, a verification set Q2 and a test set Q3;

step 2.1 training Generation of multiple weight models

Sending the training set Q1 into an improved VGG16 convolutional network for training to obtain a plurality of weight models;

the main network structure of the improved VGG16 convolutional network is VGG16, the last two full-connection layers are changed into two convolutional layers and used for extracting image characteristics of a chimney in a multi-scale mode, then the convolutional layers are connected with a global mean pooling layer, a matrix of 1 multiplied by N is generated and used for result output, and finally the matrix is input into a loss function softmax for classification to construct a complete network structure; the global mean pooling layer is used for averaging the whole feature map, so that the total parameter number of the model can be greatly reduced, and the detection speed is improved;

step 2.1.1 obtaining positive and negative samples for new network training

wherein S is_minRepresents the area of the bottommost layer; s_maxRepresents the area of the topmost layer;

the width of the default box is:

wherein Δ w is the wide offset;

the height of the default box is:

wherein Δ h is a high offset;

aspect ratio a of default box_rThere are 5 kinds, a_r＝{1,4/3,16/9,3/4,9/16}；

Comparing the area of the default box with the rectangular frame of the feature labels in the training set Q1, wherein the overlapping area of the default box and the rectangular frame of the feature labels in the training set Q1 is larger than 0.5, the image label is a positive sample, when the overlapping area is smaller than 0.3, the image label is a negative sample, and the ratio of the generated positive sample to the generated negative sample is 1: 3;

Putting the positive sample and the negative sample obtained in the step 2.1.1 into an improved VGG16 convolutional network for training to obtain three characteristic layers; convolving three layers of feature layers by using two different convolution kernels of 3 x 3 to obtain two matrix outputs of 1 x N, wherein one output is a confidence coefficient for classification, namely the confidence coefficient of the category of each default box; the other output is the positioning for regression, namely the coordinate value (x, y, w, h) of each default box;

step 2.1.3 model adjustment

Sending the three characteristic layers and a training set Q1 to an improved VGG16 convolutional network, fitting by using a SmoothL1 function, and outputting a coordinate vector;

merging the output coordinate vector, the confidence coefficient of the type of the default box obtained in the step 2.1.2 and the coordinate value of the default box by using a Concat method, and then performing model adjustment by using a loss function softmax to obtain a plurality of default boxes;

step 2.1.4 Generation of multiple weight models

Screening the plurality of default boxes in the step 2.1.3 by using a non-maximum suppression algorithm to obtain a default box A, and generating a plurality of weight models by the default box A;

step 2.2 verification of the set of parameters

Predicting the verification set Q2 by using each weight model, and recording the accuracy of the weight model; selecting a parameter corresponding to the weight model with the highest accuracy, and generating an optimal weight model by using the parameter;

step 2.3 measuring network Performance

Predicting the optimal weight model obtained in the step 2.2 by using a test set Q3, and measuring and improving the performance of the VGG16 convolution network;

and step 3: testing the chimney emission video and image;

2. The method for detecting the smoke discharge video based on the improved VGG16 convolutional network as claimed in claim 1, wherein in the step 1.2, the data enhancement comprises rotation, reflection transformation, flipping transformation, scaling transformation, translation transformation, scale transformation, contrast transformation, noise disturbance and color transformation.

3. The method for detecting the smoke discharge video based on the improved VGG16 convolutional network as claimed in claim 1 or 2, wherein in the step 2, a training set Q1 is set to be 60% of a data set A; the validation set Q2 accounts for 20% of the data set a; test set Q3 accounted for 20% of data set a.

4. The method for detecting video emission based on modified VGG16 convolutional network as claimed in claim 1 or 2, wherein in step 2.1.1, S_minThe value is 0.2; s_maxThe value is 0.9.

5. The method for detecting video emission based on modified VGG16 convolutional network as claimed in claim 3, wherein in step 2.1.1, S_minThe value is 0.2; s_maxThe value is 0.9.