CN111507416A - Smoking behavior real-time detection method based on deep learning - Google Patents

Smoking behavior real-time detection method based on deep learning Download PDF

Info

Publication number
CN111507416A
CN111507416A CN202010314703.7A CN202010314703A CN111507416A CN 111507416 A CN111507416 A CN 111507416A CN 202010314703 A CN202010314703 A CN 202010314703A CN 111507416 A CN111507416 A CN 111507416A
Authority
CN
China
Prior art keywords
smoking
data set
picture
data
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010314703.7A
Other languages
Chinese (zh)
Other versions
CN111507416B (en
Inventor
莫益军
刘金阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Mastop Technology Co ltd
Original Assignee
Hubei Mastop Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Mastop Technology Co ltd filed Critical Hubei Mastop Technology Co ltd
Priority to CN202010314703.7A priority Critical patent/CN111507416B/en
Publication of CN111507416A publication Critical patent/CN111507416A/en
Application granted granted Critical
Publication of CN111507416B publication Critical patent/CN111507416B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention is suitable for the technical field of artificial intelligence and behavior detection, and provides a smoking behavior real-time detection method based on deep learning.

Description

Smoking behavior real-time detection method based on deep learning
Technical Field
The invention belongs to the technical field of artificial intelligence and behavior detection, and particularly relates to a smoking behavior real-time detection method based on deep learning.
Background
Human behavior recognition is a basic task in computer vision, human behavior recognition based on vision is a process of marking an image sequence by using an action tag, and a reliable solution for solving the problem is applied to the fields of video monitoring, video retrieval, human-computer interaction, video understanding and the like, for example, video monitoring based on behavior recognition can be applied to the problems of old people, such as smart homes, old people falling down and the like. Video retrieves a shovel ball in a football match, a handshake in a news shot, or a typical dance move in a music video. Interactive applications, such as man-machine interaction or in-game applications. This task is very challenging due to variations in performance of the actions, recording settings, and interpersonal differences.
Smoking behavior detection is one of human behavior detection categories, at present, smoking behavior detection has been partially researched, and the other is that a feature extraction algorithm extracts features from a video and judges whether smoking behaviors exist in the video according to the features; the method is mainly divided into three steps: identifying moving regions within a frame of the video stream, such as a difference algorithm; searching smoke features in each identified area, for example, obtaining the smoke features through a color histogram, and obtaining the pixel features of the smoke through a pixel analysis algorithm; inferring the presence of smoke from the extracted smoke features, such as building a Gaussian model, hidden Markov model to match smoking behavior, or SVM classifiers to classify the features. The algorithm based on the mode has single extraction of the features, so that the algorithm generally has poor generalization capability and low recognition accuracy rate of smoking behaviors in a complex environment; the model is complex to establish, and the real-time performance is not high in practical application. The other method is to extract features from a video stream image by means of a convolutional neural network, locate smoking positions in the image and judge behavior categories, the technology can extract abundant smoking behavior features and has relatively good accuracy, and a deep learning algorithm generally has very high real-time performance, but due to lack of a related data set, the research is not deep enough, and a trained model does not have good practical application capability; and for the application of the convolutional neural network, the existing network is only applied, and an exclusive network belonging to smoking behavior detection is not constructed according to smoking characteristics.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a method for detecting smoking behavior in real time based on deep learning, and to solve the above problems.
The invention adopts the following technical scheme:
the smoking behavior real-time detection method based on deep learning comprises the following steps:
acquiring and labeling smoking behavior data to obtain a data set, wherein the data set comprises a training sample set, a verification sample set and a test sample set, and each sample set is divided into a positive sample and a negative sample;
preprocessing a data set according to the characteristics of the smoking behavior data set, wherein the preprocessing comprises data amplification processing, data set standardization processing and data set normalization processing;
combining deep learning and according to smoking behavior characteristics, constructing a smoking convolutional neural network for detecting smoking behaviors;
setting network training parameters of the smoking convolutional neural network, training, calculating the average accuracy of the test sample set after obtaining a network model, and verifying the model result;
and generating and storing the available model as required, and applying the stored available model to a smoking behavior detection reality scene.
Further, the data augmentation processing specifically includes the following steps:
randomly flipping each picture in the data set to increase the number of pictures in the data set, wherein the random flipping comprises horizontal flipping, vertical flipping and horizontal and vertical flipping;
carrying out gamma conversion processing on the data set picture subjected to random overturning processing to realize picture color enhancement;
and performing data amplification on positive samples in the data set after gamma conversion processing through an SMOTE algorithm.
Further, the input of the smoking convolutional neural network is a sample RBG picture, the smoking convolutional neural network is a convolutional network comprising 2 convolutional layers, then a plurality of residual modules are arranged, a down-sampling layer comprising 1 convolutional layer is arranged between the adjacent residual modules, then 3 convolutional layers are used as full-connection layers, and finally classification is carried out through softmax.
Furthermore, the first two convolutional layers of the smoking convolutional neural network are both designed with convolution kernels with four sizes, and each convolution path in the convolutional layer is further processed by batch standardization and an Re L U activation function, wherein the formula of the batch standardization is as follows:
Figure BDA0002458982690000031
where t is the data of each training batch, Et is the mean of the data t of each training batch, and Var t is the variance of the data t of each training batch, in order to avoid the use of a slight positive number when the divisor is 0.Φ is the scale factor and the translation factor.
Further, the residual error module includes two convolution layers, where U is an input of the residual error module, after two layers of convolution processing, each layer of convolution is further subjected to batch normalization processing and Re L U activation function processing, and the two layers of convolution is denoted as F (·), and an output of the residual error module is q (U) ═ U + F (U).
Further, the network parameters are updated by minimizing a loss function, wherein the loss function is calculated as follows:
Figure BDA0002458982690000032
wherein (x)i,yi,wi,hi) Is the position and size of the reference standard group route marked on the data set in the picture,
Figure BDA0002458982690000033
is the prediction result of the smoking convolutional neural network to the target position; p is a radical ofi(c) Is a category label of the groudtruth itself,
Figure BDA0002458982690000034
is the prediction result of the smoking convolutional neural network to the target class label; lambda [ alpha ]1、λ2、λ3The weight of the three parts is minimized, and the loss function is minimized to update the network parameters, namely, the weight is continuously updated, so that the loss function is continuously reduced, and the prediction result is gradually accurate.
Further, there are three storage modes for storing available models, namely, completely storing the whole model, respectively storing the structure and weight of the model, and storing the model map.
The method has the advantages that the method can improve the manufacturing process of the smoking behavior detection data set by collecting and marking smoking behaviors, enriches the types and the quantity of the smoking behavior detection data set, can more accurately extract the characteristics of a detection object by designing the smoking convolutional neural network aiming at the smoking behavior detection, can prevent overfitting of the network by combining batch standardization and Re L U activation function processing, can inhibit the problems of gradient dispersion and gradient explosion in the network training process by adopting a residual error module structure, improves the network performance, increases the quantity of the data set by data preprocessing, performs unified processing on the data set, removes a large amount of noise and improves the network training effect.
Drawings
Fig. 1 is a flowchart of a method for detecting smoking behavior in real time based on deep learning according to an embodiment of the present invention;
FIG. 2 is a block diagram of a constructed smoking convolutional neural network;
FIG. 3 is a block diagram of the first two convolutional layers of a smoking convolutional neural network;
fig. 4 is a structural diagram of a residual module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
As shown in fig. 1, the method for detecting smoking behavior in real time based on deep learning provided in this embodiment includes the following steps:
and S1, acquiring and labeling smoking behavior data to obtain a data set, wherein the data set comprises a training sample set, a verification sample set and a test sample set, and each sample set is divided into a positive sample and a negative sample.
The method mainly realizes the construction of a smoking behavior data set, and obtains a training sample set and a verification sample set in a training stage and a test sample set in a testing stage by collecting marked smoking data. The sample set is divided into positive and negative samples. Here, the positive example is a picture of smoking behavior, and the negative example is a picture of non-smoking behavior.
For making positive sample data, the positive sample is a picture containing smoking behaviors, and in order to ensure the diversity of data, the accuracy of the model at the training position is higher. In this embodiment, the positive sample picture has the following characteristics:
a) the picture should have a time span, and the time of day may affect the picture quality;
T={t1、t2},t1∈ day and t2∈ at night
b) The pictures comprise various weathers, and the weather conditions can influence the light rays of the pictures;
W={w1、w2、w3、w4}
w1: on a sunny day, w2: rainy day, w3: snow sky, w4: in cloudy days.
c) The pictures should have rich crowd density, including many people and few people;
Q={q1、q2、q3}
Q≤q1: low population density picture, Q is less than or equal to Q2: density picture of middle population, Q is less than or equal to Q3: high population density pictures.
d) The picture should contain different geographical locations so that the picture has a rich background;
L={l1、l2},l1∈ Country, 12∈ City
e) The target object distance in the picture should be different, including far, middle and near distances;
D={d1、d2、d3}
D≤d1: close-range picture with D less than or equal to D2: middle distance picture, D is less than or equal to D3: and (5) long-distance pictures.
f) The picture quality is clear and cannot be damaged or blurred;
g) the pictures should have authenticity and should not have super-realistic contents such as cartoons, cartoons and the like;
h) the pictures are moderate in size, and the size is about 416 × 416 or 512 × 512;
i) the number of pictures should be on the same order as the existing large data set, the number directly affects the generalization ability of the algorithm, and the positive sample should have a rich number of pictures.
For the acquisition source of the positive sample data, mainly pictures or videos of smoking behaviors are acquired from the network, such as keyword search by a search engine, specifically, the following keywords can be used: cigarette, s-mouking, tobacco, smoking, etc., which can crawl pictures searched by these keywords from the web page using crawler software; in addition, network smoking videos can also be recorded from channels such as a video viewer; finally, the corresponding natural smoking behavior can be recorded from the outside by using a mobile phone and a camera or the actor is photographed to perform the deductive smoking behavior, and the corresponding natural smoking behavior can also be copied from a video tape in places such as a smoking room and the like. Of course, if conditions allow, pictures or videos may be integrated directly from existing smoking behaviour data sets.
After the smoking video and the smoking picture are obtained through the method, a marking tool is used for marking, a picture marking tool can be developed through an OpenCV (open computer vision library), the screen marking tool comprises a candidate frame, the size of the candidate frame can be manually adjusted, the position coordinates of the upper left corner and the position coordinates of the lower right corner of a target in the picture are marked, and the position coordinates are stored in a local disk. Because the input in the convolutional neural network designed in this embodiment is a square picture, the candidate frame screen snapshot tool in this embodiment should be a square, and the stored training pictures are all those with an aspect ratio of 1: 1, and finally making D1And (5) taking a positive sample of smoking behavior.
For making negative sample data, the negative sample is a picture with a background similar to that of the positive sample but not containing a cigarette target, and in this embodiment, the negative sample has the following characteristics:
1. the picture covering background is wide and not single, and should have similarity with the positive sample background;
2. the picture quality is clear and cannot be damaged or blurred;
3. the pictures should have authenticity and should not have super-realistic contents such as cartoons, cartoons and the like;
4. the pictures are moderate in size, and the size is about 416 × 416 or 512 × 512;
5. the picture should be an RGB image, not a black and white picture or a CMYK picture;
6. the number of pictures is sufficient, and the proportion of positive samples and negative samples is controlled to be about 1: 3.
The negative sample sources can extract frames from film and television works, network videos, or search from websites and integrate from existing data sets. In addition, in the process of manufacturing the positive sample, a non-target picture is saved as the negative sample. The negative sample does not contain smoking behavior, but contains pictures close to the positive sample, and the number of the pictures obtained is D2Negative examples of (3).
After the smoking data is acquired and labeled according to the requirements, a training sample set and a verification sample set in a training stage and a test sample set in a testing stage are obtained. The test sample set contains both the positive and negative samples. The number of pictures in the training stage and the number of pictures in the testing stage are randomly distributed in a ratio of 4:1, and the number of training samples and verification samples in the training stage is 3: a ratio of 1 is randomly assigned. I.e. the number of samples of the entire data set is D ═ D1+D2D is divided into three parts: the system comprises a training sample set, a verification sample set and a testing sample set, wherein the number of the training sample set is 0.6X D, the number of the verification sample set is 0.2X D, and the number of the testing sample set is 0.2X D.
And step S2, preprocessing the data set according to the characteristics of the smoking behavior data set, including data amplification processing, data set standardization processing and data set normalization processing.
The data augmentation refers to augmentation of pictures in a data set, and specifically, the data augmentation process includes the following steps:
and S21, randomly overturning. Randomly flipping each picture in the data set to expand the number of pictures in the data set, wherein the random flipping comprises horizontal flipping, vertical flipping, and horizontal-vertical flipping.
This example employs random inversion to expand the number of pictures of a data set. Each picture in this example dataset D is flipped in one of three forms with a probability of 1/3: horizontal turning, vertical turning and horizontal and vertical turning.
The horizontal flipping refers to performing left-right pixel symmetry interchange along a perpendicular bisector of a picture, specifically, for an RGB picture, a formula for changing a pixel matrix on a certain channel during the horizontal flipping is as follows:
A(x,y)=A(x,w-y)
where w is the picture width.
The vertical turning means that the picture is subjected to up-down pixel symmetric interchange along a horizontal center line, and for an RGB picture, a pixel matrix change formula on a certain channel during vertical turning is as follows:
A(x,y)=A(h-x,y)
where h is the height of the picture.
The horizontal and vertical flipping means that the picture is subjected to left-right pixel symmetric interchange along the perpendicular bisector, and then subjected to up-down pixel symmetric interchange along the horizontal center line, so as to generate a new picture, specifically, for an RGB picture, when the picture is horizontally and vertically flipped, a pixel matrix change formula on a certain channel is as follows:
A(x,y)=A(h-x,w-y)
and randomly turning over the picture in the above 3 modes in an equal probability mode to generate a new picture.
And S22, enhancing the color. And (4) carrying out gamma conversion processing on the data set picture subjected to random overturning processing so as to realize picture color enhancement.
The step adopts gamma conversion to convert the brightness of the picture, and converts an RGB picture into a gray image in the following way:
Bgrey(x,y)=0.2989*AR(x,y)+0.5870*AG(x,y)+0.1140*AB(x,y)
wherein A isR(x, y) is the pixel matrix on the R channel, AG(x, y) is the pixel matrix on the G channel, AB(x, y) is a pixel matrix on a B channel, and a single-dimensional two-dimensional matrix B is obtained after calculationgrey(x, y) is the gray level value of the picture.
Normalizing the pixels of the acquired gray level image to a [0, 1] range by adopting the following gamma conversion formula:
Cgamma(x,y)=Bgrey(x,y)γ
when gamma is larger than 1, the whole image becomes dark; when gamma is less than 1, the whole image becomes bright; and pictures with different brightness can be obtained by continuously changing gamma.
And S23, increasing the number of pictures through the SMOTE algorithm. And carrying out data augmentation by carrying out SMOTE algorithm on positive samples in the data set. The specific process is as follows:
1) randomly taking a plurality of samples from the positive samples to form a sub-sample set, and calculating the distance from each sample x in the sub-sample set to all other samples in the sub-sample set by the following formula:
Figure BDA0002458982690000081
wherein (x)i1,yi1) Is the pixel value of sample x on R, G, B channels, (x)i2,yi2) Are the pixel values of the other samples on the R, G, B three channels. And selecting k samples nearest to the sample x, and randomly selecting n samples from the k adjacent samples.
2) For n randomly selected neighbor samples, respectively constructing new samples with the original sample x according to the following formula and adding the new samples into the data set:
y=x+rand(0,1)*||x=xn||
wherein rand (0, 1) is from 0 to1 a random number, xnIs the current randomly selected sample from the k neighboring samples, and y is the constructed new sample.
After the data amplification treatment, the data set is amplified by a certain amount to obtain a new data set M, and then the data set M is subjected to standardization treatment and normalization treatment.
The data set M is normalized using the following formula:
Figure BDA0002458982690000091
where X denotes a picture matrix, μ is a mean of the picture, σ is a standard deviation of the picture, and N is the number of pixels of the picture X.
The data set M is normalized using the following formula:
Figure BDA0002458982690000092
wherein, XiRepresenting picture pixel point values.
And S3, combining deep learning and constructing a smoking convolutional neural network for detecting smoking behaviors according to the characteristics of the smoking behaviors.
The smoking convolutional neural network SmokingNet for smoking behavior detection is constructed by combining a deep learning method and according to the characteristics of smoking behaviors. As shown in fig. 2, in the present embodiment, the smoking convolutional neural network SmokingNet input is a sample RGB picture, in the present embodiment, the sample RGB picture is 416 × 416 RGB picture, and the smoking convolutional neural network first includes a convolutional network of 2 convolutional layers; then, 5 Residual blocks are provided, and the lengths of the 5 Residual blocks are respectively: 1. 2, 4, 2; residual modules contain 1 convolution layer, which can be regarded as a down-sampling layer, so that the characteristic dimension is reduced, and the total number of the down-sampling layers is 4; then, taking 3 convolutional layers as full-connection layers, wherein the number of feature maps of the last full-connection layer is as follows:
Feature maps=(classes+1+coords)*anchors_mum
where classes is the object class, the coords value is 4, and the anchors _ num value is 3.
Finally, classification is performed by softmax.
Referring to fig. 2 and 3, the first two layers of the smoking convolutional neural network SmokingNet are two convolutional layers, convolution kernels of the convolutional layers are used for extracting local features of a given image, and in order to improve detection of smoking behaviors, the smoking convolutional neural network with multiple convolution kernels is adopted. The classic convolution kernel is square, and the cigarette is strip-shaped, so this embodiment uses long convolution kernels according to the shape characteristics of the cigarette, and designs four sizes of convolution kernels in the first convolution layer of the smoking convolution neural network SmokingNet, as shown in fig. 2 and 3, the four sizes of convolution kernels are respectively: the convolution kernel is divided into 4 paths for convolution by a small convolution kernel of 3 × 3 pixels, a large convolution kernel of 5 × 5 pixels, a long convolution kernel of 7 × 3, and a long convolution kernel of 3 × 7. Specifically, the first path is a convolution of 5 × 5 with filter number of 8, padding with 2 × 2, and step size of 1 × 1; the second path is a 3 × 7 convolution with a filter number of 8, using 1 × 3padding, with a step size of 1 × 1; the third path is a 3 × 3 convolution with a filter number of 8, using 1 × 1padding, with a step size of 1 × 1; the 7 × 3 convolution with the filter number of 8 for the fourth path, using 3 × lpadding, with a step size of 1 × 1; and finally, integrating the feature maps featuremap after the convolution of the four paths to obtain 416 × 32 output.
Convolution kernels with four sizes are designed in the second convolution layer of the smoking convolution neural network SmokinngNet, and are respectively a small convolution kernel with 2 x 2 pixels, a large convolution kernel with 6 x 6 pixels, a long convolution kernel with 6 x 2 and a long convolution kernel with 2 x 6, and the convolution kernels are divided into 4 paths to be convoluted. Specifically, the first path is 6 × 6 convolution with a filter number of 12, using 2 × 2padding, and having a step size of 2 × 2; the second path is a 2 x 6 convolution with a filter number of 12, using 0 x 2padding, with a step size of 2 x 2; the third path is a convolution with a filter number of 12, using 0 × 0padding, step size of 2 × 2; the 6 × 2 convolution with the filter number of 12 for the fourth path, using 2 × 0padding, step size 2 × 2; and finally, integrating the feature maps featuremap after the convolution of the four paths to obtain 208 × 64 output.
Further preferably, as shown in fig. 3, BN (batch normalization) and Re L U activation functions are also added to the first two convolutional layers, i.e., each convolutional path is further processed by RN + Re L U.
The batch normalization algorithm has the following calculation formula:
Figure BDA0002458982690000101
the algorithm mainly comprises 4 steps:
(1) calculating the mean value of each training batch data t, namely E [ t ];
(2) calculating the variance of the data t of each training batch, namely Var [ t ];
(3) normalizing the training data of the batch by using the obtained mean value and variance to obtain interval distribution from 0 to 1, wherein the interval distribution is a tiny positive number used when the divisor is 0;
(4) scale transformation and offset: and multiplying t by phi to adjust the numerical value, and adding the increasing offset to obtain r, wherein phi is a scale factor and a translation factor. Because the normalized basically can be limited under normal distribution, the expression capacity of the network is reduced, and in order to solve the problem, two new parameters are introduced: phi,. The sum of phi and phi is obtained by the network learning itself during training.
After batch standardization processing, the Re L U activation function is used for carrying out nonlinear transformation, and the Re L U activation function is an existing function and is not described in detail herein.
In this embodiment, a Residual network is further introduced into the smoking convolutional neural network SmokingNet to form a Residual module Residual Block, as shown in fig. 4, the Residual module includes two convolutional layers, assuming that the input of the Residual module is U, the Residual module is subjected to two layers of convolution processing, each layer of convolution is further subjected to BN + Re L U processing, and the two layers of convolution is denoted as F (·), then the output of the Residual Block is:
O(u)=u+F(u)
there are 1 convolution layer between adjacent residual modules to reduce the size of the feature map by half, in this embodiment, a convolution operation with a step size of 2 × 2 is performed using 3 × 3 convolution kernels, and the filling of the image is 1 × 1. Assuming that the size of the input feature map is recorded as h, the size of the convolution kernel is recorded as k, the step size is recorded as s, and the filling is recorded as p, the size of the output feature map is:
Figure BDA0002458982690000111
and S4, setting network training parameters of the smoking convolutional neural network, training, calculating the average accuracy of the test sample set after obtaining a network model, and verifying the model result.
The smoking convolutional neural network can extract features from the picture and acquire the position of a detection target according to the features, and a loss function is calculated according to the following formula:
Figure BDA0002458982690000112
wherein (x)i,yi,wi,hi) Is the position and size of the reference standard group route marked on the data set in the picture,
Figure BDA0002458982690000113
is the prediction result of the smoking convolutional neural network to the target position; p is a radical ofi(c) Is a category label of the groudtruth itself,
Figure BDA0002458982690000114
is the prediction result of the smoking convolutional neural network to the target class label; lambda [ alpha ]1、λ2、λ3Is the weight magnitude of the three parts. And minimizing the loss function to update the network parameters, namely continuously updating the weight, so that the loss function is continuously reduced, and the prediction result is gradually accurate.
For example, the input pictures are uniformly sized as 416 × 416, the distributed training samples are subjected to batch training in the cigarette volume neural network, one batch includes 64 pictures, the training samples are sent into the network for training in 8 times in one batch, 8 pictures are sent each time, then the weight parameters are updated each time the training of one batch is completed, and in the learning process of each time, the learned parameters are reduced according to the proportion of 0.0005. When the momentum method is used for updating the weight, momentum is set to be 0.9, the initial learning rate is set to be 0.001, the total training frequency is set to be 100000, the learning rate is attenuated by ten times when the iteration is carried out to 80000, and the learning rate is attenuated by ten times when the iteration is carried out to 90000.
After the network model is obtained after training, the average accuracy mAP of the data of the test sample set is calculated to verify the model result, and the specific calculation steps are as follows:
1. for the position prediction result of the network model on the target in the picture, calculating IoU by combining the labeling boxes in the test sample set, setting a threshold value Thred as 0.6, IoU > Thred as TP (true positive case), IoU < Thred as FP (false positive case), and marking the undetected labeling boxes as FP (false negative case); and calculating the accuracy rate Recall and the Recall ratio Precision according to the following formulas:
Figure BDA0002458982690000121
Figure BDA0002458982690000122
wherein NumgtIs the total number of groudtuth.
2. Drawing a precision-call curve of each category, wherein the area under the curve is the AP value of the category, and then calculating the mAP according to the following formula:
Figure BDA0002458982690000123
and S5, generating and storing available models as required, and applying the stored available models to smoking behavior detection reality scenes.
After training is complete, the model can be saved in several ways:
a) save the entire model in its entirety
Save the entire model in its entirety using model. save api, save the Keras model and weights in an HDF5 file that will contain: structure of the model, parameters of the model, optimizer parameters.
b) Separately preserving structure and weight of model
Save only the structure of the model: model structures are saved to json file or yml file using to json api or to ymlAPI.
Only the weight of the model is retained: the weight of the model only can be reserved through save _ weightsAPI, and also can be realized through the setting of checkpoint.
c) Preserving model maps
Summary information of a model can be printed through a model.
And selecting the model with the maximum mAP value for application, specifically, inputting pictures of video streams in the camera into a network for target positioning and prediction, and when a smoking behavior target is detected for a plurality of pictures, judging that the smoking behavior occurs in the attention range of the camera at the moment.
In summary, the invention provides a set of real-time detection method based on deep learning, which comprises the steps of firstly constructing smoking behavior data set acquisition and labeling, and then preprocessing pictures according to the characteristics of the smoking behavior data set, including data augmentation, and unified standardization and normalization of the data set; wherein the data augmentation comprises data augmentation by using random inversion, color enhancement and a SMOE algorithm; then, aiming at the characteristics of the detection object, a smoking convolutional neural network SmokingNet with multiple convolutional kernels is designed, the characteristics of the detection object can be extracted in multiple aspects, and the network can be combined with a BN technology and a Residual Block to improve the network performance; then setting and training parameters in the network training process; and finally, the model is required to be stored in the installation scheme, and the stored model can be applied to a real scene. The method can effectively manufacture the smoking behavior detection data set, generate the smoking behavior detection model with high precision and high real-time performance, and has wide application prospect.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (7)

1. A smoking behavior real-time detection method based on deep learning is characterized in that: the method comprises the following steps:
acquiring and labeling smoking behavior data to obtain a data set, wherein the data set comprises a training sample set, a verification sample set and a test sample set, and each sample set is divided into a positive sample and a negative sample;
preprocessing a data set according to the characteristics of the smoking behavior data set, wherein the preprocessing comprises data amplification processing, data set standardization processing and data set normalization processing;
combining deep learning and according to smoking behavior characteristics, constructing a smoking convolutional neural network for detecting smoking behaviors;
setting network training parameters of the smoking convolutional neural network, training, calculating the average accuracy of the test sample set after obtaining a network model, and verifying the model result;
and generating and storing the available model as required, and applying the stored available model to a smoking behavior detection reality scene.
2. The deep learning-based smoking behavior real-time detection method according to claim 1, wherein: the data augmentation processing specifically comprises the following steps:
randomly flipping each picture in the data set to increase the number of pictures in the data set, wherein the random flipping comprises horizontal flipping, vertical flipping and horizontal and vertical flipping;
carrying out gamma conversion processing on the data set picture subjected to random overturning processing to realize picture color enhancement;
and performing data amplification on positive samples in the data set after gamma conversion processing through an SMOTE algorithm.
3. The deep learning-based smoking behavior real-time detection method according to claim 2, wherein: the smoking convolutional neural network is characterized in that the input of the smoking convolutional neural network is a sample RBG picture, the smoking convolutional neural network is a convolutional network comprising 2 convolutional layers firstly, then a plurality of residual error modules are arranged, a down-sampling layer comprising 1 convolutional layer is arranged between the adjacent residual error modules, then 3 convolutional layers are used as full-connection layers, and finally classification is carried out through softmax.
4. The deep learning-based smoking behavior real-time detection method of claim 3, wherein the first two convolutional layers of the smoking convolutional neural network are both designed with convolution kernels with four sizes, and each convolution path in the convolutional layers is further processed by batch normalization processing and an Re L U activation function, wherein the batch normalization processing has the following formula:
Figure FDA0002458982680000021
where t is the data of each training batch, Et is the mean of the data t of each training batch, and Var t is the variance of the data t of each training batch, in order to avoid the use of a slight positive number when the divisor is 0.Φ is the scale factor and the translation factor.
5. The deep learning-based smoking behavior real-time detection method according to claim 4, wherein the residual error module comprises two convolution layers, the input of the residual error module is U, after two layers of convolution processing, each layer of convolution is further processed by batch standardization processing and Re L U activation function processing, and the two layers of convolution is recorded as F (-), and the output of the residual error module is O (U) ═ U + F (U).
6. The deep learning-based smoking behavior real-time detection method according to claim 5, wherein: updating the network parameters by minimizing a loss function, wherein the loss function is calculated as follows:
Figure FDA0002458982680000022
wherein (x)i,yi,wi,hi) Is the position and size of the reference standard group route marked on the data set in the picture,
Figure FDA0002458982680000023
is the prediction result of the smoking convolutional neural network to the target position; p is a radical ofi(c) Is a category label of the groudtruth itself,
Figure FDA0002458982680000024
is the prediction result of the smoking convolutional neural network to the target class label; lambda [ alpha ]1、λ2、λ3The weight of the three parts is minimized, and the loss function is minimized to update the network parameters, namely, the weight is continuously updated, so that the loss function is continuously reduced, and the prediction result is gradually accurate.
7. The deep learning-based smoking behavior real-time detection method according to claim 6, wherein: there are three storage modes for storing available models, namely, completely storing the whole model, respectively storing the structure and weight of the model and storing the model map.
CN202010314703.7A 2020-04-21 2020-04-21 Smoking behavior real-time detection method based on deep learning Active CN111507416B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010314703.7A CN111507416B (en) 2020-04-21 2020-04-21 Smoking behavior real-time detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010314703.7A CN111507416B (en) 2020-04-21 2020-04-21 Smoking behavior real-time detection method based on deep learning

Publications (2)

Publication Number Publication Date
CN111507416A true CN111507416A (en) 2020-08-07
CN111507416B CN111507416B (en) 2023-08-04

Family

ID=71874343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010314703.7A Active CN111507416B (en) 2020-04-21 2020-04-21 Smoking behavior real-time detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN111507416B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528960A (en) * 2020-12-29 2021-03-19 之江实验室 Smoking behavior detection method based on human body posture estimation and image classification
CN112613097A (en) * 2020-12-15 2021-04-06 中铁二十四局集团江苏工程有限公司 BIM rapid modeling method based on computer vision
CN113139979A (en) * 2021-04-21 2021-07-20 广州大学 Edge identification method based on deep learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180292910A1 (en) * 2017-04-07 2018-10-11 University Of South Carolina Wearable Computing Device Featuring Machine-Learning-Based Smoking Detection
CN110390673A (en) * 2019-07-22 2019-10-29 福州大学 Cigarette automatic testing method based on deep learning under a kind of monitoring scene
CN110399766A (en) * 2019-01-28 2019-11-01 浙江浩腾电子科技股份有限公司 Smoking testing and analysis system based on deep learning
CN110490098A (en) * 2019-07-31 2019-11-22 恒大智慧科技有限公司 Smoking behavior automatic testing method, equipment and the readable storage medium storing program for executing of community user
CN110604597A (en) * 2019-09-09 2019-12-24 李胜利 Method for intelligently acquiring fetal cardiac cycle images based on ultrasonic four-cavity cardiac section
CN110909672A (en) * 2019-11-21 2020-03-24 江苏德劭信息科技有限公司 Smoking action recognition method based on double-current convolutional neural network and SVM
US20200097775A1 (en) * 2018-09-20 2020-03-26 Cable Television Laboratories, Inc. Systems and methods for detecting and classifying anomalous features in one-dimensional data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180292910A1 (en) * 2017-04-07 2018-10-11 University Of South Carolina Wearable Computing Device Featuring Machine-Learning-Based Smoking Detection
US20200097775A1 (en) * 2018-09-20 2020-03-26 Cable Television Laboratories, Inc. Systems and methods for detecting and classifying anomalous features in one-dimensional data
CN110399766A (en) * 2019-01-28 2019-11-01 浙江浩腾电子科技股份有限公司 Smoking testing and analysis system based on deep learning
CN110390673A (en) * 2019-07-22 2019-10-29 福州大学 Cigarette automatic testing method based on deep learning under a kind of monitoring scene
CN110490098A (en) * 2019-07-31 2019-11-22 恒大智慧科技有限公司 Smoking behavior automatic testing method, equipment and the readable storage medium storing program for executing of community user
CN110604597A (en) * 2019-09-09 2019-12-24 李胜利 Method for intelligently acquiring fetal cardiac cycle images based on ultrasonic four-cavity cardiac section
CN110909672A (en) * 2019-11-21 2020-03-24 江苏德劭信息科技有限公司 Smoking action recognition method based on double-current convolutional neural network and SVM

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
D. ZHANG等: ""Smoking Image Detection Based on Convolutional Neural Networks"", pages 1509 - 1515 *
REDMON J等: ""Yolov3: An incremental improvement"" *
程广涛等: "基于模块化深度卷积神经网络的烟雾识别", 软件导刊, vol. 19, no. 03, pages 83 - 86 *
韩贵金等: "基于Faster R-CNN的吸烟快速检测算法", 西安邮电大学学报, vol. 25, no. 02, pages 85 - 91 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613097A (en) * 2020-12-15 2021-04-06 中铁二十四局集团江苏工程有限公司 BIM rapid modeling method based on computer vision
CN112528960A (en) * 2020-12-29 2021-03-19 之江实验室 Smoking behavior detection method based on human body posture estimation and image classification
CN113139979A (en) * 2021-04-21 2021-07-20 广州大学 Edge identification method based on deep learning

Also Published As

Publication number Publication date
CN111507416B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
CN109241982B (en) Target detection method based on deep and shallow layer convolutional neural network
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
Mnih et al. Learning to label aerial images from noisy data
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN109670405B (en) Complex background pedestrian detection method based on deep learning
CN111507416B (en) Smoking behavior real-time detection method based on deep learning
CN106407978B (en) Method for detecting salient object in unconstrained video by combining similarity degree
CN115223017B (en) Multi-scale feature fusion bridge detection method based on depth separable convolution
CN115719463A (en) Smoke and fire detection method based on super-resolution reconstruction and adaptive extrusion excitation
Jiang et al. Arbitrary-shaped building boundary-aware detection with pixel aggregation network
CN113011359B (en) Method for simultaneously detecting plane structure and generating plane description based on image and application
Yang et al. Robust visual tracking using adaptive local appearance model for smart transportation
CN110287798A (en) Vector network pedestrian detection method based on characteristic module and context fusion
CN113762009B (en) Crowd counting method based on multi-scale feature fusion and double-attention mechanism
CN110533074B (en) Automatic image category labeling method and system based on double-depth neural network
CN111582057A (en) Face verification method based on local receptive field
CN115019039B (en) Instance segmentation method and system combining self-supervision and global information enhancement
CN116434010A (en) Multi-view pedestrian attribute identification method
CN116091946A (en) Yolov 5-based unmanned aerial vehicle aerial image target detection method
CN114926826A (en) Scene text detection system
CN115115552A (en) Image correction model training method, image correction device and computer equipment
Li et al. Intelligent terminal face spoofing detection algorithm based on deep belief network
Wang et al. Insulator defect detection based on improved you-only-look-once v4 in complex scenarios
Li et al. A method of inpainting moles and acne on the high‐resolution face photos
Ding et al. Application Analysis of Image Enhancement Method in Deep Learning Image Recognition Scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant