CN111507416A

CN111507416A - Smoking behavior real-time detection method based on deep learning

Info

Publication number: CN111507416A
Application number: CN202010314703.7A
Authority: CN
Inventors: 莫益军; 刘金阳
Original assignee: Hubei Mastop Technology Co ltd
Current assignee: Hubei Mastop Technology Co ltd
Priority date: 2020-04-21
Filing date: 2020-04-21
Publication date: 2020-08-07
Anticipated expiration: 2040-04-21
Also published as: CN111507416B

Abstract

The invention is suitable for the technical field of artificial intelligence and behavior detection, and provides a smoking behavior real-time detection method based on deep learning.

Description

Smoking behavior real-time detection method based on deep learning

Technical Field

The invention belongs to the technical field of artificial intelligence and behavior detection, and particularly relates to a smoking behavior real-time detection method based on deep learning.

Background

Human behavior recognition is a basic task in computer vision, human behavior recognition based on vision is a process of marking an image sequence by using an action tag, and a reliable solution for solving the problem is applied to the fields of video monitoring, video retrieval, human-computer interaction, video understanding and the like, for example, video monitoring based on behavior recognition can be applied to the problems of old people, such as smart homes, old people falling down and the like. Video retrieves a shovel ball in a football match, a handshake in a news shot, or a typical dance move in a music video. Interactive applications, such as man-machine interaction or in-game applications. This task is very challenging due to variations in performance of the actions, recording settings, and interpersonal differences.

Smoking behavior detection is one of human behavior detection categories, at present, smoking behavior detection has been partially researched, and the other is that a feature extraction algorithm extracts features from a video and judges whether smoking behaviors exist in the video according to the features; the method is mainly divided into three steps: identifying moving regions within a frame of the video stream, such as a difference algorithm; searching smoke features in each identified area, for example, obtaining the smoke features through a color histogram, and obtaining the pixel features of the smoke through a pixel analysis algorithm; inferring the presence of smoke from the extracted smoke features, such as building a Gaussian model, hidden Markov model to match smoking behavior, or SVM classifiers to classify the features. The algorithm based on the mode has single extraction of the features, so that the algorithm generally has poor generalization capability and low recognition accuracy rate of smoking behaviors in a complex environment; the model is complex to establish, and the real-time performance is not high in practical application. The other method is to extract features from a video stream image by means of a convolutional neural network, locate smoking positions in the image and judge behavior categories, the technology can extract abundant smoking behavior features and has relatively good accuracy, and a deep learning algorithm generally has very high real-time performance, but due to lack of a related data set, the research is not deep enough, and a trained model does not have good practical application capability; and for the application of the convolutional neural network, the existing network is only applied, and an exclusive network belonging to smoking behavior detection is not constructed according to smoking characteristics.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a method for detecting smoking behavior in real time based on deep learning, and to solve the above problems.

The invention adopts the following technical scheme:

the smoking behavior real-time detection method based on deep learning comprises the following steps:

acquiring and labeling smoking behavior data to obtain a data set, wherein the data set comprises a training sample set, a verification sample set and a test sample set, and each sample set is divided into a positive sample and a negative sample;

preprocessing a data set according to the characteristics of the smoking behavior data set, wherein the preprocessing comprises data amplification processing, data set standardization processing and data set normalization processing;

combining deep learning and according to smoking behavior characteristics, constructing a smoking convolutional neural network for detecting smoking behaviors;

setting network training parameters of the smoking convolutional neural network, training, calculating the average accuracy of the test sample set after obtaining a network model, and verifying the model result;

and generating and storing the available model as required, and applying the stored available model to a smoking behavior detection reality scene.

Further, the data augmentation processing specifically includes the following steps:

randomly flipping each picture in the data set to increase the number of pictures in the data set, wherein the random flipping comprises horizontal flipping, vertical flipping and horizontal and vertical flipping;

carrying out gamma conversion processing on the data set picture subjected to random overturning processing to realize picture color enhancement;

and performing data amplification on positive samples in the data set after gamma conversion processing through an SMOTE algorithm.

Further, the input of the smoking convolutional neural network is a sample RBG picture, the smoking convolutional neural network is a convolutional network comprising 2 convolutional layers, then a plurality of residual modules are arranged, a down-sampling layer comprising 1 convolutional layer is arranged between the adjacent residual modules, then 3 convolutional layers are used as full-connection layers, and finally classification is carried out through softmax.

Furthermore, the first two convolutional layers of the smoking convolutional neural network are both designed with convolution kernels with four sizes, and each convolution path in the convolutional layer is further processed by batch standardization and an Re L U activation function, wherein the formula of the batch standardization is as follows:

where t is the data of each training batch, Et is the mean of the data t of each training batch, and Var t is the variance of the data t of each training batch, in order to avoid the use of a slight positive number when the divisor is 0.Φ is the scale factor and the translation factor.

Further, the residual error module includes two convolution layers, where U is an input of the residual error module, after two layers of convolution processing, each layer of convolution is further subjected to batch normalization processing and Re L U activation function processing, and the two layers of convolution is denoted as F (·), and an output of the residual error module is q (U) ═ U + F (U).

Further, the network parameters are updated by minimizing a loss function, wherein the loss function is calculated as follows:

wherein (x)_i，y_i，w_i，h_i) Is the position and size of the reference standard group route marked on the data set in the picture,

is the prediction result of the smoking convolutional neural network to the target position; p is a radical of_i(c) Is a category label of the groudtruth itself,

is the prediction result of the smoking convolutional neural network to the target class label; lambda [ alpha ]₁、λ₂、λ₃The weight of the three parts is minimized, and the loss function is minimized to update the network parameters, namely, the weight is continuously updated, so that the loss function is continuously reduced, and the prediction result is gradually accurate.

Further, there are three storage modes for storing available models, namely, completely storing the whole model, respectively storing the structure and weight of the model, and storing the model map.

The method has the advantages that the method can improve the manufacturing process of the smoking behavior detection data set by collecting and marking smoking behaviors, enriches the types and the quantity of the smoking behavior detection data set, can more accurately extract the characteristics of a detection object by designing the smoking convolutional neural network aiming at the smoking behavior detection, can prevent overfitting of the network by combining batch standardization and Re L U activation function processing, can inhibit the problems of gradient dispersion and gradient explosion in the network training process by adopting a residual error module structure, improves the network performance, increases the quantity of the data set by data preprocessing, performs unified processing on the data set, removes a large amount of noise and improves the network training effect.

Drawings

Fig. 1 is a flowchart of a method for detecting smoking behavior in real time based on deep learning according to an embodiment of the present invention;

FIG. 2 is a block diagram of a constructed smoking convolutional neural network;

FIG. 3 is a block diagram of the first two convolutional layers of a smoking convolutional neural network;

fig. 4 is a structural diagram of a residual module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

As shown in fig. 1, the method for detecting smoking behavior in real time based on deep learning provided in this embodiment includes the following steps:

and S1, acquiring and labeling smoking behavior data to obtain a data set, wherein the data set comprises a training sample set, a verification sample set and a test sample set, and each sample set is divided into a positive sample and a negative sample.

The method mainly realizes the construction of a smoking behavior data set, and obtains a training sample set and a verification sample set in a training stage and a test sample set in a testing stage by collecting marked smoking data. The sample set is divided into positive and negative samples. Here, the positive example is a picture of smoking behavior, and the negative example is a picture of non-smoking behavior.

For making positive sample data, the positive sample is a picture containing smoking behaviors, and in order to ensure the diversity of data, the accuracy of the model at the training position is higher. In this embodiment, the positive sample picture has the following characteristics:

a) the picture should have a time span, and the time of day may affect the picture quality;

T＝{t₁、t₂}，t₁∈ day and t₂∈ at night

b) The pictures comprise various weathers, and the weather conditions can influence the light rays of the pictures;

W＝{w₁、w₂、w₃、w₄}

w₁: on a sunny day, w₂: rainy day, w₃: snow sky, w₄: in cloudy days.

c) The pictures should have rich crowd density, including many people and few people;

Q＝{q₁、q₂、q₃}

Q≤q₁: low population density picture, Q is less than or equal to Q₂: density picture of middle population, Q is less than or equal to Q₃: high population density pictures.

d) The picture should contain different geographical locations so that the picture has a rich background;

L＝{l₁、l₂}，l₁∈ Country, 1₂∈ City

e) The target object distance in the picture should be different, including far, middle and near distances;

D＝{d₁、d₂、d₃}

D≤d₁: close-range picture with D less than or equal to D₂: middle distance picture, D is less than or equal to D₃: and (5) long-distance pictures.

f) The picture quality is clear and cannot be damaged or blurred;

g) the pictures should have authenticity and should not have super-realistic contents such as cartoons, cartoons and the like;

h) the pictures are moderate in size, and the size is about 416 × 416 or 512 × 512;

i) the number of pictures should be on the same order as the existing large data set, the number directly affects the generalization ability of the algorithm, and the positive sample should have a rich number of pictures.

For the acquisition source of the positive sample data, mainly pictures or videos of smoking behaviors are acquired from the network, such as keyword search by a search engine, specifically, the following keywords can be used: cigarette, s-mouking, tobacco, smoking, etc., which can crawl pictures searched by these keywords from the web page using crawler software; in addition, network smoking videos can also be recorded from channels such as a video viewer; finally, the corresponding natural smoking behavior can be recorded from the outside by using a mobile phone and a camera or the actor is photographed to perform the deductive smoking behavior, and the corresponding natural smoking behavior can also be copied from a video tape in places such as a smoking room and the like. Of course, if conditions allow, pictures or videos may be integrated directly from existing smoking behaviour data sets.

After the smoking video and the smoking picture are obtained through the method, a marking tool is used for marking, a picture marking tool can be developed through an OpenCV (open computer vision library), the screen marking tool comprises a candidate frame, the size of the candidate frame can be manually adjusted, the position coordinates of the upper left corner and the position coordinates of the lower right corner of a target in the picture are marked, and the position coordinates are stored in a local disk. Because the input in the convolutional neural network designed in this embodiment is a square picture, the candidate frame screen snapshot tool in this embodiment should be a square, and the stored training pictures are all those with an aspect ratio of 1: 1, and finally making D₁And (5) taking a positive sample of smoking behavior.

For making negative sample data, the negative sample is a picture with a background similar to that of the positive sample but not containing a cigarette target, and in this embodiment, the negative sample has the following characteristics:

1. the picture covering background is wide and not single, and should have similarity with the positive sample background;

2. the picture quality is clear and cannot be damaged or blurred;

3. the pictures should have authenticity and should not have super-realistic contents such as cartoons, cartoons and the like;

4. the pictures are moderate in size, and the size is about 416 × 416 or 512 × 512;

5. the picture should be an RGB image, not a black and white picture or a CMYK picture;

6. the number of pictures is sufficient, and the proportion of positive samples and negative samples is controlled to be about 1: 3.

The negative sample sources can extract frames from film and television works, network videos, or search from websites and integrate from existing data sets. In addition, in the process of manufacturing the positive sample, a non-target picture is saved as the negative sample. The negative sample does not contain smoking behavior, but contains pictures close to the positive sample, and the number of the pictures obtained is D₂Negative examples of (3).

After the smoking data is acquired and labeled according to the requirements, a training sample set and a verification sample set in a training stage and a test sample set in a testing stage are obtained. The test sample set contains both the positive and negative samples. The number of pictures in the training stage and the number of pictures in the testing stage are randomly distributed in a ratio of 4:1, and the number of training samples and verification samples in the training stage is 3: a ratio of 1 is randomly assigned. I.e. the number of samples of the entire data set is D ═ D₁+D₂D is divided into three parts: the system comprises a training sample set, a verification sample set and a testing sample set, wherein the number of the training sample set is 0.6X D, the number of the verification sample set is 0.2X D, and the number of the testing sample set is 0.2X D.

And step S2, preprocessing the data set according to the characteristics of the smoking behavior data set, including data amplification processing, data set standardization processing and data set normalization processing.

The data augmentation refers to augmentation of pictures in a data set, and specifically, the data augmentation process includes the following steps:

and S21, randomly overturning. Randomly flipping each picture in the data set to expand the number of pictures in the data set, wherein the random flipping comprises horizontal flipping, vertical flipping, and horizontal-vertical flipping.

This example employs random inversion to expand the number of pictures of a data set. Each picture in this example dataset D is flipped in one of three forms with a probability of 1/3: horizontal turning, vertical turning and horizontal and vertical turning.

The horizontal flipping refers to performing left-right pixel symmetry interchange along a perpendicular bisector of a picture, specifically, for an RGB picture, a formula for changing a pixel matrix on a certain channel during the horizontal flipping is as follows:

A(x，y)＝A(x，w-y)

where w is the picture width.

The vertical turning means that the picture is subjected to up-down pixel symmetric interchange along a horizontal center line, and for an RGB picture, a pixel matrix change formula on a certain channel during vertical turning is as follows:

A(x，y)＝A(h-x，y)

where h is the height of the picture.

The horizontal and vertical flipping means that the picture is subjected to left-right pixel symmetric interchange along the perpendicular bisector, and then subjected to up-down pixel symmetric interchange along the horizontal center line, so as to generate a new picture, specifically, for an RGB picture, when the picture is horizontally and vertically flipped, a pixel matrix change formula on a certain channel is as follows:

A(x，y)＝A(h-x，w-y)

and randomly turning over the picture in the above 3 modes in an equal probability mode to generate a new picture.

And S22, enhancing the color. And (4) carrying out gamma conversion processing on the data set picture subjected to random overturning processing so as to realize picture color enhancement.

The step adopts gamma conversion to convert the brightness of the picture, and converts an RGB picture into a gray image in the following way:

B_grey(x，y)＝0.2989*A_R(x，y)+0.5870*A_G(x，y)+0.1140*A_B(x，y)

wherein A is_R(x, y) is the pixel matrix on the R channel, A_G(x, y) is the pixel matrix on the G channel, A_B(x, y) is a pixel matrix on a B channel, and a single-dimensional two-dimensional matrix B is obtained after calculation_grey(x, y) is the gray level value of the picture.

Normalizing the pixels of the acquired gray level image to a [0, 1] range by adopting the following gamma conversion formula:

C_gamma(x，y)＝B_grey(x，y)^γ

when gamma is larger than 1, the whole image becomes dark; when gamma is less than 1, the whole image becomes bright; and pictures with different brightness can be obtained by continuously changing gamma.

And S23, increasing the number of pictures through the SMOTE algorithm. And carrying out data augmentation by carrying out SMOTE algorithm on positive samples in the data set. The specific process is as follows:

1) randomly taking a plurality of samples from the positive samples to form a sub-sample set, and calculating the distance from each sample x in the sub-sample set to all other samples in the sub-sample set by the following formula:

wherein (x)_i1，y_i1) Is the pixel value of sample x on R, G, B channels, (x)_i2，y_i2) Are the pixel values of the other samples on the R, G, B three channels. And selecting k samples nearest to the sample x, and randomly selecting n samples from the k adjacent samples.

2) For n randomly selected neighbor samples, respectively constructing new samples with the original sample x according to the following formula and adding the new samples into the data set:

y＝x+rand(0，1)*||x＝x_n||

wherein rand (0, 1) is from 0 to1 a random number, x_nIs the current randomly selected sample from the k neighboring samples, and y is the constructed new sample.

After the data amplification treatment, the data set is amplified by a certain amount to obtain a new data set M, and then the data set M is subjected to standardization treatment and normalization treatment.

The data set M is normalized using the following formula:

where X denotes a picture matrix, μ is a mean of the picture, σ is a standard deviation of the picture, and N is the number of pixels of the picture X.

The data set M is normalized using the following formula:

wherein, X_iRepresenting picture pixel point values.

And S3, combining deep learning and constructing a smoking convolutional neural network for detecting smoking behaviors according to the characteristics of the smoking behaviors.

The smoking convolutional neural network SmokingNet for smoking behavior detection is constructed by combining a deep learning method and according to the characteristics of smoking behaviors. As shown in fig. 2, in the present embodiment, the smoking convolutional neural network SmokingNet input is a sample RGB picture, in the present embodiment, the sample RGB picture is 416 × 416 RGB picture, and the smoking convolutional neural network first includes a convolutional network of 2 convolutional layers; then, 5 Residual blocks are provided, and the lengths of the 5 Residual blocks are respectively: 1. 2, 4, 2; residual modules contain 1 convolution layer, which can be regarded as a down-sampling layer, so that the characteristic dimension is reduced, and the total number of the down-sampling layers is 4; then, taking 3 convolutional layers as full-connection layers, wherein the number of feature maps of the last full-connection layer is as follows:

Feature maps＝(classes+1+coords)*anchors_mum

where classes is the object class, the coords value is 4, and the anchors _ num value is 3.

Finally, classification is performed by softmax.

Referring to fig. 2 and 3, the first two layers of the smoking convolutional neural network SmokingNet are two convolutional layers, convolution kernels of the convolutional layers are used for extracting local features of a given image, and in order to improve detection of smoking behaviors, the smoking convolutional neural network with multiple convolution kernels is adopted. The classic convolution kernel is square, and the cigarette is strip-shaped, so this embodiment uses long convolution kernels according to the shape characteristics of the cigarette, and designs four sizes of convolution kernels in the first convolution layer of the smoking convolution neural network SmokingNet, as shown in fig. 2 and 3, the four sizes of convolution kernels are respectively: the convolution kernel is divided into 4 paths for convolution by a small convolution kernel of 3 × 3 pixels, a large convolution kernel of 5 × 5 pixels, a long convolution kernel of 7 × 3, and a long convolution kernel of 3 × 7. Specifically, the first path is a convolution of 5 × 5 with filter number of 8, padding with 2 × 2, and step size of 1 × 1; the second path is a 3 × 7 convolution with a filter number of 8, using 1 × 3padding, with a step size of 1 × 1; the third path is a 3 × 3 convolution with a filter number of 8, using 1 × 1padding, with a step size of 1 × 1; the 7 × 3 convolution with the filter number of 8 for the fourth path, using 3 × lpadding, with a step size of 1 × 1; and finally, integrating the feature maps featuremap after the convolution of the four paths to obtain 416 × 32 output.

Convolution kernels with four sizes are designed in the second convolution layer of the smoking convolution neural network SmokinngNet, and are respectively a small convolution kernel with 2 x 2 pixels, a large convolution kernel with 6 x 6 pixels, a long convolution kernel with 6 x 2 and a long convolution kernel with 2 x 6, and the convolution kernels are divided into 4 paths to be convoluted. Specifically, the first path is 6 × 6 convolution with a filter number of 12, using 2 × 2padding, and having a step size of 2 × 2; the second path is a 2 x 6 convolution with a filter number of 12, using 0 x 2padding, with a step size of 2 x 2; the third path is a convolution with a filter number of 12, using 0 × 0padding, step size of 2 × 2; the 6 × 2 convolution with the filter number of 12 for the fourth path, using 2 × 0padding, step size 2 × 2; and finally, integrating the feature maps featuremap after the convolution of the four paths to obtain 208 × 64 output.

Further preferably, as shown in fig. 3, BN (batch normalization) and Re L U activation functions are also added to the first two convolutional layers, i.e., each convolutional path is further processed by RN + Re L U.

The batch normalization algorithm has the following calculation formula:

the algorithm mainly comprises 4 steps:

(1) calculating the mean value of each training batch data t, namely E [ t ];

(2) calculating the variance of the data t of each training batch, namely Var [ t ];

(3) normalizing the training data of the batch by using the obtained mean value and variance to obtain interval distribution from 0 to 1, wherein the interval distribution is a tiny positive number used when the divisor is 0;

(4) scale transformation and offset: and multiplying t by phi to adjust the numerical value, and adding the increasing offset to obtain r, wherein phi is a scale factor and a translation factor. Because the normalized basically can be limited under normal distribution, the expression capacity of the network is reduced, and in order to solve the problem, two new parameters are introduced: phi,. The sum of phi and phi is obtained by the network learning itself during training.

After batch standardization processing, the Re L U activation function is used for carrying out nonlinear transformation, and the Re L U activation function is an existing function and is not described in detail herein.

In this embodiment, a Residual network is further introduced into the smoking convolutional neural network SmokingNet to form a Residual module Residual Block, as shown in fig. 4, the Residual module includes two convolutional layers, assuming that the input of the Residual module is U, the Residual module is subjected to two layers of convolution processing, each layer of convolution is further subjected to BN + Re L U processing, and the two layers of convolution is denoted as F (·), then the output of the Residual Block is:

O(u)＝u+F(u)

there are 1 convolution layer between adjacent residual modules to reduce the size of the feature map by half, in this embodiment, a convolution operation with a step size of 2 × 2 is performed using 3 × 3 convolution kernels, and the filling of the image is 1 × 1. Assuming that the size of the input feature map is recorded as h, the size of the convolution kernel is recorded as k, the step size is recorded as s, and the filling is recorded as p, the size of the output feature map is:

and S4, setting network training parameters of the smoking convolutional neural network, training, calculating the average accuracy of the test sample set after obtaining a network model, and verifying the model result.

The smoking convolutional neural network can extract features from the picture and acquire the position of a detection target according to the features, and a loss function is calculated according to the following formula:

is the prediction result of the smoking convolutional neural network to the target class label; lambda [ alpha ]₁、λ₂、λ₃Is the weight magnitude of the three parts. And minimizing the loss function to update the network parameters, namely continuously updating the weight, so that the loss function is continuously reduced, and the prediction result is gradually accurate.

For example, the input pictures are uniformly sized as 416 × 416, the distributed training samples are subjected to batch training in the cigarette volume neural network, one batch includes 64 pictures, the training samples are sent into the network for training in 8 times in one batch, 8 pictures are sent each time, then the weight parameters are updated each time the training of one batch is completed, and in the learning process of each time, the learned parameters are reduced according to the proportion of 0.0005. When the momentum method is used for updating the weight, momentum is set to be 0.9, the initial learning rate is set to be 0.001, the total training frequency is set to be 100000, the learning rate is attenuated by ten times when the iteration is carried out to 80000, and the learning rate is attenuated by ten times when the iteration is carried out to 90000.

After the network model is obtained after training, the average accuracy mAP of the data of the test sample set is calculated to verify the model result, and the specific calculation steps are as follows:

1. for the position prediction result of the network model on the target in the picture, calculating IoU by combining the labeling boxes in the test sample set, setting a threshold value Thred as 0.6, IoU > Thred as TP (true positive case), IoU < Thred as FP (false positive case), and marking the undetected labeling boxes as FP (false negative case); and calculating the accuracy rate Recall and the Recall ratio Precision according to the following formulas:

wherein Num_gtIs the total number of groudtuth.

2. Drawing a precision-call curve of each category, wherein the area under the curve is the AP value of the category, and then calculating the mAP according to the following formula:

and S5, generating and storing available models as required, and applying the stored available models to smoking behavior detection reality scenes.

After training is complete, the model can be saved in several ways:

a) save the entire model in its entirety

Save the entire model in its entirety using model. save api, save the Keras model and weights in an HDF5 file that will contain: structure of the model, parameters of the model, optimizer parameters.

b) Separately preserving structure and weight of model

Save only the structure of the model: model structures are saved to json file or yml file using to json api or to ymlAPI.

Only the weight of the model is retained: the weight of the model only can be reserved through save _ weightsAPI, and also can be realized through the setting of checkpoint.

c) Preserving model maps

Summary information of a model can be printed through a model.

And selecting the model with the maximum mAP value for application, specifically, inputting pictures of video streams in the camera into a network for target positioning and prediction, and when a smoking behavior target is detected for a plurality of pictures, judging that the smoking behavior occurs in the attention range of the camera at the moment.

In summary, the invention provides a set of real-time detection method based on deep learning, which comprises the steps of firstly constructing smoking behavior data set acquisition and labeling, and then preprocessing pictures according to the characteristics of the smoking behavior data set, including data augmentation, and unified standardization and normalization of the data set; wherein the data augmentation comprises data augmentation by using random inversion, color enhancement and a SMOE algorithm; then, aiming at the characteristics of the detection object, a smoking convolutional neural network SmokingNet with multiple convolutional kernels is designed, the characteristics of the detection object can be extracted in multiple aspects, and the network can be combined with a BN technology and a Residual Block to improve the network performance; then setting and training parameters in the network training process; and finally, the model is required to be stored in the installation scheme, and the stored model can be applied to a real scene. The method can effectively manufacture the smoking behavior detection data set, generate the smoking behavior detection model with high precision and high real-time performance, and has wide application prospect.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A smoking behavior real-time detection method based on deep learning is characterized in that: the method comprises the following steps:

2. The deep learning-based smoking behavior real-time detection method according to claim 1, wherein: the data augmentation processing specifically comprises the following steps:

3. The deep learning-based smoking behavior real-time detection method according to claim 2, wherein: the smoking convolutional neural network is characterized in that the input of the smoking convolutional neural network is a sample RBG picture, the smoking convolutional neural network is a convolutional network comprising 2 convolutional layers firstly, then a plurality of residual error modules are arranged, a down-sampling layer comprising 1 convolutional layer is arranged between the adjacent residual error modules, then 3 convolutional layers are used as full-connection layers, and finally classification is carried out through softmax.

4. The deep learning-based smoking behavior real-time detection method of claim 3, wherein the first two convolutional layers of the smoking convolutional neural network are both designed with convolution kernels with four sizes, and each convolution path in the convolutional layers is further processed by batch normalization processing and an Re L U activation function, wherein the batch normalization processing has the following formula:

5. The deep learning-based smoking behavior real-time detection method according to claim 4, wherein the residual error module comprises two convolution layers, the input of the residual error module is U, after two layers of convolution processing, each layer of convolution is further processed by batch standardization processing and Re L U activation function processing, and the two layers of convolution is recorded as F (-), and the output of the residual error module is O (U) ═ U + F (U).

6. The deep learning-based smoking behavior real-time detection method according to claim 5, wherein: updating the network parameters by minimizing a loss function, wherein the loss function is calculated as follows:

7. The deep learning-based smoking behavior real-time detection method according to claim 6, wherein: there are three storage modes for storing available models, namely, completely storing the whole model, respectively storing the structure and weight of the model and storing the model map.