CN115690752A

CN115690752A - Driver behavior detection method and device

Info

Publication number: CN115690752A
Application number: CN202211371954.4A
Authority: CN
Inventors: 杨云飞; 李环莹; 程起敏; 凌嘉骏
Original assignee: Beijing Itarge Technology Co ltd; Huazhong University of Science and Technology
Current assignee: Beijing Itarge Technology Co ltd; Huazhong University of Science and Technology
Priority date: 2022-11-03
Filing date: 2022-11-03
Publication date: 2023-02-03

Abstract

The invention provides a driver behavior detection method and a driver behavior detection device, which are applied to an intelligent traffic monitoring system and comprise the following steps: the method comprises the following steps: manually marking the driver behavior data collected in the traffic monitoring environment, and establishing a driver abnormal behavior detection data set; then, according to the characteristics of the abnormal behavior detection data set of the driver, a lightweight detection model capable of quickly and accurately identifying the abnormal behavior of the driver is constructed by using an attention mechanism and an improved deep residual error neural network; preprocessing the constructed data set, and training by using the preprocessed data set; and finally, detecting the image of the driver by using the trained model to realize the detection of the abnormal behavior of the driver. The method can effectively improve the accuracy of the driver behavior detection, and has good real-time performance and mobility.

Description

Driver behavior detection method and device

Technical Field

The invention belongs to the field of behavior detection, and particularly relates to a driver behavior detection method and device.

Background

With the rapid development of the transportation industry, the number of motor vehicles and motor vehicle drivers on roads is rapidly increasing. The progress of vehicles brings convenience to people, meanwhile, traffic accidents are increased day by day, and the life and property safety of people is seriously threatened, wherein illegal driving behaviors such as no wearing of a safety belt, call receiving and making and the like of a driver are one of main reasons for causing the traffic accidents. Under the background, in the face of increasingly severe traffic safety situation, the development of intelligent traffic monitoring systems has attracted the attention of extensive researchers.

The traditional intelligent traffic equipment adopts various detection monitoring equipment to obtain road information, vehicle information, driver information and the like so as to achieve the purpose of traffic management. However, this method requires a lot of manual operations and analyses, traffic video data requires 24 hours of supervision by professional personnel, and a lot of manpower and material resources are required. Along with the intellectualization of the traffic monitoring system, the computer vision technology is introduced into the intelligent traffic monitoring system, so that the input of manpower can be greatly reduced.

The convolutional neural network in the deep learning technology is widely used in the field of image classification and detection, but most of the existing driver abnormal behavior detection methods based on the convolutional neural network simply input the driver behavior image into a pre-trained convolutional neural network by using a transfer learning method, and treat different channels of the input image and different positions in the image equally and indiscriminately. Such methods have difficulty capturing differences in different channel and location characteristics, resulting in low detection accuracy. Meanwhile, the direct use of the traditional convolutional neural network detection can result in lower model inference speed and poorer real-time performance.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a method and a device for detecting the behavior of a driver, and aims to solve the problems of low accuracy and insufficient real-time property of driver behavior detection in the prior art.

In order to achieve the above object, in a first aspect, the present invention provides a driver behavior detection method applied to an intelligent traffic monitoring system, the method including the steps of:

determining an improved ResNet50 network structure; the improved ResNet50 network structure is used for detecting the behavior of a driver based on a driver image, and replaces the common convolution of the second layer of a convolution residual module in the classic ResNet50 network structure with a depth separable convolution, replaces the common convolution of the first layer and the second layer of an identity residual module in the classic ResNet50 network structure with a depth separable convolution, replaces the third layer with a grouping convolution, reduces the number of identity residual modules of the classic ResNet50 network structure, reduces the neuron number of the last layer of a full connection layer of the classic ResNet50 network structure, and lightens the classic ResNet50 network structure, reduces the number of model parameters, and improves the image analysis speed of the ResNet50 network structure; meanwhile, the improved ResNet50 network structure is embedded with a channel attention mechanism module and a space attention mechanism module in each convolution residual module and each identity residual module so as to pay attention to different characteristic channels and different positions of an image, improve the expression capability of the ResNet50 network structure on the overall characteristics of the image and ensure the detection accuracy;

and inputting the image of the driver into a trained improved ResNet50 network structure, detecting the behavior of the driver in the image, and outputting a detection result.

In one possible example, the number of identical residual blocks of the classical ResNet50 network structure is reduced, specifically:

the number of identical residual blocks in the third and fourth stages of the classical ResNet50 network structure is removed by 1 and 2, respectively.

In one possible example, the reducing the number of neurons in the last fully connected layer of the classical ResNet50 network structure specifically includes:

and removing the last full connection layer in the classical ResNet50 network structure, and resetting the number of the neurons in the full connection layer to be 256, thereby reducing the parameter number of the full connection layer of the classical ResNet50 network structure.

In one possible example, a channel attention mechanism module and a spatial attention mechanism module are embedded in each convolution residual error module and each identity residual error module, so as to pay attention to the importance of different feature channels and different positions of an image, and improve the expression capability of a network structure on image depth features, specifically:

the method comprises the following steps of (1) assuming that a characteristic diagram obtained after a certain convolution residual module or identity residual module is I, the number of channels is C, and the size is H multiplied by W; the output of the channel attention mechanism module is M _c The number of channels is C, and the size is 1 multiplied by 1; the output of the space attention mechanism module is M _s If the number of channels is 1 and the size is H × W, the operations of the entire channel attention mechanism module and the spatial attention mechanism module can be described as follows:

wherein, the first and the second end of the pipe are connected with each other,

corresponding elements representing vectors are multiplied, and in the multiplication process, attention mechanism weights of two scales are mutually transmitted and fused together;x is a characteristic diagram output by the channel attention mechanism module, and Y is a characteristic diagram finally obtained by the convolution residual error module or the identity residual error module.

In one possible example, the improved ResNet50 network structure is trained by:

acquiring a large number of driver images collected by a traffic monitoring system, screening out invalid images, and reserving valid images; wherein the effective image refers to the fact that the driver has obvious calling behavior or non-belted behavior in the image;

preprocessing the effective image; the pre-treatment comprises at least one of the following operations: rotation, translation, scaling, horizontal flipping, gaussian blurring and edge enhancement;

determining real labels of all effective images, and introducing random noise into the real labels by using a label smoothing technology; the real label refers to the behavior name of the driver in the image;

training the improved ResNet50 network structure by adopting the preprocessed image, so that the trained improved ResNet50 network structure can detect the behavior of a driver based on the image of the driver; the driver behaviors which can be detected by improving the ResNet50 network structure comprise: make a call and unbelt.

In a second aspect, the present invention provides a driver behavior detection device, applied to an intelligent traffic monitoring system, the device comprising:

a network structure determining unit for determining an improved ResNet50 network structure; the improved ResNet50 network structure is used for detecting driver behaviors based on driver images, and replaces the common convolution of the second layer of a convolution residual module in the classical ResNet50 network structure with a deep separable convolution, replaces the common convolution of the first layer and the second layer of an identity residual module in the classical ResNet50 network structure with a deep separable convolution, replaces the third layer with a grouping convolution, reduces the number of identity residual modules of the classical ResNet50 network structure, and reduces the neuron number of the last full connection layer of the classical ResNet50 network structure so as to lighten the classical ResNet50 network structure, reduce the model parameters and improve the image analysis speed of the ResNet50 network structure; meanwhile, the improved ResNet50 network structure is embedded with a channel attention mechanism module and a space attention mechanism module in each convolution residual module and each identity residual module so as to pay attention to different characteristic channels and different positions of an image, improve the expression capability of the ResNet50 network structure on the overall characteristics of the image and ensure the detection accuracy;

and the behavior detection unit is used for inputting the image of the driver into the trained improved ResNet50 network structure, detecting the behavior of the driver in the image and outputting a detection result.

In one possible example, the improved ResNet50 network structure removes the number of identical residual blocks in the third and fourth stages of the classical ResNet50 network structure by 1 and 2, respectively.

In one possible example, the improved ResNet50 network structure removes the last fully-connected layer in the classical ResNet50 network structure, resets the number of neurons in the fully-connected layer to 256, and reduces the number of parameters in the fully-connected layer of the classical ResNet50 network structure.

In one possible example, a feature diagram obtained after a certain convolution residual module or an identity residual module of an improved ResNet50 network structure is assumed to be I, the number of channels is C, and the size is H multiplied by W; the output of the channel attention mechanism module is M _c The number of channels is C, and the size is 1 multiplied by 1; the output of the space attention mechanism module is M _s If the number of channels is 1 and the size is H × W, the operations of the entire channel attention mechanism module and the spatial attention mechanism module can be described as follows:

wherein the content of the first and second substances,

corresponding elements representing vectors are multiplied, and in the multiplication process, attention mechanism weights of two scales are mutually transmitted and fused together; x is a characteristic diagram output by the channel attention mechanism module, and Y is a characteristic diagram finally obtained by the convolution residual error module or the identity residual error module.

In one possible example, the apparatus further comprises:

the network structure training unit is used for acquiring a large number of driver images collected by the traffic monitoring system, screening out invalid images and reserving the valid images; wherein the effective image refers to the fact that the driver has obvious calling behavior or non-belted behavior in the image; preprocessing the effective image; the pre-treatment comprises at least one of the following operations: rotation, translation, scaling, horizontal flipping, gaussian blurring and edge enhancement; determining a real label of each effective image, and introducing random noise into the real label by using a label smoothing technology; the real label refers to the behavior name of the driver in the image; training the improved ResNet50 network structure by adopting the preprocessed image, so that the trained improved ResNet50 network structure can detect the behavior of a driver based on the image of the driver; the driver behaviors which can be detected by improving the ResNet50 network structure comprise: make a call and unbelt.

Generally, compared with the prior art, the technical scheme conceived by the invention has the following beneficial effects:

the invention provides a driver behavior detection method and a driver behavior detection device, wherein a classical deep residual error neural network ResNet50 is improved, and the method comprises the steps of replacing common convolution in a residual error block with deep separable convolution and grouped convolution and adjusting the number of network layers to reduce the parameter number of a model and improve the reasoning speed of the model; meanwhile, the channel attention mechanism strengthens channels containing important information according to global features in different channels, weakens channels containing redundant information, and enables a network to pay attention to the importance of different positions of an image so as to achieve the effect of improving the overall feature expression of the model.

The driver behavior detection method of the improved deep residual error neural network integrated with the attention mechanism can improve the accuracy of identifying abnormal behaviors of a driver in a traffic monitoring environment, reduce the amount of calculation, improve the running speed of a model and realize accurate and rapid detection and judgment.

Drawings

FIG. 1 is a flow chart of a driver behavior detection method provided by an embodiment of the invention;

FIG. 2 is a schematic flow chart of a method for detecting abnormal driver behavior based on an attention mechanism and an improved deep residual error network according to an embodiment of the present invention;

FIG. 3 is a structural diagram of a detection architecture based on an attention mechanism and an improved depth residual error network according to an embodiment of the present invention;

FIG. 4 is a Block diagram of an Identity Block (DPIB) and Convolution Block (DPCB) module of the improved Identity Block (DPIB) and Convolution Block (DPCB) of the present invention;

FIG. 5 is a comparison histogram of accuracy and run time of a model provided by an embodiment of the invention;

FIG. 6 is a driver image with a blurred background and which is prone to confusion as provided by embodiments of the present invention;

fig. 7 is a structural diagram of a driver behavior detection apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The invention discloses a driver abnormal behavior real-time detection method based on an attention mechanism and an improved deep residual error neural network, and belongs to the field of driver abnormal behavior detection. The method comprises the following steps: manually labeling the behavior data of the driver collected in the traffic monitoring environment, and establishing an abnormal behavior detection data set of the driver; then, according to the characteristics of the abnormal behavior detection data set of the driver, a lightweight detection model capable of quickly and accurately identifying the abnormal behavior of the driver is constructed by utilizing an attention-based mechanism and an improved deep residual error neural network; preprocessing the constructed data set, and training by utilizing the preprocessed data set; and finally, detecting the image of the driver by using the trained model to realize the detection of the abnormal behavior of the driver. The method can effectively improve the accuracy of the driver behavior detection, and has good real-time performance and mobility.

Fig. 1 is a flowchart of a driver behavior detection method according to an embodiment of the present invention, as shown in fig. 1, including the following steps:

s101, determining an improved ResNet50 network structure; the improved ResNet50 network structure is used for detecting the behavior of a driver based on a driver image, and replaces the common convolution of the second layer of a convolution residual module in the classic ResNet50 network structure with a depth separable convolution, replaces the common convolution of the first layer and the second layer of an identity residual module in the classic ResNet50 network structure with a depth separable convolution, replaces the third layer with a grouping convolution, reduces the number of identity residual modules of the classic ResNet50 network structure, reduces the neuron number of the last layer of a full connection layer of the classic ResNet50 network structure, and lightens the classic ResNet50 network structure, reduces the number of model parameters, and improves the image analysis speed of the ResNet50 network structure; meanwhile, the improved ResNet50 network structure is embedded with a channel attention mechanism module and a space attention mechanism module in each convolution residual module and each identity residual module so as to focus on different characteristic channels and different positions of an image, improve the expression capability of the ResNet50 network structure on the overall characteristics of the image and ensure the detection accuracy;

and S102, inputting the image of the driver into the trained improved ResNet50 network structure, detecting the behavior of the driver in the image, and outputting a detection result.

The embodiment of the invention provides a driver behavior detection method based on an attention mechanism and an improved deep residual error neural network, and with reference to fig. 2, the method comprises the following steps: firstly, acquiring image data by using a traffic monitoring camera, and constructing a driver abnormal behavior detection data set by screening and labeling original pictures; then, according to the characteristics of the abnormal behavior detection data set of the driver, a lightweight detection method based on an attention mechanism and an improved deep residual error neural network is provided; preprocessing the constructed data set, and training the preprocessed data set; and detecting the images by using the trained model to realize the detection of the abnormal behavior of the driver.

S1, acquiring image data by using a traffic monitoring camera, and constructing a driver abnormal behavior detection data set by screening and labeling original images.

The method comprises the steps of building a bayonet camera under a traffic monitoring environment, collecting images of drivers, carrying out manual labeling after invalid images are screened out, and collecting more than 1 million qualified images of the drivers, wherein 80% of the images are used as a training set, and 20% of the images are used as a testing set. In order to obtain better training effect, the number of samples of 'unbelted' and 'call receiving and making' is balanced. Specifically, as shown in table 1:

table 1: constitution of driver abnormal behavior detection data set

S2, constructing a light weight detection method based on an attention mechanism and an improved deep residual error neural network according to the characteristics of the abnormal behavior detection data set of the driver.

Specifically, the network used for acquiring the features of the driver images is a convolutional neural network, which may be any one of an AlexNet network, a LeNet network, a google LeNet network, a VGG network, an inclusion network, a ResNet network, an EfficientNet network, and a NasNet network, and is used for extracting the local and global features of each driver image.

Preferably, the convolutional neural network is a modified deep residual neural network with ResNet50 as a backbone network. The ResNet framework mainly contributes to solving the problem that the classification precision of the traditional convolutional neural network is reduced along with the deepening of the network depth, the training process of the convolutional neural network is accelerated through the proposed residual learning idea, and the problems of gradient disappearance and gradient explosion are effectively avoided. The ResNet network is formed by stacking a plurality of Convolution residual modules (Convolution blocks) and Identity residual modules (Identity blocks), wherein each Identity module comprises three Convolution layers, the first Convolution layer adopts 1 x 1 Convolution kernels to reduce the number of characteristic channels, the second Convolution layer adopts 3 x 3 Convolution kernels to extract characteristics and reduce the number of network parameters, and the third Convolution layer adopts 1 x 1 Convolution kernels to ensure that the number of output channels is equal to the number of input channels. The convolution residual module is different from the identity residual module in that a convolution layer of 1 multiplied by 1 is added on a branch of the convolution residual module and is used for adjusting the size of an input feature; the input and output sizes of the identity residual error module are the same, and the identity residual error module is used for deepening the network.

Further, as shown in fig. 3 and 4, the classical ResNet50 model is modified. The 3 × 3 normal Convolution of the second layer in the Convolution residual module is replaced by Depth Separable Convolution Block (DSCB), the normal Convolution of the first layer and the second layer in the Identity residual module is replaced by depth Separable Convolution, the third layer is replaced by packet Separable Identity Block (DSIB), and the rest is unchanged. Assuming that the input size is H × W × C, the number of convolution kernels is k × k × N, and the input feature maps are divided into M groups, the number of input feature maps in each group is C/M, and the number of convolution kernels in each group is N/M, then the corresponding parameter quantities of the ordinary convolution and the grouped convolution kernels are respectively:

n1＝k×k×N×C

as can be seen from the above equation, the total number of parameters for the grouped convolution can be reduced to 1/M of the normal convolution. The depth separable convolution is divided into two steps, depth convolution and point-by-point convolution, and the space and channel regions are considered separately in the convolution operation. Assuming that the input size is H multiplied by W multiplied by C, the sizes of convolution kernels are k multiplied by 1, the number of the convolution kernels is N, the deep convolution divides the input into C groups, each convolution kernel only carries out convolution operation on a corresponding channel, the point-by-point convolution carries out N1 multiplied by 1 convolutions on the features after the deep convolution, and finally, a feature map with the output of H multiplied by W multiplied by N is obtained. The parameter quantities corresponding to the normal convolution and the depth separable convolution are respectively:

n1＝H×W×C×k×k×N

n2＝H×W×C×(k×k+N)

the calculated amount of the depth separable convolution is compressed as:

further, the number of layers of the classical ResNet50 network is adjusted, the number of residual blocks in the second stage, the third stage, the fourth stage and the fifth stage of the ResNet50 is modified from 3, 4, 6 and 3 to 3, 4 and 3, specifically, 1 and 2 identical residual blocks in the third stage and the fourth stage are respectively removed, and the number of model parameters is reduced. Meanwhile, the last full connection layer in the ResNet50 network is removed, and the number of the neurons in the full connection layer is reset to 256.

Further, a channel attention mechanism module and a space attention mechanism module are added in each residual block of the modified ResNet50, and 13 channel attention mechanism modules and 13 space attention mechanism modules are added in total, so that the importance of different feature channels and different positions of images is concerned, and the expression capability of depth features is improved. The method specifically comprises the following steps:

the method comprises the steps that a characteristic diagram obtained after a certain residual block is assumed to be I, the number of channels is C, and the size is H multiplied by W; the output of the channel attention submodule is M _c The number of channels is C, and the size is 1 multiplied by 1; the spatial attention submodule outputs M _s If the number of channels is 1 and the size is H × W, the operations of the entire channel attention mechanism module and the spatial attention mechanism module can be described as follows:

wherein the content of the first and second substances,

the corresponding elements of the representation vector are multiplied, and Y is the feature map finally obtained by the module.

Specifically, assuming that the input of the channel attention module is a feature map I with a size H × W × C, the feature map is changed to a size of 1 × 1 × C through parallel global average pooling and global maximum pooling, and then passes through a multi-layered perceptron (Shared MLP) that shares weights. In order to reduce network parameters, the hidden layer is set to be C/r neurons, wherein r is a scaling factor, the features processed by the multilayer perceptron are added, sigmoid activation function mapping is adopted, and channel weight M is obtained _c Multiplying the input characteristic I to finish the operation processing of the channel attention, wherein the formula of the channel attention is as follows:

M _c (I)＝σ(MLP(GAP(I))+MLP(GMP(I)))

inputting the features X obtained by the channel attention module into a space attention module, respectively obtaining H multiplied by W multiplied by 1 features through global average pooling and global maximum pooling, splicing the H multiplied by W multiplied by 1 features together, changing the number of channels into 1 through 7 multiplied by 7 convolution operation, and obtaining a space weight coefficient M through a Sigmoid activation function _s The obtained spatial weight M _s Multiplying the input feature X to finish the operation processing of space attention, specifically:

M _s (I)＝σ(conv([GAP(I)；GMP(I)]))

wherein, σ is a Sigmoid activation function; conv is the convolution operation with a convolution kernel of 7 × 7; GAP is global average pooling; GMP is global maximum pooling;

is element-by-element multiplication.

And S3, preprocessing the constructed data set, and then training the preprocessed data set.

Further, due to the fact that the vehicle body is prone to being bumpy, insufficient in light, overexposure and the like in the driving process, the images shot in a traffic monitoring scene can have the problems of shaking, blurring and the like, the images are prone to being blurred, the characteristics of corresponding targets are lacked, and the quality of the image data is reduced.

Further, a label smoothing (label smoothing) technique is used to introduce random noise into the real label:

wherein y is _k The original real label is shown to be,

representing the generated label after label smoothing, alpha is a small hyperparameter set to 0.02, and K is the number of categories.

Further, the above-described driver detection model is trained using a cross entropy loss function L:

L＝-[ylog(g(x))+(1-y)log(1-g(x))]

where y is the true tag and g represents the sigmoid function, i.e.

And S4, detecting the image by using the trained model to realize the detection of the abnormal behavior of the driver.

The effectiveness of the invention is demonstrated experimentally below:

the performance test is carried out through the driver behavior data set collected on the traffic road bayonet. Each picture is distinguished by 'whether to fasten a safety belt' and 'whether to make a call', and the size of each image after data set preprocessing is 128 x 128. In the deep learning, the classification evaluation criteria frequently used include Accuracy (Accuracy), fusion matrix (Confusion matrix), F1-score (F1 value), etc., and this experiment employs the Accuracy as the evaluation criterion. The accuracy is the number of all correctly classified samples divided by the total number of samples:

wherein TP represents the number of correct positive case classifications; TN represents the number of negative cases classified correctly; p and N are all positive and all negative examples, respectively.

The convolutional neural network model adopted in the experiment is an attention-based and improved depth residual error network model provided by the invention. In pre-training the model, the initial learning rate is set to 0.0001, the current learning rate decays to 0.316 times the initial learning rate for each 10 iterations of the network, the batch size is set to 64, and the optimizer uses Adam, epoch is set to 100.

FIG. 5 is a comparison histogram of accuracy and run time of a model provided by an embodiment of the present invention; wherein, (a) is the accuracy comparison and (b) is the run time comparison. It can be seen that the accuracy of the classical ResNet50 network is the highest compared with other convolutional neural networks except for inclusion v3, and after the convolutional module and the identity module are modified and a space attention mechanism and a channel attention mechanism are added, respectively, the accuracy of the improved ResNet50 is further improved to reach 93.42% and surpass other networks. Meanwhile, the network inference speed of the ResNet50 (SM _ ResNet 50) after the deep separable convolution is introduced is improved by 47%, while the inference speed (SM _ CBAM _ ResNet 50) after the channel attention mechanism and the spatial attention mechanism are added is slightly reduced, but is still improved by 43% compared with the classical ResNet 50.

In order to analyze the influence of the improved depth residual network, the channel attention mechanism and the space attention mechanism introduced into the driver behavior detection method based on the attention mechanism and the improved depth residual neural network, ablation experiments are respectively carried out on a driver behavior data set on a classic ResNet50 network, a method (SM _ ResNet 50) of introducing grouping convolution and depth separable convolution into a residual network, and a method (SM _ CBAM _ Res 50) of embedding the channel attention mechanism and the space attention mechanism into the SM _ ResNet 50.

As can be seen from table 2, the accuracy of the direct use of the classical ResNet50 network without any modification is lowest; after the classical ResNet50 network is improved, the number of layers of the network is reduced, and deep separable convolution and grouping convolution are introduced, so that the parameter number and the calculated amount of a model are reduced, the model reasoning speed is improved by 47%, the average accuracy rate is up to 92.51%, and the model reasoning speed is improved by 1.05% compared with the classical ResNet50 network; after a channel attention mechanism and a space attention mechanism are embedded in the SM _ ResNet50 model, the SM _ CBAM _ Res50 is obtained, on the basis that the model inference speed is not obviously reduced, the precision is further improved to 93.42%, and the performance of the model is effectively improved due to the introduction of the channel attention mechanism and the space attention mechanism.

Table 2: impact of different network structures on recognition performance

Model (model)	Accuracy/％	time/ms
			ResNet50	91.46	51
SM_ResNet50	92.51	27
			SM_CBAM_Res50	93.42	29

To further verify the effectiveness of the present invention, the method herein is compared to methods commonly used in recent years for driver detection of abnormal behavior, the classification accuracy of each method is shown in table 3. As can be seen from Table 3, the method of the invention has improved precision relative to the traditional deep convolutional neural network-based VGG16, googleNet, incepotionV 3 and ResNet 18.

Table 3: comparison of different model results on driver abnormal behavior detection data set

Model (model)	Accuracy/％
		VGG16	88.64
GoogleNet	87.32
		InceptionV3	91.89
ResNet18	89.15
		SM_CBAM_Res50	93.42

For the blurred and confusing images with background (from left to right and from top to bottom, (a) - (d), respectively) as shown in fig. 6, the detection results of the method of the present invention are shown in table 4. It can be seen that the method SM _ CBAM _ Res50 of the present invention can obtain correct detection results (call means "call receiving", "nocall" means "not call receiving", "belt" means "belted belt", and "unbelted belt") compared with the classical ResNet50 network.

Table 4 partial image detection result examples

	ResNet50	SM_CBAM_Res50
			Picture (a)	call(0.97),belt(0.99)	nocall(0.99),belt(0.99)
Picture (b)	call(0.98),belt(0.99)	nocall(0.99),belt(0.99)
			Picture (c)	call(0.98),belt(0.99)	nocall(0.99),belt(0.99)
Picture (d)	nocall(0.98),nobelt(0.99)	nocall(0.99),belt(0.99)

Fig. 7 is an architecture diagram of a driver behavior detection apparatus according to an embodiment of the present invention, as shown in fig. 7, including:

a network structure determining unit 710 for determining an improved ResNet50 network structure; the improved ResNet50 network structure is used for detecting the behavior of a driver based on a driver image, and replaces the common convolution of the second layer of a convolution residual module in the classic ResNet50 network structure with a depth separable convolution, replaces the common convolution of the first layer and the second layer of an identity residual module in the classic ResNet50 network structure with a depth separable convolution, replaces the third layer with a grouping convolution, reduces the number of identity residual modules of the classic ResNet50 network structure, reduces the neuron number of the last layer of a full connection layer of the classic ResNet50 network structure, and lightens the classic ResNet50 network structure, reduces the number of model parameters, and improves the image analysis speed of the ResNet50 network structure; meanwhile, the improved ResNet50 network structure is embedded with a channel attention mechanism module and a space attention mechanism module in each convolution residual module and each identity residual module so as to pay attention to different characteristic channels and different positions of an image, improve the expression capability of the ResNet50 network structure on the overall characteristics of the image and ensure the detection accuracy;

and a behavior detection unit 720, configured to input the image of the driver into the trained improved ResNet50 network structure, detect the behavior of the driver in the image, and output a detection result.

The network structure training unit 730 is used for acquiring a large number of driver images collected by the traffic monitoring system, screening out invalid images and reserving valid images; wherein the effective image refers to the fact that the driver has obvious calling behavior or non-belted behavior in the image; preprocessing the effective image; the pre-treatment comprises at least one of the following operations: rotation, translation, scaling, horizontal flipping, gaussian blurring and edge enhancement; determining real labels of all effective images, and introducing random noise into the real labels by using a label smoothing technology; the real label refers to the behavior name of the driver in the image; training the improved ResNet50 network structure by adopting the preprocessed image, so that the trained improved ResNet50 network structure can detect the behavior of a driver based on the image of the driver; the driver behaviors which can be detected by improving the ResNet50 network structure comprise: make a call and unbelt.

It is understood that detailed functional implementation of each unit described above can refer to the description in the foregoing method embodiment, and is not described herein again.

In addition, an embodiment of the present invention provides another driver behavior detection apparatus, including: a memory and a processor; the memory for storing a computer program; the processor is configured to implement the method in the above embodiments when executing the computer program.

Furthermore, the present invention also provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method in the above-described embodiments.

Based on the methods in the above embodiments, an embodiment of the present invention provides a computer program product, which, when run on a processor, causes the processor to execute the methods in the above embodiments.

Based on the method in the foregoing embodiment, an embodiment of the present invention further provides a chip including one or more processors and an interface circuit. Optionally, the chip may also contain a bus. Wherein:

the processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor described above may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The methods, steps disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The interface circuit can be used for sending or receiving data, instructions or information, the processor can process the data, the instructions or other information received by the interface circuit, and the processing completion information can be sent out through the interface circuit.

Optionally, the chip further comprises a memory, which may include read only memory and random access memory, and provides operating instructions and data to the processor. The portion of memory may also include non-volatile random access memory (NVRAM). Optionally, the memory stores executable software modules or data structures, and the processor may perform corresponding operations by calling the operation instructions stored in the memory (the operation instructions may be stored in an operating system). Optionally, the interface circuit may be configured to output the result of the execution by the processor.

It should be noted that the respective corresponding functions of the processor and the interface circuit may be implemented by hardware design, software design, or a combination of hardware and software, which is not limited herein. It will be appreciated that the steps of the above-described method embodiments may be performed by logic circuits in the form of hardware or instructions in the form of software in a processor.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. In addition, in some possible implementation manners, each step in the above embodiments may be selectively executed according to an actual situation, may be partially executed, or may be completely executed, and is not limited herein.

It is understood that the processor in the embodiments of the present application may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. The general purpose processor may be a microprocessor, but may be any conventional processor.

The method steps in the embodiments of the present application may be implemented by hardware, or may be implemented by software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in Random Access Memory (RAM), flash memory, read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), registers, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A driver behavior detection method is applied to an intelligent traffic monitoring system, and comprises the following steps:

determining an improved ResNet50 network structure; the improved ResNet50 network structure is used for detecting driver behaviors based on driver images, and replaces the common convolution of the second layer of a convolution residual module in the classical ResNet50 network structure with a deep separable convolution, replaces the common convolution of the first layer and the second layer of an identity residual module in the classical ResNet50 network structure with a deep separable convolution, replaces the third layer with a grouping convolution, reduces the number of identity residual modules of the classical ResNet50 network structure, and reduces the neuron number of the last full connection layer of the classical ResNet50 network structure so as to lighten the classical ResNet50 network structure, reduce the model parameters and improve the image analysis speed of the ResNet50 network structure; meanwhile, the improved ResNet50 network structure is embedded with a channel attention mechanism module and a space attention mechanism module in each convolution residual module and each identity residual module so as to pay attention to different characteristic channels and different positions of an image, improve the expression capability of the ResNet50 network structure on the overall characteristics of the image and ensure the detection accuracy;

2. The method according to claim 1, wherein the number of identical residual blocks of a classical ResNet50 network structure is reduced, in particular:

3. The method according to claim 1, wherein the reducing the number of neurons in the last fully connected layer of the classical ResNet50 network structure is:

and removing the last full connection layer in the classic ResNet50 network structure, resetting the number of the neurons in the full connection layer to be 256, and reducing the parameter number of the full connection layer in the classic ResNet50 network structure.

4. The method according to any one of claims 1 to 3, wherein a channel attention mechanism module and a spatial attention mechanism module are embedded in each convolution residual module and each identity residual module, so as to pay attention to the importance of different feature channels and different positions of an image and improve the expression capability of a network structure on the image depth features, specifically:

wherein the content of the first and second substances,

5. A method according to any one of claims 1 to 3, wherein the improved ResNet50 network structure is trained by:

determining a real label of each effective image, and introducing random noise into the real label by using a label smoothing technology; the real label refers to the behavior name of the driver in the image;

training the improved ResNet50 network structure by adopting the preprocessed image, so that the trained improved ResNet50 network structure can detect the behavior of a driver based on the image of the driver; the driver behaviors which can be detected by improving the ResNet50 network structure comprise the following steps: make a call and unbelt.

6. A driver behavior detection device, which is applied to an intelligent traffic monitoring system, the device comprising:

7. The apparatus of claim 6 wherein the improved ResNet50 network structure removes the number of identical residual blocks in the third and fourth stages of the classical ResNet50 network structure by 1 and 2, respectively.

8. The apparatus of claim 6, wherein the improved ResNet50 network structure eliminates a last fully connected layer in the classic ResNet50 network structure, resets the number of neurons in the fully connected layer to 256, and reduces the parameters of the fully connected layer in the classic ResNet50 network structure.

9. The device according to any one of claims 6 to 8, wherein a characteristic diagram obtained after a certain convolution residual module or identity residual module of the improved ResNet50 network structure is assumed to be I, the number of channels is C, and the size is H x W; the output of the channel attention mechanism module is M _c The number of channels is C, and the size is 1 multiplied by 1; the output of the space attention mechanism module is M _s If the number of channels is 1 and the size is H × W, the operations of the entire channel attention mechanism module and the spatial attention mechanism module can be described as follows:

wherein the content of the first and second substances,

10. The apparatus of any one of claims 6 to 8, further comprising:

the network structure training unit is used for acquiring a large number of driver images collected by the traffic monitoring system, screening out invalid images and reserving valid images; wherein the effective image refers to the fact that the driver has obvious calling behavior or non-belted behavior in the image; preprocessing the effective image; the pre-treatment comprises at least one of the following operations: rotation, translation, scaling, horizontal flipping, gaussian blurring and edge enhancement; determining a real label of each effective image, and introducing random noise into the real label by using a label smoothing technology; the real label refers to the behavior name of the driver in the image; training the improved ResNet50 network structure by adopting the preprocessed image, so that the trained improved ResNet50 network structure can detect the behavior of a driver based on the image of the driver; the driver behaviors which can be detected by improving the ResNet50 network structure comprise: make a call and unbuckled.