CN113283338A - Method, device and equipment for identifying driving behavior of driver and readable storage medium - Google Patents

Method, device and equipment for identifying driving behavior of driver and readable storage medium Download PDF

Info

Publication number
CN113283338A
CN113283338A CN202110569233.3A CN202110569233A CN113283338A CN 113283338 A CN113283338 A CN 113283338A CN 202110569233 A CN202110569233 A CN 202110569233A CN 113283338 A CN113283338 A CN 113283338A
Authority
CN
China
Prior art keywords
driver
pooling
attention module
image data
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110569233.3A
Other languages
Chinese (zh)
Inventor
肖卫初
刘宏立
马子骥
陈伟宏
孙长亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Hunan City University
Original Assignee
Hunan University
Hunan City University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University, Hunan City University filed Critical Hunan University
Priority to CN202110569233.3A priority Critical patent/CN113283338A/en
Publication of CN113283338A publication Critical patent/CN113283338A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for identifying driving behaviors of a driver, wherein the method comprises the following steps: acquiring a driver image containing driving behaviors of a driver; processing the driver image by adopting a data enhancement technology of random cutting to obtain first image data in a three-dimensional tensor format; processing the first image data in the three-dimensional tensor format using a convolutional neural network to generate second image data; inputting the second image data into a constructed CS _ ResNet model; in the CS _ ResNet model, a channel attention module and a space attention module are serially connected and embedded in a residual error network; fusing local features by outputting a residual network model through a full connection layer to form global features, and then calculating the score of each category by using a classifier; and obtaining a driver behavior recognition result according to the score of each category. The method can reduce the complexity of model calculation and improve the identification accuracy of the system.

Description

Method, device and equipment for identifying driving behavior of driver and readable storage medium
Technical Field
The invention relates to the field of artificial intelligence, in particular to a method, a device and equipment for identifying driving behaviors of a driver and a readable storage medium.
Background
With the improvement of living standard of people, automobiles become the most common transportation tools for people to go out. The automobile brings convenience to life of people and also brings a series of problems, such as automobile road traffic accidents, environmental pollution and the like, wherein the automobile road traffic accidents are concerned about the life safety of people, so that the automobile road traffic accidents are concerned.
The occurrence of the car traffic accident is attributed to driver distraction, car malfunction, bad weather, and the like, wherein more than 80% of the car traffic accidents are caused by the driver distraction. Driver distraction behavior refers to the behavior that the driver takes to divert driving attention, such as: doze, call, send short messages, smoke, etc. Driver behavior is a main aspect affecting driving safety, and driver distraction behavior is a research hotspot of safe driving.
The national center for disease prevention and control divides driver distraction behavior into cognitive distraction, visual distraction, and manual distraction. Cognitive distraction behavior refers to the driver's deviation from thinking driving. Visual distraction means that the driver has eyes away from the road during driving, e.g. dozing. Manual distraction behavior is a variety of activities related to the deviation of the driver's body from the driving device. For example: when making a call with either the left or right hand, the driver's hand is far from the steering wheel. The driver turns to talk to the passenger and his head is offset from the front of the vehicle. To understand these distracting behaviors, the driver's state information, such as hand movements, eye gaze, head pose, and foot dynamics, must be captured. Driver behavior recognition is studied from the aspects of data sets, models, algorithms, and the like to improve accuracy. In most existing methods, specific features are usually extracted from the original image in advance. For example, adjusting the behavior of a radio requires attention to the gaze direction of the eyes. The act of making a call may be focused on the hand position and shape of the handset. However, these features are not always readily available.
Driver distraction behavior identification has been extensively studied over the last 20 years. The way in which the characteristics of the action are identified is key to understanding the behavior of the driver. From the viewpoint of model construction, methods of driver behavior recognition can be classified into a conventional method, a shallow machine learning method, and a deep learning method.
In the study of traditional methods, researchers have focused primarily on manually capturing physiological signals of features including head pose, eye gaze, facial expressions, foot dynamics, hand motion. The system uses physiological sensors to detect physiological signals of the driver, such as electroencephalograms (EEG), electrocardiograms, and electrooculograms. These features may be designed by domain experts to be selectively extracted for specific tasks. This occurs because the driver's state information or characteristics of the driver's behavior contain important clues for behavior recognition using conventional methods. Electroencephalograms show their affinity to driver behavior and can be used to identify driver behavior. In conventional approaches, scale-invariant feature transform (SIFT) and Histogram of Oriented Gradients (HOG) are well-known two-dimensional feature descriptors for image classification. For behavior recognition, SIFT and HOG may be extended to extract features of three-dimensional data, denoted SIFT-3D and HOG3D, respectively. Although behavior recognition can be achieved using traditional methods, performance is limited due to the difficulty of manually extracting features of variable appearance and pose.
The shallow machine learning method is a machine learning method for automatically extracting data features so as to improve the accuracy of behavior recognition. For example: a Random Forest (RF) classifier classifies driver behavior. The method utilizes contour transformation to extract features, and adopts a PF classifier which has better performance than linear perceptron, K-neighbor and multilayer perceptron (MLP). Berri et al propose a support vector machine model to detect the position of the face and hands to identify whether the driver is using a cell phone. Craye et al use AdaBoost to classify driver distraction, with input images captured by a Kinect sensor. Under the environment with changed lighting conditions, the method of combining the HOG classifier and the AdaBoost classifier classifies the use of the mobile phone. Chiou et al propose a Hierarchical Driver Monitoring System (HDMS) that uses a sparse representation based on partial temporal face descriptors. The first layer of HDMS with sparse representation detects the driver's normal and abnormal behavior during driving, and the second layer of HDMS determines whether the driving behavior is drowsy or distracted. The technique of stacking and combining learners and aggregation and combination rules in the distracted driver detection system achieves good effects.
In recent years, with the successful application of deep learning in computer vision, a deep learning method of driver behavior recognition has been developed. For example: xing et al propose a deep CNN model for driver-related activity recognition that is capable of recognizing seven tasks. The model is based on the image segmentation result of the Gaussian mixture model, and the behavior of the driver is detected by utilizing deep models such as AlexNet, GoogleLeNet and ResNet. Yang et al propose a feed-forward neural network (FFNN) to identify seven driver behaviors. FFNN uses RF and maximum information coefficient methods to assess the importance of each driver feature to driver behavior recognition. Eraqi et al designed an integrated system in which the convolutional neural network was genetically weighted. The system has stronger robustness for recognizing the postures of the distracted driver, and the classifier based on the genetic algorithm has better classification precision. Chen et al propose a driver behavior analysis system that utilizes one ConvNet to obtain spatial features and another ConvNet to obtain driver motion information. The modal characteristics are classified using a converged network.
The core idea of the attention mechanism is to make the system learn attention, namely, to be able to ignore irrelevant information and focus on key information. Attention mechanisms were originally proposed in the field of visual images and were inspired by human attention mechanisms. When people look at an image, they do not actually look at every pixel of the entire image at once, but rather focus their attention on a particular portion of the image according to their needs. Furthermore, humans learn from images they have seen before, and they should focus their attention in the future. From the perspective of the attention area, attention can be divided into spatial, channel, and hybrid domains.
In recent years, deep learning methods with visual attention mechanisms have been developed. For example: mnih et al proposed a recurrent neural network model with attention mechanism to classify images, where a new layer of weights is added to identify key features of the image. Through learning and training, the DNN can learn the regions of interest in each new image. Wand et al propose a remaining attention network having a plurality of attention modules. An attention module comprising a mask and a main trunk branch is constructed on the basis of ResNet and inclusion. Hu et al propose a compression and excitation network for image classification that can adaptively recalibrate the channel feature response.
The inventor invents in the process of implementing the invention, and the above method has the problem of insufficient accuracy for classification tasks of driver distraction, which is caused by the following reasons:
1) some methodologies use global information but are weak at selectively emphasizing information features.
2) Some methods capture the spatial correlation between features through CNN and gaussian mixture models, obtaining a global receptive field of the driver's body. In the task of driver behavior recognition, usually only the most prominent image attributes are of interest. However, most approaches ignore this consideration.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method, a device, equipment and a readable storage medium for identifying the driving behavior of a driver, so as to improve the accuracy of the classification of the driving behavior.
In order to achieve the purpose, the invention provides the following technical scheme:
a driver driving behavior recognition method, comprising:
s101, acquiring a driver image containing driving behaviors of a driver;
s102, processing the driver image by adopting a data enhancement technology of random cutting to reduce the resolution of the driver image, thereby obtaining first image data in a three-dimensional tensor format;
s103, processing the first image data in the three-dimensional tensor format by using a convolutional neural network to generate second image data; the convolution operation adopts tensor-based three-dimensional convolution operation to reduce the dimensionality of the first image data and create invariance to small distortion and displacement;
s104, inputting the second image data into the constructed CS _ ResNet model; in the CS _ ResNet model, a channel attention module and a space attention module are connected in series and embedded in a residual error network, and the channel attention module and the space attention module utilize maximum pooling and average pooling to reduce model calculation complexity and improve system identification accuracy;
s105, fusing local features through a full connection layer to output a residual network model to form global features, and then calculating the score of each category by using a classifier;
and S106, obtaining a driver behavior recognition result according to the score of each category.
Preferably, the driver behavior includes 10 types: c0, safe driving, C1, right-handed information sending, C2, left-handed information sending, C3, right-handed telephone calling, C4, left-handed telephone calling, C5, radio adjusting, C6 wine drinking, C7, extending hands to the back, C8, hair touching/makeup, and C9, passenger talking.
Preferably, in step S102, the 1920 × 1028 × 3 original image is reduced to the 224 × 224 × 3 first image by random cropping.
Preferably, the convolutional neural network consists of a three-dimensional convolutional layer, a ReLU activation function, and a pooling layer; wherein:
in the three-dimensional convolutional layer of the convolutional neural network, each unit is connected to a local patch in the feature map of the previous layer by a set of weights called filter bank; in the jth feature map of the ith layer, the convolution value of the cell at position (x, y) is calculated as:
Figure BDA0003081999940000051
wherein, bi,jIs the deviation of the jth feature map in the ith layer,
Figure BDA0003081999940000052
is the weight of the position (p, q) of the mth connected to the jth feature map in the ith layer;
Figure BDA0003081999940000053
is the value of the previous feature map location (x + P, y + q), PiAnd QiThe height and width of the kernel, respectively;
then, carrying out nonlinear transformation on the convolution operation result through ReLU or Sigmoid; ReLU is a corrected linear unit defined as follows:
f(x)=max(0,x)
wherein x is an input of a non-linear function, various feature mappings in the layer utilize different filter groups, and all units in the feature mappings share the same filter group;
the pooling layer is used to fuse similar features to reliably detect patterns; maximum pooling and average pooling are two typical pooling methods; for maximum pooling, the maximum of the local block of cells is calculated by pooling cells in the feature map; the average pooling method is to calculate the average of local blocks of cells, which can be moved by multiple rows or columns and used as input to adjacent pooled cells, thus enabling to reduce the dimensionality of the data and create invariance to small distortions and shifts.
Preferably, in step S104, in the CS _ ResNet model, two methods, namely average pooling and maximum pooling, are used to calculate the channel attention;
average pooling is used for compressing the input space dimension and learning the range of the target object;
max pooling is used to collect clues about unique object features;
for a given one intermediate feature mapping XC∈RC×H×WAverage pooling Z as input to the channel attention ModuleaAnd maximum pooling ZmThe calculation is as follows:
Figure BDA0003081999940000054
Zm=max{Xc(1,1),...,Xc(1,W);Xc(2,1),...,;Xc(2,W);...;Xc(H,1),...,Xc(H,W)}
wherein Z isaAnd ZmRespectively representing an average pooled output and a maximum pooled output;
in order to fully capture the dependency relationship between channels, average pooling and maximum pooling operations are sequentially transmitted into a convolutional layer and a nonlinear transformation layer, and then the results of the convolutional layer and the nonlinear transformation layer are fused through a fusion module; wherein the fusion module is composed of a multilayer perceptron and can generate CA mapping Yc=RC/r×1×1(ii) a Wherein the sensor is designed as a fully connected layer with a dimensionality reduction ratio r; a Sigmoid excitation mechanism is adopted, so that the model has flexibility;
the channel attention module is described as follows:
Yc=gs(Pavg(Xc)+Pmax(Xc))
wherein g issIndicating Sigmoid activation, + indicating a fully connected sensor, PavgAnd PmaxAverage pooling and maximum pooling, respectively.
Preferably, step S104 further includes, in CS _ ResNet:
introducing a spatial attention module into the constructed model to highlight attention to the valuable area; wherein, in a feature mapping Xs∈RC×H×WAn effective spatial attention module corresponds to Y using the spatial relationship of the featuress∈R1 ×H×WGiven input X of the spatial attention ModulesThe method sequentially comprises average pooling, maximum pooling, convolution operation and nonlinear transformation to obtain output Y of the space attention modules(ii) a Thus, a feature map with a size of 1 × H × W is obtained by using the average pooling and the maximum pooling;
in particular, for generating a cross-channel 2D spatial attention module map, input X is givensThe output of the compute spatial attention module is as follows:
Ys=gs(Cat(Pmax(Xs),Pavg(Xs)))
wherein g issRepresents a Sigmoid activation function, CatIndicating a connection operation, PavgIs average pooling, PmaxIs the maximum pooling;
for the CS _ ResNet model, which is a mixture of channel attention and spatial attention, if channel attention teaches what the model is paying attention to, spatial attention will allow the model to know where to pay attention to;
assume the input of CS _ ResNet is XrThen the output is calculated as follows:
Yr=gr((((gr(Xr*k)*k)×Yc)×Ys)+Xr)
wherein g isrRepresenting a ReLU activation function; k represents a convolution kernel; x, and + are convolution, multiplication, and addition operations, respectively. Y iscIs the output of the channel attention module, and YsIs the output of the spatial attention module.
Preferably, in step S105, the global feature is formed by fusing the local features through the full connection layer, and then the score of each category is calculated by using the softmax classifier, so as to obtain the driver behavior recognition result.
The embodiment of the invention also provides a device for identifying the driving behavior of the driver, which comprises:
the image acquisition unit is used for acquiring a driver image containing the driving behavior of the driver;
a random cropping unit, configured to process the driver image by using a data enhancement technique of random cropping to reduce a resolution of the driver image, thereby obtaining first image data in a three-dimensional tensor format;
a convolution operation unit configured to process the first image data in the three-dimensional tensor format using a convolution neural network to generate second image data; the convolution operation adopts tensor-based three-dimensional convolution operation to reduce the dimensionality of the first image data and create invariance to small distortion and displacement;
the input unit is used for inputting the second image data to the constructed CS _ ResNet model; in the CS _ ResNet model, a channel attention module and a space attention module are connected in series and embedded in a residual error network, and the channel attention module and the space attention module utilize maximum pooling and average pooling to reduce model calculation complexity and improve system identification accuracy;
the classification unit is used for fusing all local features through a full connection layer to form a global feature by outputting the residual network model, and then calculating the score of each category by using a classifier;
and the identification unit is used for obtaining a driver behavior identification result according to the score of each category.
The embodiment of the present invention further provides a driver driving behavior recognition device, which includes a memory and a processor, where the memory stores a computer program, and the computer program can be executed by the processor to implement the above driver driving behavior recognition method.
Embodiments of the present invention further provide a computer-readable storage medium, which stores a computer program, where the computer program is executable by a processor of a device on which the storage medium is located, so as to implement the above-mentioned method for identifying driving behavior of a driver.
In summary, in the embodiment, the channel attention module and the spatial attention module are serially connected and embedded into the residual error network, the channel attention module fully captures the dependency relationship between channels, teaches what features the model pays attention to, and the spatial attention module instructs where the features the model pays attention to, and the channel attention module and the spatial attention module are combined to realize adaptive feature extraction, thereby reducing the model calculation complexity and improving the system identification accuracy.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a schematic flow chart of a method for identifying driving behavior of a driver according to a first embodiment of the present invention.
Fig. 2 is a working schematic diagram of a driving behavior recognition method for a driver according to a first embodiment of the present invention.
Fig. 3 is a schematic view of different driving behaviors.
Fig. 4 is an architecture diagram of a CS _ ResNet model according to a first embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a driving behavior recognition apparatus for a driver according to a second embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the drawings and specific embodiments. This embodiment is merely an example of the present invention and does not include all embodiments.
Referring to fig. 1 and fig. 2, a first embodiment of the present invention provides a method for identifying driving behavior of a driver, which includes:
s101, obtaining a driver image containing the driving behavior of the driver.
In this embodiment, for example, a driver behavior video may be recorded by a built-in vehicle-mounted camera, and the video is divided into one frame of image with a size of 1920 × 1080, so as to obtain a driver data set; wherein, considering the need for training and verification of the model, in a possible implementation of the present embodiment, the driver data set has 17308 images in common, wherein 12978 training images and 4331 testing images.
The data set contains 10 classes of driver behavior: c0, safe driving, C1, right-handed information sending, C2, left-handed information sending, C3, right-handed telephone calling, C4, left-handed telephone calling, C5, radio adjusting, C6 wine drinking, C7, extending hands to the back, C8, hair touching/makeup, and C9, passenger talking. As shown in fig. 3.
Of course, it should be noted that in other embodiments of the present invention, different driver behaviors may be defined according to actual needs, and these schemes are all within the protection scope of the present invention.
S102, processing the driver image by adopting a data enhancement technology of random cutting to reduce the resolution of the driver image, thereby obtaining first image data in a three-dimensional tensor format.
For example, in the present embodiment, the input original image data is reduced from (1920 × 1028 × 3) to the first image data of (224 × 224 × 3) by processing the driver image data using the image enhancement technique of random cropping. Random clipping can change the size of a sample, improve the quality of a training data set, and enable training data to be as close to test data as possible, thereby improving the performance of a model.
S103, processing the first image data in the three-dimensional tensor format by using a convolutional neural network to generate second image data; wherein the convolution operation employs a tensor-based three-dimensional convolution operation to reduce dimensionality of the first image data and create invariance to small distortions and shifts.
In the present embodiment, the second image data in the three-dimensional tensor format from step S102 is processed using a Convolutional Neural Network (CNN) composed of a convolutional layer, a ReLU activation function, and a pooling layer.
The CNN is in the form of three two-dimensional arrays with RGB channels. In the convolutional layer of CNN, each unit is connected to a local patch in the upper layer eigenmap by a set of weights called filter bank. In the jth feature map of the ith layer, the convolution value of the cell at position (x, y):
Figure BDA0003081999940000091
wherein, bi,jIs the deviation of the jth feature map in the ith layer,
Figure BDA0003081999940000092
is the weight of the position (p, q) of the mth connected to the jth feature map in the ith layer.
Figure BDA0003081999940000093
Is the value of the previous feature map location (x + P, y + q), PiAnd QiRespectively the height and width of the kernel. The output of the convolution operation is then passed through a non-linear transformation (e.g., ReLU or Sigmoid). ReLU is a corrected linear unit defined as follows:
f(x)=max(0,x)
where x is the input to the non-linear function. The various feature maps in the layer utilize different filter sets, and all elements in a feature map may share the same filter set. The reason is that local groups in the image data are typically highly correlated and local patterns can be easily detected. In the CNN model, convolutional layers are used to detect local connections of features from previous layers, and pooling layers are used to fuse similar features to reliably detect patterns. Maximum pooling and average pooling are two typical pooling methods. For maximum pooling, the maximum of the local block of cells is calculated from pooled cells in the feature map. The average pooling method is to calculate the average of local cell blocks. A block may be moved by a number of rows or columns and used as an input to an adjacent pooled cell. Thus, the dimensionality of the data is reduced and invariance to small distortions and shifts is created. Multi-stage stacking of convolution, non-linearity and pooling, backpropagating gradients are used across the entire depth network, which allows the weights in all filter banks to be trained.
S104, inputting the second image data into the constructed CS _ ResNet model; in the CS _ ResNet model, a channel attention module and a space attention module are serially connected and embedded in a residual error network, and the channel attention module and the space attention module utilize maximum pooling and average pooling to reduce model calculation complexity and improve system identification accuracy.
In this embodiment, the CS _ ResNet model consists of convolution, pooling, activation functions, channel attention and spatial attention, etc., where the channel attention module (CA) and the spatial attention module (SA) are serially connected embedded in the residual network.
In the present embodiment, the output result of step S103 is used as the input of CS _ ResNet, and the channel attention and the spatial attention utilize maximum pooling and average pooling to reduce the computational complexity of the model, which effectively solves the degradation problem, and improve the system identification accuracy.
The method for constructing the CS _ ResNet model specifically comprises the following steps:
first, two methods, average pooling and maximum pooling, are employed to efficiently calculate channel attention.
The average pooling can compress the spatial dimension of the input, learning the range of the target object. Maximum pooling may collect some clues about unique object features. This example combines the features of average pooling and maximum pooling to infer the attention of a good channel. Given an intermediate feature map XC∈RC×H×WAs input for CA, average pooling ZaAnd maximum pooling ZmThe calculation is as follows:
Figure BDA0003081999940000101
Zm=max{Xc(1,1),...,Xc(1,W);Xc(2,1),...,;Xc(2,W);...;Xc(H,1),...,Xc(H,W)}
wherein Z isaAnd ZmMean pooling output and maximum pooling output are indicated, respectively.
In order to fully capture the dependency relationship between channels, the output of the average pooling and the maximum pooling operations is sequentially transmitted into a convolutional layer and a nonlinear transformation layer, and then the results of the convolutional layer and the nonlinear transformation layer are fused through a fusion module; wherein the fusion module is composed of a multilayer perceptron and can generate CA mapping Yc=RC/r×1×1. To lowerWith low model complexity, the sensor is designed as a fully connected layer with a dimensionality reduction ratio r. And a simple Sigmoid excitation mechanism is adopted, so that the model has flexibility. In summary, CA is described as follows:
Yc=gs(Pavg(Xc)+Pmax(Xc))
wherein g issIndicating Sigmoid activation, + indicating a fully connected sensor, PavgAnd PmaxAverage pooling and maximum pooling, respectively.
The SA module is then introduced into the constructed model to highlight the interest in the valuable area. In a feature map Xs∈RC×H×WBy using the spatial relationship of the features, an effective SA corresponds to Ys∈R1×H×W. Input X given SAsIt sequentially passes through average pooling, maximum pooling, convolution operation and nonlinear transformation to obtain the output Y of SAs
A feature map of size 1 xh × W can be obtained by using two pooling operations. Specifically, a 2D SA map across channels is generated. Given input XsThe output of SA may be calculated as follows:
Ys=gs(Cat(Pmax(Xs),Pavg(Xs)))
wherein g issRepresents a Sigmoid activation function, CatIndicating a connection operation, PavgIs average pooling, PmaxIs the maximum pooling.
For the CS _ ResNet model where CA and SA are mixed, if CA teaches what the model notices, SA will allow the model to know where to notice. SA is a complement to CA. Due to its lightweight computing, note that the module can be integrated into the DNN model, such as a residual network. Fig. 4 depicts a CS _ ResNet model framework in which CA and SA serial connections are embedded in a residual network. Assume the input of CS _ ResNet is XrThen the output is calculated as follows:
Yr=gr((((gr(Xr*k)*k)×Yc)×Ys)+Xr)
wherein g isrRepresenting a ReLU activation function; k represents a convolution kernel; x, and + are convolution, multiplication, and addition operations, respectively. Y iscIs the output of CA, and YsIs the output of the SA. In the sequential arrangement, the CS _ ResNet model is effective in driver behavior recognition.
And S105, fusing the local features through the full connection layer by using the residual network model output to form a global feature, and then calculating the score of each category by using a classifier.
And S106, obtaining a driver behavior recognition result according to the score of each category.
In the embodiment, the global feature is formed by fusing the local features through the full connection layer, and then the score of each category is calculated by using the softmax classifier, so that the driver behavior recognition result is obtained.
In summary, in the embodiment, the channel attention module and the spatial attention module are serially connected and embedded into the residual error network, the channel attention module fully captures the dependency relationship between channels, teaches what features the model pays attention to, and the spatial attention module instructs where the features the model pays attention to, and the channel attention module and the spatial attention module are combined to realize adaptive feature extraction, thereby reducing the model calculation complexity and improving the system identification accuracy.
Referring to fig. 5, a second embodiment of the present invention further provides a driving behavior recognition apparatus for a driver, including:
an image acquisition unit 210 for acquiring a driver image including a driving behavior of the driver;
a random cropping unit 220, configured to process the driver image by using a data enhancement technique of random cropping to reduce a resolution of the driver image, so as to obtain first image data in a three-dimensional tensor format;
a convolution operation unit 230 for processing the first image data of the three-dimensional tensor format using a convolution neural network to generate second image data; the convolution operation adopts tensor-based three-dimensional convolution operation to reduce the dimensionality of the first image data and create invariance to small distortion and displacement;
an input unit 240, configured to input the second image data to the constructed CS _ ResNet model; in the CS _ ResNet model, a channel attention module and a space attention module are connected in series and embedded in a residual error network, and the channel attention module and the space attention module utilize maximum pooling and average pooling to reduce model calculation complexity and improve system identification accuracy;
a classification unit 250, configured to fuse local features of the residual network model output through a full connection layer to form a global feature, and then calculate a score of each category using a classifier;
and the identifying unit 260 is used for obtaining a driver behavior identifying result according to the score of each category.
The third embodiment of the present invention also provides a driver driving behavior recognition apparatus, which includes a memory and a processor, wherein the memory stores a computer program, and the computer program can be executed by the processor to realize the driver driving behavior recognition method.
The fourth embodiment of the present invention also provides a computer-readable storage medium storing a computer program executable by a processor of an apparatus on which the storage medium is located to implement the above-described driver's driving behavior recognition method.
Illustratively, the computer programs described in the above embodiments may be partitioned into one or more modules, which are stored in the memory and executed by the processor to implement the present invention. The one or more modules may be a series of computer program instruction segments capable of performing particular functions.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may be used to store the computer programs and/or modules, and the processor may implement the various functions of the present embodiment by running or executing the computer programs and/or modules stored in the memory and calling the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, a text conversion function, etc.), and the like; the storage data area may store data (such as audio data, text message data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein the implemented module, if implemented in the form of a software functional unit and sold or used as a stand-alone product, can be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A driver driving behavior recognition method, characterized by comprising:
s101, acquiring a driver image containing driving behaviors of a driver;
s102, processing the driver image by adopting a data enhancement technology of random cutting to reduce the resolution of the driver image, thereby obtaining first image data in a three-dimensional tensor format;
s103, processing the first image data in the three-dimensional tensor format by using a convolutional neural network to generate second image data; the convolution operation adopts tensor-based three-dimensional convolution operation to reduce the dimensionality of the first image data and create invariance to small distortion and displacement;
s104, inputting the second image data into the constructed CS _ ResNet model; in the CS _ ResNet model, a channel attention module and a space attention module are connected in series and embedded in a residual error network, and the channel attention module and the space attention module utilize maximum pooling and average pooling to reduce model calculation complexity and improve system identification accuracy;
s105, fusing local features through a full connection layer to output a residual network model to form global features, and then calculating the score of each category by using a classifier;
and S106, obtaining a driver behavior recognition result according to the score of each category.
2. The driver driving behavior recognition method according to claim 1, characterized in that the driver behaviors include 10 different types: c0, safe driving, C1, right-handed information sending, C2, left-handed information sending, C3, right-handed telephone calling, C4, left-handed telephone calling, C5, radio adjusting, C6 wine drinking, C7, extending hands to the back, C8, hair touching/makeup, and C9, passenger talking.
3. The driver driving behavior recognition method according to claim 1, characterized in that in step S102, the 1920 x 1028 x 3 original image is reduced to the 224 x 3 first image by random cropping.
4. The driver driving behavior recognition method according to claim 1, wherein the convolutional neural network is composed of a three-dimensional convolutional layer, a ReLU activation function, and a pooling layer; wherein:
in the three-dimensional convolutional layer of the convolutional neural network, each unit is connected to a local patch in the feature map of the previous layer by a set of weights called filter bank; in the jth feature map of the ith layer, the convolution value of the cell at position (x, y) is calculated as:
Figure FDA0003081999930000021
wherein, bi,jIs the deviation of the jth feature map in the ith layer,
Figure FDA0003081999930000022
is the weight of the position (p, q) of the mth connected to the jth feature map in the ith layer;
Figure FDA0003081999930000023
is the value of the previous feature map location (x + P, y + q), PiAnd QiThe height and width of the kernel, respectively;
then, carrying out nonlinear transformation on the convolution operation result through ReLU or Sigmoid; ReLU is a corrected linear unit defined as follows:
f(x)=max(0,x)
wherein x is an input of a non-linear function, various feature mappings in the layer utilize different filter groups, and all units in the feature mappings share the same filter group;
the pooling layer is used to fuse similar features to reliably detect patterns; maximum pooling and average pooling are two typical pooling methods; for maximum pooling, the maximum of the local block of cells is calculated by pooling cells in the feature map; average pooling is the calculation of an average of local blocks of cells, which may be moved by a number of rows or columns and used as input for neighboring pooled cells.
5. The driver driving behavior recognition method according to claim 4, wherein in step S104, in the CS _ ResNet model, two methods of average pooling and maximum pooling are employed to calculate the channel attention;
average pooling is used for compressing the input space dimension and learning the range of the target object;
max pooling is used to collect clues about unique object features;
for a given one intermediate feature mapping XC∈RC×H×WAverage pooling Z as input to the channel attention ModuleaAnd maximum pooling ZmThe calculation is as follows:
Figure FDA0003081999930000031
Zm=max{Xc(1,1),...,Xc(1,W);Xc(2,1),...,;Xc(2,W);...;Xc(H,1),...,Xc(H,W)}
wherein Z isaAnd ZmRespectively representing an average pooled output and a maximum pooled output;
in order to fully capture the dependency relationship between channels, the output of the average pooling and the maximum pooling operations is sequentially transmitted into a convolutional layer and a nonlinear transformation layer, and then the results of the convolutional layer and the nonlinear transformation layer are fused through a fusion module; wherein the fusion module is composed of a multilayer perceptron and can generate CA mapping Yc=Rc/r×1×1(ii) a Wherein the sensor is designed as a fully connected layer with a dimensionality reduction ratio r; a Sigmoid excitation mechanism is adopted to enable the model to have flexibility;
the channel attention module is described as follows:
Yc=gs(Pavg(Xc)+Pmax(Xc))
wherein g issIndicating Sigmoid activation, + indicating a fully connected sensor, PavgAnd PmaxAverage pooling and maximum pooling, respectively.
6. The driver driving behavior recognition method according to claim 5, wherein step S104, in CS _ ResNet, further comprises:
introducing a spatial attention module into the constructed model to highlight attention to the valuable area; wherein, in a feature mapping Xs∈RC×H×WBy using the spatial relationship of the features, an effective spatial attentionForce module corresponds to Ys∈R1×H×WGiven input X of the spatial attention ModulesThe method sequentially comprises average pooling, maximum pooling, convolution operation and nonlinear transformation to obtain output Y of the space attention modules(ii) a Thus, a feature map with a size of 1 × H × W is obtained by using the average pooling and the maximum pooling;
in particular, for generating a cross-channel 2D spatial attention module map, input X is givensThe output of the compute spatial attention module is as follows:
Ys=gs(Cat(Pmax(Xs),Pavg(Xs)))
wherein g issRepresents a Sigmoid activation function, CatIndicating a connection operation, PavgIs average pooling, PmaxIs the maximum pooling;
assume the input of CS _ ResNet is XrThen the output is calculated as follows:
Yr=gr((((gr(Xr*k)*k)×Yc)×Ys)+Xr)
wherein g isrRepresenting a ReLU activation function; k represents a convolution kernel; x, and + are convolution, multiplication, and addition operations, respectively; y iscIs the output of the channel attention module, and YsIs the output of the spatial attention module.
7. The method for identifying the driving behavior of the driver as claimed in claim 1, wherein in step S105, the global feature is formed by fusing the local features through the full connection layer, and then the softmax classifier is used to calculate the score of each category, so as to obtain the identification result of the driving behavior.
8. A driver driving behavior recognition apparatus characterized by comprising:
the image acquisition unit is used for acquiring a driver image containing the driving behavior of the driver;
a random cropping unit, configured to process the driver image by using a data enhancement technique of random cropping to reduce a resolution of the driver image, thereby obtaining first image data in a three-dimensional tensor format;
a convolution operation unit configured to process the first image data in the three-dimensional tensor format using a convolution neural network to generate second image data; the convolution operation adopts tensor-based three-dimensional convolution operation to reduce the dimensionality of the first image data and create invariance to small distortion and displacement;
the input unit is used for inputting the second image data to the constructed CS _ ResNet model; in the CS _ ResNet model, a channel attention module and a space attention module are connected in series and embedded in a residual error network, and the channel attention module and the space attention module utilize maximum pooling and average pooling to reduce model calculation complexity and improve system identification accuracy;
and the classification unit is used for fusing the local features through a full connection layer to form a global feature by outputting the residual network model, and then calculating the score of each category by using the classifier.
CN202110569233.3A 2021-05-25 2021-05-25 Method, device and equipment for identifying driving behavior of driver and readable storage medium Pending CN113283338A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110569233.3A CN113283338A (en) 2021-05-25 2021-05-25 Method, device and equipment for identifying driving behavior of driver and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110569233.3A CN113283338A (en) 2021-05-25 2021-05-25 Method, device and equipment for identifying driving behavior of driver and readable storage medium

Publications (1)

Publication Number Publication Date
CN113283338A true CN113283338A (en) 2021-08-20

Family

ID=77281276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110569233.3A Pending CN113283338A (en) 2021-05-25 2021-05-25 Method, device and equipment for identifying driving behavior of driver and readable storage medium

Country Status (1)

Country Link
CN (1) CN113283338A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241454A (en) * 2021-12-20 2022-03-25 东南大学 Method for recognizing distracted driving by using remapping attention
CN114241453A (en) * 2021-12-20 2022-03-25 东南大学 Driver distraction monitoring method utilizing key point attention
CN114343640A (en) * 2022-01-07 2022-04-15 北京师范大学 Attention assessment method and electronic equipment
CN117842923A (en) * 2024-02-06 2024-04-09 浙江驿公里智能科技有限公司 Control system and method of intelligent full-automatic oiling robot

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871777A (en) * 2019-01-23 2019-06-11 广州智慧城市发展研究院 A kind of Activity recognition system based on attention mechanism
CN110059582A (en) * 2019-03-28 2019-07-26 东南大学 Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks
CN111428699A (en) * 2020-06-10 2020-07-17 南京理工大学 Driving fatigue detection method and system combining pseudo-3D convolutional neural network and attention mechanism
CN112016499A (en) * 2020-09-04 2020-12-01 山东大学 Traffic scene risk assessment method and system based on multi-branch convolutional neural network
CN112149504A (en) * 2020-08-21 2020-12-29 浙江理工大学 Motion video identification method combining residual error network and attention of mixed convolution

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871777A (en) * 2019-01-23 2019-06-11 广州智慧城市发展研究院 A kind of Activity recognition system based on attention mechanism
CN110059582A (en) * 2019-03-28 2019-07-26 东南大学 Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks
CN111428699A (en) * 2020-06-10 2020-07-17 南京理工大学 Driving fatigue detection method and system combining pseudo-3D convolutional neural network and attention mechanism
CN112149504A (en) * 2020-08-21 2020-12-29 浙江理工大学 Motion video identification method combining residual error network and attention of mixed convolution
CN112016499A (en) * 2020-09-04 2020-12-01 山东大学 Traffic scene risk assessment method and system based on multi-branch convolutional neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LEI ZHAO ET AL.: "Driver behavior detection via adaptive spatial attention mechanism", 《ADVANCED ENGINEERING INFORMATICS 》 *
S. JI ET AL.: "3D Convolutional Neural Networks for Human Action Recognition", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
WOO S. ET AL.: "CBAM: Convolutional Block Attention Module", 《ECCV 2018. LECTURE NOTES IN COMPUTER SCIENCE》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241454A (en) * 2021-12-20 2022-03-25 东南大学 Method for recognizing distracted driving by using remapping attention
CN114241453A (en) * 2021-12-20 2022-03-25 东南大学 Driver distraction monitoring method utilizing key point attention
CN114241453B (en) * 2021-12-20 2024-03-12 东南大学 Driver distraction driving monitoring method utilizing key point attention
CN114241454B (en) * 2021-12-20 2024-04-23 东南大学 Method for identifying distraction driving by using remapping attention
CN114343640A (en) * 2022-01-07 2022-04-15 北京师范大学 Attention assessment method and electronic equipment
CN114343640B (en) * 2022-01-07 2023-10-13 北京师范大学 Attention assessment method and electronic equipment
CN117842923A (en) * 2024-02-06 2024-04-09 浙江驿公里智能科技有限公司 Control system and method of intelligent full-automatic oiling robot

Similar Documents

Publication Publication Date Title
CN110059582B (en) Driver behavior identification method based on multi-scale attention convolution neural network
CN113283338A (en) Method, device and equipment for identifying driving behavior of driver and readable storage medium
US11783601B2 (en) Driver fatigue detection method and system based on combining a pseudo-3D convolutional neural network and an attention mechanism
Kopuklu et al. Driver anomaly detection: A dataset and contrastive learning approach
CN108388888B (en) Vehicle identification method and device and storage medium
US20220277558A1 (en) Cascaded Neural Network-Based Attention Detection Method, Computer Device, And Computer-Readable Storage Medium
Moslemi et al. Driver distraction recognition using 3d convolutional neural networks
CN115082698B (en) Distraction driving behavior detection method based on multi-scale attention module
Xiao et al. Attention-based deep neural network for driver behavior recognition
CN106709475A (en) Obstacle recognition method and device, computer equipment and readable storage medium
CN205230272U (en) Driver drive state monitoring system
CN111967319B (en) Living body detection method, device, equipment and storage medium based on infrared and visible light
CN111401196A (en) Method, computer device and computer readable storage medium for self-adaptive face clustering in limited space
CN111860316A (en) Driving behavior recognition method and device and storage medium
CN111950362B (en) Golden monkey face image recognition method, device, equipment and storage medium
CN115331205A (en) Driver fatigue detection system with cloud edge cooperation
He et al. A lightweight architecture for driver status monitoring via convolutional neural networks
Wagner et al. Vision based detection of driver cell phone usage and food consumption
Suresh et al. Driver drowsiness detection using deep learning
CN114360073A (en) Image identification method and related device
CN114492634A (en) Fine-grained equipment image classification and identification method and system
Pandey et al. Dumodds: Dual modeling approach for drowsiness detection based on spatial and spatio-temporal features
CN115205923A (en) Micro-expression recognition method based on macro-expression state migration and mixed attention constraint
CN114937300A (en) Method and system for identifying shielded face
CN113537176A (en) Method, device and equipment for determining fatigue state of driver

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination