CN110059582A

CN110059582A - Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks

Info

Publication number: CN110059582A
Application number: CN201910242262.1A
Authority: CN
Inventors: 路小波; 胡耀聪; 陆明琦
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2019-07-26
Anticipated expiration: 2039-03-28
Also published as: CN110059582B

Abstract

The invention discloses a kind of driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks, include the following steps: the image data set of (1) shooting driving behavior identification；(2) the driving behavior data set obtained to shooting does data enhancing and will enhance obtained sample while being included in training data；(3) neural network model, including three modules are constructed, is respectively as follows: multiple dimensioned convolution module, pays attention to power module and categorization module；(4) the multiple dimensioned attention convolutional neural networks of training；Network model is built using Pytorch Open-Source Tools, optimizes network parameter using stochastic gradient descent method；(5) multiple row convolutional neural networks are tested.Multiple dimensioned model and attention mechanism are introduced by the present invention to be used to extract the behavior fine granularity character representation with discrimination in driving behavior identification mission, can further improve driving behavior recognition accuracy.

Description

Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks

Technical field

It is especially a kind of based on multiple dimensioned attention convolution mind the present invention relates to image procossing and mode identification technology Driving behavior recognition methods through network.

Background technique

In recent years, with the continuous improvement of scientific and technological level and living standard, automobile has come into huge numbers of families, at present state Interior car ownership reaches 3.25 hundred million, is only second to the U.S..It is many convenient that the popularizing of automobile brings to the trip of people, together When potential hidden danger also is caused to traffic safety.According to the ASSOCIATE STATISTICS in Chinese transportation portion, sent out altogether in the whole country in 2017 212846 traffic accidents have been given birth to, have caused 63093 people dead, wherein 80% or more traffic accident and the violation of driver drive Behavior is closely bound up.Thin due to traffic law consciousness, it is relatively conventional that driver plays mobile phone, the bad steerings behavior such as smoking. In real life, unsafe driver behavior seriously disperses the attention of driver, reduces reaction and movement speed, gently then causes Traffic jam, it is heavy then cause traffic accident.Therefore the research of driving behavior recognizer is to road safety management and traffic intelligence Energyization has important meaning.

(Advanced Driver Assistance System, ADAS), built-in vehicle in advanced driving assistance system The behavior state of driver can be captured by carrying camera, however the driving behavior recognition accuracy that automatically analyzes of view-based access control model compared with It is low, but still there are a series of challenges:

(1) different driving behaviors such as normal driving, off-direction disk, smoking etc. of driving belong to and drive this big row For classification, and inter-class variance of these subclasses in image level is very small, and similarity is very high on global characteristics, only certain Different from local detail feature；

(2) different drivers has a variety of different driving habits, and such as holding the mode of steering wheel, there are notable differences.This It allows for driver and is presented biggish variance within clusters in image posture, while illumination blocks etc. and also to cause difficulty to accurately identifying Degree.

Summary of the invention

Technical problem to be solved by the present invention lies in provide a kind of driving based on multiple dimensioned attention convolutional neural networks Multiple dimensioned model and attention mechanism are introduced into and are used to extract tool in driving behavior identification mission by the person's of sailing Activity recognition method There is the behavior fine granularity character representation of discrimination, can further improve driving behavior recognition accuracy.

In order to solve the above technical problems, the present invention provides a kind of driver based on multiple dimensioned attention convolutional neural networks Activity recognition method, includes the following steps:

(1) image data set of shooting driving behavior identification；

(2) the driving behavior data set obtained to shooting does data enhancing and will enhance obtained sample while being included in instruction Practice in data；

(3) neural network model, including three modules are constructed, is respectively as follows: multiple dimensioned convolution module, pays attention to power module and divide Generic module；

(4) the multiple dimensioned attention convolutional neural networks of training；Network model is built using Pytorch Open-Source Tools, is used Stochastic gradient descent method optimizes network parameter；

(5) multiple row convolutional neural networks are tested.

Preferably, in step (1), driving behavior covers 6 kinds of different driving behaviors, including C0: safe driving；C1: Off-direction disk drives；C2: it makes a phone call to drive；C3: it bows and sees the mobile phone；C4: it smokes and drives；C5: it is talked with passenger.

Preferably, in step (2), data enhancing is done to the obtained driving behavior data set of shooting and obtains enhancing Sample is included in training data simultaneously to be specifically comprised the following steps:

(21) image normalization of input is 256 × 256, randomly selected by the data enhancement methods for using random cropping 224 × 224 image block is as training sample；

(22) it using the data enhancement methods of image content-based transformation, comprising small angle rotation, mirror image plus makes an uproar and Gauss Smoothly；

(23) if including K training sample in training set, X={ χ can be denoted as₁,χ₂,...χ_N, and in training set N-th of sample can be expressed as χ_k={ I_k,l_k, wherein I_kIndicate k-th three to lead to image, having a size of for 224 × 224 × 3, l_kIndicate its corresponding class label.

Preferably, in step (3), multiple dimensioned convolution module is input with original image, using the convolution kernel of different scale Image is successively filtered, excitation function of the maximum selection rule unit as each multiple dimensioned convolution block is melted with adaptive It closes layer-by-layer multi-scale information and tentatively extracts behavioural characteristic；Notice that power module refines behavioural characteristic, which passes through Learn Pixel-level weight matrix and channel level weight matrix obtains the Pixel-level conspicuousness and channel level conspicuousness of behavioural characteristic, and Behavioural characteristic is refined using the strategy of soft attention；Categorization module is by full articulamentum and softmax layers to driver's row To classify.

Preferably, in step (3), building neural network model specifically comprises the following steps:

(31) for the network frame designed using 224 × 224 × 3 original image as input, first layer is basic convolutional layer, Original image is filtered with 64 7 × 7 × 3 convolution kernels, maximum value pond layer will input dimensionality reduction at 56 × 56 × 64 Characteristic pattern is specifically expressed as follows:

x_bc=σ (I*W+b) (1)

F_bc=down (x_bc) (2)

Wherein * indicates convolution operation, θ_bc={ W, b } indicates that basic convolutional layer weight and threshold parameter, σ () indicate ReLU Excitation function, down () indicate the operation of maximum value pondization, F_bcIndicate the output characteristic pattern of basic convolutional layer；

Remaining convolutional layer is stacked by 8 multiple dimensioned convolution blocks, multiple dimensioned convolution block by 4 kinds of different scales (1 × 1, 3 × 3,5 × 5,7 × 7) filtering core the parallel combined forms, and each multiple dimensioned convolution block is realized certainly by maximum selection rule unit The multi-scale information of adaptation merges, and inhibits gradient explosion and gradient diffusing phenomenon using residual error learning method；

First of multiple dimensioned convolution block carries out convolution to the characteristic pattern that a upper block exports, and may be expressed as:

x^(l)=F^(l-1)*W^(l)+b^(l), l=1,2 ..., 8 } (3)

WhereinIndicate the weight and threshold parameter of first of multiple dimensioned convolution block, F^(l-1)Indicate upper one The output of secondary multiple dimensioned convolution block, x^(l)Indicate first piece of multiple dimensioned convolution characteristic pattern, first multiple dimensioned convolution block it is defeated Enter the output characteristic pattern for basic convolution；

For given lot sample sheet, first piece of trellis diagram output can be denoted asThe phase of batch data Prestige and variance can be denoted as:

Wherein K indicates the quantity of lot sample sheet,Indicate the multiple dimensioned convolution output on first piece of k-th of sample, E () The expectation and variance of lot sample sheet are respectively indicated with Var ()；

Feature after criticizing standardization can indicate are as follows:

Wherein ε, which takes, is similar to 0 normal number to improve the generalization ability of characteristic criterion, and α and β indicate that scale and offset become Parameter is changed,Feature after indicating standardization；

Maximum selection rule unit is used to the adaptive multiple dimensioned convolution characteristic pattern of fusion, first piece of standardization characteristic value It can be expressed asWherein (c, i, j) indicates that the channel of standardization feature and coordinate, scale have recorded corresponding Convolution kernel size (1 × 1,3 × 3,5 × 5,7 × 7), the output of maximum selection rule unit can indicate are as follows:

The wherein output y of maximum selection rule unit^(l)It is different scale characteristic pattern in the position (c, i, j) in the value of (c, i, j) On maximum value；

The output of multiple dimensioned convolution block can indicate are as follows:

F^(l)=σ (F^(l-1)+y^(l)) (8)

Wherein F^(l-1)And F^(l)The output and first piece of output of a block are respectively indicated, σ () indicates ReLU excitation Function；

By 8 multiple dimensioned convolution blocks, the output of multiple dimensioned convolution module can be denoted as F⁽⁸⁾, the size of characteristic pattern is 7 × 7 ×512；

(32) pay attention to power module with the characteristic pattern F of the last one multiple dimensioned convolution block⁽⁸⁾As input, attention mechanism is drawn Wire guide network is concerned about conspicuousness characterization to realize that feature refines；

Pixel-level attention mechanism and channel level attention mechanism are used in a model, and wherein pixel attention layer is with convolution spy Sign figure is weighed the importance of each pixel in characteristic pattern by one pixel weight matrix of study, can indicated as input Are as follows:

α_p=tanh (W_paU+b_pa) (9)

WhereinFor the Two-Dimensional Moment array form of input feature vector figure, θ_pa={ W_pa,b_paIndicate weight and threshold value Parameter, tanh () indicate hyperbolic tangent function,Indicate that the Pixel-level weight matrix being calculated, the matrix are used To reflect each pixel for the significance level of Activity recognition；

The pixel attention characteristic pattern of final output is the convolution characteristic pattern inputted and the matrix multiple of Pixel-level weight, tool Body may be expressed as:

WhereinRepresenting matrix multiplication, PA (|) indicate reflecting from input feature vector figure to output attention characteristic pattern It penetrates, the pixel attention characteristic pattern finally exported is

Channel attention layer is using convolution characteristic pattern as input, by one channel weight matrix learning characteristic figure of study The contribution margin that each channel classifies to behavior, may be expressed as:

α_c=tanh (W_caV+b_ca) (12)

WhereinFor the Two-Dimensional Moment array form of input feature vector figure, θ_ca={ W_ca,b_caIndicate weight and threshold value Parameter, tanh () indicate hyperbolic tangent function,Indicate the channel level weight matrix being calculated, the matrix For reflecting each channel of characteristic pattern for the significance level of Activity recognition；

The channel attention characteristic pattern of final output is the convolution characteristic pattern inputted and the matrix multiple of channel level weight, tool Body may be expressed as:

WhereinRepresenting matrix multiplication, CA (|) indicate reflecting from input feature vector figure to output attention characteristic pattern It penetrates, the channel attention characteristic pattern finally exported is

Pixel attention and channel are carried out note that the attention finally exported to convolution characteristic pattern using the mode of parallel connection Characteristic pattern is the addition fusion of the two, be may be expressed as:

F_att=PA (F⁽⁸⁾)+CA(F⁽⁸⁾) (15)

Wherein F⁽⁸⁾Indicate the characteristic pattern of the last one multiple dimensioned convolution block of input, PA () and CA () are respectively indicated Pixel and channel are note that F_attIndicate the attention characteristic pattern finally exported；

(33) module is composed of a full articulamentum and one softmax layers respectively, and the module is with attention feature Scheme F_attAs input, last output is the probability of different driving behavior classifications；

Full articulamentum will specifically can may be used having a size of 7 × 7 × 512 attention characteristic pattern dimensionality reductions at 1000 dimensional feature vectors It indicates are as follows:

F=W_fcF_att+b_fc (16)

Wherein θ_fc={ W_bc,b_bcIndicate the weight and threshold parameter of full articulamentum, f indicate 1000 dimensional features of output to Amount；

In softmax layers, output unit number is identical as behavior classification number, and output valve is softmax classifier The different classes of probability being calculated, specifically may be expressed as:

Wherein P (j) indicates that feature f belongs to the posterior probability of jth class, θ_cls={ W_cls,b_clsWeight and threshold parameter, Score={ s₁,s₂,...,s_nIndicate softmax layer export different behavior classifications probability distribution.

Preferably, in step (4), the multiple dimensioned attention convolutional neural networks of training；It is built using Pytorch Open-Source Tools Network model, using stochastic gradient descent method optimization network parameter, using intersection loss entropy function measurement true tag and in advance The distance between result is surveyed, specifically may be expressed as:

Wherein l indicates classification true value label, and P (j) i.e. softmax layers of output indicates that the posteriority for belonging to jth classification is general Rate；

For batch data, the parameter of whole network can be lost by softmax to be optimized as supervision, specifically may be used It indicates are as follows:

Wherein | | θ | | indicate loss function regularization term, for mitigate be likely to occur in network training process it is excessively quasi- It closes.

Preferably, in step (5), multiple row convolutional neural networks are tested specifically: given driver identification Test image is normalized to 224 × 224 size as the input of multiple row fusion convolutional neural networks, passes through multiple row by image The propagated forward of converged network obtains the Activity recognition result of test image.

The invention has the benefit that (1) present invention employs multiple dimensioned convolution modules to be filtered to original image, most The Analysis On Multi-scale Features of the adaptive each convolution block of fusion of big value selecting unit；(2) present invention employs the tradeoffs of attention mechanism The channel conspicuousness and pixel significance of characteristic pattern are refined for feature and behavior fine granularity character representation.

Detailed description of the invention

Fig. 1 is the sample schematic diagram of different driving behaviors in the present invention.

Fig. 2 is that data enhance schematic diagram in the present invention.

Fig. 3 is the configuration diagram of multiple dimensioned attention convolutional neural networks model in the present invention.

Fig. 4 is the multiple dimensioned convolution block schematic diagram of the present invention.

Fig. 5 is attention schematic diagram of mechanism in the present invention.

Specific embodiment

A kind of driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks, includes the following steps:

Step 1: the image data set of shooting driving behavior identification.All images are by built-in vehicle-mounted camera in difference It is recorded under angle and different light conditions.Driving behavior data set shares 42816 pictures, covers 6 kinds of different driving rows For as shown in Figure 1, being respectively as follows:

C0: safe driving；

C1: off-direction disk drives；

C2: it makes a phone call to drive；

C3: it bows and sees the mobile phone；

C4: it smokes and drives；

C5: it is talked with passenger；

The obtained image data collection of shooting be divided into training set and test set respectively include 17087 trained pictures with 25729 test pictures.

Step 2: data enhancing being done to the driving behavior data set that shooting obtains and obtained sample will be enhanced while being received Enter in training data, wherein mainly include two kinds of data enhancement methods, it is specific as follows:

Step 201: using the data enhancement methods of random cropping: by the image normalization of input for 256 × 256, at random The image block of selection 224 × 224 is as training sample.

Step 202: the data enhancement methods converted using image content-based include small angle rotation, and mirror image adds and makes an uproar, Gaussian smoothing etc. as shown in Fig. 2, these enhancing samples, which are added, can be improved the anti-noise ability of algorithm, and effectively raises depth Spend the robust ability of neural network.

Step 203: if including K training sample in training set, X={ χ can be denoted as₁,χ₂,...χ_N}.And for training N-th of the sample concentrated can be expressed as χ_k={ I_k,l_k, wherein I_kIndicate k-th three to lead to image, having a size of for 224 × 224 × 3, l_kIndicate its corresponding class label.

Step 3: building neural network model, designed model include three modules, are respectively as follows: multiple dimensioned convolution mould Block pays attention to power module and categorization module.The structural diagrams of network are intended to as shown in Figure 3.Wherein multiple dimensioned convolution module is with original Image is input, is successively filtered using the convolution collecting image of different scale, maximum selection rule unit is as each more rulers The excitation function of convolution block is spent, behavioural characteristic is extracted tentatively with the layer-by-layer multi-scale information of adaptive fusion.Attention mould Block refines behavioural characteristic, which obtains behavioural characteristic by study Pixel-level weight matrix and channel level weight matrix Pixel-level conspicuousness and channel level conspicuousness, and using soft attention strategy behavioural characteristic is refined.Categorization module is logical It crosses full articulamentum and softmax layers is classified to driving behavior.It is described in detail below:

Step 301: the network frame of design is rolled up based on first layer using 224 × 224 × 3 original image as input Lamination is filtered original image with 64 7 × 7 × 3 convolution kernels.Maximum value pond layer will input dimensionality reduction at 56 × 56 × 64 characteristic pattern, is specifically expressed as follows:

x_bc=σ (I*W+b) (1)

F_bc=down (x_bc) (2)

Wherein * indicates convolution operation, θ_bc={ W, b } indicates basic convolutional layer weight and threshold parameter.σ () indicates ReLU Excitation function.Down () indicates the operation of maximum value pondization, F_bcIndicate the output characteristic pattern of basic convolutional layer.

Remaining convolutional layer is stacked by 8 multiple dimensioned convolution blocks.Multiple dimensioned convolution block by 4 kinds of different scales (1 × 1, 3 × 3,5 × 5,7 × 7) filtering core the parallel combined forms, and each multiple dimensioned convolution block is realized certainly by maximum selection rule unit The multi-scale information of adaptation merges, and inhibits gradient explosion and gradient diffusing phenomenon using residual error learning method.Multiple dimensioned convolution block Structural schematic diagram it is as shown in Figure 4.

Specifically, first of multiple dimensioned convolution block carries out convolution to the characteristic pattern that a upper block exports, and may be expressed as:

x^(l)=F^(l-1)*W^(l)+b^(l), l=1,2 ..., 8 } (3)

WhereinIndicate the weight and threshold parameter of first of multiple dimensioned convolution block, F^(l-1)Indicate upper one The output of secondary multiple dimensioned convolution block, x^(l)Indicate first piece of multiple dimensioned convolution characteristic pattern.Particularly, first multiple dimensioned convolution The input of block is the output characteristic pattern of basic convolution.

It criticizes standardization to follow after each convolution operation, to increase the generalization of e-learning.For given lot sample This, first piece of trellis diagram output can be denoted asThe expectation of batch data and variance can be denoted as:

Wherein K indicates the quantity of lot sample sheet,Indicate the multiple dimensioned convolution output on first piece of k-th of sample, E () The expectation and variance of lot sample sheet are respectively indicated with Var ().

Feature after criticizing standardization can indicate are as follows:

Wherein ε, which takes, is similar to 0 normal number to improve the generalization ability of characteristic criterion.α and β indicates that scale and offset become Parameter is changed,Feature after indicating standardization.

Maximum selection rule unit is used to the adaptive multiple dimensioned convolution characteristic pattern of fusion.First piece of standardization characteristic value It can be expressed asWherein (c, i, j) indicates that the channel of standardization feature and coordinate, scale have recorded corresponding Convolution kernel size (1 × 1,3 × 3,5 × 5,7 × 7), the output of maximum selection rule unit can indicate are as follows:

The wherein output y of maximum selection rule unit^(l)It is different scale characteristic pattern in the position (c, i, j) in the value of (c, i, j) On maximum value.

Residual error study is introduced in multiple dimensioned convolution block for improving the convergence capabilities of network.Residual unit uses The identical mapping of an input is added in the output of the connection type of shortcut, as residual unit.Multiple dimensioned convolution block it is defeated It can indicate out are as follows:

F^(l)=σ (F^(l-1)+y^(l)) (8)

Wherein F^(l-1)And F^(l)The output and first piece of output of a block are respectively indicated, σ () indicates ReLU excitation Function.

By 8 multiple dimensioned convolution blocks, the output of multiple dimensioned convolution module can be denoted as F⁽⁸⁾, the size of characteristic pattern is 7 × 7 ×512。

Step 302: paying attention to power module with the characteristic pattern F of the last one multiple dimensioned convolution block⁽⁸⁾As input, attention machine System guidance network is concerned about conspicuousness characterization to realize that feature refines, and specific attention model can automatically emphasize that part is thin Information is saved, inhibits the global context information of redundancy, attention model used in the design is as shown in Figure 5.

Pixel-level attention mechanism and channel level attention mechanism are used in a model.Wherein pixel attention layer is with convolution spy Sign figure is weighed the importance of each pixel in characteristic pattern by one pixel weight matrix of study, can indicated as input Are as follows:

α_p=tanh (W_paU+b_pa) (9)

WhereinFor the Two-Dimensional Moment array form of input feature vector figure, θ_pa={ W_pa,b_paIndicate weight and threshold value Parameter, tanh () indicate hyperbolic tangent function,Indicate that the Pixel-level weight matrix being calculated, the matrix are used To reflect each pixel for the significance level of Activity recognition.

Similarly, attention layer in channel passes through one channel weight matrix study of study using convolution characteristic pattern as input The contribution margin that each channel classifies to behavior in characteristic pattern, may be expressed as:

α_c=tanh (W_caV+b_ca) (12)

WhereinFor the Two-Dimensional Moment array form of input feature vector figure, θ_ca={ W_ca,b_caIndicate weight and threshold value Parameter, tanh () indicate hyperbolic tangent function,Indicate the channel level weight matrix being calculated, the matrix For reflecting each channel of characteristic pattern for the significance level of Activity recognition.

F_att=PA (F⁽⁸⁾)+CA(F⁽⁸⁾) (15)

Wherein F⁽⁸⁾Indicate the characteristic pattern of the last one multiple dimensioned convolution block of input, PA () and CA () are respectively indicated Pixel and channel are note that F_attIndicate the attention characteristic pattern finally exported.

Step 303: module is composed of a full articulamentum and one softmax layers respectively, and the module is with attention Characteristic pattern F_attAs input, last output is the probability of different driving behavior classifications.

F=W_fcF_att+b_fc (16)

Wherein θ_fc={ W_bc,b_bcIndicate the weight and threshold parameter of full articulamentum, f indicate 1000 dimensional features of output to Amount.

Step 4: the multiple dimensioned attention convolutional neural networks of training.Network model is built using Pytorch Open-Source Tools.Make Optimize network parameter with stochastic gradient descent method.

The distance between true tag and prediction result are measured using loss entropy function is intersected, specifically may be expressed as:

Wherein l indicates classification true value label, and P (j) i.e. softmax layers of output indicates that the posteriority for belonging to jth classification is general Rate.

Step 5: multiple row convolutional neural networks are tested.A given driver identifies image, and test image is returned One size for turning to 224 × 224 merges the input of convolutional neural networks as multiple row, passes through the propagated forward of multiple row converged network Obtain the Activity recognition result of test image.

Claims

1. a kind of driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks, which is characterized in that including such as Lower step:

(1) image data set of shooting driving behavior identification；

(2) the driving behavior data set obtained to shooting does data enhancing and will enhance obtained sample while being included in trained number In；

(3) neural network model, including three modules are constructed, is respectively as follows: multiple dimensioned convolution module, pays attention to power module and classification mould Block；

(4) the multiple dimensioned attention convolutional neural networks of training；Network model is built using Pytorch Open-Source Tools, using random Gradient descent method optimizes network parameter；

(5) multiple row convolutional neural networks are tested.

2. the driving behavior recognition methods as described in claim 1 based on multiple dimensioned attention convolutional neural networks, special Sign is, in step (1), driving behavior covers 6 kinds of different driving behaviors, including C0: safe driving；C1: off-direction Disk drives；C2: it makes a phone call to drive；C3: it bows and sees the mobile phone；C4: it smokes and drives；C5: it is talked with passenger.

3. the driving behavior recognition methods as described in claim 1 based on multiple dimensioned attention convolutional neural networks, special Sign is, in step (2), data enhancing is done to the obtained driving behavior data set of shooting and the sample that obtains enhancing simultaneously It is included in training data and specifically comprises the following steps:

(21) image normalization of input is 256 × 256 by the data enhancement methods for using random cropping, randomly select 224 × 224 image block is as training sample；

(22) it using the data enhancement methods of image content-based transformation, comprising small angle rotation, mirror image plus makes an uproar flat with Gauss It is sliding；

(23) if including K training sample in training set, it is denoted as X={ χ₁,χ₂,...χ_N, and for n-th in training set Sample is expressed as χ_k={ I_k,l_k, wherein I_kIt indicates k-th three to lead to image, having a size of for 224 × 224 × 3, l_kIndicate that its is right The class label answered.

4. the driving behavior recognition methods as described in claim 1 based on multiple dimensioned attention convolutional neural networks, special Sign is, in step (3), multiple dimensioned convolution module with original image be input, using different scale convolution collecting image into Row successively filtering, excitation function of the maximum selection rule unit as each multiple dimensioned convolution block, layer-by-layer with adaptive fusion Multi-scale information tentatively extracts behavioural characteristic；Notice that power module refines behavioural characteristic, which passes through study pixel Grade weight matrix and channel level weight matrix obtain the Pixel-level conspicuousness and channel level conspicuousness of behavioural characteristic, and use soft note The strategy of meaning refines behavioural characteristic；Categorization module divides driving behavior by full articulamentum and softmax layers Class.

5. the driving behavior recognition methods as described in claim 1 based on multiple dimensioned attention convolutional neural networks, special Sign is, in step (3), building neural network model specifically comprises the following steps:

(31) network frame designed is using 224 × 224 × 3 original image as input, and first layer is basic convolutional layer, with 64 A 7 × 7 × 3 convolution kernel is filtered original image, and maximum value pond layer will input dimensionality reduction into 56 × 56 × 64 feature Figure, is specifically expressed as follows:

x_bc=σ (I*W+b) (1)

F_bc=down (x_bc) (2)

Remaining convolutional layer is stacked by 8 multiple dimensioned convolution blocks, multiple dimensioned convolution block by 4 kinds of different scales (1 × 1,3 × 3,5 × 5,7 × 7) filtering core the parallel combined forms, and each multiple dimensioned convolution block is realized adaptive by maximum selection rule unit Multi-scale information fusion, use residual error learning method inhibit gradient explosion and gradient diffusing phenomenon；

First of multiple dimensioned convolution block carries out convolution to the characteristic pattern that a upper block exports, and indicates are as follows:

x^(l)=F^(l-1)*W^(l)+b^(l), l=1,2 ..., 8 } (3)

WhereinIndicate the weight and threshold parameter of first of multiple dimensioned convolution block, F^(l-1)Indicate last more The output of scale convolution block, x^(l)Indicate first piece of multiple dimensioned convolution characteristic pattern, the input of first multiple dimensioned convolution block is The output characteristic pattern of basic convolution；

For given lot sample sheet, first piece of trellis diagram output is denoted asThe expectation and variance of batch data It is denoted as:

Wherein K indicates the quantity of lot sample sheet,Indicate on first piece of k-th of sample multiple dimensioned convolution output, E () and Var () respectively indicates the expectation and variance of lot sample sheet；

Character representation after criticizing standardization are as follows:

Wherein ε, which takes, is similar to 0 normal number to improve the generalization ability of characteristic criterion, and α and β indicate scale and offset transformation ginseng Number,Feature after indicating standardization；

Maximum selection rule unit is used to the adaptive multiple dimensioned convolution characteristic pattern of fusion, and first piece of standardization characteristic value indicates ForWherein (c, i, j) indicates that the channel of standardization feature and coordinate, scale have recorded corresponding convolution kernel ruler The output of very little (1 × 1,3 × 3,5 × 5,7 × 7), maximum selection rule unit indicates are as follows:

The wherein output y of maximum selection rule unit^(l)It is different scale characteristic pattern on the position (c, i, j) in the value of (c, i, j) Maximum value；

The output of multiple dimensioned convolution block indicates are as follows:

F^(l)=σ (F^(l-1)+y^(l)) (8)

By 8 multiple dimensioned convolution blocks, the output of multiple dimensioned convolution module is denoted as F⁽⁸⁾, the size of characteristic pattern is 7 × 7 × 512；

(32) pay attention to power module with the characteristic pattern F of the last one multiple dimensioned convolution block⁽⁸⁾As input, attention mechanism guiding net Network is concerned about conspicuousness characterization to realize that feature refines；

Pixel-level attention mechanism and channel level attention mechanism are used in a model, and wherein pixel attention layer is with convolution characteristic pattern As input, the importance of each pixel in characteristic pattern is weighed by one pixel weight matrix of study, is indicated are as follows:

α_p=tanh (W_paU+b_pa) (9)

WhereinFor the Two-Dimensional Moment array form of input feature vector figure, θ_pa={ W_pa,b_paIndicate weight and threshold value ginseng Number, tanh () indicate hyperbolic tangent function,Indicate that the Pixel-level weight matrix being calculated, the matrix are used to Reflect each pixel for the significance level of Activity recognition；

The pixel attention characteristic pattern of final output is the convolution characteristic pattern inputted and the matrix multiple of Pixel-level weight, specific table It is shown as:

WhereinRepresenting matrix multiplication, PA (|) indicate one from input feature vector figure to output attention characteristic pattern mapping, most The pixel attention characteristic pattern exported afterwards is

Channel attention layer is using convolution characteristic pattern as input, by each in one channel weight matrix learning characteristic figure of study The contribution margin that a channel classifies to behavior indicates are as follows:

α_c=tanh (W_caV+b_ca) (12)

WhereinFor the Two-Dimensional Moment array form of input feature vector figure, θ_ca={ W_ca,b_caIndicate weight and threshold parameter, Tanh () indicates hyperbolic tangent function,Indicate that the channel level weight matrix being calculated, the matrix are used to anti- The each channel of characteristic pattern is reflected for the significance level of Activity recognition；

The channel attention characteristic pattern of final output is the convolution characteristic pattern inputted and the matrix multiple of channel level weight, specific table It is shown as:

WhereinRepresenting matrix multiplication, CA (|) indicate one from input feature vector figure to output attention characteristic pattern mapping, most The channel attention characteristic pattern exported afterwards is

Pixel attention and channel are carried out note that the attention feature finally exported to convolution characteristic pattern using the mode of parallel connection Figure is the addition fusion of the two, is indicated are as follows:

F_att=PA (F⁽⁸⁾)+CA(F⁽⁸⁾) (15)

Wherein F⁽⁸⁾Indicate the characteristic pattern of the last one multiple dimensioned convolution block of input, PA () and CA () respectively indicate pixel With channel note that F_attIndicate the attention characteristic pattern finally exported；

(33) module is composed of a full articulamentum and one softmax layers respectively, and the module is with attention characteristic pattern F_att As input, last output is the probability of different driving behavior classifications；

Full articulamentum will be embodied as having a size of 7 × 7 × 512 attention characteristic pattern dimensionality reductions at 1000 dimensional feature vectors:

F=W_fcF_att+b_fc (16)

Wherein θ_fc={ W_bc,b_bcIndicate that the weight and threshold parameter of full articulamentum, f indicate 1000 dimensional feature vectors of output；

In softmax layers, output unit number is identical as behavior classification number, and output valve is softmax classifier calculated Obtained different classes of probability, is embodied as:

Wherein P (j) indicates that feature f belongs to the posterior probability of jth class, θ_cls={ W_cls,b_clsWeight and threshold parameter, score ={ s₁,s₂,...,s_nIndicate softmax layer export different behavior classifications probability distribution.

6. the driving behavior recognition methods as described in claim 1 based on multiple dimensioned attention convolutional neural networks, special Sign is, in step (4), the multiple dimensioned attention convolutional neural networks of training；Network mould is built using Pytorch Open-Source Tools Type optimizes network parameter using stochastic gradient descent method, measures true tag and prediction result using loss entropy function is intersected The distance between, it is embodied as:

Wherein l indicates classification true value label, and P (j) i.e. softmax layers of output indicates the posterior probability for belonging to jth classification；

For batch data, the parameter of whole network is lost by softmax to be optimized as supervision, is embodied as:

Wherein | | θ | | the regularization term of loss function is indicated, for mitigating the over-fitting being likely to occur in network training process.

7. the driving behavior recognition methods as described in claim 1 based on multiple dimensioned attention convolutional neural networks, special Sign is, in step (5), tests multiple row convolutional neural networks specifically: a given driver identifies image, will survey Attempt input of the size as being normalized to 224 × 224 as multiple row fusion convolutional neural networks, passes through multiple row converged network The Activity recognition result of propagated forward acquisition test image.