CN110084182A

CN110084182A - It is a kind of based on 3D convolutional neural networks divert one's attention drive recognition methods

Info

Publication number: CN110084182A
Application number: CN201910335667.XA
Authority: CN
Inventors: 曾凯; 张曼
Original assignee: Guizhou Institute of Technology
Current assignee: Guizhou Institute of Technology
Priority date: 2019-04-24
Filing date: 2019-04-24
Publication date: 2019-08-02

Abstract

What the invention discloses a kind of based on 3D convolutional neural networks diverts one's attention to drive recognition methods, belongs to deep learning, area of pattern recognition, more particularly to diverting one's attention based on depth convolutional neural networks drives recognition methods.Superposition processing building input layer is done to posture picture is driven.Doing convolutional calculation to picture cube first is C1 convolutional layer;Then convolutional calculation is done by the different convolution kernel of two-way, then does maximum pondization and calculates, continuously repeated four times, be C2, C3, C4, C5 convolutional layer of network；The C5 two category feature figures exported are finally done into union operation, then successively pass through two full articulamentum L1, L2, calculating softmax is finally exported and corresponding divert one's attention to drive classification.2D picture is superimposed building 3D input by the present invention, while the convolution kernel for having used two-way different proposes feature, and network has generalization ability strong, the high advantage of accuracy of identification.

Description

It is a kind of based on 3D convolutional neural networks divert one's attention drive recognition methods

Technical field

The invention belongs to deep learnings, area of pattern recognition, more particularly to divert one's attention to drive based on depth convolutional neural networks Sail recognition methods.

Background technique

According to the definition of International Organization for standardization, it is uncorrelated to normal driving to divert one's attention to drive attention direction when referring to driving Activity, so as to cause driver behavior ability decline a kind of phenomenon.It is common to divert one's attention to drive when mainly including that driver drives It makes a phone call, plays mobile phone, drinks water, engaging in a small talk etc. behaviors with passenger.For driving using mobile phone, when there is information alert, driver Automatic sight would generally be transferred on mobile phone screen from road.It usually has a look at mobile phone and needs 3 seconds, it is assumed that motor vehicle With 60km/h, it then completely blind can hold 50 meters within 3 seconds, can be very dangerous if an emergency situation is encountered.China's " traffic safety Method implementing regulations " regulation, operating motor vehicles, which must not have to dial, answers the row that hand-held phone, viewing TV etc. interfere safe driving For.

Traditional image processing means are mostly based on to the intellectual analysis of driving behavior at present, by support vector machines come structure Build Image Classifier.Correlative study in recent years shows that deep learning method can greatly improve image classification and prediction Accuracy rate.The present invention is based on 3D deep neural networks, make anticipation to driving behavior of diverting one's attention, and can preferably standardize driving row To improve the safety of road traffic.

Summary of the invention

What the purpose of the present invention is to propose to a kind of based on 3D convolutional neural networks diverts one's attention to drive recognition methods.

Technical solution of the present invention: this method does superposition processing building input layer to posture picture is driven.First to picture It is C1 is convolutional layer that cube, which does convolutional calculation,；Then convolutional calculation is done by the different convolution kernel of two-way, then does maximum pond It calculates, continuously repeats four times, be C2, C3, C4, C5 volumes of bases of network；The C5 two category feature figures exported are finally done into merging behaviour Make, then successively pass through two full articulamentum L1, L2, calculating softmax is finally exported and corresponding divert one's attention to drive classification.

Specific step is as follows:

Step 1: the driving behavior that will divert one's attention is defined as n class.Picture is uniformly scaled 300*200, then same class diverts one's attention to drive It sails after picture does superposition processing, the input of 2D picture is switched into 3D input.

Step 2: the training sample training 3D convolutional neural networks obtained using step 1

Step 2.1: input cube passes through convolutional calculation, is C1 convolutional layer.

Step 2.2: the characteristic pattern cube exported to step 2.1 does convolutional calculation by the different convolution kernel of two-way, then Maximum pondization is done to calculate.It continuously repeats four times, is C2, C3, C4, C5 convolutional layer of network.

Step 2.3: two characteristic pattern cubes that step 2.2 is exported merge into a characteristic pattern cube.

Step 2.4: full connection twice continuously being done to the characteristic pattern cube that step 2.3 exports and is calculated, is connected entirely for F1, F2 Layer.

Step 2.5: Softmax and loss being calculated according to the output of step 2.4, and is joined according to the reversed corrective networks of loss Number.Step 2.1- step 2.5 is repeated, until loss restrains.

Step 3: test picture being done into duplication superposition processing, constructs 3D cube structure.The 3D convolution obtained using step 2 Neural network testing classification result.

Above-mentioned steps 2.3 have used two-way difference convolution kernel, wherein C2 layers of convolution kernel are having a size of 64@8*8*3 and 64@6* 6*2；C3 layers of convolution kernel are having a size of 128@5*3*2 and 128@7*3*3；C4 layers of convolution kernel are having a size of 256@6*3*2 and 256@5*3* 1；C5 layers of convolution kernel are having a size of 512@3*3*3 and 512@6*5*3.

Beneficial effects of the present invention:

1. in terms of depth network structure: doing feature extraction using the various sizes of convolution kernel of two-way, improve network generalization.

2. model adaptation application aspect: the present invention constructs input layer by the way of picture superposition, is view to input data The case where frequency, equally has compatibility, and several frame building data cubes need to be only selected from video.

3. product practices aspect: not interfering driver's normal driving, taken the photograph using traffic control department monitoring camera or car As the identification to driving behavior of diverting one's attention can be realized in head.

Detailed description of the invention

Fig. 1 is flow chart of the present invention.

Fig. 2 is 3D convolutional neural networks structure chart of the present invention.

Specific implementation method

A kind of to divert one's attention to drive recognition methods based on 3D convolutional neural networks, this method is realized by following step:

Step 1: diverting one's attention driving behavior equipped with n class.Picture is uniformly scaled 300*200,30 same class is taken to divert one's attention driving figure Piece does superposition processing, and cube size is 3@300*200*30 at this time, wherein 30 indicate the superposition of 30 pictures, 300*200 is empty Between dimension size, 3 be port number.

Step 2.1:C1 is neural network first layer, does convolution sum maximum pondization to input cube and calculates.Convolution kernel size is 32@11*7*3,2 be time dimension size, and 11*7 is Spatial Dimension size, shares 32 convolution kernels, and step-length is (1,1,1).Volume The size for the characteristic pattern that product exports after calculating is 32@(300-11+1) * (200-7+1) * (30-2+1)=32@290*194* 29.It does maximum pondization after convolutional calculation to calculate, sampling window size is 2*2*1, and 1 is time dimension length, and 2*2 is Spatial Dimension. Characteristic pattern after pond is having a size of 32@(290/2) * (194/2) * (29/1)=32@145*97*29.Final C1 layers of output feature Figure size are as follows: 32 145*97*29.Wherein, convolutional calculation formula is as follows

Step 2.2:

C2 layers are the neural network second layer, do convolutional calculation to characteristic pattern cube using the different convolution kernel of two-way.Upper layer For convolution kernel having a size of 64@8*8*3, step-length is (1,1,1).The characteristic pattern exported after convolutional calculation is having a size of 64@(145-8+ 1)*(97-8+1)*(29-3+1)=64@138*90*27.The upper layer C2 pond window size is 3*3*1, characteristic pattern after pondization calculates Size is 64@(138/3) * (90/3) * (27/1)=64@46*30*27.Lower layer's convolution kernel is having a size of 64@6*6*2.Convolutional calculation The characteristic pattern exported later is having a size of 64@(145-6+1) * (97-6+1) * (29-2+1)=64@140*92*28.C2 lower layer pond Window size is 2*2*2, and characteristic pattern is having a size of 64@(140/2) * (92/2) * (28/2)=64@70*46*14 after pondization calculates.

C3 layers are neural network third layer, and the characteristic pattern that this layer respectively exports C2 layers does convolution kernel pondization and calculates.Upper layer Convolution kernel size be 128@5*3*2.The characteristic pattern exported after convolutional calculation is having a size of 128@(46-5+1) * (30-3+1) * (27-2+1)=128@42*28*26.The upper layer C3 pond window size is 2*2*1, and Chi Huahou characteristic pattern size is 128 * (42/ 2)*(28/2)* (26/1)=128@21*14*26.Lower layer's convolution kernel size is 128@7*3*3.The spy exported after convolutional calculation Figure is levied having a size of 128@(70-7+1) * (46-3+1) * (14-3+1)=128@64*44*12.C3 lower layer pond window size is 2*2*1, Chi Huahou characteristic pattern size are 128 (64/2) * (44/2) * (14/1)=128 32*22*14.

C4 layers are the 4th layer of neural network, this layer is still independent convolution sum Chi Huaji to the characteristic pattern of upper and lower level respectively It calculates.The convolution kernel size on upper layer is 256@6*3*3.The characteristic pattern exported after convolutional calculation is having a size of 256@(21-6+1) * (14-3+1)*(26-3+1)= 256@16*12*24.The upper layer C4 pond window size is 2*2*2, and characteristic pattern size is after sampling 256@*(16/2)*(12/2)*(24/2)=256@8*6*12.Lower layer's convolution kernel size is 256@5*3*1.It is defeated after convolutional calculation Characteristic pattern out is having a size of 256@(32-5+1) * (22-3+1) * (12-1+1)=256@28*20*12.C4 lower layer pond window is big Small is 2*2*1, and characteristic pattern size is 256@* (28/2) * (20/2) * (12/1)=256@14*10*12 after pondization operation.

C5 layers are neural network layer 5, this layer still does respective calculating to the characteristic pattern of upper and lower level respectively.Upper layer Convolution kernel size is 512@3*3*3, there is 512 convolution kernels.The characteristic pattern exported after convolutional calculation is having a size of 512@(8-3+ 1)*(6-3+1)*(12-3+1)=512@6*4*10.The upper layer C5 sampling window size is 2*2*2, and characteristic pattern size is after sampling 512@*(6/2)*(4/2)*(10/2)=512@3*2*5.Lower layer's convolution kernel size is 512@6*5*3.It is exported after convolutional calculation Characteristic pattern having a size of 512@(14-6+1) * (10-5+1) * (12-3+1)=512@9*6*10.C5 lower layer pond window size is 3*3*2, characteristic pattern size is 512@* (9/3) * (6/3) * (10/2)=512@3*2*5 after pondization calculates.

Step 2.3: being converted to a characteristic pattern after the identical characteristic pattern of two sizes of C5 upper and lower level is merged channel, at this time Channel changes, remaining size constancy, and the characteristic pattern exported after connection is having a size of 1024@3*2*5.

Step 2.4:F1 shares 4096 neurons as first full articulamentum, and each neuron dimension is 1024*3* 2*5, obtained output are 1*1*1*4096.F2 is the full articulamentum of the second layer, is the output layer of network.F2 layers use n 1*1* 1*4096 ties up neuron and does full connection calculating to F1 layers of output, and F2 layers of output is that size is 1*1*1*n.

Wherein connection calculation formula is as follows entirely:

Step 2.5: Softmax, loss being calculated according to the output of step 2.4, and according to the reversed corrective networks parameter of loss.It repeats Step 2.1- step 2.5, until loss restrains.The calculation formula of Softmax and loss is as follows:

Step 3: test picture being done into 30 parts of duplication and by superposition building 3D cube input.3D volumes obtained using step 2 Product neural network testing classification result.

Claims

1. it is a kind of based on 3D convolutional neural networks divert one's attention drive recognition methods, it is characterised in that this method includes

Step 1: the driving behavior that will divert one's attention is defined as n class；Original driving picture is pre-processed, it is defeated that the input of 2D picture is switched to 3D Enter；

Step 2: the training sample training 3D convolutional neural networks obtained using step 1；

Step 2.1: input cube passes through convolutional calculation, is C1 convolutional layer,

Step 2.2: the characteristic pattern cube exported to step 2.1 does convolutional calculation by the different convolution kernel of two-way, then do most Great Chiization calculates, and continuously repeats four times, is C2, C3, C4, C5 convolutional layer of network,

Step 2.3: two characteristic pattern cubes that step 2.2 is exported merge into a characteristic pattern cube,

Step 2.4: full connection twice is continuously done to the characteristic pattern cube that step 2.3 exports and is calculated, is the full articulamentum of F1, F2,

Step 2.5: Softmax, loss being calculated according to the output of step 2.4, and according to the reversed corrective networks parameter of loss；It repeats Step 2.1- step 2.5, until loss restrains；

Step 3: test picture being done into duplication superposition processing, constructs 3D cube structure；The 3D convolutional Neural obtained using step 2 Network test classification results.

2. it is as described in claim 1 it is a kind of based on 3D convolutional neural networks divert one's attention drive recognition methods, it is characterised in that institute Stating step 1 two-dimension picture input is switched to three-dimensional input includes: that picture is scaled uniform sizes size first, then will be same Class, which diverts one's attention to drive after picture does superposition processing, inputs 3D network.

3. a kind of as described in claim 1 divert one's attention to drive recognition methods based on 3D convolutional neural networks, step 2.3 is used Two-way difference convolution kernel, it is characterized in that: C2 layers of convolution kernel are having a size of 64@8*8*3 and 64@6*6*2；C3 layers of convolution kernel having a size of 128@5*3*2 and 128@7*3*3；C4 layers of convolution kernel are having a size of 256@6*3*2 and 256@5*3*1；C5 layers of convolution kernel having a size of 512@3*3*3 and 512@6*5*3.