CN109325439A

CN109325439A - A kind of recognition methods of the driving unlawful practice based on CNN

Info

Publication number: CN109325439A
Application number: CN201811087757.3A
Authority: CN
Inventors: 刘宏基
Original assignee: Chengdu Information Technology Co Ltd of CAS
Current assignee: Chengdu Information Technology Co Ltd of CAS
Priority date: 2018-09-18
Filing date: 2018-09-18
Publication date: 2019-02-12
Anticipated expiration: 2038-09-18
Also published as: CN109325439B

Abstract

The recognition methods for the driving unlawful practice based on CNN that the invention discloses a kind of, is arranged training parameter with the artificial intelligence frame of independent research, and every layer is built CNN network, carries out each layer of image recognition training.This stage is the process of iteration, and each layer network is relatively independent training mission, can be with controlled training effect in every layer so that it is convenient to be regulated and controled for weak link.Finally trained three networks are spelled, just be completed from initial monitor video to driver's unlawful practice.Solve the confinement problems of traditional Open Framework.

Description

A kind of recognition methods of the driving unlawful practice based on CNN

Technical field

The present invention relates to video identification field, in particular to the recognition methods of a kind of driving unlawful practice based on CNN.

Background technique

In China's road traffic accident certainly caused by driver's operation error and fatigue driving, i.e., most accidents are Be that outstanding experienced driver is also different surely keeps good driving condition for a long time.Therefore, driving for driver is monitored Behavior is sailed, accurately can quickly identify the unlawful practice in driving, inherently reduces the generation of traffic accident situation with important Meaning.

Currently, having been achieved for some research achievements in terms of monitoring driver's driving behavior again both at home and abroad, it is broadly divided into Two kinds: one is based on hardware device, according to driver breathe in alcohol content judge whether to drink；According to driver's The relative reflection principle of eyelid and eyeball come judge driver whether fatigue driving；According to the brain of driver that wave or electrocardiogram To determine whether some monitoring drivers such as fatigue driving come in the device for physiologically whether being in normal condition to driver's Driving condition is evaluated.Another kind is analyzed based on Computer Image Processing and mode identification technology, for driver's head Portion's activity condition, facial characteristics (such as eyes, head face) change feature to judge driving behavior and the spirit shape of driver State.For the latter since technical level is limited, the technical application of artificial intelligence is less, and accuracy of identification is not high and speed is slower, often needs It is aided with artificial.For ten hundreds of monitor videos, it is also less scientific for being monitored by the way of semi-artificial and driving violation. Therefore in a kind of method of artificial intelligence, computer high-precision is allowed to go identification violation driving behavior necessary.

In recent years, it is based on the feature learning method of CNN (Convolution Neural Network, convolutional neural networks) Immense success is achieved in terms of image classification, causes the very big concern of computer vision field.

The limitation of traditional Open Framework.Current all well-known artificial intelligence learning frameworks by the U.S. company and Mechanism publication, function, feature, the tendentiousness of these artificial intelligence frames are held by these publication tissues, this is also entire AI industry is difficult to realize commercial one of the major reasons in different field, because small-to-medium business can only almost use this A little disclosed frames carry out the research and development of upper layer packaging type, are limited technical strength, and it is fixed according to real demand and different application depth to lack The ability of artificial intelligence frame processed.

Summary of the invention

It is an object of the invention to: a kind of recognition methods of driving unlawful practice based on CNN is provided, solves tradition Open Framework confinement problems.

The technical solution adopted by the invention is as follows:

A kind of recognition methods of the driving unlawful practice based on CNN, comprising the following steps:

S1, multilayer driving behavior CNN identification network is established, the CNN identification network includes three-layer network；

S2, interception monitor video image；

S3, the monitor video classmate for intercepting step S2 are sent into the first layer for the CNN identification network that step S1 is established, CNN The first layer identification video image of identification network belongs to interior scene or car scene, will belong to the video image of interior scene It is sent into step S4, the image for belonging to car scene is eliminated；

The second layer of S4, CNN identification network identifies the step S3 video image being sent into, and extracts in video image Driver head part is distributed into step S5；

The third layer of S5, CNN identification network identifies the step S4 video image being sent into, and judges the behavior of driver With the presence or absence of in violation of rules and regulations.

Training parameter is set with the artificial intelligence frame of independent research, every layer is built CNN network, carries out each layer of image Recognition training.This stage is the process of iteration, and each layer network is relatively independent training mission, can be to control in every layer Training effect for weak link so that it is convenient to be regulated and controled.Finally by trained three networks spell get up, just complete from Initial monitor video is to driver's unlawful practice.

Further, the method that multilayer driving behavior CNN identification network is established in the step S1 includes following step It is rapid:

S101, first layer CNN identification network is established；

S102, second layer CNN identification network is established；

S103, third layer CNN identification network is established；

S104, three layers of CNN for establishing S101, S102, S10 identify series network, form final network structure.

The method of multitiered network training.Sophisticated identification task is split as multitiered network substep to identify, using multitiered network Structure is realized that individually training both can guarantee recognition speed to each layer network, moreover it is possible to accurate compared with greatly improving under small sample Rate, and can be adjusted according to the training result of each layer network.The present invention will identify that driver makes a phone call, smokes, eats east Western, fatigue driving behavior is split into outside the interior vehicle of identification, and identification drives the number of people, identification concrete behavior three parts.

Further, every layer of CNN identification network foundation the following steps are included:

S105, sample prepare, and sample is divided into training set and test set；

S106, the CNN network structure for constructing respective layer；

S107, it will be trained in the CNN network structure of training set image steps for importing S106 foundation.

S108, training are completed, and the CNN network of respective layer is obtained.

Further, the step S107 will be carried out in CNN network structure that training set image steps for importing S106 is established It further include being converted to training set image before training.Whether image mapping function is settable opens, in the training of every wheel, if The transformation of training set image is opened, training set sample will be done a round transformation again.Directly sample size is expanded.Expanding data Collection realizes industrial application.

Further, the method for described image transformation includes transverse translation, longitudinal translation, rotation, change picture contrast, Change brightness, setting fuzzy region range and fog-level and adjustment noise size, and the transformation of every class can all be controlled in detail System transformation quantity.

Further, the training frame that recognition training is carried out in the step S107 uses the calculation of dynamical learning rate Method sentences convergence algorithm with automatic.In order to realize intelligent training, alternatively referred to as on-hook training.This CNN training frame uses dynamic The algorithm of learning rate sentences convergence algorithm with automatic.With the expansion of training round, learning rate can be according in reversed gradient algorithm Change of gradient and dynamic adjust, gradually reduce to preset value.Change of gradient changes within a certain period of time is less than threshold value, then being System will be completed voluntarily in deconditioning, mark training.After the completion of training, with test program obtained network file to unknown sample Collection carries out Classification and Identification, then is slightly aided with artificial correction, just can very easily expand training dataset, be iterated training.Most The network class accuracy rate realized eventually can reach 99.8%.

Further, each layer of sample preparation includes:

First layer training: need to prepare the picture of interior picture, car scene；

Second layer training: need be in interior scene driver's number of people and be not the number of people picture；

Third layer training: the picture of violation and the not picture of violation are needed in driver's number of people.

In conclusion by adopting the above-described technical solution, the beneficial effects of the present invention are:

1. a kind of recognition methods of the driving unlawful practice based on CNN of the present invention, is fully automated, using artificial intelligence Mode is installed machines to replace manual labor.

2. a kind of recognition methods of the driving unlawful practice based on CNN of the present invention, recognition speed is fast, and accuracy of identification is high, solution It has determined the confinement problems of traditional Open Framework.

Detailed description of the invention

Examples of the present invention will be described by way of reference to the accompanying drawings, in which:

Fig. 1 is the multitiered network realization procedure chart of use of the invention；

Fig. 2 is repetitive exercise procedure chart of the invention；

Fig. 3 is the CNN structure chart of the first layer network of the invention；

Fig. 4 is the CNN structure chart of the second layer network of the invention；

Fig. 5 is the CNN structure chart of third layer network of the invention.

Specific embodiment

All features disclosed in this specification or disclosed all methods or in the process the step of, in addition to mutually exclusive Feature and/or step other than, can combine in any way.

It elaborates below with reference to Fig. 1 to Fig. 5 to the present invention.

Embodiment 1

S2, interception monitor video image；

Training parameter is set with the artificial intelligence frame of independent research again, every layer is built CNN network, carries out each layer of figure As recognition training.This stage is the process of iteration, and each layer network is relatively independent training mission, can be to control in every layer Training effect processed for weak link so that it is convenient to be regulated and controled.Finally trained three networks are spelled, just be completed From initial monitor video to driver's unlawful practice.

Embodiment 2

The present embodiment the difference from embodiment 1 is that, further, establish multilayer driving behavior in the step S1 CNN identify network method the following steps are included:

S101, first layer CNN identification network is established；

S102, second layer CNN identification network is established；

S103, third layer CNN identification network is established；

S105, sample prepare, and sample is divided into training set and test set；

S106, the CNN network structure for constructing respective layer；

Further, each layer of sample preparation includes:

Embodiment 3

Referring to Fig.1, unlawful practice recognition methods is entirely driven to realize by three-layer network.

First layer identifies that the video is the scene outside interior or vehicle；

The second layer identifies wherein according to the interior scene that first layer sorts out comprising driver's number of people again；

Third layer in the interior scene there are driver's number of people sorted out according to the second layer, then identifies driver respectively Whether make a phone call, smoke, eating, the behaviors such as fatigue driving.

Referring to Fig. 2, driving the training process that unlawful practice identifies, specific step is as follows:

S105 prepares training sample

Sample data of the invention is generated based on third party's monitor supervision platform passenger-cargo carriage Vehicular video, the monitor video of acquisition Sample resolution generally 352*288.It can certainly obtain by other means, resolution ratio is also not necessarily limited to 352*288.To this A little videos carry out key frame interception.

Since the present invention uses three-layer network, to prepare the training sample of three types.

First layer, interior car scene.By the compression of images of original 352*288 at 72*88.

The second layer, driver's Head recognition.Prepare the sample data that size is 32*32, being divided into is the one kind for driving the number of people With the non-class for driving the number of people.This sample is to be come out by the image of former 352*288 according to the size cut of 64*64, writes down the upper left corner Coordinate recompresses the size for 32*32.

Third layer, driver make a phone call to identify.According to the picture in second layer training sample, selecting is the figure for driving the number of people Whether further according to making a phone call, smoking, eat etc., behaviors are divided into positive and negative two class.Size is 64*64.

After classification gets out sample, image preprocessing is carried out.Using OpenCV shape library by RGB255 color-map representation at ash Degree figure, the grayscale image pixel value being converted to further increase efficiency, then are uniformly processed on the section 0-1 between 0 to 255.

S106 constructs CNN network structure in the neural network framework of independent research.

Use CNN network structure as follows outside interior vehicle as shown in figure 3, first layer of the present invention identifies:

This training network has ten layers, is made of an input layer, five convolutional layers, three pond layers and an output layer.

First layer is input layer, the map that input parameter is a 72*88.

The second layer is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3's that 6 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 72*88, and output dimension of picture is 72*88.

Third layer is pond layer, and according to maximum value pond, the core that 6 sizes are 2x2, moving step length 2 does not cover pond Change.Input dimension of picture is 72*88, and output dimension of picture is 36*44.

4th layer is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3 that 12 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 36*44, and output dimension of picture is 36*44.

Layer 5 is pond layer, and according to maximum value pond, the core that 12 sizes are 2x2, moving step length 2 does not cover pond Change.Input dimension of picture is 36*44, and output dimension of picture is 18*22.

Layer 6 is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3 that 24 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 18*22, and output dimension of picture is 18*22.

Layer 7 is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3 that 24 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 18*22, and output dimension of picture is 18*22

8th layer is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3 that 24 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 18*22, and output dimension of picture is 18*22

9th layer is pond layer, and using Relu activation primitive, the core that 24 sizes are 2x2, moving step length 2 is not covered. Input dimension of picture is 18*22, and output dimension of picture is 9*11.

Tenth layer is output layer, and using Softmax activation primitive, number of classifying is 2.

As shown in figure 4, second layer identification of the present invention drives the number of people and uses CNN network structure as follows:

This training network has seven layers, is made of an input layer, four convolutional layers, a pond layer and an output layer.

First layer is input layer, the map that input parameter is a 32*32.

The second layer is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3 that 16 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 32*32, and output dimension of picture is 32*32.

Third layer is pond layer, and according to maximum value pond, the core that 16 sizes are 2x2, moving step length 2 does not cover pond Change.Input dimension of picture is 32*32, and output dimension of picture is 16*16.

4th layer is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3 that 32 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 16*16, and output dimension of picture is 16*16.

Layer 5 is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3 that 32 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 16*16, and output dimension of picture is 16*16.

Layer 6 is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3 that 32 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 16*16, and output dimension of picture is 16*16

Layer 7 is output layer, and using Softmax activation primitive, number of classifying is 2.

As shown in figure 5, third layer identification of the present invention drives the number of people and uses CNN network structure as follows:

This training network has eight layers, is made of an input layer, four convolutional layers, two pond layers and an output layer.

First layer is input layer, the map that input parameter is a 64*64.

The second layer is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3 that 16 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 64*64, and output dimension of picture is 64*64.

Third layer is pond layer, and according to maximum value pond, the core that 16 sizes are 2x2, moving step length 2 does not cover pond Change.Input dimension of picture is 64*64, and output dimension of picture is 32*32.

4th layer is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3 that 32 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 32*32, and output dimension of picture is 32*32.

Layer 5 is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3 that 32 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 32*32, and output dimension of picture is 32*32.

Layer 6 is convolutional layer, and using Relu activation primitive, convolution mode is band edge convolution, and it is 3x3 that 32 sizes, which are arranged, Convolution kernel, moving step length 1.Input dimension of picture is 32*32, and output dimension of picture is 32*32

Layer 7 is pond layer, and according to maximum value pond, the core that 16 sizes are 2x2, moving step length 2 does not cover pond Change.Input dimension of picture is 32*32, and output dimension of picture is 16*16.

8th layer is output layer, and using Softmax activation primitive, number of classifying is 2.

S109, the transformation of training set image.

The method of image transformation has: lateral, longitudinal translation, rotation change picture contrast, change brightness, and mould is arranged Regional scope and fog-level are pasted, noise size is adjusted.

And transformation quantity can all be controlled in detail to the transformation of every class.Whether image mapping function is settable opens, in every wheel In training, if opening the transformation of training set image, training set sample will be done a round transformation again.It can expand as needed Data set is truly realized small sample and is trained, and realizes industrial application.

S107, training.

CNN training is begun to after the completion of image transformation.In order to realize intelligent training, alternatively referred to as on-hook training.This CNN instruction Practice frame to use the algorithm of dynamical learning rate and sentence convergence algorithm automatically.With the expansion of training round, learning rate can basis Change of gradient in reversed gradient algorithm and dynamic adjusts, gradually reduce to preset value.Change of gradient becomes within a certain period of time Change and be less than threshold value, then system will be completed voluntarily in deconditioning, mark training.

The network finally realized, recognition accuracy of making a phone call reach 97%, smoking recognition accuracy 98%, identification of eating Accuracy rate 97%.

Embodiment 4

The present invention is directed to the server-side hardware depth optimization of cross operating system, realizes in low-level hardware, with 4 core 2GHz's The calculated performance of x86 framework CPU is under standard, for CIF resolution ratio Vehicular video, using 60s duration as standard interception five The recognition speed of Zhang Tu, each behavior are less than 100ms.Accuracy of identification is high.Phone recognition accuracy reaches 97%, identification of smoking Accuracy rate 98%, recognition accuracy 97% of eating.

The above, only the preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, it is any Those skilled in the art within the technical scope disclosed by the invention, can without the variation that creative work is expected or Replacement, should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be limited with claims Subject to fixed protection scope.

Claims

1. a kind of recognition methods of the driving unlawful practice based on CNN, it is characterised in that: the following steps are included:

S2, interception monitor video image；

S3, the monitor video classmate for intercepting step S2 are sent into the first layer for the CNN identification network that step S1 is established, CNN identification The first layer identification video image of network belongs to interior scene or car scene, and the video image for belonging to interior scene is sent into Step S4 eliminates the image for belonging to car scene；

The second layer of S4, CNN identification network identifies the step S3 video image being sent into, and extracts in video image and drives Member head part is distributed into step S5；

S5, CNN identification network third layer the step S4 video image being sent into is identified, judge driver behavior whether In the presence of in violation of rules and regulations.

2. a kind of recognition methods of driving unlawful practice based on CNN according to claim 1, it is characterised in that: described Established in step S1 multilayer driving behavior CNN identification network method the following steps are included:

S101, first layer CNN identification network is established；

S102, second layer CNN identification network is established；

S103, third layer CNN identification network is established；

3. a kind of recognition methods of driving unlawful practice based on CNN according to claim 2, it is characterised in that: described The foundation of every layer of CNN identification network the following steps are included:

S105, sample prepare, and sample is divided into training set and test set；

S106, the CNN network structure for constructing respective layer；

4. a kind of recognition methods of driving unlawful practice based on CNN according to claim 3, it is characterised in that: described Step S107 further includes to training set figure before being trained in CNN network structure that training set image steps for importing S106 is established As being converted.

5. a kind of recognition methods of driving unlawful practice based on CNN according to claim 4, it is characterised in that: described The method of image transformation includes transverse translation, longitudinal translation, and rotation changes picture contrast, changes brightness, setting confusion region Domain range and fog-level and adjustment noise size, and transformation quantity can all be controlled in detail to the transformation of every class.

6. a kind of recognition methods of driving unlawful practice based on CNN according to claim 3, it is characterised in that: described The training frame that recognition training is carried out in the step S107 uses the algorithm of dynamical learning rate and sentences convergence algorithm automatically.

7. a kind of recognition methods of driving unlawful practice based on CNN according to claim 3, it is characterised in that: described Each layer of sample prepares