CN110309760A

CN110309760A - The method that the driving behavior of driver is detected

Info

Publication number: CN110309760A
Application number: CN201910561203.0A
Authority: CN
Inventors: 高治良
Original assignee: Shenzhen Micro & Nano Integrated Circuit And System Application Institute
Current assignee: Shenzhen Micro & Nano Integrated Circuit And System Application Institute
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2019-10-08

Abstract

The present invention provides the method that the driving behavior of a kind of couple of driver is detected, including obtaining step；Detecting step can detect eye state, mouth state, cigarette state and telephone state in each frame image simultaneously；Judgment step, it can be performed simultaneously the judgement of eye closing fatigue state, the judgement of yawn fatigue state, the judgement of smoking violation state and call violation state judgment step: and alarming step, it makes a sound or shines, and external equipment is sent by video or image, wherein, detecting step uses the algorithm based on convolutional neural networks, the convolutional neural networks include at least one convolutional layer, at least one residual layers, a global pool layer and a full articulamentum, wherein, one BN layers and one LeakyReLU layers are disposed in each convolutional layer.

Description

The method that the driving behavior of driver is detected

Technical field

The present invention relates to field of visual inspection, the method detected more particularly, to the driving behavior to driver.

Background technique

Driving behavior detection is generally divided into fatigue driving detection and violation motion detection.The fatigue driving detection of early stage is main From medical angle, physiological characteristic is measured by medical device, research fatigue knock sleep Producing reason and other lure Hair factor finds the method that can monitor or avoid fatigue driving.One of monitoring method is to utilize intelligent alarm system, benefit With infrared signal processing method, judge whether driver is dozing or falling asleep.For example, being exactly than more typical fatigue Detection uses special infrared LED device, is shown according to volume reflection of the retina of people to different wave length infrared light is different Physiological characteristic obtains two width eyes with minute differences using the infrared light supply of 850nm and 950nm wavelength within the same time Image, then by this two images progress difference subtract each other, position and the size of eye pupil can be extracted.It uses again PERCLOS rule calculates the closure degree of eye to judge the degree of fatigue.Hereafter, computer vision is instead of infrared LED Scheme, by disposing a CCD camera in the car to monitor driver eye's state, (including eyelid, pupil change and blink Frequency), and accurate location of the driver eye in face image is determined with quick and easy algorithm, multiple image is tracked to supervise Control driver whether driving fatigue.

Movement refers to the movement or behavior for some influence vehicle safe drivings made in driving procedure in driver in violation of rules and regulations, than It such as makes a phone call, plays mobile phone, smokes, whispering to each other.Violation motion detection solves generally by way of computer vision.It is existing There is technology that different algorithms is respectively adopted to detect for different movements.Since various algorithms differ greatly, if whole portion It affixes one's name to mobile terminal then too huge redundancy, so hindering its application on mobile terminal.

From the point of view of comprehensive status both domestic and external, it can accomplish that real-time, effective, simply detection driver tired situation is current The emphasis and hot spot of research, but put into there are no highly developed product carry out actual answer in the market currently on the market With this is primarily present difficulty below: firstly, as described above, the input cost of product is relatively high, the speed of service is relatively slow (superfluous It is remaining excessive) and accuracy it is inadequate, therefore commercialization popularization can not be carried out well；Secondly, such as EEG, " Sobering belt ", monitoring eye The validity of the methods of mirror is good, but due to being the device of contact, leverages driver and move freely；Finally, Also as the difference (gender, spectacles, light conditions, road conditions etc.) of individual and environment is subject to different influences.

Summary of the invention

The present invention at high cost, traditional images identification technology scheme redundancy complexity spy for existing sensor technology scheme Point, proposes a kind of Computer Vision Detection scheme end to end neural network based, and integration solves tired driver and drives, inhales Cigarette, the motion detections such as make a phone call, the present invention extract feature using convolutional neural networks, detect each instantaneous time point target human body State (eye opening and closing, mouth be opened and closed, make a phone call, smoke), the tired shape of comprehensive descision target body within a certain period of time State and driving condition, and alarm signal is issued when tired driver drives or makes and acts in violation of rules and regulations.

In order to solve the deficiency of existing driving behavior detection method, the present invention provides the driving behaviors of a kind of couple of driver The method detected, comprising: obtaining step；Detecting step can detect eye state in each frame image, mouth simultaneously Portion's state, cigarette state and telephone state；Judgment step can be performed simultaneously the judgement of eye closing fatigue state, yawn fatigue shape State judgement, the judgement of smoking violation state and call violation state judgment step: and alarming step, it makes a sound or shines, and External equipment is sent by video or image, wherein detecting step uses the algorithm based on convolutional neural networks, the convolutional Neural Network includes at least one convolutional layer, at least one residual layers, a global pool layer and a full articulamentum, wherein One BN layers and one LeakyReLU layers are disposed in each convolutional layer.

The present invention provides driver driving behavioral value neural network based detection modes end to end, solve tradition The problems such as detection mode is at high cost or traditional images processing computing redundancy is complicated is uniformly exported by a kind of network and all is sentenced Break as a result, settling at one go.Compared to traditional image detection means, the present invention using deep learning by the way of without doing image increasing The pretreatment of strong aspect is adapted to the various extreme environments such as uneven illumination is even, target signature is diversified, background is complicated, and props up The incremental training for scene is held, in actual use, training sample is calibrated by periodically manual intervention appropriate, is promoted Accuracy rate under dedicated scene.

By the way that the description of exemplary embodiment, other features of the invention be will be apparent referring to the drawings.

Detailed description of the invention

The attached drawing being included in the description and forms part of the description instantiates exemplary embodiment of the present invention, spy In terms of seeking peace, and the principle used to explain the present invention together with explanatory note.

Fig. 1 is the schematic diagram according to the detection process of one aspect of the invention.

Fig. 2 is the structural schematic diagram of the convolutional neural networks of one aspect according to the present invention.

Fig. 3 is the overhaul flow chart of the fatigue of one aspect or violation state according to the present invention.

Specific embodiment

The various exemplary embodiments of detailed description of the present invention, feature and aspect below with reference to accompanying drawings.It should be pointed out that removing Non- in addition to illustrate, the relative configuration of the component, digital representation and the numerical value that describe in these embodiments does not limit the present invention Range.It it should be pointed out that the following examples are not intended to limit the scope of the present invention recorded in claim, and is not these Whole combinations of feature described in embodiment are necessary to the present invention.

To solve the above problems, the present invention provides the method detected to the driving behavior of driver.

In driver's cabin, photographic device or image capturing device (such as camera, camera etc.) are installed.Photographic device or figure As filming apparatus preferably face driver.In entire driving procedure, photographic device or image capturing device are successive Ground records the driving behavior of driver, to form the video including multiple image.Image can be sent to and will be described below Image processing unit in carry out feature extraction and operation.Alternatively, meet predetermined condition frame image (for example, the 5th frame, the 10th Frame, the 15th frame, the 20th frame, and so on etc.) can be sent in the image processing unit that will be described below carry out feature mention It takes and operation.Certainly, filming apparatus also can be set to the behavior for shooting other staff in driver's cabin.

As shown in Figure 1, storage medium, which can be performed, in method, apparatus of the invention, system and computer can pass through convolutional Neural net Network extracts video or feature in image and classifies, and detects the position at each position of human body and state in image respectively, And the presence or absence and their respective positions of cigarette and phone.Then, eye and mouth directly are obtained by convolutional neural networks Open and-shut mode, smoking state (or smoking state), the telephoning state (or talking state) in portion etc..Each state it is continuous Duration or number can be calculated.It can determine that driver is in tired shape if detecting successional eye closing or yawn State, system can export corresponding caution signal；If the case where detecting the presence of smoking and call, can determine at driver In violation state, also exportable corresponding warning message or directly as warning output at this time.

<<convolutional neural networks architectural overview>>

Firstly, convolutional neural networks of the invention are made of the convolutional layer of 1*1 and 3*3 a series of as general introduction, each convolutional layer It afterwards all can be with one BN layers and one LeakyReLU layers.Meanwhile in order to solve due to convolutional network depth increase caused by property The problem of capable of declining, also introduces residual layers, finally in the end of convolutional neural networks addition global pool layer and entirely Articulamentum reuses softmax and classifies.Wherein, the step-length (strides) of convolution is defaulted as (1,1), " padding " ( Whether boundary pixel point is lost when convolution) same(is defaulted as i.e. in convolution) side length of padding default is 1, using 0 filling (before convolution algorithm, a circle 0 is mended around image, then does convolution again).Preferably, in network of the invention, Padding is preferably always by the way of same.

<<first embodiment of convolutional neural networks structure>>

Hereinafter, carrying out the specific structure for the convolutional neural networks that the present invention will be described in detail referring to Fig. 2.

Firstly, being input into the 1st convolutional layer by the video or a series of images of photographic device acquisition (" Convolutional "), to carry out preliminary feature extraction to image.Here, using the image having a size of 256*256 as showing Example is illustrated.It will be understood by those skilled in the art that the image of other sizes can also be inputted.Certainly, subsequent convolution knot Structure also can occur to change accordingly because of the size of input picture difference, such as increase or decrease convolutional layer, increase or decrease volume Product core size and quantity, increase or less residual layers of quantity change the position etc. of residual in a network.As Example, the convolution kernel size of the 1st convolutional layer are configured to 3*3, and convolution nuclear volume is arranged to 32.After layer processing, output Image having a size of 256*256.

Then, the image of the 1st convolutional layer output enters the 2nd convolutional layer, to carry out down-sampling, downscaled images ruler to image It is very little.The convolution kernel size of 2nd convolutional layer is configured to 3*3/2, and convolution nuclear volume is arranged to 64.2nd convolutional layer is by image Size reduction is 128*128 and exports.

Then, the image of the 2nd convolutional layer output enters the 3rd combination layer, to extract feature and increase network depth.3rd group It closes layer and includes the 31st convolutional layer, the 32nd convolutional layer and Residual layers.Wherein, the convolution kernel size of the 31st convolutional layer is configured to 1*1, convolution nuclear volume are arranged to 32；The convolution kernel size of 32nd convolutional layer is configured to 3*3, and convolution nuclear volume is arranged to 64.Through the layer processing after, still Output Size be 128*128 image.

Then, the image of the 3rd combination layer output enters the 4th convolutional layer, to carry out down-sampling, downscaled images ruler to image It is very little.The convolution kernel size of 4th convolutional layer is configured to 3*3/2, and convolution nuclear volume is arranged to 128.After layer processing, output Image having a size of 64*64.

Then, the image of the 4th convolutional layer output sequentially enters 2 (2x) the 5th combination layers, to extract feature and increase network Depth.Each 5th combination layer includes the 51st convolutional layer, the 52nd convolutional layer and Residual layers.Wherein, the volume of the 51st convolutional layer Product core size is configured to 1*1, and convolution nuclear volume is arranged to 64；The convolution kernel size of 52nd convolutional layer is configured to 3*3, volume Product nuclear volume is arranged to 128.After the processing of 2 the 5th combination layers, still Output Size be 64*64 image.

Next, the image of the 5th combination layer output enters the 6th convolutional layer, to carry out down-sampling, downscaled images ruler to image It is very little.The convolution kernel size of 6th convolutional layer is configured to 3*3/2, and convolution nuclear volume is arranged to 256.After layer processing, output Image having a size of 32*32.

Continue, the image of the 6th convolutional layer output sequentially enters 4 the 7th combination layers, to extract feature and increase network depth Degree.Each 7th combination layer includes the 71st convolutional layer, the 72nd convolutional layer and Residual layers.Wherein, the convolution kernel of the 71st convolutional layer Size is configured to 1*1, and convolution nuclear volume is arranged to 128；The convolution kernel size of 72nd convolutional layer is configured to 3*3, convolution Nuclear volume is arranged to 256.Through the layer processing after, still Output Size be 32*32 image.

Then, the data of the 7th combination layer output enter the 8th convolutional layer, to carry out down-sampling, downscaled images ruler to image It is very little.The convolution kernel size of 8th convolutional layer is configured to 3*3/2, and convolution nuclear volume is arranged to 512.After layer processing, output Image having a size of 16*6.

Subsequently, the image of the 8th convolutional layer output sequentially enters 4 the 9th combination layers, to extract feature and increase network depth Degree.Each 9th combination layer includes the 91st convolutional layer, the 92nd convolutional layer and Residual layers.Wherein, the convolution kernel of the 91st convolutional layer Size is configured to 1*1, and convolution nuclear volume is arranged to 256；The convolution kernel size of 92nd convolutional layer is configured to 3*3, convolution Nuclear volume is arranged to 512.Through the layer processing after, still Output Size be 16*16 image.

Continue, the image of the 9th combination layer output enters the 10th convolutional layer, to carry out down-sampling, downscaled images ruler to image It is very little.The convolution kernel size of 10th convolutional layer is configured to 3*3/2, and convolution nuclear volume is arranged to 1024.It is defeated after layer processing Out having a size of the image of 8*8.

Then, the image of the 10th convolutional layer output sequentially enters 2 the 11st combination layers, to extract feature and increase network depth Degree.Each 11st combination layer includes the 111st convolutional layer, the 112nd convolutional layer and Residual layers.Wherein, the volume of the 111st convolutional layer Product core size is configured to 1*1, and convolution nuclear volume is arranged to 512；The convolution kernel size of 112nd convolutional layer is configured to 3*3, Convolution nuclear volume is arranged to 1024.After layer processing, Output Size is the image of 8*8.

Then, the image of the 11st combination layer output sequentially enters global pool layer and full articulamentum, to classify.Complete Office's pond layer carries out global pool to obtained characteristic pattern 8*8, obtains a characteristic point.In full articulamentum, input dimension is used Two layers of neural network for being 2 for 256, output dimension handles the characteristic point, and wherein first layer neural network passes through TanH activation primitive, second layer neural network connect softmax function.

<<second embodiment of convolutional neural networks structure>>

If in order to reduce the parameter of network and calculation amount, on the one hand can the parameter appropriate that reduce network, on the other hand can be with A part of network layer is dismissed, and indistinctively influences neural network accuracy.For example, can be slightly on the basis of the first specific embodiment Deformation obtains the second specific embodiment.The parameter setting of the convolutional layer being identical with the first embodiment and combination layer will not described here And arrangement mode.The difference is that 2 points: first, second embodiment does not have the 7th for second embodiment and the first specific embodiment Combination layer, that is, the image of the 6th convolutional layer output is directly entered the 8th convolutional layer.Second, second embodiment the 11st combination layer it Afterwards, the 12nd convolutional layer and the 13rd combination layer are increased.

As an example, the convolution kernel size of the 12nd convolutional layer is configured to 3*3/2, convolution nuclear volume is arranged to 1024. After layer processing, Output Size is the image of 8*8.

As an example, the 13rd combination layer includes the 131st convolutional layer, the 132nd convolutional layer and Residual layers.Wherein, the 131st The convolution kernel size of convolutional layer is configured to 1*1, and convolution nuclear volume is arranged to 512；The convolution kernel size quilt of 132nd convolutional layer It is configured to 3*3, convolution nuclear volume is arranged to 1024.Through the layer processing after, still Output Size be 8*8 image.Later, the figure As entering global pool layer.

<<training methods and parameter of convolutional neural networks>>

Convolution kernel in convolutional layer and full articulamentum using obey mean value is 0, standard deviation is 0.1 random numbers of Gaussian distribution into Row initialization, bias term are initialized using the uniform random number that section is [0,1] is obeyed.

In batch processing layer, momentum is set as 0.95, and constant is set as 0.01.

Using AdaDelta gradient descent algorithm training weight, batch processing is dimensioned to 64.

According to a certain percentage be arranged data training set, verifying collection and test set, after the training in 20 generations, every generation all into The test of row verifying collection, that as a result best generation training pattern can be saved and used for the test of test set, and result is The result entirely learnt.

Setting total data changes cycle of training as 100 generations, and in training, the positive negative sample ratio in training set is 10:1, often In generation training, the negative sample and whole positive samples for successively upsetting 20% are trained, until whole negative samples have trained completion one A cycle of training.

Above-mentioned experimental method and parameter are obtained on the basis of scientific research by many experiments.These methods and ginseng Number is very applicable for driver environment of the present invention, especially in detection eye state, mouth state, smoking shape It is especially pronounced when state and talking state.

Video or image pass through convolutional neural networks feature extraction, and divide an image into 11*11 sub-box in advance, with every Centered on a grid, 5 Random candidate frames are randomly generated respectively, each candidate frame is carried out in the full articulamentum of the last layer Classification, obtains classification results and the position of each candidate frame with this；In network training, following several states: image are drafted The position and open and-shut mode of middle driver eye or mouth, whether driver lifts mobile phone is fitted in the state of face, hand The position of machine, cigarette position；State judgement or alert if:

Fatigue state: the state that eye is in closure is eye strain characterization, if the continuous duration of eye closure is more than 3s (that is, eye closing scheduled duration, such as 3s, 5s, 10s etc.) is then assert and is in eye closing fatigue state；Mouth is in the state opened greatly i.e. For mouth fatigue characterization, if the big Zhang Lianxu duration of mouth is more than 1s(that is, yawn scheduled duration, such as 2s) and when yawn is set Between during detect in (for example, at least 60s, 100,120s etc.) 3 times or more, then assert and be in yawn fatigue state.It closes one's eyes Fatigue state and yawn fatigue state are referred to as fatigue state.

Smoking state: as long as detecting the presence of cigarette and cigarette is defined as smoking state close to mouth.If such State reaches 3 times or 4 times or 5 times (smoking pre-determined number) in (such as 5s, 10s, 20s etc.) during setting time of smoking, then It can be determined that driver smokes in violation of rules and regulations.

Talking state: driver, which lifts mobile phone and is fitted in face, is defined as talking state, if the state is continuous Such as 5s or more (that is, call scheduled duration, such as 6s, 8s, 15s etc.) then can be determined that driver converses in violation of rules and regulations.

As the detection example of eye closing fatigue state, during video flow detection, when detecting that eye is in for the first time When closed state, record the current time (such as 10:10:10) and/or record present frame number (that is, time or number, under Together).Later during continuous detection, if being consecutively detected this kind of state, continuous integration variable, if subsequent detection In continuous several frames or back to back next frame can't detect this kind of state, illustrate that eyes are opened, just interrupt statistics, during this section Variate-value (unit: frame) or time started to the time difference (unit: second (s)) terminated between record be exactly closed-eye state Continuous duration.The present invention sets the maximum continuous time (that is, eye closing scheduled duration) of eye closing as 3s.As known to those skilled in the art, The other times such as 4s, 5s are also set to maximum continuous time of closing one's eyes.

As an example, if 1-10 frame is not detected eye and is in closed state, initial time of closing one's eyes and the company of eye closing Continuous duration is disposed as 0.If detecting that eye is in closed state in the 11st frame, current time, for example, 10 are recorded: 10:10, and eye closing initial time is set by the time.If until the 20th frame detects that eye is constantly in closed state, Continuous updating current time is until the time of the 20th frame, for example, 10:10:11, then close one's eyes a length of 1s of consecutive hours, not up to closes one's eyes Scheduled duration not can determine that driver is in eye closing fatigue state at this time.If detecting that eye is in the 21st frame opens state, It then indicates that driver is not in the state continuously closed one's eyes, excludes the possibility of fatigue driving.Eye closing initial time and eye closing at this time Continuous duration is updated to 0.Alternatively, if the 11st frame is to during the 20th frame, and in the 21st frame to during the 60th frame Detect that eye is constantly in closed state in consecutive image, then the current time being recorded constantly refresh (from the 12nd frame when Between start recording, flush to the time of the 60th frame always) to the time of the 60th frame, for example, 10:10:15, then consecutive hours of closing one's eyes Length is updated to 5s.(the present embodiment be more than) eye closing scheduled duration (such as 3s) at this point, reach since eye closing consecutive hours is long, then Assert that driver is in sleep or doze state, triggering alarm module makes a sound or the alarm of light, and controls associated picture Or video is transmitted to external equipment (such as console).After alarm, eye closing initial time and continuous duration of closing one's eyes are reset as 0, it is detected into next round.

As the detection example of yawn fatigue state, during video flow detection, when detecting that mouth is in for the first time When opening state greatly, records the current time (such as 10:10:10) and/or record the number of present frame.It continuously detects later In the process, if being consecutively detected this kind of state, then continuous integration variable, if continuous several frames or back to back in subsequent detection Next frame can't detect this kind of state, just interrupt statistics, the variate-value (unit: frame) or time started during this section to end Time difference (unit: second (s)) between record is exactly the continuous duration of yawn state.The present invention sets yawn maximum consecutive hours Between (that is, yawn scheduled duration) be 1s.As known to those skilled in the art, other times are also set to yawn maximum consecutive hours Between.

As an example, if 1-10 frame is not detected mouth and is in a state greatly, by yawn initial time and yawn Continuous duration is disposed as 0.If detecting that mouth is in a state greatly in the 11st frame, current time, for example, 10 are recorded: 10:10, and yawn initial time is set by the time.If still detecting that mouth is constantly in a shape greatly until the 15th frame State then records current time, for example, 10:10:10 ' 30, then a length of 0.5s of yawn consecutive hours.At this point, not up to yawn is pre- Timing is long (the present embodiment 1s), therefore not can determine that driver is in yawn fatigue state.If since the 11st frame until 40th frame detects that mouth is constantly in a state greatly, then the current time being recorded constantly refreshes (to be opened from the time of the 12nd frame Begin record, and the time of the 60th frame is recorded always) to the time of the 40th frame, for example, 10:10:12, then the continuous duration quilt of yawn It is updated to 2s.(the present embodiment be more than) yawn scheduled duration (such as 1s), then yawn at this point, reach since yawn consecutive hours is long Number is updated to 1 from 0, indicates that driver has played a yawn at this time, while yawn initial time and the continuous duration of yawn are equal It is updated to 0.Hereafter detection process continues, if remembering just detecting that mouth is again at a state greatly until the 100th frame Current time is recorded, for example, 10:10:16 then records the current time, and sets yawn initial time for the time.If from 100th frame starts to detect that mouth is constantly in a state, the then current time being recorded greatly and constantly refreshes until the 140th frame (from the time start recording of the 101st frame, the time of the 140th frame is recorded always) is to the time of the 140th frame, for example, 10:10: 18, then the continuous duration of yawn is updated to 2s.(the present embodiment be more than) yawn is predetermined at this point, reach since yawn consecutive hours is long Duration (such as 1s), then yawn number is updated to 2 from 1, indicates that driver has played 2 yawns at this time, while when yawn starting Between and the continuous duration of yawn be updated to 0.And so on.If being set in the yawn that the yawn initial time of first time yawn starts During fixing time in (for example, 30s, 40s, 50s), detect that yawn number is that 4 times (or 5 times or 6 times) are greater than yawn predetermined time Number 3 times, then show that driver is in yawn fatigue state.Triggering alarm module makes a sound or the alarm of light at this time, and controls Associated picture or video are transmitted to external equipment (such as console).After alarm, yawn initial time, the continuous duration of yawn and Yawn number is reset as 0, detects into next round.

As smoking violation state detection example, during video flow detection, when detect cigarette exist and first Secondary when detecting cigarette close to mouth, then number of smoking is arranged to 1.Later during continuous detection, if detecting This kind of state, then continuous integration variable.The present invention sets smoking maximum times (that is, smoking pre-determined number) as 3 times.This field skill Art personnel know, 4 times, 5 other inferior numbers be also set to smoking pre-determined number.

As an example, setting 0 for smoking number if cigarette is not detected in 1-10 frame.If being detected in the 11st frame To cigarette and its close to mouth until the 20th frame cigarette is far from mouth, then number of smoking is incremented by 1.If being detected again in the 50th frame For cigarette close to mouth until the 60th frame cigarette is far from mouth, then number of smoking is incremented by 1 again becomes 2.And so on.If smoking Smoking number increases to 3 times or 4 times or 5 inferior in (for example, 10s, 20s, 60s, 90s, 120s etc.) during setting time, then table Show that driver is in smoking violation state, triggers alarm module at this time and make a sound or the alarm of light, and control associated picture Or video is transmitted to external equipment (such as console).After alarm, smoking number is reset as 0, detects into next round.

As the detection example of call violation state, during video flow detection, when detecting that phone is in for the first time When near mouth, records the current time (such as 10:10:10) and/or record the number of present frame.It continuously detects later In the process, if being consecutively detected this kind of state, continuous integration variable, if in subsequent detection continuous several frames or it is back to back under As soon as frame can't detect this kind of state, statistics is interrupted, knot is recorded in the variate-value (unit: frame) or time started during this section Time difference (unit: second (s)) between beam recording is exactly the continuous duration of telephoning state.The present invention sets maximum of making a phone call Continuous time (that is, call scheduled duration) is 5s.As known to those skilled in the art, the other times such as 10s are also set to lead to Talk about maximum continuous time.

As an example, if 1-10 frame is not detected phone and is located near mouth, by call start time and call Continuous duration is disposed as 0.If detecting that phone is near mouth in the 11st frame, current time, for example, 10 are recorded: 10:10, and call start time is set by the time.If until the 20th frame detects that phone is constantly near mouth, Continuous updating current time is until the time of the 20th frame, for example, 10:10:11, then converse a length of 1s of consecutive hours, not up to converses Scheduled duration not can determine that driver is in call violation state at this time.If detecting that phone leaves mouth in the 21st frame, sentence Determine driver and be not in talking state, excludes the possibility driven in violation of rules and regulations.Call start time and continuous duration of conversing are equal at this time It is updated to 0.Alternatively, if the 11st frame is to during the 20th frame, and in the 21st frame into the consecutive image during the 60th frame Detect that phone is constantly near mouth, then the current time being recorded constantly refresh (from the time start recording of the 12nd frame, The time of the 60th frame is recorded always) to the time of the 50th frame, for example, 10:10:15, then continuous duration of conversing is updated to 5s.At this point, reaching call scheduled duration (such as 5s) since call consecutive hours is long, then assert that driver is in call violation shape State, triggering alarm module makes a sound or the alarm of light, and controls associated picture or video being transmitted to external equipment (in such as Control platform).After alarm, call start time and continuous duration of conversing are reset as 0, detect into next round.

The above embodiment of the present invention is exemplary only.The selection of video frame can be timing, and it is non-fixed to be also possible to When, herein with no restrictions.For example, can also be with every in preceding 100 frame every 10 milliseconds or 0.5 second 1 frame videos of interception 10 milliseconds intercept video for unit, and rear 100 frame is with every 5 milliseconds of interceptions video.For example, it may be possible to choose the 1st frame figure in 10:10:10 Picture, chooses the 10th frame image in 10:10:11, chooses the 100th frame image in 10:10:15.In addition, above-mentioned example is to record the time To judge duration, number etc..Those skilled in the art can also judge duration, secondary by way of recording the number of present frame Number etc., this is not as limitation of the present invention.

It is illustrated in figure 3 the schematic flow diagram for the method that the driving behavior of the invention to driver is detected.In step In rapid S301, eye state, mouth state, cigarette state and talking state of the driver in driving procedure are obtained.Then, exist In step S302, while detecting whether eye state meets closed-eye state, whether mouth state meets yawn state, cigarette state Whether meet smoking state and whether telephone state meets talking state, and further judges whether there is driver and be in eye closing Fatigue state, yawn fatigue state, smoking violation state and call violation state.Finally, in step S303, if any of the above-described State meets fatigue or violation state, then issues alarm.Preferably, acquisition shows that driver is in fatigue or violation state Video or picture can be sent to external equipment, such as Central Control Room or security room.

Can be with one BN layers and a LeakyReLU after each convolutional layer, and introduce residual layers of solution network Because of degradation problem caused by depth；Training method and parameter be also by a large number of experimental results show that preferable skill And parameter.Application: application of the convolution algorithm in drivers ' behavior driving behavior detection, end-to-end direct solution detection are asked Topic simplifies traditional complexity, the detection means of redundancy.

Detection method provided by the present invention and device are described in detail above.Specific case used herein Principle and implementation of the present invention are described, the above embodiments are only used to help understand side of the invention Method and its core concept.It should be pointed out that for those skilled in the art, not departing from the principle of the invention Under the premise of, it can be with several improvements and modifications are made to the present invention, these improvement and modification also fall into the claims in the present invention In protection scope.

Claims

1. the method that the driving behavior of a kind of couple of driver is detected characterized by comprising

Obtaining step shoots video of the driver in driving procedure, to obtain eye state, the driver of driver simultaneously Each of mouth state, cigarette state and telephone state multiple image；With

Detecting step can be performed simultaneously following steps in each frame image:

Whether detection eye state meets closed-eye state, if detecting, eye state does not meet closed-eye state, originates closing one's eyes Time and continuous duration of closing one's eyes are disposed as 0；If detecting that eye state meets closed-eye state for the first time, current time is set It is set to eye closing initial time；If previous frame image and detecting eye state symbol with the continuous next frame image of previous frame image Closed-eye state is closed, then is set as closing one's eyes by duration continuous between the current time and eye closing initial time of the next frame image Continuous duration；

Whether detection mouth state meets yawn state, if detecting, mouth state does not meet yawn state, and yawn is originated Time, the continuous duration of yawn and yawn number are disposed as 0；It, will if detecting that mouth state meets yawn state for the first time Current time is set as yawn initial time；If previous frame image and being detected with the continuous next frame image of the previous frame image Meet yawn state to mouth state, then when will be continuous between the current time and yawn initial time of the next frame image Length is set as the continuous duration of yawn；

Whether detection cigarette state meets smoking state, if detecting, cigarette state does not meet smoking state, and will smoke number It is initialized as 0；If detecting, cigarette state meets smoking state, smoking number is added 1, and smoking setting time hereafter During will smoking number be classified as 0；

Whether detection telephone state meets talking state, if detecting, telephone state does not meet talking state, by call starting Time and continuous duration of conversing are disposed as 0；If detecting that telephone state meets talking state for the first time, current time is recorded For call start time；If previous frame image and with the continuous next frame image of the previous frame image detect telephone state accord with Talking state is closed, then is set as conversing by duration continuous between the current time and call start time of the next frame image Continuous duration；

Judgment step can be performed simultaneously following eye closing fatigue state judgement, the judgement of yawn fatigue state, smoking violation state Judgement and call violation state judgment step:

In the judgement of eye closing fatigue state, whether the continuous duration that judges to close one's eyes reaches eye closing scheduled duration, if so, at driver In eye closing fatigue state；

In the judgement of yawn fatigue state, judge whether the continuous duration of yawn reaches yawn scheduled duration, if the continuous duration of yawn Reach yawn scheduled duration, then yawn number adds 1, if yawn number reaches yawn predetermined time during yawn setting time Number, then driver is in yawn fatigue state；

In smoking violation state judgement, judge whether smoking number reaches smoking predetermined time during smoking setting time Number, if so, driver is in smoking violation state；

In call violation state judgement, whether the continuous duration that judges to converse reaches call scheduled duration, if so, at driver In call violation state；

Alarming step is in eye closing fatigue state, yawn fatigue state, smoking violation state and call violation shape in driver It is made a sound when at least one of state or light, and sends external equipment for video or image,

Wherein, detecting step use the algorithm based on convolutional neural networks, the convolutional neural networks include at least one convolutional layer, At least one residual layers, a global pool layer and a full articulamentum, wherein be disposed in each convolutional layer One BN layers and one LeakyReLU layers.

2. the method according to claim 1, wherein convolutional neural networks include being linked in sequence such as lower layer:

1st convolutional layer, image are directly input into the 1st convolutional layer,

2nd convolutional layer,

1 the 3rd combination layer comprising the 31st convolutional layer, the 32nd convolutional layer and Residual layers,

4th convolutional layer,

2 the 5th combination layers, each 5th combination layer include the 51st convolutional layer, the 52nd convolutional layer and Residual layers,

6th convolutional layer,

4 the 7th combination layers, each 7th combination layer include the 71st convolutional layer, the 72nd convolutional layer and Residual layers,

8th convolutional layer,

4 the 9th combination layers, each 9th combination layer include the 91st convolutional layer, the 92nd convolutional layer and Residual layers,

10th convolutional layer,

2 the 11st combination layers, each 11st combination layer include the 111st convolutional layer, the 112nd convolutional layer and Residual layers,

Global pool layer, and

Full articulamentum.

3. the method according to claim 1, wherein convolutional neural networks include being linked in sequence such as lower layer:

2nd convolutional layer,

4th convolutional layer,

6th convolutional layer,

8th convolutional layer,

10th convolutional layer,

12nd convolutional layer,

1 the 13rd combination layer, each 13rd combination layer include the 131st convolutional layer, 132 convolutional layers and Residual layers,

Global pool layer, and

Full articulamentum.

4. according to the method in claim 2 or 3, which is characterized in that

The convolution kernel size of 1st convolutional layer is 3*3, and convolution nuclear volume is 32, and output picture size is 256*256；

The convolution kernel size of 2nd convolutional layer is 3*3/2, and convolution nuclear volume is 64, and output picture size is 128*128；

The convolution kernel size of 31st convolutional layer is 1*1, and convolution nuclear volume is that the convolution kernel size of the 32, the 32nd convolutional layer is 3*3, volume Product nuclear volume is that the 64, the 3rd combination layer output picture size is 128*128；

The convolution kernel size of 4th convolutional layer is 3*3/2, and convolution nuclear volume is 128, and output picture size is 64*64；

The convolution kernel size of 51st convolutional layer is 1*1, and convolution nuclear volume is that the convolution kernel size of the 64, the 52nd convolutional layer is 3*3, volume Product nuclear volume is that the 128, the 5th combination layer output picture size is 64*64；

The convolution kernel size of 6th convolutional layer is 3*3/2, and convolution nuclear volume is 256, and output picture size is 32*32；

The convolution kernel size of 71st convolutional layer is 1*1, and convolution nuclear volume is that the convolution kernel size of the 128, the 72nd convolutional layer is 3*3, Convolution nuclear volume is that the 256, the 7th combination layer output picture size is 32*32；

The convolution kernel size of 8th convolutional layer is 3*3/2, and convolution nuclear volume is 512, and output picture size is 16*16；

The convolution kernel size of 91st convolutional layer is 1*1, and convolution nuclear volume is that the convolution kernel size of the 256, the 92nd convolutional layer is 3*3, Convolution nuclear volume is that the 512, the 9th combination layer output picture size is 16*16；

The convolution kernel size of 10th convolutional layer is 3*3/2, and convolution nuclear volume is 1024, and output picture size is 8*8；

The convolution kernel size of 111st convolutional layer is 1*1, and convolution nuclear volume is that the convolution kernel size of the 512, the 112nd convolutional layer is 3* 3, convolution nuclear volume is that the 1024, the 11st combination layer output picture size is 8*8；

The convolution kernel size of 12nd convolutional layer is 3*3/2, and convolution nuclear volume is 1024, and output picture size is 8*8；

The convolution kernel size of 131st convolutional layer is 1*1, and convolution nuclear volume is that the convolution kernel size of the 512, the 132nd convolutional layer is 3* 3, convolution nuclear volume is that the 1024, the 131st combination layer output picture size is 8*8.

5. the method according to claim 1, wherein eye closing fatigue state refers to the state of eye closure, yawn Fatigue state refers to that mouth is opened greatly, and smoking violation state refers to cigarette close to mouth, and violation state of conversing refers to that mobile phone is fitted in Near face；

Eye closing scheduled duration is arranged at least 3 seconds,

Yawn scheduled duration is arranged at least 1 second,

Yawn pre-determined number is arranged at least 3 times,

It is at least 30 seconds during yawn setting time,

Smoking pre-determined number is arranged at least 3 times,

It is at least 10 seconds during setting time of smoking,

Call scheduled duration is arranged at least 5 seconds.

6. the method according to claim 1, wherein the training method and parameter of convolutional neural networks are constructed Are as follows:

Convolution kernel and full articulamentum in convolutional layer are carried out using the random numbers of Gaussian distribution that mean value is 0, standard deviation is 0.1 is obeyed Initialization, bias term are initialized using the uniform random number that section is [0,1] is obeyed；

In batch processing layer, momentum is set as 0.95, and constant is set as 0.01；

Using AdaDelta gradient descent algorithm training weight, batch processing is dimensioned to 64；

According to the training set of predetermined ratio setting data, verifying collection and test set, after the training in 20 generations, every generation is all tested The test of collection is demonstrate,proved, that as a result best generation training pattern can be saved and used for the test of test set, and result is as entire The result of study；

The setting total data cycle of training that changes be at least 100 generations, and in training, the positive negative sample ratio in training set is 10-15:1, In every generation training, the negative sample and whole positive samples for successively upsetting 10%-30% are trained, until whole negative samples have been trained Complete a cycle of training.