CN110309760A - The method that the driving behavior of driver is detected - Google Patents
The method that the driving behavior of driver is detected Download PDFInfo
- Publication number
- CN110309760A CN110309760A CN201910561203.0A CN201910561203A CN110309760A CN 110309760 A CN110309760 A CN 110309760A CN 201910561203 A CN201910561203 A CN 201910561203A CN 110309760 A CN110309760 A CN 110309760A
- Authority
- CN
- China
- Prior art keywords
- state
- convolutional layer
- layer
- yawn
- convolutional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Abstract
The present invention provides the method that the driving behavior of a kind of couple of driver is detected, including obtaining step;Detecting step can detect eye state, mouth state, cigarette state and telephone state in each frame image simultaneously;Judgment step, it can be performed simultaneously the judgement of eye closing fatigue state, the judgement of yawn fatigue state, the judgement of smoking violation state and call violation state judgment step: and alarming step, it makes a sound or shines, and external equipment is sent by video or image, wherein, detecting step uses the algorithm based on convolutional neural networks, the convolutional neural networks include at least one convolutional layer, at least one residual layers, a global pool layer and a full articulamentum, wherein, one BN layers and one LeakyReLU layers are disposed in each convolutional layer.
Description
Technical field
The present invention relates to field of visual inspection, the method detected more particularly, to the driving behavior to driver.
Background technique
Driving behavior detection is generally divided into fatigue driving detection and violation motion detection.The fatigue driving detection of early stage is main
From medical angle, physiological characteristic is measured by medical device, research fatigue knock sleep Producing reason and other lure
Hair factor finds the method that can monitor or avoid fatigue driving.One of monitoring method is to utilize intelligent alarm system, benefit
With infrared signal processing method, judge whether driver is dozing or falling asleep.For example, being exactly than more typical fatigue
Detection uses special infrared LED device, is shown according to volume reflection of the retina of people to different wave length infrared light is different
Physiological characteristic obtains two width eyes with minute differences using the infrared light supply of 850nm and 950nm wavelength within the same time
Image, then by this two images progress difference subtract each other, position and the size of eye pupil can be extracted.It uses again
PERCLOS rule calculates the closure degree of eye to judge the degree of fatigue.Hereafter, computer vision is instead of infrared LED
Scheme, by disposing a CCD camera in the car to monitor driver eye's state, (including eyelid, pupil change and blink
Frequency), and accurate location of the driver eye in face image is determined with quick and easy algorithm, multiple image is tracked to supervise
Control driver whether driving fatigue.
Movement refers to the movement or behavior for some influence vehicle safe drivings made in driving procedure in driver in violation of rules and regulations, than
It such as makes a phone call, plays mobile phone, smokes, whispering to each other.Violation motion detection solves generally by way of computer vision.It is existing
There is technology that different algorithms is respectively adopted to detect for different movements.Since various algorithms differ greatly, if whole portion
It affixes one's name to mobile terminal then too huge redundancy, so hindering its application on mobile terminal.
From the point of view of comprehensive status both domestic and external, it can accomplish that real-time, effective, simply detection driver tired situation is current
The emphasis and hot spot of research, but put into there are no highly developed product carry out actual answer in the market currently on the market
With this is primarily present difficulty below: firstly, as described above, the input cost of product is relatively high, the speed of service is relatively slow (superfluous
It is remaining excessive) and accuracy it is inadequate, therefore commercialization popularization can not be carried out well;Secondly, such as EEG, " Sobering belt ", monitoring eye
The validity of the methods of mirror is good, but due to being the device of contact, leverages driver and move freely;Finally,
Also as the difference (gender, spectacles, light conditions, road conditions etc.) of individual and environment is subject to different influences.
Summary of the invention
The present invention at high cost, traditional images identification technology scheme redundancy complexity spy for existing sensor technology scheme
Point, proposes a kind of Computer Vision Detection scheme end to end neural network based, and integration solves tired driver and drives, inhales
Cigarette, the motion detections such as make a phone call, the present invention extract feature using convolutional neural networks, detect each instantaneous time point target human body
State (eye opening and closing, mouth be opened and closed, make a phone call, smoke), the tired shape of comprehensive descision target body within a certain period of time
State and driving condition, and alarm signal is issued when tired driver drives or makes and acts in violation of rules and regulations.
In order to solve the deficiency of existing driving behavior detection method, the present invention provides the driving behaviors of a kind of couple of driver
The method detected, comprising: obtaining step;Detecting step can detect eye state in each frame image, mouth simultaneously
Portion's state, cigarette state and telephone state;Judgment step can be performed simultaneously the judgement of eye closing fatigue state, yawn fatigue shape
State judgement, the judgement of smoking violation state and call violation state judgment step: and alarming step, it makes a sound or shines, and
External equipment is sent by video or image, wherein detecting step uses the algorithm based on convolutional neural networks, the convolutional Neural
Network includes at least one convolutional layer, at least one residual layers, a global pool layer and a full articulamentum, wherein
One BN layers and one LeakyReLU layers are disposed in each convolutional layer.
The present invention provides driver driving behavioral value neural network based detection modes end to end, solve tradition
The problems such as detection mode is at high cost or traditional images processing computing redundancy is complicated is uniformly exported by a kind of network and all is sentenced
Break as a result, settling at one go.Compared to traditional image detection means, the present invention using deep learning by the way of without doing image increasing
The pretreatment of strong aspect is adapted to the various extreme environments such as uneven illumination is even, target signature is diversified, background is complicated, and props up
The incremental training for scene is held, in actual use, training sample is calibrated by periodically manual intervention appropriate, is promoted
Accuracy rate under dedicated scene.
By the way that the description of exemplary embodiment, other features of the invention be will be apparent referring to the drawings.
Detailed description of the invention
The attached drawing being included in the description and forms part of the description instantiates exemplary embodiment of the present invention, spy
In terms of seeking peace, and the principle used to explain the present invention together with explanatory note.
Fig. 1 is the schematic diagram according to the detection process of one aspect of the invention.
Fig. 2 is the structural schematic diagram of the convolutional neural networks of one aspect according to the present invention.
Fig. 3 is the overhaul flow chart of the fatigue of one aspect or violation state according to the present invention.
Specific embodiment
The various exemplary embodiments of detailed description of the present invention, feature and aspect below with reference to accompanying drawings.It should be pointed out that removing
Non- in addition to illustrate, the relative configuration of the component, digital representation and the numerical value that describe in these embodiments does not limit the present invention
Range.It it should be pointed out that the following examples are not intended to limit the scope of the present invention recorded in claim, and is not these
Whole combinations of feature described in embodiment are necessary to the present invention.
To solve the above problems, the present invention provides the method detected to the driving behavior of driver.
<acquisition of image>
In driver's cabin, photographic device or image capturing device (such as camera, camera etc.) are installed.Photographic device or figure
As filming apparatus preferably face driver.In entire driving procedure, photographic device or image capturing device are successive
Ground records the driving behavior of driver, to form the video including multiple image.Image can be sent to and will be described below
Image processing unit in carry out feature extraction and operation.Alternatively, meet predetermined condition frame image (for example, the 5th frame, the 10th
Frame, the 15th frame, the 20th frame, and so on etc.) can be sent in the image processing unit that will be described below carry out feature mention
It takes and operation.Certainly, filming apparatus also can be set to the behavior for shooting other staff in driver's cabin.
<processing of image>
As shown in Figure 1, storage medium, which can be performed, in method, apparatus of the invention, system and computer can pass through convolutional Neural net
Network extracts video or feature in image and classifies, and detects the position at each position of human body and state in image respectively,
And the presence or absence and their respective positions of cigarette and phone.Then, eye and mouth directly are obtained by convolutional neural networks
Open and-shut mode, smoking state (or smoking state), the telephoning state (or talking state) in portion etc..Each state it is continuous
Duration or number can be calculated.It can determine that driver is in tired shape if detecting successional eye closing or yawn
State, system can export corresponding caution signal;If the case where detecting the presence of smoking and call, can determine at driver
In violation state, also exportable corresponding warning message or directly as warning output at this time.
<<convolutional neural networks architectural overview>>
Firstly, convolutional neural networks of the invention are made of the convolutional layer of 1*1 and 3*3 a series of as general introduction, each convolutional layer
It afterwards all can be with one BN layers and one LeakyReLU layers.Meanwhile in order to solve due to convolutional network depth increase caused by property
The problem of capable of declining, also introduces residual layers, finally in the end of convolutional neural networks addition global pool layer and entirely
Articulamentum reuses softmax and classifies.Wherein, the step-length (strides) of convolution is defaulted as (1,1), " padding " (
Whether boundary pixel point is lost when convolution) same(is defaulted as i.e. in convolution) side length of padding default is 1, using 0 filling
(before convolution algorithm, a circle 0 is mended around image, then does convolution again).Preferably, in network of the invention,
Padding is preferably always by the way of same.
<<first embodiment of convolutional neural networks structure>>
Hereinafter, carrying out the specific structure for the convolutional neural networks that the present invention will be described in detail referring to Fig. 2.
Firstly, being input into the 1st convolutional layer by the video or a series of images of photographic device acquisition
(" Convolutional "), to carry out preliminary feature extraction to image.Here, using the image having a size of 256*256 as showing
Example is illustrated.It will be understood by those skilled in the art that the image of other sizes can also be inputted.Certainly, subsequent convolution knot
Structure also can occur to change accordingly because of the size of input picture difference, such as increase or decrease convolutional layer, increase or decrease volume
Product core size and quantity, increase or less residual layers of quantity change the position etc. of residual in a network.As
Example, the convolution kernel size of the 1st convolutional layer are configured to 3*3, and convolution nuclear volume is arranged to 32.After layer processing, output
Image having a size of 256*256.
Then, the image of the 1st convolutional layer output enters the 2nd convolutional layer, to carry out down-sampling, downscaled images ruler to image
It is very little.The convolution kernel size of 2nd convolutional layer is configured to 3*3/2, and convolution nuclear volume is arranged to 64.2nd convolutional layer is by image
Size reduction is 128*128 and exports.
Then, the image of the 2nd convolutional layer output enters the 3rd combination layer, to extract feature and increase network depth.3rd group
It closes layer and includes the 31st convolutional layer, the 32nd convolutional layer and Residual layers.Wherein, the convolution kernel size of the 31st convolutional layer is configured to
1*1, convolution nuclear volume are arranged to 32;The convolution kernel size of 32nd convolutional layer is configured to 3*3, and convolution nuclear volume is arranged to
64.Through the layer processing after, still Output Size be 128*128 image.
Then, the image of the 3rd combination layer output enters the 4th convolutional layer, to carry out down-sampling, downscaled images ruler to image
It is very little.The convolution kernel size of 4th convolutional layer is configured to 3*3/2, and convolution nuclear volume is arranged to 128.After layer processing, output
Image having a size of 64*64.
Then, the image of the 4th convolutional layer output sequentially enters 2 (2x) the 5th combination layers, to extract feature and increase network
Depth.Each 5th combination layer includes the 51st convolutional layer, the 52nd convolutional layer and Residual layers.Wherein, the volume of the 51st convolutional layer
Product core size is configured to 1*1, and convolution nuclear volume is arranged to 64;The convolution kernel size of 52nd convolutional layer is configured to 3*3, volume
Product nuclear volume is arranged to 128.After the processing of 2 the 5th combination layers, still Output Size be 64*64 image.
Next, the image of the 5th combination layer output enters the 6th convolutional layer, to carry out down-sampling, downscaled images ruler to image
It is very little.The convolution kernel size of 6th convolutional layer is configured to 3*3/2, and convolution nuclear volume is arranged to 256.After layer processing, output
Image having a size of 32*32.
Continue, the image of the 6th convolutional layer output sequentially enters 4 the 7th combination layers, to extract feature and increase network depth
Degree.Each 7th combination layer includes the 71st convolutional layer, the 72nd convolutional layer and Residual layers.Wherein, the convolution kernel of the 71st convolutional layer
Size is configured to 1*1, and convolution nuclear volume is arranged to 128;The convolution kernel size of 72nd convolutional layer is configured to 3*3, convolution
Nuclear volume is arranged to 256.Through the layer processing after, still Output Size be 32*32 image.
Then, the data of the 7th combination layer output enter the 8th convolutional layer, to carry out down-sampling, downscaled images ruler to image
It is very little.The convolution kernel size of 8th convolutional layer is configured to 3*3/2, and convolution nuclear volume is arranged to 512.After layer processing, output
Image having a size of 16*6.
Subsequently, the image of the 8th convolutional layer output sequentially enters 4 the 9th combination layers, to extract feature and increase network depth
Degree.Each 9th combination layer includes the 91st convolutional layer, the 92nd convolutional layer and Residual layers.Wherein, the convolution kernel of the 91st convolutional layer
Size is configured to 1*1, and convolution nuclear volume is arranged to 256;The convolution kernel size of 92nd convolutional layer is configured to 3*3, convolution
Nuclear volume is arranged to 512.Through the layer processing after, still Output Size be 16*16 image.
Continue, the image of the 9th combination layer output enters the 10th convolutional layer, to carry out down-sampling, downscaled images ruler to image
It is very little.The convolution kernel size of 10th convolutional layer is configured to 3*3/2, and convolution nuclear volume is arranged to 1024.It is defeated after layer processing
Out having a size of the image of 8*8.
Then, the image of the 10th convolutional layer output sequentially enters 2 the 11st combination layers, to extract feature and increase network depth
Degree.Each 11st combination layer includes the 111st convolutional layer, the 112nd convolutional layer and Residual layers.Wherein, the volume of the 111st convolutional layer
Product core size is configured to 1*1, and convolution nuclear volume is arranged to 512;The convolution kernel size of 112nd convolutional layer is configured to 3*3,
Convolution nuclear volume is arranged to 1024.After layer processing, Output Size is the image of 8*8.
Then, the image of the 11st combination layer output sequentially enters global pool layer and full articulamentum, to classify.Complete
Office's pond layer carries out global pool to obtained characteristic pattern 8*8, obtains a characteristic point.In full articulamentum, input dimension is used
Two layers of neural network for being 2 for 256, output dimension handles the characteristic point, and wherein first layer neural network passes through
TanH activation primitive, second layer neural network connect softmax function.
<<second embodiment of convolutional neural networks structure>>
If in order to reduce the parameter of network and calculation amount, on the one hand can the parameter appropriate that reduce network, on the other hand can be with
A part of network layer is dismissed, and indistinctively influences neural network accuracy.For example, can be slightly on the basis of the first specific embodiment
Deformation obtains the second specific embodiment.The parameter setting of the convolutional layer being identical with the first embodiment and combination layer will not described here
And arrangement mode.The difference is that 2 points: first, second embodiment does not have the 7th for second embodiment and the first specific embodiment
Combination layer, that is, the image of the 6th convolutional layer output is directly entered the 8th convolutional layer.Second, second embodiment the 11st combination layer it
Afterwards, the 12nd convolutional layer and the 13rd combination layer are increased.
As an example, the convolution kernel size of the 12nd convolutional layer is configured to 3*3/2, convolution nuclear volume is arranged to 1024.
After layer processing, Output Size is the image of 8*8.
As an example, the 13rd combination layer includes the 131st convolutional layer, the 132nd convolutional layer and Residual layers.Wherein, the 131st
The convolution kernel size of convolutional layer is configured to 1*1, and convolution nuclear volume is arranged to 512;The convolution kernel size quilt of 132nd convolutional layer
It is configured to 3*3, convolution nuclear volume is arranged to 1024.Through the layer processing after, still Output Size be 8*8 image.Later, the figure
As entering global pool layer.
<<training methods and parameter of convolutional neural networks>>
Convolution kernel in convolutional layer and full articulamentum using obey mean value is 0, standard deviation is 0.1 random numbers of Gaussian distribution into
Row initialization, bias term are initialized using the uniform random number that section is [0,1] is obeyed.
In batch processing layer, momentum is set as 0.95, and constant is set as 0.01.
Using AdaDelta gradient descent algorithm training weight, batch processing is dimensioned to 64.
According to a certain percentage be arranged data training set, verifying collection and test set, after the training in 20 generations, every generation all into
The test of row verifying collection, that as a result best generation training pattern can be saved and used for the test of test set, and result is
The result entirely learnt.
Setting total data changes cycle of training as 100 generations, and in training, the positive negative sample ratio in training set is 10:1, often
In generation training, the negative sample and whole positive samples for successively upsetting 20% are trained, until whole negative samples have trained completion one
A cycle of training.
Above-mentioned experimental method and parameter are obtained on the basis of scientific research by many experiments.These methods and ginseng
Number is very applicable for driver environment of the present invention, especially in detection eye state, mouth state, smoking shape
It is especially pronounced when state and talking state.
<judgement of fatigue or violation state>
Video or image pass through convolutional neural networks feature extraction, and divide an image into 11*11 sub-box in advance, with every
Centered on a grid, 5 Random candidate frames are randomly generated respectively, each candidate frame is carried out in the full articulamentum of the last layer
Classification, obtains classification results and the position of each candidate frame with this;In network training, following several states: image are drafted
The position and open and-shut mode of middle driver eye or mouth, whether driver lifts mobile phone is fitted in the state of face, hand
The position of machine, cigarette position;State judgement or alert if:
Fatigue state: the state that eye is in closure is eye strain characterization, if the continuous duration of eye closure is more than 3s
(that is, eye closing scheduled duration, such as 3s, 5s, 10s etc.) is then assert and is in eye closing fatigue state;Mouth is in the state opened greatly i.e.
For mouth fatigue characterization, if the big Zhang Lianxu duration of mouth is more than 1s(that is, yawn scheduled duration, such as 2s) and when yawn is set
Between during detect in (for example, at least 60s, 100,120s etc.) 3 times or more, then assert and be in yawn fatigue state.It closes one's eyes
Fatigue state and yawn fatigue state are referred to as fatigue state.
Smoking state: as long as detecting the presence of cigarette and cigarette is defined as smoking state close to mouth.If such
State reaches 3 times or 4 times or 5 times (smoking pre-determined number) in (such as 5s, 10s, 20s etc.) during setting time of smoking, then
It can be determined that driver smokes in violation of rules and regulations.
Talking state: driver, which lifts mobile phone and is fitted in face, is defined as talking state, if the state is continuous
Such as 5s or more (that is, call scheduled duration, such as 6s, 8s, 15s etc.) then can be determined that driver converses in violation of rules and regulations.
As the detection example of eye closing fatigue state, during video flow detection, when detecting that eye is in for the first time
When closed state, record the current time (such as 10:10:10) and/or record present frame number (that is, time or number, under
Together).Later during continuous detection, if being consecutively detected this kind of state, continuous integration variable, if subsequent detection
In continuous several frames or back to back next frame can't detect this kind of state, illustrate that eyes are opened, just interrupt statistics, during this section
Variate-value (unit: frame) or time started to the time difference (unit: second (s)) terminated between record be exactly closed-eye state
Continuous duration.The present invention sets the maximum continuous time (that is, eye closing scheduled duration) of eye closing as 3s.As known to those skilled in the art,
The other times such as 4s, 5s are also set to maximum continuous time of closing one's eyes.
As an example, if 1-10 frame is not detected eye and is in closed state, initial time of closing one's eyes and the company of eye closing
Continuous duration is disposed as 0.If detecting that eye is in closed state in the 11st frame, current time, for example, 10 are recorded:
10:10, and eye closing initial time is set by the time.If until the 20th frame detects that eye is constantly in closed state,
Continuous updating current time is until the time of the 20th frame, for example, 10:10:11, then close one's eyes a length of 1s of consecutive hours, not up to closes one's eyes
Scheduled duration not can determine that driver is in eye closing fatigue state at this time.If detecting that eye is in the 21st frame opens state,
It then indicates that driver is not in the state continuously closed one's eyes, excludes the possibility of fatigue driving.Eye closing initial time and eye closing at this time
Continuous duration is updated to 0.Alternatively, if the 11st frame is to during the 20th frame, and in the 21st frame to during the 60th frame
Detect that eye is constantly in closed state in consecutive image, then the current time being recorded constantly refresh (from the 12nd frame when
Between start recording, flush to the time of the 60th frame always) to the time of the 60th frame, for example, 10:10:15, then consecutive hours of closing one's eyes
Length is updated to 5s.(the present embodiment be more than) eye closing scheduled duration (such as 3s) at this point, reach since eye closing consecutive hours is long, then
Assert that driver is in sleep or doze state, triggering alarm module makes a sound or the alarm of light, and controls associated picture
Or video is transmitted to external equipment (such as console).After alarm, eye closing initial time and continuous duration of closing one's eyes are reset as
0, it is detected into next round.
As the detection example of yawn fatigue state, during video flow detection, when detecting that mouth is in for the first time
When opening state greatly, records the current time (such as 10:10:10) and/or record the number of present frame.It continuously detects later
In the process, if being consecutively detected this kind of state, then continuous integration variable, if continuous several frames or back to back in subsequent detection
Next frame can't detect this kind of state, just interrupt statistics, the variate-value (unit: frame) or time started during this section to end
Time difference (unit: second (s)) between record is exactly the continuous duration of yawn state.The present invention sets yawn maximum consecutive hours
Between (that is, yawn scheduled duration) be 1s.As known to those skilled in the art, other times are also set to yawn maximum consecutive hours
Between.
As an example, if 1-10 frame is not detected mouth and is in a state greatly, by yawn initial time and yawn
Continuous duration is disposed as 0.If detecting that mouth is in a state greatly in the 11st frame, current time, for example, 10 are recorded:
10:10, and yawn initial time is set by the time.If still detecting that mouth is constantly in a shape greatly until the 15th frame
State then records current time, for example, 10:10:10 ' 30, then a length of 0.5s of yawn consecutive hours.At this point, not up to yawn is pre-
Timing is long (the present embodiment 1s), therefore not can determine that driver is in yawn fatigue state.If since the 11st frame until
40th frame detects that mouth is constantly in a state greatly, then the current time being recorded constantly refreshes (to be opened from the time of the 12nd frame
Begin record, and the time of the 60th frame is recorded always) to the time of the 40th frame, for example, 10:10:12, then the continuous duration quilt of yawn
It is updated to 2s.(the present embodiment be more than) yawn scheduled duration (such as 1s), then yawn at this point, reach since yawn consecutive hours is long
Number is updated to 1 from 0, indicates that driver has played a yawn at this time, while yawn initial time and the continuous duration of yawn are equal
It is updated to 0.Hereafter detection process continues, if remembering just detecting that mouth is again at a state greatly until the 100th frame
Current time is recorded, for example, 10:10:16 then records the current time, and sets yawn initial time for the time.If from
100th frame starts to detect that mouth is constantly in a state, the then current time being recorded greatly and constantly refreshes until the 140th frame
(from the time start recording of the 101st frame, the time of the 140th frame is recorded always) is to the time of the 140th frame, for example, 10:10:
18, then the continuous duration of yawn is updated to 2s.(the present embodiment be more than) yawn is predetermined at this point, reach since yawn consecutive hours is long
Duration (such as 1s), then yawn number is updated to 2 from 1, indicates that driver has played 2 yawns at this time, while when yawn starting
Between and the continuous duration of yawn be updated to 0.And so on.If being set in the yawn that the yawn initial time of first time yawn starts
During fixing time in (for example, 30s, 40s, 50s), detect that yawn number is that 4 times (or 5 times or 6 times) are greater than yawn predetermined time
Number 3 times, then show that driver is in yawn fatigue state.Triggering alarm module makes a sound or the alarm of light at this time, and controls
Associated picture or video are transmitted to external equipment (such as console).After alarm, yawn initial time, the continuous duration of yawn and
Yawn number is reset as 0, detects into next round.
As smoking violation state detection example, during video flow detection, when detect cigarette exist and first
Secondary when detecting cigarette close to mouth, then number of smoking is arranged to 1.Later during continuous detection, if detecting
This kind of state, then continuous integration variable.The present invention sets smoking maximum times (that is, smoking pre-determined number) as 3 times.This field skill
Art personnel know, 4 times, 5 other inferior numbers be also set to smoking pre-determined number.
As an example, setting 0 for smoking number if cigarette is not detected in 1-10 frame.If being detected in the 11st frame
To cigarette and its close to mouth until the 20th frame cigarette is far from mouth, then number of smoking is incremented by 1.If being detected again in the 50th frame
For cigarette close to mouth until the 60th frame cigarette is far from mouth, then number of smoking is incremented by 1 again becomes 2.And so on.If smoking
Smoking number increases to 3 times or 4 times or 5 inferior in (for example, 10s, 20s, 60s, 90s, 120s etc.) during setting time, then table
Show that driver is in smoking violation state, triggers alarm module at this time and make a sound or the alarm of light, and control associated picture
Or video is transmitted to external equipment (such as console).After alarm, smoking number is reset as 0, detects into next round.
As the detection example of call violation state, during video flow detection, when detecting that phone is in for the first time
When near mouth, records the current time (such as 10:10:10) and/or record the number of present frame.It continuously detects later
In the process, if being consecutively detected this kind of state, continuous integration variable, if in subsequent detection continuous several frames or it is back to back under
As soon as frame can't detect this kind of state, statistics is interrupted, knot is recorded in the variate-value (unit: frame) or time started during this section
Time difference (unit: second (s)) between beam recording is exactly the continuous duration of telephoning state.The present invention sets maximum of making a phone call
Continuous time (that is, call scheduled duration) is 5s.As known to those skilled in the art, the other times such as 10s are also set to lead to
Talk about maximum continuous time.
As an example, if 1-10 frame is not detected phone and is located near mouth, by call start time and call
Continuous duration is disposed as 0.If detecting that phone is near mouth in the 11st frame, current time, for example, 10 are recorded:
10:10, and call start time is set by the time.If until the 20th frame detects that phone is constantly near mouth,
Continuous updating current time is until the time of the 20th frame, for example, 10:10:11, then converse a length of 1s of consecutive hours, not up to converses
Scheduled duration not can determine that driver is in call violation state at this time.If detecting that phone leaves mouth in the 21st frame, sentence
Determine driver and be not in talking state, excludes the possibility driven in violation of rules and regulations.Call start time and continuous duration of conversing are equal at this time
It is updated to 0.Alternatively, if the 11st frame is to during the 20th frame, and in the 21st frame into the consecutive image during the 60th frame
Detect that phone is constantly near mouth, then the current time being recorded constantly refresh (from the time start recording of the 12nd frame,
The time of the 60th frame is recorded always) to the time of the 50th frame, for example, 10:10:15, then continuous duration of conversing is updated to
5s.At this point, reaching call scheduled duration (such as 5s) since call consecutive hours is long, then assert that driver is in call violation shape
State, triggering alarm module makes a sound or the alarm of light, and controls associated picture or video being transmitted to external equipment (in such as
Control platform).After alarm, call start time and continuous duration of conversing are reset as 0, detect into next round.
The above embodiment of the present invention is exemplary only.The selection of video frame can be timing, and it is non-fixed to be also possible to
When, herein with no restrictions.For example, can also be with every in preceding 100 frame every 10 milliseconds or 0.5 second 1 frame videos of interception
10 milliseconds intercept video for unit, and rear 100 frame is with every 5 milliseconds of interceptions video.For example, it may be possible to choose the 1st frame figure in 10:10:10
Picture, chooses the 10th frame image in 10:10:11, chooses the 100th frame image in 10:10:15.In addition, above-mentioned example is to record the time
To judge duration, number etc..Those skilled in the art can also judge duration, secondary by way of recording the number of present frame
Number etc., this is not as limitation of the present invention.
It is illustrated in figure 3 the schematic flow diagram for the method that the driving behavior of the invention to driver is detected.In step
In rapid S301, eye state, mouth state, cigarette state and talking state of the driver in driving procedure are obtained.Then, exist
In step S302, while detecting whether eye state meets closed-eye state, whether mouth state meets yawn state, cigarette state
Whether meet smoking state and whether telephone state meets talking state, and further judges whether there is driver and be in eye closing
Fatigue state, yawn fatigue state, smoking violation state and call violation state.Finally, in step S303, if any of the above-described
State meets fatigue or violation state, then issues alarm.Preferably, acquisition shows that driver is in fatigue or violation state
Video or picture can be sent to external equipment, such as Central Control Room or security room.
The present invention provides driver driving behavioral value neural network based detection modes end to end, solve tradition
The problems such as detection mode is at high cost or traditional images processing computing redundancy is complicated is uniformly exported by a kind of network and all is sentenced
Break as a result, settling at one go.Compared to traditional image detection means, the present invention using deep learning by the way of without doing image increasing
The pretreatment of strong aspect is adapted to the various extreme environments such as uneven illumination is even, target signature is diversified, background is complicated, and props up
The incremental training for scene is held, in actual use, training sample is calibrated by periodically manual intervention appropriate, is promoted
Accuracy rate under dedicated scene.
Can be with one BN layers and a LeakyReLU after each convolutional layer, and introduce residual layers of solution network
Because of degradation problem caused by depth;Training method and parameter be also by a large number of experimental results show that preferable skill
And parameter.Application: application of the convolution algorithm in drivers ' behavior driving behavior detection, end-to-end direct solution detection are asked
Topic simplifies traditional complexity, the detection means of redundancy.
Detection method provided by the present invention and device are described in detail above.Specific case used herein
Principle and implementation of the present invention are described, the above embodiments are only used to help understand side of the invention
Method and its core concept.It should be pointed out that for those skilled in the art, not departing from the principle of the invention
Under the premise of, it can be with several improvements and modifications are made to the present invention, these improvement and modification also fall into the claims in the present invention
In protection scope.
Claims (6)
1. the method that the driving behavior of a kind of couple of driver is detected characterized by comprising
Obtaining step shoots video of the driver in driving procedure, to obtain eye state, the driver of driver simultaneously
Each of mouth state, cigarette state and telephone state multiple image;With
Detecting step can be performed simultaneously following steps in each frame image:
Whether detection eye state meets closed-eye state, if detecting, eye state does not meet closed-eye state, originates closing one's eyes
Time and continuous duration of closing one's eyes are disposed as 0;If detecting that eye state meets closed-eye state for the first time, current time is set
It is set to eye closing initial time;If previous frame image and detecting eye state symbol with the continuous next frame image of previous frame image
Closed-eye state is closed, then is set as closing one's eyes by duration continuous between the current time and eye closing initial time of the next frame image
Continuous duration;
Whether detection mouth state meets yawn state, if detecting, mouth state does not meet yawn state, and yawn is originated
Time, the continuous duration of yawn and yawn number are disposed as 0;It, will if detecting that mouth state meets yawn state for the first time
Current time is set as yawn initial time;If previous frame image and being detected with the continuous next frame image of the previous frame image
Meet yawn state to mouth state, then when will be continuous between the current time and yawn initial time of the next frame image
Length is set as the continuous duration of yawn;
Whether detection cigarette state meets smoking state, if detecting, cigarette state does not meet smoking state, and will smoke number
It is initialized as 0;If detecting, cigarette state meets smoking state, smoking number is added 1, and smoking setting time hereafter
During will smoking number be classified as 0;
Whether detection telephone state meets talking state, if detecting, telephone state does not meet talking state, by call starting
Time and continuous duration of conversing are disposed as 0;If detecting that telephone state meets talking state for the first time, current time is recorded
For call start time;If previous frame image and with the continuous next frame image of the previous frame image detect telephone state accord with
Talking state is closed, then is set as conversing by duration continuous between the current time and call start time of the next frame image
Continuous duration;
Judgment step can be performed simultaneously following eye closing fatigue state judgement, the judgement of yawn fatigue state, smoking violation state
Judgement and call violation state judgment step:
In the judgement of eye closing fatigue state, whether the continuous duration that judges to close one's eyes reaches eye closing scheduled duration, if so, at driver
In eye closing fatigue state;
In the judgement of yawn fatigue state, judge whether the continuous duration of yawn reaches yawn scheduled duration, if the continuous duration of yawn
Reach yawn scheduled duration, then yawn number adds 1, if yawn number reaches yawn predetermined time during yawn setting time
Number, then driver is in yawn fatigue state;
In smoking violation state judgement, judge whether smoking number reaches smoking predetermined time during smoking setting time
Number, if so, driver is in smoking violation state;
In call violation state judgement, whether the continuous duration that judges to converse reaches call scheduled duration, if so, at driver
In call violation state;
Alarming step is in eye closing fatigue state, yawn fatigue state, smoking violation state and call violation shape in driver
It is made a sound when at least one of state or light, and sends external equipment for video or image,
Wherein, detecting step use the algorithm based on convolutional neural networks, the convolutional neural networks include at least one convolutional layer,
At least one residual layers, a global pool layer and a full articulamentum, wherein be disposed in each convolutional layer
One BN layers and one LeakyReLU layers.
2. the method according to claim 1, wherein convolutional neural networks include being linked in sequence such as lower layer:
1st convolutional layer, image are directly input into the 1st convolutional layer,
2nd convolutional layer,
1 the 3rd combination layer comprising the 31st convolutional layer, the 32nd convolutional layer and Residual layers,
4th convolutional layer,
2 the 5th combination layers, each 5th combination layer include the 51st convolutional layer, the 52nd convolutional layer and Residual layers,
6th convolutional layer,
4 the 7th combination layers, each 7th combination layer include the 71st convolutional layer, the 72nd convolutional layer and Residual layers,
8th convolutional layer,
4 the 9th combination layers, each 9th combination layer include the 91st convolutional layer, the 92nd convolutional layer and Residual layers,
10th convolutional layer,
2 the 11st combination layers, each 11st combination layer include the 111st convolutional layer, the 112nd convolutional layer and Residual layers,
Global pool layer, and
Full articulamentum.
3. the method according to claim 1, wherein convolutional neural networks include being linked in sequence such as lower layer:
1st convolutional layer, image are directly input into the 1st convolutional layer,
2nd convolutional layer,
1 the 3rd combination layer comprising the 31st convolutional layer, the 32nd convolutional layer and Residual layers,
4th convolutional layer,
2 the 5th combination layers, each 5th combination layer include the 51st convolutional layer, the 52nd convolutional layer and Residual layers,
6th convolutional layer,
8th convolutional layer,
4 the 9th combination layers, each 9th combination layer include the 91st convolutional layer, the 92nd convolutional layer and Residual layers,
10th convolutional layer,
2 the 11st combination layers, each 11st combination layer include the 111st convolutional layer, the 112nd convolutional layer and Residual layers,
12nd convolutional layer,
1 the 13rd combination layer, each 13rd combination layer include the 131st convolutional layer, 132 convolutional layers and Residual layers,
Global pool layer, and
Full articulamentum.
4. according to the method in claim 2 or 3, which is characterized in that
The convolution kernel size of 1st convolutional layer is 3*3, and convolution nuclear volume is 32, and output picture size is 256*256;
The convolution kernel size of 2nd convolutional layer is 3*3/2, and convolution nuclear volume is 64, and output picture size is 128*128;
The convolution kernel size of 31st convolutional layer is 1*1, and convolution nuclear volume is that the convolution kernel size of the 32, the 32nd convolutional layer is 3*3, volume
Product nuclear volume is that the 64, the 3rd combination layer output picture size is 128*128;
The convolution kernel size of 4th convolutional layer is 3*3/2, and convolution nuclear volume is 128, and output picture size is 64*64;
The convolution kernel size of 51st convolutional layer is 1*1, and convolution nuclear volume is that the convolution kernel size of the 64, the 52nd convolutional layer is 3*3, volume
Product nuclear volume is that the 128, the 5th combination layer output picture size is 64*64;
The convolution kernel size of 6th convolutional layer is 3*3/2, and convolution nuclear volume is 256, and output picture size is 32*32;
The convolution kernel size of 71st convolutional layer is 1*1, and convolution nuclear volume is that the convolution kernel size of the 128, the 72nd convolutional layer is 3*3,
Convolution nuclear volume is that the 256, the 7th combination layer output picture size is 32*32;
The convolution kernel size of 8th convolutional layer is 3*3/2, and convolution nuclear volume is 512, and output picture size is 16*16;
The convolution kernel size of 91st convolutional layer is 1*1, and convolution nuclear volume is that the convolution kernel size of the 256, the 92nd convolutional layer is 3*3,
Convolution nuclear volume is that the 512, the 9th combination layer output picture size is 16*16;
The convolution kernel size of 10th convolutional layer is 3*3/2, and convolution nuclear volume is 1024, and output picture size is 8*8;
The convolution kernel size of 111st convolutional layer is 1*1, and convolution nuclear volume is that the convolution kernel size of the 512, the 112nd convolutional layer is 3*
3, convolution nuclear volume is that the 1024, the 11st combination layer output picture size is 8*8;
The convolution kernel size of 12nd convolutional layer is 3*3/2, and convolution nuclear volume is 1024, and output picture size is 8*8;
The convolution kernel size of 131st convolutional layer is 1*1, and convolution nuclear volume is that the convolution kernel size of the 512, the 132nd convolutional layer is 3*
3, convolution nuclear volume is that the 1024, the 131st combination layer output picture size is 8*8.
5. the method according to claim 1, wherein eye closing fatigue state refers to the state of eye closure, yawn
Fatigue state refers to that mouth is opened greatly, and smoking violation state refers to cigarette close to mouth, and violation state of conversing refers to that mobile phone is fitted in
Near face;
Eye closing scheduled duration is arranged at least 3 seconds,
Yawn scheduled duration is arranged at least 1 second,
Yawn pre-determined number is arranged at least 3 times,
It is at least 30 seconds during yawn setting time,
Smoking pre-determined number is arranged at least 3 times,
It is at least 10 seconds during setting time of smoking,
Call scheduled duration is arranged at least 5 seconds.
6. the method according to claim 1, wherein the training method and parameter of convolutional neural networks are constructed
Are as follows:
Convolution kernel and full articulamentum in convolutional layer are carried out using the random numbers of Gaussian distribution that mean value is 0, standard deviation is 0.1 is obeyed
Initialization, bias term are initialized using the uniform random number that section is [0,1] is obeyed;
In batch processing layer, momentum is set as 0.95, and constant is set as 0.01;
Using AdaDelta gradient descent algorithm training weight, batch processing is dimensioned to 64;
According to the training set of predetermined ratio setting data, verifying collection and test set, after the training in 20 generations, every generation is all tested
The test of collection is demonstrate,proved, that as a result best generation training pattern can be saved and used for the test of test set, and result is as entire
The result of study;
The setting total data cycle of training that changes be at least 100 generations, and in training, the positive negative sample ratio in training set is 10-15:1,
In every generation training, the negative sample and whole positive samples for successively upsetting 10%-30% are trained, until whole negative samples have been trained
Complete a cycle of training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910561203.0A CN110309760A (en) | 2019-06-26 | 2019-06-26 | The method that the driving behavior of driver is detected |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910561203.0A CN110309760A (en) | 2019-06-26 | 2019-06-26 | The method that the driving behavior of driver is detected |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110309760A true CN110309760A (en) | 2019-10-08 |
Family
ID=68076806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910561203.0A Pending CN110309760A (en) | 2019-06-26 | 2019-06-26 | The method that the driving behavior of driver is detected |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110309760A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110837815A (en) * | 2019-11-15 | 2020-02-25 | 济宁学院 | Driver state monitoring method based on convolutional neural network |
CN111784973A (en) * | 2020-07-30 | 2020-10-16 | 广州敏视数码科技有限公司 | MDVR equipment integration fatigue detection method of fleet management platform |
CN112947137A (en) * | 2021-01-20 | 2021-06-11 | 神华新能源有限责任公司 | Hydrogen energy automobile control method, hydrogen energy automobile and Internet of things system |
CN113392800A (en) * | 2021-06-30 | 2021-09-14 | 浙江商汤科技开发有限公司 | Behavior detection method and device, computer equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809445A (en) * | 2015-05-07 | 2015-07-29 | 吉林大学 | Fatigue driving detection method based on eye and mouth states |
CN107657236A (en) * | 2017-09-29 | 2018-02-02 | 厦门知晓物联技术服务有限公司 | Vehicle security drive method for early warning and vehicle-mounted early warning system |
CN108764034A (en) * | 2018-04-18 | 2018-11-06 | 浙江零跑科技有限公司 | A kind of driving behavior method for early warning of diverting attention based on driver's cabin near infrared camera |
CN108960065A (en) * | 2018-06-01 | 2018-12-07 | 浙江零跑科技有限公司 | A kind of driving behavior detection method of view-based access control model |
US20190019068A1 (en) * | 2017-07-12 | 2019-01-17 | Futurewei Technologies, Inc. | Integrated system for detection of driver condition |
CN109552332A (en) * | 2018-12-06 | 2019-04-02 | 电子科技大学 | A kind of automatic driving mode intelligent switching system based on driver status monitoring |
-
2019
- 2019-06-26 CN CN201910561203.0A patent/CN110309760A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809445A (en) * | 2015-05-07 | 2015-07-29 | 吉林大学 | Fatigue driving detection method based on eye and mouth states |
US20190019068A1 (en) * | 2017-07-12 | 2019-01-17 | Futurewei Technologies, Inc. | Integrated system for detection of driver condition |
CN107657236A (en) * | 2017-09-29 | 2018-02-02 | 厦门知晓物联技术服务有限公司 | Vehicle security drive method for early warning and vehicle-mounted early warning system |
CN108764034A (en) * | 2018-04-18 | 2018-11-06 | 浙江零跑科技有限公司 | A kind of driving behavior method for early warning of diverting attention based on driver's cabin near infrared camera |
CN108960065A (en) * | 2018-06-01 | 2018-12-07 | 浙江零跑科技有限公司 | A kind of driving behavior detection method of view-based access control model |
CN109552332A (en) * | 2018-12-06 | 2019-04-02 | 电子科技大学 | A kind of automatic driving mode intelligent switching system based on driver status monitoring |
Non-Patent Citations (6)
Title |
---|
2014WZY: "深度学习初始化方法", 《HTTS://BLOG.CSDN.NET/U014696921/ARTICLE/DETAILS/53819512》, 22 December 2016 (2016-12-22), pages 5 * |
JOSEPH REDMON AND ALI FARHADI: "YOLOv3: An Incremental Improvement", 《ARXIV:1804.0276V1》 * |
JOSEPH REDMON AND ALI FARHADI: "YOLOv3: An Incremental Improvement", 《ARXIV:1804.0276V1》, 8 April 2018 (2018-04-08), pages 1 - 6, XP080868709 * |
YINGYU JI ET AL.: "Fatigue State Detection Based on Multi-Index Fusion and State Recognition Network", 《IEEE ACCESS》 * |
YINGYU JI ET AL.: "Fatigue State Detection Based on Multi-Index Fusion and State Recognition Network", 《IEEE ACCESS》, 30 May 2019 (2019-05-30) * |
田萱 等: "《基于深度学习的图像语义分割技术》", 31 May 2019, pages: 85 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110837815A (en) * | 2019-11-15 | 2020-02-25 | 济宁学院 | Driver state monitoring method based on convolutional neural network |
CN111784973A (en) * | 2020-07-30 | 2020-10-16 | 广州敏视数码科技有限公司 | MDVR equipment integration fatigue detection method of fleet management platform |
CN111784973B (en) * | 2020-07-30 | 2021-12-14 | 广州敏视数码科技有限公司 | MDVR equipment integration fatigue detection method of fleet management platform |
CN112947137A (en) * | 2021-01-20 | 2021-06-11 | 神华新能源有限责任公司 | Hydrogen energy automobile control method, hydrogen energy automobile and Internet of things system |
CN113392800A (en) * | 2021-06-30 | 2021-09-14 | 浙江商汤科技开发有限公司 | Behavior detection method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110309760A (en) | The method that the driving behavior of driver is detected | |
CN106446831B (en) | Face recognition method and device | |
CN101639894B (en) | Method for detecting train driver behavior and fatigue state on line and detection system thereof | |
CN109769099B (en) | Method and device for detecting abnormality of call person | |
Ji et al. | Fatigue state detection based on multi-index fusion and state recognition network | |
Kurylyak et al. | Detection of the eye blinks for human's fatigue monitoring | |
KR102096617B1 (en) | Driver drowsiness detection system using image and ppg data based on multimodal deep learning | |
CN110334627A (en) | The device and system that the behavior of personnel is detected | |
CN105719431A (en) | Fatigue driving detection system | |
CN104757981A (en) | Method and device for high-sensitively receiving and transmitting integrated infrared detection of driver's fatigue | |
CN106128032A (en) | A kind of fatigue state monitoring and method for early warning and system thereof | |
CN110321780A (en) | Exception based on spatiotemporal motion characteristic falls down behavioral value method | |
CN109953763A (en) | A kind of vehicle carried driving behavioral value early warning system and method based on deep learning | |
CN109740477A (en) | Study in Driver Fatigue State Surveillance System and its fatigue detection method | |
CN109543577A (en) | A kind of fatigue driving detection method for early warning based on facial expression feature | |
CN104068868A (en) | Method and device for monitoring driver fatigue on basis of machine vision | |
CN208126407U (en) | Anti-fatigue-driving system based on software-hardware synergism image procossing | |
CN107045766A (en) | Terminal user safety monitoring method and device | |
CN108021875A (en) | A kind of vehicle driver's personalization fatigue monitoring and method for early warning | |
Damousis et al. | Fuzzy fusion of eyelid activity indicators for hypovigilance-related accident prediction | |
CN110232327B (en) | Driving fatigue detection method based on trapezoid cascade convolution neural network | |
CN110477927A (en) | A kind of active blink detection method, apparatus, equipment and readable storage medium storing program for executing | |
Kavitha et al. | A novel approach for driver drowsiness detection using deep learning | |
CN117547270A (en) | Pilot cognitive load feedback system with multi-source data fusion | |
Boverie et al. | Driver vigilance diagnostic based on eyelid movement observation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191008 |
|
RJ01 | Rejection of invention patent application after publication |