CN110502995A - Driver based on subtle facial action recognition yawns detection method - Google Patents
Driver based on subtle facial action recognition yawns detection method Download PDFInfo
- Publication number
- CN110502995A CN110502995A CN201910658690.2A CN201910658690A CN110502995A CN 110502995 A CN110502995 A CN 110502995A CN 201910658690 A CN201910658690 A CN 201910658690A CN 110502995 A CN110502995 A CN 110502995A
- Authority
- CN
- China
- Prior art keywords
- frame
- key
- driver
- frames
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/59—Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
- G06V20/597—Recognising the driver's state or behaviour, e.g. attention or drowsiness
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
Abstract
It yawns detection method the present invention provides the driver based on subtle facial action recognition, comprising the following steps: step 1, driver's driving video that vehicle-mounted vidicon captures is pre-processed, carries out Face datection and segmentation, image size normalization and denoising;Step 2, Key-frame Extraction Algorithm is proposed, the similarity picture that screens and peel off by picture histogram similarity threshold rejects the method combined, to extract the key frame in deliberate action sequence;Step 3, according to the key frame of selection, establishing has the 3D deep learning network (3D-LTS) of low Temporal sampling to detect various behaviors of yawning, the present invention extracts the key frame of deliberate action by Key-frame Extraction Algorithm, then it by the 3D-LTS network established, extracts space-time characteristic and detects various facial deliberate actions;It is better than existing method in terms of discrimination and overall performance, can effectively distinguishes and yawn and other facial deliberate actions, effectively reduces driver and yawn the false detection rate of behavior.
Description
Technical field
The present invention relates to technical field of computer vision, and the driver specially based on subtle facial action recognition yawns
Detection method.
Background technique
Intelligent driving includes providing pre-warning signal, and monitoring and assist vehicle control is the hot topic of road improvement safety in recent years
Study topic.There are thousands onboard to fall asleep due to dead or major injuries because of driver every year.Road safety is tired by driver
Labor seriously threatens.National Highway Traffic safety management bureau is investigated, interviewed more than one third as the result is shown
Person recognizes to live through fatigue when driving.In fatigue driving accident, 10% people recognizes them in past one month and past
Such accident occurred in 1 year.It was discovered by researchers that driver fatigue results in 22% traffic accident.In no any police
In the case where announcement, driving fatigue leads to collision or close a possibility that colliding is six times of normal driving.Therefore, research identification is driven
The method of the person's of sailing fatigue is extremely important for improving road safety.In the past few decades, it is tired that many drivers are proposed
Labor detection method, to help driver safety to drive and improve traffic safety.The behavioural characteristic packet of driver in fatigue driving
Blink is included, is nodded, close one's eyes and is yawned.In these behaviors, yawning is one of the principal mode of fatigue performance.Therefore, it grinds
Study carefully personnel and numerous studies have been done to detection of yawning.Compared with traditional double identification, face action can be considered as thin
Micro- facial action.
It yawns although many researchers propose different methods to detect, they still suffer from huge choose
War.Due to driver's face action and expression complicated in true driving environment, existing method is difficult to accurately, steadily detect
It yawns, ought especially have some face actions and the mouth deformation of expression similar with yawning, easily generation false retrieval.Therefore,
New feature, new challenge in face of driving environment, how quickly and accurately to carry out driver's behavioral value of yawning is us
The project for needing to study.
Summary of the invention
It is an object of the present invention to solve already present driver behavioral value algorithm of yawning can not effectively distinguish some spies
It is different yawn with class yawn behavior the problem of, such as while singing the special behavior of yawning such as yawn, the classes such as shout
It yawns behavior, proposes that the driver based on subtle facial action recognition yawns detection method.
To achieve the above object, the invention provides the following technical scheme: the driver based on subtle facial action recognition beats
Yawn detection method, comprising the following steps:
Step 1, driver's driving video that vehicle-mounted vidicon captures is pre-processed, carry out Face datection and divided
It cuts, image size normalization and denoising;
Step 2, it proposes Key-frame Extraction Algorithm, passes through the screening of picture histogram similarity threshold and the similarity graph that peels off
Piece rejects the method combined, to extract the key frame in deliberate action sequence;
Step 3, according to the key frame of selection, establish have the 3D deep learning network (3D-LTS) of low Temporal sampling with
Detect various behaviors of yawning.
Further, it includes: use that the driver's driving video captured to vehicle-mounted vidicon, which carries out pretreatment,
Viola-Jones Face datection algorithm carries out the detection of driver's human face region, driver's facial area is partitioned into, using quick
Median filtering method carries out denoising.
Further, the Key-frame Extraction Algorithm is from a series of original video frame F={ Fj, j=1 ..., N } in mention
Take a series of key frame K={ Ki, i=1 ..., M };Wherein M indicates the quantity of the key frame selected from primitive frame, and N indicates former
The quantity of beginning frame, the Key-frame Extraction Algorithm include two choice phases:
In the first choice stage, the RGB color histogram of each video frame is calculated;Then, two are calculated using Euclidean distance
The color histogram γ of a successive framejAnd γj+1Between similarity:
Wherein, 1≤j≤N-1, n are the dimensions of picture color histogram.
Similarity threshold T is calculated by formula (2)s:
Ts=μs (2)
Wherein, μsFor Mean (S), S SjSet, work as Sj> TsWhen, it is believed that FjAnd Fj+1Similarity is smaller, by FjIt is added
Candidate key frame queue.
In second choice phase, the candidate key-frames picture that those in candidate key-frames have the feature that peels off is rejected, is come
Final key frame is obtained, use two kinds of image similarity Measure Indexes: Euclidean distance (ED) and root-mean-square error (RMSE) make
The frame with the feature that peels off is detected with median absolute deviation (MAD), calculates MAD according to formula (3):
MAD=median (| Xi-median(X)|) (3)
Continuous two candidate key-frames are denoted as KI, i+1, for all KI, i+1, their RMSE and ED value is calculated,
For each calculated RMSE (KI, i+1) and ED (KI, i+1), their MAD value is calculated, is denoted as α=MAD (RMSE), β=
MAD (ED), RMSE (KI, i+1) calculation formula such as formula (4) shown in, ED (KI, i+1) calculation formula such as formula (5) shown in, when
RMSE(KI, i+1) it is less than α and ED (KI, i+1) be less than β when, it is believed that KiIt is with the candidate key-frames of feature of peeling off, and by KiIt moves
Except candidate key-frames.
Here n represents KiSize.
Here m represents the picture color histogram dimension of candidate key-frames.
Further, the 3D-LTS network extracts the identification with deliberate action, the 3D-LTS net for space-time characteristic
Network uses 8 non-superframe frames as input, and space-time characteristic, all convolution filters are extracted from successive frame using four 3D convolutional layers
Wave device is all 3 × 3 × 3, and stride is 1 × 1 × 1, and all pond layers are all maximum ponds, and the kernel size of the first and second tether layers is
1 × 2 × 2, the filter quantity of first four convolutional layer is respectively 32,64,128 and 256, and third pond inner nuclear layer is 2 × 4 × 4,
Convolutional layer is followed by the layer being fully connected, be used for mappings characteristics, using one with 1024 output be fully connected layer,
Distribution for integration characteristic.
Compared with prior art, the beneficial effects of the present invention are:
It yawns detection method the invention proposes a kind of driver based on subtle facial action recognition, firstly, proposing
A kind of two stage Key-frame Extraction Algorithm, the algorithm have calculating speed fast and can effectively mention from original frame sequence
The advantages of taking the key frame of deliberate action;Secondly, the invention also provides a kind of, the deliberate action based on Three dimensional convolution network is known
Other network, for extracting space-time characteristic and detecting various facial deliberate actions;Method proposed by the present invention is in discrimination and entirety
Aspect of performance is better than existing method, can effectively distinguish and yawn and other facial deliberate actions, effectively reduce driver and beat Kazakhstan
The false detection rate for the behavior of owing.
Detailed description of the invention
Fig. 1 is detection framework figure of yawning the present invention is based on the driver of subtle facial action recognition;
Fig. 2 is key-frame extraction result demonstration graph of the present invention;
Fig. 3 is the comparison figure of two-dimensional convolution and Three dimensional convolution;
Fig. 4 is the structure chart of 3D-LTS network proposed by the present invention;
Fig. 5 is some frame samples from YawDDR data set;
Fig. 6 be YawDDR data set in two kinds movement some image sequences (a) speak (b) yawn;
Fig. 7 is high-definition camera of the present invention and position of driver figure;
Fig. 8 be MFAY data set of the present invention in three face actions image sequence (a) yawn (b) singing (c) shout;
Fig. 9 is the video sequence number figure in MFAY data set of the present invention;
Figure 10 is the testing result figure of the method for the present invention and four kinds of advanced methods on MFAY data set.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is described in further detail.The specific embodiments are only for explaining the present invention technical solution described herein, and
It is not limited to the present invention.
The present invention provides a kind of technical solution: the driver based on subtle facial action recognition yawns detection method,
General frame is as shown in Figure 1, comprising the following steps:
Step 1, driver's driving video that vehicle-mounted vidicon captures is pre-processed, carry out Face datection and divided
It cuts, image size normalization and denoising;
Step 2, it proposes Key-frame Extraction Algorithm, passes through the screening of picture histogram similarity threshold and the similarity graph that peels off
Piece rejects the method combined, to extract the key frame in deliberate action sequence;
Step 3, according to the key frame of selection, establish have the 3D deep learning network (3D-LTS) of low Temporal sampling with
Detect various behaviors of yawning.
Video pre-filtering is the essential step in our research contents, since driver's detection of yawning is related to driver
Fatigue driving, so must have the requirement of real-time.It must be taken the photograph using faster and better video processing technique to vehicle-mounted
As the video that head is recorded is handled.Video is carried out sub-frame processing first by us.Due to including many in the picture after framing
The redundancy of background etc., and these information have not a particle of use for our subsequent classification, will cause very big do instead
It disturbs.Our target is classified to the movement of driver face, so our region-of-interest should be the face of driver
Portion region.We carry out driver's human face region detection with Viola-Jones Face datection algorithm.The inspection of Viola-Jones face
It is quick, stable, accurate that method of determining and calculating has the characteristics that, is a kind of the most widely used Face datection algorithm.But it is to being more than
The detection effect of the face of certain rotation angle is simultaneously bad.In the present invention, the position of camera is face driver face,
It may be ensured that there is 100% face recall rate.After detecting human face region, the size of successive frame is unified for by we
200×200。
Driving environment in reality has vibration due to the movement of automobile.Which results in our vehicle-mounted pick-ups
Head can generate noise and interference because of vibration during shooting.It is dry in order to reduce these noise brings as far as possible
It disturbs, we carry out denoising using Fast median filtering algorithm.Fast median filtering algorithm is that the GPU of median filtering method accelerates version
This.Median filtering method can effectively remove shot noise and salt-pepper noise, since the noise that vibration generates belongs to both mostly.
So denoising effect can be made to reach best using Fast median filtering algorithm, the interference of noise bring is minimized.
In pretreatment stage, video is divided into 30 frames/second frame sequence.Due in original frame sequence between consecutive frame
Information content difference very little, there are bulk redundancy frames, these redundant frames make the accuracy decline of the classification of motion, especially for movement
The lesser deliberate action of amplitude of variation.In order to solve these problems, the present invention proposes a kind of based on the screening of picture similarity threshold
With the effective real-time Key-frame Extraction Algorithm for the similarity rejecting that peels off.Fig. 2 shows the keyframe sequence by the algorithms selection.
Key-frame Extraction Algorithm proposed by the present invention is from a series of original video frame F={ Fj, j=1 ..., N } in extract
A series of key frame K={ Ki, i=1 ..., M }.Wherein M indicates the quantity of the key frame selected from primitive frame, and N indicates original
The quantity of frame.The present invention combines the histogram similarity filtering based on threshold value with outlier detection.Picture histogram has
Calculate advantage at low cost.Compared with local feature, the global characteristics such as image distance and histogram in classification can be effectively
Reduce wrong report.
Key-frame Extraction Algorithm proposed by the present invention includes two choice phases:
In the first choice stage, the RGB color histogram of each video frame is calculated;Then, two are calculated using Euclidean distance
The color histogram γ of a successive framejAnd γj+1Between similarity:
Wherein, 1≤j≤N-1, n are the dimensions of picture color histogram.By calculating, we obtain one to include
FjAnd Fj+1The set S of similarity.We are it needs to be determined that a similarity threshold TsTo carry out the selection of key frame.This threshold value energy
Represent the average level of similarity between frame.We consider two kinds of threshold value calculation methods.The half of minimum and maximum similarity
And average similarity.The data set and processed YawDD reference data that we use the two threshold values to collect from us ourselves
Concentrate selection key frame.Our network is trained on the data set that self is collected, and in processed YawDD data set
On tested.As the result is shown in table 1, wherein s indicates similarity collection, and YT is the abbreviation yawned when talking.From knot
Fruit is it will be seen that use average similarity to allow our detection method realization of yawning optimal whole as metric threshold
Body result.One semi-fusion of largest frames similarity and minimum frame similarity, two extreme similarities.The threshold value cannot represent this
The average value of the similarity of a little face actions.Average similarity as threshold value can choose most representative key frame.
Experimental result (unit: %) under 1 two kinds of threshold values of table
Similarity threshold T is calculated by formula (2)s:
Ts=μs (2)
Wherein, μsFor Mean (S), S SjSet, work as Sj> TsWhen, it is believed that FjAnd Fj+1Similarity is smaller, by FjIt is added
Candidate key frame queue.
In second choice phase, the candidate key-frames picture that those in candidate key-frames have the feature that peels off is rejected, is come
Final key frame is obtained, use two kinds of image similarity Measure Indexes: Euclidean distance (ED) and root-mean-square error (RMSE) make
The frame with the feature that peels off is detected with median absolute deviation (MAD), calculates MAD according to formula (3):
MAD=median (| Xi-median(X)|) (3)
Continuous two candidate key-frames are denoted as KI, i+1, for all KI, i+1, their RMSE and ED value is calculated,
For each calculated RMSE (KI, i+1) and ED (KI, i+1), their MAD value is calculated, is denoted as α=MAD (RMSE), β=
MAD (ED), RMSE (KI, i+1) calculation formula such as formula (4) shown in, ED (KI, i+1) calculation formula such as formula (5) shown in, when
RMSE(KI, i+1) it is less than α and ED (KI, i+1) be less than β when, it is believed that KiIt is with the candidate key-frames of feature of peeling off, and by KiIt moves
Except candidate key-frames.
Here n represents KiSize.
Here m represents the picture color histogram dimension of candidate key-frames.
In the present invention, another important contribution is detection that the introducing of action recognition mechanism is yawned.In recent years, it acts
Identification all has made great progress in terms of accuracy and speed.Researcher has proposed various networks to identify action.Extensively
The action recognition frame used is double-current converged network and 3D convolutional network.
3D convolutional network causes many passes in action recognition, scene and target identification and movement similarity analysis
Note.Compared with other space-time characteristic extracting methods based on binary-flow network, Three dimensional convolution network has calculating speed fast, accuracy rate
High advantage.Some researchers attempt to be superimposed the continuous characteristic pattern of 2D convolution to classify to video actions, but in 2D convolution
Temporal information has also been lost in the process.In contrast, 3D convolutional network uses multiple successive video frames as input, such as Fig. 3 institute
Show that .3D convolutional network realizes better temporal information modeling by the operation of 3D convolution sum 3D pondization.It is discovered by experiment that 3 × 3 ×
The 3D convolution kernel of 3 sizes can extract most representative space-time characteristic.
Based on Three dimensional convolution, the invention proposes a kind of 3D-LTS networks with low Temporal sampling, special for space-time
Sign is extracted and the identification of deliberate action.The 3D-LTS Web vector graphic 3D convolution extracts space-time characteristic, and using softmax layers into
Row classification.After data prediction and key frame selection, determines and how many frame are used as the input of 3D-LTS network to obtain most
Good recognition performance is extremely important.We compare the result of 3D-LTS network from different input frame numbers.Our net
Network is trained on the data set that self is collected, and is tested on processed YawDD data set.Experimental result such as table
2.From the point of view of whole recognition result, the results showed that our 3D-LTS network is not very sensitive to the quantity of input frame.We
Web vector graphic 8 non-superframe frames show better performance as input.3D-LTS is using four 3D convolutional layers from successive frame
Extract space-time characteristic.The structure of 3D-LTS is as shown in Figure 4.It may be seen that all convolution filters are all from structure chart
3 × 3 × 3, stride is 1 × 1 × 1.All pond layers are all maximum ponds.If we slow down the remittance of shallow tether layer on time dimension
Collect rate, then deep convolutional layer can extract more representative temporal characteristics from shallow convolutional layer.This is for recognizing subtle behavior
It is extremely important.Based on the theory analysis, the kernel size of the first and second tether layers in our 3D-LTS is 1 × 2 × 2.
The filter quantity of first four convolutional layer is respectively 32,64,128 and 256.Third pond inner nuclear layer is 2 × 4 × 4.After convolutional layer
Face is the layer being fully connected, and is used for mappings characteristics.We used one with 1024 output be fully connected layer.It is complete
The layer connected entirely is used for the distribution of integration characteristic.We have found that our 3D-LTS is when face is a layer being fully connected behind
Obtain optimal recognition performance.
The experimental result (unit: %) of 2 network difference of table input frame number
In present invention experiment, our systems that we are tested using the yawn detection data collection of a standard first,
YawDD data set, YawDD are a public detection data collection of yawning.It can be used for verifying Face datection, and face characteristic mentions
It takes, yawn detection and other algorithms.The data set has collected from different sexes, age, and country is with the volunteer's of race
A series of actions video.The data set includes 351 videos.It recorded three to four videos for every driver, including not
Same mouth situation, such as speaks, yawns and yawn.
It since most of video clip duration in YawDD data set are super after one minute, and include multiple faces
Movement, therefore we need for the video clip in YawDD data set to be divided into video clip only comprising individual part.Pass through
This mode, we are based on YawDD data set and construct YawDDR data set.Video length in YawDDR data set is about 8
Second.There are three types of operations in this data set: talk (T) yawns (Y) and yawns (YT) when talking.It is had collected in YawDDR
486 image sequences.Display is in fig. 5 and fig. for some examples (before face segmentation and after segmentation) in data set.I
Verified using the data set we method validity.
Many face data collection are used for identification, human facial expression recognition and face detection.However, none public is driven
The person's of sailing detection data collection of yawning includes various face actions.The purpose for collecting this data set is to verify our method various
Driver is carried out in face action to yawn the efficiency of detection.Therefore, by practical driving environment using HD camera come structure
Build our MFAY data set.Various face actions are divided into six grades that may occur during driving by we.Driver
In speak (T);Yawn (YT) when speaking;Yawn (Y);It sings (S);Yawn (YS) when singing;Shout (ST).In view of tired
Please the danger sailed, our collection place are selected on the broad road of seldom pedestrian.Do not influencing the case where driving
Under, mini high-definition camera is installed in face of driver to capture their face action.During the experiment, driver is not
It drives a car under same illumination and road conditions.In the copilot of vehicle, researcher persistently monitors each subject's
Face action variation, to annotate the brass tacks of each face action.The position of high definition camera and driver are as shown in Figure 7.
The facial video of 20 testers (the range of age from 20 to 46 year old) is under the different situations in motor racing
It obtains.The sample image of MFAY data set is as shown in Figure 8.All videos are all converted to audio-video stagger scheme, video speed
Rate is 30fps.Finally, as shown in figure 9, extracting 347 image sequences (53652 images) from the video obtained.Each figure
As the length of sequence is about 5 seconds (150 frame).
The present invention is based on YawDDR data sets and MFAY data set to have carried out following three parts experiment:
Experiment one: in order to prove that our Key-frame Extraction Algorithm can effectively select the pass in driving sequence of frames of video
Key frame, we have carried out following experiment on YawDDR data set and MFAY data set.Firstly, picture histogram is for deleting difference
The frame of other very little simultaneously selects candidate key-frames, this operation note is the stage one by we.Can have to verify our algorithm
Effect improves the discrimination of various face actions, we additionally provide recognition result, does not use any Key-frame Extraction Algorithm.I
This case " is not used " referred to as.The results are shown in Table 3.After the stage one, accuracy is improved.In the base in stage one
On plinth, we reject the candidate key-frames with the feature that peels off using MAD.After completing this processing, we will obtain required pass
Key frame.As can be seen from Table III, compared with the stage one and without using key frame extraction, our two stages Key-frame Extraction Algorithm
Realize best identified performance.Demonstrate the validity of our Key-frame Extraction Algorithm.
The experimental result (unit: %) of the different key frame choice phases of table 3
Experiment two: in this experiment, we be absorbed in it is proposed that method and some other existing based on image
Method between comparative experiments.The method based on coring fuzzy rough set that we propose our method and Du Y et al.
It compares.The two-fold algorithm for acting on behalf of expert system proposed based on Anitha C et al..Convolutional Neural net is based on two kinds
The method of network.In order to verify we method validity, we use following model training and Test Strategy: training set includes
Classification of the random video clipping according to belonging to them is extracted from MFAY data set and YawDDR data set.Remaining video is cut
It collects and is used for test model.All video clippings all by it is proposed that Key-frame Extraction Algorithm handle.Selected key frame is used for
Trained and test network model.Since the movement as YT cannot be effectively detected in the method based on image, these feelings
Experimental result under condition is not recorded in our table and figure.It as shown in table 4 and Figure 10, is compared with other methods, is based on
The discrimination of the detection method of yawning of subtle facial movement and key frame of video has obtained significant raising.We identify various faces
The method of portion's movement is better than existing method, effectively reduces error detection.Method based on video can efficiently extract foot
Enough space-time motion characteristics simultaneously realize detection of dynamically yawning.This further demonstrate it is proposed that method robustness.
The testing result (unit: %) of table 4 mentioned method and four kinds of advanced methods of the invention on YAWDDR data set
Experiment three: in this experiment, we compare the method based on image and the method based on video.Our method
Use successive frame as input, this is a kind of method based on video.For the method based on image, YawDDR data set and
Frame image in MAFY data set is for training and testing.We equably extract some frames from two datasets, and according to
Class belonging to them is that they distribute label.The data processing step and verification algorithm of these experiments are identical.Experimental result
As shown in table 5.The result shows that there is better performance than the method based on image based on the method for video, because yawning is
Continuous action rather than it is static.Method based on video can detecte yawning in various facial situations.If being used only one
Frame is identified that the material time action message between frame will lose.It indicates that the feature yawned may be acted with instruction to sing
Or the obscure aspects shouted.In contrast, the method based on video can provide enough space-time action messages, can lead to
Movement frame sequence is crossed to classify to movement.To yawn be considered as movement rather than static state go to detect, can significantly change
A large amount of false retrieval problems present in the kind method detected based on still image.
Detection method experimental result (unit: %) of the table 5 based on picture and based on video
Experiment shows method proposed by the present invention in terms of discrimination and overall performance better than existing method, energy effective district
It point yawns and other facial deliberate actions, effectively reduces driver and yawn the false detection rate of behavior.
The above only expresses the preferred embodiment of the present invention, and the description thereof is more specific and detailed, but can not be because
This and be interpreted as limitations on the scope of the patent of the present invention.It should be pointed out that for those of ordinary skill in the art, In
Under the premise of not departing from present inventive concept, several deformations can also be made, improves and substitutes, these belong to protection of the invention
Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.
Claims (4)
- The detection method 1. driver based on subtle facial action recognition yawns, it is characterised in that: the following steps are included:Step 1, driver's driving video that vehicle-mounted vidicon captures is pre-processed, carries out Face datection and segmentation, figure As size normalization and denoising;Step 2, it proposes Key-frame Extraction Algorithm, is picked by the screening of picture histogram similarity threshold and the similarity picture that peels off Except the method combined, to extract the key frame in deliberate action sequence;Step 3, according to the key frame of selection, establishing has the 3D deep learning network (3D-LTS) of low Temporal sampling to detect Various behaviors of yawning.
- The detection method 2. driver according to claim 1 based on subtle facial action recognition yawns, feature exist In: it includes: to be examined using Viola-Jones face that the driver's driving video captured to vehicle-mounted vidicon, which carries out pretreatment, Method of determining and calculating carries out the detection of driver's human face region, is partitioned into driver's facial area, is gone using Fast median filtering algorithm It makes an uproar processing.
- The detection method 3. driver according to claim 1 based on subtle facial action recognition yawns, feature exist In: the Key-frame Extraction Algorithm from a series of original video frame F={ Fj, j=1 ..., N } in extract a series of key frames K={ Ki, i=1 ..., M };Wherein M indicates the quantity of the key frame selected from primitive frame, and N indicates the quantity of primitive frame, institute Stating Key-frame Extraction Algorithm includes two choice phases:In the first choice stage, the RGB color histogram of each video frame is calculated;Then, two companies are calculated using Euclidean distance The color histogram γ of continuous framejAnd γj+1Between similarity:Wherein, 1≤j≤N-1, n are the dimensions of picture color histogram;Similarity threshold T is calculated by formula (2)s:Ts=μs (2)Wherein, μsFor Mean (S), S SjSet, work as Si> TsWhen, it is believed that FjAnd Fj+1Similarity is smaller, by FjIt is added candidate Crucial frame queue;In second choice phase, the candidate key-frames picture that those in candidate key-frames have the feature that peels off is rejected, to obtain Final key frame, uses two kinds of image similarity Measure Indexes: Euclidean distance (ED) and root-mean-square error (RMSE), in use Value absolute deviation (MAD) has the frame for the feature that peels off to detect, and calculates MAD according to formula (3):MAD=median (| Xi-median(X)|) (3)Continuous two candidate key-frames are denoted as KI, i+1, for all KI, i+1, their RMSE and ED value is calculated, for meter Each the RMSE (K calculatedI, i+1) and ED (KI, i+1), their MAD value is calculated, α=MAD (RMSE), β=MAD are denoted as (ED), RMSE (KI, i+1) calculation formula such as formula (4) shown in, ED (KI, i+1) calculation formula such as formula (5) shown in, when RMSE(KI, i+1) it is less than α and ED (KI, i+1) be less than β when, it is believed that KiIt is with the candidate key-frames of feature of peeling off, and by KiIt moves Except candidate key-frames;Here n represents KiSize;Here m represents the picture color histogram dimension of candidate key-frames.
- The detection method 4. driver according to claim 1 based on subtle facial action recognition yawns, feature exist In: the 3D-LTS network extracts for space-time characteristic and the identification of deliberate action, the 3D-LTS Web vector graphic 8 is non-super Frame frame extracts space-time characteristic using four 3D convolutional layers, all convolution filters are all 3 × 3 as input from successive frame × 3, stride is 1 × 1 × 1, and all pond layers are all maximum ponds, and the kernel size of the first and second tether layers is 1 × 2 × 2, preceding four The filter quantity of a convolutional layer is respectively 32,64,128 and 256, and third pond inner nuclear layer is 2 × 4 × 4, and convolutional layer is followed by One layer being fully connected, be used for mappings characteristics, using one with 1024 output be fully connected layer, be used for integration characteristic Distribution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910658690.2A CN110502995B (en) | 2019-07-19 | 2019-07-19 | Driver yawning detection method based on fine facial action recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910658690.2A CN110502995B (en) | 2019-07-19 | 2019-07-19 | Driver yawning detection method based on fine facial action recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110502995A true CN110502995A (en) | 2019-11-26 |
CN110502995B CN110502995B (en) | 2023-03-14 |
Family
ID=68586658
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910658690.2A Active CN110502995B (en) | 2019-07-19 | 2019-07-19 | Driver yawning detection method based on fine facial action recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110502995B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111445548A (en) * | 2020-03-21 | 2020-07-24 | 南昌大学 | Multi-view face image generation method based on non-paired images |
CN113724211A (en) * | 2021-08-13 | 2021-11-30 | 扬州美德莱医疗用品有限公司 | Fault automatic identification method and system based on state induction |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106203283A (en) * | 2016-06-30 | 2016-12-07 | 重庆理工大学 | Based on Three dimensional convolution deep neural network and the action identification method of deep video |
CN107087211A (en) * | 2017-03-30 | 2017-08-22 | 北京奇艺世纪科技有限公司 | A kind of anchor shots detection method and device |
US20170340292A1 (en) * | 2016-05-31 | 2017-11-30 | Stmicroelectronics S.R.L. | Method for the detecting electrocardiogram anomalies and corresponding system |
CN107451552A (en) * | 2017-07-25 | 2017-12-08 | 北京联合大学 | A kind of gesture identification method based on 3D CNN and convolution LSTM |
CN108399380A (en) * | 2018-02-12 | 2018-08-14 | 北京工业大学 | A kind of video actions detection method based on Three dimensional convolution and Faster RCNN |
CN108830157A (en) * | 2018-05-15 | 2018-11-16 | 华北电力大学(保定) | Human bodys' response method based on attention mechanism and 3D convolutional neural networks |
CN108875610A (en) * | 2018-06-05 | 2018-11-23 | 北京大学深圳研究生院 | A method of positioning for actuation time axis in video based on border searching |
CN109145823A (en) * | 2018-08-22 | 2019-01-04 | 佛山铮荣科技有限公司 | A kind of market monitoring device |
CN109697434A (en) * | 2019-01-07 | 2019-04-30 | 腾讯科技(深圳)有限公司 | A kind of Activity recognition method, apparatus and storage medium |
-
2019
- 2019-07-19 CN CN201910658690.2A patent/CN110502995B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170340292A1 (en) * | 2016-05-31 | 2017-11-30 | Stmicroelectronics S.R.L. | Method for the detecting electrocardiogram anomalies and corresponding system |
CN106203283A (en) * | 2016-06-30 | 2016-12-07 | 重庆理工大学 | Based on Three dimensional convolution deep neural network and the action identification method of deep video |
CN107087211A (en) * | 2017-03-30 | 2017-08-22 | 北京奇艺世纪科技有限公司 | A kind of anchor shots detection method and device |
CN107451552A (en) * | 2017-07-25 | 2017-12-08 | 北京联合大学 | A kind of gesture identification method based on 3D CNN and convolution LSTM |
CN108399380A (en) * | 2018-02-12 | 2018-08-14 | 北京工业大学 | A kind of video actions detection method based on Three dimensional convolution and Faster RCNN |
CN108830157A (en) * | 2018-05-15 | 2018-11-16 | 华北电力大学(保定) | Human bodys' response method based on attention mechanism and 3D convolutional neural networks |
CN108875610A (en) * | 2018-06-05 | 2018-11-23 | 北京大学深圳研究生院 | A method of positioning for actuation time axis in video based on border searching |
CN109145823A (en) * | 2018-08-22 | 2019-01-04 | 佛山铮荣科技有限公司 | A kind of market monitoring device |
CN109697434A (en) * | 2019-01-07 | 2019-04-30 | 腾讯科技(深圳)有限公司 | A kind of Activity recognition method, apparatus and storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111445548A (en) * | 2020-03-21 | 2020-07-24 | 南昌大学 | Multi-view face image generation method based on non-paired images |
CN111445548B (en) * | 2020-03-21 | 2022-08-09 | 南昌大学 | Multi-view face image generation method based on non-paired images |
CN113724211A (en) * | 2021-08-13 | 2021-11-30 | 扬州美德莱医疗用品有限公司 | Fault automatic identification method and system based on state induction |
Also Published As
Publication number | Publication date |
---|---|
CN110502995B (en) | 2023-03-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | Driver yawning detection based on subtle facial action recognition | |
CN106778595B (en) | Method for detecting abnormal behaviors in crowd based on Gaussian mixture model | |
Choi et al. | Facial micro-expression recognition using two-dimensional landmark feature maps | |
CN111914664A (en) | Vehicle multi-target detection and track tracking method based on re-identification | |
CN110414371A (en) | A kind of real-time face expression recognition method based on multiple dimensioned nuclear convolution neural network | |
CN110263712B (en) | Coarse and fine pedestrian detection method based on region candidates | |
Yao et al. | When, where, and what? A new dataset for anomaly detection in driving videos | |
Zhang et al. | A dense u-net with cross-layer intersection for detection and localization of image forgery | |
CN108280421B (en) | Human behavior recognition method based on multi-feature depth motion map | |
CN110378233B (en) | Double-branch anomaly detection method based on crowd behavior prior knowledge | |
Wang et al. | Improving human action recognition by non-action classification | |
Yang et al. | Single shot multibox detector with kalman filter for online pedestrian detection in video | |
CN114333070A (en) | Examinee abnormal behavior detection method based on deep learning | |
Lyu et al. | Small object recognition algorithm of grain pests based on SSD feature fusion | |
CN108416780A (en) | A kind of object detection and matching process based on twin-area-of-interest pond model | |
CN111738218B (en) | Human body abnormal behavior recognition system and method | |
CN105701466A (en) | Rapid all angle face tracking method | |
Yang et al. | Selective spatio-temporal aggregation based pose refinement system: Towards understanding human activities in real-world videos | |
CN113378675A (en) | Face recognition method for simultaneous detection and feature extraction | |
CN110502995A (en) | Driver based on subtle facial action recognition yawns detection method | |
He et al. | Occluded pedestrian detection via distribution-based mutual-supervised feature learning | |
Saypadith et al. | An approach to detect anomaly in video using deep generative network | |
Chen et al. | Dlfmnet: End-to-end detection and localization of face manipulation using multi-domain features | |
CN109165542A (en) | Based on the pedestrian detection method for simplifying convolutional neural networks | |
Li et al. | MobileNetV3-CenterNet: A target recognition method for avoiding missed detection effectively based on a lightweight network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |