CN111582086A - Fatigue driving identification method and system based on multiple characteristics - Google Patents

Fatigue driving identification method and system based on multiple characteristics Download PDF

Info

Publication number
CN111582086A
CN111582086A CN202010338222.XA CN202010338222A CN111582086A CN 111582086 A CN111582086 A CN 111582086A CN 202010338222 A CN202010338222 A CN 202010338222A CN 111582086 A CN111582086 A CN 111582086A
Authority
CN
China
Prior art keywords
fatigue
eye
state
image
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010338222.XA
Other languages
Chinese (zh)
Inventor
胡峰松
彭清舟
徐蓉
程哲坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
CERNET Corp
Original Assignee
Hunan University
CERNET Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University, CERNET Corp filed Critical Hunan University
Priority to CN202010338222.XA priority Critical patent/CN111582086A/en
Publication of CN111582086A publication Critical patent/CN111582086A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a fatigue driving identification method and a system based on multiple characteristics, wherein the method and the system preprocess images, not only filter out noise, but also avoid the problems of poor image quality and low detection precision caused by the influence of external environment factors on the images; the AdaBoost algorithm can be used for stably, quickly and efficiently detecting the human face, so that the complexity of human face detection is reduced; the scale space-based facial target tracking algorithm adopts a self-adaptive high-confidence updating strategy, when an error occurs in a target tracking stage, the confidence of target detection is low, and a model is not updated, so that the risk of drifting of the tracking algorithm is effectively reduced, and the tracking precision is improved; the eye state recognition is carried out by adopting the SVM classifier, so that the accuracy of the eye state recognition is improved, and the method is high in recognition accuracy and strong in adaptability to the environment.

Description

Fatigue driving identification method and system based on multiple characteristics
Technical Field
The invention belongs to the technical field of driving safety, and particularly relates to a fatigue driving identification method and system based on multiple characteristics.
Background
Nowadays, the fatigue detection technology of drivers is more and more mature, and the fatigue detection method can be mainly divided into three types:
the method is based on a vehicle detection method, and mainly judges the fatigue state by collecting vehicle driving parameters and analyzing abnormal fluctuation of the parameters. The detection method comprises the steps of detecting the turning angle degree of the steering wheel, detecting the steering grip strength of the steering wheel, detecting the vehicle speed, detecting the vehicle deviation, detecting the brake pedal force, detecting the accelerator pedal force and the like. Most of the current vehicles are equipped with different types of sensors for collecting real-time parameters such as driving speed, steering wheel angle, fuel consumption and engine speed, and the fatigue state of a driver can be indirectly detected through single or comprehensive analysis of the data. However, the analysis result of the method is easily affected by external environmental factors such as personal driving habits, weather, vehicle characteristics and road conditions, and the method is not strong in robustness and low in recognition accuracy. And the abnormality can be detected only when the driver is about to have a traffic accident, and early warning cannot be carried out. Therefore, the analysis result of the method is preferably used as an auxiliary detection index rather than a main detection index.
The other is a detection method based on drivers, and the methods can be divided into methods based on the physiological parameters of the drivers and methods based on the behavior characteristics of the drivers. Relevant research shows that when a driver is in a fatigue state, physiological response can be slowed down, the stimulation response of the body to the outside can be delayed, and the physiological indexes can deviate from normal values. Therefore, the physiological parameters of the driver collected by the physiological sensor can be used for judging whether the driver is in a fatigue state, and the physiological parameters are mainly detected based on electroencephalogram (EEG), electrocardio signals (ECG), Electromyogram (EMG) and the like. However, in the actual fatigue detection application, the physiological parameters are greatly different among individuals, so that the physiological parameters are easily influenced by factors such as sex, age and body type of a driver, are not favorable for fatigue judgment by adopting a unified standard, and are limited in the actual application. When the driver is drowsy, its facial features will be different from those of the awake state. Therefore, the method for detecting the fatigue driving in real time is effective by analyzing the facial feature data of the driver by using the computer vision technology. The characteristic parameters extracted by the method mainly comprise eye movement characteristics (blink frequency, PERCLOS, eye opening and closing degree, gazing direction and the like), mouth states (yawning frequency and the like) and head positions. Since the change in the head and facial features is relatively significant, it is easily detected. However, the feature extraction, i.e., the detection result, is susceptible to factors such as occlusion and illumination, resulting in low recognition accuracy.
And thirdly, the detection method based on information fusion, which integrates various fatigue characteristics, has improved detection precision and reliability compared with the fatigue detection method based on single characteristic information, but has great challenges in extracting various characteristics and establishing a model based on the information fusion detection method by using the prior art, and the established fatigue detection model has poor applicability to complex environments.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a fatigue driving identification method and system based on multiple characteristics, and aims to solve the problems of low identification accuracy and poor adaptability of the existing detection method.
The invention solves the technical problems through the following technical scheme: a fatigue driving identification method based on multiple characteristics comprises the following steps:
step 1: acquiring a video single-frame image in real time, and preprocessing the video single-frame image;
step 2: performing face detection on the preprocessed video image by adopting an AdaBoost algorithm based on Haar-like characteristics, and tracking the detected face in real time by adopting a target tracking algorithm based on a scale space;
and step 3: the method comprises the steps of positioning feature points of a human face, respectively positioning an eye region and a mouth region according to the positioned feature points, identifying an eye state by adopting an SVM (support vector machine) classifier, and identifying a mouth state by calculating the aspect ratio of the mouth;
and 4, step 4: respectively calculating eye fatigue parameters and mouth fatigue parameters according to the eye state and the mouth state, and calculating head fatigue parameters according to the positioned position information of the feature points;
and 5: and identifying and early warning the fatigue state of the driver according to the eye fatigue parameter, the mouth fatigue parameter and the head fatigue parameter.
The method of the invention preprocesses the image, not only filters out noise, but also avoids the problems of poor image quality and low detection precision caused by the influence of external environment factors on the image; by AdaBoosthe t algorithm can stably, quickly and efficiently detect the face, and the complexity of face detection is reduced; the scale space-based facial target tracking algorithm adopts a self-adaptive high-confidence updating strategy, when an error occurs in a target tracking stage, the confidence of target detection is low, and a model is not updated, so that the risk of drifting of the tracking algorithm is effectively reduced, and the tracking precision is improved; the eye state recognition is carried out by adopting the SVM classifier, so that the accuracy of the eye state recognition is improved, and the method is high in recognition accuracy and strong in adaptability to the environment.
Further, in step 1, the video single-frame image preprocessing process includes:
step 1.1: carrying out smooth denoising processing on a video single-frame image;
step 1.2: and carrying out illumination compensation processing on the video image subjected to the smooth denoising processing.
By preprocessing the video image, noise interference in the image can be filtered, the image is prevented from being influenced by external environment factors, the quality of the image is improved, and the accuracy of subsequent detection and analysis is improved.
Further, in the step 1.1, a smooth denoising process is performed on the video image by using adaptive median filtering.
The self-adaptive median filtering can simultaneously take account of denoising and image detail information retention when the noise density is high, effectively filters noise interference existing in an original image, can retain useful information in the image while improving the quality of the image, improves the signal to noise ratio, and enables the image to be better suitable for the application of a specific scene.
Further, in the step 1.2, an illumination equalization algorithm based on a dynamic threshold is adopted to perform illumination compensation processing on the video image with different illumination brightness.
The problem that the human face cannot be accurately detected and the features of the human face cannot be extracted due to the fact that the image is unevenly distributed when light is received is avoided, and the image is prevented from being influenced by factors such as illumination intensity, light source color and position.
Further, in the step 2, the specific operation steps of the AdaBoost algorithm for face detection are as follows:
step 2.11: calculating Haar-like characteristics of the image by using the integral graph;
step 2.12: for the Haar-like characteristics, selecting an optimal weak classifier through training iteration, and constructing the weak classifier into a strong classifier according to a weighted voting mode;
step 2.13: then connecting a plurality of strong classifiers obtained by training in series to form a cascade classifier with a cascade structure;
step 2.14: and carrying out face detection on the image by adopting a stacked classifier.
Further, in step 2, a target tracking algorithm based on a scale space is adopted to track the detected face in real time, and the specific operation steps are as follows:
step 2.21: taking a face area and a scale detected by a face as an initial position P of a target1And the dimension S1And carrying out position correlation filter and scale correlation filter training on the face region to obtain a position model
Figure BDA0002467500510000021
Figure BDA0002467500510000022
Sum scale model
Figure BDA0002467500510000023
Step 2.22: according to the last frame It-1Target position P oft-1Sum scale St-1At the current frame ItA characteristic sample with the size 2 times of the target of the previous frame is collected
Figure BDA0002467500510000024
Using characteristic samples
Figure BDA0002467500510000025
And the position model of the previous frame
Figure BDA0002467500510000026
Calculating the maximum response value of the position-dependent filter to obtain the new position P of the targett
Step 2.23: from the determined new position P of the targettTaking the current new position as a central point, utilizing a one-dimensional scale correlation filter to obtain S candidate samples with different scales according to a scaling rule, and respectively extracting d-dimensional features from each candidate sample to obtain a feature sample of the current frame
Figure BDA0002467500510000027
Then using the feature sample
Figure BDA0002467500510000028
Sum scale model
Figure BDA0002467500510000029
Calculating the response value of the 1 × S-dimensional scale correlation filter, wherein the scale corresponding to the maximum response value is the scale S of the final targett
Step 2.24: if the maximum response value and the average peak value correlation energy of the current frame correlation filter both meet the updating strategy condition, then the current frame ItAccording to position PtSum scale StExtraction of feature ft trans、ft scaleUpdating the location model
Figure BDA00024675005100000210
Sum scale model
Figure BDA00024675005100000211
Otherwise in the current frame ItCarrying out face detection again;
the updating strategy conditions are that the maximum response value and the average peak correlation energy are respectively larger than the ratio β1Sum ratio β2,β1Is 0.7, β2Is 0.45.
Preferably, the calculation expression of the response value of the position or scale dependent filter is:
Figure BDA00024675005100000212
wherein, F-1() DFT, y for inverse discrete Fourier transformtFor the response value obtained, d-dimensional features are extracted from each pixel of the feature sample, wherein the feature map of the l-th dimension is marked as flWhere l is 1,2, …, d, l is a certain dimension of the feature, λ is the coefficient of the regular term,
Figure BDA00024675005100000213
respectively the numerator and denominator of the filter updated in the previous frame,
Figure BDA00024675005100000214
to find the two-dimensional DFT of each dimension of the feature map of the current frame image.
Further, in step 3, a cascade regression tree algorithm is adopted to locate the feature points of the human face, where the feature points of the human face include eye feature points and mouth feature points.
Further, in step 3, the specific operation of identifying the eye state by using the SVM classifier is as follows:
training an SVM classifier by taking the aspect ratio of human eyes and the accumulated difference value of black pixels in a binary image area of the human eyes as input characteristics of the SVM classifier, classifying and identifying the eye state by adopting the trained SVM classifier,the accuracy of eye state recognition is improved; the black pixel accumulated difference value F of the human eye binary image areaBlack colourThe calculation formula of (2) is as follows:
Figure BDA00024675005100000215
T(t)=α*|D(t-1)|,α∈[0,1]
where n (t) is the number of black pixels of the t-th frame, Δ n (t) is the difference in the number of black pixels between the t-th frame and the t-1-th frame, D (t-1) is the accumulated difference in the number of black pixels of the t-1-th frame in "state 1", and α is a constant value between 0 and 1.
Further, in the step 3, the aspect ratio of the mouth part is MAR, and when MAR is less than or equal to 0.4, the mouth part is in a closed state; when MAR is more than 0.4 and less than or equal to 0.8, the mouth is in a normal speaking state; when MAR >0.8, the mouth is in the yawning state.
Further, in the step 4, the eye fatigue parameters include the ratio of eye closure frame number, blinking frequency and longest duration eye closure time, the mouth fatigue parameters include yawning frequency, and the head fatigue parameters include nodding frequency; preferably, the fatigue state is identified by performing weighted summation on the eye fatigue parameter, the mouth fatigue parameter and the head fatigue parameter, and the specific weighted summation expression is as follows:
Efatigue=VECR×W1+VMECT×W2+VBF×W3+VNF×W4+VYF×W5
Wherein E isFatigueTo weight the fatigue value, VECRThe number of closed-eye frames, VMECTFor the longest duration of eye closure time, VBFFrequency of blinking, VNFFor nodding frequency, VYFTo beat the frequency of yawning, WiThe weight values corresponding to the different parameters are set,
Figure BDA0002467500510000031
preferably, when the weighted fatigue value is less than 0.3, the state is the waking state; when the weighted fatigue value is more than or equal to 0.3 and less than 0.7, the state is a fatigue state; when the weighted fatigue value is 0.7 or more, the fatigue state is severe.
The invention also provides a fatigue driving recognition system based on multiple characteristics, which comprises:
the image acquisition and processing unit is used for acquiring a video single-frame image in real time and preprocessing the video single-frame image;
the face detection and tracking unit is used for carrying out face detection on the preprocessed video image by adopting an AdaBoost algorithm based on Haar-like characteristics and tracking the detected face in real time by adopting a target tracking algorithm based on a scale space;
the positioning and state recognition unit is used for positioning the feature points of the human face, respectively positioning the eye region and the mouth region according to the positioned feature points, recognizing the eye state by adopting an SVM classifier, and recognizing the mouth state by calculating the aspect ratio of the mouth;
the parameter calculation unit is used for calculating eye fatigue parameters and mouth fatigue parameters according to the eye state and the mouth state respectively and calculating head fatigue parameters according to the positioned feature point position information;
and the fatigue state identification unit is used for identifying and early warning the fatigue state of the driver according to the eye fatigue parameter, the mouth fatigue parameter and the head fatigue parameter.
Advantageous effects
Compared with the prior art, the fatigue driving identification method and system based on multiple characteristics, provided by the invention, preprocesses the image, not only filters out noise, but also avoids the problems of poor image quality and low detection precision caused by the influence of external environment factors on the image; the AdaBoost algorithm can be used for stably, quickly and efficiently detecting the human face, so that the complexity of human face detection is reduced; the scale space-based facial target tracking algorithm adopts a self-adaptive high-confidence updating strategy, when an error occurs in a target tracking stage, the confidence of target detection is low, and a model is not updated, so that the risk of drifting of the tracking algorithm is effectively reduced, and the tracking precision is improved; the eye state recognition is carried out by adopting the SVM classifier, so that the accuracy of the eye state recognition is improved, and the method is high in recognition accuracy and strong in adaptability to the environment.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only one embodiment of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a flow chart of a method of identifying fatigue driving in an embodiment of the present invention;
FIG. 2 is a flow chart of face detection and facial target tracking according to an embodiment of the present invention;
FIG. 3 is a rectangle D in an embodiment of the present invention0Region pixels and a computational schematic;
FIG. 4 is a flowchart of target position estimation in the flow of facial target tracking in an embodiment of the present invention;
FIG. 5 is a sample of a scale filter in an embodiment of the invention;
fig. 6 is a target size estimation flow chart in the face target tracking flow in the embodiment of the present invention;
FIG. 7 is a face feature point model in an embodiment of the invention;
FIG. 8 is a diagram illustrating the detection results of facial feature points from different angles according to an embodiment of the present invention;
fig. 9 is a schematic diagram of eye positioning based on feature points in an embodiment of the invention, where fig. 9(a) is a human face feature point model, and fig. 9(b) is a schematic diagram of eye positioning;
FIG. 10 is a schematic diagram of six key points of a human eye in an embodiment of the present invention, with FIG. 10(a) in an open-eye state and FIG. 10(b) in a closed-eye state;
FIG. 11 is a graph of EAR mean results in an embodiment of the present invention;
FIG. 12 shows the number of black pixels in the process of opening and closing the eyes of the human eye according to the embodiment of the present invention;
FIG. 13 is a diagram illustrating the difference between the number of black pixels in two consecutive frames;
FIG. 14 is a cumulative difference of the number of black pixels for the human eye in an embodiment of the invention;
FIG. 15 is a cumulative difference of the number of black pixels for an adaptive threshold human eye in an embodiment of the invention;
FIG. 16 is a schematic diagram of the 10 key points of the mouth in an embodiment of the present invention;
FIG. 17 is a graph showing the results of mouth MAR detection in the embodiment of the present invention;
FIG. 18 is a schematic illustration of the opening and closing process of an embodiment of the present invention;
FIG. 19 is the EAR threshold, frame number K in the embodiment of the present inventionEye (A)A value optimizing result graph;
fig. 20 is a schematic view of a state of a mouth in an embodiment of the present invention;
fig. 21 is a diagram of head motion analysis in an embodiment of the present invention.
Detailed Description
The technical solutions in the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the fatigue driving identification method based on multiple features provided by the present invention includes:
1. the method comprises the following steps of acquiring a video single-frame image in real time, and preprocessing the video single-frame image, wherein the preprocessing process comprises the following specific steps:
and (1.1) performing smooth denoising processing on the video single-frame image by adopting self-adaptive median filtering.
The final purpose of denoising the video image is to improve the quality of the acquired image and retain useful information carried in the original image. The problem that the image quality is reduced due to noise interference of an actual image can be effectively solved through a filtering denoising technology, the signal to noise ratio is increased, and the image is better suitable for application of a specific scene. The adaptive median filtering can dynamically change the size of a filtering template according to a preset template, and can judge whether the current pixel is noise or not, if so, the current pixel value is replaced by a neighborhood median, and the processing process comprises two steps:
step A: order to
Figure BDA0002467500510000041
If A is1>0 and A2<0, go to step B, otherwise increase the size of the filtering template, and remember that the increased template size is SForm panel(ii) a If S isForm panel≤STemplate maxRepeating step A, otherwise, making Zxy=ZmedOutput Zxy
And B: order to
Figure BDA0002467500510000042
If B is present1>0 and B2<0 then outputs ZxyOtherwise, output Zmed
Wherein S isForm panelFor the filter template matrix size, point (x, y) is the center point of the filter template matrix, SxyRepresenting the filtering area, S, centered on the point (x, y)Template maxMaximum window size, Z, allowed for the filter template (filter window)minIs the minimum pixel value, Z, in the filter windowmaxIs the maximum pixel value, Z, in the filter windowmedBeing the median value of the pixel values in the filtering window, ZxyIs the pixel value at point ((x, y).
In order to compare the smooth denoising effects of different filtering methods on images, salt and pepper noise with the intensity of 0.1 and Gaussian noise with the average value of 0.1 and the variance of 20 are respectively added to a test image, then the salt and pepper noise and the Gaussian noise images are respectively subjected to smooth denoising processing by using different filtering methods, and the processing results of different methods are compared and analyzed. The comparison shows that the denoising effect of the median filtering and the self-adaptive median filtering on the salt-pepper noise is obviously better than that of the other two methods, and the mean filtering has a better denoising effect on the Gaussian noise.
TABLE 1 comparison of the indexes of the algorithms for salt and pepper noise
Figure BDA0002467500510000043
In order to more objectively verify the denoising effect of each method, the mean square error MSE and the peak signal-to-noise ratio PSNR before and after image processing and the algorithm running time T of each filtering method are respectively calculated, the calculation formulas of MSE and PSNR are shown in formulas (1) and (2), and the result analysis is shown in tables 1 and 2.
Figure BDA0002467500510000044
Where f (x, y) represents a noisy image of size M × N, and f*(x, y) represents the filtered denoised image.
Figure BDA0002467500510000045
Where MAX is the maximum pixel value that can be used in an image.
TABLE 2 Gauss noise Algorithm index comparison
Figure BDA0002467500510000046
Through comprehensive analysis, the three filtering methods can achieve a certain degree of denoising effect, but a filtering template needs to be set in advance, edge details and contours in an image are blurred when the image is filtered and denoised, and the image needs to be sharpened at a later stage to highlight edge information of the image. Therefore, in order to retain image detail information to the maximum extent while smoothing denoising, the invention uses the adaptive filtering algorithm to improve the image denoising capability.
And (1.2) performing illumination compensation processing on the video image with different illumination brightness by adopting an illumination balance algorithm based on a dynamic threshold value.
The collected color information of the video image is easily influenced by factors such as illumination brightness, light source color and position, and the like, so that the video image is unevenly distributed. In order to accurately detect a human face from a human face image which cannot be illuminated with light at an intensity and extract features, the human face needs to be firstly detectedAnd carrying out illumination equalization processing on the image. Converting an image from an RGB color space to YCbC according to equation (3)rAnd the color space performs illumination equalization processing on the image.
Figure BDA0002467500510000051
The processing procedure is divided into two steps of detecting reference white points based on dynamic threshold values and adjusting image pixels. Selection of reference white point: an image is first divided into M blocks according to an appropriate aspect ratio (block size), and an average M of Cb and Cr is calculated for each blockb、MrThen, the average absolute difference D is calculated from the equation (4) respectivelyb、Dr
Figure BDA0002467500510000052
Wherein N is the total number of pixels of the image block, Cb(i, j) and Cr(i, j) is Cb, Cr value (chroma value) of pixel (i, j), for each region block, if DbAnd DrIf the color distribution is too small, the color distribution of the block is relatively uniform, and no treatment is needed; then M for each region block to be processedb、Mr、Db、DrM is taken as the whole image after summing and averagingb、Mr、Db、DrThe value satisfying the relation (5) is a set of pixel points in the near-white region of the image.
Figure BDA0002467500510000053
Based on the brightness value (Y value), the pixel points 10% before the brightness value of the near white area are selected as the reference white points.
Adjustment of the image: in order to keep the brightness of the whole image the same, the gain R of each channel can be obtained by referring to the average value of the white point in the RGB channels and the maximum brightness value (maximum value of Y value) of the whole imagegain、Ggain、BgainThe calculation formula is as follows:
Figure BDA0002467500510000054
wherein R isavg、Gavg、BavgAverage value of reference white point in RGB channels, YmaxThe maximum brightness value of the pixel points in the image. The pixel value of each pixel point in the image is adjusted in the following way:
R′=R*Rgain,G′=G*Ggain,B′=B*Bgain(7)
r, G, B is the original pixel value of the image, and R ', G ', B ' are the adjusted pixel values of the image.
2. The preprocessed video image is subjected to face detection by adopting an AdaBoost algorithm based on Haar-like characteristics, and the detected face is tracked in real time by adopting a target tracking algorithm based on scale space, as shown in FIG. 2.
Harr-like features can be classified into three categories: linear feature, edge feature, point feature (central feature), diagonal feature, Harr-like feature value is the difference between the sum of all pixel gray values in the white rectangle and the sum of all pixel gray values in the black rectangle, which reflects the gray change condition of the image. The Haar-like characteristics can effectively extract the texture characteristics of the image, and the characteristic values of different positions and scales are extracted through translation and scaling of the template.
Due to the variation of the category, the size and the position of the Haar-like characteristic rectangular template, even the size of the detection template or the window is small, the detection template or the window contains a great number of rectangular characteristic values. The number of rectangular features within a detection window of size 24 x 24 may also reach hundreds of thousands, as after the form of the features is determined. Due to the large number of features, fast calculation for solving the features is important.
The integral graph algorithm can calculate the pixel sum of any rectangular area in the image only by traversing the image once, and the calculation efficiency of the image characteristic value is improved to a great extent. The main idea is as follows: the sum of pixels from the starting point to each point of each rectangular region of the image is calculated, the value of each region is calculated and is stored in an array as an element, when the pixel sum of a certain region needs to be calculated subsequently, the value of a target region can be obtained by directly using an array index, recalculation is not needed, and calculation is accelerated.
The value of any point (i, j) on the integral image is the sum of the gray values of all pixel points in a rectangular area enclosed by the upper left corner of the gray image and the current point. The integral graph calculation formula is shown in formula (8):
Figure BDA0002467500510000055
where I (x, y) is the gray value at point (x, y). The integral graph can also be simplified by iterative operations as follows:
I′(i,j)=I′(i,j-1)+I′(i-1,j)-I′(i-1,j-1)+I(i,j) (9)
wherein the boundary point
Figure BDA0002467500510000061
I′(-1,j)=0,
Figure BDA0002467500510000062
I′(-1,-1)=0。
After obtaining the integral map, the feature value of the rectangular region is calculated only in relation to the integral map of the end point of the feature rectangle, so that the time consumed for calculating the feature value is fixed regardless of the scale transformation of the feature rectangle. The difference between the pixel sums of the two matrix areas is calculated only by calculating an integral graph of the end point of the characteristic area to perform simple addition and subtraction operation, so that the characteristic value of any rectangular area can be quickly calculated.
As D in FIG. 30The area to illustrate the integral graph algorithm:
integral I 'of endpoint 1'1=Sum(A0) (ii) a Integral I 'of endpoint 2'2=Sum(A0)+Sum(B0);
Integral I 'of endpoint 3'3=Sum(A0)+Sum(C0) (ii) a Integral I 'of endpoint 4'4=Sum(A0)+Sum(B0)+Sum(C0)+Sum(D0);
Wherein Sum (N)1) Indicating the region N1Is then the sum of all pixels, then region D0The sum of all pixels of (a) is:
Sum(D0)=I′1+I′4-I′2-I′3(10)
AdaBoosthe t algorithm is a classifier algorithm, and the algorithm principle is as follows: fast computation of H for an image using an integral graphaar-likeThe method is characterized in that an optimal weak classifier is selected through training iteration, the weak classifier is constructed into a strong classifier according to a weighting voting mode, and then a plurality of strong classifiers obtained through training are connected in series to form a cascade-structured stacked classifier, so that the detection speed and the accuracy of the classifier are improved. The algorithm trains a plurality of weak classifiers through the probability distribution of the positive and negative sample sets, the sample weight is updated once in each cycle, T weak classifiers are obtained after T cycles, and the strong classifiers are finally obtained through weight superposition.
Given a training data set:
T={(xi,yi)},i=1,2,...,NT(11)
wherein x isiFor the trained image, yiBelong to xiCorrectly classify the label set { -1, +1}, if yiIf the image is a positive sample, namely, the image is a face image, if y is 1iAnd-1, the image is a negative sample, that is, the image does not contain a human face. The training algorithm for the sample flows as follows:
firstly, initializing weight distribution of training data, making the weight of every sample be identical to make
Figure BDA0002467500510000063
Wherein D1Denotes the first iteration, w1iRepresenting the weight of the ith sample of the first iteration.
②, making m 1,2TAnd m is the number of iterations.In D with weight distributionmLearning on data to obtain weak classifier H with lowest errorm(x) X → { -1, +1} with a classification error rate:
Figure BDA0002467500510000064
and thirdly, the weight coefficient of each iteration of the weak classifier is as follows:
Figure BDA0002467500510000065
fourthly, updating weight distribution of the training set:
Figure BDA0002467500510000066
Figure BDA0002467500510000067
wherein ZmIn order to normalize the factors, the method comprises the steps of,
Figure BDA0002467500510000068
and fifthly, combining the weak classifiers to finally obtain a strong classifier through continuous iteration:
Figure BDA0002467500510000069
because an AdaBoost face detection algorithm based on haar-like characteristics is packaged in an open source library OpenCV, the invention carries out face detection by utilizing a haarcascade _ front _ default.xml classifier file which is self-trained in OpenCV, and a CascadeClassifier is a cascade classifier class defined by the OpenCV, wherein a multi-scale detection method is packaged, an image to be detected is input, the face detection is carried out on the image to be detected by loading the xml classifier file for detecting the face, and a possible face area rectangular frame is output.
And table 3 shows the accuracy of face detection by using the AdaBoost algorithm and the threshold skin color model under the conditions of similar skin color background interference and no interference. Through the analysis of the table 3, it is found that when a similar skin color background exists in a video image or similar skin color areas exist in other parts of a human body, the interference of the similar skin colors makes the detection range of a face detection algorithm based on a threshold skin color model not accurate enough, which may cause the occurrence of false detection. AdaBoost is mainly used for carrying out face classification detection according to haar characteristics, so that the interference of similar skin colors can be eliminated, the calculation efficiency and the accuracy are high, and the face can be quickly detected without carrying out characteristic screening, so that the face detection is carried out by using an AdaBoost algorithm.
Table 3 comparison of AdaBoost algorithm with threshold skin color model face detection accuracy
Figure BDA0002467500510000071
Considering that the variation range of the face position of the driver is small in the actual driving process, if the face detection and positioning is performed on each frame of the video image, not only the time complexity is increased, but also the interrelation between the continuous frames cannot be fully utilized. Therefore, in order to better position the face in the subsequent video image and improve the accuracy and robustness of detection, after the face is detected for the first time, the detected face is tracked in real time by adopting a target tracking algorithm based on a scale space.
The DSST (discrete Scale Space Tracker, DSST) algorithm is improved on the basis of the MOSSE algorithm, although the MOSEE algorithm improves the tracking accuracy and simultaneously reduces the complexity of calculation, the performance of a related filter tracking algorithm is greatly improved, when a filter is solved, the input of the MOSEE algorithm is the gray level feature of an image, and the feature dimension used by a model is too low to well reflect the characteristics of texture, edge and the like of a target. And only the translational motion of the central point of the target area between frames is estimated, the scale change of the target in the motion process is not considered, and the target cannot be well tracked when the scale of the target is changed. Based on the deficiency of the MOSSE algorithm, M Danelljan, G
Figure BDA00024675005100000710
F Khan, el at proposes a three-dimensional scale space correlation filter translation-scale joint tracking method. The DSST replaces the original grayscale feature with the HOG feature, so that the target feature can be better described. In addition, in order to better adapt to the scale change of the tracked target, a scale correlation filter is added, and the position change and the scale change are tracked through the two filters respectively. A two-dimensional position Filter (transformation Filter) is used for evaluating the target position change, a one-dimensional Scale Filter (Scale Filter) is used for carrying out target Scale estimation, and a three-dimensional joint position and Scale Filter transformation-Scale is used for target positioning. The two filters are relatively independent and therefore can be trained and tested using different features and feature computation approaches.
(1) Position dependent filter
Filter training
Collecting a sample with the size 2 times of the target size, extracting d-dimensional features from each pixel of the sample, and recording the feature map as f l1,2, …, d. To construct the optimal correlation filter h, the following objective function is minimized over the different feature dimensions i:
Figure BDA0002467500510000072
★ represents cyclic correlation, l represents a certain dimension of the characteristic, and λ is a coefficient of a regular term and is set to be 0.01, wherein the term λ is used for avoiding the condition that the denominator is zero in the process of solving the frequency domain parameter of the filter, and simultaneously, the variation range of the filter parameter can be controlled, the smaller the λ is, the larger the variation range of the filter parameter is, the expected correlation output g is a Gaussian function with parameterized standard deviation, and f isl,hlAnd g all have the same dimensions and size.
Fourier transform is carried out on the formula (18), and a filter is obtained by solving the partial derivative and making the derivative be 0
Figure BDA0002467500510000073
WhereinCapital letters represent corresponding values after Discrete Fourier Transform (DFT), that is, F is obtained by performing two-dimensional DFT on the characteristic of each dimension of FlAnd performing two-dimensional DFT on G to obtain G.
For all training samples f1,f2,...,ftTo simplify the calculation of equation (19), the filters are updated separately
Figure BDA0002467500510000074
Of (a) a molecule
Figure BDA0002467500510000075
And denominator
Figure BDA0002467500510000076
The calculation formula is as follows:
Figure BDA0002467500510000077
where η represents the learning rate (η ═ 0.025), and t represents the number of samples. Substituting both G and F into the above equation, the value of the filter template H can be obtained. The simplified calculation of equation (19) is:
Figure BDA0002467500510000078
estimation of target position
The target position estimation process shown in FIG. 4, for the feature map z of the t-th frame imagetSimilarly, the two-dimensional DFT of each dimension z is obtained
Figure BDA0002467500510000079
Obtaining the maximum correlation filter response value y by solving the inverse DFT at the target positiontTo determine:
Figure BDA0002467500510000081
wherein
Figure BDA0002467500510000082
And
Figure BDA0002467500510000083
is the numerator and denominator of the filter updated in the previous frame.
(2) Scale filter
The model updating and the filter response solving process in the training process of the scale filter are consistent with the position filter.
Filter training
Corresponding to fig. 5, the target position is used as the center for scaling sampling, and the scale selection principle is as follows:
Figure BDA0002467500510000084
wherein P × RSRepresenting the target scale in the current frame, a is the scaling factor (a equals 1.02) and S is the size of the scale filter (S equals 33).
And (3) scaling the target image according to the formula (23), selecting S samples with different scales, and extracting d-dimensional hog features from each sample to form a pyramid with the number of layers being S. Taking the feature as a training sample, and the feature f of each dimensionlFor a vector of 1 × S, performing one-dimensional DFT on the feature of each dimension of F to obtain FlAnd performing one-dimensional DFT on G to obtain G, wherein G is an output response constructed by a Gaussian function and has the size of 1 × S, and a correlation filter H is obtained according to the formula (21) and used for predicting an output scale.
Size estimation
As shown in fig. 6, in a new frame, a two-dimensional position-dependent filter is used to determine a new candidate position of the target, then a one-dimensional scale-dependent filter is used to obtain S candidate blocks with different scales with the current central position as the central point, d-dimensional features are respectively extracted to form a new feature map Z, and a DFT of each dimension is obtained to obtain ZlThen, the value of y is obtained according to the formula (22), wherein y is a vector with the dimension of 1 × S, and the scale corresponding to the maximum value in the vector y is the scale of the final target.
Since the DSST algorithm requires manual marking of the initial framePosition, and tracking cannot be performed well when the target is blocked by a foreign object or lost, so that it is necessary to determine model update using feedback of the tracking result during target detection. The peaks and fluctuations of the response map may reveal to some extent the confidence of the tracking result. Therefore, two confidence indexes, the maximum response value F, are introducedmaxAnd average peak-to-correlation energy (APCE). In general FmaxThe larger the tracking effect, the better, the APCE reflects the degree of fluctuation of the response map and the confidence level of the detected target.
Figure BDA0002467500510000085
Wherein Fmax、FminRepresenting the maximum and minimum of the response, Fw,hRepresents the value of the position of the response map (w, h). When the detected target is a close match to the correct target, the response map should have only one sharp peak and be smooth in all other regions, where APCE will become larger, the sharper the correlation peak, and the higher the positioning accuracy. The APCE will be significantly reduced if the object is occluded or lost. If F of the current framemaxAnd APCE are both greater than the ratio β1,β21=0.7,β20.45), the tracking result in the current frame is considered as high-reliability, and then the model is updated, otherwise, the face detection needs to be performed on the current frame again.
As shown in fig. 2, the specific operation steps are as follows:
step 2.21: taking a face area and a scale detected by a face as an initial position P of a target1And the dimension S1And carrying out position correlation filter and scale correlation filter training on the face region to obtain a position model
Figure BDA0002467500510000086
Figure BDA0002467500510000087
Sum scale model
Figure BDA0002467500510000088
Step 2.22: according to the last frame It-1Target position P oft-1Sum scale St-1At the current frame ItA characteristic sample with the size 2 times of the target of the previous frame is collected
Figure BDA0002467500510000089
Using characteristic samples
Figure BDA00024675005100000810
And the position model of the previous frame
Figure BDA00024675005100000811
Calculating the maximum response value of the position-dependent filter according to equation (22) to obtain a new position P of the targett
Step 2.23: from the determined new position P of the targettTaking the current new position as a central point, utilizing a one-dimensional scale correlation filter to obtain S candidate samples with different scales according to a scaling rule, and respectively extracting d-dimensional features from each candidate sample to obtain a feature sample of the current frame
Figure BDA00024675005100000812
Then using the feature sample
Figure BDA00024675005100000813
Sum scale model
Figure BDA00024675005100000814
Calculating the response value of the 1 × S-dimensional scale correlation filter according to the formula (22), wherein the scale corresponding to the maximum response value is the scale S of the final targett
Step 2.24: if the maximum response value and the average peak value correlation energy of the current frame correlation filter both meet the updating strategy condition, then the current frame ItAccording to position PtSum scale StExtraction of feature ft trans、ft scaleUpdating the position model according to equation (20)
Figure BDA0002467500510000091
Sum scale model
Figure BDA0002467500510000092
Otherwise in the current frame ItAnd face detection is carried out again.
3. The method comprises the steps of positioning feature points of a human face by adopting a cascade regression tree algorithm, respectively positioning an eye region and a mouth region according to the positioned feature points, identifying an eye state by adopting an SVM (support vector machine) classifier, and identifying a mouth state by calculating a mouth aspect ratio.
The human face key point detection method based on the cascading Regression Tress (ERT) algorithm learns the local features of each key point, combines the features and detects the key points by using linear Regression. The ERT algorithm is a cascaded regression tree based face keypoint localization algorithm proposed by Kazemi and Sullivan, which selects 68 key feature point models of the labeled face, as shown in fig. 7, and proposes a general framework based on a gradient enhancement algorithm for learning the cascaded regression tree and using the cascaded regression tree to estimate the landmark locations of the face directly from a sparse subset of pixel intensities. The algorithm includes two processes: training and establishing a model and fitting the model.
Firstly, establishing a model
The algorithm uses two layers of regression to build a mathematical model. The first-level regression iteration formula is:
Figure BDA0002467500510000093
wherein SShape ofIs a vector of the shape of the object,
Figure BDA0002467500510000094
coordinates, X, representing all p facial markers in image Ii∈R2Is the coordinates (x, y) of the ith facial marker in image I.
Figure BDA0002467500510000095
The feature point coordinate set shape vector predicted for the t-th iteration,
Figure BDA0002467500510000096
for the results of the t +1 th iteration prediction, each regressor
Figure BDA0002467500510000097
In the cascade, update vectors from images are predicted, the input of which is the current training picture and shape vector, and the output of which is the amount of location update for all keypoints. In the cascade regressor of the layer, every time the cascade regressor passes through the first-level cascade regressor, the positions of all key points are updated once to obtain more accurate positions.
The second layer of regression is the regressor rtInternal iteration of (2). Let us assume a training data set { (I)1,SShape 1),...,(In,SShape n) N is the number of samples, IiFor face images, SShape iAs an image IiAnd (4) corresponding to the position shape vector of the key point of the human face. To learn the regression function r in the cascadetCreating triplets of face images from training data
Figure BDA0002467500510000098
Wherein
Figure BDA0002467500510000099
For the face image in the data set,
Figure BDA00024675005100000910
the keypoint shape vectors are predicted for the ith iteration of the first-level cascaded regression,
Figure BDA00024675005100000911
are the true value and the predicted difference value.
Figure BDA00024675005100000912
Figure BDA00024675005100000913
The process is iterated through the above equation until a T-level regression r is learned0,r1,...,rt-1Is cascaded.
For training data
Figure BDA00024675005100000914
Learning rate
0<υ<1, regression function rtLearning is performed by using a gradient tree enhancement algorithm with a sum of squared error losses as follows:
(1) initialization function
Figure BDA00024675005100000915
Wherein K1, K:
Figure BDA00024675005100000916
(2) fitting regression Tree r by N iterationsikTo obtain a weak regression function
Figure BDA00024675005100000917
Wherein i 1ikThe expression is as follows:
Figure BDA00024675005100000918
(3) updating according to the obtained weak regression function
Figure BDA00024675005100000919
Figure BDA00024675005100000920
(4) Repeating the steps (2) and (3) until K times of iteration are carried out to obtain
Figure BDA00024675005100000921
(5) Obtaining a regression function
Figure BDA00024675005100000922
Model fitting
Obtaining a regression model through K iterations, wherein the specific steps of model fitting are as follows:
(1) and initializing a feature point shape vector of each face image, wherein the initial shapes of all the images are the same.
(2) And establishing a feature pool, randomly selecting two points in the feature pool, and calculating the pixel difference of each image at the two points according to the shape of the feature points of the image.
(3) And constructing a regression tree. And randomly generating a splitting threshold, splitting towards the left if the pixel difference value of the image is smaller than the threshold, splitting towards the right if the pixel difference value of the image is not smaller than the threshold, and splitting all the images according to the method to divide the image into a left part and a right part. Repeating the process for several times to obtain the optimal node theta by minimizing the square error·The objective function is as follows:
Figure BDA0002467500510000101
wherein the nodes to be selected are theta, l and r respectively representing left and right subtrees muθ,sRepresenting the results produced according to the current partition. And after obtaining the optimal node, storing the coordinate values and the splitting threshold of the two characteristic points. This step is then repeated for each node split until a leaf node is reached.
(4) The residual of each leaf node is calculated. And calculating the difference value between the current shape and the real shape of each image, averaging the sum of the difference values of all the images in the same leaf node, and storing the residual error into the leaf node.
(5) The shape of each image is updated. The current shape SShape ofUpdating to the current shape plus residual i.e. (S)Shape of,△SShape of)。
(6) And (4) repeating the processes from (2) to (4) until the finally obtained feature point shape vector represents a real shape.
The Dlib is a cross-platform open source library that provides many implementations for machine learning, deep learning, image processing, and other algorithms. Because the Dlib open source library realizes the ERT algorithm, and a face key point detector is trained on the iBUG 300-W data set, the detector can find the 68 feature points on any face, and therefore the invention uses the algorithm realized by the Dlib open source library to detect the face key points. The experimental result is shown in fig. 8, and it can be seen from the experimental result that the ERT algorithm has better robustness to different facial expressions and head directions, and can well realize the positioning of the facial feature points at different angles.
In order to simply and quickly position the human eyes, the invention positions the eye region according to the positions of the human eye feature points on the basis of the detection of the facial key points.
As shown in the model of a) human face feature points in FIG. 9, the position of each feature point can be known according to the serial numbers of the key points in the figure, for example, the serial numbers of the left eye are 36-41, and the serial numbers of the right eye are 42-47. The extracted left and right eye regions are rectangular regions shown as b) in fig. 9 according to the serial numbers of the eye feature points. The positioning calculation rule is as follows:
Figure BDA0002467500510000102
where W _ e is the horizontal distance of the eye feature points 36 and 39, H _ e is the average of the vertical distances of the feature points 37, 41 and 38, 40, and W and H are the width and height of the localized eye region.
In order to accurately and quickly recognize the open/close state of the eyes, the aspect ratio (EAR) of the eyes is calculated, which is substantially small in the difference between individuals when the eyes are open and is completely invariant to the uniform scaling of the image and the rotation of the face. As shown in fig. 10 for the 6 key points (P1-P6) detected for the left eye in the open and closed states, the eye aspect ratio is calculated as:
Figure BDA0002467500510000103
wherein the numerator represents the euclidean distance between the eye vertical feature points and the denominator is the euclidean distance between the eye horizontal feature points.
Taking the left eye as an example, according to the six feature points, the euclidean distances between the vertical key points and between the horizontal key points can be calculated, and the calculation formula of the euclidean distances between the two points is as follows:
Figure BDA0002467500510000104
wherein P isa·x、PaY are the coordinates x and y of point a, respectively. The horizontal and vertical euclidean distance of the eye can be expressed as
Eyeh=Dis(P1,P4) (35)
Eyev=Mean(Dis(P2,P6),Dis(P3,P5)) (36)
Where Mean (A, B) represents taking the average of A and B. The aspect ratio of the eye at this time can be expressed as:
Figure BDA0002467500510000105
according to equation (37), the aspect ratio of the left and right eyes of a video image is calculated for 200 consecutive frames, and the EAR value becomes small and substantially constant when the eyes are open, but becomes small and approximately zero when the eyes are closed. The eyes are closed or opened basically synchronously, and for more accurately identifying the eye state, the average value of the EAR of the eyes is taken as the characteristic of eye opening and closing identification:
EAR=Mean(EARleft,EARright) (38)
according to the above formula, eye state recognition is performed, and EAR mean values of both eyes are calculated, and the result is shown in fig. 11. When blinking occurs, the EAR value decreases rapidly close to 0 and then increases slowly close to the EAR value at which the eyes are normally open. According to this phenomenon, the EAR value can be used as a feature value for identifying the open-closed eye state, and blink detection can also be performed based on the EAR value.
After the human eye area is positioned, a local adaptive threshold algorithm is selected to carry out binarization on the human eye image, and after morphological opening operation and median filtering processing, the outline and the details of the eye can be better presented. When the human eye is closed, the maximum dark pupil region does not appear although it may be affected by dark regions such as eyelashes and eyelids. The number of black pixels in the binary image is drastically reduced when the eyes are closed, compared to open eyes. However, the number of black pixels may vary with the distance between the human eye and the camera. As the distance becomes larger, the eye area is reduced in the image, and thus the number of black pixels is reduced. Fig. 12 shows the number of black pixels in the eye region during the process of opening and closing the right eye, and it can be seen from fig. 12 that a threshold value can be set to distinguish the open and closed eyes from the 57 th frame, but when the human eye is far away from the camera from the 109 th frame, the number of black pixels in the human eye decreases regardless of the open or closed eye state, and at this time, the open and closed state of the human eye cannot be judged according to the threshold value.
In order to reduce the influence of the distance factor between the human eyes and the camera, the human eye images are normalized to the same size, the difference of the number of black pixels between two continuous frames is calculated, the eye closing action can be observed in more than two continuous frames generally, therefore when the difference value is more than two frames and less than 0, the continuous difference value is accumulated, and the accumulated difference value threshold value is set to judge the opening and closing state. However, it can be seen from fig. 13 and 14 that at frame 54, the difference is greater than 0 and is not accumulated, so that the eye-open state is erroneously recognized.
Therefore, to solve this problem, the present invention accumulates the difference using an adaptive threshold method. Two states of "state 0" and "state 1" are defined, and when the difference value of the black pixels of the binarized image of the human eye region is smaller than 0, the state is changed from "state 0" to "state 1". In the state 1, if the difference is smaller than a threshold value T (t), accumulating the difference and keeping the state unchanged; if the difference is greater than the threshold T (t), no difference is accumulated and the state changes to "state 0".
Human eye binary image region black pixel accumulated difference F based on self-adaptive threshold valueBlack colourThe calculation formula of (2) is as follows:
Figure BDA0002467500510000111
where n (t) is the number of black pixels of the t-th frame, Δ n (t) is the difference in the number of black pixels between the t-th frame and the t-1 th frame, D (t-1) is the accumulated difference in the number of black pixels of the t-1 th frame in "state 1", α is a constant value between 0 and 1, and the optimum α value is determined by the accuracy of detecting open and closed eyes.
Frame 54 can be correctly identified as closed eye by changing the adaptive threshold t (t) based on the accumulated difference at frame t-1. Fig. 15 is a diagram showing the result of calculating the accumulated difference of black pixels of a binary image of a human eye using an adaptive threshold, and it can be seen that the method can better identify the eye-closing state.
In order to more accurately recognize the opening and closing states of human eyes, the aspect ratio of the human eyes and the accumulated difference value of the black pixels of the human eyes are used as input parameters of the SVM classifier, and the trained classifier is used for recognizing the states of the human eyes in the image. The SVM is a machine learning algorithm which can solve the problem of two-classification and can supervise learning, and the essence of the SVM is to find a classification hyperplane with the largest interval from a classification sample point so as to enable the interval between a positive sample and a negative sample to be trained to be the largest. The algorithm can be used for classification and regression analysis of data, and solves the problems of small samples, fractional linearity, high-dimensional mathematics and the like.
The method uses the SVM classifier to perform secondary classification, and mainly comprises five parts of data selection, data processing, characteristic parameter normalization, model training and testing.
(1) Data selection
2000 open-eye samples and 1000 closed-eye samples are selected from 80 videos of the ZJU blink video data set; selecting 2000 eye opening samples and 1000 eye closing samples from the NTHU driver fatigue detection video data set; 2000 open-eye samples and 4000 closed-eye samples were collected by oneself. A total of 6000 open and closed eye sample images were taken, with and without glasses, each sample containing a human face.
(2) Data processing
Firstly, positioning key points of the human face on each sample, and then calculating the aspect ratio of human eyes and the accumulated difference value of black pixels of the human eyes, namely extracting two characteristic values of each sample.
Computing characteristic value EAR
Since the aspect ratio EAR of the eye is completely invariant to uniform scaling and rotation of the image, for each sample, after locating the key point locations of the eyes, the mean value of the aspect ratio of the eyes is directly calculated as the first eigenvalue F of the sample according to equation (38)Black 1
Calculating the cumulative difference of human eye black pixels
Because the number of black pixels in the human eye area can be changed along with the change of the distance between the human eyes and the camera, for each sample, after the right eye area is positioned according to the formula (32), the human eye area is scaled to the same size, and then the black pixel value of the human eye area is calculated. For the sample data of different experimenters, the right-eye black pixel value of the half-open eye state of the experimenters is used as the comparison value of the first frame, the black pixel accumulated difference value of the first frame of the experimenters is the black pixel value of the first frame minus the black pixel value of the half-open eye state, and the black pixel accumulated difference value of the rest sample data of the experimenters is accumulated according to the formula (39). Using the black pixel accumulated difference value as the second characteristic value F of each sample dataBlack 2
The open eye sample and the closed eye sample are processed separately. For each closed-eye sample image, obtaining two characteristic values according to the method and then storing the two characteristic values into a corresponding text file, wherein each behavior is a sample data, and each column is a characteristic value; the same processing as that of the closed-eye sample is performed for each open-eye sample image.
(3) Feature parameter normalization
Because the dimension difference between the numerical values of the two types of characteristic parameters extracted from each sample results in a small contribution of the characteristic parameters with small numerical values in the model training process, in order to balance the weight of each characteristic parameter in the model training process, normalization processing needs to be performed on the data of the two types of characteristic parameters:
Figure BDA0002467500510000112
wherein y isiFor the normalized result value, the normalized value is in the interval [ -1,1 [ ]]Internal; x is the number ofiIs the original characteristic value, xmaxAnd xminAre respectively xiThe number of training samples is N.
And (3) after the text data of the sample characteristic values are obtained according to the step (2), reading the characteristic values in the two files and storing the characteristic values in a two-dimensional array, wherein each row of the array is a sample, each column is a characteristic value, and the category label corresponding to each sample is stored in a label array. Calculating the maximum value x of each column in the two-dimensional arraymaxAnd minimum value xminFor each column in the array, each eigenvalue x for that column is calculated by equation (40)iNormalized result value yiAnd obtaining a two-dimensional array value after the array processing is finished, namely a value obtained by normalizing all sample characteristic values.
(4) Model training and parameter optimization
The SVM classifier can be expressed as:
Figure BDA0002467500510000121
wherein N is the number of training samples; y isi∈ { -1, 1} is the class label of the training sample, 1 represents closed eye, -1 represents open eye, K (x, x)i) Representing a kernel function, constant b being a bias term, αiBy solving a quadratic programming problem with linear constraints.
SVMs have four kernel functions: LINEAR kernel function (LINEAR), polynomial kernel function (POLY), radial basis kernel function (RBF), SIGMOD kernel function. Before classifier training, a proper kernel function needs to be selected, and since the RBF kernel function can handle the situation when the relation between the features and the classification labels is nonlinear, the RBF kernel function is adopted for model training. The RBF kernel function has two undetermined variables which are used for controlling a penalty coefficient C of a loss function and controlling a linear separability kernel parameter gamma after a nonlinear problem is transformed to a high-dimensional space, and the selection of the two variables has a decisive effect on the prediction precision.
In order to search for the optimal penalty coefficient C and the kernel variable gamma and improve the accuracy of model prediction, a K-CV cross verification method is adopted to optimize the parameters C and gamma. 8000 groups of the collected 12000 groups of characteristic values are evenly divided into 10 groups, 9 groups are selected as a training set each time, and the rest 1 group is used as a verification set. And (4) carrying out normalization processing on the characteristic values of the training set and the verification set according to the step (3), and storing the class labels in corresponding class label arrays. The optimization finds that when the parameter C is 2.04 and the parameter gamma is 0.9, the effect of model prediction classification is better.
(5) Experimental detection
(ii) evaluation of parameters in an experiment
In order to evaluate the performance of the training model for predicting the eye opening and closing state, the Accuracy (Accuracy), Precision (Precision) and Recall (Recall) are selected as evaluation parameters. For each sample of the test set, the results of the identification may appear as follows:
TP (true Positive): indicating that the test sample is predicted to be in a closed-eye state and actually is also in a closed-eye state.
FP (false Positive): indicating that the test sample predicted a closed-eye condition, and was actually an open-eye condition.
Tn (true negative): indicating that the test sample is predicted to be in an open eye state, and actually in an open eye state.
Fn (false negative): indicating that the test sample is predicted to be in an open-eye state and actually in a closed-eye state.
The three evaluation parameters were calculated as follows:
Figure BDA0002467500510000122
experimental results and analysis
The remaining 4000 sets of data from the sample data were selected for testing the open-closed eye status, and the test results are shown in the following table.
Table 4 open/close eye state detection results
Figure BDA0002467500510000123
As can be seen from table 4, the accuracy of the proposed method for identifying the open-closed eye state is high, and table 5 is a comparison of the identification results using different algorithms. Experiments show that the accuracy of the provided method for training the classifier by fusing the characteristics to the open and close states of human eyes is higher than that of the method for recognizing the state of human eyes with single characteristics.
TABLE 5 comparison of recognition results of different algorithms
Figure BDA0002467500510000124
The serial numbers of the positions of the characteristic points of the mouth are 48-67 according to the positioning of the characteristic points of the human face, so that the mouth can be positioned and the state can be identified according to the serial numbers of the characteristic points, as shown in fig. 16.
The mouth state is judged by calculating the Mouth Aspect Ratio (MAR), and in order to make the MAR value more accurate, as shown in fig. 16, P of the mark1-P10For calculating 10 feature points of the MAR, the calculation formula of the euclidean distance may refer to expression (43).
Figure BDA0002467500510000131
Under normal driving conditions, the mouth is in a closed state; when speaking with a person, the lips are in an opening and closing state which is constantly changed, and the opening amplitude is not large; and when the human body is in a fatigue yawning state, the mouth opening amplitude is large and the duration is long. In order to judge the mouth state such as speaking, yawning and the like, the state simulation is carried out by using a method based on the aspect ratio, the detection result is shown in FIG. 17, and it can be known from FIG. 17 that when MAR is less than or equal to 0.4, the mouth is closed; when the MAR is more than 0.4 and less than or equal to 0.8, the speech state is normal; it is in the yawning state when MAR > 0.8. From the above analysis, the mouth state can be identified using MAR as a feature.
4. And respectively calculating eye fatigue parameters and mouth fatigue parameters according to the eye state and the mouth state, and calculating head fatigue parameters according to the positioned characteristic point position information.
Extracting fatigue parameters according to the states of eyes, a mouth and a head, then integrating the fatigue parameters to establish a fatigue state recognition model, and judging the fatigue state of the driver by adopting a multi-feature weighted sum value. The main extracted parameters include the ratio of eye closure frame number (ECR), Blinking Frequency (BF), longest duration eye closure time (MECT), Yawning Frequency (YF), Nodding Frequency (NF), and the like.
4.1 extraction of eye fatigue information
When a person is in a fatigue state, phenomena of blink frequency increase, eye closing time increase, yawning and the like can occur, and even dozing can occur in serious cases. According to research, people normally blink 10 times to 25 times per minute, and the duration of eye closure is about 0.2sLeft and right. According to the phenomenon, the invention selects three eye indexes which can show fatigue state most based on ECR, MECT and BF of PERCLOS criterion as eye fatigue characteristic parameters.
(1) ECR based on PERCLOS criterion
The PERCLOS criterion is the most effective and reliable criterion recognized for fatigue driving detection, which calculates the percentage of the total time that the human eye is closed over a time period. This criterion contains 3 decision criteria, depending on the definition of the closing of the eyes: EM, P70And P80. Wherein, P80Is most suitable for identifying fatigue driving, and represents the proportion of time that the eyelid covers more than 80% of the pupil area. Because it is difficult to accurately calculate the area of the eyelid covering the pupil during actual detection, and the judgment of the Eye closing state is well realized in the foregoing, the Eye closing state judgment method takes the percentage of the Eye closing frame number in the total frame number (ECR) in the time period as the Eye characteristic parameter:
Figure BDA0002467500510000132
wherein n isTime of flightNumber of closed frames in a time period, NTime of flightIs the total number of frames in the time period.
(2) Maximum duration of eye closure
Maximum Eye-closing Time (Max Eye Close Time, MECT): eyes from complete closure to completionDuration of full opening, i.e. t in fig. 182To t4The elapsed time. In a fatigue state of a human, the closing time of human eyes is often more than 1.5 s. If the video speed is f per secondClosing deviceFrame, the number of closed-eye continuous frames in the time period is KcThat is, the continuous eye closing time in one time period is:
Figure BDA0002467500510000133
if the continuous eye closing time in the time period exceeds the threshold value, the characteristic parameter is regarded as a fatigue state.
(3) Blink frequency
Blink Frequency (BF): number of blinks per unit time. One blink time is from t of FIG. 181To t4The time elapsed for a person to blink while awake averages about 10-25 blinks per minute, and the number of blinks increases with fatigue, but decreases with distraction or severe fatigue. Therefore, the number of blinks in the time period can be counted, and if the number exceeds the normal range, the characteristic parameter is regarded as the fatigue state.
Blink detection may be performed based on the EAR value. From the EAR value calculation results, it was found that the EAR value decreased until it approached zero after one blink, and then gradually increased to a normal eye-open state value. With EEye (A)As a threshold value of EAR, KEye (A)When EAR is less than EEye (A)The threshold of one blink is counted for how many frames in succession. When EAR is less than threshold EEye (A)When the eye begins to close, it is greater than E when it approaches the normal eye-open state valueEye (A)When the eyes were fully open, we counted EAR in the process<EEye (A)Number of consecutive frames FEye (A)When EAR is not less than EEye (A)When F is presentEye (A)Greater than the set threshold K of the number of continuous framesEye (A)Blink once.
To find the optimum threshold value EEye (A)And KEye (A)Experiments were performed on a ZJU blink dataset, with 80 videos in ZJU containing four subjects: front video without glasses, front video with thin-frame glasses, and front video with black-frame glassesThe figures and videos of elevation up without glasses, 20 groups of videos per subject, varying from one to six blinks per video, for a total of 255 blinks in the data set.
From the results of FIG. 19, when extracting the eye fatigue parameter blink frequency, choose to calculate EAR less than threshold EEye (A)When EAR is greater than the threshold, the continuous frame number is also greater than the threshold KEye (A)And (3) recording one blink, and calculating the number of blinks in the time period to be the blink frequency.
And taking 60s as a time period, and carrying out statistical analysis on the human eye state in the period to obtain the statistical value of the eye fatigue characteristics. The waking state is represented by 0, the fatigue state is represented by 1, the longest eye closing time is mect, the ratio of the number of eye closing frames is ecr, the number of eye blinks is bf, and the fatigue threshold values among three eye fatigue characteristic values are obtained through experiments and related references as shown in the following table 6:
TABLE 6 evaluation conditions for eye fatigue
Figure BDA0002467500510000141
4.2 mouth fatigue parameter extraction
When the driver is in a drowsy state, the yawning can be continuously performed, the mouth opening time is kept about 6 seconds each time, and the driver needs to stop to rest at the moment and is not suitable to continue driving. Based on this phenomenon, the number of times the driver yawns in a time period can be detected to assess whether he is fatigued. From the foregoing, it can be seen that when the mouth aspect ratio MAR is greater than 0.7 for 15 consecutive frames, it is once yawned. As in fig. 20, t1To t4The time difference is the time of one yawning, and when the opening degree of the mouth exceeds a threshold value, whether yawning is performed or not is detected. The normal state is represented by 0, the fatigue state is represented by 1, and the value conditions of the fatigue state of the mouth are as follows:
Figure BDA0002467500510000142
where yf represents the number of yawns, yt is the duration of one yawning, and N is 3, and t is 4 s.
4.3 head fatigue parameter extraction
When a person is in a drowsy state, the reaction is delayed, and the control ability of the head is reduced, thereby causing the head drooping phenomenon. In order to keep waking, the head is raised continuously, so that the head is lowered and the head is raised to reciprocate up and down. When the phenomenon frequently occurs to the driver, the driver is in a fatigue state, traffic accidents are possible to occur at any time, and the detection of the nodding frequency in the driving process of the driver is a key to head motion analysis and is also an important factor for detecting fatigue driving. The driver may be considered to be in a tired state when the frequency of nodding heads exceeds a certain threshold during a time period.
According to the position information of the eye feature points, from the aspects of real-time performance and accuracy, the midpoint of the connecting line of the central points of the two eyes is taken as a head position detection point, and the nodding frequency in a time period is calculated according to the change condition of the coordinate y of the detection point along with the time in the vertical direction. FIG. 21 is a graph showing the relationship between the y value and the number of frames when the driver is dozing.
The algorithm process is as follows: when the number of video frames is large, the image can be approximately fitted into a curve, and curve extreme points are calculated, wherein the extreme points can divide the curve into a plurality of monotonous curves. The number of the extreme points, namely the number of the head-on times nf, of which the value y of the monotonously decreasing section minimum value points is greater than the initial position 50 pixels in the time period is counted through experiments; if the curve has no minimum value point, judging whether the curve is monotonically decreased, if so, setting the number nf of head points to be 1, otherwise, setting the number to be 0. The value of NF is shown as formula 47:
Figure BDA0002467500510000143
if the number NF of head-on times in the time period is greater than a certain threshold value N, the NF fatigue characteristic parameter value is 1, otherwise, the NF fatigue characteristic parameter value is 0, and the fatigue state detection accuracy is highest by taking N as 8 through experiments.
5. And identifying and early warning the fatigue state of the driver according to the eye fatigue parameter, the mouth fatigue parameter and the head fatigue parameter.
According to fatigue characteristic indexes of eyes, mouths and heads, weighting values are respectively taken for the accuracy of fatigue judgment, and the weighting sum of characteristic parameters is calculated as follows:
Efatigue=VECR×W1+VMECT×W2+VBF×W3+VNF×W4+VYF×W5(48)
Wherein EFatigueTo weight the fatigue value, VECRThe number of closed-eye frames, VMECTFor the longest duration of eye closure time, VBFFrequency of blinking, VNFFor nodding frequency, VYFTo beat the frequency of yawning, WiThe weight values corresponding to the different parameters are set,
Figure BDA0002467500510000144
carrying out experiment optimization through simulating fatigue, determining respective weight values of five fatigue characteristic parameters of eyes, mouths and heads, wherein the weight values of the corresponding characteristics are as follows: w1=0.2,W2=0.1,W3=0.2,W4=0.2,W5=0.3。
According to the different values of the fatigue parameters after weighting, the states are divided into three grades: clear-headed, fatigue, severe fatigue. And integrating the weighted values and the fatigue grades of the fatigue characteristic parameters, corresponding the weighted values of the fatigue characteristic parameters to the fatigue grades, and judging the driving state of the driver according to the corresponding relation. The correspondence is shown in table 7:
TABLE 7 fatigue value and fatigue grade corresponding relation table
Figure BDA0002467500510000145
In order to verify the performance of the method, a verification experiment is carried out on a PC of a 64-bit operating system, a python programming language is adopted, and the experimental analysis is carried out by combining Opencv 2.4.13 and a Dlib18.17 function library. Experimental test data were from the ntuu driver fatigue test video dataset, with 5 different scenarios in the test data: the glasses are worn in the daytime, the sunglasses and the glasses are not worn, and the glasses are worn at night and the glasses are not worn. Each scene contained 16 sets of data, each set containing awake, tired, and heavily tired states.
The fatigue state of a driver is detected in a period of 60s, 165 groups of data in total are selected from each scene of 5 different scenes to find the optimal weight of each fatigue index, each weight is changed between 0.1 and 0.6, table 8 shows the influence of selection of different weights of each fatigue index part on the fatigue state identification accuracy, and the data in table 8 shows that the fatigue identification rate is highest when each fatigue index weight is the optimal value of a formula (48). The fatigue grade identification accuracy is calculated as follows:
Figure BDA0002467500510000151
TABLE 8 fatigue index weight optimization
Figure BDA0002467500510000152
Selecting the optimal weight value of each fatigue index, and identifying the fatigue state of 75 videos in the remaining 5 groups of data in each scene, wherein a table 9 is a fatigue identification result; table 10 shows the specific calculation results, fatigue values, and corresponding fatigue recognition results of the characteristic parameters of 15 videos when the glasses are worn in the daytime.
TABLE 9 fatigue recognition results under different environments
Figure BDA0002467500510000153
TABLE 10 fatigue recognition results for wearing glasses in daytime
Figure BDA0002467500510000161
As can be seen from the above table, the fatigue identification method provided has better identification accuracy in the daytime than at night, has lower identification accuracy when wearing sunglasses, but has better identification effect as a whole.
Table 11 shows the average running time of each module per frame in the method of the present invention, and the overall running time is as follows from table 11: 159.5903ms, the running time is about 17.1003ms after the face is detected. When the human face misdetection or the loss of the tracking target occurs, the detection of the next frame is immediately carried out, even if the misdetection is carried out for 3-5 seconds in a time period, the processing speed of more than 30 frames/second can be met, and the fatigue identification method has good real-time property.
TABLE 11 average run time of modules
Figure BDA0002467500510000162
The invention also provides a fatigue driving recognition system based on multiple characteristics, which comprises:
the image acquisition and processing unit is used for acquiring a video single-frame image in real time and preprocessing the video single-frame image; the face detection and tracking unit is used for carrying out face detection on the preprocessed video image by adopting an AdaBoost algorithm based on Haar-like characteristics and tracking the detected face in real time by adopting a target tracking algorithm based on a scale space; the positioning and state recognition unit is used for positioning the feature points of the human face, respectively positioning the eye region and the mouth region according to the positioned feature points, recognizing the eye state by adopting an SVM classifier, and recognizing the mouth state by calculating the aspect ratio of the mouth; the parameter calculation unit is used for calculating eye fatigue parameters and mouth fatigue parameters according to the eye state and the mouth state respectively and calculating head fatigue parameters according to the positioned feature point position information; and the fatigue state identification unit is used for identifying and early warning the fatigue state of the driver according to the eye fatigue parameter, the mouth fatigue parameter and the head fatigue parameter.
The above disclosure is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or modifications within the technical scope of the present invention, and shall be covered by the scope of the present invention.

Claims (10)

1. A fatigue driving identification method based on multiple characteristics is characterized by comprising the following steps:
step 1: acquiring a video single-frame image in real time, and preprocessing the video single-frame image;
step 2: performing face detection on the preprocessed video image by adopting an AdaBoost algorithm based on Haar-like characteristics, and tracking the detected face in real time by adopting a target tracking algorithm based on a scale space;
and step 3: the method comprises the steps of positioning feature points of a human face, respectively positioning an eye region and a mouth region according to the positioned feature points, identifying an eye state by adopting an SVM (support vector machine) classifier, and identifying a mouth state by calculating the aspect ratio of the mouth;
and 4, step 4: respectively calculating eye fatigue parameters and mouth fatigue parameters according to the eye state and the mouth state, and calculating head fatigue parameters according to the positioned position information of the feature points;
and 5: and identifying and early warning the fatigue state of the driver according to the eye fatigue parameter, the mouth fatigue parameter and the head fatigue parameter.
2. The fatigue driving identification method according to claim 1, wherein in the step 1, the video single-frame image preprocessing process comprises:
step 1.1: carrying out smooth denoising processing on a video single-frame image;
step 1.2: and carrying out illumination compensation processing on the video image subjected to the smooth denoising processing.
3. The fatigue driving identification method according to claim 2, wherein in the step 1.1, the video image is subjected to smoothing and denoising processing by using adaptive median filtering.
4. The fatigue driving identification method according to claim 2, wherein in step 1.2, an illumination equalization algorithm based on a dynamic threshold is adopted to perform illumination compensation processing on the video images with different illumination shades.
5. The fatigue driving recognition method according to claim 1, wherein in the step 2, the specific operation steps of the AdaBoost algorithm for face detection are as follows:
step 2.11: calculating Haar-like characteristics of the image by using the integral graph;
step 2.12: for the Haar-like characteristics, selecting an optimal weak classifier through training iteration, and constructing the weak classifier into a strong classifier according to a weighted voting mode;
step 2.13: then connecting a plurality of strong classifiers obtained by training in series to form a cascade classifier with a cascade structure;
step 2.14: and carrying out face detection on the image by adopting a stacked classifier.
6. The fatigue driving recognition method according to claim 1 or 5, wherein in the step 2, a target tracking algorithm based on a scale space is adopted to track the detected face in real time, and the specific operation steps are as follows:
step 2.21: taking a face area and a scale detected by a face as an initial position P of a target1And the dimension S1And carrying out position correlation filter and scale correlation filter training on the face region to obtain a position model
Figure FDA00024675005000000113
Sum scale model
Figure FDA00024675005000000114
Step 2.22: according to the last frame It-1Target position P oft-1Sum scale St-1At the current frame ItA characteristic sample with the size 2 times of the target of the previous frame is collected
Figure FDA0002467500500000012
Using characteristic samples
Figure FDA0002467500500000013
And the position model of the previous frame
Figure FDA0002467500500000014
Calculating the maximum response value of the position-dependent filter to obtain the new position P of the targett
Step 2.23: from the determined new position P of the targettTaking the current new position as a central point, utilizing a one-dimensional scale correlation filter to obtain S candidate samples with different scales according to a scaling rule, and respectively extracting d-dimensional features from each candidate sample to obtain a feature sample of the current frame
Figure FDA0002467500500000015
Then using the feature sample
Figure FDA0002467500500000016
Sum scale model
Figure FDA0002467500500000017
Calculating the response value of the 1 × S-dimensional scale correlation filter, wherein the scale corresponding to the maximum response value is the scale S of the final targett
Step 2.24: if the maximum response value and the average peak value correlation energy of the current frame correlation filter both meet the updating strategy condition, then the current frame ItAccording to position PtSum scale StExtraction of feature ft trans、ft scaleUpdating the location model
Figure FDA0002467500500000018
Sum scale model
Figure FDA0002467500500000019
Otherwise in the current frame ItCarrying out face detection again;
the updating strategy conditions are that the maximum response value and the average peak correlation energy are respectively larger than the ratio β1Sum ratio β2,β1Is 0.7, β2Is 0.45;
preferably, the calculation expression of the response value of the position or scale dependent filter is:
Figure FDA00024675005000000110
wherein, F-1() DFT, y for inverse discrete Fourier transformtFor the response value obtained, d-dimensional features are extracted from each pixel of the feature sample, wherein the feature map of the l-th dimension is marked as flWhere l is 1,2, …, d, l is a certain dimension of the feature, λ is the coefficient of the regular term,
Figure FDA00024675005000000111
respectively the numerator and denominator of the filter updated in the previous frame,
Figure FDA00024675005000000112
to find the two-dimensional DFT of each dimension of the feature map of the current frame image.
7. The fatigue driving recognition method according to claim 1, wherein in the step 3, a cascade regression tree-based algorithm is adopted to locate the feature points of the human face, wherein the feature points of the human face include eye feature points and mouth feature points.
8. The fatigue driving recognition method according to claim 1 or 7, wherein in the step 3, the specific operation of recognizing the eye state by using the SVM classifier is:
training an SVM classifier by taking the human eye aspect ratio and the human eye binary image area black pixel accumulated difference value as input characteristics of the SVM classifier, and then classifying and identifying the eye state by adopting the trained SVM classifier; the black pixel accumulated difference value F of the human eye binary image areaBlack colourThe calculation formula of (2) is as follows:
Figure FDA0002467500500000021
where n (t) is the number of black pixels of the t-th frame, Δ n (t) is the difference in the number of black pixels between the t-th frame and the t-1-th frame, D (t-1) is the accumulated difference in the number of black pixels of the t-1-th frame in "state 1", and α is a constant value between 0 and 1.
9. The fatigue driving identification method according to claim 1, wherein in the step 4, the eye fatigue parameters include a ratio of eye closure frame number, a blinking frequency, and a maximum duration eye closure time, the mouth fatigue parameters include a yawning frequency, and the head fatigue parameters include a nodding frequency; preferably, the fatigue state is identified by performing weighted summation on the eye fatigue parameter, the mouth fatigue parameter and the head fatigue parameter, and the specific weighted summation expression is as follows:
Efatigue=VECR×W1+VMECT×W2+VBF×W3+VNF×W4+VYF×W5
Wherein E isFatigueTo weight the fatigue value, VECRThe number of closed-eye frames, VMECTFor the longest duration of eye closure time, VBFFrequency of blinking, VNFFor nodding frequency, VYFTo beat the frequency of yawning, WiThe weight values corresponding to the different parameters are set,
Figure FDA0002467500500000022
preferably, when the weighted fatigue value is less than 0.3, the state is the waking state; when the weighted fatigue value is more than or equal to 0.3 and less than 0.7, the state is a fatigue state; when the weighted fatigue value is 0.7 or more, the fatigue state is severe.
10. A multi-feature based fatigue driving recognition system, comprising:
the image acquisition and processing unit is used for acquiring a video single-frame image in real time and preprocessing the video single-frame image;
the face detection and tracking unit is used for carrying out face detection on the preprocessed video image by adopting an AdaBoost algorithm based on Haar-like characteristics and tracking the detected face in real time by adopting a target tracking algorithm based on a scale space;
the positioning and state recognition unit is used for positioning the feature points of the human face, respectively positioning the eye region and the mouth region according to the positioned feature points, recognizing the eye state by adopting an SVM classifier, and recognizing the mouth state by calculating the aspect ratio of the mouth;
the parameter calculation unit is used for calculating eye fatigue parameters and mouth fatigue parameters according to the eye state and the mouth state respectively and calculating head fatigue parameters according to the positioned feature point position information;
and the fatigue state identification unit is used for identifying and early warning the fatigue state of the driver according to the eye fatigue parameter, the mouth fatigue parameter and the head fatigue parameter.
CN202010338222.XA 2020-04-26 2020-04-26 Fatigue driving identification method and system based on multiple characteristics Pending CN111582086A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010338222.XA CN111582086A (en) 2020-04-26 2020-04-26 Fatigue driving identification method and system based on multiple characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010338222.XA CN111582086A (en) 2020-04-26 2020-04-26 Fatigue driving identification method and system based on multiple characteristics

Publications (1)

Publication Number Publication Date
CN111582086A true CN111582086A (en) 2020-08-25

Family

ID=72114102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010338222.XA Pending CN111582086A (en) 2020-04-26 2020-04-26 Fatigue driving identification method and system based on multiple characteristics

Country Status (1)

Country Link
CN (1) CN111582086A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986443A (en) * 2020-08-31 2020-11-24 上海博泰悦臻网络技术服务有限公司 Fatigue driving monitoring device and method
CN112069359A (en) * 2020-09-01 2020-12-11 上海熙菱信息技术有限公司 Method for dynamically filtering abnormal data of comparison result of snapshot object
CN112183220A (en) * 2020-09-04 2021-01-05 广州汽车集团股份有限公司 Driver fatigue detection method and system and computer storage medium
CN112183215A (en) * 2020-09-02 2021-01-05 重庆利龙科技产业(集团)有限公司 Human eye positioning method and system combining multi-feature cascade SVM and human eye template
CN112528792A (en) * 2020-12-03 2021-03-19 深圳地平线机器人科技有限公司 Fatigue state detection method, fatigue state detection device, fatigue state detection medium, and electronic device
CN112528767A (en) * 2020-11-26 2021-03-19 天津大学 Machine vision-based construction machinery operator fatigue operation detection system and method
CN112528843A (en) * 2020-12-07 2021-03-19 湖南警察学院 Motor vehicle driver fatigue detection method fusing facial features
CN113040757A (en) * 2021-03-02 2021-06-29 江西台德智慧科技有限公司 Head posture monitoring method and device, head intelligent wearable device and storage medium
CN113076884A (en) * 2021-04-08 2021-07-06 华南理工大学 Cross-mode eye state identification method from near infrared light to visible light
CN113197573A (en) * 2021-05-19 2021-08-03 哈尔滨工业大学 Film watching impression detection method based on expression recognition and electroencephalogram fusion
CN113240885A (en) * 2021-04-27 2021-08-10 宁波职业技术学院 Method for detecting fatigue of vehicle-mounted driver
CN113780164A (en) * 2021-09-09 2021-12-10 福建天泉教育科技有限公司 Head posture recognition method and terminal
CN113838265A (en) * 2021-09-27 2021-12-24 科大讯飞股份有限公司 Fatigue driving early warning method and device and electronic equipment
CN113978475A (en) * 2021-09-22 2022-01-28 东风汽车集团股份有限公司 Control method and system for automatic driving intervention during fatigue driving of driver
CN115641542A (en) * 2022-12-23 2023-01-24 腾讯科技(深圳)有限公司 Data processing method and device and storage medium
CN117523521A (en) * 2024-01-04 2024-02-06 山东科技大学 Vehicle detection method based on Haar features and improved HOG features

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622600A (en) * 2012-02-02 2012-08-01 西南交通大学 High-speed train driver alertness detecting method based on face image and eye movement analysis
CN106372621A (en) * 2016-09-30 2017-02-01 防城港市港口区高创信息技术有限公司 Face recognition-based fatigue driving detection method
CN107578008A (en) * 2017-09-02 2018-01-12 吉林大学 Fatigue state detection method based on blocking characteristic matrix algorithm and SVM
CN110210382A (en) * 2019-05-30 2019-09-06 上海工程技术大学 A kind of face method for detecting fatigue driving and device based on space-time characteristic identification
CN110334600A (en) * 2019-06-03 2019-10-15 武汉工程大学 A kind of multiple features fusion driver exception expression recognition method
CN110532887A (en) * 2019-07-31 2019-12-03 郑州大学 A kind of method for detecting fatigue driving and system based on facial characteristics fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622600A (en) * 2012-02-02 2012-08-01 西南交通大学 High-speed train driver alertness detecting method based on face image and eye movement analysis
CN106372621A (en) * 2016-09-30 2017-02-01 防城港市港口区高创信息技术有限公司 Face recognition-based fatigue driving detection method
CN107578008A (en) * 2017-09-02 2018-01-12 吉林大学 Fatigue state detection method based on blocking characteristic matrix algorithm and SVM
CN110210382A (en) * 2019-05-30 2019-09-06 上海工程技术大学 A kind of face method for detecting fatigue driving and device based on space-time characteristic identification
CN110334600A (en) * 2019-06-03 2019-10-15 武汉工程大学 A kind of multiple features fusion driver exception expression recognition method
CN110532887A (en) * 2019-07-31 2019-12-03 郑州大学 A kind of method for detecting fatigue driving and system based on facial characteristics fusion

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
WON OH LEE等: "Blink detection robust to various facial poses", vol. 193, no. 2, pages 356 - 372, XP027452186, DOI: 10.1016/j.jneumeth.2010.08.034 *
刘明周等: "基于面部几何特征及手部运动特征的驾驶员疲劳检测", vol. 55, no. 2, pages 19 - 26 *
周海英等: "改进的核相关自适应目标跟踪算法及其实验验证", 《科学技术与工程》, vol. 18, no. 14 *
居超等: "一种抗遮挡尺度自适应核相关滤波器跟踪算法", 《上 海 理 工 大 学 学 报》, vol. 40, no. 5 *
张国山等: "基于位置修正机制和模型更新策略的跟踪算法", 《信息与控制》, vol. 49, no. 2 *
陈忠等: "列车司机疲劳驾驶监测中的人脸定位方法研究", 《铁道科学与工程学报》, vol. 16, no. 12 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986443A (en) * 2020-08-31 2020-11-24 上海博泰悦臻网络技术服务有限公司 Fatigue driving monitoring device and method
CN112069359A (en) * 2020-09-01 2020-12-11 上海熙菱信息技术有限公司 Method for dynamically filtering abnormal data of comparison result of snapshot object
CN112069359B (en) * 2020-09-01 2024-03-19 上海熙菱信息技术有限公司 Method for dynamically filtering abnormal data of snapshot object comparison result
CN112183215A (en) * 2020-09-02 2021-01-05 重庆利龙科技产业(集团)有限公司 Human eye positioning method and system combining multi-feature cascade SVM and human eye template
CN112183220A (en) * 2020-09-04 2021-01-05 广州汽车集团股份有限公司 Driver fatigue detection method and system and computer storage medium
CN112183220B (en) * 2020-09-04 2024-05-24 广州汽车集团股份有限公司 Driver fatigue detection method and system and computer storage medium thereof
CN112528767A (en) * 2020-11-26 2021-03-19 天津大学 Machine vision-based construction machinery operator fatigue operation detection system and method
CN112528792B (en) * 2020-12-03 2024-05-31 深圳地平线机器人科技有限公司 Fatigue state detection method, device, medium and electronic equipment
CN112528792A (en) * 2020-12-03 2021-03-19 深圳地平线机器人科技有限公司 Fatigue state detection method, fatigue state detection device, fatigue state detection medium, and electronic device
CN112528843A (en) * 2020-12-07 2021-03-19 湖南警察学院 Motor vehicle driver fatigue detection method fusing facial features
CN113040757B (en) * 2021-03-02 2022-12-20 江西台德智慧科技有限公司 Head posture monitoring method and device, head intelligent wearable device and storage medium
CN113040757A (en) * 2021-03-02 2021-06-29 江西台德智慧科技有限公司 Head posture monitoring method and device, head intelligent wearable device and storage medium
CN113076884A (en) * 2021-04-08 2021-07-06 华南理工大学 Cross-mode eye state identification method from near infrared light to visible light
CN113240885A (en) * 2021-04-27 2021-08-10 宁波职业技术学院 Method for detecting fatigue of vehicle-mounted driver
CN113197573A (en) * 2021-05-19 2021-08-03 哈尔滨工业大学 Film watching impression detection method based on expression recognition and electroencephalogram fusion
CN113780164B (en) * 2021-09-09 2023-04-28 福建天泉教育科技有限公司 Head gesture recognition method and terminal
CN113780164A (en) * 2021-09-09 2021-12-10 福建天泉教育科技有限公司 Head posture recognition method and terminal
CN113978475A (en) * 2021-09-22 2022-01-28 东风汽车集团股份有限公司 Control method and system for automatic driving intervention during fatigue driving of driver
CN113838265B (en) * 2021-09-27 2023-05-30 科大讯飞股份有限公司 Fatigue driving early warning method and device and electronic equipment
CN113838265A (en) * 2021-09-27 2021-12-24 科大讯飞股份有限公司 Fatigue driving early warning method and device and electronic equipment
CN115641542A (en) * 2022-12-23 2023-01-24 腾讯科技(深圳)有限公司 Data processing method and device and storage medium
CN117523521A (en) * 2024-01-04 2024-02-06 山东科技大学 Vehicle detection method based on Haar features and improved HOG features
CN117523521B (en) * 2024-01-04 2024-04-02 山东科技大学 Vehicle detection method based on Haar features and improved HOG features

Similar Documents

Publication Publication Date Title
CN111582086A (en) Fatigue driving identification method and system based on multiple characteristics
Zhang et al. Driver fatigue detection based on eye state recognition
CN106682578B (en) Weak light face recognition method based on blink detection
KR101653278B1 (en) Face tracking system using colar-based face detection method
Han et al. Driver drowsiness detection based on novel eye openness recognition method and unsupervised feature learning
Salve et al. Iris recognition using SVM and ANN
CN111460950A (en) Cognitive distraction method based on head-eye evidence fusion in natural driving conversation behavior
CN106599785A (en) Method and device for building human body 3D feature identity information database
Shakya et al. Human behavior prediction using facial expression analysis
Naz et al. Driver fatigue detection using mean intensity, SVM, and SIFT
Rajevenceltha et al. A novel approach for drowsiness detection using local binary patterns and histogram of gradients
Sadeghi et al. Modelling and segmentation of lip area in face images
Faraji et al. Drowsiness detection based on driver temporal behavior using a new developed dataset
D'orazio et al. A neural system for eye detection in a driver vigilance application
Panicker et al. Open-eye detection using iris–sclera pattern analysis for driver drowsiness detection
Monwar et al. Eigenimage based pain expression recognition
Ananthakumar Efficient face and gesture recognition for time sensitive application
Campadelli et al. Localization of facial features and fiducial points
Karungaru et al. Face recognition in colour images using neural networks and genetic algorithms
CN115100704A (en) Face recognition device and method for resisting spoofing attack by combining thermal infrared and visible light
Sheikh Robust recognition of facial expressions on noise degraded facial images
Dornaika et al. Driver drowsiness detection in facial images
CN114757967A (en) Multi-scale anti-occlusion target tracking method based on manual feature fusion
Akinci et al. A video-based eye pupil detection system for diagnosing bipolar disorder
CN113408389A (en) Method for intelligently recognizing drowsiness action of driver

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination