CN205230272U

CN205230272U - Driver drive state monitoring system

Info

Publication number: CN205230272U
Application number: CN201520900920.9U
Authority: CN
Inventors: 王海
Original assignee: 熊强; 王海
Current assignee: Dream Innovation Technology (Shenzhen) Co., Ltd.
Priority date: 2015-11-12
Filing date: 2015-11-12
Publication date: 2016-05-11
Anticipated expiration: 2025-11-12

Abstract

The utility model provides a driver drive state monitoring system, this system include camera module, treater, storage module, speaker, wherein, and camera module for acquire the image information of auttombilism seat, storage module for storage algorithm model file, parameter and user configuration file, the speaker is used for indicateing alarm information to the user, the treater includes arithmetic element, main control unit, internal storage location and system bus, the arithmetic element is arranged in handling according to the data that the internal storage location was read in the order of main control unit to export the result to main control unit, the main control unit is handled the logical decision of system's operation in -process, still is used for the connection of control hardware module and opens simultaneously, and the simultaneous control speaker sends reports to the police and notify sound, the internal storage location provides the memory support for arithmetic element and main control unit. The utility model discloses can utilize the computer vision technique to monitor ordinary driver's drive state, indicate and standardize driver's action.

Description

Automobilist's driving condition supervision system

Technical field

The utility model relates to the technical field such as computer vision, automotive safety, particularly a kind of automobilist's driving condition supervision system.

Background technology

At present, automobile quantity gets more and more, and traffic is more and more flourishing, and the frequency that people drive to go on a journey increases before comparing greatly.Although drive, number of times increases, and the security precautions of people does not but strengthen accordingly.How to ensure that driving safety is a very important problem in this case.In all driving accidents, fatigue driving is one of wherein maximum accident induced factor.

Given this, industry develops the equipment of monitoring fatigue driving, but these equipment are based on very simply supposing mostly, such as carry out head with gravity sensor and move up and down, to detect sleepiness.This equipment false alarm rate is high, and usability is not strong.In addition, also have some technical schemes based on the analysis of driver's face-image to detect driver drowsiness, but its image analysis technology processing time used is long, the treatable different conditions of institute is limited, poor practicability, and easily blocks driver's sight line.

Summary of the invention

The purpose of this utility model is to provide a kind of automobilist's driving condition supervision system, can identify the abnormal activity of automobilist in time.

To achieve these goals, the utility model provides following technical scheme:

A kind of automobilist's driving condition supervision system, is characterized in that, comprise processor, photographing module, memory module, loudspeaker, wherein,

Above-mentioned processor comprises main control unit, arithmetic element, internal storage location and system bus;

Above-mentioned photographing module is connected with above-mentioned processor, for obtaining the image information of car steering seat and being sent to above-mentioned internal storage location via said system bus;

Above-mentioned memory module is used for storage algorithm model file, parameter and user profile, and above-mentioned memory module is connected with above-mentioned processor, and above-mentioned processor can call and revise the data stored in above-mentioned memory module;

Above-mentioned loudspeaker is connected with above-mentioned processor, for pointing out warning message when receiving the instruction of above-mentioned processor to user;

Above-mentioned main control unit processes the Logic judgment in system operation, simultaneously also for connection and the unlatching of control hardware module; The data that above-mentioned arithmetic element is used for reading in above-mentioned internal storage location according to the order of above-mentioned main control unit are processed, and export result to above-mentioned main control unit; Above-mentioned internal storage location provides internal memory support for above-mentioned arithmetic element and above-mentioned main control unit;

Above-mentioned automobilist's driving condition supervision system also comprises:

Whether face detection module, detect current driver and be in and can in sensing range, as can't detect face, then point out driver to adjust the position of above-mentioned camera, until can face be detected; Face detected, then the above-mentioned main control unit prompting that makes above-mentioned loudspeaker start broadcasting system to start working;

Mouth localization and tracking module, constantly obtains above-mentioned image information from above-mentioned camera, and carries out mouth detection and localization and mouth tracing detection;

Eyes location and sort module nictation, carry out eyes location, lower eyelid distance in analysis, until the frame number analyzed exceedes some, and according to upper lower eyelid distance, the threshold value of initialization classification nictation, carries out classification nictation and judge; With

Human facial expression recognition module, constantly obtains above-mentioned image information from above-mentioned camera, by analyzing above-mentioned image information, detecting that driver expresses one's feelings in real time and comprehensively analyzing driver driving state.

Preferably, in such scheme, above-mentioned photographing module adopts infrared camera, and with the addition of infrared light compensating lamp and optical filter.

Preferably, in such scheme, also comprise Universal Database and infrared picture data storehouse, described face detection module, described mouth localization and tracking module, described eyes location with nictation sort module and described human facial expression recognition module all by shift learning by the model above the Model transfer above described Universal Database to infrared picture data storehouse.

Preferably, in such scheme, also possess bluetooth module, above-mentioned processor is connected with handheld terminal or mobile unit by above-mentioned bluetooth module.

Preferably, in such scheme, also possess wireless communication module, above-mentioned processor is connected with handheld terminal or mobile unit by above-mentioned wireless communication module.

Preferably, in such scheme, above-mentioned processor is Blackfin53x digital signal processor.

The utility model provides a kind of computer vision technique that utilizes to monitor the intelligent system of the driving condition of normal driver.When fatigue driving appears in driver, drive absent mindedly (to glance right and left, come back and bow, such as play mobile phone), when driving when emotional instability, system can take adequate measures, and as with audible alarm, vibration mode points out the behavior with specification driver, also carry out record by software, synchronous and display driver driving custom simultaneously.

Accompanying drawing explanation

Fig. 1 is the system principle diagram of the utility model embodiment;

Fig. 2 is the working-flow figure of the utility model embodiment;

Fig. 3 is the synthesis result analysis process figure of the utility model embodiment;

Fig. 4 is the operating handset process flow diagram of the utility model embodiment;

Fig. 5 is the workflow diagram of each functional module in the utility model embodiment;

Fig. 6 is the LGP feature extraction figure of the utility model embodiment;

Fig. 7 is the MCT feature extraction figure of the utility model embodiment;

Fig. 8 is the face mask of the utility model embodiment;

Fig. 9 is the coordinate diagram of the RELU function of the utility model embodiment;

Figure 10 is the MB-MCT feature extraction figure of the utility model embodiment;

Figure 11 is the eye detection areal map of the utility model embodiment;

Figure 12 is the VectorBoost training process figure of the utility model embodiment;

Figure 13 is the cascade classifier schematic diagram of the utility model embodiment;

Figure 14 is the CNN configuration diagram of the utility model embodiment.

Embodiment

Below in conjunction with the drawings and specific embodiments, the utility model is described in further details.

As Fig. 1, with regard to the system architecture of the utility model embodiment, it comprises processor, photographing module, loudspeaker, bluetooth module, memory module, synchronous DRAM (SynchronousDynamicRandomAccessMemory, hereinafter referred to as SDRAM) etc., wherein, processor comprises arithmetic element, main control unit, internal storage location and system bus.Wherein, photographing module adopts infrared camera, for obtaining Infrared Image Information, and especially human face image information.Photographing module is connected with processor, and the Infrared Image Information of obtained car steering seat is sent to SDRAM via system bus, reads through system bus for the arithmetic element of processor and main control unit; Memory module is used for storage algorithm model file, parameter and user profile, memory module is connected with processor, processor can call and revise the data stored in described memory module, during system cloud gray model, main control unit sends instruction calls related data through system bus to memory module, then memory module by stored related data through system bus stored in SDRAM module, read data for arithmetic element and main control unit.Internal storage location is used for for arithmetic element and main control unit provide necessary internal memory support.Although the internal storage location fast response time that processor carries, cannot satisfy the demands when data are excessive, therefore using SDRAM as the expanding element of internal storage location.Loudspeaker is connected with processor, for pointing out warning message when receiving the instruction of described processor to user.In processor, main control unit mainly judges to process to the various logic in system operation, and simultaneously also for the unlatching of receiving control device, be responsible for various different hardware module to couple together, control loudspeaker sends warning and voice prompt simultaneously.Arithmetic element, data for reading in internal storage location according to the order of main control unit are processed, and export result to main control unit, essentially, computing is carried out for the various computer vision algorithms make belonging to the utility model, operand required for computer vision algorithms make is large, if only rely on main control chip, so arithmetic speed can be slowly.Internal storage location provides internal memory support for algorithm runs.

Camera of the present utility model adopts infrared camera, and with the addition of infrared light compensating lamp and optical filter.Wherein infrared light compensating lamp and optical filter are optional, if only need to run at the environment of enough visible rays, so select common camera.If need also can work under the dark environment of light, then need increase infrared light compensating lamp and need to choose infrared camera.In order to eliminate the various optical noise such as high light and backlight, then need to add optical filter.Camera is connected with main control unit by data bus, therefore the pictorial information accessed by camera is sent in internal storage location through data bus, then computer vision algorithms make obtains image information from internal storage location, at digital signal processing (digitalsignalprocessing, hereinafter referred to as DSP) carry out computing in arithmetic element, calculate various information, and by obtained information feed back to main control unit, control loudspeaker module and bluetooth module is removed, to make corresponding reaction by main control unit.Arithmetic element and main control unit are also coupled together by internal storage data bus.

Preferably, the utility model can also be connected with the handheld terminals such as mobile phone or mobile unit.If think statistics driving habits etc., then installation can be chosen.Whether the installation of the handheld terminals such as mobile phone or mobile unit application program, do not affect the operation of hardware device.

Preferably, using ADIBlackfin53x as dsp processor, most high primary frequency is 600MHZ, has maximum 4GB addressing space, the L1 data-carrier store of the L1 command memory of maximum 80KBSRAM and 2 32KBSRAM, integrated abundant peripherals and interface.What Blackfin53x and DSP was connected has 16MBFlash memory module (scalable to 32MB or 64MB), the outer internal memory SDRAM of 32MB sheet (scalable to 64MB).Memory module is used for audio file needed for storage system and configuration file, internal memory required time SDRAM and SRAM provides whole system to run.Other peripheral module: infrared camera, infrared light compensating lamp, bluetooth module, loudspeaker.On optical texture, adopt the infrared light compensating lamp of the sightless narrow spectrum of human eye; Infrared fileter is positioned at infrared light compensating lamp and infrared camera front, eliminate optical interference and the noise of non-infrared light in external environment, camera can only utilize the infrared light compensating lamp of system self to gather image, (no matter daytime or night, frontlighting or backlight or sail the interference of automobile lamp with or without subtend, waits other optical interference) just can collect stable and image clearly so in any environment.

Analyze known, ultimate principle of the present utility model is exactly that mainly human face analysis technology carries out real-time analysis to driver's state, finally comprehensively draws the driving condition of driver according to computer vision technique, comprising driver drives absent minded, fatigue driving and emotional state when driving.Experimentally obtained threshold value, the driving condition obtain analysis and threshold value compare, if exceed certain threshold value, then can trigger corresponding warning system and report to the police, the behavior of specification driver.

When the utility model is implemented, as Fig. 2, its operating procedure can with reference to as follows:

(1) system power-up self-inspection, as hardware non-fault, then continues.

(2) call face detection module, detect current driver and whether be in and can in sensing range, as can't detect face, then point out driver to adjust device location, until equipment can detect face.

(3) face detected, loudspeaker starts the prompting that broadcasting system is started working.

(4) constantly image information is obtained from camera, and carry out Face datection, mouth detects, and initialization mouth tracking module, call eye contour location algorithm location, analyze the upper lower eyelid distance of user, until the frame number analyzed exceedes some.

(5) according to the upper lower eyelid distance of user, the threshold value of initialization evaluation algorithm nictation.

(6) constantly obtain image information from camera, call algoritic module and image is analyzed, call the driving condition that synthesis result analysis module comprehensively analyzes driver.

(7) if receive user's off signal, then releasing memory, closes bluetooth, exits circulation.

From above-mentioned operating procedure of the present utility model, the utility model mainly comprises two parts functional module: one is corresponding computer vision module, is used for analyzing every two field picture, obtain original information, such as human face region, eye position, nictation/close one's eyes, head pose.Two is that synthesis result analyzes determination module, and its fundamental purpose judges according to various original analysis data the driving condition that driver is current, and such as whether driver is in degree of depth fatigue driving state.Specifically, utilize computer vision algorithms make, the infrared image that analysis camera obtains also obtains some raw informations, mainly comprises the position (may comprise multiple human face region) of face, the position of eyes, the position of (upper inferior orbit) key point near eye contour.Utilize these original analysis information, the in addition statistical study of synthesis result analysis module, judges the current doze state that whether is in of driver, the focus of driving mood how and when driving.Last according to Comprehensive analysis results, then call corresponding hardware module to carry out to remind and send data to cell phone software by bluetooth and carry out storing and showing, go for user the driving behavior checking oneself.

Specifically, the computer vision algorithms make of the utility model embodiment has: Face datection algorithm, eyes location algorithm, nictation sorting algorithm, mouth localization and tracking algorithm, facial expression recognition, front face sorting algorithm.Be described below respectively:

For Face datection algorithm, utilize vectorial boosting algorithm (VectorBoosting) and shift learning (transferlearning) as sorter framework to train whole algorithm, carry out the feature of abstract image with amendment central transformation (MCT) and local gradient mode (LGP).Simultaneously, be different from original vectorboosting algorithm, obtain general Face datection model on general database after, regather a part of infrared training image, by shift learning (transferlearning) technology, general face's detection model is transferred to based on above infrared training image, make obtained Face datection model compared to general Face datection model, performance is better, has more specific aim.

Concrete steps are as follows: (1) feature extraction phases: the feature (comprising infrared and Universal Database) extracting all faces and non-face image with amendment central transformation (MCT) and local gradient mode (LGP), concrete MCT and LGP feature extraction mode is shown in accompanying drawing 6 and 7.

(2) positive negative sample builds the stage: classified by the image in Universal Database and infrared picture data storehouse, comprise facial image and non-face image, by human face region image scaling to 40 × 40 pixels, and according to the different attitudes of face, each face is divided into different subsets.

(3) the original training stage: adopt traditional VectorBoosting algorithm to carry out the structure of cascade classifier on Universal Database, sorter feature used is the combination of MCT characteristic sum LGP feature.

(4) the shift learning stage: adopt VectorBoosting algorithm to carry out the structure of cascade classifier on infrared data storehouse, and take into account the model obtained above Universal Database simultaneously, optimize specific training objective equation, make obtained model not only have the feature of universal model but also have the feature of infrared picture data, overcome infrared picture data quantity not sufficient problem.

(5) detection-phase: utilize the VectorBoosting model that the shift learning stage obtains, detects human face region with based on vector tree-model structure on infrared image.

In feature extraction phases, the circular based on MCT feature is adopted to be:

C (\overset{&OverBar;}{I} (x), I (y)) = \{\begin{matrix} 1 & \begin{matrix} i f & \overset{&OverBar;}{I} (x) < I (y) \end{matrix}, \\ 0 & o t h e r w i s e, \end{matrix}

Wherein, N'(x)=N'(x) ∪ x is the local space neighbours of x, I (x) is the gray-scale value of pixel x, the average gray value of all neighbours of pixel x, linked operation symbol, definition is equal to C.

Employing based on the circular of LGP feature is:

\begin{matrix} {LGP}_{p, r} (x_{c}, y_{c}) = Σ_{n = 0}^{p - 1} s (g_{n} - \overset{&OverBar;}{g}) 2^{n} & s (x) = \{\begin{matrix} 0, & \begin{matrix} i f & x < 0 \end{matrix}, \\ 1, & o t h e r w i s e \end{matrix} \end{matrix}

Wherein, (x _c, y _c) be pixel center point, central point i _ci is put with neighbours _nbetween gray value differences be g _n=| i _n-i _c|,

Be different from other algorithm, MCT and LGP feature has the place of complementing each other, and combines the stability that can improve whole algorithm, and in identical false detection rate situation, correct verification and measurement ratio can promote greatly, and the whole feature extraction and calculation time is still lower.

Build the stage at positive negative sample, for many non-face regions, build more non-face picture, making the positive and negative sample proportion in whole picture sample uneven, by accepting a large amount of negative samples, the false alarm rate of the whole model of Face datection can be reduced.

In the original training stage, the VectorBoosting training pattern that direct use is original, go to go to select a part of eigenwert from LGP and the MCT eigenwert of higher-dimension at every turn, and give each Weak Classifier with certain weight, in conjunction with the result of current sorter, duplicate removal newly carries out weights distribution to each image, classification error give larger weights, classify and correct give less weights, concrete training process is shown in Figure 12.Formula when choosing Weak Classifier is:

f_{t} (x) = \arg \min {Σ_{i = 1}^{m} w_{i}^{t - 1} \exp (- v_{i} f (x_{i}))}

Wherein exp is exponential function, f (x _i) be Weak Classifier, v _icurrent key words sorting, sample i is in the weight of the t time iteration.In the shift learning stage, input is the model trained on Universal Database, output be by Universal Database above the model of Model transfer above infrared image.As much as possible the model parameter above Universal Database is transferred to above infrared image in order to the gap weighed between model, we used KL distance to weigh the difference of model.Concrete optimization formula is as follows:

l o s s (F (x)) = \frac{1}{m} Σ_{i = 1}^{m} \exp (- v_{i} F (x_{i})) + λ K L (F (x), \overset{&OverBar;}{F (x)})

K L (p, q) = Σ_{i = 1}^{N} p_{i} l o g \frac{p_{i}}{q_{i}}

In the middle of formula, λ arranges different values, and finally we choose one test errors rate can be allowed to arrive minimum λ, be on Universal Database, train obtained model, p, q are two probability distribution, p _i, q _iit is the probability of i-th example in two distributions.

At test phase, last strong classifier is because the model of shift learning is the same with the model that non-diverting learns, just parameter is different, therefore adopt traditional waterfall tree cascade classifier to carry out Face datection, pyramid convergent-divergent is carried out to every two field picture, on different zoom scale, carry out Face datection, then by detect resultant scaled to original image size.When different image scalings, in order to accelerate arithmetic speed, carry out convergent-divergent to the image of different scale, parallel computation eigenwert, calculated product partial image detects simultaneously, and concrete testing process is shown in Figure 13.

Owing to using VectorBoosting algorithm, therefore robustness is stronger, can process the face of different attitude, makes Face datection scope larger.Compare and directly on infrared image, train the model obtained, the facial image that on the internet utilized, a large amount of attitudes of existing are different is to strengthen the robustness of model, compare simultaneously and only use Internet picture to train the general face's detection model obtained, add Infrared Image Information, make last model more pointed, on infrared image, than universal model, there is better effect.Further, different from general Haar feature extracting method, owing to employing the combination of the feature extraction of amendment center variation characteristic extraction algorithm (MCT) and local gradient mode (LGP), make algorithm very insensitive to the illumination variation of image, Detection results is better.After the model parameter obtaining whole algorithm, a given secondary new images, first can MCT characteristic sum partial gradient pattern (LGP) of abstract image, calculate integral image, then slided in positions all on this image, utilize waterfall cascade model to carry out evaluation to judge whether this window is human face region to the window (40 × 40 pixel) that each slides.

For eyes location algorithm, utilize FloatBoosting as sorter model, except sorter model and feature extraction different, whole detecting step is substantially the same with Face datection.

Concrete steps are as follows: (1) feature extraction phases: revise by multimode the characteristics of image (comprising infrared and Universal Database) that central transformation (MBMCT) extracts all eye areas and non-ocular region, concrete MBMCT is shown in accompanying drawing 10.

(2) positive negative sample builds the stage: classified by the image in Universal Database and infrared picture data storehouse, comprise eye areas image and non-ocular area image, by eye areas image scaling to 15 × 15 pixels.

(3) the original training stage: adopt traditional FloatBoosting algorithm to carry out the structure of strong classifier on Universal Database, sorter feature used is MBMCT;

(4) the shift learning stage: adopt FloatBoosting algorithm to join by force the structure of sorter on infrared data storehouse, and take into account the model obtained above Universal Database simultaneously, go to optimize specific training objective equation, make obtained model not only have the feature of universal model but also have the feature of infrared picture data, overcome infrared picture data quantity not sufficient problem;

(5) detection-phase: utilize the FloatBoosting model that the shift learning stage obtains, detects eye areas with based on vector tree-model structure on infrared image.Finally, the multiple potential eyes rectangular area obtained is averaged, and the central point getting rectangle is as eyeball position.

In feature extraction phases, the circular based on MBMCT feature is adopted to be:

T (x^{'}, y^{'}) = \{\begin{matrix} 1 & \begin{matrix} i f & R (x^{'}, y^{'}) > \overset{&OverBar;}{R}, \end{matrix} \\ 0 & o t h e r w i s e, \end{matrix}

\overset{&OverBar;}{R} = \frac{1}{9} Σ_{x^{'} = x - 1}^{x^{'} = x + 1} Σ_{y^{'} = y - 1}^{y^{'} = y + 1} R (x^{'}, y^{'}) .

Wherein R (x, y)=ii (x+1, y+1)-ii (x, y+1)-ii (x+1, y)+ii (x, y), ii (x, y) is integral image.

Because MBMCT can revise in Face datection MCT feature extraction result, so just can avoid repeating to extract feature, on digital signal processor, the required processing time will significantly reduce.

FloatBoosting target equation used of original training stage is:

L o s s (H_{M}) = \underset{i}{Σ} \exp (- y_{i} H_{M} (x_{i}))

h _m＝argminLoss(H _M-1(x)+h(x))

Wherein y _ithe mark of i-th example, H _m(x _i) be that strong classifier (having run M iteration) is to example x _iclassification results, h (x) is new Weak Classifier.Shift learning optimization method used is:

J = \underset{i}{Σ} \exp (- y_{i} H_{M} (x_{i})) + λ K L (H_{M}, \overset{&OverBar;}{H_{M}})

Wherein be in conventional data learn the strong classifier that obtains, λ is balance parameters.

After obtaining the algorithm model parameter for infrared image, given any one secondary new human face region (supposing that human face region detects), algorithm can carry out pyramid convergent-divergent to this facial image again, sliding to all positions in each convergent-divergent facial image region, moving window size is 15 × 15 pixels, then carries out evaluation to judge whether this window is comprise eye areas to the window that each slides.In order to shorten the processing time on embedded platform, human face region can be divided into 4 pieces when realizing, then only above, the detection of eyes is carried out in two regions, sees Figure 11.Obtain all may comprise the region of eyes after, then average computation is done to obtain the position of eye areas to all regions, after obtaining eye areas position, gets the position of regional center position as eyes.With general based on adaptive boosting algorithm (Adaboost), the eye detection methods such as DCT change are different, it takes FloatingBoost as sorter, MBMCT is as feature extracting method, in fact machine test result shows, MBMCT is more stable than DCT change, and identifiability is stronger.

For sorting algorithm nictation, after obtaining eyes particular location, ocular vicinity area part is extracted, extract multimode amendment central transformation (MBMCT) eigenwert, then current signature vector is classified.Because classification nictation is two classification problems, and characteristic dimension is higher, therefore adopts the Nonlinear Support Vector Machines model (SVM) based on RBF core.Nonlinear Support Vector Machines (SVM) formula based on RBF core is:

\tilde{L} (α) = Σ_{i = 1}^{n} α_{i} - \frac{1}{2} \underset{i, j}{Σ} α_{i} α_{j} y_{i} y_{j} k (x_{i}, x_{j})

0≤α _i≤C′

Σ_{i = 1}^{n} α_{i} y_{i} = 0

Wherein, α _ithe hidden variable to i-th example, y _ithe mark of i-th example, x _ithe input feature vector of i-th example, k (x _i, x _j) be similarity between i-th example and a jth example (being realized by definition kernel k), C is punishment parameter.In order to calculate k (x _i, x _j), adopt RBF kernel function.Whole nictation, the workflow of sorting algorithm was as follows:

Feature extraction phases: revise central transformation (MBMCT) eigenwert by multimode and extract all non-eye closing images and eye closing characteristics of image (comprising infrared and Universal Database).Eye areas image size is 15 × 15 pixels.

The original training stage: the eigenwert of the eye closing image obtained above Universal Database and non-eye closing image is input in SVM classifier and trains, and specify series of parameters, such as punish parameter, nuclear parameter, optimized algorithm.

The shift learning stage: adopt SVM algorithm to carry out the structure of new sorter on infrared data storehouse, and take into account the model obtained above Universal Database simultaneously, go to optimize new training objective equation, make obtained model not only have the feature of universal model but also have the feature of infrared picture data, overcome infrared image training data quantity not sufficient problem.The formula of concrete shift learning is similar to Face datection, is all to adopt KL distance to judge.

Sorting phase: utilize the SVM model that the shift learning stage obtains, suppose that eye areas detects, the general partial binary feature of multimode is extracted to the eye areas detected, eigenwert is input in the SVM model trained and goes, obtain classification results, also can obtain the degree of belief of classification results simultaneously.

Compared to other based on image processing techniques, upper lower eyelid is detected as by erosion algorithm, then eye state is judged by upper lower eyelid distance, more insensitive to illumination based on multimode partial binary characteristics extraction, and to image noise robust more, and experiment prove feature can classifying type stronger.Relative to partial binary feature extracting method, find that the general partial binary feature extracting method of multimode is applicable to classification nictation more, and computer experiment result shows, the algorithm of linear R BF kernel support vectors machine is higher than the classification accuracy of linear SVM algorithm.And unlike, algorithm can carry out independent condition adjudgement to each eyes, then the state of two eyes is integrated, and obtains last result of determination.Meanwhile, in order to strengthen the accuracy rate of algorithm, being provided with interface, revising the threshold value of classification results for user.

For mouth localization and tracking algorithm, with eyes location, also be have employed Boosting model and MBMCT feature extraction, difference is that the image of training is different, the image of mouth location training is the concrete some position of mouth region and mouth, and when carrying out mouth and detecting, moving window size and eye detection are different, and the reference position of detection is also different with eye detection.With eye detection module unlike, at mouth, with the addition of following function, under positioning function cisco unity malfunction, we can also find mouth position like this.Because mouth positioning work principle is the same with eyes location, because in this no longer repeated description testing process.

Track algorithm hypothesis driver head moving range can not be very large, between every continuous a few two field picture relatively, therefore even without under face situation being detected, still can suppose that present frame exists face and human face region is in previous frame image near human face region, therefore track algorithm can be searched at current all potential Probability Areas, coupling, then the most similar mouth region is found, and judge similarity, if similarity is greater than certain threshold value, then follow the tracks of successfully, otherwise, follow the tracks of unsuccessfully, so just can judge not there is facial image in present image, namely driver is not in current driver's seat.

Differently with general tracking module be, the robustness feature extraction algorithm (SURF) based on acceleration that the utility model embodiment adopts is as feature extracting method, SURF fast operation, when following the tracks of, it has the incomparable speed advantage of other feature extractions.In addition, we adopt frequency plot to intersect similarity and weigh similarity before two vectors, be different from other cosine similarity, frequency plot intersection measurement way arithmetic speed is exceedingly fast, and do not relate to multiplying, floating-point operation, and close with cosine similarity on measuring similarity effect.Frequency plot intersection calculating formula of similarity is as follows:

K_{\cap} (a, b) = \frac{1}{2} Σ_{i = 1}^{n} (a_{i} + b_{i} - | a_{i} - b_{i} |)

Wherein a, b are two statistic histograms, a _iand b _ihistogram a, i-th frequency values in b.

For facial expression recognition, after face being detected and navigating to eyes, by linear interpolation and affine change, human face region is normalized into the little image of 40 × 50 pixels, by binary mask, some regions shield of image is fallen (forehead, chin etc. follow the region little with expression recognition relation) again, afterwards again to new image zooming-out multimode central transformation (MBMCT) eigenwert, extracted feature is input in neural network and classifies.Last classification results is exactly facial expression, and our algorithm comprises 4 class expressions at present, neutral, glad, angry and surprised, but can certainly add more multiple expression as required.

The binary mask that the utility model embodiment uses is shown in Fig. 8.

The neural network that the utility model embodiment uses is 4 layers of convolutional neural networks (CNN).Its framework map is shown in Figure 14.Ground floor input is the eigenwert extracted from image, and the second layer is convolutional calculation layer, and third layer is hidden variable layer, and the 4th layer is classification layer.The wave filter size of convolutional calculation layer is 10 × 10 pixels, and the interval of each wave filter is 2 pixel values, and hidden variable layer is all connected with classification layer, and its activation function is RELU function, and specific formula for calculation is:

f(x)＝max(0，x)

Its concrete function shape is shown in Fig. 9.Compare based on Sigmoid or Tanh activation function with general, when training there is not derivative disappearance or derivative explosion issues in RELU, make whole training process more stable like this, and the calculating of RELU function is simpler, do not need to carry out floating-point operation, greatly can reduce the processing time on embedded platform.Compared with the general Expression Recognition algorithm based on neural network, the network architecture of the present embodiment is different, and employing binary mask, the image-region little with Expression Recognition relation is masked, and employ convolutional calculation layer, more insensitive to the subtle change making algorithm to image, last accuracy rate is higher.

Meanwhile, adopt MBMCT change, with adopting DCT change or the expression recognition based on template to compare, the identifiability of MBMCT feature is higher.

Because most of computer vision algorithms make all needs training, different training datas all may affect last effect, and in order to reach reasonable effect, the training data details that part needs the module of training to use is specific as follows:

Face datection: when training, in order to reduce false alarm rate, we utilize facial images many on internet to train our model, training data is very huge, and the inside contains a large amount of negative samples (non-face image), whole training sample comprises the facial image of not agnate, the colour of skin, age, sex and different attitude.

Eye detection: when training.Have collected the picture that 10000 comprise eye areas, 10000 pictures not comprising eye areas, this 20,000 ten thousand pictures comprises the various different colour of skin, age, sex etc.First carry out training on this picture obtaining general eye areas detection model, afterwards, we regather the infrared image partly comprising eye areas and revise model, make the model obtained more meet infrared image.

After training completes, all modules only can retain the model of test procedure and training gained.

Because each algorithm above all needs to operate in above embedded platform, therefore when realizing, have employed fixed-point implementation, to avoid floating-point operation, greatly can strengthen the travelling speed of whole system like this.

Specifically, a given two field picture, as shown in Figure 5, analysis process is as follows:

(1) external image information is obtained by camera.

(2) face is detected by real-time face detection module.

If (2-a) can't detect face, then start mouth tracking module, see and can trace into mouth.If can trace into, then exit whole flow process, and to mark current be result be derive from tracking module; Otherwise, directly judge that current driver is not on driver's seat, and exit whole flow process.

If (2-b) face detected, by more all human face region sizes detected, choose the face of maximum face as subsequent analysis.

(3) eye detection is carried out to the maximum face region obtained.

If (3-a) detected, then preserve eye position, and condition adjudgement is carried out to eyes, preserve eye state result.

If (3-b) can't detect eyes, then preserve and can not find eye information, and eye state is used as normal process.

(4) mouth detection is carried out to the maximum face region obtained.

(5) if detected, then mouth position is preserved, otherwise, starting mouth tracking module, obtaining mouth position by following the tracks of.

(6) if eye position detected, and detect or trace into mouth position, then starting to start head pose judge module, analyze and preserve head pose information.

(7) if eyes detected, then maximum human face region is normalized, then starts Emotion identification module, Emotion identification result is left.

When carrying out synthesis result and analyzing, summarize according to the initial analysis result that each analysis module provides and process process, then according to the result analyzed out, judge that driver is in different driving conditions, then the prompting that driver is different is supplied to, further, at equipment run duration, analysis result can be sent to application program of mobile phone by bluetooth always.The current prompting that can provide comprises: absent mindedly (glance right and left, come back low first-class), comprise slight and severe, fatigue driving (eyes closed number of times), comprise degree of depth fatigue and either shallow fatigue, and driving emotional state (neutral, angry, glad, surprised etc.).

If hardware of the present utility model is per second can process at least 3 two field pictures, after often analyzing 10 two field pictures, comprehensive Analysis and judgments will be done once.After often analyzing 20 two field pictures, will do once comprehensive, information will be sent to application program of mobile phone.

It is absent minded that mainly to comprise slightly absent minded and severe absent minded.The slight absent minded head pose deviation that is mainly reflected in is not very large, and the absent minded head pose deviation that is mainly reflected in of severe is very large, and driver still but can't detect face (head deflection is too large) on driver's seat.

Degree of fatigue: degree of fatigue mainly to also have in eye contour the distance of upper lower eyelid to carry out comprehensive descision according to frequency of wink.If frequency of wink is too high and upper lower eyelid apart from too low, so can judge driver be severe sleep; If frequency of wink is moderate and upper lower eyelid distance is lower, then judge that driver is hypophypnosis.If frequency of wink is little, be within normal range, then judge that driver's state is normal.

Mood is analyzed: when driver driving, and the expression of indignation is more dangerous, by analyzing multiple image, is then weighted statistics, if detect that driver has too much angry facial expression, then can note keeping happy by loudspeaker warning reminding driver.

After often doing once judgement, according to analysis result, corresponding warning system can be triggered and push corresponding data to the handheld terminals such as mobile phone or mobile unit program by bluetooth.

As shown in Figure 3, the idiographic flow of above-mentioned each analysis is as follows:

(1) the various data structure of initialization, initialization array.

(2) present image information is obtained from outside camera.

(3) call analysis module to analyze present image, obtain analysis result.

(4) if the frame number analyzed has arrived 10 frames, then step (5) is entered; Otherwise enter step (2), continue to analyze present image.

(5) add up the result that current 10 frames are analyzed, judge absent minded degree, degree of fatigue and mood degree.

(6) if absent minded, then call alarm module, play absent minded prompting, enter step (9).

(7) if degree of fatigue is comparatively large, then call alarm module, play fatigue driving prompting, enter step (9).

(8) if angry proportion is comparatively large, then call alarm module, broadcasting please keep normal Emotive advisory, enters step (9).

(9) current preserved data are set to sky again, enter into step (2), start next round analysis.

(10) if the frame number analyzed has arrived 20 frames, then step (11) is entered; Otherwise enter step (2), continue to analyze present image.

(11) current 20 frame analysis results are done a statistics, comprise the slight fatigue driving time long, sever fatigue driving time is long, time of slightly glancing right and left is long, severe time of glancing right and left is long, and normal time, neutral expression's time is long, the happiness expression time is long, and the surprised expression time is long and the angry facial expression time is long.

(12) call bluetooth module, the result in step (11) is sent to the handheld terminals such as mobile phone or mobile unit application program by bluetooth.

Popular along with softwares such as Applewatch, Fitbit, present Intelligent hardware is more and more, and handheld terminal or mobile unit application program couple together with mobile phone etc., the handheld terminals such as mobile phone or mobile unit application program are mainly used in statistics and display analysis result, consult for user.Compared to other Drowsy drivers detection system, except the improvement above algorithm, outside the raising above hardware and outward appearance, the utility model can also realize being connected with mobile terminals such as mobile phones.

The handheld terminals such as current phone or mobile unit application program mainly comprise following module: data receiver/memory module, data statistics module, data disaply moudle, Intelligent bracelet and watch kind (as Applewatch) pushing module.Be specially:

Data receiver/memory module: this module mainly processes reception and the storage of data, is matched by bluetooth and hardware device when receiving data, once successful matching, then starts to accept data.Store and mainly utilize SQLite embedded data bank interface, real-time for the data received is deposited into SQLite.In addition, this module also comprises from SQLite database according to specific condition query database.

Data statistics module: mainly different according to user requests carrys out statistics in different ways, such as according to week, or according to hour carrying out statistics.In addition, this module comprises different statistics strategies, when such as in the end assessing the PTS of user security driving habits, whether gives all driving behaviors lack of standardization with same weight, or gives the different different weights of driving behavior lack of standardization.

Data disaply moudle: this module mainly by add up the data obtained and show, display mode comprises adds up according to over week or the moon, and the display mode that user can select circular display and column display etc. different.The packet of display is long containing the deep sleep time, and the hypophypnosis time is long, and the time of slightly glancing right and left is long, and severe time of glancing right and left is long, the time of various emotional state grow and normal time length.After the score information that display is basic, if user clicks every sub-score, software then can enter corresponding display interface and go to show more detailed statistical information.

AppleWatch pushing module: if can be connected to Applewatch, so analyzing after deep sleep and severe glances right and left, can be pushed to Applewatch and pushing vibrations, driver to be waken up.

It is emphasized that the utility model is not limited to iOS system, be equally also applicable to android system.

As Fig. 4, the workflow of the handheld terminals such as mobile phone or mobile unit application program is as follows:

(1) handheld terminal such as mobile phone or mobile unit application program are opened.

(2) handheld terminal or the mobile unit bluetooths such as user's starting hand-set is pointed out.

(3) application program and hardware device carry out Bluetooth pairing, if successful matching, then enter step (4), otherwise, attempt matching with hardware always, repeat step (3).

(4) constantly monitor bluetooth port, see and whether have data to send over from hardware device, if any, then enter step (5), otherwise, monitor bluetooth port always, repeat step (4).

(5) data are accepted, if data be deposited in SQLite database; Enter step (4).

(6) if user selects to show data, then corresponding analytic statistics module statistics is called.

(7) need to carry out display data according to user.

To sum up, the utility model can detect automobilist's driving condition in real time, improves traffic safety, in conjunction with computer vision, infrared image technology, by detecting many kinds of parameters, can provide safe driving reference for driver.Such as, human face detection tech is utilized to detect the scope of activities of driver's face, eyes location algorithm is utilized to orient driver's eyes, classification nictation is utilized to judge the closure state of driver's eyes, utilize head pose estimation technology head 3 d pose that real-time judge goes out driver, utilizing Expression Recognition algorithm to detect in real time, that driver expresses one's feelings is (neutral, glad, surprised, indignation etc.), then these information are carried out compressive classification, identify the driving behavior various lack of standardization of driver, glance right and left when comprising driving, feel bad, fatigue driving, driver is not at operator seat etc.In addition, the utility model can also be analyzed data based on the handheld terminals such as mobile phone or mobile unit real-time reception and also be stored, shows, and browses the driving habits of specification oneself for driver.Generally speaking, it is more accurate that the utility model has algorithm, and robustness is stronger, and manageable extreme case is more, the infrared light compensating lamp that spectrum is narrower can be selected, and select high-quality infrared fileter to eliminate the advantage such as various optical interference and noise in the various external world completely.It is less that the monitoring system that the utility model provides then can realize volume, and low in hardware cost is controlled, is easy to extensive popularization.

As known by the technical knowledge, the utility model can be realized by other the embodiment not departing from its Spirit Essence or essential feature.Therefore, above-mentioned disclosed embodiment, with regard to each side, all just illustrates, is not only.Allly all to be included in the utility model within the scope of the utility model or being equal to the change in scope of the present utility model.

Claims

1. automobilist's driving condition supervision system, is characterized in that, comprises processor, photographing module, memory module, loudspeaker, wherein,

Described processor comprises main control unit, arithmetic element, internal storage location and system bus;

Described photographing module is connected with described processor, for obtaining the image information of car steering seat and being sent to described internal storage location via described system bus;

Described memory module is used for storage algorithm model file, parameter and user profile, and described memory module is connected with described processor, and described processor can call and revise the data stored in described memory module;

Described loudspeaker is connected with described processor, for pointing out warning message when receiving the instruction of described processor to user;

Described main control unit processes the Logic judgment in system operation, simultaneously also for connection and the unlatching of control hardware module; The data that described arithmetic element is used for reading in described internal storage location according to the order of described main control unit are processed, and export result to described main control unit; Described internal storage location provides internal memory support for described arithmetic element and described main control unit;

Described automobilist's driving condition supervision system also comprises:

Whether face detection module, detect current driver and be in and can in sensing range, as can't detect face, then point out driver to adjust the position of described camera, until can face be detected; Face detected, then the described main control unit prompting that makes described loudspeaker start broadcasting system to start working;

Mouth localization and tracking module, constantly obtains described image information from described camera, and carries out mouth detection and localization and mouth tracing detection;

Human facial expression recognition module, constantly obtains described image information from described camera, by analyzing described image information, detecting that driver expresses one's feelings in real time and comprehensively analyzing driver driving state.

2. automobilist's driving condition supervision system according to claim 1, is characterized in that, described photographing module adopts infrared camera, and with the addition of infrared light compensating lamp and optical filter.

3. automobilist's driving condition supervision system according to claim 2, it is characterized in that, also comprise Universal Database and infrared picture data storehouse, described face detection module, described mouth localization and tracking module, described eyes location with nictation sort module and described human facial expression recognition module all by shift learning by the model above the Model transfer above described Universal Database to infrared picture data storehouse.

4. automobilist's driving condition supervision system according to claim 1, is characterized in that also possessing bluetooth module, and described processor is connected with handheld terminal or mobile unit by described bluetooth module.

5. automobilist's driving condition supervision system according to claim 1, is characterized in that also possessing wireless communication module, and described processor is connected with handheld terminal or mobile unit by described wireless communication module.

6. automobilist's driving condition supervision system according to claim 1, is characterized in that, described processor is Blackfin53x digital signal processor.