CN108216254B

CN108216254B - Road anger emotion recognition method based on fusion of facial image and pulse information

Info

Publication number: CN108216254B
Application number: CN201810022416.1A
Authority: CN
Inventors: 杨立才; 于申浩; 张成昱
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2018-01-10
Filing date: 2018-01-10
Publication date: 2020-03-10
Anticipated expiration: 2038-01-10
Also published as: CN108216254A

Abstract

The invention discloses a road rage emotion recognition method based on fusion of facial images and pulse information, which comprises the steps of collecting facial images and pulse information of a driver, respectively preprocessing the facial images and the pulse information, respectively extracting characteristic quantities capable of reflecting road rage emotion of the driver from the facial images and the pulse information, fusing the two extracted characteristics, establishing a road rage emotion recognition model of the driver based on related algorithms and technologies of data mining and machine learning, monitoring the emotion state of the driver in real time, and timely judging the emotion state of the driver. The road rage emotion real-time monitoring and distinguishing method can realize real-time monitoring and distinguishing of road rage emotions of drivers, and can effectively reduce adverse effects of road rage symptoms on traffic safety.

Description

Road anger emotion recognition method based on fusion of facial image and pulse information

Technical Field

The invention belongs to the field of traffic safety driving, and particularly relates to a road rage emotion recognition method based on fusion of facial images and pulse information.

Background

With the continuous improvement of the living standard of people and the rapid development of the transportation industry, the number of household cars, taxis, passenger cars and various trucks is increased year by year, which brings great convenience to daily travel and life of people, but at the same time, the current road traffic condition is more complicated. In addition, the fast pace of life and work makes people more and more stressed, and when the stress is transferred to the road traffic, various adverse effects are caused, and one of the adverse effects is road rage.

"road rage" refers to an offensive or angry behavior of a vehicle driver, which is mainly manifested by stigmatizing words or gestures, intentionally using a safety-threatening way to drive a vehicle, implementing behaviors that threaten the life and health of others, and the like. Road traffic safety is an important component of social public safety, and with the increase of vehicles on the road and the increasingly complex road traffic conditions, the influence of road rage on road traffic safety is increasingly prominent. Data published in 2015 as "traffic safety date" shows that 1733 thousands of road rage events are investigated in the whole country, and aggressive driving behaviors caused by the road rage have seriously influenced urban road traffic order.

At present, the domestic and foreign research on the road irritability mainly comprises the generation mechanism and influencing factors of the road irritability of the driver, the behavior expression of the driver when the driver is angry to drive, the influence of the driver angry to drive on the traffic safety, the identification of the road irritability during the driving of the driver, and the like. The research on the generation mechanism of the road irritability symptom, the behavior expression and the adverse effect of the driver angry driving is relatively mature, but the research on the road irritability emotion recognition is less, and the road irritability emotion of the driver is judged mainly by adopting a subjective survey method, namely by interview, questionnaire survey, observation and the like. Although the subjective survey method has the characteristics of direct and simple operation, the subjective survey method is easily influenced by subjective factors of a driver and has time lag, and the driver cannot be timely and effectively helped and early warned of bad behaviors such as driving safety and the like. The objective analysis method of the road rage emotion is a most effective method for identifying the road rage emotion, namely the objective information such as voice, facial images and physiological parameters of a driver is collected, and the objective, real and real-time discrimination of the road rage emotion of the driver is realized based on data analysis. The road rage emotion objective analysis method is limited by the traditional information acquisition and data processing technology, mainly adopts facial images or voice as an information source, and has the defects that the information source is single, the road rage emotion recognition rate cannot achieve the actual application effect and the like. The road rage emotion recognition method based on the multi-source information fusion is explored based on modern data acquisition and information fusion technology, data mining, machine learning and other algorithms, and the road rage emotion recognition method based on the multi-source information fusion is aimed at accurately recognizing the road rage emotion of the driver, so that adverse effects of the road rage emotion of the driver on road traffic safety and the like are reduced and even avoided as much as possible, and the method has important theoretical significance and application value.

Disclosure of Invention

The invention aims to solve the problem that a road rage emotion recognition method based on fusion of facial images and pulse information is provided, the road rage emotion of a driver is accurately recognized and early warned based on technologies such as multi-source information and data fusion, and adverse consequences such as traffic safety accidents caused by the road rage emotion of the driver are prevented.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a road anger emotion recognition method based on fusion of facial images and pulse information comprises the following steps:

step (1): data acquisition:

acquiring facial video image information of a driver through an infrared high-speed camera arranged on an instrument panel in the vehicle;

acquiring pulse information of a driver through a wrist strap type wireless pulse acquisition terminal worn on the wrist of the driver;

step (2): data preprocessing:

acquiring a face image of the driver from the face video image information, and extracting facial features from the face image;

extracting pulse features from the pulse information;

carrying out feature fusion and dimension reduction processing on the facial features and the pulse features to obtain road rage emotion features of a driver;

and (3): training road rage emotion characteristics of a driver by using a Support Vector Machine (SVM), and establishing a road rage emotion recognition model of the driver;

and (4): and monitoring the emotional state of the driver in real time by using the road rage emotion recognition model obtained by training, and judging the road rage emotional state of the driver.

Further, in the step (1), the infrared high-speed camera firstly positions the driver and then collects the face video image; the wrist strap type wireless pulse acquisition terminal adopts a photoelectric pulse sensor to acquire pulse information of a driver.

Further, in the step (2), extracting facial features specifically include:

acquiring a face image of a driver from the face video information, and extracting facial features from the face image, the steps of:

selecting a set time period from video image information, extracting a plurality of continuous frames of images, denoising the images, identifying and segmenting a face region in the images by using an adaptive skin color segmentation method based on a YCbCr color space, and extracting facial features from the face region based on gray projection integration, wherein the face region comprises human eyes, eyebrows and mouth regions.

In the human eye region, the extracted features include: the number of blinks, the frequency of blinks, the degree of opening and closing of eyes, the duration of abnormal opening and closing in unit time and the average blink time, wherein the duration of abnormal opening and closing in unit time exceeds a set threshold value, and then the current driver is in an angry state;

in the eyebrow region, the extracted features include: the relative offset of the left eyebrow and the right eyebrow exceeds a set threshold value, and the current driver is in an angry state;

in the mouth region, the extracted features include: when the mouth abnormal opening degree is larger than a set threshold value, the current driver is in an angry state;

the features extracted from the eye region, the eyebrow region and the mouth region are fused and then the anger state of the driver is comprehensively judged, and the recognition rate is improved.

Further, in the step (2), the pulse feature is extracted, specifically:

the pulse signals are filtered and denoised based on wavelet transformation, and pulse features are extracted from a time domain and a frequency domain respectively and comprise linear features and nonlinear features.

Temporal pulse characteristics, including: the mean value of the interval between adjacent pulse main wave crests, the standard deviation of the interval between adjacent pulse main wave crests, the root mean square of the interval between adjacent pulse main wave crests, the first-order difference of the interval between adjacent pulse main wave crests or the second-order difference of the interval between adjacent pulse main wave crests;

frequency domain pulse characteristics, including: a low frequency power spectrum, a high to low frequency ratio, a power spectrum peak or peak frequency;

a non-linear feature comprising: the HRV correlation index and Lyapunov index of heart rate variability.

Further, in the step (2), the feature fusion and dimension reduction specifically include:

carrying out feature fusion and dimension reduction processing on the facial features and the pulse features to obtain road rage emotion features of a driver, and the steps are as follows:

and performing feature fusion on the facial features and the pulse features to form a multi-dimensional feature space, and then performing dimension reduction on the multi-dimensional feature space to obtain the road rage emotion features of the driver.

Further, in the step (3), part of the experimental samples are extracted from the emotion recognition feature vector matrix to form a training set, the training set is used as an input vector of the Support Vector Machine (SVM), the SVM is trained, the generalization capability and robustness of the SVM are improved, and finally a driver road rage emotion recognition model is constructed.

The step (3) of establishing the driver road rage emotion recognition model comprises the following steps:

step (31): randomly dividing road rage emotion characteristics of the driver subjected to PCA dimension reduction into a training set and a testing set;

step (32): selecting a radial basis RBF kernel function, and determining a penalty factor C and cross checking times;

step (33): training a recognition model by using the training set data, and calculating the recognition rate by using the test set;

when the recognition rate meets the expected requirement, the model training is finished;

and when the recognition rate fails to meet the expected requirement, optimizing a penalty factor C, and performing model training by using the training set again until the recognition rate meets the expected requirement.

The conversion relationship between the YCbCr color space and the RGB color space is as follows:

wherein Y represents luminance, i.e., a gray scale value; cb reflects the difference between the blue part of the RGB input signal and its luminance value; cr reflects the difference between the red part of the RGB input signal and its luminance value.

After color space conversion, continuous data information is formed by calculating the probability value of pixels, a skin color probability image is obtained, binarization is carried out to obtain a face skin color binary image, and then a minimum circumscribed rectangle of the binary image is obtained, and the part is a face region.

And extracting facial features including human eyes, eyebrows and mouth regions from the human face region based on gray projection integration. The formula for the gray scale integral projection is:

horizontal direction:

the vertical direction is as follows:

wherein, I (x, y) is the gray value of the point (x, y) in the image.

The formula of the wavelet transform is:

wherein, f (t) is a signal to be processed;

is a wavelet function; a is the scale of wavelet transform, and controls the expansion and contraction of a wavelet function, corresponding to the frequency; τ is the offset, controlling the translation of the wavelet function, corresponding to time.

The dimensionality reduction treatment adopts a Principal Component Analysis (PCA), and the principal component calculation formula is as follows:

P_i＝(X₁，X₂，X₃，…，X_p)*(L₁，L₂，L₃，…，L_p)^T

wherein, P_i(i ═ 1,2,3, … p) are the column vectors in the feature matrix, L_i(i ═ 1,2,3, … p) are column vectors of the principal component load matrix L.

The radial basis kernel function formula is:

K(x_i,x_j)＝exp(-γ‖x_i-x_j‖²)

wherein gamma is greater than 0, the default value is 1/k, and k is the number of categories.

Compared with the prior art, the invention has the following innovation points:

1. compared with the traditional single image signal source method, the method comprehensively utilizes the pulse information and the face image of the driver to realize the road rage emotion recognition based on the multi-source information characteristic fusion;

2. when the features of the face image are extracted, key features are extracted from the eye, eyebrow and mouth regions respectively, and the recognition rate of road rage emotion is improved by integrating multiple features;

3. when the road rage emotion recognition model is established, a Principal Component Analysis (PCA) method and a Support Vector Machine (SVM) algorithm are used;

4. the method for extracting the face features based on the combination of skin color segmentation and gray scale integral projection is provided, and the road rage emotion features of a driver in a face area can be extracted quickly and accurately.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a flow chart of a method for identifying road rage emotion of a driver;

FIG. 2 is a schematic view of an installation of the information acquisition device;

FIG. 3 is a pulse waveform diagram;

FIG. 4 is a flow chart for establishing an SVM road rage emotion classification recognition model.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

FIG. 1 shows a flow of a driver road rage emotion recognition method, wherein the whole process comprises four parts of acquisition and preprocessing of driver original information, feature extraction of pulse and facial image information, feature fusion dimensionality reduction and establishment of a road rage emotion recognition model. The information acquisition device comprises an infrared high-speed camera and a wrist strap type wireless pulse acquisition terminal.

Fig. 2 shows the position of each acquisition device. In fig. 2, the infrared high-speed camera is positioned on the instrument panel and right in front of the driver, and has dual functions of positioning and information acquisition, and when the driver is at a reasonable information acquisition position, the infrared high-speed camera starts to acquire video information of the driver;

the wrist strap type wireless pulse acquisition terminal adopts a photoelectric pulse sensor to acquire pulse information of a driver and transmits data in a wireless Bluetooth mode;

after the video information and the pulse information of a driver are subjected to original data acquisition, facial images need to be extracted from the video information, and data preprocessing is carried out on the two paths of information, wherein the specific implementation methods respectively comprise the following steps:

(1) selecting a key segment from video information to extract continuous multi-frame images, denoising, and then identifying and segmenting a face region in the images by using a YCbCr color space-based self-adaptive skin color segmentation method, wherein the conversion relationship between the YCbCr color space and the RGB color space is as follows:

After color space conversion, continuous data information is formed by calculating probability values of pixels and a skin color probability graph is obtained, wherein the skin color probability calculation formula is as follows:

wherein, x is Cb and Cr values in the YCbCr color space; m is the sample mean value of the YCbCr color space; c is the covariance matrix:

C＝E{(x-m)(x-m)^T}

and then carrying out binarization to obtain a face complexion binary image, and further obtaining a minimum circumscribed rectangle of the binary image, wherein the part is a face region.

horizontal direction:

the vertical direction is as follows:

wherein, I (x, y) is the gray value of the point (x, y) in the image.

(2) And filtering and denoising the pulse signals based on wavelet transformation. The wavelet transform is an inner product operation performed on each wavelet basis with different scale shifts and a signal to be processed, and the formula is as follows:

wherein, f (t) is a signal to be processed;

And after the preprocessing process is finished, respectively extracting the characteristics of the obtained face image and the pulse information.

The face image is preprocessed to obtain three key areas of human eyes, eyebrows and mouth. When angry emotions appear, facial muscles can be changed dramatically in a short time, and the three key areas are driven to change obviously, so that characteristics are extracted from the three key areas for recognition, and the recognition rate of road anger emotions can be improved better. The concrete characteristics include the following:

in the human eye region, the extracted features include the number of blinks, the frequency of blinks, the degree of eye openness, the duration of abnormal opening per unit time (i.e., opening greater than 110% in the normal state), and the average blink time. The eye opening degree of the driver in the normal state is taken as a reference standard, and a person is usually accompanied by the obvious characteristic of 'glaring' under the angry condition, so that when the eye opening degree of the driver exceeds the normal state, abnormal opening and closing is judged, and if the duration time exceeds 5s, the driver can be preliminarily judged to be in an angry state;

in the eyebrow region, the extracted features include an amount of offset on the eyebrows and a relative amount of offset of the left and right eyebrows. When a driver has angry emotion, eyebrows can change jointly with the action of the glaring, or the eyebrows move upwards obviously, or the eyebrows are wrinkled, so that the left eyebrow and the right eyebrow are displaced relatively. Judging the characteristics of the eyebrow area by matching with the characteristics of the eye area;

in the mouth region, the extracted features include the mouth opening degree, the mouth abnormal opening degree (the opening degree is larger than the normal condition), the mouth opening and closing times and the mouth opening and closing frequency. A person is often expressed emotionally under angry, most notably by language, such as shouting. The maximum opening degree of the mouth of a driver during normal speaking is taken as a reference standard, and if the maximum opening degree exceeds the reference standard, abnormal opening and closing is judged. When the characteristics of the human eyes and the eyebrow regions are used as the main characteristics for judging the road rage state of the driver, the characteristics of the mouth region are introduced, so that the error recognition rate can be effectively reduced, and the recognition accuracy is further improved.

And extracting pulse features from the time domain and the frequency domain of the processed pulse information, wherein the pulse features comprise linear features and nonlinear features. Fig. 3 is a typical pulse image. Monocycle pulse signals generally contain the following components: dominant wave (ascending branch, descending branch), descending isthmus, dicrotic wave, preprotic wave, etc. A, G two points in the figure are aorta opening points, which represent the time of one pulse period; b represents the main wave crest R and marks the arterial blood volume and the maximum value of blood pressure; point D is the wave crest of the wave before the dicrotic pulse, formed by the reflection generated by the artery expansion and the blood pressure reduction in the blood vessel; point E is the descending gorge; point F is the heavy wave peak.

Linear characteristics of a time domain and a frequency domain are extracted from pulse information, the time domain statistical characteristics comprise a mean value of an R-R interval, an R-R interval standard deviation, an R-R interval root mean square, an R-R interval first-order difference and an R-R interval second-order difference, and the frequency domain characteristics comprise a low-frequency power spectrum, a high-low frequency proportion, a power spectrum peak value and a peak frequency; the non-linear characteristics include the HRV correlation index and Lyapunov index of heart rate variability.

Principal Component Analysis (PCA) is a dimension reduction method widely used in the fields of image analysis, data mining and the like, and the main idea is as follows: a plurality of variables in the multi-dimensional feature vector are converted into main components with smaller quantity, each main component is a linear combination of initial features, and simultaneously the condition that data variables are linearly independent to each other is met, namely, n-dimensional features are mapped to k-dimensional features (k < n), and brand new k-dimensional orthogonal features are reconstructed, so that the purpose of reducing the dimensions is achieved. The principal component calculation formula of the principal component analysis method is as follows:

P_i＝(X₁，X₂，X₃，…，X_p)*(L₁，L₂，L₃，…，L_p)^T

wherein, P_i(i ═ 1,2,3, … p) represents each column vector in the feature matrix, L_i(i-1, 2,3, … P) as each column vector of the principal component load matrix L, P_iPerforming angry emotion recognition for the feature quantity;

and fusing the two extracted features to form a multi-dimensional feature space, then calculating the principal components of the multi-dimensional feature space based on a calculation formula of the principal component analysis method, and taking each principal component with the accumulated contribution rate of 95% or more as a new feature vector matrix so as to achieve the purpose of reducing the dimension.

And selecting a Support Vector Machine (SVM) algorithm in machine learning for the driver road rage emotion recognition model. The support vector machine is a two-classification pattern recognition algorithm, and can be conventionally understood as seeking a linear classifier with the maximum distance interval on a feature space, and a learning strategy is seeking to maximize the distance between two patterns, so that the solution becomes a convex quadratic programming problem. The core strategy of support vector machines is to seek a classification hyperplane that maximizes the classification interval.

For the linear inseparable problem, the method for converting the linear inseparable problem into the high-dimensional space can convert the linear inseparable problem into the linear inseparable problem. The support vector machine solves the inner product operation difficulty in a high-dimensional space by introducing a kernel function, so that the classification of the nonlinear model becomes possible. The basic function of the kernel function is to accept the vectors of two low-dimensional spaces, solve the vector inner product in the transformed high-dimensional space, and then solve the classifier.

And extracting part of experimental samples from the fused new characteristic vector matrix to serve as training samples of input vectors, training the decision maker, and accordingly constructing a road rage emotion decision maker with high recognition capability, improving the generalization capability and robustness of the decision maker in the later stage, and finally constructing a driver road rage emotion recognition model.

FIG. 4 is a flow chart for establishing an SVM road rage emotion classification recognition model, which comprises the following steps:

step (1): randomly dividing road rage emotion characteristics of the driver subjected to PCA dimension reduction into a training set and a testing set;

step (2): a kernel function type is selected. The method selects a Radial Basis Function (RBF) kernel function because the selected kernel function can nonlinearly map samples, so that the nonlinear relation between classification labels and attributes is processed, and the method has smaller numerical complexity compared with other kernel functions. The radial basis kernel function formula is:

K(x_i,x_j)＝exp(-γ‖x_i-x_j‖²)

And then determining two parameters of a penalty factor C and the number of cross-checks. When the C value is larger, the penalty for misclassification is larger, but the C value is too large, so that overfitting can be caused, and the proper C value is important for the classification accuracy;

and (3): training a recognition model by using training set data, calculating a recognition rate by using a test set, and determining the road rage emotion classification recognition model to be usable when the recognition rate meets the expected requirement; when the recognition rate fails to meet the expected requirement, optimizing parameters such as a penalty factor C and the like, and performing model training by using the training set again until the recognition rate meets the expected requirement;

and (4): after new data are input, a road anger emotion recognition result is generated through a road anger emotion recognition model;

when the road rage emotion recognition result of the driver is angry, the driver can be reminded or pacified in time through the vehicle-mounted alarm device, so that the driver can adjust the self emotion in time, the bad driving consequence is avoided, and the bad influence of the road rage to the road traffic safety is reduced.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A road rage emotion recognition method based on fusion of facial images and pulse information is characterized in that the road rage emotion of a driver is accurately recognized and early warned based on multi-source information and data fusion technology, and the method comprises the following steps:

step (1): data acquisition:

step (2): data preprocessing:

extracting pulse features from the pulse information;

carrying out feature fusion on the facial features and the pulse features, and then carrying out dimension reduction processing based on a principal component analysis method to obtain road rage emotion features of a driver;

and (4): monitoring the emotional state of the driver in real time by using the road rage emotion recognition model obtained by training, and judging the road rage emotional state of the driver;

in the step (2), the facial features are extracted, specifically:

selecting a set time period from video image information, extracting a plurality of continuous frames of images, denoising the images, identifying and segmenting a face region in the images by using an adaptive skin color segmentation method based on a YCbCr color space, and extracting facial features from the face region based on gray projection integration, wherein the face region comprises human eyes, eyebrows and mouth regions;

the features extracted from the eye region, the eyebrow region and the mouth region are fused and then the anger state of the driver is comprehensively judged, so that the recognition rate is improved;

in the step (2), the feature fusion and dimension reduction specifically comprises:

2. The method for identifying road rage emotion based on fusion of facial image and pulse information as claimed in claim 1, wherein in step (1), the infrared high-speed camera firstly positions the driver and then collects the facial video image; the wrist strap type wireless pulse acquisition terminal adopts a photoelectric pulse sensor to acquire pulse information of a driver.

3. The road rage emotion recognition method based on fusion of facial image and pulse information as claimed in claim 1, wherein in the step (2), the pulse feature is extracted, specifically:

filtering and denoising the pulse signals based on wavelet transformation, and extracting pulse characteristics from a time domain and a frequency domain respectively, wherein the pulse characteristics comprise linear characteristics and nonlinear characteristics;

4. The method for identifying road rage emotion based on fusion of facial image and pulse information as claimed in claim 1, wherein in the step (3), part of experimental samples are extracted from the emotion recognition feature vector matrix to be a training set, the training set is used as an input vector of a Support Vector Machine (SVM), the SVM is trained, generalization ability and robustness of the SVM are improved, and finally a driver road rage emotion identification model is constructed.

5. The road rage emotion recognition method based on fusion of facial image and pulse information as claimed in claim 1, wherein the driver road rage emotion recognition model building step of step (3) is:

and when the recognition rate fails to meet the expected requirement, optimizing the penalty factor C, and performing model training by using the training set again until the recognition rate reaches the expected value.