CN109993093B - Road rage monitoring method, system, equipment and medium based on facial and respiratory characteristics - Google Patents

Road rage monitoring method, system, equipment and medium based on facial and respiratory characteristics Download PDF

Info

Publication number
CN109993093B
CN109993093B CN201910228205.8A CN201910228205A CN109993093B CN 109993093 B CN109993093 B CN 109993093B CN 201910228205 A CN201910228205 A CN 201910228205A CN 109993093 B CN109993093 B CN 109993093B
Authority
CN
China
Prior art keywords
image
respiratory
facial
features
extracting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910228205.8A
Other languages
Chinese (zh)
Other versions
CN109993093A (en
Inventor
杨立才
张成昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201910228205.8A priority Critical patent/CN109993093B/en
Publication of CN109993093A publication Critical patent/CN109993093A/en
Application granted granted Critical
Publication of CN109993093B publication Critical patent/CN109993093B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • G06V10/507Summing image-intensity values; Histogram projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a road rage monitoring method, a system, equipment and a medium based on facial and respiratory characteristics, wherein facial images and respiratory information of a driver are collected, the facial images and the respiratory information are respectively preprocessed, and characteristics capable of reflecting road rage emotion of the driver are extracted; performing feature fusion on the extracted two types of features, and then establishing a driver road rage emotion recognition model based on a machine learning method; the model can judge whether the driver is in the road rage state or not and can adjust the road rage emotion of the driver according to the result. The invention can detect the emotional state of the driver under the condition of not influencing the normal driving of the driver because of using the image and the breathing information which are easy to collect, and can remind the driver through the audio equipment, warn and adjust the emotion when the driver is in the road rage.

Description

Road rage monitoring method, system, equipment and medium based on facial and respiratory characteristics
Technical Field
The present disclosure relates to road rage monitoring methods, systems, devices, and media based on features of facial images and respiratory information.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
In the course of implementing the present disclosure, the inventors found that the following technical problems exist in the prior art:
"road irritability" refers to the behavior of drivers of automobiles or other vehicles that is offensive or angry during driving, such as gross gestures, verbal insults, intentional driving of the vehicle in unsafe or safety-threatening ways, and the like. Studies have shown that road irritability affects the normal driving of the driver. The behaviors of aggressive driving, dangerous driving, wrong driving and the like are in positive correlation with the road rage. At present, road rage driving becomes an important cause of traffic accidents, so that it is necessary to identify road rage emotion of a driver, perform safety warning and adjust emotion.
The driver road rage emotion recognition method mostly adopts a method of fusing facial images and physiological signals to judge the emotion of the driver. At present, electroencephalogram signals and pulse signals are the most commonly used physiological signals, and can be used for identifying road rage emotion, but due to the current sensor technology, the signal acquisition devices of the electroencephalogram signals and the pulse signals can affect the normal driving behavior of a driver more or less. In order to obtain an electroencephalogram signal, a driver needs to wear an electroencephalogram cap on the head; in order to acquire the pulse signal, a sensor needs to be worn on the wrist or finger of the driver. These signal acquisition devices increase the burden on the driver, may cause discomfort to the driver, and may affect the normal driving behavior of the driver.
Disclosure of Invention
In order to solve the defects of the prior art, the disclosure provides a road rage monitoring method, a system, equipment and a medium based on facial and respiratory characteristics, aiming at identifying the road rage emotion of a driver, giving out a warning and adjusting the emotion of the driver on the premise of not influencing the normal driving of the driver.
In a first aspect, the present disclosure provides a road rage monitoring method based on facial and respiratory characteristics;
road rage monitoring method based on facial and respiratory characteristics comprises the following steps:
acquiring a face video and breathing data of a driver;
extracting a face area image from the face video, and extracting facial features from the obtained face area image;
extracting respiratory characteristics from the acquired respiratory data;
carrying out feature fusion on the collected facial features and the collected respiratory features;
and inputting the fused features into a trained deep learning model, and outputting the monitoring state of road rage.
In a second aspect, the present disclosure provides a road rage monitoring system based on facial and respiratory characteristics;
road anger monitoring system based on face and breathing characteristic includes:
the acquisition module is used for acquiring facial videos and breathing data of a driver;
the facial feature extraction module is used for extracting a facial region image from the facial video and extracting facial features from the acquired facial region image;
the respiratory feature extraction module is used for extracting respiratory features from the acquired respiratory data;
the feature fusion module is used for carrying out feature fusion on the collected facial features and the collected respiratory features;
and the road rage state monitoring module is used for inputting the fused features into the trained deep learning model and outputting the road rage monitoring state.
In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, which, when executed by the processor, perform the method of the first aspect.
In a fourth aspect, the present disclosure also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the steps of the method of the first aspect.
Compared with the prior art, this disclosed beneficial effect is:
1. the method based on HOG specific extraction and image pyramid is used for extracting the face image part in the video, so that the face image can be quickly and effectively extracted.
2. The facial image and the respiratory information are used for feature fusion, and a correlation method of the convolutional neural network in machine learning is used, so that the trained road rage recognition model is high in reliability.
3. The breathing information is used as a signal source, and the abdominal belt type breathing acquisition terminal is integrated on the safety belt, so that the breathing signal can be stably acquired.
4. The camera in the information acquisition is arranged in front of the driver, and the abdominal belt type respiration acquisition terminal is integrated on the safety belt, so that the normal driving operation of the driver is not influenced.
5. If the breathing signal is used as the signal source, the abdominal belt type breathing acquisition terminal can be arranged on the safety belt. The driver all need wear the safety belt when driving, with the integration of binder formula respiratory device on the safety belt, can enough acquire effectual breathing signal, can not influence driver's normal driving again.
6. The convolutional neural network is a common algorithm in deep learning and is one of core algorithms in the field of image recognition, and has a good effect in the problems of image processing and classification recognition. The road anger emotion recognition model is trained by using the convolutional neural network, information fusion is carried out on the face information and the breathing information, and the recognition rate and the robustness of the model can be effectively improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram of the hardware system of the present invention;
FIG. 3 is a block diagram of a convolutional neural network of the present invention;
FIG. 4 is a diagram of a facial image extraction step of the present invention;
fig. 5 is a diagram of steps of training a road rage emotion recognition model of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Chinese interpretation and english full name of HOG: histogram of Oriented Gradients (HOG)
The first embodiment is as follows:
as shown in fig. 1, the road rage monitoring method based on facial and respiratory characteristics includes:
acquiring a face video and breathing data of a driver;
extracting a face area image from the face video, and extracting facial features from the obtained face area image;
extracting respiratory characteristics from the acquired respiratory data;
carrying out feature fusion on the collected facial features and the collected respiratory features;
and inputting the fused features into a trained deep learning model, and outputting the monitoring state of road rage.
As an embodiment, the specific steps of extracting the face region image from the face video are as follows:
selecting a facial image with set duration, extracting a frame of facial image at set time intervals, extracting a plurality of frames of facial images together, and performing smoothing processing and denoising processing on each extracted frame of facial image;
and extracting the facial area images of the driver from the plurality of frames of images subjected to denoising processing by adopting a mode based on HOG feature extraction and an image pyramid.
As shown in fig. 4, further, the specific steps of extracting the facial region image of the driver from the denoised frames of images in a way based on HOG feature extraction and image pyramid include:
sub-sampling each frame of image subjected to denoising processing respectively, and constructing an image pyramid for each frame of image;
HOG characteristic vectors are extracted from each layer of sub-image of each image pyramid, and the extracted HOG characteristic vectors are subjected to standardization processing;
finally, cascading HOG feature vectors of all layers in each image pyramid to obtain HOG pyramid features;
inputting the HOG pyramid characteristics into a Support Vector Machine (SVM) face region detection model obtained through pre-training, reserving a face region part of the image, and deleting a non-face region to obtain a face region image of the current frame image.
And finally, based on a bilinear difference method, carrying out image size normalization processing on the extracted face region images of all the frame images so as to extract features subsequently.
It should be understood that the support vector machine SVM face region detection model obtained by pre-training is specifically trained as follows:
constructing a Support Vector Machine (SVM) model;
training a Support Vector Machine (SVM) model by using HOG pyramid characteristics of a historical driver face image with a face region label and a non-face region label;
and obtaining a trained face region detection model of the support vector machine SVM.
The well-trained standard of the support vector machine SVM is that the classification accuracy exceeds a set threshold value.
Further, the specific step of extracting the HOG feature vector for each layer of sub-image of the image pyramid is as follows:
calculating the gradient g of each pixel point (x, y) of the image x 、g y Gradient magnitude g and direction θ.
g x =f(x+1,y)-f(x-1,y);
g y =f(x,y+1)-f(x,y-1);
Figure BDA0002005894290000061
Figure BDA0002005894290000062
Dividing a single image into area blocks with a set number and the same size, and dividing each area block into unit cell with the set number and the same size;
respectively counting the gradient direction histogram of each cell based on the gradient amplitude g and the direction theta of each pixel point, connecting the gradient histograms of the cells in the same region block into a region block histogram, carrying out L2-norm standardization on the region block histogram, and finally cascading the feature vectors of all the region block blocks to obtain the HOG feature vector of the whole image.
Further, the specific steps of normalizing the extracted HOG feature vector are as follows:
Figure BDA0002005894290000063
wherein x is HOG feature vector, | | x | | non-calculation 2 Is a norm of order 2 for x, and ε is a constant.
As an embodiment, the specific steps of extracting facial features from the acquired facial image are as follows:
performing facial feature extraction on the preprocessed facial image of the driver, and extracting facial features by using a convolutional neural network;
feature extraction of the face image is performed using a convolutional neural network. The convolutional neural network is a representative algorithm commonly used in deep learning, and is a type of neural network which contains convolution calculation and has a deep structure. The method can extract the distinguishing characteristics of the images in the fine classification and identification of the images so as to be used for other classifiers to learn. As shown in FIG. 3, a convolutional neural network generally consists of an input layer, a convolutional layer, a pooling layer, a fully-connected layer, and an output layer. The input layer is input data, here an image of the face of the driver; the convolution layer performs traversal processing on data through convolution kernel, extracts the characteristics of the input layer, the convolution kernel is usually a 3 × 3 or 5 × 5 weight matrix, performs matrix multiplication on input elements, and outputs a single element value. After the feature processing is performed on the convolutional layer, the feature mapping is performed by using an activation function to reduce the number of features, the activation function can introduce nonlinear elements into the neural network, and common activation functions include a Sigmoid function, a Tanh function, a ReLU function and the like. The full-connection layer expands the extracted multi-dimensional features into feature vectors and transmits the feature vectors to the output layer through the excitation function. The output layer can process the feature vectors by using a classification function for the classification problem and output a classification label. Where a convolutional neural network may have multiple convolutional and pooling layers. Here, feature vectors of the face image are extracted first, and output classification is not performed.
As an example, the acquired breathing data may need to be preprocessed before the step of acquiring breathing characteristics from the acquired breathing data.
Further, the specific steps of preprocessing the acquired respiratory data are as follows:
and denoising and filtering the respiratory data based on an empirical mode decomposition method.
It should be understood that the formula for decomposing the signal x (t) using empirical mode decomposition is as follows:
Figure BDA0002005894290000071
wherein, imf i (t) is the ith IMF component, and RES represents the residual amount.
Empirical mode decomposition EMD decomposes the signal into finite IMF components and a residual RES.
Each IMF component must satisfy two conditions:
(1) The difference value between the number of extreme points and the number of zero-crossing points of the signal is required to be less than or equal to 1;
(2) The mean of the upper and lower envelopes at any point of the signal is zero.
After each IMF component is decomposed, whether the RES can continuously decompose the IMF component meeting the condition is judged, if yes, the RES continues to decompose the IMF component, and if not, the RES ends.
Has the advantages that: after the signal is decomposed by EMD method, the noise and unnecessary signal part can be removed. And carrying out filtering and denoising treatment on the acquired respiratory signals based on an Empirical Mode Decomposition (EMD) method. Empirical mode decomposition can adaptively decompose a signal without requiring a preset basis function and a decomposition function and without considering characteristics such as sparsity of the signal in advance.
As an embodiment, the specific steps of extracting the respiratory characteristics from the acquired respiratory data are as follows:
extracting respiratory characteristics of the respiratory data obtained by preprocessing, and extracting time domain characteristics, frequency domain characteristics and nonlinear characteristics;
the time domain features include: mean, standard deviation, skewness value and kurtosis value;
the calculation formulas of the skewness value s and the kurtosis value k are as follows:
Figure BDA0002005894290000081
Figure BDA0002005894290000082
wherein the content of the first and second substances,
Figure BDA0002005894290000083
the mean value of the respiratory signal, sigma the standard deviation of the respiratory signal, the skewness value representing the central symmetry degree of the signal, and the kurtosis value representing the steepness degree of the distribution form of the signal.
The frequency domain characteristic is the sum of the power of the respiratory signal in each frequency band, and each frequency band comprises: 0-0.1Hz,0.1-0.2Hz,0.2-0.3Hz,0.3-0.4Hz or 0.4-1Hz;
the nonlinear characteristic comprises: multi-scale entropy, approximate entropy, or heart rate variability;
the multi-scale entropy algorithm consists of a coarse graining process and sample entropy calculation, and the complexity of a time sequence is evaluated by calculating the sample entropy on a plurality of time scales. When a driver is in an angry state, breathing becomes relatively tense and jerky, the complexity of the breathing signal time sequence is increased, and the value of the multi-scale entropy can be greatly changed compared with the quiet state.
Approximate entropy is a non-linear kinetic parameter used to quantify regularity and irregularity of time series fluctuations. It reflects the probability of new information in the time series, and the more irregular the time series, the larger the corresponding approximate entropy. When a driver is in an angry state, the breathing signal fluctuates relatively to the breathing signal in a quiet state, the irregularity of the time series is large, and the approximate entropy is large.
The heart rate variability refers to the change of the difference of successive heart cycles, and contains information capable of reflecting part of cardiovascular diseases and also capable of reflecting the emotion of a human. When the driver is in an angry state, the periodic difference of the breathing signals can also be changed, and the heart rate variability correlation index characteristic can be used for judging the emotional state of the driver.
As an embodiment, the specific steps of feature fusion of the collected facial features and respiratory features are as follows:
normalizing the collected facial features and respiratory features by adopting a maximum and minimum normalization method;
and performing feature fusion on the facial features and the respiratory features obtained by the normalization processing in a weighting mode to obtain a fused feature vector.
As an embodiment, the deep learning model specifically includes: a convolutional neural network model.
As an embodiment, as shown in fig. 5, the training steps of the trained deep learning model are as follows:
acquiring a face image and breathing data of a driver;
extracting facial features from the acquired facial image, and extracting respiratory features from the acquired respiratory data;
performing feature fusion on the collected facial features and the collected respiratory features; obtaining a fused feature vector;
labeling a road anger label and a non-road anger label for the fused feature vector;
dividing the fusion characteristic vector of the label into a training set and a test set;
constructing a convolutional neural network model, inputting a training set into the convolutional neural network model, training the convolutional neural network model, and obtaining a preliminarily trained convolutional neural network model when the recognition rate reaches a set threshold value; otherwise, continuing training;
and then inputting the test set into the preliminarily trained convolutional neural network model, testing the preliminarily trained convolutional neural network, if the test classification accuracy is higher than a set threshold value, obtaining the trained convolutional neural network model, otherwise, optimizing parameters of the convolutional neural network, updating the training set, re-training until the trained convolutional neural network model is obtained, and ending.
As an embodiment, the monitoring state of road rage includes: road irritability or no road irritability.
As an example, the facial image is acquired by an infrared high-speed camera; the respiration data is acquired through the abdominal belt type respiration acquisition terminal.
Optionally, the infrared high-speed camera is fixedly arranged on an instrument panel right in front of the main driver seat, and the infrared high-speed camera is connected with the controller; and uploading the acquired facial image or video to the controller.
It should be understood that the infrared high speed camera is capable of rotational movement within a range so as to acquire a frontal face image of the driver.
Further, the infrared high-speed camera can collect face images of drivers at night.
The beneficial effects of the above technical scheme are that, can realize the collection of driver's all-round facial image, avoid only gathering the left face or only gathering the condition appearance of right face.
Optionally, the abdominal belt type respiration acquisition terminal is a pressure sensor, the pressure sensor is connected with the controller, and the acquired respiration data is uploaded to the controller; the pressure sensor is arranged on a safety belt of the main driver seat, and is positioned in the middle of the abdomen of a driver after the driver fastens the safety belt, and the pressure sensor is arranged in the safety belt; the pressure sensor is responsible for collecting abdominal pressure data of the driver, which is considered as breathing data.
The beneficial effect of placing the pressure sensor in the safety belt is that the flexible driving action of the driver can be realized, the influence on the driving action radius of the driver caused by placing the pressure sensor at other positions is avoided, and the behavior action of the driver is not restrained.
As shown in fig. 2, the controller respectively performs preprocessing and feature extraction on the acquired facial image and respiratory data, performs feature fusion on the extracted facial features and respiratory features, performs road rage result judgment on the result after the feature fusion, and if the road rage state is, the controller sends a control instruction to the audio device to remind the driver to adjust the emotion through the audio device. An audio device includes a microphone.
The second embodiment: a road rage monitoring system based on facial and respiratory characteristics is provided;
road anger monitoring system based on face and breathing characteristic includes:
the acquisition module is used for acquiring facial videos and breathing data of a driver;
the facial feature extraction module is used for extracting a facial region image from the facial video and extracting facial features from the acquired facial region image;
the breath characteristic extraction module is used for extracting breath characteristics from the acquired breath data;
the feature fusion module is used for carrying out feature fusion on the collected facial features and the collected respiratory features;
and the road rage state monitoring module is used for inputting the fused features into the trained deep learning model and outputting the road rage monitoring state.
Example three: the embodiment also provides an electronic device, which includes a memory, a processor, and a computer instruction stored in the memory and running on the processor, where the computer instruction completes each operation in the method when being run by the processor, and for brevity, details are not described here again.
The electronic device may be a mobile terminal and a non-mobile terminal, the non-mobile terminal includes a desktop computer, and the mobile terminal includes a Smart Phone (Smart Phone, such as Android Phone and IOS Phone), smart glasses, a Smart watch, a Smart band, a tablet computer, a notebook computer, a personal digital assistant, and other mobile internet devices capable of performing wireless communication.
It should be understood that in the present disclosure, the processor may be a central processing unit CPU, but may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the present disclosure may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here. Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (4)

1. Road rage monitoring method based on facial and respiratory characteristics is characterized by comprising the following steps:
acquiring a face video and breathing data of a driver;
the face video is acquired through an infrared high-speed camera; the respiration data is acquired through a belly belt type respiration acquisition terminal;
the abdominal belt type respiration acquisition terminal is a pressure sensor, the pressure sensor is connected with the controller, and the acquired respiration data are uploaded to the controller; the pressure sensor is arranged on a safety belt of the main driver seat, and is positioned in the middle of the abdomen of a driver after the driver fastens the safety belt, and the pressure sensor is arranged in the safety belt; the pressure sensor is used for collecting abdominal pressure data of the driver, and the abdominal pressure data is regarded as breathing data;
extracting a face area image from the face video, and extracting facial features from the obtained face area image;
the specific steps of extracting the face region image from the face video are as follows:
selecting a facial image with set duration, extracting a frame of facial image at set time intervals, extracting a plurality of frames of facial images together, and performing smoothing processing and denoising processing on each extracted frame of facial image;
extracting a facial area image of the driver from the denoised frames of images by adopting a mode based on HOG feature extraction and an image pyramid;
the specific steps of extracting the facial region image of the driver from the plurality of frames of images subjected to denoising processing in a mode based on HOG feature extraction and image pyramid are as follows:
sub-sampling each frame of image subjected to denoising processing respectively, and constructing an image pyramid for each frame of image;
extracting HOG characteristic vectors from each layer of sub-images of each image pyramid, and performing standardization processing on the extracted HOG characteristic vectors;
the specific steps of extracting the HOG characteristic vector from each layer of sub-image of each image pyramid are as follows:
calculating the gradient g of each pixel point (x, y) of the image x 、g y Gradient magnitude g and direction θ;
g x =f(x+1,y)-f(x-1,y);
g y =f(x,y+1)-f(x,y-1);
Figure FDA0003775970310000021
Figure FDA0003775970310000022
dividing a single image into area blocks with a set number and the same size, and dividing each area block into unit cell with the set number and the same size;
respectively counting the gradient direction histogram of each cell based on the gradient amplitude g and the direction theta of each pixel point, connecting the gradient histograms of the cells in the same region block into a region block histogram, carrying out L2-norm standardization on the region block histogram, and finally cascading the feature vectors of all the region block blocks to obtain the HOG feature vector of the whole image;
the specific steps of carrying out standardization processing on the extracted HOG characteristic vector are as follows:
Figure FDA0003775970310000023
wherein a is HOG feature vector, | | a | | non-woven phosphor 2 A norm of order 2 for a, ε is a constant;
finally, cascading HOG feature vectors of all layers in each image pyramid to obtain HOG pyramid features;
inputting the HOG pyramid characteristics into a Support Vector Machine (SVM) face region detection model obtained by pre-training, reserving a face region part of an image, and deleting a non-face region to obtain a face region image of a current frame image;
the method comprises the following specific training process of pre-training an obtained Support Vector Machine (SVM) face region detection model:
constructing a Support Vector Machine (SVM) model;
training a Support Vector Machine (SVM) model by using HOG pyramid characteristics of a historical driver face image with a face region label and a non-face region label;
obtaining a trained SVM face region detection model;
extracting respiratory characteristics from the acquired respiratory data;
before the step of collecting the respiratory characteristics from the respiratory characteristics collected from the acquired respiratory data, the acquired respiratory data needs to be preprocessed;
the specific steps of preprocessing the acquired respiratory data are as follows:
denoising and filtering the respiratory data based on an empirical mode decomposition method;
the formula for decomposing the signal x (t) by using the empirical mode decomposition method is as follows:
Figure FDA0003775970310000031
wherein, imf i (t) is the ith IMF component, RES represents the residue;
an empirical mode decomposition method EMD decomposes a signal into a plurality of finite IMF components and a residual amount RES;
each IMF component must satisfy two conditions:
(1) The difference value between the number of extreme points and the number of zero-crossing points of the signal is required to be less than or equal to 1;
(2) The mean value of the upper envelope and the lower envelope of any point of the signal is zero;
after each IMF component is decomposed, judging whether the RES can continuously decompose the IMF component meeting the condition, if yes, continuing, and if not, ending;
the specific steps of extracting the respiratory characteristics from the acquired respiratory data are as follows:
extracting respiratory characteristics of the respiratory data obtained by preprocessing, and extracting time domain characteristics, frequency domain characteristics and nonlinear characteristics;
the time domain features include: mean, standard deviation, skewness value and kurtosis value;
the calculation formulas of the skewness value s and the kurtosis value k are as follows:
Figure FDA0003775970310000041
Figure FDA0003775970310000042
wherein the content of the first and second substances,
Figure FDA0003775970310000043
the mean value of the respiratory signal is, sigma is the standard deviation of the respiratory signal, the skewness value represents the central symmetry degree of the signal, and the kurtosis value represents the steepness degree of the distribution form of the signal;
the frequency domain characteristic is the sum of the power of the respiratory signal in each frequency band, and each frequency band comprises: 0-0.1Hz,0.1-0.2Hz,0.2-0.3Hz,0.3-0.4Hz or 0.4-1Hz;
the nonlinear features include: multi-scale entropy, approximate entropy, or heart rate variability;
the multi-scale entropy algorithm consists of a coarse graining process and sample entropy calculation, and the complexity of a time sequence is evaluated by calculating the sample entropy on a plurality of time scales;
approximate entropy is a non-linear kinetic parameter used to quantify regularity and irregularity of time series fluctuations; the probability of new information in the time sequence is reflected, and the more irregular the time sequence is, the larger the corresponding approximate entropy is;
the heart rate variability refers to the change condition of the difference of successive heartbeat cycles, contains information capable of reflecting part of cardiovascular diseases and can also reflect the emotion of a person; when the driver is in an angry state, the periodic difference of the breathing signals can also change, and the heart rate variability correlation index characteristic can be used for judging the emotional state of the driver;
carrying out feature fusion on the collected facial features and the collected respiratory features;
the specific steps of carrying out feature fusion on the collected facial features and the respiratory features are as follows:
normalizing the collected facial features and respiratory features by adopting a maximum and minimum normalization method; performing feature fusion on the facial features and the respiratory features obtained by normalization processing in a weighting mode to obtain fused feature vectors;
inputting the fused features into a trained deep learning model, and outputting a road rage monitoring state;
the training step of the trained deep learning model comprises the following steps:
acquiring a face image and breathing data of a driver;
extracting facial features from the acquired facial image, and extracting respiratory features from the acquired respiratory data;
performing feature fusion on the collected facial features and the collected respiratory features; obtaining a fused feature vector;
labeling a road rage label and a non-road rage label for the fused feature vector;
dividing the fusion characteristic vector of the label into a training set and a test set;
building a convolutional neural network model, inputting a training set into the convolutional neural network model, training the convolutional neural network model, and obtaining a preliminarily trained convolutional neural network model when the recognition rate reaches a set threshold value; otherwise, continuing training;
inputting the test set into the preliminarily trained convolutional neural network model, testing the preliminarily trained convolutional neural network, obtaining the trained convolutional neural network model if the test classification accuracy is higher than a set threshold, otherwise, optimizing parameters of the convolutional neural network, updating the training set, re-training until the trained convolutional neural network model is obtained, and ending.
2. The system for monitoring road rage based on facial and respiratory characteristics, which adopts the method for monitoring road rage based on facial and respiratory characteristics as claimed in claim 1, is characterized by comprising:
the acquisition module is used for acquiring facial videos and breathing data of a driver;
the facial feature extraction module is used for extracting a facial area image from the facial video and extracting facial features from the acquired facial area image;
the specific steps of extracting the face area image from the face video are as follows:
selecting a facial image with set duration, extracting a frame of facial image at set time intervals, extracting a plurality of frames of facial images in total, and performing smoothing processing and denoising processing on each extracted frame of facial image;
extracting a face area image of a driver from a plurality of frames of images subjected to denoising processing by adopting a mode based on HOG feature extraction and an image pyramid, wherein the method comprises the following specific steps:
sub-sampling each frame of image subjected to denoising processing respectively, and constructing an image pyramid for each frame of image;
extracting HOG characteristic vectors from each layer of sub-images of each image pyramid, and performing standardization processing on the extracted HOG characteristic vectors;
finally, cascading HOG feature vectors of all layers in each image pyramid to obtain HOG pyramid features;
inputting the HOG pyramid characteristics into a Support Vector Machine (SVM) face region detection model obtained by pre-training, reserving a face region part of an image, and deleting a non-face region to obtain a face region image of a current frame image;
the respiratory feature extraction module is used for extracting respiratory features from the acquired respiratory data;
the feature fusion module is used for carrying out feature fusion on the acquired facial features and the respiratory features;
the specific steps of carrying out feature fusion on the collected facial features and the respiratory features are as follows:
normalizing the collected facial features and respiratory features by adopting a maximum and minimum normalization method; performing feature fusion on the facial features and the respiratory features obtained by normalization processing in a weighting mode to obtain fused feature vectors;
the road rage state monitoring module is used for inputting the fused features into a trained deep learning model and outputting a road rage monitoring state;
inputting the fused features into a trained deep learning model, and outputting a road rage monitoring state;
the training step of the trained deep learning model comprises the following steps:
acquiring a face image and breathing data of a driver;
extracting facial features from the acquired facial image, and extracting respiratory features from the acquired respiratory data;
performing feature fusion on the collected facial features and the collected respiratory features; obtaining a fused feature vector;
labeling a road rage label and a non-road rage label for the fused feature vector;
dividing the fusion characteristic vector of the label into a training set and a test set;
building a convolutional neural network model, inputting a training set into the convolutional neural network model, training the convolutional neural network model, and obtaining a preliminarily trained convolutional neural network model when the recognition rate reaches a set threshold value; otherwise, continuing training.
3. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the method of claim 1.
4. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of claim 1.
CN201910228205.8A 2019-03-25 2019-03-25 Road rage monitoring method, system, equipment and medium based on facial and respiratory characteristics Active CN109993093B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910228205.8A CN109993093B (en) 2019-03-25 2019-03-25 Road rage monitoring method, system, equipment and medium based on facial and respiratory characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910228205.8A CN109993093B (en) 2019-03-25 2019-03-25 Road rage monitoring method, system, equipment and medium based on facial and respiratory characteristics

Publications (2)

Publication Number Publication Date
CN109993093A CN109993093A (en) 2019-07-09
CN109993093B true CN109993093B (en) 2022-10-25

Family

ID=67131402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910228205.8A Active CN109993093B (en) 2019-03-25 2019-03-25 Road rage monitoring method, system, equipment and medium based on facial and respiratory characteristics

Country Status (1)

Country Link
CN (1) CN109993093B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781719A (en) * 2019-09-02 2020-02-11 中国航天员科研训练中心 Non-contact and contact cooperative mental state intelligent monitoring system
CN110751015B (en) * 2019-09-02 2023-04-11 合肥工业大学 Perfusion optimization and artificial intelligence emotion monitoring method for facial infrared heat map
CN110693508A (en) * 2019-09-02 2020-01-17 中国航天员科研训练中心 Multi-channel cooperative psychophysiological active sensing method and service robot
CN110751381A (en) * 2019-09-30 2020-02-04 东南大学 Road rage vehicle risk assessment and prevention and control method
CN111027391A (en) * 2019-11-12 2020-04-17 湖南大学 Fatigue state identification method based on CNN pyramid characteristics and LSTM
CN110991428A (en) * 2019-12-30 2020-04-10 山东大学 Breathing signal emotion recognition method and system based on multi-scale entropy
CN111127117A (en) * 2019-12-31 2020-05-08 上海能塔智能科技有限公司 Vehicle operation and use satisfaction identification processing method and device and electronic equipment
CN111626186A (en) * 2020-05-25 2020-09-04 宁波大学 Driver distraction detection method
CN111991012B (en) * 2020-09-04 2022-12-06 北京中科心研科技有限公司 Method and device for monitoring driving road rage state
CN112043252B (en) * 2020-10-10 2021-09-28 山东大学 Emotion recognition system and method based on respiratory component in pulse signal
CN112699774B (en) * 2020-12-28 2024-05-24 深延科技(北京)有限公司 Emotion recognition method and device for characters in video, computer equipment and medium
CN112712022B (en) * 2020-12-29 2023-05-23 华南理工大学 Pressure detection method, system, device and storage medium based on image recognition
CN113191212B (en) * 2021-04-12 2022-06-07 合肥中聚源智能科技有限公司 Driver road rage risk early warning method and system
CN113191283B (en) * 2021-05-08 2022-09-23 河北工业大学 Driving path decision method based on emotion change of on-road travelers
CN114312997B (en) * 2021-12-09 2023-04-07 科大讯飞股份有限公司 Vehicle steering control method, device and system and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107235045A (en) * 2017-06-29 2017-10-10 吉林大学 Consider physiology and the vehicle-mounted identification interactive system of driver road anger state of manipulation information
CN109498041A (en) * 2019-01-15 2019-03-22 吉林大学 Driver road anger state identification method based on brain electricity and pulse information

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1768701A (en) * 2005-07-21 2006-05-10 高春平 Integrated intelligent type physiological signal sensor
CN101699470A (en) * 2009-10-30 2010-04-28 华南理工大学 Extracting method for smiling face identification on picture of human face
DE102013018663B4 (en) * 2013-11-07 2017-05-24 Dräger Safety AG & Co. KGaA Device and a method for measuring an alcohol or Rauschmittelanteils in the breath of a driver
CN116389554A (en) * 2017-03-08 2023-07-04 理查德.A.罗思柴尔德 System for improving user's performance in athletic activities and method thereof
CN108053615B (en) * 2018-01-10 2020-12-25 山东大学 Method for detecting fatigue driving state of driver based on micro-expression
CN108216254B (en) * 2018-01-10 2020-03-10 山东大学 Road anger emotion recognition method based on fusion of facial image and pulse information

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107235045A (en) * 2017-06-29 2017-10-10 吉林大学 Consider physiology and the vehicle-mounted identification interactive system of driver road anger state of manipulation information
CN109498041A (en) * 2019-01-15 2019-03-22 吉林大学 Driver road anger state identification method based on brain electricity and pulse information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Toward Emotion Recognition in Car-Racing Drivers:A Biosignal Processing Approach;Christos D. Katsis et al.;《IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS》;20080331;第38卷(第3期);第502-512页 *

Also Published As

Publication number Publication date
CN109993093A (en) 2019-07-09

Similar Documents

Publication Publication Date Title
CN109993093B (en) Road rage monitoring method, system, equipment and medium based on facial and respiratory characteristics
Zhang et al. Driver fatigue detection based on eye state recognition
CN108216254B (en) Road anger emotion recognition method based on fusion of facial image and pulse information
Ngxande et al. Driver drowsiness detection using behavioral measures and machine learning techniques: A review of state-of-art techniques
CN107273845B (en) Facial expression recognition method based on confidence region and multi-feature weighted fusion
CN107924472B (en) Image classification method and system based on brain computer interface
Zhao et al. Intelligent recognition of fatigue and sleepiness based on inceptionV3-LSTM via multi-feature fusion
Rajamohana et al. Driver drowsiness detection system using hybrid approach of convolutional neural network and bidirectional long short term memory (CNN_BILSTM)
Ma et al. Wearable driver drowsiness detection using electrooculography signal
Liu et al. Real time detection of driver fatigue based on CNN‐LSTM
CN111178130A (en) Face recognition method, system and readable storage medium based on deep learning
CN112932501B (en) Method for automatically identifying insomnia based on one-dimensional convolutional neural network
CN109002774A (en) A kind of fatigue monitoring device and method based on convolutional neural networks
US11954922B2 (en) Method of processing signals indicative of a level of attention of a human individual, corresponding system, vehicle and computer program product
Alharbey et al. Fatigue state detection for tired persons in presence of driving periods
Zhao et al. Deep convolutional neural network for drowsy student state detection
CN112949560A (en) Method for identifying continuous expression change of long video expression interval under two-channel feature fusion
Walizad et al. Driver drowsiness detection system using convolutional neural network
Ukwuoma et al. Deep learning review on drivers drowsiness detection
CN111723869A (en) Special personnel-oriented intelligent behavior risk early warning method and system
CN106446822A (en) Blink detection method based on circle fitting
Faisal et al. Systematic development of real-time driver drowsiness detection system using deep learning
CN106384096B (en) A kind of fatigue driving monitoring method based on blink detection
CN111444863B (en) Driver emotion recognition method based on camera and adopting 5G vehicle-mounted network cloud assistance
Nissimagoudar et al. Driver alertness detection using CNN-BiLSTM and implementation on ARM-based SBC

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant