CN109993093B

CN109993093B - Road rage monitoring method, system, equipment and medium based on facial and respiratory characteristics

Info

Publication number: CN109993093B
Application number: CN201910228205.8A
Authority: CN
Inventors: 杨立才; 张成昱
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2022-10-25
Anticipated expiration: 2039-03-25
Also published as: CN109993093A

Abstract

The invention discloses a road rage monitoring method, a system, equipment and a medium based on facial and respiratory characteristics, wherein facial images and respiratory information of a driver are collected, the facial images and the respiratory information are respectively preprocessed, and characteristics capable of reflecting road rage emotion of the driver are extracted; performing feature fusion on the extracted two types of features, and then establishing a driver road rage emotion recognition model based on a machine learning method; the model can judge whether the driver is in the road rage state or not and can adjust the road rage emotion of the driver according to the result. The invention can detect the emotional state of the driver under the condition of not influencing the normal driving of the driver because of using the image and the breathing information which are easy to collect, and can remind the driver through the audio equipment, warn and adjust the emotion when the driver is in the road rage.

Description

Road rage monitoring method, system, equipment and medium based on facial and respiratory characteristics

Technical Field

The present disclosure relates to road rage monitoring methods, systems, devices, and media based on features of facial images and respiratory information.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

In the course of implementing the present disclosure, the inventors found that the following technical problems exist in the prior art:

"road irritability" refers to the behavior of drivers of automobiles or other vehicles that is offensive or angry during driving, such as gross gestures, verbal insults, intentional driving of the vehicle in unsafe or safety-threatening ways, and the like. Studies have shown that road irritability affects the normal driving of the driver. The behaviors of aggressive driving, dangerous driving, wrong driving and the like are in positive correlation with the road rage. At present, road rage driving becomes an important cause of traffic accidents, so that it is necessary to identify road rage emotion of a driver, perform safety warning and adjust emotion.

The driver road rage emotion recognition method mostly adopts a method of fusing facial images and physiological signals to judge the emotion of the driver. At present, electroencephalogram signals and pulse signals are the most commonly used physiological signals, and can be used for identifying road rage emotion, but due to the current sensor technology, the signal acquisition devices of the electroencephalogram signals and the pulse signals can affect the normal driving behavior of a driver more or less. In order to obtain an electroencephalogram signal, a driver needs to wear an electroencephalogram cap on the head; in order to acquire the pulse signal, a sensor needs to be worn on the wrist or finger of the driver. These signal acquisition devices increase the burden on the driver, may cause discomfort to the driver, and may affect the normal driving behavior of the driver.

Disclosure of Invention

In order to solve the defects of the prior art, the disclosure provides a road rage monitoring method, a system, equipment and a medium based on facial and respiratory characteristics, aiming at identifying the road rage emotion of a driver, giving out a warning and adjusting the emotion of the driver on the premise of not influencing the normal driving of the driver.

In a first aspect, the present disclosure provides a road rage monitoring method based on facial and respiratory characteristics;

road rage monitoring method based on facial and respiratory characteristics comprises the following steps:

acquiring a face video and breathing data of a driver;

extracting a face area image from the face video, and extracting facial features from the obtained face area image;

extracting respiratory characteristics from the acquired respiratory data;

carrying out feature fusion on the collected facial features and the collected respiratory features;

and inputting the fused features into a trained deep learning model, and outputting the monitoring state of road rage.

In a second aspect, the present disclosure provides a road rage monitoring system based on facial and respiratory characteristics;

road anger monitoring system based on face and breathing characteristic includes:

the acquisition module is used for acquiring facial videos and breathing data of a driver;

the facial feature extraction module is used for extracting a facial region image from the facial video and extracting facial features from the acquired facial region image;

the respiratory feature extraction module is used for extracting respiratory features from the acquired respiratory data;

the feature fusion module is used for carrying out feature fusion on the collected facial features and the collected respiratory features;

and the road rage state monitoring module is used for inputting the fused features into the trained deep learning model and outputting the road rage monitoring state.

In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, which, when executed by the processor, perform the method of the first aspect.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the steps of the method of the first aspect.

Compared with the prior art, this disclosed beneficial effect is:

1. the method based on HOG specific extraction and image pyramid is used for extracting the face image part in the video, so that the face image can be quickly and effectively extracted.

2. The facial image and the respiratory information are used for feature fusion, and a correlation method of the convolutional neural network in machine learning is used, so that the trained road rage recognition model is high in reliability.

3. The breathing information is used as a signal source, and the abdominal belt type breathing acquisition terminal is integrated on the safety belt, so that the breathing signal can be stably acquired.

4. The camera in the information acquisition is arranged in front of the driver, and the abdominal belt type respiration acquisition terminal is integrated on the safety belt, so that the normal driving operation of the driver is not influenced.

5. If the breathing signal is used as the signal source, the abdominal belt type breathing acquisition terminal can be arranged on the safety belt. The driver all need wear the safety belt when driving, with the integration of binder formula respiratory device on the safety belt, can enough acquire effectual breathing signal, can not influence driver's normal driving again.

6. The convolutional neural network is a common algorithm in deep learning and is one of core algorithms in the field of image recognition, and has a good effect in the problems of image processing and classification recognition. The road anger emotion recognition model is trained by using the convolutional neural network, information fusion is carried out on the face information and the breathing information, and the recognition rate and the robustness of the model can be effectively improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a diagram of the hardware system of the present invention;

FIG. 3 is a block diagram of a convolutional neural network of the present invention;

FIG. 4 is a diagram of a facial image extraction step of the present invention;

fig. 5 is a diagram of steps of training a road rage emotion recognition model of the present invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Chinese interpretation and english full name of HOG: histogram of Oriented Gradients (HOG)

The first embodiment is as follows:

as shown in fig. 1, the road rage monitoring method based on facial and respiratory characteristics includes:

acquiring a face video and breathing data of a driver;

extracting respiratory characteristics from the acquired respiratory data;

As an embodiment, the specific steps of extracting the face region image from the face video are as follows:

selecting a facial image with set duration, extracting a frame of facial image at set time intervals, extracting a plurality of frames of facial images together, and performing smoothing processing and denoising processing on each extracted frame of facial image;

and extracting the facial area images of the driver from the plurality of frames of images subjected to denoising processing by adopting a mode based on HOG feature extraction and an image pyramid.

As shown in fig. 4, further, the specific steps of extracting the facial region image of the driver from the denoised frames of images in a way based on HOG feature extraction and image pyramid include:

sub-sampling each frame of image subjected to denoising processing respectively, and constructing an image pyramid for each frame of image;

HOG characteristic vectors are extracted from each layer of sub-image of each image pyramid, and the extracted HOG characteristic vectors are subjected to standardization processing;

finally, cascading HOG feature vectors of all layers in each image pyramid to obtain HOG pyramid features;

inputting the HOG pyramid characteristics into a Support Vector Machine (SVM) face region detection model obtained through pre-training, reserving a face region part of the image, and deleting a non-face region to obtain a face region image of the current frame image.

And finally, based on a bilinear difference method, carrying out image size normalization processing on the extracted face region images of all the frame images so as to extract features subsequently.

It should be understood that the support vector machine SVM face region detection model obtained by pre-training is specifically trained as follows:

constructing a Support Vector Machine (SVM) model;

training a Support Vector Machine (SVM) model by using HOG pyramid characteristics of a historical driver face image with a face region label and a non-face region label;

and obtaining a trained face region detection model of the support vector machine SVM.

The well-trained standard of the support vector machine SVM is that the classification accuracy exceeds a set threshold value.

Further, the specific step of extracting the HOG feature vector for each layer of sub-image of the image pyramid is as follows:

calculating the gradient g of each pixel point (x, y) of the image _x 、g _y Gradient magnitude g and direction θ.

g _x ＝f(x+1,y)-f(x-1,y)；

g _y ＝f(x,y+1)-f(x,y-1)；

Dividing a single image into area blocks with a set number and the same size, and dividing each area block into unit cell with the set number and the same size;

respectively counting the gradient direction histogram of each cell based on the gradient amplitude g and the direction theta of each pixel point, connecting the gradient histograms of the cells in the same region block into a region block histogram, carrying out L2-norm standardization on the region block histogram, and finally cascading the feature vectors of all the region block blocks to obtain the HOG feature vector of the whole image.

Further, the specific steps of normalizing the extracted HOG feature vector are as follows:

wherein x is HOG feature vector, | | x | | non-calculation ₂ Is a norm of order 2 for x, and ε is a constant.

As an embodiment, the specific steps of extracting facial features from the acquired facial image are as follows:

performing facial feature extraction on the preprocessed facial image of the driver, and extracting facial features by using a convolutional neural network;

feature extraction of the face image is performed using a convolutional neural network. The convolutional neural network is a representative algorithm commonly used in deep learning, and is a type of neural network which contains convolution calculation and has a deep structure. The method can extract the distinguishing characteristics of the images in the fine classification and identification of the images so as to be used for other classifiers to learn. As shown in FIG. 3, a convolutional neural network generally consists of an input layer, a convolutional layer, a pooling layer, a fully-connected layer, and an output layer. The input layer is input data, here an image of the face of the driver; the convolution layer performs traversal processing on data through convolution kernel, extracts the characteristics of the input layer, the convolution kernel is usually a 3 × 3 or 5 × 5 weight matrix, performs matrix multiplication on input elements, and outputs a single element value. After the feature processing is performed on the convolutional layer, the feature mapping is performed by using an activation function to reduce the number of features, the activation function can introduce nonlinear elements into the neural network, and common activation functions include a Sigmoid function, a Tanh function, a ReLU function and the like. The full-connection layer expands the extracted multi-dimensional features into feature vectors and transmits the feature vectors to the output layer through the excitation function. The output layer can process the feature vectors by using a classification function for the classification problem and output a classification label. Where a convolutional neural network may have multiple convolutional and pooling layers. Here, feature vectors of the face image are extracted first, and output classification is not performed.

As an example, the acquired breathing data may need to be preprocessed before the step of acquiring breathing characteristics from the acquired breathing data.

Further, the specific steps of preprocessing the acquired respiratory data are as follows:

and denoising and filtering the respiratory data based on an empirical mode decomposition method.

It should be understood that the formula for decomposing the signal x (t) using empirical mode decomposition is as follows:

wherein, imf _i (t) is the ith IMF component, and RES represents the residual amount.

Empirical mode decomposition EMD decomposes the signal into finite IMF components and a residual RES.

Each IMF component must satisfy two conditions:

(1) The difference value between the number of extreme points and the number of zero-crossing points of the signal is required to be less than or equal to 1;

(2) The mean of the upper and lower envelopes at any point of the signal is zero.

After each IMF component is decomposed, whether the RES can continuously decompose the IMF component meeting the condition is judged, if yes, the RES continues to decompose the IMF component, and if not, the RES ends.

Has the advantages that: after the signal is decomposed by EMD method, the noise and unnecessary signal part can be removed. And carrying out filtering and denoising treatment on the acquired respiratory signals based on an Empirical Mode Decomposition (EMD) method. Empirical mode decomposition can adaptively decompose a signal without requiring a preset basis function and a decomposition function and without considering characteristics such as sparsity of the signal in advance.

As an embodiment, the specific steps of extracting the respiratory characteristics from the acquired respiratory data are as follows:

extracting respiratory characteristics of the respiratory data obtained by preprocessing, and extracting time domain characteristics, frequency domain characteristics and nonlinear characteristics;

the time domain features include: mean, standard deviation, skewness value and kurtosis value;

the calculation formulas of the skewness value s and the kurtosis value k are as follows:

wherein the content of the first and second substances,

the mean value of the respiratory signal, sigma the standard deviation of the respiratory signal, the skewness value representing the central symmetry degree of the signal, and the kurtosis value representing the steepness degree of the distribution form of the signal.

The frequency domain characteristic is the sum of the power of the respiratory signal in each frequency band, and each frequency band comprises: 0-0.1Hz,0.1-0.2Hz,0.2-0.3Hz,0.3-0.4Hz or 0.4-1Hz;

the nonlinear characteristic comprises: multi-scale entropy, approximate entropy, or heart rate variability;

the multi-scale entropy algorithm consists of a coarse graining process and sample entropy calculation, and the complexity of a time sequence is evaluated by calculating the sample entropy on a plurality of time scales. When a driver is in an angry state, breathing becomes relatively tense and jerky, the complexity of the breathing signal time sequence is increased, and the value of the multi-scale entropy can be greatly changed compared with the quiet state.

Approximate entropy is a non-linear kinetic parameter used to quantify regularity and irregularity of time series fluctuations. It reflects the probability of new information in the time series, and the more irregular the time series, the larger the corresponding approximate entropy. When a driver is in an angry state, the breathing signal fluctuates relatively to the breathing signal in a quiet state, the irregularity of the time series is large, and the approximate entropy is large.

The heart rate variability refers to the change of the difference of successive heart cycles, and contains information capable of reflecting part of cardiovascular diseases and also capable of reflecting the emotion of a human. When the driver is in an angry state, the periodic difference of the breathing signals can also be changed, and the heart rate variability correlation index characteristic can be used for judging the emotional state of the driver.

As an embodiment, the specific steps of feature fusion of the collected facial features and respiratory features are as follows:

normalizing the collected facial features and respiratory features by adopting a maximum and minimum normalization method;

and performing feature fusion on the facial features and the respiratory features obtained by the normalization processing in a weighting mode to obtain a fused feature vector.

As an embodiment, the deep learning model specifically includes: a convolutional neural network model.

As an embodiment, as shown in fig. 5, the training steps of the trained deep learning model are as follows:

acquiring a face image and breathing data of a driver;

extracting facial features from the acquired facial image, and extracting respiratory features from the acquired respiratory data;

performing feature fusion on the collected facial features and the collected respiratory features; obtaining a fused feature vector;

labeling a road anger label and a non-road anger label for the fused feature vector;

dividing the fusion characteristic vector of the label into a training set and a test set;

constructing a convolutional neural network model, inputting a training set into the convolutional neural network model, training the convolutional neural network model, and obtaining a preliminarily trained convolutional neural network model when the recognition rate reaches a set threshold value; otherwise, continuing training;

and then inputting the test set into the preliminarily trained convolutional neural network model, testing the preliminarily trained convolutional neural network, if the test classification accuracy is higher than a set threshold value, obtaining the trained convolutional neural network model, otherwise, optimizing parameters of the convolutional neural network, updating the training set, re-training until the trained convolutional neural network model is obtained, and ending.

As an embodiment, the monitoring state of road rage includes: road irritability or no road irritability.

As an example, the facial image is acquired by an infrared high-speed camera; the respiration data is acquired through the abdominal belt type respiration acquisition terminal.

Optionally, the infrared high-speed camera is fixedly arranged on an instrument panel right in front of the main driver seat, and the infrared high-speed camera is connected with the controller; and uploading the acquired facial image or video to the controller.

It should be understood that the infrared high speed camera is capable of rotational movement within a range so as to acquire a frontal face image of the driver.

Further, the infrared high-speed camera can collect face images of drivers at night.

The beneficial effects of the above technical scheme are that, can realize the collection of driver's all-round facial image, avoid only gathering the left face or only gathering the condition appearance of right face.

Optionally, the abdominal belt type respiration acquisition terminal is a pressure sensor, the pressure sensor is connected with the controller, and the acquired respiration data is uploaded to the controller; the pressure sensor is arranged on a safety belt of the main driver seat, and is positioned in the middle of the abdomen of a driver after the driver fastens the safety belt, and the pressure sensor is arranged in the safety belt; the pressure sensor is responsible for collecting abdominal pressure data of the driver, which is considered as breathing data.

The beneficial effect of placing the pressure sensor in the safety belt is that the flexible driving action of the driver can be realized, the influence on the driving action radius of the driver caused by placing the pressure sensor at other positions is avoided, and the behavior action of the driver is not restrained.

As shown in fig. 2, the controller respectively performs preprocessing and feature extraction on the acquired facial image and respiratory data, performs feature fusion on the extracted facial features and respiratory features, performs road rage result judgment on the result after the feature fusion, and if the road rage state is, the controller sends a control instruction to the audio device to remind the driver to adjust the emotion through the audio device. An audio device includes a microphone.

The second embodiment: a road rage monitoring system based on facial and respiratory characteristics is provided;

the breath characteristic extraction module is used for extracting breath characteristics from the acquired breath data;

Example three: the embodiment also provides an electronic device, which includes a memory, a processor, and a computer instruction stored in the memory and running on the processor, where the computer instruction completes each operation in the method when being run by the processor, and for brevity, details are not described here again.

The electronic device may be a mobile terminal and a non-mobile terminal, the non-mobile terminal includes a desktop computer, and the mobile terminal includes a Smart Phone (Smart Phone, such as Android Phone and IOS Phone), smart glasses, a Smart watch, a Smart band, a tablet computer, a notebook computer, a personal digital assistant, and other mobile internet devices capable of performing wireless communication.

It should be understood that in the present disclosure, the processor may be a central processing unit CPU, but may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the present disclosure may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here. Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. Road rage monitoring method based on facial and respiratory characteristics is characterized by comprising the following steps:

acquiring a face video and breathing data of a driver;

the face video is acquired through an infrared high-speed camera; the respiration data is acquired through a belly belt type respiration acquisition terminal;

the abdominal belt type respiration acquisition terminal is a pressure sensor, the pressure sensor is connected with the controller, and the acquired respiration data are uploaded to the controller; the pressure sensor is arranged on a safety belt of the main driver seat, and is positioned in the middle of the abdomen of a driver after the driver fastens the safety belt, and the pressure sensor is arranged in the safety belt; the pressure sensor is used for collecting abdominal pressure data of the driver, and the abdominal pressure data is regarded as breathing data;

the specific steps of extracting the face region image from the face video are as follows:

extracting a facial area image of the driver from the denoised frames of images by adopting a mode based on HOG feature extraction and an image pyramid;

the specific steps of extracting the facial region image of the driver from the plurality of frames of images subjected to denoising processing in a mode based on HOG feature extraction and image pyramid are as follows:

extracting HOG characteristic vectors from each layer of sub-images of each image pyramid, and performing standardization processing on the extracted HOG characteristic vectors;

the specific steps of extracting the HOG characteristic vector from each layer of sub-image of each image pyramid are as follows:

calculating the gradient g of each pixel point (x, y) of the image _x 、g _y Gradient magnitude g and direction θ;

g _x ＝f(x+1,y)-f(x-1,y)；

g _y ＝f(x,y+1)-f(x,y-1)；

respectively counting the gradient direction histogram of each cell based on the gradient amplitude g and the direction theta of each pixel point, connecting the gradient histograms of the cells in the same region block into a region block histogram, carrying out L2-norm standardization on the region block histogram, and finally cascading the feature vectors of all the region block blocks to obtain the HOG feature vector of the whole image;

the specific steps of carrying out standardization processing on the extracted HOG characteristic vector are as follows:

wherein a is HOG feature vector, | | a | | non-woven phosphor ₂ A norm of order 2 for a, ε is a constant;

inputting the HOG pyramid characteristics into a Support Vector Machine (SVM) face region detection model obtained by pre-training, reserving a face region part of an image, and deleting a non-face region to obtain a face region image of a current frame image;

the method comprises the following specific training process of pre-training an obtained Support Vector Machine (SVM) face region detection model:

constructing a Support Vector Machine (SVM) model;

obtaining a trained SVM face region detection model;

extracting respiratory characteristics from the acquired respiratory data;

before the step of collecting the respiratory characteristics from the respiratory characteristics collected from the acquired respiratory data, the acquired respiratory data needs to be preprocessed;

the specific steps of preprocessing the acquired respiratory data are as follows:

denoising and filtering the respiratory data based on an empirical mode decomposition method;

the formula for decomposing the signal x (t) by using the empirical mode decomposition method is as follows:

wherein, imf _i (t) is the ith IMF component, RES represents the residue;

an empirical mode decomposition method EMD decomposes a signal into a plurality of finite IMF components and a residual amount RES;

each IMF component must satisfy two conditions:

(2) The mean value of the upper envelope and the lower envelope of any point of the signal is zero;

after each IMF component is decomposed, judging whether the RES can continuously decompose the IMF component meeting the condition, if yes, continuing, and if not, ending;

the specific steps of extracting the respiratory characteristics from the acquired respiratory data are as follows:

wherein the content of the first and second substances,

the mean value of the respiratory signal is, sigma is the standard deviation of the respiratory signal, the skewness value represents the central symmetry degree of the signal, and the kurtosis value represents the steepness degree of the distribution form of the signal;

the nonlinear features include: multi-scale entropy, approximate entropy, or heart rate variability;

the multi-scale entropy algorithm consists of a coarse graining process and sample entropy calculation, and the complexity of a time sequence is evaluated by calculating the sample entropy on a plurality of time scales;

approximate entropy is a non-linear kinetic parameter used to quantify regularity and irregularity of time series fluctuations; the probability of new information in the time sequence is reflected, and the more irregular the time sequence is, the larger the corresponding approximate entropy is;

the heart rate variability refers to the change condition of the difference of successive heartbeat cycles, contains information capable of reflecting part of cardiovascular diseases and can also reflect the emotion of a person; when the driver is in an angry state, the periodic difference of the breathing signals can also change, and the heart rate variability correlation index characteristic can be used for judging the emotional state of the driver;

the specific steps of carrying out feature fusion on the collected facial features and the respiratory features are as follows:

normalizing the collected facial features and respiratory features by adopting a maximum and minimum normalization method; performing feature fusion on the facial features and the respiratory features obtained by normalization processing in a weighting mode to obtain fused feature vectors;

inputting the fused features into a trained deep learning model, and outputting a road rage monitoring state;

the training step of the trained deep learning model comprises the following steps:

acquiring a face image and breathing data of a driver;

labeling a road rage label and a non-road rage label for the fused feature vector;

building a convolutional neural network model, inputting a training set into the convolutional neural network model, training the convolutional neural network model, and obtaining a preliminarily trained convolutional neural network model when the recognition rate reaches a set threshold value; otherwise, continuing training;

inputting the test set into the preliminarily trained convolutional neural network model, testing the preliminarily trained convolutional neural network, obtaining the trained convolutional neural network model if the test classification accuracy is higher than a set threshold, otherwise, optimizing parameters of the convolutional neural network, updating the training set, re-training until the trained convolutional neural network model is obtained, and ending.

2. The system for monitoring road rage based on facial and respiratory characteristics, which adopts the method for monitoring road rage based on facial and respiratory characteristics as claimed in claim 1, is characterized by comprising:

the facial feature extraction module is used for extracting a facial area image from the facial video and extracting facial features from the acquired facial area image;

the specific steps of extracting the face area image from the face video are as follows:

selecting a facial image with set duration, extracting a frame of facial image at set time intervals, extracting a plurality of frames of facial images in total, and performing smoothing processing and denoising processing on each extracted frame of facial image;

extracting a face area image of a driver from a plurality of frames of images subjected to denoising processing by adopting a mode based on HOG feature extraction and an image pyramid, wherein the method comprises the following specific steps:

the feature fusion module is used for carrying out feature fusion on the acquired facial features and the respiratory features;

the road rage state monitoring module is used for inputting the fused features into a trained deep learning model and outputting a road rage monitoring state;

acquiring a face image and breathing data of a driver;

building a convolutional neural network model, inputting a training set into the convolutional neural network model, training the convolutional neural network model, and obtaining a preliminarily trained convolutional neural network model when the recognition rate reaches a set threshold value; otherwise, continuing training.

3. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the method of claim 1.

4. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of claim 1.