CN114170585B

CN114170585B - Dangerous driving behavior recognition method and device, electronic equipment and storage medium

Info

Publication number: CN114170585B
Application number: CN202111358472.0A
Authority: CN
Inventors: 郑鹏; 刘志徽; 周东
Original assignee: Guangxi Zhongke Shuguang Cloud Computing Co ltd
Current assignee: Guangxi Zhongke Shuguang Cloud Computing Co ltd
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2023-03-24
Anticipated expiration: 2041-11-16
Also published as: CN114170585A

Abstract

The application discloses a dangerous driving behavior identification method, a dangerous driving behavior identification device, electronic equipment and a storage medium, wherein video data and audio data of a user in a cab are acquired to monitor the user in real time; performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features so as to extract micro-expression features, action features and sound features of a user, thereby performing mode fusion analysis in the aspects of hearing and vision; compared with the traditional single-mode processing mode, the method can acquire more comprehensive user information and more fully utilize heterogeneous information, so that the recognition result of the driving behavior has higher reliability and accuracy; and finally, if the driving behavior result belongs to dangerous driving behavior, warning information is sent to the user so as to achieve the purpose of prompting the user to drive safely.

Description

Dangerous driving behavior recognition method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of safe driving, and in particular, to a method and an apparatus for identifying dangerous driving behavior, an electronic device, and a storage medium.

Background

Drunk driving behaviors belong to abstract dangerous driving behaviors, cause serious harm to the life and health of people and bring unstable factors to the society. Therefore, the traffic police can intercept the vehicle by observing the face color of the driver and the vehicle running route, detect the alcohol content in the gas exhaled by the driver by combining an alcohol detector and judge whether the driver drives drunk. However, fish falling from the net is easy to occur in the manual interception detection mode, so a more intelligent detection mode is required to improve the traffic safety.

At present, there is an alcohol concentration detector installed in a cab according to a related art, and when the alcohol concentration detected by the alcohol concentration detector is greater than a preset value, a user is warned to stop driving and control an ignition system of a vehicle. However, alcohol may be present in some foods, or alcohol emitted in the cab is less, and cannot be detected by the detector, so that the alcohol concentration detector may make a false judgment or a false judgment.

Disclosure of Invention

The application provides a dangerous driving behavior identification method and device, electronic equipment and a storage medium, and aims to solve the technical problem that an existing dangerous driving behavior detection method is low in detection accuracy.

In order to solve the technical problem, in a first aspect, an embodiment of the present application provides a method for identifying dangerous driving behaviors, including:

acquiring video data and audio data of a user in a cab;

performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features, wherein the single-mode features comprise micro-expression features, action features and sound features;

performing feature fusion on the plurality of single-mode features to obtain fusion features;

performing secondary classification on the fusion characteristics to obtain a driving behavior result of the user;

and if the driving behavior result belongs to dangerous driving behavior, warning information is sent to the user.

The embodiment monitors the user in real time by acquiring the video data and the audio data of the user in the cab; performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features so as to extract micro-expression features, action features and sound features of a user, thereby performing mode fusion analysis in the aspects of hearing and vision; compared with the traditional single-mode processing mode, the method can acquire more comprehensive user information and more fully utilize heterogeneous information, so that the recognition result of the driving behavior has higher reliability and accuracy; and finally, if the driving behavior result belongs to dangerous driving behavior, warning information is sent to the user so as to achieve the purpose of prompting the user to drive safely.

In one embodiment, the feature extraction is performed on the video data and the audio data to obtain a plurality of single-mode features, and the method includes:

performing feature extraction on a video image in the video data by using a preset first multilayer perceptron to obtain micro expression features;

performing feature extraction on the video data by using a preset 3D convolutional neural network to obtain action features;

and performing feature extraction on the audio data by using a preset openSMILE tool to obtain the sound features.

In the embodiment, through different networks or tools, the single-mode features of the video data and the audio data are respectively extracted, so that comprehensive user feature information can be conveniently acquired, and the identification accuracy of the subsequent identification process is improved.

In a preferred embodiment, the method for extracting features of a video image in video data by using a preset first multi-layer perceptron to obtain micro-expression features includes:

extracting facial features of a user in each frame of video image of the video data;

and comparing the facial features with preset micro-expression features by utilizing a first multilayer perceptron, and determining the micro-expression features corresponding to the facial features.

In the embodiment, the facial features are compared with the preset micro-expression features to integrate the feature details, so that the identification accuracy and the reliability are improved.

In a preferred embodiment, the method for extracting features of video data by using a preset 3D convolutional neural network to obtain motion features includes:

inputting the number of channels and the number of frames of video data and the height and width of each frame of video image into a 3D convolutional neural network;

performing convolution operation on the video data by using a 3D filter in a 3D convolution neural network to obtain convolution result data;

and performing pooling operation and full connection operation on the convolution result data to obtain action characteristics.

In the embodiment, the feature extraction is carried out through the 3D convolutional neural network, and the inter-frame motion information with time dimension can be obtained, so that the action features in the video data can be better captured in the time dimension and the space dimension, the extracted action features are more comprehensive, and the identification accuracy and the reliability are further improved.

In a preferred embodiment, the extracting the features of the audio data by using a preset openSMILE tool to obtain the sound features includes:

removing background noise of the audio data, and standardizing the audio data after the background noise is removed to obtain target audio data;

performing feature extraction on the target audio data by using an openSMILE tool to obtain high-dimensional audio features;

and inputting the high-dimensional audio features into a preset second multilayer perceptron, and outputting the sound features.

The embodiment can avoid noise interference and make the performance of the feature extraction process better through denoising, standardization, feature extraction and dimension reduction of a perceptron, and the whole processing process is more efficient.

In one embodiment, performing feature fusion on a plurality of single-mode features to obtain a fused feature includes:

and performing feature splicing on the micro-expression features, the action features and the sound features according to a localization fusion mode to obtain fusion features.

In an embodiment, if the driving behavior result belongs to dangerous driving behavior, after the warning message is sent to the user, the method further includes:

carrying out time frame alignment and combination on the video data and the audio data to obtain video data;

and sending the video data and the driving behavior result to preset supervision equipment.

In the embodiment, the video data and the audio data are combined into the complete video data, and the video data and the driving behavior result are sent to the monitoring equipment, so that relevant monitoring personnel can timely process the driving behavior result, and traffic accidents caused by dangerous driving behaviors are prevented.

In a second aspect, an embodiment of the present application provides an apparatus for identifying dangerous driving behavior, including:

the acquisition module is used for acquiring video data and audio data of a user in a cab;

the extraction module is used for carrying out feature extraction on the video data and the audio data to obtain a plurality of single-mode features, wherein the single-mode features comprise micro-expression features, action features and sound features;

the fusion module is used for carrying out feature fusion on the plurality of single-mode features to obtain fusion features;

the classification module is used for carrying out secondary classification on the fusion characteristics to obtain a driving behavior result of the user;

and the sending module is used for sending warning information to the user if the driving behavior result belongs to dangerous driving behaviors.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory is used to store a computer program, and the computer program, when executed by the processor, implements the method for identifying dangerous driving behavior according to the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the method for identifying dangerous driving behavior according to the first aspect.

Please refer to the relevant description of the first aspect for the beneficial effects of the second aspect to the fourth aspect, which are not described herein again.

Drawings

Fig. 1 is a schematic flowchart of a dangerous driving behavior recognition method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a dangerous driving behavior recognition device provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As described in the related art, an alcohol concentration detector installed in a cab warns a user to stop driving and control an ignition system of a vehicle when the alcohol concentration detected by the alcohol concentration detector is greater than a preset value. However, alcohol may be present in some foods, or alcohol emitted in the cab is less, and cannot be detected by the detector, so that the alcohol concentration detector may make a false judgment or a false judgment.

Therefore, the embodiment of the application provides a dangerous driving behavior identification method, a dangerous driving behavior identification device, an electronic device and a storage medium, wherein the method comprises the steps of acquiring video data and audio data of a user in a cab to monitor the user in real time; performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features so as to extract micro-expression features, action features and sound features of a user, thereby performing mode fusion analysis in the aspects of hearing and vision; compared with the traditional single-mode processing mode, the method can acquire more comprehensive user information and more fully utilize heterogeneous information, so that the recognition result of the driving behavior has higher reliability and accuracy; and finally, if the driving behavior result belongs to dangerous driving behavior, warning information is sent to the user so as to achieve the purpose of prompting the user to drive safely.

Referring to fig. 1, fig. 1 is a schematic flowchart of a method for identifying dangerous driving behaviors provided in an embodiment of the present application. The dangerous driving behavior identification method can be applied to electronic equipment, and the electronic equipment comprises but is not limited to computing equipment such as a smart phone, a tablet computer, a personal digital assistant and a vehicle-mounted terminal which are installed in a cab. As shown in fig. 1, the method for identifying dangerous driving behavior includes steps S101 to S105, which are detailed as follows:

step S101, video data and audio data of a user in the cab are acquired.

In this step, video data is collected by a camera of the electronic device, and audio data is collected by a microphone of the electronic device. It will be appreciated that alcohol concentration data may also be collected by an alcohol concentration detector.

Step S102, performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features, wherein the single-mode features comprise micro-expression features, action features and sound features.

In this step, the micro-expression features are micro-expression features of the user, which include but are not limited to micro-expression features for fatigue, anger, crying, drowsiness and intoxication; the action characteristics are limb action characteristics of the user; the voice characteristics are voice characteristics of the user when speaking.

By constructing a model that best fits the data type for extracting features from various sources of information, it can be appreciated that features extracted from one source are independent of another. For example, in drunk driving image recognition, features extracted from the image are in the form of finer details, such as edges and surroundings, while corresponding features extracted from the picture are in the form of tokens. After all features important for prediction are extracted from at least two data sources (e.g. video, audio, etc.), the different features are combined into one shared representation.

And S103, performing feature fusion on the plurality of single-mode features to obtain fusion features.

In the step, the single-mode features are fused, compared with the traditional single-mode processing, more comprehensive information can be obtained, heterogeneous information can be more fully utilized, dangerous driving behaviors such as drunk driving behaviors and the like can be detected and analyzed, and the detection and analysis result has higher credibility and accuracy.

And step S104, performing secondary classification on the fusion characteristics to obtain the driving behavior result of the user.

In this step, the two classifications can be implemented based on a support vector machine, and the driving behavior results include non-dangerous driving behaviors and dangerous driving behaviors.

And step S105, if the driving behavior result belongs to dangerous driving behavior, warning information is sent to the user.

In this step, for example, voice warning information may be issued to the user through the in-vehicle device, or flashing warning information may be issued to the user through the indicator lamp.

In an embodiment, on the basis of the embodiment in fig. 1, the step S102 includes:

performing feature extraction on a video image in the video data by using a preset first multilayer perceptron to obtain the micro-expression features;

performing feature extraction on the video data by using a preset 3D convolutional neural network to obtain the action features;

In this embodiment, 3D-CNN (3D convolutional neural network) is formed by stacking a plurality of consecutive frames to form a cube, and then applying a 3D convolutional kernel in the cube. In this structure, each feature map in the convolutional layer is connected to a plurality of adjacent continuous frames in the previous layer, so as to capture motion information, and can better capture the temporal and spatial feature information in the video.

The openSMILE tool is a tool that operates in the form of a command line for extracting audio features by configuring a config file.

Optionally, the performing, by using a preset first multilayer perceptron, feature extraction on a video image in the video data to obtain the micro-expression feature includes:

and comparing the facial features with preset micro-expression features by using the first multilayer perceptron, and determining the micro-expression features corresponding to the facial features.

In this embodiment, 5 facial micro-expressions including fatigue, drowsiness, intoxication, anger, crying, etc. are converted into 5-dimensional vector codes and stored in the feature vector database for comparison with the subsequently input driver micro-expressions. In the embodiment, the micro-expression image of the driver is captured by the camera and is compared with the expression data, such as drunkenness, fatigue, drowsiness and the like, stored in the feature vector library, and the micro-expression feature wf is obtained after comparison. The advantage is to incorporate details so that the driving behavior detection is more convincing.

Optionally, the performing feature extraction on the video data by using a preset 3D convolutional neural network to obtain the motion feature includes:

inputting the number of channels and the number of frames of the video data and the height and width of each frame of video image into the 3D convolutional neural network;

performing convolution operation on the video data by using a 3D filter in the 3D convolution neural network to obtain convolution result data;

and performing pooling operation and full-connection operation on the convolution result data to obtain the action characteristics.

In this embodiment, exemplarily, the video is v, the input dimension is (c, f, h, w), c is the number of channels, f is the number of frames, and h and w represent the height and width of each frame respectively; the 3D filter fl is (fm, c, fd, fh, fw), fm is the number of feature maps, c is the number of channels, fd is the number of frames (depth of convolution kernel), fh and fw represent the height and width of the filter, respectively.

Using a 3D filter, a convolution operation is performed on the video data, the output dimension after convolution being (fm, c, f-fd +1, h-fh +1, w-fw + 1). Then, the maximum pooling (Max pooling) (mp, mp, mp) is used for pooling, and the pooling result is input into a full connection layer and a softmax layer with the size df, and the action characteristic vf is output.

Where the video data is RGB three channel, so c =3. Using 32 feature maps, the 3D convolution kernel is fd = fh = fw =5, so the fl dimension is 32 × 3 × 5 × 5 × 5, and the maximum pooled mp is 3. The resulting feature vector vf dimension is 300. In the embodiment, the controllable range of the convolution kernel is expanded to the time domain through the 3D-CNN, the flexibility is higher compared with the 2D convolution, the 3D-CNN (3D convolution neural network) is used for extracting the video features, and the 3D-CNN not only can extract the features of each frame of image of the video, but also can extract the features of the whole video in the time domain and the space domain. More motion information can be learned.

Optionally, the performing, by using a preset openSMILE tool, feature extraction on the audio data to obtain the sound feature includes:

removing the background noise of the audio data, and standardizing the audio data after the background noise is removed to obtain target audio data;

performing feature extraction on the target audio data by using the openSMILE tool to obtain high-dimensional audio features;

In this example, the audio tool SoX is used to remove the background noise, and the audio data after removing the background noise is normalized by Z-normalization. And after standardization, sending the audio signals to openSMILE to extract audio features to obtain 6373-dimensional audio features. Finally, a 300-dimensional feature vector af is obtained by a multi-layer perceptron. The advantage is a higher audio processing efficiency.

Wherein the SoX tool can read and write audio files in common format and selectively add some sound effects in the process. It can combine several input sources and synthesized sound effect, and can be used as audio player or multitrack recorder on many systems. The basic flow of the SoX tool for processing audio includes Input(s), combiner, effects and Output(s). All functions of the SoX tool can be realized by a simple SoX command and corresponding options, but it simultaneously provides a play command for playing an audio file, a rec command for recording audio, and a SoX command for acquiring information contained in a file header of the audio.

In an embodiment, the performing feature fusion on a plurality of the single-mode features to obtain a fused feature includes:

and performing feature splicing on the micro-expression feature, the action feature and the sound feature according to a localization fusion mode to obtain the fusion feature.

In this embodiment, driving behavior is predicted by multimodal fusion processing to combine information from two or more modalities. In this embodiment, the feature vectors af, vf, and mf are directly spliced by a localization fusion manner to obtain zf = [ vf ] of 605 dimensions; af; wf vector. Compared with traditional single-mode processing, the method and the device can acquire more comprehensive information, can make full use of heterogeneous information, and have more credibility and accuracy on the detection and analysis result of dangerous driving behaviors.

In an embodiment, after sending warning information to the user if the driving behavior result belongs to dangerous driving behavior, the method further includes:

performing time frame alignment and combination on the video data and the audio data to obtain video data;

In this embodiment, the supervision device may be a supervision device of a traffic management department. The video data and the audio data are combined after time frame alignment is carried out on the video data and the audio data, so that the video data are obtained, and the continuity of the video data is ensured. And reporting the driving condition of the driver to relevant departments for processing through a network. Can accomplish humanized intelligence and remind the driver to let relevant departments foretell the dangerous action of driver early, in order can in time arrange personnel to handle in advance, prevent dangerous accidents such as drunk driving.

In order to execute the dangerous driving behavior identification method corresponding to the method embodiment, corresponding functions and technical effects are achieved. Referring to fig. 2, fig. 2 is a block diagram illustrating a structure of a dangerous driving behavior recognition apparatus according to an embodiment of the present application. For convenience of explanation, only the portions related to the present embodiment are shown, and the dangerous driving behavior recognition apparatus provided in the embodiment of the present application includes:

an obtaining module 201, configured to obtain video data and audio data of a user in a cab;

an extraction module 202, configured to perform feature extraction on the video data and the audio data to obtain a plurality of single-modality features, where the single-modality features include a micro-expression feature, an action feature, and a sound feature;

a fusion module 203, configured to perform feature fusion on the single-mode features to obtain a fusion feature;

the classification module 204 is configured to perform secondary classification on the fusion features to obtain a driving behavior result of the user;

a sending module 205, configured to send warning information to the user if the driving behavior result belongs to dangerous driving behavior.

In one embodiment, the extraction module 202 includes:

the first extraction unit is used for extracting the characteristics of the video image in the video data by utilizing a preset first multilayer perceptron to obtain the micro-expression characteristics;

the second extraction unit is used for extracting the characteristics of the video data by using a preset 3D convolutional neural network to obtain the action characteristics;

and the third extraction unit is used for extracting the characteristics of the audio data by using a preset openSMILE tool to obtain the sound characteristics.

In a preferred embodiment, the first extraction unit includes:

the first extraction subunit is used for extracting facial features of a user in each frame of video image of the video data;

and the comparison subunit is used for comparing the facial features with preset micro-expression features by using the first multilayer perceptron, and determining the micro-expression features corresponding to the facial features.

In a preferred embodiment, the second extraction unit includes:

the input subunit is used for inputting the channel number, the frame number, the height and the width of each frame of video image of the video data into the 3D convolutional neural network;

the convolution subunit is configured to perform convolution operation on the video data by using a 3D filter in the 3D convolution neural network to obtain convolution result data;

and the first output subunit is used for performing pooling operation and full-connection operation on the convolution result data to obtain the action characteristics.

In a preferred embodiment, the third extraction unit includes:

the removing subunit is used for removing the background noise of the audio data and standardizing the audio data after the background noise is removed to obtain target audio data;

the second extraction subunit is used for extracting the features of the target audio data by using the openSMILE tool to obtain high-dimensional audio features;

and the second output subunit is used for inputting the high-dimensional audio features to a preset second multilayer perceptron and outputting the sound features.

In one embodiment, the fusion module 203 includes:

and the splicing unit is used for performing characteristic splicing on the micro-expression characteristic, the action characteristic and the sound characteristic according to a collocation fusion mode to obtain the fusion characteristic.

In an embodiment, the identification apparatus further includes:

the combination module is used for carrying out time frame alignment and combination on the video data and the audio data to obtain video data;

and the sending unit is used for sending the video data and the driving behavior result to preset monitoring equipment.

The dangerous driving behavior recognition device can implement the dangerous driving behavior recognition method of the method embodiment. The alternatives in the above-described method embodiments are also applicable to this embodiment and will not be described in detail here. The rest of the embodiments of the present application may refer to the contents of the above method embodiments, and in this embodiment, details are not described again.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the application. As shown in fig. 3, the electronic apparatus 3 of this embodiment includes: at least one processor 30 (only one shown in fig. 3), a memory 31, and a computer program 32 stored in the memory 31 and executable on the at least one processor 30, the processor 30 implementing the steps of any of the method embodiments described above when executing the computer program 32.

The electronic device 3 may be a computing device such as a smart phone, a tablet computer, and a vehicle-mounted terminal. The electronic device may include, but is not limited to, a processor 30, a memory 31. Those skilled in the art will appreciate that fig. 3 is only an example of the electronic device 3, and does not constitute a limitation to the electronic device 3, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.

The Processor 30 may be a Central Processing Unit (CPU), and the Processor 30 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 31 may in some embodiments be an internal storage unit of the electronic device 3, such as a hard disk or a memory of the electronic device 3. The memory 31 may also be an external storage device of the electronic device 3 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the electronic device 3. The memory 31 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 31 may also be used to temporarily store data that has been output or is to be output.

In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in any of the method embodiments described above.

The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.

In several embodiments provided herein, it will be understood that each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a terminal device to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

The above-mentioned embodiments are further detailed to explain the objects, technical solutions and advantages of the present application, and it should be understood that the above-mentioned embodiments are only examples of the present application and are not intended to limit the scope of the present application. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the present application, may occur to those skilled in the art and are intended to be included within the scope of the present application.

Claims

1. A method for identifying dangerous driving behavior, comprising:

acquiring video data and audio data of a user in a cab;

performing feature fusion on the single-mode features to obtain fusion features;

if the driving behavior result belongs to dangerous driving behavior, warning information is sent to the user;

the performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features includes:

the method for extracting the characteristics of the video image in the video data by using the preset first multilayer perceptron to obtain the micro-expression characteristics comprises the following steps:

and comparing the facial features with preset micro-expression features by utilizing the first multilayer perceptron to determine the micro-expression features corresponding to the facial features, wherein the preset micro-expression features are pre-stored in a feature vector database in a vector coding mode.

2. The method for identifying dangerous driving behavior according to claim 1, wherein said extracting features from said video data and said audio data to obtain a plurality of single-mode features further comprises:

3. The method for identifying dangerous driving behaviors as claimed in claim 2, wherein said performing feature extraction on said video data by using a preset 3D convolutional neural network to obtain said motion features comprises:

4. The method for recognizing dangerous driving behavior according to claim 2, wherein the performing feature extraction on the audio data by using a preset openSMILE tool to obtain the sound features comprises:

5. The method for identifying dangerous driving behavior according to claim 1, wherein said performing feature fusion on a plurality of said single-modal features to obtain a fused feature comprises:

and performing feature splicing on the micro-expression features, the action features and the sound features according to a localization fusion mode to obtain the fusion features.

6. The method for identifying dangerous driving behavior according to claim 1, wherein after the step of sending warning information to the user if the driving behavior result belongs to dangerous driving behavior, the method further comprises:

7. An apparatus for recognizing dangerous driving behavior, comprising:

the fusion module is used for carrying out feature fusion on the single-mode features to obtain fusion features;

the sending module is used for sending warning information to the user if the driving behavior result belongs to dangerous driving behavior;

the extraction module comprises:

the first extraction unit includes:

and the comparison subunit is used for comparing the facial features with preset micro-expression features by utilizing the first multilayer perceptron, and determining the micro-expression features corresponding to the facial features, wherein the preset micro-expression features are pre-stored in a feature vector database in a vector coding mode.

8. An electronic device, characterized in that it comprises a processor and a memory for storing a computer program which, when executed by the processor, implements a method for identifying dangerous driving behavior as claimed in any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that it stores a computer program which, when being executed by a processor, implements the method of identifying dangerous driving behavior of any one of claims 1 to 6.