CN114170585B - Dangerous driving behavior recognition method and device, electronic equipment and storage medium - Google Patents

Dangerous driving behavior recognition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114170585B
CN114170585B CN202111358472.0A CN202111358472A CN114170585B CN 114170585 B CN114170585 B CN 114170585B CN 202111358472 A CN202111358472 A CN 202111358472A CN 114170585 B CN114170585 B CN 114170585B
Authority
CN
China
Prior art keywords
features
driving behavior
video data
user
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111358472.0A
Other languages
Chinese (zh)
Other versions
CN114170585A (en
Inventor
郑鹏
刘志徽
周东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Zhongke Shuguang Cloud Computing Co ltd
Original Assignee
Guangxi Zhongke Shuguang Cloud Computing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Zhongke Shuguang Cloud Computing Co ltd filed Critical Guangxi Zhongke Shuguang Cloud Computing Co ltd
Priority to CN202111358472.0A priority Critical patent/CN114170585B/en
Publication of CN114170585A publication Critical patent/CN114170585A/en
Application granted granted Critical
Publication of CN114170585B publication Critical patent/CN114170585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Traffic Control Systems (AREA)

Abstract

The application discloses a dangerous driving behavior identification method, a dangerous driving behavior identification device, electronic equipment and a storage medium, wherein video data and audio data of a user in a cab are acquired to monitor the user in real time; performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features so as to extract micro-expression features, action features and sound features of a user, thereby performing mode fusion analysis in the aspects of hearing and vision; compared with the traditional single-mode processing mode, the method can acquire more comprehensive user information and more fully utilize heterogeneous information, so that the recognition result of the driving behavior has higher reliability and accuracy; and finally, if the driving behavior result belongs to dangerous driving behavior, warning information is sent to the user so as to achieve the purpose of prompting the user to drive safely.

Description

Dangerous driving behavior recognition method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of safe driving, and in particular, to a method and an apparatus for identifying dangerous driving behavior, an electronic device, and a storage medium.
Background
Drunk driving behaviors belong to abstract dangerous driving behaviors, cause serious harm to the life and health of people and bring unstable factors to the society. Therefore, the traffic police can intercept the vehicle by observing the face color of the driver and the vehicle running route, detect the alcohol content in the gas exhaled by the driver by combining an alcohol detector and judge whether the driver drives drunk. However, fish falling from the net is easy to occur in the manual interception detection mode, so a more intelligent detection mode is required to improve the traffic safety.
At present, there is an alcohol concentration detector installed in a cab according to a related art, and when the alcohol concentration detected by the alcohol concentration detector is greater than a preset value, a user is warned to stop driving and control an ignition system of a vehicle. However, alcohol may be present in some foods, or alcohol emitted in the cab is less, and cannot be detected by the detector, so that the alcohol concentration detector may make a false judgment or a false judgment.
Disclosure of Invention
The application provides a dangerous driving behavior identification method and device, electronic equipment and a storage medium, and aims to solve the technical problem that an existing dangerous driving behavior detection method is low in detection accuracy.
In order to solve the technical problem, in a first aspect, an embodiment of the present application provides a method for identifying dangerous driving behaviors, including:
acquiring video data and audio data of a user in a cab;
performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features, wherein the single-mode features comprise micro-expression features, action features and sound features;
performing feature fusion on the plurality of single-mode features to obtain fusion features;
performing secondary classification on the fusion characteristics to obtain a driving behavior result of the user;
and if the driving behavior result belongs to dangerous driving behavior, warning information is sent to the user.
The embodiment monitors the user in real time by acquiring the video data and the audio data of the user in the cab; performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features so as to extract micro-expression features, action features and sound features of a user, thereby performing mode fusion analysis in the aspects of hearing and vision; compared with the traditional single-mode processing mode, the method can acquire more comprehensive user information and more fully utilize heterogeneous information, so that the recognition result of the driving behavior has higher reliability and accuracy; and finally, if the driving behavior result belongs to dangerous driving behavior, warning information is sent to the user so as to achieve the purpose of prompting the user to drive safely.
In one embodiment, the feature extraction is performed on the video data and the audio data to obtain a plurality of single-mode features, and the method includes:
performing feature extraction on a video image in the video data by using a preset first multilayer perceptron to obtain micro expression features;
performing feature extraction on the video data by using a preset 3D convolutional neural network to obtain action features;
and performing feature extraction on the audio data by using a preset openSMILE tool to obtain the sound features.
In the embodiment, through different networks or tools, the single-mode features of the video data and the audio data are respectively extracted, so that comprehensive user feature information can be conveniently acquired, and the identification accuracy of the subsequent identification process is improved.
In a preferred embodiment, the method for extracting features of a video image in video data by using a preset first multi-layer perceptron to obtain micro-expression features includes:
extracting facial features of a user in each frame of video image of the video data;
and comparing the facial features with preset micro-expression features by utilizing a first multilayer perceptron, and determining the micro-expression features corresponding to the facial features.
In the embodiment, the facial features are compared with the preset micro-expression features to integrate the feature details, so that the identification accuracy and the reliability are improved.
In a preferred embodiment, the method for extracting features of video data by using a preset 3D convolutional neural network to obtain motion features includes:
inputting the number of channels and the number of frames of video data and the height and width of each frame of video image into a 3D convolutional neural network;
performing convolution operation on the video data by using a 3D filter in a 3D convolution neural network to obtain convolution result data;
and performing pooling operation and full connection operation on the convolution result data to obtain action characteristics.
In the embodiment, the feature extraction is carried out through the 3D convolutional neural network, and the inter-frame motion information with time dimension can be obtained, so that the action features in the video data can be better captured in the time dimension and the space dimension, the extracted action features are more comprehensive, and the identification accuracy and the reliability are further improved.
In a preferred embodiment, the extracting the features of the audio data by using a preset openSMILE tool to obtain the sound features includes:
removing background noise of the audio data, and standardizing the audio data after the background noise is removed to obtain target audio data;
performing feature extraction on the target audio data by using an openSMILE tool to obtain high-dimensional audio features;
and inputting the high-dimensional audio features into a preset second multilayer perceptron, and outputting the sound features.
The embodiment can avoid noise interference and make the performance of the feature extraction process better through denoising, standardization, feature extraction and dimension reduction of a perceptron, and the whole processing process is more efficient.
In one embodiment, performing feature fusion on a plurality of single-mode features to obtain a fused feature includes:
and performing feature splicing on the micro-expression features, the action features and the sound features according to a localization fusion mode to obtain fusion features.
In an embodiment, if the driving behavior result belongs to dangerous driving behavior, after the warning message is sent to the user, the method further includes:
carrying out time frame alignment and combination on the video data and the audio data to obtain video data;
and sending the video data and the driving behavior result to preset supervision equipment.
In the embodiment, the video data and the audio data are combined into the complete video data, and the video data and the driving behavior result are sent to the monitoring equipment, so that relevant monitoring personnel can timely process the driving behavior result, and traffic accidents caused by dangerous driving behaviors are prevented.
In a second aspect, an embodiment of the present application provides an apparatus for identifying dangerous driving behavior, including:
the acquisition module is used for acquiring video data and audio data of a user in a cab;
the extraction module is used for carrying out feature extraction on the video data and the audio data to obtain a plurality of single-mode features, wherein the single-mode features comprise micro-expression features, action features and sound features;
the fusion module is used for carrying out feature fusion on the plurality of single-mode features to obtain fusion features;
the classification module is used for carrying out secondary classification on the fusion characteristics to obtain a driving behavior result of the user;
and the sending module is used for sending warning information to the user if the driving behavior result belongs to dangerous driving behaviors.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory is used to store a computer program, and the computer program, when executed by the processor, implements the method for identifying dangerous driving behavior according to the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the method for identifying dangerous driving behavior according to the first aspect.
Please refer to the relevant description of the first aspect for the beneficial effects of the second aspect to the fourth aspect, which are not described herein again.
Drawings
Fig. 1 is a schematic flowchart of a dangerous driving behavior recognition method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a dangerous driving behavior recognition device provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As described in the related art, an alcohol concentration detector installed in a cab warns a user to stop driving and control an ignition system of a vehicle when the alcohol concentration detected by the alcohol concentration detector is greater than a preset value. However, alcohol may be present in some foods, or alcohol emitted in the cab is less, and cannot be detected by the detector, so that the alcohol concentration detector may make a false judgment or a false judgment.
Therefore, the embodiment of the application provides a dangerous driving behavior identification method, a dangerous driving behavior identification device, an electronic device and a storage medium, wherein the method comprises the steps of acquiring video data and audio data of a user in a cab to monitor the user in real time; performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features so as to extract micro-expression features, action features and sound features of a user, thereby performing mode fusion analysis in the aspects of hearing and vision; compared with the traditional single-mode processing mode, the method can acquire more comprehensive user information and more fully utilize heterogeneous information, so that the recognition result of the driving behavior has higher reliability and accuracy; and finally, if the driving behavior result belongs to dangerous driving behavior, warning information is sent to the user so as to achieve the purpose of prompting the user to drive safely.
Referring to fig. 1, fig. 1 is a schematic flowchart of a method for identifying dangerous driving behaviors provided in an embodiment of the present application. The dangerous driving behavior identification method can be applied to electronic equipment, and the electronic equipment comprises but is not limited to computing equipment such as a smart phone, a tablet computer, a personal digital assistant and a vehicle-mounted terminal which are installed in a cab. As shown in fig. 1, the method for identifying dangerous driving behavior includes steps S101 to S105, which are detailed as follows:
step S101, video data and audio data of a user in the cab are acquired.
In this step, video data is collected by a camera of the electronic device, and audio data is collected by a microphone of the electronic device. It will be appreciated that alcohol concentration data may also be collected by an alcohol concentration detector.
Step S102, performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features, wherein the single-mode features comprise micro-expression features, action features and sound features.
In this step, the micro-expression features are micro-expression features of the user, which include but are not limited to micro-expression features for fatigue, anger, crying, drowsiness and intoxication; the action characteristics are limb action characteristics of the user; the voice characteristics are voice characteristics of the user when speaking.
By constructing a model that best fits the data type for extracting features from various sources of information, it can be appreciated that features extracted from one source are independent of another. For example, in drunk driving image recognition, features extracted from the image are in the form of finer details, such as edges and surroundings, while corresponding features extracted from the picture are in the form of tokens. After all features important for prediction are extracted from at least two data sources (e.g. video, audio, etc.), the different features are combined into one shared representation.
And S103, performing feature fusion on the plurality of single-mode features to obtain fusion features.
In the step, the single-mode features are fused, compared with the traditional single-mode processing, more comprehensive information can be obtained, heterogeneous information can be more fully utilized, dangerous driving behaviors such as drunk driving behaviors and the like can be detected and analyzed, and the detection and analysis result has higher credibility and accuracy.
And step S104, performing secondary classification on the fusion characteristics to obtain the driving behavior result of the user.
In this step, the two classifications can be implemented based on a support vector machine, and the driving behavior results include non-dangerous driving behaviors and dangerous driving behaviors.
And step S105, if the driving behavior result belongs to dangerous driving behavior, warning information is sent to the user.
In this step, for example, voice warning information may be issued to the user through the in-vehicle device, or flashing warning information may be issued to the user through the indicator lamp.
In an embodiment, on the basis of the embodiment in fig. 1, the step S102 includes:
performing feature extraction on a video image in the video data by using a preset first multilayer perceptron to obtain the micro-expression features;
performing feature extraction on the video data by using a preset 3D convolutional neural network to obtain the action features;
and performing feature extraction on the audio data by using a preset openSMILE tool to obtain the sound features.
In this embodiment, 3D-CNN (3D convolutional neural network) is formed by stacking a plurality of consecutive frames to form a cube, and then applying a 3D convolutional kernel in the cube. In this structure, each feature map in the convolutional layer is connected to a plurality of adjacent continuous frames in the previous layer, so as to capture motion information, and can better capture the temporal and spatial feature information in the video.
The openSMILE tool is a tool that operates in the form of a command line for extracting audio features by configuring a config file.
Optionally, the performing, by using a preset first multilayer perceptron, feature extraction on a video image in the video data to obtain the micro-expression feature includes:
extracting facial features of a user in each frame of video image of the video data;
and comparing the facial features with preset micro-expression features by using the first multilayer perceptron, and determining the micro-expression features corresponding to the facial features.
In this embodiment, 5 facial micro-expressions including fatigue, drowsiness, intoxication, anger, crying, etc. are converted into 5-dimensional vector codes and stored in the feature vector database for comparison with the subsequently input driver micro-expressions. In the embodiment, the micro-expression image of the driver is captured by the camera and is compared with the expression data, such as drunkenness, fatigue, drowsiness and the like, stored in the feature vector library, and the micro-expression feature wf is obtained after comparison. The advantage is to incorporate details so that the driving behavior detection is more convincing.
Optionally, the performing feature extraction on the video data by using a preset 3D convolutional neural network to obtain the motion feature includes:
inputting the number of channels and the number of frames of the video data and the height and width of each frame of video image into the 3D convolutional neural network;
performing convolution operation on the video data by using a 3D filter in the 3D convolution neural network to obtain convolution result data;
and performing pooling operation and full-connection operation on the convolution result data to obtain the action characteristics.
In this embodiment, exemplarily, the video is v, the input dimension is (c, f, h, w), c is the number of channels, f is the number of frames, and h and w represent the height and width of each frame respectively; the 3D filter fl is (fm, c, fd, fh, fw), fm is the number of feature maps, c is the number of channels, fd is the number of frames (depth of convolution kernel), fh and fw represent the height and width of the filter, respectively.
Using a 3D filter, a convolution operation is performed on the video data, the output dimension after convolution being (fm, c, f-fd +1, h-fh +1, w-fw + 1). Then, the maximum pooling (Max pooling) (mp, mp, mp) is used for pooling, and the pooling result is input into a full connection layer and a softmax layer with the size df, and the action characteristic vf is output.
Where the video data is RGB three channel, so c =3. Using 32 feature maps, the 3D convolution kernel is fd = fh = fw =5, so the fl dimension is 32 × 3 × 5 × 5 × 5, and the maximum pooled mp is 3. The resulting feature vector vf dimension is 300. In the embodiment, the controllable range of the convolution kernel is expanded to the time domain through the 3D-CNN, the flexibility is higher compared with the 2D convolution, the 3D-CNN (3D convolution neural network) is used for extracting the video features, and the 3D-CNN not only can extract the features of each frame of image of the video, but also can extract the features of the whole video in the time domain and the space domain. More motion information can be learned.
Optionally, the performing, by using a preset openSMILE tool, feature extraction on the audio data to obtain the sound feature includes:
removing the background noise of the audio data, and standardizing the audio data after the background noise is removed to obtain target audio data;
performing feature extraction on the target audio data by using the openSMILE tool to obtain high-dimensional audio features;
and inputting the high-dimensional audio features into a preset second multilayer perceptron, and outputting the sound features.
In this example, the audio tool SoX is used to remove the background noise, and the audio data after removing the background noise is normalized by Z-normalization. And after standardization, sending the audio signals to openSMILE to extract audio features to obtain 6373-dimensional audio features. Finally, a 300-dimensional feature vector af is obtained by a multi-layer perceptron. The advantage is a higher audio processing efficiency.
Wherein the SoX tool can read and write audio files in common format and selectively add some sound effects in the process. It can combine several input sources and synthesized sound effect, and can be used as audio player or multitrack recorder on many systems. The basic flow of the SoX tool for processing audio includes Input(s), combiner, effects and Output(s). All functions of the SoX tool can be realized by a simple SoX command and corresponding options, but it simultaneously provides a play command for playing an audio file, a rec command for recording audio, and a SoX command for acquiring information contained in a file header of the audio.
In an embodiment, the performing feature fusion on a plurality of the single-mode features to obtain a fused feature includes:
and performing feature splicing on the micro-expression feature, the action feature and the sound feature according to a localization fusion mode to obtain the fusion feature.
In this embodiment, driving behavior is predicted by multimodal fusion processing to combine information from two or more modalities. In this embodiment, the feature vectors af, vf, and mf are directly spliced by a localization fusion manner to obtain zf = [ vf ] of 605 dimensions; af; wf vector. Compared with traditional single-mode processing, the method and the device can acquire more comprehensive information, can make full use of heterogeneous information, and have more credibility and accuracy on the detection and analysis result of dangerous driving behaviors.
In an embodiment, after sending warning information to the user if the driving behavior result belongs to dangerous driving behavior, the method further includes:
performing time frame alignment and combination on the video data and the audio data to obtain video data;
and sending the video data and the driving behavior result to preset supervision equipment.
In this embodiment, the supervision device may be a supervision device of a traffic management department. The video data and the audio data are combined after time frame alignment is carried out on the video data and the audio data, so that the video data are obtained, and the continuity of the video data is ensured. And reporting the driving condition of the driver to relevant departments for processing through a network. Can accomplish humanized intelligence and remind the driver to let relevant departments foretell the dangerous action of driver early, in order can in time arrange personnel to handle in advance, prevent dangerous accidents such as drunk driving.
In order to execute the dangerous driving behavior identification method corresponding to the method embodiment, corresponding functions and technical effects are achieved. Referring to fig. 2, fig. 2 is a block diagram illustrating a structure of a dangerous driving behavior recognition apparatus according to an embodiment of the present application. For convenience of explanation, only the portions related to the present embodiment are shown, and the dangerous driving behavior recognition apparatus provided in the embodiment of the present application includes:
an obtaining module 201, configured to obtain video data and audio data of a user in a cab;
an extraction module 202, configured to perform feature extraction on the video data and the audio data to obtain a plurality of single-modality features, where the single-modality features include a micro-expression feature, an action feature, and a sound feature;
a fusion module 203, configured to perform feature fusion on the single-mode features to obtain a fusion feature;
the classification module 204 is configured to perform secondary classification on the fusion features to obtain a driving behavior result of the user;
a sending module 205, configured to send warning information to the user if the driving behavior result belongs to dangerous driving behavior.
In one embodiment, the extraction module 202 includes:
the first extraction unit is used for extracting the characteristics of the video image in the video data by utilizing a preset first multilayer perceptron to obtain the micro-expression characteristics;
the second extraction unit is used for extracting the characteristics of the video data by using a preset 3D convolutional neural network to obtain the action characteristics;
and the third extraction unit is used for extracting the characteristics of the audio data by using a preset openSMILE tool to obtain the sound characteristics.
In a preferred embodiment, the first extraction unit includes:
the first extraction subunit is used for extracting facial features of a user in each frame of video image of the video data;
and the comparison subunit is used for comparing the facial features with preset micro-expression features by using the first multilayer perceptron, and determining the micro-expression features corresponding to the facial features.
In a preferred embodiment, the second extraction unit includes:
the input subunit is used for inputting the channel number, the frame number, the height and the width of each frame of video image of the video data into the 3D convolutional neural network;
the convolution subunit is configured to perform convolution operation on the video data by using a 3D filter in the 3D convolution neural network to obtain convolution result data;
and the first output subunit is used for performing pooling operation and full-connection operation on the convolution result data to obtain the action characteristics.
In a preferred embodiment, the third extraction unit includes:
the removing subunit is used for removing the background noise of the audio data and standardizing the audio data after the background noise is removed to obtain target audio data;
the second extraction subunit is used for extracting the features of the target audio data by using the openSMILE tool to obtain high-dimensional audio features;
and the second output subunit is used for inputting the high-dimensional audio features to a preset second multilayer perceptron and outputting the sound features.
In one embodiment, the fusion module 203 includes:
and the splicing unit is used for performing characteristic splicing on the micro-expression characteristic, the action characteristic and the sound characteristic according to a collocation fusion mode to obtain the fusion characteristic.
In an embodiment, the identification apparatus further includes:
the combination module is used for carrying out time frame alignment and combination on the video data and the audio data to obtain video data;
and the sending unit is used for sending the video data and the driving behavior result to preset monitoring equipment.
The dangerous driving behavior recognition device can implement the dangerous driving behavior recognition method of the method embodiment. The alternatives in the above-described method embodiments are also applicable to this embodiment and will not be described in detail here. The rest of the embodiments of the present application may refer to the contents of the above method embodiments, and in this embodiment, details are not described again.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the application. As shown in fig. 3, the electronic apparatus 3 of this embodiment includes: at least one processor 30 (only one shown in fig. 3), a memory 31, and a computer program 32 stored in the memory 31 and executable on the at least one processor 30, the processor 30 implementing the steps of any of the method embodiments described above when executing the computer program 32.
The electronic device 3 may be a computing device such as a smart phone, a tablet computer, and a vehicle-mounted terminal. The electronic device may include, but is not limited to, a processor 30, a memory 31. Those skilled in the art will appreciate that fig. 3 is only an example of the electronic device 3, and does not constitute a limitation to the electronic device 3, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.
The Processor 30 may be a Central Processing Unit (CPU), and the Processor 30 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 31 may in some embodiments be an internal storage unit of the electronic device 3, such as a hard disk or a memory of the electronic device 3. The memory 31 may also be an external storage device of the electronic device 3 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the electronic device 3. The memory 31 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 31 may also be used to temporarily store data that has been output or is to be output.
In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in any of the method embodiments described above.
The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.
In several embodiments provided herein, it will be understood that each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a terminal device to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The above-mentioned embodiments are further detailed to explain the objects, technical solutions and advantages of the present application, and it should be understood that the above-mentioned embodiments are only examples of the present application and are not intended to limit the scope of the present application. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the present application, may occur to those skilled in the art and are intended to be included within the scope of the present application.

Claims (9)

1. A method for identifying dangerous driving behavior, comprising:
acquiring video data and audio data of a user in a cab;
performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features, wherein the single-mode features comprise micro-expression features, action features and sound features;
performing feature fusion on the single-mode features to obtain fusion features;
performing secondary classification on the fusion characteristics to obtain a driving behavior result of the user;
if the driving behavior result belongs to dangerous driving behavior, warning information is sent to the user;
the performing feature extraction on the video data and the audio data to obtain a plurality of single-mode features includes:
performing feature extraction on a video image in the video data by using a preset first multilayer perceptron to obtain the micro-expression features;
the method for extracting the characteristics of the video image in the video data by using the preset first multilayer perceptron to obtain the micro-expression characteristics comprises the following steps:
extracting facial features of a user in each frame of video image of the video data;
and comparing the facial features with preset micro-expression features by utilizing the first multilayer perceptron to determine the micro-expression features corresponding to the facial features, wherein the preset micro-expression features are pre-stored in a feature vector database in a vector coding mode.
2. The method for identifying dangerous driving behavior according to claim 1, wherein said extracting features from said video data and said audio data to obtain a plurality of single-mode features further comprises:
performing feature extraction on the video data by using a preset 3D convolutional neural network to obtain the action features;
and performing feature extraction on the audio data by using a preset openSMILE tool to obtain the sound features.
3. The method for identifying dangerous driving behaviors as claimed in claim 2, wherein said performing feature extraction on said video data by using a preset 3D convolutional neural network to obtain said motion features comprises:
inputting the number of channels and the number of frames of the video data and the height and width of each frame of video image into the 3D convolutional neural network;
performing convolution operation on the video data by using a 3D filter in the 3D convolution neural network to obtain convolution result data;
and performing pooling operation and full-connection operation on the convolution result data to obtain the action characteristics.
4. The method for recognizing dangerous driving behavior according to claim 2, wherein the performing feature extraction on the audio data by using a preset openSMILE tool to obtain the sound features comprises:
removing the background noise of the audio data, and standardizing the audio data after the background noise is removed to obtain target audio data;
performing feature extraction on the target audio data by using the openSMILE tool to obtain high-dimensional audio features;
and inputting the high-dimensional audio features into a preset second multilayer perceptron, and outputting the sound features.
5. The method for identifying dangerous driving behavior according to claim 1, wherein said performing feature fusion on a plurality of said single-modal features to obtain a fused feature comprises:
and performing feature splicing on the micro-expression features, the action features and the sound features according to a localization fusion mode to obtain the fusion features.
6. The method for identifying dangerous driving behavior according to claim 1, wherein after the step of sending warning information to the user if the driving behavior result belongs to dangerous driving behavior, the method further comprises:
performing time frame alignment and combination on the video data and the audio data to obtain video data;
and sending the video data and the driving behavior result to preset supervision equipment.
7. An apparatus for recognizing dangerous driving behavior, comprising:
the acquisition module is used for acquiring video data and audio data of a user in a cab;
the extraction module is used for carrying out feature extraction on the video data and the audio data to obtain a plurality of single-mode features, wherein the single-mode features comprise micro-expression features, action features and sound features;
the fusion module is used for carrying out feature fusion on the single-mode features to obtain fusion features;
the classification module is used for carrying out secondary classification on the fusion characteristics to obtain a driving behavior result of the user;
the sending module is used for sending warning information to the user if the driving behavior result belongs to dangerous driving behavior;
the extraction module comprises:
the first extraction unit is used for extracting the characteristics of the video image in the video data by utilizing a preset first multilayer perceptron to obtain the micro-expression characteristics;
the first extraction unit includes:
the first extraction subunit is used for extracting facial features of a user in each frame of video image of the video data;
and the comparison subunit is used for comparing the facial features with preset micro-expression features by utilizing the first multilayer perceptron, and determining the micro-expression features corresponding to the facial features, wherein the preset micro-expression features are pre-stored in a feature vector database in a vector coding mode.
8. An electronic device, characterized in that it comprises a processor and a memory for storing a computer program which, when executed by the processor, implements a method for identifying dangerous driving behavior as claimed in any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that it stores a computer program which, when being executed by a processor, implements the method of identifying dangerous driving behavior of any one of claims 1 to 6.
CN202111358472.0A 2021-11-16 2021-11-16 Dangerous driving behavior recognition method and device, electronic equipment and storage medium Active CN114170585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111358472.0A CN114170585B (en) 2021-11-16 2021-11-16 Dangerous driving behavior recognition method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111358472.0A CN114170585B (en) 2021-11-16 2021-11-16 Dangerous driving behavior recognition method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114170585A CN114170585A (en) 2022-03-11
CN114170585B true CN114170585B (en) 2023-03-24

Family

ID=80479331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111358472.0A Active CN114170585B (en) 2021-11-16 2021-11-16 Dangerous driving behavior recognition method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114170585B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115641570B (en) * 2022-12-26 2023-06-23 中国汽车技术研究中心有限公司 Driving behavior determination method, driving behavior determination device, electronic equipment and storage medium
CN117152308B (en) * 2023-09-05 2024-03-22 江苏八点八智能科技有限公司 Virtual person action expression optimization method and system
CN118457218B (en) * 2024-07-12 2024-09-17 陕西三航科技有限公司 Drunk driving detection method and system based on sensor and machine vision

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858379A (en) * 2019-01-03 2019-06-07 深圳壹账通智能科技有限公司 Smile's sincerity degree detection method, device, storage medium and electronic equipment
CN110110662A (en) * 2019-05-07 2019-08-09 济南大学 Driver eye movement behavioral value method, system, medium and equipment under Driving Scene

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220591A (en) * 2017-04-28 2017-09-29 哈尔滨工业大学深圳研究生院 Multi-modal intelligent mood sensing system
CN110555346A (en) * 2018-06-01 2019-12-10 杭州海康威视数字技术股份有限公司 Driver emotion detection method and device, electronic equipment and storage medium
CN110399793A (en) * 2019-06-19 2019-11-01 深圳壹账通智能科技有限公司 Driving behavior method for early warning, device and computer equipment based on image recognition
CN111723752A (en) * 2020-06-23 2020-09-29 深圳壹账通智能科技有限公司 Method and device for detecting on-duty driving of driver based on emotion recognition
CN113591525B (en) * 2020-10-27 2024-03-01 蓝海(福建)信息科技有限公司 Driver road anger recognition method by deeply fusing facial expression and voice
CN113408385B (en) * 2021-06-10 2022-06-14 华南理工大学 Audio and video multi-mode emotion classification method and system
CN113469153B (en) * 2021-09-03 2022-01-11 中国科学院自动化研究所 Multi-modal emotion recognition method based on micro-expressions, limb actions and voice

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858379A (en) * 2019-01-03 2019-06-07 深圳壹账通智能科技有限公司 Smile's sincerity degree detection method, device, storage medium and electronic equipment
CN110110662A (en) * 2019-05-07 2019-08-09 济南大学 Driver eye movement behavioral value method, system, medium and equipment under Driving Scene

Also Published As

Publication number Publication date
CN114170585A (en) 2022-03-11

Similar Documents

Publication Publication Date Title
CN114170585B (en) Dangerous driving behavior recognition method and device, electronic equipment and storage medium
CN110390262B (en) Video analysis method, device, server and storage medium
CN109584507B (en) Driving behavior monitoring method, device, system, vehicle and storage medium
CN108351968B (en) Alarming method, device, storage medium and server for criminal activities
WO2020024457A1 (en) Liability cognizance method and device of traffic accident and computer readable storage medium
CN110866427A (en) Vehicle behavior detection method and device
CN113052029A (en) Abnormal behavior supervision method and device based on action recognition and storage medium
CN103366506A (en) Device and method for automatically monitoring telephone call behavior of driver when driving
CN110580808B (en) Information processing method and device, electronic equipment and intelligent traffic system
WO2022213336A1 (en) Vehicle driving environment abnormality monitoring method and apparatus, electronic device, and storage medium
CN112507860A (en) Video annotation method, device, equipment and storage medium
Kumtepe et al. Driver aggressiveness detection via multisensory data fusion
CN114373189A (en) Behavior detection method and apparatus, terminal device and storage medium
CN111985304A (en) Patrol alarm method, system, terminal equipment and storage medium
CN112464755A (en) Monitoring method and device, electronic equipment and storage medium
CN118135800B (en) Abnormal traffic event accurate identification warning method based on deep learning
CN117011830B (en) Image recognition method, device, computer equipment and storage medium
CN111241918B (en) Vehicle tracking prevention method and system based on face recognition
CN111275008B (en) Method and device for detecting abnormality of target vehicle, storage medium and electronic device
CN113076852A (en) Vehicle-mounted snapshot processing system occupying bus lane based on 5G communication
CN110502995B (en) Driver yawning detection method based on fine facial action recognition
CN115620110B (en) Video event positioning and identifying method, device and storage medium
CN112597924B (en) Electric bicycle track tracking method, camera device and server
CN112016423B (en) Method, device and equipment for identifying vehicle door state and computer storage medium
CN112686136B (en) Object detection method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant