CN117576666A - Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting - Google Patents

Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting Download PDF

Info

Publication number
CN117576666A
CN117576666A CN202311538093.9A CN202311538093A CN117576666A CN 117576666 A CN117576666 A CN 117576666A CN 202311538093 A CN202311538093 A CN 202311538093A CN 117576666 A CN117576666 A CN 117576666A
Authority
CN
China
Prior art keywords
dynamic convolution
driving behavior
dangerous driving
dckf
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311538093.9A
Other languages
Chinese (zh)
Other versions
CN117576666B (en
Inventor
李自强
吴克伟
纪松
谢昭
程明
徐浩
王键钊
张沛錡
谭昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202311538093.9A priority Critical patent/CN117576666B/en
Publication of CN117576666A publication Critical patent/CN117576666A/en
Application granted granted Critical
Publication of CN117576666B publication Critical patent/CN117576666B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting. Because of the existing target detection model, different types of dangerous driving behaviors are difficult to distinguish, the method of the invention uses the characteristics of different behaviors to learn the dynamic convolution kernel. In order to improve the recognition accuracy of dangerous driving behaviors with different resolutions, the method disclosed by the invention considers dynamic convolution kernels with different scales in a monitoring environment. In order to effectively fuse the multi-scale dynamic convolution characteristics, the method analyzes the relation among the scale characteristics, is used for learning the attention of each scale and realizes the multi-scale characteristic fusion. The multi-scale dynamic convolution module and the attention weighting module are added in the existing target detection model, so that the accuracy of dangerous driving behavior detection can be improved, the method can be applied to a vehicle safety system, and driving safety is guaranteed.

Description

Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting
Technical Field
The invention relates to the technical field of multi-scale dynamic convolution attention weighting, in particular to a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting.
Background
In order to ensure good traffic order and safety of people's lives and properties, it is necessary to monitor dangerous driving behavior of a driving driver. With the rapid development of a multi-scale dynamic convolution attention weighting module, a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting is gradually paid attention to the industry.
The Chinese patent application publication No. CN114005093A (video analysis-based driving behavior warning method, device, equipment and medium) provides a driving behavior warning method based on video analysis. The method identifies an image of dangerous driving behaviors of the target vehicle by marking position information, vehicle speed information and driving track information of the target vehicle and other vehicles in the image and a plurality of dangerous driving characteristics acquired in advance. And when the number of images of dangerous driving behaviors is larger than a preset threshold value in a preset unit time, alarming a driver of the target vehicle. However, this method only uses the external information of the vehicle to detect, but cannot sufficiently combine the driving state of the driver, and it is difficult to achieve the effect of early warning. Chinese patent application publication No. CN113033261A, a dangerous driving identification and early warning method, proposes a dangerous driving identification and early warning method. The method comprises the following steps: 1. acquiring current driving data of a driving vehicle; 2. extracting driving behavior characteristics according to driving data; 3. identifying driving behavior characteristics by adopting a fuzzy convolutional neural network; 4. and when dangerous driving is identified, sending out an early warning signal. The method has the advantages of simple steps and easy implementation, the driving behavior characteristics are identified through the fuzzy convolutional neural network, the method has higher identification accuracy, the dangerous driving behavior of the driver can be effectively judged, the early warning signal can be timely sent out, the obvious effect is achieved, and the potential of popularization and application is provided.
In Gong Jian-Qiang and Wang Yi-ying, research on Online Identification Algorithm of Dangerous Driving Behavior, they propose an algorithm for identifying dangerous driving behaviors using a variance Bayesian network, and the results indicate that the model can identify two dangerous driving behaviors and has better generalization performance compared with a single variance model. Zhe Ma, xiaohui Yang and Haora Zhang, "Dangerous Driving Behavior Recognition using CA-CenterNet," propose a dangerous driving behavior recognition method using CA-CenterNet. In this study, they identified dangerous driving behaviors based on the driver's hand behaviors and captured a large amount of driver video to build a dangerous driving behavior data set based on hand detection. They evaluated and compared with other network models, and experimental results show that the method improves the accuracy of behavior recognition.
However, in dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting, we need to fully consider the problem that the fragments of the background and the fuzzy behavior in the driving scene are difficult to classify. The key information such as the face of the driver is acquired by a multi-scale dynamic convolution technology. Using the image data acquired by the camera, in combination with the attention weighted model, we can accurately extract the position and pose of the point of interest therefrom. In this way, we can monitor the face situation of the driver more accurately, and have stronger characterization ability when detecting dangerous driving behaviors of the driver.
Disclosure of Invention
The invention aims to make up the defects of the prior art and provides a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting.
The invention is realized by the following technical scheme:
a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting specifically comprises the following steps:
s1: constructing a dangerous driving behavior data set;
s2: constructing a multi-scale dynamic convolution characteristic;
s3: fusing the multi-scale dynamic convolution characteristics based on the attention weighting;
s4: training a multi-scale dynamic convolution model for dangerous driving behavior detection;
s5: the multi-scale dynamic convolution model is tested and used for dangerous driving behavior detection;
the dangerous driving behavior training set is constructed in the step S1, and the specific steps are as follows:
s1-1: inputting dangerous driving behavior video V n Where n=1, 2, …, N is the number of videos;
s1-2: inputting dangerous driving behavior label true value L n E {0,1,2,3}, where 0 represents normal driving, 1 represents use of a cell phone, 2 represents drinking water, and 3 represents communication with the passenger;
s1-3: each video V obtained in the step S1-1 n Dividing into non-overlapping segments of multiple frames Snippetn m Where m=1, 2, …, M is the number of fragments;
s1-4: for each segment Snippetn m Randomly sampling to obtain a video frame F i Where i=1, 2, …, I is the number of video frames;
s1-5: preprocessing video frames;
s1-5-1: input video frame F i
S1-5-2: for video frame F i Randomly cutting;
s1-5-3: for video frame F i Carrying out random horizontal overturning;
s1-5-4: for video frame F i Performing standardization, namely standardizing the mean {0.485,0.456,0.406} of the three channels and the standard deviation {0.299,0.224,0.225};
s1-5-5: obtaining a preprocessed video frame FP i
S1-6: for all video frames F i Repeating the steps S1-5 to obtain a dangerous driving behavior data set DS;
the construction of the multi-scale dynamic convolution feature in the step S2 comprises the following specific steps:
s2-1: input pre-processed video frame FP i ∈R C×H×W
Wherein C represents the number of channels, H represents the picture height, and W represents the picture width;
s2-2: inputting the characteristic Headf ε R extracted by the YOLOv3 model C×H×W
Wherein C represents the number of channels and H×W represents the feature size;
s2-3: constructing a 3×3 dynamic convolution feature;
s2-3-1: input feature Headf ε R C×H×W
S2-3-2: determining a center pixel (h, w), where h E [2, H-2], w E [2, W-2];
s2-3-3: determining a neighborhood range, wherein the neighborhood size is 1, and the neighborhood range is [ h-1, h+1 ]]×[w-1,w+1]Neighborhood is denoted as NS 3 ∈R 3×3
S2-3-4: given W K ∈R C×1 And Headf h,w ∈R C×1 Calculating Key feature Key of center pixel (h, w) h,w The formula is as follows:
Key h,w =W K ·Headf h,w ∈R C×1
wherein Key is h,w Representing the key vector at position (h, W), W K Representing a weight matrix;
s2-3-5: given W Q ∈R C×1 And Headf u,v ∈R C×1 Computing Query feature Query for pixel (u, v) in the domain u,v The formula is as follows:
Query u,v =W Q ·Headf u,v ∈R C×1
wherein Query is u,v Representing a query vector, W, at a location (u, v) Q Representing a weight matrix;
s2-3-6: calculating dynamic convolution kernel weights dck h,w,u,v The formula is as follows:
wherein the softmax function is used for normalizing the similarity, and Trans is a transpose operation, d r =C;
S2-3-7: repeating S2-3-3 to S2-3-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with the scale of 3 3,h,w ∈R 3×3
S2-3-8: for neighborhood NS 3 Using dynamic convolution kernel DCK 3,h,w Performing convolution operation to obtain dynamic convolution characteristic dckf h,w The formula is as follows:
dckf h,w =Conv 3×3 (NS 3 ,DCK 3,h,w )∈R 1×1
wherein dckf h,w Representing the final output at position (h, w), conv 3×3 Is a 3x3 convolution operation;
s2-3-9: repeating S2-3-2 to S2-3-8 by using a sliding window for all central pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF 1
S2-4: constructing a 5×5 dynamic convolution feature;
s2-4-1: input feature Headf ε R C×H×W
S2-4-2: determining a center pixel (h, w), wherein h E [3, H-3], w E [3,W-3];
s2-4-3: determining a neighborhood range, wherein the neighborhood size is 2, and the neighborhood range is [ h-2, h+2 ]]×[w-2,w+2]Neighborhood is denoted as NS 5 ∈R 5×5
S2-4-4: given W K ∈RC ×1 And Headf h,w ∈R C×1 Calculating Key feature Key of center pixel (h, w) h,w The formula is as follows:
Key h,w =W K ·Headf h,w ∈R C×1
wherein Key is h,w Representing the key vector at position (h, W), W K Representing a weight matrix;
s2-4-5: given W Q ∈R C×1 And headf u,v ∈R C×1 Computing Query feature Query for pixel (u, v) in the domain u,v The formula is as follows:
Query u,v =W Q ·Headf u,v ∈R C×1
wherein Query is u,v Representing a query vector, W, at a location (u, v) Q Representing a weight matrix;
s2-4-6: calculating dynamic convolution kernel weights dck h,w,u,v The formula is as follows:
wherein the softmax function is used for normalizing the similarity, and Trans is a transpose operation, d r =C;
S2-4-7: repeating S2-5-3 to S2-5-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with a scale of 5 5,h,w
S2-4-8: for neighborhood NS 5 Using dynamic convolution kernel DCK 5,h,w Performing convolution operation to obtain dynamic convolution characteristic dckf h,w The formula is as follows:
dckf h,w =Conv 5×5 (NS 5 ,DCK 5,h,w )∈R 1×1
wherein dckf h,w Representing the final output at position (h, w), conv 5×5 Is a convolution operation of 5x 5;
s2-4-9: in HeadfAll center pixels (h, w), repeating S2-5-2 to S2-5-8 using a sliding window, to obtain a dynamic convolution characteristic DCKF 2
S2-5: constructing 7×7 dynamic convolution characteristics;
s2-5-1: input feature Headf ε R C×H×W
S2-5-2: determining a center pixel (h, w), where h E [4, H-4], w E [4, W-4];
s2-5-3: determining a neighborhood range, wherein the neighborhood size is 3, and the neighborhood range is [ h-3, h+3 ]]×[w-3,w+3]Neighborhood is denoted as NS 7 ∈R 7×7
S2-5-4: given W K ∈R C×1 And Headf h,w ∈R C×1 Calculating Key feature Key of center pixel (h, w) h,w The formula is as follows:
Key h,w =W K ·headf h,w ∈R C×1
wherein Key is h,w Representing the key vector at position (h, W), W K Representing a weight matrix;
s2-5-5: given W Q ∈R C×1 And Headf u,v ∈R C×1 Computing Query feature Query for pixel (u, v) in the domain u,v The formula is as follows:
Query u,v =W Q ·Headf u,v ∈R C×1
wherein Query is u,v Representing a query vector, W, at a location (u, v) Q Representing a weight matrix;
s2-5-6: calculating dynamic convolution kernel weights dck h,w,u,v The formula is as follows:
wherein the softmax function is used for normalizing the similarity, and Trans is a transpose operation, d r =C;
S2-5-7: repeating S2-7-3 to S2-7-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with the scale of 7 7,h,w ∈R 7×7
S2-5-8: for neighborhood NS 7 Using dynamic convolution kernel DCK 7,h,w Performing convolution operation to obtain dynamic convolution characteristic dckf h,w The formula is as follows:
dckf h,w =Conv 7×7 (NS 7 ,DCK 7,h,w )∈R 1×1
wherein dckf h,w Representing the final output at position (h, w), conv 7×7 Is a 7x7 convolution operation;
s2-5-9: repeating S2-7-2 to S2-7-8 for all center pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF 7
The step S3 of fusing the multi-scale dynamic convolution characteristics based on the attention weighting specifically comprises the following steps:
s3-1: input 3×3 dynamic convolution characteristic DCKF 1 5×5 dynamic convolution characteristic DCKF 2 7×7 dynamic convolution characteristic DCKF 3
S3-2: for multi-scale dynamic convolution feature DCKF 1 、DCKF 2 、DCKF 3 Global average pooling is performed using a global average pooling layer to obtain global average pooling features Global Average Pooling Feature, denoted GAPF, formulated as follows:
GAPF=GAP(DCKF 1 ,DCKF 2 ,DCKF 3 )∈R C×1
wherein GAP is an abbreviation for global average pooling (Global Average Pooling), which is a common pooling operation method;
s3-3: the global average pooling feature GAPF is subjected to convolution operation by using a 1 multiplied by 1 convolution layer to obtain a convolution feature CF 0 The formula is as follows:
CF 0 =Conv 1×1 (GAPF)∈R C′×1
wherein R is C′×1 A column vector representing a C' dimension is a real vector space;
s3-4: for convolution characteristics CF 0 The three channels are respectively convolved by using three 1 multiplied by 1 convolution layers to obtain convolution characteristics CF 1 、CF 2 And CF (compact F) 3 The formula is as follows:
wherein CF is 1 、CF 2 、CF 3 These three vectors correspond to CF respectively 0 Representing new characteristic information obtained after convolution operation;
s3-5: for convolution characteristics CF 1 、CF 2 And CF (compact F) 3 Normalization operation is carried out by using a Softmax layer to obtain a fused multi-scale dynamic convolution feature CF 1,c ,CF 2,c And CF (compact F) 3,c The formula is as follows:
wherein CF is 1,c 、CF 2,c And CF (compact F) 3,c The value of (2) represents the weight or importance of the corresponding feature;
the training multi-scale dynamic convolution model in the step S4 is used for dangerous driving behavior detection, and specifically comprises the following steps:
s4-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;
s4-2: calling step S2 to construct a multi-scale dynamic convolution feature, and extracting the multi-scale dynamic convolution feature to the feature Headf extracted by the dangerous driving behavior training set DS and the YOLOv3 model obtained in step S4-1 to obtain a 3 multiplied by 3 dynamic convolution feature DCKF 1 5×5 dynamic convolution kernel DCKF 2 7×7 dynamic convolution kernel DCKF 3
S4-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the step S4-2 on the 3 multiplied by 3 dynamic convolution characteristics DCKF 1 5×5 dynamic convolution characteristic DCKF 2 7×7 dynamic convolution characteristic DCKF 3 Performing attention-based weighted fusion to obtain a fused multi-scale dynamic convolutionFeature CF 1,c ∈R 1×1 ,CF 2,c ∈R 1×1 And CF (compact F) 3,c ∈R 1×1
S4-4: for the fused multi-scale dynamic convolution feature CF obtained in the step S4-3 1,c ,CF 2,c And CF (compact F) 3,c Performing score vector calculation to obtain a score vector SV;
s4-5: feature Headf extracted from dangerous driving behavior training set DS and YOLOv3 model obtained in step S4-1 1 、Headf 2 And Headf 3 Repeating steps S4-2 to S4-4 to obtain a score vector SV 1 、SV 2 And SV(s) 3
S4-6: for the score vector SV obtained in step S4-5 1 、SV 2 And SV(s) 3 Performing addition and merging operation to obtain a final score vector FSV;
s4-7: performing maximum value extraction operation on the score vector FSV obtained in the step S4-6, and calculating a cross entropy loss function by using the real value of the dangerous driving behavior label to obtain loss;
s4-8: carrying out back propagation update on all parameters to obtain a multi-scale dynamic convolution model;
the step S5 of testing the multi-scale dynamic convolution model comprises the following specific steps:
s5-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;
s5-2: calling the step S2 to construct a multi-scale dynamic convolution feature, and extracting the multi-scale dynamic convolution feature to obtain a 3 multiplied by 3 dynamic convolution feature DCKF on the dangerous driving behavior training set DS obtained in the step S4-1 1 5×5 dynamic convolution kernel DCKF 2 7×7 dynamic convolution kernel DCKF 3
S5-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the step S5-2 on the 3 multiplied by 3 dynamic convolution characteristics DCKF 1 5×5 dynamic convolution characteristic DCKF 2 7×7 dynamic convolution characteristic DCKF 3 Performing attention-based weighted fusion to obtain a fused multi-scale dynamic convolution feature CF 1,c ∈R 1×1 ,CF 2,c ∈R 1×1 And CF (compact F) 3,c ∈R 1×1
S5-4: and (3) fusing the multiscale dynamic convolution characteristics CF obtained in the step S5-3 1,c ,CF 2,c And CF (compact F) 3,c Performing score vector calculation to obtain a score vector SV;
s5-5: feature Headf extracted from dangerous driving behavior training set DS and YOLOv3 model obtained in step S5-1 1 、Headf 2 And Headf 3 Repeating steps S5-2 to S5-4 to obtain score vector SV 1 、SV 2 And SV(s) 3
S5-6: for the score vector SV obtained in step S5-5 1 、SV 2 And SV(s) 3 Performing addition and merging operation to obtain a final score vector FSV;
s5-7: performing maximum value extraction operation on the score vector FSV obtained in the step S5-6 to obtain dangerous driving behavior label predicted values;
s5-8: and (3) fusing the multiscale dynamic convolution characteristics CF obtained in the step S5-3 1,c ,CF 2,c And CF (compact F) 3,c Predicting by using a multi-scale dynamic convolution model to obtain a dangerous driving behavior score DBS;
s5-9: and (5) invoking threshold judgment, namely performing threshold judgment on the dangerous driving behavior score DBS obtained in the step (S5-8), and performing corresponding reminding if the threshold is exceeded.
The invention has the advantages that: the driving safety is improved by the dangerous driving behavior detection method based on the multi-scale dynamic convolution attention weighting, and in order to effectively fuse the multi-scale dynamic convolution characteristics, the method analyzes the relation among the scale characteristics, is used for learning the attention of each scale and realizes the multi-scale characteristic fusion. Acquiring facial information of a driver through a video frame, and then detecting whether dangerous driving behaviors exist in the driver through analyzing features such as facial expressions, head gestures and the like; in order to accurately detect dangerous driving behaviors, the method adopts a multi-scale dynamic convolution attention weighted dangerous driving behavior detection technology, and a video frame sequence is input into a multi-scale dynamic convolution model to capture key moments and actions so as to obtain a series of driving behavior representations; when dangerous driving behaviors are detected, the method combines facial features and driving behaviors to carry out comprehensive judgment. For example, dangerous behavior such as fatigue driving, distraction driving, and the like may be detected by analyzing the facial expression and driving behavior of the driver. Meanwhile, by combining the learning result of the attention weighting module, more accurate face information can be acquired and used for detecting the attention concentration degree, driving behavior and the like of the driver. According to the invention, through dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting and combining with facial features and driving behaviors, the accuracy and reliability of the driving behaviors of a driver are improved. This helps to promote driving safety, prevents the emergence of accident.
Drawings
FIG. 1 is a flow chart for dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting;
FIG. 2 is a schematic diagram of constructing a dangerous driving behavior training set;
FIG. 3 is a diagram of steps for constructing a multi-scale dynamic convolution feature;
FIG. 4 is a schematic diagram of a multi-scale dynamic convolution feature based on attention-weighted fusion;
FIG. 5 is a schematic diagram of a test multiscale dynamic convolution model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and detailed description. The invention relates to dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting, and the specific flow is shown in fig. 1, and the implementation scheme of the invention comprises the following steps:
s1: constructing a dangerous driving behavior training set as shown in fig. 2;
s1-1: inputting dangerous driving behavior video V n Where n=1, 2, …, N is the number of videos;
s1-2: inputting dangerous driving behavior label true value L n E {0,1,2,3}, where 0 represents normal driving, 1 represents use of a cell phone, 2 represents drinking water, and 3 represents and occupantCommunicating;
s1-3: each video V obtained in the step S1-1 n Dividing into non-overlapping segments of multiple frames Snippetn m Where m=1, 2, …, M is the number of fragments;
s1-4: for each segment Snippetn m Randomly sampling to obtain a video frame F i Where i=1, 2, …, I is the number of video frames;
s1-5: preprocessing video frames;
s1-5-1: input video frame F i
S1-5-2: for video frame F i Randomly cutting;
s1-5-3: for video frame F i Carrying out random horizontal overturning;
s1-5-4: for video frame F i Performing standardization, namely standardizing the mean {0.485,0.456,0.406} of the three channels and the standard deviation {0.299,0.224,0.225};
s1-5-5: obtaining a preprocessed video frame FP i
S1-6: for all video frames F i Repeating the steps S1-5 to obtain a dangerous driving behavior data set DS;
s2: constructing a multi-scale dynamic convolution feature, as shown in fig. 3;
s2-1: input pre-processed video frame FP i ∈R C×H×W
Wherein C represents the number of channels, H represents the picture height, and W represents the picture width;
s2-2: inputting the characteristic Headf ε R extracted by the YOLOv3 model C×H×W
Wherein C represents the number of channels and H×W represents the feature size;
s2-3: constructing a 3×3 dynamic convolution feature;
s2-3-1: input feature Headf ε R C×H×W
S2-3-2: determining a center pixel (h, w), where h E [2, H-2], w E [2, W-2];
s2-3-3: determining a neighborhood range, wherein the neighborhood size is 1, and the neighborhood range is [ h-1, h+1 ]]×[w-1,w+1]Neighborhood is denoted as NS 3 ∈R 3×3
S2-3-4: given W K ∈R C×1 And Headf h,w ∈R C×1 Calculating Key feature Key of center pixel (h, w) h,w The formula is as follows:
Key h,w =W K ·Headf h,w ∈R C×1
wherein Key is h,w Representing the key vector at position (h, W), W K Representing a weight matrix;
s2-3-5: given W Q ∈R C×1 And Headf u,v ∈R C×1 Computing Query feature Query for pixel (u, v) in the domain u,v The formula is as follows:
Query u,v =W Q ·Headf u,v ∈R C×1
wherein Query is u,v Representing a query vector, W, at a location (u, v) Q Representing a weight matrix;
s2-3-6: calculating dynamic convolution kernel weights dck h,w,u,v The formula is as follows:
wherein the softmax function is used for normalizing the similarity, and Trans is a transpose operation, d r =C;
S2-3-7: repeating S2-3-3 to S2-3-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with the scale of 3 3,h,w ∈R 3×3
S2-3-8: for neighborhood NS 3 Using dynamic convolution kernel DCK 3,h,w Performing convolution operation to obtain dynamic convolution characteristic dckf h,w The formula is as follows:
dckf h,w =Conv 3×3 (NS 3 ,DCK 3,h,w )∈R 1×1
wherein dckf h,w Representing the final output at position (h, w), conv 3×3 Is a 3x3 convolution operation;
s2-3-9: repeating S2-3-2 to S2-3-8 by using a sliding window for all central pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF 1
S2-4: constructing a 5×5 dynamic convolution feature;
s2-4-1: input feature Headf ε R C×H×W
S2-4-2: determining a center pixel (h, w), wherein h E [3, H-3], w E [3,W-3];
s2-4-3: determining a neighborhood range, wherein the neighborhood size is 2, and the neighborhood range is [ h-2, h+2 ]]×[w-2,w+2]Neighborhood is denoted as NS 5 ∈R 5×5
S2-4-4: given W K ∈R C×1 And Headf h,w ∈R C×1 Calculating Key feature Key of center pixel (h, w) h,w The formula is as follows:
Key h,w =W K ·Headf h,w ∈R C×1
wherein Key is h,w Representing the key vector at position (h, W), W K Representing a weight matrix;
s2-4-5: given W Q ∈R C×1 And headf u,v ∈R C×1 Computing Query feature Query for pixel (u, v) in the domain u,v The formula is as follows:
Query u,v =W Q ·Headf u,v ∈R C×1
wherein Query is u,v Representing a query vector, W, at a location (u, v) Q Representing a weight matrix;
s2-4-6: calculating dynamic convolution kernel weights dck h,w,u,v The formula is as follows:
wherein the softmax function is used for normalizing the similarity, and Trans is a transpose operation, d r =C;
S2-4-7: repeating S2-5-3 to S2-5-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with a scale of 5 5,h,w
S2-4-8: for neighborhood NS 5 Using dynamic convolution kernel DCK 5,h,w Performing convolution operation to obtain dynamic convolution characteristic dckf h,w The formula is as follows:
wherein dckf h,w Representing the final output at position (h, w), conv 5×5 Is a convolution operation of 5x 5;
s2-4-9: repeating S2-5-2 to S2-5-8 by using a sliding window for all central pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF 2
S2-5: constructing 7×7 dynamic convolution characteristics;
s2-5-1: input feature Headf ε R C×H×W
S2-5-2: determining a center pixel (h, w), where h E [4, H-4], w E [4, W-4];
s2-5-3: determining a neighborhood range, wherein the neighborhood size is 3, and the neighborhood range is [ h-3, h+3 ]]×[w-3,w+3]Neighborhood is denoted as NS 7 ∈R 7×7
S2-5-4: given W K ∈R C×1 And Headf h,w ∈R C×1 Calculating Key feature Key of center pixel (h, w) h,w The formula is as follows:
Key h,w =W K ·Headf h,w ∈R C×1
wherein Key is h,w Representing the key vector at position (h, W), W K Representing a weight matrix;
s2-5-5: given W Q ∈R C×1 And Headf u,v ∈R C×1 Computing Query feature Query for pixel (u, v) in the domain u,v The formula is as follows:
Query u,v =W Q ·Headf u,v ∈R C×1
wherein Query is u,v Representing a query vector, W, at a location (u, v) Q Representing a weight matrix;
s2-5-6: calculating dynamic convolution kernel weights dck h,w,u,v The formula is as follows:
wherein the softmax function is used for normalizing the similarity, and Trans is a transpose operation, d r =C;
S2-5-7: repeating S2-7-3 to S2-7-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with the scale of 7 7,h,w ∈R 7×7
S2-5-8: for neighborhood NS 7 Using dynamic convolution kernel DCK 7,h,w Performing convolution operation to obtain dynamic convolution characteristic dckf h,w The formula is as follows:
dckf h,w =Conv 7×7 (NS 7 ,DCK 7,h,w )∈R 1×1
wherein dckf h,w Representing the final output at position (h, w), conv 7×7 Is a 7x7 convolution operation;
s2-5-9: repeating S2-7-2 to S2-7-8 for all center pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF 7
S3: fusing the multi-scale dynamic convolution characteristics based on the attention weighting;
s3-1: input 3×3 dynamic convolution characteristic DCKF 1 5×5 dynamic convolution characteristic DCKF 2 7×7 dynamic convolution characteristic DCKF 3
S3-2: for multi-scale dynamic convolution feature DCKF 1 、DCKF 2 、DCKF 3 Global average pooling is performed using a global average pooling layer to obtain global average pooling features Global Average Pooling Feature, denoted GAPF, formulated as follows:
GAPF=GAP(DCKF 1 ,DCKF 2 ,DCKF 3 )∈R C×1
wherein GAP is an abbreviation for global average pooling (Global Average Pooling), which is a common pooling operation method;
s3-3: the global average pooling feature GAPF is subjected to convolution operation by using a 1 multiplied by 1 convolution layer to obtain a convolution feature CF 0 The formula is as follows:
CF 0 =Conv 1×1 (GAPF)∈R C′×1
wherein R is C′×1 A column vector representing a C' dimension is a real vector space;
s3-4: for convolution characteristics CF 0 The three channels are respectively convolved by using three 1 multiplied by 1 convolution layers to obtain convolution characteristics CF 1 、CF 2 And CF (compact F) 3 As shown in fig. 4, the formula is as follows:
wherein CF is 1 、CF 2 、CF 3 These three vectors correspond to CF respectively 0 Representing new characteristic information obtained after convolution operation;
s3-5: for convolution characteristics CF 1 、CF 2 And CF (compact F) 3 Normalization operation is carried out by using a Softmax layer to obtain a fused multi-scale dynamic convolution feature CF 1,c ,CF 2,c And CF (compact F) 3,c The formula is as follows:
wherein CF is 1,c 、CF 2,c And CF (compact F) 3,c The value of (2) represents the weight or importance of the corresponding feature;
s4: training a multi-scale dynamic convolution model for dangerous driving behavior detection;
s4-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;
s4-2: calling step S2 to construct a multi-scale dynamic convolution feature, and extracting the multi-scale dynamic convolution feature to the feature Headf extracted by the dangerous driving behavior training set DS and the YOLOv3 model obtained in step S4-1 to obtain a 3 multiplied by 3 dynamic convolution feature DCKF 1 5×5 dynamic convolution kernel DCKF 2 7X7 movementState convolution kernel DCKF 3
S4-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the step S4-2 on the 3 multiplied by 3 dynamic convolution characteristics DCKF 1 5×5 dynamic convolution characteristic DCKF 2 7×7 dynamic convolution characteristic DCKF 3 Performing attention-based weighted fusion to obtain a fused multi-scale dynamic convolution feature CF 1,c ∈R 1×1 ,CF 2,c ∈R 1×1 And CF (compact F) 3,c ∈R 1×1
S4-4: for the fused multi-scale dynamic convolution feature CF obtained in the step S4-3 1,c ,CF 2,c And CF (compact F) 3,c Performing score vector calculation to obtain a score vector SV;
s4-5: feature Headf extracted from dangerous driving behavior training set DS and YOLOv3 model obtained in step S4-1 1 、Headf 2 And Headf 3 Repeating steps S4-2 to S4-4 to obtain a score vector SV 1 、SV 2 And SV(s) 3
S4-6: for the score vector SV obtained in step S4-5 1 、SV 2 And SV(s) 3 Performing addition and merging operation to obtain a final score vector FSV;
s4-7: performing maximum value extraction operation on the score vector FSV obtained in the step S4-6, and calculating a cross entropy loss function by using the real value of the dangerous driving behavior label to obtain loss;
s4-8: carrying out back propagation update on all parameters to obtain a multi-scale dynamic convolution model;
s5: testing a multi-scale dynamic convolution model, as shown in fig. 5;
s5-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;
s5-2: calling the step S2 to construct a multi-scale dynamic convolution feature, and extracting the multi-scale dynamic convolution feature to obtain a 3 multiplied by 3 dynamic convolution feature DCKF on the dangerous driving behavior training set DS obtained in the step S4-1 1 5×5 dynamic convolution kernel DCKF 2 7×7 dynamic convolution kernel DCKF 3
S5-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the step S5-2 on the 3 multiplied by 3 dynamic convolution characteristics DCKF 1 5×5 dynamic convolution characteristic DCKF 2 7×7 dynamic convolution characteristic DCKF 3 Performing attention-based weighted fusion to obtain a fused multi-scale dynamic convolution feature CF 1,c ∈R 1×1 ,CF 2,c ∈R 1×1 And CF (compact F) 3,c ∈R 1×1
S5-4: and (3) fusing the multiscale dynamic convolution characteristics CF obtained in the step S5-3 1,c ,CF 2,c And CF (compact F) 3,c Performing score vector calculation to obtain a score vector SV;
s5-5: feature Headf extracted from dangerous driving behavior training set DS and YOLOv3 model obtained in step S5-1 1 、Headf 2 And Headf 3 Repeating steps S5-2 to S5-4 to obtain score vector SV 1 、SV 2 And SV(s) 3
S5-6: for the score vector SV obtained in step S5-5 1 、SV 2 And SV(s) 3 Performing addition and merging operation to obtain a final score vector FSV;
s5-7: performing maximum value extraction operation on the score vector FSV obtained in the step S5-6 to obtain dangerous driving behavior label predicted values;
s5-8: and (3) fusing the multiscale dynamic convolution characteristics CF obtained in the step S5-3 1,c ,CF 2,c And CF (compact F) 3,c Predicting by using a multi-scale dynamic convolution model to obtain a dangerous driving behavior score DBS;
s5-9: and (5) invoking threshold judgment, namely performing threshold judgment on the dangerous driving behavior score DBS obtained in the step (S5-8), and performing corresponding reminding if the threshold is exceeded.

Claims (10)

1. A dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting is characterized by comprising the following steps of: the method specifically comprises the following steps:
s1: constructing a dangerous driving behavior data set;
s2: constructing a multi-scale dynamic convolution characteristic;
s3: fusing the multi-scale dynamic convolution characteristics based on the attention weighting;
s4: training a multi-scale dynamic convolution model for dangerous driving behavior detection;
s5: the test multi-scale dynamic convolution model is used for dangerous driving behavior detection.
2. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 1, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the step S1 of constructing the dangerous driving behavior data set to obtain the dangerous driving behavior data set specifically comprises the following steps:
s1-1: inputting dangerous driving behavior video V n Where n=1, 2,..n, N is the number of videos;
s1-2: inputting dangerous driving behavior label true value L n E {0,1,2,3}, where 0 represents normal driving, 1 represents use of a cell phone, 2 represents drinking water, and 3 represents communication with the passenger;
s1-3: each video V obtained in the step S1-1 n Dividing into non-overlapping segments of multiple frames Snippetn m Where m=1, 2,..m, M is the number of fragments;
s1-4: for each segment Snippetn m Randomly sampling to obtain a video frame F i Where i=1, 2, I, I is the number of video frames;
s1-5: preprocessing video frames;
s1-6: for all video frames F i And repeating S1-5 to obtain a dangerous driving behavior data set DS.
3. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 2, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the video frame preprocessing described in step S1-5 specifically comprises the following steps:
s1-5-1: input video frame F i
S1-5-2: for video frame F i Proceeding withRandomly cutting;
s1-5-3: for video frame F i Carrying out random horizontal overturning;
s1-5-4: for video frame F i Performing standardization, namely standardizing the mean {0.485,0.456,0.406} of the three channels and the standard deviation {0.299,0.224,0.225};
s1-5-5: obtaining a preprocessed video frame FP i
4. A dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 3, characterized in that: the step S2 of constructing the multi-scale dynamic convolution feature specifically comprises the following steps:
s2-1: input pre-processed video frame FP i ∈R C×H×W Wherein C represents the number of channels, H represents the picture height, and W represents the picture width;
s2-2: inputting the characteristic Headf ε R extracted by the YOLOv3 model C×H×W Wherein C represents the number of channels and h×w represents the feature size;
s2-3: constructing a 3×3 dynamic convolution feature;
s2-4: constructing a 5×5 dynamic convolution feature;
s2-5: a 7x7 dynamic convolution feature is constructed.
5. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 4, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the 3×3 dynamic convolution feature is constructed as described in step S2-3, and is specifically as follows:
s2-3-1: input feature Headf ε R C×H×W
S2-3-2: determining a center pixel (h, w), where h E [2, H-2], w E [2, W-2];
s2-3-3: determining a neighborhood range, wherein the neighborhood size is 1, and the neighborhood range is [ h-1, h+1 ]]×[w-1,w+1]Neighborhood is denoted as NS 3 ∈R 3×3
S2-3-4: given W K ∈R C×1 And Headf h,w ∈R C×1 Calculating Key feature Key of center pixel (h, w) h,w The formula is as follows:
Key h,w =W K ·Headf h,w ∈R C×1
wherein Key is h,w Representing the key vector at position (h, W), W K Representing a weight matrix;
s2-3-5: given W Q ∈R C×1 And Headf u,v ∈R C×1 Computing Query feature Query for pixel (u, v) in the domain u,v The formula is as follows:
Query u,v =W Q ·Headf u,v ∈R C×1
wherein Query is u,v Representing a query vector, W, at a location (u, v) Q Representing a weight matrix;
s2-3-6: calculating dynamic convolution kernel weights dck h,w,u,v The formula is as follows:
wherein the softmax function is used for normalizing the similarity, and Trans is a transpose operation, d r =C;
S2-3-7: repeating S2-3-3 to S2-3-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with the scale of 3 3,h,w ∈R 3×3
S2-3-8: for neighborhood NS 3 Using dynamic convolution kernel DCK 3,h,w Performing convolution operation to obtain dynamic convolution characteristic dckf h,w The formula is as follows:
dckf h,w =Conv 3×3 (NS 3 ,DCK 3,h,w )∈R 1×1
wherein dckf h,w Representing the final output at position (h, w), conv 3×3 Is a 3x3 convolution operation;
s2-3-9: repeating S2-3-2 to S2-3-8 by using a sliding window for all central pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF 1
6. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 5, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the construction of the 5×5 dynamic convolution feature described in step S2-4 is specifically as follows:
s2-4-1: input feature Headf ε R C×H×W
S2-4-2: determining a center pixel (h, w), wherein h E [3, H-3], w E [3,W-3];
s2-4-3: determining a neighborhood range, wherein the neighborhood size is 2, and the neighborhood range is [ h-2, h+2 ]]×[w-2,w+2]Neighborhood is denoted as NS 5 ∈R 5×5
S2-4-4: given W K ∈R C×1 And Headf h,w ∈R C×1 Calculating Key feature Key of center pixel (h, w) h,w The formula is as follows:
Key h,w =W K ·Headf h,w ∈R C×1
wherein Key is h,w Representing the key vector at position (h, W), W K Representing a weight matrix;
s2-4-5: given W Q ∈R C×1 And headf u,v ∈R C×1 Computing Query feature Query for pixel (u, v) in the domain u,v The formula is as follows:
Query u,v =W Q ·Headf u,v ∈R C×1
wherein Query is u,v Representing a query vector, W, at a location (u, v) Q Representing a weight matrix;
s2-4-6: calculating dynamic convolution kernel weights dck h,w,u,v The formula is as follows:
wherein the softmax function is used for normalizing the similarity, and Trans is a transpose operation, d r =C;
S2-4-7: within the range of the neighborhoodRepeating S2-5-3 to S2-5-6 for all pixels (u, v) to obtain dynamic convolution kernel DCK with scale of 5 5,h,w
S2-4-8: for neighborhood NS 5 Using dynamic convolution kernel DCK 5,h,w Performing convolution operation to obtain dynamic convolution characteristic dckf h,w The formula is as follows:
dckf h,w =Conv 5×5 (NS 5 ,DCK 5,h,w )∈R 1×1
wherein dckf h,w Representing the final output at position (h, w), conv 5×5 Is a convolution operation of 5x 5;
s2-4-9: repeating S2-5-2 to S2-5-8 by using a sliding window for all central pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF 2
7. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 6, wherein the dangerous driving behavior detection method is characterized by comprising the following steps: the 7×7 dynamic convolution feature is constructed as described in step S2-5, and is specifically as follows:
s2-5-1: input feature Headf ε R C×H×W
S2-5-2: determining a center pixel (h, w), where h E [4, H-4], w E [4, W-4];
s2-5-3: determining a neighborhood range, wherein the neighborhood size is 3, and the neighborhood range is [ h-3, h+3 ]]×[w-3,w+3]Neighborhood is denoted as NS 7 ∈R 7×7
S2-5-4: given W K ∈R C×1 And Headf h,w ∈R C×1 Calculating Key feature Key of center pixel (h, w) h,w The formula is as follows:
Key h,w =W K ·Headf h,w ∈R C×1
wherein Key is h,w Representing the key vector at position (h, W), W K Representing a weight matrix;
s2-5-5: given W Q ∈R C×1 And Headf u,v ∈R C×1 Computing Query feature Query for pixel (u, v) in the domain u,v The formula is as follows:
Query u,v =W Q ·Headf u,v ∈R C×1
wherein Query is u,v Representing a query vector, W, at a location (u, v) Q Representing a weight matrix;
s2-5-6: calculating dynamic convolution kernel weights dck h,w,u,v The formula is as follows:
wherein the softmax function is used for normalizing the similarity, and Trans is a transpose operation, d r =C;
S2-5-7: repeating S2-7-3 to S2-7-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with the scale of 7 7,h,w ∈R 7×7
S2-5-8: for neighborhood NS 7 Using dynamic convolution kernel DCK 7,h,w Performing convolution operation to obtain dynamic convolution characteristic dckf h,w The formula is as follows:
dckf h,w =Conv 7×7 (NS 7 ,DCK 7,h,w )∈R 1×1
wherein dckf h,w Representing the final output at position (h, w), conv 7×7 Is a 7x7 convolution operation;
s2-5-9: repeating S2-7-2 to S2-7-8 for all center pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF 7
8. The method for detecting dangerous driving behavior based on multi-scale dynamic convolution attention weighting according to claim 7, wherein the method comprises the following steps: the attention-based weighted fusion multi-scale dynamic convolution feature described in the step S3 specifically comprises the following steps:
s3-1: input 3×3 dynamic convolution characteristic DCKF 1 5×5 dynamic convolution characteristic DCKF 2 7×7 dynamic convolution characteristic DCKF 3
S3-2: for multi-scale dynamic convolution feature DCKF 1 、DCKF 2 、DCKF 3 Global average pooling is performed using a global average pooling layer to obtain global average pooling features Global Average Pooling Feature, denoted GAPF, formulated as follows:
GAPF=GAP(DCKF 1 ,DCKF 2 ,DCKF 3 )∈R C×1
s3-3: the global average pooling feature GAPF is subjected to convolution operation by using a 1 multiplied by 1 convolution layer to obtain a convolution feature CF 0 The formula is as follows:
CF 0 =Conv 1×1 (GAPF)∈R C′×1
wherein R is C′×1 A column vector representing a C' dimension is a real vector space;
s3-4: for convolution characteristics CF 0 The three channels are respectively convolved by using three 1 multiplied by 1 convolution layers to obtain convolution characteristics CF 1 、CF 2 And CF (compact F) 3 The formula is as follows:
wherein CF is 1 、CF 2 、CF 3 These three vectors correspond to CF respectively 0 Representing new characteristic information obtained after convolution operation;
s3-5: for convolution characteristics CF 1 、CF 2 And CF (compact F) 3 Normalization operation is carried out by using a Softmax layer to obtain a fused multi-scale dynamic convolution feature CF 1,c ,CF 2,c And CF (compact F) 3,c The formula is as follows:
wherein CF is 1,c 、CF 2,c And CF (compact F) 3,c The value of (2) represents the weight or importance of the corresponding feature.
9. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 8, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the training multi-scale dynamic convolution model in the step S4 is used for dangerous driving behavior detection, and specifically comprises the following steps:
s4-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;
s4-2: calling step S2 to construct a multi-scale dynamic convolution feature, and extracting the multi-scale dynamic convolution feature to the feature Headf extracted by the dangerous driving behavior training set DS and the YOLOv3 model obtained in step S4-1 to obtain a 3 multiplied by 3 dynamic convolution feature DCKF 1 5×5 dynamic convolution kernel DCKF 2 7×7 dynamic convolution kernel DCKF 3
S4-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the step S4-2 on the 3 multiplied by 3 dynamic convolution characteristics DCKF 1 5×5 dynamic convolution characteristic DCKF 2 7×7 dynamic convolution characteristic DCKF 3 Performing attention-based weighted fusion to obtain a fused multi-scale dynamic convolution feature CF 1,c ∈R 1×1 ,CF 2,c ∈R 1×1 And CF (compact F) 3,c ∈R 1×1
S4-4: for the fused multi-scale dynamic convolution feature CF obtained in the step S4-3 1,c ,CF 2,c And CF (compact F) 3,c Performing score vector calculation to obtain a score vector SV;
s4-5: feature Headf extracted from dangerous driving behavior training set DS and YOLOv3 model obtained in step S4-1 1 、Headf 2 And Headf 3 Repeating steps S4-2 to S4-4 to obtain a score vector SV 1 、SV 2 And SV(s) 3
S4-6: for the score vector SV obtained in step S4-5 1 、SV 2 And SV(s) 3 Adding and combiningAnd operating to obtain a final score vector FSV;
s4-7: performing maximum value extraction operation on the score vector FSV obtained in the step S4-6, and calculating a cross entropy loss function by using the real value of the dangerous driving behavior label to obtain loss;
s4-8: and carrying out back propagation updating on all parameters to obtain the multi-scale dynamic convolution model.
10. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 9, wherein the dangerous driving behavior detection method comprises the following steps: the step S5 of testing the multi-scale dynamic convolution model specifically comprises the following steps:
s5-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;
s5-2: calling the step S2 to construct a multi-scale dynamic convolution feature, and extracting the multi-scale dynamic convolution feature to obtain a 3 multiplied by 3 dynamic convolution feature DCKF on the dangerous driving behavior training set DS obtained in the step S4-1 1 5×5 dynamic convolution kernel DCKF 2 7×7 dynamic convolution kernel DCKF 3
S5-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the step S5-2 on the 3 multiplied by 3 dynamic convolution characteristics DCKF 1 5×5 dynamic convolution characteristic DCKF 2 7×7 dynamic convolution characteristic DCKF 3 Performing attention-based weighted fusion to obtain a fused multi-scale dynamic convolution feature CF 1,c ∈R 1×1 ,CF 2,c ∈R 1×1 And CF (compact F) 3,c ∈R 1×1
S5-4: and (3) fusing the multiscale dynamic convolution characteristics CF obtained in the step S5-3 1,c ,CF 2,c And CF (compact F) 3,c Performing score vector calculation to obtain a score vector SV;
s5-5: feature Headf extracted from dangerous driving behavior training set DS and YOLOv3 model obtained in step S5-1 1 、Headf 2 And Headf 3 Repeating steps S5-2 to S5-4 to obtain score vector SV 1 、SV 2 And SV(s) 3
S5-6: for the score vector SV obtained in step S5-5 1 、SV 2 And SV(s) 3 Performing addition and merging operation to obtain a final score vector FSV;
s5-7: performing maximum value extraction operation on the score vector FSV obtained in the step S5-6 to obtain dangerous driving behavior label predicted values;
s5-8: and (3) fusing the multiscale dynamic convolution characteristics CF obtained in the step S5-3 1,c ,CF 2,c And CF (compact F) 3,c Predicting by using a multi-scale dynamic convolution model to obtain a dangerous driving behavior score DBS;
s5-9: and (5) invoking threshold judgment, namely performing threshold judgment on the dangerous driving behavior score DBS obtained in the step (S5-8), and performing corresponding reminding if the threshold is exceeded.
CN202311538093.9A 2023-11-17 2023-11-17 Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting Active CN117576666B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311538093.9A CN117576666B (en) 2023-11-17 2023-11-17 Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311538093.9A CN117576666B (en) 2023-11-17 2023-11-17 Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting

Publications (2)

Publication Number Publication Date
CN117576666A true CN117576666A (en) 2024-02-20
CN117576666B CN117576666B (en) 2024-05-10

Family

ID=89894792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311538093.9A Active CN117576666B (en) 2023-11-17 2023-11-17 Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting

Country Status (1)

Country Link
CN (1) CN117576666B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059582A (en) * 2019-03-28 2019-07-26 东南大学 Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks
WO2021248687A1 (en) * 2020-06-10 2021-12-16 南京理工大学 Driving fatigue detection method and system combining pseudo 3d convolutional neural network and attention mechanism
CN114241210A (en) * 2021-11-22 2022-03-25 中国海洋大学 Multi-task learning method and system based on dynamic convolution
CN114241456A (en) * 2021-12-20 2022-03-25 东南大学 Safe driving monitoring method using feature adaptive weighting
US20230260247A1 (en) * 2022-02-17 2023-08-17 Samsung Electronics Co., Ltd. System and method for dual-value attention and instance boundary aware regression in computer vision system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059582A (en) * 2019-03-28 2019-07-26 东南大学 Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks
WO2021248687A1 (en) * 2020-06-10 2021-12-16 南京理工大学 Driving fatigue detection method and system combining pseudo 3d convolutional neural network and attention mechanism
CN114241210A (en) * 2021-11-22 2022-03-25 中国海洋大学 Multi-task learning method and system based on dynamic convolution
CN114241456A (en) * 2021-12-20 2022-03-25 东南大学 Safe driving monitoring method using feature adaptive weighting
US20230260247A1 (en) * 2022-02-17 2023-08-17 Samsung Electronics Co., Ltd. System and method for dual-value attention and instance boundary aware regression in computer vision system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
万思宇;: "基于注意力机制的3D车辆检测算法", 计算机工程与科学, no. 01, 15 January 2020 (2020-01-15) *
龙劲峄;周骅;: "基于嵌入式神经网络的危险驾驶行为检测系统", 智能计算机与应用, no. 03, 1 March 2020 (2020-03-01) *

Also Published As

Publication number Publication date
CN117576666B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
Omerustaoglu et al. Distracted driver detection by combining in-vehicle and image data using deep learning
Weng et al. Driver drowsiness detection via a hierarchical temporal deep belief network
Yuan Video-based smoke detection with histogram sequence of LBP and LBPV pyramids
Li et al. Visual saliency based on conditional entropy
Yan et al. Driving posture recognition by joint application of motion history image and pyramid histogram of oriented gradients
Hossain et al. Automatic driver distraction detection using deep convolutional neural networks
CN109460787B (en) Intrusion detection model establishing method and device and data processing equipment
CN110427871B (en) Fatigue driving detection method based on computer vision
CN108416780B (en) Object detection and matching method based on twin-region-of-interest pooling model
CN110826429A (en) Scenic spot video-based method and system for automatically monitoring travel emergency
Ganokratanaa et al. Video anomaly detection using deep residual-spatiotemporal translation network
Li et al. Fall detection based on fused saliency maps
Kassem et al. Yawn based driver fatigue level prediction
Xu et al. Concrete crack segmentation based on convolution–deconvolution feature fusion with holistically nested networks
Uppal et al. Emotion recognition and drowsiness detection using Python
Jegham et al. Deep learning-based hard spatial attention for driver in-vehicle action monitoring
Dhawan et al. Identification of traffic signs for advanced driving assistance systems in smart cities using deep learning
CN112528903B (en) Face image acquisition method and device, electronic equipment and medium
CN117576666B (en) Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting
Sirisha et al. Utilizing a Hybrid Model for Human Injury Severity Analysis in Traffic Accidents.
Gopikrishnan et al. DriveCare: a real-time vision based driver drowsiness detection using multiple convolutional neural networks with kernelized correlation filters (MCNN-KCF)
Xu et al. An intra-frame classification network for video anomaly detection and localization
CN112329566A (en) Visual perception system for accurately perceiving head movements of motor vehicle driver
Chai et al. Driver head pose detection from naturalistic driving data
QU et al. Multi-Attention Fusion Drowsy Driving Detection Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant