CN117576666A

CN117576666A - Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting

Info

Publication number: CN117576666A
Application number: CN202311538093.9A
Authority: CN
Inventors: 李自强; 吴克伟; 纪松; 谢昭; 程明; 徐浩; 王键钊; 张沛錡; 谭昊
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2023-11-17
Filing date: 2023-11-17
Publication date: 2024-02-20
Anticipated expiration: 2043-11-17
Also published as: CN117576666B

Abstract

The invention discloses a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting. Because of the existing target detection model, different types of dangerous driving behaviors are difficult to distinguish, the method of the invention uses the characteristics of different behaviors to learn the dynamic convolution kernel. In order to improve the recognition accuracy of dangerous driving behaviors with different resolutions, the method disclosed by the invention considers dynamic convolution kernels with different scales in a monitoring environment. In order to effectively fuse the multi-scale dynamic convolution characteristics, the method analyzes the relation among the scale characteristics, is used for learning the attention of each scale and realizes the multi-scale characteristic fusion. The multi-scale dynamic convolution module and the attention weighting module are added in the existing target detection model, so that the accuracy of dangerous driving behavior detection can be improved, the method can be applied to a vehicle safety system, and driving safety is guaranteed.

Description

Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting

Technical Field

The invention relates to the technical field of multi-scale dynamic convolution attention weighting, in particular to a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting.

Background

In order to ensure good traffic order and safety of people's lives and properties, it is necessary to monitor dangerous driving behavior of a driving driver. With the rapid development of a multi-scale dynamic convolution attention weighting module, a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting is gradually paid attention to the industry.

The Chinese patent application publication No. CN114005093A (video analysis-based driving behavior warning method, device, equipment and medium) provides a driving behavior warning method based on video analysis. The method identifies an image of dangerous driving behaviors of the target vehicle by marking position information, vehicle speed information and driving track information of the target vehicle and other vehicles in the image and a plurality of dangerous driving characteristics acquired in advance. And when the number of images of dangerous driving behaviors is larger than a preset threshold value in a preset unit time, alarming a driver of the target vehicle. However, this method only uses the external information of the vehicle to detect, but cannot sufficiently combine the driving state of the driver, and it is difficult to achieve the effect of early warning. Chinese patent application publication No. CN113033261A, a dangerous driving identification and early warning method, proposes a dangerous driving identification and early warning method. The method comprises the following steps: 1. acquiring current driving data of a driving vehicle; 2. extracting driving behavior characteristics according to driving data; 3. identifying driving behavior characteristics by adopting a fuzzy convolutional neural network; 4. and when dangerous driving is identified, sending out an early warning signal. The method has the advantages of simple steps and easy implementation, the driving behavior characteristics are identified through the fuzzy convolutional neural network, the method has higher identification accuracy, the dangerous driving behavior of the driver can be effectively judged, the early warning signal can be timely sent out, the obvious effect is achieved, and the potential of popularization and application is provided.

In Gong Jian-Qiang and Wang Yi-ying, research on Online Identification Algorithm of Dangerous Driving Behavior, they propose an algorithm for identifying dangerous driving behaviors using a variance Bayesian network, and the results indicate that the model can identify two dangerous driving behaviors and has better generalization performance compared with a single variance model. Zhe Ma, xiaohui Yang and Haora Zhang, "Dangerous Driving Behavior Recognition using CA-CenterNet," propose a dangerous driving behavior recognition method using CA-CenterNet. In this study, they identified dangerous driving behaviors based on the driver's hand behaviors and captured a large amount of driver video to build a dangerous driving behavior data set based on hand detection. They evaluated and compared with other network models, and experimental results show that the method improves the accuracy of behavior recognition.

However, in dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting, we need to fully consider the problem that the fragments of the background and the fuzzy behavior in the driving scene are difficult to classify. The key information such as the face of the driver is acquired by a multi-scale dynamic convolution technology. Using the image data acquired by the camera, in combination with the attention weighted model, we can accurately extract the position and pose of the point of interest therefrom. In this way, we can monitor the face situation of the driver more accurately, and have stronger characterization ability when detecting dangerous driving behaviors of the driver.

Disclosure of Invention

The invention aims to make up the defects of the prior art and provides a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting.

The invention is realized by the following technical scheme:

a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting specifically comprises the following steps:

s1: constructing a dangerous driving behavior data set;

s2: constructing a multi-scale dynamic convolution characteristic;

s3: fusing the multi-scale dynamic convolution characteristics based on the attention weighting;

s4: training a multi-scale dynamic convolution model for dangerous driving behavior detection;

s5: the multi-scale dynamic convolution model is tested and used for dangerous driving behavior detection;

the dangerous driving behavior training set is constructed in the step S1, and the specific steps are as follows:

s1-1: inputting dangerous driving behavior video V _n Where n=1, 2, …, N is the number of videos;

s1-2: inputting dangerous driving behavior label true value L _n E {0,1,2,3}, where 0 represents normal driving, 1 represents use of a cell phone, 2 represents drinking water, and 3 represents communication with the passenger;

s1-3: each video V obtained in the step S1-1 _n Dividing into non-overlapping segments of multiple frames Snippetn _m Where m=1, 2, …, M is the number of fragments;

s1-4: for each segment Snippetn _m Randomly sampling to obtain a video frame F _i Where i=1, 2, …, I is the number of video frames;

s1-5: preprocessing video frames;

s1-5-1: input video frame F _i ；

S1-5-2: for video frame F _i Randomly cutting;

s1-5-3: for video frame F _i Carrying out random horizontal overturning;

s1-5-4: for video frame F _i Performing standardization, namely standardizing the mean {0.485,0.456,0.406} of the three channels and the standard deviation {0.299,0.224,0.225};

s1-5-5: obtaining a preprocessed video frame FP _i ；

S1-6: for all video frames F _i Repeating the steps S1-5 to obtain a dangerous driving behavior data set DS;

the construction of the multi-scale dynamic convolution feature in the step S2 comprises the following specific steps:

s2-1: input pre-processed video frame FP _i ∈R ^C×H×W ；

Wherein C represents the number of channels, H represents the picture height, and W represents the picture width;

s2-2: inputting the characteristic Headf ε R extracted by the YOLOv3 model ^C×H×W ；

Wherein C represents the number of channels and H×W represents the feature size;

s2-3: constructing a 3×3 dynamic convolution feature;

s2-3-1: input feature Headf ε R ^C×H×W ；

S2-3-2: determining a center pixel (h, w), where h E [2, H-2], w E [2, W-2];

s2-3-3: determining a neighborhood range, wherein the neighborhood size is 1, and the neighborhood range is [ h-1, h+1 ]]×[w-1,w+1]Neighborhood is denoted as NS ₃ ∈R ^3×3 ；

S2-3-4: given W _K ∈R ^C×1 And Headf _h,w ∈R ^C×1 Calculating Key feature Key of center pixel (h, w) _h,w The formula is as follows:

Key _h,w ＝W _K ·Headf _h,w ∈R ^C×1 ；

wherein Key is _h,w Representing the key vector at position (h, W), W _K Representing a weight matrix;

s2-3-5: given W _Q ∈R ^C×1 And Headf _u,v ∈R ^C×1 Computing Query feature Query for pixel (u, v) in the domain _u,v The formula is as follows:

Query _u,v ＝W _Q ·Headf _u,v ∈R ^C×1 ；

wherein Query is _u,v Representing a query vector, W, at a location (u, v) _Q Representing a weight matrix;

s2-3-6: calculating dynamic convolution kernel weights dck _h,w,u,v The formula is as follows:

wherein the softmax function is used for normalizing the similarity, and Trans is a transpose operation, d _r ＝C；

S2-3-7: repeating S2-3-3 to S2-3-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with the scale of 3 _3,h,w ∈R ^3×3 ；

S2-3-8: for neighborhood NS ₃ Using dynamic convolution kernel DCK _3,h,w Performing convolution operation to obtain dynamic convolution characteristic dckf _h,w The formula is as follows:

dckf _h,w ＝Conv _3×3 (NS ₃ ,DCK _3,h,w )∈R ^1×1 ；

wherein dckf _h,w Representing the final output at position (h, w), conv _3×3 Is a 3x3 convolution operation;

s2-3-9: repeating S2-3-2 to S2-3-8 by using a sliding window for all central pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF ₁ ；

S2-4: constructing a 5×5 dynamic convolution feature;

s2-4-1: input feature Headf ε R ^C×H×W ；

S2-4-2: determining a center pixel (h, w), wherein h E [3, H-3], w E [3,W-3];

s2-4-3: determining a neighborhood range, wherein the neighborhood size is 2, and the neighborhood range is [ h-2, h+2 ]]×[w-2,w+2]Neighborhood is denoted as NS ₅ ∈R ^5×5 ；

S2-4-4: given W _K ∈RC ^×1 And Headf _h,w ∈R ^C×1 Calculating Key feature Key of center pixel (h, w) _h,w The formula is as follows:

Key _h,w ＝W _K ·Headf _h,w ∈R ^C×1 ；

s2-4-5: given W _Q ∈R ^C×1 And headf _u,v ∈R ^C×1 Computing Query feature Query for pixel (u, v) in the domain _u,v The formula is as follows:

Query _u,v ＝W _Q ·Headf _u,v ∈R ^C×1 ；

s2-4-6: calculating dynamic convolution kernel weights dck _h,w,u,v The formula is as follows:

S2-4-7: repeating S2-5-3 to S2-5-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with a scale of 5 _5,h,w ；

S2-4-8: for neighborhood NS ₅ Using dynamic convolution kernel DCK _5,h,w Performing convolution operation to obtain dynamic convolution characteristic dckf _h,w The formula is as follows:

dckf _h,w ＝Conv _5×5 (NS ₅ ,DCK _5,h,w )∈R ^1×1 ；

wherein dckf _h,w Representing the final output at position (h, w), conv _5×5 Is a convolution operation of 5x 5;

s2-4-9: in HeadfAll center pixels (h, w), repeating S2-5-2 to S2-5-8 using a sliding window, to obtain a dynamic convolution characteristic DCKF ₂ ；

S2-5: constructing 7×7 dynamic convolution characteristics;

s2-5-1: input feature Headf ε R ^C×H×W ；

S2-5-2: determining a center pixel (h, w), where h E [4, H-4], w E [4, W-4];

s2-5-3: determining a neighborhood range, wherein the neighborhood size is 3, and the neighborhood range is [ h-3, h+3 ]]×[w-3,w+3]Neighborhood is denoted as NS ₇ ∈R ^7×7 ；

S2-5-4: given W _K ∈R ^C×1 And Headf _h,w ∈R ^C×1 Calculating Key feature Key of center pixel (h, w) _h,w The formula is as follows:

Key _h,w ＝W _K ·headf _h,w ∈R ^C×1 ；

s2-5-5: given W _Q ∈R ^C×1 And Headf _u,v ∈R ^C×1 Computing Query feature Query for pixel (u, v) in the domain _u,v The formula is as follows:

Query _u,v ＝W _Q ·Headf _u,v ∈R ^C×1 ；

s2-5-6: calculating dynamic convolution kernel weights dck _h,w,u,v The formula is as follows:

S2-5-7: repeating S2-7-3 to S2-7-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with the scale of 7 _7,h,w ∈R ^7×7 ；

S2-5-8: for neighborhood NS ₇ Using dynamic convolution kernel DCK _7,h,w Performing convolution operation to obtain dynamic convolution characteristic dckf _h,w The formula is as follows:

dckf _h,w ＝Conv _7×7 (NS ₇ ,DCK _7,h,w )∈R ^1×1 ；

wherein dckf _h,w Representing the final output at position (h, w), conv _7×7 Is a 7x7 convolution operation;

s2-5-9: repeating S2-7-2 to S2-7-8 for all center pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF ₇ ；

The step S3 of fusing the multi-scale dynamic convolution characteristics based on the attention weighting specifically comprises the following steps:

s3-1: input 3×3 dynamic convolution characteristic DCKF ₁ 5×5 dynamic convolution characteristic DCKF ₂ 7×7 dynamic convolution characteristic DCKF ₃ ；

S3-2: for multi-scale dynamic convolution feature DCKF ₁ 、DCKF ₂ 、DCKF ₃ Global average pooling is performed using a global average pooling layer to obtain global average pooling features Global Average Pooling Feature, denoted GAPF, formulated as follows:

GAPF＝GAP(DCKF ₁ ,DCKF ₂ ,DCKF ₃ )∈R ^C×1 ；

wherein GAP is an abbreviation for global average pooling (Global Average Pooling), which is a common pooling operation method;

s3-3: the global average pooling feature GAPF is subjected to convolution operation by using a 1 multiplied by 1 convolution layer to obtain a convolution feature CF ₀ The formula is as follows:

CF ₀ ＝Conv _1×1 (GAPF)∈R ^C′×1 ；

wherein R is ^C′×1 A column vector representing a C' dimension is a real vector space;

s3-4: for convolution characteristics CF ₀ The three channels are respectively convolved by using three 1 multiplied by 1 convolution layers to obtain convolution characteristics CF ₁ 、CF ₂ And CF (compact F) ₃ The formula is as follows:

wherein CF is ₁ 、CF ₂ 、CF ₃ These three vectors correspond to CF respectively ₀ Representing new characteristic information obtained after convolution operation;

s3-5: for convolution characteristics CF ₁ 、CF ₂ And CF (compact F) ₃ Normalization operation is carried out by using a Softmax layer to obtain a fused multi-scale dynamic convolution feature CF _1,c ，CF _2,c And CF (compact F) _3,c The formula is as follows:

wherein CF is _1,c 、CF _2,c And CF (compact F) _3,c The value of (2) represents the weight or importance of the corresponding feature;

the training multi-scale dynamic convolution model in the step S4 is used for dangerous driving behavior detection, and specifically comprises the following steps:

s4-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;

s4-2: calling step S2 to construct a multi-scale dynamic convolution feature, and extracting the multi-scale dynamic convolution feature to the feature Headf extracted by the dangerous driving behavior training set DS and the YOLOv3 model obtained in step S4-1 to obtain a 3 multiplied by 3 dynamic convolution feature DCKF ₁ 5×5 dynamic convolution kernel DCKF ₂ 7×7 dynamic convolution kernel DCKF ₃ ；

S4-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the step S4-2 on the 3 multiplied by 3 dynamic convolution characteristics DCKF ₁ 5×5 dynamic convolution characteristic DCKF ₂ 7×7 dynamic convolution characteristic DCKF ₃ Performing attention-based weighted fusion to obtain a fused multi-scale dynamic convolutionFeature CF _1,c ∈R ^1×1 ，CF _2,c ∈R ^1×1 And CF (compact F) _3,c ∈R ^1×1 ；

S4-4: for the fused multi-scale dynamic convolution feature CF obtained in the step S4-3 _1,c ，CF _2,c And CF (compact F) _3,c Performing score vector calculation to obtain a score vector SV;

s4-5: feature Headf extracted from dangerous driving behavior training set DS and YOLOv3 model obtained in step S4-1 ₁ 、Headf ₂ And Headf ₃ Repeating steps S4-2 to S4-4 to obtain a score vector SV ₁ 、SV ₂ And SV(s) ₃ ；

S4-6: for the score vector SV obtained in step S4-5 ₁ 、SV ₂ And SV(s) ₃ Performing addition and merging operation to obtain a final score vector FSV;

s4-7: performing maximum value extraction operation on the score vector FSV obtained in the step S4-6, and calculating a cross entropy loss function by using the real value of the dangerous driving behavior label to obtain loss;

s4-8: carrying out back propagation update on all parameters to obtain a multi-scale dynamic convolution model;

the step S5 of testing the multi-scale dynamic convolution model comprises the following specific steps:

s5-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;

s5-2: calling the step S2 to construct a multi-scale dynamic convolution feature, and extracting the multi-scale dynamic convolution feature to obtain a 3 multiplied by 3 dynamic convolution feature DCKF on the dangerous driving behavior training set DS obtained in the step S4-1 ₁ 5×5 dynamic convolution kernel DCKF ₂ 7×7 dynamic convolution kernel DCKF ₃ ；

S5-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the step S5-2 on the 3 multiplied by 3 dynamic convolution characteristics DCKF ₁ 5×5 dynamic convolution characteristic DCKF ₂ 7×7 dynamic convolution characteristic DCKF ₃ Performing attention-based weighted fusion to obtain a fused multi-scale dynamic convolution feature CF _1,c ∈R ^1×1 ，CF _2,c ∈R ^1×1 And CF (compact F) _3,c ∈R ^1×1 ；

S5-4: and (3) fusing the multiscale dynamic convolution characteristics CF obtained in the step S5-3 _1,c ，CF _2,c And CF (compact F) _3,c Performing score vector calculation to obtain a score vector SV;

s5-5: feature Headf extracted from dangerous driving behavior training set DS and YOLOv3 model obtained in step S5-1 ₁ 、Headf ₂ And Headf ₃ Repeating steps S5-2 to S5-4 to obtain score vector SV ₁ 、SV ₂ And SV(s) ₃ ；

S5-6: for the score vector SV obtained in step S5-5 ₁ 、SV ₂ And SV(s) ₃ Performing addition and merging operation to obtain a final score vector FSV;

s5-7: performing maximum value extraction operation on the score vector FSV obtained in the step S5-6 to obtain dangerous driving behavior label predicted values;

s5-8: and (3) fusing the multiscale dynamic convolution characteristics CF obtained in the step S5-3 _1,c ，CF _2,c And CF (compact F) _3,c Predicting by using a multi-scale dynamic convolution model to obtain a dangerous driving behavior score DBS;

s5-9: and (5) invoking threshold judgment, namely performing threshold judgment on the dangerous driving behavior score DBS obtained in the step (S5-8), and performing corresponding reminding if the threshold is exceeded.

The invention has the advantages that: the driving safety is improved by the dangerous driving behavior detection method based on the multi-scale dynamic convolution attention weighting, and in order to effectively fuse the multi-scale dynamic convolution characteristics, the method analyzes the relation among the scale characteristics, is used for learning the attention of each scale and realizes the multi-scale characteristic fusion. Acquiring facial information of a driver through a video frame, and then detecting whether dangerous driving behaviors exist in the driver through analyzing features such as facial expressions, head gestures and the like; in order to accurately detect dangerous driving behaviors, the method adopts a multi-scale dynamic convolution attention weighted dangerous driving behavior detection technology, and a video frame sequence is input into a multi-scale dynamic convolution model to capture key moments and actions so as to obtain a series of driving behavior representations; when dangerous driving behaviors are detected, the method combines facial features and driving behaviors to carry out comprehensive judgment. For example, dangerous behavior such as fatigue driving, distraction driving, and the like may be detected by analyzing the facial expression and driving behavior of the driver. Meanwhile, by combining the learning result of the attention weighting module, more accurate face information can be acquired and used for detecting the attention concentration degree, driving behavior and the like of the driver. According to the invention, through dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting and combining with facial features and driving behaviors, the accuracy and reliability of the driving behaviors of a driver are improved. This helps to promote driving safety, prevents the emergence of accident.

Drawings

FIG. 1 is a flow chart for dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting;

FIG. 2 is a schematic diagram of constructing a dangerous driving behavior training set;

FIG. 3 is a diagram of steps for constructing a multi-scale dynamic convolution feature;

FIG. 4 is a schematic diagram of a multi-scale dynamic convolution feature based on attention-weighted fusion;

FIG. 5 is a schematic diagram of a test multiscale dynamic convolution model.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and detailed description. The invention relates to dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting, and the specific flow is shown in fig. 1, and the implementation scheme of the invention comprises the following steps:

s1: constructing a dangerous driving behavior training set as shown in fig. 2;

s1-2: inputting dangerous driving behavior label true value L _n E {0,1,2,3}, where 0 represents normal driving, 1 represents use of a cell phone, 2 represents drinking water, and 3 represents and occupantCommunicating;

s1-5: preprocessing video frames;

s1-5-1: input video frame F _i ；

S1-5-2: for video frame F _i Randomly cutting;

s1-5-3: for video frame F _i Carrying out random horizontal overturning;

s1-5-5: obtaining a preprocessed video frame FP _i ；

s2: constructing a multi-scale dynamic convolution feature, as shown in fig. 3;

s2-1: input pre-processed video frame FP _i ∈R ^C×H×W ；

s2-3: constructing a 3×3 dynamic convolution feature;

s2-3-1: input feature Headf ε R ^C×H×W ；

S2-3-2: determining a center pixel (h, w), where h E [2, H-2], w E [2, W-2];

Key _h,w ＝W _K ·Headf _h,w ∈R ^C×1 ；

Query _u,v ＝W _Q ·Headf _u,v ∈R ^C×1 ；

dckf _h,w ＝Conv _3×3 (NS ₃ ,DCK _3,h,w )∈R ^1×1 ；

S2-4: constructing a 5×5 dynamic convolution feature;

s2-4-1: input feature Headf ε R ^C×H×W ；

S2-4-2: determining a center pixel (h, w), wherein h E [3, H-3], w E [3,W-3];

S2-4-4: given W _K ∈R ^C×1 And Headf _h,w ∈R ^C×1 Calculating Key feature Key of center pixel (h, w) _h,w The formula is as follows:

Key _h,w ＝W _K ·Headf _h,w ∈R ^C×1 ；

Query _u,v ＝W _Q ·Headf _u,v ∈R ^C×1 ；

s2-4-9: repeating S2-5-2 to S2-5-8 by using a sliding window for all central pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF ₂ ；

S2-5: constructing 7×7 dynamic convolution characteristics;

s2-5-1: input feature Headf ε R ^C×H×W ；

S2-5-2: determining a center pixel (h, w), where h E [4, H-4], w E [4, W-4];

Key _h,w ＝W _K ·Headf _h,w ∈R ^C×1 ；

Query _u,v ＝W _Q ·Headf _u,v ∈R ^C×1 ；

dckf _h,w ＝Conv _7×7 (NS ₇ ,DCK _7,h,w )∈R ^1×1 ；

GAPF＝GAP(DCKF ₁ ,DCKF ₂ ,DCKF ₃ )∈R ^C×1 ；

CF ₀ ＝Conv _1×1 (GAPF)∈R ^C′×1 ；

s3-4: for convolution characteristics CF ₀ The three channels are respectively convolved by using three 1 multiplied by 1 convolution layers to obtain convolution characteristics CF ₁ 、CF ₂ And CF (compact F) ₃ As shown in fig. 4, the formula is as follows:

s4-2: calling step S2 to construct a multi-scale dynamic convolution feature, and extracting the multi-scale dynamic convolution feature to the feature Headf extracted by the dangerous driving behavior training set DS and the YOLOv3 model obtained in step S4-1 to obtain a 3 multiplied by 3 dynamic convolution feature DCKF ₁ 5×5 dynamic convolution kernel DCKF ₂ 7X7 movementState convolution kernel DCKF ₃ ；

S4-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the step S4-2 on the 3 multiplied by 3 dynamic convolution characteristics DCKF ₁ 5×5 dynamic convolution characteristic DCKF ₂ 7×7 dynamic convolution characteristic DCKF ₃ Performing attention-based weighted fusion to obtain a fused multi-scale dynamic convolution feature CF _1,c ∈R ^1×1 ，CF _2,c ∈R ^1×1 And CF (compact F) _3,c ∈R ^1×1 ；

s5: testing a multi-scale dynamic convolution model, as shown in fig. 5;

Claims

1. A dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting is characterized by comprising the following steps of: the method specifically comprises the following steps:

s1: constructing a dangerous driving behavior data set;

s2: constructing a multi-scale dynamic convolution characteristic;

s5: the test multi-scale dynamic convolution model is used for dangerous driving behavior detection.

2. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 1, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the step S1 of constructing the dangerous driving behavior data set to obtain the dangerous driving behavior data set specifically comprises the following steps:

s1-1: inputting dangerous driving behavior video V _n Where n=1, 2,..n, N is the number of videos;

s1-3: each video V obtained in the step S1-1 _n Dividing into non-overlapping segments of multiple frames Snippetn _m Where m=1, 2,..m, M is the number of fragments;

s1-4: for each segment Snippetn _m Randomly sampling to obtain a video frame F _i Where i=1, 2, I, I is the number of video frames;

s1-5: preprocessing video frames;

s1-6: for all video frames F _i And repeating S1-5 to obtain a dangerous driving behavior data set DS.

3. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 2, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the video frame preprocessing described in step S1-5 specifically comprises the following steps:

s1-5-1: input video frame F _i ；

S1-5-2: for video frame F _i Proceeding withRandomly cutting;

s1-5-3: for video frame F _i Carrying out random horizontal overturning;

s1-5-5: obtaining a preprocessed video frame FP _i 。

4. A dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 3, characterized in that: the step S2 of constructing the multi-scale dynamic convolution feature specifically comprises the following steps:

s2-1: input pre-processed video frame FP _i ∈R ^C×H×W Wherein C represents the number of channels, H represents the picture height, and W represents the picture width;

s2-2: inputting the characteristic Headf ε R extracted by the YOLOv3 model ^C×H×W Wherein C represents the number of channels and h×w represents the feature size;

s2-3: constructing a 3×3 dynamic convolution feature;

s2-4: constructing a 5×5 dynamic convolution feature;

s2-5: a 7x7 dynamic convolution feature is constructed.

5. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 4, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the 3×3 dynamic convolution feature is constructed as described in step S2-3, and is specifically as follows:

s2-3-1: input feature Headf ε R ^C×H×W ；

S2-3-2: determining a center pixel (h, w), where h E [2, H-2], w E [2, W-2];

s2-3-3: determining a neighborhood range, wherein the neighborhood size is 1, and the neighborhood range is [ h-1, h+1 ]]×[w-1，w+1]Neighborhood is denoted as NS ₃ ∈R ^3×3 ；

S2-3-4: given W _K ∈R ^C×1 And Headf _h，w ∈R ^C×1 Calculating Key feature Key of center pixel (h, w) _h,w The formula is as follows:

Key _h,w ＝W _K ·Headf _h，w ∈R ^C×1

s2-3-5: given W _Q ∈R ^C×1 And Headf _u，v ∈R ^C×1 Computing Query feature Query for pixel (u, v) in the domain _u,v The formula is as follows:

Query _u,v ＝W _Q ·Headf _u，v ∈R ^C×1

s2-3-6: calculating dynamic convolution kernel weights dck _h,w,u，v The formula is as follows:

S2-3-7: repeating S2-3-3 to S2-3-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with the scale of 3 _3，h，w ∈R ^3×3 ；

S2-3-8: for neighborhood NS ₃ Using dynamic convolution kernel DCK _3，h，w Performing convolution operation to obtain dynamic convolution characteristic dckf _h，w The formula is as follows:

dckf _h，w ＝Conv _3×3 (NS ₃ ，DCK _3，h,w )∈R ^1×1

wherein dckf _h，w Representing the final output at position (h, w), conv _3×3 Is a 3x3 convolution operation;

s2-3-9: repeating S2-3-2 to S2-3-8 by using a sliding window for all central pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF ₁ 。

6. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 5, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the construction of the 5×5 dynamic convolution feature described in step S2-4 is specifically as follows:

s2-4-1: input feature Headf ε R ^C×H×W ；

S2-4-2: determining a center pixel (h, w), wherein h E [3, H-3], w E [3,W-3];

s2-4-3: determining a neighborhood range, wherein the neighborhood size is 2, and the neighborhood range is [ h-2, h+2 ]]×[w-2，w+2]Neighborhood is denoted as NS ₅ ∈R ^5×5 ；

S2-4-4: given W _K ∈R ^C×1 And Headf _h，w ∈R ^C×1 Calculating Key feature Key of center pixel (h, w) _h,w The formula is as follows:

Key _h,w ＝W _K ·Headf _h，w ∈R ^C×1

s2-4-5: given W _Q ∈R ^C×1 And headf _u，v ∈R ^C×1 Computing Query feature Query for pixel (u, v) in the domain _u,v The formula is as follows:

Query _u,v ＝W _Q ·Headf _u，v ∈R ^C×1

s2-4-6: calculating dynamic convolution kernel weights dck _h,w,u，v The formula is as follows:

S2-4-7: within the range of the neighborhoodRepeating S2-5-3 to S2-5-6 for all pixels (u, v) to obtain dynamic convolution kernel DCK with scale of 5 _5，h，w ；

S2-4-8: for neighborhood NS ₅ Using dynamic convolution kernel DCK _5，h，w Performing convolution operation to obtain dynamic convolution characteristic dckf _h，w The formula is as follows:

dckf _h，w ＝Conv _5×5 (NS ₅ ，DCK _5，h，w )∈R ^1×1

wherein dckf _h，w Representing the final output at position (h, w), conv _5×5 Is a convolution operation of 5x 5;

s2-4-9: repeating S2-5-2 to S2-5-8 by using a sliding window for all central pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF ₂ 。

7. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 6, wherein the dangerous driving behavior detection method is characterized by comprising the following steps: the 7×7 dynamic convolution feature is constructed as described in step S2-5, and is specifically as follows:

s2-5-1: input feature Headf ε R ^C×H×W ；

S2-5-2: determining a center pixel (h, w), where h E [4, H-4], w E [4, W-4];

s2-5-3: determining a neighborhood range, wherein the neighborhood size is 3, and the neighborhood range is [ h-3, h+3 ]]×[w-3，w+3]Neighborhood is denoted as NS ₇ ∈R ^7×7 ；

S2-5-4: given W _K ∈R ^C×1 And Headf _h，w ∈R ^C×1 Calculating Key feature Key of center pixel (h, w) _h,w The formula is as follows:

Key _h,w ＝W _K ·Headf _h，w ∈R ^C×1

s2-5-5: given W _Q ∈R ^C×1 And Headf _u，v ∈R ^C×1 Computing Query feature Query for pixel (u, v) in the domain _u,v The formula is as follows:

Query _u,v ＝W _Q ·Headf _u，v ∈R ^C×1

S2-5-7: repeating S2-7-3 to S2-7-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK with the scale of 7 _7，h，w ∈R ^7×7 ；

S2-5-8: for neighborhood NS ₇ Using dynamic convolution kernel DCK _7，h，w Performing convolution operation to obtain dynamic convolution characteristic dckf _h，w The formula is as follows:

dckf _h，w ＝Conv _7×7 (NS ₇ ，DCK _7，h，w )∈R ^1×1

wherein dckf _h，w Representing the final output at position (h, w), conv _7×7 Is a 7x7 convolution operation;

s2-5-9: repeating S2-7-2 to S2-7-8 for all center pixels (h, w) in the Headf to obtain a dynamic convolution characteristic DCKF ₇ 。

8. The method for detecting dangerous driving behavior based on multi-scale dynamic convolution attention weighting according to claim 7, wherein the method comprises the following steps: the attention-based weighted fusion multi-scale dynamic convolution feature described in the step S3 specifically comprises the following steps:

GAPF＝GAP(DCKF ₁ ，DCKF ₂ ，DCKF ₃ )∈R ^C×1

CF ₀ ＝Conv _1×1 (GAPF)∈R ^C′×1

s3-5: for convolution characteristics CF ₁ 、CF ₂ And CF (compact F) ₃ Normalization operation is carried out by using a Softmax layer to obtain a fused multi-scale dynamic convolution feature CF _1，c ，CF _2，c And CF (compact F) _3，c The formula is as follows:

wherein CF is _1，c 、CF _2，c And CF (compact F) _3，c The value of (2) represents the weight or importance of the corresponding feature.

9. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 8, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the training multi-scale dynamic convolution model in the step S4 is used for dangerous driving behavior detection, and specifically comprises the following steps:

S4-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the step S4-2 on the 3 multiplied by 3 dynamic convolution characteristics DCKF ₁ 5×5 dynamic convolution characteristic DCKF ₂ 7×7 dynamic convolution characteristic DCKF ₃ Performing attention-based weighted fusion to obtain a fused multi-scale dynamic convolution feature CF _1,c ∈R ^1×1 ，CF _2，c ∈R ^1×1 And CF (compact F) _3，c ∈R ^1×1 ；

S4-4: for the fused multi-scale dynamic convolution feature CF obtained in the step S4-3 _1，c ，CF _2，c And CF (compact F) _3，c Performing score vector calculation to obtain a score vector SV;

S4-6: for the score vector SV obtained in step S4-5 ₁ 、SV ₂ And SV(s) ₃ Adding and combiningAnd operating to obtain a final score vector FSV;

s4-8: and carrying out back propagation updating on all parameters to obtain the multi-scale dynamic convolution model.

10. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 9, wherein the dangerous driving behavior detection method comprises the following steps: the step S5 of testing the multi-scale dynamic convolution model specifically comprises the following steps:

S5-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the step S5-2 on the 3 multiplied by 3 dynamic convolution characteristics DCKF ₁ 5×5 dynamic convolution characteristic DCKF ₂ 7×7 dynamic convolution characteristic DCKF ₃ Performing attention-based weighted fusion to obtain a fused multi-scale dynamic convolution feature CF _1,c ∈R ^1×1 ，CF _2，c ∈R ^1×1 And CF (compact F) _3，c ∈R ^1×1 ；

S5-4: and (3) fusing the multiscale dynamic convolution characteristics CF obtained in the step S5-3 _1，c ，CF _2，c And CF (compact F) _3，c Performing score vector calculation to obtain a score vector SV;

s5-8: and (3) fusing the multiscale dynamic convolution characteristics CF obtained in the step S5-3 _1，c ，CF _2，c And CF (compact F) _3，c Predicting by using a multi-scale dynamic convolution model to obtain a dangerous driving behavior score DBS;