CN117576666B

CN117576666B - Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting

Info

Publication number: CN117576666B
Application number: CN202311538093.9A
Authority: CN
Inventors: 李自强; 吴克伟; 纪松; 谢昭; 程明; 徐浩; 王键钊; 张沛錡; 谭昊
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2023-11-17
Filing date: 2023-11-17
Publication date: 2024-05-10
Anticipated expiration: 2043-11-17
Also published as: CN117576666A

Abstract

The invention discloses a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting. Because of the existing target detection model, different types of dangerous driving behaviors are difficult to distinguish, the method of the invention uses the characteristics of different behaviors to learn the dynamic convolution kernel. In order to improve the recognition accuracy of dangerous driving behaviors with different resolutions, the method disclosed by the invention considers dynamic convolution kernels with different scales in a monitoring environment. In order to effectively fuse the multi-scale dynamic convolution characteristics, the method analyzes the relation among the scale characteristics, is used for learning the attention of each scale and realizes the multi-scale characteristic fusion. The multi-scale dynamic convolution module and the attention weighting module are added in the existing target detection model, so that the accuracy of dangerous driving behavior detection can be improved, the method can be applied to a vehicle safety system, and driving safety is guaranteed.

Description

Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting

Technical Field

The invention relates to the technical field of multi-scale dynamic convolution attention weighting, in particular to a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting.

Background

In order to ensure good traffic order and safety of people's lives and properties, it is necessary to monitor dangerous driving behavior of a driving driver. With the rapid development of a multi-scale dynamic convolution attention weighting module, a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting is gradually paid attention to the industry.

The Chinese patent application publication No. CN114005093A (video analysis-based driving behavior warning method, device, equipment and medium) provides a driving behavior warning method based on video analysis. The method identifies an image of dangerous driving behaviors of the target vehicle by marking position information, vehicle speed information and driving track information of the target vehicle and other vehicles in the image and a plurality of dangerous driving characteristics acquired in advance. And when the number of images of dangerous driving behaviors is larger than a preset threshold value in a preset unit time, alarming a driver of the target vehicle. However, this method only uses the external information of the vehicle to detect, but cannot sufficiently combine the driving state of the driver, and it is difficult to achieve the effect of early warning. Chinese patent application publication No. CN113033261A, a dangerous driving identification and early warning method, proposes a dangerous driving identification and early warning method. The method comprises the following steps: 1. acquiring current driving data of a driving vehicle; 2. extracting driving behavior characteristics according to driving data; 3. identifying driving behavior characteristics by adopting a fuzzy convolutional neural network; 4. and when dangerous driving is identified, sending out an early warning signal. The method has the advantages of simple steps and easy implementation, the driving behavior characteristics are identified through the fuzzy convolutional neural network, the method has higher identification accuracy, the dangerous driving behavior of the driver can be effectively judged, the early warning signal can be timely sent out, the obvious effect is achieved, and the potential of popularization and application is provided.

In Gong Jian-Qiang and Wang Yi-ying, research on Online Identification Algorithm of Dangerous Driving Behavior, they propose an algorithm for identifying dangerous driving behaviors using a variance Bayesian network, and the results indicate that the model can identify two dangerous driving behaviors and has better generalization performance compared with a single variance model. Zhe Ma, xiaohui Yang and Haoran Zhang, "Dangerous Driving Behavior Recognition using CA-CENTERNET," they propose a dangerous driving behavior recognition method using CA-CENTERNET. In this study, they identified dangerous driving behaviors based on the driver's hand behaviors and captured a large amount of driver video to build a dangerous driving behavior data set based on hand detection. They evaluated and compared with other network models, and experimental results show that the method improves the accuracy of behavior recognition.

However, in dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting, we need to fully consider the problem that the fragments of the background and the fuzzy behavior in the driving scene are difficult to classify. The key information such as the face of the driver is acquired by a multi-scale dynamic convolution technology. Using the image data acquired by the camera, in combination with the attention weighted model, we can accurately extract the position and pose of the point of interest therefrom. In this way, we can monitor the face situation of the driver more accurately, and have stronger characterization ability when detecting dangerous driving behaviors of the driver.

Disclosure of Invention

The invention aims to make up the defects of the prior art and provides a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting.

The invention is realized by the following technical scheme:

a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting specifically comprises the following steps:

s1: constructing a dangerous driving behavior data set;

S2: constructing a multi-scale dynamic convolution characteristic;

s3: fusing the multi-scale dynamic convolution characteristics based on the attention weighting;

S4: training a multi-scale dynamic convolution model for dangerous driving behavior detection;

s5: the multi-scale dynamic convolution model is tested and used for dangerous driving behavior detection;

the dangerous driving behavior training set is constructed in the step S1, and the specific steps are as follows:

S1-1: inputting dangerous driving behavior videos V _n, wherein n=1, 2, …, N and N are the video numbers;

s1-2: inputting a dangerous driving behavior tag true value L _n epsilon {0,1,2,3}, wherein 0 represents normal driving, 1 represents using a mobile phone, 2 represents drinking water, and 3 represents communicating with passengers;

S1-3: dividing each video V _n obtained in step S1-1 into a plurality of non-overlapping segments Snippetn _m, wherein m=1, 2, …, M is the number of segments;

S1-4: randomly sampling each segment Snippetn _m to obtain a video frame F _i, where i=1, 2, …, I is the number of video frames;

s1-5: preprocessing video frames;

s1-5-1: input video frame F _i;

S1-5-2: randomly clipping the video frame F _i;

S1-5-3: performing random horizontal overturn on the video frame F _i;

S1-5-4: video frame F _i is normalized, the mean {0.485,0.456,0.406} of the three channels is normalized, and the standard deviation {0.299,0.224,0.225};

S1-5-5: obtaining a preprocessed video frame FP _i;

s1-6: repeating S1-5 for all video frames F _i to obtain a dangerous driving behavior dataset DS;

The construction of the multi-scale dynamic convolution feature in the step S2 comprises the following specific steps:

S2-1: inputting a preprocessed video frame FP _i∈R^C×H×W;

wherein C represents the number of channels, H represents the picture height, and W represents the picture width;

S2-2: inputting the feature Headf E R ^C×H×W extracted by the YOLOv model;

wherein C represents the number of channels and H×W represents the feature size;

S2-3: constructing a 3×3 dynamic convolution feature;

s2-3-1: input features Headf e R ^C×H×W;

S2-3-2: determining a center pixel (h, w), where h E [2, H-2], w E [2, W-2];

S2-3-3: determining a neighborhood range, wherein the size of the neighborhood is 1, the neighborhood range is [ h-1, h+1] × [ w-1, w+1], and the neighborhood is designated as NS ₃∈R^3×3;

S2-3-4: given W _K∈R^C×1 and Headf _h,w∈R^C×1, the Key feature Key _h,w for the center pixel (h, W) is calculated as follows:

Key_h,w＝W_K·Headf_h,w∈R^C×1；

where Key _h,w represents the Key vector at position (h, W) and W _K represents a weight matrix;

s2-3-5: given W _Q∈R^C×1 and Headf _u,v∈R^C×1, a Query feature Query _u,v for a pixel (u, v) within the computational domain is formulated as follows:

Query_u,v＝W_Q·Headf_u,v∈R^C×1；

Where Query _u,v represents the Query vector at position (u, v) and W _Q represents a weight matrix;

s2-3-6: the dynamic convolution kernel weight dck _h,w,u,v is calculated as follows:

The softmax function is used for normalizing the similarity, the Trans is a transposition operation, and d _r =c;

S2-3-7: repeating S2-3-3 to S2-3-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK _3,h,w∈R^3×3 under the condition of 3 scale;

S2-3-8: and carrying out convolution operation on the neighborhood NS ₃ by using a dynamic convolution kernel DCK _3,h,w to obtain a dynamic convolution characteristic dckf _h,w, wherein the formula is as follows:

dckf_h,w＝Conv_3×3(NS₃,DCK_3,h,w)∈R^1×1；

wherein dckf _h,w denotes the final output at position (h, w), conv _3×3 is a 3x3 convolution operation;

S2-3-9: repeating S2-3-2 to S2-3-8 using a sliding window for all center pixels (h, w) within Headf to obtain a dynamic convolution feature DCKF ₁;

s2-4: constructing a 5×5 dynamic convolution feature;

s2-4-1: input features Headf e R ^C×H×W;

S2-4-2: determining a center pixel (h, w), wherein h E [3, H-3], w E [3,W-3];

s2-4-3: determining a neighborhood range, wherein the neighborhood size is 2, the neighborhood range is [ h-2, h+2] × [ w-2, w+2], and the neighborhood is designated as NS ₅∈R^5×5;

S2-4-4: given W _K∈RC^×1 and Headf _h,w∈R^C×1, the Key feature Key _h,w for the center pixel (h, W) is calculated as follows:

Key_h,w＝W_K·Headf_h,w∈R^C×1；

s2-4-5: given W _Q∈R^C×1 and headf _u,v∈R^C×1, a Query feature Query _u,v for a pixel (u, v) within the computational domain is formulated as follows:

Query_u,v＝W_Q·Headf_u,v∈R^C×1；

S2-4-6: the dynamic convolution kernel weight dck _h,w,u,v is calculated as follows:

S2-4-7: repeating S2-5-3 to S2-5-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK _5,h,w under the condition of a scale of 5;

S2-4-8: and carrying out convolution operation on the neighborhood NS ₅ by using a dynamic convolution kernel DCK _5,h,w to obtain a dynamic convolution characteristic dckf _h,w, wherein the formula is as follows:

dckf_h,w＝Conv_5×5(NS₅,DCK_5,h,w)∈R^1×1；

wherein dckf _h,w represents the final output at position (h, w), conv _5×5 is a 5x5 convolution operation;

S2-4-9: repeating S2-5-2 to S2-5-8 using a sliding window for all center pixels (h, w) within Headf to obtain a dynamic convolution feature DCKF ₂;

S2-5: constructing 7×7 dynamic convolution characteristics;

S2-5-1: input features Headf e R ^C×H×W;

S2-5-2: determining a center pixel (h, w), where h E [4, H-4], w E [4, W-4];

S2-5-3: determining a neighborhood range, wherein the neighborhood size is 3, the neighborhood range is [ h-3, h+3] × [ w-3, w+3], and the neighborhood is designated as NS ₇∈R^7×7;

s2-5-4: given W _K∈R^C×1 and Headf _h,w∈R^C×1, the Key feature Key _h,w for the center pixel (h, W) is calculated as follows:

Key_h,w＝W_K·headf_h,w∈R^C×1；

S2-5-5: given W _Q∈R^C×1 and Headf _u,v∈R^C×1, a Query feature Query _u,v for a pixel (u, v) within the computational domain is formulated as follows:

Query_u,v＝W_Q·Headf_u,v∈R^C×1；

S2-5-6: the dynamic convolution kernel weight dck _h,w,u,v is calculated as follows:

s2-5-7: repeating S2-7-3 to S2-7-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK _7,h,w∈R^7×7 under the condition of the scale of 7;

S2-5-8: and carrying out convolution operation on the neighborhood NS ₇ by using a dynamic convolution kernel DCK _7,h,w to obtain a dynamic convolution characteristic dckf _h,w, wherein the formula is as follows:

dckf_h,w＝Conv_7×7(NS₇,DCK_7,h,w)∈R^1×1；

Wherein dckf _h,w denotes the final output at position (h, w), conv _7×7 is a 7x7 convolution operation;

S2-5-9: repeating S2-7-2 to S2-7-8 for all center pixels (h, w) within Headf to obtain a dynamic convolution feature DCKF ₇;

the step S3 of fusing the multi-scale dynamic convolution characteristics based on the attention weighting specifically comprises the following steps:

S3-1: the 3×3 dynamic convolution feature DCKF ₁, 5×5 dynamic convolution feature DCKF ₂, 7×7 dynamic convolution feature DCKF ₃ are input;

S3-2: for the multi-scale dynamic convolution feature DCKF ₁、DCKF₂、DCKF₃, global average pooling operation is performed by using a global average pooling layer, so as to obtain a global average pooled feature Global Average Pooling Feature, which is marked as GAPF, and the formula is as follows:

GAPF＝GAP(DCKF₁,DCKF₂,DCKF₃)∈R^C×1；

wherein GAP is an abbreviation for global average pooling (Global Average Pooling), which is a common pooling operation method;

s3-3: the global average pooling feature GAPF is convolved with a 1 x 1 convolution layer to obtain a convolution feature CF ₀, with the following formula:

CF₀＝Conv_1×1(GAPF)∈R^C′×1；

Wherein R ^C′×1 represents a column vector of C' dimension, which is a real vector space;

s3-4: for the convolution feature CF ₀, the convolution operation is performed on the three channels by using three 1×1 convolution layers, so as to obtain convolution features CF ₁、CF₂ and CF ₃, and the formula is as follows:

The three vectors of the CF ₁、CF₂、CF₃ correspond to the three channels in the CF ₀ respectively and represent new characteristic information obtained after convolution operation;

S3-5: normalized operation is performed on the convolution features CF ₁、CF₂ and CF ₃ using a Softmax layer to obtain fused multi-scale dynamic convolution features CF _1,c,CF_2,c and CF _3,c, with the following formula:

Wherein the values of CF _1,c、CF_2,c and CF _3,c represent the weight or importance of the corresponding features;

The training multi-scale dynamic convolution model in the step S4 is used for dangerous driving behavior detection, and specifically comprises the following steps:

S4-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;

s4-2: step S2 is called to construct a multi-scale dynamic convolution feature, and multi-scale dynamic convolution feature extraction is carried out on the feature Headf extracted from the dangerous driving behavior training set DS and YOLOv model obtained in step S4-1 to obtain a 3×3 dynamic convolution feature DCKF ₁, a5×5 dynamic convolution kernel DCKF ₂ and a 7×7 dynamic convolution kernel DCKF ₃;

S4-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the attention-based weighted fusion on the 3X 3 dynamic convolution characteristics DCKF ₁, the 5X 5 dynamic convolution characteristics DCKF ₂ and the 7X 7 dynamic convolution characteristics DCKF ₃ obtained in the step S4-2 to obtain fused multi-scale dynamic convolution characteristics CF _1,c∈R^1×1,CF_2,c∈R^1×1 and CF _3,c∈R^1×1;

S4-4: carrying out score vector calculation on the fused multi-scale dynamic convolution features CF _1,c,CF_2,c and CF _3,c obtained in the step S4-3 to obtain a score vector SV;

S4-5: repeating the steps S4-2 to S4-4 for the features Headf ₁、Headf₂ and Headf ₃ extracted from the dangerous driving behavior training set DS and YOLOv model obtained in the step S4-1 to obtain score vectors SV ₁、SV₂ and SV ₃;

s4-6: performing addition and combination operation on the score vectors SV ₁、SV₂ and SV ₃ obtained in the step S4-5 to obtain a final score vector FSV;

S4-7: performing maximum value extraction operation on the score vector FSV obtained in the step S4-6, and calculating a cross entropy loss function by using the real value of the dangerous driving behavior label to obtain loss;

s4-8: carrying out back propagation update on all parameters to obtain a multi-scale dynamic convolution model;

the step S5 of testing the multi-scale dynamic convolution model comprises the following specific steps:

S5-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;

S5-2: step S2 is called to construct a multi-scale dynamic convolution feature, and multi-scale dynamic convolution feature extraction is carried out on the dangerous driving behavior training set DS obtained in step S4-1 to obtain a 3X 3 dynamic convolution feature DCKF ₁, a 5X 5 dynamic convolution kernel DCKF ₂ and a 7X 7 dynamic convolution kernel DCKF ₃;

S5-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the attention-based weighted fusion on the 3X 3 dynamic convolution characteristics DCKF ₁, the 5X 5 dynamic convolution characteristics DCKF ₂ and the 7X 7 dynamic convolution characteristics DCKF ₃ obtained in the step S5-2 to obtain fused multi-scale dynamic convolution characteristics CF _1,c∈R^1×1,CF_2,c∈R^1×1 and CF _3,c∈R^1×1;

s5-4: carrying out score vector calculation on the fused multi-scale dynamic convolution features CF _1,c,CF_2,c and CF _3,c obtained in the step S5-3 to obtain a score vector SV;

S5-5: repeating the steps S5-2 to S5-4 for the features Headf ₁、Headf₂ and Headf ₃ extracted from the dangerous driving behavior training set DS and YOLOv model obtained in the step S5-1 to obtain score vectors SV ₁、SV₂ and SV ₃;

S5-6: performing addition and combination operation on the score vectors SV ₁、SV₂ and SV ₃ obtained in the step S5-5 to obtain a final score vector FSV;

s5-7: performing maximum value extraction operation on the score vector FSV obtained in the step S5-6 to obtain dangerous driving behavior label predicted values;

S5-8: predicting the fused multi-scale dynamic convolution features CF _1,c,CF_2,c and CF _3,c obtained in the step S5-3 by using a multi-scale dynamic convolution model to obtain a dangerous driving behavior score DBS;

S5-9: and (5) invoking threshold judgment, namely performing threshold judgment on the dangerous driving behavior score DBS obtained in the step (S5-8), and performing corresponding reminding if the threshold is exceeded.

The invention has the advantages that: the driving safety is improved by the dangerous driving behavior detection method based on the multi-scale dynamic convolution attention weighting, and in order to effectively fuse the multi-scale dynamic convolution characteristics, the method analyzes the relation among the scale characteristics, is used for learning the attention of each scale and realizes the multi-scale characteristic fusion. Acquiring facial information of a driver through a video frame, and then detecting whether dangerous driving behaviors exist in the driver through analyzing features such as facial expressions, head gestures and the like; in order to accurately detect dangerous driving behaviors, the method adopts a multi-scale dynamic convolution attention weighted dangerous driving behavior detection technology, and a video frame sequence is input into a multi-scale dynamic convolution model to capture key moments and actions so as to obtain a series of driving behavior representations; when dangerous driving behaviors are detected, the method combines facial features and driving behaviors to carry out comprehensive judgment. For example, dangerous behavior such as fatigue driving, distraction driving, and the like may be detected by analyzing the facial expression and driving behavior of the driver. Meanwhile, by combining the learning result of the attention weighting module, more accurate face information can be acquired and used for detecting the attention concentration degree, driving behavior and the like of the driver. According to the invention, through dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting and combining with facial features and driving behaviors, the accuracy and reliability of the driving behaviors of a driver are improved. This helps to promote driving safety, prevents the emergence of accident.

Drawings

FIG. 1 is a flow chart for dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting;

FIG. 2 is a schematic diagram of constructing a dangerous driving behavior training set;

FIG. 3 is a diagram of steps for constructing a multi-scale dynamic convolution feature;

FIG. 4 is a schematic diagram of a multi-scale dynamic convolution feature based on attention-weighted fusion;

FIG. 5 is a schematic diagram of a test multiscale dynamic convolution model.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and detailed description. The invention relates to dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting, and the specific flow is shown in fig. 1, and the implementation scheme of the invention comprises the following steps:

s1: constructing a dangerous driving behavior training set as shown in fig. 2;

s1-5: preprocessing video frames;

s1-5-1: input video frame F _i;

S1-5-2: randomly clipping the video frame F _i;

S1-5-3: performing random horizontal overturn on the video frame F _i;

S1-5-5: obtaining a preprocessed video frame FP _i;

s2: constructing a multi-scale dynamic convolution feature, as shown in fig. 3;

S2-1: inputting a preprocessed video frame FP _i∈R^C×H×W;

S2-2: inputting the feature Headf E R ^C×H×W extracted by the YOLOv model;

S2-3: constructing a 3×3 dynamic convolution feature;

s2-3-1: input features Headf e R ^C×H×W;

S2-3-2: determining a center pixel (h, w), where h E [2, H-2], w E [2, W-2];

Key_h,w＝W_K·Headf_h,w∈R^C×1；

Query_u,v＝W_Q·Headf_u,v∈R^C×1；

dckf_h,w＝Conv_3×3(NS₃,DCK_3,h,w)∈R^1×1；

s2-4: constructing a 5×5 dynamic convolution feature;

s2-4-1: input features Headf e R ^C×H×W;

S2-4-2: determining a center pixel (h, w), wherein h E [3, H-3], w E [3,W-3];

S2-4-4: given W _K∈R^C×1 and Headf _h,w∈R^C×1, the Key feature Key _h,w for the center pixel (h, W) is calculated as follows:

Key_h,w＝W_K·Headf_h,w∈R^C×1；

Query_u,v＝W_Q·Headf_u,v∈R^C×1；

S2-5: constructing 7×7 dynamic convolution characteristics;

S2-5-1: input features Headf e R ^C×H×W;

S2-5-2: determining a center pixel (h, w), where h E [4, H-4], w E [4, W-4];

Key_h,w＝W_K·Headf_h,w∈R^C×1；

Query_u,v＝W_Q·Headf_u,v∈R^C×1；

dckf_h,w＝Conv_7×7(NS₇,DCK_7,h,w)∈R^1×1；

GAPF＝GAP(DCKF₁,DCKF₂,DCKF₃)∈R^C×1；

CF₀＝Conv_1×1(GAPF)∈R^C′×1；

s3-4: for the convolution feature CF ₀, the three channels are respectively convolved with three 1×1 convolution layers to obtain convolution features CF ₁、CF₂ and CF ₃, as shown in fig. 4, with the following formula:

s5: testing a multi-scale dynamic convolution model, as shown in fig. 5;

Claims

1. A dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting is characterized by comprising the following steps of: the method specifically comprises the following steps:

s1: constructing a dangerous driving behavior data set;

S2: constructing a multi-scale dynamic convolution characteristic;

The step S2 of constructing the multi-scale dynamic convolution feature specifically comprises the following steps:

S2-1: inputting a preprocessing video frame FP _i∈R^C×H×W, wherein C represents the number of channels, H represents the picture height, and W represents the picture width;

S2-2: inputting a YOLOv3 model extracted feature Headf epsilon R ^C×H×W, wherein C represents the number of channels and H multiplied by W represents the feature size;

S2-3: constructing a 3×3 dynamic convolution feature;

s2-4: constructing a 5×5 dynamic convolution feature;

S2-5: constructing 7×7 dynamic convolution characteristics;

The attention-based weighted fusion multi-scale dynamic convolution feature described in the step S3 specifically comprises the following steps:

GAPF＝GAP(DCKF₁,DCKF₂,DCKF₃)∈R^C×1

CF₀＝Conv_1×1(GAPF)∈R^C′×1

where the values of CF _1,c、CF_2,c and CF _3,c represent the weight or importance of the corresponding features.

2. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 1, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the step S1 of constructing the dangerous driving behavior data set to obtain the dangerous driving behavior data set specifically comprises the following steps:

s1-5: preprocessing video frames;

S1-6: and repeating S1-5 for all video frames F _i to obtain a dangerous driving behavior dataset DS.

3. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 2, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the video frame preprocessing described in step S1-5 specifically comprises the following steps:

s1-5-1: input video frame F _i;

S1-5-2: randomly clipping the video frame F _i;

S1-5-3: performing random horizontal overturn on the video frame F _i;

S1-5-5: a preprocessed video frame FP _i is obtained.

4. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 1, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the 3×3 dynamic convolution feature is constructed as described in step S2-3, and is specifically as follows:

s2-3-1: input features Headf e R ^C×H×W;

S2-3-2: determining a center pixel (h, w), where h E [2, H-2], w E [2, W-2];

Key_h,w＝W_K·Headf_h,w∈R^C×1

Query_u,v＝W_Q·Headf_u,v∈R^C×1

dckf_h,w＝Conv_3×3(NS₃,DCK_3,h,w)∈R^1×1

S2-3-9: the dynamic convolution feature DCKF ₁ is obtained by repeating S2-3-2 through S2-3-8 using a sliding window for all center pixels (h, w) within Headf.

5. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 4, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the construction of the 5×5 dynamic convolution feature described in step S2-4 is specifically as follows:

s2-4-1: input features Headf e R ^C×H×W;

S2-4-2: determining a center pixel (h, w), wherein h E [3, H-3], w E [3,W-3];

Key_h,w＝W_K·Headf_h,w∈R^C×1

Query_u,v＝W_Q·Headf_u,v∈E^C×1

Where Query _u,v represents the Query vector at position (u, v) and W _O represents a weight matrix;

dckf_h,w＝Conv_5×5(NS₅,DCK_5,h,w)∈R^1×1

S2-4-9: the dynamic convolution feature DCKF ₂ is obtained by repeating S2-5-2 through S2-5-8 using a sliding window for all center pixels (h, w) within Headf.

6. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 5, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the 7×7 dynamic convolution feature is constructed as described in step S2-5, and is specifically as follows:

S2-5-1: input features Headf e R ^C×H×W;

S2-5-2: determining a center pixel (h, w), where h E [4, H-4], w E [4, W-4];

Key_h,w＝W_K·Headf_h,w∈R^C×1

Query_u,v＝W_Q·Headf_u,v∈R^C×1

dckf_h,w＝Conv_7×7(NS₇,DCK_7,h,w)∈R^1×1

S2-5-9: the dynamic convolution feature DCKF ₇ is obtained by repeating S2-7-2 through S2-7-8 for all center pixels (h, w) within Headf.

7. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 1, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the training multi-scale dynamic convolution model in the step S4 is used for dangerous driving behavior detection, and specifically comprises the following steps:

s4-8: and carrying out back propagation updating on all parameters to obtain the multi-scale dynamic convolution model.

8. The method for detecting dangerous driving behavior based on multi-scale dynamic convolution attention weighting according to claim 7, wherein the method comprises the following steps: the step S5 of testing the multi-scale dynamic convolution model specifically comprises the following steps: