CN117576666B - Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting - Google Patents

Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting Download PDF

Info

Publication number
CN117576666B
CN117576666B CN202311538093.9A CN202311538093A CN117576666B CN 117576666 B CN117576666 B CN 117576666B CN 202311538093 A CN202311538093 A CN 202311538093A CN 117576666 B CN117576666 B CN 117576666B
Authority
CN
China
Prior art keywords
dynamic convolution
driving behavior
dangerous driving
dckf
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311538093.9A
Other languages
Chinese (zh)
Other versions
CN117576666A (en
Inventor
李自强
吴克伟
纪松
谢昭
程明
徐浩
王键钊
张沛錡
谭昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202311538093.9A priority Critical patent/CN117576666B/en
Publication of CN117576666A publication Critical patent/CN117576666A/en
Application granted granted Critical
Publication of CN117576666B publication Critical patent/CN117576666B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting. Because of the existing target detection model, different types of dangerous driving behaviors are difficult to distinguish, the method of the invention uses the characteristics of different behaviors to learn the dynamic convolution kernel. In order to improve the recognition accuracy of dangerous driving behaviors with different resolutions, the method disclosed by the invention considers dynamic convolution kernels with different scales in a monitoring environment. In order to effectively fuse the multi-scale dynamic convolution characteristics, the method analyzes the relation among the scale characteristics, is used for learning the attention of each scale and realizes the multi-scale characteristic fusion. The multi-scale dynamic convolution module and the attention weighting module are added in the existing target detection model, so that the accuracy of dangerous driving behavior detection can be improved, the method can be applied to a vehicle safety system, and driving safety is guaranteed.

Description

Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting
Technical Field
The invention relates to the technical field of multi-scale dynamic convolution attention weighting, in particular to a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting.
Background
In order to ensure good traffic order and safety of people's lives and properties, it is necessary to monitor dangerous driving behavior of a driving driver. With the rapid development of a multi-scale dynamic convolution attention weighting module, a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting is gradually paid attention to the industry.
The Chinese patent application publication No. CN114005093A (video analysis-based driving behavior warning method, device, equipment and medium) provides a driving behavior warning method based on video analysis. The method identifies an image of dangerous driving behaviors of the target vehicle by marking position information, vehicle speed information and driving track information of the target vehicle and other vehicles in the image and a plurality of dangerous driving characteristics acquired in advance. And when the number of images of dangerous driving behaviors is larger than a preset threshold value in a preset unit time, alarming a driver of the target vehicle. However, this method only uses the external information of the vehicle to detect, but cannot sufficiently combine the driving state of the driver, and it is difficult to achieve the effect of early warning. Chinese patent application publication No. CN113033261A, a dangerous driving identification and early warning method, proposes a dangerous driving identification and early warning method. The method comprises the following steps: 1. acquiring current driving data of a driving vehicle; 2. extracting driving behavior characteristics according to driving data; 3. identifying driving behavior characteristics by adopting a fuzzy convolutional neural network; 4. and when dangerous driving is identified, sending out an early warning signal. The method has the advantages of simple steps and easy implementation, the driving behavior characteristics are identified through the fuzzy convolutional neural network, the method has higher identification accuracy, the dangerous driving behavior of the driver can be effectively judged, the early warning signal can be timely sent out, the obvious effect is achieved, and the potential of popularization and application is provided.
In Gong Jian-Qiang and Wang Yi-ying, research on Online Identification Algorithm of Dangerous Driving Behavior, they propose an algorithm for identifying dangerous driving behaviors using a variance Bayesian network, and the results indicate that the model can identify two dangerous driving behaviors and has better generalization performance compared with a single variance model. Zhe Ma, xiaohui Yang and Haoran Zhang, "Dangerous Driving Behavior Recognition using CA-CENTERNET," they propose a dangerous driving behavior recognition method using CA-CENTERNET. In this study, they identified dangerous driving behaviors based on the driver's hand behaviors and captured a large amount of driver video to build a dangerous driving behavior data set based on hand detection. They evaluated and compared with other network models, and experimental results show that the method improves the accuracy of behavior recognition.
However, in dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting, we need to fully consider the problem that the fragments of the background and the fuzzy behavior in the driving scene are difficult to classify. The key information such as the face of the driver is acquired by a multi-scale dynamic convolution technology. Using the image data acquired by the camera, in combination with the attention weighted model, we can accurately extract the position and pose of the point of interest therefrom. In this way, we can monitor the face situation of the driver more accurately, and have stronger characterization ability when detecting dangerous driving behaviors of the driver.
Disclosure of Invention
The invention aims to make up the defects of the prior art and provides a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting.
The invention is realized by the following technical scheme:
a dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting specifically comprises the following steps:
s1: constructing a dangerous driving behavior data set;
S2: constructing a multi-scale dynamic convolution characteristic;
s3: fusing the multi-scale dynamic convolution characteristics based on the attention weighting;
S4: training a multi-scale dynamic convolution model for dangerous driving behavior detection;
s5: the multi-scale dynamic convolution model is tested and used for dangerous driving behavior detection;
the dangerous driving behavior training set is constructed in the step S1, and the specific steps are as follows:
S1-1: inputting dangerous driving behavior videos V n, wherein n=1, 2, …, N and N are the video numbers;
s1-2: inputting a dangerous driving behavior tag true value L n epsilon {0,1,2,3}, wherein 0 represents normal driving, 1 represents using a mobile phone, 2 represents drinking water, and 3 represents communicating with passengers;
S1-3: dividing each video V n obtained in step S1-1 into a plurality of non-overlapping segments Snippetn m, wherein m=1, 2, …, M is the number of segments;
S1-4: randomly sampling each segment Snippetn m to obtain a video frame F i, where i=1, 2, …, I is the number of video frames;
s1-5: preprocessing video frames;
s1-5-1: input video frame F i;
S1-5-2: randomly clipping the video frame F i;
S1-5-3: performing random horizontal overturn on the video frame F i;
S1-5-4: video frame F i is normalized, the mean {0.485,0.456,0.406} of the three channels is normalized, and the standard deviation {0.299,0.224,0.225};
S1-5-5: obtaining a preprocessed video frame FP i;
s1-6: repeating S1-5 for all video frames F i to obtain a dangerous driving behavior dataset DS;
The construction of the multi-scale dynamic convolution feature in the step S2 comprises the following specific steps:
S2-1: inputting a preprocessed video frame FP i∈RC×H×W;
wherein C represents the number of channels, H represents the picture height, and W represents the picture width;
S2-2: inputting the feature Headf E R C×H×W extracted by the YOLOv model;
wherein C represents the number of channels and H×W represents the feature size;
S2-3: constructing a 3×3 dynamic convolution feature;
s2-3-1: input features Headf e R C×H×W;
S2-3-2: determining a center pixel (h, w), where h E [2, H-2], w E [2, W-2];
S2-3-3: determining a neighborhood range, wherein the size of the neighborhood is 1, the neighborhood range is [ h-1, h+1] × [ w-1, w+1], and the neighborhood is designated as NS 3∈R3×3;
S2-3-4: given W K∈RC×1 and Headf h,w∈RC×1, the Key feature Key h,w for the center pixel (h, W) is calculated as follows:
Keyh,w=WK·Headfh,w∈RC×1
where Key h,w represents the Key vector at position (h, W) and W K represents a weight matrix;
s2-3-5: given W Q∈RC×1 and Headf u,v∈RC×1, a Query feature Query u,v for a pixel (u, v) within the computational domain is formulated as follows:
Queryu,v=WQ·Headfu,v∈RC×1
Where Query u,v represents the Query vector at position (u, v) and W Q represents a weight matrix;
s2-3-6: the dynamic convolution kernel weight dck h,w,u,v is calculated as follows:
The softmax function is used for normalizing the similarity, the Trans is a transposition operation, and d r =c;
S2-3-7: repeating S2-3-3 to S2-3-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK 3,h,w∈R3×3 under the condition of 3 scale;
S2-3-8: and carrying out convolution operation on the neighborhood NS 3 by using a dynamic convolution kernel DCK 3,h,w to obtain a dynamic convolution characteristic dckf h,w, wherein the formula is as follows:
dckfh,w=Conv3×3(NS3,DCK3,h,w)∈R1×1
wherein dckf h,w denotes the final output at position (h, w), conv 3×3 is a 3x3 convolution operation;
S2-3-9: repeating S2-3-2 to S2-3-8 using a sliding window for all center pixels (h, w) within Headf to obtain a dynamic convolution feature DCKF 1;
s2-4: constructing a 5×5 dynamic convolution feature;
s2-4-1: input features Headf e R C×H×W;
S2-4-2: determining a center pixel (h, w), wherein h E [3, H-3], w E [3,W-3];
s2-4-3: determining a neighborhood range, wherein the neighborhood size is 2, the neighborhood range is [ h-2, h+2] × [ w-2, w+2], and the neighborhood is designated as NS 5∈R5×5;
S2-4-4: given W K∈RC×1 and Headf h,w∈RC×1, the Key feature Key h,w for the center pixel (h, W) is calculated as follows:
Keyh,w=WK·Headfh,w∈RC×1
where Key h,w represents the Key vector at position (h, W) and W K represents a weight matrix;
s2-4-5: given W Q∈RC×1 and headf u,v∈RC×1, a Query feature Query u,v for a pixel (u, v) within the computational domain is formulated as follows:
Queryu,v=WQ·Headfu,v∈RC×1
Where Query u,v represents the Query vector at position (u, v) and W Q represents a weight matrix;
S2-4-6: the dynamic convolution kernel weight dck h,w,u,v is calculated as follows:
The softmax function is used for normalizing the similarity, the Trans is a transposition operation, and d r =c;
S2-4-7: repeating S2-5-3 to S2-5-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK 5,h,w under the condition of a scale of 5;
S2-4-8: and carrying out convolution operation on the neighborhood NS 5 by using a dynamic convolution kernel DCK 5,h,w to obtain a dynamic convolution characteristic dckf h,w, wherein the formula is as follows:
dckfh,w=Conv5×5(NS5,DCK5,h,w)∈R1×1
wherein dckf h,w represents the final output at position (h, w), conv 5×5 is a 5x5 convolution operation;
S2-4-9: repeating S2-5-2 to S2-5-8 using a sliding window for all center pixels (h, w) within Headf to obtain a dynamic convolution feature DCKF 2;
S2-5: constructing 7×7 dynamic convolution characteristics;
S2-5-1: input features Headf e R C×H×W;
S2-5-2: determining a center pixel (h, w), where h E [4, H-4], w E [4, W-4];
S2-5-3: determining a neighborhood range, wherein the neighborhood size is 3, the neighborhood range is [ h-3, h+3] × [ w-3, w+3], and the neighborhood is designated as NS 7∈R7×7;
s2-5-4: given W K∈RC×1 and Headf h,w∈RC×1, the Key feature Key h,w for the center pixel (h, W) is calculated as follows:
Keyh,w=WK·headfh,w∈RC×1
where Key h,w represents the Key vector at position (h, W) and W K represents a weight matrix;
S2-5-5: given W Q∈RC×1 and Headf u,v∈RC×1, a Query feature Query u,v for a pixel (u, v) within the computational domain is formulated as follows:
Queryu,v=WQ·Headfu,v∈RC×1
Where Query u,v represents the Query vector at position (u, v) and W Q represents a weight matrix;
S2-5-6: the dynamic convolution kernel weight dck h,w,u,v is calculated as follows:
The softmax function is used for normalizing the similarity, the Trans is a transposition operation, and d r =c;
s2-5-7: repeating S2-7-3 to S2-7-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK 7,h,w∈R7×7 under the condition of the scale of 7;
S2-5-8: and carrying out convolution operation on the neighborhood NS 7 by using a dynamic convolution kernel DCK 7,h,w to obtain a dynamic convolution characteristic dckf h,w, wherein the formula is as follows:
dckfh,w=Conv7×7(NS7,DCK7,h,w)∈R1×1
Wherein dckf h,w denotes the final output at position (h, w), conv 7×7 is a 7x7 convolution operation;
S2-5-9: repeating S2-7-2 to S2-7-8 for all center pixels (h, w) within Headf to obtain a dynamic convolution feature DCKF 7;
the step S3 of fusing the multi-scale dynamic convolution characteristics based on the attention weighting specifically comprises the following steps:
S3-1: the 3×3 dynamic convolution feature DCKF 1, 5×5 dynamic convolution feature DCKF 2, 7×7 dynamic convolution feature DCKF 3 are input;
S3-2: for the multi-scale dynamic convolution feature DCKF 1、DCKF2、DCKF3, global average pooling operation is performed by using a global average pooling layer, so as to obtain a global average pooled feature Global Average Pooling Feature, which is marked as GAPF, and the formula is as follows:
GAPF=GAP(DCKF1,DCKF2,DCKF3)∈RC×1
wherein GAP is an abbreviation for global average pooling (Global Average Pooling), which is a common pooling operation method;
s3-3: the global average pooling feature GAPF is convolved with a 1 x 1 convolution layer to obtain a convolution feature CF 0, with the following formula:
CF0=Conv1×1(GAPF)∈RC′×1
Wherein R C′×1 represents a column vector of C' dimension, which is a real vector space;
s3-4: for the convolution feature CF 0, the convolution operation is performed on the three channels by using three 1×1 convolution layers, so as to obtain convolution features CF 1、CF2 and CF 3, and the formula is as follows:
The three vectors of the CF 1、CF2、CF3 correspond to the three channels in the CF 0 respectively and represent new characteristic information obtained after convolution operation;
S3-5: normalized operation is performed on the convolution features CF 1、CF2 and CF 3 using a Softmax layer to obtain fused multi-scale dynamic convolution features CF 1,c,CF2,c and CF 3,c, with the following formula:
Wherein the values of CF 1,c、CF2,c and CF 3,c represent the weight or importance of the corresponding features;
The training multi-scale dynamic convolution model in the step S4 is used for dangerous driving behavior detection, and specifically comprises the following steps:
S4-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;
s4-2: step S2 is called to construct a multi-scale dynamic convolution feature, and multi-scale dynamic convolution feature extraction is carried out on the feature Headf extracted from the dangerous driving behavior training set DS and YOLOv model obtained in step S4-1 to obtain a 3×3 dynamic convolution feature DCKF 1, a5×5 dynamic convolution kernel DCKF 2 and a 7×7 dynamic convolution kernel DCKF 3;
S4-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the attention-based weighted fusion on the 3X 3 dynamic convolution characteristics DCKF 1, the 5X 5 dynamic convolution characteristics DCKF 2 and the 7X 7 dynamic convolution characteristics DCKF 3 obtained in the step S4-2 to obtain fused multi-scale dynamic convolution characteristics CF 1,c∈R1×1,CF2,c∈R1×1 and CF 3,c∈R1×1;
S4-4: carrying out score vector calculation on the fused multi-scale dynamic convolution features CF 1,c,CF2,c and CF 3,c obtained in the step S4-3 to obtain a score vector SV;
S4-5: repeating the steps S4-2 to S4-4 for the features Headf 1、Headf2 and Headf 3 extracted from the dangerous driving behavior training set DS and YOLOv model obtained in the step S4-1 to obtain score vectors SV 1、SV2 and SV 3;
s4-6: performing addition and combination operation on the score vectors SV 1、SV2 and SV 3 obtained in the step S4-5 to obtain a final score vector FSV;
S4-7: performing maximum value extraction operation on the score vector FSV obtained in the step S4-6, and calculating a cross entropy loss function by using the real value of the dangerous driving behavior label to obtain loss;
s4-8: carrying out back propagation update on all parameters to obtain a multi-scale dynamic convolution model;
the step S5 of testing the multi-scale dynamic convolution model comprises the following specific steps:
S5-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;
S5-2: step S2 is called to construct a multi-scale dynamic convolution feature, and multi-scale dynamic convolution feature extraction is carried out on the dangerous driving behavior training set DS obtained in step S4-1 to obtain a 3X 3 dynamic convolution feature DCKF 1, a 5X 5 dynamic convolution kernel DCKF 2 and a 7X 7 dynamic convolution kernel DCKF 3;
S5-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the attention-based weighted fusion on the 3X 3 dynamic convolution characteristics DCKF 1, the 5X 5 dynamic convolution characteristics DCKF 2 and the 7X 7 dynamic convolution characteristics DCKF 3 obtained in the step S5-2 to obtain fused multi-scale dynamic convolution characteristics CF 1,c∈R1×1,CF2,c∈R1×1 and CF 3,c∈R1×1;
s5-4: carrying out score vector calculation on the fused multi-scale dynamic convolution features CF 1,c,CF2,c and CF 3,c obtained in the step S5-3 to obtain a score vector SV;
S5-5: repeating the steps S5-2 to S5-4 for the features Headf 1、Headf2 and Headf 3 extracted from the dangerous driving behavior training set DS and YOLOv model obtained in the step S5-1 to obtain score vectors SV 1、SV2 and SV 3;
S5-6: performing addition and combination operation on the score vectors SV 1、SV2 and SV 3 obtained in the step S5-5 to obtain a final score vector FSV;
s5-7: performing maximum value extraction operation on the score vector FSV obtained in the step S5-6 to obtain dangerous driving behavior label predicted values;
S5-8: predicting the fused multi-scale dynamic convolution features CF 1,c,CF2,c and CF 3,c obtained in the step S5-3 by using a multi-scale dynamic convolution model to obtain a dangerous driving behavior score DBS;
S5-9: and (5) invoking threshold judgment, namely performing threshold judgment on the dangerous driving behavior score DBS obtained in the step (S5-8), and performing corresponding reminding if the threshold is exceeded.
The invention has the advantages that: the driving safety is improved by the dangerous driving behavior detection method based on the multi-scale dynamic convolution attention weighting, and in order to effectively fuse the multi-scale dynamic convolution characteristics, the method analyzes the relation among the scale characteristics, is used for learning the attention of each scale and realizes the multi-scale characteristic fusion. Acquiring facial information of a driver through a video frame, and then detecting whether dangerous driving behaviors exist in the driver through analyzing features such as facial expressions, head gestures and the like; in order to accurately detect dangerous driving behaviors, the method adopts a multi-scale dynamic convolution attention weighted dangerous driving behavior detection technology, and a video frame sequence is input into a multi-scale dynamic convolution model to capture key moments and actions so as to obtain a series of driving behavior representations; when dangerous driving behaviors are detected, the method combines facial features and driving behaviors to carry out comprehensive judgment. For example, dangerous behavior such as fatigue driving, distraction driving, and the like may be detected by analyzing the facial expression and driving behavior of the driver. Meanwhile, by combining the learning result of the attention weighting module, more accurate face information can be acquired and used for detecting the attention concentration degree, driving behavior and the like of the driver. According to the invention, through dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting and combining with facial features and driving behaviors, the accuracy and reliability of the driving behaviors of a driver are improved. This helps to promote driving safety, prevents the emergence of accident.
Drawings
FIG. 1 is a flow chart for dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting;
FIG. 2 is a schematic diagram of constructing a dangerous driving behavior training set;
FIG. 3 is a diagram of steps for constructing a multi-scale dynamic convolution feature;
FIG. 4 is a schematic diagram of a multi-scale dynamic convolution feature based on attention-weighted fusion;
FIG. 5 is a schematic diagram of a test multiscale dynamic convolution model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and detailed description. The invention relates to dangerous driving behavior detection based on multi-scale dynamic convolution attention weighting, and the specific flow is shown in fig. 1, and the implementation scheme of the invention comprises the following steps:
s1: constructing a dangerous driving behavior training set as shown in fig. 2;
S1-1: inputting dangerous driving behavior videos V n, wherein n=1, 2, …, N and N are the video numbers;
s1-2: inputting a dangerous driving behavior tag true value L n epsilon {0,1,2,3}, wherein 0 represents normal driving, 1 represents using a mobile phone, 2 represents drinking water, and 3 represents communicating with passengers;
S1-3: dividing each video V n obtained in step S1-1 into a plurality of non-overlapping segments Snippetn m, wherein m=1, 2, …, M is the number of segments;
S1-4: randomly sampling each segment Snippetn m to obtain a video frame F i, where i=1, 2, …, I is the number of video frames;
s1-5: preprocessing video frames;
s1-5-1: input video frame F i;
S1-5-2: randomly clipping the video frame F i;
S1-5-3: performing random horizontal overturn on the video frame F i;
S1-5-4: video frame F i is normalized, the mean {0.485,0.456,0.406} of the three channels is normalized, and the standard deviation {0.299,0.224,0.225};
S1-5-5: obtaining a preprocessed video frame FP i;
s1-6: repeating S1-5 for all video frames F i to obtain a dangerous driving behavior dataset DS;
s2: constructing a multi-scale dynamic convolution feature, as shown in fig. 3;
S2-1: inputting a preprocessed video frame FP i∈RC×H×W;
wherein C represents the number of channels, H represents the picture height, and W represents the picture width;
S2-2: inputting the feature Headf E R C×H×W extracted by the YOLOv model;
wherein C represents the number of channels and H×W represents the feature size;
S2-3: constructing a 3×3 dynamic convolution feature;
s2-3-1: input features Headf e R C×H×W;
S2-3-2: determining a center pixel (h, w), where h E [2, H-2], w E [2, W-2];
S2-3-3: determining a neighborhood range, wherein the size of the neighborhood is 1, the neighborhood range is [ h-1, h+1] × [ w-1, w+1], and the neighborhood is designated as NS 3∈R3×3;
S2-3-4: given W K∈RC×1 and Headf h,w∈RC×1, the Key feature Key h,w for the center pixel (h, W) is calculated as follows:
Keyh,w=WK·Headfh,w∈RC×1
where Key h,w represents the Key vector at position (h, W) and W K represents a weight matrix;
s2-3-5: given W Q∈RC×1 and Headf u,v∈RC×1, a Query feature Query u,v for a pixel (u, v) within the computational domain is formulated as follows:
Queryu,v=WQ·Headfu,v∈RC×1
Where Query u,v represents the Query vector at position (u, v) and W Q represents a weight matrix;
s2-3-6: the dynamic convolution kernel weight dck h,w,u,v is calculated as follows:
The softmax function is used for normalizing the similarity, the Trans is a transposition operation, and d r =c;
S2-3-7: repeating S2-3-3 to S2-3-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK 3,h,w∈R3×3 under the condition of 3 scale;
S2-3-8: and carrying out convolution operation on the neighborhood NS 3 by using a dynamic convolution kernel DCK 3,h,w to obtain a dynamic convolution characteristic dckf h,w, wherein the formula is as follows:
dckfh,w=Conv3×3(NS3,DCK3,h,w)∈R1×1
wherein dckf h,w denotes the final output at position (h, w), conv 3×3 is a 3x3 convolution operation;
S2-3-9: repeating S2-3-2 to S2-3-8 using a sliding window for all center pixels (h, w) within Headf to obtain a dynamic convolution feature DCKF 1;
s2-4: constructing a 5×5 dynamic convolution feature;
s2-4-1: input features Headf e R C×H×W;
S2-4-2: determining a center pixel (h, w), wherein h E [3, H-3], w E [3,W-3];
s2-4-3: determining a neighborhood range, wherein the neighborhood size is 2, the neighborhood range is [ h-2, h+2] × [ w-2, w+2], and the neighborhood is designated as NS 5∈R5×5;
S2-4-4: given W K∈RC×1 and Headf h,w∈RC×1, the Key feature Key h,w for the center pixel (h, W) is calculated as follows:
Keyh,w=WK·Headfh,w∈RC×1
where Key h,w represents the Key vector at position (h, W) and W K represents a weight matrix;
s2-4-5: given W Q∈RC×1 and headf u,v∈RC×1, a Query feature Query u,v for a pixel (u, v) within the computational domain is formulated as follows:
Queryu,v=WQ·Headfu,v∈RC×1
Where Query u,v represents the Query vector at position (u, v) and W Q represents a weight matrix;
S2-4-6: the dynamic convolution kernel weight dck h,w,u,v is calculated as follows:
The softmax function is used for normalizing the similarity, the Trans is a transposition operation, and d r =c;
S2-4-7: repeating S2-5-3 to S2-5-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK 5,h,w under the condition of a scale of 5;
S2-4-8: and carrying out convolution operation on the neighborhood NS 5 by using a dynamic convolution kernel DCK 5,h,w to obtain a dynamic convolution characteristic dckf h,w, wherein the formula is as follows:
wherein dckf h,w represents the final output at position (h, w), conv 5×5 is a 5x5 convolution operation;
S2-4-9: repeating S2-5-2 to S2-5-8 using a sliding window for all center pixels (h, w) within Headf to obtain a dynamic convolution feature DCKF 2;
S2-5: constructing 7×7 dynamic convolution characteristics;
S2-5-1: input features Headf e R C×H×W;
S2-5-2: determining a center pixel (h, w), where h E [4, H-4], w E [4, W-4];
S2-5-3: determining a neighborhood range, wherein the neighborhood size is 3, the neighborhood range is [ h-3, h+3] × [ w-3, w+3], and the neighborhood is designated as NS 7∈R7×7;
s2-5-4: given W K∈RC×1 and Headf h,w∈RC×1, the Key feature Key h,w for the center pixel (h, W) is calculated as follows:
Keyh,w=WK·Headfh,w∈RC×1
where Key h,w represents the Key vector at position (h, W) and W K represents a weight matrix;
S2-5-5: given W Q∈RC×1 and Headf u,v∈RC×1, a Query feature Query u,v for a pixel (u, v) within the computational domain is formulated as follows:
Queryu,v=WQ·Headfu,v∈RC×1
Where Query u,v represents the Query vector at position (u, v) and W Q represents a weight matrix;
S2-5-6: the dynamic convolution kernel weight dck h,w,u,v is calculated as follows:
The softmax function is used for normalizing the similarity, the Trans is a transposition operation, and d r =c;
s2-5-7: repeating S2-7-3 to S2-7-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK 7,h,w∈R7×7 under the condition of the scale of 7;
S2-5-8: and carrying out convolution operation on the neighborhood NS 7 by using a dynamic convolution kernel DCK 7,h,w to obtain a dynamic convolution characteristic dckf h,w, wherein the formula is as follows:
dckfh,w=Conv7×7(NS7,DCK7,h,w)∈R1×1
Wherein dckf h,w denotes the final output at position (h, w), conv 7×7 is a 7x7 convolution operation;
S2-5-9: repeating S2-7-2 to S2-7-8 for all center pixels (h, w) within Headf to obtain a dynamic convolution feature DCKF 7;
s3: fusing the multi-scale dynamic convolution characteristics based on the attention weighting;
S3-1: the 3×3 dynamic convolution feature DCKF 1, 5×5 dynamic convolution feature DCKF 2, 7×7 dynamic convolution feature DCKF 3 are input;
S3-2: for the multi-scale dynamic convolution feature DCKF 1、DCKF2、DCKF3, global average pooling operation is performed by using a global average pooling layer, so as to obtain a global average pooled feature Global Average Pooling Feature, which is marked as GAPF, and the formula is as follows:
GAPF=GAP(DCKF1,DCKF2,DCKF3)∈RC×1
wherein GAP is an abbreviation for global average pooling (Global Average Pooling), which is a common pooling operation method;
s3-3: the global average pooling feature GAPF is convolved with a 1 x 1 convolution layer to obtain a convolution feature CF 0, with the following formula:
CF0=Conv1×1(GAPF)∈RC′×1
Wherein R C′×1 represents a column vector of C' dimension, which is a real vector space;
s3-4: for the convolution feature CF 0, the three channels are respectively convolved with three 1×1 convolution layers to obtain convolution features CF 1、CF2 and CF 3, as shown in fig. 4, with the following formula:
The three vectors of the CF 1、CF2、CF3 correspond to the three channels in the CF 0 respectively and represent new characteristic information obtained after convolution operation;
S3-5: normalized operation is performed on the convolution features CF 1、CF2 and CF 3 using a Softmax layer to obtain fused multi-scale dynamic convolution features CF 1,c,CF2,c and CF 3,c, with the following formula:
Wherein the values of CF 1,c、CF2,c and CF 3,c represent the weight or importance of the corresponding features;
S4: training a multi-scale dynamic convolution model for dangerous driving behavior detection;
S4-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;
s4-2: step S2 is called to construct a multi-scale dynamic convolution feature, and multi-scale dynamic convolution feature extraction is carried out on the feature Headf extracted from the dangerous driving behavior training set DS and YOLOv model obtained in step S4-1 to obtain a 3×3 dynamic convolution feature DCKF 1, a5×5 dynamic convolution kernel DCKF 2 and a 7×7 dynamic convolution kernel DCKF 3;
S4-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the attention-based weighted fusion on the 3X 3 dynamic convolution characteristics DCKF 1, the 5X 5 dynamic convolution characteristics DCKF 2 and the 7X 7 dynamic convolution characteristics DCKF 3 obtained in the step S4-2 to obtain fused multi-scale dynamic convolution characteristics CF 1,c∈R1×1,CF2,c∈R1×1 and CF 3,c∈R1×1;
S4-4: carrying out score vector calculation on the fused multi-scale dynamic convolution features CF 1,c,CF2,c and CF 3,c obtained in the step S4-3 to obtain a score vector SV;
S4-5: repeating the steps S4-2 to S4-4 for the features Headf 1、Headf2 and Headf 3 extracted from the dangerous driving behavior training set DS and YOLOv model obtained in the step S4-1 to obtain score vectors SV 1、SV2 and SV 3;
s4-6: performing addition and combination operation on the score vectors SV 1、SV2 and SV 3 obtained in the step S4-5 to obtain a final score vector FSV;
S4-7: performing maximum value extraction operation on the score vector FSV obtained in the step S4-6, and calculating a cross entropy loss function by using the real value of the dangerous driving behavior label to obtain loss;
s4-8: carrying out back propagation update on all parameters to obtain a multi-scale dynamic convolution model;
s5: testing a multi-scale dynamic convolution model, as shown in fig. 5;
S5-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;
S5-2: step S2 is called to construct a multi-scale dynamic convolution feature, and multi-scale dynamic convolution feature extraction is carried out on the dangerous driving behavior training set DS obtained in step S4-1 to obtain a 3X 3 dynamic convolution feature DCKF 1, a 5X 5 dynamic convolution kernel DCKF 2 and a 7X 7 dynamic convolution kernel DCKF 3;
S5-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the attention-based weighted fusion on the 3X 3 dynamic convolution characteristics DCKF 1, the 5X 5 dynamic convolution characteristics DCKF 2 and the 7X 7 dynamic convolution characteristics DCKF 3 obtained in the step S5-2 to obtain fused multi-scale dynamic convolution characteristics CF 1,c∈R1×1,CF2,c∈R1×1 and CF 3,c∈R1×1;
s5-4: carrying out score vector calculation on the fused multi-scale dynamic convolution features CF 1,c,CF2,c and CF 3,c obtained in the step S5-3 to obtain a score vector SV;
S5-5: repeating the steps S5-2 to S5-4 for the features Headf 1、Headf2 and Headf 3 extracted from the dangerous driving behavior training set DS and YOLOv model obtained in the step S5-1 to obtain score vectors SV 1、SV2 and SV 3;
S5-6: performing addition and combination operation on the score vectors SV 1、SV2 and SV 3 obtained in the step S5-5 to obtain a final score vector FSV;
s5-7: performing maximum value extraction operation on the score vector FSV obtained in the step S5-6 to obtain dangerous driving behavior label predicted values;
S5-8: predicting the fused multi-scale dynamic convolution features CF 1,c,CF2,c and CF 3,c obtained in the step S5-3 by using a multi-scale dynamic convolution model to obtain a dangerous driving behavior score DBS;
S5-9: and (5) invoking threshold judgment, namely performing threshold judgment on the dangerous driving behavior score DBS obtained in the step (S5-8), and performing corresponding reminding if the threshold is exceeded.

Claims (8)

1. A dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting is characterized by comprising the following steps of: the method specifically comprises the following steps:
s1: constructing a dangerous driving behavior data set;
S2: constructing a multi-scale dynamic convolution characteristic;
s3: fusing the multi-scale dynamic convolution characteristics based on the attention weighting;
S4: training a multi-scale dynamic convolution model for dangerous driving behavior detection;
s5: the multi-scale dynamic convolution model is tested and used for dangerous driving behavior detection;
The step S2 of constructing the multi-scale dynamic convolution feature specifically comprises the following steps:
S2-1: inputting a preprocessing video frame FP i∈RC×H×W, wherein C represents the number of channels, H represents the picture height, and W represents the picture width;
S2-2: inputting a YOLOv3 model extracted feature Headf epsilon R C×H×W, wherein C represents the number of channels and H multiplied by W represents the feature size;
S2-3: constructing a 3×3 dynamic convolution feature;
s2-4: constructing a 5×5 dynamic convolution feature;
S2-5: constructing 7×7 dynamic convolution characteristics;
The attention-based weighted fusion multi-scale dynamic convolution feature described in the step S3 specifically comprises the following steps:
S3-1: the 3×3 dynamic convolution feature DCKF 1, 5×5 dynamic convolution feature DCKF 2, 7×7 dynamic convolution feature DCKF 3 are input;
S3-2: for the multi-scale dynamic convolution feature DCKF 1、DCKF2、DCKF3, global average pooling operation is performed by using a global average pooling layer, so as to obtain a global average pooled feature Global Average Pooling Feature, which is marked as GAPF, and the formula is as follows:
GAPF=GAP(DCKF1,DCKF2,DCKF3)∈RC×1
s3-3: the global average pooling feature GAPF is convolved with a 1 x 1 convolution layer to obtain a convolution feature CF 0, with the following formula:
CF0=Conv1×1(GAPF)∈RC′×1
Wherein R C′×1 represents a column vector of C' dimension, which is a real vector space;
s3-4: for the convolution feature CF 0, the convolution operation is performed on the three channels by using three 1×1 convolution layers, so as to obtain convolution features CF 1、CF2 and CF 3, and the formula is as follows:
The three vectors of the CF 1、CF2、CF3 correspond to the three channels in the CF 0 respectively and represent new characteristic information obtained after convolution operation;
S3-5: normalized operation is performed on the convolution features CF 1、CF2 and CF 3 using a Softmax layer to obtain fused multi-scale dynamic convolution features CF 1,c,CF2,c and CF 3,c, with the following formula:
where the values of CF 1,c、CF2,c and CF 3,c represent the weight or importance of the corresponding features.
2. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 1, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the step S1 of constructing the dangerous driving behavior data set to obtain the dangerous driving behavior data set specifically comprises the following steps:
S1-1: inputting dangerous driving behavior videos V n, wherein n=1, 2, …, N and N are the video numbers;
s1-2: inputting a dangerous driving behavior tag true value L n epsilon {0,1,2,3}, wherein 0 represents normal driving, 1 represents using a mobile phone, 2 represents drinking water, and 3 represents communicating with passengers;
S1-3: dividing each video V n obtained in step S1-1 into a plurality of non-overlapping segments Snippetn m, wherein m=1, 2, …, M is the number of segments;
S1-4: randomly sampling each segment Snippetn m to obtain a video frame F i, where i=1, 2, …, I is the number of video frames;
s1-5: preprocessing video frames;
S1-6: and repeating S1-5 for all video frames F i to obtain a dangerous driving behavior dataset DS.
3. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 2, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the video frame preprocessing described in step S1-5 specifically comprises the following steps:
s1-5-1: input video frame F i;
S1-5-2: randomly clipping the video frame F i;
S1-5-3: performing random horizontal overturn on the video frame F i;
S1-5-4: video frame F i is normalized, the mean {0.485,0.456,0.406} of the three channels is normalized, and the standard deviation {0.299,0.224,0.225};
S1-5-5: a preprocessed video frame FP i is obtained.
4. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 1, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the 3×3 dynamic convolution feature is constructed as described in step S2-3, and is specifically as follows:
s2-3-1: input features Headf e R C×H×W;
S2-3-2: determining a center pixel (h, w), where h E [2, H-2], w E [2, W-2];
S2-3-3: determining a neighborhood range, wherein the size of the neighborhood is 1, the neighborhood range is [ h-1, h+1] × [ w-1, w+1], and the neighborhood is designated as NS 3∈R3×3;
S2-3-4: given W K∈RC×1 and Headf h,w∈RC×1, the Key feature Key h,w for the center pixel (h, W) is calculated as follows:
Keyh,w=WK·Headfh,w∈RC×1
where Key h,w represents the Key vector at position (h, W) and W K represents a weight matrix;
s2-3-5: given W Q∈RC×1 and Headf u,v∈RC×1, a Query feature Query u,v for a pixel (u, v) within the computational domain is formulated as follows:
Queryu,v=WQ·Headfu,v∈RC×1
Where Query u,v represents the Query vector at position (u, v) and W Q represents a weight matrix;
s2-3-6: the dynamic convolution kernel weight dck h,w,u,v is calculated as follows:
The softmax function is used for normalizing the similarity, the Trans is a transposition operation, and d r =c;
S2-3-7: repeating S2-3-3 to S2-3-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK 3,h,w∈R3×3 under the condition of 3 scale;
S2-3-8: and carrying out convolution operation on the neighborhood NS 3 by using a dynamic convolution kernel DCK 3,h,w to obtain a dynamic convolution characteristic dckf h,w, wherein the formula is as follows:
dckfh,w=Conv3×3(NS3,DCK3,h,w)∈R1×1
wherein dckf h,w denotes the final output at position (h, w), conv 3×3 is a 3x3 convolution operation;
S2-3-9: the dynamic convolution feature DCKF 1 is obtained by repeating S2-3-2 through S2-3-8 using a sliding window for all center pixels (h, w) within Headf.
5. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 4, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the construction of the 5×5 dynamic convolution feature described in step S2-4 is specifically as follows:
s2-4-1: input features Headf e R C×H×W;
S2-4-2: determining a center pixel (h, w), wherein h E [3, H-3], w E [3,W-3];
s2-4-3: determining a neighborhood range, wherein the neighborhood size is 2, the neighborhood range is [ h-2, h+2] × [ w-2, w+2], and the neighborhood is designated as NS 5∈R5×5;
S2-4-4: given W K∈RC×1 and Headf h,w∈RC×1, the Key feature Key h,w for the center pixel (h, W) is calculated as follows:
Keyh,w=WK·Headfh,w∈RC×1
where Key h,w represents the Key vector at position (h, W) and W K represents a weight matrix;
s2-4-5: given W Q∈RC×1 and headf u,v∈RC×1, a Query feature Query u,v for a pixel (u, v) within the computational domain is formulated as follows:
Queryu,v=WQ·Headfu,v∈EC×1
Where Query u,v represents the Query vector at position (u, v) and W O represents a weight matrix;
S2-4-6: the dynamic convolution kernel weight dck h,w,u,v is calculated as follows:
The softmax function is used for normalizing the similarity, the Trans is a transposition operation, and d r =c;
S2-4-7: repeating S2-5-3 to S2-5-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK 5,h,w under the condition of a scale of 5;
S2-4-8: and carrying out convolution operation on the neighborhood NS 5 by using a dynamic convolution kernel DCK 5,h,w to obtain a dynamic convolution characteristic dckf h,w, wherein the formula is as follows:
dckfh,w=Conv5×5(NS5,DCK5,h,w)∈R1×1
wherein dckf h,w represents the final output at position (h, w), conv 5×5 is a 5x5 convolution operation;
S2-4-9: the dynamic convolution feature DCKF 2 is obtained by repeating S2-5-2 through S2-5-8 using a sliding window for all center pixels (h, w) within Headf.
6. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 5, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the 7×7 dynamic convolution feature is constructed as described in step S2-5, and is specifically as follows:
S2-5-1: input features Headf e R C×H×W;
S2-5-2: determining a center pixel (h, w), where h E [4, H-4], w E [4, W-4];
S2-5-3: determining a neighborhood range, wherein the neighborhood size is 3, the neighborhood range is [ h-3, h+3] × [ w-3, w+3], and the neighborhood is designated as NS 7∈R7×7;
s2-5-4: given W K∈RC×1 and Headf h,w∈RC×1, the Key feature Key h,w for the center pixel (h, W) is calculated as follows:
Keyh,w=WK·Headfh,w∈RC×1
where Key h,w represents the Key vector at position (h, W) and W K represents a weight matrix;
S2-5-5: given W Q∈RC×1 and Headf u,v∈RC×1, a Query feature Query u,v for a pixel (u, v) within the computational domain is formulated as follows:
Queryu,v=WQ·Headfu,v∈RC×1
Where Query u,v represents the Query vector at position (u, v) and W Q represents a weight matrix;
S2-5-6: the dynamic convolution kernel weight dck h,w,u,v is calculated as follows:
The softmax function is used for normalizing the similarity, the Trans is a transposition operation, and d r =c;
s2-5-7: repeating S2-7-3 to S2-7-6 for all pixels (u, v) in the neighborhood range to obtain a dynamic convolution kernel DCK 7,h,w∈R7×7 under the condition of the scale of 7;
S2-5-8: and carrying out convolution operation on the neighborhood NS 7 by using a dynamic convolution kernel DCK 7,h,w to obtain a dynamic convolution characteristic dckf h,w, wherein the formula is as follows:
dckfh,w=Conv7×7(NS7,DCK7,h,w)∈R1×1
Wherein dckf h,w denotes the final output at position (h, w), conv 7×7 is a 7x7 convolution operation;
S2-5-9: the dynamic convolution feature DCKF 7 is obtained by repeating S2-7-2 through S2-7-8 for all center pixels (h, w) within Headf.
7. The dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting according to claim 1, wherein the dangerous driving behavior detection method is characterized by comprising the following steps of: the training multi-scale dynamic convolution model in the step S4 is used for dangerous driving behavior detection, and specifically comprises the following steps:
S4-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;
s4-2: step S2 is called to construct a multi-scale dynamic convolution feature, and multi-scale dynamic convolution feature extraction is carried out on the feature Headf extracted from the dangerous driving behavior training set DS and YOLOv model obtained in step S4-1 to obtain a 3×3 dynamic convolution feature DCKF 1, a5×5 dynamic convolution kernel DCKF 2 and a 7×7 dynamic convolution kernel DCKF 3;
S4-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the attention-based weighted fusion on the 3X 3 dynamic convolution characteristics DCKF 1, the 5X 5 dynamic convolution characteristics DCKF 2 and the 7X 7 dynamic convolution characteristics DCKF 3 obtained in the step S4-2 to obtain fused multi-scale dynamic convolution characteristics CF 1,c∈R1×1,CF2,c∈R1×1 and CF 3,c∈R1×1;
S4-4: carrying out score vector calculation on the fused multi-scale dynamic convolution features CF 1,c,CF2,c and CF 3,c obtained in the step S4-3 to obtain a score vector SV;
S4-5: repeating the steps S4-2 to S4-4 for the features Headf 1、Headf2 and Headf 3 extracted from the dangerous driving behavior training set DS and YOLOv model obtained in the step S4-1 to obtain score vectors SV 1、SV2 and SV 3;
s4-6: performing addition and combination operation on the score vectors SV 1、SV2 and SV 3 obtained in the step S4-5 to obtain a final score vector FSV;
S4-7: performing maximum value extraction operation on the score vector FSV obtained in the step S4-6, and calculating a cross entropy loss function by using the real value of the dangerous driving behavior label to obtain loss;
s4-8: and carrying out back propagation updating on all parameters to obtain the multi-scale dynamic convolution model.
8. The method for detecting dangerous driving behavior based on multi-scale dynamic convolution attention weighting according to claim 7, wherein the method comprises the following steps: the step S5 of testing the multi-scale dynamic convolution model specifically comprises the following steps:
S5-1: step S1, a dangerous driving behavior data set is constructed, and video frame extraction, image random cutting, image random horizontal turning and image standardization are carried out on input dangerous driving behavior videos and dangerous driving behavior label real values to obtain a dangerous driving behavior training set DS;
S5-2: step S2 is called to construct a multi-scale dynamic convolution feature, and multi-scale dynamic convolution feature extraction is carried out on the dangerous driving behavior training set DS obtained in step S4-1 to obtain a 3X 3 dynamic convolution feature DCKF 1, a 5X 5 dynamic convolution kernel DCKF 2 and a 7X 7 dynamic convolution kernel DCKF 3;
S5-3: invoking step S3 to fuse the multi-scale dynamic convolution characteristics based on the attention weighting, and performing the attention-based weighted fusion on the 3X 3 dynamic convolution characteristics DCKF 1, the 5X 5 dynamic convolution characteristics DCKF 2 and the 7X 7 dynamic convolution characteristics DCKF 3 obtained in the step S5-2 to obtain fused multi-scale dynamic convolution characteristics CF 1,c∈R1×1,CF2,c∈R1×1 and CF 3,c∈R1×1;
s5-4: carrying out score vector calculation on the fused multi-scale dynamic convolution features CF 1,c,CF2,c and CF 3,c obtained in the step S5-3 to obtain a score vector SV;
S5-5: repeating the steps S5-2 to S5-4 for the features Headf 1、Headf2 and Headf 3 extracted from the dangerous driving behavior training set DS and YOLOv model obtained in the step S5-1 to obtain score vectors SV 1、SV2 and SV 3;
S5-6: performing addition and combination operation on the score vectors SV 1、SV2 and SV 3 obtained in the step S5-5 to obtain a final score vector FSV;
s5-7: performing maximum value extraction operation on the score vector FSV obtained in the step S5-6 to obtain dangerous driving behavior label predicted values;
S5-8: predicting the fused multi-scale dynamic convolution features CF 1,c,CF2,c and CF 3,c obtained in the step S5-3 by using a multi-scale dynamic convolution model to obtain a dangerous driving behavior score DBS;
S5-9: and (5) invoking threshold judgment, namely performing threshold judgment on the dangerous driving behavior score DBS obtained in the step (S5-8), and performing corresponding reminding if the threshold is exceeded.
CN202311538093.9A 2023-11-17 2023-11-17 Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting Active CN117576666B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311538093.9A CN117576666B (en) 2023-11-17 2023-11-17 Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311538093.9A CN117576666B (en) 2023-11-17 2023-11-17 Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting

Publications (2)

Publication Number Publication Date
CN117576666A CN117576666A (en) 2024-02-20
CN117576666B true CN117576666B (en) 2024-05-10

Family

ID=89894792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311538093.9A Active CN117576666B (en) 2023-11-17 2023-11-17 Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting

Country Status (1)

Country Link
CN (1) CN117576666B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059582A (en) * 2019-03-28 2019-07-26 东南大学 Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks
WO2021248687A1 (en) * 2020-06-10 2021-12-16 南京理工大学 Driving fatigue detection method and system combining pseudo 3d convolutional neural network and attention mechanism
CN114241210A (en) * 2021-11-22 2022-03-25 中国海洋大学 Multi-task learning method and system based on dynamic convolution
CN114241456A (en) * 2021-12-20 2022-03-25 东南大学 Safe driving monitoring method using feature adaptive weighting

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230260247A1 (en) * 2022-02-17 2023-08-17 Samsung Electronics Co., Ltd. System and method for dual-value attention and instance boundary aware regression in computer vision system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059582A (en) * 2019-03-28 2019-07-26 东南大学 Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks
WO2021248687A1 (en) * 2020-06-10 2021-12-16 南京理工大学 Driving fatigue detection method and system combining pseudo 3d convolutional neural network and attention mechanism
CN114241210A (en) * 2021-11-22 2022-03-25 中国海洋大学 Multi-task learning method and system based on dynamic convolution
CN114241456A (en) * 2021-12-20 2022-03-25 东南大学 Safe driving monitoring method using feature adaptive weighting

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
万思宇 ; .基于注意力机制的3D车辆检测算法.计算机工程与科学.2020,(第01期),全文. *
龙劲峄 ; 周骅 ; .基于嵌入式神经网络的危险驾驶行为检测系统.智能计算机与应用.2020,(第03期),全文. *

Also Published As

Publication number Publication date
CN117576666A (en) 2024-02-20

Similar Documents

Publication Publication Date Title
Omerustaoglu et al. Distracted driver detection by combining in-vehicle and image data using deep learning
Weng et al. Driver drowsiness detection via a hierarchical temporal deep belief network
Zhang et al. Too far to see? Not really!—Pedestrian detection with scale-aware localization policy
Yuan Video-based smoke detection with histogram sequence of LBP and LBPV pyramids
CN111274881A (en) Driving safety monitoring method and device, computer equipment and storage medium
CN111008600B (en) Lane line detection method
CN110427871B (en) Fatigue driving detection method based on computer vision
CN109460787B (en) Intrusion detection model establishing method and device and data processing equipment
CN110826429A (en) Scenic spot video-based method and system for automatically monitoring travel emergency
Ganokratanaa et al. Video anomaly detection using deep residual-spatiotemporal translation network
CN116311214B (en) License plate recognition method and device
Li et al. Fall detection based on fused saliency maps
Muthalagu et al. Vehicle lane markings segmentation and keypoint determination using deep convolutional neural networks
Uppal et al. Emotion recognition and drowsiness detection using Python
Jegham et al. Deep learning-based hard spatial attention for driver in-vehicle action monitoring
Dhawan et al. Identification of traffic signs for advanced driving assistance systems in smart cities using deep learning
CN112528903B (en) Face image acquisition method and device, electronic equipment and medium
CN117576666B (en) Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting
Batapati et al. Video analysis for traffic anomaly detection using support vector machines
Thakare et al. Object interaction-based localization and description of road accident events using deep learning
Kommanduri et al. DAST-Net: Dense visual attention augmented spatio-temporal network for unsupervised video anomaly detection
CN114792437A (en) Method and system for analyzing safe driving behavior based on facial features
Tayo et al. Vehicle license plate recognition using edge detection and neural network
Sirisha et al. Utilizing a Hybrid Model for Human Injury Severity Analysis in Traffic Accidents.
Gopikrishnan et al. DriveCare: a real-time vision based driver drowsiness detection using multiple convolutional neural networks with kernelized correlation filters (MCNN-KCF)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant