CN114241456A - Safe driving monitoring method using feature adaptive weighting - Google Patents
Safe driving monitoring method using feature adaptive weighting Download PDFInfo
- Publication number
- CN114241456A CN114241456A CN202111564304.7A CN202111564304A CN114241456A CN 114241456 A CN114241456 A CN 114241456A CN 202111564304 A CN202111564304 A CN 202111564304A CN 114241456 A CN114241456 A CN 114241456A
- Authority
- CN
- China
- Prior art keywords
- features
- convolution
- input
- adaptive weighting
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a safe driving monitoring method using feature adaptive weighting. The invention researches the fusion strategy of global features and key point features with different scales. Aiming at the attention problem in the fusion process, the invention does not directly cascade global features and key point features, but provides a feature fusion module based on the posture aiming at the global features and the key point features. The type difference of the driver behavior is shown in different image areas, the model focuses on different areas of different input images, therefore, the self-adaptive weighting module is provided, a set of expert weights specific to input data is learned to select a convolution kernel for calculation, a new direction is provided for the driver action recognition, and the accuracy of the driver behavior recognition is further improved. The invention has important application value in the field of traffic safety.
Description
Technical Field
The invention belongs to the field of image processing and pattern recognition, and particularly relates to a safe driving monitoring method using feature adaptive weighting.
Background
Despite the improved safety of road and vehicle designs, the total number of fatal accidents is still increasing. The World Health Organization (WHO)2017 global condition report reports that the worldwide annual death due to road traffic accidents is estimated to be 125 million, causing up to 5 million people non-fatal injuries. In addition, road traffic accidents cause enormous property damage, and the number of road accidents due to distracted driving is steadily increasing, so that the research of the driver behavior recognition algorithm is an important but challenging task for road safety.
Disclosure of Invention
In order to solve the problems, the invention discloses a safe driving monitoring method utilizing feature adaptive weighting, and the method of feature fusion and dynamic convolution used in the invention realizes the improvement of the accuracy rate of driver behavior identification in the testing stage.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a safe driving monitoring method using feature adaptive weighting includes the following steps:
step 1: using the existing StateFarm distraction dataset as an experimental dataset;
step 2: constructing a feature self-adaptive weighting model, wherein ResNet is used as a network global feature extractor, a gesture estimation model is used for capturing key point level semantics, a gesture-based feature fusion module is used for fusing global features and key point features, the module adopts a multi-branch structure, one branch uses a global average pooling layer to extract the attention of the global features, and the other branch directly uses point-by-point convolution to extract the channel attention of the key point features; then, feeding the fused features into an adaptive weighting module to dynamically adjust the convolutional neural network; dynamically generating a set of input-dependent weights using the fully-connected layer and combining them with corresponding convolution parameters to generate a new convolution kernel; finally, the output characteristics of the module are input to a classifier;
step 201: for an input driving image, extracting global features by adopting ResNet as a model backbone;
step 202: detecting key points of a driver by using a posture estimation model, generating a boundary box of the key points through post-processing, extracting the characteristics of the key points through RoI Align and modeling; setting a threshold value for the response of the key point due to the fact that the key point cannot be detected due to shielding and the like; key points with response values below the threshold will not participate in subsequent calculations; the network generates a separate feature map for each key point;
step 203: the category difference of the driver actions is mainly reflected on the detail of the key points, so that the global features and the key point features are fused in a cascading manner;
step 204: in order to enhance the representation of useful keypoint feature channels and suppress irrelevant features, an adaptive weighting module is proposed to recalibrate the activation strengths of different keypoint feature channels; the self-adaptive weighting module is used for converting the fusion features before the fusion features are transmitted to the classifier; each input is treated as a linear combination of n experts to compute the convolution kernel, a detailed description of this module is as follows:
the self-adaptive weighting module sets a plurality of convolution kernels in the convolution layer; the weight of each convolution kernel is determined by the input of the convolution layer through a full connection layer; finally, a group of convolution kernels which are customized for input are obtained through weighted summation, so that primary convolution is realized; taking the fusion characteristics of the global and key points as the input of the self-adaptive weighting module; the fusion feature reveals the driver action category, guides different experts to focus on the input they are interested in; the n expert weights in the convolutional layer are determined by the fusion characteristics; in other words, the weights of the n experts are different in all samples, each input being processed with a different weight; specifically, the expert weight α ═ r (f) is dynamically generated and combined with the corresponding original parameters to generate a new convolution kernel;
α=r(f)=S(FC(GAP(f)))#(5)
wherein S represents a Sigmoid activation function, FC represents a full connection layer, and GAP represents a global average pooling layer; the full connection layer maps the processed fusion features to n expert weights; the convolution kernel associated with the input is calculated as a function of the input samples and parameterized as (alpha)1·W1+…αn·Wn) (ii) a Generated convolution feature fIDThe calculation is as follows:
fID=σ((α1·W1+…an·Wn)*f)#(6)
where f denotes input features, each αi=ri(x) Is a scalar weight that depends on the input, n is the number of experts, σ is an activation function; it is evident that for different inputs, the model capacity increases with the number of experts; only a small inference cost is needed because the convolution kernel is computed as a linear sum of n expert weights, rather than increasing the kernel parameters or number of channels of the convolution layer;
replacing the standard convolution layer with an adaptive weighting module to construct an adaptive block capable of learning specific convolution kernel parameters for each input driving image; in order to avoid serious overfitting problem caused by too deep network, and the expert weight in the adaptive weighting module is more specific in class in the deeper layer of the network, only the last convolution group in ResNet uses an adaptive block;
and step 3: training a feature adaptive weighting model; on an open source platform PyTorch, training by adopting an SGD optimizer;
and 4, step 4: and testing the characteristic self-adaptive weighting model.
The invention has the beneficial effects that:
(1) the invention provides a feature fusion module based on gestures to fuse global and key point features, which enhances channel attention and fuses multi-scale feature context.
(2) The invention provides a self-adaptive weighting module, which is used for customizing an independent convolution layer for each input sample and dynamically adjusting the convolution layer.
(3) The method further improves the accuracy of identifying the driver behavior, and has important application value in the field of traffic safety.
Drawings
FIG. 1 is a flow chart of the present invention;
figure 2 is a picture of a sample of different driving behaviors in the present invention,
FIG. 3 is a schematic diagram of the structure of the feature adaptive weighting model in the present invention,
fig. 4 is a schematic structural diagram of the adaptive weighting module of the present invention.
Detailed Description
The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention.
The invention relates to a safe driving monitoring method using characteristic self-adaptive weighting, which comprises the following specific implementation steps:
step 1: using the existing StateFarm distraction dataset as an experimental dataset;
step 2: constructing a characteristic adaptive weighting model, wherein FIG. 3 is a structural schematic diagram of the model; the method uses ResNet as a network global feature extractor, uses a posture estimation model to capture key point level semantics, fuses global features and key point features through a posture-based feature fusion module, the module adopts a multi-branch structure, one branch uses a global average pooling layer to extract attention of the global features, and the other branch directly uses point-by-point convolution to extract channel attention of the key point features; then, feeding the fused features into an adaptive weighting module to dynamically adjust the convolutional neural network; dynamically generating a set of input-dependent weights using the fully-connected layer and combining them with corresponding convolution parameters to generate a new convolution kernel; finally, the output characteristics of the module are input to a classifier;
step 201: for an input driving image, extracting global features by adopting ResNet as a model backbone;
step 202: detecting key points of a driver by using a posture estimation model, generating a boundary box of the key points through post-processing, extracting the characteristics of the key points through RoI Align and modeling; setting a threshold value for the response of the key point due to the fact that the key point cannot be detected due to shielding and the like; key points with response values below the threshold will not participate in subsequent calculations; the network generates a separate feature map for each key point;
step 203: the category difference of the driver actions is mainly reflected on the detail of the key points, so that the global features and the key point features are fused in a cascading manner;
step 204: in order to enhance the representation of useful keypoint feature channels and suppress irrelevant features, an adaptive weighting module is proposed to recalibrate the activation strengths of different keypoint feature channels; in FIG. 3, the adaptive weighting module is used to transform the fused features before passing them to the classifier; as shown in fig. 4, each input is treated as a linear combination of n experts to compute the convolution kernel, a detailed description of this module is as follows:
the self-adaptive weighting module sets a plurality of convolution kernels in the convolution layer; the weight of each convolution kernel is determined by the input of the convolution layer through a full connection layer; finally, a group of convolution kernels which are customized for input are obtained through weighted summation, so that primary convolution is realized; taking the fusion characteristics of the global and key points as the input of the self-adaptive weighting module; the fusion feature reveals the driver action category, guides different experts to focus on the input they are interested in; the n expert weights in the convolutional layer are determined by the fusion characteristics; in other words, the weights of the n experts are different in all samples, each input being processed with a different weight; specifically, the expert weight a ═ r (f) is dynamically generated and combined with the corresponding original parameters to generate a new convolution kernel;
α=r(f)=S(FC(GAP(f)))#(5)
wherein S represents a Sigmoid activation function, FC represents a full connection layer, and GAP represents a global average pooling layer; the full connection layer maps the processed fusion features to n expert weights; the convolution kernel associated with the input is calculated as a function of the input samples and parameterized as (alpha)1·W1+…αn·Wn) (ii) a Generated convolution feature fIDThe calculation is as follows:
fID=σ((α1·W1+…αn·Wn)*f)#(6)
where f denotes input features, each αi=ri(x) Is a scalar weight that depends on the input, n is the number of experts, σ is an activation function; it is evident that for different inputs, the model capacity increases with the number of experts; only a small inference cost is needed because the convolution kernel is computed as a linear sum of n expert weights, rather than increasing the kernel parameters or number of channels of the convolution layer;
replacing the standard convolution layer with an adaptive weighting module to construct an adaptive block capable of learning specific convolution kernel parameters for each input driving image; in order to avoid serious overfitting problem caused by too deep network, and the expert weight in the adaptive weighting module is more specific in class in the deeper layer of the network, only the last convolution group in ResNet uses an adaptive block;
and step 3: training a feature adaptive weighting model; on an open source platform PyTorch, training by adopting an SGD optimizer;
and 4, step 4: and testing the characteristic self-adaptive weighting model.
It should be noted that the above-mentioned contents only illustrate the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and it is obvious to those skilled in the art that several modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations fall within the protection scope of the claims of the present invention.
Claims (2)
1. A safe driving monitoring method using feature adaptive weighting is characterized by comprising the following steps:
step 1: using the existing StateFarm distraction dataset as an experimental dataset;
step 2: constructing a characteristic self-adaptive weighting model; using ResNet as a network global feature extractor, using a posture estimation model to capture key point level semantics, fusing global features and key point features through a posture-based feature fusion module, wherein the module adopts a multi-branch structure, one branch uses a global average pooling layer to extract attention of the global features, and the other branch directly uses point-by-point convolution to extract channel attention of the key point features; then, feeding the fused features into an adaptive weighting module to dynamically adjust the convolutional neural network; dynamically generating a set of input-dependent weights using the fully-connected layer and combining them with corresponding convolution parameters to generate a new convolution kernel; finally, the output characteristics of the module are input to a classifier;
and step 3: training a feature adaptive weighting model; on an open source platform PyTorch, training by adopting an SGD optimizer;
and 4, step 4: and testing the characteristic self-adaptive weighting model.
2. The safe driving monitoring method using feature adaptive weighting according to claim 1, wherein the step 2 is to construct a feature adaptive weighting model, use ResNet as a network global feature extractor, use a pose estimation model to capture the key point level semantics, and fuse the global features and the key point features through a pose-based feature fusion module, which adopts a multi-branch structure, one branch uses a global mean pooling layer to extract the attention of the global features, and the other branch directly uses a point-by-point convolution to extract the channel attention of the key point features; then, feeding the fused features into an adaptive weighting module to dynamically adjust the convolutional neural network; dynamically generating a set of input-dependent weights using the fully-connected layer and combining them with corresponding convolution parameters to generate a new convolution kernel; finally, the output characteristics of the module are input to a classifier; the method comprises the following specific steps:
step 201: for an input driving image, extracting global features by adopting ResNet as a model backbone;
step 202: detecting key points of a driver by using a posture estimation model, generating a boundary box of the key points through post-processing, extracting the characteristics of the key points through RoI Align and modeling; setting a threshold value for the response of the key point due to the fact that the key point cannot be detected due to shielding and the like; key points with response values below the threshold will not participate in subsequent calculations; the network generates a separate feature map for each key point;
step 203: the category difference of the driver actions is mainly reflected on the detail of the key points, so that the global features and the key point features are fused in a cascading manner;
step 204: in order to enhance the representation of useful keypoint feature channels and suppress irrelevant features, an adaptive weighting module is proposed to recalibrate the activation strengths of different keypoint feature channels; the self-adaptive weighting module is used for converting the fusion features before the fusion features are transmitted to the classifier; each input is treated as a linear combination of n experts to compute the convolution kernel, a detailed description of this module is as follows:
the self-adaptive weighting module sets a plurality of convolution kernels in the convolution layer; the weight of each convolution kernel is determined by the input of the convolution layer through a full connection layer; finally, a group of convolution kernels which are customized for input are obtained through weighted summation, so that primary convolution is realized; taking the fusion characteristics of the global and key points as the input of the self-adaptive weighting module; the fusion feature reveals the driver action category, guides different experts to focus on the input they are interested in; the n expert weights in the convolutional layer are determined by the fusion characteristics; in other words, the weights of the n experts are different in all samples, each input being processed with a different weight; specifically, the expert weight α ═ r (f) is dynamically generated and combined with the corresponding original parameters to generate a new convolution kernel;
α=r(f)=S(FC(GAP(f)))#(5)
wherein S represents a Sigmoid activation function, FC represents a full connection layer, and GAP represents a global average pooling layer; the full connection layer maps the processed fusion features to n expert weights; the convolution kernel associated with the input is calculated as a function of the input samples and parameterized as (alpha)1·W1+…αn·Wn) (ii) a Generated convolution feature fIDThe calculation is as follows:
fID=σ((α1·W1+…αn·Wn)*f)#(6)
where f denotes input features, each αi=ri(x) Is a scalar weight that depends on the input, n is the number of experts, σ is an activation function; it is evident that for different inputs, the model capacity increases with the number of experts; only a small inference cost is needed because the convolution kernel is computed as a linear sum of n expert weights, rather than increasing the kernel parameters or number of channels of the convolution layer;
replacing the standard convolution layer with an adaptive weighting module to construct an adaptive block capable of learning specific convolution kernel parameters for each input driving image; to avoid severe overfitting problems due to too deep networks, and to make the expert weights in the adaptive weighting module more class specific at deeper layers of the network, the adaptive block is used only in the last convolution group in the ResNet.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111564304.7A CN114241456A (en) | 2021-12-20 | 2021-12-20 | Safe driving monitoring method using feature adaptive weighting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111564304.7A CN114241456A (en) | 2021-12-20 | 2021-12-20 | Safe driving monitoring method using feature adaptive weighting |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114241456A true CN114241456A (en) | 2022-03-25 |
Family
ID=80759466
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111564304.7A Pending CN114241456A (en) | 2021-12-20 | 2021-12-20 | Safe driving monitoring method using feature adaptive weighting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114241456A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115272992A (en) * | 2022-09-30 | 2022-11-01 | 松立控股集团股份有限公司 | Vehicle attitude estimation method |
CN117576666A (en) * | 2023-11-17 | 2024-02-20 | 合肥工业大学 | Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting |
-
2021
- 2021-12-20 CN CN202111564304.7A patent/CN114241456A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115272992A (en) * | 2022-09-30 | 2022-11-01 | 松立控股集团股份有限公司 | Vehicle attitude estimation method |
CN117576666A (en) * | 2023-11-17 | 2024-02-20 | 合肥工业大学 | Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting |
CN117576666B (en) * | 2023-11-17 | 2024-05-10 | 合肥工业大学 | Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guo et al. | Driver drowsiness detection using hybrid convolutional neural network and long short-term memory | |
CN108133188B (en) | Behavior identification method based on motion history image and convolutional neural network | |
CN110276765B (en) | Image panorama segmentation method based on multitask learning deep neural network | |
Neil et al. | Learning to be efficient: Algorithms for training low-latency, low-compute deep spiking neural networks | |
CN110516536B (en) | Weak supervision video behavior detection method based on time sequence class activation graph complementation | |
CN110110689B (en) | Pedestrian re-identification method | |
CN109034264B (en) | CSP-CNN model for predicting severity of traffic accident and modeling method thereof | |
CN109145712B (en) | Text information fused GIF short video emotion recognition method and system | |
CN110276248B (en) | Facial expression recognition method based on sample weight distribution and deep learning | |
CN114241456A (en) | Safe driving monitoring method using feature adaptive weighting | |
CN115082698B (en) | Distraction driving behavior detection method based on multi-scale attention module | |
CN112861945B (en) | Multi-mode fusion lie detection method | |
Pratama et al. | Deep convolutional neural network for hand sign language recognition using model E | |
CN115280373A (en) | Managing occlusions in twin network tracking using structured dropping | |
CN110633689B (en) | Face recognition model based on semi-supervised attention network | |
CN114241458B (en) | Driver behavior recognition method based on attitude estimation feature fusion | |
CN111797705A (en) | Action recognition method based on character relation modeling | |
Siddiqi | Fruit-classification model resilience under adversarial attack | |
CN109190471B (en) | Attention model method for video monitoring pedestrian search based on natural language description | |
CN114492634A (en) | Fine-grained equipment image classification and identification method and system | |
CN113221683A (en) | Expression recognition method based on CNN model in teaching scene | |
CN113361466A (en) | Multi-modal cross-directed learning-based multi-spectral target detection method | |
CN113205044B (en) | Deep fake video detection method based on characterization contrast prediction learning | |
Singh | Image spam classification using deep learning | |
Karthigayan et al. | Genetic algorithm and neural network for face emotion recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |