CN114241456A

CN114241456A - Safe driving monitoring method using feature adaptive weighting

Info

Publication number: CN114241456A
Application number: CN202111564304.7A
Authority: CN
Inventors: 路小波; 陆明琦; 胡耀聪
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2022-03-25

Abstract

The invention discloses a safe driving monitoring method using feature adaptive weighting. The invention researches the fusion strategy of global features and key point features with different scales. Aiming at the attention problem in the fusion process, the invention does not directly cascade global features and key point features, but provides a feature fusion module based on the posture aiming at the global features and the key point features. The type difference of the driver behavior is shown in different image areas, the model focuses on different areas of different input images, therefore, the self-adaptive weighting module is provided, a set of expert weights specific to input data is learned to select a convolution kernel for calculation, a new direction is provided for the driver action recognition, and the accuracy of the driver behavior recognition is further improved. The invention has important application value in the field of traffic safety.

Description

Safe driving monitoring method using feature adaptive weighting

Technical Field

The invention belongs to the field of image processing and pattern recognition, and particularly relates to a safe driving monitoring method using feature adaptive weighting.

Background

Despite the improved safety of road and vehicle designs, the total number of fatal accidents is still increasing. The World Health Organization (WHO)2017 global condition report reports that the worldwide annual death due to road traffic accidents is estimated to be 125 million, causing up to 5 million people non-fatal injuries. In addition, road traffic accidents cause enormous property damage, and the number of road accidents due to distracted driving is steadily increasing, so that the research of the driver behavior recognition algorithm is an important but challenging task for road safety.

Disclosure of Invention

In order to solve the problems, the invention discloses a safe driving monitoring method utilizing feature adaptive weighting, and the method of feature fusion and dynamic convolution used in the invention realizes the improvement of the accuracy rate of driver behavior identification in the testing stage.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a safe driving monitoring method using feature adaptive weighting includes the following steps:

step 1: using the existing StateFarm distraction dataset as an experimental dataset;

step 2: constructing a feature self-adaptive weighting model, wherein ResNet is used as a network global feature extractor, a gesture estimation model is used for capturing key point level semantics, a gesture-based feature fusion module is used for fusing global features and key point features, the module adopts a multi-branch structure, one branch uses a global average pooling layer to extract the attention of the global features, and the other branch directly uses point-by-point convolution to extract the channel attention of the key point features; then, feeding the fused features into an adaptive weighting module to dynamically adjust the convolutional neural network; dynamically generating a set of input-dependent weights using the fully-connected layer and combining them with corresponding convolution parameters to generate a new convolution kernel; finally, the output characteristics of the module are input to a classifier;

step 201: for an input driving image, extracting global features by adopting ResNet as a model backbone;

step 202: detecting key points of a driver by using a posture estimation model, generating a boundary box of the key points through post-processing, extracting the characteristics of the key points through RoI Align and modeling; setting a threshold value for the response of the key point due to the fact that the key point cannot be detected due to shielding and the like; key points with response values below the threshold will not participate in subsequent calculations; the network generates a separate feature map for each key point;

step 203: the category difference of the driver actions is mainly reflected on the detail of the key points, so that the global features and the key point features are fused in a cascading manner;

step 204: in order to enhance the representation of useful keypoint feature channels and suppress irrelevant features, an adaptive weighting module is proposed to recalibrate the activation strengths of different keypoint feature channels; the self-adaptive weighting module is used for converting the fusion features before the fusion features are transmitted to the classifier; each input is treated as a linear combination of n experts to compute the convolution kernel, a detailed description of this module is as follows:

the self-adaptive weighting module sets a plurality of convolution kernels in the convolution layer; the weight of each convolution kernel is determined by the input of the convolution layer through a full connection layer; finally, a group of convolution kernels which are customized for input are obtained through weighted summation, so that primary convolution is realized; taking the fusion characteristics of the global and key points as the input of the self-adaptive weighting module; the fusion feature reveals the driver action category, guides different experts to focus on the input they are interested in; the n expert weights in the convolutional layer are determined by the fusion characteristics; in other words, the weights of the n experts are different in all samples, each input being processed with a different weight; specifically, the expert weight α ═ r (f) is dynamically generated and combined with the corresponding original parameters to generate a new convolution kernel;

α＝r(f)＝S(FC(GAP(f)))#(5)

wherein S represents a Sigmoid activation function, FC represents a full connection layer, and GAP represents a global average pooling layer; the full connection layer maps the processed fusion features to n expert weights; the convolution kernel associated with the input is calculated as a function of the input samples and parameterized as (alpha)₁·W₁+…α_n·W_n) (ii) a Generated convolution feature f^IDThe calculation is as follows:

f^ID＝σ((α₁·W₁+…a_n·W_n)*f)#(6)

where f denotes input features, each α_i＝r_i(x) Is a scalar weight that depends on the input, n is the number of experts, σ is an activation function; it is evident that for different inputs, the model capacity increases with the number of experts; only a small inference cost is needed because the convolution kernel is computed as a linear sum of n expert weights, rather than increasing the kernel parameters or number of channels of the convolution layer;

replacing the standard convolution layer with an adaptive weighting module to construct an adaptive block capable of learning specific convolution kernel parameters for each input driving image; in order to avoid serious overfitting problem caused by too deep network, and the expert weight in the adaptive weighting module is more specific in class in the deeper layer of the network, only the last convolution group in ResNet uses an adaptive block;

and step 3: training a feature adaptive weighting model; on an open source platform PyTorch, training by adopting an SGD optimizer;

and 4, step 4: and testing the characteristic self-adaptive weighting model.

The invention has the beneficial effects that:

(1) the invention provides a feature fusion module based on gestures to fuse global and key point features, which enhances channel attention and fuses multi-scale feature context.

(2) The invention provides a self-adaptive weighting module, which is used for customizing an independent convolution layer for each input sample and dynamically adjusting the convolution layer.

(3) The method further improves the accuracy of identifying the driver behavior, and has important application value in the field of traffic safety.

Drawings

FIG. 1 is a flow chart of the present invention;

figure 2 is a picture of a sample of different driving behaviors in the present invention,

FIG. 3 is a schematic diagram of the structure of the feature adaptive weighting model in the present invention,

fig. 4 is a schematic structural diagram of the adaptive weighting module of the present invention.

Detailed Description

The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention.

The invention relates to a safe driving monitoring method using characteristic self-adaptive weighting, which comprises the following specific implementation steps:

step 2: constructing a characteristic adaptive weighting model, wherein FIG. 3 is a structural schematic diagram of the model; the method uses ResNet as a network global feature extractor, uses a posture estimation model to capture key point level semantics, fuses global features and key point features through a posture-based feature fusion module, the module adopts a multi-branch structure, one branch uses a global average pooling layer to extract attention of the global features, and the other branch directly uses point-by-point convolution to extract channel attention of the key point features; then, feeding the fused features into an adaptive weighting module to dynamically adjust the convolutional neural network; dynamically generating a set of input-dependent weights using the fully-connected layer and combining them with corresponding convolution parameters to generate a new convolution kernel; finally, the output characteristics of the module are input to a classifier;

step 204: in order to enhance the representation of useful keypoint feature channels and suppress irrelevant features, an adaptive weighting module is proposed to recalibrate the activation strengths of different keypoint feature channels; in FIG. 3, the adaptive weighting module is used to transform the fused features before passing them to the classifier; as shown in fig. 4, each input is treated as a linear combination of n experts to compute the convolution kernel, a detailed description of this module is as follows:

the self-adaptive weighting module sets a plurality of convolution kernels in the convolution layer; the weight of each convolution kernel is determined by the input of the convolution layer through a full connection layer; finally, a group of convolution kernels which are customized for input are obtained through weighted summation, so that primary convolution is realized; taking the fusion characteristics of the global and key points as the input of the self-adaptive weighting module; the fusion feature reveals the driver action category, guides different experts to focus on the input they are interested in; the n expert weights in the convolutional layer are determined by the fusion characteristics; in other words, the weights of the n experts are different in all samples, each input being processed with a different weight; specifically, the expert weight a ═ r (f) is dynamically generated and combined with the corresponding original parameters to generate a new convolution kernel;

α＝r(f)＝S(FC(GAP(f)))#(5)

f^ID＝σ((α₁·W₁+…α_n·W_n)*f)#(6)

and 4, step 4: and testing the characteristic self-adaptive weighting model.

It should be noted that the above-mentioned contents only illustrate the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and it is obvious to those skilled in the art that several modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations fall within the protection scope of the claims of the present invention.

Claims

1. A safe driving monitoring method using feature adaptive weighting is characterized by comprising the following steps:

step 2: constructing a characteristic self-adaptive weighting model; using ResNet as a network global feature extractor, using a posture estimation model to capture key point level semantics, fusing global features and key point features through a posture-based feature fusion module, wherein the module adopts a multi-branch structure, one branch uses a global average pooling layer to extract attention of the global features, and the other branch directly uses point-by-point convolution to extract channel attention of the key point features; then, feeding the fused features into an adaptive weighting module to dynamically adjust the convolutional neural network; dynamically generating a set of input-dependent weights using the fully-connected layer and combining them with corresponding convolution parameters to generate a new convolution kernel; finally, the output characteristics of the module are input to a classifier;

and 4, step 4: and testing the characteristic self-adaptive weighting model.

2. The safe driving monitoring method using feature adaptive weighting according to claim 1, wherein the step 2 is to construct a feature adaptive weighting model, use ResNet as a network global feature extractor, use a pose estimation model to capture the key point level semantics, and fuse the global features and the key point features through a pose-based feature fusion module, which adopts a multi-branch structure, one branch uses a global mean pooling layer to extract the attention of the global features, and the other branch directly uses a point-by-point convolution to extract the channel attention of the key point features; then, feeding the fused features into an adaptive weighting module to dynamically adjust the convolutional neural network; dynamically generating a set of input-dependent weights using the fully-connected layer and combining them with corresponding convolution parameters to generate a new convolution kernel; finally, the output characteristics of the module are input to a classifier; the method comprises the following specific steps:

α＝r(f)＝S(FC(GAP(f)))#(5)

f^ID＝σ((α₁·W₁+…α_n·W_n)*f)#(6)

replacing the standard convolution layer with an adaptive weighting module to construct an adaptive block capable of learning specific convolution kernel parameters for each input driving image; to avoid severe overfitting problems due to too deep networks, and to make the expert weights in the adaptive weighting module more class specific at deeper layers of the network, the adaptive block is used only in the last convolution group in the ResNet.