CN110378484A

CN110378484A - A kind of empty spatial convolution pyramid pond context learning method based on attention mechanism

Info

Publication number: CN110378484A
Application number: CN201910351669.8A
Authority: CN
Inventors: 王吴凡; 朱纪洪; 匡敏驰; 陈吕劼; 闫星辉
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2019-10-25

Abstract

The empty spatial convolution pyramid pond context learning method based on attention mechanism that the invention discloses a kind of, be characterized in include: empty spatial convolution pyramid pond model and attention model.The cavity spatial convolution pyramid pond model is made of a plurality of empty convolution path in parallel with different spreading rates, for extracting multiple dimensioned contextual information.The attention model characterizes the relationship between different channel contextual informations by nonlinear function, and then distributes weight to the multi-scale information in each channel.The contextual feature learning ability of empty spatial convolution pyramid pond model can be enhanced in empty spatial convolution pyramid pond context learning method based on attention mechanism of the invention, and can flexibly it be embedded into neural network model, suitable for multiple-tasks such as image, semantic segmentation, target detection, image classifications, it is suitable for promoting and applying.

Description

A kind of empty spatial convolution pyramid pond context study based on attention mechanism Method

Technical field

The invention belongs to deep learning field, in particular to a kind of empty spatial convolution pyramid based on attention mechanism Pond context learning method.

Background technique

Empty spatial convolution pyramid pond model is extracted multiple dimensioned by the empty convolution of multiple and different spreading rates in parallel Then contextual information carries out linear fusion to each channel using 1 × 1 convolution.However, since multiple dimensioned contextual information is logical It is normally present in non-linearity manifold, only is not enough to portray the nonlinear dependence between multiple dimensioned contextual information using linear function System, causes empty spatial convolution pyramid pond model that can not effectively extract multiple dimensioned contextual information.

Summary of the invention

In order to overcome above-mentioned empty spatial convolution pyramid pond model to be difficult to characterize asking for different channel non-linearities relationships Topic, the present invention provide a kind of empty spatial convolution pyramid pond context learning method based on attention mechanism.

A kind of empty spatial convolution pyramid pond context learning method based on attention mechanism of the invention belongs to Deep learning field, it is characterised in that include: empty spatial convolution pyramid pond model and attention model, the cavity volume Product space pyramid pond model is made of a plurality of empty convolution path in parallel with different spreading rates, multiple dimensioned for extracting Contextual information, the attention model characterize the relationship between different channel contextual informations by nonlinear function, in turn Weight is distributed to multiple dimensioned contextual information, increases the multiple dimensioned contextual information of empty spatial convolution pyramid pond model Habit ability.

Cavity spatial convolution pyramid pond model, it is characterised in that the cavity spatial convolution pyramid Chi Huamo The single access of type can formalize are as follows:

Wherein p is the location index of the corresponding pixel in convolution kernel center, and c is the channel index of input, and d is the expansion Rate, w_{C, (i, j)}It is the convolution kernel weight of dedicated tunnel and position, x_{C, p+d (i, j)}It is the pixel value of dedicated tunnel and position, G is to adopt Sample grid, b are bias terms.Input feature vector figure x forms multiple dimensioned by empty spatial convolution pyramid pond model treatment Characteristic pattern WhereinIt is d for the spreading rate_nAccess corresponding to output, more rulers Degree characteristic pattern is by being spliced to form the input x of attention model_ASPP

The input of the attention model is complete by global pool and twice to be connected and obtains Bu Tong leading to after activation primitive The weight z in road

Z=δ₂(W₂δ₁(W₁y))

WhereinIt is attention model input x_ASPPIn channel c, value corresponding to position (h, w), y_cFor channel The corresponding global pool value of c, y are the tensor obtained after each channel pool value is spliced, δ₁And δ₂For activation primitive, W₁And W₂It is complete The weight of articulamentum.The input x of the attention model_ASPPBy being multiplied to obtain multiple dimensioned contextual feature with the weight z Scheme X

Detailed description of the invention

Fig. 1 is that a kind of empty spatial convolution pyramid pond context learning method based on attention mechanism of the present invention is shown It is intended to

Specific embodiment

Using drawings and examples, the present invention will be further described below, and attached drawing described herein is used to provide to this Further understanding for invention, constitutes part of this application, and do not constitute a limitation of the invention.

A kind of empty spatial convolution pyramid pond context learning method schematic diagram based on attention mechanism is shown in attached drawing 1, it is characterised in that include: empty spatial convolution pyramid pond model and attention model, the cavity spatial convolution gold word Tower basin model is made of a plurality of empty convolution path in parallel with different spreading rates, for extracting multiple dimensioned context letter Breath, the attention model characterize the relationship between different channel contextual informations by nonlinear function, and then to multiple dimensioned Contextual information distributes weight, increases the multiple dimensioned contextual information learning ability of empty spatial convolution pyramid pond model.

Z=δ₂(W₂δ₁(W₁y))

Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims

1. a kind of empty spatial convolution pyramid pond context learning method based on attention mechanism, it is characterised in that packet Contain: empty spatial convolution pyramid pond model and attention model, cavity spatial convolution pyramid pond model is by more Empty convolution path in parallel of the item with different spreading rates forms, for extracting multiple dimensioned contextual information, the attention mould Type characterizes the relationship between different channel contextual informations by nonlinear function, and then distributes multiple dimensioned contextual information and weigh Weight increases the multiple dimensioned contextual information learning ability of empty spatial convolution pyramid pond model.

2. the pyramid pond of cavity spatial convolution described according to claim 1 model, it is characterised in that the cavity convolution is empty Between the single access of pyramid pond model can formalize are as follows:

Wherein p is the location index of the corresponding pixel in convolution kernel center, and c is the channel index of input, and d is the spreading rate, w_{E, (i, j)}It is the convolution kernel weight of dedicated tunnel and position, x_{C, p+d (i, j)}It is the pixel value of dedicated tunnel and position, G is sampling network Lattice, b are bias terms.Input feature vector figure x forms Analysis On Multi-scale Features by the empty spatial convolution pyramid pond model treatment Figure WhereinIt is d for the spreading rate_nAccess corresponding to output, the multiple dimensioned spy Sign figure is by being spliced to form the input x of attention model_ASPP

The input of the attention model obtains different channels after full connection and activation primitive by global pool and twice Weight z

Z=δ₂(W₂δ₁(W₁y))

WhereinIt is attention model input x_ASPPIn channel c, value corresponding to position (h, w), y_cFor c pairs of channel The global pool value answered, y are the tensor obtained after each channel pool value is spliced, δ₁And δ₂For activation primitive, W₁And W₂To connect entirely The weight of layer.The input x of the attention model_ASPPBy being multiplied to obtain multiple dimensioned contextual feature figure X with the weight z