CN113298235A - Neural network architecture of multi-branch depth self-attention transformation network and implementation method - Google Patents

Neural network architecture of multi-branch depth self-attention transformation network and implementation method Download PDF

Info

Publication number
CN113298235A
CN113298235A CN202110648214.XA CN202110648214A CN113298235A CN 113298235 A CN113298235 A CN 113298235A CN 202110648214 A CN202110648214 A CN 202110648214A CN 113298235 A CN113298235 A CN 113298235A
Authority
CN
China
Prior art keywords
branch
layer
channel
characteristic
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110648214.XA
Other languages
Chinese (zh)
Inventor
李云响
王亚奇
章一帆
夏能
彭睿孜
唐凯
俞定国
张随雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Media and Communications
Original Assignee
Zhejiang University of Media and Communications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Media and Communications filed Critical Zhejiang University of Media and Communications
Priority to CN202110648214.XA priority Critical patent/CN113298235A/en
Publication of CN113298235A publication Critical patent/CN113298235A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a neural network architecture of a multi-branch depth self-attention transformation network and an implementation method, wherein the neural network architecture comprises the following steps: the first 4 stages of resenxt and two branches and a branch fusion module, where the two branches are local feature branches and global feature branches, respectively. Firstly, inputting the first 4 basic stages passing through ResNeXt, then merging the obtained characteristic layers, passing through a branch fusion module to obtain a final characteristic layer, and classifying through a full connection layer. The network architecture of the invention focuses on the information of the current state of the network in the local characteristic branch aiming at the image and extracts the global information of the network in the global characteristic branch, the multi-branch structure greatly improves the information extraction capability of the network to the image, the accuracy of the network is improved by adding channel weight information in the branch fusion module, and the simple network structure is easy to self-define and modify, and the robustness to related image tasks is increased.

Description

Neural network architecture of multi-branch depth self-attention transformation network and implementation method
Technical Field
The invention relates to the field of deep learning network architectures, in particular to a neural network architecture of a multi-branch deep self-attention transformation network and an implementation method.
Background
In the past few years, Convolutional Neural Networks (CNNs) have become the predominant machine learning method used for various tasks in the field of computer vision, including image recognition, object detection, and semantic segmentation. While a good neural network framework can generally maintain or even improve the accuracy of the results with a reduced computational effort.
The deep learning network architecture is one of main research contents for solving the problems of image classification, target detection, semantic segmentation, human body posture estimation and the like in the field of computers. In the present invention, the proposed portable block is part of a deep learning network architecture.
The invention and optimization of the deep learning network architecture are one of the research hotspots of the current deep learning. The method can be applied to the fields of future medical treatment, unmanned vehicles, voice recognition and the like. The good deep learning network framework can help to improve the detection accuracy, accelerate the network operation rate and the like.
Disclosure of Invention
The network architecture provided by the invention fully extracts local information and global information in the network by a multi-branch structure so as to improve the information extraction capability of the network, and increases weight representation on a channel of a feature layer to enrich the information expression capability of the feature layer.
The invention mainly aims at further improvement and optimization of a deep learning network architecture model and provides a neural network architecture of a multi-branch deep self-attention transformation network and an implementation method. The main network of the network architecture is a ResNeXt network, and the global characteristic branch and the local characteristic branch of the multi-branch structure effectively optimize the network without damaging the main flow network architecture, so that the network architecture is better improved.
A neural network architecture for a multi-branch deep self-attention transforming network, comprising:
receiving a convolution information extraction structure of an image;
receiving a local feature extraction branch and a global feature extraction branch output by the convolution information extraction structure, wherein the local feature extraction branch and the global feature extraction branch are of parallel structures;
a branch fusion module for receiving the output of the local feature extraction branch and the global feature extraction branch;
and the full connection layer is connected with the branch fusion module.
In the invention, the convolution information extraction structure adopts the first 4 stages of the ResNeXt network.
The local feature extraction branch comprises: the block convolution single-channel module comprises a plurality of local information extraction units, wherein the local information extraction units are 1 multiplied by 1 convolutional layers, 3 multiplied by 3 convolutional layers and 1 multiplied by 1 convolutional layers. The number of the grouped convolution single-channel modules is 32 which are connected in parallel. The packet convolution single channel module effectively reduces the parameter amount of the network and gives a weight which can be learned by each packet. The local feature extraction branch mainly focuses on and extracts the current feature layer, extracts new features from the last feature layer and establishes a new feature layer.
The global feature extraction branch comprises:
a downsampling convolutional layer connected with the output end of the convolutional information extraction structure;
a global feature extraction unit connected with the downsampling convolution layer;
and the up-sampling module is connected with the global feature extraction unit.
The global feature extraction unit comprises: a plurality of bottleneck depth self-attention transformation modules, said bottleneck depth self-attention transformation modules comprising: a 1 × 1 convolutional layer, a multi-head self-attention Module (MHSA), and a 1 × 1 convolutional layer connected in this order. The bottleneck depth self-attention transformation module can generate a more interpretable model, and each attention head can learn to perform different tasks. The global feature branch can model the interaction between remote information in the network, so that the attention and extraction of the network to the global information can be improved.
The branch fusion module comprises:
a branch characteristic connection module;
the channel relation learning branch and the reference branch are connected with the branch characteristic connection module;
the channel reweighting module is connected with the channel relation learning branch and the reference branch;
and the channel probability discarding layer is connected with the channel re-weighting module.
The channel relation learning branch comprises a pooling layer, a 1 × 1 convolution layer, a ReLU layer, a 1 × 1 convolution layer and a Sigmoid layer which are sequentially connected. The branch fusion module can combine two feature layers into one feature layer, and each channel of the feature layers can be given the ability of learning weight through the layers, so that the attention of the network to the feature layer channels is optimized.
A method for realizing a neural network architecture of a multi-branch depth self-attention transformation network comprises the following steps:
s1, inputting the image into the first 4 stages of ResNeXt to obtain a characteristic layer;
s2, down-sampling the feature layer obtained in the step S1, and carrying out batch normalization operation on the obtained feature layer;
s3, passing the new feature layer obtained in the step S2 through three bottleneck depth self-attention transformation modules, wherein each bottleneck depth self-attention transformation module comprises a 1 × 1 convolutional layer, a multi-head self-attention module and a 1 × 1 convolutional layer;
s4, carrying out up-sampling on the new feature layer obtained in the step S3 to obtain a feature layer of a global feature branch;
s5, enabling the characteristic layer obtained in the step S1 to pass through the 5 th stage of a ResNeXt network, and evenly dividing the channel of the characteristic layer into 32 parallel grouped convolution single-channel modules;
in step S5, since the total number of channels of the feature layer is 1024, the number of channels of each new feature layer is 32.
The block convolution single-channel module comprises a plurality of local information extraction units, wherein the local information extraction units are 1 × 1 convolutional layers, 3 × 3 convolutional layers and 1 × 1 convolutional layers.
And S6, weighting and combining the 32 characteristic layers obtained in the step S5 to obtain the characteristic layer of the local characteristic branch.
And S7, merging the feature layer of the global feature branch obtained in the step S4 and the feature layer of the local feature branch obtained in the step S6 to obtain a new feature layer.
And S8, passing the feature layer obtained in the step S7 through a branch fusion module to enable each channel in the feature layer to have different weights, and finally passing through a channel probability discarding layer (dropout layer) at the tail end of the module to obtain a new feature layer.
And S9, passing the characteristic layer obtained in the step S8 through a full connection layer to obtain a result.
In step S3, the step size parameter of the convolution kernel used is 2, the convolution kernel size is 3 × 3, the filling pattern is (1,1), and the number of convolution kernels is 1024. The resulting feature layer after convolution is [32 × 1024], where 1024 is the number of channels and 32 is the length and width, respectively.
In step S3, the phase 5 group convolution of the resenext network participates in the multi-branch structure becoming a local feature branch, and the same network structure but where the convolution layer of 3 × 3 is replaced by the multi-head self attention layer (MHSA) to form a new bottleneck depth self attention network, and the operations of down-sampling and up-sampling are added, and the other operations remain unchanged, the structure is called a global feature branch. Because the branch of the multi-head self-attention layer can extract information of the global network, compared with the local information extracted by only the local characteristic branch, the information extraction extent of the network to the image is improved.
In step S3, the obtained feature layer is input, the number of channels is changed by 1 × 1 convolution, and then 1 × 1 convolution is performed to obtain different query feature layers (q), key feature layers (k) and value specific diagnosis layers (v), respectively.
Firstly, relative position coding is carried out in a two-dimensional space, and a relative position coding layer with the same size as the query feature layer, the key feature layer and the value feature layer and the same channel number is obtained.
Secondly, the query feature layer and the key feature layer are point-multiplied to obtain qkT(kT is the transpose of k), qk to prevent the softmax operation from over-amplifying the key with the larger valueTNeeds to be divided by √ (C), the query feature layer dot-multiplied by the relative position code layer to obtain qrT(rT is the transpose of r), the two are subjected to matrix addition to obtain a feature key, and then softmax operation is performed.
And finally, multiplying the obtained feature layer and the value feature layer point to obtain an output feature layer (z) with the same size as the input feature layer.
Due to the multi-head self-attention mechanism, an input feature layer may go through the above steps several times but the parameters are different. And combining a plurality of z into one characteristic layer, and performing convolution operation to keep the size of the obtained characteristic layer the same as that of the input characteristic layer.
In step S3, since the self-attention mechanism is not a cross-over operation, we use a mean pooling layer with step size of 2 and size of 2 × 2.
In step S4, the upsampling process makes the obtained new feature layer have the same length and width as the feature layer to be merged by the other branch through a bilinear interpolation algorithm.
In steps S2 through S6, the ResNeXt network residual (shortcut) structure is still maintained.
In step S8, the branch fusion module includes:
a branch characteristic connection module;
the channel relation learning branch and the reference branch are connected with the branch characteristic connection module;
the channel reweighting module is connected with the channel relation learning branch and the reference branch;
and the channel probability discarding layer is connected with the channel re-weighting module.
The channel relation learning branch comprises a pooling layer, a 1 × 1 convolution layer, a ReLU layer, a 1 × 1 convolution layer and a Sigmoid layer which are sequentially connected.
The branch fusion module includes: a global average pooling layer with an output size of 1 × 1 and a channel number of 4096, a full connection layer with an output channel number of 256, a ReLU activation function, a full connection layer with an output channel number of 4096, a Sigmoid activation function. And obtaining the weight of each corresponding channel of the feature layer, and finally obtaining a new feature layer through the dropout layer with the random probability of 0.5.
The method comprises the following steps: 1) the first 4 stages of resenxt and two branches and a branch fusion module, where the two branches are local feature branches and global feature branches, respectively. 2) The local feature branch in the step 1) is built based on a group convolution structure in ResNeXt, and comprises 32 convolution channels, each convolution channel comprises three same bottleneck convolution blocks, each bottleneck convolution block comprises three convolution layers, and finally the feature layers obtained by each convolution channel are weighted and combined to finally obtain the feature layer of the local feature branch. 3) The global feature branch in the step 1) is constructed based on a Bottleneck transformations network, and comprises a first down-sampling, three Bottleneck depths are self-attention transformed into network blocks, and finally up-sampling is carried out to obtain a feature layer of the global feature branch. 4) The branch fusion module in the step 1) is designed based on the Squeezer-and-Exception block, the structure of the branch fusion module enables each channel in the feature layer to have different weights, and a dropout layer is added at the tail end of the module to reduce the calculation amount of the network. 5) Firstly, inputting the first 4 basic stages of ResNeXt in the step 1), then merging the characteristic layers respectively obtained in the step 3) and the step 5), obtaining a final characteristic layer through a branch fusion module, and classifying through a full connection layer.
Compared with the prior art, the invention has the following advantages:
the invention optimizes the information extraction capability of the network through a multi-branch structure, focuses on global network information and relatively balances local network information, and strengthens the channel information expression of the characteristic layer by adding weight to the channel of the characteristic layer.
The network architecture of the invention focuses on the information of the current state of the network in the local characteristic branch aiming at the image and extracts the global information of the network in the global characteristic branch, the multi-branch structure greatly improves the information extraction capability of the network to the image, the accuracy of the network is improved by adding channel weight information in the branch fusion module, and the simple network structure is easy to self-define and modify, and the robustness to related image tasks is increased.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a diagram of a network architecture according to the present invention;
FIG. 3 is a schematic diagram of a multi-head self-attentive force mechanism according to the present invention;
fig. 4 is a comparison of the results of the inventive network and other classical networks.
Detailed Description
As shown in fig. 1, an implementation method of a neural network architecture of a multi-branch deep self-attention transformation network includes the following steps:
and S1, inputting the image into the first 4 stages of ResNeXt to obtain the characteristic layer.
S2, the feature layer obtained in step S1 is down-sampled, and the obtained feature layer is subjected to batch normalization.
And S3, passing the new feature layer obtained in the step S2 through three bottleneck depth self-attention transformation network blocks, wherein each bottleneck depth self-attention transformation network block comprises a 1 × 1 convolutional layer, a multi-head self-attention layer and a 1 × 1 convolutional layer.
And S4, upsampling the new feature layer obtained in the step S3 to obtain a feature layer of a global feature branch.
S5, passing the feature layer obtained in step S1 through stage 5 of the resenext network, i.e. dividing the channels of the feature layer into 32 channels, because the total channel number of the feature layer is 1024, the channel number of each new feature layer is 32.
And S6, passing the feature layer obtained in the step S4 through a convolution channel, wherein the convolution channel comprises three identical bottleneck convolution blocks, and each bottleneck convolution block comprises three convolution layers.
And S7, weighting and combining the 32 characteristic layers obtained in the step S5 to obtain the characteristic layer of the local characteristic branch.
And S8, merging the feature layer of the global feature branch obtained in the steps S4 and S7 with the feature layer of the local feature branch to obtain a new feature layer.
And S9, passing the feature layer obtained in the step S8 through a branch fusion module, wherein the branch fusion module comprises a pooling layer, a full connection layer, a ReLU layer, a full connection layer and a sigmoid layer, so that each channel in the feature layer has different weights, and finally passing through a dropout layer at the tail end of the module to obtain a new feature layer.
And S10, passing the characteristic layer obtained in the step S9 through a full connection layer to obtain a result.
In step 2), the step size parameter of the convolution kernel used is 2, the convolution kernel size is 3 × 3, the filling pattern is (1,1), and the number of convolution kernels is 1024. The resulting feature layer after convolution is [32 × 1024], where 1024 is the number of channels and 32 is the length and width, respectively.
In step 3), inputting the obtained feature layers, changing the number of channels through 1 × 1 convolution, and then performing 1 × 1 convolution to obtain different query feature layers (q), key feature layers (k) and value specific diagnosis layers (v) respectively.
Firstly, relative position coding is carried out in a two-dimensional space, and a relative position coding layer with the same size as the query feature layer, the key feature layer and the value feature layer and the same channel number is obtained.
Secondly, the query feature layer and the key feature layer are point-multiplied to obtain qkT(kT is the transpose of k), qk to prevent the softmax operation from over-amplifying the key with the larger valueTNeeds to be divided by √ (C), the query feature layer dot-multiplied by the relative position code layer to obtain qrT(rT is the transpose of r), the two are subjected to matrix addition to obtain a feature key, and then softmax operation is performed.
And finally, multiplying the obtained feature layer and the value feature layer point to obtain an output feature layer (z) with the same size as the input feature layer.
Due to the multi-head self-attention mechanism, an input feature layer may go through the above steps several times but the parameters are different. And combining a plurality of z into one characteristic layer, and performing convolution operation to keep the size of the obtained characteristic layer the same as that of the input characteristic layer.
In step 3), since the self-attention mechanism is not a cross-over operation, we use a mean pooling layer with step size of 2 and size of 2 × 2.
The phase 5 group convolution of the ResNeXt network participates in a multi-branch structure to become a local feature branch, a new bottleneck depth self-attention network is formed by replacing a convolution layer with 3 x 3 of the network structure with a multi-head self-attention layer (MHSA), the operations of down-sampling and up-sampling are added, and other operations are kept unchanged, and the structure is called a global feature branch. Because the branch of the multi-head self-attention layer can extract information of the global network, compared with the local information extracted by only the local characteristic branch, the information extraction extent of the network to the image is improved.
In step 4), the upsampling process makes the obtained new feature layer have the same length and width as the feature layer to be merged in the other branch through a bilinear interpolation algorithm.
In steps 2) to 7), the ResNeXt network residual (shortcut) structure is still maintained.
In step 9), the feature layer obtained in step 8) passes through a branch fusion module, which includes a pooling layer, a full connection layer, a ReLU layer, a full connection layer and a sigmoid layer, so that each channel in the feature layer has different weights, and finally a new feature layer is obtained through a dropout layer at the end of the module.
The branch fusion module includes: a global average pooling layer with an output size of 1 × 1 and a channel number of 4096, a full connection layer with an output channel number of 256, a ReLU activation function, a full connection layer with an output channel number of 4096, a Sigmoid activation function. And obtaining the weight of each corresponding channel of the feature layer, and finally obtaining a new feature layer through the dropout layer with the random probability of 0.5.
As shown in fig. 2, the network block specifically implements the method:
1) based on ResNeXt as a basic backbone, ResNeXt usually has 5 block groups, the first 4 block groups are not changed, and a multi-branch structure is added to the 5 th block group.
2) The multi-branch structure is divided into a local feature branch and a global feature branch, the local feature branch is the 5 th block group of ResNeXt, and the convolution of the group is not changed. The size of the feature layer finally output by the 4 th block group is (1024 ), wherein 1024 is the size obtained by multiplying the length and the width of the feature layer respectively, and 1024 is the number of channels. In the global feature branch, the feature layer is first sampled downwards, the step size parameter of the convolution kernel used is 2, the size of the convolution kernel is 3 x 3, the number of the convolution kernels is 1024, the filling mode is (1,1), the size of the new feature layer obtained after convolution is (1024 ),
3) the step size parameter of the convolution kernel of the new feature layer is 1, the size of the convolution kernel is 1 x 1, the number of the convolution kernels is 512, and the size of the new feature layer obtained by normalization after convolution is 1024 or 512.
4) And (3) obtaining a group of a characteristic layers by convolution of 3 a (a is the number of heads) times of convolution kernel with the size of 1 x 1, the step length of 1 and the number of convolution kernels of C, wherein each group is respectively provided with a query characteristic layer (q), a key characteristic layer (k) and a value diagnosis layer (v), and the sizes of the key characteristic layer (k) and the value diagnosis layer (v) are respectively (1024 ).
Secondly, the query feature layer and the key feature layer are point-multiplied to obtain qkTSize 32 × 32, qk to prevent the softmax operation from excessively amplifying the key with a large valueTNeeds to be divided by √ (C). The query feature layer and the relative position coding layer are subjected to dot multiplication to obtain qrT(rTIs the transpose of r), the feature key is obtained by matrix addition of the two, and then softmax operation is carried out.
And finally, multiplying the obtained feature layer and the value feature layer point to obtain an output feature layer (z) with the same size as the input feature layer. Since the self-attention mechanism is not a cross-over operation, we use an average pooling layer of size 2 x 2 with step size 2 x 2. Since we use a headers, a numerically different feature layers z are generated. And combining the a characteristic layers into a characteristic layer with a channel layer number a × 512, and reducing the channel number of the characteristic layer to 512 through a convolution kernel with the convolution kernel number of 512.
5) The new feature layer is obtained by performing normalization after convolution, wherein the size of the new feature layer is (512,2048), the step parameter of the convolution kernel is 1, the size of the convolution kernel is 1 x 1, and the number of the convolution kernels is 2048.
6) And repeating the operation for one block for 2 times according to the steps 3 to 5 to obtain a new feature layer.
7) And the new feature layer is downsampled by an algorithm, so that the obtained new feature layer has the same length and width as the feature layer to be accessed by the original network. Since the operation cannot be completed by the ordinary convolution operation, a new feature layer is obtained by filling the linear relation among pixels by using a bilinear interpolation algorithm.
8) And (3) entering the feature layer obtained in the step (7) and the feature layer obtained by local feature branching into a branch fusion module, firstly merging to obtain a feature layer with the size of (1024, 4096), obtaining a feature layer with the size of (1, 4096) through a global average pooling layer, obtaining a feature layer with the size of (1, 256) through a full connection layer, obtaining a feature layer with the size of (1, 4096) through the full connection layer through a ReLU activation function, and obtaining a new feature layer through the ReLU activation function, wherein the feature layer is increased in calculation amount due to overlarge feature layer, and then passing through a dropout layer with the random probability of 0.5 to reduce the calculation amount and obtain the new feature layer.
9) The new feature layer finally obtains the required result through the full connection layer.
Fig. 4 is a diagram illustrating the results of the network of the present invention and other classical networks, wherein an AGMB-Transformer (AGMB-Transformer) is an abbreviation of a multi-branch depth self-attention transformation network. The network models are ResNet50, SERESNet50, SERESNEXt50, Inception V3, ViT and the model of the invention (AGMB-Transformer). The evaluation criteria were (ACC) accuracy, (AUC) area under receiver operating characteristic curve, (SEN) sensitivity, (SPC) specificity, (F1 Score) F1 Score, respectively. The training and testing data set was derived from a root canal treatment data set, for a total of 245 root canal images. It can be seen from the graph that all scores of the model of the present invention are greater or significantly greater than those of other classical networks.
The above-described embodiments of the present invention do not limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention shall be included in the protection scope of the claims of the present invention.

Claims (10)

1. A neural network architecture for a multi-branch deep self-attention translation network, comprising:
receiving a convolution information extraction structure of an image;
receiving a local feature extraction branch and a global feature extraction branch output by the convolution information extraction structure, wherein the local feature extraction branch and the global feature extraction branch are of parallel structures;
a branch fusion module for receiving the output of the local feature extraction branch and the global feature extraction branch;
and the full connection layer is connected with the branch fusion module.
2. The neural network architecture of the multi-branch deep self-attention transforming network of claim 1, wherein the convolution information extraction structure employs the first 4 stages of the ResNeXt network.
3. The neural network architecture of the multi-branch deep self-attention transforming network according to claim 1, wherein the local feature extraction branches comprise: the block convolution single-channel module comprises a plurality of local information extraction units, wherein the local information extraction units are 1 multiplied by 1 convolutional layers, 3 multiplied by 3 convolutional layers and 1 multiplied by 1 convolutional layers.
4. The neural network architecture of the multi-branch deep self-attention transform network of claim 3, wherein the number of the grouped convolution single channel modules is 32 connected in parallel.
5. The neural network architecture of the multi-branch deep self-attention transforming network according to claim 1, wherein the global feature extraction branch comprises:
a downsampling convolutional layer connected with the output end of the convolutional information extraction structure;
a global feature extraction unit connected with the downsampling convolution layer;
and the up-sampling module is connected with the global feature extraction unit.
6. The neural network architecture of the multi-branch deep self-attention transforming network according to claim 5, wherein the global feature extraction unit comprises: a plurality of bottleneck depth self-attention transformation modules, said bottleneck depth self-attention transformation modules comprising: the multi-head self-attention module comprises a 1 × 1 convolutional layer, a multi-head self-attention module and a 1 × 1 convolutional layer which are connected in sequence.
7. The neural network architecture of the multi-branch deep self-attention transforming network according to claim 1, wherein the branch fusion module comprises:
a branch characteristic connection module;
the channel relation learning branch and the reference branch are connected with the branch characteristic connection module;
the channel reweighting module is connected with the channel relation learning branch and the reference branch;
and the channel probability discarding layer is connected with the channel re-weighting module.
8. The neural network architecture of the multi-branch deep self-attention transforming network of claim 7, wherein the channel relationship learning branch comprises a pooling layer, a 1 x 1 convolutional layer, a ReLU layer, a 1 x 1 convolutional layer, and a Sigmoid layer, which are connected in sequence.
9. The method for implementing the neural network architecture of the multi-branch deep self-attention transformation network according to any one of claims 1 to 8, comprising the following steps:
s1, inputting the image into the first 4 stages of ResNeXt to obtain a characteristic layer;
s2, down-sampling the feature layer obtained in the step S1, and carrying out batch normalization operation on the obtained feature layer;
s3, passing the new feature layer obtained in the step S2 through three bottleneck depth self-attention transformation modules;
s4, carrying out up-sampling on the new feature layer obtained in the step S3 to obtain a feature layer of a global feature branch;
s5, enabling the characteristic layer obtained in the step S1 to pass through the 5 th stage of a ResNeXt network, and evenly dividing the channel of the characteristic layer into 32 parallel grouped convolution single-channel modules;
s6, weighting and combining the 32 characteristic layers obtained in the step S5 to obtain a characteristic layer with local characteristic branches;
s7, merging the characteristic layer of the global characteristic branch obtained in the step S4 and the characteristic layer of the local characteristic branch obtained in the step S6 to obtain a new characteristic layer;
s8, passing the feature layer obtained in the step S7 through a branch fusion module to enable each channel in the feature layer to have different weights, and finally passing through a channel probability discarding layer at the tail end of the module to obtain a new feature layer;
and S9, passing the characteristic layer obtained in the step S8 through a full connection layer to obtain a result.
10. The method according to claim 9, wherein in step S3, each bottleneck depth self-attention transform module comprises a 1 x 1 convolutional layer, a multi-head self-attention module, a 1 x 1 convolutional layer;
in step S5, the block convolution single channel module includes a plurality of local information extraction units, where the local information extraction units are 1 × 1 convolutional layers, 3 × 3 convolutional layers, and 1 × 1 convolutional layers;
in step S8, the branch fusion module includes:
a branch characteristic connection module;
the channel relation learning branch and the reference branch are connected with the branch characteristic connection module;
the channel reweighting module is connected with the channel relation learning branch and the reference branch;
a channel probability discarding layer connected to the channel re-weighting module;
the channel relation learning branch comprises a pooling layer, a 1 × 1 convolution layer, a ReLU layer, a 1 × 1 convolution layer and a Sigmoid layer which are sequentially connected.
CN202110648214.XA 2021-06-10 2021-06-10 Neural network architecture of multi-branch depth self-attention transformation network and implementation method Pending CN113298235A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110648214.XA CN113298235A (en) 2021-06-10 2021-06-10 Neural network architecture of multi-branch depth self-attention transformation network and implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110648214.XA CN113298235A (en) 2021-06-10 2021-06-10 Neural network architecture of multi-branch depth self-attention transformation network and implementation method

Publications (1)

Publication Number Publication Date
CN113298235A true CN113298235A (en) 2021-08-24

Family

ID=77327903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110648214.XA Pending CN113298235A (en) 2021-06-10 2021-06-10 Neural network architecture of multi-branch depth self-attention transformation network and implementation method

Country Status (1)

Country Link
CN (1) CN113298235A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114092833A (en) * 2022-01-24 2022-02-25 长沙理工大学 Remote sensing image classification method and device, computer equipment and storage medium
CN114299542A (en) * 2021-12-29 2022-04-08 北京航空航天大学 Video pedestrian re-identification method based on multi-scale feature fusion
CN116704328A (en) * 2023-04-24 2023-09-05 中国科学院空天信息创新研究院 Ground object classification method, device, electronic equipment and storage medium
WO2024055952A1 (en) * 2022-09-16 2024-03-21 华为技术有限公司 Data processing method and apparatus thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710826A (en) * 2018-04-13 2018-10-26 燕山大学 A kind of traffic sign deep learning mode identification method
CN109063728A (en) * 2018-06-20 2018-12-21 燕山大学 A kind of fire image deep learning mode identification method
CN112447265A (en) * 2020-11-25 2021-03-05 太原理工大学 Lysine acetylation site prediction method based on modular dense convolutional network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710826A (en) * 2018-04-13 2018-10-26 燕山大学 A kind of traffic sign deep learning mode identification method
CN109063728A (en) * 2018-06-20 2018-12-21 燕山大学 A kind of fire image deep learning mode identification method
CN112447265A (en) * 2020-11-25 2021-03-05 太原理工大学 Lysine acetylation site prediction method based on modular dense convolutional network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHANGQIAN YU ET AL.: "BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation", 《ARXIV:1808.00897》, pages 1 - 17 *
SAINING XIE ET AL.: "Aggregated Residual Transformations for Deep Neural Networks", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, pages 5987 - 5995 *
YUNXIANG LI ET AL.: "AGMB-Transformer: Anatomy-Guided Multi-Branch Transformer Network for Automated Evaluation of Root Canal Therapy", 《IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS 》, vol. 26, no. 4, pages 1684, XP011906040, DOI: 10.1109/JBHI.2021.3129245 *
YUNXIANG LI ET AL.: "Anatomy-Guided Parallel Bottleneck Transformer Network for Automated Evaluation of Root Canal Therapy", 《ARXIV:2105.00381V1》, pages 1 - 15 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114299542A (en) * 2021-12-29 2022-04-08 北京航空航天大学 Video pedestrian re-identification method based on multi-scale feature fusion
CN114092833A (en) * 2022-01-24 2022-02-25 长沙理工大学 Remote sensing image classification method and device, computer equipment and storage medium
WO2024055952A1 (en) * 2022-09-16 2024-03-21 华为技术有限公司 Data processing method and apparatus thereof
CN116704328A (en) * 2023-04-24 2023-09-05 中国科学院空天信息创新研究院 Ground object classification method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111767979B (en) Training method, image processing method and image processing device for neural network
CN112052886B (en) Intelligent human body action posture estimation method and device based on convolutional neural network
Gao et al. Global second-order pooling convolutional networks
CN113298235A (en) Neural network architecture of multi-branch depth self-attention transformation network and implementation method
CN112580694B (en) Small sample image target recognition method and system based on joint attention mechanism
CN111696027A (en) Multi-modal image style migration method based on adaptive attention mechanism
CN111401294B (en) Multi-task face attribute classification method and system based on adaptive feature fusion
CN112613479B (en) Expression recognition method based on light-weight streaming network and attention mechanism
CN111476133B (en) Unmanned driving-oriented foreground and background codec network target extraction method
CN112330719A (en) Deep learning target tracking method based on feature map segmentation and adaptive fusion
CN112818764A (en) Low-resolution image facial expression recognition method based on feature reconstruction model
CN113516133A (en) Multi-modal image classification method and system
CN110956575A (en) Method and device for converting image style and convolution neural network processor
CN112733716A (en) SROCRN network-based low-resolution text image identification method
CN115482518A (en) Extensible multitask visual perception method for traffic scene
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN114519383A (en) Image target detection method and system
Dogan A new global pooling method for deep neural networks: Global average of top-k max-pooling
CN112801029A (en) Multi-task learning method based on attention mechanism
CN115171052B (en) Crowded crowd attitude estimation method based on high-resolution context network
CN117033985A (en) Motor imagery electroencephalogram classification method based on ResCNN-BiGRU
CN116246110A (en) Image classification method based on improved capsule network
CN116758415A (en) Lightweight pest identification method based on two-dimensional discrete wavelet transformation
CN113688946B (en) Multi-label image recognition method based on spatial correlation
CN115861841A (en) SAR image target detection method combined with lightweight large convolution kernel

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination