CN114067153A - Image classification method and system based on parallel double-attention light-weight residual error network - Google Patents

Image classification method and system based on parallel double-attention light-weight residual error network Download PDF

Info

Publication number
CN114067153A
CN114067153A CN202111290845.5A CN202111290845A CN114067153A CN 114067153 A CN114067153 A CN 114067153A CN 202111290845 A CN202111290845 A CN 202111290845A CN 114067153 A CN114067153 A CN 114067153A
Authority
CN
China
Prior art keywords
characteristic information
information matrix
attention
input
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111290845.5A
Other languages
Chinese (zh)
Other versions
CN114067153B (en
Inventor
骆爱文
路畅
黄蓓蓓
李媛
王芮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN202111290845.5A priority Critical patent/CN114067153B/en
Publication of CN114067153A publication Critical patent/CN114067153A/en
Application granted granted Critical
Publication of CN114067153B publication Critical patent/CN114067153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image classification method and system based on a parallel double-attention light-weight residual error network, which comprises the steps of carrying out structural optimization on a convolution kernel of the residual error network and extracting a characteristic information matrix of an input image; performing global average pooling on the characteristic information matrix output by the parallel double-attention light-weight residual error structure, integrating full-layer spatial information, and converting the characteristic information matrix into a one-dimensional characteristic information matrix; and inputting the one-dimensional characteristic information matrix into the full-connection layer to obtain a matrix of the number of classes corresponding to the classification task, and outputting an image classification result. According to the invention, the residual error network is compressed and a double-branch space channel attention mechanism is adopted, so that parameters and calculated amount are compressed and the processing speed is increased on the premise of ensuring the precision, thereby improving the overall efficiency of the target classification identification based on the deep neural network.

Description

Image classification method and system based on parallel double-attention light-weight residual error network
Technical Field
The invention relates to the field of image processing, in particular to an image classification method and system based on a parallel double-attention light-weight residual error network.
Background
The purpose of image object classification is to achieve the positioning of image objects on a background image and output their corresponding categories. The ResNet residual network is a deep neural network commonly used today in the field of image classification. The ResNet can effectively relieve the precision degradation problem caused by network deepening by means of a residual error layer, so that the training of a deep neural network becomes feasible, and a deeper network can be used for obtaining a better image recognition effect. In addition, the residual layer of ResNet can make the neural network easily break through hundreds of layers or even thousands of layers, thereby obtaining stronger image feature expression capability and target classification and identification capability.
The method for classifying the chip defect images based on the ResNet network is provided with a publication number CN113076989A (the publication number is 2021-07-06), and classification is carried out through the ResNet network, wherein the classification comprises the steps of dividing obtained sample data into a training set, a verification set and a test set; sample pretreatment; using the training set and the verification set in the processed sample image to train the constructed network model; taking the trained network model as a test model, using the rest test set in a test network, and finally outputting a classification result through an activation function; through preprocessing, the problem that a large amount of calculation work and time are consumed for processing the whole image due to the fact that the original image size of a sample is large is avoided, the possibility of over-fitting is prevented through data enhancement, and the classification performance can be improved through adding more feature information; the problems of gradient disappearance or gradient explosion and the problem of learning efficiency degradation are solved through a ResNet residual block.
The method classifies the images based on the ResNet network; however, parameters and Floating-Point Operations (FLOPs) of the ResNet network are still high, which still causes a large amount of computation, and further causes a too slow computation speed measured by a frame rate (fps); in addition, after the ResNet network enters a deeper layer, the improvement range of the model classification precision is very limited, and the development requirement of the edge machine vision application cannot be met.
Disclosure of Invention
The invention provides an image classification method and system based on a parallel double-attention light-weight residual error network, aiming at overcoming the defects that the existing ResNet residual error network is large in parameter and large in calculation amount, so that the calculation speed is low, and the identification accuracy is damaged when the ResNet residual error network is compressed.
In order to solve the technical problems, the technical scheme of the invention is as follows:
in a first aspect, the present invention provides an image classification method based on a parallel dual-attention lightweight residual error network, including the following steps:
s1: and inputting the image into a residual error network and preprocessing the image.
S2: and carrying out structure optimization on the convolution kernel of the residual error network, and extracting a characteristic information matrix of the input image.
S3: and carrying out batch normalization processing on the characteristic information matrix and carrying out nonlinear activation.
S4: the feature information matrix obtained in S3 is subjected to parallel processing of channel attention and spatial attention, and a new feature information matrix is output.
S5: and performing global average pooling on the characteristic information matrix obtained in the step S4, integrating full-layer spatial information, and converting the characteristic information matrix into a one-dimensional characteristic information matrix.
S6: and inputting the one-dimensional characteristic information matrix into the full-connection layer to obtain a matrix of the number of classes corresponding to the classification task, and outputting an image classification result.
Preferably, the preprocessing of the image in S1 includes a uniform modification of the size of the image by way of supplementation or cropping.
Preferably, S2 specifically includes: dividing an A multiplied by A large convolution kernel arranged on a residual network input layer into a plurality of layers of symmetrical small convolution kernels connected in series, then sequentially inputting the preprocessed input image into the plurality of layers of symmetrical small convolution kernels connected in series, and extracting to obtain a characteristic information matrix of the input image; wherein the size of any one of the small symmetric convolution kernels is B multiplied by B, and A > B is more than or equal to 1.
Preferably, S2 specifically includes: decomposing the A multiplied by A large convolution kernel arranged on the residual network input layer into a layer of A multiplied by 1 and a layer of 1 multiplied by A asymmetric convolution kernels which are connected in sequence, then sequentially inputting the preprocessed input image into the layer of A multiplied by 1 and the layer of 1 multiplied by A asymmetric convolution kernels, and extracting to obtain a characteristic information matrix of the input image; wherein A is a positive integer greater than 1.
Preferably, S3 specifically includes the following:
s3.1: the characteristic information matrix is subjected to batch normalization processing, and the calculation formula is as follows:
Figure BDA0003334698350000021
wherein, FoutputAn output characteristic information matrix representing batch normalization processing; finputAn input characteristic information matrix representing batch normalization processing; mean () represents the Mean calculation; var (·) denotes variance calculation; eps indicates the introduction of errors and avoids the denominator being zero; gamma represents a scaling factor; β represents a characteristic translation factor;
s3.2: output characteristic information matrix F for batch normalization processing by ReLU functionoutputAnd carrying out nonlinear activation, wherein the output size of the nonlinear activation is kept unchanged, and the formula is as follows:
Figure BDA0003334698350000031
s3.3: down-sampling the characteristic information matrix subjected to nonlinear activation through maximum pooling operation, and changing the output size of the characteristic information matrix;
s3.4: and (4) performing 1 × 1 convolution operation on the matrix obtained in the step (S3.3), performing batch normalization processing, and activating by using a ReLU function to obtain a new characteristic information matrix.
Preferably, S4 specifically includes the following:
s4.1: the characteristic information matrix is divided into two parts according to the equal channel number, and the two parts respectively enter two parallel characteristic screening branches.
S4.2: and splicing the characteristic information matrixes respectively output by the two parallel characteristic screening branches, and performing batch normalization processing on the spliced characteristic information matrixes and activating by utilizing a ReLU function.
S4.3: and performing parallel processing of channel attention and space attention on the characteristic information matrix obtained in the step S4.2, and adding attention to the characteristic information matrix.
S4.4: and adding the characteristic information matrix obtained in the step S3.4 and the characteristic information matrix added with attention in the step S4.3 to output a new characteristic information matrix.
Preferably, in S4.1, the two parallel feature screening branches respectively include a 1 × 1 dot convolution, a 3 × 3 depth separable convolution and a 1 × 1 dot convolution, which are connected in sequence.
Preferably, in S4.1, the parallel feature screening branch is provided with a variable size processing operation and a non-variable size processing operation:
in the variable-size processing operation, after the characteristic information matrix is subjected to 3 multiplied by 3 depth separable convolution, the height and the width are halved, and the number of channels is doubled;
in the invariant size processing operation, the feature information matrix is not changed in size after being subjected to 3 × 3 depth separable convolution.
Preferably, S4.3 specifically includes the following:
s4.3.1: compressing the characteristic information matrix in the spatial dimension, and performing global average pooling, multi-layer perceptron and Sigmoid activation function processing to obtain a multi-layer information matrix with the size of 1 × 1 and unchanged channels, wherein the multi-layer information matrix is a weight matrix of the characteristic information in the channel dimension, namely a channel attention Foutput_CThe formula is as follows:
Foutput_C=σ(MLP2(ReLU(MLP1(AvgPool(F′input)))))
wherein the first sensing operation
Figure BDA0003334698350000041
Second perception operation
Figure BDA0003334698350000042
The inchannel is the number of channels of the characteristic information matrix output by S4.2; conv (in, out, kernel _ size) represents convolution calculation operation, wherein in is the number of input channels, out is the number of output channels, and kernel _ size is the size of a convolution kernel; ReLU (.) is a ReLU function; σ () denotes a Sigmoid activation function, and
Figure BDA0003334698350000043
AvgPool (.) represents the average pooling operation; f'inputA characteristic information matrix obtained in S4.2;
compressing the characteristic information matrix in the channel dimension, and performing channel averaging and Sigmoid activation function processing to obtain a single-channel information matrix with unchanged size, wherein the single-channel information matrix is a weight matrix of the characteristic information in the space dimension, namely space attention Foutput_SThe formula is as follows:
Foutput_S=σ(Mean(F′input))
wherein Mean (·) represents the Mean calculation;
s4.3.2: attention to the channel Foutput_CAnd spatial attention Foutput_SThe feature information matrix F 'obtained in S4.2 is added in a multiplication mode'inputAnd merge the input features FinputNamely, the input characteristics, the channel attention and the space attention are fused in parallel to obtain a characteristic information matrix Foutput_dualThe formula is as follows:
Foutput_dual=Finput*σ(MLP2(ReLU(MLP1(AvgPool(F′input)))))*σ(Mean(F′input))。
in a second aspect, the present invention further provides an image classification system based on a parallel dual-attention lightweight residual error network, which is applied to the image classification method based on the parallel dual-attention lightweight residual error network in any of the above solutions, and includes:
the preprocessing module is used for preprocessing an input image;
the characteristic information extraction module is used for carrying out structural optimization on a convolution kernel of the residual error network and extracting a characteristic information matrix of the input image;
the characteristic information processing module is used for processing a characteristic information matrix by utilizing a parallel double-attention light-weight residual error structure in a residual error network to obtain a one-dimensional characteristic information matrix comprising accurate characteristic information;
and the image classification module is used for inputting the one-dimensional characteristic information matrix into the full connection layer of the residual error network to obtain a matrix of the number of classes corresponding to the classification task and outputting an image classification result.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
(1) the invention adopts the modes of convolution kernel decomposition, channel separation, depth separable convolution and network width adjustment to reduce network parameters and hardware memory resource occupation, thereby reducing the calculated amount of the network, accelerating the calculation speed of the network and realizing the lightweight of the model.
(2) The invention further adopts a double-branch spatial channel attention mechanism (DBSC) to improve the system identification capability, and improves the accuracy of the whole network on the basis of light model weight, thereby achieving better classification effect.
On the premise of ensuring the precision, the invention compresses the parameters and the calculated amount, and improves the processing speed, thereby improving the overall efficiency of the target classification and identification based on the deep neural network.
Drawings
FIG. 1 is a flowchart of an image classification method based on a parallel dual attention lightweight residual network in embodiment 1
Fig. 2 is an overall framework diagram of the parallel dual-attention lightweight residual error network model in embodiment 1.
FIG. 3 is a comparison of three different convolution kernels of example 1.
Fig. 4 is a flowchart for performing parameter weight reduction on the original bottleeck residual structure by using channel separation parallel computation in example 1.
Fig. 5 is a schematic diagram of adjusting the network width in the bottleeck residual structure in embodiment 1.
Fig. 6 is a schematic diagram of the operation of the attention mechanism of the dual-branch spatial channel in embodiment 1.
Fig. 7 is a diagram showing the overall architecture of the four bottleeck residual structures in example 2.
FIG. 8 is the Top-1 error evolution of the four residual structures estimated on the Animals-10 dataset and the CIFAR-10 dataset in example 2.
Fig. 9 is a schematic diagram of an image classification system based on a parallel dual attention lightweight residual network.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
Referring to fig. 1 to fig. 6, the present embodiment provides an image classification method based on a parallel dual attention lightweight residual error network, including the following steps:
s1: the image is input to a residual network and preprocessed.
In this embodiment, an images dataset of Animals-10 and a data dataset of CIFAR-10 are used as input images. Both Animals-10 and CIFAR-10 contain class 10 images. For the Animals-10 dataset, this example used 25,000 images as the training set, 1,000 images as the validation set, and 1,000 images as the test set. For the CIFAR-10 dataset, this example has 50,000 images as the training set, 5,000 images as the validation set, and 5,000 images as the test set.
In the present embodiment, the input image is subjected to uniform modification of a size of 224 × 224 × 3 by way of supplementation or cropping. As shown in fig. 2, fig. 2 is aN overall framework diagram of a parallel dual-attention lightweight residual error network model, the number of input channels in the first layer in fig. 2 is 3, and the number of output channels is aN, where a represents a channel width factor, and N is the number of output channels of aN original residual error network ResNet.
In this embodiment, the number of input channels and the number of output channels of each layer of the residual error structure in the residual error network may be set or modified according to different convolution operations executed by each layer, using channel information carried by a convolution kernel.
S2: and carrying out structure optimization on the convolution kernel of the residual error network, and extracting a characteristic information matrix of the input image.
The input layer of the existing residual error network needs to be convolved with a large 7 × 7 convolution kernel to strengthen the correlation between the input image and the first layer feature map, however, the algorithm precision improvement obtained by the large 7 × 7 or 5 × 5 convolution kernel is not proportional to the resource consumption, the consumption of computing power is increased more rapidly, and therefore the computing cost is very expensive.
In this embodiment, two schemes are proposed to extract the feature information matrix of the input image instead of the large convolution kernel of 7 × 7. A7 x 7 large convolution kernel of a residual network input layer is divided into a plurality of layers of symmetrical small convolution kernels which are connected in series, preferably three layers of continuous symmetrical small convolution kernels of 3 x 3, and then a characteristic information matrix of an input image is extracted by utilizing the three layers of continuous symmetrical small convolution kernels of 3 x 3; the other method is to nucleate the 7 × 7 large convolution of the residual network input layer into a layer of 7 × 1 and a layer of 1 × 7 asymmetric convolution kernels, and then extract the characteristic information matrix of the input image by using the layer of 7 × 1 and the layer of 1 × 7 asymmetric convolution kernels, and the transformation is called asymmetric matrix of spatial decomposition. As shown in fig. 3, fig. 3 is a comparison of the structures of three different convolution kernels designed in this embodiment.
From the experimental results, as shown in table 1, the computational efficiency is improved by using three successive layers of symmetric small convolution kernels of 3 × 3 × 3, but the improvement effect is rather limited, and the improvement comes at the expense of the reduction of the expression ability. And the input size and the output depth of the asymmetric convolution kernels of one layer of 7 multiplied by 1 and one layer of 1 multiplied by 7 are the same, but the generated model parameters are less, the calculation speed (fps) is faster, and the identification precision of the algorithm can be ensured to be almost unchanged.
TABLE 1 comparison of training time and accuracy before and after model compression
Figure BDA0003334698350000071
As can be seen from table 1, both the small convolution kernel and the asymmetric convolution kernel can effectively reduce the network parameters and improve the memory utilization, wherein the training time can be reduced by 22.3% by using the multilayer continuous symmetric small convolution kernel, and the training time can be reduced by about 29.0% by using the asymmetric convolution kernel. However, the frame rate of the multi-layer small convolution kernel operation is similar to that of the large convolution kernel in the reasoning process, and the asymmetric convolution can greatly improve the frame recognition rate (fps) and accelerate the reasoning speed, so the invention preferentially selects the asymmetric convolution kernel to carry out calculation optimization on the large convolution kernel operation of the network input layer. The characteristic information matrix is extracted by using a layer of 7 × 1 and a layer of 1 × 7 asymmetric convolution kernels, and the size change process of the characteristic information matrix is 224 × 224 × 3 → 112 × 224 × 3 → 112 × 112 × 48.
S3: and carrying out batch normalization processing on the characteristic information matrix and carrying out nonlinear activation. When a deep neural network model is constructed, the perception effect of the middle layer (hidden layer) of the neural network is almost disappeared because the linear function only generates linear change to the input signal, namely the output signal is always the linear combination of the input signal. Therefore, it is necessary to introduce a nonlinear function to approximate the input signal of the corresponding layer, so that each layer of the neural network can generate corresponding spatial mapping or transformation in the nonlinear function, and thus the neural network generates multi-layer perception through approximation of the nonlinear function. In the present embodiment, the ReLU function is preferably used to perform nonlinear activation on the neural network model.
Specifically, the step of S3 includes the following:
s3.1: the characteristic information matrix is subjected to batch normalization processing, and the calculation formula is as follows:
Figure BDA0003334698350000072
wherein, FoutputAn output characteristic information matrix representing batch normalization processing; finputAn input characteristic information matrix representing batch normalization processing; mean () means allCalculating a value; var (·) denotes variance calculation; eps is an introduced error, and the default value is 1 e-5; gamma represents a scaling factor, with a default value of 1; beta represents a characteristic translation factor, and the default value is 0;
s3.2: output characteristic information matrix F for batch normalization processing by ReLU functionoutputAnd carrying out nonlinear activation, wherein the output size of the nonlinear activation is kept unchanged, and the formula is as follows:
Figure BDA0003334698350000081
s3.3: the non-linearly activated feature information matrix is down-sampled by a maximum pooling operation with a step size of 1,3 × 3 so that the output size of the feature information matrix becomes 56 × 56 × 48.
S3.4: and (4) performing 1 × 1 convolution operation on the matrix obtained in the step (S3.4), performing batch normalization processing, and activating by using a ReLU function to obtain a new characteristic information matrix.
S4: the parallel processing of the channel attention and the spatial attention is performed on the feature information matrix obtained in S3, and a new feature information matrix is output, which specifically includes the following steps:
s4.1: dividing the characteristic information matrix into two parts according to the equal channel number, and respectively entering two parallel characteristic branches; wherein, the two parallel characteristic branches respectively comprise a 1 × 1 point convolution, a 3 × 3 depth separable convolution and a 1 × 1 point convolution which are connected in sequence.
In this embodiment, the parallel feature screening branch is provided with a variable size processing operation and a non-variable size processing operation: after the characteristic information matrix is subjected to separable convolution with the depth of 3 multiplied by 3 in the variable-size operation, the height and the width are halved, and the number of channels is doubled; the characteristic information matrix in the operation of the unchanged size has no change in size after being subjected to the depth separable convolution of 3 multiplied by 3. In the implementation process, when the 3 × 3 deep separable convolution is performed in each layer network, the modification of the size of the feature map matrix can be realized by changing the number of channels of the 3 × 3 convolution kernel and the convolution operation.
Because the middle hidden layer in the neural network related by the invention is mainly formed by superposing a certain number of residual error structures, the residual error structure of each layer adopts variable size operation or invariable size operation, and the characteristic diagram information required to be reserved in the layer is determined and is continuously transmitted downwards. The characteristic diagram information needing to be reserved is optimally set according to the calculation result of the previous layer (for example, the characteristic diagram size matched with the previous layer) and the superiority and inferiority of the experimental test result.
In this embodiment, channel separation is performed on the input layer, and the characteristic information matrix of the input residual structure is divided into two parts according to the equal number of channels, so that parallel computation of a plurality of convolution kernels can be realized, the efficiency of convolution computation is improved, and the parameter amount of the network is reduced. The channel separation technology is used for compressing parameters in a Bottleneck (Bottleneck) residual structure formed by a core backbone network of a residual network.
In the implementation process, the input feature information of each bottleeck module is averagely divided into a plurality of groups C, and assuming that the number of feature map input channels input into the current residual module is N, the number of channels of the feature map of each convolution group (sequentially including 1 × 1 point convolution, 3 × 3 depth separable convolution and 1 × 1 point convolution) becomes N/C, that is, N channels of the original feature map are averagely divided into N/C channels by each parallel convolution group. Each divided characteristic channel is processed and calculated by an independent convolution group (sequentially comprising 1 × 1 point convolution, 3 × 3 depth separable convolution and 1 × 1 point convolution), so that multi-branch parallel calculation is realized, wherein 2 is preferentially selected in the embodiment.
In this embodiment, the possibility that parameter optimization can be performed is continuously searched for inside a convolution group including 1 × 1-point convolution, 3 × 3-depth separable convolution, and 1 × 1-point convolution in this order. The 3 × 3 Convolution operation in the original ResNet residual error network is a standard Convolution operation with high calculation power consumption, and in order to further improve the calculation speed and reduce the parameter quantity and the calculation quantity, the invention adopts the depth Separable Convolution (Depthwise Separable Convolition) with higher calculation efficiency to replace the standard 3 × 3 Convolution operation in the Bottleneck residual error structure. The main idea of the depth separable convolution is to combine the depth convolution with a 1 x 1 point-by-point convolution instead of the standard convolution.
For standard convolution, a convolution kernel requires more computational operations and usually more energy to perform for each standard convolution of N input channels. Since the 3 × 3 standard convolution kernel in the bottleeck is a main calculation consumption source of the bottleeck residual error structure, the depth convolution adopted here first is used to apply a single 3 × 3 convolution kernel to each input channel to reduce the calculation complexity; a linear combination of depth layer outputs is then created using point-by-point convolution implemented with a simple 1 x 1 point convolution, whose depth can be flexibly controlled to map it to higher dimensions. In addition, Batch Normalization (BN) processing and the ReLU function are used for non-linear activation for both channels.
The deep convolution divides a 3 multiplied by 3 convolution kernel into three layers, each input channel is only subjected to convolution calculation with the convolution kernel of the corresponding layer to obtain three layers of output channel information, and the three layers of channel information are integrated through 1 multiplied by 1 point-by-point convolution to obtain complete output characteristics. The deep separable convolution reduces the calculation amount of convolution by sacrificing certain correlation among channels and compresses the network parameter amount.
The depth convolution in this embodiment has a network width adjustment, where the network width adjustment is to adjust the number of channels of the feature map by introducing a channel parameter α (0< α ≦ 1) and then according to a formula M ═ α M, where M represents the initial channel depth and M' represents the modified channel depth, for the convolution channel depth of the bottleeck residual structure. As shown in fig. 5, the network depth is reduced according to the size of α, so as to achieve the purpose of reducing the calculation amount and compressing the network parameters, and an optimal parameter α for balancing the accuracy and the light weight is obtained according to the experimental result.
S4.2: splicing (Concat) the characteristic information matrixes respectively output by the two parallel characteristic branches, and performing batch normalization processing on the spliced characteristic information matrixes and activating by utilizing a ReLU function;
s4.3: and performing parallel processing of channel attention and space attention on the characteristic information matrix obtained in the step S4.2, and adding attention to the characteristic information matrix.
The attention mechanism is mainly to enable a model to learn to ignore relatively irrelevant information in an image in a training process, pay more attention to interested information in the image, recover accuracy lost due to lightweight transformation, and is essentially weight operation. Processing the characteristic information of BottleNeck input characteristics according to channel average and convolution average respectively to obtain channel attention and space attention, namely weight; then, the two types of weights are multiplied by the output features of the BottleNeck at the same time, the weights are given to the output features, the feature difference of the output features is increased, and the identification accuracy is improved, wherein the working principle of the method is shown in fig. 6, and the method specifically comprises the following steps:
s4.3.1: compressing the characteristic information matrix in the spatial dimension, and performing global average pooling, multi-layer perceptron and Sigmoid activation function processing to obtain a multi-layer information matrix with the size of 1 × 1 and unchanged channels, wherein the multi-layer information matrix is a weight matrix of the characteristic information in the channel dimension, namely a channel attention Foutput_CThe formula is as follows:
Foutput_C=σ(MLP2(ReLU(MLP1(AvgPool(F′input)))))
wherein the first sensing operation
Figure BDA0003334698350000101
Second perception operation
Figure BDA0003334698350000102
The inchannel is the number of channels of the characteristic information matrix output by S4.2; conv (in, out, kernel _ size) represents convolution calculation operation, wherein in is the number of input channels, out is the number of output channels, and kernel _ size is the size of a convolution kernel; ReLU (.) is a ReLU function; σ () denotes a Sigmoid activation function, and
Figure BDA0003334698350000103
AvgPool (.) represents the average pooling operation; f'inputA characteristic information matrix obtained in S4.2;
compressing the characteristic information matrix in the channel dimension, and carrying out channel averaging and Sigmoid activation function processing to obtain a single-channel information matrix with unchanged size, wherein the single-channel information matrix is characterized byWeight matrix of information in spatial dimension, i.e. spatial attention Foutput_SThe formula is as follows:
Foutput_S=σ(Mean(F′input))
wherein Mean (·) represents the Mean calculation;
s4.3.2: attention to the channel Foutput_CAnd spatial attention Foutput_SThe feature information matrix F 'obtained in S4.2 is added in a multiplication mode'inputThe formula is as follows:
Foutput_dual=Finput*σ(MLP2(ReLU(MLPi(AvgPool(F′input)))))*σ(Mean(F′input))。
since a single attention mechanism can usually only focus on critical information in its own one-dimensional feature space, different attention architectures produce distinct results in deep convolutional neural networks. Unlike many previous networking frameworks that only use spatial attention in low-level feature extraction and apply channel attention in high-level feature extraction, the present invention combines spatial attention and channel attention to form a dual-branch spatial channel (DBSC) attention mechanism, i.e., the channel attention processing and the spatial attention processing are performed in parallel; directing the channel attention F at the output of the residual structureoutput_CAnd spatial attention Foutput_SIs multiplied and simultaneously added to a feature information matrix F'inputAnd merge the input features FinputNamely, the input characteristics, the channel attention and the space attention are fused in parallel to obtain a characteristic information matrix Foutput_dual
The advantages of the double-branch space channel are: on the one hand, the salient position information of each feature map can be emphasized through spatial attention; on the other hand, a significant region present in some feature maps may be captured by the channel attention in another branch.
S4.4: and adding the characteristic information matrix obtained in the step S3.4 and the characteristic information matrix added with attention in the step S4.3 to output a new characteristic information matrix, wherein the matrix can retain the characteristic information of the original image to the greatest extent.
Because the neural networks with different depths can be realized by superposing different numbers of network layers, the depth of the whole residual error network is changed by superposing S residual error structures. In the embodiment, when the performance of the network architecture is tested, it is determined through experimental data that S is set to 16, and the "precision-speed" efficiency ratio is the highest at this time. The number and size of input/output channels of the residual error structure of each network layer are set by a convolution kernel adopted in convolution operation to be executed by the network layer according to the information of the connected upper and lower layer characteristic graphs, and final characteristic information is obtained after transformation of an S layer residual error module.
S5: performing global average pooling on the characteristic information matrix output by the residual error structure, integrating full-layer spatial information, and converting the characteristic information matrix into a one-dimensional characteristic information matrix through a flatten operation;
s6: and inputting the one-dimensional characteristic information matrix into the full-connection layer to obtain a matrix of the number of classes corresponding to the classification task, and realizing image classification.
The large-kernel convolution of the input layer can be replaced by a plurality of layers of small convolution kernels or asymmetric convolution, so that the input parameter quantity can be reduced, the calculation speed (fps) is improved, the network depth can be deepened to realize the improvement of the network capacity, and sufficient image characteristic information is obtained to ensure the stability of the identification precision; by optimizing the residual error structure and using the depth separable convolution to replace network compression technical means such as standard convolution and the like, the parameter quantity of the residual error structure is greatly reduced again, and after the multilayer residual error structures are overlapped, the parameter quantity of the integral model is lighter. The image classification method based on the parallel double-attention light-weight residual error network can be used for the local (internet network edge end) rapid and high-precision image target classification and identification, and reduces hardware resource consumption and energy consumption. The method has high application value for a plurality of devices, such as high-definition televisions, computer monitors, cameras, smart phones and tablet computers. In addition, it can be applied to various computer vision fields, such as object detection, medical imaging, security and surveillance imaging, face recognition, remote sensing images, and the like. By selecting a proper computing platform, the deep residual error network model trained by the invention can be applied to key nodes of new-generation information technologies such as big data, Internet of things, cloud services and the like, namely edge equipment terminals.
Example 2
Referring to fig. 7-8, this embodiment provides an image classification method based on a parallel dual-attention lightweight residual network, which further includes various improved designs of an overall structure of a Bottleneck residual structure, including a bottleeck residual structure PAResNet with only a dual-branch spatial channel attention mechanism, a bottleeck residual structure LightResNet combined with channel segmentation and depth separable convolution, and a bottleeck residual structure ALResNet combined with a channel segmentation, depth separable convolution and parallel dual-branch spatial channel attention mechanism.
The lightweight residual error bottleneck structure is important for constructing a lightweight residual error network. Therefore, different residual networks can be formed by overlapping according to different residual structures in fig. 7, as shown in table 3.
TABLE 3 comparison of four residual network architectures
Figure BDA0003334698350000121
Wherein, the second residual network PAResNet is constructed by stacking the Bottleneck residual structures as shown in fig. 7 (b); the third residual network LightResNet is constructed by stacking channel split Bottleneck residual structures as shown in fig. 7 (c). The attention-driven ALResNet lightweight residual network is constructed by combining the parallel dual-branch spatial channel attention mechanism with the lightweight-oriented technique, as shown in fig. 7 (d). Applying asymmetric convolution at the input of LightResNet and ALResNet instead of a 7 × 7 convolution kernel, unlike previous work using separate attention strategies in different levels of feature extraction, the present invention uses spatial attention and channel attention in parallel for low, medium, and high level feature extraction. Since ALResNet fuses the advantages of model compression and attention mechanisms, it is desirable to obtain better accuracy with fewer parameters and less computational cost. More specifically, the input of the two-branch spatial channel attention mechanism module is connected to the output of the convolutional layer in the Bottleneck, and the other branch of the identity map Bottleneck Bottleneck is connected to the output of the attention module, as shown. However, stacking the attention module directly results in significant performance degradation. Although the two-branch space channel attention mechanism can be integrated into the residual Bottleneck in different ways to obtain more accurate feature information, it also brings more model parameters and calculation operations, so that when the bottleeck residual structure is optimized, the invention combines the two-branch space channel attention mechanism with the lightweight network compression technical means to form the bottleeck residual structure as shown in fig. 7(d) to balance the parameter number and the network precision. Furthermore, the dual branch spatial channel attention mechanism of the present invention may also be applied to other types of layers, blocks, or networks.
This example performed a number of experiments on the Animals-10 dataset and the CIFAR-10 dataset, the results of which are shown in Table 4, Table 5, Table 6, Table 7 and FIG. 8.
TABLE 4 Experimental results for four residual structures estimated on the Animals-10 dataset
Figure BDA0003334698350000131
TABLE 5 Experimental results for four residual structures estimated on CIFAR-10 dataset
Figure BDA0003334698350000132
TABLE 6 evaluation of the comparison of Performance of network models formed by superposition of different residual structures on the Animals-10 dataset
Figure BDA0003334698350000141
TABLE 7 evaluation of the comparison of the Performance of a network model formed by the superposition of different residual structures on the CIFAR-10 dataset
Figure BDA0003334698350000142
It can be seen from the experimental results that the model size of the residual network of the increased attention mechanism increases slightly by about 4.95%, the parameter amount reaches 24.82M, but considerable improvement is achieved in the verification (up to 97.4% on Animals-10 and up to 92.5% on CIFAR-10) and recognition accuracy in the test (up to 95.2% on Animals-10 and up to 92.6% on CIFAR-10). According to the experimental result, the accuracy performance of different attention mechanisms accords with the following conditions: channel attention > spatial attention, fusion attention > single attention, parallel attention > serial attention. In particular, PAResNet integrated with parallel DBSC attention achieves the best accuracy performance. Nevertheless, the model size of PAResNet is still far from satisfactory for edge calculation.
In the embodiment, the ALResNet with the superposition of S-16 residual structures generates a parameter quantity of 4.77M, which is only one fifth of the original ResNet-50 parameter quantity, the inference speed on animalls-10 is as high as 14.90fps, and the inference speed on CIFAR-10 is as high as 16.21 fps. In addition, ALResNet achieved 92.1% top-1 test accuracy on Animals-10 and 89.4% top-1 test accuracy on CIFAR-10, respectively, at a computational cost of 736.82 MFLOPs. The above results demonstrate the effectiveness of spatial channel attention mechanisms and lightweight-oriented network compression techniques. Compared to the most advanced studies, the proposed ALResNet enables a good trade-off between accuracy and computational efficiency in fast reasoning for resource-limited mobile devices in vision-based tasks.
The present embodiment further studies the performance of a lightweight residual network LightResNet, which involves only lightweight oriented compression techniques. Their impact on model size, computational efficiency and recognition accuracy are estimated separately. The results of experiments based on Animals-10 and CIFAR-10 are summarized in tables 6 and 7, respectively. According to the experimental result, PAResNet of uncompressed network scale realizes the best accuracy performance, but the parameter quantity and the reasoning speed of the PAResNet need to be improved. In contrast, LightResNet, which does not involve any attention-driven layer, reduces the amount of parameters to 4.08M substantially and achieves an inference speed two times higher than ResNet-50. It is feasible to use model compression techniques to improve computational efficiency. However, LightResNet achieves the worst accuracy in both data sets. The model error-epoch curve shown in fig. 8 also demonstrates a non-negligible error for a lightweight LightResNet. In other words, the identification capability of LightResNet is diminished due to the loss of characteristic information during network compression.
The three networks have respective advantages, and the lightweight network LightResNet has the fastest inference speed and the minimum parameter number, but the error rate is also the highest; in contrast, PAResNet, which only adds a mechanism of attention, shows better recognition accuracy, but the parameter quantity thereof is also the highest. In contrast, ALResNet, which combines both attention mechanism and network compression technique, achieves the best tradeoff between accuracy and speed.
Example 3
Referring to fig. 9, the present embodiment further provides an image classification system based on a parallel dual attention lightweight residual error network, which is applied to the image classification method based on a parallel dual attention lightweight residual error network in the foregoing embodiment, and includes:
the device comprises a preprocessing module, a characteristic information extraction module, a characteristic information processing module and an image classification module.
In the specific implementation process, the preprocessing module performs preprocessing of uniform size on the input image; the characteristic information extraction module is used for carrying out structural optimization on a convolution kernel of the residual error network and extracting a characteristic information matrix of the input image; (ii) a The characteristic information processing module processes the characteristic information matrix by using a parallel double-attention light-weight residual error structure to obtain a one-dimensional characteristic information matrix comprising accurate characteristic information; and the image classification module inputs the one-dimensional characteristic information matrix into a full connection layer of the residual error network to obtain a matrix of the number of classes corresponding to the classification task, so as to realize image classification.
The terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. The image classification method based on the parallel double-attention light-weight residual error network is characterized by comprising the following steps of:
s1: inputting the image into a residual error network and preprocessing the image;
s2: performing structural optimization on a convolution kernel of the residual error network, and extracting a characteristic information matrix of an input image;
s3: carrying out batch normalization processing on the characteristic information matrix and carrying out nonlinear activation;
s4: performing parallel processing of channel attention and space attention on the characteristic information matrix obtained in the step S3, and outputting a new characteristic information matrix;
s5: performing global average pooling on the characteristic information matrix obtained in the step S4, integrating full-layer spatial information, and converting the characteristic information matrix into a one-dimensional characteristic information matrix;
s6: and inputting the one-dimensional characteristic information matrix into the full-connection layer to obtain a matrix of the number of classes corresponding to the classification task, and outputting an image classification result.
2. The method of image classification based on parallel dual attention lightweight residual network according to claim 1, characterized in that the preprocessing of the image in S1 includes a unified modification of the size of the image by means of supplementation or cropping.
3. The method for image classification based on the parallel dual-attention lightweight residual network according to claim 1, wherein S2 specifically comprises: dividing an A multiplied by A large convolution kernel arranged on a residual network input layer into a plurality of layers of symmetrical small convolution kernels connected in series, then sequentially inputting the preprocessed input image into the plurality of layers of symmetrical small convolution kernels connected in series, and extracting to obtain a characteristic information matrix of the input image; wherein the size of any one of the small symmetric convolution kernels is B multiplied by B, and A > B is more than or equal to 1.
4. The method for image classification based on the parallel dual-attention lightweight residual network according to claim 1, wherein S2 specifically comprises: decomposing the A multiplied by A large convolution kernel arranged on the residual network input layer into a layer of A multiplied by 1 and a layer of 1 multiplied by A asymmetric convolution kernels which are connected in sequence, then sequentially inputting the preprocessed input image into the layer of A multiplied by 1 and the layer of 1 multiplied by A asymmetric convolution kernels, and extracting to obtain a characteristic information matrix of the input image; wherein A is a positive integer greater than 1.
5. The method for image classification based on the parallel dual attention lightweight residual network according to claim 1, wherein S3 specifically comprises the following:
s3.1: the characteristic information matrix is subjected to batch normalization processing, and the calculation formula is as follows:
Figure FDA0003334698340000021
wherein, FoutputAn output characteristic information matrix representing batch normalization processing; finputAn input characteristic information matrix representing batch normalization processing; mean () represents the Mean calculation; var (·) denotes variance calculation; eps is an introduced error; gamma represents a scaling factor; β represents a characteristic translation factor;
s3.2: output characteristic information matrix F for batch normalization processing by ReLU functionoutputAnd carrying out nonlinear activation, wherein the output size of the nonlinear activation is kept unchanged, and the formula is as follows:
Figure FDA0003334698340000022
s3.3: down-sampling the characteristic information matrix subjected to nonlinear activation through maximum pooling operation, and changing the output size of the characteristic information matrix;
s3.4: and (4) performing 1 × 1 convolution operation on the matrix obtained in the step (S3.3), performing batch normalization processing, and activating by using a ReLU function to obtain a new characteristic information matrix.
6. The method for image classification based on the parallel dual attention lightweight residual network according to claim 5, wherein S4 specifically comprises the following steps:
s4.1: dividing the characteristic information matrix into two parts according to the equal channel number, and respectively entering two parallel characteristic screening branches;
s4.2: splicing the characteristic information matrixes respectively output by the two parallel characteristic screening branches, and performing batch normalization processing on the spliced characteristic information matrixes and activating by utilizing a ReLU function;
s4.3: performing parallel processing of channel attention and space attention on the characteristic information matrix obtained in the S4.2, and adding attention to the characteristic information matrix;
s4.4: and adding the characteristic information matrix obtained in the step S3.4 and the characteristic information matrix added with attention in the step S4.3 to output a new characteristic information matrix.
7. The method for image classification based on the parallel double-attention light-weight residual error network according to the claim 6, wherein in S4.1, two parallel feature screening branches respectively comprise a 1 x 1 point convolution, a 3 x 3 depth separable convolution and a 1 x 1 point convolution which are connected in sequence.
8. The parallel dual-attention light-weight residual error network-based image classification method according to claim 7, characterized in that in S4.1, the parallel feature screening branch is provided with variable-size processing operation and invariable-size processing operation:
in the variable-size processing operation, after the characteristic information matrix is subjected to 3 multiplied by 3 depth separable convolution, the height and the width are halved, and the number of channels is doubled;
in the invariant size processing operation, the feature information matrix is not changed in size after being subjected to 3 × 3 depth separable convolution.
9. The method for image classification based on the parallel dual attention lightweight residual network according to claim 6, wherein S4.3 specifically comprises the following:
s4.3.1: compressing the characteristic information matrix in the spatial dimension, and performing global average pooling, multi-layer perceptron and Sigmoid activation function processing to obtain a multi-layer information matrix with the size of 1 × 1 and unchanged channels, wherein the multi-layer information matrix is a weight matrix of the characteristic information in the channel dimension, namely a channel attention Foutput_CThe formula is as follows:
Foutput_C=σ(MLP2(ReLU(MLP1(AvgPool(F′input)))))
wherein the first sensing operation
Figure FDA0003334698340000031
Second perception operation
Figure FDA0003334698340000032
The inchannel is the number of channels of the characteristic information matrix output by S4.2; conv (in, out, kernel _ size) represents convolution calculation operation, wherein in is the number of input channels, out is the number of output channels, and kernel _ size is the size of a convolution kernel; ReLU (.) is a ReLU function; σ () denotes a Sigmoid activation function, and
Figure FDA0003334698340000033
AvgPool (.) represents the average pooling operation; f'inputA characteristic information matrix obtained in S4.2;
compressing the characteristic information matrix in the channel dimension, and performing channel averaging and Sigmoid activation function processing to obtain the characteristic information matrixTo a size-invariant single-channel information matrix, which is a weight matrix of the feature information in the spatial dimension, i.e. spatial attention Foutput_SThe formula is as follows:
Foutput_S=σ(Mean(F′input))
wherein Mean (·) represents the Mean calculation;
s4.3.2: attention to the channel Foutput_CAnd spatial attention Foutput_SThe feature information matrix F 'obtained in S4.2 is added in a multiplication mode'inputAnd merge the input features FinputNamely, the input characteristics, the channel attention and the space attention are fused in parallel to obtain a characteristic information matrix Foutput_dualThe formula is as follows:
Foutput_dual=Finput*σ(MLP2(ReLU(MLP1(AvgPool(F′input)))))*σ(Mean(F′input))。
10. the image classification system based on the parallel double-attention light weight residual error network is applied to the image classification method based on the parallel double-attention light weight residual error network, which is characterized by comprising the following steps of:
the preprocessing module is used for preprocessing an input image;
the characteristic information extraction module is used for carrying out structural optimization on a convolution kernel of the residual error network and extracting a characteristic information matrix of the input image;
the characteristic information processing module is used for processing a characteristic information matrix by utilizing a parallel double-attention light-weight residual error structure in a residual error network to obtain a one-dimensional characteristic information matrix comprising accurate characteristic information;
and the image classification module is used for inputting the one-dimensional characteristic information matrix into the full connection layer of the residual error network to obtain a matrix of the number of classes corresponding to the classification task and outputting an image classification result.
CN202111290845.5A 2021-11-02 2021-11-02 Image classification method and system based on parallel double-attention light-weight residual error network Active CN114067153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111290845.5A CN114067153B (en) 2021-11-02 2021-11-02 Image classification method and system based on parallel double-attention light-weight residual error network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111290845.5A CN114067153B (en) 2021-11-02 2021-11-02 Image classification method and system based on parallel double-attention light-weight residual error network

Publications (2)

Publication Number Publication Date
CN114067153A true CN114067153A (en) 2022-02-18
CN114067153B CN114067153B (en) 2022-07-12

Family

ID=80236549

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111290845.5A Active CN114067153B (en) 2021-11-02 2021-11-02 Image classification method and system based on parallel double-attention light-weight residual error network

Country Status (1)

Country Link
CN (1) CN114067153B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612694A (en) * 2022-05-11 2022-06-10 合肥高维数据技术有限公司 Picture invisible watermark detection method based on two-channel differential convolutional network
CN114722928A (en) * 2022-03-29 2022-07-08 河海大学 Blue-green algae image identification method based on deep learning
CN114898463A (en) * 2022-05-09 2022-08-12 河海大学 Sitting posture identification method based on improved depth residual error network
CN115082928A (en) * 2022-06-21 2022-09-20 电子科技大学 Method for asymmetric double-branch real-time semantic segmentation of network for complex scene
CN115348215A (en) * 2022-07-25 2022-11-15 南京信息工程大学 Encrypted network flow classification method based on space-time attention mechanism
CN115577242A (en) * 2022-10-14 2023-01-06 成都信息工程大学 Electroencephalogram signal classification method based on attention mechanism and neural network
CN115607170A (en) * 2022-11-18 2023-01-17 中国科学技术大学 Lightweight sleep staging method based on single-channel electroencephalogram signal and application
CN116186593A (en) * 2023-03-10 2023-05-30 山东省人工智能研究院 Electrocardiosignal detection method based on separable convolution and attention mechanism
CN114898463B (en) * 2022-05-09 2024-05-14 河海大学 Sitting posture identification method based on improved depth residual error network

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949297A (en) * 2019-03-20 2019-06-28 天津工业大学 Pulmonary nodule detection method based on Reception and Faster R-CNN
CN110929602A (en) * 2019-11-09 2020-03-27 北京工业大学 Foundation cloud picture cloud shape identification method based on convolutional neural network
CN111191737A (en) * 2020-01-05 2020-05-22 天津大学 Fine-grained image classification method based on multi-scale repeated attention mechanism
CN111274869A (en) * 2020-01-07 2020-06-12 中国地质大学(武汉) Method for classifying hyperspectral images based on parallel attention mechanism residual error network
CN111523521A (en) * 2020-06-18 2020-08-11 西安电子科技大学 Remote sensing image classification method for double-branch fusion multi-scale attention neural network
CN111598939A (en) * 2020-05-22 2020-08-28 中原工学院 Human body circumference measuring method based on multi-vision system
CN111898709A (en) * 2020-09-30 2020-11-06 中国人民解放军国防科技大学 Image classification method and device
CN111985370A (en) * 2020-08-10 2020-11-24 华南农业大学 Crop pest and disease fine-grained identification method based on improved mixed attention module
CN112101318A (en) * 2020-11-17 2020-12-18 深圳市优必选科技股份有限公司 Image processing method, device, equipment and medium based on neural network model
CN112733774A (en) * 2021-01-18 2021-04-30 大连海事大学 Light-weight ECG classification method based on combination of BiLSTM and serial-parallel multi-scale CNN
CN112990391A (en) * 2021-05-20 2021-06-18 四川大学 Feature fusion based defect classification and identification system of convolutional neural network
CN113343799A (en) * 2021-05-25 2021-09-03 山东师范大学 Method and system for realizing automatic classification of white blood cells based on mixed attention residual error network
WO2021177628A1 (en) * 2020-03-04 2021-09-10 Samsung Electronics Co., Ltd. Method and apparatus for action recognition

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949297A (en) * 2019-03-20 2019-06-28 天津工业大学 Pulmonary nodule detection method based on Reception and Faster R-CNN
CN110929602A (en) * 2019-11-09 2020-03-27 北京工业大学 Foundation cloud picture cloud shape identification method based on convolutional neural network
CN111191737A (en) * 2020-01-05 2020-05-22 天津大学 Fine-grained image classification method based on multi-scale repeated attention mechanism
CN111274869A (en) * 2020-01-07 2020-06-12 中国地质大学(武汉) Method for classifying hyperspectral images based on parallel attention mechanism residual error network
WO2021177628A1 (en) * 2020-03-04 2021-09-10 Samsung Electronics Co., Ltd. Method and apparatus for action recognition
CN111598939A (en) * 2020-05-22 2020-08-28 中原工学院 Human body circumference measuring method based on multi-vision system
CN111523521A (en) * 2020-06-18 2020-08-11 西安电子科技大学 Remote sensing image classification method for double-branch fusion multi-scale attention neural network
CN111985370A (en) * 2020-08-10 2020-11-24 华南农业大学 Crop pest and disease fine-grained identification method based on improved mixed attention module
CN111898709A (en) * 2020-09-30 2020-11-06 中国人民解放军国防科技大学 Image classification method and device
CN112101318A (en) * 2020-11-17 2020-12-18 深圳市优必选科技股份有限公司 Image processing method, device, equipment and medium based on neural network model
CN112733774A (en) * 2021-01-18 2021-04-30 大连海事大学 Light-weight ECG classification method based on combination of BiLSTM and serial-parallel multi-scale CNN
CN112990391A (en) * 2021-05-20 2021-06-18 四川大学 Feature fusion based defect classification and identification system of convolutional neural network
CN113343799A (en) * 2021-05-25 2021-09-03 山东师范大学 Method and system for realizing automatic classification of white blood cells based on mixed attention residual error network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FEI WANG 等: "Residual Attention Network for Image Classification", 《ARXIV》 *
乔思波 等: "基于残差混合注意力机制的脑部CT图像分类卷积神经网络模型", 《电子学报》 *
宁尚明等: "基于多通道自注意力机制的电子病历实体关系抽取", 《计算机学报》 *
宋泰年 等: "面向轻量化网络的改进双通道注意力机制图像分类方法", 《航空兵器》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114722928A (en) * 2022-03-29 2022-07-08 河海大学 Blue-green algae image identification method based on deep learning
CN114722928B (en) * 2022-03-29 2024-04-16 河海大学 Blue algae image recognition method based on deep learning
CN114898463A (en) * 2022-05-09 2022-08-12 河海大学 Sitting posture identification method based on improved depth residual error network
CN114898463B (en) * 2022-05-09 2024-05-14 河海大学 Sitting posture identification method based on improved depth residual error network
CN114612694B (en) * 2022-05-11 2022-07-29 合肥高维数据技术有限公司 Picture invisible watermark detection method based on two-channel differential convolutional network
CN114612694A (en) * 2022-05-11 2022-06-10 合肥高维数据技术有限公司 Picture invisible watermark detection method based on two-channel differential convolutional network
CN115082928A (en) * 2022-06-21 2022-09-20 电子科技大学 Method for asymmetric double-branch real-time semantic segmentation of network for complex scene
CN115082928B (en) * 2022-06-21 2024-04-30 电子科技大学 Method for asymmetric double-branch real-time semantic segmentation network facing complex scene
CN115348215B (en) * 2022-07-25 2023-11-24 南京信息工程大学 Encryption network traffic classification method based on space-time attention mechanism
CN115348215A (en) * 2022-07-25 2022-11-15 南京信息工程大学 Encrypted network flow classification method based on space-time attention mechanism
CN115577242A (en) * 2022-10-14 2023-01-06 成都信息工程大学 Electroencephalogram signal classification method based on attention mechanism and neural network
CN115607170B (en) * 2022-11-18 2023-04-25 中国科学技术大学 Lightweight sleep staging method based on single-channel electroencephalogram signals and application
CN115607170A (en) * 2022-11-18 2023-01-17 中国科学技术大学 Lightweight sleep staging method based on single-channel electroencephalogram signal and application
CN116186593B (en) * 2023-03-10 2023-10-03 山东省人工智能研究院 Electrocardiosignal detection method based on separable convolution and attention mechanism
CN116186593A (en) * 2023-03-10 2023-05-30 山东省人工智能研究院 Electrocardiosignal detection method based on separable convolution and attention mechanism

Also Published As

Publication number Publication date
CN114067153B (en) 2022-07-12

Similar Documents

Publication Publication Date Title
CN114067153B (en) Image classification method and system based on parallel double-attention light-weight residual error network
CN111462126B (en) Semantic image segmentation method and system based on edge enhancement
US10983754B2 (en) Accelerated quantized multiply-and-add operations
CN111639692B (en) Shadow detection method based on attention mechanism
WO2021018163A1 (en) Neural network search method and apparatus
Liu et al. FDDWNet: a lightweight convolutional neural network for real-time semantic segmentation
CN112446476A (en) Neural network model compression method, device, storage medium and chip
Li et al. Depth-wise asymmetric bottleneck with point-wise aggregation decoder for real-time semantic segmentation in urban scenes
US11216913B2 (en) Convolutional neural network processor, image processing method and electronic device
Chen et al. StereoEngine: An FPGA-based accelerator for real-time high-quality stereo estimation with binary neural network
Zhang et al. Lightweight and efficient asymmetric network design for real-time semantic segmentation
CN110738241A (en) binocular stereo vision matching method based on neural network and operation frame thereof
CN115081588A (en) Neural network parameter quantification method and device
CN116012722A (en) Remote sensing image scene classification method
CN113297959A (en) Target tracking method and system based on corner attention twin network
Xu et al. Faster BiSeNet: A faster bilateral segmentation network for real-time semantic segmentation
Ujiie et al. Approximated prediction strategy for reducing power consumption of convolutional neural network processor
US11948090B2 (en) Method and apparatus for video coding
Wang et al. Msfnet: multistage fusion network for infrared and visible image fusion
CN114049491A (en) Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium
Li et al. Holoparser: Holistic visual parsing for real-time semantic segmentation in autonomous driving
Gao et al. Multi-branch aware module with channel shuffle pixel-wise attention for lightweight image super-resolution
EP4075343A1 (en) Device and method for realizing data synchronization in neural network inference
Gong et al. Research on mobile traffic data augmentation methods based on SA-ACGAN-GN
Feng et al. Real-time object detection method based on YOLOv5 and efficient mobile network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant