CN114565792B

CN114565792B - Image classification method and device based on lightweight convolutional neural network

Info

Publication number: CN114565792B
Application number: CN202210189921.1A
Authority: CN
Inventors: 王天江; 张量奇; 沈海波; 罗逸豪; 曹翔; 潘蕾西兰
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2024-07-19
Anticipated expiration: 2042-02-28
Also published as: CN114565792A

Abstract

The invention discloses an image classification method and device based on a lightweight convolutional neural network, and belongs to the field of image classification in computer deep learning. The invention comprises the following steps: s1: constructing a lightweight convolutional neural network model, wherein the lightweight neural network model comprises the following components connected in sequence: the system comprises a standard convolution layer, a plurality of sampling splicing units, a global pooling layer and a full connection layer, wherein the sampling splicing units comprise a plurality of components which are sequentially connected: the system comprises a downsampling layer, a plurality of universal layers and a splicing layer; s2: and inputting the pictures to be classified into the lightweight convolutional neural network model to obtain a classification result. According to the invention, the lightweight convolutional neural network model with low parameter quantity, low calculation quantity and high reasoning speed is constructed, and then the lightweight convolutional neural network model is utilized to classify images, so that compared with the prior lightweight convolutional neural network model, the classification speed of the model is greatly improved while the parameter quantity is greatly reduced under the similar classification accuracy.

Description

A method and device for image classification based on lightweight convolutional neural network

技术领域Technical Field

本发明属于深度学习图像分类领域，更具体地，涉及一种基于轻量化卷积神经网络的图像分类方法和装置。The present invention belongs to the field of deep learning image classification, and more specifically, relates to an image classification method and device based on a lightweight convolutional neural network.

背景技术Background technique

近年来，卷积神经网络(ConvolutionalNeural Network，CNN)在计算机视觉，例如图像分类，中得到了广泛的应用，为了提高分类的准确率，CNN模型的深度和宽度都迅速增加，由此带来了模型参数量和计算量的迅速增长，这给CNN在计算能力较弱的设备上的应用带来了阻碍。In recent years, Convolutional Neural Networks (CNN) have been widely used in computer vision, such as image classification. In order to improve the accuracy of classification, the depth and width of CNN models have increased rapidly, resulting in a rapid increase in the number of model parameters and the amount of computation, which has hindered the application of CNN on devices with weaker computing power.

为了在移动或嵌入式设备上进行应用，模型的轻量化是一种重要的途径。目前基于深度可分离卷积(Depthwise Separable Convolution，DSCs)由一层深度卷积和一层逐点卷积构成构建模型并使用神经架构搜索(Neural Architecture Search，NAS)算法搜索最佳架构的方法取得了显著成功，比如MobileNets系列和EfficientNets系列。这些模型均使用深度可分离卷积代替标准卷积，相较于标准卷积，深度可分离卷积的参数量和计算量减少了几倍；在确定网络架构的主干后，再使用NAS算法再搜索最优的模型。例如每层的宽度等，使得在减少模型参数量和计算量的情况下，保证模型的分类准确率。In order to apply it on mobile or embedded devices, lightweighting of the model is an important approach. At present, the method of building a model based on depthwise separable convolution (DSCs) consisting of a layer of deep convolution and a layer of point-by-point convolution and using the Neural Architecture Search (NAS) algorithm to search for the best architecture has achieved remarkable success, such as the MobileNets series and the EfficientNets series. These models all use depthwise separable convolution instead of standard convolution. Compared with standard convolution, the number of parameters and the amount of calculation of depthwise separable convolution are reduced by several times; after determining the backbone of the network architecture, the NAS algorithm is used to search for the optimal model. For example, the width of each layer, etc., ensures the classification accuracy of the model while reducing the number of model parameters and the amount of calculation.

这类方法极大的减少了参数量和计算量，但是在GPU上的推理速度并没有提升甚至比ResNet等经典网络有所下降。其原因是深度可分离卷积中的并不能较好的利用GPU资源，并且这些网络相较于同等准确率的经典网络，使用了更多的网络层，其中包括了更多的非线性激活层和批归一化层，这些都对模型的推理速度造成了影响。This type of method greatly reduces the number of parameters and computation, but the inference speed on the GPU has not improved and is even lower than that of classic networks such as ResNet. The reason is that the depthwise separable convolution cannot make good use of GPU resources, and these networks use more network layers than classic networks with the same accuracy, including more nonlinear activation layers and batch normalization layers, which have an impact on the inference speed of the model.

发明内容Summary of the invention

针对现有技术的以上缺陷或改进需求，本发明提供了一种基于轻量化卷积神经网络的图像分类方法和装置，其目的在于通过设计所述轻量化神经网络模型包括依次连接的：标准卷积层、多个采样拼接单元、全局池化层和全连接层，所述采样拼接单元包括依次连接的：下采样层、多个通用层和拼接层；将待分类图片输入所述轻量化卷积神经网络模型得到分类结果；由此在降低卷积神经网络模型参数量的同时提升模型的推理速度。In view of the above defects or improvement needs of the prior art, the present invention provides an image classification method and device based on a lightweight convolutional neural network, the purpose of which is to design the lightweight neural network model to include: a standard convolutional layer, a plurality of sampling splicing units, a global pooling layer and a fully connected layer connected in sequence, and the sampling splicing unit includes: a downsampling layer, a plurality of general layers and a splicing layer connected in sequence; the image to be classified is input into the lightweight convolutional neural network model to obtain a classification result; thereby reducing the number of parameters of the convolutional neural network model while improving the inference speed of the model.

为实现上述目的，按照本发明的一个方面，提供了一种设计轻量化卷积神经网络构架的方法，包括：To achieve the above object, according to one aspect of the present invention, a method for designing a lightweight convolutional neural network architecture is provided, comprising:

S1：构建轻量化卷积神经网络模型，所述轻量化神经网络模型包括依次连接的：标准卷积层、多个采样拼接单元、全局池化层和全连接层，所述采样拼接单元包括依次连接的：下采样层、多个通用层和拼接层；S1: construct a lightweight convolutional neural network model, wherein the lightweight neural network model includes: a standard convolution layer, a plurality of sampling splicing units, a global pooling layer, and a fully connected layer connected in sequence, and the sampling splicing unit includes: a downsampling layer, a plurality of general layers, and a splicing layer connected in sequence;

S2：将待分类图片输入所述轻量化卷积神经网络模型得到分类结果，包括：S2: Inputting the image to be classified into the lightweight convolutional neural network model to obtain a classification result, including:

S21：利用所述标准卷积层将所述待分类图片的通道扩充为指定通道数量，从而得到原始特征图；S21: using the standard convolutional layer to expand the channels of the image to be classified to a specified number of channels, thereby obtaining an original feature map;

S22：利用第一个所述采样拼接单元中的下采样层对所述原始特征图进行下采样得到两组第一特征图，然后利用其中的多个通用层对两组所述第一特征图分别进行特征提取得到各自对应的第二特征图，再利用其中的拼接层将两组所述第二特征图进行拼接得到第一目标特征图；将所述第一目标特征图输入相邻的第二个所述采样拼接单元，以使其对所述第一目标特征图进行下采样、特征提取和拼接得到第二目标特征图；再将所述第二目标特征图输入相邻的第三个所述采样拼接单元，以此类推，直至最后一个所述采样拼接单元输出最终目标特征图；S22: using the downsampling layer in the first sampling splicing unit to downsample the original feature map to obtain two groups of first feature maps, then using multiple general layers therein to respectively extract features from the two groups of the first feature maps to obtain respective corresponding second feature maps, and then using the splicing layer therein to splice the two groups of the second feature maps to obtain a first target feature map; inputting the first target feature map to the adjacent second sampling splicing unit so that it downsamples, extracts features from and splices the first target feature map to obtain a second target feature map; then inputting the second target feature map to the adjacent third sampling splicing unit, and so on, until the last sampling splicing unit outputs the final target feature map;

S23：将最后一个所述采样拼接单元输出的所述最终目标特征图输入所述全局池化层降低维度后输入全连接层，以使所述全连接层输出所述分类图片对应的分类结果。S23: The final target feature map output by the last sampling splicing unit is input into the global pooling layer to reduce the dimension and then input into the fully connected layer, so that the fully connected layer outputs the classification result corresponding to the classified image.

在其中一个实施例中，所述下采样层，用于包括依次连接的：高斯下采样层、逐点卷积层、非线性激活层和批量标准化层；所述下采样层输出两组输出特征图；In one embodiment, the downsampling layer is used to include: a Gaussian downsampling layer, a point-by-point convolution layer, a nonlinear activation layer and a batch normalization layer connected in sequence; the downsampling layer outputs two sets of output feature maps;

输入特征图经过高斯下采样层得到一组输出特征图；The input feature map is passed through the Gaussian downsampling layer to obtain a set of output feature maps;

输入的特征图依次输入所述下采样层中的高斯下采样层、逐点卷积层、非线性激活层和批量标准化层得到另一组输出特征图。The input feature map is sequentially input into the Gaussian downsampling layer, the point-by-point convolution layer, the nonlinear activation layer and the batch normalization layer in the downsampling layer to obtain another set of output feature maps.

在其中一个实施例中，所述下采样层中的高斯下采样层对上一层输出的特征图进行卷积操作，输出特征图的分辨率将为输入时的一半。In one of the embodiments, the Gaussian downsampling layer in the downsampling layer performs a convolution operation on the feature map output by the previous layer, and the resolution of the output feature map will be half of that of the input.

在其中一个实施例中，所述下采样层中的逐点卷积层根据输入输出的特征通道数对输入特征图进行扩张或收缩。In one of the embodiments, the point-by-point convolution layer in the downsampling layer expands or contracts the input feature map according to the number of input and output feature channels.

在其中一个实施例中，所述下采样层中的非线性激活层中使用ReLU作为激活函数。In one of the embodiments, ReLU is used as an activation function in the nonlinear activation layer in the downsampling layer.

在其中一个实施例中，所述通用层包括依次连接的：深度卷积层、拼接层、逐点卷积层、非线性激活层和批量标准化层；In one embodiment, the general layer includes: a depth convolution layer, a concatenation layer, a point-by-point convolution layer, a nonlinear activation layer, and a batch normalization layer connected in sequence;

每个所述通用层输入为a组特征图和b组特征图；b组特征图输入所述深度卷积层进行卷积得到的c组特征图；将a组特征图和c组特征图输入所述通用层中的拼接层、逐点卷积层、非线性激活层和批量标准化层得到新的b组特征图；Each of the universal layers is inputted with a group of feature maps and a group of feature maps; the group of feature maps b are inputted into the depth convolution layer for convolution to obtain a group of feature maps c; the group of feature maps a and the group of feature maps c are inputted into the concatenation layer, the point-by-point convolution layer, the nonlinear activation layer and the batch normalization layer in the universal layer to obtain a new group of feature maps b;

所述b组特征图作为新a组特征图和所述新的b组特征图作为相邻的下一个通用层的输入。The b group feature maps serve as new a group feature maps and the new b group feature maps serve as inputs of the next adjacent general layer.

按照本发明的另一方面，提供了一种基于轻量化卷积神经网络的图像分类装置，包括：According to another aspect of the present invention, there is provided an image classification device based on a lightweight convolutional neural network, comprising:

构建模块，用于构建轻量化卷积神经网络模型，所述轻量化神经网络模型包括依次连接的：标准卷积层、多个采样拼接单元、全局池化层和全连接层，所述采样拼接单元包括依次连接的：下采样层、多个通用层和拼接层；A construction module is used to construct a lightweight convolutional neural network model, wherein the lightweight neural network model includes: a standard convolution layer, a plurality of sampling splicing units, a global pooling layer, and a fully connected layer connected in sequence, and the sampling splicing unit includes: a downsampling layer, a plurality of general layers, and a splicing layer connected in sequence;

分类模块，用于将待分类图片输入所述轻量化卷积神经网络模型得到分类结果，具体包括：The classification module is used to input the image to be classified into the lightweight convolutional neural network model to obtain the classification result, which specifically includes:

利用所述标准卷积层将所述待分类图片的通道扩充为指定通道数量，从而得到原始特征图；Expanding the channels of the image to be classified to a specified number of channels using the standard convolutional layer, thereby obtaining an original feature map;

利用第一个所述采样拼接单元中的下采样层对所述原始特征图进行下采样得到两组第一特征图，然后利用其中的多个通用层对两组所述第一特征图分别进行特征提取得到各自对应的第二特征图，再利用其中的拼接层将两组所述第二特征图进行拼接得到第一目标特征图；将所述第一目标特征图输入相邻的第二个所述采样拼接单元，以使其对所述第一目标特征图进行下采样、特征提取和拼接得到第二目标特征图；再将所述第二目标特征图输入相邻的第三个所述采样拼接单元，以此类推，直至最后一个所述采样拼接单元输出最终目标特征图；The original feature map is downsampled by using the downsampling layer in the first sampling splicing unit to obtain two groups of first feature maps, and then the two groups of the first feature maps are respectively extracted by using the multiple general layers therein to obtain the corresponding second feature maps, and then the two groups of the second feature maps are spliced by using the splicing layer therein to obtain the first target feature map; the first target feature map is input into the adjacent second sampling splicing unit so that it downsamples, extracts features and splices the first target feature map to obtain the second target feature map; the second target feature map is then input into the adjacent third sampling splicing unit, and so on, until the last sampling splicing unit outputs the final target feature map;

将最后一个所述采样拼接单元输出的所述最终目标特征图输入所述、全局池化层将低维度后输入全连接层，以使所述全连接层输出所述分类图片对应的分类结果。The final target feature map output by the last sampling splicing unit is input into the global pooling layer, and the low dimension is then input into the fully connected layer, so that the fully connected layer outputs the classification result corresponding to the classified image.

总体而言，通过本发明所构思的以上技术方案与现有技术相比，具有以下有益效果：In general, the above technical solution conceived by the present invention has the following beneficial effects compared with the prior art:

本发明通过修改网络单元结构，构建低参数量、低计算量且推理速度快的轻量化卷积神经网络模型，再利用轻量化卷积神经网络模型进行图片分类；由于本发明模型的参数量和计算量低，在GPU上具有更快的推理速度，利用该模型进行图片分类能够提高图片的分类效率。The present invention modifies the network unit structure to construct a lightweight convolutional neural network model with low parameter count, low computational complexity and fast inference speed, and then uses the lightweight convolutional neural network model to classify images. Since the model of the present invention has low parameter count and computational complexity and has a faster inference speed on a GPU, using the model for image classification can improve the classification efficiency of images.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明一实施例中可视化的MobileNetV2下采样层的部分卷积核的示意图，卷积核大小为1××3×3；FIG1 is a schematic diagram of a portion of convolution kernels of a downsampling layer of MobileNetV2 visualized in one embodiment of the present invention, where the convolution kernel size is 1××3×3;

图2是本发明一实施例中可视化的EfficientNet-B0非下采样层的部分卷积核的示意图，卷积核大小为1×5×5；FIG2 is a schematic diagram of a portion of convolution kernels of a non-downsampling layer of EfficientNet-B0 visualized in one embodiment of the present invention, where the convolution kernel size is 1×5×5;

图3是本发明一实施例中可视化的RegNetX-400MF相邻层第5和第7层的部分相似特征图；FIG3 is a partial similar feature map of adjacent layers 5 and 7 of RegNetX-400MF visualized in one embodiment of the present invention;

图4是本发明一实施例中构建的下采样层的示意图；FIG4 is a schematic diagram of a downsampling layer constructed in one embodiment of the present invention;

图5是本发明一实施例中构建的通用层的示意图；FIG5 is a schematic diagram of a universal layer constructed in one embodiment of the present invention;

图6是是本发明一实施例中基于轻量化卷积神经网络的图像分类方法的流程图。FIG6 is a flowchart of an image classification method based on a lightweight convolutional neural network in one embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。此外，下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the purpose, technical solutions and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

本发明通过可视化特征图和卷积核提出轻量化卷积神经网络架构的设计依据。需要说明的是，可视化的网络可以是已经训练好参数的任意深度可分离卷积神经网络；可视化过程中均对特征图和卷积核进行归一化操作，而且如果卷积核的中心值为负数，则整个卷积核乘以-1。The present invention proposes a design basis for a lightweight convolutional neural network architecture by visualizing feature maps and convolution kernels. It should be noted that the visualized network can be any depth-separable convolutional neural network with trained parameters; the feature maps and convolution kernels are normalized during the visualization process, and if the center value of the convolution kernel is a negative number, the entire convolution kernel is multiplied by -1.

依据1、如图1所示，下采样层(步长为2的深度卷积层)中的卷积核大部分近似于高斯模糊卷积核，并且下采样前进行模糊操作符合采样定理，因此我们构建的网络中，将用高斯模糊核代替下采样层中的卷积核。需要说明的是，图1由多个3x3大小(每个最小的方格代表一个像素，9个像素)的卷积核组成，越暗的像素值越大。这些3x3的卷积核中很多和高斯卷积核相似，即两个方差不同的高斯核。因此可以尝试使用高斯卷积核替代。According to 1, as shown in Figure 1, most of the convolution kernels in the downsampling layer (deep convolution layer with a stride of 2) are similar to Gaussian blur convolution kernels, and the blur operation before downsampling conforms to the sampling theorem. Therefore, in the network we constructed, the convolution kernel in the downsampling layer will be replaced by Gaussian blur kernel. It should be noted that Figure 1 is composed of multiple 3x3 convolution kernels (each smallest square represents a pixel, 9 pixels), and the darker the pixel value, the larger it is. Many of these 3x3 convolution kernels are similar to Gaussian convolution kernels, that is, two Gaussian kernels with different variances. Therefore, you can try to use Gaussian convolution kernel instead.

依据2、如图2所示，除下采样层外的其他深度卷积层中的卷积核很大一部分近似于恒等核(即只有中央有值的卷积核)，因此我们构建的网络中，将使用恒等核代替部分深度卷积核，这等价于直接移除部分深度卷积层的卷积核。注意，为提升准确率和计算效率，深度卷积层后不再使用非线性激活层和批归一化层。需要说明的是，图2也是由多个3x3的卷积核组成。这些卷积核中很多和3x3的恒等核(Identity Kernel，中间值为1，其他值为0)相似，而Identity Kernl可以直接去除，从而减少计算量。According to 2, as shown in Figure 2, a large part of the convolution kernels in other deep convolution layers except the downsampling layer are similar to the identity kernel (that is, the convolution kernel with only the central value). Therefore, in the network we constructed, the identity kernel will be used to replace some deep convolution kernels, which is equivalent to directly removing the convolution kernels of some deep convolution layers. Note that in order to improve accuracy and computational efficiency, nonlinear activation layers and batch normalization layers are no longer used after the deep convolution layer. It should be noted that Figure 2 is also composed of multiple 3x3 convolution kernels. Many of these convolution kernels are similar to the 3x3 identity kernel (Identity Kernel, the middle value is 1, and the other values are 0), and the Identity Kernl can be directly removed to reduce the amount of calculation.

依据3、如图3所示，相邻层输出的特征图中，有很大一份分为相似的重复特征图，因此，我们将在相近的层中通过恒等映射复用部分特征图，且使得复用后的特征图不参与下一次的深度卷积操作，这保证网络宽度不变的同时，减少了一半的参数量和计算量。According to 3, as shown in Figure 3, a large part of the feature maps output by adjacent layers are divided into similar repeated feature maps. Therefore, we will reuse some feature maps in similar layers through identity mapping, and make the reused feature maps not participate in the next deep convolution operation. This ensures that the network width remains unchanged while reducing the number of parameters and calculations by half.

本发明提供了一种基于轻量化卷积神经网络的图像分类方法，包括：The present invention provides an image classification method based on a lightweight convolutional neural network, comprising:

如图4所示，本申请提供的下采样层由4层构成：高斯下采样层(替换为高斯卷积核后步长为2的深度卷积层)、逐点卷积层、非线性激活层、批归一化层。高斯下采样层对上一层输出的特征图进行卷积操作，输出特征图的分辨率将为输入时的一半；逐点卷积层根据输入输出的特征通道数对输入特征图进行扩张或收缩；非线性激活层和批归一化层分别对上一层的输出特征进行非线性激活和归一化操作；最终，高斯下采样层的输出特征图和批归一化层的输出特征图共同作为本单元的输出。特别的，非线性激活层中使用ReLU作为激活函数。As shown in Figure 4, the downsampling layer provided by the present application consists of 4 layers: a Gaussian downsampling layer (a deep convolution layer with a step size of 2 after being replaced by a Gaussian convolution kernel), a point-by-point convolution layer, a nonlinear activation layer, and a batch normalization layer. The Gaussian downsampling layer performs a convolution operation on the feature map output by the previous layer, and the resolution of the output feature map will be half of that of the input; the point-by-point convolution layer expands or contracts the input feature map according to the number of feature channels of the input and output; the nonlinear activation layer and the batch normalization layer perform nonlinear activation and normalization operations on the output features of the previous layer, respectively; finally, the output feature map of the Gaussian downsampling layer and the output feature map of the batch normalization layer are used together as the output of this unit. In particular, ReLU is used as the activation function in the nonlinear activation layer.

在其中一个实施例中，所述下采样层中的逐点卷积层根据输入输出的特征通道数对输入特征的通道进行扩张或收缩。In one of the embodiments, the point-by-point convolution layer in the downsampling layer expands or contracts the channels of the input features according to the number of input and output feature channels.

所述新的b组特征图作为新一轮的a组特征图和新一轮中输入的b组特征图作为相邻的下一个通用层的输入。The new b-group feature map is used as the a-group feature map of the new round, and the b-group feature map input in the new round is used as the input of the next adjacent general layer.

如图5所示，通用层共由5层构成：深度卷积层、拼接层、逐点卷积层、非线性激活层、批归一化层。根据步骤一中依据2，我们移除了深度卷积层中的一半卷积核，即深度卷积层仅对输入的第2组特征图进行卷积操作；拼接层将该单元输入的第1组特征图和深度卷积层输出的特征图拼接为1组作为输出；根据步骤一中的依据3，为了复用特征图并降低参数量，逐点卷积层对拼接层的输出特征图处理并降低一半的通道数，之后再经过非线性激活层和批归一化层处理；最终，该单元将输入的第2组特征图作为输出的第1组特征图，并将逐点卷积层的输出作为该单元的第2组输出特征图。As shown in Figure 5, the general layer consists of 5 layers: depth convolution layer, concatenation layer, point-by-point convolution layer, nonlinear activation layer, and batch normalization layer. According to the basis 2 in step 1, we removed half of the convolution kernels in the depth convolution layer, that is, the depth convolution layer only performs convolution operation on the second set of input feature maps; the concatenation layer concatenates the first set of feature maps input to the unit and the feature maps output by the depth convolution layer into one set as output; according to the basis 3 in step 1, in order to reuse feature maps and reduce the number of parameters, the point-by-point convolution layer processes the output feature maps of the concatenation layer and reduces the number of channels by half, and then processes them through the nonlinear activation layer and batch normalization layer; finally, the unit uses the second set of input feature maps as the first set of output feature maps, and uses the output of the point-by-point convolution layer as the second set of output feature maps of the unit.

需要说明的是，本发明利用下采样层(DownBlock)和通用层(HalfConvBlock)构建轻量化神经网络，如图6所示：It should be noted that the present invention uses a downsampling layer (DownBlock) and a general layer (HalfConvBlock) to construct a lightweight neural network, as shown in FIG6 :

1、使用标准卷积将输入图片通道扩充为指定通道数量；1. Use standard convolution to expand the input image channels to the specified number of channels;

2、使用DownBlock下采样并扩充通道数量，输出为2组特征图；2. Use DownBlock to downsample and expand the number of channels, and output two sets of feature maps;

3、重复使用HalfConvBlock提取特征，输入输出均为2组特征图；3. Repeatedly use HalfConvBlock to extract features, with both input and output being 2 sets of feature maps;

4、拼接2组特征图为1组；4. Combine two sets of feature maps into one set;

4、再次重复2、3和4的操作3次；4. Repeat steps 2, 3 and 4 3 more times;

5、使用全连接层输出最终的分类结果。5. Use the fully connected layer to output the final classification result.

需要说明的是，最终的模型可以根据具体的需求(如参数量或计算量)来设置不同层的层数，例如，构建参数量为1.5M的网络具体细节如表1所示(表中省略了每个通用层后的拼接层)。It should be noted that the final model can set the number of layers of different layers according to specific requirements (such as the number of parameters or the amount of calculation). For example, the specific details of building a network with a parameter amount of 1.5M are shown in Table 1 (the concatenation layer after each general layer is omitted in the table).

表1Table 1

当参数量限制为1.5M或2.5M且达到其他模型相近准确率时，本发明模型在参数量和推理速度上均大幅领先，如表2(本发明模型使用1.5M参数时在ImageNet上的准确率。GPU推理速度在RTX 6000上测得)和表3所示。另外，按照1.5M和2.5M的参数量构建的实施例仅用以说明本发明的技术方案，本领域的技术人员应当理解，仅对本发明的技术方案进行修改或等同替换，特别是仅修改网络层数和通道数量，而不脱离本技术方案的宗旨和范围。When the number of parameters is limited to 1.5M or 2.5M and the accuracy of other models is similar, the model of the present invention is significantly ahead in both the number of parameters and the inference speed, as shown in Table 2 (the accuracy of the model of the present invention on ImageNet when using 1.5M parameters. The GPU inference speed is measured on RTX 6000) and Table 3. In addition, the embodiments constructed according to the number of parameters of 1.5M and 2.5M are only used to illustrate the technical solution of the present invention. Those skilled in the art should understand that only modifying or equivalently replacing the technical solution of the present invention, especially only modifying the number of network layers and the number of channels, does not deviate from the purpose and scope of the technical solution.

表2Table 2

表3table 3

构建模块，用于构建轻量化卷积神经网络模型，所述轻量化神经网络模型包括依次连接的：标准卷积层、多个采样拼接单元和全连接层，所述采样拼接单元包括依次连接的：下采样层、多个通用层和拼接层；A construction module is used to construct a lightweight convolutional neural network model, wherein the lightweight neural network model includes: a standard convolution layer, a plurality of sampling splicing units, and a fully connected layer connected in sequence, and the sampling splicing unit includes: a downsampling layer, a plurality of general layers, and a splicing layer connected in sequence;

将最后一个所述采样拼接单元输出的所述最终目标特征图输入所述全局池化层降低维度后输入所述全连接层，以使所述全连接层输出所述分类图片对应的分类结果。The final target feature map output by the last sampling splicing unit is input into the global pooling layer to reduce the dimension and then input into the fully connected layer, so that the fully connected layer outputs the classification result corresponding to the classified image.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。It will be easily understood by those skilled in the art that the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the present invention.

Claims

1. An image classification method based on a lightweight convolutional neural network is characterized by comprising the following steps:

s1: constructing a lightweight convolutional neural network model, wherein the lightweight neural network model comprises the following components connected in sequence: the system comprises a standard convolution layer, a plurality of sampling splicing units and a full connection layer, wherein the sampling splicing units comprise sequentially connected: the system comprises a downsampling layer, a plurality of universal layers and a splicing layer;

s2: inputting the picture to be classified into the lightweight convolutional neural network model to obtain a classification result, which comprises the following steps:

S21: expanding the channels of the pictures to be classified into the number of specified channels by using the standard convolution layer, so as to obtain an original feature map;

S22: downsampling the original feature images by using a downsampling layer in a first sampling and stitching unit to obtain two groups of first feature images, then respectively extracting features of the two groups of first feature images by using a plurality of universal layers in the downsampling layer to obtain respective corresponding second feature images, and stitching the two groups of second feature images by using a stitching layer in the downsampling layer to obtain a first target feature image; inputting the first target feature map into a second adjacent sampling and splicing unit so as to enable the first target feature map to be subjected to downsampling, feature extraction and splicing to obtain a second target feature map; inputting the second target feature image into the adjacent third sampling and splicing unit, and so on until the last sampling and splicing unit outputs a final target feature image;

S23: inputting the final target feature image output by the last sampling and splicing unit into a global pooling layer to reduce the dimension and then inputting the final target feature image into the full-connection layer so that the full-connection layer outputs a classification result corresponding to the classification picture;

The universal layer comprises the following components connected in sequence: the system comprises a depth convolution layer, a splicing layer, a point-by-point convolution layer, a nonlinear activation layer and a batch standardization layer; each universal layer input is an a-group characteristic diagram and a b-group characteristic diagram; b group of feature graphs are input into the depth convolution layer to be convolved to obtain c group of feature graphs; inputting the a group of feature images and the c group of feature images into a splicing layer, a point-by-point convolution layer, a nonlinear activation layer and a batch standardization layer in the universal layer to obtain a b group of feature images; the new b-group feature map is used as a new a-group feature map and the new b-group feature map is used as the input of the next adjacent general layer.

2. The method for classifying images based on a lightweight convolutional neural network of claim 1,

The downsampling layer is used for comprising the following steps of: a Gaussian downsampling layer, a point-by-point convolution layer, a nonlinear activation layer and a batch standardization layer; the downsampling layer outputs two groups of output feature graphs;

the input feature map is subjected to Gaussian downsampling layer to obtain a group of output feature maps;

The input feature images are sequentially input into a Gaussian downsampling layer, a point-by-point convolution layer, a nonlinear activation layer and a batch standardization layer in the downsampling layer to obtain another group of output feature images.

3. The method for classifying images based on a lightweight convolutional neural network according to claim 2,

The Gaussian downsampling layer in the downsampling layers carries out convolution operation on the feature map output by the upper layer, and the resolution of the output feature map is half of that of the input feature map.

4. The method for classifying images based on a lightweight convolutional neural network according to claim 2,

And the point-by-point convolution layer in the downsampling layer expands or contracts channels of input features according to the number of the input and output feature channels.

5. The method for classifying images based on a lightweight convolutional neural network according to claim 2,

ReLU is used as an activation function in a nonlinear activation layer in the downsampling layer.

6. An image classification apparatus based on a lightweight convolutional neural network, for performing the method of any one of claims 1-5, comprising:

The building module is used for building a lightweight convolutional neural network model, and the lightweight neural network model comprises the following components connected in sequence: the system comprises a standard convolution layer, a plurality of sampling splicing units and a full connection layer, wherein the sampling splicing units comprise sequentially connected: the system comprises a downsampling layer, a plurality of universal layers and a splicing layer;

The classification module is used for inputting the pictures to be classified into the lightweight convolutional neural network model to obtain classification results, and specifically comprises the following steps:

Expanding the channels of the pictures to be classified into the number of specified channels by using the standard convolution layer, so as to obtain an original feature map;

Downsampling the original feature images by using a downsampling layer in a first sampling and stitching unit to obtain two groups of first feature images, then respectively extracting features of the two groups of first feature images by using a plurality of universal layers in the downsampling layer to obtain respective corresponding second feature images, and stitching the two groups of second feature images by using a stitching layer in the downsampling layer to obtain a first target feature image; inputting the first target feature map into a second adjacent sampling and splicing unit so as to enable the first target feature map to be subjected to downsampling, feature extraction and splicing to obtain a second target feature map; inputting the second target feature image into the adjacent third sampling and splicing unit, and so on until the last sampling and splicing unit outputs a final target feature image;

And inputting the final target feature image output by the last sampling and splicing unit into the global pooling layer, reducing the dimension, and then inputting the final target feature image into the full-connection layer, so that the full-connection layer outputs a classification result corresponding to the classification image.